diff mbox

BQL support in gianfar causes network hickup

Message ID 9AA65D849A88EB44B5D9B6A8BA098E23040A60D6EE6E@Exchange1.lawo.de
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Keitel, Tino (ALC NetworX GmbH) Nov. 23, 2012, 3:58 p.m. UTC
Hi,

commit d8a0f1b0af67679bba886784de10d8c21acc4e0e causes the following
trace on a Freescale RDB8313 board:

NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue 0 timed out
------------[ cut here ]------------
WARNING:
at /home/keitelt1/src/git/linux-stable/net/sched/sch_generic.c:255
Modules linked in:
NIP: c02448b0 LR: c02448b0 CTR: c01c19b8
REGS: c7ffbe40 TRAP: 0700   Not tainted  (3.7.0-rc6-rt18)
MSR: 00029032 <EE,ME,IR,DR,RI>  CR: 24002044  XER: 20000000
TASK = c03dd370[0] 'swapper' THREAD: c03fe000
GPR00: c02448b0 c7ffbef0 c03dd370 0000003f 00000001 c001aea8 00000000
00000001
GPR08: 00000001 c03e0000 00000000 0000009d 24002084 1008eb5c 07ffb000
ffffffff
GPR16: 00000004 c0362c7c c03dfbf8 00200000 c0411ed0 c0411cd0 c0411ad0
ffffffff
GPR24: 00000000 c749e1d8 00000004 c783d1b0 c0400000 c03e0000 c749e000
00000000
NIP [c02448b0] dev_watchdog+0x288/0x298
LR [c02448b0] dev_watchdog+0x288/0x298
Call Trace:
[c7ffbef0] [c02448b0] dev_watchdog+0x288/0x298 (unreliable)
[c7ffbf20] [c00267f8] call_timer_fn+0x6c/0xd8
[c7ffbf50] [c00269e4] run_timer_softirq+0x180/0x1f8
[c7ffbfa0] [c0021144] __do_softirq+0xc4/0x160
[c7ffbff0] [c000d0b8] call_do_softirq+0x14/0x24
[c03ffe90] [c00058e8] do_softirq+0x8c/0xb8
[c03ffeb0] [c0021358] irq_exit+0x98/0xb4
[c03ffec0] [c0009fb0] timer_interrupt+0x158/0x170
[c03ffee0] [c000f02c] ret_from_except+0x0/0x14
--- Exception: 901 at cpu_idle+0x94/0x100
    LR = cpu_idle+0x94/0x100
[c03fffa0] [c00088ec] cpu_idle+0x5c/0x100 (unreliable)
[c03fffc0] [c03b37b0] start_kernel+0x2dc/0x2f0
[c03ffff0] [00003438] 0x3438
Instruction dump:
7d2903a6 4e800421 80fe01fc 4bffff74 7fc3f378 4bfecb7d 7fc4f378 7fe6fb78
7c651b78 3c60c038 38637280 48090e69 <0fe00000> 39200001 993cc7c9
4bffffb8
---[ end trace 32125455035c2f70 ]---

With this commit reverted, it works fine. v3.3 is ok, v3.4 contains the
bad commit. The commit doesn't revert in a clean way in 3.7-rc6. I
attached diff without the tqi changes.

The above trace happens while a ptp client (for IEEE1588) is running, so
there is some locally generated network traffic. The client stops to
work after this, but maybe this is due to bad error handling.

Regards,
Tino

Comments

Paul Gortmaker Nov. 23, 2012, 4:34 p.m. UTC | #1
On 12-11-23 10:58 AM, Keitel, Tino (ALC NetworX GmbH) wrote:
> Hi,
> 
> commit d8a0f1b0af67679bba886784de10d8c21acc4e0e causes the following
> trace on a Freescale RDB8313 board:

Thanks for the report.

> 
> NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue 0 timed out
> ------------[ cut here ]------------
> WARNING:
> at /home/keitelt1/src/git/linux-stable/net/sched/sch_generic.c:255
> Modules linked in:
> NIP: c02448b0 LR: c02448b0 CTR: c01c19b8
> REGS: c7ffbe40 TRAP: 0700   Not tainted  (3.7.0-rc6-rt18)
                                            ^^^^^^^^^^^^^^^
I almost overlooked the above.  It would have been nice to
see more explicit information on what kernel you are running.
I say that because the above concerns me.  For several reasons.

1) it looks to be not mainline, but preempt_rt
2) There is no RT on 3.7 yet, so I'm assuming this is a custom
   forward port of the 250 odd RT patches.  (The RT is 3.6.7-rt18,
   i.e. based on the 3.6 gregKH stable tree.)

> MSR: 00029032 <EE,ME,IR,DR,RI>  CR: 24002044  XER: 20000000
> TASK = c03dd370[0] 'swapper' THREAD: c03fe000
> GPR00: c02448b0 c7ffbef0 c03dd370 0000003f 00000001 c001aea8 00000000
> 00000001
> GPR08: 00000001 c03e0000 00000000 0000009d 24002084 1008eb5c 07ffb000
> ffffffff
> GPR16: 00000004 c0362c7c c03dfbf8 00200000 c0411ed0 c0411cd0 c0411ad0
> ffffffff
> GPR24: 00000000 c749e1d8 00000004 c783d1b0 c0400000 c03e0000 c749e000
> 00000000
> NIP [c02448b0] dev_watchdog+0x288/0x298
> LR [c02448b0] dev_watchdog+0x288/0x298
> Call Trace:
> [c7ffbef0] [c02448b0] dev_watchdog+0x288/0x298 (unreliable)
> [c7ffbf20] [c00267f8] call_timer_fn+0x6c/0xd8
> [c7ffbf50] [c00269e4] run_timer_softirq+0x180/0x1f8
> [c7ffbfa0] [c0021144] __do_softirq+0xc4/0x160
> [c7ffbff0] [c000d0b8] call_do_softirq+0x14/0x24
> [c03ffe90] [c00058e8] do_softirq+0x8c/0xb8
> [c03ffeb0] [c0021358] irq_exit+0x98/0xb4
> [c03ffec0] [c0009fb0] timer_interrupt+0x158/0x170
> [c03ffee0] [c000f02c] ret_from_except+0x0/0x14
> --- Exception: 901 at cpu_idle+0x94/0x100
>     LR = cpu_idle+0x94/0x100
> [c03fffa0] [c00088ec] cpu_idle+0x5c/0x100 (unreliable)
> [c03fffc0] [c03b37b0] start_kernel+0x2dc/0x2f0
> [c03ffff0] [00003438] 0x3438
> Instruction dump:
> 7d2903a6 4e800421 80fe01fc 4bffff74 7fc3f378 4bfecb7d 7fc4f378 7fe6fb78
> 7c651b78 3c60c038 38637280 48090e69 <0fe00000> 39200001 993cc7c9
> 4bffffb8
> ---[ end trace 32125455035c2f70 ]---
> 
> With this commit reverted, it works fine. v3.3 is ok, v3.4 contains the
> bad commit. The commit doesn't revert in a clean way in 3.7-rc6. I
> attached diff without the tqi changes.

Have you reproduced this on a mainline kernel, i.e. vanilla 3.4
or vanilla 3.7-rc6?  And then done a revert on that baseline?

The patch was relatively straightforward and reviewed by Eric
who knows this stuff inside out; it isn't immediately clear
to me why it would cause problems for you.

Paul.
--

> 
> The above trace happens while a ptp client (for IEEE1588) is running, so
> there is some locally generated network traffic. The client stops to
> work after this, but maybe this is due to bad error handling.
> 
> Regards,
> Tino
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Francois Romieu Nov. 23, 2012, 7:42 p.m. UTC | #2
Paul Gortmaker <paul.gortmaker@windriver.com> :
> On 12-11-23 10:58 AM, Keitel, Tino (ALC NetworX GmbH) wrote:
[...]
> > With this commit reverted, it works fine. v3.3 is ok, v3.4 contains the
> > bad commit. The commit doesn't revert in a clean way in 3.7-rc6. I
> > attached diff without the tqi changes.

Pre v3.4.5 stable kernel will miss some bql fixes. Namely:
4f4bdaeb40df95499c1ee7ea3fbca9d76174a59e (upstream 914bec1011a25f65cdc)
1414a53d956340ca8b1b27e05ab94ba63e82ed97 (upstream 25426b794efdc70dde7)
Tino Keitel Nov. 24, 2012, 8:42 p.m. UTC | #3
Paul Gortmaker <paul.gortmaker <at> windriver.com> writes:

> 
> On 12-11-23 10:58 AM, Keitel, Tino (ALC NetworX GmbH) wrote:
> > Hi,
> > 
> > commit d8a0f1b0af67679bba886784de10d8c21acc4e0e causes the following
> > trace on a Freescale RDB8313 board:
> 
> Thanks for the report.
> 
> > 
> > NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue 0 timed out
> > ------------[ cut here ]------------
> > WARNING:
> > at /home/keitelt1/src/git/linux-stable/net/sched/sch_generic.c:255
> > Modules linked in:
> > NIP: c02448b0 LR: c02448b0 CTR: c01c19b8
> > REGS: c7ffbe40 TRAP: 0700   Not tainted  (3.7.0-rc6-rt18)
>                                             ^^^^^^^^^^^^^^^
> I almost overlooked the above.  It would have been nice to
> see more explicit information on what kernel you are running.
> I say that because the above concerns me.  For several reasons.
> 
> 1) it looks to be not mainline, but preempt_rt
> 2) There is no RT on 3.7 yet, so I'm assuming this is a custom
>    forward port of the 250 odd RT patches.  (The RT is 3.6.7-rt18,
>    i.e. based on the 3.6 gregKH stable tree.)

Sorry for the confusion. This was a 3.7.0-rc6 tree, and I forgot git clean after
trying the rt-patches and git reset --hard v3.7.0-rc6, so the localversion file
for -rt was still present, and the kernel was named 3.7.0-rc6-rt18. If I got
this right, this should be a normal kernel with just the version file modified.

I tried kernel 3.3, which doesnt have the issue. I tried 3.4, 3.6.7 and 3.7-rc6,
which all show the kernel trace and ptp client misbehaviour. I tried 3.4, 3.6.7,
3.7-rc6 and 3.6.5-rt18 with the patch I posted, and they were ok.

The patch I posted is for 3.7-rc6.

Regards,
Tino


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 19ac096..e314c6f 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -1748,13 +1748,9 @@  static void free_skb_resources(struct gfar_private *priv)
 
 	/* Go through all the buffer descriptors and free their data buffers */
 	for (i = 0; i < priv->num_tx_queues; i++) {
-		struct netdev_queue *txq;
-
 		tx_queue = priv->tx_queue[i];
-		txq = netdev_get_tx_queue(tx_queue->dev, tx_queue->qindex);
 		if (tx_queue->tx_skbuff)
 			free_skb_tx_queue(tx_queue);
-		netdev_tx_reset_queue(txq);
 	}
 
 	for (i = 0; i < priv->num_rx_queues; i++) {
@@ -2210,8 +2206,6 @@  static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		lstatus |= BD_LFLAG(TXBD_CRC | TXBD_READY) | skb_headlen(skb);
 	}
 
-	netdev_tx_sent_queue(txq, skb->len);
-
 	/* We can work in parallel with gfar_clean_tx_ring(), except
 	 * when modifying num_txbdfree. Note that we didn't grab the lock
 	 * when we were reading the num_txbdfree and checking for available
@@ -2456,7 +2450,6 @@  static void gfar_align_skb(struct sk_buff *skb)
 static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 {
 	struct net_device *dev = tx_queue->dev;
-	struct netdev_queue *txq;
 	struct gfar_private *priv = netdev_priv(dev);
 	struct gfar_priv_rx_q *rx_queue = NULL;
 	struct txbd8 *bdp, *next = NULL;
@@ -2469,12 +2462,10 @@  static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 	int i;
 	int howmany = 0;
 	int tqi = tx_queue->qindex;
-	unsigned int bytes_sent = 0;
 	u32 lstatus;
 	size_t buflen;
 
 	rx_queue = priv->rx_queue[tqi];
-	txq = netdev_get_tx_queue(dev, tqi);
 	bdp = tx_queue->dirty_tx;
 	skb_dirtytx = tx_queue->skb_dirtytx;
 
@@ -2531,8 +2522,6 @@  static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 			bdp = next_txbd(bdp, base, tx_ring_size);
 		}
 
-		bytes_sent += skb->len;
-
 		dev_kfree_skb_any(skb);
 
 		tx_queue->tx_skbuff[skb_dirtytx] = NULL;
@@ -2547,15 +2536,13 @@  static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 	}
 
 	/* If we freed a buffer, we can restart transmission, if necessary */
-	if (netif_tx_queue_stopped(txq) && tx_queue->num_txbdfree)
+	if (__netif_subqueue_stopped(dev, tx_queue->qindex) && tx_queue->num_txbdfree)
 		netif_wake_subqueue(dev, tqi);
 
 	/* Update dirty indicators */
 	tx_queue->skb_dirtytx = skb_dirtytx;
 	tx_queue->dirty_tx = bdp;
 
-	netdev_tx_completed_queue(txq, howmany, bytes_sent);
-
 	return howmany;
 }