diff mbox

BQL support in gianfar causes network hickup

Message ID 1353947677.7553.2.camel@edumazet-glaptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Nov. 26, 2012, 4:34 p.m. UTC
On Mon, 2012-11-26 at 11:01 +0100, Tino Keitel wrote:
> On Sat, Nov 24, 2012 at 15:43:36 -0800, Eric Dumazet wrote:
> 
> [...]
> 
> > Hmm, I wonder if BQL makes a particular bug showing more often.
> > 
> > I see gianfar uses a very small watchdog_timeo of 1 second, while many
> > drivers use 5 seconds.
> > 
> > What happens if you change this to 5 seconds ?
> 
> I still got the trace and a failing ptp client.
> 

Thanks. Is this bug easy to trigger ?

I suspect a core issue and a race, likely to happen on your (non x86)
hardware

Could you add the following debugging patch ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Keitel, Tino (ALC NetworX GmbH) Nov. 26, 2012, 5:08 p.m. UTC | #1
On Mo, 2012-11-26 at 08:34 -0800, Eric Dumazet wrote:
> On Mon, 2012-11-26 at 11:01 +0100, Tino Keitel wrote:
> > On Sat, Nov 24, 2012 at 15:43:36 -0800, Eric Dumazet wrote:
> > 
> > [...]
> > 
> > > Hmm, I wonder if BQL makes a particular bug showing more often.
> > > 
> > > I see gianfar uses a very small watchdog_timeo of 1 second, while many
> > > drivers use 5 seconds.
> > > 
> > > What happens if you change this to 5 seconds ?
> > 
> > I still got the trace and a failing ptp client.
> > 
> 
> Thanks. Is this bug easy to trigger ?
> 
> I suspect a core issue and a race, likely to happen on your (non x86)
> hardware
> 
> Could you add the following debugging patch ?

No visible difference:

NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue 0 timed out
------------[ cut here ]------------
WARNING:
at /home/keitelt1/src/git/linux-stable/net/sched/sch_generic.c:255
Modules linked in:
NIP: c02448b0 LR: c02448b0 CTR: c01c19b8
REGS: c7ffbe40 TRAP: 0700   Not tainted  (3.7.0-rc6-dirty)
MSR: 00029032 <EE,ME,IR,DR,RI>  CR: 22002044  XER: 20000000
TASK = c03dd370[0] 'swapper' THREAD: c03fe000
GPR00: c02448b0 c7ffbef0 c03dd370 0000003f 00000001 c001aea8 00000000
00000001
GPR08: 00000001 c03e0000 00000000 0000009d 22002084 1008eb5c 07ffb000
ffffffff
GPR16: 00000004 c0362c7c c03dfbf8 00200000 c0411ed0 c0411cd0 c0411ad0
ffffffff
GPR24: 00000000 c749e1d8 00000004 c783c1b0 c0400000 c03e0000 c749e000
00000000
NIP [c02448b0] dev_watchdog+0x288/0x298
LR [c02448b0] dev_watchdog+0x288/0x298
Call Trace:
[c7ffbef0] [c02448b0] dev_watchdog+0x288/0x298 (unreliable)
[c7ffbf20] [c00267f8] call_timer_fn+0x6c/0xd8
[c7ffbf50] [c00269e4] run_timer_softirq+0x180/0x1f8
[c7ffbfa0] [c0021144] __do_softirq+0xc4/0x160
[c7ffbff0] [c000d0b8] call_do_softirq+0x14/0x24
[c03ffe00] [c00058e8] do_softirq+0x8c/0xb8
[c03ffe20] [c0021358] irq_exit+0x98/0xb4
[c03ffe30] [c0009fb0] timer_interrupt+0x158/0x170
[c03ffe50] [c000f02c] ret_from_except+0x0/0x14
--- Exception: 901 at _raw_spin_unlock_irq+0x3c/0x78
    LR = _raw_spin_unlock_irq+0x2c/0x78
[c03fff20] [c00434c8] finish_task_switch.constprop.69+0x5c/0xdc
[c03fff40] [c02d354c] __schedule+0x1e0/0x410
[c03fff90] [c02d3a78] schedule_preempt_disabled+0x18/0x30
[c03fffa0] [c000898c] cpu_idle+0xfc/0x100
[c03fffc0] [c03b37b0] start_kernel+0x2dc/0x2f0
[c03ffff0] [00003438] 0x3438
Instruction dump:
7d2903a6 4e800421 80fe01fc 4bffff74 7fc3f378 4bfecb7d 7fc4f378 7fe6fb78 
7c651b78 3c60c038 38637280 48090e51 <0fe00000> 39200001 993cc7c9
4bffffb8 
---[ end trace c170f56a0503cdd2 ]---

Regards,
Tino
diff mbox

Patch

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index aefc150..a8859ec 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -117,7 +117,7 @@  int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 	int ret = NETDEV_TX_BUSY;
 
 	/* And release qdisc */
-	spin_unlock(root_lock);
+//	spin_unlock(root_lock);
 
 	HARD_TX_LOCK(dev, txq, smp_processor_id());
 	if (!netif_xmit_frozen_or_stopped(txq))
@@ -125,7 +125,7 @@  int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
 	HARD_TX_UNLOCK(dev, txq);
 
-	spin_lock(root_lock);
+//	spin_lock(root_lock);
 
 	if (dev_xmit_complete(ret)) {
 		/* Driver sent out skb successfully or skb was consumed */