Patchwork 3c59x: fix deadlock when using netconsole with 3c59x

login
register
mail settings
Submitter Neil Horman
Date Aug. 9, 2010, 4:32 p.m.
Message ID <20100809163210.GA1781@hmsreliant.think-freely.org>
Download mbox | patch
Permalink /patch/61292/
State Rejected
Delegated to: David Miller
Headers show

Comments

Neil Horman - Aug. 9, 2010, 4:32 p.m.
When using netpoll, its possible to deadlock the 3c59x driver.  Since it takes
an internal spinlock (vp->lock) that serializes boomerang_interrupt and parts
of boomerang_start_xmit, if we call pr_debug in the former, we can go through
the tx path on the same cpu, and recurse into the same driver again, deadlocking
in the transmit routine.

This patch fixes that problem by stopping the queues during interrupt
processing, so that subsequent transmits will get queued until a later point.
Its not a great solution, but we need to find some way to serialize access to
the register file on the card without enforcing a deadlock.  I think the queue
stop is the best way to do that.  And since we only print things in
boomerang_interrupt when we have debug enabled, we can mitigate the impact of
this change to only stop the queues when debug is on.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
---
 drivers/net/3c59x.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)
David Miller - Aug. 10, 2010, 11:48 p.m.
From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 9 Aug 2010 12:32:10 -0400

> When using netpoll, its possible to deadlock the 3c59x driver.  Since it takes
> an internal spinlock (vp->lock) that serializes boomerang_interrupt and parts
> of boomerang_start_xmit, if we call pr_debug in the former, we can go through
> the tx path on the same cpu, and recurse into the same driver again, deadlocking
> in the transmit routine.
> 
> This patch fixes that problem by stopping the queues during interrupt
> processing, so that subsequent transmits will get queued until a later point.
> Its not a great solution, but we need to find some way to serialize access to
> the register file on the card without enforcing a deadlock.  I think the queue
> stop is the best way to do that.  And since we only print things in
> boomerang_interrupt when we have debug enabled, we can mitigate the impact of
> this change to only stop the queues when debug is on.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

Nothing serialized changes to "vortex_debug" with the tests you're
making here.  Simply turning it off when a packet arrives can deadlock
the send queue of the device.

Even if proper serializatio did exist, I still see this as an awful
solution.

And the default value of this thing is "1" so it's always going to be
doing this send queue state flipping for effectively everyone.

Please find another way to solve this problem.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - Aug. 11, 2010, 1:16 a.m.
On Tue, Aug 10, 2010 at 04:48:03PM -0700, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Mon, 9 Aug 2010 12:32:10 -0400
> 
> > When using netpoll, its possible to deadlock the 3c59x driver.  Since it takes
> > an internal spinlock (vp->lock) that serializes boomerang_interrupt and parts
> > of boomerang_start_xmit, if we call pr_debug in the former, we can go through
> > the tx path on the same cpu, and recurse into the same driver again, deadlocking
> > in the transmit routine.
> > 
> > This patch fixes that problem by stopping the queues during interrupt
> > processing, so that subsequent transmits will get queued until a later point.
> > Its not a great solution, but we need to find some way to serialize access to
> > the register file on the card without enforcing a deadlock.  I think the queue
> > stop is the best way to do that.  And since we only print things in
> > boomerang_interrupt when we have debug enabled, we can mitigate the impact of
> > this change to only stop the queues when debug is on.
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> Nothing serialized changes to "vortex_debug" with the tests you're
> making here.  Simply turning it off when a packet arrives can deadlock
> the send queue of the device.
> 
> Even if proper serializatio did exist, I still see this as an awful
> solution.
> 
> And the default value of this thing is "1" so it's always going to be
> doing this send queue state flipping for effectively everyone.
> 
> Please find another way to solve this problem.
> 

Crap-spackle, you're right.  I was thinking the module_param defaulted it to
zero, but it doesn't.  Regardless, sysfs would let us change this value and
deadlock it all to heck.

Recinded.  I'll find a better way to fix this.  Sorry for the noise
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index c754d88..e5c8757 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -2338,7 +2338,17 @@  boomerang_interrupt(int irq, void *dev_id)
 	/*
 	 * It seems dopey to put the spinlock this early, but we could race against vortex_tx_timeout
 	 * and boomerang_start_xmit
+	 * We also need to disable the tx queue in the event that we print
+	 * anything from this path.  if pr_debug is called, we run the risk of 
+	 * recursing through the transmit path, which also takes the vp->lock,
+	 * and deadlocks us.  This only happens if we're using netpoll
+	 * but we need to be ready for it
+	 * We can mitigate the perf impact here if we only
+	 * do this is vortex_debug is != 0
 	 */
+	if (vortex_debug)
+		netif_stop_queue(dev);
+		
 	spin_lock(&vp->lock);
 
 	status = ioread16(ioaddr + EL3_STATUS);
@@ -2447,6 +2457,8 @@  boomerang_interrupt(int irq, void *dev_id)
 		pr_debug("%s: exiting interrupt, status %4.4x.\n",
 			   dev->name, status);
 handler_exit:
+	if (vortex_debug)
+		netif_start_queue(dev);
 	spin_unlock(&vp->lock);
 	return IRQ_HANDLED;
 }