Patchwork Fix deadlock between boomerang_interrupt and boomerang_start_tx in 3c59x

login
register
mail settings
Submitter Neil Horman
Date Aug. 11, 2010, 3:12 p.m.
Message ID <20100811151257.GB23317@hmsreliant.think-freely.org>
Download mbox | patch
Permalink /patch/61485/
State Accepted
Delegated to: David Miller
Headers show

Comments

Neil Horman - Aug. 11, 2010, 3:12 p.m.
If netconsole is in use, there is a possibility for deadlock in 3c59x between
boomerang_interrupt and boomerang_start_xmit.  Both routines take the vp->lock,
and if netconsole is in use, a pr_* call from the boomerang_interrupt routine
will result in the netconsole code attempting to trnasmit an skb, which can try
to take the same spin lock, resulting in deadlock.

The fix is pretty straightforward.  This patch allocats a bit in the 3c59x
private structure to indicate that its handling an interrupt.  If we get into
the transmit routine and that bit is set, we can be sure that we have recursed
and will deadlock if we continue, so instead we just return NETDEV_TX_BUSY, so
the stack requeues the skb to try again later.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
---
 drivers/net/3c59x.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger - Aug. 11, 2010, 4:09 p.m.
On Wed, 11 Aug 2010 11:12:57 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:

> If netconsole is in use, there is a possibility for deadlock in 3c59x between
> boomerang_interrupt and boomerang_start_xmit.  Both routines take the vp->lock,
> and if netconsole is in use, a pr_* call from the boomerang_interrupt routine
> will result in the netconsole code attempting to trnasmit an skb, which can try
> to take the same spin lock, resulting in de

I thought we agreed that any device supporting netconsole agrees
to not call printk in the transmit path. Just kill the pr_* call.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - Aug. 11, 2010, 5:51 p.m.
On Wed, Aug 11, 2010 at 12:09:00PM -0400, Stephen Hemminger wrote:
> On Wed, 11 Aug 2010 11:12:57 -0400
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > If netconsole is in use, there is a possibility for deadlock in 3c59x between
> > boomerang_interrupt and boomerang_start_xmit.  Both routines take the vp->lock,
> > and if netconsole is in use, a pr_* call from the boomerang_interrupt routine
> > will result in the netconsole code attempting to trnasmit an skb, which can try
> > to take the same spin lock, resulting in de
> 
> I thought we agreed that any device supporting netconsole agrees
> to not call printk in the transmit path. Just kill the pr_* call.
> 

3c59x isn't calling pr_* in the transmit path, its calling it in the interrupt
handler, but since the interrupt handler and the transmit path share a spinlock,
you get a deadlock in that case.  We could kill the several pr_debug calls in
the interrupt path, but I don't think thats really needed in this case.

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu - Aug. 13, 2010, 3:15 p.m.
Stephen Hemminger <shemminger@vyatta.com> wrote:
>
> I thought we agreed that any device supporting netconsole agrees
> to not call printk in the transmit path. Just kill the pr_* call.

Not calling printk from the driver doesn't help since you can
still call printk from an IRQ handler.

Cheers,
Eric Dumazet - Aug. 23, 2010, 8:01 p.m.
Le mercredi 11 août 2010 à 11:12 -0400, Neil Horman a écrit :
> If netconsole is in use, there is a possibility for deadlock in 3c59x between
> boomerang_interrupt and boomerang_start_xmit.  Both routines take the vp->lock,
> and if netconsole is in use, a pr_* call from the boomerang_interrupt routine
> will result in the netconsole code attempting to trnasmit an skb, which can try
> to take the same spin lock, resulting in deadlock.
> 
> The fix is pretty straightforward.  This patch allocats a bit in the 3c59x
> private structure to indicate that its handling an interrupt.  If we get into
> the transmit routine and that bit is set, we can be sure that we have recursed
> and will deadlock if we continue, so instead we just return NETDEV_TX_BUSY, so
> the stack requeues the skb to try again later.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> ---
>  drivers/net/3c59x.c |   15 ++++++++++++++-
>  1 files changed, 14 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
> index c754d88..c685a55 100644
> --- a/drivers/net/3c59x.c
> +++ b/drivers/net/3c59x.c
> @@ -633,7 +633,8 @@ struct vortex_private {
>  		open:1,
>  		medialock:1,
>  		must_free_region:1,				/* Flag: if zero, Cardbus owns the I/O region */
> -		large_frames:1;			/* accept large frames */
> +		large_frames:1,			/* accept large frames */
> +		handling_irq:1;			/* private in_irq indicator */


It would be safer and faster to use a dedicated 'int' instead of a
bitfield.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - Aug. 23, 2010, 8:24 p.m.
On Mon, Aug 23, 2010 at 10:01:34PM +0200, Eric Dumazet wrote:
> Le mercredi 11 août 2010 à 11:12 -0400, Neil Horman a écrit :
> > If netconsole is in use, there is a possibility for deadlock in 3c59x between
> > boomerang_interrupt and boomerang_start_xmit.  Both routines take the vp->lock,
> > and if netconsole is in use, a pr_* call from the boomerang_interrupt routine
> > will result in the netconsole code attempting to trnasmit an skb, which can try
> > to take the same spin lock, resulting in deadlock.
> > 
> > The fix is pretty straightforward.  This patch allocats a bit in the 3c59x
> > private structure to indicate that its handling an interrupt.  If we get into
> > the transmit routine and that bit is set, we can be sure that we have recursed
> > and will deadlock if we continue, so instead we just return NETDEV_TX_BUSY, so
> > the stack requeues the skb to try again later.
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > ---
> >  drivers/net/3c59x.c |   15 ++++++++++++++-
> >  1 files changed, 14 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
> > index c754d88..c685a55 100644
> > --- a/drivers/net/3c59x.c
> > +++ b/drivers/net/3c59x.c
> > @@ -633,7 +633,8 @@ struct vortex_private {
> >  		open:1,
> >  		medialock:1,
> >  		must_free_region:1,				/* Flag: if zero, Cardbus owns the I/O region */
> > -		large_frames:1;			/* accept large frames */
> > +		large_frames:1,			/* accept large frames */
> > +		handling_irq:1;			/* private in_irq indicator */
> 
> 
> It would be safer and faster to use a dedicated 'int' instead of a
> bitfield.
> 
> 
> 
> 


Faster I agree with, although I'm not sure if speed is really a big issue here,
given that this is a ancient (but fairly well used) 10/100 adapter.  And we
still have space in the octet that that bitfield is living in, so I figured I'd
use that anyway.

As for safe, I'm not sure I follow you on that point.  Is there something
inherently dangerous about using a bitfield in this case?

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Aug. 23, 2010, 8:48 p.m.
Le lundi 23 août 2010 à 16:24 -0400, Neil Horman a écrit :

> 
> Faster I agree with, although I'm not sure if speed is really a big issue here,
> given that this is a ancient (but fairly well used) 10/100 adapter.  And we
> still have space in the octet that that bitfield is living in, so I figured I'd
> use that anyway.
> 
> As for safe, I'm not sure I follow you on that point.  Is there something
> inherently dangerous about using a bitfield in this case?
> 

A bitfield is not SMP safe.

Are you sure another cpu is not changing another bit, using a non atomic
RMW sequence, and your bit change is lost ?

Quite tricky to check I suppose, so just add an "int" ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - Aug. 23, 2010, 11:41 p.m.
On Mon, Aug 23, 2010 at 10:48:58PM +0200, Eric Dumazet wrote:
> Le lundi 23 août 2010 à 16:24 -0400, Neil Horman a écrit :
> 
> > 
> > Faster I agree with, although I'm not sure if speed is really a big issue here,
> > given that this is a ancient (but fairly well used) 10/100 adapter.  And we
> > still have space in the octet that that bitfield is living in, so I figured I'd
> > use that anyway.
> > 
> > As for safe, I'm not sure I follow you on that point.  Is there something
> > inherently dangerous about using a bitfield in this case?
> > 
> 
> A bitfield is not SMP safe.
> 
> Are you sure another cpu is not changing another bit, using a non atomic
> RMW sequence, and your bit change is lost ?
> 
> Quite tricky to check I suppose, so just add an "int" ;)
> 
> 
> 
> 

Ah, I see what your saying.  Since bitfield access is not a single instruction
you get races when multiple accesses take place at once, since the assignment is
non-atomic, nor is the read.

In this case, thats ok.  I say that because in the only time we really care what
the value of this bitfield is, is when we've recursed from the interrupt handler
to the netconsole module, back into the transmit path on the same cpu.  In that
case the read is guaranteed to be ordered after the write, since its a linear
code path.  In the case where cpu 0 is in the interrupt handler and cpu1 is
going into the transmit method for this driver, we don't really care what the
value of the bitfield is, its a don't care.  If we read it as a zero, thats ok,
we have the driver-internal sempahore still protecting us (the one that
deadlocks if you recurse via netconsole on the same cpu).  And if we read it as
a 1, thats ok too, because we simply cause the network scheduler to queue the
frame and send it again as soon as we're out of the interrupt handler.

Theres still the size vs. speed issue with the conversion to int though.  I
figured since it was already using bitfields in lots of places, there wouldn't
be much performance impact.  But if you feel really strongly about it, let me
know, and I'll make the conversion.

Neil


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Aug. 24, 2010, 8:48 a.m.
From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 23 Aug 2010 19:41:44 -0400

> Ah, I see what your saying.  Since bitfield access is not a single instruction
> you get races when multiple accesses take place at once, since the assignment is
> non-atomic, nor is the read.
> 
> In this case, thats ok.  I say that because in the only time we really care what
> the value of this bitfield is, is when we've recursed from the interrupt handler
> to the netconsole module, back into the transmit path on the same cpu.  In that
> case the read is guaranteed to be ordered after the write, since its a linear
> code path.  In the case where cpu 0 is in the interrupt handler and cpu1 is
> going into the transmit method for this driver, we don't really care what the
> value of the bitfield is, its a don't care.  If we read it as a zero, thats ok,
> we have the driver-internal sempahore still protecting us (the one that
> deadlocks if you recurse via netconsole on the same cpu).  And if we read it as
> a 1, thats ok too, because we simply cause the network scheduler to queue the
> frame and send it again as soon as we're out of the interrupt handler.

Right this should be OK.

The only write to the bit happens with the lock held.

The other bits are never modified while the device is active
and interrupts can run.

I've applied Neil's patch, thanks Neil.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index c754d88..c685a55 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -633,7 +633,8 @@  struct vortex_private {
 		open:1,
 		medialock:1,
 		must_free_region:1,				/* Flag: if zero, Cardbus owns the I/O region */
-		large_frames:1;			/* accept large frames */
+		large_frames:1,			/* accept large frames */
+		handling_irq:1;			/* private in_irq indicator */
 	int drv_flags;
 	u16 status_enable;
 	u16 intr_enable;
@@ -2133,6 +2134,15 @@  boomerang_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			   dev->name, vp->cur_tx);
 	}
 
+	/*
+	 * We can't allow a recursion from our interrupt handler back into the
+	 * tx routine, as they take the same spin lock, and that causes
+	 * deadlock.  Just return NETDEV_TX_BUSY and let the stack try again in
+	 * a bit
+	 */
+	if (vp->handling_irq)
+		return NETDEV_TX_BUSY;
+
 	if (vp->cur_tx - vp->dirty_tx >= TX_RING_SIZE) {
 		if (vortex_debug > 0)
 			pr_warning("%s: BUG! Tx Ring full, refusing to send buffer.\n",
@@ -2335,11 +2345,13 @@  boomerang_interrupt(int irq, void *dev_id)
 
 	ioaddr = vp->ioaddr;
 
+
 	/*
 	 * It seems dopey to put the spinlock this early, but we could race against vortex_tx_timeout
 	 * and boomerang_start_xmit
 	 */
 	spin_lock(&vp->lock);
+	vp->handling_irq = 1;
 
 	status = ioread16(ioaddr + EL3_STATUS);
 
@@ -2447,6 +2459,7 @@  boomerang_interrupt(int irq, void *dev_id)
 		pr_debug("%s: exiting interrupt, status %4.4x.\n",
 			   dev->name, status);
 handler_exit:
+	vp->handling_irq = 0;
 	spin_unlock(&vp->lock);
 	return IRQ_HANDLED;
 }