Patchwork e100: Fix the TX workqueue race

login
register
mail settings
Submitter Alan Cox
Date April 23, 2010, 2:34 p.m.
Message ID <20100423143356.7092.45260.stgit@localhost.localdomain>
Download mbox | patch
Permalink /patch/50816/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Alan Cox - April 23, 2010, 2:34 p.m.
I'd assumed someone would have picked up on this and fixed it using rtnl_lock
as was suggested but it seems to have fallen through the cracks ?

Anyway this is I assume what was meant ?

---

Nothing stops the workqueue being left to run in parallel with close or a
few other operations. This causes double unmaps and the like.

See kerneloops.org #1041230 for an example

Signed-off-by: Alan Cox <alan@linux.intel.com>
---

 drivers/net/e100.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - April 23, 2010, 4:20 p.m.
On 04/23/2010 10:34 AM, Alan Cox wrote:
> I'd assumed someone would have picked up on this and fixed it using rtnl_lock
> as was suggested but it seems to have fallen through the cracks ?
>
> Anyway this is I assume what was meant ?
>
> ---
>
> Nothing stops the workqueue being left to run in parallel with close or a
> few other operations. This causes double unmaps and the like.
>
> See kerneloops.org #1041230 for an example
>
> Signed-off-by: Alan Cox<alan@linux.intel.com>

Acked-by: Jeff Garzik <jgarzik@redhat.com>

Glad someone finally fixed this, it has bugged me for years...


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 23, 2010, 11:31 p.m.
From: Alan Cox <alan@linux.intel.com>
Date: Fri, 23 Apr 2010 15:34:43 +0100

> I'd assumed someone would have picked up on this and fixed it using rtnl_lock
> as was suggested but it seems to have fallen through the cracks ?
> 
> Anyway this is I assume what was meant ?

I hope this doesn't deadlock with linkwatch, as that's generally
a problem we hit with trying to take RTNL from workqueues in
the networking.

Linkwatch takes the RTNL lock, and then can make calls into the driver
in it's main work loop.

But since you don't hold any driver locks here (you can't as if we did
we couldn't take the RTNL lock here at all) so it should be OK.

I'll apply this to net-2.6, thanks Alan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 23, 2010, 11:35 p.m.
From: David Miller <davem@davemloft.net>
Date: Fri, 23 Apr 2010 16:31:27 -0700 (PDT)

> I'll apply this to net-2.6, thanks Alan.

Nevermind...

Doesn't apply to net-2.6, but even when I fix that up it doesn't
even compile.  There is no 'dev' variable present etc.

You even use a combination of "dev" and "netdev" in the resulting
code block.

If it doesn't even build, I doubt it's been tested either.

Please resolve this and get some testing on it, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - April 24, 2010, 10:36 a.m.
On Fri, 23 Apr 2010 16:35:45 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: David Miller <davem@davemloft.net>
> Date: Fri, 23 Apr 2010 16:31:27 -0700 (PDT)
> 
> > I'll apply this to net-2.6, thanks Alan.
> 
> Nevermind...
> 
> Doesn't apply to net-2.6, but even when I fix that up it doesn't
> even compile.  There is no 'dev' variable present etc.
> 
> You even use a combination of "dev" and "netdev" in the resulting
> code block.
> 
> If it doesn't even build, I doubt it's been tested either.

Puzzling as it came from a building -next tree. Will see whats happened
next week if I get time, but I'm afraid net stuff isn't a priority - in
fact its disappointing that having diagnosed a bug months ago (which
was the hard bit) and posted a test patch months ago the maintainers
haven't fixed it.

Alan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - April 25, 2010, 3 a.m.
From: Alan Cox <alan@linux.intel.com>
Date: Sat, 24 Apr 2010 11:36:29 +0100

> Puzzling as it came from a building -next tree. Will see whats
> happened next week if I get time, but I'm afraid net stuff isn't a
> priority - in fact its disappointing that having diagnosed a bug
> months ago (which was the hard bit) and posted a test patch months
> ago the maintainers haven't fixed it.

It's disappointing to me that someone as experienced and skilled
as yourself can't generate a clean patch which is 1) against
the appropriate tree for a bug fix and 2) actually compiles.

Or is this too much to ask? :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 3e8d000..859e833 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2280,8 +2280,13 @@  static void e100_tx_timeout_task(struct work_struct *work)
 
 	netif_printk(nic, tx_err, KERN_DEBUG, nic->netdev,
 		     "scb.status=0x%02X\n", ioread8(&nic->csr->scb.status));
-	e100_down(netdev_priv(netdev));
-	e100_up(netdev_priv(netdev));
+
+	rtnl_lock();
+	if (netif_running(dev)) {
+		e100_down(netdev_priv(netdev));
+		e100_up(netdev_priv(netdev));
+	}
+	rtnl_unlock();
 }
 
 static int e100_loopback_test(struct nic *nic, enum loopback loopback_mode)