Message ID | 1334249091-7605-1-git-send-email-mjr@cs.wisc.edu |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On 04/12/12 09:44, mjr@cs.wisc.edu wrote: > From: Matt Renzelmann <mjr@cs.wisc.edu> > > An unexpected/spurious interrupt may cause the irq_work queue to > execute during or after module unload, which can cause a crash. It > should be canceled. > > Signed-off-by: Matt Renzelmann <mjr@cs.wisc.edu> > --- > drivers/net/ethernet/micrel/ks8851.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c > index c722aa6..ab46953 100644 > --- a/drivers/net/ethernet/micrel/ks8851.c > +++ b/drivers/net/ethernet/micrel/ks8851.c > @@ -1540,6 +1540,7 @@ static int __devexit ks8851_remove(struct spi_device *spi) > dev_info(&spi->dev, "remove\n"); > > unregister_netdev(priv->netdev); > + cancel_work_sync(&priv->irq_work); > free_irq(spi->irq, priv); > free_netdev(priv->netdev); > Is this actually solving anything? Presumably cancel_work_sync() could run and then another spurious interrupt could come in after that function returns and we would have the same problem again. We should probably free the irq before unregistering the netdev so that ks8851_net_stop() would run after the interrupt is no longer registered, and the flush_work() in there would finish the last work. But then we have a problem where we're enabling the irq in the irq_work callback after the irq has been freed. Ugh. I also see a potential deadlock in ks8851_net_stop(). ks8851_net_stop() holds the ks->lock while calling flush_work() which could deadlock if an interrupt comes and schedules an irq_work between the time ks8851_net_stop() grabs the mutex and calls flush_work().
> > Is this actually solving anything? Presumably cancel_work_sync() could > run and then another spurious interrupt could come in after that > function returns and we would have the same problem again. We should > probably free the irq before unregistering the netdev so that > ks8851_net_stop() would run after the interrupt is no longer registered, > and the flush_work() in there would finish the last work. But then we > have a problem where we're enabling the irq in the irq_work callback > after the irq has been freed. Ugh. > > I also see a potential deadlock in ks8851_net_stop(). ks8851_net_stop() > holds the ks->lock while calling flush_work() which could deadlock if an > interrupt comes and schedules an irq_work between the time > ks8851_net_stop() grabs the mutex and calls flush_work(). > I agree on all counts -- the patch is buggy, though it does at least "shrink" the window of vulnerability. Frankly, I don't believe I'm qualified to write an appropriate patch for this driver, at least without spending considerably more time on it. FWIW, I found this problem with a new driver-testing tool we've developed called SymDrive, and my goal is primarily to determine if the bug is real or not. The tool is imperfect and we are trying to validate its operation. That said, if there is an issue here, and we can come up with an appropriate fix, then I'd be happy to write a patch for it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/12/12 13:34, Matt Renzelmann wrote: > I agree on all counts -- the patch is buggy, though it does at least "shrink" > the window of vulnerability. Frankly, I don't believe I'm qualified to write an > appropriate patch for this driver, at least without spending considerably more > time on it. > > FWIW, I found this problem with a new driver-testing tool we've developed called > SymDrive, and my goal is primarily to determine if the bug is real or not. The > tool is imperfect and we are trying to validate its operation. The bug is real if your interrupt controller is broken :-) One could argue that it's outside the scope of this driver to handle broken interrupt controllers or buggy genirq code, but being defensive sounds like a good idea. > > That said, if there is an issue here, and we can come up with an appropriate > fix, then I'd be happy to write a patch for it. > I'll see what I can do in the next few days about the deadlock I mentioned.
diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index c722aa6..ab46953 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -1540,6 +1540,7 @@ static int __devexit ks8851_remove(struct spi_device *spi) dev_info(&spi->dev, "remove\n"); unregister_netdev(priv->netdev); + cancel_work_sync(&priv->irq_work); free_irq(spi->irq, priv); free_netdev(priv->netdev);