Patchwork ks8851: Cancel any pending IRQ work

login
register
mail settings
Submitter mjr@cs.wisc.edu
Date April 12, 2012, 4:44 p.m.
Message ID <1334249091-7605-1-git-send-email-mjr@cs.wisc.edu>
Download mbox | patch
Permalink /patch/152139/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

mjr@cs.wisc.edu - April 12, 2012, 4:44 p.m.
From: Matt Renzelmann <mjr@cs.wisc.edu>

An unexpected/spurious interrupt may cause the irq_work queue to
execute during or after module unload, which can cause a crash.  It
should be canceled.

Signed-off-by: Matt Renzelmann <mjr@cs.wisc.edu>
---
 drivers/net/ethernet/micrel/ks8851.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
Stephen Boyd - April 12, 2012, 8:19 p.m.
On 04/12/12 09:44, mjr@cs.wisc.edu wrote:
> From: Matt Renzelmann <mjr@cs.wisc.edu>
>
> An unexpected/spurious interrupt may cause the irq_work queue to
> execute during or after module unload, which can cause a crash.  It
> should be canceled.
>
> Signed-off-by: Matt Renzelmann <mjr@cs.wisc.edu>
> ---
>  drivers/net/ethernet/micrel/ks8851.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c
> index c722aa6..ab46953 100644
> --- a/drivers/net/ethernet/micrel/ks8851.c
> +++ b/drivers/net/ethernet/micrel/ks8851.c
> @@ -1540,6 +1540,7 @@ static int __devexit ks8851_remove(struct spi_device *spi)
>  		dev_info(&spi->dev, "remove\n");
>  
>  	unregister_netdev(priv->netdev);
> +	cancel_work_sync(&priv->irq_work);
>  	free_irq(spi->irq, priv);
>  	free_netdev(priv->netdev);
>  

Is this actually solving anything? Presumably cancel_work_sync() could
run and then another spurious interrupt could come in after that
function returns and we would have the same problem again. We should
probably free the irq before unregistering the netdev so that
ks8851_net_stop() would run after the interrupt is no longer registered,
and the flush_work() in there would finish the last work. But then we
have a problem where we're enabling the irq in the irq_work callback
after the irq has been freed. Ugh.

I also see a potential deadlock in ks8851_net_stop(). ks8851_net_stop()
holds the ks->lock while calling flush_work() which could deadlock if an
interrupt comes and schedules an irq_work between the time
ks8851_net_stop() grabs the mutex and calls flush_work().
mjr@cs.wisc.edu - April 12, 2012, 8:34 p.m.
> 
> Is this actually solving anything? Presumably cancel_work_sync() could
> run and then another spurious interrupt could come in after that
> function returns and we would have the same problem again. We should
> probably free the irq before unregistering the netdev so that
> ks8851_net_stop() would run after the interrupt is no longer registered,
> and the flush_work() in there would finish the last work. But then we
> have a problem where we're enabling the irq in the irq_work callback
> after the irq has been freed. Ugh.
> 
> I also see a potential deadlock in ks8851_net_stop(). ks8851_net_stop()
> holds the ks->lock while calling flush_work() which could deadlock if an
> interrupt comes and schedules an irq_work between the time
> ks8851_net_stop() grabs the mutex and calls flush_work().
> 

I agree on all counts -- the patch is buggy, though it does at least "shrink"
the window of vulnerability.  Frankly, I don't believe I'm qualified to write an
appropriate patch for this driver, at least without spending considerably more
time on it.

FWIW, I found this problem with a new driver-testing tool we've developed called
SymDrive, and my goal is primarily to determine if the bug is real or not.  The
tool is imperfect and we are trying to validate its operation.

That said, if there is an issue here, and we can come up with an appropriate
fix, then I'd be happy to write a patch for it.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Boyd - April 13, 2012, 6:32 p.m.
On 04/12/12 13:34, Matt Renzelmann wrote:
> I agree on all counts -- the patch is buggy, though it does at least "shrink"
> the window of vulnerability.  Frankly, I don't believe I'm qualified to write an
> appropriate patch for this driver, at least without spending considerably more
> time on it.
>
> FWIW, I found this problem with a new driver-testing tool we've developed called
> SymDrive, and my goal is primarily to determine if the bug is real or not.  The
> tool is imperfect and we are trying to validate its operation.

The bug is real if your interrupt controller is broken :-) One could
argue that it's outside the scope of this driver to handle broken
interrupt controllers or buggy genirq code, but being defensive sounds
like a good idea.

>
> That said, if there is an issue here, and we can come up with an appropriate
> fix, then I'd be happy to write a patch for it.
>

I'll see what I can do in the next few days about the deadlock I mentioned.

Patch

diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c
index c722aa6..ab46953 100644
--- a/drivers/net/ethernet/micrel/ks8851.c
+++ b/drivers/net/ethernet/micrel/ks8851.c
@@ -1540,6 +1540,7 @@  static int __devexit ks8851_remove(struct spi_device *spi)
 		dev_info(&spi->dev, "remove\n");
 
 	unregister_netdev(priv->netdev);
+	cancel_work_sync(&priv->irq_work);
 	free_irq(spi->irq, priv);
 	free_netdev(priv->netdev);