diff mbox

pci: aer: wait till the workqueue completes before free memory

Message ID 20151217143243.GA9654@linutronix.de
State Changes Requested
Headers show

Commit Message

Sebastian Andrzej Siewior Dec. 17, 2015, 2:32 p.m. UTC
I start a binary which should flash the FPGA and re-enumare the PCI-BUS
and find a new device. It works most of the time. With SLUB debug it
crashes on each iteration with something like this (compressed output):

| pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
| Unable to handle kernel paging request for data at address 0x27ef9e3e
| Faulting instruction address: 0x602f5328
| Oops: Kernel access of bad area, sig: 11 [#1]
| Workqueue: events aer_isr
| GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
| NIP [602f5328] pci_walk_bus+0xd4/0x104

Register 25 has the user-after magic. As it turns out, the old PCIe
device is leaving, generates an error before it left, aer_irq() is fired,
it schedules a work item. What happens now is that free_irq() is
invoked, all resources are gone *before* the aes_isr() work item is
completed.
So to fix this, I flush the workqueue to ensure that there is no more
work pending.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Bjorn, this could deserve a stable tag. However it seems to have been
like that even in v2.6.20.

 drivers/pci/pcie/aer/aerdrv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Jan. 6, 2016, 11:27 p.m. UTC | #1
Hi Sebastian,

On Thu, Dec 17, 2015 at 03:32:43PM +0100, Sebastian Andrzej Siewior wrote:
> I start a binary which should flash the FPGA and re-enumare the PCI-BUS
> and find a new device. It works most of the time. With SLUB debug it
> crashes on each iteration with something like this (compressed output):
> 
> | pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
> | Unable to handle kernel paging request for data at address 0x27ef9e3e
> | Faulting instruction address: 0x602f5328
> | Oops: Kernel access of bad area, sig: 11 [#1]
> | Workqueue: events aer_isr
> | GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
> | NIP [602f5328] pci_walk_bus+0xd4/0x104
> 
> Register 25 has the user-after magic. As it turns out, the old PCIe
> device is leaving, generates an error before it left, aer_irq() is fired,
> it schedules a work item. What happens now is that free_irq() is
> invoked, all resources are gone *before* the aes_isr() work item is
> completed.
> So to fix this, I flush the workqueue to ensure that there is no more
> work pending.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> Bjorn, this could deserve a stable tag. However it seems to have been
> like that even in v2.6.20.
> 
>  drivers/pci/pcie/aer/aerdrv.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
> index 0bf82a20a0fb..7acd27348098 100644
> --- a/drivers/pci/pcie/aer/aerdrv.c
> +++ b/drivers/pci/pcie/aer/aerdrv.c
> @@ -282,8 +282,10 @@ static void aer_remove(struct pcie_device *dev)
>  
>  	if (rpc) {
>  		/* If register interrupt service, it must be free. */
> -		if (rpc->isr)
> +		if (rpc->isr) {
>  			free_irq(dev->irq, dev);
> +			flush_work(&rpc->dpc_handler);
> +		}
>  
>  		wait_event(rpc->wait_release, rpc->prod_idx == rpc->cons_idx);

Your change looks reasonable.  But I'm curious about the wait_event()
just below it.  That *looks* like it's intended to do the same thing
as your flush_work().

Can you explain why the wait_event() isn't working?  If we add the
flush_work(), can we remove the wait_event() stuff?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior Jan. 15, 2016, 6:03 p.m. UTC | #2
* Bjorn Helgaas | 2016-01-06 17:27:58 [-0600]:

>Hi Sebastian,
Hi Bjorn,

>Your change looks reasonable.  But I'm curious about the wait_event()
>just below it.  That *looks* like it's intended to do the same thing
>as your flush_work().
Indeed.

>Can you explain why the wait_event() isn't working?  If we add the

aer_isr() invokes get_e_source() which increments rpc->cons_idx. So
the condition is valid after that and the function does not terminate
yes it invokes aer_isr_one_error().
That means if we have one CPU doing the ISR + workqueue task and another
CPU doing the aer_remove() removal thingy then the latter CPU evaluates
the condition to true and continues cleanup while the former is still in
aer_isr_one_error() wondering where the memory went.

>flush_work(), can we remove the wait_event() stuff?

I think so since its only purpose is to sync against removal which does
not work on SMP. So let me remove this and the wait_release member.

>Bjorn

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
index 0bf82a20a0fb..7acd27348098 100644
--- a/drivers/pci/pcie/aer/aerdrv.c
+++ b/drivers/pci/pcie/aer/aerdrv.c
@@ -282,8 +282,10 @@  static void aer_remove(struct pcie_device *dev)
 
 	if (rpc) {
 		/* If register interrupt service, it must be free. */
-		if (rpc->isr)
+		if (rpc->isr) {
 			free_irq(dev->irq, dev);
+			flush_work(&rpc->dpc_handler);
+		}
 
 		wait_event(rpc->wait_release, rpc->prod_idx == rpc->cons_idx);