diff mbox

[1/2] cxl: Keep IRQ mappings on context teardown

Message ID 1461301069-12331-1-git-send-email-mikey@neuling.org (mailing list archive)
State Accepted
Headers show

Commit Message

Michael Neuling April 22, 2016, 4:57 a.m. UTC
Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
allocate the mapping again, the generic code will give the same
mapping used last time.

Doing this works around a race in the generic code. Masking the
interrupt introduces a race which can crash the kernel or result in
IRQ that is never EOIed. The lost of EOI results in all subsequent
mappings to the same HW IRQ never receiving an interrupt.

We've seen this race with cxl test cases which are doing heavy context
startup and teardown at the same time as heavy interrupt load.

A fix to the generic code is being investigated also.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org # 3.8
---
 drivers/misc/cxl/irq.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Andrew Donnellan April 22, 2016, 5:14 a.m. UTC | #1
On 22/04/16 14:57, Michael Neuling wrote:
> Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
> allocate the mapping again, the generic code will give the same
> mapping used last time.
>
> Doing this works around a race in the generic code. Masking the
> interrupt introduces a race which can crash the kernel or result in
> IRQ that is never EOIed. The lost of EOI results in all subsequent
> mappings to the same HW IRQ never receiving an interrupt.
>
> We've seen this race with cxl test cases which are doing heavy context
> startup and teardown at the same time as heavy interrupt load.
>
> A fix to the generic code is being investigated also.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> cc: stable@vger.kernel.org # 3.8

Tested on top of 4.6-rc3 using the genwqe-echo test utility[0].

Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

[0] 
https://github.com/ibm-genwqe/genwqe-user/blob/master/tools/genwqe_echo.c
Ian Munsie April 22, 2016, 5:18 a.m. UTC | #2
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Vaibhav Jain April 22, 2016, 6:03 a.m. UTC | #3
Michael Neuling <mikey@neuling.org> writes:

> Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
> allocate the mapping again, the generic code will give the same
> mapping used last time.
>
> Doing this works around a race in the generic code. Masking the
> interrupt introduces a race which can crash the kernel or result in
> IRQ that is never EOIed. The lost of EOI results in all subsequent
> mappings to the same HW IRQ never receiving an interrupt.
>
> We've seen this race with cxl test cases which are doing heavy context
> startup and teardown at the same time as heavy interrupt load.
>
> A fix to the generic code is being investigated also.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> cc: stable@vger.kernel.org # 3.8

Tested on top of following skiboot patches that fix potential races in
phb.

http://patchwork.ozlabs.org/patch/581764/
http://patchwork.ozlabs.org/patch/581765/

Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Michael Ellerman April 28, 2016, 1:52 a.m. UTC | #4
On Fri, 2016-22-04 at 04:57:48 UTC, Michael Neuling wrote:
> Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
> allocate the mapping again, the generic code will give the same
> mapping used last time.
> 
> Doing this works around a race in the generic code. Masking the
> interrupt introduces a race which can crash the kernel or result in
> IRQ that is never EOIed. The lost of EOI results in all subsequent
> mappings to the same HW IRQ never receiving an interrupt.
> 
> We've seen this race with cxl test cases which are doing heavy context
> startup and teardown at the same time as heavy interrupt load.
> 
> A fix to the generic code is being investigated also.
> 
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> cc: stable@vger.kernel.org # 3.8
> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Acked-by: Ian Munsie <imunsie@au1.ibm.com>
> Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/d6776bba44d9752f6cdf640046

cheers
diff mbox

Patch

diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index be646dc..8def455 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -203,7 +203,6 @@  unsigned int cxl_map_irq(struct cxl *adapter, irq_hw_number_t hwirq,
 void cxl_unmap_irq(unsigned int virq, void *cookie)
 {
 	free_irq(virq, cookie);
-	irq_dispose_mapping(virq);
 }
 
 int cxl_register_one_irq(struct cxl *adapter,