diff mbox

powerpc: MSI: Fix race condition in tearing down MSI interrupts

Message ID 20150910043621.GA21735@iris.ozlabs.ibm.com (mailing list archive)
State Accepted
Headers show

Commit Message

Paul Mackerras Sept. 10, 2015, 4:36 a.m. UTC
This fixes a race which can result in the same virtual IRQ number
being assigned to two different MSI interrupts.  The most visible
consequence of that is usually a warning and stack trace from the
sysfs code about an attempt to create a duplicate entry in sysfs.

The race happens when one CPU (say CPU 0) is disposing of an MSI
while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
(for example) pnv_teardown_msi_irqs(), which calls
msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
to calling irq_dispose_mapping() to free up the virtal IRQ number,
CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
MSI, and gets the same hardware IRQ number that CPU 0 just freed.
CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
which sees that there is currently a mapping for that hardware IRQ
number and returns the corresponding virtual IRQ number (which is
the same virtual IRQ number that CPU 0 was using).  CPU 0 then
calls irq_dispose_mapping() and frees that virtual IRQ number.
Now, if another CPU comes along and calls irq_create_mapping(), it
is likely to get the virtual IRQ number that was just freed,
resulting in the same virtual IRQ number apparently being used for
two different hardware interrupts.

To fix this race, we just move the call to msi_bitmap_free_hwirqs()
to after the call to irq_dispose_mapping().  Since virq_to_hw()
doesn't work for the virtual IRQ number after irq_dispose_mapping()
has been called, we need to call it before irq_dispose_mapping() and
remember the result for the msi_bitmap_free_hwirqs() call.

The pattern of calling msi_bitmap_free_hwirqs() before
irq_dispose_mapping() appears in 5 places under arch/powerpc, and
appears to have originated in commit 05af7bd2d75e ("[POWERPC] MPIC
U3/U4 MSI backend") from 2007.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/platforms/pasemi/msi.c  | 5 +++--
 arch/powerpc/platforms/powernv/pci.c | 5 +++--
 arch/powerpc/sysdev/fsl_msi.c        | 5 +++--
 arch/powerpc/sysdev/mpic_u3msi.c     | 5 +++--
 arch/powerpc/sysdev/ppc4xx_msi.c     | 5 +++--
 5 files changed, 15 insertions(+), 10 deletions(-)

Comments

Michael Ellerman Sept. 10, 2015, 4:55 a.m. UTC | #1
On Thu, 2015-09-10 at 14:36 +1000, Paul Mackerras wrote:
> This fixes a race which can result in the same virtual IRQ number
> being assigned to two different MSI interrupts.  The most visible
> consequence of that is usually a warning and stack trace from the
> sysfs code about an attempt to create a duplicate entry in sysfs.
> 
> The race happens when one CPU (say CPU 0) is disposing of an MSI
> while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
> (for example) pnv_teardown_msi_irqs(), which calls
> msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
> hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
> to calling irq_dispose_mapping() to free up the virtal IRQ number,
> CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
> MSI, and gets the same hardware IRQ number that CPU 0 just freed.
> CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
> which sees that there is currently a mapping for that hardware IRQ
> number and returns the corresponding virtual IRQ number (which is
> the same virtual IRQ number that CPU 0 was using).  CPU 0 then
> calls irq_dispose_mapping() and frees that virtual IRQ number.
> Now, if another CPU comes along and calls irq_create_mapping(), it
> is likely to get the virtual IRQ number that was just freed,
> resulting in the same virtual IRQ number apparently being used for
> two different hardware interrupts.
> 
> To fix this race, we just move the call to msi_bitmap_free_hwirqs()
> to after the call to irq_dispose_mapping().  Since virq_to_hw()
> doesn't work for the virtual IRQ number after irq_dispose_mapping()
> has been called, we need to call it before irq_dispose_mapping() and
> remember the result for the msi_bitmap_free_hwirqs() call.
> 
> The pattern of calling msi_bitmap_free_hwirqs() before
> irq_dispose_mapping() appears in 5 places under arch/powerpc, and
> appears to have originated in commit 05af7bd2d75e ("[POWERPC] MPIC
> U3/U4 MSI backend") from 2007.

Guilty.

Any reason this shouldn't go to stable too?

>  arch/powerpc/platforms/pasemi/msi.c  | 5 +++--
>  arch/powerpc/platforms/powernv/pci.c | 5 +++--
>  arch/powerpc/sysdev/fsl_msi.c        | 5 +++--
>  arch/powerpc/sysdev/mpic_u3msi.c     | 5 +++--
>  arch/powerpc/sysdev/ppc4xx_msi.c     | 5 +++--
>  5 files changed, 15 insertions(+), 10 deletions(-)

I assume you've tested on powernv, but none of the other platforms?

cheers
Paul Mackerras Sept. 10, 2015, 5:26 a.m. UTC | #2
On Thu, Sep 10, 2015 at 02:55:08PM +1000, Michael Ellerman wrote:
> On Thu, 2015-09-10 at 14:36 +1000, Paul Mackerras wrote:
> > This fixes a race which can result in the same virtual IRQ number
> > being assigned to two different MSI interrupts.  The most visible
> > consequence of that is usually a warning and stack trace from the
> > sysfs code about an attempt to create a duplicate entry in sysfs.
> > 
> > The race happens when one CPU (say CPU 0) is disposing of an MSI
> > while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
> > (for example) pnv_teardown_msi_irqs(), which calls
> > msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
> > hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
> > to calling irq_dispose_mapping() to free up the virtal IRQ number,
> > CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
> > MSI, and gets the same hardware IRQ number that CPU 0 just freed.
> > CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
> > which sees that there is currently a mapping for that hardware IRQ
> > number and returns the corresponding virtual IRQ number (which is
> > the same virtual IRQ number that CPU 0 was using).  CPU 0 then
> > calls irq_dispose_mapping() and frees that virtual IRQ number.
> > Now, if another CPU comes along and calls irq_create_mapping(), it
> > is likely to get the virtual IRQ number that was just freed,
> > resulting in the same virtual IRQ number apparently being used for
> > two different hardware interrupts.
> > 
> > To fix this race, we just move the call to msi_bitmap_free_hwirqs()
> > to after the call to irq_dispose_mapping().  Since virq_to_hw()
> > doesn't work for the virtual IRQ number after irq_dispose_mapping()
> > has been called, we need to call it before irq_dispose_mapping() and
> > remember the result for the msi_bitmap_free_hwirqs() call.
> > 
> > The pattern of calling msi_bitmap_free_hwirqs() before
> > irq_dispose_mapping() appears in 5 places under arch/powerpc, and
> > appears to have originated in commit 05af7bd2d75e ("[POWERPC] MPIC
> > U3/U4 MSI backend") from 2007.
> 
> Guilty.
> 
> Any reason this shouldn't go to stable too?

The patch doesn't apply cleanly as-is to previous kernel versions due
to commit 2921d1790eee ("powerpc/PCI: Use for_pci_msi_entry() to
access MSI device list") changing some of the context lines.  Other
than that yes it probably should go to stable too.

> >  arch/powerpc/platforms/pasemi/msi.c  | 5 +++--
> >  arch/powerpc/platforms/powernv/pci.c | 5 +++--
> >  arch/powerpc/sysdev/fsl_msi.c        | 5 +++--
> >  arch/powerpc/sysdev/mpic_u3msi.c     | 5 +++--
> >  arch/powerpc/sysdev/ppc4xx_msi.c     | 5 +++--
> >  5 files changed, 15 insertions(+), 10 deletions(-)
> 
> I assume you've tested on powernv, but none of the other platforms?

Compile tested on all relevant platforms, but booted only on powernv.
Alexey verified that on powernv he no longer sees the splat from
sysfs with the patch applied.

Paul.
Michael Ellerman Sept. 14, 2015, 2:55 a.m. UTC | #3
On Thu, 2015-10-09 at 04:36:21 UTC, Paul Mackerras wrote:
> This fixes a race which can result in the same virtual IRQ number
> being assigned to two different MSI interrupts.  The most visible
> consequence of that is usually a warning and stack trace from the
> sysfs code about an attempt to create a duplicate entry in sysfs.

<snip>

> The pattern of calling msi_bitmap_free_hwirqs() before
> irq_dispose_mapping() appears in 5 places under arch/powerpc, and
> appears to have originated in commit 05af7bd2d75e ("[POWERPC] MPIC
> U3/U4 MSI backend") from 2007.
> 
> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/e297c939b745e420ef0b9dc9

cheers
diff mbox

Patch

diff --git a/arch/powerpc/platforms/pasemi/msi.c b/arch/powerpc/platforms/pasemi/msi.c
index e66ef19..b304a9f 100644
--- a/arch/powerpc/platforms/pasemi/msi.c
+++ b/arch/powerpc/platforms/pasemi/msi.c
@@ -63,6 +63,7 @@  static struct irq_chip mpic_pasemi_msi_chip = {
 static void pasemi_msi_teardown_msi_irqs(struct pci_dev *pdev)
 {
 	struct msi_desc *entry;
+	irq_hw_number_t hwirq;
 
 	pr_debug("pasemi_msi_teardown_msi_irqs, pdev %p\n", pdev);
 
@@ -70,10 +71,10 @@  static void pasemi_msi_teardown_msi_irqs(struct pci_dev *pdev)
 		if (entry->irq == NO_IRQ)
 			continue;
 
+		hwirq = virq_to_hw(entry->irq);
 		irq_set_msi_desc(entry->irq, NULL);
-		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap,
-				       virq_to_hw(entry->irq), ALLOC_CHUNK);
 		irq_dispose_mapping(entry->irq);
+		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, ALLOC_CHUNK);
 	}
 
 	return;
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 9b2480b..f2dd772 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -99,6 +99,7 @@  void pnv_teardown_msi_irqs(struct pci_dev *pdev)
 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 	struct pnv_phb *phb = hose->private_data;
 	struct msi_desc *entry;
+	irq_hw_number_t hwirq;
 
 	if (WARN_ON(!phb))
 		return;
@@ -106,10 +107,10 @@  void pnv_teardown_msi_irqs(struct pci_dev *pdev)
 	for_each_pci_msi_entry(entry, pdev) {
 		if (entry->irq == NO_IRQ)
 			continue;
+		hwirq = virq_to_hw(entry->irq);
 		irq_set_msi_desc(entry->irq, NULL);
-		msi_bitmap_free_hwirqs(&phb->msi_bmp,
-			virq_to_hw(entry->irq) - phb->msi_base, 1);
 		irq_dispose_mapping(entry->irq);
+		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, 1);
 	}
 }
 #endif /* CONFIG_PCI_MSI */
diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 5916da1..48a576a 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -128,15 +128,16 @@  static void fsl_teardown_msi_irqs(struct pci_dev *pdev)
 {
 	struct msi_desc *entry;
 	struct fsl_msi *msi_data;
+	irq_hw_number_t hwirq;
 
 	for_each_pci_msi_entry(entry, pdev) {
 		if (entry->irq == NO_IRQ)
 			continue;
+		hwirq = virq_to_hw(entry->irq);
 		msi_data = irq_get_chip_data(entry->irq);
 		irq_set_msi_desc(entry->irq, NULL);
-		msi_bitmap_free_hwirqs(&msi_data->bitmap,
-				       virq_to_hw(entry->irq), 1);
 		irq_dispose_mapping(entry->irq);
+		msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
 	}
 
 	return;
diff --git a/arch/powerpc/sysdev/mpic_u3msi.c b/arch/powerpc/sysdev/mpic_u3msi.c
index 70fbd56..2cbc7e2 100644
--- a/arch/powerpc/sysdev/mpic_u3msi.c
+++ b/arch/powerpc/sysdev/mpic_u3msi.c
@@ -107,15 +107,16 @@  static u64 find_u4_magic_addr(struct pci_dev *pdev, unsigned int hwirq)
 static void u3msi_teardown_msi_irqs(struct pci_dev *pdev)
 {
 	struct msi_desc *entry;
+	irq_hw_number_t hwirq;
 
 	for_each_pci_msi_entry(entry, pdev) {
 		if (entry->irq == NO_IRQ)
 			continue;
 
+		hwirq = virq_to_hw(entry->irq);
 		irq_set_msi_desc(entry->irq, NULL);
-		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap,
-				       virq_to_hw(entry->irq), 1);
 		irq_dispose_mapping(entry->irq);
+		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, 1);
 	}
 
 	return;
diff --git a/arch/powerpc/sysdev/ppc4xx_msi.c b/arch/powerpc/sysdev/ppc4xx_msi.c
index 24d0470..8fb8061 100644
--- a/arch/powerpc/sysdev/ppc4xx_msi.c
+++ b/arch/powerpc/sysdev/ppc4xx_msi.c
@@ -124,16 +124,17 @@  void ppc4xx_teardown_msi_irqs(struct pci_dev *dev)
 {
 	struct msi_desc *entry;
 	struct ppc4xx_msi *msi_data = &ppc4xx_msi;
+	irq_hw_number_t hwirq;
 
 	dev_dbg(&dev->dev, "PCIE-MSI: tearing down msi irqs\n");
 
 	for_each_pci_msi_entry(entry, dev) {
 		if (entry->irq == NO_IRQ)
 			continue;
+		hwirq = virq_to_hw(entry->irq);
 		irq_set_msi_desc(entry->irq, NULL);
-		msi_bitmap_free_hwirqs(&msi_data->bitmap,
-				virq_to_hw(entry->irq), 1);
 		irq_dispose_mapping(entry->irq);
+		msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
 	}
 }