Patchwork Only disable/enable LSI interrupts in EEH

login
register
mail settings
Submitter Mike Mason
Date Feb. 10, 2009, 3:07 a.m.
Message ID <4990EF8B.7060203@us.ibm.com>
Download mbox | patch
Permalink /patch/22853/
State Superseded
Delegated to: Michael Ellerman
Headers show

Comments

Mike Mason - Feb. 10, 2009, 3:07 a.m.
The EEH code disables and enables interrupts during the
device recovery process.  This is unnecessary for MSI
and MSI-X interrupts because they are effectively disabled
by the DMA Stopped state when an EEH error occurs.  The 
current code is also incorrect for MSI-X interrupts.  It
doesn't take into account that MSI-X interrupts are tracked
in a different way than LSI/MSI interrupts.  This 
patch ensures only LSI interrupts are disabled/enabled.

The patch also includes a couple minor formatting fixes.


Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Linas Vepstas - Feb. 10, 2009, 5:14 p.m.
2009/2/9 Mike Mason <mmlnx@us.ibm.com>:
> The EEH code disables and enables interrupts during the
> device recovery process.  This is unnecessary for MSI
> and MSI-X interrupts because they are effectively disabled
> by the DMA Stopped state when an EEH error occurs.  The current code is also
> incorrect for MSI-X interrupts.  It
> doesn't take into account that MSI-X interrupts are tracked
> in a different way than LSI/MSI interrupts.  This patch ensures only LSI
> interrupts are disabled/enabled.
>
> The patch also includes a couple minor formatting fixes.
>
>
> Signed-off-by: Mike Mason <mmlnx@us.ibm.com>

Looks good to me.
Acked-by: Linas Vepstas <linasvepstas@gmail.com>

On a somewhat-related note: there was an issue (I forget
the details) where the kernel needed to shadow some sort
of MSI state so that it could be correctly, um, kept-track-of,
after an EEH reset (it didn't need to be restored, because
firmware did this(?)).  After some digging around and
discussion, we concluded that some generic PPC MSI
code needed to be altered to track this state, and/or
the main kernel MSI code needed to be changed to
(not?) track this state.  Mike Ellerman seemed to best
grasp this area ... was this ever fixed?

Or perhaps this is an alternate fix for that bug? It may
well have been that calling the MSI disable triggered
the problem, I don't remember now.

--linas
Michael Ellerman - Feb. 10, 2009, 11:38 p.m.
On Tue, 2009-02-10 at 11:14 -0600, Linas Vepstas wrote:
> 2009/2/9 Mike Mason <mmlnx@us.ibm.com>:
> > The EEH code disables and enables interrupts during the
> > device recovery process.  This is unnecessary for MSI
> > and MSI-X interrupts because they are effectively disabled
> > by the DMA Stopped state when an EEH error occurs.  The current code is also
> > incorrect for MSI-X interrupts.  It
> > doesn't take into account that MSI-X interrupts are tracked
> > in a different way than LSI/MSI interrupts.  This patch ensures only LSI
> > interrupts are disabled/enabled.
> >
> > The patch also includes a couple minor formatting fixes.
> >
> >
> > Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
> 
> Looks good to me.
> Acked-by: Linas Vepstas <linasvepstas@gmail.com>
> 
> On a somewhat-related note: there was an issue (I forget
> the details) where the kernel needed to shadow some sort
> of MSI state so that it could be correctly, um, kept-track-of,
> after an EEH reset (it didn't need to be restored, because
> firmware did this(?)).  After some digging around and
> discussion, we concluded that some generic PPC MSI
> code needed to be altered to track this state, and/or
> the main kernel MSI code needed to be changed to
> (not?) track this state.  Mike Ellerman seemed to best
> grasp this area ... was this ever fixed?
> 
> Or perhaps this is an alternate fix for that bug? It may
> well have been that calling the MSI disable triggered
> the problem, I don't remember now.

I'm pretty sure you're referring to this patch, which you acked :)

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1db3e890aed3ac39cded30d6e94618bda086f7ce

I don't know of anything else that fits your description?

cheers
Linas Vepstas - Feb. 10, 2009, 11:46 p.m.
2009/2/10 Michael Ellerman <michael@ellerman.id.au>:
> On Tue, 2009-02-10 at 11:14 -0600, Linas Vepstas wrote:
>> On a somewhat-related note: there was an issue (I forget
>> the details) where the kernel needed to shadow some sort
>> of MSI state so that it could be correctly, um, kept-track-of,
>> after an EEH reset (it didn't need to be restored, because
>> firmware did this(?)).  After some digging around and
>> discussion, we concluded that some generic PPC MSI
>> code needed to be altered to track this state, and/or
>> the main kernel MSI code needed to be changed to
>> (not?) track this state.  Mike Ellerman seemed to best
>> grasp this area ... was this ever fixed?
>>
>> Or perhaps this is an alternate fix for that bug? It may
>> well have been that calling the MSI disable triggered
>> the problem, I don't remember now.
>
> I'm pretty sure you're referring to this patch, which you acked :)
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1db3e890aed3ac39cded30d6e94618bda086f7ce
>
> I don't know of anything else that fits your description?

Yes, that's the one. I wasn't sure if it ever made it in or
not, and I just wanted to make sure it wasn't what was
biting you.

--linas

Patch

--- linux-2.6.18.ppc64-orig/arch/powerpc/platforms/pseries/eeh_driver.c	2009-01-16 10:59:59.000000000 -0800
+++ linux-2.6.18.ppc64/arch/powerpc/platforms/pseries/eeh_driver.c	2009-02-07 12:29:14.000000000 -0800
@@ -67,7 +67,7 @@  static int irq_in_use(unsigned int irq)
 {
 	int rc = 0;
 	unsigned long flags;
-   struct irq_desc *desc = irq_desc + irq;
+	struct irq_desc *desc = irq_desc + irq;
 
 	spin_lock_irqsave(&desc->lock, flags);
 	if (desc->action)
@@ -76,6 +76,40 @@  static int irq_in_use(unsigned int irq)
 	return rc;
 }
 
+/**
+ * eeh_disable_irq - disable interrupt for the recovering device
+ */
+void eeh_disable_irq(struct pci_dev *dev)
+{
+	struct device_node *dn = pci_device_to_OF_node(dev);
+
+	/* Don't disable MSI and MSI-X interrupts. They are
+	 * effectively disabled by the DMA Stopped state 
+	 * when an EEH error occurs.
+	*/
+	if (dev->msi_enabled || dev->msix_enabled)
+		return;
+
+	if (!irq_in_use(dev->irq))
+		return;
+
+	PCI_DN(dn)->eeh_mode |= EEH_MODE_IRQ_DISABLED;
+	disable_irq_nosync(dev->irq);
+}
+
+/**
+ * eeh_enable_irq - enable interrupt for the recovering device
+ */
+void eeh_enable_irq(struct pci_dev *dev)
+{
+	struct device_node *dn = pci_device_to_OF_node(dev);
+
+	if ((PCI_DN(dn)->eeh_mode) & EEH_MODE_IRQ_DISABLED) {
+		PCI_DN(dn)->eeh_mode &= ~EEH_MODE_IRQ_DISABLED;
+		enable_irq(dev->irq);
+	}
+}
+
 /* ------------------------------------------------------- */
 /**
  * eeh_report_error - report pci error to each device driver
@@ -95,11 +129,8 @@  static void eeh_report_error(struct pci_
 	if (!driver)
 		return;
 
-	if (irq_in_use (dev->irq)) {
-		struct device_node *dn = pci_device_to_OF_node(dev);
-		PCI_DN(dn)->eeh_mode |= EEH_MODE_IRQ_DISABLED;
-		disable_irq_nosync(dev->irq);
-	}
+	eeh_disable_irq(dev);
+
 	if (!driver->err_handler ||
 	    !driver->err_handler->error_detected)
 		return;
@@ -144,15 +175,12 @@  static void eeh_report_reset(struct pci_
 {
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver = dev->driver;
-	struct device_node *dn = pci_device_to_OF_node(dev);
 
 	if (!driver)
 		return;
 
-	if ((PCI_DN(dn)->eeh_mode) & EEH_MODE_IRQ_DISABLED) {
-		PCI_DN(dn)->eeh_mode &= ~EEH_MODE_IRQ_DISABLED;
-		enable_irq(dev->irq);
-	}
+	eeh_enable_irq(dev);
+
 	if (!driver->err_handler ||
 	    !driver->err_handler->slot_reset)
 		return;
@@ -171,17 +199,14 @@  static void eeh_report_reset(struct pci_
 static void eeh_report_resume(struct pci_dev *dev, void *userdata)
 {
 	struct pci_driver *driver = dev->driver;
-	struct device_node *dn = pci_device_to_OF_node(dev);
 
 	dev->error_state = pci_channel_io_normal;
 
 	if (!driver)
 		return;
 
-	if ((PCI_DN(dn)->eeh_mode) & EEH_MODE_IRQ_DISABLED) {
-		PCI_DN(dn)->eeh_mode &= ~EEH_MODE_IRQ_DISABLED;
-		enable_irq(dev->irq);
-	}
+	eeh_enable_irq(dev);
+
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume)
 		return;
@@ -205,15 +230,12 @@  static void eeh_report_failure(struct pc
 	if (!driver)
 		return;
 
-	if (irq_in_use (dev->irq)) {
-		struct device_node *dn = pci_device_to_OF_node(dev);
-		PCI_DN(dn)->eeh_mode |= EEH_MODE_IRQ_DISABLED;
-		disable_irq_nosync(dev->irq);
-	}
-	if (!driver->err_handler)
-		return;
-	if (!driver->err_handler->error_detected)
+	eeh_disable_irq(dev);
+
+	if (!driver->err_handler ||
+	    !driver->err_handler->error_detected)
 		return;
+
 	driver->err_handler->error_detected(dev, pci_channel_io_perm_failure);
 }