diff mbox

PCI: vmd: Free up IRQs on suspend path

Message ID 20170720202858.1918-1-scott.bauer@intel.com
State Changes Requested
Headers show

Commit Message

Scott Bauer July 20, 2017, 8:28 p.m. UTC
This patch frees up the IRQs we request on the suspend path,
and reallocates them on the resume path.

Fixes:
[  559.964386] CPU 111 disable failed: CPU has 9 vectors assigned and there are only 0 available.
[  559.966824] Error taking CPU111 down: -34
[  559.966825] Non-boot CPUs are not disabled
[  559.966826] Enabling non-boot CPUs ...

Signed-off-by: Scott Bauer <scott.bauer@intel.com>
---
 drivers/pci/host/vmd.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Scott Bauer July 20, 2017, 9:34 p.m. UTC | #1
On Tue, Jul 25, 2017 at 08:15:32PM -0600, Jon Derrick wrote:
> On 07/20/2017 02:28 PM, Scott Bauer wrote:
> > This patch frees up the IRQs we request on the suspend path,
> > and reallocates them on the resume path.
> > 
> > Fixes:
> > [  559.964386] CPU 111 disable failed: CPU has 9 vectors assigned and there are only 0 available.
> > [  559.966824] Error taking CPU111 down: -34
> > [  559.966825] Non-boot CPUs are not disabled
> > [  559.966826] Enabling non-boot CPUs ...
> > 
> > Signed-off-by: Scott Bauer <scott.bauer@intel.com>
> > ---
> >  drivers/pci/host/vmd.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/drivers/pci/host/vmd.c b/drivers/pci/host/vmd.c
> > index 6088c3083194..88bf0c754fb2 100644
> > --- a/drivers/pci/host/vmd.c
> > +++ b/drivers/pci/host/vmd.c
> > @@ -755,6 +755,11 @@ static void vmd_remove(struct pci_dev *dev)
> >  static int vmd_suspend(struct device *dev)
> >  {
> >  	struct pci_dev *pdev = to_pci_dev(dev);
> > +	struct vmd_dev *vmd = pci_get_drvdata(pdev);
> > +	int i;
> > +
> > +	for (i = 0; i < vmd->msix_count; i++)
> > +                devm_free_irq(dev, pci_irq_vector(pdev, i), &vmd->irqs[i]);
> >  
> >  	pci_save_state(pdev);
> >  	return 0;
> > @@ -763,7 +768,16 @@ static int vmd_suspend(struct device *dev)
> >  static int vmd_resume(struct device *dev)
> >  {
> >  	struct pci_dev *pdev = to_pci_dev(dev);
> > +	struct vmd_dev *vmd = pci_get_drvdata(pdev);
> > +	int err, i;
> >  
> > +	for (i = 0; i < vmd->msix_count; i++) {
> > +		err = devm_request_irq(dev, pci_irq_vector(pdev, i),
> > +				       vmd_irq, IRQF_NO_THREAD,
> The flag here looks redundant to me because the fn calls into
> request_threaded_irq with no thread_fn. But since it's a bit more
> verbose, can you change the other devm_request_irq to use IRQF_NO_THREAD?

I actually yanked the request irq from the other call. It was changed
to NO_THREAD in April:

(3eefa790c968) PCI: host: Mark PCIe/PCI (MSI) cascade ISR as IRQF_NO_THREAD
Scott Bauer July 20, 2017, 9:56 p.m. UTC | #2
On Tue, Jul 25, 2017 at 08:36:10PM -0600, Jon Derrick wrote:
> On 07/20/2017 03:34 PM, Scott Bauer wrote:
> [snip]
> >>> +	for (i = 0; i < vmd->msix_count; i++) {
> >>> +		err = devm_request_irq(dev, pci_irq_vector(pdev, i),
> >>> +				       vmd_irq, IRQF_NO_THREAD,
> >> The flag here looks redundant to me because the fn calls into
> >> request_threaded_irq with no thread_fn. But since it's a bit more
> >> verbose, can you change the other devm_request_irq to use IRQF_NO_THREAD?
> > 
> > I actually yanked the request irq from the other call. It was changed
> > to NO_THREAD in April:
> > 
> > (3eefa790c968) PCI: host: Mark PCIe/PCI (MSI) cascade ISR as IRQF_NO_THREAD
> > 
> 
> Ah so it was. Sorry was looking at v4.12
> 
> 
> I think we'll need to do more than free the irq handlers because if an
> interrupt occurs after that, I think it'll get kicked to handle_bad_irq.
> I think we just need to add pci_disable_device after save and
> pcim_enable_device after restore.

I orignally had this in the patch I was testing on SLES and took it out when
it seemingly didn't make a difference. I didnt test yanking a drive or anything
behind the domain while it was suspended, so I wouldn't have hit your scenario.

I'll add it back in. Also, I think I need to CC stable on this one too since
this is missing all the way since the original inception of this driver.
Keith Busch July 20, 2017, 10:41 p.m. UTC | #3
On Thu, Jul 20, 2017 at 03:56:54PM -0600, Scott Bauer wrote:
> On Tue, Jul 25, 2017 at 08:36:10PM -0600, Jon Derrick wrote:
> > 
> > I think we'll need to do more than free the irq handlers because if an
> > interrupt occurs after that, I think it'll get kicked to handle_bad_irq.
> > I think we just need to add pci_disable_device after save and
> > pcim_enable_device after restore.
> 
> I orignally had this in the patch I was testing on SLES and took it out when
> it seemingly didn't make a difference. I didnt test yanking a drive or anything
> behind the domain while it was suspended, so I wouldn't have hit your scenario.
> 
> I'll add it back in. Also, I think I need to CC stable on this one too since
> this is missing all the way since the original inception of this driver.

I think your patch should be okay as-is. After freeing VMD's IRQs,
the kernel will automatically clear the MSI-x control bit to disable it
from firing.
Jon Derrick July 26, 2017, 2:15 a.m. UTC | #4
On 07/20/2017 02:28 PM, Scott Bauer wrote:
> This patch frees up the IRQs we request on the suspend path,
> and reallocates them on the resume path.
> 
> Fixes:
> [  559.964386] CPU 111 disable failed: CPU has 9 vectors assigned and there are only 0 available.
> [  559.966824] Error taking CPU111 down: -34
> [  559.966825] Non-boot CPUs are not disabled
> [  559.966826] Enabling non-boot CPUs ...
> 
> Signed-off-by: Scott Bauer <scott.bauer@intel.com>
> ---
>  drivers/pci/host/vmd.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/pci/host/vmd.c b/drivers/pci/host/vmd.c
> index 6088c3083194..88bf0c754fb2 100644
> --- a/drivers/pci/host/vmd.c
> +++ b/drivers/pci/host/vmd.c
> @@ -755,6 +755,11 @@ static void vmd_remove(struct pci_dev *dev)
>  static int vmd_suspend(struct device *dev)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct vmd_dev *vmd = pci_get_drvdata(pdev);
> +	int i;
> +
> +	for (i = 0; i < vmd->msix_count; i++)
> +                devm_free_irq(dev, pci_irq_vector(pdev, i), &vmd->irqs[i]);
>  
>  	pci_save_state(pdev);
>  	return 0;
> @@ -763,7 +768,16 @@ static int vmd_suspend(struct device *dev)
>  static int vmd_resume(struct device *dev)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct vmd_dev *vmd = pci_get_drvdata(pdev);
> +	int err, i;
>  
> +	for (i = 0; i < vmd->msix_count; i++) {
> +		err = devm_request_irq(dev, pci_irq_vector(pdev, i),
> +				       vmd_irq, IRQF_NO_THREAD,
The flag here looks redundant to me because the fn calls into
request_threaded_irq with no thread_fn. But since it's a bit more
verbose, can you change the other devm_request_irq to use IRQF_NO_THREAD?

> +				       "vmd", &vmd->irqs[i]);
> +		if (err)
> +			return err;
> +	}
>  	pci_restore_state(pdev);
>  	return 0;
>  }
>
Jon Derrick July 26, 2017, 2:36 a.m. UTC | #5
On 07/20/2017 03:34 PM, Scott Bauer wrote:
[snip]
>>> +	for (i = 0; i < vmd->msix_count; i++) {
>>> +		err = devm_request_irq(dev, pci_irq_vector(pdev, i),
>>> +				       vmd_irq, IRQF_NO_THREAD,
>> The flag here looks redundant to me because the fn calls into
>> request_threaded_irq with no thread_fn. But since it's a bit more
>> verbose, can you change the other devm_request_irq to use IRQF_NO_THREAD?
> 
> I actually yanked the request irq from the other call. It was changed
> to NO_THREAD in April:
> 
> (3eefa790c968) PCI: host: Mark PCIe/PCI (MSI) cascade ISR as IRQF_NO_THREAD
> 

Ah so it was. Sorry was looking at v4.12


I think we'll need to do more than free the irq handlers because if an
interrupt occurs after that, I think it'll get kicked to handle_bad_irq.
I think we just need to add pci_disable_device after save and
pcim_enable_device after restore.
diff mbox

Patch

diff --git a/drivers/pci/host/vmd.c b/drivers/pci/host/vmd.c
index 6088c3083194..88bf0c754fb2 100644
--- a/drivers/pci/host/vmd.c
+++ b/drivers/pci/host/vmd.c
@@ -755,6 +755,11 @@  static void vmd_remove(struct pci_dev *dev)
 static int vmd_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
+	struct vmd_dev *vmd = pci_get_drvdata(pdev);
+	int i;
+
+	for (i = 0; i < vmd->msix_count; i++)
+                devm_free_irq(dev, pci_irq_vector(pdev, i), &vmd->irqs[i]);
 
 	pci_save_state(pdev);
 	return 0;
@@ -763,7 +768,16 @@  static int vmd_suspend(struct device *dev)
 static int vmd_resume(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
+	struct vmd_dev *vmd = pci_get_drvdata(pdev);
+	int err, i;
 
+	for (i = 0; i < vmd->msix_count; i++) {
+		err = devm_request_irq(dev, pci_irq_vector(pdev, i),
+				       vmd_irq, IRQF_NO_THREAD,
+				       "vmd", &vmd->irqs[i]);
+		if (err)
+			return err;
+	}
 	pci_restore_state(pdev);
 	return 0;
 }