[1/2] PCI/DPC: Disable interrupt generation during suspend

Message ID 20180314114125.71132-1-mika.westerberg@linux.intel.com
State Not Applicable
Delegated to: Bjorn Helgaas
Headers show
Series
  • [1/2] PCI/DPC: Disable interrupt generation during suspend
Related show

Commit Message

Mika Westerberg March 14, 2018, 11:41 a.m.
When system is resumed if the interrupt generation is enabled it may
trigger immediately when interrupts are enabled depending on what is in
the status register. This may generate spurious DPC conditions and
unnecessary removal of devices.

Make this work better with system suspend and disable interrupt
generation when the system is suspended.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pcie/pcie-dpc.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

Comments

Rafael J. Wysocki March 14, 2018, 11:48 a.m. | #1
On Wed, Mar 14, 2018 at 12:41 PM, Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> When system is resumed if the interrupt generation is enabled it may
> trigger immediately when interrupts are enabled depending on what is in
> the status register. This may generate spurious DPC conditions and
> unnecessary removal of devices.
>
> Make this work better with system suspend and disable interrupt
> generation when the system is suspended.
>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/pcie/pcie-dpc.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
> index 38e40c6c576f..14b96983dbbd 100644
> --- a/drivers/pci/pcie/pcie-dpc.c
> +++ b/drivers/pci/pcie/pcie-dpc.c
> @@ -280,23 +280,44 @@ static int dpc_probe(struct pcie_device *dev)
>         return status;
>  }
>
> -static void dpc_remove(struct pcie_device *dev)
> +static void dpc_interrupt_enable(struct dpc_dev *dpc, bool enable)
>  {
> -       struct dpc_dev *dpc = get_service_data(dev);
> -       struct pci_dev *pdev = dev->port;
> +       struct pci_dev *pdev = dpc->dev->port;
>         u16 ctl;
>
>         pci_read_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, &ctl);
> -       ctl &= ~(PCI_EXP_DPC_CTL_EN_NONFATAL | PCI_EXP_DPC_CTL_INT_EN);
> +       if (enable)
> +               ctl |= PCI_EXP_DPC_CTL_EN_NONFATAL | PCI_EXP_DPC_CTL_INT_EN;
> +       else
> +               ctl &= ~(PCI_EXP_DPC_CTL_EN_NONFATAL | PCI_EXP_DPC_CTL_INT_EN);
>         pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, ctl);
>  }
>
> +static void dpc_remove(struct pcie_device *dev)
> +{
> +       dpc_interrupt_enable(get_service_data(dev), false);
> +}
> +
> +static int dpc_suspend(struct pcie_device *dev)
> +{
> +       dpc_interrupt_enable(get_service_data(dev), false);
> +       return 0;
> +}
> +
> +static int dpc_resume(struct pcie_device *dev)
> +{
> +       dpc_interrupt_enable(get_service_data(dev), true);
> +       return 0;
> +}

Have you considered putting these things into the ->suspend_late and
->resume_early callbacks, respectively?

That might be slightly better as runtime resume is still enabled when
the ->suspend and ->resume callbacks run.

> +
>  static struct pcie_port_service_driver dpcdriver = {
>         .name           = "dpc",
>         .port_type      = PCIE_ANY_PORT,
>         .service        = PCIE_PORT_SERVICE_DPC,
>         .probe          = dpc_probe,
>         .remove         = dpc_remove,
> +       .suspend        = dpc_suspend,
> +       .resume         = dpc_resume,
>  };
>
>  static int __init dpc_service_init(void)
> --
Mika Westerberg March 14, 2018, 12:05 p.m. | #2
On Wed, Mar 14, 2018 at 12:48:49PM +0100, Rafael J. Wysocki wrote:
> Have you considered putting these things into the ->suspend_late and
> ->resume_early callbacks, respectively?
> 
> That might be slightly better as runtime resume is still enabled when
> the ->suspend and ->resume callbacks run.

There is no ->suspend_late or ->resume_early callbacks in struct
pcie_port_service_driver so I followed what drivers/pci/pcie/pme.c is
doing. I guess we could add those callbacks as well if you think they
are better suited here.
Lukas Wunner March 14, 2018, 12:33 p.m. | #3
On Wed, Mar 14, 2018 at 02:05:47PM +0200, Mika Westerberg wrote:
> On Wed, Mar 14, 2018 at 12:48:49PM +0100, Rafael J. Wysocki wrote:
> > Have you considered putting these things into the ->suspend_late and
> > ->resume_early callbacks, respectively?
> > 
> > That might be slightly better as runtime resume is still enabled when
> > the ->suspend and ->resume callbacks run.
> 
> There is no ->suspend_late or ->resume_early callbacks in struct
> pcie_port_service_driver so I followed what drivers/pci/pcie/pme.c is
> doing. I guess we could add those callbacks as well if you think they
> are better suited here.

Maybe these two commits I did 2 years ago but never upstreamed are of help:

https://github.com/l1k/linux/commit/4f59372f830899e89fea9ba7c090680900d5998e
https://github.com/l1k/linux/commit/ce63b6ed01602e77c56043ee6dfa52ca9d257f88

I'm not sure if they still compile & work, sorry. :)

Thanks,

Lukas
Mika Westerberg March 20, 2018, 10:45 a.m. | #4
On Wed, Mar 14, 2018 at 01:33:32PM +0100, Lukas Wunner wrote:
> On Wed, Mar 14, 2018 at 02:05:47PM +0200, Mika Westerberg wrote:
> > On Wed, Mar 14, 2018 at 12:48:49PM +0100, Rafael J. Wysocki wrote:
> > > Have you considered putting these things into the ->suspend_late and
> > > ->resume_early callbacks, respectively?
> > > 
> > > That might be slightly better as runtime resume is still enabled when
> > > the ->suspend and ->resume callbacks run.
> > 
> > There is no ->suspend_late or ->resume_early callbacks in struct
> > pcie_port_service_driver so I followed what drivers/pci/pcie/pme.c is
> > doing. I guess we could add those callbacks as well if you think they
> > are better suited here.
> 
> Maybe these two commits I did 2 years ago but never upstreamed are of help:
> 
> https://github.com/l1k/linux/commit/4f59372f830899e89fea9ba7c090680900d5998e
> https://github.com/l1k/linux/commit/ce63b6ed01602e77c56043ee6dfa52ca9d257f88
> 
> I'm not sure if they still compile & work, sorry. :)

Thanks for the pointers.

I took a look and then realized that we most probably do not need the
custom portdrv specific hooks at all. They are basically doing the same
than what PM core is (iterate over children and call service driver PM
hook). I don't see any reason why we could not rely on the PM core ops
instead of these custom ones but maybe I'm missing something.
Lukas Wunner March 20, 2018, 11:35 a.m. | #5
On Tue, Mar 20, 2018 at 12:45:08PM +0200, Mika Westerberg wrote:
> > On Wed, Mar 14, 2018 at 02:05:47PM +0200, Mika Westerberg wrote:
> > > On Wed, Mar 14, 2018 at 12:48:49PM +0100, Rafael J. Wysocki wrote:
> > > > Have you considered putting these things into the ->suspend_late and
> > > > ->resume_early callbacks, respectively?
> > > > 
> > > > That might be slightly better as runtime resume is still enabled when
> > > > the ->suspend and ->resume callbacks run.
> > > 
> > > There is no ->suspend_late or ->resume_early callbacks in struct
> > > pcie_port_service_driver so I followed what drivers/pci/pcie/pme.c is
> > > doing. I guess we could add those callbacks as well if you think they
> > > are better suited here.
> 
> I took a look and then realized that we most probably do not need the
> custom portdrv specific hooks at all. They are basically doing the same
> than what PM core is (iterate over children and call service driver PM
> hook). I don't see any reason why we could not rely on the PM core ops
> instead of these custom ones but maybe I'm missing something.

Can't think of a reason why this solution shouldn't work, I must have
been blind not to see it.

Lukas
Lukas Wunner March 22, 2018, 10:45 a.m. | #6
On Tue, Mar 20, 2018 at 12:35:56PM +0100, Lukas Wunner wrote:
> On Tue, Mar 20, 2018 at 12:45:08PM +0200, Mika Westerberg wrote:
> > > On Wed, Mar 14, 2018 at 02:05:47PM +0200, Mika Westerberg wrote:
> > > > On Wed, Mar 14, 2018 at 12:48:49PM +0100, Rafael J. Wysocki wrote:
> > > > > Have you considered putting these things into the ->suspend_late and
> > > > > ->resume_early callbacks, respectively?
> > > > > 
> > > > > That might be slightly better as runtime resume is still enabled when
> > > > > the ->suspend and ->resume callbacks run.
> > > > 
> > > > There is no ->suspend_late or ->resume_early callbacks in struct
> > > > pcie_port_service_driver so I followed what drivers/pci/pcie/pme.c is
> > > > doing. I guess we could add those callbacks as well if you think they
> > > > are better suited here.
> > 
> > I took a look and then realized that we most probably do not need the
> > custom portdrv specific hooks at all. They are basically doing the same
> > than what PM core is (iterate over children and call service driver PM
> > hook). I don't see any reason why we could not rely on the PM core ops
> > instead of these custom ones but maybe I'm missing something.
> 
> Can't think of a reason why this solution shouldn't work

Now I've thought of one.

The port may have more children besides the port service devices,
namely all the PCI devices below the port.  The PM core doesn't
impose a specific ordering on suspend/resume but will try to
parallelize among all the children.

Usually that's not what you want.  On resume, you want to resume
the port itself (including its port services) *before* resuming
the PCI child devices.  And the other way round on suspend.

Thanks,

Lukas
Mika Westerberg March 22, 2018, 4:53 p.m. | #7
On Thu, Mar 22, 2018 at 11:45:17AM +0100, Lukas Wunner wrote:
> Now I've thought of one.
> 
> The port may have more children besides the port service devices,
> namely all the PCI devices below the port.  The PM core doesn't
> impose a specific ordering on suspend/resume but will try to
> parallelize among all the children.
> 
> Usually that's not what you want.  On resume, you want to resume
> the port itself (including its port services) *before* resuming
> the PCI child devices.  And the other way round on suspend.

That's a good point.

So I guess there is no way avoiding adding suspend_late/resume_early
callbacks to the pcie port service structure. I'll do that in the next
revision.
Lukas Wunner March 22, 2018, 5:39 p.m. | #8
On Thu, Mar 22, 2018 at 06:53:17PM +0200, Mika Westerberg wrote:
> On Thu, Mar 22, 2018 at 11:45:17AM +0100, Lukas Wunner wrote:
> > Now I've thought of one.
> > 
> > The port may have more children besides the port service devices,
> > namely all the PCI devices below the port.  The PM core doesn't
> > impose a specific ordering on suspend/resume but will try to
> > parallelize among all the children.
> > 
> > Usually that's not what you want.  On resume, you want to resume
> > the port itself (including its port services) *before* resuming
> > the PCI child devices.  And the other way round on suspend.
> 
> That's a good point.
> 
> So I guess there is no way avoiding adding suspend_late/resume_early
> callbacks to the pcie port service structure. I'll do that in the next
> revision.

Well, there *are* ways to avoid it but they might not be better.

Iterating over the port services' callbacks is equivalent to ordering
the port service devices after the port's PCI device but before its
PCI child devices in devices_kset.

That can also be achieved by adding a device link from every PCI child
device (consumer) to every port service device (provider).  The result
however is a combinatorial explosion.  Say you've got 64 down stream
bridges in a PCIe switch and the upstream bridge has 3 port services,
that's 3 x 64 = 192 device links.  That's probably clumsier than
iterating over the port services.

Thanks,

Lukas
Bjorn Helgaas March 22, 2018, 7:36 p.m. | #9
On Thu, Mar 22, 2018 at 06:39:03PM +0100, Lukas Wunner wrote:
> On Thu, Mar 22, 2018 at 06:53:17PM +0200, Mika Westerberg wrote:
> > On Thu, Mar 22, 2018 at 11:45:17AM +0100, Lukas Wunner wrote:
> > > Now I've thought of one.
> > > 
> > > The port may have more children besides the port service devices,
> > > namely all the PCI devices below the port.  The PM core doesn't
> > > impose a specific ordering on suspend/resume but will try to
> > > parallelize among all the children.
> > > 
> > > Usually that's not what you want.  On resume, you want to resume
> > > the port itself (including its port services) *before* resuming
> > > the PCI child devices.  And the other way round on suspend.
> > 
> > That's a good point.
> > 
> > So I guess there is no way avoiding adding suspend_late/resume_early
> > callbacks to the pcie port service structure. I'll do that in the next
> > revision.
> 
> Well, there *are* ways to avoid it but they might not be better.
> 
> Iterating over the port services' callbacks is equivalent to ordering
> the port service devices after the port's PCI device but before its
> PCI child devices in devices_kset.
> 
> That can also be achieved by adding a device link from every PCI child
> device (consumer) to every port service device (provider).  The result
> however is a combinatorial explosion.  Say you've got 64 down stream
> bridges in a PCIe switch and the upstream bridge has 3 port services,
> that's 3 x 64 = 192 device links.  That's probably clumsier than
> iterating over the port services.

I hope we can avoid adding suspend_late/resume_early callbacks in
struct pcie_port_service_driver, and I also hope we can avoid adding
device links.  Those both sound pretty complicated.

Can you do something like the patch below, which does something
similar for PME?


commit 6c4dfc1389e1
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Fri Mar 9 11:06:54 2018 -0600

    PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver
    
    fe31e69740ed ("PCI/PCIe: Clear Root PME Status bits early during system
    resume") added a .resume_noirq() callback to the PCIe port driver to clear
    the PME Status bit during resume to work around a BIOS issue.
    
    The BIOS evidently enabled PME interrupts for ACPI-based runtime wakeups
    but did not clear the PME Status bit during resume, which meant PMEs after
    resume did not trigger interrupts because PME Status did not transition
    from cleared to set.
    
    The fix was in the PCIe port driver, so it worked when CONFIG_PCIEPORTBUS
    was set.  But I think we *always* want the fix because the platform may use
    PME interrupts even if Linux is built without the PCIe port driver.
    
    Move the fix from the port driver to the PCI core so we can work around
    this "PME doesn't work after waking from a sleep state" issue regardless of
    CONFIG_PCIEPORTBUS.
    
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 3bed6beda051..e561fa0f456c 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -525,6 +525,18 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
 }
 
+static void pcie_pme_root_status_cleanup(struct pci_dev *pci_dev)
+{
+	/*
+	 * Some BIOSes forget to clear Root PME Status bits after system
+	 * wakeup, which breaks ACPI-based runtime wakeup on PCI Express.
+	 * Clear those bits now just in case (shouldn't hurt).
+	 */
+	if (pci_is_pcie(pci_dev) &&
+	    pci_pcie_type(pci_dev) == PCI_EXP_TYPE_ROOT_PORT)
+		pcie_clear_root_pme_status(pci_dev);
+}
+
 /*
  * Default "suspend" method for devices that have no driver provided suspend,
  * or not even a driver at all (second part).
@@ -873,6 +885,8 @@ static int pci_pm_resume_noirq(struct device *dev)
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume_early(dev);
 
+	pcie_pme_root_status_cleanup(pci_dev);
+
 	if (drv && drv->pm && drv->pm->resume_noirq)
 		error = drv->pm->resume_noirq(dev);
 
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index d6f10a97d400..ec9e936c2a5b 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -61,20 +61,6 @@ static int pcie_portdrv_restore_config(struct pci_dev *dev)
 }
 
 #ifdef CONFIG_PM
-static int pcie_port_resume_noirq(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-
-	/*
-	 * Some BIOSes forget to clear Root PME Status bits after system wakeup
-	 * which breaks ACPI-based runtime wakeup on PCI Express, so clear those
-	 * bits now just in case (shouldn't hurt).
-	 */
-	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
-		pcie_clear_root_pme_status(pdev);
-	return 0;
-}
-
 static int pcie_port_runtime_suspend(struct device *dev)
 {
 	return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
@@ -102,7 +88,6 @@ static const struct dev_pm_ops pcie_portdrv_pm_ops = {
 	.thaw		= pcie_port_device_resume,
 	.poweroff	= pcie_port_device_suspend,
 	.restore	= pcie_port_device_resume,
-	.resume_noirq	= pcie_port_resume_noirq,
 	.runtime_suspend = pcie_port_runtime_suspend,
 	.runtime_resume	= pcie_port_runtime_resume,
 	.runtime_idle	= pcie_port_runtime_idle,
Mika Westerberg March 23, 2018, 11:18 a.m. | #10
On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> I hope we can avoid adding suspend_late/resume_early callbacks in
> struct pcie_port_service_driver, and I also hope we can avoid adding
> device links.  Those both sound pretty complicated.
> 
> Can you do something like the patch below, which does something
> similar for PME?

AFAICT the core PCI PM code follows the same ordering than what PM core
does so it may be possible that not all service drivers get
resumed/suspended before other children (PCI buses). Basically this
would be the same than just using core PM ops in DPC driver (in which
case DPC specific things are still kept in DPC driver not in PCI core).

If we do not care about that then I think both options are pretty
straighforward to implement.
Bjorn Helgaas March 23, 2018, 9:08 p.m. | #11
On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote:
> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> > I hope we can avoid adding suspend_late/resume_early callbacks in
> > struct pcie_port_service_driver, and I also hope we can avoid adding
> > device links.  Those both sound pretty complicated.
> > 
> > Can you do something like the patch below, which does something
> > similar for PME?
> 
> AFAICT the core PCI PM code follows the same ordering than what PM core
> does so it may be possible that not all service drivers get
> resumed/suspended before other children (PCI buses). Basically this
> would be the same than just using core PM ops in DPC driver (in which
> case DPC specific things are still kept in DPC driver not in PCI core).

I'm not sure I follow this.  I assume the core PCI PM code guarantees
that a bridge is suspended after its children and resumed before them.
Are you saying that's not the case?
Rafael J. Wysocki March 23, 2018, 9:11 p.m. | #12
On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote:
>> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
>> > I hope we can avoid adding suspend_late/resume_early callbacks in
>> > struct pcie_port_service_driver, and I also hope we can avoid adding
>> > device links.  Those both sound pretty complicated.
>> >
>> > Can you do something like the patch below, which does something
>> > similar for PME?
>>
>> AFAICT the core PCI PM code follows the same ordering than what PM core
>> does so it may be possible that not all service drivers get
>> resumed/suspended before other children (PCI buses). Basically this
>> would be the same than just using core PM ops in DPC driver (in which
>> case DPC specific things are still kept in DPC driver not in PCI core).
>
> I'm not sure I follow this.  I assume the core PCI PM code guarantees
> that a bridge is suspended after its children and resumed before them.
> Are you saying that's not the case?

if this is a PCIe port, then there are two categories of childres:
port services and the PCI devices below it.

There are no ordering constraints between the former and the latter,
which appears to be a problem here.
Bjorn Helgaas March 23, 2018, 10:01 p.m. | #13
On Fri, Mar 23, 2018 at 10:11:53PM +0100, Rafael J. Wysocki wrote:
> On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote:
> >> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> >> > I hope we can avoid adding suspend_late/resume_early callbacks in
> >> > struct pcie_port_service_driver, and I also hope we can avoid adding
> >> > device links.  Those both sound pretty complicated.
> >> >
> >> > Can you do something like the patch below, which does something
> >> > similar for PME?
> >>
> >> AFAICT the core PCI PM code follows the same ordering than what PM core
> >> does so it may be possible that not all service drivers get
> >> resumed/suspended before other children (PCI buses). Basically this
> >> would be the same than just using core PM ops in DPC driver (in which
> >> case DPC specific things are still kept in DPC driver not in PCI core).
> >
> > I'm not sure I follow this.  I assume the core PCI PM code guarantees
> > that a bridge is suspended after its children and resumed before them.
> > Are you saying that's not the case?
> 
> if this is a PCIe port, then there are two categories of childres:
> port services and the PCI devices below it.
> 
> There are no ordering constraints between the former and the latter,
> which appears to be a problem here.

It seems like a pretty fundamental problem if port services can be
suspended before PCI devices below the port.  I assume that would have
to be fixed somehow in the PCI core and the port driver, but this
patch only touches the DPC service driver.

I'd really like to get rid of the port services driver altogether, and
this ordering issue seems like a good reason to do that.
Rafael J. Wysocki March 24, 2018, 10:48 a.m. | #14
On Fri, Mar 23, 2018 at 11:01 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Mar 23, 2018 at 10:11:53PM +0100, Rafael J. Wysocki wrote:
>> On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> > On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote:
>> >> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
>> >> > I hope we can avoid adding suspend_late/resume_early callbacks in
>> >> > struct pcie_port_service_driver, and I also hope we can avoid adding
>> >> > device links.  Those both sound pretty complicated.
>> >> >
>> >> > Can you do something like the patch below, which does something
>> >> > similar for PME?
>> >>
>> >> AFAICT the core PCI PM code follows the same ordering than what PM core
>> >> does so it may be possible that not all service drivers get
>> >> resumed/suspended before other children (PCI buses). Basically this
>> >> would be the same than just using core PM ops in DPC driver (in which
>> >> case DPC specific things are still kept in DPC driver not in PCI core).
>> >
>> > I'm not sure I follow this.  I assume the core PCI PM code guarantees
>> > that a bridge is suspended after its children and resumed before them.
>> > Are you saying that's not the case?
>>
>> if this is a PCIe port, then there are two categories of childres:
>> port services and the PCI devices below it.
>>
>> There are no ordering constraints between the former and the latter,
>> which appears to be a problem here.
>
> It seems like a pretty fundamental problem if port services can be
> suspended before PCI devices below the port.  I assume that would have
> to be fixed somehow in the PCI core and the port driver, but this
> patch only touches the DPC service driver.
>
> I'd really like to get rid of the port services driver altogether, and
> this ordering issue seems like a good reason to do that.

I agree.

In fact, I was about to say the exact same thing, but you beat me to that. :-)
Lukas Wunner March 24, 2018, 12:15 p.m. | #15
On Fri, Mar 23, 2018 at 05:01:27PM -0500, Bjorn Helgaas wrote:
> On Fri, Mar 23, 2018 at 10:11:53PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote:
> > >> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> > >> > I hope we can avoid adding suspend_late/resume_early callbacks in
> > >> > struct pcie_port_service_driver, and I also hope we can avoid adding
> > >> > device links.  Those both sound pretty complicated.
> > >> >
> > >> > Can you do something like the patch below, which does something
> > >> > similar for PME?
> > >>
> > >> AFAICT the core PCI PM code follows the same ordering than what PM core
> > >> does so it may be possible that not all service drivers get
> > >> resumed/suspended before other children (PCI buses). Basically this
> > >> would be the same than just using core PM ops in DPC driver (in which
> > >> case DPC specific things are still kept in DPC driver not in PCI core).
> > >
> > > I'm not sure I follow this.  I assume the core PCI PM code guarantees
> > > that a bridge is suspended after its children and resumed before them.
> > > Are you saying that's not the case?
> > 
> > if this is a PCIe port, then there are two categories of childres:
> > port services and the PCI devices below it.
> > 
> > There are no ordering constraints between the former and the latter,
> > which appears to be a problem here.
> 
> It seems like a pretty fundamental problem if port services can be
> suspended before PCI devices below the port.  I assume that would have
> to be fixed somehow in the PCI core and the port driver, but this
> patch only touches the DPC service driver.
> 
> I'd really like to get rid of the port services driver altogether, and
> this ordering issue seems like a good reason to do that.

As it is, there is nothing to fix.

The port services driver enforces that the services are resumed before
PCI child devices (and after them on suspend) by iterating over the
services' ->resume / ->suspend callbacks, see pcie_port_device_resume()
and pcie_port_device_suspend(), which in turn are the port's ->resume /
->suspend callbacks.

There is (or was, before your reorganization this cycle) an inconsistency
in the handling of ->resume / ->suspend stages vis-à-vis the ->resume_noirq /
->runtime_resume / ->runtime_>suspend / ->runtime_idle stages.  The latter
are handled directly by portdrv callbacks whereas the former are handled
by per-service callbacks which we iterate over.

I cooked up these commits two years ago to make the handling more consistent
but ended up not needing them:
https://github.com/l1k/linux/commit/4f59372f830899e89fea9ba7c090680900d5998e
https://github.com/l1k/linux/commit/ce63b6ed01602e77c56043ee6dfa52ca9d257f88

Thanks,

Lukas
Lukas Wunner March 24, 2018, 1:48 p.m. | #16
On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> I hope we can avoid adding suspend_late/resume_early callbacks in
> struct pcie_port_service_driver,

I'm fairly certain that we cannot avoid adding at least a ->resume_noirq
callback to struct pcie_port_service_driver to fix a pciehp use case:

On ->resume_noirq the PCI core walks down the hierarchy to put every
device in D0 and restore its state (with a few exceptions such as direct
complete).  However with hotplug ports, it's possible that the user has
unplugged devices while the system was asleep, or replaced them with
other devices.  That's a very real use case with Thunderbolt and we're
handling it poorly or not at all currently.  We need to check if the
devices below a hotplug port are still there or have been replaced
(can probably be recognized by looking at vendor/device IDs across the
entire sub-hierarchy) during the ->resume_noirq phase.  We could mark
them with pci_dev_set_disconnected(), then skip putting them into D0
if that flag has been set.


> @@ -102,7 +88,6 @@ static const struct dev_pm_ops pcie_portdrv_pm_ops = {
>  	.thaw		= pcie_port_device_resume,
>  	.poweroff	= pcie_port_device_suspend,
>  	.restore	= pcie_port_device_resume,
> -	.resume_noirq	= pcie_port_resume_noirq,
>  	.runtime_suspend = pcie_port_runtime_suspend,
>  	.runtime_resume	= pcie_port_runtime_resume,
>  	.runtime_idle	= pcie_port_runtime_idle,

So the above would have to be reverted unfortunately when we fix this
use case.

Thanks,

Lukas
Bjorn Helgaas March 24, 2018, 2:09 p.m. | #17
On Sat, Mar 24, 2018 at 02:48:07PM +0100, Lukas Wunner wrote:
> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> > I hope we can avoid adding suspend_late/resume_early callbacks in
> > struct pcie_port_service_driver,
> 
> I'm fairly certain that we cannot avoid adding at least a ->resume_noirq
> callback to struct pcie_port_service_driver to fix a pciehp use case:
> 
> On ->resume_noirq the PCI core walks down the hierarchy to put every
> device in D0 and restore its state (with a few exceptions such as direct
> complete).  However with hotplug ports, it's possible that the user has
> unplugged devices while the system was asleep, or replaced them with
> other devices.  That's a very real use case with Thunderbolt and we're
> handling it poorly or not at all currently.  We need to check if the
> devices below a hotplug port are still there or have been replaced
> (can probably be recognized by looking at vendor/device IDs across the
> entire sub-hierarchy) during the ->resume_noirq phase.  We could mark
> them with pci_dev_set_disconnected(), then skip putting them into D0
> if that flag has been set.

This is definitely a real issue, and I think it affects more than just
Thunderbolt.  We've never handled undocks very well either.

I certainly agree we need a way to handle this; I would just rather do
it by integrating pciehp and the other services more directly into the
PCI core instead of using the port driver.  Maybe we'll need a
short-term ->resume_noirq, but I don't think the port driver is a good
long-term solution.

Bjorn
Mika Westerberg March 26, 2018, 9:55 a.m. | #18
On Sat, Mar 24, 2018 at 09:09:50AM -0500, Bjorn Helgaas wrote:
> On Sat, Mar 24, 2018 at 02:48:07PM +0100, Lukas Wunner wrote:
> > On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> > > I hope we can avoid adding suspend_late/resume_early callbacks in
> > > struct pcie_port_service_driver,
> > 
> > I'm fairly certain that we cannot avoid adding at least a ->resume_noirq
> > callback to struct pcie_port_service_driver to fix a pciehp use case:
> > 
> > On ->resume_noirq the PCI core walks down the hierarchy to put every
> > device in D0 and restore its state (with a few exceptions such as direct
> > complete).  However with hotplug ports, it's possible that the user has
> > unplugged devices while the system was asleep, or replaced them with
> > other devices.  That's a very real use case with Thunderbolt and we're
> > handling it poorly or not at all currently.  We need to check if the
> > devices below a hotplug port are still there or have been replaced
> > (can probably be recognized by looking at vendor/device IDs across the
> > entire sub-hierarchy) during the ->resume_noirq phase.  We could mark
> > them with pci_dev_set_disconnected(), then skip putting them into D0
> > if that flag has been set.
> 
> This is definitely a real issue, and I think it affects more than just
> Thunderbolt.  We've never handled undocks very well either.
> 
> I certainly agree we need a way to handle this; I would just rather do
> it by integrating pciehp and the other services more directly into the
> PCI core instead of using the port driver.  Maybe we'll need a
> short-term ->resume_noirq, but I don't think the port driver is a good
> long-term solution.

For the context of this patch series, I think I'll drop this patch from
the series now until it is clear what the direction is. I agree moving
port driver functionality into the PCI core makes sense.

I'll send out an updated version of the first patch as it does not
require these hooks.

Patch

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index 38e40c6c576f..14b96983dbbd 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -280,23 +280,44 @@  static int dpc_probe(struct pcie_device *dev)
 	return status;
 }
 
-static void dpc_remove(struct pcie_device *dev)
+static void dpc_interrupt_enable(struct dpc_dev *dpc, bool enable)
 {
-	struct dpc_dev *dpc = get_service_data(dev);
-	struct pci_dev *pdev = dev->port;
+	struct pci_dev *pdev = dpc->dev->port;
 	u16 ctl;
 
 	pci_read_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, &ctl);
-	ctl &= ~(PCI_EXP_DPC_CTL_EN_NONFATAL | PCI_EXP_DPC_CTL_INT_EN);
+	if (enable)
+		ctl |= PCI_EXP_DPC_CTL_EN_NONFATAL | PCI_EXP_DPC_CTL_INT_EN;
+	else
+		ctl &= ~(PCI_EXP_DPC_CTL_EN_NONFATAL | PCI_EXP_DPC_CTL_INT_EN);
 	pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, ctl);
 }
 
+static void dpc_remove(struct pcie_device *dev)
+{
+	dpc_interrupt_enable(get_service_data(dev), false);
+}
+
+static int dpc_suspend(struct pcie_device *dev)
+{
+	dpc_interrupt_enable(get_service_data(dev), false);
+	return 0;
+}
+
+static int dpc_resume(struct pcie_device *dev)
+{
+	dpc_interrupt_enable(get_service_data(dev), true);
+	return 0;
+}
+
 static struct pcie_port_service_driver dpcdriver = {
 	.name		= "dpc",
 	.port_type	= PCIE_ANY_PORT,
 	.service	= PCIE_PORT_SERVICE_DPC,
 	.probe		= dpc_probe,
 	.remove		= dpc_remove,
+	.suspend	= dpc_suspend,
+	.resume		= dpc_resume,
 };
 
 static int __init dpc_service_init(void)