diff mbox

[v2,4/4] PCI: Add runtime PM support for PCIe ports

Message ID 1460111790-92836-5-git-send-email-mika.westerberg@linux.intel.com
State Superseded
Headers show

Commit Message

Mika Westerberg April 8, 2016, 10:36 a.m. UTC
Add back runtime PM support for PCIe ports that was removed in
commit fe9a743a2601 ("PCI/PM: Drop unused runtime PM support code for
PCIe ports").

First of all we cannot enable it automatically for all ports since there
has been problems previously as can be seen in [1]. In summary suspended
PCIe ports were not able to deal with hotplug reliably. One reason why this
might happen is the fact that when a PCIe port is powered down, config
space access to the devices behind the port is not possible. If the BIOS
hotplug SMI handler assumes the port is always in D0 it will not be able to
find the hotplugged devices. To be on the safe side only enable runtime PM
if the port does not claim to be hotplug capable.

Furthermore we need to check that the PCI core thinks the port can go to D3
in the first place. The PCI core sets 'bridge_d3' in that case. If both
conditions are met we enable and allow runtime PM for the PCIe port. Since
'bridge_d3' can be changed anytime we check this in driver ->runtime_idle()
and ->runtime_suspend() and only allow runtime suspend if the flag is still
set. The actual power transition to D3 and back is handled in the PCI core.

Idea to automatically unblock (allow) runtime PM for PCIe ports came from
Dave Airlie.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=53811

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pcie/portdrv_pci.c | 85 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

Comments

Lukas Wunner April 12, 2016, 5:52 p.m. UTC | #1
Hi Mika,

I'm working on runtime pm for Thunderbolt on the Mac and your patches
directly impact that. A Thunderbolt controller comprises an upstream
bridge plus multiple downstream bridges below it, the latter are the
actual hotplug ports. I'm using my own runtime pm code for the PCIe
ports at the moment but will rebase on your patches once they're
finalised.

I've just pushed v2 of my patches to GitHub, this is nearly finished,
just needs some more polish before I can post it:
https://github.com/l1k/linux/commits/thunderbolt_runpm_v2

First of all, the root port on my 2012 Ivy Bridge machine suspends to
D3hot just fine, as do the upstream and downstream ports on the 2010
"Light Ridge" Thunderbolt controller. So the 2015 cut-off in patch [2/4]
seems extremely conservative to me, perhaps unnecessarily so. At least
you've made it possible to override that by setting bridge_d3 = true,
however I'd be happier if this could be extended further back, say,
to 2010. That way it would encompass all Macs with Thunderbolt.

Secondly, you're disabling runtime pm on hotplug ports, citing a
bugzilla entry as an example why this is necessary, however there's
a patch attached to that bugzilla entry which fixes the issue:
https://bugzilla.kernel.org/show_bug.cgi?id=53811

It is therefore unclear why you're disabling it. This breaks my
patches and you've not provided a way to selectively re-enable
runtime pm for specific hotplug ports.

FWIW I've had zero issues suspending the hotplug ports on the Light
Ridge Thunderbolt controller. Hotplug detection works fine even in
D3hot, all that is necessary is to resume the port on hotplug and
unplug while we access the hotplugged device's config space.

So it looks like a hotplug port should be *allowed* to suspend,
but its parent ports (by default) should *not* as we wouldn't be
able to access the hotplug port's config space anymore. On the Mac
even the parent ports may suspend because there's a separate GPE
provided which fires on hotplug during D3cold. Just disabling
runtime pm *generally* on all hotplug ports is too strict.

The changes required for pciehp to work with runtime suspended ports
are apparent from the following patch, I can spin them out into a
separate commit if you would like to include them in your series:
https://github.com/l1k/linux/commit/97d1140926670e6498f6671d91201e6d66e78680


> +static int pcie_port_runtime_idle(struct device *dev)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +
> +	/*
> +	 * Rely the PCI core has set bridge_d3 whenever it thinks the port
> +	 * should be good to go to D3. Everything else, including moving
> +	 * the port to D3, is handled by the PCI core.
> +	 */
> +	if (pdev->bridge_d3) {
> +		pm_schedule_suspend(dev, 10);
> +		return 0;
> +	}
> +	return -EBUSY;
> +}

It's unclear why the pm_schedule_suspend() is needed here and what the
10 ms delay is for. I don't have that delay in my runtime pm code and
haven't seen any issues. If the delay is needed then I'm wondering why
this isn't implemented using autosuspend?


> +static bool pcie_port_runtime_pm_possible(struct pci_dev *pdev)
> +{
> +	/*
> +	 * Only enable runtime PM if the PCI core agrees that this port can
> +	 * even go to D3.
> +	 */
> +	if (!pdev->bridge_d3)
> +		return false;
> +
> +	/*
> +	 * Prevent runtime PM if the port is hotplug capable. Otherwise the
> +	 * hotplug SMI code might not be able to enumerate devices behind
> +	 * this port properly (the port is powered down preventing all
> +	 * config space accesses to the subordinate devices).
> +	 */
> +	if (pcie_caps_reg(pdev) & PCI_EXP_FLAGS_SLOT) {
> +		u32 sltcap;
> +
> +		pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &sltcap);
> +		if (sltcap & (PCI_EXP_SLTCAP_HPC | PCI_EXP_SLTCAP_HPS))
> +			return false;
> +	}
> +
> +	return true;
> +}

Why do you re-read the register here, seems like just checking
pdev->hotplug_bridge should be sufficient?


>  static void pcie_portdrv_remove(struct pci_dev *dev)
>  {
> +	const struct pcie_port_data *pdata = pci_get_drvdata(dev);
> +
> +	if (pdata->runtime_pm_enabled) {
> +		pm_runtime_forbid(&dev->dev);
> +		pm_runtime_get_noresume(&dev->dev);
> +	}
> +
>  	pcie_port_device_remove(dev);
>  }

I think you're missing a pci_set_drvdata(dev, NULL) here.


In my runtime pm code I've set pm_runtime_no_callbacks() for the port
service devices to prevent users from mucking around with their sysfs
files. Feel free to copy that from the above-linked patch if you want.

Best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg April 13, 2016, 8:33 a.m. UTC | #2
On Tue, Apr 12, 2016 at 07:52:03PM +0200, Lukas Wunner wrote:
> Hi Mika,
> 
> I'm working on runtime pm for Thunderbolt on the Mac and your patches
> directly impact that. A Thunderbolt controller comprises an upstream
> bridge plus multiple downstream bridges below it, the latter are the
> actual hotplug ports. I'm using my own runtime pm code for the PCIe
> ports at the moment but will rebase on your patches once they're
> finalised.
> 
> I've just pushed v2 of my patches to GitHub, this is nearly finished,
> just needs some more polish before I can post it:
> https://github.com/l1k/linux/commits/thunderbolt_runpm_v2

Interesting :)

> First of all, the root port on my 2012 Ivy Bridge machine suspends to
> D3hot just fine, as do the upstream and downstream ports on the 2010
> "Light Ridge" Thunderbolt controller. So the 2015 cut-off in patch [2/4]
> seems extremely conservative to me, perhaps unnecessarily so. At least
> you've made it possible to override that by setting bridge_d3 = true,
> however I'd be happier if this could be extended further back, say,
> to 2010. That way it would encompass all Macs with Thunderbolt.

What about other non-Mac machines? Can we be sure that the same thing
works there?

I have no problem lowering the cut-off date but it should not start
breaking things. I think Windows is doing something similar but I'm not
sure what their cut-off date is.

> Secondly, you're disabling runtime pm on hotplug ports, citing a
> bugzilla entry as an example why this is necessary, however there's
> a patch attached to that bugzilla entry which fixes the issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=53811
> 
> It is therefore unclear why you're disabling it. This breaks my
> patches and you've not provided a way to selectively re-enable
> runtime pm for specific hotplug ports.

If I understand correctly, it actually does not fix anything. By luck it
works intermittently.

I've been testing on Alpine Ridge (which is the latest and greatest TBT3
host controller) and I'm still seeing the issue ->

> FWIW I've had zero issues suspending the hotplug ports on the Light
> Ridge Thunderbolt controller. Hotplug detection works fine even in
> D3hot, all that is necessary is to resume the port on hotplug and
> unplug while we access the hotplugged device's config space.
> 
> So it looks like a hotplug port should be *allowed* to suspend,
> but its parent ports (by default) should *not* as we wouldn't be
> able to access the hotplug port's config space anymore. On the Mac
> even the parent ports may suspend because there's a separate GPE
> provided which fires on hotplug during D3cold. Just disabling
> runtime pm *generally* on all hotplug ports is too strict.

<- Now, my understanding is that Macs do not use ACPI hotplug but
instead this is all native, correct? When you have the controller
exposed all the time, of course you can get hotplug events and handle
them in the driver.

However, problem arises when enumeration and configuration is actually
done in BIOS SMI handler, like in normal non-Mac PCs. If the port is in
D3 the handler is not smart enough to move it back to D0 and then
re-enumerate ports. It just gives up.

That is the reason we do not runtime suspend those ports.

We can allow runtime suspending PCIe ports that use PCIe native hotplug
but I do not have such hardware here so I'm unable to test that but
since you have, maybe we can co-operate on this :)

> The changes required for pciehp to work with runtime suspended ports
> are apparent from the following patch, I can spin them out into a
> separate commit if you would like to include them in your series:
> https://github.com/l1k/linux/commit/97d1140926670e6498f6671d91201e6d66e78680
> 
> 
> > +static int pcie_port_runtime_idle(struct device *dev)
> > +{
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > +	/*
> > +	 * Rely the PCI core has set bridge_d3 whenever it thinks the port
> > +	 * should be good to go to D3. Everything else, including moving
> > +	 * the port to D3, is handled by the PCI core.
> > +	 */
> > +	if (pdev->bridge_d3) {
> > +		pm_schedule_suspend(dev, 10);
> > +		return 0;
> > +	}
> > +	return -EBUSY;
> > +}
> 
> It's unclear why the pm_schedule_suspend() is needed here and what the
> 10 ms delay is for. I don't have that delay in my runtime pm code and
> haven't seen any issues. If the delay is needed then I'm wondering why
> this isn't implemented using autosuspend?

This part is from the original runtime PM patch. I did not want to
change 10ms delay here. Not sure if it is needed.

However, we need to have pcie_port_runtime_idle() callback because here we
actually check if we are allowed to suspend the port. Using autosuspend
does not work because of that. Documentation/power/runtime_pm.txt says
this regarding ->runtime_idle() callback:

  The action performed by the idle callback is totally dependent on the
  subsystem (or driver) in question, but the expected and recommended
  action is to check if the device can be suspended (i.e. if all of the
  conditions necessary for suspending the device are satisfied) and to
  queue up a suspend request for the device in that case.

So delay probably is not necessary but we need the callback to check if
the port can be suspended.

> > +static bool pcie_port_runtime_pm_possible(struct pci_dev *pdev)
> > +{
> > +	/*
> > +	 * Only enable runtime PM if the PCI core agrees that this port can
> > +	 * even go to D3.
> > +	 */
> > +	if (!pdev->bridge_d3)
> > +		return false;
> > +
> > +	/*
> > +	 * Prevent runtime PM if the port is hotplug capable. Otherwise the
> > +	 * hotplug SMI code might not be able to enumerate devices behind
> > +	 * this port properly (the port is powered down preventing all
> > +	 * config space accesses to the subordinate devices).
> > +	 */
> > +	if (pcie_caps_reg(pdev) & PCI_EXP_FLAGS_SLOT) {
> > +		u32 sltcap;
> > +
> > +		pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &sltcap);
> > +		if (sltcap & (PCI_EXP_SLTCAP_HPC | PCI_EXP_SLTCAP_HPS))
> > +			return false;
> > +	}
> > +
> > +	return true;
> > +}
> 
> Why do you re-read the register here, seems like just checking
> pdev->hotplug_bridge should be sufficient?

Indeed. Although it does not check for surprise hotplug I think we can
use that field instead.

> >  static void pcie_portdrv_remove(struct pci_dev *dev)
> >  {
> > +	const struct pcie_port_data *pdata = pci_get_drvdata(dev);
> > +
> > +	if (pdata->runtime_pm_enabled) {
> > +		pm_runtime_forbid(&dev->dev);
> > +		pm_runtime_get_noresume(&dev->dev);
> > +	}
> > +
> >  	pcie_port_device_remove(dev);
> >  }
> 
> I think you're missing a pci_set_drvdata(dev, NULL) here.

Device core does that automatically.

> In my runtime pm code I've set pm_runtime_no_callbacks() for the port
> service devices to prevent users from mucking around with their sysfs
> files. Feel free to copy that from the above-linked patch if you want.

OK, I'll check those.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Noever April 13, 2016, 9:08 a.m. UTC | #3
On Wed, Apr 13, 2016 at 10:33 AM, Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> On Tue, Apr 12, 2016 at 07:52:03PM +0200, Lukas Wunner wrote:
>> Hi Mika,
>>
>> I'm working on runtime pm for Thunderbolt on the Mac and your patches
>> directly impact that. A Thunderbolt controller comprises an upstream
>> bridge plus multiple downstream bridges below it, the latter are the
>> actual hotplug ports. I'm using my own runtime pm code for the PCIe
>> ports at the moment but will rebase on your patches once they're
>> finalised.
>>
>> I've just pushed v2 of my patches to GitHub, this is nearly finished,
>> just needs some more polish before I can post it:
>> https://github.com/l1k/linux/commits/thunderbolt_runpm_v2
>
> Interesting :)
>
>> First of all, the root port on my 2012 Ivy Bridge machine suspends to
>> D3hot just fine, as do the upstream and downstream ports on the 2010
>> "Light Ridge" Thunderbolt controller. So the 2015 cut-off in patch [2/4]
>> seems extremely conservative to me, perhaps unnecessarily so. At least
>> you've made it possible to override that by setting bridge_d3 = true,
>> however I'd be happier if this could be extended further back, say,
>> to 2010. That way it would encompass all Macs with Thunderbolt.
>
> What about other non-Mac machines? Can we be sure that the same thing
> works there?
>
> I have no problem lowering the cut-off date but it should not start
> breaking things. I think Windows is doing something similar but I'm not
> sure what their cut-off date is.
>
>> Secondly, you're disabling runtime pm on hotplug ports, citing a
>> bugzilla entry as an example why this is necessary, however there's
>> a patch attached to that bugzilla entry which fixes the issue:
>> https://bugzilla.kernel.org/show_bug.cgi?id=53811
>>
>> It is therefore unclear why you're disabling it. This breaks my
>> patches and you've not provided a way to selectively re-enable
>> runtime pm for specific hotplug ports.
>
> If I understand correctly, it actually does not fix anything. By luck it
> works intermittently.
>
> I've been testing on Alpine Ridge (which is the latest and greatest TBT3
> host controller) and I'm still seeing the issue ->
>
>> FWIW I've had zero issues suspending the hotplug ports on the Light
>> Ridge Thunderbolt controller. Hotplug detection works fine even in
>> D3hot, all that is necessary is to resume the port on hotplug and
>> unplug while we access the hotplugged device's config space.
>>
>> So it looks like a hotplug port should be *allowed* to suspend,
>> but its parent ports (by default) should *not* as we wouldn't be
>> able to access the hotplug port's config space anymore. On the Mac
>> even the parent ports may suspend because there's a separate GPE
>> provided which fires on hotplug during D3cold. Just disabling
>> runtime pm *generally* on all hotplug ports is too strict.
>
> <- Now, my understanding is that Macs do not use ACPI hotplug but
> instead this is all native, correct? When you have the controller
> exposed all the time, of course you can get hotplug events and handle
> them in the driver.
>
> However, problem arises when enumeration and configuration is actually
> done in BIOS SMI handler, like in normal non-Mac PCs. If the port is in
> D3 the handler is not smart enough to move it back to D0 and then
> re-enumerate ports. It just gives up.
Is this issue specific to the "ACPI/SMI Thunderbolt" implementations
used in non-Mac PCs or is PCI hotplug in general done through SMI/ACPI
instead of the native pciehp port driver? Wouldn't it make more sense
to only disable runtime suspend on non-Mac thunderbolt ports (or ports
using ACPI hotplug, if that is detectable).

Andreas

> That is the reason we do not runtime suspend those ports.
>
> We can allow runtime suspending PCIe ports that use PCIe native hotplug
> but I do not have such hardware here so I'm unable to test that but
> since you have, maybe we can co-operate on this :)
>
>> The changes required for pciehp to work with runtime suspended ports
>> are apparent from the following patch, I can spin them out into a
>> separate commit if you would like to include them in your series:
>> https://github.com/l1k/linux/commit/97d1140926670e6498f6671d91201e6d66e78680
>>
>>
>> > +static int pcie_port_runtime_idle(struct device *dev)
>> > +{
>> > +   struct pci_dev *pdev = to_pci_dev(dev);
>> > +
>> > +   /*
>> > +    * Rely the PCI core has set bridge_d3 whenever it thinks the port
>> > +    * should be good to go to D3. Everything else, including moving
>> > +    * the port to D3, is handled by the PCI core.
>> > +    */
>> > +   if (pdev->bridge_d3) {
>> > +           pm_schedule_suspend(dev, 10);
>> > +           return 0;
>> > +   }
>> > +   return -EBUSY;
>> > +}
>>
>> It's unclear why the pm_schedule_suspend() is needed here and what the
>> 10 ms delay is for. I don't have that delay in my runtime pm code and
>> haven't seen any issues. If the delay is needed then I'm wondering why
>> this isn't implemented using autosuspend?
>
> This part is from the original runtime PM patch. I did not want to
> change 10ms delay here. Not sure if it is needed.
>
> However, we need to have pcie_port_runtime_idle() callback because here we
> actually check if we are allowed to suspend the port. Using autosuspend
> does not work because of that. Documentation/power/runtime_pm.txt says
> this regarding ->runtime_idle() callback:
>
>   The action performed by the idle callback is totally dependent on the
>   subsystem (or driver) in question, but the expected and recommended
>   action is to check if the device can be suspended (i.e. if all of the
>   conditions necessary for suspending the device are satisfied) and to
>   queue up a suspend request for the device in that case.
>
> So delay probably is not necessary but we need the callback to check if
> the port can be suspended.
>
>> > +static bool pcie_port_runtime_pm_possible(struct pci_dev *pdev)
>> > +{
>> > +   /*
>> > +    * Only enable runtime PM if the PCI core agrees that this port can
>> > +    * even go to D3.
>> > +    */
>> > +   if (!pdev->bridge_d3)
>> > +           return false;
>> > +
>> > +   /*
>> > +    * Prevent runtime PM if the port is hotplug capable. Otherwise the
>> > +    * hotplug SMI code might not be able to enumerate devices behind
>> > +    * this port properly (the port is powered down preventing all
>> > +    * config space accesses to the subordinate devices).
>> > +    */
>> > +   if (pcie_caps_reg(pdev) & PCI_EXP_FLAGS_SLOT) {
>> > +           u32 sltcap;
>> > +
>> > +           pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &sltcap);
>> > +           if (sltcap & (PCI_EXP_SLTCAP_HPC | PCI_EXP_SLTCAP_HPS))
>> > +                   return false;
>> > +   }
>> > +
>> > +   return true;
>> > +}
>>
>> Why do you re-read the register here, seems like just checking
>> pdev->hotplug_bridge should be sufficient?
>
> Indeed. Although it does not check for surprise hotplug I think we can
> use that field instead.
>
>> >  static void pcie_portdrv_remove(struct pci_dev *dev)
>> >  {
>> > +   const struct pcie_port_data *pdata = pci_get_drvdata(dev);
>> > +
>> > +   if (pdata->runtime_pm_enabled) {
>> > +           pm_runtime_forbid(&dev->dev);
>> > +           pm_runtime_get_noresume(&dev->dev);
>> > +   }
>> > +
>> >     pcie_port_device_remove(dev);
>> >  }
>>
>> I think you're missing a pci_set_drvdata(dev, NULL) here.
>
> Device core does that automatically.
>
>> In my runtime pm code I've set pm_runtime_no_callbacks() for the port
>> service devices to prevent users from mucking around with their sysfs
>> files. Feel free to copy that from the above-linked patch if you want.
>
> OK, I'll check those.
>
> Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg April 13, 2016, 9:16 a.m. UTC | #4
On Wed, Apr 13, 2016 at 11:08:37AM +0200, Andreas Noever wrote:
> > <- Now, my understanding is that Macs do not use ACPI hotplug but
> > instead this is all native, correct? When you have the controller
> > exposed all the time, of course you can get hotplug events and handle
> > them in the driver.
> >
> > However, problem arises when enumeration and configuration is actually
> > done in BIOS SMI handler, like in normal non-Mac PCs. If the port is in
> > D3 the handler is not smart enough to move it back to D0 and then
> > re-enumerate ports. It just gives up.
> Is this issue specific to the "ACPI/SMI Thunderbolt" implementations
> used in non-Mac PCs or is PCI hotplug in general done through SMI/ACPI
> instead of the native pciehp port driver? Wouldn't it make more sense
> to only disable runtime suspend on non-Mac thunderbolt ports (or ports
> using ACPI hotplug, if that is detectable).

I think it is specific to ACPI hotplug implementations (non-Mac PCs). We
should be able to detect which one is used. ACPI _OSC method is used to
allow/disallow native PCIe hotplug so I think we use that information
along with the slot capabilities to decide whether we enable or disable
runtime PM.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner April 18, 2016, 2:38 p.m. UTC | #5
Hi Mika,

thanks for enabling runtime pm on native pciehp ports in v3 of your
series. I've ditched my own runtime pm code now, rebased on top of
your v3 and got everything working again:
https://github.com/l1k/linux/commits/thunderbolt_runpm_v2


There's one thing I had to change in your patches: Patch [4/4] only
allows runtime pm if pci_dev->bridge_d3 is set when portdrv loads.
However bridge_d3 can change over time. Let's say the bridge is a
Thunderbolt port which has a device attached on boot which is PME
capable but not from D3cold. Then runtime pm will not be allowed.
Once that device is unplugged, bridge_d3 will be set to true,
but the port won't go to D3 because runtime pm wasn't allowed.

The solution is to allow runtime pm regardless of bridge_d3, and to rely
solely on the bridge_d3 checks in ->runtime_idle and ->runtime_suspend.
Since the only remaining check in pcie_port_runtime_pm_possible()
(whether it's an ACPI hotplug port) doesn't change over time, we can
just call that function again in ->remove and therefore struct
pcie_port_data is no longer necessary.

In case you agree with this, I've prepared a fixup patch which you
can apply with "git rebase --autosquash":
https://github.com/l1k/linux/commit/24fc9d7b34ffc88f562fa67be248f95dd488da1e


Another thing I've spotted but that wasn't needed to get my patches
working: When the bridge_d3 attribute changes to true, you need to
notify the PM core thereof by calling pm_request_idle() for the
bridge device, otherwise the bridge needlessly stays awake. This
needs to be added near the end of pci_bridge_d3_update() I guess.


I've spun out the changes necessary for pciehp to support runtime pm
into a separate commit now. You could either include that in your
series or I could post it as part of my series later on:
https://github.com/l1k/linux/commit/0800e6851264960a3148a5d81076ad079e80caff

One detail I'm not sure about: If you've got a hotplug downstream port
behind an upstream port and the upstream port goes to D3hot, is it
still possible for the downstream port to signal hotplug interrupts?
In other words, can INTx or MSI interrupts generated by the downstream
bridge still be routed via the upstream bridge if the upstream bridge
is in D3hot? I cannot easily test this, I'd have to hack the tg3 driver
for my Thunderbolt Ethernet adapter to use runtime suspend.

My guess is that the interrupts are *not* routed if the upstream bridge
is in D3. If that is true then the algorithm in pci_bridge_d3_update()
and pci_dev_check_d3cold() needs to be amended so that the hotplug bridge
may suspend but anything above it may not.


On Wed, Apr 13, 2016 at 11:33:52AM +0300, Mika Westerberg wrote:
> I did not change the cut-off date from 2015 yet to be on the safe side,
> even if older Macs seem to work just fine. Maybe it can be lowered to 2013
> or so but I would like to hear from Bjorn and Rafael what they think about
> that.

My opinion is that we should strive for maximum power saving, thus try
(at least) 2010 and blacklist individual devices as needed.


> On Tue, Apr 12, 2016 at 07:52:03PM +0200, Lukas Wunner wrote:
> > It's unclear why the pm_schedule_suspend() is needed here and what the
> > 10 ms delay is for. I don't have that delay in my runtime pm code and
> > haven't seen any issues. If the delay is needed then I'm wondering why
> > this isn't implemented using autosuspend?
> 
> This part is from the original runtime PM patch. I did not want to
> change 10ms delay here. Not sure if it is needed.

Found it, the delay is explained in the commit message of 3d8387efe1ad
("PCI/PM: Fix config reg access for D3cold and bridge suspending").
It's supposed to prevent repeatedly suspending and resuming for every
configuration register read/write.

That makes sense but I'd suggest changing this to autosuspend,
thereby allowing users to change it to their heart's content.

The delay should also be re-explained in the commit message of
patch [4/4].


> However, we need to have pcie_port_runtime_idle() callback because here we
> actually check if we are allowed to suspend the port. Using autosuspend
> does not work because of that. Documentation/power/runtime_pm.txt says
> this regarding ->runtime_idle() callback:
> 
>   The action performed by the idle callback is totally dependent on the
>   subsystem (or driver) in question, but the expected and recommended
>   action is to check if the device can be suspended (i.e. if all of the
>   conditions necessary for suspending the device are satisfied) and to
>   queue up a suspend request for the device in that case.
> 
> So delay probably is not necessary but we need the callback to check if
> the port can be suspended.

If ->runtime_idle returns 0, the PM core automatically calls
->runtime_suspend. So there's no need for pm_schedule_suspend().


One more thing, when reviewing patch [2/4], it took me a while to
grasp that you've chosen to determine *in advance* whether a bridge
is allowed to go to D3. The previous attempt at D3 support for PCIe
ports, 448bd857d48e ("PCI/PM: add PCIe runtime D3cold support"),
took the opposite approach and determined this *on demand* in
pcie_port_runtime_suspend(). It may be worth making that explicit
in the commit message so that it's easier for an uninitiated reader
to comprehend what's going on.

I also found it difficult to understand the precise meaning of the
various d3 attributes that we now have in struct pci_dev. That isn't
your fault, it's caused by how things have evolved over time, but it
may be worth to pay back this technical debt while we're at it.
The comment for no_d3cold says "D3cold is forbidden". The actual
meaning is that drivers may set this to true in their ->runtime_suspend
callback to prevent the platform from putting the device into D3cold.
So perhaps a more appropriate comment would be "Runtime d3cold forbidden
by driver".

Likewise the comment for d3cold_allowed is "D3cold is allowed by user".
In reality this attribute is set to true by default, so there's nothing
for the user to allow. Rather, the user may set it to *false* to prevent
the platform from runtime suspending the device to D3cold. A more
appropriate comment would be "Runtime d3cold not forbidden by user".
Alternatively the variable name could be changed to d3cold_forbidden,
or d3cold_usr_forbidden. (Then no_d3cold could be renamed to
d3cold_drv_forbidden.)


Best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg April 19, 2016, 12:31 p.m. UTC | #6
On Mon, Apr 18, 2016 at 04:38:23PM +0200, Lukas Wunner wrote:
> Hi Mika,
> 
> thanks for enabling runtime pm on native pciehp ports in v3 of your
> series. I've ditched my own runtime pm code now, rebased on top of
> your v3 and got everything working again:
> https://github.com/l1k/linux/commits/thunderbolt_runpm_v2

Cool, thanks for testing :)

> There's one thing I had to change in your patches: Patch [4/4] only
> allows runtime pm if pci_dev->bridge_d3 is set when portdrv loads.
> However bridge_d3 can change over time. Let's say the bridge is a
> Thunderbolt port which has a device attached on boot which is PME
> capable but not from D3cold. Then runtime pm will not be allowed.
> Once that device is unplugged, bridge_d3 will be set to true,
> but the port won't go to D3 because runtime pm wasn't allowed.
> 
> The solution is to allow runtime pm regardless of bridge_d3, and to rely
> solely on the bridge_d3 checks in ->runtime_idle and ->runtime_suspend.
> Since the only remaining check in pcie_port_runtime_pm_possible()
> (whether it's an ACPI hotplug port) doesn't change over time, we can
> just call that function again in ->remove and therefore struct
> pcie_port_data is no longer necessary.

I thought about that also but then decided to add the check because it
could mislead the user to think their PCIe port has runtime PM enabled
even when it actually is not (because we always return -EBUSY from the
callbacks).

But then again they can enable runtime PM without these patches and the
port is never suspended anyway.

> In case you agree with this, I've prepared a fixup patch which you
> can apply with "git rebase --autosquash":
> https://github.com/l1k/linux/commit/24fc9d7b34ffc88f562fa67be248f95dd488da1e

Based on the above, I'm going to fold your commit to [4/4]. Are you OK
if I add your Signed-off-by to that patch?

> Another thing I've spotted but that wasn't needed to get my patches
> working: When the bridge_d3 attribute changes to true, you need to
> notify the PM core thereof by calling pm_request_idle() for the
> bridge device, otherwise the bridge needlessly stays awake. This
> needs to be added near the end of pci_bridge_d3_update() I guess.

I've been testing this by writing to 'd3cold_allowed' sysfs file and it
suspends just fine when it is 1. How do you reproduce this?

> I've spun out the changes necessary for pciehp to support runtime pm
> into a separate commit now. You could either include that in your
> series or I could post it as part of my series later on:
> https://github.com/l1k/linux/commit/0800e6851264960a3148a5d81076ad079e80caff

I think it may be better if you submit that as part of your series. I
have no means to test it properly here.

> One detail I'm not sure about: If you've got a hotplug downstream port
> behind an upstream port and the upstream port goes to D3hot, is it
> still possible for the downstream port to signal hotplug interrupts?
> In other words, can INTx or MSI interrupts generated by the downstream
> bridge still be routed via the upstream bridge if the upstream bridge
> is in D3hot? I cannot easily test this, I'd have to hack the tg3 driver
> for my Thunderbolt Ethernet adapter to use runtime suspend.

If the downstream port is able to trigger wake (PME) from D3cold, I
think it should work but I'm not 100% sure.

> My guess is that the interrupts are *not* routed if the upstream bridge
> is in D3. If that is true then the algorithm in pci_bridge_d3_update()
> and pci_dev_check_d3cold() needs to be amended so that the hotplug bridge
> may suspend but anything above it may not.

PMEs should be forwarded upstream even if the upstream bridge is in
D3hot.

> On Wed, Apr 13, 2016 at 11:33:52AM +0300, Mika Westerberg wrote:
> > I did not change the cut-off date from 2015 yet to be on the safe side,
> > even if older Macs seem to work just fine. Maybe it can be lowered to 2013
> > or so but I would like to hear from Bjorn and Rafael what they think about
> > that.
> 
> My opinion is that we should strive for maximum power saving, thus try
> (at least) 2010 and blacklist individual devices as needed.

IMHO we should strive for working systems and not maximum power saving
but I'm OK to change that if Bjorn or Rafael are fine with it.

> > On Tue, Apr 12, 2016 at 07:52:03PM +0200, Lukas Wunner wrote:
> > > It's unclear why the pm_schedule_suspend() is needed here and what the
> > > 10 ms delay is for. I don't have that delay in my runtime pm code and
> > > haven't seen any issues. If the delay is needed then I'm wondering why
> > > this isn't implemented using autosuspend?
> > 
> > This part is from the original runtime PM patch. I did not want to
> > change 10ms delay here. Not sure if it is needed.
> 
> Found it, the delay is explained in the commit message of 3d8387efe1ad
> ("PCI/PM: Fix config reg access for D3cold and bridge suspending").
> It's supposed to prevent repeatedly suspending and resuming for every
> configuration register read/write.

Indeed, thanks for looking it up.

> That makes sense but I'd suggest changing this to autosuspend,
> thereby allowing users to change it to their heart's content.

OK, I'll change it to use autosuspend.

> The delay should also be re-explained in the commit message of
> patch [4/4].

Sure, I'll explain it in the commit message.

> > However, we need to have pcie_port_runtime_idle() callback because here we
> > actually check if we are allowed to suspend the port. Using autosuspend
> > does not work because of that. Documentation/power/runtime_pm.txt says
> > this regarding ->runtime_idle() callback:
> > 
> >   The action performed by the idle callback is totally dependent on the
> >   subsystem (or driver) in question, but the expected and recommended
> >   action is to check if the device can be suspended (i.e. if all of the
> >   conditions necessary for suspending the device are satisfied) and to
> >   queue up a suspend request for the device in that case.
> > 
> > So delay probably is not necessary but we need the callback to check if
> > the port can be suspended.
> 
> If ->runtime_idle returns 0, the PM core automatically calls
> ->runtime_suspend. So there's no need for pm_schedule_suspend().

I see, thanks.

> One more thing, when reviewing patch [2/4], it took me a while to
> grasp that you've chosen to determine *in advance* whether a bridge
> is allowed to go to D3. The previous attempt at D3 support for PCIe
> ports, 448bd857d48e ("PCI/PM: add PCIe runtime D3cold support"),
> took the opposite approach and determined this *on demand* in
> pcie_port_runtime_suspend(). It may be worth making that explicit
> in the commit message so that it's easier for an uninitiated reader
> to comprehend what's going on.

OK, I'll mention that in the commit message then.

> I also found it difficult to understand the precise meaning of the
> various d3 attributes that we now have in struct pci_dev. That isn't
> your fault, it's caused by how things have evolved over time, but it
> may be worth to pay back this technical debt while we're at it.
> The comment for no_d3cold says "D3cold is forbidden". The actual
> meaning is that drivers may set this to true in their ->runtime_suspend
> callback to prevent the platform from putting the device into D3cold.
> So perhaps a more appropriate comment would be "Runtime d3cold forbidden
> by driver".

I also got confused by it first so I think adding "by driver" makes it
more understandable. I can cook a separate patch changing this.

> Likewise the comment for d3cold_allowed is "D3cold is allowed by user".
> In reality this attribute is set to true by default, so there's nothing
> for the user to allow. Rather, the user may set it to *false* to prevent
> the platform from runtime suspending the device to D3cold. A more
> appropriate comment would be "Runtime d3cold not forbidden by user".
> Alternatively the variable name could be changed to d3cold_forbidden,
> or d3cold_usr_forbidden. (Then no_d3cold could be renamed to
> d3cold_drv_forbidden.)

Here, I think the comment is fine (explains meaning of that field), at
least I was able to understand what it means :) Also it is not only
about runtime PM but it concerns system sleep as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner April 20, 2016, 7:22 p.m. UTC | #7
Hi Mika,

On Tue, Apr 19, 2016 at 03:31:31PM +0300, Mika Westerberg wrote:
> On Mon, Apr 18, 2016 at 04:38:23PM +0200, Lukas Wunner wrote:
> > In case you agree with this, I've prepared a fixup patch which you
> > can apply with "git rebase --autosquash":
> > https://github.com/l1k/linux/commit/24fc9d7b34ff
> 
> Based on the above, I'm going to fold your commit to [4/4]. Are you OK
> if I add your Signed-off-by to that patch?

Sure.


> > Another thing I've spotted but that wasn't needed to get my patches
> > working: When the bridge_d3 attribute changes to true, you need to
> > notify the PM core thereof by calling pm_request_idle() for the
> > bridge device, otherwise the bridge needlessly stays awake. This
> > needs to be added near the end of pci_bridge_d3_update() I guess.
> 
> I've been testing this by writing to 'd3cold_allowed' sysfs file and it
> suspends just fine when it is 1. How do you reproduce this?

Testing with d3cold_allowed is misleading in this case because
d3cold_allowed_store() already calls pm_runtime_resume(dev),
and the next time dev runtime suspends, it automatically calls
rpm_idle() on its parent. This masks that a call to pm_request_idle()
is currently missing in pci_bridge_d3_update().

However pci_bridge_d3_update() also gets called e.g. from
pci_remove_bus_device() and there's no call to pm_request_idle()
there, so a bridge would stay awake even if a child that has blocked
d3 has been removed.

I only noticed this because I force bridge_d3 = true when loading
the thunderbolt upstream bridge driver, as a workaround for the BIOS
cut-off date, and noticed that I had to call pm_request_idle() because
otherwise the bridges would stay awake.


I hadn't played with d3cold_allowed before but tested it now that
you've mentioned it. Found a bug there:

If d3cold_allowed is set to false for a device via sysfs, the bridge_d3
attribute of its parent bridges will correctly be updated to false.

If d3cold_allowed is then set to true and the device has runtime
suspended in the meantime, the bridge_d3 attribute of its parent bridges
is *not* reverted back to true as it should. The reason is that when the
device runtime suspended, its no_d3cold attribute was set to false in
pci_pm_runtime_suspend() because d3cold_allowed was false. But no_d3cold
isn't reverted back to true when setting d3cold_allowed to true and
because this attribute is checked in pci_dev_check_d3cold(), the parent
bridges' bridge_d3 attribute remains false.

no_d3cold is just a transient variable used by pci_pm_runtime_suspend()
and acpi_pci_choose_state(), which is indirectly called from it via
pci_finish_runtime_suspend(). One can fix this by reverting no_d3cold
back to false at the end of pci_pm_runtime_suspend(), like this:
https://github.com/l1k/linux/commit/9b9ffb8


> > One detail I'm not sure about: If you've got a hotplug downstream port
> > behind an upstream port and the upstream port goes to D3hot, is it
> > still possible for the downstream port to signal hotplug interrupts?
> > In other words, can INTx or MSI interrupts generated by the downstream
> > bridge still be routed via the upstream bridge if the upstream bridge
> > is in D3hot? I cannot easily test this, I'd have to hack the tg3 driver
> > for my Thunderbolt Ethernet adapter to use runtime suspend.
> 
> If the downstream port is able to trigger wake (PME) from D3cold, I
> think it should work but I'm not 100% sure.

I've been able to test this now with a hacked tg3 driver and it's as
I expected: A hotplug port may go to D3hot and will still generate
interrupts on hotplug, but all its parent ports are *not* allowed
to go to D3hot because otherwise the hotplug interrupts will no longer
come through. The algorithm in pci_bridge_d3_update() and
pci_dev_check_d3cold() needs to be amended to take that into account.
Hm, it's nontrivial I guess, allowing bridge_d3 = true for the lowest
hotplug bridge in a hierarchy but not for anything above?


> > My opinion is that we should strive for maximum power saving, thus try
> > (at least) 2010 and blacklist individual devices as needed.
> 
> IMHO we should strive for working systems and not maximum power saving
> but I'm OK to change that if Bjorn or Rafael are fine with it.

They've kept mum so far. :-)


Best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki April 20, 2016, 8:23 p.m. UTC | #8
On Wed, Apr 20, 2016 at 9:22 PM, Lukas Wunner <lukas@wunner.de> wrote:
> Hi Mika,
>
> On Tue, Apr 19, 2016 at 03:31:31PM +0300, Mika Westerberg wrote:
>> On Mon, Apr 18, 2016 at 04:38:23PM +0200, Lukas Wunner wrote:
>> > In case you agree with this, I've prepared a fixup patch which you
>> > can apply with "git rebase --autosquash":
>> > https://github.com/l1k/linux/commit/24fc9d7b34ff
>>
>> Based on the above, I'm going to fold your commit to [4/4]. Are you OK
>> if I add your Signed-off-by to that patch?
>
> Sure.
>
>
>> > Another thing I've spotted but that wasn't needed to get my patches
>> > working: When the bridge_d3 attribute changes to true, you need to
>> > notify the PM core thereof by calling pm_request_idle() for the
>> > bridge device, otherwise the bridge needlessly stays awake. This
>> > needs to be added near the end of pci_bridge_d3_update() I guess.
>>
>> I've been testing this by writing to 'd3cold_allowed' sysfs file and it
>> suspends just fine when it is 1. How do you reproduce this?
>
> Testing with d3cold_allowed is misleading in this case because
> d3cold_allowed_store() already calls pm_runtime_resume(dev),
> and the next time dev runtime suspends, it automatically calls
> rpm_idle() on its parent. This masks that a call to pm_request_idle()
> is currently missing in pci_bridge_d3_update().
>
> However pci_bridge_d3_update() also gets called e.g. from
> pci_remove_bus_device() and there's no call to pm_request_idle()
> there, so a bridge would stay awake even if a child that has blocked
> d3 has been removed.
>
> I only noticed this because I force bridge_d3 = true when loading
> the thunderbolt upstream bridge driver, as a workaround for the BIOS
> cut-off date, and noticed that I had to call pm_request_idle() because
> otherwise the bridges would stay awake.
>
>
> I hadn't played with d3cold_allowed before but tested it now that
> you've mentioned it. Found a bug there:
>
> If d3cold_allowed is set to false for a device via sysfs, the bridge_d3
> attribute of its parent bridges will correctly be updated to false.
>
> If d3cold_allowed is then set to true and the device has runtime
> suspended in the meantime, the bridge_d3 attribute of its parent bridges
> is *not* reverted back to true as it should. The reason is that when the
> device runtime suspended, its no_d3cold attribute was set to false in
> pci_pm_runtime_suspend() because d3cold_allowed was false. But no_d3cold
> isn't reverted back to true when setting d3cold_allowed to true and
> because this attribute is checked in pci_dev_check_d3cold(), the parent
> bridges' bridge_d3 attribute remains false.
>
> no_d3cold is just a transient variable used by pci_pm_runtime_suspend()
> and acpi_pci_choose_state(), which is indirectly called from it via
> pci_finish_runtime_suspend(). One can fix this by reverting no_d3cold
> back to false at the end of pci_pm_runtime_suspend(), like this:
> https://github.com/l1k/linux/commit/9b9ffb8
>
>
>> > One detail I'm not sure about: If you've got a hotplug downstream port
>> > behind an upstream port and the upstream port goes to D3hot, is it
>> > still possible for the downstream port to signal hotplug interrupts?
>> > In other words, can INTx or MSI interrupts generated by the downstream
>> > bridge still be routed via the upstream bridge if the upstream bridge
>> > is in D3hot? I cannot easily test this, I'd have to hack the tg3 driver
>> > for my Thunderbolt Ethernet adapter to use runtime suspend.
>>
>> If the downstream port is able to trigger wake (PME) from D3cold, I
>> think it should work but I'm not 100% sure.
>
> I've been able to test this now with a hacked tg3 driver and it's as
> I expected: A hotplug port may go to D3hot and will still generate
> interrupts on hotplug, but all its parent ports are *not* allowed
> to go to D3hot because otherwise the hotplug interrupts will no longer
> come through. The algorithm in pci_bridge_d3_update() and
> pci_dev_check_d3cold() needs to be amended to take that into account.
> Hm, it's nontrivial I guess, allowing bridge_d3 = true for the lowest
> hotplug bridge in a hierarchy but not for anything above?
>
>
>> > My opinion is that we should strive for maximum power saving, thus try
>> > (at least) 2010 and blacklist individual devices as needed.
>>
>> IMHO we should strive for working systems and not maximum power saving
>> but I'm OK to change that if Bjorn or Rafael are fine with it.
>
> They've kept mum so far. :-)

Breaking any systems that work today is not an option.  if that
happens, the commit that has done that is a clear candidate for
reverting.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg April 21, 2016, 1:10 p.m. UTC | #9
On Wed, Apr 20, 2016 at 09:22:11PM +0200, Lukas Wunner wrote:
> > I've been testing this by writing to 'd3cold_allowed' sysfs file and it
> > suspends just fine when it is 1. How do you reproduce this?
> 
> Testing with d3cold_allowed is misleading in this case because
> d3cold_allowed_store() already calls pm_runtime_resume(dev),
> and the next time dev runtime suspends, it automatically calls
> rpm_idle() on its parent. This masks that a call to pm_request_idle()
> is currently missing in pci_bridge_d3_update().

I don't think pci_bridge_d3_* functions should care about device runtime
PM at all. They are meant to update whether the bridge can go to D3 or
not nothing else.

> However pci_bridge_d3_update() also gets called e.g. from
> pci_remove_bus_device() and there's no call to pm_request_idle()
> there, so a bridge would stay awake even if a child that has blocked
> d3 has been removed.

As far as I can tell removing a device ends up calling
__device_release_driver() where runtime PM is updated accordingly. That
should trigger the parent device (upstream bridge) to runtime suspend if
there is no more active children.

> I only noticed this because I force bridge_d3 = true when loading
> the thunderbolt upstream bridge driver, as a workaround for the BIOS
> cut-off date, and noticed that I had to call pm_request_idle() because
> otherwise the bridges would stay awake.

Well, that's probably the right thing to do. bridge_d3 is about whether
the bridge can go to D3, not if it should runtime suspend right now.

> I hadn't played with d3cold_allowed before but tested it now that
> you've mentioned it. Found a bug there:
> 
> If d3cold_allowed is set to false for a device via sysfs, the bridge_d3
> attribute of its parent bridges will correctly be updated to false.
> 
> If d3cold_allowed is then set to true and the device has runtime
> suspended in the meantime, the bridge_d3 attribute of its parent bridges
> is *not* reverted back to true as it should. The reason is that when the
> device runtime suspended, its no_d3cold attribute was set to false in
> pci_pm_runtime_suspend() because d3cold_allowed was false. But no_d3cold
> isn't reverted back to true when setting d3cold_allowed to true and
> because this attribute is checked in pci_dev_check_d3cold(), the parent
> bridges' bridge_d3 attribute remains false.

Yes, you are right.

> no_d3cold is just a transient variable used by pci_pm_runtime_suspend()
> and acpi_pci_choose_state(), which is indirectly called from it via
> pci_finish_runtime_suspend(). One can fix this by reverting no_d3cold
> back to false at the end of pci_pm_runtime_suspend(), like this:
> https://github.com/l1k/linux/commit/9b9ffb8

That works or alternatively we could make no_d3cold sticky by making all
changes go through pci_d3cold_enable()/disable(). That way upstream
bridges ->bridge_d3 should be in always in sync.

> > > One detail I'm not sure about: If you've got a hotplug downstream port
> > > behind an upstream port and the upstream port goes to D3hot, is it
> > > still possible for the downstream port to signal hotplug interrupts?
> > > In other words, can INTx or MSI interrupts generated by the downstream
> > > bridge still be routed via the upstream bridge if the upstream bridge
> > > is in D3hot? I cannot easily test this, I'd have to hack the tg3 driver
> > > for my Thunderbolt Ethernet adapter to use runtime suspend.
> > 
> > If the downstream port is able to trigger wake (PME) from D3cold, I
> > think it should work but I'm not 100% sure.
> 
> I've been able to test this now with a hacked tg3 driver and it's as
> I expected: A hotplug port may go to D3hot and will still generate
> interrupts on hotplug, but all its parent ports are *not* allowed
> to go to D3hot because otherwise the hotplug interrupts will no longer
> come through.

Interrupts are not possible from any other state than D0 so it is always
PME that gets sent upstream.

Once you move parent port of that downstream port to D3hot it means that
the downstream port is, in fact in D3cold and the link may be in L2 or
L3 (depending on the platform). So a hotplug capable port must be able
to trigger PME from D3cold and you need to enable that as well.

> The algorithm in pci_bridge_d3_update() and pci_dev_check_d3cold()
> needs to be amended to take that into account. Hm, it's nontrivial I
> guess, allowing bridge_d3 = true for the lowest hotplug bridge in a
> hierarchy but not for anything above?

If we need to do things like that, it will get pretty complex and we
still cannot be sure whether hotplug works. I think it is safer to go
back to what I already had and disable runtime PM from such ports.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg April 21, 2016, 1:12 p.m. UTC | #10
On Wed, Apr 20, 2016 at 10:23:33PM +0200, Rafael J. Wysocki wrote:
> Breaking any systems that work today is not an option.  if that
> happens, the commit that has done that is a clear candidate for
> reverting.

Understood, thanks.

BTW, do you have any preference regarding the cut-off date?
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki April 21, 2016, 7:19 p.m. UTC | #11
On Thursday, April 21, 2016 04:12:24 PM Mika Westerberg wrote:
> On Wed, Apr 20, 2016 at 10:23:33PM +0200, Rafael J. Wysocki wrote:
> > Breaking any systems that work today is not an option.  if that
> > happens, the commit that has done that is a clear candidate for
> > reverting.
> 
> Understood, thanks.
> 
> BTW, do you have any preference regarding the cut-off date?

Some time around when machines with Windows 10 started to ship should be
relatively safe.

I guess we can just pick a reasonable date in the initial patch and then
try to move it back to the past subsequently and see if that breaks things
for anyone.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Noever April 21, 2016, 11:25 p.m. UTC | #12
On Thu, Apr 21, 2016 at 9:19 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, April 21, 2016 04:12:24 PM Mika Westerberg wrote:
>> On Wed, Apr 20, 2016 at 10:23:33PM +0200, Rafael J. Wysocki wrote:
>> > Breaking any systems that work today is not an option.  if that
>> > happens, the commit that has done that is a clear candidate for
>> > reverting.
>>
>> Understood, thanks.
>>
>> BTW, do you have any preference regarding the cut-off date?
>
> Some time around when machines with Windows 10 started to ship should be
> relatively safe.
>
> I guess we can just pick a reasonable date in the initial patch and then
> try to move it back to the past subsequently and see if that breaks things
> for anyone.

Maybe add a boot option to overwrite the heuristic (in both
directions) to allow people to test this on older machines and
pinpoint breakage on newer machines?

> Thanks,
> Rafael
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki April 22, 2016, 12:26 a.m. UTC | #13
On Fri, Apr 22, 2016 at 1:25 AM, Andreas Noever
<andreas.noever@gmail.com> wrote:
> On Thu, Apr 21, 2016 at 9:19 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Thursday, April 21, 2016 04:12:24 PM Mika Westerberg wrote:
>>> On Wed, Apr 20, 2016 at 10:23:33PM +0200, Rafael J. Wysocki wrote:
>>> > Breaking any systems that work today is not an option.  if that
>>> > happens, the commit that has done that is a clear candidate for
>>> > reverting.
>>>
>>> Understood, thanks.
>>>
>>> BTW, do you have any preference regarding the cut-off date?
>>
>> Some time around when machines with Windows 10 started to ship should be
>> relatively safe.
>>
>> I guess we can just pick a reasonable date in the initial patch and then
>> try to move it back to the past subsequently and see if that breaks things
>> for anyone.
>
> Maybe add a boot option to overwrite the heuristic (in both
> directions) to allow people to test this on older machines and
> pinpoint breakage on newer machines?

Yes, a kernel command line override would be fine by me.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg April 22, 2016, 9:10 a.m. UTC | #14
On Fri, Apr 22, 2016 at 02:26:22AM +0200, Rafael J. Wysocki wrote:
> On Fri, Apr 22, 2016 at 1:25 AM, Andreas Noever
> <andreas.noever@gmail.com> wrote:
> > On Thu, Apr 21, 2016 at 9:19 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >> On Thursday, April 21, 2016 04:12:24 PM Mika Westerberg wrote:
> >>> On Wed, Apr 20, 2016 at 10:23:33PM +0200, Rafael J. Wysocki wrote:
> >>> > Breaking any systems that work today is not an option.  if that
> >>> > happens, the commit that has done that is a clear candidate for
> >>> > reverting.
> >>>
> >>> Understood, thanks.
> >>>
> >>> BTW, do you have any preference regarding the cut-off date?
> >>
> >> Some time around when machines with Windows 10 started to ship should be
> >> relatively safe.
> >>
> >> I guess we can just pick a reasonable date in the initial patch and then
> >> try to move it back to the past subsequently and see if that breaks things
> >> for anyone.
> >
> > Maybe add a boot option to overwrite the heuristic (in both
> > directions) to allow people to test this on older machines and
> > pinpoint breakage on newer machines?
> 
> Yes, a kernel command line override would be fine by me.

OK, so to summarize:

 - Use cut-off date 2015 which is the year Windows 10 was released

 - Add command line parameter that allows this to be overridden.
   Something like:

	pcie_port_pm=   [PCIE] PCIe port power management handling:
		off     Disable power management off all PCIe ports
		force   Forcibly enable power management of all PCIe ports
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki April 22, 2016, 12:37 p.m. UTC | #15
On Friday, April 22, 2016 12:10:26 PM Mika Westerberg wrote:
> On Fri, Apr 22, 2016 at 02:26:22AM +0200, Rafael J. Wysocki wrote:
> > On Fri, Apr 22, 2016 at 1:25 AM, Andreas Noever
> > <andreas.noever@gmail.com> wrote:
> > > On Thu, Apr 21, 2016 at 9:19 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >> On Thursday, April 21, 2016 04:12:24 PM Mika Westerberg wrote:
> > >>> On Wed, Apr 20, 2016 at 10:23:33PM +0200, Rafael J. Wysocki wrote:
> > >>> > Breaking any systems that work today is not an option.  if that
> > >>> > happens, the commit that has done that is a clear candidate for
> > >>> > reverting.
> > >>>
> > >>> Understood, thanks.
> > >>>
> > >>> BTW, do you have any preference regarding the cut-off date?
> > >>
> > >> Some time around when machines with Windows 10 started to ship should be
> > >> relatively safe.
> > >>
> > >> I guess we can just pick a reasonable date in the initial patch and then
> > >> try to move it back to the past subsequently and see if that breaks things
> > >> for anyone.
> > >
> > > Maybe add a boot option to overwrite the heuristic (in both
> > > directions) to allow people to test this on older machines and
> > > pinpoint breakage on newer machines?
> > 
> > Yes, a kernel command line override would be fine by me.
> 
> OK, so to summarize:
> 
>  - Use cut-off date 2015 which is the year Windows 10 was released
> 
>  - Add command line parameter that allows this to be overridden.
>    Something like:
> 
> 	pcie_port_pm=   [PCIE] PCIe port power management handling:
> 		off     Disable power management off all PCIe ports
> 		force   Forcibly enable power management of all PCIe ports

Correct, unless Bjorn has a differing opinion.

Bjorn?

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner April 24, 2016, 4:13 p.m. UTC | #16
Hi Mika,

On Thu, Apr 21, 2016 at 04:10:02PM +0300, Mika Westerberg wrote:
> On Wed, Apr 20, 2016 at 09:22:11PM +0200, Lukas Wunner wrote:
> > However pci_bridge_d3_update() also gets called e.g. from
> > pci_remove_bus_device() and there's no call to pm_request_idle()
> > there, so a bridge would stay awake even if a child that has blocked
> > d3 has been removed.
> 
> As far as I can tell removing a device ends up calling
> __device_release_driver() where runtime PM is updated accordingly. That
> should trigger the parent device (upstream bridge) to runtime suspend if
> there is no more active children.

Right, makes sense.


> > I've been able to test this now with a hacked tg3 driver and it's as
> > I expected: A hotplug port may go to D3hot and will still generate
> > interrupts on hotplug, but all its parent ports are *not* allowed
> > to go to D3hot because otherwise the hotplug interrupts will no longer
> > come through.
> 
> Interrupts are not possible from any other state than D0 so it is always
> PME that gets sent upstream.
> 
> Once you move parent port of that downstream port to D3hot it means that
> the downstream port is, in fact in D3cold and the link may be in L2 or
> L3 (depending on the platform). So a hotplug capable port must be able
> to trigger PME from D3cold and you need to enable that as well.

The PME WAKE# pin of Thunderbolt controllers built into Macs is wired
to the southbridge so that it wakes the entire system from sleep.
IIUC for the downstream port to deliver a side-band interrupt to
the root port so that the root port resumes from D3, WAKE# would have
to be wired to the root complex.

In the case of Thunderbolt (as compared to CardBus or whatever),
there's a separate wake pin on the controller which signals such
a side-band interrupt on hotplug, on Macs it's delivered as a GPE.


> > The algorithm in pci_bridge_d3_update() and pci_dev_check_d3cold()
> > needs to be amended to take that into account. Hm, it's nontrivial I
> > guess, allowing bridge_d3 = true for the lowest hotplug bridge in a
> > hierarchy but not for anything above?
> 
> If we need to do things like that, it will get pretty complex and we
> still cannot be sure whether hotplug works. I think it is safer to go
> back to what I already had and disable runtime PM from such ports.

Okay, maybe I'll just solve this by allowing D3 for all PCIe ports
that belong to a Thunderbolt device if DMI says we're on a Mac.

Best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 6c6bb03392ea..bbe86527788c 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -58,6 +58,10 @@  __setup("pcie_ports=", pcie_port_setup);
 
 /* global data */
 
+struct pcie_port_data {
+	bool runtime_pm_enabled;
+};
+
 /**
  * pcie_clear_root_pme_status - Clear root port PME interrupt status.
  * @dev: PCIe root port or event collector.
@@ -93,6 +97,58 @@  static int pcie_port_resume_noirq(struct device *dev)
 	return 0;
 }
 
+static int pcie_port_runtime_suspend(struct device *dev)
+{
+	return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
+}
+
+static int pcie_port_runtime_resume(struct device *dev)
+{
+	return 0;
+}
+
+static int pcie_port_runtime_idle(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	/*
+	 * Rely the PCI core has set bridge_d3 whenever it thinks the port
+	 * should be good to go to D3. Everything else, including moving
+	 * the port to D3, is handled by the PCI core.
+	 */
+	if (pdev->bridge_d3) {
+		pm_schedule_suspend(dev, 10);
+		return 0;
+	}
+	return -EBUSY;
+}
+
+static bool pcie_port_runtime_pm_possible(struct pci_dev *pdev)
+{
+	/*
+	 * Only enable runtime PM if the PCI core agrees that this port can
+	 * even go to D3.
+	 */
+	if (!pdev->bridge_d3)
+		return false;
+
+	/*
+	 * Prevent runtime PM if the port is hotplug capable. Otherwise the
+	 * hotplug SMI code might not be able to enumerate devices behind
+	 * this port properly (the port is powered down preventing all
+	 * config space accesses to the subordinate devices).
+	 */
+	if (pcie_caps_reg(pdev) & PCI_EXP_FLAGS_SLOT) {
+		u32 sltcap;
+
+		pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &sltcap);
+		if (sltcap & (PCI_EXP_SLTCAP_HPC | PCI_EXP_SLTCAP_HPS))
+			return false;
+	}
+
+	return true;
+}
+
 static const struct dev_pm_ops pcie_portdrv_pm_ops = {
 	.suspend	= pcie_port_device_suspend,
 	.resume		= pcie_port_device_resume,
@@ -101,12 +157,20 @@  static const struct dev_pm_ops pcie_portdrv_pm_ops = {
 	.poweroff	= pcie_port_device_suspend,
 	.restore	= pcie_port_device_resume,
 	.resume_noirq	= pcie_port_resume_noirq,
+	.runtime_suspend = pcie_port_runtime_suspend,
+	.runtime_resume	= pcie_port_runtime_resume,
+	.runtime_idle	= pcie_port_runtime_idle,
 };
 
 #define PCIE_PORTDRV_PM_OPS	(&pcie_portdrv_pm_ops)
 
 #else /* !PM */
 
+static inline bool pcie_port_runtime_pm_possible(struct pci_dev *pdev)
+{
+	return false;
+}
+
 #define PCIE_PORTDRV_PM_OPS	NULL
 #endif /* !PM */
 
@@ -121,6 +185,7 @@  static const struct dev_pm_ops pcie_portdrv_pm_ops = {
 static int pcie_portdrv_probe(struct pci_dev *dev,
 					const struct pci_device_id *id)
 {
+	struct pcie_port_data *pdata;
 	int status;
 
 	if (!pci_is_pcie(dev) ||
@@ -134,11 +199,31 @@  static int pcie_portdrv_probe(struct pci_dev *dev,
 		return status;
 
 	pci_save_state(dev);
+
+	pdata = devm_kzalloc(&dev->dev, sizeof(*pdata), GFP_KERNEL);
+	if (!pdata)
+		return -ENOMEM;
+
+	pci_set_drvdata(dev, pdata);
+	if (pcie_port_runtime_pm_possible(dev)) {
+		pm_runtime_put_noidle(&dev->dev);
+		pm_runtime_allow(&dev->dev);
+
+		pdata->runtime_pm_enabled = true;
+	}
+
 	return 0;
 }
 
 static void pcie_portdrv_remove(struct pci_dev *dev)
 {
+	const struct pcie_port_data *pdata = pci_get_drvdata(dev);
+
+	if (pdata->runtime_pm_enabled) {
+		pm_runtime_forbid(&dev->dev);
+		pm_runtime_get_noresume(&dev->dev);
+	}
+
 	pcie_port_device_remove(dev);
 }