diff mbox series

[v1,2/2] PCI: Allow user to request power management of conventional and hotplug bridges

Message ID 151908204614.37696.12828004282495415825.stgit@bhelgaas-glaptop.roam.corp.google.com
State Not Applicable
Headers show
Series PCI/PM: Add comments, allow PM of conventional & hotplug bridges | expand

Commit Message

Bjorn Helgaas Feb. 19, 2018, 11:14 p.m. UTC
From: Bjorn Helgaas <bhelgaas@google.com>

Previously "pcie_port_pm=force" enabled power management of PCI bridges,
but only for PCIe ports (not conventional PCI bridges) and only for ports
that do not support hotplug.  Those limitations are there because we're not
confident that all those configurations work, not because the spec requires
them.

Change "pcie_port_pm=force" to enable power management of conventional PCI
bridges and hotplug bridges as well as PCIe ports.  As with the previous
PCIe port-only behavior, this is not expected to work in all systems.

Add a "pci=bridge_pm" parameter to reflect the increased scope.  For
backward compatibility, retain "pcie_port_pm=force" as an undocumented
equivalent.

Add "pci=no_bridge_pm" as an equivalent to "pcie_port_pm=off".  This
disables power management for all PCI bridges, which is results in the same
behavior as before, since we always disabled power management of
conventional PCI bridges, and "pcie_port_pm=off" disabled it for PCIe
ports.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/admin-guide/kernel-parameters.txt |    8 ++++----
 drivers/pci/pci.c                               |   21 ++++++++++++---------
 2 files changed, 16 insertions(+), 13 deletions(-)

Comments

Valdis Kl ē tnieks Feb. 19, 2018, 11:28 p.m. UTC | #1
On Mon, 19 Feb 2018 17:14:06 -0600, Bjorn Helgaas said:

> Change "pcie_port_pm=force" to enable power management of conventional PCI
> bridges and hotplug bridges as well as PCIe ports.  As with the previous
> PCIe port-only behavior, this is not expected to work in all systems.

This part says the behavior changes - which is itself a Bad Idea unless you
have a deprecation cut-over across several releases.  The general rule is that
you're not allowed to break somebody's kernel without a lot of warning.
Remember that there's probably a lot of embedded systems that hardcode their
boot cmdline and changing the behavior can result in a failed boot - which can
be a royal bitch to debug if the embedded system doesn't have a console.....

In addition, it doesn't match the actual patch, which documents the boot
parameter as being removed, rather than the behavior changed:

> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 1d1d53f85ddd..4660105ec851 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt

> @@ -3143,10 +3147,6 @@
>  		compat	Treat PCIe ports as PCI-to-PCI bridges, disable the PCIe
>  			ports driver.
>
> -	pcie_port_pm=	[PCIE] PCIe port power management handling:
> -		off	Disable power management of all PCIe ports
> -		force	Forcibly enable power management of all PCIe ports
> -

And *that* doesn't match the rest of the patch, which never touches the handling
of that parameter, either changing it or removing it.
Rafael J. Wysocki Feb. 20, 2018, 9:41 a.m. UTC | #2
On Tue, Feb 20, 2018 at 12:14 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> Previously "pcie_port_pm=force" enabled power management of PCI bridges,
> but only for PCIe ports (not conventional PCI bridges) and only for ports
> that do not support hotplug.  Those limitations are there because we're not
> confident that all those configurations work, not because the spec requires
> them.
>
> Change "pcie_port_pm=force" to enable power management of conventional PCI
> bridges and hotplug bridges as well as PCIe ports.  As with the previous
> PCIe port-only behavior, this is not expected to work in all systems.
>
> Add a "pci=bridge_pm" parameter to reflect the increased scope.  For
> backward compatibility, retain "pcie_port_pm=force" as an undocumented
> equivalent.
>
> Add "pci=no_bridge_pm" as an equivalent to "pcie_port_pm=off".  This
> disables power management for all PCI bridges, which is results in the same
> behavior as before, since we always disabled power management of
> conventional PCI bridges, and "pcie_port_pm=off" disabled it for PCIe
> ports.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Honestly, I wouldn't do that, at least not this way.

Somebody might be using pcie_port_pm=force already, for example, and
it works for them for PCIe, but the PCI-to-PCI part of the same system
may not.

IMO the behavior of pcie_port_pm= should be as is and I don't see
what's wrong with it being documented.

Of course, you can add pci=bridge_pm/no_bridge_pm to extend the scope,
but for what reason really?  Just to follow the letter of the spec?

> ---
>  Documentation/admin-guide/kernel-parameters.txt |    8 ++++----
>  drivers/pci/pci.c                               |   21 ++++++++++++---------
>  2 files changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 1d1d53f85ddd..4660105ec851 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3117,6 +3117,10 @@
>                 pcie_scan_all   Scan all possible PCIe devices.  Otherwise we
>                                 only look for one device below a PCIe downstream
>                                 port.
> +               bridge_pm       Enable runtime power management of all bridges,
> +                               including conventional PCI bridges, PCIe ports,
> +                               and hotplug bridges.
> +               no_bridge_pm    Disable runtime power management of all bridges.
>                 big_root_window Try to add a big 64bit memory window to the PCIe
>                                 root complex on AMD CPUs. Some GFX hardware
>                                 can resize a BAR to allow access to all VRAM.
> @@ -3143,10 +3147,6 @@
>                 compat  Treat PCIe ports as PCI-to-PCI bridges, disable the PCIe
>                         ports driver.
>
> -       pcie_port_pm=   [PCIE] PCIe port power management handling:
> -               off     Disable power management of all PCIe ports
> -               force   Forcibly enable power management of all PCIe ports
> -
>         pcie_pme=       [PCIE,PM] Native PCIe PME signaling options:
>                 nomsi   Do not use MSI for native PCIe PME signaling (this makes
>                         all PCIe root ports use INTx for all services).
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 75db77cf3f8f..2aa1ae9c4afa 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -111,10 +111,8 @@ unsigned int pcibios_max_latency = 255;
>  /* If set, the PCIe ARI capability will not be used. */
>  static bool pcie_ari_disabled;
>
> -/* Disable bridge_d3 for all PCIe ports */
> -static bool pci_bridge_d3_disable;
> -/* Force bridge_d3 for all PCIe ports */
> -static bool pci_bridge_d3_force;
> +static bool pci_bridge_d3_disable;     /* Disable D3 for all bridges */
> +static bool pci_bridge_d3_force;       /* Enable D3 for all bridges */
>
>  static int __init pcie_port_pm_setup(char *str)
>  {
> @@ -2260,6 +2258,12 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
>  {
>         unsigned int year;
>
> +       if (pci_bridge_d3_disable)
> +               return false;
> +
> +       if (pci_bridge_d3_force)
> +               return true;
> +
>         /*
>          * In principle we should be able to put conventional PCI bridges
>          * into D3.  We only support it for PCIe because (a) we want to
> @@ -2274,8 +2278,6 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
>         case PCI_EXP_TYPE_ROOT_PORT:
>         case PCI_EXP_TYPE_UPSTREAM:
>         case PCI_EXP_TYPE_DOWNSTREAM:
> -               if (pci_bridge_d3_disable)
> -                       return false;
>
>                 /*
>                  * Hotplug interrupts cannot be delivered if the link is down,
> @@ -2295,9 +2297,6 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
>                 if (bridge->is_hotplug_bridge)
>                         return false;
>
> -               if (pci_bridge_d3_force)
> -                       return true;
> -
>                 /*
>                  * It should be safe to put PCIe ports from 2015 or newer
>                  * to D3.  We have vague reports of possible hardware
> @@ -5708,6 +5707,10 @@ static int __init pci_setup(char *str)
>                                 pcie_bus_config = PCIE_BUS_PEER2PEER;
>                         } else if (!strncmp(str, "pcie_scan_all", 13)) {
>                                 pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
> +                       } else if (!strncmp(str, "bridge_pm", 9)) {
> +                               pci_bridge_d3_force = true;
> +                       } else if (!strncmp(str, "no_bridge_pm", 12)) {
> +                               pci_bridge_d3_disable = true;
>                         } else {
>                                 printk(KERN_ERR "PCI: Unknown option `%s'\n",
>                                                 str);
>
Rafael J. Wysocki Feb. 20, 2018, 9:57 a.m. UTC | #3
On Tue, Feb 20, 2018 at 10:41 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Feb 20, 2018 at 12:14 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> From: Bjorn Helgaas <bhelgaas@google.com>
>>
>> Previously "pcie_port_pm=force" enabled power management of PCI bridges,
>> but only for PCIe ports (not conventional PCI bridges) and only for ports
>> that do not support hotplug.  Those limitations are there because we're not
>> confident that all those configurations work, not because the spec requires
>> them.
>>
>> Change "pcie_port_pm=force" to enable power management of conventional PCI
>> bridges and hotplug bridges as well as PCIe ports.  As with the previous
>> PCIe port-only behavior, this is not expected to work in all systems.
>>
>> Add a "pci=bridge_pm" parameter to reflect the increased scope.  For
>> backward compatibility, retain "pcie_port_pm=force" as an undocumented
>> equivalent.
>>
>> Add "pci=no_bridge_pm" as an equivalent to "pcie_port_pm=off".  This
>> disables power management for all PCI bridges, which is results in the same
>> behavior as before, since we always disabled power management of
>> conventional PCI bridges, and "pcie_port_pm=off" disabled it for PCIe
>> ports.
>>
>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> Honestly, I wouldn't do that, at least not this way.
>
> Somebody might be using pcie_port_pm=force already, for example, and
> it works for them for PCIe, but the PCI-to-PCI part of the same system
> may not.
>
> IMO the behavior of pcie_port_pm= should be as is and I don't see
> what's wrong with it being documented.

That said it may be useful to add something like this to the
documentation of it in kernel-parameters.txt:

[PCIe port power management is needed to enable SoC-wide power
management on some SoCs shipping since 2015, so it is enabled by
default on systems from 2015 and newer.  Some older systems may still
benefit from it and it may not work on some newer systems due to
hardware problems.  Use this parameter in those cases.]
Bjorn Helgaas Feb. 20, 2018, 6:15 p.m. UTC | #4
On Tue, Feb 20, 2018 at 10:41:33AM +0100, Rafael J. Wysocki wrote:
> On Tue, Feb 20, 2018 at 12:14 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > From: Bjorn Helgaas <bhelgaas@google.com>
> >
> > Previously "pcie_port_pm=force" enabled power management of PCI bridges,
> > but only for PCIe ports (not conventional PCI bridges) and only for ports
> > that do not support hotplug.  Those limitations are there because we're not
> > confident that all those configurations work, not because the spec requires
> > them.
> >
> > Change "pcie_port_pm=force" to enable power management of conventional PCI
> > bridges and hotplug bridges as well as PCIe ports.  As with the previous
> > PCIe port-only behavior, this is not expected to work in all systems.
> >
> > Add a "pci=bridge_pm" parameter to reflect the increased scope.  For
> > backward compatibility, retain "pcie_port_pm=force" as an undocumented
> > equivalent.
> >
> > Add "pci=no_bridge_pm" as an equivalent to "pcie_port_pm=off".  This
> > disables power management for all PCI bridges, which is results in the same
> > behavior as before, since we always disabled power management of
> > conventional PCI bridges, and "pcie_port_pm=off" disabled it for PCIe
> > ports.
> >
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> Honestly, I wouldn't do that, at least not this way.
> 
> Somebody might be using pcie_port_pm=force already, for example, and
> it works for them for PCIe, but the PCI-to-PCI part of the same system
> may not.

Yes, you and Valdis are right, this is over-aggressive and I'll drop
it.

> IMO the behavior of pcie_port_pm= should be as is and I don't see
> what's wrong with it being documented.
>
> Of course, you can add pci=bridge_pm/no_bridge_pm to extend the scope,
> but for what reason really?  Just to follow the letter of the spec?

Basically I was hoping to partially rectify what I think was a mistake
on my part when we merged this.  9d26d3a8f1b0 ("PCI: Put PCIe ports
into D3 during suspend") is somewhat misleading because it suggests
that PCI bridge power management can only be supported on non-hotplug
PCIe ports, when in fact this was mostly a question of testing and "we
know this works on the systems we care about so we're going to
minimize our risk by excluding others".  These constraints seem pretty
Intel-centric and it's not clear how or whether they apply to other
architectures.

Adding the comments will help with that some, but in general I don't
like to artificially limit feature support because it reduces testing
exposure and makes future maintenance more difficult.

For example, we disallow D3 for hotplug bridges.  I don't think the
spec requires that, so the fact that we put that limitation in
suggests that there was some issue we didn't fully understand, and now
it will be hard to go back and figure that out if and when we *do*
want to support D3 for hotplug bridges.

Anyway, I'll drop this one and just go with adding the comments.

Bjorn
Rafael J. Wysocki Feb. 20, 2018, 7 p.m. UTC | #5
On Tue, Feb 20, 2018 at 7:15 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Tue, Feb 20, 2018 at 10:41:33AM +0100, Rafael J. Wysocki wrote:
>> On Tue, Feb 20, 2018 at 12:14 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> > From: Bjorn Helgaas <bhelgaas@google.com>
>> >
>> > Previously "pcie_port_pm=force" enabled power management of PCI bridges,
>> > but only for PCIe ports (not conventional PCI bridges) and only for ports
>> > that do not support hotplug.  Those limitations are there because we're not
>> > confident that all those configurations work, not because the spec requires
>> > them.
>> >
>> > Change "pcie_port_pm=force" to enable power management of conventional PCI
>> > bridges and hotplug bridges as well as PCIe ports.  As with the previous
>> > PCIe port-only behavior, this is not expected to work in all systems.
>> >
>> > Add a "pci=bridge_pm" parameter to reflect the increased scope.  For
>> > backward compatibility, retain "pcie_port_pm=force" as an undocumented
>> > equivalent.
>> >
>> > Add "pci=no_bridge_pm" as an equivalent to "pcie_port_pm=off".  This
>> > disables power management for all PCI bridges, which is results in the same
>> > behavior as before, since we always disabled power management of
>> > conventional PCI bridges, and "pcie_port_pm=off" disabled it for PCIe
>> > ports.
>> >
>> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>
>> Honestly, I wouldn't do that, at least not this way.
>>
>> Somebody might be using pcie_port_pm=force already, for example, and
>> it works for them for PCIe, but the PCI-to-PCI part of the same system
>> may not.
>
> Yes, you and Valdis are right, this is over-aggressive and I'll drop
> it.
>
>> IMO the behavior of pcie_port_pm= should be as is and I don't see
>> what's wrong with it being documented.
>>
>> Of course, you can add pci=bridge_pm/no_bridge_pm to extend the scope,
>> but for what reason really?  Just to follow the letter of the spec?
>
> Basically I was hoping to partially rectify what I think was a mistake
> on my part when we merged this.  9d26d3a8f1b0 ("PCI: Put PCIe ports
> into D3 during suspend") is somewhat misleading because it suggests
> that PCI bridge power management can only be supported on non-hotplug
> PCIe ports, when in fact this was mostly a question of testing and "we
> know this works on the systems we care about so we're going to
> minimize our risk by excluding others".  These constraints seem pretty
> Intel-centric and it's not clear how or whether they apply to other
> architectures.
>
> Adding the comments will help with that some, but in general I don't
> like to artificially limit feature support because it reduces testing
> exposure and makes future maintenance more difficult.
>
> For example, we disallow D3 for hotplug bridges.  I don't think the
> spec requires that, so the fact that we put that limitation in
> suggests that there was some issue we didn't fully understand, and now
> it will be hard to go back and figure that out if and when we *do*
> want to support D3 for hotplug bridges.

In this particular case we just wanted to limit the scope of changes
to what we were able to test at that time.

You seem to be arguing that the target coverage for a new feature
should always be maximum, because that makes future maintenance
easier.

While that might be the case, it places a lot of burden on the
developer who introduces the feature to also cover systems they may
not have access to and causes the test matrix to increase
significantly.

I prefer to limit the initial scope of changes to a set of systems
that can be tested and validated in a specific time frame as that is
much more friendly to developers working on the features in question.
It's just a different development strategy and it is generally
applicable regardless of which company the given developers work for
IMO.

> Anyway, I'll drop this one and just go with adding the comments.

Thanks!
Lukas Wunner Feb. 22, 2018, 1:18 p.m. UTC | #6
On Tue, Feb 20, 2018 at 12:15:54PM -0600, Bjorn Helgaas wrote:
> Basically I was hoping to partially rectify what I think was a mistake
> on my part when we merged this.  9d26d3a8f1b0 ("PCI: Put PCIe ports
> into D3 during suspend") is somewhat misleading because it suggests
> that PCI bridge power management can only be supported on non-hotplug
> PCIe ports, when in fact this was mostly a question of testing and "we
> know this works on the systems we care about so we're going to
> minimize our risk by excluding others".  These constraints seem pretty
> Intel-centric and it's not clear how or whether they apply to other
> architectures.
> 
> Adding the comments will help with that some, but in general I don't
> like to artificially limit feature support because it reduces testing
> exposure and makes future maintenance more difficult.
> 
> For example, we disallow D3 for hotplug bridges.  I don't think the
> spec requires that, so the fact that we put that limitation in
> suggests that there was some issue we didn't fully understand, and now
> it will be hard to go back and figure that out if and when we *do*
> want to support D3 for hotplug bridges.

Some x86 machines which handle hotplug in firmware, rather than natively
by the OS, require that the OS doesn't transition them to D3hot behind
the firmware's back.  That's the reason why Mika excluded them from
runtime PM:
https://bugzilla.kernel.org/show_bug.cgi?id=53811

If the OS handles hotplug natively, transitioning the ports to D3hot
should be fine in theory.  I submitted this series last May to extend
runtime PM to those:
https://www.spinics.net/lists/linux-pci/msg60962.html

However Ashok Raj tested them on a Xeon-SP system and got Hardware Errors:
https://lkml.org/lkml/2017/5/3/480

I'm not sure if I've done anything wrong in that series or if we're
dealing with an incompatibility of this particular platform with D3hot
on hotplug ports.

We do need runtime PM on hotplug ports to power off Thunderbolt
controllers when nothing is plugged in.  That saves 1.5 W, so a
noticeable amount of power.  I was going to respin the series one
of these days, I think the best I can do is continue to forbid
runtime PM on hotplug ports by default, but whitelist it for
Thunderbolt and allow manually enabling it on other platforms via
the command line.  That way, vendors are put in a position to
validate their platforms for runtime PM of hotplug ports, and
perhaps someday we can enable it for all platforms by default,
but with a BIOS cut-off date.

As for the existing 2015 cut-off for non-hotplug ports, I remember
Rafael writing that we may try to slowly push the cut-off further
back into the past and stop as soon as problems are reported.
That hasn't happened yet because noone had a need for it.

Thanks,

Lukas
Wysocki, Rafael J Feb. 22, 2018, 1:31 p.m. UTC | #7
On 2/22/2018 2:18 PM, Lukas Wunner wrote:
> On Tue, Feb 20, 2018 at 12:15:54PM -0600, Bjorn Helgaas wrote:
>> Basically I was hoping to partially rectify what I think was a mistake
>> on my part when we merged this.  9d26d3a8f1b0 ("PCI: Put PCIe ports
>> into D3 during suspend") is somewhat misleading because it suggests
>> that PCI bridge power management can only be supported on non-hotplug
>> PCIe ports, when in fact this was mostly a question of testing and "we
>> know this works on the systems we care about so we're going to
>> minimize our risk by excluding others".  These constraints seem pretty
>> Intel-centric and it's not clear how or whether they apply to other
>> architectures.
>>
>> Adding the comments will help with that some, but in general I don't
>> like to artificially limit feature support because it reduces testing
>> exposure and makes future maintenance more difficult.
>>
>> For example, we disallow D3 for hotplug bridges.  I don't think the
>> spec requires that, so the fact that we put that limitation in
>> suggests that there was some issue we didn't fully understand, and now
>> it will be hard to go back and figure that out if and when we *do*
>> want to support D3 for hotplug bridges.
> Some x86 machines which handle hotplug in firmware, rather than natively
> by the OS, require that the OS doesn't transition them to D3hot behind
> the firmware's back.  That's the reason why Mika excluded them from
> runtime PM:
> https://bugzilla.kernel.org/show_bug.cgi?id=53811
>
> If the OS handles hotplug natively, transitioning the ports to D3hot
> should be fine in theory.  I submitted this series last May to extend
> runtime PM to those:
> https://www.spinics.net/lists/linux-pci/msg60962.html
>
> However Ashok Raj tested them on a Xeon-SP system and got Hardware Errors:
> https://lkml.org/lkml/2017/5/3/480
>
> I'm not sure if I've done anything wrong in that series or if we're
> dealing with an incompatibility of this particular platform with D3hot
> on hotplug ports.

Thanks for mentioning that, and for the pointers!

> We do need runtime PM on hotplug ports to power off Thunderbolt
> controllers when nothing is plugged in.  That saves 1.5 W, so a
> noticeable amount of power.  I was going to respin the series one
> of these days, I think the best I can do is continue to forbid
> runtime PM on hotplug ports by default, but whitelist it for
> Thunderbolt and allow manually enabling it on other platforms via
> the command line.  That way, vendors are put in a position to
> validate their platforms for runtime PM of hotplug ports, and
> perhaps someday we can enable it for all platforms by default,
> but with a BIOS cut-off date.
>
> As for the existing 2015 cut-off for non-hotplug ports, I remember
> Rafael writing that we may try to slowly push the cut-off further
> back into the past and stop as soon as problems are reported.
> That hasn't happened yet because noone had a need for it.

Right.

There's more background related to this particular thing worth 
mentioning IMO.  I'll write about it later today.

Thanks,
Rafael
Rafael J. Wysocki Feb. 22, 2018, 5:24 p.m. UTC | #8
On Thu, Feb 22, 2018 at 2:31 PM, Rafael J. Wysocki
<rafael.j.wysocki@intel.com> wrote:
> On 2/22/2018 2:18 PM, Lukas Wunner wrote:
>>
>> On Tue, Feb 20, 2018 at 12:15:54PM -0600, Bjorn Helgaas wrote:
>>>
>>> Basically I was hoping to partially rectify what I think was a mistake
>>> on my part when we merged this.  9d26d3a8f1b0 ("PCI: Put PCIe ports
>>> into D3 during suspend") is somewhat misleading because it suggests
>>> that PCI bridge power management can only be supported on non-hotplug
>>> PCIe ports, when in fact this was mostly a question of testing and "we
>>> know this works on the systems we care about so we're going to
>>> minimize our risk by excluding others".  These constraints seem pretty
>>> Intel-centric and it's not clear how or whether they apply to other
>>> architectures.
>>>
>>> Adding the comments will help with that some, but in general I don't
>>> like to artificially limit feature support because it reduces testing
>>> exposure and makes future maintenance more difficult.
>>>
>>> For example, we disallow D3 for hotplug bridges.  I don't think the
>>> spec requires that, so the fact that we put that limitation in
>>> suggests that there was some issue we didn't fully understand, and now
>>> it will be hard to go back and figure that out if and when we *do*
>>> want to support D3 for hotplug bridges.
>>
>> Some x86 machines which handle hotplug in firmware, rather than natively
>> by the OS, require that the OS doesn't transition them to D3hot behind
>> the firmware's back.  That's the reason why Mika excluded them from
>> runtime PM:
>> https://bugzilla.kernel.org/show_bug.cgi?id=53811
>>
>> If the OS handles hotplug natively, transitioning the ports to D3hot
>> should be fine in theory.  I submitted this series last May to extend
>> runtime PM to those:
>> https://www.spinics.net/lists/linux-pci/msg60962.html
>>
>> However Ashok Raj tested them on a Xeon-SP system and got Hardware Errors:
>> https://lkml.org/lkml/2017/5/3/480
>>
>> I'm not sure if I've done anything wrong in that series or if we're
>> dealing with an incompatibility of this particular platform with D3hot
>> on hotplug ports.
>
>
> Thanks for mentioning that, and for the pointers!
>
>> We do need runtime PM on hotplug ports to power off Thunderbolt
>> controllers when nothing is plugged in.  That saves 1.5 W, so a
>> noticeable amount of power.  I was going to respin the series one
>> of these days, I think the best I can do is continue to forbid
>> runtime PM on hotplug ports by default, but whitelist it for
>> Thunderbolt and allow manually enabling it on other platforms via
>> the command line.  That way, vendors are put in a position to
>> validate their platforms for runtime PM of hotplug ports, and
>> perhaps someday we can enable it for all platforms by default,
>> but with a BIOS cut-off date.
>>
>> As for the existing 2015 cut-off for non-hotplug ports, I remember
>> Rafael writing that we may try to slowly push the cut-off further
>> back into the past and stop as soon as problems are reported.
>> That hasn't happened yet because noone had a need for it.
>
>
> Right.
>
> There's more background related to this particular thing worth mentioning
> IMO.  I'll write about it later today.

It is generally advisable to realize that in many cases platform
validation is relative to the specific software stack the given
platform is going to ship with.  If there are hardware (firmware,
similar) features in the platform that aren't exercised by that
software stack, they may receive limited testing coverage and they may
not be reliable in practice even though the formal specification of
the hardware etc may require them to be functional.

Now, I'm not aware of any OS doing PCI-to-PCI bridge power management
in a serious way.  It is formally there in the PCI PM spec, but it is
optional and that spec itself was separate from the PCI proper in the
past.  AFAICS, PM is mandatory in PCIe only.

For this reason, there is a huge risk related to enabling PM on
PCI-to-PCI bridges which is why we don't do that.

Also nobody hadn't really done runtime PM even on PCIe in practice
before 2009 and power management of ports started to be done in the
Windows 8 time frame, presumably because of Connected Standby.  It
generally is risky to assume it to work in hardware that shipped
earlier and that's the reason for the cut-off date: we knew that there
had to be a cut-off, but there's no reliable science on how far in the
past to put it, so we chose a date relatively close to "now" with an
option to move it back if need be.

Thanks,
Rafael
Mika Westerberg Feb. 26, 2018, 12:05 p.m. UTC | #9
On Thu, Feb 22, 2018 at 02:18:34PM +0100, Lukas Wunner wrote:
> We do need runtime PM on hotplug ports to power off Thunderbolt
> controllers when nothing is plugged in.  That saves 1.5 W, so a
> noticeable amount of power.  I was going to respin the series one
> of these days, I think the best I can do is continue to forbid
> runtime PM on hotplug ports by default, but whitelist it for
> Thunderbolt and allow manually enabling it on other platforms via
> the command line.  That way, vendors are put in a position to
> validate their platforms for runtime PM of hotplug ports, and
> perhaps someday we can enable it for all platforms by default,
> but with a BIOS cut-off date.

AFAIK Windows started to enable runtime PM (RTD3) for native PCIe
hotplug ports with the latest release (I guess it's the RS3 release) but
only when there is a special ACPI _DSD property ("HotPlugSupportInD3")
associated with the root port. I think we can take advantage of that in
Linux as well and I already have a patch series to enable runtime PM for
such ports but I haven't been able to test it properly yet.
Lukas Wunner Feb. 26, 2018, 12:22 p.m. UTC | #10
On Mon, Feb 26, 2018 at 02:05:34PM +0200, Mika Westerberg wrote:
> On Thu, Feb 22, 2018 at 02:18:34PM +0100, Lukas Wunner wrote:
> > We do need runtime PM on hotplug ports to power off Thunderbolt
> > controllers when nothing is plugged in.  That saves 1.5 W, so a
> > noticeable amount of power.  I was going to respin the series one
> > of these days, I think the best I can do is continue to forbid
> > runtime PM on hotplug ports by default, but whitelist it for
> > Thunderbolt and allow manually enabling it on other platforms via
> > the command line.  That way, vendors are put in a position to
> > validate their platforms for runtime PM of hotplug ports, and
> > perhaps someday we can enable it for all platforms by default,
> > but with a BIOS cut-off date.
> 
> AFAIK Windows started to enable runtime PM (RTD3) for native PCIe
> hotplug ports with the latest release (I guess it's the RS3 release) but
> only when there is a special ACPI _DSD property ("HotPlugSupportInD3")
> associated with the root port. I think we can take advantage of that in
> Linux as well and I already have a patch series to enable runtime PM for
> such ports but I haven't been able to test it properly yet.

Okay.  Well it would be trivial to whitelist those ports with
device_property_present("HotPlugSupportInD3").  In how far are
your patches identical with the patches I submitted last May?
I've started reworking them for v2 but that would be a waste of
time if you're working on this issue in parallel.

Thanks,

Lukas
Mika Westerberg Feb. 26, 2018, 12:35 p.m. UTC | #11
On Mon, Feb 26, 2018 at 01:22:43PM +0100, Lukas Wunner wrote:
> On Mon, Feb 26, 2018 at 02:05:34PM +0200, Mika Westerberg wrote:
> > On Thu, Feb 22, 2018 at 02:18:34PM +0100, Lukas Wunner wrote:
> > > We do need runtime PM on hotplug ports to power off Thunderbolt
> > > controllers when nothing is plugged in.  That saves 1.5 W, so a
> > > noticeable amount of power.  I was going to respin the series one
> > > of these days, I think the best I can do is continue to forbid
> > > runtime PM on hotplug ports by default, but whitelist it for
> > > Thunderbolt and allow manually enabling it on other platforms via
> > > the command line.  That way, vendors are put in a position to
> > > validate their platforms for runtime PM of hotplug ports, and
> > > perhaps someday we can enable it for all platforms by default,
> > > but with a BIOS cut-off date.
> > 
> > AFAIK Windows started to enable runtime PM (RTD3) for native PCIe
> > hotplug ports with the latest release (I guess it's the RS3 release) but
> > only when there is a special ACPI _DSD property ("HotPlugSupportInD3")
> > associated with the root port. I think we can take advantage of that in
> > Linux as well and I already have a patch series to enable runtime PM for
> > such ports but I haven't been able to test it properly yet.
> 
> Okay.  Well it would be trivial to whitelist those ports with
> device_property_present("HotPlugSupportInD3").  In how far are
> your patches identical with the patches I submitted last May?

My patches pretty much only touch the whitelist part not the other fixes
you made for pciehp in your series. We can add those on top of your
series or I can send them out separately.

> I've started reworking them for v2 but that would be a waste of
> time if you're working on this issue in parallel.

Please continue with your v2 :) I can provide testing assistance if you
need any.
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1d1d53f85ddd..4660105ec851 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3117,6 +3117,10 @@ 
 		pcie_scan_all	Scan all possible PCIe devices.  Otherwise we
 				only look for one device below a PCIe downstream
 				port.
+		bridge_pm	Enable runtime power management of all bridges,
+				including conventional PCI bridges, PCIe ports,
+				and hotplug bridges.
+		no_bridge_pm	Disable runtime power management of all bridges.
 		big_root_window	Try to add a big 64bit memory window to the PCIe
 				root complex on AMD CPUs. Some GFX hardware
 				can resize a BAR to allow access to all VRAM.
@@ -3143,10 +3147,6 @@ 
 		compat	Treat PCIe ports as PCI-to-PCI bridges, disable the PCIe
 			ports driver.
 
-	pcie_port_pm=	[PCIE] PCIe port power management handling:
-		off	Disable power management of all PCIe ports
-		force	Forcibly enable power management of all PCIe ports
-
 	pcie_pme=	[PCIE,PM] Native PCIe PME signaling options:
 		nomsi	Do not use MSI for native PCIe PME signaling (this makes
 			all PCIe root ports use INTx for all services).
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 75db77cf3f8f..2aa1ae9c4afa 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -111,10 +111,8 @@  unsigned int pcibios_max_latency = 255;
 /* If set, the PCIe ARI capability will not be used. */
 static bool pcie_ari_disabled;
 
-/* Disable bridge_d3 for all PCIe ports */
-static bool pci_bridge_d3_disable;
-/* Force bridge_d3 for all PCIe ports */
-static bool pci_bridge_d3_force;
+static bool pci_bridge_d3_disable;	/* Disable D3 for all bridges */
+static bool pci_bridge_d3_force;	/* Enable D3 for all bridges */
 
 static int __init pcie_port_pm_setup(char *str)
 {
@@ -2260,6 +2258,12 @@  bool pci_bridge_d3_possible(struct pci_dev *bridge)
 {
 	unsigned int year;
 
+	if (pci_bridge_d3_disable)
+		return false;
+
+	if (pci_bridge_d3_force)
+		return true;
+
 	/*
 	 * In principle we should be able to put conventional PCI bridges
 	 * into D3.  We only support it for PCIe because (a) we want to
@@ -2274,8 +2278,6 @@  bool pci_bridge_d3_possible(struct pci_dev *bridge)
 	case PCI_EXP_TYPE_ROOT_PORT:
 	case PCI_EXP_TYPE_UPSTREAM:
 	case PCI_EXP_TYPE_DOWNSTREAM:
-		if (pci_bridge_d3_disable)
-			return false;
 
 		/*
 		 * Hotplug interrupts cannot be delivered if the link is down,
@@ -2295,9 +2297,6 @@  bool pci_bridge_d3_possible(struct pci_dev *bridge)
 		if (bridge->is_hotplug_bridge)
 			return false;
 
-		if (pci_bridge_d3_force)
-			return true;
-
 		/*
 		 * It should be safe to put PCIe ports from 2015 or newer
 		 * to D3.  We have vague reports of possible hardware
@@ -5708,6 +5707,10 @@  static int __init pci_setup(char *str)
 				pcie_bus_config = PCIE_BUS_PEER2PEER;
 			} else if (!strncmp(str, "pcie_scan_all", 13)) {
 				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
+			} else if (!strncmp(str, "bridge_pm", 9)) {
+				pci_bridge_d3_force = true;
+			} else if (!strncmp(str, "no_bridge_pm", 12)) {
+				pci_bridge_d3_disable = true;
 			} else {
 				printk(KERN_ERR "PCI: Unknown option `%s'\n",
 						str);