diff mbox series

PCI/PM: Assume ports without DLL Link Active train links in 100 ms

Message ID 20200819130625.12778-1-mika.westerberg@linux.intel.com
State New
Headers show
Series PCI/PM: Assume ports without DLL Link Active train links in 100 ms | expand

Commit Message

Mika Westerberg Aug. 19, 2020, 1:06 p.m. UTC
Kai-Heng Feng reported that it takes a long time (> 1 s) to resume
Thunderbolt-connected devices from both runtime suspend and system sleep
(s2idle).

This was because some Downstream Ports that support > 5 GT/s do not also
support Data Link Layer Link Active reporting.  Per PCIe r5.0 sec 6.6.1:

  With a Downstream Port that supports Link speeds greater than 5.0 GT/s,
  software must wait a minimum of 100 ms after Link training completes
  before sending a Configuration Request to the device immediately below
  that Port. Software can determine when Link training completes by polling
  the Data Link Layer Link Active bit or by setting up an associated
  interrupt (see Section 6.7.3.3).

Sec 7.5.3.6 requires such Ports to support DLL Link Active reporting, but
at least the Intel JHL6240 Thunderbolt 3 Bridge [8086:15c0] and the Intel
JHL7540 Thunderbolt 3 Bridge [8086:15ea] do not.

Previously we tried to wait for Link training to complete, but since there
was no DLL Link Active reporting, all we could do was wait the worst-case
1000 ms, then another 100 ms.

Instead of using the supported speeds to determine whether to wait for Link
training, check whether the port supports DLL Link Active reporting.  The
Ports in question do not, so we'll wait only the 100 ms required for Ports
that support Link speeds <= 5 GT/s.

This of course assumes these Ports always train the Link within 100 ms even
if they are operating at > 5 GT/s, which is not required by the spec.

This version adds a special check for devices whose power management is
disabled by their driver (->pm_cap is set to zero). This is needed to
avoid regression with some NVIDIA GPUs.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=208597
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206837
Link: https://lore.kernel.org/r/20200514133043.27429-1-mika.westerberg@linux.intel.com
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pci.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

Comments

Lyude Paul Aug. 19, 2020, 4:58 p.m. UTC | #1
On Wed, 2020-08-19 at 16:06 +0300, Mika Westerberg wrote:
> Kai-Heng Feng reported that it takes a long time (> 1 s) to resume
> Thunderbolt-connected devices from both runtime suspend and system sleep
> (s2idle).
> 
> This was because some Downstream Ports that support > 5 GT/s do not also
> support Data Link Layer Link Active reporting.  Per PCIe r5.0 sec 6.6.1:
> 
>   With a Downstream Port that supports Link speeds greater than 5.0 GT/s,
>   software must wait a minimum of 100 ms after Link training completes
>   before sending a Configuration Request to the device immediately below
>   that Port. Software can determine when Link training completes by polling
>   the Data Link Layer Link Active bit or by setting up an associated
>   interrupt (see Section 6.7.3.3).
> 
> Sec 7.5.3.6 requires such Ports to support DLL Link Active reporting, but
> at least the Intel JHL6240 Thunderbolt 3 Bridge [8086:15c0] and the Intel
> JHL7540 Thunderbolt 3 Bridge [8086:15ea] do not.
> 
> Previously we tried to wait for Link training to complete, but since there
> was no DLL Link Active reporting, all we could do was wait the worst-case
> 1000 ms, then another 100 ms.
> 
> Instead of using the supported speeds to determine whether to wait for Link
> training, check whether the port supports DLL Link Active reporting.  The
> Ports in question do not, so we'll wait only the 100 ms required for Ports
> that support Link speeds <= 5 GT/s.
> 
> This of course assumes these Ports always train the Link within 100 ms even
> if they are operating at > 5 GT/s, which is not required by the spec.
> 
> This version adds a special check for devices whose power management is
> disabled by their driver (->pm_cap is set to zero). This is needed to
> avoid regression with some NVIDIA GPUs.

Hm, I'm not entirely sure that the link training delay is specific to laptops
with ->pm_cap set to 0, I think we should try figuring out if there's any
laptops that match those characteristics before moving forward with this - I'll
take a look through the test machines I've got available today


> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=208597
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206837
> Link: 
> https://lore.kernel.org/r/20200514133043.27429-1-mika.westerberg@linux.intel.com
> Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/pci.c | 37 +++++++++++++++++++++++++++----------
>  1 file changed, 27 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index a458c46d7e39..33eb502a60c8 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4658,7 +4658,8 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
>   * pcie_wait_for_link_delay - Wait until link is active or inactive
>   * @pdev: Bridge device
>   * @active: waiting for active or inactive?
> - * @delay: Delay to wait after link has become active (in ms)
> + * @delay: Delay to wait after link has become active (in ms). Specify %0
> + *	   for no delay.
>   *
>   * Use this to wait till link becomes active or inactive.
>   */
> @@ -4699,7 +4700,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev
> *pdev, bool active,
>  		msleep(10);
>  		timeout -= 10;
>  	}
> -	if (active && ret)
> +	if (active && ret && delay)
>  		msleep(delay);
>  	else if (ret != active)
>  		pci_info(pdev, "Data Link Layer Link Active not %s in 1000
> msec\n",
> @@ -4793,8 +4794,13 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev
> *dev)
>  	 * accessing the device after reset (that is 1000 ms + 100 ms). In
>  	 * practice this should not be needed because we don't do power
>  	 * management for them (see pci_bridge_d3_possible()).
> +	 *
> +	 * Also do the same for devices that have power management disabled
> +	 * by their driver and are completely power managed through the
> +	 * root port power resource instead. This is a special case for
> +	 * nouveau.
>  	 */
> -	if (!pci_is_pcie(dev)) {
> +	if (!pci_is_pcie(dev) || !child->pm_cap) {
>  		pci_dbg(dev, "waiting %d ms for secondary bus\n", 1000 + delay);
>  		msleep(1000 + delay);
>  		return;
> @@ -4820,17 +4826,28 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev
> *dev)
>  	if (!pcie_downstream_port(dev))
>  		return;
>  
> -	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
> -		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
> -		msleep(delay);
> -	} else {
> -		pci_dbg(dev, "waiting %d ms for downstream link, after
> activation\n",
> -			delay);
> -		if (!pcie_wait_for_link_delay(dev, true, delay)) {
> +	/*
> +	 * Per PCIe r5.0, sec 6.6.1, for downstream ports that support
> +	 * speeds > 5 GT/s, we must wait for link training to complete
> +	 * before the mandatory delay.
> +	 *
> +	 * We can only tell when link training completes via DLL Link
> +	 * Active, which is required for downstream ports that support
> +	 * speeds > 5 GT/s (sec 7.5.3.6).  Unfortunately some common
> +	 * devices do not implement Link Active reporting even when it's
> +	 * required, so we'll check for that directly instead of checking
> +	 * the supported link speed.  We assume devices without Link Active
> +	 * reporting can train in 100 ms regardless of speed.
> +	 */
> +	if (dev->link_active_reporting) {
> +		pci_dbg(dev, "waiting for link to train\n");
> +		if (!pcie_wait_for_link_delay(dev, true, 0)) {
>  			/* Did not train, no need to wait any further */
>  			return;
>  		}
>  	}
> +	pci_dbg(child, "waiting %d ms to become accessible\n", delay);
> +	msleep(delay);
>  
>  	if (!pci_device_is_present(child)) {
>  		pci_dbg(child, "waiting additional %d ms to become
> accessible\n", delay);
Mika Westerberg Aug. 19, 2020, 5:20 p.m. UTC | #2
On Wed, Aug 19, 2020 at 12:58:18PM -0400, Lyude Paul wrote:
> On Wed, 2020-08-19 at 16:06 +0300, Mika Westerberg wrote:
> > Kai-Heng Feng reported that it takes a long time (> 1 s) to resume
> > Thunderbolt-connected devices from both runtime suspend and system sleep
> > (s2idle).
> > 
> > This was because some Downstream Ports that support > 5 GT/s do not also
> > support Data Link Layer Link Active reporting.  Per PCIe r5.0 sec 6.6.1:
> > 
> >   With a Downstream Port that supports Link speeds greater than 5.0 GT/s,
> >   software must wait a minimum of 100 ms after Link training completes
> >   before sending a Configuration Request to the device immediately below
> >   that Port. Software can determine when Link training completes by polling
> >   the Data Link Layer Link Active bit or by setting up an associated
> >   interrupt (see Section 6.7.3.3).
> > 
> > Sec 7.5.3.6 requires such Ports to support DLL Link Active reporting, but
> > at least the Intel JHL6240 Thunderbolt 3 Bridge [8086:15c0] and the Intel
> > JHL7540 Thunderbolt 3 Bridge [8086:15ea] do not.
> > 
> > Previously we tried to wait for Link training to complete, but since there
> > was no DLL Link Active reporting, all we could do was wait the worst-case
> > 1000 ms, then another 100 ms.
> > 
> > Instead of using the supported speeds to determine whether to wait for Link
> > training, check whether the port supports DLL Link Active reporting.  The
> > Ports in question do not, so we'll wait only the 100 ms required for Ports
> > that support Link speeds <= 5 GT/s.
> > 
> > This of course assumes these Ports always train the Link within 100 ms even
> > if they are operating at > 5 GT/s, which is not required by the spec.
> > 
> > This version adds a special check for devices whose power management is
> > disabled by their driver (->pm_cap is set to zero). This is needed to
> > avoid regression with some NVIDIA GPUs.
> 
> Hm, I'm not entirely sure that the link training delay is specific to laptops
> with ->pm_cap set to 0, I think we should try figuring out if there's any
> laptops that match those characteristics before moving forward with this - I'll
> take a look through the test machines I've got available today

OK, thanks!
Lukas Wunner Aug. 20, 2020, 8:13 a.m. UTC | #3
On Wed, Aug 19, 2020 at 04:06:25PM +0300, Mika Westerberg wrote:
> Sec 7.5.3.6 requires such Ports to support DLL Link Active reporting, but
> at least the Intel JHL6240 Thunderbolt 3 Bridge [8086:15c0] and the Intel
> JHL7540 Thunderbolt 3 Bridge [8086:15ea] do not.
[...]
> +	 * Also do the same for devices that have power management disabled
> +	 * by their driver and are completely power managed through the
> +	 * root port power resource instead. This is a special case for
> +	 * nouveau.
>  	 */
> -	if (!pci_is_pcie(dev)) {
> +	if (!pci_is_pcie(dev) || !child->pm_cap) {

It sounds like the above-mentioned Thunderbolt controllers are broken,
not the Nvidia cards, so to me (as an outside observer) it would seem
more logical that a quirk for the former is needed.  The code comment
suggests that nouveau somehow has a problem, but that doesn't seem to
be the case (IIUC).  Also, it's a little ugly to have references to
specific drivers in PCI core code.

Maybe this can be fixed with quirks for the Thunderbolt controllers
which set a flag, and that flag causes the 1000 msec wait to be skipped?

Thanks,

Lukas
Lyude Paul Aug. 20, 2020, 3:36 p.m. UTC | #4
On Thu, 2020-08-20 at 10:13 +0200, Lukas Wunner wrote:
> On Wed, Aug 19, 2020 at 04:06:25PM +0300, Mika Westerberg wrote:
> > Sec 7.5.3.6 requires such Ports to support DLL Link Active reporting, but
> > at least the Intel JHL6240 Thunderbolt 3 Bridge [8086:15c0] and the Intel
> > JHL7540 Thunderbolt 3 Bridge [8086:15ea] do not.
> [...]
> > +	 * Also do the same for devices that have power management disabled
> > +	 * by their driver and are completely power managed through the
> > +	 * root port power resource instead. This is a special case for
> > +	 * nouveau.
> >  	 */
> > -	if (!pci_is_pcie(dev)) {
> > +	if (!pci_is_pcie(dev) || !child->pm_cap) {
> 
> It sounds like the above-mentioned Thunderbolt controllers are broken,
> not the Nvidia cards, so to me (as an outside observer) it would seem
> more logical that a quirk for the former is needed.  The code comment
> suggests that nouveau somehow has a problem, but that doesn't seem to
> be the case (IIUC).  Also, it's a little ugly to have references to
> specific drivers in PCI core code.
> 
> Maybe this can be fixed with quirks for the Thunderbolt controllers
> which set a flag, and that flag causes the 1000 msec wait to be skipped?
Sorry, some stuff came up yesterday so I didn't get the time to go through my
laptops and test them. I do agree with this though - I'd be worried as well that
nouveau might not be the only driver out there that needs this kind of delay

> 
> Thanks,
> 
> Lukas
>
Mika Westerberg Aug. 21, 2020, 9:32 a.m. UTC | #5
Hi,

On Thu, Aug 20, 2020 at 11:36:37AM -0400, Lyude Paul wrote:
> On Thu, 2020-08-20 at 10:13 +0200, Lukas Wunner wrote:
> > On Wed, Aug 19, 2020 at 04:06:25PM +0300, Mika Westerberg wrote:
> > > Sec 7.5.3.6 requires such Ports to support DLL Link Active reporting, but
> > > at least the Intel JHL6240 Thunderbolt 3 Bridge [8086:15c0] and the Intel
> > > JHL7540 Thunderbolt 3 Bridge [8086:15ea] do not.
> > [...]
> > > +	 * Also do the same for devices that have power management disabled
> > > +	 * by their driver and are completely power managed through the
> > > +	 * root port power resource instead. This is a special case for
> > > +	 * nouveau.
> > >  	 */
> > > -	if (!pci_is_pcie(dev)) {
> > > +	if (!pci_is_pcie(dev) || !child->pm_cap) {
> > 
> > It sounds like the above-mentioned Thunderbolt controllers are broken,
> > not the Nvidia cards, so to me (as an outside observer) it would seem
> > more logical that a quirk for the former is needed.  The code comment
> > suggests that nouveau somehow has a problem, but that doesn't seem to
> > be the case (IIUC).  Also, it's a little ugly to have references to
> > specific drivers in PCI core code.
> > 
> > Maybe this can be fixed with quirks for the Thunderbolt controllers
> > which set a flag, and that flag causes the 1000 msec wait to be skipped?
>
> Sorry, some stuff came up yesterday so I didn't get the time to go through my
> laptops and test them. I do agree with this though - I'd be worried as well that
> nouveau might not be the only driver out there that needs this kind of delay

I actually expect that nouveau is the only one because it is doing some
PM tricks to get the runtime PM working, which is that it leaves the GPU
device in D0 and puts the parent root port into D3cold. The BIOS ASL
code has some assumptions there and I think this 1000 ms delay just
works that around by luck ;-)

IIRC Bjorn suggested quirking the affected downstream ports when I
originally sent the patch but I thought we could make this solution more
generic. Which of course, did not work too well.

I can look into the quirk solution instead if this is what people
prefer.
Lyude Paul Aug. 21, 2020, 11:09 p.m. UTC | #6
On Fri, 2020-08-21 at 12:32 +0300, Mika Westerberg wrote:
> Hi,
> 
> On Thu, Aug 20, 2020 at 11:36:37AM -0400, Lyude Paul wrote:
> > On Thu, 2020-08-20 at 10:13 +0200, Lukas Wunner wrote:
> > > On Wed, Aug 19, 2020 at 04:06:25PM +0300, Mika Westerberg wrote:
> > > > Sec 7.5.3.6 requires such Ports to support DLL Link Active reporting,
> > > > but
> > > > at least the Intel JHL6240 Thunderbolt 3 Bridge [8086:15c0] and the
> > > > Intel
> > > > JHL7540 Thunderbolt 3 Bridge [8086:15ea] do not.
> > > [...]
> > > > +	 * Also do the same for devices that have power management
> > > > disabled
> > > > +	 * by their driver and are completely power managed through the
> > > > +	 * root port power resource instead. This is a special case for
> > > > +	 * nouveau.
> > > >  	 */
> > > > -	if (!pci_is_pcie(dev)) {
> > > > +	if (!pci_is_pcie(dev) || !child->pm_cap) {
> > > 
> > > It sounds like the above-mentioned Thunderbolt controllers are broken,
> > > not the Nvidia cards, so to me (as an outside observer) it would seem
> > > more logical that a quirk for the former is needed.  The code comment
> > > suggests that nouveau somehow has a problem, but that doesn't seem to
> > > be the case (IIUC).  Also, it's a little ugly to have references to
> > > specific drivers in PCI core code.
> > > 
> > > Maybe this can be fixed with quirks for the Thunderbolt controllers
> > > which set a flag, and that flag causes the 1000 msec wait to be skipped?
> > 
> > Sorry, some stuff came up yesterday so I didn't get the time to go through
> > my
> > laptops and test them. I do agree with this though - I'd be worried as well
> > that
> > nouveau might not be the only driver out there that needs this kind of delay
> 
> I actually expect that nouveau is the only one because it is doing some
> PM tricks to get the runtime PM working, which is that it leaves the GPU
> device in D0 and puts the parent root port into D3cold. The BIOS ASL
> code has some assumptions there and I think this 1000 ms delay just
> works that around by luck ;-)
> 
> IIRC Bjorn suggested quirking the affected downstream ports when I
> originally sent the patch but I thought we could make this solution more
> generic. Which of course, did not work too well.
> 
> I can look into the quirk solution instead if this is what people
> prefer.

yeah, probably the safest bet IMO.
>
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a458c46d7e39..33eb502a60c8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4658,7 +4658,8 @@  static int pci_pm_reset(struct pci_dev *dev, int probe)
  * pcie_wait_for_link_delay - Wait until link is active or inactive
  * @pdev: Bridge device
  * @active: waiting for active or inactive?
- * @delay: Delay to wait after link has become active (in ms)
+ * @delay: Delay to wait after link has become active (in ms). Specify %0
+ *	   for no delay.
  *
  * Use this to wait till link becomes active or inactive.
  */
@@ -4699,7 +4700,7 @@  static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
 		msleep(10);
 		timeout -= 10;
 	}
-	if (active && ret)
+	if (active && ret && delay)
 		msleep(delay);
 	else if (ret != active)
 		pci_info(pdev, "Data Link Layer Link Active not %s in 1000 msec\n",
@@ -4793,8 +4794,13 @@  void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 	 * accessing the device after reset (that is 1000 ms + 100 ms). In
 	 * practice this should not be needed because we don't do power
 	 * management for them (see pci_bridge_d3_possible()).
+	 *
+	 * Also do the same for devices that have power management disabled
+	 * by their driver and are completely power managed through the
+	 * root port power resource instead. This is a special case for
+	 * nouveau.
 	 */
-	if (!pci_is_pcie(dev)) {
+	if (!pci_is_pcie(dev) || !child->pm_cap) {
 		pci_dbg(dev, "waiting %d ms for secondary bus\n", 1000 + delay);
 		msleep(1000 + delay);
 		return;
@@ -4820,17 +4826,28 @@  void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 	if (!pcie_downstream_port(dev))
 		return;
 
-	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
-		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
-		msleep(delay);
-	} else {
-		pci_dbg(dev, "waiting %d ms for downstream link, after activation\n",
-			delay);
-		if (!pcie_wait_for_link_delay(dev, true, delay)) {
+	/*
+	 * Per PCIe r5.0, sec 6.6.1, for downstream ports that support
+	 * speeds > 5 GT/s, we must wait for link training to complete
+	 * before the mandatory delay.
+	 *
+	 * We can only tell when link training completes via DLL Link
+	 * Active, which is required for downstream ports that support
+	 * speeds > 5 GT/s (sec 7.5.3.6).  Unfortunately some common
+	 * devices do not implement Link Active reporting even when it's
+	 * required, so we'll check for that directly instead of checking
+	 * the supported link speed.  We assume devices without Link Active
+	 * reporting can train in 100 ms regardless of speed.
+	 */
+	if (dev->link_active_reporting) {
+		pci_dbg(dev, "waiting for link to train\n");
+		if (!pcie_wait_for_link_delay(dev, true, 0)) {
 			/* Did not train, no need to wait any further */
 			return;
 		}
 	}
+	pci_dbg(child, "waiting %d ms to become accessible\n", delay);
+	msleep(delay);
 
 	if (!pci_device_is_present(child)) {
 		pci_dbg(child, "waiting additional %d ms to become accessible\n", delay);