diff mbox series

PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume

Message ID 20230918053041.1018876-1-mika.westerberg@linux.intel.com
State New
Headers show
Series PCI/PM: Mark devices disconnected if their upstream PCIe link is down on resume | expand

Commit Message

Mika Westerberg Sept. 18, 2023, 5:30 a.m. UTC
Mark Blakeney reported that when suspending system with a Thunderbolt
dock connected and then unplugging the dock before resume (which is
pretty normal flow with laptops), resuming takes long time.

What happens is that the PCIe link from the root port to the PCIe switch
inside the Thunderbolt device does not train (as expected, the link is
upplugged):

[   34.903158] pcieport 0000:00:07.2: restoring config space at offset 0x24 (was 0x3bf12001, writing 0x3bf12001)
[   34.903231] pcieport 0000:00:07.0: waiting 100 ms for downstream link
[   36.140616] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up

However, at this point we still try the resume the devices below that
unplugged link:

[   36.140741] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
...
[   36.142235] pcieport 0000:01:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
...
[   36.144702] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation

And this is the link from PCIe switch downstream port to the xHCI on the
dock:

[   38.380618] xhci_hcd 0000:03:00.0: not ready 1023ms after resume; waiting
[   39.420587] xhci_hcd 0000:03:00.0: not ready 2047ms after resume; waiting
[   41.527250] xhci_hcd 0000:03:00.0: not ready 4095ms after resume; waiting
[   45.793957] xhci_hcd 0000:03:00.0: not ready 8191ms after resume; waiting
[   54.113950] xhci_hcd 0000:03:00.0: not ready 16383ms after resume; waiting
[   71.180576] xhci_hcd 0000:03:00.0: not ready 32767ms after resume; waiting
...
[  105.313963] xhci_hcd 0000:03:00.0: not ready 65535ms after resume; giving up
[  105.314037] xhci_hcd 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  105.315640] xhci_hcd 0000:03:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
...

This ends up slowing down the resume time considerably. For this reason
mark these devices as disconnected if the link above them did not train
properly.

Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pci-driver.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Lukas Wunner Sept. 18, 2023, 8:37 a.m. UTC | #1
On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> Mark Blakeney reported that when suspending system with a Thunderbolt
> dock connected and then unplugging the dock before resume (which is
> pretty normal flow with laptops), resuming takes long time.
> 
> What happens is that the PCIe link from the root port to the PCIe switch
> inside the Thunderbolt device does not train (as expected, the link is
> upplugged):
> 
> [   34.903158] pcieport 0000:00:07.2: restoring config space at offset 0x24 (was 0x3bf12001, writing 0x3bf12001)
> [   34.903231] pcieport 0000:00:07.0: waiting 100 ms for downstream link
> [   36.140616] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
> 
> However, at this point we still try the resume the devices below that
> unplugged link:
> 
> [   36.140741] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
> ...
> [   36.142235] pcieport 0000:01:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> ...
> [   36.144702] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation
> 
> And this is the link from PCIe switch downstream port to the xHCI on the
> dock:
> 
> [   38.380618] xhci_hcd 0000:03:00.0: not ready 1023ms after resume; waiting
> [   39.420587] xhci_hcd 0000:03:00.0: not ready 2047ms after resume; waiting
> [   41.527250] xhci_hcd 0000:03:00.0: not ready 4095ms after resume; waiting
> [   45.793957] xhci_hcd 0000:03:00.0: not ready 8191ms after resume; waiting
> [   54.113950] xhci_hcd 0000:03:00.0: not ready 16383ms after resume; waiting
> [   71.180576] xhci_hcd 0000:03:00.0: not ready 32767ms after resume; waiting
> ...
> [  105.313963] xhci_hcd 0000:03:00.0: not ready 65535ms after resume; giving up
> [  105.314037] xhci_hcd 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
> [  105.315640] xhci_hcd 0000:03:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
> ...
> 
> This ends up slowing down the resume time considerably. For this reason
> mark these devices as disconnected if the link above them did not train
> properly.
> 
> Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
> Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Reviewed-by: Lukas Wunner <lukas@wunner.de>
Bjorn Helgaas Sept. 21, 2023, 8:19 p.m. UTC | #2
[+cc Kamil, Chris]

On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> Mark Blakeney reported that when suspending system with a Thunderbolt
> dock connected and then unplugging the dock before resume (which is
> pretty normal flow with laptops), resuming takes long time.
> 
> What happens is that the PCIe link from the root port to the PCIe switch
> inside the Thunderbolt device does not train (as expected, the link is
> upplugged):
> 
> [   34.903158] pcieport 0000:00:07.2: restoring config space at offset 0x24 (was 0x3bf12001, writing 0x3bf12001)
> [   34.903231] pcieport 0000:00:07.0: waiting 100 ms for downstream link
> [   36.140616] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
> 
> However, at this point we still try the resume the devices below that
> unplugged link:
> 
> [   36.140741] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
> ...
> [   36.142235] pcieport 0000:01:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> ...
> [   36.144702] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation
> 
> And this is the link from PCIe switch downstream port to the xHCI on the
> dock:
> 
> [   38.380618] xhci_hcd 0000:03:00.0: not ready 1023ms after resume; waiting
> [   39.420587] xhci_hcd 0000:03:00.0: not ready 2047ms after resume; waiting
> [   41.527250] xhci_hcd 0000:03:00.0: not ready 4095ms after resume; waiting
> [   45.793957] xhci_hcd 0000:03:00.0: not ready 8191ms after resume; waiting
> [   54.113950] xhci_hcd 0000:03:00.0: not ready 16383ms after resume; waiting
> [   71.180576] xhci_hcd 0000:03:00.0: not ready 32767ms after resume; waiting
> ...
> [  105.313963] xhci_hcd 0000:03:00.0: not ready 65535ms after resume; giving up
> [  105.314037] xhci_hcd 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
> [  105.315640] xhci_hcd 0000:03:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
> ...
> 
> This ends up slowing down the resume time considerably. For this reason
> mark these devices as disconnected if the link above them did not train
> properly.
> 
> Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
> Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Applied with Lukas' Reviewed-by to pm for v6.7.

e8b908146d44 appeared in v6.4.  Seems like maybe a candidate for
stable?  IIUC, resume actually does work, but takes 65+ seconds longer
than it should?

Kamil also bisected a 60+ second resume delay to e8b908146d44
(https://lore.kernel.org/r/CA+cBOTeWrsTyANjLZQ=bGoBQ_yOkkV1juyRvJq-C8GOrbW6t9Q@mail.gmail.com),
but IIUC at
https://lore.kernel.org/linux-pci/20230824114300.GU3465@black.fi.intel.com/T/#u
you concluded that Kamil's issue was related to firmware and actually
had nothing to do with e8b908146d44.

Do you still think Kamil's issue is unrelated to e8b908146d44 and this
patch?  If so, how do we handle Kamil's issue?  An answer like "users
of v6.4+ must upgrade their Thunderbolt firmware" seems like it would
be kind of a nightmare for users.

> ---
>  drivers/pci/pci-driver.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a79c110c7e51..51ec9e7e784f 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -572,7 +572,19 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
>  
>  static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
>  {
> -	pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
> +	int ret;
> +
> +	ret = pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
> +	if (ret) {
> +		/*
> +		 * The downstream link failed to come up, so mark the
> +		 * devices below as disconnected to make sure we don't
> +		 * attempt to resume them.
> +		 */
> +		pci_walk_bus(pci_dev->subordinate, pci_dev_set_disconnected,
> +			     NULL);
> +		return;
> +	}
>  
>  	/*
>  	 * When powering on a bridge from D3cold, the whole hierarchy may be
> -- 
> 2.40.1
>
Mika Westerberg Sept. 22, 2023, 4:42 a.m. UTC | #3
Hi Bjorn,

On Thu, Sep 21, 2023 at 03:19:45PM -0500, Bjorn Helgaas wrote:
> [+cc Kamil, Chris]
> 
> On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> > Mark Blakeney reported that when suspending system with a Thunderbolt
> > dock connected and then unplugging the dock before resume (which is
> > pretty normal flow with laptops), resuming takes long time.
> > 
> > What happens is that the PCIe link from the root port to the PCIe switch
> > inside the Thunderbolt device does not train (as expected, the link is
> > upplugged):
> > 
> > [   34.903158] pcieport 0000:00:07.2: restoring config space at offset 0x24 (was 0x3bf12001, writing 0x3bf12001)
> > [   34.903231] pcieport 0000:00:07.0: waiting 100 ms for downstream link
> > [   36.140616] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
> > 
> > However, at this point we still try the resume the devices below that
> > unplugged link:
> > 
> > [   36.140741] pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
> > ...
> > [   36.142235] pcieport 0000:01:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> > ...
> > [   36.144702] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation
> > 
> > And this is the link from PCIe switch downstream port to the xHCI on the
> > dock:
> > 
> > [   38.380618] xhci_hcd 0000:03:00.0: not ready 1023ms after resume; waiting
> > [   39.420587] xhci_hcd 0000:03:00.0: not ready 2047ms after resume; waiting
> > [   41.527250] xhci_hcd 0000:03:00.0: not ready 4095ms after resume; waiting
> > [   45.793957] xhci_hcd 0000:03:00.0: not ready 8191ms after resume; waiting
> > [   54.113950] xhci_hcd 0000:03:00.0: not ready 16383ms after resume; waiting
> > [   71.180576] xhci_hcd 0000:03:00.0: not ready 32767ms after resume; waiting
> > ...
> > [  105.313963] xhci_hcd 0000:03:00.0: not ready 65535ms after resume; giving up
> > [  105.314037] xhci_hcd 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
> > [  105.315640] xhci_hcd 0000:03:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
> > ...
> > 
> > This ends up slowing down the resume time considerably. For this reason
> > mark these devices as disconnected if the link above them did not train
> > properly.
> > 
> > Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
> > Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> Applied with Lukas' Reviewed-by to pm for v6.7.

Thanks!

> e8b908146d44 appeared in v6.4.  Seems like maybe a candidate for
> stable?  IIUC, resume actually does work, but takes 65+ seconds longer
> than it should?

Yes, I think it should be tagged for stable.

> Kamil also bisected a 60+ second resume delay to e8b908146d44
> (https://lore.kernel.org/r/CA+cBOTeWrsTyANjLZQ=bGoBQ_yOkkV1juyRvJq-C8GOrbW6t9Q@mail.gmail.com),
> but IIUC at
> https://lore.kernel.org/linux-pci/20230824114300.GU3465@black.fi.intel.com/T/#u
> you concluded that Kamil's issue was related to firmware and actually
> had nothing to do with e8b908146d44.
> 
> Do you still think Kamil's issue is unrelated to e8b908146d44 and this
> patch?  If so, how do we handle Kamil's issue?  An answer like "users
> of v6.4+ must upgrade their Thunderbolt firmware" seems like it would
> be kind of a nightmare for users.

It's a different issue. What happens in his system is that the link went
down even though the dock was still connected and this should not happen
(the firmware should bring the link up during resume). The delay was
just a "symptom".

What happen here is that the user suspends the device and deliberately
disconnects the dock.
Linux regression tracking (Thorsten Leemhuis) Sept. 22, 2023, 11:45 a.m. UTC | #4
On 21.09.23 22:19, Bjorn Helgaas wrote:
> [+cc Kamil, Chris]
> 
> On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
>> Mark Blakeney reported that when suspending system with a Thunderbolt
>> dock connected and then unplugging the dock before resume (which is
>> pretty normal flow with laptops), resuming takes long time.
>>
>> What happens is that the PCIe link from the root port to the PCIe switch
>> inside the Thunderbolt device does not train (as expected, the link is
>> upplugged):
> [...]
>> Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
>> Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
>> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> Applied with Lukas' Reviewed-by to pm for v6.7.
>
> e8b908146d44 appeared in v6.4. 

Then why did you apply this for 6.7 and not to a branch targeting the
current cycle? Linus wants regression introduced during round about the
last 12 months to be handled liked regressions from the current cycle,
unless there is some good reason to treat the fix differently (big risk
of other regressions for example).

> Seems like maybe a candidate for stable? 

+1

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
Bjorn Helgaas Sept. 22, 2023, 12:41 p.m. UTC | #5
On Fri, Sep 22, 2023 at 01:45:58PM +0200, Thorsten Leemhuis wrote:
> On 21.09.23 22:19, Bjorn Helgaas wrote:
> > On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> >> Mark Blakeney reported that when suspending system with a Thunderbolt
> >> dock connected and then unplugging the dock before resume (which is
> >> pretty normal flow with laptops), resuming takes long time.
> >>
> >> What happens is that the PCIe link from the root port to the PCIe switch
> >> inside the Thunderbolt device does not train (as expected, the link is
> >> upplugged):
> > [...]
> >> Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
> >> Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
> >> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
> >> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > 
> > Applied with Lukas' Reviewed-by to pm for v6.7.
> >
> > e8b908146d44 appeared in v6.4. 
> 
> Then why did you apply this for 6.7 and not to a branch targeting the
> current cycle? Linus wants regression introduced during round about the
> last 12 months to be handled liked regressions from the current cycle,

I was not aware of the last 12 months rule.  Happy to change if that's
the guideline.  My previous rule of thumb was: fixes for regressions
in the most recent merge window always go to current cycle, fixes for
older regressions case-by-case.

Bjorn
Linux regression tracking (Thorsten Leemhuis) Sept. 22, 2023, 12:53 p.m. UTC | #6
On 22.09.23 14:41, Bjorn Helgaas wrote:
> On Fri, Sep 22, 2023 at 01:45:58PM +0200, Thorsten Leemhuis wrote:
>> On 21.09.23 22:19, Bjorn Helgaas wrote:
>>> On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
>>>> Mark Blakeney reported that when suspending system with a Thunderbolt
>>>> dock connected and then unplugging the dock before resume (which is
>>>> pretty normal flow with laptops), resuming takes long time.
>>>>
>>>> What happens is that the PCIe link from the root port to the PCIe switch
>>>> inside the Thunderbolt device does not train (as expected, the link is
>>>> upplugged):
>>> [...]
>>>> Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
>>>> Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
>>>> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>>>
>>> Applied with Lukas' Reviewed-by to pm for v6.7.
>>>
>>> e8b908146d44 appeared in v6.4. 
>>
>> Then why did you apply this for 6.7 and not to a branch targeting the
>> current cycle? Linus wants regression introduced during round about the
>> last 12 months to be handled liked regressions from the current cycle,
> 
> I was not aware of the last 12 months rule.  Happy to change if that's
> the guideline.  

Thx. FWIW, if you want to know what Linus said exactly, check these out:

https://lore.kernel.org/all/CAHk-=wis_qQy4oDNynNKi5b7Qhosmxtoj1jxo5wmB6SRUwQUBQ@mail.gmail.com/
https://lore.kernel.org/all/CAHk-=wgD98pmSK3ZyHk_d9kZ2bhgN6DuNZMAJaV0WTtbkf=RDw@mail.gmail.com/

> My previous rule of thumb was: fixes for regressions
> in the most recent merge window always go to current cycle, fixes for
> older regressions case-by-case.

Yeah, there are cases where waiting is the right thing, but most of the
time it's not I'd say.

Ciao, Thorsten
Bjorn Helgaas Sept. 22, 2023, 12:59 p.m. UTC | #7
[+cc Thorsten]

On Fri, Sep 22, 2023 at 07:42:37AM +0300, Mika Westerberg wrote:
> On Thu, Sep 21, 2023 at 03:19:45PM -0500, Bjorn Helgaas wrote:
> > On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> ...

> > Kamil also bisected a 60+ second resume delay to e8b908146d44
> > (https://lore.kernel.org/r/CA+cBOTeWrsTyANjLZQ=bGoBQ_yOkkV1juyRvJq-C8GOrbW6t9Q@mail.gmail.com),
> > but IIUC at
> > https://lore.kernel.org/linux-pci/20230824114300.GU3465@black.fi.intel.com/T/#u
> > you concluded that Kamil's issue was related to firmware and actually
> > had nothing to do with e8b908146d44.
> > 
> > Do you still think Kamil's issue is unrelated to e8b908146d44 and this
> > patch?  If so, how do we handle Kamil's issue?  An answer like "users
> > of v6.4+ must upgrade their Thunderbolt firmware" seems like it would
> > be kind of a nightmare for users.
> 
> It's a different issue. What happens in his system is that the link went
> down even though the dock was still connected and this should not happen
> (the firmware should bring the link up during resume). The delay was
> just a "symptom".

Do you have any leads for Kamil's issue?  If we had known that
e8b908146d44 would cause that problem, we never would have applied it
in the first place.

No OS would accept that resume delay, so there must be some way to fix
that in the OS without requiring a firmware update.

If Kamil's issue is that firmware doesn't bring up the link during
resume, how *does* the link get brought up, and what does the delay
have to do with it?

Bjorn
Mika Westerberg Sept. 24, 2023, 1:44 p.m. UTC | #8
On Fri, Sep 22, 2023 at 07:59:26AM -0500, Bjorn Helgaas wrote:
> [+cc Thorsten]
> 
> On Fri, Sep 22, 2023 at 07:42:37AM +0300, Mika Westerberg wrote:
> > On Thu, Sep 21, 2023 at 03:19:45PM -0500, Bjorn Helgaas wrote:
> > > On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> > ...
> 
> > > Kamil also bisected a 60+ second resume delay to e8b908146d44
> > > (https://lore.kernel.org/r/CA+cBOTeWrsTyANjLZQ=bGoBQ_yOkkV1juyRvJq-C8GOrbW6t9Q@mail.gmail.com),
> > > but IIUC at
> > > https://lore.kernel.org/linux-pci/20230824114300.GU3465@black.fi.intel.com/T/#u
> > > you concluded that Kamil's issue was related to firmware and actually
> > > had nothing to do with e8b908146d44.
> > > 
> > > Do you still think Kamil's issue is unrelated to e8b908146d44 and this
> > > patch?  If so, how do we handle Kamil's issue?  An answer like "users
> > > of v6.4+ must upgrade their Thunderbolt firmware" seems like it would
> > > be kind of a nightmare for users.
> > 
> > It's a different issue. What happens in his system is that the link went
> > down even though the dock was still connected and this should not happen
> > (the firmware should bring the link up during resume). The delay was
> > just a "symptom".
> 
> Do you have any leads for Kamil's issue?  If we had known that
> e8b908146d44 would cause that problem, we never would have applied it
> in the first place.

I explained it in the other email I just sent. I should mention here
that the two issues are different.

> No OS would accept that resume delay, so there must be some way to fix
> that in the OS without requiring a firmware update.

It is not "resume" delay. It is the delay what we wait for the device to
become ready until we decide it is not functional/disconnect. That delay
is completely arbitrary.

> If Kamil's issue is that firmware doesn't bring up the link during
> resume, how *does* the link get brought up, and what does the delay
> have to do with it?

The PCIe tunnel (the "link" above) gets established after D3cold by the
Thunderbolt firmware running inside the host controller. The trigger is
typically when _PR0 ACPI method is called, this sends special command
through the mailbox that makes the firmware re-connect all the tunnels
that were previously connected.

The delay we are talking about here is the PCIe spec required delay
after the device went through a reset that the OS must observe before it
can send configuration requests to that device. Now, the PCIe spec does
not specify how long the OS should wait for device on a link that does
not come up. We increased that delay to the ~60s to fix another issue on
a xHCI controller but forgot the fact that when the device is
deliberately unplugged we still wait for the ~60s which is wasted effort
and just ends up annoying users.
Bjorn Helgaas Sept. 29, 2023, 10:45 p.m. UTC | #9
[+cc Thorsten]

On Thu, Sep 21, 2023 at 03:19:45PM -0500, Bjorn Helgaas wrote:
> On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote:
> > Mark Blakeney reported that when suspending system with a Thunderbolt
> > dock connected and then unplugging the dock before resume (which is
> > pretty normal flow with laptops), resuming takes long time.
> > ...

> > Fixes: e8b908146d44 ("PCI/PM: Increase wait time after resume")
> > Reported-by: Mark Blakeney <mark.blakeney@bullet-systems.net>
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217915
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> Applied with Lukas' Reviewed-by to pm for v6.7.
> 
> e8b908146d44 appeared in v6.4.  Seems like maybe a candidate for
> stable?  IIUC, resume actually does work, but takes 65+ seconds longer
> than it should?

I moved this to for-linus for v6.6 and added a stable tag for v6.4+.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index a79c110c7e51..51ec9e7e784f 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -572,7 +572,19 @@  static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
 
 static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
 {
-	pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
+	int ret;
+
+	ret = pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
+	if (ret) {
+		/*
+		 * The downstream link failed to come up, so mark the
+		 * devices below as disconnected to make sure we don't
+		 * attempt to resume them.
+		 */
+		pci_walk_bus(pci_dev->subordinate, pci_dev_set_disconnected,
+			     NULL);
+		return;
+	}
 
 	/*
 	 * When powering on a bridge from D3cold, the whole hierarchy may be