PCI: increase D3 delay for AMD Ryzen5/7 XHCI controllers
diff mbox series

Message ID 20191014061355.29072-1-drake@endlessm.com
State New
Delegated to: Bjorn Helgaas
Headers show
Series
  • PCI: increase D3 delay for AMD Ryzen5/7 XHCI controllers
Related show

Commit Message

Daniel Drake Oct. 14, 2019, 6:13 a.m. UTC
On Asus laptops with AMD Ryzen7 3700U and AMD Ryzen5 3500U,
the XHCI controller fails to resume from runtime suspend or s2idle,
and USB becomes unusable from that point.

xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout
xhci_hcd 0000:03:00.4: PCI post-resume error -110!
xhci_hcd 0000:03:00.4: HC died; cleaning up

The D3-to-D0 transition is successful if the D3 delay is increased
to 20ms. Add an appropriate quirk for the affected hardware.

Link: http://lkml.kernel.org/r/CAD8Lp47Vh69gQjROYG69=waJgL7hs1PwnLonL9+27S_TcRhixA@mail.gmail.com
Signed-off-by: Daniel Drake <drake@endlessm.com>
---
 drivers/pci/quirks.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

Comments

Bjorn Helgaas Oct. 14, 2019, 3:43 p.m. UTC | #1
[+cc Mika]

On Mon, Oct 14, 2019 at 02:13:55PM +0800, Daniel Drake wrote:
> On Asus laptops with AMD Ryzen7 3700U and AMD Ryzen5 3500U,
> the XHCI controller fails to resume from runtime suspend or s2idle,
> and USB becomes unusable from that point.
> 
> xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
> xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout
> xhci_hcd 0000:03:00.4: PCI post-resume error -110!
> xhci_hcd 0000:03:00.4: HC died; cleaning up
> 
> The D3-to-D0 transition is successful if the D3 delay is increased
> to 20ms. Add an appropriate quirk for the affected hardware.
> 
> Link: http://lkml.kernel.org/r/CAD8Lp47Vh69gQjROYG69=waJgL7hs1PwnLonL9+27S_TcRhixA@mail.gmail.com
> Signed-off-by: Daniel Drake <drake@endlessm.com>

Can you tell if this is because the Ryzen7 XHCI controller is out of
spec, or is the Linux PCI core missing some delay?  If the latter,
fixing the core might fix other devices as well.

Mika has this patch:
https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com
for similar issues, but I think that patch fixes D3cold->D0
transitions, and your patch appears to be concerned with D3hot->D0
transitions.

> ---
>  drivers/pci/quirks.c | 30 +++++++++++++++++++++++-------
>  1 file changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 320255e5e8f8..4570439a6a6c 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1871,19 +1871,35 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x2609, quirk_intel_pcie_pm);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x260a, quirk_intel_pcie_pm);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x260b, quirk_intel_pcie_pm);
>  
> +static void quirk_d3_delay(struct pci_dev *dev, unsigned int delay)
> +{
> +	if (dev->d3_delay >= delay)
> +		return;
> +
> +	dev->d3_delay = delay;
> +	pci_info(dev, "extending delay after power-on from D3 to %d msec\n",
> +		 dev->d3_delay);
> +}
> +
>  static void quirk_radeon_pm(struct pci_dev *dev)
>  {
>  	if (dev->subsystem_vendor == PCI_VENDOR_ID_APPLE &&
> -	    dev->subsystem_device == 0x00e2) {
> -		if (dev->d3_delay < 20) {
> -			dev->d3_delay = 20;
> -			pci_info(dev, "extending delay after power-on from D3 to %d msec\n",
> -				 dev->d3_delay);
> -		}
> -	}
> +	    dev->subsystem_device == 0x00e2)
> +		quirk_d3_delay(dev, 20);
>  }
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm);
>  
> +/*
> + * Ryzen7 XHCI controllers fail upon resume from runtime suspend or s2idle
> + * unless an extended D3 delay is used.
> + */
> +static void quirk_ryzen_xhci_d3(struct pci_dev *dev)
> +{
> +	quirk_d3_delay(dev, 20);
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3);
> +
>  #ifdef CONFIG_X86_IO_APIC
>  static int dmi_disable_ioapicreroute(const struct dmi_system_id *d)
>  {
> -- 
> 2.20.1
>
Daniel Drake Oct. 15, 2019, 5:31 a.m. UTC | #2
On Mon, Oct 14, 2019 at 11:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> Can you tell if this is because the Ryzen7 XHCI controller is out of
> spec, or is the Linux PCI core missing some delay?  If the latter,
> fixing the core might fix other devices as well.
>
> Mika has this patch:
> https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com
> for similar issues, but I think that patch fixes D3cold->D0
> transitions, and your patch appears to be concerned with D3hot->D0
> transitions.

It's actually coming out of D3cold here, however what happens right
before this is that __pci_start_power_transition() calls
pci_platform_power_transition(D0) to leave D3cold state, then
pci_update_current_state() reads PMCSR and updates dev->current_state
to D3hot.

The 20ms delay for these XHCI controllers is needed precisely at this
point - after writing PMCSR to move to D0, and before reading it back
to check the result.
I tried moving the delay immediately before writing PMCSR, but that
doesn't work. Based on that, it seems like it's just a little out of
spec.

With Mika's patch, pcie_wait_downstream_accessible() is called for
these devices after the state transition has already failed. It also
doesn't do any delaying at that point because pci_pcie_type(pdev) ==
0.

Daniel
Rafael J. Wysocki Oct. 15, 2019, 5:52 p.m. UTC | #3
On Tue, Oct 15, 2019 at 7:31 AM Daniel Drake <drake@endlessm.com> wrote:
>
> On Mon, Oct 14, 2019 at 11:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > Can you tell if this is because the Ryzen7 XHCI controller is out of
> > spec, or is the Linux PCI core missing some delay?  If the latter,
> > fixing the core might fix other devices as well.
> >
> > Mika has this patch:
> > https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com
> > for similar issues, but I think that patch fixes D3cold->D0
> > transitions, and your patch appears to be concerned with D3hot->D0
> > transitions.
>
> It's actually coming out of D3cold here, however what happens right
> before this is that __pci_start_power_transition() calls
> pci_platform_power_transition(D0) to leave D3cold state, then
> pci_update_current_state() reads PMCSR and updates dev->current_state
> to D3hot.

Which pci_update_current_state() do you mean, exactly?

Note that pci_platform_power_transition() itself contains one, which
triggers after a successful change of the ACPI power state of the
device (which should be the case here).

Then, there is one in pci_power_up(), but before that the device's
PMCSR is read from and written to in pci_raw_set_power_state().

It looks like the reads from it after the ACPI power state change are
all successful, but the write isn't unless there is the delay between
it and the former platform_pci_set_power_state().

> The 20ms delay for these XHCI controllers is needed precisely at this
> point - after writing PMCSR to move to D0, and before reading it back
> to check the result.
> I tried moving the delay immediately before writing PMCSR, but that
> doesn't work. Based on that, it seems like it's just a little out of
> spec.

That I agree with and the platform firmware doesn't compensate for
that (which it could do, arguably).
Daniel Drake Oct. 16, 2019, 6:14 a.m. UTC | #4
On Wed, Oct 16, 2019 at 1:52 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Oct 15, 2019 at 7:31 AM Daniel Drake <drake@endlessm.com> wrote:
> > It's actually coming out of D3cold here, however what happens right
> > before this is that __pci_start_power_transition() calls
> > pci_platform_power_transition(D0) to leave D3cold state, then
> > pci_update_current_state() reads PMCSR and updates dev->current_state
> > to D3hot.
>
> Which pci_update_current_state() do you mean, exactly?
>
> Note that pci_platform_power_transition() itself contains one, which
> triggers after a successful change of the ACPI power state of the
> device (which should be the case here).

That's the one I mean.

pci_pm_default_resume_early
- pci_power_up
-- __pci_start_power_transition
--- pci_platform_power_transition
---- pci_update_current_state(D0)

At this point, PMCSR is read and dev->current_status is set to D3 accordingly.

Then, continuing in pci_power_up(), pci_raw_set_power_state(D0) is
called and the extra delay is needed there after writing PMCSR to
transition to D0.
I didn't go further along the call trace because at that point the
problem has already been triggered.

> That I agree with and the platform firmware doesn't compensate for
> that (which it could do, arguably).

I tried to get official AMD support on this but their response was
that they don't have available resources to dedicate to Linux support.
Without guidance from AMD I don't think there's any chance of a
firmware change from the platform vendor.

I think we just have to figure out how to work with it. It seems that
Windows 10 delays longer or uses some other scheme. Another
alternative that I just tried is retrying the PMCSR write & readback
if it didn't complete the transition on the first try. That works
fine, let me know if it's preferred to implement something along those
lines as a more generic workaround.

Daniel
Mika Westerberg Oct. 21, 2019, 11:33 a.m. UTC | #5
Hi,

Sorry for the delay. I was on vacation last week.

On Tue, Oct 15, 2019 at 01:31:32PM +0800, Daniel Drake wrote:
> On Mon, Oct 14, 2019 at 11:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > Can you tell if this is because the Ryzen7 XHCI controller is out of
> > spec, or is the Linux PCI core missing some delay?  If the latter,
> > fixing the core might fix other devices as well.
> >
> > Mika has this patch:
> > https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com
> > for similar issues, but I think that patch fixes D3cold->D0
> > transitions, and your patch appears to be concerned with D3hot->D0
> > transitions.
> 
> It's actually coming out of D3cold here, however what happens right
> before this is that __pci_start_power_transition() calls
> pci_platform_power_transition(D0) to leave D3cold state, then
> pci_update_current_state() reads PMCSR and updates dev->current_state
> to D3hot.
> 
> The 20ms delay for these XHCI controllers is needed precisely at this
> point - after writing PMCSR to move to D0, and before reading it back
> to check the result.
> I tried moving the delay immediately before writing PMCSR, but that
> doesn't work. Based on that, it seems like it's just a little out of
> spec.
> 
> With Mika's patch, pcie_wait_downstream_accessible() is called for
> these devices after the state transition has already failed. It also
> doesn't do any delaying at that point because pci_pcie_type(pdev) ==
> 0.

Just to be sure, did you try the patch or just looked at it? Because
what the patch does is that it does the delay when the downstream/root
port is resumed, not the xHCI itself.
Daniel Drake Oct. 22, 2019, 2:40 a.m. UTC | #6
On Mon, Oct 21, 2019 at 7:33 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> Just to be sure, did you try the patch or just looked at it? Because
> what the patch does is that it does the delay when the downstream/root
> port is resumed, not the xHCI itself.

I tried it, it didn't fix the problem.

Daniel
Mika Westerberg Oct. 22, 2019, 9:33 a.m. UTC | #7
On Tue, Oct 22, 2019 at 10:40:00AM +0800, Daniel Drake wrote:
> On Mon, Oct 21, 2019 at 7:33 PM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> > Just to be sure, did you try the patch or just looked at it? Because
> > what the patch does is that it does the delay when the downstream/root
> > port is resumed, not the xHCI itself.
> 
> I tried it, it didn't fix the problem.

:(

It may very well be that this particular xHCI controller needs more than
that 10ms from D3hot -> D0 transition. Again the PCIe spec says the 10ms
is the minimum time system software needs to delay but it does not say
what would be the maximum time the function absolutely must be properly
in D0.
Bjorn Helgaas Oct. 23, 2019, 10:40 p.m. UTC | #8
On Tue, Oct 22, 2019 at 12:33:49PM +0300, Mika Westerberg wrote:
> On Tue, Oct 22, 2019 at 10:40:00AM +0800, Daniel Drake wrote:
> > On Mon, Oct 21, 2019 at 7:33 PM Mika Westerberg
> > <mika.westerberg@linux.intel.com> wrote:
> > > Just to be sure, did you try the patch or just looked at it? Because
> > > what the patch does is that it does the delay when the downstream/root
> > > port is resumed, not the xHCI itself.
> > 
> > I tried it, it didn't fix the problem.
> 
> :(
> 
> It may very well be that this particular xHCI controller needs more than
> that 10ms from D3hot -> D0 transition. Again the PCIe spec says the 10ms
> is the minimum time system software needs to delay but it does not say
> what would be the maximum time the function absolutely must be properly
> in D0.

Hmm, PCIe r5.0, sec 2.3.1, says devices are permitted to return
Configuration Request Retry Status (CRS) after a "reset initiated in
response to a D3hot to D0uninitialized" transition.

I think that applies to this device because your lspci [1] shows:

	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

so No_Soft_Reset is clear, which means the D3hot to D0 transition goes
to D0uninitialized.

pci_raw_set_power_state() *does* delay, generally for 10ms:

  pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
  if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
    pci_dev_d3_sleep(dev);
  pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);

but there's no mention of CRS, so I think that config read is liable
to fail with CRS if the device isn't ready, and we won't notice.

I think we need something like the patch below.  We already do
basically the same thing in pci_pm_reset().

[1] https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e7982af9a5d8..e8702388830f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -883,9 +883,10 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 	 * Mandatory power management transition delays; see PCI PM 1.1
 	 * 5.6.1 table 18
 	 */
-	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
+	if (state == PCI_D3hot || dev->current_state == PCI_D3hot) {
 		pci_dev_d3_sleep(dev);
-	else if (state == PCI_D2 || dev->current_state == PCI_D2)
+		pci_dev_wait(dev, "D3 transition", PCIE_RESET_READY_POLL_MS);
+	} else if (state == PCI_D2 || dev->current_state == PCI_D2)
 		udelay(PCI_PM_D2_DELAY);
 
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
Daniel Drake Oct. 24, 2019, 3:28 a.m. UTC | #9
On Thu, Oct 24, 2019 at 6:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> I think we need something like the patch below.  We already do
> basically the same thing in pci_pm_reset().
>
> [1] https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e7982af9a5d8..e8702388830f 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -883,9 +883,10 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
>          * Mandatory power management transition delays; see PCI PM 1.1
>          * 5.6.1 table 18
>          */
> -       if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
> +       if (state == PCI_D3hot || dev->current_state == PCI_D3hot) {
>                 pci_dev_d3_sleep(dev);
> -       else if (state == PCI_D2 || dev->current_state == PCI_D2)
> +               pci_dev_wait(dev, "D3 transition", PCIE_RESET_READY_POLL_MS);
> +       } else if (state == PCI_D2 || dev->current_state == PCI_D2)
>                 udelay(PCI_PM_D2_DELAY);
>
>         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);

You also need to move the pci_dev_wait function definition higher up
in the file.
Tested and that doesn't help this case unfortunately. pci_dev_wait
doesn't do anything since PCI_COMMAND value at this point is 0x100403
:(

Daniel
Bjorn Helgaas Oct. 24, 2019, 5 p.m. UTC | #10
On Thu, Oct 24, 2019 at 11:28:59AM +0800, Daniel Drake wrote:
> On Thu, Oct 24, 2019 at 6:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > I think we need something like the patch below.  We already do
> > basically the same thing in pci_pm_reset().
> >
> > [1] https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index e7982af9a5d8..e8702388830f 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -883,9 +883,10 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
> >          * Mandatory power management transition delays; see PCI PM 1.1
> >          * 5.6.1 table 18
> >          */
> > -       if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
> > +       if (state == PCI_D3hot || dev->current_state == PCI_D3hot) {
> >                 pci_dev_d3_sleep(dev);
> > -       else if (state == PCI_D2 || dev->current_state == PCI_D2)
> > +               pci_dev_wait(dev, "D3 transition", PCIE_RESET_READY_POLL_MS);
> > +       } else if (state == PCI_D2 || dev->current_state == PCI_D2)
> >                 udelay(PCI_PM_D2_DELAY);
> >
> >         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> 
> You also need to move the pci_dev_wait function definition higher up
> in the file.

OK, that would need a little tweaking.  Also, we should only need this
for the D3hot->D0 transition and probably only when No_Soft_Reset is
clear.  We shouldn't need it for the D0->D3hot transition, since
that's not a reset.

> Tested and that doesn't help this case unfortunately. pci_dev_wait
> doesn't do anything since PCI_COMMAND value at this point is 0x100403

That's really strange.  Your original message showed:

  xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
  xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)

The first line is from pci_raw_set_power_state() reading PCI_PM_CTRL,
but we can't tell whether the read failed and we got ~0, or it
succeeded and we got something with just the low two bits set.  Can
you print out the whole value so we can see what happened?

The second line is from pci_enable_resources() reading PCI_COMMAND,
and it got *0*, not 0x0403 as you got from the CRS experiment.
Daniel Drake Oct. 25, 2019, 7:11 a.m. UTC | #11
On Fri, Oct 25, 2019 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> That's really strange.  Your original message showed:
>
>   xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
>   xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
>
> The first line is from pci_raw_set_power_state() reading PCI_PM_CTRL,
> but we can't tell whether the read failed and we got ~0, or it
> succeeded and we got something with just the low two bits set.  Can
> you print out the whole value so we can see what happened?
>
> The second line is from pci_enable_resources() reading PCI_COMMAND,
> and it got *0*, not 0x0403 as you got from the CRS experiment.

Thanks for persisting here. In more detail:

pci_pm_resume_noirq
- pci_pm_default_resume_early
-- pci_raw_set_power_state(D0)

At this point, pci_dev_wait() reads PCI_COMMAND to be 0x100403 (32-bit
read) - so no wait.
pci_raw_set_power_state writes to PM_CTRL and then reads it back with value 0x3.
>   xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3

At the point of return from pci_pm_resume_noirq, an extra check I
added shows that PCI_COMMAND has value 0x403 (16-bit read).
35ms later, pci_pm_resume is entered, and I checked that at this
point, PCI_COMMAND has value 0.
It then goes on to reach pci_enable_resources().
>   xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)

The change in PCI_COMMAND value is just down to timing.
At the end of pci_pm_resume_noirq(), if I log PCI_COMMAND, wait 10ms,
and log PCI_COMMAND again, I see it transition from 0x403 to 0.

Daniel
Bjorn Helgaas Oct. 25, 2019, 4:28 p.m. UTC | #12
On Fri, Oct 25, 2019 at 03:11:49PM +0800, Daniel Drake wrote:
> On Fri, Oct 25, 2019 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > That's really strange.  Your original message showed:
> >
> >   xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> >   xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
> >
> > The first line is from pci_raw_set_power_state() reading PCI_PM_CTRL,
> > but we can't tell whether the read failed and we got ~0, or it
> > succeeded and we got something with just the low two bits set.  Can
> > you print out the whole value so we can see what happened?
> >
> > The second line is from pci_enable_resources() reading PCI_COMMAND,
> > and it got *0*, not 0x0403 as you got from the CRS experiment.
> 
> Thanks for persisting here. In more detail:
> 
> pci_pm_resume_noirq
> - pci_pm_default_resume_early
> -- pci_raw_set_power_state(D0)
> 
> At this point, pci_dev_wait() reads PCI_COMMAND to be 0x100403 (32-bit
> read) - so no wait.

Just thinking out loud here: This is before writing PCI_PM_CTRL.  The
device should be in D3hot and 0x100403 is PCI_COMMAND_IO |
PCI_COMMAND_MEMORY | PCI_COMMAND_INTX_DISABLE (and
PCI_STATUS_CAP_LIST), which mostly matches your lspci (it's missing
PCI_COMMAND_MASTER, but maybe that got turned off during suspend).
It's a little strange that PCI_COMMAND_IO is set because 03:00.3 has
no I/O BARs, but maybe that was set by BIOS at boot-time.

> pci_raw_set_power_state writes to PM_CTRL and then reads it back
> with value 0x3.

When you write D0 to PCI_PM_CTRL the device does a soft reset, so
pci_raw_set_power_state() delays before the next access.

When you read PCI_PM_CTRL again, I think you *should* get either
0x0000 (indicating that the device is in D0) or 0xffff (if the read
failed with a Config Request Retry Status (CRS) because the device
wasn't ready yet).

I can't explain why you would read 0x0003 (not 0xffff) from
PCI_PM_CTRL.

What happens if you do a dword read from PCI_VENDOR_ID here (after the
delay but before pci_dev_wait() or reading PCI_PM_CTRL)?

We have CRS "software visibility" enabled, and the expectation in the
spec is that software will read PCI_VENDOR_ID to see whether the
device is ready: 0x0001 means the read got a CRS completion (device
isn't ready), valid Vendor ID means device is ready, and 0xffff
indicates some other error.

pci_dev_wait() reads PCI_COMMAND, not PCI_VENDOR_ID, so maybe there's
some wrinkle in how that's handled.

You might also try changing pci_enable_crs() to disable
PCI_EXP_RTCTL_CRSSVE instead of enabling it to see if that makes any
difference.  CRS SV has kind of a checkered history and I'm a little
dubious about whether it buys us anything.

> >   xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> 
> At the point of return from pci_pm_resume_noirq, an extra check I
> added shows that PCI_COMMAND has value 0x403 (16-bit read).

If PCI_COMMAND is non-zero at that point, I think something's wrong.
It should be zero by the time pci_raw_set_power_state() reads
PCI_PM_CTRL after the D3 delay.  By that time, we assume the reset has
happened and the device is in D0uninitialized and fully accessible.

> 35ms later, pci_pm_resume is entered, and I checked that at this
> point, PCI_COMMAND has value 0.
> It then goes on to reach pci_enable_resources().
> >   xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
> 
> The change in PCI_COMMAND value is just down to timing.
> At the end of pci_pm_resume_noirq(), if I log PCI_COMMAND, wait 10ms,
> and log PCI_COMMAND again, I see it transition from 0x403 to 0.
> 
> Daniel
Daniel Drake Oct. 28, 2019, 6:32 a.m. UTC | #13
On Sat, Oct 26, 2019 at 12:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > pci_pm_resume_noirq
> > - pci_pm_default_resume_early
> > -- pci_raw_set_power_state(D0)
> >
> > At this point, pci_dev_wait() reads PCI_COMMAND to be 0x100403 (32-bit
> > read) - so no wait.
>
> Just thinking out loud here: This is before writing PCI_PM_CTRL. The

It's not - it's after writing PCI_PM_CTRL, but before reading it back.

> device should be in D3hot and 0x100403 is PCI_COMMAND_IO |
> PCI_COMMAND_MEMORY | PCI_COMMAND_INTX_DISABLE (and
> PCI_STATUS_CAP_LIST), which mostly matches your lspci (it's missing
> PCI_COMMAND_MASTER, but maybe that got turned off during suspend).
> It's a little strange that PCI_COMMAND_IO is set because 03:00.3 has
> no I/O BARs, but maybe that was set by BIOS at boot-time.

I also checked PCI_COMMAND before writing PCI_PM_CTRL, it's the same
value 0x403.
Immediately after writing PCI_PM_CTRL, it holds the same value.
10ms later (after pci_dev_d3_sleep()), it holds the same value.
Another 10ms later, it has value 0.

> > pci_raw_set_power_state writes to PM_CTRL and then reads it back
> > with value 0x3.
>
> When you write D0 to PCI_PM_CTRL the device does a soft reset, so
> pci_raw_set_power_state() delays before the next access.
>
> When you read PCI_PM_CTRL again, I think you *should* get either
> 0x0000 (indicating that the device is in D0) or 0xffff (if the read
> failed with a Config Request Retry Status (CRS) because the device
> wasn't ready yet).

PCI_PM_CTRL stats with value 0x103.
Then 0 is written and pci_dev_d3_sleep() delays 10ms.
At this point it has value 0x3.
After an additional 10ms delay, it has value 0.

> I can't explain why you would read 0x0003 (not 0xffff) from
> PCI_PM_CTRL.
>
> What happens if you do a dword read from PCI_VENDOR_ID here (after the
> delay but before pci_dev_wait() or reading PCI_PM_CTRL)?

Vendor ID remains 0x1022 at all points.

> You might also try changing pci_enable_crs() to disable
> PCI_EXP_RTCTL_CRSSVE instead of enabling it to see if that makes any
> difference.  CRS SV has kind of a checkered history and I'm a little
> dubious about whether it buys us anything.

I stubbed out that register write which would have otherwise applied
to 8 PCI devices (but not the XHCI controllers), it still fails in the
same way unless the delay is increased.

> > >   xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> >
> > At the point of return from pci_pm_resume_noirq, an extra check I
> > added shows that PCI_COMMAND has value 0x403 (16-bit read).
>
> If PCI_COMMAND is non-zero at that point, I think something's wrong.
> It should be zero by the time pci_raw_set_power_state() reads
> PCI_PM_CTRL after the D3 delay.  By that time, we assume the reset has
> happened and the device is in D0uninitialized and fully accessible.

It looks like we can detect that the reset has failed (or more
precisely, not quite completed) by reading PCI_COMMAND (value not yet
0) or PCI_PM_CTRL (doesn't yet indicate D0 state, we already log a
warning for this case).

From that angle, another workaround possibility is to catch that case
and then retry the PCI_PM_CTRL write and delay once more.

Daniel
Daniel Drake Nov. 18, 2019, 8:52 a.m. UTC | #14
Hi Bjorn,

Any further ideas here? Do we go ahead with the quirk or try to find a
more generic approach?

On Mon, Oct 28, 2019 at 2:32 PM Daniel Drake <drake@endlessm.com> wrote:
> It looks like we can detect that the reset has failed (or more
> precisely, not quite completed) by reading PCI_COMMAND (value not yet
> 0) or PCI_PM_CTRL (doesn't yet indicate D0 state, we already log a
> warning for this case).
>
> From that angle, another workaround possibility is to catch that case
> and then retry the PCI_PM_CTRL write and delay once more.

Thanks
Daniel

Patch
diff mbox series

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 320255e5e8f8..4570439a6a6c 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1871,19 +1871,35 @@  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x2609, quirk_intel_pcie_pm);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x260a, quirk_intel_pcie_pm);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x260b, quirk_intel_pcie_pm);
 
+static void quirk_d3_delay(struct pci_dev *dev, unsigned int delay)
+{
+	if (dev->d3_delay >= delay)
+		return;
+
+	dev->d3_delay = delay;
+	pci_info(dev, "extending delay after power-on from D3 to %d msec\n",
+		 dev->d3_delay);
+}
+
 static void quirk_radeon_pm(struct pci_dev *dev)
 {
 	if (dev->subsystem_vendor == PCI_VENDOR_ID_APPLE &&
-	    dev->subsystem_device == 0x00e2) {
-		if (dev->d3_delay < 20) {
-			dev->d3_delay = 20;
-			pci_info(dev, "extending delay after power-on from D3 to %d msec\n",
-				 dev->d3_delay);
-		}
-	}
+	    dev->subsystem_device == 0x00e2)
+		quirk_d3_delay(dev, 20);
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm);
 
+/*
+ * Ryzen7 XHCI controllers fail upon resume from runtime suspend or s2idle
+ * unless an extended D3 delay is used.
+ */
+static void quirk_ryzen_xhci_d3(struct pci_dev *dev)
+{
+	quirk_d3_delay(dev, 20);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3);
+
 #ifdef CONFIG_X86_IO_APIC
 static int dmi_disable_ioapicreroute(const struct dmi_system_id *d)
 {