PCI: pciehp: Check for PCIe capabilities change after resume

Submitted by Bjorn Helgaas on Nov. 10, 2016, 6:55 p.m.

Details

Message ID 20161110185520.19461.9789.stgit@bhelgaas-glaptop.roam.corp.google.com
State New
Headers show

Commit Message

Bjorn Helgaas Nov. 10, 2016, 6:55 p.m.
Len Brown reported that resume on a Dell XPS11 laptop takes longer than it
should.  The delay is caused by pciehp scanning for a device below a Root
Port that has nothing connected to it.

At boot-time, the 00:1c.0 Root Port's PCI Express Capabilities Register
(PCIe spec r3.0, sec 7.8.2) advertises a slot (PCI_EXP_FLAGS_SLOT is set),
so pciehp claims the port.

At resume-time, PCI_EXP_FLAGS_SLOT is clear, so the Root Port no longer
advertises a slot.  But pciehp doesn't notice that change, and it reads
Slot Status to see if anything changed.  Slot Status says a device is
present (Ports not connected to slots are required to report "Card Present"
per sec 7.8.11), so pciehp tries to bring up the link and scan for the
device, which accounts for the delay.

Per sec 7.8.2, the PCIe Capabilities Register is all read-only, so I
think the fact that it changes between boot- and resume-time is a
firmware defect.

Work around this by re-reading the Capabilites at resume-time and updating
the cached copy in pci_dev->pcie_flags_reg.  Then stop using pciehp on the
port if it no longer advertises a slot.

Reported-and-tested-by: Len Brown <lenb@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99751
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/pciehp_core.c |   10 ++++++++++
 drivers/pci/pci-driver.c          |   13 +++++++++++++
 2 files changed, 23 insertions(+)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rajat Jain Nov. 10, 2016, 7:46 p.m.
On Thu, Nov 10, 2016 at 10:55 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Len Brown reported that resume on a Dell XPS11 laptop takes longer than it
> should.  The delay is caused by pciehp scanning for a device below a Root
> Port that has nothing connected to it.
>
> At boot-time, the 00:1c.0 Root Port's PCI Express Capabilities Register
> (PCIe spec r3.0, sec 7.8.2) advertises a slot (PCI_EXP_FLAGS_SLOT is set),
> so pciehp claims the port.
>
> At resume-time, PCI_EXP_FLAGS_SLOT is clear, so the Root Port no longer
> advertises a slot.  But pciehp doesn't notice that change, and it reads
> Slot Status to see if anything changed.  Slot Status says a device is
> present (Ports not connected to slots are required to report "Card Present"
> per sec 7.8.11), so pciehp tries to bring up the link and scan for the
> device, which accounts for the delay.
>
> Per sec 7.8.2, the PCIe Capabilities Register is all read-only, so I
> think the fact that it changes between boot- and resume-time is a
> firmware defect.
>
> Work around this by re-reading the Capabilites at resume-time and updating
> the cached copy in pci_dev->pcie_flags_reg.  Then stop using pciehp on the
> port if it no longer advertises a slot.
>
> Reported-and-tested-by: Len Brown <lenb@kernel.org>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99751
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

I wish we could handle such firmware bugs using quirks
(pci_fixup_resume_early), but it looks difficult for this one :-(

Reviewed-by: Rajat Jain <rajatja@google.com>

> ---
>  drivers/pci/hotplug/pciehp_core.c |   10 ++++++++++
>  drivers/pci/pci-driver.c          |   13 +++++++++++++
>  2 files changed, 23 insertions(+)
>
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index 7d32fa33..f5461cb 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -278,6 +278,9 @@ static void pciehp_remove(struct pcie_device *dev)
>  {
>         struct controller *ctrl = get_service_data(dev);
>
> +       if (!ctrl)
> +               return;
> +
>         cleanup_slot(ctrl);
>         pciehp_release_ctrl(ctrl);
>  }
> @@ -296,6 +299,13 @@ static int pciehp_resume(struct pcie_device *dev)
>
>         ctrl = get_service_data(dev);
>
> +       if (!(pcie_caps_reg(dev->port) & PCI_EXP_FLAGS_SLOT)) {
> +               dev_info(&dev->port->dev, "No longer supports hotplug\n");

May be add a comment here, that this is only to work around a firmware bug?

> +               pciehp_remove(dev);
> +               set_service_data(dev, NULL);
> +               return 0;
> +       }
> +
>         /* reinitialize the chipset's event detection logic */
>         pcie_enable_notification(ctrl);
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 1ccce1c..fe8e964 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -505,7 +505,20 @@ static int pci_restore_standard_config(struct pci_dev *pci_dev)
>
>  static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
>  {
> +       u16 flags;
> +
>         pci_power_up(pci_dev);
> +
> +       if (pci_dev->pcie_cap) {

may consider using accessor functions here (pci_is_pcie() /
pcie_caps_reg() etc).

> +               pci_read_config_word(pci_dev, pci_dev->pcie_cap + PCI_EXP_FLAGS,
> +                                    &flags);
> +               if (pci_dev->pcie_flags_reg != flags) {
> +                       dev_info(&pci_dev->dev, "PCIe Capabilities was %#06x, is %#06x after resume (possible firmware defect)\n",
> +                                pci_dev->pcie_flags_reg, flags);
> +                       pci_dev->pcie_flags_reg = flags;
> +               }
> +       }
> +
>         pci_restore_state(pci_dev);
>         pci_fixup_device(pci_fixup_resume_early, pci_dev);
>  }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Nov. 10, 2016, 11:18 p.m.
On Thu, Nov 10, 2016 at 12:55:20PM -0600, Bjorn Helgaas wrote:
> Len Brown reported that resume on a Dell XPS11 laptop takes longer than it
> should.  The delay is caused by pciehp scanning for a device below a Root
> Port that has nothing connected to it.
> 
> At boot-time, the 00:1c.0 Root Port's PCI Express Capabilities Register
> (PCIe spec r3.0, sec 7.8.2) advertises a slot (PCI_EXP_FLAGS_SLOT is set),
> so pciehp claims the port.
> 
> At resume-time, PCI_EXP_FLAGS_SLOT is clear, so the Root Port no longer
> advertises a slot.  But pciehp doesn't notice that change, and it reads
> Slot Status to see if anything changed.  Slot Status says a device is
> present (Ports not connected to slots are required to report "Card Present"
> per sec 7.8.11), so pciehp tries to bring up the link and scan for the
> device, which accounts for the delay.
> 
> Per sec 7.8.2, the PCIe Capabilities Register is all read-only, so I
> think the fact that it changes between boot- and resume-time is a
> firmware defect.
> 
> Work around this by re-reading the Capabilites at resume-time and updating
> the cached copy in pci_dev->pcie_flags_reg.  Then stop using pciehp on the
> port if it no longer advertises a slot.
> 
> Reported-and-tested-by: Len Brown <lenb@kernel.org>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99751
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/hotplug/pciehp_core.c |   10 ++++++++++
>  drivers/pci/pci-driver.c          |   13 +++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index 7d32fa33..f5461cb 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -278,6 +278,9 @@ static void pciehp_remove(struct pcie_device *dev)
>  {
>  	struct controller *ctrl = get_service_data(dev);
>  
> +	if (!ctrl)
> +		return;
> +
>  	cleanup_slot(ctrl);
>  	pciehp_release_ctrl(ctrl);
>  }
> @@ -296,6 +299,13 @@ static int pciehp_resume(struct pcie_device *dev)
>  
>  	ctrl = get_service_data(dev);
>  
> +	if (!(pcie_caps_reg(dev->port) & PCI_EXP_FLAGS_SLOT)) {
> +		dev_info(&dev->port->dev, "No longer supports hotplug\n");
> +		pciehp_remove(dev);
> +		set_service_data(dev, NULL);
> +		return 0;
> +	}
> +
>  	/* reinitialize the chipset's event detection logic */
>  	pcie_enable_notification(ctrl);
>  
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 1ccce1c..fe8e964 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -505,7 +505,20 @@ static int pci_restore_standard_config(struct pci_dev *pci_dev)
>  
>  static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
>  {
> +	u16 flags;
> +
>  	pci_power_up(pci_dev);
> +
> +	if (pci_dev->pcie_cap) {
> +		pci_read_config_word(pci_dev, pci_dev->pcie_cap + PCI_EXP_FLAGS,
> +				     &flags);
> +		if (pci_dev->pcie_flags_reg != flags) {
> +			dev_info(&pci_dev->dev, "PCIe Capabilities was %#06x, is %#06x after resume (possible firmware defect)\n",
> +				 pci_dev->pcie_flags_reg, flags);
> +			pci_dev->pcie_flags_reg = flags;
> +		}
> +	}
> +

Is there a reason that this must happen at ->resume_noirq time rather than
->resume time?

If not, you could move this to pciehp_resume(), no?

If yes, I think the above is suboptimal because it is executed for any
decice, even though it only concerns hotplug ports handled by pciehp.
(What if pciehp isn't compiled in at all?)  A better way would IMHO be to:

- add a ->resume_noirq callback to struct pcie_port_service_driver,
- amend pcie_port_resume_noirq() to iterate over all port services and
  call down to the ->resume_noirq callback of each, just like in
  pcie_port_device_resume(),
- declare a ->resume_noirq callback for pciehp containing the above code.

Thanks,

Lukas

>  	pci_restore_state(pci_dev);
>  	pci_fixup_device(pci_fixup_resume_early, pci_dev);
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Nov. 11, 2016, 12:24 a.m.
On Fri, Nov 11, 2016 at 12:18:08AM +0100, Lukas Wunner wrote:
> On Thu, Nov 10, 2016 at 12:55:20PM -0600, Bjorn Helgaas wrote:
> > Len Brown reported that resume on a Dell XPS11 laptop takes longer than it
> > should.  The delay is caused by pciehp scanning for a device below a Root
> > Port that has nothing connected to it.
> > 
> > At boot-time, the 00:1c.0 Root Port's PCI Express Capabilities Register
> > (PCIe spec r3.0, sec 7.8.2) advertises a slot (PCI_EXP_FLAGS_SLOT is set),
> > so pciehp claims the port.
> > 
> > At resume-time, PCI_EXP_FLAGS_SLOT is clear, so the Root Port no longer
> > advertises a slot.  But pciehp doesn't notice that change, and it reads
> > Slot Status to see if anything changed.  Slot Status says a device is
> > present (Ports not connected to slots are required to report "Card Present"
> > per sec 7.8.11), so pciehp tries to bring up the link and scan for the
> > device, which accounts for the delay.
> > 
> > Per sec 7.8.2, the PCIe Capabilities Register is all read-only, so I
> > think the fact that it changes between boot- and resume-time is a
> > firmware defect.
> > 
> > Work around this by re-reading the Capabilites at resume-time and updating
> > the cached copy in pci_dev->pcie_flags_reg.  Then stop using pciehp on the
> > port if it no longer advertises a slot.
> > 
> > Reported-and-tested-by: Len Brown <lenb@kernel.org>
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=99751
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > ---
> >  drivers/pci/hotplug/pciehp_core.c |   10 ++++++++++
> >  drivers/pci/pci-driver.c          |   13 +++++++++++++
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> > index 7d32fa33..f5461cb 100644
> > --- a/drivers/pci/hotplug/pciehp_core.c
> > +++ b/drivers/pci/hotplug/pciehp_core.c
> > @@ -278,6 +278,9 @@ static void pciehp_remove(struct pcie_device *dev)
> >  {
> >  	struct controller *ctrl = get_service_data(dev);
> >  
> > +	if (!ctrl)
> > +		return;
> > +
> >  	cleanup_slot(ctrl);
> >  	pciehp_release_ctrl(ctrl);
> >  }
> > @@ -296,6 +299,13 @@ static int pciehp_resume(struct pcie_device *dev)
> >  
> >  	ctrl = get_service_data(dev);
> >  
> > +	if (!(pcie_caps_reg(dev->port) & PCI_EXP_FLAGS_SLOT)) {
> > +		dev_info(&dev->port->dev, "No longer supports hotplug\n");
> > +		pciehp_remove(dev);
> > +		set_service_data(dev, NULL);
> > +		return 0;
> > +	}
> > +
> >  	/* reinitialize the chipset's event detection logic */
> >  	pcie_enable_notification(ctrl);
> >  
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 1ccce1c..fe8e964 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -505,7 +505,20 @@ static int pci_restore_standard_config(struct pci_dev *pci_dev)
> >  
> >  static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
> >  {
> > +	u16 flags;
> > +
> >  	pci_power_up(pci_dev);
> > +
> > +	if (pci_dev->pcie_cap) {
> > +		pci_read_config_word(pci_dev, pci_dev->pcie_cap + PCI_EXP_FLAGS,
> > +				     &flags);
> > +		if (pci_dev->pcie_flags_reg != flags) {
> > +			dev_info(&pci_dev->dev, "PCIe Capabilities was %#06x, is %#06x after resume (possible firmware defect)\n",
> > +				 pci_dev->pcie_flags_reg, flags);
> > +			pci_dev->pcie_flags_reg = flags;
> > +		}
> > +	}
> > +
> 
> Is there a reason that this must happen at ->resume_noirq time rather than
> ->resume time?

PM is black magic to me.  I put it here because Rafael said
pci_power_up() is called for every PCI device during resume.  Possibly
another location would make more sense (please suggest one, if so).

One reason I put it here rather than in pciehp is because pciehp
shouldn't be responsible for updating the pci_dev->pcie_flags_reg
cached value.  That's used by the generic pcie_caps_reg() interface,
which is in turn used in the pcie_capability_read/write*() paths to
figure out whether the PCIe capability includes slot registers.

> If not, you could move this to pciehp_resume(), no?
> 
> If yes, I think the above is suboptimal because it is executed for any
> decice, even though it only concerns hotplug ports handled by pciehp.
> (What if pciehp isn't compiled in at all?)

I'm a little gun-shy because as far as I can tell, this bit isn't
supposed to change, and I don't know what *else* might change.  We'd
be totally hosed if the Device/Port Type changed, so I guess we don't
have to worry about that.  The Interrupt Message Number is nominally
RO, but hardware is explicitly required to update it in some cases, so
I could imagine it changing at resume, and I guess we should check how
we use that.

But you're a PM expert so maybe you can answer this:
PCI_EXP_FLAGS_SLOT is "HwInit" and per spec, can only be reset by a
Fundamental Reset.  Does a suspend/resume involve a Fundamental Reset?

If so, then pciehp (and maybe other things in PCI) needs to do much
more than just this.  For example, maybe we booted with
PCI_EXP_FLAGS_SLOT clear, and it becomes *set* when we resume.  I
don't think we would handle that case at all.

> A better way would IMHO be to:
> 
> - add a ->resume_noirq callback to struct pcie_port_service_driver,
> - amend pcie_port_resume_noirq() to iterate over all port services and
>   call down to the ->resume_noirq callback of each, just like in
>   pcie_port_device_resume(),
> - declare a ->resume_noirq callback for pciehp containing the above code.
> 
> Thanks,
> 
> Lukas
> 
> >  	pci_restore_state(pci_dev);
> >  	pci_fixup_device(pci_fixup_resume_early, pci_dev);
> >  }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Nov. 11, 2016, 6:44 a.m.
On Thu, Nov 10, 2016 at 06:24:48PM -0600, Bjorn Helgaas wrote:
> On Fri, Nov 11, 2016 at 12:18:08AM +0100, Lukas Wunner wrote:
> > On Thu, Nov 10, 2016 at 12:55:20PM -0600, Bjorn Helgaas wrote:
> > > Len Brown reported that resume on a Dell XPS11 laptop takes longer than it
> > > should.  The delay is caused by pciehp scanning for a device below a Root
> > > Port that has nothing connected to it.
> > > 
> > > At boot-time, the 00:1c.0 Root Port's PCI Express Capabilities Register
> > > (PCIe spec r3.0, sec 7.8.2) advertises a slot (PCI_EXP_FLAGS_SLOT is set),
> > > so pciehp claims the port.
> > > 
> > > At resume-time, PCI_EXP_FLAGS_SLOT is clear, so the Root Port no longer
> > > advertises a slot.  But pciehp doesn't notice that change, and it reads
> > > Slot Status to see if anything changed.  Slot Status says a device is
> > > present (Ports not connected to slots are required to report "Card Present"
> > > per sec 7.8.11), so pciehp tries to bring up the link and scan for the
> > > device, which accounts for the delay.
> > > 
> > > Per sec 7.8.2, the PCIe Capabilities Register is all read-only, so I
> > > think the fact that it changes between boot- and resume-time is a
> > > firmware defect.
> > > 
> > > Work around this by re-reading the Capabilites at resume-time and updating
> > > the cached copy in pci_dev->pcie_flags_reg.  Then stop using pciehp on the
> > > port if it no longer advertises a slot.
> > > 
> > > Reported-and-tested-by: Len Brown <lenb@kernel.org>
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=99751
> > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > > ---
[snip]
> > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > > index 1ccce1c..fe8e964 100644
> > > --- a/drivers/pci/pci-driver.c
> > > +++ b/drivers/pci/pci-driver.c
> > > @@ -505,7 +505,20 @@ static int pci_restore_standard_config(struct pci_dev *pci_dev)
> > >  
> > >  static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
> > >  {
> > > +	u16 flags;
> > > +
> > >  	pci_power_up(pci_dev);
> > > +
> > > +	if (pci_dev->pcie_cap) {
> > > +		pci_read_config_word(pci_dev, pci_dev->pcie_cap + PCI_EXP_FLAGS,
> > > +				     &flags);
> > > +		if (pci_dev->pcie_flags_reg != flags) {
> > > +			dev_info(&pci_dev->dev, "PCIe Capabilities was %#06x, is %#06x after resume (possible firmware defect)\n",
> > > +				 pci_dev->pcie_flags_reg, flags);
> > > +			pci_dev->pcie_flags_reg = flags;
> > > +		}
> > > +	}
> > > +
> > 
> > Is there a reason that this must happen at ->resume_noirq time rather than
> > ->resume time?
> 
> PM is black magic to me.  I put it here because Rafael said
> pci_power_up() is called for every PCI device during resume.  Possibly
> another location would make more sense (please suggest one, if so).

I think Rajat Jain had the right idea to put this in a quirk.
A port losing the slot capability over system sleep seems like
a total anomaly and nothing that we should need to worry about
in regular PCI code.

By setting the is_hotplug_bridge to false, pciehp can be prevented
from binding to this port in the first place.

That quirk needs to be executed after is_hotplug_bridge is set
and before pciehp_resume() is called for the first time. This
simple one-liner might already be sufficient, there's already a
quirk_hotplug_bridge() function in quirks.c, this could go right
next to it:


/*
 * Intel Haswell-ULT root port 1 loses slot capability over system sleep,
 * causing pciehp to fail on resume.
 */
static void quirk_no_hotplug_bridge(struct pci_dev *dev)
{
	dev->is_hotplug_bridge = 0;
}
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9c10, quirk_no_hotplug_bridge);


> I'm a little gun-shy because as far as I can tell, this bit isn't
> supposed to change, and I don't know what *else* might change.  We'd
> be totally hosed if the Device/Port Type changed, so I guess we don't
> have to worry about that.  The Interrupt Message Number is nominally
> RO, but hardware is explicitly required to update it in some cases, so
> I could imagine it changing at resume, and I guess we should check how
> we use that.
> 
> But you're a PM expert so maybe you can answer this:
> PCI_EXP_FLAGS_SLOT is "HwInit" and per spec, can only be reset by a
> Fundamental Reset.  Does a suspend/resume involve a Fundamental Reset?

If the machine went into S3 then I'd say yes because power went away and
is reapplied.

However Linux also supports "suspend-to-idle", i.e. all devices are
put into D3 but the machine isn't suspended using the platform.
When resuming from suspend-to-idle, the devices just return to D0
and there's no reset.  The same ->suspend and ->resume hooks are
executed regardless if the machine is going to suspend-to-idle or
S3.

To me this looks like an erratum of this Haswell-ULT revision (the port
has revision e4) wherein the slot capability bit is set incorrectly
after platform suspend.  However to know for sure you'd have to ask an
Intel hardware engineer how the config space is populated when the
system is initially powered up versus resumes from S3.

Best regards,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Nov. 11, 2016, 6:47 a.m.
On Fri, Nov 11, 2016 at 07:44:17AM +0100, Lukas Wunner wrote:
> That quirk needs to be executed after is_hotplug_bridge is set
> and before pciehp_resume() is called for the first time.

Sorry I meant "before pcie portdrv binds to the port".

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Nov. 11, 2016, 3:49 p.m.
On Fri, Nov 11, 2016 at 07:44:17AM +0100, Lukas Wunner wrote:
> On Thu, Nov 10, 2016 at 06:24:48PM -0600, Bjorn Helgaas wrote:
> > On Fri, Nov 11, 2016 at 12:18:08AM +0100, Lukas Wunner wrote:
> > > On Thu, Nov 10, 2016 at 12:55:20PM -0600, Bjorn Helgaas wrote:
> > > > Len Brown reported that resume on a Dell XPS11 laptop takes longer than it
> > > > should.  The delay is caused by pciehp scanning for a device below a Root
> > > > Port that has nothing connected to it.
> > > > 
> > > > At boot-time, the 00:1c.0 Root Port's PCI Express Capabilities Register
> > > > (PCIe spec r3.0, sec 7.8.2) advertises a slot (PCI_EXP_FLAGS_SLOT is set),
> > > > so pciehp claims the port.
> > > > 
> > > > At resume-time, PCI_EXP_FLAGS_SLOT is clear, so the Root Port no longer
> > > > advertises a slot.  But pciehp doesn't notice that change, and it reads
> > > > Slot Status to see if anything changed.  Slot Status says a device is
> > > > present (Ports not connected to slots are required to report "Card Present"
> > > > per sec 7.8.11), so pciehp tries to bring up the link and scan for the
> > > > device, which accounts for the delay.
> > > > 
> > > > Per sec 7.8.2, the PCIe Capabilities Register is all read-only, so I
> > > > think the fact that it changes between boot- and resume-time is a
> > > > firmware defect.
> > > > 
> > > > Work around this by re-reading the Capabilites at resume-time and updating
> > > > the cached copy in pci_dev->pcie_flags_reg.  Then stop using pciehp on the
> > > > port if it no longer advertises a slot.
> > > > 
> > > > Reported-and-tested-by: Len Brown <lenb@kernel.org>
> > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=99751
> > > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > > > ---
> [snip]
> > > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > > > index 1ccce1c..fe8e964 100644
> > > > --- a/drivers/pci/pci-driver.c
> > > > +++ b/drivers/pci/pci-driver.c
> > > > @@ -505,7 +505,20 @@ static int pci_restore_standard_config(struct pci_dev *pci_dev)
> > > >  
> > > >  static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
> > > >  {
> > > > +	u16 flags;
> > > > +
> > > >  	pci_power_up(pci_dev);
> > > > +
> > > > +	if (pci_dev->pcie_cap) {
> > > > +		pci_read_config_word(pci_dev, pci_dev->pcie_cap + PCI_EXP_FLAGS,
> > > > +				     &flags);
> > > > +		if (pci_dev->pcie_flags_reg != flags) {
> > > > +			dev_info(&pci_dev->dev, "PCIe Capabilities was %#06x, is %#06x after resume (possible firmware defect)\n",
> > > > +				 pci_dev->pcie_flags_reg, flags);
> > > > +			pci_dev->pcie_flags_reg = flags;
> > > > +		}
> > > > +	}
> > > > +
> > > 
> > > Is there a reason that this must happen at ->resume_noirq time rather than
> > > ->resume time?
> > 
> > PM is black magic to me.  I put it here because Rafael said
> > pci_power_up() is called for every PCI device during resume.  Possibly
> > another location would make more sense (please suggest one, if so).
> 
> I think Rajat Jain had the right idea to put this in a quirk.
> A port losing the slot capability over system sleep seems like
> a total anomaly and nothing that we should need to worry about
> in regular PCI code.
> 
> By setting the is_hotplug_bridge to false, pciehp can be prevented
> from binding to this port in the first place.
> 
> That quirk needs to be executed after is_hotplug_bridge is set
> and before pciehp_resume() is called for the first time. This
> simple one-liner might already be sufficient, there's already a
> quirk_hotplug_bridge() function in quirks.c, this could go right
> next to it:
> 
> 
> /*
>  * Intel Haswell-ULT root port 1 loses slot capability over system sleep,
>  * causing pciehp to fail on resume.
>  */
> static void quirk_no_hotplug_bridge(struct pci_dev *dev)
> {
> 	dev->is_hotplug_bridge = 0;
> }
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9c10, quirk_no_hotplug_bridge);

I've sent out the above before reading all of the bugzilla comments.
Len Brown and Yinghai Lu confirm there that it's a silicon bug.

It also occured to me that this closely resembles the issue Chen Yu
was working on regarding failing poweroff on 2015 MacBook Pros:
That was caused by the exact same root port (00:1c.0) claiming to
be a hotplug port and Chen Yu's solution was identical to what
I've suggested above.  The root port in that case had device ID
0x8c10, the above has 0x9c10.  Both are Lynx Point chipsets
(meant to be used with Haswell CPUs):
https://lkml.org/lkml/2016/8/19/248

It would be great if both (apparently related) issues could be
fixed in one go, and this looks like stable material to me.
Adding Chen Yu to cc.

Thanks!

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Nov. 11, 2016, 6:28 p.m.
On Fri, Nov 11, 2016 at 04:49:19PM +0100, Lukas Wunner wrote:
> On Fri, Nov 11, 2016 at 07:44:17AM +0100, Lukas Wunner wrote:

> > /*
> >  * Intel Haswell-ULT root port 1 loses slot capability over system sleep,
> >  * causing pciehp to fail on resume.
> >  */
> > static void quirk_no_hotplug_bridge(struct pci_dev *dev)
> > {
> > 	dev->is_hotplug_bridge = 0;
> > }
> > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9c10, quirk_no_hotplug_bridge);
> 
> I've sent out the above before reading all of the bugzilla comments.
> Len Brown and Yinghai Lu confirm there that it's a silicon bug.

I'd love to be proven wrong, but I don't believe it's a silicon bug.
All we have is Yinghai's vague assertion with no erratum and no
details to back it up.

As I read it, the response Len got from the validation team (comment
#22) does not confirm a silicon bug.  It merely restates the fact that
the PCIe spec requires that Presence Detect State be hardwired to 1b
if Slot Implemented is 0b (PCIe spec r3.0, sec 7.8.11).  It also
quotes this language from an Intel spec:

  Slot Implemented (SI) - R/WO. Indicates whether the root port is
  connected to a slot.  Slot support is platform specific.  BIOS
  programs this field, and it is maintained until a platform reset.

I found this in the "Intel 8 Series/C220 Series Chipset Family
Platform Controller Hub (PCH) Datasheet", May 2014, sec 19.1.24.
Technically this spec actually covers the Dell [8086:9c10] device, not
the MacBook Pro [8086:8c10] device, but the Intel validation folks say
it applies to the Dell as well.

That suggests to me that it's a Dell BIOS bug: BIOS should have
programmed Slot Implemented the same way for initial boot and for
resume, but it did not.

We could do a quirk for [8086:9c10] as long as it was qualified by
some sort of DMI check.  I don't think we could turn off hotplug for
all [8086:9c10] root ports.  The data I see says the hardware is
working per spec, and it's consistent with the PCIe spec.

I do like the idea of a quirk much more than mucking around in pciehp.
However, I think we still should account for the PCI_EXP_FLAGS_SLOT
change somehow.  If we do nothing, the accessors will still assume the
slot registers exist after resume, but the hardware will return
different results when we read them (PCIe sec 7.8 says that except for
Presence Detect State, the slot registers should be hardwired to zero
if Slot Implemented is zero).

Slot Implemented is defined as "R/WO".  The Intel spec (sec 9) says it
becomes read-only after the first write.  If the BIOS didn't write it,
I wonder if an OS quirk that runs after resume could still write it,
or if there's some other locking mechanism involved.  If an OS quirk
could set Slot Implemented, the way it was at initial boot, everything
should just work.  Presence Detect State (sec 19.1.33) should then be
0b, indicating the slot is empty, so pciehp wouldn't try to bring up
the link.

> It also occured to me that this closely resembles the issue Chen Yu
> was working on regarding failing poweroff on 2015 MacBook Pros:
> That was caused by the exact same root port (00:1c.0) claiming to
> be a hotplug port and Chen Yu's solution was identical to what
> I've suggested above.  The root port in that case had device ID
> 0x8c10, the above has 0x9c10.  Both are Lynx Point chipsets
> (meant to be used with Haswell CPUs):
> https://lkml.org/lkml/2016/8/19/248
> 
> It would be great if both (apparently related) issues could be
> fixed in one go, and this looks like stable material to me.

The MacBook Pro issue is definitely stable material.  A 2 second
speedup in resume is nice, but fails the criteria in
Documentation/stable_kernel_rules.txt.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Nov. 14, 2016, 12:10 p.m.
On Fri, Nov 11, 2016 at 12:28:31PM -0600, Bjorn Helgaas wrote:
> On Fri, Nov 11, 2016 at 04:49:19PM +0100, Lukas Wunner wrote:
> I'd love to be proven wrong, but I don't believe it's a silicon bug.
> All we have is Yinghai's vague assertion with no erratum and no
> details to back it up.
> 
> As I read it, the response Len got from the validation team (comment
> #22) does not confirm a silicon bug.  It merely restates the fact that
> the PCIe spec requires that Presence Detect State be hardwired to 1b
> if Slot Implemented is 0b (PCIe spec r3.0, sec 7.8.11).  It also
> quotes this language from an Intel spec:
> 
>   Slot Implemented (SI) - R/WO. Indicates whether the root port is
>   connected to a slot.  Slot support is platform specific.  BIOS
>   programs this field, and it is maintained until a platform reset.
> 
> I found this in the "Intel 8 Series/C220 Series Chipset Family
> Platform Controller Hub (PCH) Datasheet", May 2014, sec 19.1.24.
> Technically this spec actually covers the Dell [8086:9c10] device, not
> the MacBook Pro [8086:8c10] device, but the Intel validation folks say
> it applies to the Dell as well.
> 
> That suggests to me that it's a Dell BIOS bug: BIOS should have
> programmed Slot Implemented the same way for initial boot and for
> resume, but it did not.

Hm, sounds plausible.


> We could do a quirk for [8086:9c10] as long as it was qualified by
> some sort of DMI check.  I don't think we could turn off hotplug for
> all [8086:9c10] root ports.  The data I see says the hardware is
> working per spec, and it's consistent with the PCIe spec.
> 
> I do like the idea of a quirk much more than mucking around in pciehp.
> However, I think we still should account for the PCI_EXP_FLAGS_SLOT
> change somehow.  If we do nothing, the accessors will still assume the
> slot registers exist after resume, but the hardware will return
> different results when we read them (PCIe sec 7.8 says that except for
> Presence Detect State, the slot registers should be hardwired to zero
> if Slot Implemented is zero).
> 
> Slot Implemented is defined as "R/WO".  The Intel spec (sec 9) says it
> becomes read-only after the first write.  If the BIOS didn't write it,
> I wonder if an OS quirk that runs after resume could still write it,
> or if there's some other locking mechanism involved.  If an OS quirk
> could set Slot Implemented, the way it was at initial boot, everything
> should just work.  Presence Detect State (sec 19.1.33) should then be
> 0b, indicating the slot is empty, so pciehp wouldn't try to bring up
> the link.

Len could try "setpci -s 00:1c.0 42.w=142" after resume to set the
Slot Implemented bit.

Then use "setpci -s 00:1c.0 42.w" to test if the bit was written.

If this works, it could go into a DECLARE_PCI_FIXUP_RESUME_EARLY() quirk.

If this doesn't work, the DECLARE_PCI_FIXUP_HEADER() would have to clear
not just the is_hotplug_bridge bit (to prevent pciehp from binding) but
also the Slot Implemented bit in the cached pcie_flags_reg word.

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch hide | download patch | download mbox

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 7d32fa33..f5461cb 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -278,6 +278,9 @@  static void pciehp_remove(struct pcie_device *dev)
 {
 	struct controller *ctrl = get_service_data(dev);
 
+	if (!ctrl)
+		return;
+
 	cleanup_slot(ctrl);
 	pciehp_release_ctrl(ctrl);
 }
@@ -296,6 +299,13 @@  static int pciehp_resume(struct pcie_device *dev)
 
 	ctrl = get_service_data(dev);
 
+	if (!(pcie_caps_reg(dev->port) & PCI_EXP_FLAGS_SLOT)) {
+		dev_info(&dev->port->dev, "No longer supports hotplug\n");
+		pciehp_remove(dev);
+		set_service_data(dev, NULL);
+		return 0;
+	}
+
 	/* reinitialize the chipset's event detection logic */
 	pcie_enable_notification(ctrl);
 
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 1ccce1c..fe8e964 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -505,7 +505,20 @@  static int pci_restore_standard_config(struct pci_dev *pci_dev)
 
 static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
 {
+	u16 flags;
+
 	pci_power_up(pci_dev);
+
+	if (pci_dev->pcie_cap) {
+		pci_read_config_word(pci_dev, pci_dev->pcie_cap + PCI_EXP_FLAGS,
+				     &flags);
+		if (pci_dev->pcie_flags_reg != flags) {
+			dev_info(&pci_dev->dev, "PCIe Capabilities was %#06x, is %#06x after resume (possible firmware defect)\n",
+				 pci_dev->pcie_flags_reg, flags);
+			pci_dev->pcie_flags_reg = flags;
+		}
+	}
+
 	pci_restore_state(pci_dev);
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
 }