diff mbox

PCI: pciehp: Include the Data Link Layer State Changed bit when clearing the Slot Status register's event bits

Message ID 20140617192734.30783.71260.stgit@amt.stowe
State Accepted
Headers show

Commit Message

Myron Stowe June 17, 2014, 7:27 p.m. UTC
During PCIe hot-plug initialization - pciehp_probe - data structures
related to slot capabilities are set up.  As part of this set up, ISRs are
put in place to handle slot events and all event bits are cleared out.

This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
Slot Status bit to the event bits that are cleared out during
initialization.

Reference:
  PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
  (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).

Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
---

 drivers/pci/hotplug/pciehp_hpc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bjorn Helgaas June 17, 2014, 9:07 p.m. UTC | #1
On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
> During PCIe hot-plug initialization - pciehp_probe - data structures
> related to slot capabilities are set up.  As part of this set up, ISRs are
> put in place to handle slot events and all event bits are cleared out.
>
> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
> Slot Status bit to the event bits that are cleared out during
> initialization.

I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
notifications for hot-plug and removal").  Prior to that, pcie_isr()
didn't look at the PCI_EXP_SLTSTA_DLLSC bit.

Apparently there's a non-public report of spurious messages like this
at boot-time, with no actual hotplug events:

  pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
  pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
0000:83:00, cannot hot-add
  pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00

Device 0000:83:00.0 was enumerated previously, during the normal PCI
device enumeration.  I suspect DLLSC was set by the hardware when the
link came up after power-on, and it remained set until Linux booted.
Then when we take the first pciehp interrupt, we notice DLLSC is set
and think it's a new link state change.

This might be system-specific, because some BIOSes might clear DLLSC
before handing off to the OS.  Or my theory might be all wet.

But I think we should clear DLLSC in any case.  We clear all the other
RW1C bits in Slot Status, and I think we should clear DLLSC as well.

I'd like to include a bugzilla or mailing list reference for the
spurious message, and I'd also like confirmation that this changes
actually fixes it.

Bjorn

> Reference:
>   PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>   (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>
> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
> ---
>
>  drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 42914e0..0568416 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>         pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>                 PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>                 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
> -               PCI_EXP_SLTSTA_CC);
> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>
>         /* Disable software notification */
>         pcie_disable_notification(ctrl);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guenter Roeck June 17, 2014, 10:55 p.m. UTC | #2
On 06/17/2014 02:07 PM, Bjorn Helgaas wrote:
> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
>> During PCIe hot-plug initialization - pciehp_probe - data structures
>> related to slot capabilities are set up.  As part of this set up, ISRs are
>> put in place to handle slot events and all event bits are cleared out.
>>
>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>> Slot Status bit to the event bits that are cleared out during
>> initialization.
>
> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
> notifications for hot-plug and removal").  Prior to that, pcie_isr()
> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>
> Apparently there's a non-public report of spurious messages like this
> at boot-time, with no actual hotplug events:
>
>    pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>    pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
> 0000:83:00, cannot hot-add
>    pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>

I think I have seen that message once in a while in our systems.
Rajat, didn't we talk about this a while ago ?

Guenter

> Device 0000:83:00.0 was enumerated previously, during the normal PCI
> device enumeration.  I suspect DLLSC was set by the hardware when the
> link came up after power-on, and it remained set until Linux booted.
> Then when we take the first pciehp interrupt, we notice DLLSC is set
> and think it's a new link state change.
>
> This might be system-specific, because some BIOSes might clear DLLSC
> before handing off to the OS.  Or my theory might be all wet.
>
> But I think we should clear DLLSC in any case.  We clear all the other
> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>
> I'd like to include a bugzilla or mailing list reference for the
> spurious message, and I'd also like confirmation that this changes
> actually fixes it.
>
> Bjorn
>
>> Reference:
>>    PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>>    (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>
>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>> ---
>>
>>   drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>>   1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index 42914e0..0568416 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>          pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>                  PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>                  PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>> -               PCI_EXP_SLTSTA_CC);
>> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>
>>          /* Disable software notification */
>>          pcie_disable_notification(ctrl);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain June 24, 2014, 8:34 p.m. UTC | #3
Hello,

Sorry, I missed this email.

Please see below. 

> -----Original Message-----

> From: linux-pci-owner@vger.kernel.org [mailto:linux-pci-

> owner@vger.kernel.org] On Behalf Of Guenter Roeck

> Sent: Tuesday, June 17, 2014 3:56 PM

> To: Bjorn Helgaas; Myron Stowe

> Cc: linux-pci@vger.kernel.org; Kenji Kaneshige; linux-

> kernel@vger.kernel.org; Rajat Jain

> Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed

> bit when clearing the Slot Status register's event bits

> 

> On 06/17/2014 02:07 PM, Bjorn Helgaas wrote:

> > On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe

> <myron.stowe@redhat.com> wrote:

> >> During PCIe hot-plug initialization - pciehp_probe - data structures

> >> related to slot capabilities are set up.  As part of this set up,

> >> ISRs are put in place to handle slot events and all event bits are cleared

> out.

> >>

> >> This patch adds the Data Link Layer State Changed

> >> (PCI_EXP_SLTSTA_DLLSC) Slot Status bit to the event bits that are

> >> cleared out during initialization.

> >

> > I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change

> > notifications for hot-plug and removal").  Prior to that, pcie_isr()

> > didn't look at the PCI_EXP_SLTSTA_DLLSC bit.

> >

> > Apparently there's a non-public report of spurious messages like this

> > at boot-time, with no actual hotplug events:

> >

> >    pciehp 0000:82:04.0:pcie24: slot(4): Link Up event

> >    pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at

> > 0000:83:00, cannot hot-add

> >    pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00

> >

> 

> I think I have seen that message once in a while in our systems.

> Rajat, didn't we talk about this a while ago ?


Essentially my hiccup was that I was not sure whether the driver should or should not take care of the link change events that have happened BEFORE the driver gets loaded. This has more impact if the pciehp is built as a kernel module.

As an example:

It is common for the platform init code built into the kernel, to take the PCI devices on the board out of reset. And that can happen after the PCI enumeration but before the pciehp driver gets loaded. Thus in this condition, with this patch, the pciehp will ignore the linkup event, thus device will not be visible. The only way (other than a rescan) to do hot-add the device would be to apply-and-remove-reset-signal to the device again. At which point pciehp may give a warning about about an attempt to remove a non-existent card, and then will proceed fine with hot-add.

The patch looks good, only wanted to make sure that we understand and agree that the pciehp should NOT process and link events that have happened before the pciehp was loaded.

Acked-by: Rajat Jain <rajatjain@juniper.net>


Thanks,

Rajat
Bjorn Helgaas June 24, 2014, 9:16 p.m. UTC | #4
On Tue, Jun 24, 2014 at 2:34 PM, Rajat Jain <rajatjain@juniper.net> wrote:
> Hello,
>
> Sorry, I missed this email.
>
> Please see below.
>
>> -----Original Message-----
>> From: linux-pci-owner@vger.kernel.org [mailto:linux-pci-
>> owner@vger.kernel.org] On Behalf Of Guenter Roeck
>> Sent: Tuesday, June 17, 2014 3:56 PM
>> To: Bjorn Helgaas; Myron Stowe
>> Cc: linux-pci@vger.kernel.org; Kenji Kaneshige; linux-
>> kernel@vger.kernel.org; Rajat Jain
>> Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed
>> bit when clearing the Slot Status register's event bits
>>
>> On 06/17/2014 02:07 PM, Bjorn Helgaas wrote:
>> > On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe
>> <myron.stowe@redhat.com> wrote:
>> >> During PCIe hot-plug initialization - pciehp_probe - data structures
>> >> related to slot capabilities are set up.  As part of this set up,
>> >> ISRs are put in place to handle slot events and all event bits are cleared
>> out.
>> >>
>> >> This patch adds the Data Link Layer State Changed
>> >> (PCI_EXP_SLTSTA_DLLSC) Slot Status bit to the event bits that are
>> >> cleared out during initialization.
>> >
>> > I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>> > notifications for hot-plug and removal").  Prior to that, pcie_isr()
>> > didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>> >
>> > Apparently there's a non-public report of spurious messages like this
>> > at boot-time, with no actual hotplug events:
>> >
>> >    pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>> >    pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>> > 0000:83:00, cannot hot-add
>> >    pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>> >
>>
>> I think I have seen that message once in a while in our systems.
>> Rajat, didn't we talk about this a while ago ?
>
> Essentially my hiccup was that I was not sure whether the driver should or should not take care of the link change events that have happened BEFORE the driver gets loaded. This has more impact if the pciehp is built as a kernel module.
>
> As an example:
>
> It is common for the platform init code built into the kernel, to take the PCI devices on the board out of reset. And that can happen after the PCI enumeration but before the pciehp driver gets loaded. Thus in this condition, with this patch, the pciehp will ignore the linkup event, thus device will not be visible. The only way (other than a rescan) to do hot-add the device would be to apply-and-remove-reset-signal to the device again. At which point pciehp may give a warning about about an attempt to remove a non-existent card, and then will proceed fine with hot-add.

When you saw these problems, was pciehp a module?  We changed it
(c10cc483bf3f in v3.11) so it can't be a module any more, but if there
can still be a significant window between enumeration and pciehp init,
I'd like to understand more about it.

> The patch looks good, only wanted to make sure that we understand and agree that the pciehp should NOT process and link events that have happened before the pciehp was loaded.

Currently (before this patch), we clear the Attention Button Pressed,
Power Fault Detected, MRL Sensor Changed, Presence Detect Changed, and
Command Completed bits, but we *don't* clear Data Link Layer State
Changed.

That asymmetry seems hard to justify.  For example, if a card were
inserted after enumeration but before pciehp is initialized, we'd miss
the PDC indication, so I think we would fail to notice the new device.
That seems basically the same as missing the linkup event in your
example.  In both cases, I think we *should* notice the new device, so
the fact that we don't is a problem.

But since pciehp can't be a module any more, the window between PCI
enumeration and pciehp initialization should be relatively small.  I
think the best way to fix this would be to integrate pciehp more
closely into PCI enumeration, e.g., by doing pcie_init() at the point
where we discover the bridge, before we enumerate any devices below
the bridge.  This is somewhat tangled up with acpihp, so I don't know
how complicated it would be to do this.

So I guess my argument is:

  - Ignoring events that happen before pciehp init decreases our
dependency on BIOS
  - Handling all events consistently is very important
  - We currently have a problem with missing pre-pciehp events, but
there is a way to fix this

> Acked-by: Rajat Jain <rajatjain@juniper.net>

Thanks for taking a look at this!

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain June 25, 2014, 5:26 a.m. UTC | #5
SGVsbG8sDQoNCj4gPj4NCj4gPj4gT24gMDYvMTcvMjAxNCAwMjowNyBQTSwgQmpvcm4gSGVsZ2Fh
cyB3cm90ZToNCj4gPj4gPiBPbiBUdWUsIEp1biAxNywgMjAxNCBhdCAxOjI3IFBNLCBNeXJvbiBT
dG93ZQ0KPiA+PiA8bXlyb24uc3Rvd2VAcmVkaGF0LmNvbT4gd3JvdGU6DQo+ID4+ID4+IER1cmlu
ZyBQQ0llIGhvdC1wbHVnIGluaXRpYWxpemF0aW9uIC0gcGNpZWhwX3Byb2JlIC0gZGF0YQ0KPiA+
PiA+PiBzdHJ1Y3R1cmVzIHJlbGF0ZWQgdG8gc2xvdCBjYXBhYmlsaXRpZXMgYXJlIHNldCB1cC4g
IEFzIHBhcnQgb2YNCj4gPj4gPj4gdGhpcyBzZXQgdXAsIElTUnMgYXJlIHB1dCBpbiBwbGFjZSB0
byBoYW5kbGUgc2xvdCBldmVudHMgYW5kIGFsbA0KPiA+PiA+PiBldmVudCBiaXRzIGFyZSBjbGVh
cmVkDQo+ID4+IG91dC4NCj4gPj4gPj4NCj4gPj4gPj4gVGhpcyBwYXRjaCBhZGRzIHRoZSBEYXRh
IExpbmsgTGF5ZXIgU3RhdGUgQ2hhbmdlZA0KPiA+PiA+PiAoUENJX0VYUF9TTFRTVEFfRExMU0Mp
IFNsb3QgU3RhdHVzIGJpdCB0byB0aGUgZXZlbnQgYml0cyB0aGF0IGFyZQ0KPiA+PiA+PiBjbGVh
cmVkIG91dCBkdXJpbmcgaW5pdGlhbGl6YXRpb24uDQo+ID4+ID4NCj4gPj4gPiBJIHRoaW5rIHRo
aXMgaXMgcmVsYXRlZCB0byBlNDhmMWI2N2Y2NjggKCJQQ0k6IHBjaWVocDogVXNlIGxpbmsNCj4g
Pj4gPiBjaGFuZ2Ugbm90aWZpY2F0aW9ucyBmb3IgaG90LXBsdWcgYW5kIHJlbW92YWwiKS4gIFBy
aW9yIHRvIHRoYXQsDQo+ID4+ID4gcGNpZV9pc3IoKSBkaWRuJ3QgbG9vayBhdCB0aGUgUENJX0VY
UF9TTFRTVEFfRExMU0MgYml0Lg0KPiA+PiA+DQo+ID4+ID4gQXBwYXJlbnRseSB0aGVyZSdzIGEg
bm9uLXB1YmxpYyByZXBvcnQgb2Ygc3B1cmlvdXMgbWVzc2FnZXMgbGlrZQ0KPiA+PiA+IHRoaXMg
YXQgYm9vdC10aW1lLCB3aXRoIG5vIGFjdHVhbCBob3RwbHVnIGV2ZW50czoNCj4gPj4gPg0KPiA+
PiA+ICAgIHBjaWVocCAwMDAwOjgyOjA0LjA6cGNpZTI0OiBzbG90KDQpOiBMaW5rIFVwIGV2ZW50
DQo+ID4+ID4gICAgcGNpZWhwIDAwMDA6ODI6MDQuMDpwY2llMjQ6IERldmljZSAwMDAwOjgzOjAw
LjAgYWxyZWFkeSBleGlzdHMNCj4gPj4gPiBhdCAwMDAwOjgzOjAwLCBjYW5ub3QgaG90LWFkZA0K
PiA+PiA+ICAgIHBjaWVocCAwMDAwOjgyOjA0LjA6cGNpZTI0OiBDYW5ub3QgYWRkIGRldmljZSBh
dCAwMDAwOjgzOjAwDQo+ID4+ID4NCj4gPj4NCj4gPj4gSSB0aGluayBJIGhhdmUgc2VlbiB0aGF0
IG1lc3NhZ2Ugb25jZSBpbiBhIHdoaWxlIGluIG91ciBzeXN0ZW1zLg0KPiA+PiBSYWphdCwgZGlk
bid0IHdlIHRhbGsgYWJvdXQgdGhpcyBhIHdoaWxlIGFnbyA/DQo+ID4NCj4gPiBFc3NlbnRpYWxs
eSBteSBoaWNjdXAgd2FzIHRoYXQgSSB3YXMgbm90IHN1cmUgd2hldGhlciB0aGUgZHJpdmVyIHNo
b3VsZCBvcg0KPiBzaG91bGQgbm90IHRha2UgY2FyZSBvZiB0aGUgbGluayBjaGFuZ2UgZXZlbnRz
IHRoYXQgaGF2ZSBoYXBwZW5lZCBCRUZPUkUNCj4gdGhlIGRyaXZlciBnZXRzIGxvYWRlZC4gVGhp
cyBoYXMgbW9yZSBpbXBhY3QgaWYgdGhlIHBjaWVocCBpcyBidWlsdCBhcyBhIGtlcm5lbA0KPiBt
b2R1bGUuDQo+ID4NCj4gPiBBcyBhbiBleGFtcGxlOg0KPiA+DQo+ID4gSXQgaXMgY29tbW9uIGZv
ciB0aGUgcGxhdGZvcm0gaW5pdCBjb2RlIGJ1aWx0IGludG8gdGhlIGtlcm5lbCwgdG8gdGFrZSB0
aGUgUENJDQo+IGRldmljZXMgb24gdGhlIGJvYXJkIG91dCBvZiByZXNldC4gQW5kIHRoYXQgY2Fu
IGhhcHBlbiBhZnRlciB0aGUgUENJDQo+IGVudW1lcmF0aW9uIGJ1dCBiZWZvcmUgdGhlIHBjaWVo
cCBkcml2ZXIgZ2V0cyBsb2FkZWQuIFRodXMgaW4gdGhpcyBjb25kaXRpb24sDQo+IHdpdGggdGhp
cyBwYXRjaCwgdGhlIHBjaWVocCB3aWxsIGlnbm9yZSB0aGUgbGlua3VwIGV2ZW50LCB0aHVzIGRl
dmljZSB3aWxsIG5vdA0KPiBiZSB2aXNpYmxlLiBUaGUgb25seSB3YXkgKG90aGVyIHRoYW4gYSBy
ZXNjYW4pIHRvIGRvIGhvdC1hZGQgdGhlIGRldmljZSB3b3VsZA0KPiBiZSB0byBhcHBseS1hbmQt
cmVtb3ZlLXJlc2V0LXNpZ25hbCB0byB0aGUgZGV2aWNlIGFnYWluLiBBdCB3aGljaCBwb2ludA0K
PiBwY2llaHAgbWF5IGdpdmUgYSB3YXJuaW5nIGFib3V0IGFib3V0IGFuIGF0dGVtcHQgdG8gcmVt
b3ZlIGEgbm9uLWV4aXN0ZW50DQo+IGNhcmQsIGFuZCB0aGVuIHdpbGwgcHJvY2VlZCBmaW5lIHdp
dGggaG90LWFkZC4NCj4gDQo+IFdoZW4geW91IHNhdyB0aGVzZSBwcm9ibGVtcywgd2FzIHBjaWVo
cCBhIG1vZHVsZT8NCg0KTm8sIHdlIGRpZCBub3QgYWN0dWFsbHkgaGl0ICp0aGlzKiBwcm9ibGVt
IChETExTQyBnZXR0aW5nIHNldCBiZWZvcmUgcGNpZWhwIGluaXQpLiBXZSBoYWQgb2JzZXJ2ZWQg
YW5vdGhlciBIVyBzcGVjaWZpYyBwcm9ibGVtIHdoZXJlIGJvdGggbGluayBldmVudCBhbmQgcHJl
c2VuY2UgZGV0ZWN0IHdlcmUgdG9nZ2xpbmcsIGhlbmNlIDIgaW5kZXBlbmRlbnQgaG90LXBsdWcg
aW50ZXJydXB0cyBldmVudHMgd2VyZSBjb21pbmcsIHRodXMgcmVzdWx0aW5nIGluIHNwdXJpb3Vz
IG1lc3NhZ2VzLg0KDQo+IFdlIGNoYW5nZWQgaXQNCj4gKGMxMGNjNDgzYmYzZiBpbiB2My4xMSkg
c28gaXQgY2FuJ3QgYmUgYSBtb2R1bGUgYW55IG1vcmUsIGJ1dCBpZiB0aGVyZSBjYW4gc3RpbGwN
Cj4gYmUgYSBzaWduaWZpY2FudCB3aW5kb3cgYmV0d2VlbiBlbnVtZXJhdGlvbiBhbmQgcGNpZWhw
IGluaXQsIEknZCBsaWtlIHRvDQo+IHVuZGVyc3RhbmQgbW9yZSBhYm91dCBpdC4NCg0KT2gsIG15
IGJhZC4gSSBkaWQgbm90IGV2ZXIgdHJ5IHRvIGFjdHVhbGx5IGJ1aWxkIGl0IGFzIGEgbW9kdWxl
LCBJIGp1c3QgYXNzdW1lZCB0aGF0IGl0J2QgYmUgcG9zc2libGUuIEkgd2FzIGp1c3QgdGhpbmtp
bmcgb2YgY29ybmVyIGNhc2VzIEkgZ3Vlc3MuIA0KDQo+IA0KPiA+IFRoZSBwYXRjaCBsb29rcyBn
b29kLCBvbmx5IHdhbnRlZCB0byBtYWtlIHN1cmUgdGhhdCB3ZSB1bmRlcnN0YW5kIGFuZA0KPiBh
Z3JlZSB0aGF0IHRoZSBwY2llaHAgc2hvdWxkIE5PVCBwcm9jZXNzIGFuZCBsaW5rIGV2ZW50cyB0
aGF0IGhhdmUNCj4gaGFwcGVuZWQgYmVmb3JlIHRoZSBwY2llaHAgd2FzIGxvYWRlZC4NCj4gDQo+
IEN1cnJlbnRseSAoYmVmb3JlIHRoaXMgcGF0Y2gpLCB3ZSBjbGVhciB0aGUgQXR0ZW50aW9uIEJ1
dHRvbiBQcmVzc2VkLCBQb3dlcg0KPiBGYXVsdCBEZXRlY3RlZCwgTVJMIFNlbnNvciBDaGFuZ2Vk
LCBQcmVzZW5jZSBEZXRlY3QgQ2hhbmdlZCwgYW5kDQo+IENvbW1hbmQgQ29tcGxldGVkIGJpdHMs
IGJ1dCB3ZSAqZG9uJ3QqIGNsZWFyIERhdGEgTGluayBMYXllciBTdGF0ZQ0KPiBDaGFuZ2VkLg0K
PiANCj4gVGhhdCBhc3ltbWV0cnkgc2VlbXMgaGFyZCB0byBqdXN0aWZ5Lg0KDQpBZ3JlZS4NCg0K
ID4gRm9yIGV4YW1wbGUsIGlmIGEgY2FyZCB3ZXJlIGluc2VydGVkDQo+IGFmdGVyIGVudW1lcmF0
aW9uIGJ1dCBiZWZvcmUgcGNpZWhwIGlzIGluaXRpYWxpemVkLCB3ZSdkIG1pc3MgdGhlIFBEQw0K
PiBpbmRpY2F0aW9uLCBzbyBJIHRoaW5rIHdlIHdvdWxkIGZhaWwgdG8gbm90aWNlIHRoZSBuZXcg
ZGV2aWNlLg0KPiBUaGF0IHNlZW1zIGJhc2ljYWxseSB0aGUgc2FtZSBhcyBtaXNzaW5nIHRoZSBs
aW5rdXAgZXZlbnQgaW4geW91ciBleGFtcGxlLg0KPiBJbiBib3RoIGNhc2VzLCBJIHRoaW5rIHdl
ICpzaG91bGQqIG5vdGljZSB0aGUgbmV3IGRldmljZSwgc28gdGhlIGZhY3QgdGhhdCB3ZQ0KPiBk
b24ndCBpcyBhIHByb2JsZW0uDQoNCkh1bW1tLCBvayB1bmRlcnN0b29kLiBTbyB0aGF0IG1lYW5z
IHdlIGhhdmUgZGVmaW5lZCBhIHByb2JsZW0gc3RhdGVtZW50IGF0IGxlYXN0IDotKS4gWWVzLCBJ
IHVuZGVyc3RhbmQgdGhhdCB0aGUgd2luZG93IGlzIHZlcnkgc21hbGwgaGVyZS4gQWxzbyBJIGFn
cmVlIHdpdGggeW91ciBvcGluaW9uIHdoYXRldmVyIHNvbHV0aW9uIHdlIHB1dCBpdCB0byB0YWtl
IGNhcmUgb2YgZXZlbnRzIHRoYXQgaGFwcGVuZWQgYmVmb3JlIHBjaWVocCBpbml0LCBzaG91bGQg
YmUgY29uc2lzdGVudCBmb3IgYWxsIGV2ZW50cyAobm90IGp1c3QgbGluayBjaGFuZ2VzKS4gVGhl
IHByb2JsZW0gY2FuIHdhaXQgdG8gYmUgc29sdmVkIEkgdGhpbmsuDQoNCj4gDQo+IEJ1dCBzaW5j
ZSBwY2llaHAgY2FuJ3QgYmUgYSBtb2R1bGUgYW55IG1vcmUsIHRoZSB3aW5kb3cgYmV0d2VlbiBQ
Q0kNCj4gZW51bWVyYXRpb24gYW5kIHBjaWVocCBpbml0aWFsaXphdGlvbiBzaG91bGQgYmUgcmVs
YXRpdmVseSBzbWFsbC4gIEkgdGhpbmsgdGhlDQo+IGJlc3Qgd2F5IHRvIGZpeCB0aGlzIHdvdWxk
IGJlIHRvIGludGVncmF0ZSBwY2llaHAgbW9yZSBjbG9zZWx5IGludG8gUENJDQo+IGVudW1lcmF0
aW9uLCBlLmcuLCBieSBkb2luZyBwY2llX2luaXQoKSBhdCB0aGUgcG9pbnQgd2hlcmUgd2UgZGlz
Y292ZXIgdGhlDQo+IGJyaWRnZSwgYmVmb3JlIHdlIGVudW1lcmF0ZSBhbnkgZGV2aWNlcyBiZWxv
dyB0aGUgYnJpZGdlLiAgVGhpcyBpcw0KPiBzb21ld2hhdCB0YW5nbGVkIHVwIHdpdGggYWNwaWhw
LCBzbyBJIGRvbid0IGtub3cgaG93IGNvbXBsaWNhdGVkIGl0IHdvdWxkDQo+IGJlIHRvIGRvIHRo
aXMuDQo+IA0KPiBTbyBJIGd1ZXNzIG15IGFyZ3VtZW50IGlzOg0KPiANCj4gICAtIElnbm9yaW5n
IGV2ZW50cyB0aGF0IGhhcHBlbiBiZWZvcmUgcGNpZWhwIGluaXQgZGVjcmVhc2VzIG91ciBkZXBl
bmRlbmN5DQo+IG9uIEJJT1MNCg0KQWdyZWUuDQoNCj4gICAtIEhhbmRsaW5nIGFsbCBldmVudHMg
Y29uc2lzdGVudGx5IGlzIHZlcnkgaW1wb3J0YW50DQoNCkFncmVlLg0KDQo+ICAgLSBXZSBjdXJy
ZW50bHkgaGF2ZSBhIHByb2JsZW0gd2l0aCBtaXNzaW5nIHByZS1wY2llaHAgZXZlbnRzLCBidXQg
dGhlcmUgaXMNCj4gYSB3YXkgdG8gZml4IHRoaXMNCg0KU29tZXdoYXQgYWdyZWUgKEkgZG8gbm90
IHVuZGVyc3RhbmQgd2hhdCB0aGF0IHdheSBpcyBjdXJyZW50bHkuIEJ1dCBJIHRoaW5rIGl0IGNh
biB3YWl0IGFzIHdlIGRvbid0IGhpdCB0aGF0IHByb2JsZW0gdG9kYXkgYW5kIGFsc28gdGhlIHdp
bmRvdyB3aGVyZSBpdCBjYW4gb2NjdXIgaXMgdmVyeSBzbWFsbCkuDQoNClRoYW5rcywNCg0KUmFq
YXQNCg0KPiANCj4gPiBBY2tlZC1ieTogUmFqYXQgSmFpbiA8cmFqYXRqYWluQGp1bmlwZXIubmV0
Pg0KPiANCj4gVGhhbmtzIGZvciB0YWtpbmcgYSBsb29rIGF0IHRoaXMhDQo+IA0KPiBCam9ybg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas June 25, 2014, 5:51 p.m. UTC | #6
On Tue, Jun 24, 2014 at 11:26 PM, Rajat Jain <rajatjain@juniper.net> wrote:

>> > Essentially my hiccup was that I was not sure whether the driver should or
>> should not take care of the link change events that have happened BEFORE
>> the driver gets loaded. This has more impact if the pciehp is built as a kernel
>> module.
>> >
>> > As an example:
>> >
>> > It is common for the platform init code built into the kernel, to take the PCI
>> devices on the board out of reset. And that can happen after the PCI
>> enumeration but before the pciehp driver gets loaded. Thus in this condition,
>> with this patch, the pciehp will ignore the linkup event, thus device will not
>> be visible. The only way (other than a rescan) to do hot-add the device would
>> be to apply-and-remove-reset-signal to the device again. At which point
>> pciehp may give a warning about about an attempt to remove a non-existent
>> card, and then will proceed fine with hot-add.
>>
>> When you saw these problems, was pciehp a module?
>
> No, we did not actually hit *this* problem (DLLSC getting set before pciehp init). We had observed another HW specific problem where both link event and presence detect were toggling, hence 2 independent hot-plug interrupts events were coming, thus resulting in spurious messages.

OK, this sounds like a slightly different problem that we can work on
later if it turns out to be necessary.

>  > For example, if a card were inserted
>> after enumeration but before pciehp is initialized, we'd miss the PDC
>> indication, so I think we would fail to notice the new device.
>> That seems basically the same as missing the linkup event in your example.
>> In both cases, I think we *should* notice the new device, so the fact that we
>> don't is a problem.
> ...

>>   - We currently have a problem with missing pre-pciehp events, but there is
>> a way to fix this
>
> Somewhat agree (I do not understand what that way is currently. But I think it can wait as we don't hit that problem today and also the window where it can occur is very small).

My thought was that we could fix this by integrating pciehp into PCI
enumeration, so we're prepared to handle hotplug events on the bridge
before we enumerate any device below the bridge.  But I haven't
actually tried this so I don't know whether it's practical.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Myron Stowe June 30, 2014, 4:49 p.m. UTC | #7
On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
>> During PCIe hot-plug initialization - pciehp_probe - data structures
>> related to slot capabilities are set up.  As part of this set up, ISRs are
>> put in place to handle slot events and all event bits are cleared out.
>>
>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>> Slot Status bit to the event bits that are cleared out during
>> initialization.
>
> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
> notifications for hot-plug and removal").  Prior to that, pcie_isr()
> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>
> Apparently there's a non-public report of spurious messages like this
> at boot-time, with no actual hotplug events:
>
>   pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>   pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
> 0000:83:00, cannot hot-add
>   pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>
> Device 0000:83:00.0 was enumerated previously, during the normal PCI
> device enumeration.  I suspect DLLSC was set by the hardware when the
> link came up after power-on, and it remained set until Linux booted.
> Then when we take the first pciehp interrupt, we notice DLLSC is set
> and think it's a new link state change.
>
> This might be system-specific, because some BIOSes might clear DLLSC
> before handing off to the OS.  Or my theory might be all wet.
>
> But I think we should clear DLLSC in any case.  We clear all the other
> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>
> I'd like to include a bugzilla or mailing list reference for the
> spurious message, and I'd also like confirmation that this changes
> actually fixes it.

I just got confirmation from the reporter that the patch fixes the
issue they were encountering.

I've also asked them numerous time if I could make the bug public as I
expect others may be hitting it and could benefit from a BZ and
analysis/explanation.  I just repeated my request - if they reply
positively I'll follow through with some documentation.

Myron

>
> Bjorn
>
>> Reference:
>>   PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>>   (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>
>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>> ---
>>
>>  drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index 42914e0..0568416 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>         pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>                 PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>                 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>> -               PCI_EXP_SLTSTA_CC);
>> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>
>>         /* Disable software notification */
>>         pcie_disable_notification(ctrl);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 1, 2014, 7:29 p.m. UTC | #8
On Mon, Jun 30, 2014 at 10:49 AM, Myron Stowe <myron.stowe@gmail.com> wrote:
> On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
>>> During PCIe hot-plug initialization - pciehp_probe - data structures
>>> related to slot capabilities are set up.  As part of this set up, ISRs are
>>> put in place to handle slot events and all event bits are cleared out.
>>>
>>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>>> Slot Status bit to the event bits that are cleared out during
>>> initialization.
>>
>> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>> notifications for hot-plug and removal").  Prior to that, pcie_isr()
>> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>>
>> Apparently there's a non-public report of spurious messages like this
>> at boot-time, with no actual hotplug events:
>>
>>   pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>>   pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>> 0000:83:00, cannot hot-add
>>   pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>>
>> Device 0000:83:00.0 was enumerated previously, during the normal PCI
>> device enumeration.  I suspect DLLSC was set by the hardware when the
>> link came up after power-on, and it remained set until Linux booted.
>> Then when we take the first pciehp interrupt, we notice DLLSC is set
>> and think it's a new link state change.
>>
>> This might be system-specific, because some BIOSes might clear DLLSC
>> before handing off to the OS.  Or my theory might be all wet.
>>
>> But I think we should clear DLLSC in any case.  We clear all the other
>> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>>
>> I'd like to include a bugzilla or mailing list reference for the
>> spurious message, and I'd also like confirmation that this changes
>> actually fixes it.
>
> I just got confirmation from the reporter that the patch fixes the
> issue they were encountering.
>
> I've also asked them numerous time if I could make the bug public as I
> expect others may be hitting it and could benefit from a BZ and
> analysis/explanation.  I just repeated my request - if they reply
> positively I'll follow through with some documentation.

I'd be glad to merge this, as soon as we get more details about the
problem it fixes.  I think it's good to include some specifics, e.g.,
"Unexpected Link Up event," etc., because it helps other people with
the same problem to find the solution.  It's OK if details about
pre-production machines and so on are removed, but it's better if
changes to generic code are supported by public evidence that
everybody can evaluate.

If you want to pull out the public stuff into a kernel.org bugzilla,
that would be fine, too.  That would actually be better because then
we don't have to depend on Red Hat maintaining it.

Bjorn

>>> Reference:
>>>   PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>>>   (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>>
>>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>>> ---
>>>
>>>  drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>> index 42914e0..0568416 100644
>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>>         pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>>                 PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>>                 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>>> -               PCI_EXP_SLTSTA_CC);
>>> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>>
>>>         /* Disable software notification */
>>>         pcie_disable_notification(ctrl);
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Myron Stowe July 7, 2014, 8:26 p.m. UTC | #9
On Tue, Jul 1, 2014 at 1:29 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Jun 30, 2014 at 10:49 AM, Myron Stowe <myron.stowe@gmail.com> wrote:
>> On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
>>>> During PCIe hot-plug initialization - pciehp_probe - data structures
>>>> related to slot capabilities are set up.  As part of this set up, ISRs are
>>>> put in place to handle slot events and all event bits are cleared out.
>>>>
>>>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>>>> Slot Status bit to the event bits that are cleared out during
>>>> initialization.
>>>
>>> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>>> notifications for hot-plug and removal").  Prior to that, pcie_isr()
>>> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>>>
>>> Apparently there's a non-public report of spurious messages like this
>>> at boot-time, with no actual hotplug events:
>>>
>>>   pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>>>   pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>>> 0000:83:00, cannot hot-add
>>>   pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>>>
>>> Device 0000:83:00.0 was enumerated previously, during the normal PCI
>>> device enumeration.  I suspect DLLSC was set by the hardware when the
>>> link came up after power-on, and it remained set until Linux booted.
>>> Then when we take the first pciehp interrupt, we notice DLLSC is set
>>> and think it's a new link state change.
>>>
>>> This might be system-specific, because some BIOSes might clear DLLSC
>>> before handing off to the OS.  Or my theory might be all wet.
>>>
>>> But I think we should clear DLLSC in any case.  We clear all the other
>>> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>>>
>>> I'd like to include a bugzilla or mailing list reference for the
>>> spurious message, and I'd also like confirmation that this changes
>>> actually fixes it.
>>
>> I just got confirmation from the reporter that the patch fixes the
>> issue they were encountering.
>>
>> I've also asked them numerous time if I could make the bug public as I
>> expect others may be hitting it and could benefit from a BZ and
>> analysis/explanation.  I just repeated my request - if they reply
>> positively I'll follow through with some documentation.
>
> I'd be glad to merge this, as soon as we get more details about the
> problem it fixes.  I think it's good to include some specifics, e.g.,
> "Unexpected Link Up event," etc., because it helps other people with
> the same problem to find the solution.  It's OK if details about
> pre-production machines and so on are removed, but it's better if
> changes to generic code are supported by public evidence that
> everybody can evaluate.
>
> If you want to pull out the public stuff into a kernel.org bugzilla,
> that would be fine, too.  That would actually be better because then
> we don't have to depend on Red Hat maintaining it.

Agreed (but I couldn't get the vendor to agree to open up the Red Hat
BZ).  I've opened https://bugzilla.kernel.org/show_bug.cgi?id=79611
and included a scrubbed 'dmesg' log there along with an analysis of
how we are hitting the issue.

>
> Bjorn
>
>>>> Reference:
>>>>   PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>>>>   (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>>>
>>>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>>>> ---
>>>>
>>>>  drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>>> index 42914e0..0568416 100644
>>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>>>         pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>>>                 PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>>>                 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>>>> -               PCI_EXP_SLTSTA_CC);
>>>> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>>>
>>>>         /* Disable software notification */
>>>>         pcie_disable_notification(ctrl);
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 7, 2014, 8:56 p.m. UTC | #10
On Mon, Jul 7, 2014 at 2:26 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
> On Tue, Jul 1, 2014 at 1:29 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Mon, Jun 30, 2014 at 10:49 AM, Myron Stowe <myron.stowe@gmail.com> wrote:
>>> On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@redhat.com> wrote:
>>>>> During PCIe hot-plug initialization - pciehp_probe - data structures
>>>>> related to slot capabilities are set up.  As part of this set up, ISRs are
>>>>> put in place to handle slot events and all event bits are cleared out.
>>>>>
>>>>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
>>>>> Slot Status bit to the event bits that are cleared out during
>>>>> initialization.
>>>>
>>>> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>>>> notifications for hot-plug and removal").  Prior to that, pcie_isr()
>>>> didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>>>>
>>>> Apparently there's a non-public report of spurious messages like this
>>>> at boot-time, with no actual hotplug events:
>>>>
>>>>   pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>>>>   pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>>>> 0000:83:00, cannot hot-add
>>>>   pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>>>>
>>>> Device 0000:83:00.0 was enumerated previously, during the normal PCI
>>>> device enumeration.  I suspect DLLSC was set by the hardware when the
>>>> link came up after power-on, and it remained set until Linux booted.
>>>> Then when we take the first pciehp interrupt, we notice DLLSC is set
>>>> and think it's a new link state change.
>>>>
>>>> This might be system-specific, because some BIOSes might clear DLLSC
>>>> before handing off to the OS.  Or my theory might be all wet.
>>>>
>>>> But I think we should clear DLLSC in any case.  We clear all the other
>>>> RW1C bits in Slot Status, and I think we should clear DLLSC as well.
>>>>
>>>> I'd like to include a bugzilla or mailing list reference for the
>>>> spurious message, and I'd also like confirmation that this changes
>>>> actually fixes it.
>>>
>>> I just got confirmation from the reporter that the patch fixes the
>>> issue they were encountering.
>>>
>>> I've also asked them numerous time if I could make the bug public as I
>>> expect others may be hitting it and could benefit from a BZ and
>>> analysis/explanation.  I just repeated my request - if they reply
>>> positively I'll follow through with some documentation.
>>
>> I'd be glad to merge this, as soon as we get more details about the
>> problem it fixes.  I think it's good to include some specifics, e.g.,
>> "Unexpected Link Up event," etc., because it helps other people with
>> the same problem to find the solution.  It's OK if details about
>> pre-production machines and so on are removed, but it's better if
>> changes to generic code are supported by public evidence that
>> everybody can evaluate.
>>
>> If you want to pull out the public stuff into a kernel.org bugzilla,
>> that would be fine, too.  That would actually be better because then
>> we don't have to depend on Red Hat maintaining it.
>
> Agreed (but I couldn't get the vendor to agree to open up the Red Hat
> BZ).  I've opened https://bugzilla.kernel.org/show_bug.cgi?id=79611
> and included a scrubbed 'dmesg' log there along with an analysis of
> how we are hitting the issue.

Beautiful, thanks!  I marked this for stable and put this in
pci/hotplug for v3.17.

>>>>> Reference:
>>>>>   PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
>>>>>   (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
>>>>>
>>>>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>>>>> ---
>>>>>
>>>>>  drivers/pci/hotplug/pciehp_hpc.c |    2 +-
>>>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>>>> index 42914e0..0568416 100644
>>>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>>>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>>>>>         pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>>>>>                 PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>>>>>                 PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
>>>>> -               PCI_EXP_SLTSTA_CC);
>>>>> +               PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>>>>>
>>>>>         /* Disable software notification */
>>>>>         pcie_disable_notification(ctrl);
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 42914e0..0568416 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -794,7 +794,7 @@  struct controller *pcie_init(struct pcie_device *dev)
 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
 		PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
 		PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
-		PCI_EXP_SLTSTA_CC);
+		PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
 
 	/* Disable software notification */
 	pcie_disable_notification(ctrl);