Message ID | 20240116143258.483235-1-vidyas@nvidia.com |
---|---|
State | New |
Headers | show |
Series | [V2] PCI: Clear errors logged in Secondary Status Register | expand |
On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote: > The enumeration process leaves the 'Received Master Abort' bit set in > the Secondary Status Register of the downstream port in the following > scenarios. > > (1) The device connected to the downstream port has ARI capability > and that makes the kernel set the 'ARI Forwarding Enable' bit in > the Device Control 2 Register of the downstream port. This > effectively makes the downstream port forward the configuration > requests targeting the devices downstream of it, even though they > don't exist in reality. It causes the downstream devices return > completions with UR set in the status in turn causing 'Received > Master Abort' bit set. > > In contrast, if the downstream device doesn't have ARI capability, > the 'ARI Forwarding Enable' bit in the downstream port is not set > and any configuration requests targeting the downstream devices > that don't exist are terminated (section 6.13 of PCI Express Base > 6.0 spec) in the downstream port itself resulting in no change of > the 'Received Master Abort' bit. > > (2) A PCIe switch is connected to the downstream port and when the > enumeration flow tries to explore the presence of devices that > don't really exist downstream of the switch, the downstream > port receives the completions with UR set causing the 'Received > Master Abort' bit set. Are these the only possible ways this error is logged? I expected them to be logged when we enumerate below a Root Port that has nothing attached, for example. Does clearing them in pci_scan_bridge_extend() cover all ways this error might be logged during enumeration? I can't remember whether all enumeration goes through this path. > Clear 'Received Master Abort' bit to keep the bridge device in a clean > state post enumeration. > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com> > --- > V2: > * Changed commit message based on Bjorn's feedback > > drivers/pci/probe.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 795534589b98..640d2871b061 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev, > } > > out: > + /* Clear errors in the Secondary Status Register */ > + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff); > + > pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl); > > pm_runtime_put(&dev->dev); > -- > 2.25.1 >
On 23-01-2024 04:30, Bjorn Helgaas wrote: > External email: Use caution opening links or attachments > > > On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote: >> The enumeration process leaves the 'Received Master Abort' bit set in >> the Secondary Status Register of the downstream port in the following >> scenarios. >> >> (1) The device connected to the downstream port has ARI capability >> and that makes the kernel set the 'ARI Forwarding Enable' bit in >> the Device Control 2 Register of the downstream port. This >> effectively makes the downstream port forward the configuration >> requests targeting the devices downstream of it, even though they >> don't exist in reality. It causes the downstream devices return >> completions with UR set in the status in turn causing 'Received >> Master Abort' bit set. >> >> In contrast, if the downstream device doesn't have ARI capability, >> the 'ARI Forwarding Enable' bit in the downstream port is not set >> and any configuration requests targeting the downstream devices >> that don't exist are terminated (section 6.13 of PCI Express Base >> 6.0 spec) in the downstream port itself resulting in no change of >> the 'Received Master Abort' bit. >> >> (2) A PCIe switch is connected to the downstream port and when the >> enumeration flow tries to explore the presence of devices that >> don't really exist downstream of the switch, the downstream >> port receives the completions with UR set causing the 'Received >> Master Abort' bit set. > Are these the only possible ways this error is logged? I expected > them to be logged when we enumerate below a Root Port that has nothing > attached, for example. In this case, there won't be any TLP sent downstream. I talked about this scenario in the second paragraph of point (1) above. > Does clearing them in pci_scan_bridge_extend() cover all ways this > error might be logged during enumeration? I can't remember whether > all enumeration goes through this path. So far in my testing, clearing it in pci_scan_bridge_extend() covers all the cases. >> Clear 'Received Master Abort' bit to keep the bridge device in a clean >> state post enumeration. >> >> Signed-off-by: Vidya Sagar <vidyas@nvidia.com> >> --- >> V2: >> * Changed commit message based on Bjorn's feedback >> >> drivers/pci/probe.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >> index 795534589b98..640d2871b061 100644 >> --- a/drivers/pci/probe.c >> +++ b/drivers/pci/probe.c >> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev, >> } >> >> out: >> + /* Clear errors in the Secondary Status Register */ >> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff); >> + >> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl); >> >> pm_runtime_put(&dev->dev); >> -- >> 2.25.1 >>
Hi Bjorn, Just checking on this thread. Is there anything else you want me to clarify on? Thanks, Vidya Sagar On 14-03-2024 06:09, Vidya Sagar wrote: > > > On 23-01-2024 04:30, Bjorn Helgaas wrote: >> External email: Use caution opening links or attachments >> >> >> On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote: >>> The enumeration process leaves the 'Received Master Abort' bit set in >>> the Secondary Status Register of the downstream port in the following >>> scenarios. >>> >>> (1) The device connected to the downstream port has ARI capability >>> and that makes the kernel set the 'ARI Forwarding Enable' bit in >>> the Device Control 2 Register of the downstream port. This >>> effectively makes the downstream port forward the configuration >>> requests targeting the devices downstream of it, even though they >>> don't exist in reality. It causes the downstream devices return >>> completions with UR set in the status in turn causing 'Received >>> Master Abort' bit set. >>> >>> In contrast, if the downstream device doesn't have ARI capability, >>> the 'ARI Forwarding Enable' bit in the downstream port is not set >>> and any configuration requests targeting the downstream devices >>> that don't exist are terminated (section 6.13 of PCI Express Base >>> 6.0 spec) in the downstream port itself resulting in no change of >>> the 'Received Master Abort' bit. >>> >>> (2) A PCIe switch is connected to the downstream port and when the >>> enumeration flow tries to explore the presence of devices that >>> don't really exist downstream of the switch, the downstream >>> port receives the completions with UR set causing the 'Received >>> Master Abort' bit set. >> Are these the only possible ways this error is logged? I expected >> them to be logged when we enumerate below a Root Port that has nothing >> attached, for example. > In this case, there won't be any TLP sent downstream. I talked about > this scenario in the > second paragraph of point (1) above. >> Does clearing them in pci_scan_bridge_extend() cover all ways this >> error might be logged during enumeration? I can't remember whether >> all enumeration goes through this path. > So far in my testing, clearing it in pci_scan_bridge_extend() covers > all the cases. > >>> Clear 'Received Master Abort' bit to keep the bridge device in a clean >>> state post enumeration. >>> >>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com> >>> --- >>> V2: >>> * Changed commit message based on Bjorn's feedback >>> >>> drivers/pci/probe.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >>> index 795534589b98..640d2871b061 100644 >>> --- a/drivers/pci/probe.c >>> +++ b/drivers/pci/probe.c >>> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct >>> pci_bus *bus, struct pci_dev *dev, >>> } >>> >>> out: >>> + /* Clear errors in the Secondary Status Register */ >>> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff); >>> + >>> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl); >>> >>> pm_runtime_put(&dev->dev); >>> -- >>> 2.25.1 >>> >
Hi Bjorn, Sorry to bug you. Is this change good to be accepted? Thanks, Vidya Sagar On 01-04-2024 13:29, Vidya Sagar wrote: > Hi Bjorn, > Just checking on this thread. > Is there anything else you want me to clarify on? > > Thanks, > Vidya Sagar > > On 14-03-2024 06:09, Vidya Sagar wrote: >> >> >> On 23-01-2024 04:30, Bjorn Helgaas wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote: >>>> The enumeration process leaves the 'Received Master Abort' bit set in >>>> the Secondary Status Register of the downstream port in the following >>>> scenarios. >>>> >>>> (1) The device connected to the downstream port has ARI capability >>>> and that makes the kernel set the 'ARI Forwarding Enable' bit in >>>> the Device Control 2 Register of the downstream port. This >>>> effectively makes the downstream port forward the configuration >>>> requests targeting the devices downstream of it, even though they >>>> don't exist in reality. It causes the downstream devices return >>>> completions with UR set in the status in turn causing 'Received >>>> Master Abort' bit set. >>>> >>>> In contrast, if the downstream device doesn't have ARI capability, >>>> the 'ARI Forwarding Enable' bit in the downstream port is not set >>>> and any configuration requests targeting the downstream devices >>>> that don't exist are terminated (section 6.13 of PCI Express Base >>>> 6.0 spec) in the downstream port itself resulting in no change of >>>> the 'Received Master Abort' bit. >>>> >>>> (2) A PCIe switch is connected to the downstream port and when the >>>> enumeration flow tries to explore the presence of devices that >>>> don't really exist downstream of the switch, the downstream >>>> port receives the completions with UR set causing the 'Received >>>> Master Abort' bit set. >>> Are these the only possible ways this error is logged? I expected >>> them to be logged when we enumerate below a Root Port that has nothing >>> attached, for example. >> In this case, there won't be any TLP sent downstream. I talked about this >> scenario in the >> second paragraph of point (1) above. >>> Does clearing them in pci_scan_bridge_extend() cover all ways this >>> error might be logged during enumeration? I can't remember whether >>> all enumeration goes through this path. >> So far in my testing, clearing it in pci_scan_bridge_extend() covers all the >> cases. >> >>>> Clear 'Received Master Abort' bit to keep the bridge device in a clean >>>> state post enumeration. >>>> >>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com> >>>> --- >>>> V2: >>>> * Changed commit message based on Bjorn's feedback >>>> >>>> drivers/pci/probe.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >>>> index 795534589b98..640d2871b061 100644 >>>> --- a/drivers/pci/probe.c >>>> +++ b/drivers/pci/probe.c >>>> @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus >>>> *bus, struct pci_dev *dev, >>>> } >>>> >>>> out: >>>> + /* Clear errors in the Secondary Status Register */ >>>> + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff); >>>> + >>>> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl); >>>> >>>> pm_runtime_put(&dev->dev); >>>> -- >>>> 2.25.1 >>>> >> >
On Tue, Jan 16, 2024 at 08:02:58PM +0530, Vidya Sagar wrote: > The enumeration process leaves the 'Received Master Abort' bit set in > the Secondary Status Register of the downstream port in the following > scenarios. > > (1) The device connected to the downstream port has ARI capability > and that makes the kernel set the 'ARI Forwarding Enable' bit in > the Device Control 2 Register of the downstream port. This > effectively makes the downstream port forward the configuration > requests targeting the devices downstream of it, even though they > don't exist in reality. It causes the downstream devices return > completions with UR set in the status in turn causing 'Received > Master Abort' bit set. > > In contrast, if the downstream device doesn't have ARI capability, > the 'ARI Forwarding Enable' bit in the downstream port is not set > and any configuration requests targeting the downstream devices > that don't exist are terminated (section 6.13 of PCI Express Base > 6.0 spec) in the downstream port itself resulting in no change of > the 'Received Master Abort' bit. > > (2) A PCIe switch is connected to the downstream port and when the > enumeration flow tries to explore the presence of devices that > don't really exist downstream of the switch, the downstream > port receives the completions with UR set causing the 'Received > Master Abort' bit set. > > Clear 'Received Master Abort' bit to keep the bridge device in a clean > state post enumeration. > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Applied to pci/enumeration for v6.10, thanks! I shortened the commit log because I think this happens all the time, not just in the specific cases you mentioned above: PCI: Clear Secondary Status errors after enumeration We enumerate devices by attempting config reads to the Vendor ID of each possible device. On conventional PCI, if no device responds, the read terminates with a Master Abort (PCI r3.0, sec 6.1). On PCIe, the config read is terminated as an Unsupported Request (PCIe r6.0, sec 2.3.2, 7.5.1.3.7). In either case, if the read addressed a device below a bridge, it is logged by setting "Received Master Abort" in the bridge Secondary Status register. Clear any errors logged in the Secondary Status register after enumeration. > --- > V2: > * Changed commit message based on Bjorn's feedback > > drivers/pci/probe.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 795534589b98..640d2871b061 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev, > } > > out: > + /* Clear errors in the Secondary Status Register */ > + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff); > + > pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl); > > pm_runtime_put(&dev->dev); > -- > 2.25.1 >
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 795534589b98..640d2871b061 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1470,6 +1470,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev, } out: + /* Clear errors in the Secondary Status Register */ + pci_write_config_word(dev, PCI_SEC_STATUS, 0xffff); + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl); pm_runtime_put(&dev->dev);
The enumeration process leaves the 'Received Master Abort' bit set in the Secondary Status Register of the downstream port in the following scenarios. (1) The device connected to the downstream port has ARI capability and that makes the kernel set the 'ARI Forwarding Enable' bit in the Device Control 2 Register of the downstream port. This effectively makes the downstream port forward the configuration requests targeting the devices downstream of it, even though they don't exist in reality. It causes the downstream devices return completions with UR set in the status in turn causing 'Received Master Abort' bit set. In contrast, if the downstream device doesn't have ARI capability, the 'ARI Forwarding Enable' bit in the downstream port is not set and any configuration requests targeting the downstream devices that don't exist are terminated (section 6.13 of PCI Express Base 6.0 spec) in the downstream port itself resulting in no change of the 'Received Master Abort' bit. (2) A PCIe switch is connected to the downstream port and when the enumeration flow tries to explore the presence of devices that don't really exist downstream of the switch, the downstream port receives the completions with UR set causing the 'Received Master Abort' bit set. Clear 'Received Master Abort' bit to keep the bridge device in a clean state post enumeration. Signed-off-by: Vidya Sagar <vidyas@nvidia.com> --- V2: * Changed commit message based on Bjorn's feedback drivers/pci/probe.c | 3 +++ 1 file changed, 3 insertions(+)