diff mbox

[2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't

Message ID 20161102223552.14776-2-jthumshirn@suse.de
State Changes Requested
Headers show

Commit Message

Johannes Thumshirn Nov. 2, 2016, 10:35 p.m. UTC
The Read Completion Boundary (RCB) bit must only be set on a device or
endpoint if it is set on the root complex.

Certain BIOSes erroneously set the RCB Bit in their ACPI _HPX Tables
even if it is not set on the root port. This is a violation to the PCIe
Specification and is known to bring some Mellanox Connect-X 3 HCAs into
a state where they can't map their firmware and go into error recovery.

BIOS Information
	Vendor: IBM
	Version: -[A8E120CUS-1.30]-
	Release Date: 08/22/2016

From PCI Express Base Specification 1.1,
section 2.3.1.1. Data Return for Read Requests:
The Read Completion Boundary (RCB) parameter determines the naturally
aligned address boundaries on which a Read Request may be serviced with
multiple Completions
o For a Root Complex, RCB is 64 bytes or 128 bytes
  o This value is reported through a configuration register
    (see Section 7.8)
  Note: Bridges and Endpoints may implement a corresponding command
  bit which may be set by system software to indicate the RCB value
  for the Root Complex, allowing the Bridge/Endpoint to optimize its
  behavior when the Root Complex’s RCB is 128 bytes.
o For all other system elements, RCB is 128 bytes

Table 7-16: Link Control Register:
Configuration software must only Set this bit if the Root Port
Upstream from the Endpoint or Bridge reports an RCB value of
128 bytes (a value of 1b in the Read Completion Boundary bit).
Default value of this bit is 0b.

Functions that do not implement this feature must hardwire the
bit to 0b.

Before commit 7a1562d4f:
> 41:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
>
> 40:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-

After:
> 40:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
>
> 41:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
> 		LnkCtl:	ASPM Disabled; RCB 128 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
 drivers/pci/probe.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

Comments

Bjorn Helgaas Nov. 9, 2016, 5:11 p.m. UTC | #1
Hi Johannes,

On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> The Read Completion Boundary (RCB) bit must only be set on a device or
> endpoint if it is set on the root complex.
> 
> Certain BIOSes erroneously set the RCB Bit in their ACPI _HPX Tables
> even if it is not set on the root port. This is a violation to the PCIe
> Specification and is known to bring some Mellanox Connect-X 3 HCAs into
> a state where they can't map their firmware and go into error recovery.
> 
> BIOS Information
> 	Vendor: IBM
> 	Version: -[A8E120CUS-1.30]-
> 	Release Date: 08/22/2016

This seems like a pretty serious problem (sounds like maybe the HCA is
completely useless?)

Can you point us at a bugzilla or other problem report?  It's nice to
have details of what this looks like to a user, so people who trip
over this problem have a little more chance of finding the solution.

7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
with a link") appeared in v3.18, so it's probably not a *new* problem,
so my guess is that this is v4.10 material.

> From PCI Express Base Specification 1.1,
> section 2.3.1.1. Data Return for Read Requests:
> The Read Completion Boundary (RCB) parameter determines the naturally
> aligned address boundaries on which a Read Request may be serviced with
> multiple Completions
> o For a Root Complex, RCB is 64 bytes or 128 bytes
>   o This value is reported through a configuration register
>     (see Section 7.8)
>   Note: Bridges and Endpoints may implement a corresponding command
>   bit which may be set by system software to indicate the RCB value
>   for the Root Complex, allowing the Bridge/Endpoint to optimize its
>   behavior when the Root Complex’s RCB is 128 bytes.
> o For all other system elements, RCB is 128 bytes
> 
> Table 7-16: Link Control Register:
> Configuration software must only Set this bit if the Root Port
> Upstream from the Endpoint or Bridge reports an RCB value of
> 128 bytes (a value of 1b in the Read Completion Boundary bit).
> Default value of this bit is 0b.
> 
> Functions that do not implement this feature must hardwire the
> bit to 0b.
> 
> Before commit 7a1562d4f:
> > 41:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
> > 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
> >
> > 40:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
> > 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
> 
> After:
> > 40:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
> > 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
> >
> > 41:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
> > 		LnkCtl:	ASPM Disabled; RCB 128 bytes Disabled- CommClk+
> > 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 
> Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/pci/probe.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ab00267..0a4ab9c 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1439,6 +1439,19 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp)
>  		dev_warn(&dev->dev, "PCI-X settings not supported\n");
>  }
>  
> +static bool pcie_get_root_rcb(struct pci_dev *dev)
> +{
> +	struct pci_dev *rp = pcie_find_root_port(dev);
> +	u16 lnkctl;
> +
> +	if (!rp)
> +		return false;
> +
> +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> +
> +	return lnkctl & PCI_EXP_LNKCTL_RCB;
> +}
> +
>  static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
>  {
>  	int pos;
> @@ -1468,9 +1481,21 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
>  			~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or);
>  
>  	/* Initialize Link Control Register */
> -	if (pcie_cap_has_lnkctl(dev))
> +	if (pcie_cap_has_lnkctl(dev)) {
> +		bool rrcb;
> +		u16 clear;
> +		u16 set;
> +
> +		rrcb = pcie_get_root_rcb(dev);
> +
> +		clear = ~hpp->pci_exp_lnkctl_and;
> +		set = hpp->pci_exp_lnkctl_or;
> +		if (!rrcb)
> +			set &= ~PCI_EXP_LNKCTL_RCB;
> +
>  		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> -			~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or);
> +						  clear, set);
> +	}
>  
>  	/* Find Advanced Error Reporting Enhanced Capability */
>  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
> -- 
> 2.10.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Thumshirn Nov. 14, 2016, 11:56 a.m. UTC | #2
On Wed, Nov 09, 2016 at 11:11:40AM -0600, Bjorn Helgaas wrote:
> Hi Johannes,
> 
> On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> > The Read Completion Boundary (RCB) bit must only be set on a device or
> > endpoint if it is set on the root complex.
> > 
> > Certain BIOSes erroneously set the RCB Bit in their ACPI _HPX Tables
> > even if it is not set on the root port. This is a violation to the PCIe
> > Specification and is known to bring some Mellanox Connect-X 3 HCAs into
> > a state where they can't map their firmware and go into error recovery.
> > 
> > BIOS Information
> > 	Vendor: IBM
> > 	Version: -[A8E120CUS-1.30]-
> > 	Release Date: 08/22/2016
> 
> This seems like a pretty serious problem (sounds like maybe the HCA is
> completely useless?)

Correct.

> 
> Can you point us at a bugzilla or other problem report?  It's nice to
> have details of what this looks like to a user, so people who trip
> over this problem have a little more chance of finding the solution.

As we already said, our bugzilla entry for this is not accessible from the
outside, but I know Red Hat does have a bugzilla entry for the same issue as
well. Maybe this is reachable from the outside (adding Don for this, as I know
he has worked on this problem as well).

> 
> 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
> with a link") appeared in v3.18, so it's probably not a *new* problem,
> so my guess is that this is v4.10 material.

Yes 4.10 sounds good to me. I personally think, this problem hasn't
materialized yet, as this is the kind of hardware you run on a rather /stable/
kernel either you built on your own or get from an enterprise distribution and
until recently these kernels haven't been updated to something newer than
3.18.

Thanks,
	Johannes
Don Dutile Nov. 14, 2016, 4:16 p.m. UTC | #3
On 11/14/2016 06:56 AM, Johannes Thumshirn wrote:
> On Wed, Nov 09, 2016 at 11:11:40AM -0600, Bjorn Helgaas wrote:
>> Hi Johannes,
>>
>> On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
>>> The Read Completion Boundary (RCB) bit must only be set on a device or
>>> endpoint if it is set on the root complex.
>>>
>>> Certain BIOSes erroneously set the RCB Bit in their ACPI _HPX Tables
>>> even if it is not set on the root port. This is a violation to the PCIe
>>> Specification and is known to bring some Mellanox Connect-X 3 HCAs into
>>> a state where they can't map their firmware and go into error recovery.
>>>
>>> BIOS Information
>>> 	Vendor: IBM
>>> 	Version: -[A8E120CUS-1.30]-
>>> 	Release Date: 08/22/2016
>>
>> This seems like a pretty serious problem (sounds like maybe the HCA is
>> completely useless?)
>
> Correct.
>
>>
>> Can you point us at a bugzilla or other problem report?  It's nice to
>> have details of what this looks like to a user, so people who trip
>> over this problem have a little more chance of finding the solution.
>
> As we already said, our bugzilla entry for this is not accessible from the
> outside, but I know Red Hat does have a bugzilla entry for the same issue as
> well. Maybe this is reachable from the outside (adding Don for this, as I know
> he has worked on this problem as well).
>
RHEL bz's are not accessible from the outside.
I suggest capturing the content of the RH bz issue and creating a k.o. bz
with the information.

>>
>> 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
>> with a link") appeared in v3.18, so it's probably not a *new* problem,
>> so my guess is that this is v4.10 material.
>
> Yes 4.10 sounds good to me. I personally think, this problem hasn't
> materialized yet, as this is the kind of hardware you run on a rather /stable/
> kernel either you built on your own or get from an enterprise distribution and
> until recently these kernels haven't been updated to something newer than
> 3.18.
>
> Thanks,
> 	Johannes
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Thumshirn Nov. 15, 2016, 12:58 p.m. UTC | #4
On Mon, Nov 14, 2016 at 11:16:51AM -0500, Don Dutile wrote:
> On 11/14/2016 06:56 AM, Johannes Thumshirn wrote:
> > On Wed, Nov 09, 2016 at 11:11:40AM -0600, Bjorn Helgaas wrote:
> > > Hi Johannes,
> > > 
> > > On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> > > > The Read Completion Boundary (RCB) bit must only be set on a device or
> > > > endpoint if it is set on the root complex.
> > > > 
> > > > Certain BIOSes erroneously set the RCB Bit in their ACPI _HPX Tables
> > > > even if it is not set on the root port. This is a violation to the PCIe
> > > > Specification and is known to bring some Mellanox Connect-X 3 HCAs into
> > > > a state where they can't map their firmware and go into error recovery.
> > > > 
> > > > BIOS Information
> > > > 	Vendor: IBM
> > > > 	Version: -[A8E120CUS-1.30]-
> > > > 	Release Date: 08/22/2016
> > > 
> > > This seems like a pretty serious problem (sounds like maybe the HCA is
> > > completely useless?)
> > 
> > Correct.
> > 
> > > 
> > > Can you point us at a bugzilla or other problem report?  It's nice to
> > > have details of what this looks like to a user, so people who trip
> > > over this problem have a little more chance of finding the solution.
> > 
> > As we already said, our bugzilla entry for this is not accessible from the
> > outside, but I know Red Hat does have a bugzilla entry for the same issue as
> > well. Maybe this is reachable from the outside (adding Don for this, as I know
> > he has worked on this problem as well).
> > 
> RHEL bz's are not accessible from the outside.
> I suggest capturing the content of the RH bz issue and creating a k.o. bz
> with the information.

I've created https://bugzilla.kernel.org/show_bug.cgi?id=187781 to track this
on b.k.o. Feel free to add any information you have.

@Bjorn anything else I can provide in order to get the fix applied?

Byte,
 	Johannes
diff mbox

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ab00267..0a4ab9c 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1439,6 +1439,19 @@  static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp)
 		dev_warn(&dev->dev, "PCI-X settings not supported\n");
 }
 
+static bool pcie_get_root_rcb(struct pci_dev *dev)
+{
+	struct pci_dev *rp = pcie_find_root_port(dev);
+	u16 lnkctl;
+
+	if (!rp)
+		return false;
+
+	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
+
+	return lnkctl & PCI_EXP_LNKCTL_RCB;
+}
+
 static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
 {
 	int pos;
@@ -1468,9 +1481,21 @@  static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
 			~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or);
 
 	/* Initialize Link Control Register */
-	if (pcie_cap_has_lnkctl(dev))
+	if (pcie_cap_has_lnkctl(dev)) {
+		bool rrcb;
+		u16 clear;
+		u16 set;
+
+		rrcb = pcie_get_root_rcb(dev);
+
+		clear = ~hpp->pci_exp_lnkctl_and;
+		set = hpp->pci_exp_lnkctl_or;
+		if (!rrcb)
+			set &= ~PCI_EXP_LNKCTL_RCB;
+
 		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
-			~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or);
+						  clear, set);
+	}
 
 	/* Find Advanced Error Reporting Enhanced Capability */
 	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);