diff mbox

pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't

Message ID 1477490014-115917-1-git-send-email-jthumshirn@suse.de
State Changes Requested
Headers show

Commit Message

Johannes Thumshirn Oct. 26, 2016, 1:53 p.m. UTC
The Read Completion Boundary bit must only be set on a device or endpoint if
it is set on the upstream bridge.

Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/pci/probe.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

Comments

Hannes Reinecke Oct. 26, 2016, 1:58 p.m. UTC | #1
On 10/26/2016 03:53 PM, Johannes Thumshirn wrote:
> The Read Completion Boundary bit must only be set on a device or endpoint if
> it is set on the upstream bridge.
> 
> Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/pci/probe.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
This fixes a regression where the mlx4 driver would not load (and, in
fact, crash) on certain Ivybridge servers.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
Bjorn Helgaas Oct. 26, 2016, 7:43 p.m. UTC | #2
Hi Johannes,

On Wed, Oct 26, 2016 at 03:53:34PM +0200, Johannes Thumshirn wrote:
> The Read Completion Boundary bit must only be set on a device or endpoint if
> it is set on the upstream bridge.
> 
> Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")

Can you please include a spec citation and a pointer to the bug report?

> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/pci/probe.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ab00267..5007b96 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1439,6 +1439,16 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp)
>  		dev_warn(&dev->dev, "PCI-X settings not supported\n");
>  }
>  
> +static bool pcie_get_upstream_rcb(struct pci_dev *dev)
> +{
> +	struct pci_dev *bridge = pci_upstream_bridge(dev);
> +	u16 lnkctl;
> +
> +	pcie_capability_read_word(bridge, PCI_EXP_LNKCTL, &lnkctl);
> +
> +	return lnkctl & PCI_EXP_LNKCTL_RCB;
> +}
> +
>  static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
>  {
>  	int pos;
> @@ -1468,9 +1478,21 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
>  			~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or);
>  
>  	/* Initialize Link Control Register */
> -	if (pcie_cap_has_lnkctl(dev))
> +	if (pcie_cap_has_lnkctl(dev)) {
> +		bool us_rcb;
> +		u16 clear;
> +		u16 set;
> +
> +		us_rcb = pcie_get_upstream_rcb(dev);
> +
> +		clear = ~hpp->pci_exp_lnkctl_and;
> +		set = hpp->pci_exp_lnkctl_or;
> +		if (!us_rcb)
> +			set &= ~PCI_EXP_LNKCTL_RCB;
> +
>  		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> -			~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or);
> +						  clear, set);
> +	}
>  
>  	/* Find Advanced Error Reporting Enhanced Capability */
>  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
> -- 
> 1.8.5.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke Oct. 27, 2016, 5:42 a.m. UTC | #3
On 10/26/2016 09:43 PM, Bjorn Helgaas wrote:
> Hi Johannes,
> 
> On Wed, Oct 26, 2016 at 03:53:34PM +0200, Johannes Thumshirn wrote:
>> The Read Completion Boundary bit must only be set on a device or endpoint if
>> it is set on the upstream bridge.
>>
>> Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")
> 
> Can you please include a spec citation and a pointer to the bug report?
> 
PCI Express Base Specification 1.1,
section 2.3.1.1. Data Return for Read Requests:

The Read Completion Boundary (RCB) parameter determines the naturally
aligned address boundaries on which a Read Request may be serviced with
multiple Completions
o For a Root Complex, RCB is 64 bytes or 128 bytes
  o This value is reported through a configuration register
    (see Section 7.8)
  Note: Bridges and Endpoints may implement a corresponding command
  bit which may be set by system software to indicate the RCB value
  for the Root Complex, allowing the Bridge/Endpoint to optimize its
  behavior when the Root Complex’s RCB is 128 bytes.
o For all other system elements, RCB is 128 bytes

In this particular case the _HPX method was causing the RCB for all PCI
devices to be set to 128 bytes, while the root bridge remained at 64 bytes.
While this is arguably a BIOS bug, earlier linux version (ie without the
mentioned patch) were running fine, so this is actually a regression.

Cheers,

Hannes
Bjorn Helgaas Oct. 27, 2016, 11:51 a.m. UTC | #4
On Thu, Oct 27, 2016 at 07:42:27AM +0200, Hannes Reinecke wrote:
> On 10/26/2016 09:43 PM, Bjorn Helgaas wrote:
> > Hi Johannes,
> > 
> > On Wed, Oct 26, 2016 at 03:53:34PM +0200, Johannes Thumshirn wrote:
> >> The Read Completion Boundary bit must only be set on a device or endpoint if
> >> it is set on the upstream bridge.
> >>
> >> Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")
> > 
> > Can you please include a spec citation and a pointer to the bug report?
> > 
> PCI Express Base Specification 1.1,
> section 2.3.1.1. Data Return for Read Requests:
> 
> The Read Completion Boundary (RCB) parameter determines the naturally
> aligned address boundaries on which a Read Request may be serviced with
> multiple Completions
> o For a Root Complex, RCB is 64 bytes or 128 bytes
>   o This value is reported through a configuration register
>     (see Section 7.8)
>   Note: Bridges and Endpoints may implement a corresponding command
>   bit which may be set by system software to indicate the RCB value
>   for the Root Complex, allowing the Bridge/Endpoint to optimize its
>   behavior when the Root Complex’s RCB is 128 bytes.
> o For all other system elements, RCB is 128 bytes
> 
> In this particular case the _HPX method was causing the RCB for all PCI
> devices to be set to 128 bytes, while the root bridge remained at 64 bytes.
> While this is arguably a BIOS bug, earlier linux version (ie without the
> mentioned patch) were running fine, so this is actually a regression.

Thanks!  I can fold this into the changelog.

I assume you didn't mention a bugzilla or similar URL because this was
found internally?  I'd still like a clue about what this issue looks
like to a user, because that helps connect future problem reports with
this fix.

And I suppose that since 7a1562d4f2d0 appeared in v3.18, we maybe
should consider marking the fix for stable?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke Oct. 27, 2016, 12:06 p.m. UTC | #5
On 10/27/2016 01:51 PM, Bjorn Helgaas wrote:
> On Thu, Oct 27, 2016 at 07:42:27AM +0200, Hannes Reinecke wrote:
>> On 10/26/2016 09:43 PM, Bjorn Helgaas wrote:
>>> Hi Johannes,
>>>
>>> On Wed, Oct 26, 2016 at 03:53:34PM +0200, Johannes Thumshirn wrote:
>>>> The Read Completion Boundary bit must only be set on a device or endpoint if
>>>> it is set on the upstream bridge.
>>>>
>>>> Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")
>>>
>>> Can you please include a spec citation and a pointer to the bug report?
>>>
>> PCI Express Base Specification 1.1,
>> section 2.3.1.1. Data Return for Read Requests:
>>
>> The Read Completion Boundary (RCB) parameter determines the naturally
>> aligned address boundaries on which a Read Request may be serviced with
>> multiple Completions
>> o For a Root Complex, RCB is 64 bytes or 128 bytes
>>   o This value is reported through a configuration register
>>     (see Section 7.8)
>>   Note: Bridges and Endpoints may implement a corresponding command
>>   bit which may be set by system software to indicate the RCB value
>>   for the Root Complex, allowing the Bridge/Endpoint to optimize its
>>   behavior when the Root Complex’s RCB is 128 bytes.
>> o For all other system elements, RCB is 128 bytes
>>
>> In this particular case the _HPX method was causing the RCB for all PCI
>> devices to be set to 128 bytes, while the root bridge remained at 64 bytes.
>> While this is arguably a BIOS bug, earlier linux version (ie without the
>> mentioned patch) were running fine, so this is actually a regression.
> 
> Thanks!  I can fold this into the changelog.
> 
> I assume you didn't mention a bugzilla or similar URL because this was
> found internally?  I'd still like a clue about what this issue looks
> like to a user, because that helps connect future problem reports with
> this fix.
> 
We do have a bugzilla report, but as this is a) on the SUSE internal
bugzilla and b) a customer issue I didn't include a reference here.

However, the symptoms are:

[    8.648872] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[    8.648889] mlx4_core: Initializing 0000:41:00.0
[   10.068642] mlx4_core 0000:41:00.0: command 0xfff failed: fw status = 0x1
[   10.068645] mlx4_core 0000:41:00.0: MAP_FA command failed, aborting
[   10.068659] mlx4_core 0000:41:00.0: Failed to start FW, aborting
[   10.068661] mlx4_core 0000:41:00.0: Failed to init fw, aborting.
[   11.071536] mlx4_core: probe of 0000:41:00.0 failed with error -5

> And I suppose that since 7a1562d4f2d0 appeared in v3.18, we maybe
> should consider marking the fix for stable?
> 
Yes, please.

Cheers,

Hannes
Johannes Thumshirn Oct. 27, 2016, 12:25 p.m. UTC | #6
On Thu, Oct 27, 2016 at 06:51:52AM -0500, Bjorn Helgaas wrote:
> On Thu, Oct 27, 2016 at 07:42:27AM +0200, Hannes Reinecke wrote:
> > On 10/26/2016 09:43 PM, Bjorn Helgaas wrote:
> > > Hi Johannes,
> > > 
> > > On Wed, Oct 26, 2016 at 03:53:34PM +0200, Johannes Thumshirn wrote:
> > >> The Read Completion Boundary bit must only be set on a device or endpoint if
> > >> it is set on the upstream bridge.
> > >>
> > >> Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")
> > > 
> > > Can you please include a spec citation and a pointer to the bug report?
> > > 
> > PCI Express Base Specification 1.1,
> > section 2.3.1.1. Data Return for Read Requests:
> > 
> > The Read Completion Boundary (RCB) parameter determines the naturally
> > aligned address boundaries on which a Read Request may be serviced with
> > multiple Completions
> > o For a Root Complex, RCB is 64 bytes or 128 bytes
> >   o This value is reported through a configuration register
> >     (see Section 7.8)
> >   Note: Bridges and Endpoints may implement a corresponding command
> >   bit which may be set by system software to indicate the RCB value
> >   for the Root Complex, allowing the Bridge/Endpoint to optimize its
> >   behavior when the Root Complex’s RCB is 128 bytes.
> > o For all other system elements, RCB is 128 bytes
> > 
> > In this particular case the _HPX method was causing the RCB for all PCI
> > devices to be set to 128 bytes, while the root bridge remained at 64 bytes.
> > While this is arguably a BIOS bug, earlier linux version (ie without the
> > mentioned patch) were running fine, so this is actually a regression.
> 
> Thanks!  I can fold this into the changelog.
> 
> I assume you didn't mention a bugzilla or similar URL because this was
> found internally?  I'd still like a clue about what this issue looks
> like to a user, because that helps connect future problem reports with
> this fix.

Sharing the bugzilla number won't be of any help here, as it's not accessible.
The user visible issue was, that mlx4_core.ko wasn't able to issue the
MLX4_CMD_MAP_FA command to it's firmware resulting in the adapter undergoing
it's error recovery and ultimately hitting a BUG_ON() (which is subject to
another patch). Even without asserting it won't load and hence not provide any
network service. 

We haven't seen this behaviour on other drivers and it is likely a BIOS
error (the wrong setting of PCI_EXP_LNKCTL_RCB) but that wasn't an issue
before 7a1562d4f2d0 hence we do have a regression to kernels < 3.18.

> 
> And I suppose that since 7a1562d4f2d0 appeared in v3.18, we maybe
> should consider marking the fix for stable?

Yes probably, but on the other hand we haven't heard of any other cases than
this one (mlx4 NIC/HCA with a particular IBM IvyBridge Server, the next generation
didn't expose the bug).

Btw: Did you see I've sent a v2 as Alex Graf pointed out there might be a
possible NULL pointer dereference when accessing the "bridge" pointer in
pcie_get_upstream_rcb()?

Thanks,
	Johannes
diff mbox

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ab00267..5007b96 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1439,6 +1439,16 @@  static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp)
 		dev_warn(&dev->dev, "PCI-X settings not supported\n");
 }
 
+static bool pcie_get_upstream_rcb(struct pci_dev *dev)
+{
+	struct pci_dev *bridge = pci_upstream_bridge(dev);
+	u16 lnkctl;
+
+	pcie_capability_read_word(bridge, PCI_EXP_LNKCTL, &lnkctl);
+
+	return lnkctl & PCI_EXP_LNKCTL_RCB;
+}
+
 static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
 {
 	int pos;
@@ -1468,9 +1478,21 @@  static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
 			~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or);
 
 	/* Initialize Link Control Register */
-	if (pcie_cap_has_lnkctl(dev))
+	if (pcie_cap_has_lnkctl(dev)) {
+		bool us_rcb;
+		u16 clear;
+		u16 set;
+
+		us_rcb = pcie_get_upstream_rcb(dev);
+
+		clear = ~hpp->pci_exp_lnkctl_and;
+		set = hpp->pci_exp_lnkctl_or;
+		if (!us_rcb)
+			set &= ~PCI_EXP_LNKCTL_RCB;
+
 		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
-			~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or);
+						  clear, set);
+	}
 
 	/* Find Advanced Error Reporting Enhanced Capability */
 	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);