diff mbox

PCI: update device mps when doing pci hotplug

Message ID 1406621877-12022-1-git-send-email-wangyijing@huawei.com
State Changes Requested
Headers show

Commit Message

Yijing Wang July 29, 2014, 8:17 a.m. UTC
Currently we don't update device's mps value when doing
pci device hot-add. The hot-added device's mps will be set
to default value (128B). But the upstream port device's mps
may be larger than 128B which was set by firmware during
system bootup. In this case the new added device may not
work normally. This issue was found in huawei 5885 server
and Dell R620 server. And if we run the platform with windows,
this problem is gone. This patch try to update the hot added
device mps equal to its parent mps, if device mpss < parent mps,
print warning.

References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
Reported-by: Keith Busch <keith.busch@intel.com>
Reported-by: Jordan_Hargrave@Dell.com
Reported-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>
---
 drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

Comments

Alex Williamson July 29, 2014, 4:18 p.m. UTC | #1
On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
> Currently we don't update device's mps value when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally.

Apologies if we rehash some previously discussed topics while I try to
cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
The device should be functional with a lower mps setting, right?

>  This issue was found in huawei 5885 server
> and Dell R620 server. And if we run the platform with windows,
> this problem is gone. This patch try to update the hot added
> device mps equal to its parent mps, if device mpss < parent mps,
> print warning.
> 
> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> Reported-by: Keith Busch <keith.busch@intel.com>
> Reported-by: Jordan_Hargrave@Dell.com
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> ---
>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e3cf8a2..583ca52 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>  		dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>  }
>  
> +/**
> + * pcie_bus_update_set - update device mps when device doing hot-add
> + * @dev: PCI device to set
> + * 
> + * After device hot add, mps will be set to default(128B), But the 
> + * upstream port device's mps may be larger than 128B which was set 
> + * by firmware during system bootup. Then we should update the device
> + * mps to equal to its parent mps, Or the device can not work normally.
> + */
> +static void pcie_bus_update_set(struct pci_dev *dev)
> +{
> +	int mps, p_mps, mpss;
> +	struct pci_dev *parent;
> +
> +	if (!pci_is_pcie(dev) || !dev->bus->self 
> +			|| !dev->bus->self->is_hotplug_bridge)
> +		return;
> +	
> +	parent = dev->bus->self;
> +	mps = pcie_get_mps(dev);
> +	p_mps = pcie_get_mps(parent);
> +
> +	if (mps >= p_mps)
> +		return;
> +
> +	mpss = 128 << dev->pcie_mpss;
> +	if (mpss < p_mps) {
> +		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
> +				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
> +				mpss, p_mps);
> +		return;
> +	}
> +
> +	pcie_write_mps(dev, p_mps);
> +	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
> +			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
> +}

So if the device mps is less than the parent mps and the device supports
the parent mps, we update the device.  If the device cannot support the
parent mps, warn.  Why do we bypass the opportunity to reduce the device
mps if it exceeds the parent mps?

> +
>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>  {
>  	struct pci_dev *bridge = dev->bus->self;
> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>  		return 0;
>  
>  	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
> +		pcie_bus_update_set(dev);
>  		pcie_bus_detect_mps(dev);
>  		return 0;
>  	}

pcie_bus_update_set() and pcie_bus_detect_mps() have a lot of
redundancy, can't we merge this new functionality into the existing
function?  Also, we're in the PCIE_BUS_TUNE_OFF branch, but we seem to
be adding code which would imply PCIE_BUS_PERFORMANCE since we're
bringing the device up to an optimal mps to match the parent.  Is there
a simpler solution to simply downgrade the dev_warn in
pcie_bus_detect_mps() to dev_info and change the text?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keith Busch July 29, 2014, 4:30 p.m. UTC | #2
On Tue, 29 Jul 2014, Alex Williamson wrote:
> On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
>> Currently we don't update device's mps value when doing
>> pci device hot-add. The hot-added device's mps will be set
>> to default value (128B). But the upstream port device's mps
>> may be larger than 128B which was set by firmware during
>> system bootup. In this case the new added device may not
>> work normally.
>
> Apologies if we rehash some previously discussed topics while I try to
> cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
> The device should be functional with a lower mps setting, right?

You'd think so, but some platforms don't work. A pci-e trace showed
TLPs exceeding MPS when parent device at 256B and the end device left
at 128B. Even if that's a platform bug, I think we still want it to work.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 29, 2014, 4:42 p.m. UTC | #3
On Tue, 2014-07-29 at 10:30 -0600, Keith Busch wrote:
> On Tue, 29 Jul 2014, Alex Williamson wrote:
> > On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
> >> Currently we don't update device's mps value when doing
> >> pci device hot-add. The hot-added device's mps will be set
> >> to default value (128B). But the upstream port device's mps
> >> may be larger than 128B which was set by firmware during
> >> system bootup. In this case the new added device may not
> >> work normally.
> >
> > Apologies if we rehash some previously discussed topics while I try to
> > cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
> > The device should be functional with a lower mps setting, right?
> 
> You'd think so, but some platforms don't work. A pci-e trace showed
> TLPs exceeding MPS when parent device at 256B and the end device left
> at 128B. Even if that's a platform bug, I think we still want it to work.

But if it's a platform bug for a non-compliant device, should it be
handled as a quirk rather than standard configuration?  Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keith Busch July 29, 2014, 7:04 p.m. UTC | #4
On Tue, 29 Jul 2014, Alex Williamson wrote:
> On Tue, 2014-07-29 at 10:30 -0600, Keith Busch wrote:
>> On Tue, 29 Jul 2014, Alex Williamson wrote:
>>> On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
>>>> Currently we don't update device's mps value when doing
>>>> pci device hot-add. The hot-added device's mps will be set
>>>> to default value (128B). But the upstream port device's mps
>>>> may be larger than 128B which was set by firmware during
>>>> system bootup. In this case the new added device may not
>>>> work normally.
>>>
>>> Apologies if we rehash some previously discussed topics while I try to
>>> cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
>>> The device should be functional with a lower mps setting, right?
>>
>> You'd think so, but some platforms don't work. A pci-e trace showed
>> TLPs exceeding MPS when parent device at 256B and the end device left
>> at 128B. Even if that's a platform bug, I think we still want it to work.
>
> But if it's a platform bug for a non-compliant device, should it be
> handled as a quirk rather than standard configuration?  Thanks,

I'm not even sure it is a platform bug, but that was just a guess. The
way the devices are configured, it appears they behave inline with the
spec (from table 7-13):

   "
   Max_Payload_size -- This field sets maximum TLP payload size for the
   Function. As a Receiver, the Function must handle TLPs as large as the set
   value. As a Transmitter, the Function must not generate TLPs exceeding the
   set value.
   "

It sounds like it allows a transmitter to generate a TLP that the receiver
can't handle, but I don't know if that was the intent. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang July 30, 2014, 3:27 a.m. UTC | #5
On 2014/7/30 0:18, Alex Williamson wrote:
> On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
>> Currently we don't update device's mps value when doing
>> pci device hot-add. The hot-added device's mps will be set
>> to default value (128B). But the upstream port device's mps
>> may be larger than 128B which was set by firmware during
>> system bootup. In this case the new added device may not
>> work normally.
> 
> Apologies if we rehash some previously discussed topics while I try to
> cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
> The device should be functional with a lower mps setting, right?

No, the device can not work, because some pcie tlp packets will be discarded.
Sorry for my poor English.
> 
>>  This issue was found in huawei 5885 server
>> and Dell R620 server. And if we run the platform with windows,
>> this problem is gone. This patch try to update the hot added
>> device mps equal to its parent mps, if device mpss < parent mps,
>> print warning.
>>
>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
>> Reported-by: Keith Busch <keith.busch@intel.com>
>> Reported-by: Jordan_Hargrave@Dell.com
>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> Cc: Jon Mason <jdmason@kudzu.us>
>> ---
>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index e3cf8a2..583ca52 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>  		dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>  }
>>  
>> +/**
>> + * pcie_bus_update_set - update device mps when device doing hot-add
>> + * @dev: PCI device to set
>> + * 
>> + * After device hot add, mps will be set to default(128B), But the 
>> + * upstream port device's mps may be larger than 128B which was set 
>> + * by firmware during system bootup. Then we should update the device
>> + * mps to equal to its parent mps, Or the device can not work normally.
>> + */
>> +static void pcie_bus_update_set(struct pci_dev *dev)
>> +{
>> +	int mps, p_mps, mpss;
>> +	struct pci_dev *parent;
>> +
>> +	if (!pci_is_pcie(dev) || !dev->bus->self 
>> +			|| !dev->bus->self->is_hotplug_bridge)
>> +		return;
>> +	
>> +	parent = dev->bus->self;
>> +	mps = pcie_get_mps(dev);
>> +	p_mps = pcie_get_mps(parent);
>> +
>> +	if (mps >= p_mps)
>> +		return;
>> +
>> +	mpss = 128 << dev->pcie_mpss;
>> +	if (mpss < p_mps) {
>> +		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>> +				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>> +				mpss, p_mps);
>> +		return;
>> +	}
>> +
>> +	pcie_write_mps(dev, p_mps);
>> +	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
>> +			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>> +}
> 
> So if the device mps is less than the parent mps and the device supports
> the parent mps, we update the device.  If the device cannot support the
> parent mps, warn.  Why do we bypass the opportunity to reduce the device
> mps if it exceeds the parent mps?

Exactly, the device's mps will never larger than its parent's. That's unexpected, so we leave it.

> 
>> +
>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>  {
>>  	struct pci_dev *bridge = dev->bus->self;
>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>  		return 0;
>>  
>>  	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>> +		pcie_bus_update_set(dev);
>>  		pcie_bus_detect_mps(dev);
>>  		return 0;
>>  	}
> 
> pcie_bus_update_set() and pcie_bus_detect_mps() have a lot of
> redundancy, can't we merge this new functionality into the existing
> function? 

OK, will do.

> Also, we're in the PCIE_BUS_TUNE_OFF branch, but we seem to
> be adding code which would imply PCIE_BUS_PERFORMANCE since we're
> bringing the device up to an optimal mps to match the parent.  Is there
> a simpler solution to simply downgrade the dev_warn in
> pcie_bus_detect_mps() to dev_info and change the text?  Thanks,

We just to adjust the device's mps to make the device can work, not for an
optimal mps.

> 
> Alex
> 
> 
> .
>
Ethan Zhao July 30, 2014, 3:33 a.m. UTC | #6
Yijing,
    Seems you want to workaround platform bug in generic PCIe hotplug
code, while not specific to some platforms or devices, that is not
safe/fair to other vendor's platform.
and the updating to MPS of device is out of PCIe specification.
    So the best way is to fix within platform, at least, any platform
specific way in Linux code.

Thanks,
Ethan

On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> Currently we don't update device's mps value when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally. This issue was found in huawei 5885 server
> and Dell R620 server. And if we run the platform with windows,
> this problem is gone. This patch try to update the hot added
> device mps equal to its parent mps, if device mpss < parent mps,
> print warning.
>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> Reported-by: Keith Busch <keith.busch@intel.com>
> Reported-by: Jordan_Hargrave@Dell.com
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> ---
>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e3cf8a2..583ca52 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>  }
>
> +/**
> + * pcie_bus_update_set - update device mps when device doing hot-add
> + * @dev: PCI device to set
> + *
> + * After device hot add, mps will be set to default(128B), But the
> + * upstream port device's mps may be larger than 128B which was set
> + * by firmware during system bootup. Then we should update the device
> + * mps to equal to its parent mps, Or the device can not work normally.
> + */
> +static void pcie_bus_update_set(struct pci_dev *dev)
> +{
> +       int mps, p_mps, mpss;
> +       struct pci_dev *parent;
> +
> +       if (!pci_is_pcie(dev) || !dev->bus->self
> +                       || !dev->bus->self->is_hotplug_bridge)
> +               return;
> +
> +       parent = dev->bus->self;
> +       mps = pcie_get_mps(dev);
> +       p_mps = pcie_get_mps(parent);
> +
> +       if (mps >= p_mps)
> +               return;
> +
> +       mpss = 128 << dev->pcie_mpss;
> +       if (mpss < p_mps) {
> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
> +                               mpss, p_mps);
> +               return;
> +       }
> +
> +       pcie_write_mps(dev, p_mps);
> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
> +}
> +
>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>  {
>         struct pci_dev *bridge = dev->bus->self;
> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>                 return 0;
>
>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
> +               pcie_bus_update_set(dev);
>                 pcie_bus_detect_mps(dev);
>                 return 0;
>         }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang July 30, 2014, 3:35 a.m. UTC | #7
On 2014/7/30 0:42, Alex Williamson wrote:
> On Tue, 2014-07-29 at 10:30 -0600, Keith Busch wrote:
>> On Tue, 29 Jul 2014, Alex Williamson wrote:
>>> On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
>>>> Currently we don't update device's mps value when doing
>>>> pci device hot-add. The hot-added device's mps will be set
>>>> to default value (128B). But the upstream port device's mps
>>>> may be larger than 128B which was set by firmware during
>>>> system bootup. In this case the new added device may not
>>>> work normally.
>>>
>>> Apologies if we rehash some previously discussed topics while I try to
>>> cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
>>> The device should be functional with a lower mps setting, right?
>>
>> You'd think so, but some platforms don't work. A pci-e trace showed
>> TLPs exceeding MPS when parent device at 256B and the end device left
>> at 128B. Even if that's a platform bug, I think we still want it to work.
> 
> But if it's a platform bug for a non-compliant device, should it be
> handled as a quirk rather than standard configuration?  Thanks,

This is not a platform bug, in PCIe 3.0 spec 7.8.4, page 618, it said
Software can control this to achieve some purposes.


IMPLEMENTATION NOTE
Use of Max_Payload_Size
The Max_Payload_Size mechanism allows software to control the maximum payload in packets sent
by Endpoints to balance latency versus bandwidth trade-offs, particularly for isochronous traffic.
If software chooses to program the Max_Payload_Size of various System Elements to non-default
values, it must take care to ensure that each packet does not exceed the Max_Payload_Size
parameter of any System Element along the packet's path.  Otherwise, the packet will be rejected by  20
the System Element whose Max_Payload_Size parameter is too small.
Discussion of specific algorithms used to configure Max_Payload_Size to meet this requirement is
beyond the scope of this specification, but software should base its algorithm upon factors such as
the following:
‰  the Max_Payload_Size capability of each System Element within a hierarchy  25
‰  awareness of when System Elements are added or removed through Hot-Plug operations
‰  knowing which System Elements send packets to each other, what type of traffic is carried, what
type of transactions are used, or if packet sizes are constrained by other mechanisms

For the case of firmware that configures System Elements in preparation for running legacy
operating system environments, the firmware may need to avoid programming a Max_Payload_Size
above the default of 128 bytes, which is the minimum supported by Endpoints.
For example, if the operating system environment does not comprehend PCI Express, firmware
probably should not program a non-default Max_Payload_Size for a hierarchy that supports Hot- 5
Plug operations.  Otherwise, if no software is present to manage Max_Payload_Size settings when a
new element is added, improper operation may result.  Note that a newly added element may not
even support a Max_Payload_Size setting as large as the rest of the hierarchy, in which case software
may need to deny enabling the new element or reduce the Max_Payload_Size settings of other
elements.

> 
> Alex
> 
> 
> 
> .
>
Yijing Wang July 30, 2014, 3:42 a.m. UTC | #8
On 2014/7/30 11:33, Ethan Zhao wrote:
> Yijing,
>     Seems you want to workaround platform bug in generic PCIe hotplug
> code, while not specific to some platforms or devices, that is not
> safe/fair to other vendor's platform.
> and the updating to MPS of device is out of PCIe specification.
>     So the best way is to fix within platform, at least, any platform
> specific way in Linux code.

No, this is not a platform bug, and this is safe, you can refer to my last reply
to Alex.

> 
> Thanks,
> Ethan
> 
> On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Currently we don't update device's mps value when doing
>> pci device hot-add. The hot-added device's mps will be set
>> to default value (128B). But the upstream port device's mps
>> may be larger than 128B which was set by firmware during
>> system bootup. In this case the new added device may not
>> work normally. This issue was found in huawei 5885 server
>> and Dell R620 server. And if we run the platform with windows,
>> this problem is gone. This patch try to update the hot added
>> device mps equal to its parent mps, if device mpss < parent mps,
>> print warning.
>>
>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
>> Reported-by: Keith Busch <keith.busch@intel.com>
>> Reported-by: Jordan_Hargrave@Dell.com
>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> Cc: Jon Mason <jdmason@kudzu.us>
>> ---
>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index e3cf8a2..583ca52 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>  }
>>
>> +/**
>> + * pcie_bus_update_set - update device mps when device doing hot-add
>> + * @dev: PCI device to set
>> + *
>> + * After device hot add, mps will be set to default(128B), But the
>> + * upstream port device's mps may be larger than 128B which was set
>> + * by firmware during system bootup. Then we should update the device
>> + * mps to equal to its parent mps, Or the device can not work normally.
>> + */
>> +static void pcie_bus_update_set(struct pci_dev *dev)
>> +{
>> +       int mps, p_mps, mpss;
>> +       struct pci_dev *parent;
>> +
>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>> +                       || !dev->bus->self->is_hotplug_bridge)
>> +               return;
>> +
>> +       parent = dev->bus->self;
>> +       mps = pcie_get_mps(dev);
>> +       p_mps = pcie_get_mps(parent);
>> +
>> +       if (mps >= p_mps)
>> +               return;
>> +
>> +       mpss = 128 << dev->pcie_mpss;
>> +       if (mpss < p_mps) {
>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>> +                               mpss, p_mps);
>> +               return;
>> +       }
>> +
>> +       pcie_write_mps(dev, p_mps);
>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>> +}
>> +
>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>  {
>>         struct pci_dev *bridge = dev->bus->self;
>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>                 return 0;
>>
>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>> +               pcie_bus_update_set(dev);
>>                 pcie_bus_detect_mps(dev);
>>                 return 0;
>>         }
>> --
>> 1.7.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
>
Ethan Zhao July 30, 2014, 3:58 a.m. UTC | #9
> 在 2014年7月30日,上午11:42,Yijing Wang <wangyijing@huawei.com> 写道:
> 
>> On 2014/7/30 11:33, Ethan Zhao wrote:
>> Yijing,
>>    Seems you want to workaround platform bug in generic PCIe hotplug
>> code, while not specific to some platforms or devices, that is not
>> safe/fair to other vendor's platform.
>> and the updating to MPS of device is out of PCIe specification.
>>    So the best way is to fix within platform, at least, any platform
>> specific way in Linux code.
> 
> No, this is not a platform bug, and this is safe, you can refer to my last reply
> to Alex.
if is not a platform bug, why Didn't other platforms that can do hotplug report such issue ?
If it was safe, is it possible that you have all other vendors verify it ? 
> 
>> 
>> Thanks,
>> Ethan
>> 
>>> On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> Currently we don't update device's mps value when doing
>>> pci device hot-add. The hot-added device's mps will be set
>>> to default value (128B). But the upstream port device's mps
>>> may be larger than 128B which was set by firmware during
>>> system bootup. In this case the new added device may not
>>> work normally. This issue was found in huawei 5885 server
>>> and Dell R620 server. And if we run the platform with windows,
>>> this problem is gone. This patch try to update the hot added
>>> device mps equal to its parent mps, if device mpss < parent mps,
>>> print warning.
>>> 
>>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>> Reported-by: Jordan_Hargrave@Dell.com
>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>> Cc: Jon Mason <jdmason@kudzu.us>
>>> ---
>>> drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>> 1 files changed, 39 insertions(+), 0 deletions(-)
>>> 
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index e3cf8a2..583ca52 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>                dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>> }
>>> 
>>> +/**
>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>> + * @dev: PCI device to set
>>> + *
>>> + * After device hot add, mps will be set to default(128B), But the
>>> + * upstream port device's mps may be larger than 128B which was set
>>> + * by firmware during system bootup. Then we should update the device
>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>> + */
>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>> +{
>>> +       int mps, p_mps, mpss;
>>> +       struct pci_dev *parent;
>>> +
>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>> +               return;
>>> +
>>> +       parent = dev->bus->self;
>>> +       mps = pcie_get_mps(dev);
>>> +       p_mps = pcie_get_mps(parent);
>>> +
>>> +       if (mps >= p_mps)
>>> +               return;
>>> +
>>> +       mpss = 128 << dev->pcie_mpss;
>>> +       if (mpss < p_mps) {
>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>> +                               mpss, p_mps);
>>> +               return;
>>> +       }
>>> +
>>> +       pcie_write_mps(dev, p_mps);
>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>> +}
>>> +
>>> static void pcie_bus_detect_mps(struct pci_dev *dev)
>>> {
>>>        struct pci_dev *bridge = dev->bus->self;
>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>                return 0;
>>> 
>>>        if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>> +               pcie_bus_update_set(dev);
>>>                pcie_bus_detect_mps(dev);
>>>                return 0;
>>>        }
>>> --
>>> 1.7.1
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> .
> 
> 
> -- 
> Thanks!
> Yijing
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang July 30, 2014, 4:42 a.m. UTC | #10
On 2014/7/30 11:58, Ethan Zhao wrote:
> 
> 
>> 在 2014年7月30日,上午11:42,Yijing Wang <wangyijing@huawei.com> 写道:
>>
>>> On 2014/7/30 11:33, Ethan Zhao wrote:
>>> Yijing,
>>>    Seems you want to workaround platform bug in generic PCIe hotplug
>>> code, while not specific to some platforms or devices, that is not
>>> safe/fair to other vendor's platform.
>>> and the updating to MPS of device is out of PCIe specification.
>>>    So the best way is to fix within platform, at least, any platform
>>> specific way in Linux code.
>>
>> No, this is not a platform bug, and this is safe, you can refer to my last reply
>> to Alex.
> if is not a platform bug, why Didn't other platforms that can do hotplug report such issue ?
> If it was safe, is it possible that you have all other vendors verify it ? 

This issue was triggered by firmware set the device mps to 256B. It's not related to platform
hardware.

I think this is safe, because I only modified the device newly hot-added, this device is not enabled yet.

Thanks!
Yijing.


>>>
>>>> On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>>> Currently we don't update device's mps value when doing
>>>> pci device hot-add. The hot-added device's mps will be set
>>>> to default value (128B). But the upstream port device's mps
>>>> may be larger than 128B which was set by firmware during
>>>> system bootup. In this case the new added device may not
>>>> work normally. This issue was found in huawei 5885 server
>>>> and Dell R620 server. And if we run the platform with windows,
>>>> this problem is gone. This patch try to update the hot added
>>>> device mps equal to its parent mps, if device mpss < parent mps,
>>>> print warning.
>>>>
>>>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>> ---
>>>> drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>> 1 files changed, 39 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>> index e3cf8a2..583ca52 100644
>>>> --- a/drivers/pci/probe.c
>>>> +++ b/drivers/pci/probe.c
>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>                dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>> }
>>>>
>>>> +/**
>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>> + * @dev: PCI device to set
>>>> + *
>>>> + * After device hot add, mps will be set to default(128B), But the
>>>> + * upstream port device's mps may be larger than 128B which was set
>>>> + * by firmware during system bootup. Then we should update the device
>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>> + */
>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>> +{
>>>> +       int mps, p_mps, mpss;
>>>> +       struct pci_dev *parent;
>>>> +
>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>> +               return;
>>>> +
>>>> +       parent = dev->bus->self;
>>>> +       mps = pcie_get_mps(dev);
>>>> +       p_mps = pcie_get_mps(parent);
>>>> +
>>>> +       if (mps >= p_mps)
>>>> +               return;
>>>> +
>>>> +       mpss = 128 << dev->pcie_mpss;
>>>> +       if (mpss < p_mps) {
>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>> +                               mpss, p_mps);
>>>> +               return;
>>>> +       }
>>>> +
>>>> +       pcie_write_mps(dev, p_mps);
>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>> +}
>>>> +
>>>> static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>> {
>>>>        struct pci_dev *bridge = dev->bus->self;
>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>                return 0;
>>>>
>>>>        if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>> +               pcie_bus_update_set(dev);
>>>>                pcie_bus_detect_mps(dev);
>>>>                return 0;
>>>>        }
>>>> --
>>>> 1.7.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> .
>>
>>
>> -- 
>> Thanks!
>> Yijing
>>
> 
> .
>
Ethan Zhao July 30, 2014, 6:26 a.m. UTC | #11
On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> Currently we don't update device's mps value when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally. This issue was found in huawei 5885 server
> and Dell R620 server. And if we run the platform with windows,
> this problem is gone. This patch try to update the hot added
> device mps equal to its parent mps, if device mpss < parent mps,
> print warning.
>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671

http://marc.info/?l=e1000-devel&m=134182518500774&w=2

The bug was reported by Joe.jin@oracle.com. the related hardware
is SUN FIRE X2270 M2, it is not a hotplug sever except SAS HDD,
and the the issue is definitely a BIOS bug. not related to the hotplug
code path. was fixed by BIOS.

Thanks,
Ethan

> Reported-by: Keith Busch <keith.busch@intel.com>
> Reported-by: Jordan_Hargrave@Dell.com
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> ---
>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e3cf8a2..583ca52 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>  }
>
> +/**
> + * pcie_bus_update_set - update device mps when device doing hot-add
> + * @dev: PCI device to set
> + *
> + * After device hot add, mps will be set to default(128B), But the
> + * upstream port device's mps may be larger than 128B which was set
> + * by firmware during system bootup. Then we should update the device
> + * mps to equal to its parent mps, Or the device can not work normally.
> + */
> +static void pcie_bus_update_set(struct pci_dev *dev)
> +{
> +       int mps, p_mps, mpss;
> +       struct pci_dev *parent;
> +
> +       if (!pci_is_pcie(dev) || !dev->bus->self
> +                       || !dev->bus->self->is_hotplug_bridge)
> +               return;
> +
> +       parent = dev->bus->self;
> +       mps = pcie_get_mps(dev);
> +       p_mps = pcie_get_mps(parent);
> +
> +       if (mps >= p_mps)
> +               return;
> +
> +       mpss = 128 << dev->pcie_mpss;
> +       if (mpss < p_mps) {
> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
> +                               mpss, p_mps);
> +               return;
> +       }
> +
> +       pcie_write_mps(dev, p_mps);
> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
> +}
> +
>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>  {
>         struct pci_dev *bridge = dev->bus->self;
> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>                 return 0;
>
>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
> +               pcie_bus_update_set(dev);
>                 pcie_bus_detect_mps(dev);
>                 return 0;
>         }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang July 30, 2014, 6:57 a.m. UTC | #12
On 2014/7/30 14:26, Ethan Zhao wrote:
> On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Currently we don't update device's mps value when doing
>> pci device hot-add. The hot-added device's mps will be set
>> to default value (128B). But the upstream port device's mps
>> may be larger than 128B which was set by firmware during
>> system bootup. In this case the new added device may not
>> work normally. This issue was found in huawei 5885 server
>> and Dell R620 server. And if we run the platform with windows,
>> this problem is gone. This patch try to update the hot added
>> device mps equal to its parent mps, if device mpss < parent mps,
>> print warning.
>>
>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> 
> http://marc.info/?l=e1000-devel&m=134182518500774&w=2
> 
> The bug was reported by Joe.jin@oracle.com. the related hardware
> is SUN FIRE X2270 M2, it is not a hotplug sever except SAS HDD,
> and the the issue is definitely a BIOS bug. not related to the hotplug
> code path. was fixed by BIOS.
> 

Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
But that issue is not same as this one, Keith and Jordan found the issue
after hot-plug. And my patch only modify the hotplug slot connected device.

In my idea, make the device work is important, because these platforms with windows
can run happy, why linux leave this issue to BIOS.

Anyway, this is my personal opinion.

Thanks!
Yijing.

> 
>> Reported-by: Keith Busch <keith.busch@intel.com>
>> Reported-by: Jordan_Hargrave@Dell.com
>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> Cc: Jon Mason <jdmason@kudzu.us>
>> ---
>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index e3cf8a2..583ca52 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>  }
>>
>> +/**
>> + * pcie_bus_update_set - update device mps when device doing hot-add
>> + * @dev: PCI device to set
>> + *
>> + * After device hot add, mps will be set to default(128B), But the
>> + * upstream port device's mps may be larger than 128B which was set
>> + * by firmware during system bootup. Then we should update the device
>> + * mps to equal to its parent mps, Or the device can not work normally.
>> + */
>> +static void pcie_bus_update_set(struct pci_dev *dev)
>> +{
>> +       int mps, p_mps, mpss;
>> +       struct pci_dev *parent;
>> +
>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>> +                       || !dev->bus->self->is_hotplug_bridge)
>> +               return;
>> +
>> +       parent = dev->bus->self;
>> +       mps = pcie_get_mps(dev);
>> +       p_mps = pcie_get_mps(parent);
>> +
>> +       if (mps >= p_mps)
>> +               return;
>> +
>> +       mpss = 128 << dev->pcie_mpss;
>> +       if (mpss < p_mps) {
>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>> +                               mpss, p_mps);
>> +               return;
>> +       }
>> +
>> +       pcie_write_mps(dev, p_mps);
>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>> +}
>> +
>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>  {
>>         struct pci_dev *bridge = dev->bus->self;
>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>                 return 0;
>>
>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>> +               pcie_bus_update_set(dev);
>>                 pcie_bus_detect_mps(dev);
>>                 return 0;
>>         }
>> --
>> 1.7.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
>
Ethan Zhao July 30, 2014, 7:17 a.m. UTC | #13
On Wed, Jul 30, 2014 at 2:57 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> On 2014/7/30 14:26, Ethan Zhao wrote:
>> On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> Currently we don't update device's mps value when doing
>>> pci device hot-add. The hot-added device's mps will be set
>>> to default value (128B). But the upstream port device's mps
>>> may be larger than 128B which was set by firmware during
>>> system bootup. In this case the new added device may not
>>> work normally. This issue was found in huawei 5885 server
>>> and Dell R620 server. And if we run the platform with windows,
>>> this problem is gone. This patch try to update the hot added
>>> device mps equal to its parent mps, if device mpss < parent mps,
>>> print warning.
>>>
>>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
>>
>> http://marc.info/?l=e1000-devel&m=134182518500774&w=2
>>
>> The bug was reported by Joe.jin@oracle.com. the related hardware
>> is SUN FIRE X2270 M2, it is not a hotplug sever except SAS HDD,
>> and the the issue is definitely a BIOS bug. not related to the hotplug
>> code path. was fixed by BIOS.
>>
>
> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
> But that issue is not same as this one, Keith and Jordan found the issue
> after hot-plug. And my patch only modify the hotplug slot connected device.
>
> In my idea, make the device work is important, because these platforms with windows
> can run happy, why linux leave this issue to BIOS.

   That is a reason to make it works with Linux, but does your
platform have _HPX from ACPI for those hot-added back devices ?  if it
has, maybe windows could apply _HPX to configure those devices and
work well.

   Maybe you should check with _HPX .  That is a generic way to import
configuration from firmware when hot-add device.

  Please see also
  chapter 6.2.8 _HPX (Hot Plug Parameter Extensions) of ACPI spec 4&5

  Thanks,
  Ethan


>
> Anyway, this is my personal opinion.
>
> Thanks!
> Yijing.
>
>>
>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>> Reported-by: Jordan_Hargrave@Dell.com
>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>> Cc: Jon Mason <jdmason@kudzu.us>
>>> ---
>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index e3cf8a2..583ca52 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>  }
>>>
>>> +/**
>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>> + * @dev: PCI device to set
>>> + *
>>> + * After device hot add, mps will be set to default(128B), But the
>>> + * upstream port device's mps may be larger than 128B which was set
>>> + * by firmware during system bootup. Then we should update the device
>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>> + */
>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>> +{
>>> +       int mps, p_mps, mpss;
>>> +       struct pci_dev *parent;
>>> +
>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>> +               return;
>>> +
>>> +       parent = dev->bus->self;
>>> +       mps = pcie_get_mps(dev);
>>> +       p_mps = pcie_get_mps(parent);
>>> +
>>> +       if (mps >= p_mps)
>>> +               return;
>>> +
>>> +       mpss = 128 << dev->pcie_mpss;
>>> +       if (mpss < p_mps) {
>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>> +                               mpss, p_mps);
>>> +               return;
>>> +       }
>>> +
>>> +       pcie_write_mps(dev, p_mps);
>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>> +}
>>> +
>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>  {
>>>         struct pci_dev *bridge = dev->bus->self;
>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>                 return 0;
>>>
>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>> +               pcie_bus_update_set(dev);
>>>                 pcie_bus_detect_mps(dev);
>>>                 return 0;
>>>         }
>>> --
>>> 1.7.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> .
>>
>
>
> --
> Thanks!
> Yijing
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang July 30, 2014, 8:13 a.m. UTC | #14
>> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
>> But that issue is not same as this one, Keith and Jordan found the issue
>> after hot-plug. And my patch only modify the hotplug slot connected device.
>>
>> In my idea, make the device work is important, because these platforms with windows
>> can run happy, why linux leave this issue to BIOS.
> 
>    That is a reason to make it works with Linux, but does your
> platform have _HPX from ACPI for those hot-added back devices ?  if it
> has, maybe windows could apply _HPX to configure those devices and
> work well.
> 

I checked DSDT table exported from my server, but no "_HPX" found.
Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
driver won't touch ACPI methods like "_HPX".

Thanks!
Yijing.

>>
>>>
>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>> ---
>>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>> index e3cf8a2..583ca52 100644
>>>> --- a/drivers/pci/probe.c
>>>> +++ b/drivers/pci/probe.c
>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>>  }
>>>>
>>>> +/**
>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>> + * @dev: PCI device to set
>>>> + *
>>>> + * After device hot add, mps will be set to default(128B), But the
>>>> + * upstream port device's mps may be larger than 128B which was set
>>>> + * by firmware during system bootup. Then we should update the device
>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>> + */
>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>> +{
>>>> +       int mps, p_mps, mpss;
>>>> +       struct pci_dev *parent;
>>>> +
>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>> +               return;
>>>> +
>>>> +       parent = dev->bus->self;
>>>> +       mps = pcie_get_mps(dev);
>>>> +       p_mps = pcie_get_mps(parent);
>>>> +
>>>> +       if (mps >= p_mps)
>>>> +               return;
>>>> +
>>>> +       mpss = 128 << dev->pcie_mpss;
>>>> +       if (mpss < p_mps) {
>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>> +                               mpss, p_mps);
>>>> +               return;
>>>> +       }
>>>> +
>>>> +       pcie_write_mps(dev, p_mps);
>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>> +}
>>>> +
>>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>>  {
>>>>         struct pci_dev *bridge = dev->bus->self;
>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>                 return 0;
>>>>
>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>> +               pcie_bus_update_set(dev);
>>>>                 pcie_bus_detect_mps(dev);
>>>>                 return 0;
>>>>         }
>>>> --
>>>> 1.7.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> .
>>>
>>
>>
>> --
>> Thanks!
>> Yijing
>>
> 
> .
>
Ethan Zhao July 30, 2014, 8:38 a.m. UTC | #15
On Wed, Jul 30, 2014 at 4:13 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
>>> But that issue is not same as this one, Keith and Jordan found the issue
>>> after hot-plug. And my patch only modify the hotplug slot connected device.
>>>
>>> In my idea, make the device work is important, because these platforms with windows
>>> can run happy, why linux leave this issue to BIOS.
>>
>>    That is a reason to make it works with Linux, but does your
>> platform have _HPX from ACPI for those hot-added back devices ?  if it
>> has, maybe windows could apply _HPX to configure those devices and
>> work well.
>>
>
> I checked DSDT table exported from my server, but no "_HPX" found.

You might check the wrong place...

> Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
> driver won't touch ACPI methods like "_HPX".

That is not true, please read the code in pciehp_pci.c

 pciehp_configure_device()
    pci_configure_slot()
       pci_get_hp_params()
           acpi_run_hpx()
               acpi_evaluate_object(handle, "_HPX",...)
 ...

Native pciehp also does acpi parameter importing ... ...

Thank,s
Ethan

>
> Thanks!
> Yijing.
>
>>>
>>>>
>>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>>> ---
>>>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>>> index e3cf8a2..583ca52 100644
>>>>> --- a/drivers/pci/probe.c
>>>>> +++ b/drivers/pci/probe.c
>>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>>>  }
>>>>>
>>>>> +/**
>>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>>> + * @dev: PCI device to set
>>>>> + *
>>>>> + * After device hot add, mps will be set to default(128B), But the
>>>>> + * upstream port device's mps may be larger than 128B which was set
>>>>> + * by firmware during system bootup. Then we should update the device
>>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>>> + */
>>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>>> +{
>>>>> +       int mps, p_mps, mpss;
>>>>> +       struct pci_dev *parent;
>>>>> +
>>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>>> +               return;
>>>>> +
>>>>> +       parent = dev->bus->self;
>>>>> +       mps = pcie_get_mps(dev);
>>>>> +       p_mps = pcie_get_mps(parent);
>>>>> +
>>>>> +       if (mps >= p_mps)
>>>>> +               return;
>>>>> +
>>>>> +       mpss = 128 << dev->pcie_mpss;
>>>>> +       if (mpss < p_mps) {
>>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>>> +                               mpss, p_mps);
>>>>> +               return;
>>>>> +       }
>>>>> +
>>>>> +       pcie_write_mps(dev, p_mps);
>>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>>> +}
>>>>> +
>>>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>>>  {
>>>>>         struct pci_dev *bridge = dev->bus->self;
>>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>>                 return 0;
>>>>>
>>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>> +               pcie_bus_update_set(dev);
>>>>>                 pcie_bus_detect_mps(dev);
>>>>>                 return 0;
>>>>>         }
>>>>> --
>>>>> 1.7.1
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Thanks!
>>> Yijing
>>>
>>
>> .
>>
>
>
> --
> Thanks!
> Yijing
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang July 30, 2014, 9:17 a.m. UTC | #16
On 2014/7/30 16:38, Ethan Zhao wrote:
> On Wed, Jul 30, 2014 at 4:13 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>>> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
>>>> But that issue is not same as this one, Keith and Jordan found the issue
>>>> after hot-plug. And my patch only modify the hotplug slot connected device.
>>>>
>>>> In my idea, make the device work is important, because these platforms with windows
>>>> can run happy, why linux leave this issue to BIOS.
>>>
>>>    That is a reason to make it works with Linux, but does your
>>> platform have _HPX from ACPI for those hot-added back devices ?  if it
>>> has, maybe windows could apply _HPX to configure those devices and
>>> work well.
>>>
>>
>> I checked DSDT table exported from my server, but no "_HPX" found.
> 
> You might check the wrong place...
> 
>> Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
>> driver won't touch ACPI methods like "_HPX".
> 
> That is not true, please read the code in pciehp_pci.c
> 
>  pciehp_configure_device()
>     pci_configure_slot()
>        pci_get_hp_params()
>            acpi_run_hpx()
>                acpi_evaluate_object(handle, "_HPX",...)
>  ...
> 

Sorry, I make a mistake. You are right, Use the _HPX can fix the issue from BIOS.
But what confused me is why linux fail to reconfigure mps after hotplug, but windows did it well.

Maybe we can cache the device mps into pci_dev, every time we call pcie_bus_configure_settings(),
if we find the device mps is not equal to the cached value, and the device is disabled, restore it.


/* PCI Express Setting Record (Type 2) */
struct hpp_type2 {
	u32 revision;
	u32 unc_err_mask_and;
	u32 unc_err_mask_or;
	u32 unc_err_sever_and;
	u32 unc_err_sever_or;
	u32 cor_err_mask_and;
	u32 cor_err_mask_or;
	u32 adv_err_cap_and;
	u32 adv_err_cap_or;
	u16 pci_exp_devctl_and;
	u16 pci_exp_devctl_or;
	u16 pci_exp_lnkctl_and;
	u16 pci_exp_lnkctl_or;
	u32 sec_unc_err_sever_and;
	u32 sec_unc_err_sever_or;
	u32 sec_unc_err_mask_and;
	u32 sec_unc_err_mask_or;
};

> Native pciehp also does acpi parameter importing ... ...
> 
> Thank,s
> Ethan
> 
>>
>> Thanks!
>> Yijing.
>>
>>>>
>>>>>
>>>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>>>> ---
>>>>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>>>> index e3cf8a2..583ca52 100644
>>>>>> --- a/drivers/pci/probe.c
>>>>>> +++ b/drivers/pci/probe.c
>>>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>>>>  }
>>>>>>
>>>>>> +/**
>>>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>>>> + * @dev: PCI device to set
>>>>>> + *
>>>>>> + * After device hot add, mps will be set to default(128B), But the
>>>>>> + * upstream port device's mps may be larger than 128B which was set
>>>>>> + * by firmware during system bootup. Then we should update the device
>>>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>>>> + */
>>>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>>>> +{
>>>>>> +       int mps, p_mps, mpss;
>>>>>> +       struct pci_dev *parent;
>>>>>> +
>>>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>>>> +               return;
>>>>>> +
>>>>>> +       parent = dev->bus->self;
>>>>>> +       mps = pcie_get_mps(dev);
>>>>>> +       p_mps = pcie_get_mps(parent);
>>>>>> +
>>>>>> +       if (mps >= p_mps)
>>>>>> +               return;
>>>>>> +
>>>>>> +       mpss = 128 << dev->pcie_mpss;
>>>>>> +       if (mpss < p_mps) {
>>>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>>>> +                               mpss, p_mps);
>>>>>> +               return;
>>>>>> +       }
>>>>>> +
>>>>>> +       pcie_write_mps(dev, p_mps);
>>>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>>>> +}
>>>>>> +
>>>>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>>>>  {
>>>>>>         struct pci_dev *bridge = dev->bus->self;
>>>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>>>                 return 0;
>>>>>>
>>>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>>> +               pcie_bus_update_set(dev);
>>>>>>                 pcie_bus_detect_mps(dev);
>>>>>>                 return 0;
>>>>>>         }
>>>>>> --
>>>>>> 1.7.1
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks!
>>>> Yijing
>>>>
>>>
>>> .
>>>
>>
>>
>> --
>> Thanks!
>> Yijing
>>
> 
> .
>
Jordan_Hargrave@Dell.com July 30, 2014, 7:41 p.m. UTC | #17
One concern with using _HPX is there doesn't appear to be a failure mechanism...
What happens if inserting a device where its maximum possible MPS is < the MPS of the parent device.

The _HPX AML would have to read the parent device MPS which would be pretty ugly.

--jordan hargrave
Dell Enterprise Linux Engineering
Bjorn Helgaas Sept. 3, 2014, 7:20 p.m. UTC | #18
On Wed, Jul 30, 2014 at 1:41 PM,  <Jordan_Hargrave@dell.com> wrote:
> One concern with using _HPX is there doesn't appear to be a failure mechanism...
> What happens if inserting a device where its maximum possible MPS is < the MPS of the parent device.
>
> The _HPX AML would have to read the parent device MPS which would be pretty ugly.

_HPX should be applied when a device is added via any hotplug driver,
including pciehp, but I don't think it is a reasonable way to fix
this.  The spec (ACPI r5.0, sec 6.2.8.3) allows a Type 2 record to set
arbitrary values in the Device Control register (including MPS), and
that's what the current Linux implementation does.

However, the spec allows OSPM to override any bits deemed necessary.
MPS is a property that is controlled by OSPM and it depends on other
devices in the system, so I don't think it is safe to put an MPS value
from _HPX directly into Device Control.  OSPM should first ensure that
the value is compatible with how it has configured the rest of the
system.

> ________________________________________
> From: Yijing Wang [wangyijing@huawei.com]
> Sent: Wednesday, July 30, 2014 4:17 AM
> To: Ethan Zhao
> Cc: Bjorn Helgaas; linux-pci; Hargrave, Jordan; <keith.busch@intel.com>; Jon Mason
> Subject: Re: [PATCH] PCI: update device mps when doing pci hotplug
>
> On 2014/7/30 16:38, Ethan Zhao wrote:
>> On Wed, Jul 30, 2014 at 4:13 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>>>> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
>>>>> But that issue is not same as this one, Keith and Jordan found the issue
>>>>> after hot-plug. And my patch only modify the hotplug slot connected device.
>>>>>
>>>>> In my idea, make the device work is important, because these platforms with windows
>>>>> can run happy, why linux leave this issue to BIOS.
>>>>
>>>>    That is a reason to make it works with Linux, but does your
>>>> platform have _HPX from ACPI for those hot-added back devices ?  if it
>>>> has, maybe windows could apply _HPX to configure those devices and
>>>> work well.
>>>>
>>>
>>> I checked DSDT table exported from my server, but no "_HPX" found.
>>
>> You might check the wrong place...
>>
>>> Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
>>> driver won't touch ACPI methods like "_HPX".
>>
>> That is not true, please read the code in pciehp_pci.c
>>
>>  pciehp_configure_device()
>>     pci_configure_slot()
>>        pci_get_hp_params()
>>            acpi_run_hpx()
>>                acpi_evaluate_object(handle, "_HPX",...)
>>  ...
>>
>
> Sorry, I make a mistake. You are right, Use the _HPX can fix the issue from BIOS.
> But what confused me is why linux fail to reconfigure mps after hotplug, but windows did it well.
>
> Maybe we can cache the device mps into pci_dev, every time we call pcie_bus_configure_settings(),
> if we find the device mps is not equal to the cached value, and the device is disabled, restore it.
>
>
> /* PCI Express Setting Record (Type 2) */
> struct hpp_type2 {
>         u32 revision;
>         u32 unc_err_mask_and;
>         u32 unc_err_mask_or;
>         u32 unc_err_sever_and;
>         u32 unc_err_sever_or;
>         u32 cor_err_mask_and;
>         u32 cor_err_mask_or;
>         u32 adv_err_cap_and;
>         u32 adv_err_cap_or;
>         u16 pci_exp_devctl_and;
>         u16 pci_exp_devctl_or;
>         u16 pci_exp_lnkctl_and;
>         u16 pci_exp_lnkctl_or;
>         u32 sec_unc_err_sever_and;
>         u32 sec_unc_err_sever_or;
>         u32 sec_unc_err_mask_and;
>         u32 sec_unc_err_mask_or;
> };
>
>> Native pciehp also does acpi parameter importing ... ...
>>
>> Thank,s
>> Ethan
>>
>>>
>>> Thanks!
>>> Yijing.
>>>
>>>>>
>>>>>>
>>>>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>>>>> ---
>>>>>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>>>>> index e3cf8a2..583ca52 100644
>>>>>>> --- a/drivers/pci/probe.c
>>>>>>> +++ b/drivers/pci/probe.c
>>>>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>>>>>  }
>>>>>>>
>>>>>>> +/**
>>>>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>>>>> + * @dev: PCI device to set
>>>>>>> + *
>>>>>>> + * After device hot add, mps will be set to default(128B), But the
>>>>>>> + * upstream port device's mps may be larger than 128B which was set
>>>>>>> + * by firmware during system bootup. Then we should update the device
>>>>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>>>>> + */
>>>>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>>>>> +{
>>>>>>> +       int mps, p_mps, mpss;
>>>>>>> +       struct pci_dev *parent;
>>>>>>> +
>>>>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>>>>> +               return;
>>>>>>> +
>>>>>>> +       parent = dev->bus->self;
>>>>>>> +       mps = pcie_get_mps(dev);
>>>>>>> +       p_mps = pcie_get_mps(parent);
>>>>>>> +
>>>>>>> +       if (mps >= p_mps)
>>>>>>> +               return;
>>>>>>> +
>>>>>>> +       mpss = 128 << dev->pcie_mpss;
>>>>>>> +       if (mpss < p_mps) {
>>>>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>>>>> +                               mpss, p_mps);
>>>>>>> +               return;
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       pcie_write_mps(dev, p_mps);
>>>>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>>>>> +}
>>>>>>> +
>>>>>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>>>>>  {
>>>>>>>         struct pci_dev *bridge = dev->bus->self;
>>>>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>>>>                 return 0;
>>>>>>>
>>>>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>>>> +               pcie_bus_update_set(dev);
>>>>>>>                 pcie_bus_detect_mps(dev);
>>>>>>>                 return 0;
>>>>>>>         }
>>>>>>> --
>>>>>>> 1.7.1
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks!
>>>>> Yijing
>>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Thanks!
>>> Yijing
>>>
>>
>> .
>>
>
>
> --
> Thanks!
> Yijing
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Sept. 3, 2014, 10:42 p.m. UTC | #19
On Tue, Jul 29, 2014 at 04:17:57PM +0800, Yijing Wang wrote:
> Currently we don't update device's mps value when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally. This issue was found in huawei 5885 server
> and Dell R620 server. And if we run the platform with windows,
> this problem is gone. This patch try to update the hot added
> device mps equal to its parent mps, if device mpss < parent mps,
> print warning.
> 
> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> Reported-by: Keith Busch <keith.busch@intel.com>
> Reported-by: Jordan_Hargrave@Dell.com
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> ---
>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e3cf8a2..583ca52 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>  		dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>  }
>  
> +/**
> + * pcie_bus_update_set - update device mps when device doing hot-add
> + * @dev: PCI device to set
> + * 
> + * After device hot add, mps will be set to default(128B), But the 
> + * upstream port device's mps may be larger than 128B which was set 
> + * by firmware during system bootup. Then we should update the device
> + * mps to equal to its parent mps, Or the device can not work normally.
> + */
> +static void pcie_bus_update_set(struct pci_dev *dev)
> +{
> +	int mps, p_mps, mpss;
> +	struct pci_dev *parent;
> +
> +	if (!pci_is_pcie(dev) || !dev->bus->self 
> +			|| !dev->bus->self->is_hotplug_bridge)

Part of this looks redundant because pcie_bus_configure_set() already
checks pci_is_pcie().  And I don't know why we need to test
is_hotplug_bridge here; MPS settings need to be consistent regardless of
whether the upstream bridge supports hotplug.

> +		return;
> +	
> +	parent = dev->bus->self;
> +	mps = pcie_get_mps(dev);
> +	p_mps = pcie_get_mps(parent);
> +
> +	if (mps >= p_mps)
> +		return;
> +
> +	mpss = 128 << dev->pcie_mpss;
> +	if (mpss < p_mps) {
> +		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
> +				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
> +				mpss, p_mps);
> +		return;

Since we can't configure the new device correctly, we really shouldn't
allow a driver to bind to it.  The current design doesn't have much
provision for doing that, so warning is probably all we can do.

> +	}
> +
> +	pcie_write_mps(dev, p_mps);
> +	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
> +			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
> +}
> +
>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>  {
>  	struct pci_dev *bridge = dev->bus->self;
> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>  		return 0;
>  
>  	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
> +		pcie_bus_update_set(dev);

You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
problem occur for other pcie_bus_config settings?

>  		pcie_bus_detect_mps(dev);
>  		return 0;
>  	}

I have some long-term ideas here (below), but to make progress in the short
term, I think we just need to make sure this handles all pcie_bus_config
settings.

Bjorn



Stepping back a long ways, I think the current design is hard to use.
It's set up with the idea that we (1) enumerate all the devices in the
system, and then (2) configure MPS for everything all at once.

That's not a very good fit when we start hotplugging devices, and it's
part of the reason MPS configuration is not well integrated into the PCI
core and doesn't get done at all for most architectures.

What I'd prefer is something that could be done in the core as each device
is enumerated, e.g., in or near pci_device_add().  I know there's tension
between the need to do this before drivers bind to the device and the
desire to enumerate the whole hierarchy before committing to MPS settings.
But we need to handle that tension anyway for hot-added devices, so we
might as well deal with it at boot-time and use the same code path for
both boot-time and hot-add time.

I have in mind something like this:

  pcie_configure_mps(struct pci_dev *dev)
  {
    int ret;

    if (!pci_is_pci(dev))
      return;

    if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
      /* set my MPS to dev->pcie_mpss (max supported size) */
      return;
    }

    if (dev->pcie_mpss >= upstream bridge MPS) {
      /* set my MPS to upstream bridge MPS */
      return;
    }

    ret = pcie_set_hierarchy_mps(pcie_root_port(dev), dev->mpss);
    if (ret == failure)
      /* emit warning, can't enable this device */
  }

  struct pci_dev *pcie_root_port(struct pci_dev *dev)
  {
    if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
      return dev;

    return pcie_root_port(dev->bus->self);
  }

  pcie_set_hierarchy_mps(struct pci_dev *root, int mpss)
  {
    struct pci_bus *secondary;
    struct pci_dev *dev;
    int ret;

    if (root->driver)
      return -EINVAL;

    secondary = root->subordinate;
    if (secondary) {
      list_for_each_entry(dev, &secondary->devices, bus_list) {
	ret = pcie_set_hierarchy(dev, mpss);
	if (ret)
	  return ret;
      }
    }

    /* set my MPS to mpss */
    return 0;
  }
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang Sept. 4, 2014, 6:12 a.m. UTC | #20
>> + * pcie_bus_update_set - update device mps when device doing hot-add
>> + * @dev: PCI device to set
>> + * 
>> + * After device hot add, mps will be set to default(128B), But the 
>> + * upstream port device's mps may be larger than 128B which was set 
>> + * by firmware during system bootup. Then we should update the device
>> + * mps to equal to its parent mps, Or the device can not work normally.
>> + */
>> +static void pcie_bus_update_set(struct pci_dev *dev)
>> +{
>> +	int mps, p_mps, mpss;
>> +	struct pci_dev *parent;
>> +
>> +	if (!pci_is_pcie(dev) || !dev->bus->self 
>> +			|| !dev->bus->self->is_hotplug_bridge)
> 
> Part of this looks redundant because pcie_bus_configure_set() already
> checks pci_is_pcie().  And I don't know why we need to test
> is_hotplug_bridge here; MPS settings need to be consistent regardless of
> whether the upstream bridge supports hotplug.

Hi Bjorn, I added is_hotplug_bridge() here is mainly to touch the hotplug case only.
It was more like a temporary solution and not perfect one.

> 
>> +		return;
>> +	
>> +	parent = dev->bus->self;
>> +	mps = pcie_get_mps(dev);
>> +	p_mps = pcie_get_mps(parent);
>> +
>> +	if (mps >= p_mps)
>> +		return;
>> +
>> +	mpss = 128 << dev->pcie_mpss;
>> +	if (mpss < p_mps) {
>> +		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>> +				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>> +				mpss, p_mps);
>> +		return;
> 
> Since we can't configure the new device correctly, we really shouldn't
> allow a driver to bind to it.  The current design doesn't have much
> provision for doing that, so warning is probably all we can do.

Yes, bind a driver to the device which mps is not correctly set will cause another problem.

> 
>> +	}
>> +
>> +	pcie_write_mps(dev, p_mps);
>> +	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
>> +			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>> +}
>> +
>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>  {
>>  	struct pci_dev *bridge = dev->bus->self;
>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>  		return 0;
>>  
>>  	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>> +		pcie_bus_update_set(dev);
> 
> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
> problem occur for other pcie_bus_config settings?

We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
This issue won't happen.

> 
>>  		pcie_bus_detect_mps(dev);
>>  		return 0;
>>  	}
> 
> I have some long-term ideas here (below), but to make progress in the short
> term, I think we just need to make sure this handles all pcie_bus_config
> settings.
> 
> Bjorn
> 
> 
> 
> Stepping back a long ways, I think the current design is hard to use.
> It's set up with the idea that we (1) enumerate all the devices in the
> system, and then (2) configure MPS for everything all at once.
> 
> That's not a very good fit when we start hotplugging devices, and it's
> part of the reason MPS configuration is not well integrated into the PCI
> core and doesn't get done at all for most architectures.

Agree, arch code should not be involved the MPS setting. It's arch independent.

> 
> What I'd prefer is something that could be done in the core as each device
> is enumerated, e.g., in or near pci_device_add().  I know there's tension
> between the need to do this before drivers bind to the device and the
> desire to enumerate the whole hierarchy before committing to MPS settings.
> But we need to handle that tension anyway for hot-added devices, so we
> might as well deal with it at boot-time and use the same code path for
> both boot-time and hot-add time.
> 
> I have in mind something like this:
> 
>   pcie_configure_mps(struct pci_dev *dev)
>   {
>     int ret;
> 
>     if (!pci_is_pci(dev))
>       return;
> 
>     if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
>       /* set my MPS to dev->pcie_mpss (max supported size) */
>       return;
>     }
> 
>     if (dev->pcie_mpss >= upstream bridge MPS) {
>       /* set my MPS to upstream bridge MPS */
>       return;
>     }
> 
>     ret = pcie_set_hierarchy_mps(pcie_root_port(dev), dev->mpss);
>     if (ret == failure)
>       /* emit warning, can't enable this device */

If got failure here, should roll back ? What about set hierarchy mps in reverse order(down to top).

>   }
> 
>   struct pci_dev *pcie_root_port(struct pci_dev *dev)
>   {
>     if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
>       return dev;
> 
>     return pcie_root_port(dev->bus->self);
>   }
> 
>   pcie_set_hierarchy_mps(struct pci_dev *root, int mpss)
>   {
>     struct pci_bus *secondary;
>     struct pci_dev *dev;
>     int ret;
> 
>     if (root->driver)
>       return -EINVAL;

Maybe it's not safe enough, change device's mps has risk unless all its children devices have no driver bound(disabled).
A root port may has no pcieport driver bound, if pcieport driver probe failed. But its children device can work normally.

> 
>     secondary = root->subordinate;
>     if (secondary) {
>       list_for_each_entry(dev, &secondary->devices, bus_list) {
> 	ret = pcie_set_hierarchy(dev, mpss);
> 	if (ret)
> 	  return ret;
>       }
>     }
> 
>     /* set my MPS to mpss */
>     return 0;
>   }





> 
> .
>
Bjorn Helgaas Sept. 4, 2014, 1:16 p.m. UTC | #21
On Thu, Sep 4, 2014 at 12:12 AM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>> + * @dev: PCI device to set
>>> + *
>>> + * After device hot add, mps will be set to default(128B), But the
>>> + * upstream port device's mps may be larger than 128B which was set
>>> + * by firmware during system bootup. Then we should update the device
>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>> + */
>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>> +{
>>> +    int mps, p_mps, mpss;
>>> +    struct pci_dev *parent;
>>> +
>>> +    if (!pci_is_pcie(dev) || !dev->bus->self
>>> +                    || !dev->bus->self->is_hotplug_bridge)
>>
>> Part of this looks redundant because pcie_bus_configure_set() already
>> checks pci_is_pcie().  And I don't know why we need to test
>> is_hotplug_bridge here; MPS settings need to be consistent regardless of
>> whether the upstream bridge supports hotplug.
>
> Hi Bjorn, I added is_hotplug_bridge() here is mainly to touch the hotplug case only.
> It was more like a temporary solution and not perfect one.
>
>>
>>> +            return;
>>> +
>>> +    parent = dev->bus->self;
>>> +    mps = pcie_get_mps(dev);
>>> +    p_mps = pcie_get_mps(parent);
>>> +
>>> +    if (mps >= p_mps)
>>> +            return;
>>> +
>>> +    mpss = 128 << dev->pcie_mpss;
>>> +    if (mpss < p_mps) {
>>> +            dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>> +                            "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>> +                            mpss, p_mps);
>>> +            return;
>>
>> Since we can't configure the new device correctly, we really shouldn't
>> allow a driver to bind to it.  The current design doesn't have much
>> provision for doing that, so warning is probably all we can do.
>
> Yes, bind a driver to the device which mps is not correctly set will cause another problem.
>
>>
>>> +    }
>>> +
>>> +    pcie_write_mps(dev, p_mps);
>>> +    dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>> +                    pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>> +}
>>> +
>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>  {
>>>      struct pci_dev *bridge = dev->bus->self;
>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>              return 0;
>>>
>>>      if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>> +            pcie_bus_update_set(dev);
>>
>> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
>> problem occur for other pcie_bus_config settings?
>
> We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
> This issue won't happen.

Sorry, I can't parse this.  Are you saying the problem won't happen in
the other modes?  Why not?

>>>              pcie_bus_detect_mps(dev);
>>>              return 0;
>>>      }
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang Sept. 5, 2014, 1:27 a.m. UTC | #22
>>>>      if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>> +            pcie_bus_update_set(dev);
>>>
>>> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
>>> problem occur for other pcie_bus_config settings?
>>
>> We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
>> This issue won't happen.
> 
> Sorry, I can't parse this.  Are you saying the problem won't happen in
> the other modes?  Why not?

Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the largest available mpss in a pcie path.
Then call pcie_bus_configure_set() to set all devices' mps to the largest available mps in this path, so
all devices in the path will have the same mps. When in PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in pcie_write_mps(),

			/* For "Performance", the assumption is made that
			 * downstream communication will never be larger than
			 * the MRRS.  So, the MPS only needs to be configured
			 * for the upstream communication.  This being the case, <------
 			 * walk from the top down and set the MPS of the child
			 * to that of the parent bus.

So I think the problem won't happen in other modes.

Thanks!
Yijing.


> 
>>>>              pcie_bus_detect_mps(dev);
>>>>              return 0;
>>>>      }
> 
> .
>
Keith Busch Sept. 5, 2014, 2:37 p.m. UTC | #23
On Thu, 4 Sep 2014, Yijing Wang wrote:
> So I think the problem won't happen in other modes.

Yes, you're right; we currently recommend end-users set one of the other
bus tuning options because the problem doesn't happen there.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keith Busch Sept. 24, 2014, 10:41 p.m. UTC | #24
Just poking this thread to make sure it's not dead. :)

I tested Yijing's proposal and it is successful on our Intel server
platforms; hoping either this or something that derives similar behavior
will be applied so we can remove bus tuning kernel parameters.

Tested-by: Keith Busch <keith.busch@intel.com>

On Thu, 4 Sep 2014, Yijing Wang wrote:
>>>>>      if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>> +            pcie_bus_update_set(dev);
>>>>
>>>> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
>>>> problem occur for other pcie_bus_config settings?
>>>
>>> We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
>>> This issue won't happen.
>>
>> Sorry, I can't parse this.  Are you saying the problem won't happen in
>> the other modes?  Why not?
>
> Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the largest available mpss in a pcie path.
> Then call pcie_bus_configure_set() to set all devices' mps to the largest available mps in this path, so
> all devices in the path will have the same mps. When in PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
> for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in pcie_write_mps(),
>
> 			/* For "Performance", the assumption is made that
> 			 * downstream communication will never be larger than
> 			 * the MRRS.  So, the MPS only needs to be configured
> 			 * for the upstream communication.  This being the case, <------
> 			 * walk from the top down and set the MPS of the child
> 			 * to that of the parent bus.
>
> So I think the problem won't happen in other modes.
>
> Thanks!
> Yijing.
>
>
>>
>>>>>              pcie_bus_detect_mps(dev);
>>>>>              return 0;
>>>>>      }
>>
>> .
>>
>
>
> -- 
> Thanks!
> Yijing
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Sept. 24, 2014, 11:30 p.m. UTC | #25
On Wed, Sep 24, 2014 at 4:41 PM, Keith Busch <keith.busch@intel.com> wrote:
> Just poking this thread to make sure it's not dead. :)
>
> I tested Yijing's proposal and it is successful on our Intel server
> platforms; hoping either this or something that derives similar behavior
> will be applied so we can remove bus tuning kernel parameters.
>
> Tested-by: Keith Busch <keith.busch@intel.com>

Oops, thanks for poking me, because this was indeed dead.

My main objection was to testing "is_hotplug_bridge".  That doesn't
seem right, because this issue really isn't specific to hotplug.  I
didn't see a resolution of that, but let me know if I missed it.

Bjorn

> On Thu, 4 Sep 2014, Yijing Wang wrote:
>>>>>>
>>>>>>      if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>>> +            pcie_bus_update_set(dev);
>>>>>
>>>>>
>>>>> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
>>>>> problem occur for other pcie_bus_config settings?
>>>>
>>>>
>>>> We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like
>>>> PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
>>>> This issue won't happen.
>>>
>>>
>>> Sorry, I can't parse this.  Are you saying the problem won't happen in
>>> the other modes?  Why not?
>>
>>
>> Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the
>> largest available mpss in a pcie path.
>> Then call pcie_bus_configure_set() to set all devices' mps to the largest
>> available mps in this path, so
>> all devices in the path will have the same mps. When in
>> PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
>> for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in
>> pcie_write_mps(),
>>
>>                         /* For "Performance", the assumption is made that
>>                          * downstream communication will never be larger
>> than
>>                          * the MRRS.  So, the MPS only needs to be
>> configured
>>                          * for the upstream communication.  This being the
>> case, <------
>>                          * walk from the top down and set the MPS of the
>> child
>>                          * to that of the parent bus.
>>
>> So I think the problem won't happen in other modes.
>>
>> Thanks!
>> Yijing.
>>
>>
>>>
>>>>>>              pcie_bus_detect_mps(dev);
>>>>>>              return 0;
>>>>>>      }
>>>
>>>
>>> .
>>>
>>
>>
>> --
>> Thanks!
>> Yijing
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang Sept. 25, 2014, 1:23 a.m. UTC | #26
On 2014/9/25 7:30, Bjorn Helgaas wrote:
> On Wed, Sep 24, 2014 at 4:41 PM, Keith Busch <keith.busch@intel.com> wrote:
>> Just poking this thread to make sure it's not dead. :)
>>
>> I tested Yijing's proposal and it is successful on our Intel server
>> platforms; hoping either this or something that derives similar behavior
>> will be applied so we can remove bus tuning kernel parameters.
>>
>> Tested-by: Keith Busch <keith.busch@intel.com>
> 
> Oops, thanks for poking me, because this was indeed dead.
> 
> My main objection was to testing "is_hotplug_bridge".  That doesn't
> seem right, because this issue really isn't specific to hotplug.  I
> didn't see a resolution of that, but let me know if I missed it.

Why I introduced "is_hotplug_bridge" is to avoid to touch the MPS which is not
in hotplug case when pcie_bus_config == PCIE_BUS_TUNE_OFF.

It's so sad that PCIe spec doesn't give a detailed guide to configure MPS.

I'd like to refactor current MPS framework, but now there are still some puzzles
to me. I need to have a deeper understanding of pcie mps. I read Jon's mps patch
log from git, I found he turn off all this MPS config, because some issues were found
in some platforms, but no platforms detailed info and no bugzilla records.


> 
> Bjorn
> 
>> On Thu, 4 Sep 2014, Yijing Wang wrote:
>>>>>>>
>>>>>>>      if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>>>> +            pcie_bus_update_set(dev);
>>>>>>
>>>>>>
>>>>>> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
>>>>>> problem occur for other pcie_bus_config settings?
>>>>>
>>>>>
>>>>> We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like
>>>>> PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
>>>>> This issue won't happen.
>>>>
>>>>
>>>> Sorry, I can't parse this.  Are you saying the problem won't happen in
>>>> the other modes?  Why not?
>>>
>>>
>>> Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the
>>> largest available mpss in a pcie path.
>>> Then call pcie_bus_configure_set() to set all devices' mps to the largest
>>> available mps in this path, so
>>> all devices in the path will have the same mps. When in
>>> PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
>>> for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in
>>> pcie_write_mps(),
>>>
>>>                         /* For "Performance", the assumption is made that
>>>                          * downstream communication will never be larger
>>> than
>>>                          * the MRRS.  So, the MPS only needs to be
>>> configured
>>>                          * for the upstream communication.  This being the
>>> case, <------
>>>                          * walk from the top down and set the MPS of the
>>> child
>>>                          * to that of the parent bus.
>>>
>>> So I think the problem won't happen in other modes.
>>>
>>> Thanks!
>>> Yijing.
>>>
>>>
>>>>
>>>>>>>              pcie_bus_detect_mps(dev);
>>>>>>>              return 0;
>>>>>>>      }
>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Thanks!
>>> Yijing
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
>
Keith Busch Sept. 25, 2014, 4:46 p.m. UTC | #27
On Wed, 24 Sep 2014, Yijing Wang wrote:
> On 2014/9/25 7:30, Bjorn Helgaas wrote:
>> On Wed, Sep 24, 2014 at 4:41 PM, Keith Busch <keith.busch@intel.com> wrote:
>>> Just poking this thread to make sure it's not dead. :)
>>>
>>> I tested Yijing's proposal and it is successful on our Intel server
>>> platforms; hoping either this or something that derives similar behavior
>>> will be applied so we can remove bus tuning kernel parameters.
>>>
>>> Tested-by: Keith Busch <keith.busch@intel.com>
>>
>> Oops, thanks for poking me, because this was indeed dead.
>>
>> My main objection was to testing "is_hotplug_bridge".  That doesn't
>> seem right, because this issue really isn't specific to hotplug.  I
>> didn't see a resolution of that, but let me know if I missed it.
>
> Why I introduced "is_hotplug_bridge" is to avoid to touch the MPS which is not
> in hotplug case when pcie_bus_config == PCIE_BUS_TUNE_OFF.
>
> It's so sad that PCIe spec doesn't give a detailed guide to configure MPS.
>
> I'd like to refactor current MPS framework, but now there are still some puzzles
> to me. I need to have a deeper understanding of pcie mps. I read Jon's mps patch
> log from git, I found he turn off all this MPS config, because some issues were found
> in some platforms, but no platforms detailed info and no bugzilla records.

Just my opinion, I thought the hotplug check was a good idea: it addresses
a known issue, and does not mess with current unknowns. Outside a hotplug
scenario, I think we expect platform f/w to handle MPS settings and the
kernel can stay out of the way because of the unknown platform issues. If
it is hotplug, having the kernel set device's MPS to match the parent
couldn't make things worse off than doing nothing, right?

On the side, I'll see if I can ping some comrades on PCI-SIG to propose
an ECN to clarify configuring MPS. They usually ignore me though, so no
promises. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang Sept. 26, 2014, 3:22 a.m. UTC | #28
On 2014/9/26 0:46, Keith Busch wrote:
> On Wed, 24 Sep 2014, Yijing Wang wrote:
>> On 2014/9/25 7:30, Bjorn Helgaas wrote:
>>> On Wed, Sep 24, 2014 at 4:41 PM, Keith Busch <keith.busch@intel.com> wrote:
>>>> Just poking this thread to make sure it's not dead. :)
>>>>
>>>> I tested Yijing's proposal and it is successful on our Intel server
>>>> platforms; hoping either this or something that derives similar behavior
>>>> will be applied so we can remove bus tuning kernel parameters.
>>>>
>>>> Tested-by: Keith Busch <keith.busch@intel.com>
>>>
>>> Oops, thanks for poking me, because this was indeed dead.
>>>
>>> My main objection was to testing "is_hotplug_bridge".  That doesn't
>>> seem right, because this issue really isn't specific to hotplug.  I
>>> didn't see a resolution of that, but let me know if I missed it.
>>
>> Why I introduced "is_hotplug_bridge" is to avoid to touch the MPS which is not
>> in hotplug case when pcie_bus_config == PCIE_BUS_TUNE_OFF.
>>
>> It's so sad that PCIe spec doesn't give a detailed guide to configure MPS.
>>
>> I'd like to refactor current MPS framework, but now there are still some puzzles
>> to me. I need to have a deeper understanding of pcie mps. I read Jon's mps patch
>> log from git, I found he turn off all this MPS config, because some issues were found
>> in some platforms, but no platforms detailed info and no bugzilla records.
> 
> Just my opinion, I thought the hotplug check was a good idea: it addresses
> a known issue, and does not mess with current unknowns. Outside a hotplug
> scenario, I think we expect platform f/w to handle MPS settings and the
> kernel can stay out of the way because of the unknown platform issues. If
> it is hotplug, having the kernel set device's MPS to match the parent
> couldn't make things worse off than doing nothing, right?

Yes, I think so, but it all decided by Bjorn. I think he want a better solution,
and current patch just still is a temporary fix.

> 
> On the side, I'll see if I can ping some comrades on PCI-SIG to propose
> an ECN to clarify configuring MPS. They usually ignore me though, so no
> promises. :)

Thanks in advance for your help :)

> 
> .
>
Jordan_Hargrave@Dell.com Oct. 2, 2014, 3:31 p.m. UTC | #29
\My original proposed patch to handle MPS only lived in the hotplug code path, it wasn't part of the initial PCI configuration.
This is the code that we have had to insert into our drivers (MTIP32XX, NVME) as the current default setting doesn't work for hotplug.

http://www.spinics.net/lists/hotplug/msg04728.html

--jordan hargrave
Dell Enterprise Linux Engineering
diff mbox

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e3cf8a2..583ca52 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1613,6 +1613,44 @@  static void pcie_write_mrrs(struct pci_dev *dev)
 		dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
 }
 
+/**
+ * pcie_bus_update_set - update device mps when device doing hot-add
+ * @dev: PCI device to set
+ * 
+ * After device hot add, mps will be set to default(128B), But the 
+ * upstream port device's mps may be larger than 128B which was set 
+ * by firmware during system bootup. Then we should update the device
+ * mps to equal to its parent mps, Or the device can not work normally.
+ */
+static void pcie_bus_update_set(struct pci_dev *dev)
+{
+	int mps, p_mps, mpss;
+	struct pci_dev *parent;
+
+	if (!pci_is_pcie(dev) || !dev->bus->self 
+			|| !dev->bus->self->is_hotplug_bridge)
+		return;
+	
+	parent = dev->bus->self;
+	mps = pcie_get_mps(dev);
+	p_mps = pcie_get_mps(parent);
+
+	if (mps >= p_mps)
+		return;
+
+	mpss = 128 << dev->pcie_mpss;
+	if (mpss < p_mps) {
+		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
+				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
+				mpss, p_mps);
+		return;
+	}
+
+	pcie_write_mps(dev, p_mps);
+	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
+			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
+}
+
 static void pcie_bus_detect_mps(struct pci_dev *dev)
 {
 	struct pci_dev *bridge = dev->bus->self;
@@ -1637,6 +1675,7 @@  static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
 		return 0;
 
 	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+		pcie_bus_update_set(dev);
 		pcie_bus_detect_mps(dev);
 		return 0;
 	}