diff mbox

[v7,2/3] PCI: Enable PCIe Relaxed Ordering if supported

Message ID 1499955692-26556-3-git-send-email-dingtianhong@huawei.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Ding Tianhong July 13, 2017, 2:21 p.m. UTC
The PCIe Device Control Register use the bit 4 to indicate that
whether the device is permitted to enable relaxed ordering or not.
But relaxed ordering is not safe for some platform which could only
use strong write ordering, so devices are allowed (but not required)
to enable relaxed ordering bit by default.

If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest machine.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
 drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/pci.h |  2 ++
 3 files changed, 68 insertions(+)

Comments

Sinan Kaya July 13, 2017, 9:09 p.m. UTC | #1
On 7/13/2017 10:21 AM, Ding Tianhong wrote:
> static void pci_configure_relaxed_ordering(struct pci_dev *dev)
> +{
> +	/* We should not alter the relaxed ordering bit for the VF */
> +	if (dev->is_virtfn)
> +		return;
> +
> +	/* If the releaxed ordering enable bit is not set, do nothing. */
> +	if (!pcie_relaxed_ordering_supported(dev))
> +		return;
> +
> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {
> +		pcie_clear_relaxed_ordering(dev);
> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");
> +	}
> +}

I couldn't find anywhere where you actually enable the relaxed ordering
like the subject suggests.
Ding Tianhong July 14, 2017, 1:26 a.m. UTC | #2
On 2017/7/14 5:09, Sinan Kaya wrote:
> On 7/13/2017 10:21 AM, Ding Tianhong wrote:
>> static void pci_configure_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +	/* We should not alter the relaxed ordering bit for the VF */
>> +	if (dev->is_virtfn)
>> +		return;
>> +
>> +	/* If the releaxed ordering enable bit is not set, do nothing. */
>> +	if (!pcie_relaxed_ordering_supported(dev))
>> +		return;
>> +
>> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {
>> +		pcie_clear_relaxed_ordering(dev);
>> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");
>> +	}
>> +}
> 
> I couldn't find anywhere where you actually enable the relaxed ordering
> like the subject suggests.
> 
There is no code to enable the PCIe Relaxed Ordering bit in the configuration space,
it is only be enable by default according to the PCIe Standard Specification, what we
do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit
to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root
Complex.

Thanks
Ding
Sinan Kaya July 14, 2017, 1:54 p.m. UTC | #3
On 7/13/2017 9:26 PM, Ding Tianhong wrote:
> There is no code to enable the PCIe Relaxed Ordering bit in the configuration space,
> it is only be enable by default according to the PCIe Standard Specification, what we
> do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit
> to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root
> Complex.

Maybe, you should change the patch commit as 
"Disable PCIe Relaxed Ordering if not supported"...
Ding Tianhong July 22, 2017, 4:19 a.m. UTC | #4
Hi Sinan, Bjorn:

On 2017/7/14 21:54, Sinan Kaya wrote:
> On 7/13/2017 9:26 PM, Ding Tianhong wrote:
>> There is no code to enable the PCIe Relaxed Ordering bit in the configuration space,
>> it is only be enable by default according to the PCIe Standard Specification, what we
>> do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit
>> to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root
>> Complex.
> 
> Maybe, you should change the patch commit as 
> "Disable PCIe Relaxed Ordering if not supported"...

I agree that to use the new commit title as your suggested, thanks. :)

@Bjorn do you want me to spawn a new patchset with the new commit title
and the Reviewed-by from Casey on the patch 3, or maybe you could pick this
up and modify it own ? thanks.

Ding

>
Alex Williamson July 24, 2017, 3:05 p.m. UTC | #5
On Sat, 22 Jul 2017 12:19:38 +0800
Ding Tianhong <dingtianhong@huawei.com> wrote:

> Hi Sinan, Bjorn:
> 
> On 2017/7/14 21:54, Sinan Kaya wrote:
> > On 7/13/2017 9:26 PM, Ding Tianhong wrote:  
> >> There is no code to enable the PCIe Relaxed Ordering bit in the configuration space,
> >> it is only be enable by default according to the PCIe Standard Specification, what we
> >> do is to distinguish the RC problematic platform and clear the Relaxed Ordering bit
> >> to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to the Root
> >> Complex.  
> > 
> > Maybe, you should change the patch commit as 
> > "Disable PCIe Relaxed Ordering if not supported"...  
> 
> I agree that to use the new commit title as your suggested, thanks. :)
> 
> @Bjorn do you want me to spawn a new patchset with the new commit title
> and the Reviewed-by from Casey on the patch 3, or maybe you could pick this
> up and modify it own ? thanks.

Hi Ding,

Bjorn is currently on holiday so it might be a good idea to respin the
series with any updates so nothing is lost.  Thanks,

Alex
Casey Leedom July 26, 2017, 6:26 p.m. UTC | #6
By the way Ding, two issues:

 1. Did we ever get any acknowledgement from either Intel or AMD
    on this patch?  I know that we can't ensure that, but it sure would
    be nice since the PCI Quirks that we're putting in affect their
    products.

 2. I just realized that there's still a small hole in the patch with
    respect to PCIe SR-IOV Virtual Functions.  When the PCI Quirk
    notices a problematic PCIe Device and marks it to note that
    it's not "happy" receiving Transaction Layer Packets with the
    Relaxed Ordering Attribute set, it's my understanding that your
    current patch iterates down the PCIe Fabric and turns off the
    PCIe Capability Device Control[Relaxed Ordering Enable].
    But this scan may miss any SR-IOV VFs because they
    probably won't be instantiated at that time.  And, at a later
    time, when they are instantiated, they could well have their
    Relaxed Ordering Enable set.

    I think that the patch will need to be extended to modify
    drivers/pci.c/iov.c:sriov_enable() to explicitly turn off
    Relaxed Ordering Enable if the Root Complex is marked
    for no RO TLPs.

Casey
Casey Leedom July 26, 2017, 7:05 p.m. UTC | #7
| From: Alexander Duyck <alexander.duyck@gmail.com>
| Sent: Wednesday, July 26, 2017 11:44 AM
| 
| On Jul 26, 2017 11:26 AM, "Casey Leedom" <leedom@chelsio.com> wrote:
| |
| |     I think that the patch will need to be extended to modify
| |     drivers/pci.c/iov.c:sriov_enable() to explicitly turn off
| |     Relaxed Ordering Enable if the Root Complex is marked
|     for no RO TLPs.
| 
| I'm not sure that would be an issue. Wouldn't most VFs inherit the PF's settings?

Ah yes, you're right.  This is covered in section 3.5.4 of the Single Root I/O
Virtualization and Sharing Specification, Revision 1.0 (September 11, 2007),
governing the PCIe Capability Device Control register.  It states that the VF
version of that register shall follow the setting of the corresponding PF.

So we should enhance the cxgb4vf/sge.c:t4vf_sge_alloc_rxq() in the same
way we did for the cxgb4 driver, but that's not critical since the Relaxed
Ordering Enable supersedes the internal chip's desire to use the Relaxed
Ordering Attribute.

Ding, send me a note if you'd like me to work that up for you.

| Also I thought most of the VF configuration space is read only.

Yes, but not all of it.  And when a VF is exported to a Virtual Machine,
then the Hypervisor captures and interprets all accesses to the VF's
PCIe Configuration Space from the VM.

Thanks again for reminding me of the subtle aspect of the SR_IOV
specification that I forgot.

Casey
Ding Tianhong July 27, 2017, 1:01 a.m. UTC | #8
On 2017/7/27 3:05, Casey Leedom wrote:
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Sent: Wednesday, July 26, 2017 11:44 AM
> | 
> | On Jul 26, 2017 11:26 AM, "Casey Leedom" <leedom@chelsio.com> wrote:
> | |
> | |     I think that the patch will need to be extended to modify
> | |     drivers/pci.c/iov.c:sriov_enable() to explicitly turn off
> | |     Relaxed Ordering Enable if the Root Complex is marked
> |     for no RO TLPs.
> | 
> | I'm not sure that would be an issue. Wouldn't most VFs inherit the PF's settings?
> 
> Ah yes, you're right.  This is covered in section 3.5.4 of the Single Root I/O
> Virtualization and Sharing Specification, Revision 1.0 (September 11, 2007),
> governing the PCIe Capability Device Control register.  It states that the VF
> version of that register shall follow the setting of the corresponding PF.
> 
> So we should enhance the cxgb4vf/sge.c:t4vf_sge_alloc_rxq() in the same
> way we did for the cxgb4 driver, but that's not critical since the Relaxed
> Ordering Enable supersedes the internal chip's desire to use the Relaxed
> Ordering Attribute.
> 
> Ding, send me a note if you'd like me to work that up for you.
> 

Ok, you could send the change log and I could put it in the v8 version together,
will you base on the patch 3/3 or build a independence patch?

Ding

> | Also I thought most of the VF configuration space is read only.
> 
> Yes, but not all of it.  And when a VF is exported to a Virtual Machine,
> then the Hypervisor captures and interprets all accesses to the VF's
> PCIe Configuration Space from the VM.
> 
> Thanks again for reminding me of the subtle aspect of the SR_IOV
> specification that I forgot.
> 
> Casey
> .
>
Ding Tianhong July 27, 2017, 1:08 a.m. UTC | #9
On 2017/7/27 2:26, Casey Leedom wrote:
>   By the way Ding, two issues:
> 
>  1. Did we ever get any acknowledgement from either Intel or AMD
>     on this patch?  I know that we can't ensure that, but it sure would
>     be nice since the PCI Quirks that we're putting in affect their
>     products.
> 

Still no Intel and AMD guys has ack this, this is what I am worried about, should I
ping some man again ?

Thanks
Ding
> 
> Casey
> .
>
Casey Leedom July 27, 2017, 5:44 p.m. UTC | #10
| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, July 26, 2017 6:01 PM
|
| On 2017/7/27 3:05, Casey Leedom wrote:
| >
| > Ding, send me a note if you'd like me to work that [cxgb4vf patch] up
| > for you.
|
| Ok, you could send the change log and I could put it in the v8 version
| together, will you base on the patch 3/3 or build a independence patch?

Which ever you'd prefer.  It would basically mirror the same exact code that
you've got for cxgb4.  I.e. testing the setting of the VF's PCIe Capability
Device Control[Relaxed Ordering Enable], setting a new flag in
adpater->flags, testing that flag in cxgb4vf/sge.c:t4vf_sge_alloc_rxq().
But since the VF's PF will already have disabled the PF's Relaxed Ordering
Enable, the VF will also have it's Relaxed Ordering Enable disabled and any
effort by the internal chip to send TLPs with the Relaxed Ordering Attribute
will be gated by the PCIe logic.  So it's not critical that this be in the
first patch.  Your call.  Let me know if you'd like me to send that to you.


| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, July 26, 2017 6:08 PM
|
| On 2017/7/27 2:26, Casey Leedom wrote:
| >
| >  1. Did we ever get any acknowledgement from either Intel or AMD
| >     on this patch?  I know that we can't ensure that, but it sure would
| >     be nice since the PCI Quirks that we're putting in affect their
| >     products.
|
| Still no Intel and AMD guys has ack this, this is what I am worried about,
| should I ping some man again ?

By amusing coincidence, Patrik Cramer (now Cc'ed) from Intel sent me a note
yesterday with a link to the official Intel performance tuning documentation
which covers this issue:

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

In section 3.9.1 we have:

    3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
          and Toward MMIO Regions (P2P)

    In order to maximize performance for PCIe devices in the processors
    listed in Table 3-6 below, the soft- ware should determine whether the
    accesses are toward coherent memory (system memory) or toward MMIO
    regions (P2P access to other devices). If the access is toward MMIO
    region, then software can command HW to set the RO bit in the TLP
    header, as this would allow hardware to achieve maximum throughput for
    these types of accesses. For accesses toward coherent memory, software
    can command HW to clear the RO bit in the TLP header (no RO), as this
    would allow hardware to achieve maximum throughput for these types of
    accesses.

    Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
               PCIe Performance

    Processor                            CPU RP Device IDs

    Intel Xeon processors based on       6F01H-6F0EH
    Broadwell microarchitecture

    Intel Xeon processors based on       2F01H-2F0EH
    Haswell microarchitecture

Unfortunately that's a pretty thin section.  But it does expand the set of
Intel Root Complexes for which our Linux PCI Quirk will need to cover.  So
you should add those to the next (and hopefully final) spin of your patch.
And, it also verifies the need to handle the use of Relaxed Ordering more
subtlely than simply turning it off since the NVMe peer-to-peer example I
keep bringing up would fall into the "need to use Relaxed Ordering" case ...

It would have been nice to know why this is happening and if any future
processor would fix this.  After all, Relaxed Ordering, is just supposed to
be a hint.  At worst, a receiving device could just ignore the attribute
entirely.  Obviously someone made an effort to implement it but ... it
didn't go the way they wanted.

And, it also would have been nice to know if there was any hidden register
in these Intel Root Complexes which can completely turn off the effort to
pay attention to the Relaxed Ordering Attribute.  We've spend an enormous
amount of effort on this issue here on the Linux PCI email list struggling
mightily to come up with a way to determine when it's
safe/recommended/not-recommended/unsafe to use Relaxed Ordering when
directing TLPs towards the Root Complex.  And some architectures require RO
for decent performance so we can't just "turn it off" unilatterally.

Casey
Alexander H Duyck July 27, 2017, 5:49 p.m. UTC | #11
On Wed, Jul 26, 2017 at 6:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>
>
> On 2017/7/27 2:26, Casey Leedom wrote:
>>   By the way Ding, two issues:
>>
>>  1. Did we ever get any acknowledgement from either Intel or AMD
>>     on this patch?  I know that we can't ensure that, but it sure would
>>     be nice since the PCI Quirks that we're putting in affect their
>>     products.
>>
>
> Still no Intel and AMD guys has ack this, this is what I am worried about, should I
> ping some man again ?
>
> Thanks
> Ding


I probably wouldn't worry about it too much. If anything all this
patch is doing is disabling relaxed ordering on the platforms we know
have issues based on what Casey originally had. If nothing else we can
follow up once the patches are in the kernel and if somebody has an
issue then.

You can include my acked-by, but it is mostly related to how this
interacts with NICs, and not so much about the PCI chipsets
themselves.

Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Ashok Raj July 27, 2017, 6:42 p.m. UTC | #12
Hi Casey

> | Still no Intel and AMD guys has ack this, this is what I am worried about,
> | should I ping some man again ?


I can ack the patch set for Intel specific changes. Now that the doc is made
public :-).

Can you/Ding resend the patch series, i do have the most recent v7, some
of the commit message wasn't easy to ready. Seems like this patch has
gotten bigger than originally intended, but seems to be for the overall
good :-).

Sorry for staying silent up until now.

Cheers,
Ashok
Ding Tianhong July 28, 2017, 2:48 a.m. UTC | #13
On 2017/7/28 1:44, Casey Leedom wrote:
> | From: Ding Tianhong <dingtianhong@huawei.com>
> | Sent: Wednesday, July 26, 2017 6:01 PM
> |
> | On 2017/7/27 3:05, Casey Leedom wrote:
> | >
> | > Ding, send me a note if you'd like me to work that [cxgb4vf patch] up
> | > for you.
> |
> | Ok, you could send the change log and I could put it in the v8 version
> | together, will you base on the patch 3/3 or build a independence patch?
> 
> Which ever you'd prefer.  It would basically mirror the same exact code that
> you've got for cxgb4.  I.e. testing the setting of the VF's PCIe Capability
> Device Control[Relaxed Ordering Enable], setting a new flag in
> adpater->flags, testing that flag in cxgb4vf/sge.c:t4vf_sge_alloc_rxq().
> But since the VF's PF will already have disabled the PF's Relaxed Ordering
> Enable, the VF will also have it's Relaxed Ordering Enable disabled and any
> effort by the internal chip to send TLPs with the Relaxed Ordering Attribute
> will be gated by the PCIe logic.  So it's not critical that this be in the
> first patch.  Your call.  Let me know if you'd like me to send that to you.
> 

Good, please Send it to me, I will put it together and send the v8 this week,
I think Bjorn will be back next week .:)

> 
> | From: Ding Tianhong <dingtianhong@huawei.com>
> | Sent: Wednesday, July 26, 2017 6:08 PM
> |
> | On 2017/7/27 2:26, Casey Leedom wrote:
> | >
> | >  1. Did we ever get any acknowledgement from either Intel or AMD
> | >     on this patch?  I know that we can't ensure that, but it sure would
> | >     be nice since the PCI Quirks that we're putting in affect their
> | >     products.
> |
> | Still no Intel and AMD guys has ack this, this is what I am worried about,
> | should I ping some man again ?
> 
> By amusing coincidence, Patrik Cramer (now Cc'ed) from Intel sent me a note
> yesterday with a link to the official Intel performance tuning documentation
> which covers this issue:
> 
> https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf
> 
> In section 3.9.1 we have:
> 
>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>           and Toward MMIO Regions (P2P)
> 
>     In order to maximize performance for PCIe devices in the processors
>     listed in Table 3-6 below, the soft- ware should determine whether the
>     accesses are toward coherent memory (system memory) or toward MMIO
>     regions (P2P access to other devices). If the access is toward MMIO
>     region, then software can command HW to set the RO bit in the TLP
>     header, as this would allow hardware to achieve maximum throughput for
>     these types of accesses. For accesses toward coherent memory, software
>     can command HW to clear the RO bit in the TLP header (no RO), as this
>     would allow hardware to achieve maximum throughput for these types of
>     accesses.
> 
>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>                PCIe Performance
> 
>     Processor                            CPU RP Device IDs
> 
>     Intel Xeon processors based on       6F01H-6F0EH
>     Broadwell microarchitecture
> 
>     Intel Xeon processors based on       2F01H-2F0EH
>     Haswell microarchitecture
> 
> Unfortunately that's a pretty thin section.  But it does expand the set of
> Intel Root Complexes for which our Linux PCI Quirk will need to cover.  So
> you should add those to the next (and hopefully final) spin of your patch.
> And, it also verifies the need to handle the use of Relaxed Ordering more
> subtlely than simply turning it off since the NVMe peer-to-peer example I
> keep bringing up would fall into the "need to use Relaxed Ordering" case ...
> 
> It would have been nice to know why this is happening and if any future
> processor would fix this.  After all, Relaxed Ordering, is just supposed to
> be a hint.  At worst, a receiving device could just ignore the attribute
> entirely.  Obviously someone made an effort to implement it but ... it
> didn't go the way they wanted.
> 
> And, it also would have been nice to know if there was any hidden register
> in these Intel Root Complexes which can completely turn off the effort to
> pay attention to the Relaxed Ordering Attribute.  We've spend an enormous
> amount of effort on this issue here on the Linux PCI email list struggling
> mightily to come up with a way to determine when it's
> safe/recommended/not-recommended/unsafe to use Relaxed Ordering when
> directing TLPs towards the Root Complex.  And some architectures require RO
> for decent performance so we can't just "turn it off" unilatterally.
> 

I am glad to hear that more person were focus on this problem, It would be great
if they could enter our discussion and give us more suggestion. :)

Thanks
Ding

> Casey
> 
> .
>
Ding Tianhong July 28, 2017, 2:57 a.m. UTC | #14
On 2017/7/28 2:42, Raj, Ashok wrote:
> Hi Casey
> 
>> | Still no Intel and AMD guys has ack this, this is what I am worried about,
>> | should I ping some man again ?
> 
> 
> I can ack the patch set for Intel specific changes. Now that the doc is made
> public :-).
> 

Good, Thanks. :)

> Can you/Ding resend the patch series, i do have the most recent v7, some
> of the commit message wasn't easy to ready. Seems like this patch has
> gotten bigger than originally intended, but seems to be for the overall
> good :-).
> 

OK, I will send v8 patch set and which will update the patch title and add
Casey's new modification for his vf driver, thanks.

Ding

> Sorry for staying silent up until now.
> 
> Cheers,
> Ashok
> 
> .
>
Ding Tianhong July 28, 2017, 3 a.m. UTC | #15
On 2017/7/28 1:49, Alexander Duyck wrote:
> On Wed, Jul 26, 2017 at 6:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>>
>>
>> On 2017/7/27 2:26, Casey Leedom wrote:
>>>   By the way Ding, two issues:
>>>
>>>  1. Did we ever get any acknowledgement from either Intel or AMD
>>>     on this patch?  I know that we can't ensure that, but it sure would
>>>     be nice since the PCI Quirks that we're putting in affect their
>>>     products.
>>>
>>
>> Still no Intel and AMD guys has ack this, this is what I am worried about, should I
>> ping some man again ?
>>
>> Thanks
>> Ding
> 
> 
> I probably wouldn't worry about it too much. If anything all this
> patch is doing is disabling relaxed ordering on the platforms we know
> have issues based on what Casey originally had. If nothing else we can
> follow up once the patches are in the kernel and if somebody has an
> issue then.
> 
> You can include my acked-by, but it is mostly related to how this
> interacts with NICs, and not so much about the PCI chipsets
> themselves.
> 
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
> 

Thanks, Alex. :)

> .
>
Ashok Raj Aug. 3, 2017, 9:13 a.m. UTC | #16
Hi Ding

patch looks good, except would reword the patch description for clarity

here is my crack at it, feel free to use.

On Thu, Jul 13, 2017 at 10:21:31PM +0800, Ding Tianhong wrote:
> The PCIe Device Control Register use the bit 4 to indicate that
> whether the device is permitted to enable relaxed ordering or not.
> But relaxed ordering is not safe for some platform which could only
> use strong write ordering, so devices are allowed (but not required)
> to enable relaxed ordering bit by default.
> 
> If a PCIe device didn't enable the relaxed ordering attribute default,
> we should not do anything in the PCIe configuration, otherwise we
> should check if any of the devices above us do not support relaxed
> ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
> the result if we get a return that indicate that the relaxed ordering
> is not supported we should update our device to disable relaxed ordering
> in configuration space. If the device above us doesn't exist or isn't
> the PCIe device, we shouldn't do anything and skip updating relaxed ordering
> because we are probably running in a guest machine.

When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the eapability for this device.

> 
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>  drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
>  drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h |  2 ++
>  3 files changed, 68 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d88edf5..7a6b32f 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>  EXPORT_SYMBOL(pcie_set_mps);
>  
>  /**
> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
> + * @dev: PCI device to query
> + *
> + * If possible clear relaxed ordering
> + */
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
> +{
> +	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
> +					  PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
> +
> +/**
> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
> + * @dev: PCI device to query
> + *
> + * Returns true if the device support relaxed ordering attribute.
> + */
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
> +{
> +	u16 v;
> +
> +	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
> +
> +	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
> +
> +/**
>   * pcie_get_minimum_link - determine minimum link settings of a PCI device
>   * @dev: PCI device to query
>   * @speed: storage for minimum speed
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index c31310d..48df012 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
>  					 PCI_EXP_DEVCTL_EXT_TAG);
>  }
>  
> +/**
> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device
> + * should disable the relaxed ordering attribute.
> + * @dev: PCI device
> + *
> + * Return true if any of the PCI devices above us do not support
> + * relaxed ordering.
> + */
> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
> +{
> +	while (dev) {
> +		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
> +			return true;
> +
> +		dev = dev->bus->self;
> +	}
> +
> +	return false;
> +}
> +
> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
> +{
> +	/* We should not alter the relaxed ordering bit for the VF */
> +	if (dev->is_virtfn)
> +		return;
> +
> +	/* If the releaxed ordering enable bit is not set, do nothing. */
> +	if (!pcie_relaxed_ordering_supported(dev))
> +		return;
> +
> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {
> +		pcie_clear_relaxed_ordering(dev);
> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");
> +	}
> +}
> +
>  static void pci_configure_device(struct pci_dev *dev)
>  {
>  	struct hotplug_params hpp;
> @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
>  
>  	pci_configure_mps(dev);
>  	pci_configure_extended_tags(dev);
> +	pci_configure_relaxed_ordering(dev);
>  
>  	memset(&hpp, 0, sizeof(hpp));
>  	ret = pci_get_hp_params(dev, &hpp);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 412ec1c..3aa23a2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
>  void pci_pme_wakeup_bus(struct pci_bus *bus);
>  void pci_d3cold_enable(struct pci_dev *dev);
>  void pci_d3cold_disable(struct pci_dev *dev);
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev);
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
>  
>  /* PCI Virtual Channel */
>  int pci_save_vc_state(struct pci_dev *dev);
> -- 
> 1.8.3.1
> 
>
Ding Tianhong Aug. 3, 2017, 10:22 a.m. UTC | #17
On 2017/8/3 17:13, Raj, Ashok wrote:
> Hi Ding
> 
> patch looks good, except would reword the patch description for clarity
> 
> here is my crack at it, feel free to use.
> 
> On Thu, Jul 13, 2017 at 10:21:31PM +0800, Ding Tianhong wrote:
>> The PCIe Device Control Register use the bit 4 to indicate that
>> whether the device is permitted to enable relaxed ordering or not.
>> But relaxed ordering is not safe for some platform which could only
>> use strong write ordering, so devices are allowed (but not required)
>> to enable relaxed ordering bit by default.
>>
>> If a PCIe device didn't enable the relaxed ordering attribute default,
>> we should not do anything in the PCIe configuration, otherwise we
>> should check if any of the devices above us do not support relaxed
>> ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
>> the result if we get a return that indicate that the relaxed ordering
>> is not supported we should update our device to disable relaxed ordering
>> in configuration space. If the device above us doesn't exist or isn't
>> the PCIe device, we shouldn't do anything and skip updating relaxed ordering
>> because we are probably running in a guest machine.
> 
> When bit4 is set in the PCIe Device Control register, it indicates
> whether the device is permitted to use relaxed ordering.
> On some platforms using relaxed ordering can have performance issues or
> due to erratum can cause data-corruption. In such cases devices must avoid
> using relaxed ordering.
> 
> This patch checks if there is any node in the hierarchy that indicates that
> using relaxed ordering is not safe. In such cases the patch turns off the
> relaxed ordering by clearing the eapability for this device.
> 

Good, thanks for the commit, I will send v8 and update the patch description.

Ding

>>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
>>  drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
>>  drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
>>  include/linux/pci.h |  2 ++
>>  3 files changed, 68 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index d88edf5..7a6b32f 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>>  EXPORT_SYMBOL(pcie_set_mps);
>>  
>>  /**
>> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
>> + * @dev: PCI device to query
>> + *
>> + * If possible clear relaxed ordering
>> + */
>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>> +					  PCI_EXP_DEVCTL_RELAX_EN);
>> +}
>> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
>> +
>> +/**
>> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
>> + * @dev: PCI device to query
>> + *
>> + * Returns true if the device support relaxed ordering attribute.
>> + */
>> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
>> +{
>> +	u16 v;
>> +
>> +	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
>> +
>> +	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
>> +}
>> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
>> +
>> +/**
>>   * pcie_get_minimum_link - determine minimum link settings of a PCI device
>>   * @dev: PCI device to query
>>   * @speed: storage for minimum speed
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index c31310d..48df012 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
>>  					 PCI_EXP_DEVCTL_EXT_TAG);
>>  }
>>  
>> +/**
>> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device
>> + * should disable the relaxed ordering attribute.
>> + * @dev: PCI device
>> + *
>> + * Return true if any of the PCI devices above us do not support
>> + * relaxed ordering.
>> + */
>> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +	while (dev) {
>> +		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
>> +			return true;
>> +
>> +		dev = dev->bus->self;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +	/* We should not alter the relaxed ordering bit for the VF */
>> +	if (dev->is_virtfn)
>> +		return;
>> +
>> +	/* If the releaxed ordering enable bit is not set, do nothing. */
>> +	if (!pcie_relaxed_ordering_supported(dev))
>> +		return;
>> +
>> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {
>> +		pcie_clear_relaxed_ordering(dev);
>> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");
>> +	}
>> +}
>> +
>>  static void pci_configure_device(struct pci_dev *dev)
>>  {
>>  	struct hotplug_params hpp;
>> @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
>>  
>>  	pci_configure_mps(dev);
>>  	pci_configure_extended_tags(dev);
>> +	pci_configure_relaxed_ordering(dev);
>>  
>>  	memset(&hpp, 0, sizeof(hpp));
>>  	ret = pci_get_hp_params(dev, &hpp);
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 412ec1c..3aa23a2 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
>>  void pci_pme_wakeup_bus(struct pci_bus *bus);
>>  void pci_d3cold_enable(struct pci_dev *dev);
>>  void pci_d3cold_disable(struct pci_dev *dev);
>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev);
>> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
>>  
>>  /* PCI Virtual Channel */
>>  int pci_save_vc_state(struct pci_dev *dev);
>> -- 
>> 1.8.3.1
>>
>>
> 
> .
>
diff mbox

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d88edf5..7a6b32f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4854,6 +4854,35 @@  int pcie_set_mps(struct pci_dev *dev, int mps)
 EXPORT_SYMBOL(pcie_set_mps);
 
 /**
+ * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
+ * @dev: PCI device to query
+ *
+ * If possible clear relaxed ordering
+ */
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+					  PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+/**
+ * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
+ * @dev: PCI device to query
+ *
+ * Returns true if the device support relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
+{
+	u16 v;
+
+	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
+
+	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
+
+/**
  * pcie_get_minimum_link - determine minimum link settings of a PCI device
  * @dev: PCI device to query
  * @speed: storage for minimum speed
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..48df012 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,42 @@  static void pci_configure_extended_tags(struct pci_dev *dev)
 					 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pci_dev_should_disable_relaxed_ordering - check if the PCI device
+ * should disable the relaxed ordering attribute.
+ * @dev: PCI device
+ *
+ * Return true if any of the PCI devices above us do not support
+ * relaxed ordering.
+ */
+static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
+{
+	while (dev) {
+		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
+			return true;
+
+		dev = dev->bus->self;
+	}
+
+	return false;
+}
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+	/* We should not alter the relaxed ordering bit for the VF */
+	if (dev->is_virtfn)
+		return;
+
+	/* If the releaxed ordering enable bit is not set, do nothing. */
+	if (!pcie_relaxed_ordering_supported(dev))
+		return;
+
+	if (pci_dev_should_disable_relaxed_ordering(dev)) {
+		pcie_clear_relaxed_ordering(dev);
+		dev_info(&dev->dev, "Disable Relaxed Ordering\n");
+	}
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
 	struct hotplug_params hpp;
@@ -1769,6 +1805,7 @@  static void pci_configure_device(struct pci_dev *dev)
 
 	pci_configure_mps(dev);
 	pci_configure_extended_tags(dev);
+	pci_configure_relaxed_ordering(dev);
 
 	memset(&hpp, 0, sizeof(hpp));
 	ret = pci_get_hp_params(dev, &hpp);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 412ec1c..3aa23a2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1127,6 +1127,8 @@  int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+int pcie_clear_relaxed_ordering(struct pci_dev *dev);
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);