diff mbox series

[V11,7/8] PCI: Enable 10-Bit Tag support for PCIe Endpoint device

Message ID 20211030135348.61364-8-liudongdong3@huawei.com
State New
Headers show
Series PCI: Enable 10-Bit tag support for PCIe devices | expand

Commit Message

Dongdong Liu Oct. 30, 2021, 1:53 p.m. UTC
10-Bit Tag capability, introduced in PCIe-4.0 increases the total Tag
field size from 8 bits to 10 bits.

PCIe spec 5.0 r1.0 section 2.2.6.2 "Considerations for Implementing
10-Bit Tag Capabilities" Implementation Note:

  For platforms where the RC supports 10-Bit Tag Completer capability,
  it is highly recommended for platform firmware or operating software
  that configures PCIe hierarchies to Set the 10-Bit Tag Requester Enable
  bit automatically in Endpoints with 10-Bit Tag Requester capability.
  This enables the important class of 10-Bit Tag capable adapters that
  send Memory Read Requests only to host memory.

It's safe to enable 10-bit tags for all devices below a Root Port that
supports them. Switches that lack 10-Bit Tag Completer capability are
still able to forward NPRs and Completions carrying 10-Bit Tags correctly,
since the two new Tag bits are in TLP Header bits that were formerly
Reserved.

PCIe spec 5.0 r1.0 section 2.2.6.2 says:

  If an Endpoint supports sending Requests to other Endpoints (as opposed
  to host memory), the Endpoint must not send 10-Bit Tag Requests to
  another given Endpoint unless an implementation-specific mechanism
  determines that the Endpoint supports 10-Bit Tag Completer capability.

It is not safe for P2P traffic if an Endpoint send 10-Bit Tag Requesters
to another Endpoint that does not support 10-Bit Tag Completer capability,
so we provide sysfs file to disable 10-Bit Tag Requester. Unbind the
device driver, set the sysfs file and then rebind the driver.

Add a kernel parameter pcie_tag_peer2peer that disables 10-Bit Tag
Requester for all PCIe devices. This configuration allows peer-to-peer
DMA between any pair of devices, possibly at the cost of reduced
performance.

Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 +++
 drivers/pci/iov.c                             |  3 ++
 drivers/pci/pci-sysfs.c                       |  3 ++
 drivers/pci/pci.c                             |  4 ++
 drivers/pci/pci.h                             |  7 ++++
 drivers/pci/probe.c                           | 42 ++++++++++++++++++-
 6 files changed, 63 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Nov. 1, 2021, 10:02 p.m. UTC | #1
On Sat, Oct 30, 2021 at 09:53:47PM +0800, Dongdong Liu wrote:
> 10-Bit Tag capability, introduced in PCIe-4.0 increases the total Tag
> field size from 8 bits to 10 bits.
> 
> PCIe spec 5.0 r1.0 section 2.2.6.2 "Considerations for Implementing
> 10-Bit Tag Capabilities" Implementation Note:
> 
>   For platforms where the RC supports 10-Bit Tag Completer capability,
>   it is highly recommended for platform firmware or operating software
>   that configures PCIe hierarchies to Set the 10-Bit Tag Requester Enable
>   bit automatically in Endpoints with 10-Bit Tag Requester capability.
>   This enables the important class of 10-Bit Tag capable adapters that
>   send Memory Read Requests only to host memory.
> 
> It's safe to enable 10-bit tags for all devices below a Root Port that
> supports them. Switches that lack 10-Bit Tag Completer capability are
> still able to forward NPRs and Completions carrying 10-Bit Tags correctly,
> since the two new Tag bits are in TLP Header bits that were formerly
> Reserved.

Side note: the reason we want to do this to increase performance by
allowing more outstanding requests.  Do you have any benchmarking that
we can mention here to show that this is actually a benefit?  I don't
doubt that it is, but I assume you've measured it and it would be nice
to advertise it.

Bjorn
Bjorn Helgaas Nov. 1, 2021, 10:33 p.m. UTC | #2
On Mon, Nov 01, 2021 at 05:02:41PM -0500, Bjorn Helgaas wrote:
> On Sat, Oct 30, 2021 at 09:53:47PM +0800, Dongdong Liu wrote:
> > 10-Bit Tag capability, introduced in PCIe-4.0 increases the total Tag
> > field size from 8 bits to 10 bits.
> > 
> > PCIe spec 5.0 r1.0 section 2.2.6.2 "Considerations for Implementing
> > 10-Bit Tag Capabilities" Implementation Note:
> > 
> >   For platforms where the RC supports 10-Bit Tag Completer capability,
> >   it is highly recommended for platform firmware or operating software
> >   that configures PCIe hierarchies to Set the 10-Bit Tag Requester Enable
> >   bit automatically in Endpoints with 10-Bit Tag Requester capability.
> >   This enables the important class of 10-Bit Tag capable adapters that
> >   send Memory Read Requests only to host memory.
> > 
> > It's safe to enable 10-bit tags for all devices below a Root Port that
> > supports them. Switches that lack 10-Bit Tag Completer capability are
> > still able to forward NPRs and Completions carrying 10-Bit Tags correctly,
> > since the two new Tag bits are in TLP Header bits that were formerly
> > Reserved.
> 
> Side note: the reason we want to do this to increase performance by
> allowing more outstanding requests.  Do you have any benchmarking that
> we can mention here to show that this is actually a benefit?  I don't
> doubt that it is, but I assume you've measured it and it would be nice
> to advertise it.

Hmmm.  I did a quick Google search looking for "nvme pcie 10-bit tags"
hoping to find some performance info, but what I *actually* found was
several reports of 10-bit tags causing breakage:

  https://www.reddit.com/r/MSI_Gaming/comments/exjvzg/x570_apro_7c37vh72beta_version_has_anyone_tryed_it/
  https://rog.asus.com/forum/showthread.php?115064-Beware-of-agesa-1-0-0-4B-bios-not-good!/page2
  https://forum-en.msi.com/index.php?threads/sound-blaster-z-has-weird-behaviour-after-updating-bios-x570-gaming-edge-wifi.325223/page-2
  https://gearspace.com/board/electronic-music-instruments-and-electronic-music-production/1317189-h8000fw-firewire-facts-2020-must-read.html
  https://www.soundonsound.com/forum/viewtopic.php?t=69651&start=12
  https://forum.rme-audio.de/viewtopic.php?id=30307

This is a big problem for me.

Some of these might be a broken BIOS that turns on 10-bit tags when
the completer doesn't support them.  I didn't try to debug them to
that level.  But the last thing I want is to enable 10-bit by default
and cause boot issues or sound card issues or whatever.

I'm pretty sure this is a show-stopper for wedging this into v5.16 at
this late date.  It's conceivable we could still do it if everything
defaulted to "off" and we had a knob whereby users could turn it on
via boot param or sysfs.

In any case, we (by which I'm afraid I mean "you" :)) need to
investigate the problem reports, figure out whether we will see
similar problems, and fix them before merging if we can.

Thanks to Krzysztof for pointing out the potential for issues like
this.

Bjorn
Dongdong Liu Nov. 3, 2021, 10:05 a.m. UTC | #3
On 2021/11/2 6:33, Bjorn Helgaas wrote:
> On Mon, Nov 01, 2021 at 05:02:41PM -0500, Bjorn Helgaas wrote:
>> On Sat, Oct 30, 2021 at 09:53:47PM +0800, Dongdong Liu wrote:
>>> 10-Bit Tag capability, introduced in PCIe-4.0 increases the total Tag
>>> field size from 8 bits to 10 bits.
>>>
>>> PCIe spec 5.0 r1.0 section 2.2.6.2 "Considerations for Implementing
>>> 10-Bit Tag Capabilities" Implementation Note:
>>>
>>>   For platforms where the RC supports 10-Bit Tag Completer capability,
>>>   it is highly recommended for platform firmware or operating software
>>>   that configures PCIe hierarchies to Set the 10-Bit Tag Requester Enable
>>>   bit automatically in Endpoints with 10-Bit Tag Requester capability.
>>>   This enables the important class of 10-Bit Tag capable adapters that
>>>   send Memory Read Requests only to host memory.
>>>
>>> It's safe to enable 10-bit tags for all devices below a Root Port that
>>> supports them. Switches that lack 10-Bit Tag Completer capability are
>>> still able to forward NPRs and Completions carrying 10-Bit Tags correctly,
>>> since the two new Tag bits are in TLP Header bits that were formerly
>>> Reserved.
>>
>> Side note: the reason we want to do this to increase performance by
>> allowing more outstanding requests.  Do you have any benchmarking that
>> we can mention here to show that this is actually a benefit?  I don't
>> doubt that it is, but I assume you've measured it and it would be nice
>> to advertise it.
>
> Hmmm.  I did a quick Google search looking for "nvme pcie 10-bit tags"
> hoping to find some performance info, but what I *actually* found was
> several reports of 10-bit tags causing breakage:
>
>   https://www.reddit.com/r/MSI_Gaming/comments/exjvzg/x570_apro_7c37vh72beta_version_has_anyone_tryed_it/
>   https://rog.asus.com/forum/showthread.php?115064-Beware-of-agesa-1-0-0-4B-bios-not-good!/page2
>   https://forum-en.msi.com/index.php?threads/sound-blaster-z-has-weird-behaviour-after-updating-bios-x570-gaming-edge-wifi.325223/page-2
>   https://gearspace.com/board/electronic-music-instruments-and-electronic-music-production/1317189-h8000fw-firewire-facts-2020-must-read.html
>   https://www.soundonsound.com/forum/viewtopic.php?t=69651&start=12
>   https://forum.rme-audio.de/viewtopic.php?id=30307
>
> This is a big problem for me.
>
> Some of these might be a broken BIOS that turns on 10-bit tags when
> the completer doesn't support them.  I didn't try to debug them to
> that level.  But the last thing I want is to enable 10-bit by default
> and cause boot issues or sound card issues or whatever.
It seems a BIOS software bug, as it turned on(as default) a 10-Bit Tag
Field for RP, but the card(non-Gen4 card) does not support 10-Bit Completer.

This patch we enable 10-Bit Tag Requester for EP when RC supports
10-Bit Tag Completer capability. So it shuld be worked ok.

The below is one of the link. other links seems the same issue.
https://forum.rme-audio.de/viewtopic.php?id=30307
"Re: AMD Ryzen X570 chipset problems
So for those running into similar problems I found out that in most
recent BIOS AMD turned on(as default) a 10-bit Tag Field, which is only
available on PCI-e Gen4 devices. So your BIOS get stuck on startup if
inserted a non-Gen4 card like my firewire card.

So you need to find that feature in your BIOS and turn it off and set
the PCI compatibility on Gen3 for the slot your card is in.

Hope this helps others running in to similar troubles."
>
> I'm pretty sure this is a show-stopper for wedging this into v5.16 at
> this late date.  It's conceivable we could still do it if everything
> defaulted to "off" and we had a knob whereby users could turn it on
> via boot param or sysfs.
Maybe we can merge this patchset later into v5.17.

But I still think default to "on" will be better,
Current we enable 10-Bit Tag, in the future PCIe 6.0 maybe need to use
14-Bit tags to get good performance.
>
> In any case, we (by which I'm afraid I mean "you" :)) need to
> investigate the problem reports, figure out whether we will see
> similar problems, and fix them before merging if we can.
We have tested a PCIe 5.0 network card on FPGA with 10-Bit tag worked
ok. I have not got the performance data as FPGA is slow.

Current we enable 10-Bit Tag Requester for EP when RC supports
10-Bit Tag Completer capability. It shoud be worked ok except hardware
bugs, we also provide boot param to diasble 10-Bit Tag if the hardware
really have a bug or can do some quirks as 8-bit tag has done if we
have known the hardware.

Thanks,
Dongdong.
>
> Thanks to Krzysztof for pointing out the potential for issues like
> this.
>
> Bjorn
> .
>
Bjorn Helgaas Nov. 3, 2021, 4:02 p.m. UTC | #4
On Wed, Nov 03, 2021 at 06:05:34PM +0800, Dongdong Liu wrote:
> On 2021/11/2 6:33, Bjorn Helgaas wrote:
> > On Mon, Nov 01, 2021 at 05:02:41PM -0500, Bjorn Helgaas wrote:
> > > On Sat, Oct 30, 2021 at 09:53:47PM +0800, Dongdong Liu wrote:
> > > > 10-Bit Tag capability, introduced in PCIe-4.0 increases the total Tag
> > > > field size from 8 bits to 10 bits.
> > > > 
> > > > PCIe spec 5.0 r1.0 section 2.2.6.2 "Considerations for Implementing
> > > > 10-Bit Tag Capabilities" Implementation Note:
> > > > 
> > > >   For platforms where the RC supports 10-Bit Tag Completer capability,
> > > >   it is highly recommended for platform firmware or operating software
> > > >   that configures PCIe hierarchies to Set the 10-Bit Tag Requester Enable
> > > >   bit automatically in Endpoints with 10-Bit Tag Requester capability.
> > > >   This enables the important class of 10-Bit Tag capable adapters that
> > > >   send Memory Read Requests only to host memory.
> > > > 
> > > > It's safe to enable 10-bit tags for all devices below a Root Port that
> > > > supports them. Switches that lack 10-Bit Tag Completer capability are
> > > > still able to forward NPRs and Completions carrying 10-Bit Tags correctly,
> > > > since the two new Tag bits are in TLP Header bits that were formerly
> > > > Reserved.
> > > 
> > > Side note: the reason we want to do this to increase performance by
> > > allowing more outstanding requests.  Do you have any benchmarking that
> > > we can mention here to show that this is actually a benefit?  I don't
> > > doubt that it is, but I assume you've measured it and it would be nice
> > > to advertise it.
> > 
> > Hmmm.  I did a quick Google search looking for "nvme pcie 10-bit tags"
> > hoping to find some performance info, but what I *actually* found was
> > several reports of 10-bit tags causing breakage:
> > 
> >   https://www.reddit.com/r/MSI_Gaming/comments/exjvzg/x570_apro_7c37vh72beta_version_has_anyone_tryed_it/
> >   https://rog.asus.com/forum/showthread.php?115064-Beware-of-agesa-1-0-0-4B-bios-not-good!/page2
> >   https://forum-en.msi.com/index.php?threads/sound-blaster-z-has-weird-behaviour-after-updating-bios-x570-gaming-edge-wifi.325223/page-2
> >   https://gearspace.com/board/electronic-music-instruments-and-electronic-music-production/1317189-h8000fw-firewire-facts-2020-must-read.html
> >   https://www.soundonsound.com/forum/viewtopic.php?t=69651&start=12
> >   https://forum.rme-audio.de/viewtopic.php?id=30307
> > 
> > This is a big problem for me.
> > 
> > Some of these might be a broken BIOS that turns on 10-bit tags
> > when the completer doesn't support them.  I didn't try to debug
> > them to that level.  But the last thing I want is to enable 10-bit
> > by default and cause boot issues or sound card issues or whatever.
>
> It seems a BIOS software bug, as it turned on (as default) a 10-Bit
> Tag Field for RP, but the card (non-Gen4 card) does not support
> 10-Bit Completer.

It doesn't matter *where* the problem is.  If we change Linux to
*expose* a BIOS bug, that's just as much of a problem as if the bug
were in Linux.  Users are not equipped to diagnose or fix problems
like that.

> This patch we enable 10-Bit Tag Requester for EP when RC supports
> 10-Bit Tag Completer capability. So it shuld be worked ok.

That's true as long as the RC supports 10-bit tags correctly when it
advertises support for them.  It "should" work :)

But it does remind me that if the RC doesn't support 10-bit tags, but
we use sysfs to enable 10-bit tags for a reqester that intends to use
P2PDMA to a peer that *does* support them, I don't think there's
any check in the DMA API that prevents the driver from setting up DMA
to the RC in addition to the peer.

> But I still think default to "on" will be better,
> Current we enable 10-Bit Tag, in the future PCIe 6.0 maybe need to use
> 14-Bit tags to get good performance.

Maybe we can default to "on" based on BIOS date or something.  Older
systems that want the benefit can use the param to enable it, and if
there's a problem, the cause will be obvious ("we booted with
'pci=tag-bits=10' and things broke").

If we enable 10-bit tags by default on systems from 2022 or newer, we
shouldn't break any existing systems, and we have a chance to discover
any problems and add quirk if necessary.

> > In any case, we (by which I'm afraid I mean "you" :)) need to
> > investigate the problem reports, figure out whether we will see
> > similar problems, and fix them before merging if we can.
>
> We have tested a PCIe 5.0 network card on FPGA with 10-Bit tag worked
> ok. I have not got the performance data as FPGA is slow.

10-bit tag support appeared in the spec four years ago (PCIe r4.0, in
September, 2017).  Surely there is production hardware that supports
this and could demonstrate a benefit from this.

We need a commit log that says "enabling 10-bit tags allows more
outstanding transactions, which improves performance of adapters like
X by Y% on these workloads," not a log that says "we think enabling
10-bit tags is safe, but users with non-compliant hardware may see new
PCIe errors or even non-bootable systems, and they should use boot
param X to work around this."

> Current we enable 10-Bit Tag Requester for EP when RC supports
> 10-Bit Tag Completer capability. It should be worked ok except
> hardware bugs, we also provide boot param to disable 10-Bit Tag if
> the hardware really have a bug or can do some quirks as 8-bit tag
> has done if we have known the hardware.

The problem is that turning it on by default means systems with
hardware defects *used* to work but now they mysteriously *stop*
working.  Yes, a boot param can work around that, but it's just
not an acceptable user experience.  Maybe there are no such defects.
I dunno.

Bjorn
Dongdong Liu Nov. 5, 2021, 8:24 a.m. UTC | #5
On 2021/11/4 0:02, Bjorn Helgaas wrote:
> On Wed, Nov 03, 2021 at 06:05:34PM +0800, Dongdong Liu wrote:
>> On 2021/11/2 6:33, Bjorn Helgaas wrote:
>>> On Mon, Nov 01, 2021 at 05:02:41PM -0500, Bjorn Helgaas wrote:
>>>> On Sat, Oct 30, 2021 at 09:53:47PM +0800, Dongdong Liu wrote:
>>>>> 10-Bit Tag capability, introduced in PCIe-4.0 increases the total Tag
>>>>> field size from 8 bits to 10 bits.
>>>>>
>>>>> PCIe spec 5.0 r1.0 section 2.2.6.2 "Considerations for Implementing
>>>>> 10-Bit Tag Capabilities" Implementation Note:
>>>>>
>>>>>   For platforms where the RC supports 10-Bit Tag Completer capability,
>>>>>   it is highly recommended for platform firmware or operating software
>>>>>   that configures PCIe hierarchies to Set the 10-Bit Tag Requester Enable
>>>>>   bit automatically in Endpoints with 10-Bit Tag Requester capability.
>>>>>   This enables the important class of 10-Bit Tag capable adapters that
>>>>>   send Memory Read Requests only to host memory.
>>>>>
>>>>> It's safe to enable 10-bit tags for all devices below a Root Port that
>>>>> supports them. Switches that lack 10-Bit Tag Completer capability are
>>>>> still able to forward NPRs and Completions carrying 10-Bit Tags correctly,
>>>>> since the two new Tag bits are in TLP Header bits that were formerly
>>>>> Reserved.
>>>>
>>>> Side note: the reason we want to do this to increase performance by
>>>> allowing more outstanding requests.  Do you have any benchmarking that
>>>> we can mention here to show that this is actually a benefit?  I don't
>>>> doubt that it is, but I assume you've measured it and it would be nice
>>>> to advertise it.
>>>
>>> Hmmm.  I did a quick Google search looking for "nvme pcie 10-bit tags"
>>> hoping to find some performance info, but what I *actually* found was
>>> several reports of 10-bit tags causing breakage:
>>>
>>>   https://www.reddit.com/r/MSI_Gaming/comments/exjvzg/x570_apro_7c37vh72beta_version_has_anyone_tryed_it/
>>>   https://rog.asus.com/forum/showthread.php?115064-Beware-of-agesa-1-0-0-4B-bios-not-good!/page2
>>>   https://forum-en.msi.com/index.php?threads/sound-blaster-z-has-weird-behaviour-after-updating-bios-x570-gaming-edge-wifi.325223/page-2
>>>   https://gearspace.com/board/electronic-music-instruments-and-electronic-music-production/1317189-h8000fw-firewire-facts-2020-must-read.html
>>>   https://www.soundonsound.com/forum/viewtopic.php?t=69651&start=12
>>>   https://forum.rme-audio.de/viewtopic.php?id=30307
>>>
>>> This is a big problem for me.
>>>
>>> Some of these might be a broken BIOS that turns on 10-bit tags
>>> when the completer doesn't support them.  I didn't try to debug
>>> them to that level.  But the last thing I want is to enable 10-bit
>>> by default and cause boot issues or sound card issues or whatever.
>>
>> It seems a BIOS software bug, as it turned on (as default) a 10-Bit
>> Tag Field for RP, but the card (non-Gen4 card) does not support
>> 10-Bit Completer.
>
> It doesn't matter *where* the problem is.  If we change Linux to
> *expose* a BIOS bug, that's just as much of a problem as if the bug
> were in Linux.  Users are not equipped to diagnose or fix problems
> like that.
>
>> This patch we enable 10-Bit Tag Requester for EP when RC supports
>> 10-Bit Tag Completer capability. So it shuld be worked ok.
>
> That's true as long as the RC supports 10-bit tags correctly when it
> advertises support for them.  It "should" work :)
>
> But it does remind me that if the RC doesn't support 10-bit tags, but
> we use sysfs to enable 10-bit tags for a reqester that intends to use
> P2PDMA to a peer that *does* support them, I don't think there's
> any check in the DMA API that prevents the driver from setting up DMA
> to the RC in addition to the peer.
Current we use sysfs to enable/disable 10-bit tags for a requester also
depend on the RP support 10-bit tag completer, so it will be ok.
>
>> But I still think default to "on" will be better,
>> Current we enable 10-Bit Tag, in the future PCIe 6.0 maybe need to use
>> 14-Bit tags to get good performance.
>
> Maybe we can default to "on" based on BIOS date or something.  Older
> systems that want the benefit can use the param to enable it, and if
> there's a problem, the cause will be obvious ("we booted with
> 'pci=tag-bits=10' and things broke").
>
> If we enable 10-bit tags by default on systems from 2022 or newer, we
> shouldn't break any existing systems, and we have a chance to discover
> any problems and add quirk if necessary.
>
>>> In any case, we (by which I'm afraid I mean "you" :)) need to
>>> investigate the problem reports, figure out whether we will see
>>> similar problems, and fix them before merging if we can.
>>
>> We have tested a PCIe 5.0 network card on FPGA with 10-Bit tag worked
>> ok. I have not got the performance data as FPGA is slow.
>
> 10-bit tag support appeared in the spec four years ago (PCIe r4.0, in
> September, 2017).  Surely there is production hardware that supports
> this and could demonstrate a benefit from this.
I found the below introduction about "Number of tags needed to achieve
maximum throughput for PCIe 4.0 and PCIe 5.0 links"
https://www.synopsys.com/designware-ip/technical-bulletin/accelerating-32gtps-pcie5-designs.html

It seems pretty clear.
>
> We need a commit log that says "enabling 10-bit tags allows more
> outstanding transactions, which improves performance of adapters like
> X by Y% on these workloads," not a log that says "we think enabling
> 10-bit tags is safe, but users with non-compliant hardware may see new
> PCIe errors or even non-bootable systems, and they should use boot
> param X to work around this."
Looks good, will fix the commit log.
I investigate some PCIe 4.0 cards such as mlx cx5(PCIe 4.0 16GT/s
x16), a NVME SSD(PCIe 4.0 16GT/s X4), but these cards only support
10-bit tag completer not support 10-bit tag requester. Maybe
these cards use 8-bit tag can achieve its performance specs.
>
>> Current we enable 10-Bit Tag Requester for EP when RC supports
>> 10-Bit Tag Completer capability. It should be worked ok except
>> hardware bugs, we also provide boot param to disable 10-Bit Tag if
>> the hardware really have a bug or can do some quirks as 8-bit tag
>> has done if we have known the hardware.
>
> The problem is that turning it on by default means systems with
> hardware defects *used* to work but now they mysteriously *stop*
> working.  Yes, a boot param can work around that, but it's just
> not an acceptable user experience.  Maybe there are no such defects.
> I dunno.
Ok, current defaulted to "off" and use boot param and sysfs to turn on
maybe a safe choice.

Thanks,
Dongdong
>
> Bjorn
> .
>
Bjorn Helgaas Nov. 5, 2021, 5:39 p.m. UTC | #6
On Fri, Nov 05, 2021 at 04:24:24PM +0800, Dongdong Liu wrote:
> On 2021/11/4 0:02, Bjorn Helgaas wrote:

> > But it does remind me that if the RC doesn't support 10-bit tags, but
> > we use sysfs to enable 10-bit tags for a reqester that intends to use
> > P2PDMA to a peer that *does* support them, I don't think there's
> > any check in the DMA API that prevents the driver from setting up DMA
> > to the RC in addition to the peer.
>
> Current we use sysfs to enable/disable 10-bit tags for a requester also
> depend on the RP support 10-bit tag completer, so it will be ok.

Ah, OK.  So we can never *enable* 10-bit tags unless the Root Port
supports them.

I misunderstood the purpose of this file.  When the Root Port doesn't
support 10-bit tags, we won't enable them during enumeration.  I
though the point was that if we want to do P2PDMA to a peer that
*does* support them, we could use this file to enable them.

But my understanding was wrong -- the real purpose of the file is to
*disable* 10-bit tags for the case when a P2PDMA peer doesn't support
them.

It does support enabling 10-bit tags as well, but that's only because
we need a way to get back to the default "enabled during enumeration"
state without having to reboot.

We might be able to highlight this a little more in the commit log.

> > 10-bit tag support appeared in the spec four years ago (PCIe r4.0, in
> > September, 2017).  Surely there is production hardware that supports
> > this and could demonstrate a benefit from this.
>
> I found the below introduction about "Number of tags needed to achieve
> maximum throughput for PCIe 4.0 and PCIe 5.0 links"
> https://www.synopsys.com/designware-ip/technical-bulletin/accelerating-32gtps-pcie5-designs.html
> 
> It seems pretty clear.

Yes, that's a start.  But we don't really need a white paper to tell
us that more outstanding transactions is better.  That's obvious.  But
this adds risk, and if we can't demonstrate a tangible, measurable
benefit, there's no point in doing it.

Bjorn
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc0..4ab2c46a1af7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3979,6 +3979,11 @@ 
 				any pair of devices, possibly at the cost of
 				reduced performance.  This also guarantees
 				that hot-added devices will work.
+		pcie_tag_peer2peer	Disable 10-Bit Tag Requester for all
+				PCIe devices. This configuration allows
+				peer-to-peer DMA between any pair of devices,
+				possibly at the cost of reduced performance.
+
 		cbiosize=nn[KMG]	The fixed amount of bus space which is
 				reserved for the CardBus bridge's IO window.
 				The default value is 256 bytes.
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 3627e495d7af..0f8203ccc701 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -241,6 +241,9 @@  static ssize_t sriov_vf_tags_ctl_store(struct device *dev,
 	if (vf_dev->driver)
 		return -EBUSY;
 
+	if (pcie_tag_config == PCIE_TAG_PEER2PEER)
+		return -EPERM;
+
 	if (!pcie_rp_10bit_tag_cmp_supported(pdev))
 		return -EPERM;
 
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 04fd9a8b9d4e..71647bb3ac06 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -326,6 +326,9 @@  static ssize_t tags_store(struct device *dev,
 	if (pdev->driver)
 		return -EBUSY;
 
+	if (pcie_tag_config == PCIE_TAG_PEER2PEER)
+		return -EPERM;
+
 	if (!pcie_rp_10bit_tag_cmp_supported(pdev))
 		return -EPERM;
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 64138a83b0f7..46faf2e8c8ab 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -118,6 +118,8 @@  enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_PEER2PEER;
 enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_DEFAULT;
 #endif
 
+enum pcie_tag_config_types pcie_tag_config = PCIE_TAG_DEFAULT;
+
 /*
  * The default CLS is used if arch didn't set CLS explicitly and not
  * all pci devices agree on the same value.  Arch can override either
@@ -6795,6 +6797,8 @@  static int __init pci_setup(char *str)
 				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
 			} else if (!strncmp(str, "disable_acs_redir=", 18)) {
 				disable_acs_redir_param = str + 18;
+			} else if (!strncmp(str, "pcie_tag_peer2peer", 18)) {
+				pcie_tag_config = PCIE_TAG_PEER2PEER;
 			} else {
 				pr_err("PCI: Unknown option `%s'\n", str);
 			}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index f719a41dfc7f..7846aa7b85dc 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -59,6 +59,13 @@  struct pci_cap_saved_state *pci_find_saved_cap(struct pci_dev *dev, char cap);
 struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
 						   u16 cap);
 
+enum pcie_tag_config_types {
+	PCIE_TAG_DEFAULT,   /* Enable 10-Bit Tag Requester for devices below
+			       Root Port that support 10-Bit Tag Completer. */
+	PCIE_TAG_PEER2PEER  /* Disable 10-Bit Tag Requester for all devices. */
+};
+extern enum pcie_tag_config_types pcie_tag_config;
+
 #define PCI_PM_D2_DELAY         200	/* usec; see PCIe r4.0, sec 5.9.1 */
 #define PCI_PM_D3HOT_WAIT       10	/* msec */
 #define PCI_PM_D3COLD_WAIT      100	/* msec */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8f5372c7c737..18a74e257dd9 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2059,12 +2059,20 @@  bool pcie_rp_10bit_tag_cmp_supported(struct pci_dev *dev)
 int pci_configure_extended_tags(struct pci_dev *dev, void *ign)
 {
 	struct pci_host_bridge *host;
-	u16 ctl;
+	u16 ctl, ctl2;
 	int ret;
 
 	if (!pci_is_pcie(dev))
 		return 0;
 
+	/*
+	 * PCIe 5.0 section 9.3.5.4 Extended Tag Field Enable in Device Control
+	 * Register is RsvdP fo VF. section 9.3.5.10 10-Bit Tag Requester
+	 * Enable in Device Control 2 Register is RsvdP for VF.
+	 */
+	if (dev->is_virtfn)
+		return 0;
+
 	if (!(dev->devcap & PCI_EXP_DEVCAP_EXT_TAG))
 		return 0;
 
@@ -2072,6 +2080,10 @@  int pci_configure_extended_tags(struct pci_dev *dev, void *ign)
 	if (ret)
 		return 0;
 
+	ret = pcie_capability_read_word(dev, PCI_EXP_DEVCTL2, &ctl2);
+	if (ret)
+		return 0;
+
 	host = pci_find_host_bridge(dev->bus);
 	if (!host)
 		return 0;
@@ -2086,6 +2098,12 @@  int pci_configure_extended_tags(struct pci_dev *dev, void *ign)
 			pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
 						   PCI_EXP_DEVCTL_EXT_TAG);
 		}
+
+		if (ctl2 & PCI_EXP_DEVCTL2_10BIT_TAG_REQ_EN) {
+			pci_info(dev, "disabling 10-Bit Tags\n");
+			pcie_capability_clear_word(dev, PCI_EXP_DEVCTL2,
+					PCI_EXP_DEVCTL2_10BIT_TAG_REQ_EN);
+		}
 		return 0;
 	}
 
@@ -2100,6 +2118,28 @@  int pci_configure_extended_tags(struct pci_dev *dev, void *ign)
 		pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
 					 PCI_EXP_DEVCTL_EXT_TAG);
 	}
+
+	if ((pcie_tag_config == PCIE_TAG_PEER2PEER) &&
+	    (ctl2 & PCI_EXP_DEVCTL2_10BIT_TAG_REQ_EN)) {
+		pci_info(dev, "disabling 10-Bit Tags\n");
+		pcie_capability_clear_word(dev, PCI_EXP_DEVCTL2,
+					   PCI_EXP_DEVCTL2_10BIT_TAG_REQ_EN);
+		return 0;
+	}
+
+	if (pcie_tag_config != PCIE_TAG_DEFAULT)
+		return 0;
+
+	if (!pcie_rp_10bit_tag_cmp_supported(dev))
+		return 0;
+
+	if (pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT)
+		return 0;
+
+	pci_dbg(dev, "enabling 10-Bit Tag Requester\n");
+	pcie_capability_set_word(dev, PCI_EXP_DEVCTL2,
+				 PCI_EXP_DEVCTL2_10BIT_TAG_REQ_EN);
+
 	return 0;
 }