diff mbox series

[V3,3/4] PCI: Improve the MRRS quirk for LS7A

Message ID 20210625093030.3698570-4-chenhuacai@loongson.cn
State New
Headers show
Series PCI: Loongson-related pci quirks | expand

Commit Message

陈华才 June 25, 2021, 9:30 a.m. UTC
In new revision of LS7A, some PCIe ports support larger value than 256,
but their maximum supported MRRS values are not detectable. Moreover,
the current loongson_mrrs_quirk() cannot avoid devices increasing its
MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
will actually set a big value in its driver. So the only possible way is
configure MRRS of all devices in BIOS, and add a PCI device flag (i.e.,
PCI_DEV_FLAGS_NO_INCREASE_MRRS) to stop the increasing MRRS operations.

However, according to PCIe Spec, it is legal for an OS to program any
value for MRRS, and it is also legal for an endpoint to generate a Read
Request with any size up to its MRRS. As the hardware engineers says,
the root cause here is LS7A doesn't break up large read requests (Yes,
that is a problem in the LS7A design).

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/pci.c    | 5 +++++
 drivers/pci/quirks.c | 8 +++++++-
 include/linux/pci.h  | 2 ++
 3 files changed, 14 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas June 25, 2021, 10:22 p.m. UTC | #1
On Fri, Jun 25, 2021 at 05:30:29PM +0800, Huacai Chen wrote:
> In new revision of LS7A, some PCIe ports support larger value than 256,
> but their maximum supported MRRS values are not detectable. Moreover,
> the current loongson_mrrs_quirk() cannot avoid devices increasing its
> MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
> will actually set a big value in its driver. So the only possible way is
> configure MRRS of all devices in BIOS, and add a PCI device flag (i.e.,
> PCI_DEV_FLAGS_NO_INCREASE_MRRS) to stop the increasing MRRS operations.
> 
> However, according to PCIe Spec, it is legal for an OS to program any
> value for MRRS, and it is also legal for an endpoint to generate a Read
> Request with any size up to its MRRS. As the hardware engineers says,
> the root cause here is LS7A doesn't break up large read requests (Yes,
> that is a problem in the LS7A design).

"LS7A doesn't break up large read requests" claims to be a root cause,
but you haven't yet said what the actual *problem* is.

Is the problem that an endpoint reports a malformed TLP because it
received a completion bigger than it can handle?  Is it that the LS7A
root port reports some kind of error if it receives a Memory Read
request with a size that's "too big"?  Maybe the LS7A doesn't know
what to do when it receives a Memory Read request with MRRS > MPS?
What exactly happens when the problem occurs?

MRRS applies only to the read request.  It is not directly related to
the size of the completions that carry the data back to the device
(except that obviously you shouldn't get a completion larger than the
read you requested).

The setting that directly controls the size of completions is MPS
(Max_Payload_Size).  One reason to break up read requests is because
the endpoint's buffers can't accommodate big TLPs.  One way to deal
with that is to set MPS in the hierarchy to a smaller value.  Then the
root port must ensure that no TLP exceeds the MPS size, regardless of
what the MRRS in the read request was.

For example, if the endpoint's MRRS=4096 and the hierarchy's MPS=128,
it's up to the root port to break up completions into 128-byte chunks.

It's also possible to set the endpoint's MRRS=128, which means reads
to main memory will never receive completions larger than 128 bytes.
But it does NOT guarantee that a peer-to-peer DMA from another device
will be limited to 128 bytes.  The other device is allowed to generate
Memory Write TLPs with payloads up to its MPS size, and MRRS is not
involved at all.

It's not clear yet whether the LS7A problem is with MRRS, with MPS, or
with some combination.  It's important to understand exactly what is
broken here so the quirk doesn't get in the way of future changes to
the generic MRRS and MPS configuration.

Here's a good overview:

  https://www.xilinx.com/support/documentation/white_papers/wp350.pdf

> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---
>  drivers/pci/pci.c    | 5 +++++
>  drivers/pci/quirks.c | 8 +++++++-
>  include/linux/pci.h  | 2 ++
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b717680377a9..6f0d2f5b6f30 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5802,6 +5802,11 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
>  
>  	v = (ffs(rq) - 8) << 12;
>  
> +	if (dev->dev_flags & PCI_DEV_FLAGS_NO_INCREASE_MRRS) {
> +		if (rq > pcie_get_readrq(dev))
> +			return -EINVAL;
> +	}
> +
>  	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
>  						  PCI_EXP_DEVCTL_READRQ, v);
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index dee4798a49fc..8284480dc7e4 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -263,7 +263,13 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
>  		 * anything larger than this. So force this limit on
>  		 * any devices attached under these ports.
>  		 */
> -		if (pci_match_id(bridge_devids, bridge)) {
> +		if (bridge && pci_match_id(bridge_devids, bridge)) {
> +			dev->dev_flags |= PCI_DEV_FLAGS_NO_INCREASE_MRRS;
> +
> +			if (pcie_bus_config == PCIE_BUS_DEFAULT ||
> +			    pcie_bus_config == PCIE_BUS_TUNE_OFF)
> +				break;
> +
>  			if (pcie_get_readrq(dev) > 256) {
>  				pci_info(dev, "limiting MRRS to 256\n");
>  				pcie_set_readrq(dev, 256);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 24306504226a..5e0ec3e4318b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -227,6 +227,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
>  	/* Don't use Relaxed Ordering for TLPs directed at this device */
>  	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
> +	/* Don't increase BIOS's MRRS configuration */
> +	PCI_DEV_FLAGS_NO_INCREASE_MRRS = (__force pci_dev_flags_t) (1 << 12),
>  };
>  
>  enum pci_irq_reroute_variant {
> -- 
> 2.27.0
>
Bjorn Helgaas June 26, 2021, 3:48 p.m. UTC | #2
On Fri, Jun 25, 2021 at 05:22:04PM -0500, Bjorn Helgaas wrote:
> On Fri, Jun 25, 2021 at 05:30:29PM +0800, Huacai Chen wrote:
> > In new revision of LS7A, some PCIe ports support larger value than 256,
> > but their maximum supported MRRS values are not detectable. Moreover,
> > the current loongson_mrrs_quirk() cannot avoid devices increasing its
> > MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
> > will actually set a big value in its driver. So the only possible way is
> > configure MRRS of all devices in BIOS, and add a PCI device flag (i.e.,
> > PCI_DEV_FLAGS_NO_INCREASE_MRRS) to stop the increasing MRRS operations.
> > 
> > However, according to PCIe Spec, it is legal for an OS to program any
> > value for MRRS, and it is also legal for an endpoint to generate a Read
> > Request with any size up to its MRRS. As the hardware engineers says,
> > the root cause here is LS7A doesn't break up large read requests (Yes,
> > that is a problem in the LS7A design).
> 
> "LS7A doesn't break up large read requests" claims to be a root cause,
> but you haven't yet said what the actual *problem* is.
> 
> Is the problem that an endpoint reports a malformed TLP because it
> received a completion bigger than it can handle?  Is it that the LS7A
> root port reports some kind of error if it receives a Memory Read
> request with a size that's "too big"?  Maybe the LS7A doesn't know
> what to do when it receives a Memory Read request with MRRS > MPS?
> What exactly happens when the problem occurs?
> 
> MRRS applies only to the read request.  It is not directly related to
> the size of the completions that carry the data back to the device
> (except that obviously you shouldn't get a completion larger than the
> read you requested).
> 
> The setting that directly controls the size of completions is MPS
> (Max_Payload_Size).  One reason to break up read requests is because
> the endpoint's buffers can't accommodate big TLPs.  One way to deal
> with that is to set MPS in the hierarchy to a smaller value.  Then the
> root port must ensure that no TLP exceeds the MPS size, regardless of
> what the MRRS in the read request was.
> 
> For example, if the endpoint's MRRS=4096 and the hierarchy's MPS=128,
> it's up to the root port to break up completions into 128-byte chunks.
> 
> It's also possible to set the endpoint's MRRS=128, which means reads
> to main memory will never receive completions larger than 128 bytes.
> But it does NOT guarantee that a peer-to-peer DMA from another device
> will be limited to 128 bytes.  The other device is allowed to generate
> Memory Write TLPs with payloads up to its MPS size, and MRRS is not
> involved at all.
> 
> It's not clear yet whether the LS7A problem is with MRRS, with MPS, or
> with some combination.  It's important to understand exactly what is
> broken here so the quirk doesn't get in the way of future changes to
> the generic MRRS and MPS configuration.
> 
> Here's a good overview:
> 
>   https://www.xilinx.com/support/documentation/white_papers/wp350.pdf
> 
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
> >  drivers/pci/pci.c    | 5 +++++
> >  drivers/pci/quirks.c | 8 +++++++-
> >  include/linux/pci.h  | 2 ++
> >  3 files changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b717680377a9..6f0d2f5b6f30 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5802,6 +5802,11 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
> >  
> >  	v = (ffs(rq) - 8) << 12;
> >  
> > +	if (dev->dev_flags & PCI_DEV_FLAGS_NO_INCREASE_MRRS) {
> > +		if (rq > pcie_get_readrq(dev))
> > +			return -EINVAL;
> > +	}
> > +
> >  	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
> >  						  PCI_EXP_DEVCTL_READRQ, v);
> >  
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index dee4798a49fc..8284480dc7e4 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -263,7 +263,13 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
> >  		 * anything larger than this. So force this limit on
> >  		 * any devices attached under these ports.
> >  		 */
> > -		if (pci_match_id(bridge_devids, bridge)) {
> > +		if (bridge && pci_match_id(bridge_devids, bridge)) {
> > +			dev->dev_flags |= PCI_DEV_FLAGS_NO_INCREASE_MRRS;
> > +
> > +			if (pcie_bus_config == PCIE_BUS_DEFAULT ||
> > +			    pcie_bus_config == PCIE_BUS_TUNE_OFF)
> > +				break;

Another approach might be to make a quirk that prevents Linux from
touching MPS and MRRS at all under any circumstances.

You'd have to do this without reference to pcie_bus_config so future
MPS/MRRS algorithm changes wouldn't be affected.  And the quirk bit
would have to be in struct pci_host_bridge, similar to no_ext_tags.

> >  			if (pcie_get_readrq(dev) > 256) {
> >  				pci_info(dev, "limiting MRRS to 256\n");
> >  				pcie_set_readrq(dev, 256);
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 24306504226a..5e0ec3e4318b 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -227,6 +227,8 @@ enum pci_dev_flags {
> >  	PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
> >  	/* Don't use Relaxed Ordering for TLPs directed at this device */
> >  	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
> > +	/* Don't increase BIOS's MRRS configuration */
> > +	PCI_DEV_FLAGS_NO_INCREASE_MRRS = (__force pci_dev_flags_t) (1 << 12),
> >  };
> >  
> >  enum pci_irq_reroute_variant {
> > -- 
> > 2.27.0
> >
Huacai Chen June 27, 2021, 10:25 a.m. UTC | #3
Hi, Bjorn,

On Sat, Jun 26, 2021 at 6:22 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Jun 25, 2021 at 05:30:29PM +0800, Huacai Chen wrote:
> > In new revision of LS7A, some PCIe ports support larger value than 256,
> > but their maximum supported MRRS values are not detectable. Moreover,
> > the current loongson_mrrs_quirk() cannot avoid devices increasing its
> > MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
> > will actually set a big value in its driver. So the only possible way is
> > configure MRRS of all devices in BIOS, and add a PCI device flag (i.e.,
> > PCI_DEV_FLAGS_NO_INCREASE_MRRS) to stop the increasing MRRS operations.
> >
> > However, according to PCIe Spec, it is legal for an OS to program any
> > value for MRRS, and it is also legal for an endpoint to generate a Read
> > Request with any size up to its MRRS. As the hardware engineers says,
> > the root cause here is LS7A doesn't break up large read requests (Yes,
> > that is a problem in the LS7A design).
>
> "LS7A doesn't break up large read requests" claims to be a root cause,
> but you haven't yet said what the actual *problem* is.
>
> Is the problem that an endpoint reports a malformed TLP because it
> received a completion bigger than it can handle?  Is it that the LS7A
> root port reports some kind of error if it receives a Memory Read
> request with a size that's "too big"?  Maybe the LS7A doesn't know
> what to do when it receives a Memory Read request with MRRS > MPS?
> What exactly happens when the problem occurs?
The hardware engineer said that the problem is: LS7A PCIe port reports
CA (Completer Abort) if it receives a Memory Read
request with a size that's "too big".

Huacai
>
> MRRS applies only to the read request.  It is not directly related to
> the size of the completions that carry the data back to the device
> (except that obviously you shouldn't get a completion larger than the
> read you requested).
>
> The setting that directly controls the size of completions is MPS
> (Max_Payload_Size).  One reason to break up read requests is because
> the endpoint's buffers can't accommodate big TLPs.  One way to deal
> with that is to set MPS in the hierarchy to a smaller value.  Then the
> root port must ensure that no TLP exceeds the MPS size, regardless of
> what the MRRS in the read request was.
>
> For example, if the endpoint's MRRS=4096 and the hierarchy's MPS=128,
> it's up to the root port to break up completions into 128-byte chunks.
>
> It's also possible to set the endpoint's MRRS=128, which means reads
> to main memory will never receive completions larger than 128 bytes.
> But it does NOT guarantee that a peer-to-peer DMA from another device
> will be limited to 128 bytes.  The other device is allowed to generate
> Memory Write TLPs with payloads up to its MPS size, and MRRS is not
> involved at all.
>
> It's not clear yet whether the LS7A problem is with MRRS, with MPS, or
> with some combination.  It's important to understand exactly what is
> broken here so the quirk doesn't get in the way of future changes to
> the generic MRRS and MPS configuration.
>
> Here's a good overview:
>
>   https://www.xilinx.com/support/documentation/white_papers/wp350.pdf
>
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
> >  drivers/pci/pci.c    | 5 +++++
> >  drivers/pci/quirks.c | 8 +++++++-
> >  include/linux/pci.h  | 2 ++
> >  3 files changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b717680377a9..6f0d2f5b6f30 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5802,6 +5802,11 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
> >
> >       v = (ffs(rq) - 8) << 12;
> >
> > +     if (dev->dev_flags & PCI_DEV_FLAGS_NO_INCREASE_MRRS) {
> > +             if (rq > pcie_get_readrq(dev))
> > +                     return -EINVAL;
> > +     }
> > +
> >       ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
> >                                                 PCI_EXP_DEVCTL_READRQ, v);
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index dee4798a49fc..8284480dc7e4 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -263,7 +263,13 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
> >                * anything larger than this. So force this limit on
> >                * any devices attached under these ports.
> >                */
> > -             if (pci_match_id(bridge_devids, bridge)) {
> > +             if (bridge && pci_match_id(bridge_devids, bridge)) {
> > +                     dev->dev_flags |= PCI_DEV_FLAGS_NO_INCREASE_MRRS;
> > +
> > +                     if (pcie_bus_config == PCIE_BUS_DEFAULT ||
> > +                         pcie_bus_config == PCIE_BUS_TUNE_OFF)
> > +                             break;
> > +
> >                       if (pcie_get_readrq(dev) > 256) {
> >                               pci_info(dev, "limiting MRRS to 256\n");
> >                               pcie_set_readrq(dev, 256);
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 24306504226a..5e0ec3e4318b 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -227,6 +227,8 @@ enum pci_dev_flags {
> >       PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
> >       /* Don't use Relaxed Ordering for TLPs directed at this device */
> >       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
> > +     /* Don't increase BIOS's MRRS configuration */
> > +     PCI_DEV_FLAGS_NO_INCREASE_MRRS = (__force pci_dev_flags_t) (1 << 12),
> >  };
> >
> >  enum pci_irq_reroute_variant {
> > --
> > 2.27.0
> >
Huacai Chen June 27, 2021, 10:27 a.m. UTC | #4
Hi, Bjorn,

On Sat, Jun 26, 2021 at 11:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Jun 25, 2021 at 05:22:04PM -0500, Bjorn Helgaas wrote:
> > On Fri, Jun 25, 2021 at 05:30:29PM +0800, Huacai Chen wrote:
> > > In new revision of LS7A, some PCIe ports support larger value than 256,
> > > but their maximum supported MRRS values are not detectable. Moreover,
> > > the current loongson_mrrs_quirk() cannot avoid devices increasing its
> > > MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
> > > will actually set a big value in its driver. So the only possible way is
> > > configure MRRS of all devices in BIOS, and add a PCI device flag (i.e.,
> > > PCI_DEV_FLAGS_NO_INCREASE_MRRS) to stop the increasing MRRS operations.
> > >
> > > However, according to PCIe Spec, it is legal for an OS to program any
> > > value for MRRS, and it is also legal for an endpoint to generate a Read
> > > Request with any size up to its MRRS. As the hardware engineers says,
> > > the root cause here is LS7A doesn't break up large read requests (Yes,
> > > that is a problem in the LS7A design).
> >
> > "LS7A doesn't break up large read requests" claims to be a root cause,
> > but you haven't yet said what the actual *problem* is.
> >
> > Is the problem that an endpoint reports a malformed TLP because it
> > received a completion bigger than it can handle?  Is it that the LS7A
> > root port reports some kind of error if it receives a Memory Read
> > request with a size that's "too big"?  Maybe the LS7A doesn't know
> > what to do when it receives a Memory Read request with MRRS > MPS?
> > What exactly happens when the problem occurs?
> >
> > MRRS applies only to the read request.  It is not directly related to
> > the size of the completions that carry the data back to the device
> > (except that obviously you shouldn't get a completion larger than the
> > read you requested).
> >
> > The setting that directly controls the size of completions is MPS
> > (Max_Payload_Size).  One reason to break up read requests is because
> > the endpoint's buffers can't accommodate big TLPs.  One way to deal
> > with that is to set MPS in the hierarchy to a smaller value.  Then the
> > root port must ensure that no TLP exceeds the MPS size, regardless of
> > what the MRRS in the read request was.
> >
> > For example, if the endpoint's MRRS=4096 and the hierarchy's MPS=128,
> > it's up to the root port to break up completions into 128-byte chunks.
> >
> > It's also possible to set the endpoint's MRRS=128, which means reads
> > to main memory will never receive completions larger than 128 bytes.
> > But it does NOT guarantee that a peer-to-peer DMA from another device
> > will be limited to 128 bytes.  The other device is allowed to generate
> > Memory Write TLPs with payloads up to its MPS size, and MRRS is not
> > involved at all.
> >
> > It's not clear yet whether the LS7A problem is with MRRS, with MPS, or
> > with some combination.  It's important to understand exactly what is
> > broken here so the quirk doesn't get in the way of future changes to
> > the generic MRRS and MPS configuration.
> >
> > Here's a good overview:
> >
> >   https://www.xilinx.com/support/documentation/white_papers/wp350.pdf
> >
> > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > > ---
> > >  drivers/pci/pci.c    | 5 +++++
> > >  drivers/pci/quirks.c | 8 +++++++-
> > >  include/linux/pci.h  | 2 ++
> > >  3 files changed, 14 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index b717680377a9..6f0d2f5b6f30 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -5802,6 +5802,11 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
> > >
> > >     v = (ffs(rq) - 8) << 12;
> > >
> > > +   if (dev->dev_flags & PCI_DEV_FLAGS_NO_INCREASE_MRRS) {
> > > +           if (rq > pcie_get_readrq(dev))
> > > +                   return -EINVAL;
> > > +   }
> > > +
> > >     ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
> > >                                               PCI_EXP_DEVCTL_READRQ, v);
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index dee4798a49fc..8284480dc7e4 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -263,7 +263,13 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
> > >              * anything larger than this. So force this limit on
> > >              * any devices attached under these ports.
> > >              */
> > > -           if (pci_match_id(bridge_devids, bridge)) {
> > > +           if (bridge && pci_match_id(bridge_devids, bridge)) {
> > > +                   dev->dev_flags |= PCI_DEV_FLAGS_NO_INCREASE_MRRS;
> > > +
> > > +                   if (pcie_bus_config == PCIE_BUS_DEFAULT ||
> > > +                       pcie_bus_config == PCIE_BUS_TUNE_OFF)
> > > +                           break;
>
> Another approach might be to make a quirk that prevents Linux from
> touching MPS and MRRS at all under any circumstances.
>
> You'd have to do this without reference to pcie_bus_config so future
> MPS/MRRS algorithm changes wouldn't be affected.  And the quirk bit
> would have to be in struct pci_host_bridge, similar to no_ext_tags.
I will change the quirk bit to be a bus flag, but still allow to decrease MRRS.

Huacai
>
> > >                     if (pcie_get_readrq(dev) > 256) {
> > >                             pci_info(dev, "limiting MRRS to 256\n");
> > >                             pcie_set_readrq(dev, 256);
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 24306504226a..5e0ec3e4318b 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -227,6 +227,8 @@ enum pci_dev_flags {
> > >     PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
> > >     /* Don't use Relaxed Ordering for TLPs directed at this device */
> > >     PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
> > > +   /* Don't increase BIOS's MRRS configuration */
> > > +   PCI_DEV_FLAGS_NO_INCREASE_MRRS = (__force pci_dev_flags_t) (1 << 12),
> > >  };
> > >
> > >  enum pci_irq_reroute_variant {
> > > --
> > > 2.27.0
> > >
Bjorn Helgaas June 28, 2021, 8:51 p.m. UTC | #5
On Sun, Jun 27, 2021 at 06:25:04PM +0800, Huacai Chen wrote:
> On Sat, Jun 26, 2021 at 6:22 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Jun 25, 2021 at 05:30:29PM +0800, Huacai Chen wrote:
> > > In new revision of LS7A, some PCIe ports support larger value than 256,
> > > but their maximum supported MRRS values are not detectable. Moreover,
> > > the current loongson_mrrs_quirk() cannot avoid devices increasing its
> > > MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
> > > will actually set a big value in its driver. So the only possible way is
> > > configure MRRS of all devices in BIOS, and add a PCI device flag (i.e.,
> > > PCI_DEV_FLAGS_NO_INCREASE_MRRS) to stop the increasing MRRS operations.
> > >
> > > However, according to PCIe Spec, it is legal for an OS to program any
> > > value for MRRS, and it is also legal for an endpoint to generate a Read
> > > Request with any size up to its MRRS. As the hardware engineers says,
> > > the root cause here is LS7A doesn't break up large read requests (Yes,
> > > that is a problem in the LS7A design).
> >
> > "LS7A doesn't break up large read requests" claims to be a root cause,
> > but you haven't yet said what the actual *problem* is.
> >
> > Is the problem that an endpoint reports a malformed TLP because it
> > received a completion bigger than it can handle?  Is it that the LS7A
> > root port reports some kind of error if it receives a Memory Read
> > request with a size that's "too big"?  Maybe the LS7A doesn't know
> > what to do when it receives a Memory Read request with MRRS > MPS?
> > What exactly happens when the problem occurs?
>
> The hardware engineer said that the problem is: LS7A PCIe port reports
> CA (Completer Abort) if it receives a Memory Read
> request with a size that's "too big".

What is "too big"?

I'm trying to figure out how to make this work with hot-added devices.
Per spec (PCIe r5.0, sec 7.5.3.4), devices should power up with
MRRS=010b (512 bytes).

If Linux does not touch MRRS at all in hierarchices under LS7A, will a
hot-added device with MRRS=010b work?  Or does Linux need to actively
write MRRS to 000b (128 bytes) or 001b (256 bytes)?

Bjorn
Huacai Chen June 29, 2021, 2 a.m. UTC | #6
Hi, Bjorn,

On Tue, Jun 29, 2021 at 4:51 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Sun, Jun 27, 2021 at 06:25:04PM +0800, Huacai Chen wrote:
> > On Sat, Jun 26, 2021 at 6:22 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Fri, Jun 25, 2021 at 05:30:29PM +0800, Huacai Chen wrote:
> > > > In new revision of LS7A, some PCIe ports support larger value than 256,
> > > > but their maximum supported MRRS values are not detectable. Moreover,
> > > > the current loongson_mrrs_quirk() cannot avoid devices increasing its
> > > > MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
> > > > will actually set a big value in its driver. So the only possible way is
> > > > configure MRRS of all devices in BIOS, and add a PCI device flag (i.e.,
> > > > PCI_DEV_FLAGS_NO_INCREASE_MRRS) to stop the increasing MRRS operations.
> > > >
> > > > However, according to PCIe Spec, it is legal for an OS to program any
> > > > value for MRRS, and it is also legal for an endpoint to generate a Read
> > > > Request with any size up to its MRRS. As the hardware engineers says,
> > > > the root cause here is LS7A doesn't break up large read requests (Yes,
> > > > that is a problem in the LS7A design).
> > >
> > > "LS7A doesn't break up large read requests" claims to be a root cause,
> > > but you haven't yet said what the actual *problem* is.
> > >
> > > Is the problem that an endpoint reports a malformed TLP because it
> > > received a completion bigger than it can handle?  Is it that the LS7A
> > > root port reports some kind of error if it receives a Memory Read
> > > request with a size that's "too big"?  Maybe the LS7A doesn't know
> > > what to do when it receives a Memory Read request with MRRS > MPS?
> > > What exactly happens when the problem occurs?
> >
> > The hardware engineer said that the problem is: LS7A PCIe port reports
> > CA (Completer Abort) if it receives a Memory Read
> > request with a size that's "too big".
>
> What is "too big"?
>
"Too big" means bigger than the port can handle, PCIe SPEC allows any
MRRS value, but, but, LS7A surely violates the protocol here.

Huacai

> I'm trying to figure out how to make this work with hot-added devices.
> Per spec (PCIe r5.0, sec 7.5.3.4), devices should power up with
> MRRS=010b (512 bytes).
>
> If Linux does not touch MRRS at all in hierarchices under LS7A, will a
> hot-added device with MRRS=010b work?  Or does Linux need to actively
> write MRRS to 000b (128 bytes) or 001b (256 bytes)?
>
> Bjorn
Bjorn Helgaas June 29, 2021, 2:12 a.m. UTC | #7
On Tue, Jun 29, 2021 at 10:00:20AM +0800, Huacai Chen wrote:
> On Tue, Jun 29, 2021 at 4:51 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Sun, Jun 27, 2021 at 06:25:04PM +0800, Huacai Chen wrote:
> > > On Sat, Jun 26, 2021 at 6:22 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Fri, Jun 25, 2021 at 05:30:29PM +0800, Huacai Chen wrote:
> > > > > In new revision of LS7A, some PCIe ports support larger value than 256,
> > > > > but their maximum supported MRRS values are not detectable. Moreover,
> > > > > the current loongson_mrrs_quirk() cannot avoid devices increasing its
> > > > > MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
> > > > > will actually set a big value in its driver. So the only possible way is
> > > > > configure MRRS of all devices in BIOS, and add a PCI device flag (i.e.,
> > > > > PCI_DEV_FLAGS_NO_INCREASE_MRRS) to stop the increasing MRRS operations.
> > > > >
> > > > > However, according to PCIe Spec, it is legal for an OS to program any
> > > > > value for MRRS, and it is also legal for an endpoint to generate a Read
> > > > > Request with any size up to its MRRS. As the hardware engineers says,
> > > > > the root cause here is LS7A doesn't break up large read requests (Yes,
> > > > > that is a problem in the LS7A design).
> > > >
> > > > "LS7A doesn't break up large read requests" claims to be a root cause,
> > > > but you haven't yet said what the actual *problem* is.
> > > >
> > > > Is the problem that an endpoint reports a malformed TLP because it
> > > > received a completion bigger than it can handle?  Is it that the LS7A
> > > > root port reports some kind of error if it receives a Memory Read
> > > > request with a size that's "too big"?  Maybe the LS7A doesn't know
> > > > what to do when it receives a Memory Read request with MRRS > MPS?
> > > > What exactly happens when the problem occurs?
> > >
> > > The hardware engineer said that the problem is: LS7A PCIe port reports
> > > CA (Completer Abort) if it receives a Memory Read
> > > request with a size that's "too big".
> >
> > What is "too big"?
> >
> "Too big" means bigger than the port can handle, PCIe SPEC allows any
> MRRS value, but, but, LS7A surely violates the protocol here.

Right, I just wanted to know what the number is.  That is, what values
we can write to MRRS safely.

But ISTR you saying that it's not actually fixed, and that's why you
wanted to rely on what firmware put there.

This is important to know for the question about hot-added devices
below, because a hot-added device should power up with MRRS=512 bytes,
and if that's too big for LS7A, then we have a problem and the quirk
needs to be more extensive.

> > I'm trying to figure out how to make this work with hot-added devices.
> > Per spec (PCIe r5.0, sec 7.5.3.4), devices should power up with
> > MRRS=010b (512 bytes).
> >
> > If Linux does not touch MRRS at all in hierarchices under LS7A, will a
> > hot-added device with MRRS=010b work?  Or does Linux need to actively
> > write MRRS to 000b (128 bytes) or 001b (256 bytes)?
> >
> > Bjorn
Huacai Chen June 29, 2021, 3:32 a.m. UTC | #8
Hi, Bjorn,

On Tue, Jun 29, 2021 at 10:12 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Jun 29, 2021 at 10:00:20AM +0800, Huacai Chen wrote:
> > On Tue, Jun 29, 2021 at 4:51 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Sun, Jun 27, 2021 at 06:25:04PM +0800, Huacai Chen wrote:
> > > > On Sat, Jun 26, 2021 at 6:22 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Fri, Jun 25, 2021 at 05:30:29PM +0800, Huacai Chen wrote:
> > > > > > In new revision of LS7A, some PCIe ports support larger value than 256,
> > > > > > but their maximum supported MRRS values are not detectable. Moreover,
> > > > > > the current loongson_mrrs_quirk() cannot avoid devices increasing its
> > > > > > MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
> > > > > > will actually set a big value in its driver. So the only possible way is
> > > > > > configure MRRS of all devices in BIOS, and add a PCI device flag (i.e.,
> > > > > > PCI_DEV_FLAGS_NO_INCREASE_MRRS) to stop the increasing MRRS operations.
> > > > > >
> > > > > > However, according to PCIe Spec, it is legal for an OS to program any
> > > > > > value for MRRS, and it is also legal for an endpoint to generate a Read
> > > > > > Request with any size up to its MRRS. As the hardware engineers says,
> > > > > > the root cause here is LS7A doesn't break up large read requests (Yes,
> > > > > > that is a problem in the LS7A design).
> > > > >
> > > > > "LS7A doesn't break up large read requests" claims to be a root cause,
> > > > > but you haven't yet said what the actual *problem* is.
> > > > >
> > > > > Is the problem that an endpoint reports a malformed TLP because it
> > > > > received a completion bigger than it can handle?  Is it that the LS7A
> > > > > root port reports some kind of error if it receives a Memory Read
> > > > > request with a size that's "too big"?  Maybe the LS7A doesn't know
> > > > > what to do when it receives a Memory Read request with MRRS > MPS?
> > > > > What exactly happens when the problem occurs?
> > > >
> > > > The hardware engineer said that the problem is: LS7A PCIe port reports
> > > > CA (Completer Abort) if it receives a Memory Read
> > > > request with a size that's "too big".
> > >
> > > What is "too big"?
> > >
> > "Too big" means bigger than the port can handle, PCIe SPEC allows any
> > MRRS value, but, but, LS7A surely violates the protocol here.
>
> Right, I just wanted to know what the number is.  That is, what values
> we can write to MRRS safely.
>
> But ISTR you saying that it's not actually fixed, and that's why you
> wanted to rely on what firmware put there.
Yes, it's not fixed (256 on some ports and 4096 on other ports), so we
should heavily depend on firmware.

Huacai
>
> This is important to know for the question about hot-added devices
> below, because a hot-added device should power up with MRRS=512 bytes,
> and if that's too big for LS7A, then we have a problem and the quirk
> needs to be more extensive.
>
> > > I'm trying to figure out how to make this work with hot-added devices.
> > > Per spec (PCIe r5.0, sec 7.5.3.4), devices should power up with
> > > MRRS=010b (512 bytes).

> > >
> > > If Linux does not touch MRRS at all in hierarchices under LS7A, will a
> > > hot-added device with MRRS=010b work?  Or does Linux need to actively
> > > write MRRS to 000b (128 bytes) or 001b (256 bytes)?
Emm, hot-plug is a problem, maybe we can only disable hot-plug in
board design...

Huacai
> > >
> > > Bjorn
Jiaxun Yang June 29, 2021, 3:38 a.m. UTC | #9
在 2021/6/29 上午11:32, Huacai Chen 写道:
> Hi, Bjorn,
>
> On Tue, Jun 29, 2021 at 10:12 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>> On Tue, Jun 29, 2021 at 10:00:20AM +0800, Huacai Chen wrote:
>>> On Tue, Jun 29, 2021 at 4:51 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>> On Sun, Jun 27, 2021 at 06:25:04PM +0800, Huacai Chen wrote:
>>>>> On Sat, Jun 26, 2021 at 6:22 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>>>> On Fri, Jun 25, 2021 at 05:30:29PM +0800, Huacai Chen wrote:
>>>>>>> In new revision of LS7A, some PCIe ports support larger value than 256,
>>>>>>> but their maximum supported MRRS values are not detectable. Moreover,
>>>>>>> the current loongson_mrrs_quirk() cannot avoid devices increasing its
>>>>>>> MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
>>>>>>> will actually set a big value in its driver. So the only possible way is
>>>>>>> configure MRRS of all devices in BIOS, and add a PCI device flag (i.e.,
>>>>>>> PCI_DEV_FLAGS_NO_INCREASE_MRRS) to stop the increasing MRRS operations.
>>>>>>>
>>>>>>> However, according to PCIe Spec, it is legal for an OS to program any
>>>>>>> value for MRRS, and it is also legal for an endpoint to generate a Read
>>>>>>> Request with any size up to its MRRS. As the hardware engineers says,
>>>>>>> the root cause here is LS7A doesn't break up large read requests (Yes,
>>>>>>> that is a problem in the LS7A design).
>>>>>> "LS7A doesn't break up large read requests" claims to be a root cause,
>>>>>> but you haven't yet said what the actual *problem* is.
>>>>>>
>>>>>> Is the problem that an endpoint reports a malformed TLP because it
>>>>>> received a completion bigger than it can handle?  Is it that the LS7A
>>>>>> root port reports some kind of error if it receives a Memory Read
>>>>>> request with a size that's "too big"?  Maybe the LS7A doesn't know
>>>>>> what to do when it receives a Memory Read request with MRRS > MPS?
>>>>>> What exactly happens when the problem occurs?
>>>>> The hardware engineer said that the problem is: LS7A PCIe port reports
>>>>> CA (Completer Abort) if it receives a Memory Read
>>>>> request with a size that's "too big".
>>>> What is "too big"?
>>>>
>>> "Too big" means bigger than the port can handle, PCIe SPEC allows any
>>> MRRS value, but, but, LS7A surely violates the protocol here.
>> Right, I just wanted to know what the number is.  That is, what values
>> we can write to MRRS safely.
>>
>> But ISTR you saying that it's not actually fixed, and that's why you
>> wanted to rely on what firmware put there.
> Yes, it's not fixed (256 on some ports and 4096 on other ports), so we
> should heavily depend on firmware.
>
> Huacai
>> This is important to know for the question about hot-added devices
>> below, because a hot-added device should power up with MRRS=512 bytes,
>> and if that's too big for LS7A, then we have a problem and the quirk
>> needs to be more extensive.
>>
>>>> I'm trying to figure out how to make this work with hot-added devices.
>>>> Per spec (PCIe r5.0, sec 7.5.3.4), devices should power up with
>>>> MRRS=010b (512 bytes).
>>>> If Linux does not touch MRRS at all in hierarchices under LS7A, will a
>>>> hot-added device with MRRS=010b work?  Or does Linux need to actively
>>>> write MRRS to 000b (128 bytes) or 001b (256 bytes)?
> Emm, hot-plug is a problem, maybe we can only disable hot-plug in
> board design...
in ACPI?

Thanks.

- Jiaxun
>
> Huacai
>>>> Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b717680377a9..6f0d2f5b6f30 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5802,6 +5802,11 @@  int pcie_set_readrq(struct pci_dev *dev, int rq)
 
 	v = (ffs(rq) - 8) << 12;
 
+	if (dev->dev_flags & PCI_DEV_FLAGS_NO_INCREASE_MRRS) {
+		if (rq > pcie_get_readrq(dev))
+			return -EINVAL;
+	}
+
 	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
 						  PCI_EXP_DEVCTL_READRQ, v);
 
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index dee4798a49fc..8284480dc7e4 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -263,7 +263,13 @@  static void loongson_mrrs_quirk(struct pci_dev *dev)
 		 * anything larger than this. So force this limit on
 		 * any devices attached under these ports.
 		 */
-		if (pci_match_id(bridge_devids, bridge)) {
+		if (bridge && pci_match_id(bridge_devids, bridge)) {
+			dev->dev_flags |= PCI_DEV_FLAGS_NO_INCREASE_MRRS;
+
+			if (pcie_bus_config == PCIE_BUS_DEFAULT ||
+			    pcie_bus_config == PCIE_BUS_TUNE_OFF)
+				break;
+
 			if (pcie_get_readrq(dev) > 256) {
 				pci_info(dev, "limiting MRRS to 256\n");
 				pcie_set_readrq(dev, 256);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 24306504226a..5e0ec3e4318b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -227,6 +227,8 @@  enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
 	/* Don't use Relaxed Ordering for TLPs directed at this device */
 	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
+	/* Don't increase BIOS's MRRS configuration */
+	PCI_DEV_FLAGS_NO_INCREASE_MRRS = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {