diff mbox series

PCI/ASPM: Wait for data link active after retraining

Message ID 20220602065544.2552771-1-nathan@nathanrossi.com
State New
Headers show
Series PCI/ASPM: Wait for data link active after retraining | expand

Commit Message

Nathan Rossi June 2, 2022, 6:55 a.m. UTC
From: Nathan Rossi <nathan.rossi@digi.com>

When retraining the link either the child or the parent device may have
the data link layer state machine of the respective devices move out of
the active state despite the physical link training being completed.
Depending on how long is takes for the devices to return to the active
state, the device may not be ready and any further reads/writes to the
device can fail.

This issue is present with the pci-mvebu controller paired with a device
supporting ASPM but without advertising the Slot Clock, where during
boot the pcie_aspm_cap_init call would cause common clocks to be made
consistent and then retrain the link. However the data link layer would
not be active before any device initialization (e.g. ASPM capability
queries, BAR configuration) causing improper configuration of the device
without error.

To ensure the child device is accessible, after the link retraining use
pcie_wait_for_link to perform the associated state checks and any needed
delays.

Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
---
 drivers/pci/pcie/aspm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---
2.36.1

Comments

Nathan Rossi Sept. 26, 2022, 12:59 a.m. UTC | #1
On Thu, 2 Jun 2022 at 16:55, Nathan Rossi <nathan@nathanrossi.com> wrote:
>
> From: Nathan Rossi <nathan.rossi@digi.com>
>
> When retraining the link either the child or the parent device may have
> the data link layer state machine of the respective devices move out of
> the active state despite the physical link training being completed.
> Depending on how long is takes for the devices to return to the active
> state, the device may not be ready and any further reads/writes to the
> device can fail.
>
> This issue is present with the pci-mvebu controller paired with a device
> supporting ASPM but without advertising the Slot Clock, where during
> boot the pcie_aspm_cap_init call would cause common clocks to be made
> consistent and then retrain the link. However the data link layer would
> not be active before any device initialization (e.g. ASPM capability
> queries, BAR configuration) causing improper configuration of the device
> without error.
>
> To ensure the child device is accessible, after the link retraining use
> pcie_wait_for_link to perform the associated state checks and any needed
> delays.
>
> Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
> ---

Just pinging this patch, are there any comments or feedback for this change?

Thanks,
Nathan

>  drivers/pci/pcie/aspm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index a96b7424c9..4b8a1810be 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -288,7 +288,8 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
>                 reg16 &= ~PCI_EXP_LNKCTL_CCC;
>         pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>
> -       if (pcie_retrain_link(link))
> +       /* Retrain link and then wait for the link to become active */
> +       if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))
>                 return;
>
>         /* Training failed. Restore common clock configurations */
> ---
> 2.36.1
Pali Rohár Oct. 2, 2022, 5:56 p.m. UTC | #2
Hello!

On Thursday 02 June 2022 06:55:44 Nathan Rossi wrote:
> From: Nathan Rossi <nathan.rossi@digi.com>
> 
> When retraining the link either the child or the parent device may have
> the data link layer state machine of the respective devices move out of
> the active state despite the physical link training being completed.
> Depending on how long is takes for the devices to return to the active
> state, the device may not be ready and any further reads/writes to the
> device can fail.
> 
> This issue is present with the pci-mvebu controller paired with a device
> supporting ASPM but without advertising the Slot Clock, where during
> boot the pcie_aspm_cap_init call would cause common clocks to be made
> consistent and then retrain the link. However the data link layer would
> not be active before any device initialization (e.g. ASPM capability
> queries, BAR configuration) causing improper configuration of the device
> without error.

There is the known issue in marvell pcie controllers. They completely
drop the link for PCIe GEN1 cards when Target Link Speed (Link Control2)
in Root Port is configured to 5.0 GT/s or higher value and OS issues
Retrain Link (Link Control).

I think the proper way should be to workaround root of this issue by
programming Target Link Speed in Link Control2 register to required
value, instead of hacking couple of other places which are just
implication of that issue...

I can reproduce it for example with Qualcomm Atheros ath9k/ath10k wifi
cards which have another issue that they go into "broken" state when
in-band reset (e.g. pcie hot reset or pcie link down) is issues multiple
times without longer delay.

These two bugs (first in marvell pcie controller and second in wifi
card) cause that setting kernel ASPM cause disappearing card from bus
until cpu/board reset (or pcie warm reset; if board supports it at
runtime without going to POR).

I guess you are just observing result of this issue here.

> To ensure the child device is accessible, after the link retraining use
> pcie_wait_for_link to perform the associated state checks and any needed
> delays.
> 
> Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
> ---
>  drivers/pci/pcie/aspm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index a96b7424c9..4b8a1810be 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -288,7 +288,8 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
>  		reg16 &= ~PCI_EXP_LNKCTL_CCC;
>  	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>  
> -	if (pcie_retrain_link(link))
> +	/* Retrain link and then wait for the link to become active */
> +	if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))
>  		return;
>  
>  	/* Training failed. Restore common clock configurations */
> ---
> 2.36.1
Nathan Rossi Nov. 2, 2022, 8:31 a.m. UTC | #3
On Mon, 3 Oct 2022 at 03:56, Pali Rohár <pali@kernel.org> wrote:
>
> Hello!
>
> On Thursday 02 June 2022 06:55:44 Nathan Rossi wrote:
> > From: Nathan Rossi <nathan.rossi@digi.com>
> >
> > When retraining the link either the child or the parent device may have
> > the data link layer state machine of the respective devices move out of
> > the active state despite the physical link training being completed.
> > Depending on how long is takes for the devices to return to the active
> > state, the device may not be ready and any further reads/writes to the
> > device can fail.
> >
> > This issue is present with the pci-mvebu controller paired with a device
> > supporting ASPM but without advertising the Slot Clock, where during
> > boot the pcie_aspm_cap_init call would cause common clocks to be made
> > consistent and then retrain the link. However the data link layer would
> > not be active before any device initialization (e.g. ASPM capability
> > queries, BAR configuration) causing improper configuration of the device
> > without error.
>
> There is the known issue in marvell pcie controllers. They completely
> drop the link for PCIe GEN1 cards when Target Link Speed (Link Control2)
> in Root Port is configured to 5.0 GT/s or higher value and OS issues
> Retrain Link (Link Control).

In the configuration we are having issues with, the downstream device
is indeed a 2.5GT downstream, and the upstream is configured to
support 2.5GT and 5GT (Armada 385). So it does make sense that this
known issue would apply. I tested setting the target speed to 2.5GT
within mvebu_pcie_setup_hw before the retraining occurs, and it does
resolve the retraining delay/link drop. So this issue is indeed the
problem we are having. Is this behaviour mentioned in any errata?

>
> I think the proper way should be to workaround root of this issue by
> programming Target Link Speed in Link Control2 register to required
> value, instead of hacking couple of other places which are just
> implication of that issue...

By programming the Target Link Speed, are you referring to programming
the value like other controller drivers do with dtb configuration?
Relying on this would be problematic for our design (mixed downstream
link speed variants). Does it make more sense to set the Target Link
Speed when the retrain bit is being set in Link Control (e.g. in the
mvebu_pci_bridge_emul_pcie_conf_write function) essentially preventing
the retraining from causing the link to drop only when the issue is
expected to present itself (Link Status at 2.5GT)?

Thanks,
Nathan

>
> I can reproduce it for example with Qualcomm Atheros ath9k/ath10k wifi
> cards which have another issue that they go into "broken" state when
> in-band reset (e.g. pcie hot reset or pcie link down) is issues multiple
> times without longer delay.
>
> These two bugs (first in marvell pcie controller and second in wifi
> card) cause that setting kernel ASPM cause disappearing card from bus
> until cpu/board reset (or pcie warm reset; if board supports it at
> runtime without going to POR).
>
> I guess you are just observing result of this issue here.
>
> > To ensure the child device is accessible, after the link retraining use
> > pcie_wait_for_link to perform the associated state checks and any needed
> > delays.
> >
> > Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index a96b7424c9..4b8a1810be 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -288,7 +288,8 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
> >               reg16 &= ~PCI_EXP_LNKCTL_CCC;
> >       pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> >
> > -     if (pcie_retrain_link(link))
> > +     /* Retrain link and then wait for the link to become active */
> > +     if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))
> >               return;
> >
> >       /* Training failed. Restore common clock configurations */
> > ---
> > 2.36.1
Pali Rohár Nov. 2, 2022, 8:37 a.m. UTC | #4
On Wednesday 02 November 2022 18:31:04 Nathan Rossi wrote:
> On Mon, 3 Oct 2022 at 03:56, Pali Rohár <pali@kernel.org> wrote:
> >
> > Hello!
> >
> > On Thursday 02 June 2022 06:55:44 Nathan Rossi wrote:
> > > From: Nathan Rossi <nathan.rossi@digi.com>
> > >
> > > When retraining the link either the child or the parent device may have
> > > the data link layer state machine of the respective devices move out of
> > > the active state despite the physical link training being completed.
> > > Depending on how long is takes for the devices to return to the active
> > > state, the device may not be ready and any further reads/writes to the
> > > device can fail.
> > >
> > > This issue is present with the pci-mvebu controller paired with a device
> > > supporting ASPM but without advertising the Slot Clock, where during
> > > boot the pcie_aspm_cap_init call would cause common clocks to be made
> > > consistent and then retrain the link. However the data link layer would
> > > not be active before any device initialization (e.g. ASPM capability
> > > queries, BAR configuration) causing improper configuration of the device
> > > without error.
> >
> > There is the known issue in marvell pcie controllers. They completely
> > drop the link for PCIe GEN1 cards when Target Link Speed (Link Control2)
> > in Root Port is configured to 5.0 GT/s or higher value and OS issues
> > Retrain Link (Link Control).
> 
> In the configuration we are having issues with, the downstream device
> is indeed a 2.5GT downstream, and the upstream is configured to
> support 2.5GT and 5GT (Armada 385). So it does make sense that this
> known issue would apply. I tested setting the target speed to 2.5GT
> within mvebu_pcie_setup_hw before the retraining occurs, and it does
> resolve the retraining delay/link drop. So this issue is indeed the
> problem we are having.

I observed exactly same behavior as you described, on Armada 385 and
similar also on Armada 3720.

> Is this behaviour mentioned in any errata?

I have not documented this behavior in any document.

> >
> > I think the proper way should be to workaround root of this issue by
> > programming Target Link Speed in Link Control2 register to required
> > value, instead of hacking couple of other places which are just
> > implication of that issue...
> 
> By programming the Target Link Speed, are you referring to programming
> the value like other controller drivers do with dtb configuration?

No, I mean programming Link Control 2 register in PCIe capabilities
section of PCIe Root Port device (the one which is implemented throw
emulated bridge kernel code).

It should be programmed dynamically after detecting type of upstream
device to the PCIe Root Port (other end of the serdes link).

> Relying on this would be problematic for our design (mixed downstream
> link speed variants).

Yes, due to this reason it should be rather implemented dynamically in
kernel code instead of "hacking" DTS files as it does not really fix it.

> Does it make more sense to set the Target Link
> Speed when the retrain bit is being set in Link Control (e.g. in the
> mvebu_pci_bridge_emul_pcie_conf_write function) essentially preventing
> the retraining from causing the link to drop only when the issue is
> expected to present itself (Link Status at 2.5GT)?
> 
> Thanks,
> Nathan
> 
> >
> > I can reproduce it for example with Qualcomm Atheros ath9k/ath10k wifi
> > cards which have another issue that they go into "broken" state when
> > in-band reset (e.g. pcie hot reset or pcie link down) is issues multiple
> > times without longer delay.
> >
> > These two bugs (first in marvell pcie controller and second in wifi
> > card) cause that setting kernel ASPM cause disappearing card from bus
> > until cpu/board reset (or pcie warm reset; if board supports it at
> > runtime without going to POR).
> >
> > I guess you are just observing result of this issue here.
> >
> > > To ensure the child device is accessible, after the link retraining use
> > > pcie_wait_for_link to perform the associated state checks and any needed
> > > delays.
> > >
> > > Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
> > > ---
> > >  drivers/pci/pcie/aspm.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index a96b7424c9..4b8a1810be 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -288,7 +288,8 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
> > >               reg16 &= ~PCI_EXP_LNKCTL_CCC;
> > >       pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> > >
> > > -     if (pcie_retrain_link(link))
> > > +     /* Retrain link and then wait for the link to become active */
> > > +     if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))
> > >               return;
> > >
> > >       /* Training failed. Restore common clock configurations */
> > > ---
> > > 2.36.1
Bjorn Helgaas Nov. 8, 2022, 10:29 p.m. UTC | #5
On Thu, Jun 02, 2022 at 06:55:44AM +0000, Nathan Rossi wrote:
> From: Nathan Rossi <nathan.rossi@digi.com>
> 
> When retraining the link either the child or the parent device may have
> the data link layer state machine of the respective devices move out of
> the active state despite the physical link training being completed.
> Depending on how long is takes for the devices to return to the active
> state, the device may not be ready and any further reads/writes to the
> device can fail.
> 
> This issue is present with the pci-mvebu controller paired with a device
> supporting ASPM but without advertising the Slot Clock, where during
> boot the pcie_aspm_cap_init call would cause common clocks to be made
> consistent and then retrain the link. However the data link layer would
> not be active before any device initialization (e.g. ASPM capability
> queries, BAR configuration) causing improper configuration of the device
> without error.
> 
> To ensure the child device is accessible, after the link retraining use
> pcie_wait_for_link to perform the associated state checks and any needed
> delays.
> 
> Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
> ---
>  drivers/pci/pcie/aspm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index a96b7424c9..4b8a1810be 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -288,7 +288,8 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
>  		reg16 &= ~PCI_EXP_LNKCTL_CCC;
>  	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>  
> -	if (pcie_retrain_link(link))
> +	/* Retrain link and then wait for the link to become active */
> +	if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))

pcie_retrain_link() waits for PCI_EXP_LNKSTA_LT (Link Training) to be
cleared, which means the LTSSM has exited the Configuration/Recovery
state.  pcie_wait_for_link() waits for PCI_EXP_LNKSTA_DLLLA (Data Link
Layer Link Active) to be set, which means the link is in DL_Active.

I don't see an explicit procedure in the spec for determining when
a link retrain is complete, but from PCIe r6.0, sec 6.2.11 (DPC):

  After software releases the Downstream Port from DPC, the Port’s
  LTSSM must transition to the Detect state, where the Link will
  attempt to retrain. Software can use Data Link Layer State Changed
  interrupts, DL_ACTIVE ERR_COR signaling, or both, to signal when the
  Link reaches the DL_Active state again.

and sec 6.6:

  On the completion of Link Training (entering the DL_Active state,
  see Section 3.2), a component must be able to receive and process
  TLPs and DLLPs.

The only use mentioned in the spec for the Link Training bit is the
implementation note in sec 7.5.3.7 about avoiding race conditions when
using the Retrain Link bit, where software should poll Link Training
until it returns to zero before setting the Retrain Link bit to change
link parameters.

And I think you're absolutely right that what we *want* here is the
data link layer DL_Active state, not just the link layer L0 state.

This all makes me think that checking the Link Training bit might be
the wrong thing to begin with.

Of course, the Data Link Layer Link Active bit wasn't added until PCIe
r1.1, and even now it's optional.  Without it, I don't know if there's
a way to make sure the link is in DL_Active.

Maybe pcie_retrain_link() should wait for Data Link Layer Link Active
if it is supported, and use the existing behavior of waiting for Link
Training to be cleared otherwise?

>  		return;
>  
>  	/* Training failed. Restore common clock configurations */
> ---
> 2.36.1
Bjorn Helgaas Nov. 9, 2022, 5:34 p.m. UTC | #6
[+cc Maciej for similar retrain issue]

On Tue, Nov 08, 2022 at 04:29:44PM -0600, Bjorn Helgaas wrote:
> On Thu, Jun 02, 2022 at 06:55:44AM +0000, Nathan Rossi wrote:
> > From: Nathan Rossi <nathan.rossi@digi.com>
> > 
> > When retraining the link either the child or the parent device may have
> > the data link layer state machine of the respective devices move out of
> > the active state despite the physical link training being completed.
> > Depending on how long is takes for the devices to return to the active
> > state, the device may not be ready and any further reads/writes to the
> > device can fail.
> > 
> > This issue is present with the pci-mvebu controller paired with a device
> > supporting ASPM but without advertising the Slot Clock, where during
> > boot the pcie_aspm_cap_init call would cause common clocks to be made
> > consistent and then retrain the link. However the data link layer would
> > not be active before any device initialization (e.g. ASPM capability
> > queries, BAR configuration) causing improper configuration of the device
> > without error.
> > 
> > To ensure the child device is accessible, after the link retraining use
> > pcie_wait_for_link to perform the associated state checks and any needed
> > delays.
> > 
> > Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index a96b7424c9..4b8a1810be 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -288,7 +288,8 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
> >  		reg16 &= ~PCI_EXP_LNKCTL_CCC;
> >  	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> >  
> > -	if (pcie_retrain_link(link))
> > +	/* Retrain link and then wait for the link to become active */
> > +	if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))
> 
> pcie_retrain_link() waits for PCI_EXP_LNKSTA_LT (Link Training) to be
> cleared, which means the LTSSM has exited the Configuration/Recovery
> state.  pcie_wait_for_link() waits for PCI_EXP_LNKSTA_DLLLA (Data Link
> Layer Link Active) to be set, which means the link is in DL_Active.
> 
> I don't see an explicit procedure in the spec for determining when
> a link retrain is complete, but from PCIe r6.0, sec 6.2.11 (DPC):
> 
>   After software releases the Downstream Port from DPC, the Port’s
>   LTSSM must transition to the Detect state, where the Link will
>   attempt to retrain. Software can use Data Link Layer State Changed
>   interrupts, DL_ACTIVE ERR_COR signaling, or both, to signal when the
>   Link reaches the DL_Active state again.
> 
> and sec 6.6:
> 
>   On the completion of Link Training (entering the DL_Active state,
>   see Section 3.2), a component must be able to receive and process
>   TLPs and DLLPs.
> 
> The only use mentioned in the spec for the Link Training bit is the
> implementation note in sec 7.5.3.7 about avoiding race conditions when
> using the Retrain Link bit, where software should poll Link Training
> until it returns to zero before setting the Retrain Link bit to change
> link parameters.
> 
> And I think you're absolutely right that what we *want* here is the
> data link layer DL_Active state, not just the link layer L0 state.
> 
> This all makes me think that checking the Link Training bit might be
> the wrong thing to begin with.
> 
> Of course, the Data Link Layer Link Active bit wasn't added until PCIe
> r1.1, and even now it's optional.  Without it, I don't know if there's
> a way to make sure the link is in DL_Active.
> 
> Maybe pcie_retrain_link() should wait for Data Link Layer Link Active
> if it is supported, and use the existing behavior of waiting for Link
> Training to be cleared otherwise?

Nathan, I meant to cc you on this thread, which is doing something
very similar.  Just FYI:

https://lore.kernel.org/all/alpine.DEB.2.21.2209130050380.60554@angie.orcam.me.uk/
Nathan Rossi Nov. 14, 2022, 3:59 a.m. UTC | #7
On Thu, 10 Nov 2022 at 03:34, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Maciej for similar retrain issue]
>
> On Tue, Nov 08, 2022 at 04:29:44PM -0600, Bjorn Helgaas wrote:
> > On Thu, Jun 02, 2022 at 06:55:44AM +0000, Nathan Rossi wrote:
> > > From: Nathan Rossi <nathan.rossi@digi.com>
> > >
> > > When retraining the link either the child or the parent device may have
> > > the data link layer state machine of the respective devices move out of
> > > the active state despite the physical link training being completed.
> > > Depending on how long is takes for the devices to return to the active
> > > state, the device may not be ready and any further reads/writes to the
> > > device can fail.
> > >
> > > This issue is present with the pci-mvebu controller paired with a device
> > > supporting ASPM but without advertising the Slot Clock, where during
> > > boot the pcie_aspm_cap_init call would cause common clocks to be made
> > > consistent and then retrain the link. However the data link layer would
> > > not be active before any device initialization (e.g. ASPM capability
> > > queries, BAR configuration) causing improper configuration of the device
> > > without error.
> > >
> > > To ensure the child device is accessible, after the link retraining use
> > > pcie_wait_for_link to perform the associated state checks and any needed
> > > delays.
> > >
> > > Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
> > > ---
> > >  drivers/pci/pcie/aspm.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index a96b7424c9..4b8a1810be 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -288,7 +288,8 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
> > >             reg16 &= ~PCI_EXP_LNKCTL_CCC;
> > >     pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
> > >
> > > -   if (pcie_retrain_link(link))
> > > +   /* Retrain link and then wait for the link to become active */
> > > +   if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))
> >
> > pcie_retrain_link() waits for PCI_EXP_LNKSTA_LT (Link Training) to be
> > cleared, which means the LTSSM has exited the Configuration/Recovery
> > state.  pcie_wait_for_link() waits for PCI_EXP_LNKSTA_DLLLA (Data Link
> > Layer Link Active) to be set, which means the link is in DL_Active.
> >
> > I don't see an explicit procedure in the spec for determining when
> > a link retrain is complete, but from PCIe r6.0, sec 6.2.11 (DPC):
> >
> >   After software releases the Downstream Port from DPC, the Port’s
> >   LTSSM must transition to the Detect state, where the Link will
> >   attempt to retrain. Software can use Data Link Layer State Changed
> >   interrupts, DL_ACTIVE ERR_COR signaling, or both, to signal when the
> >   Link reaches the DL_Active state again.
> >
> > and sec 6.6:
> >
> >   On the completion of Link Training (entering the DL_Active state,
> >   see Section 3.2), a component must be able to receive and process
> >   TLPs and DLLPs.
> >
> > The only use mentioned in the spec for the Link Training bit is the
> > implementation note in sec 7.5.3.7 about avoiding race conditions when
> > using the Retrain Link bit, where software should poll Link Training
> > until it returns to zero before setting the Retrain Link bit to change
> > link parameters.
> >
> > And I think you're absolutely right that what we *want* here is the
> > data link layer DL_Active state, not just the link layer L0 state.
> >
> > This all makes me think that checking the Link Training bit might be
> > the wrong thing to begin with.
> >
> > Of course, the Data Link Layer Link Active bit wasn't added until PCIe
> > r1.1, and even now it's optional.  Without it, I don't know if there's
> > a way to make sure the link is in DL_Active.

My understanding is there is no way to check for the DL_Active state
on these devices. Which is why pcie_wait_for_link_delay uses a fixed
delay in that case.

> >
> > Maybe pcie_retrain_link() should wait for Data Link Layer Link Active
> > if it is supported, and use the existing behavior of waiting for Link
> > Training to be cleared otherwise?

I think it still makes sense for pcie_retrain_link to wait for the
Link Training bit, since it is typical for the retraining to never go
to a LTSSM state where the LinkUp is 0 (e.g. Recovery ->
Configuration), thus the Data Link Layer will stay in DL_Active. But
it is still important to wait for the link training to complete in
that case, but also because the training may not cause the Data Link
state to change immediately.

However since DLLLA reporting is optional, it is probably ideal if the
pcie_retrain_link only calls pcie_wait_for_link if reporting is
available? This would avoid the fixed delay of 1s upon link retraining
for devices without reporting, which is unnecessary in the majority of
cases.

Thanks,
Nathan


>
> Nathan, I meant to cc you on this thread, which is doing something
> very similar.  Just FYI:
>
> https://lore.kernel.org/all/alpine.DEB.2.21.2209130050380.60554@angie.orcam.me.uk/
Siddharth Vadapalli Dec. 12, 2022, 9:11 a.m. UTC | #8
Hello,

On 14/11/22 09:29, Nathan Rossi wrote:
> On Thu, 10 Nov 2022 at 03:34, Bjorn Helgaas <helgaas@kernel.org> wrote:
>>
>> [+cc Maciej for similar retrain issue]
>>
>> On Tue, Nov 08, 2022 at 04:29:44PM -0600, Bjorn Helgaas wrote:
>>> On Thu, Jun 02, 2022 at 06:55:44AM +0000, Nathan Rossi wrote:
>>>> From: Nathan Rossi <nathan.rossi@digi.com>
>>>>
>>>> When retraining the link either the child or the parent device may have
>>>> the data link layer state machine of the respective devices move out of
>>>> the active state despite the physical link training being completed.
>>>> Depending on how long is takes for the devices to return to the active
>>>> state, the device may not be ready and any further reads/writes to the
>>>> device can fail.
>>>>
>>>> This issue is present with the pci-mvebu controller paired with a device
>>>> supporting ASPM but without advertising the Slot Clock, where during
>>>> boot the pcie_aspm_cap_init call would cause common clocks to be made
>>>> consistent and then retrain the link. However the data link layer would
>>>> not be active before any device initialization (e.g. ASPM capability
>>>> queries, BAR configuration) causing improper configuration of the device
>>>> without error.
>>>>
>>>> To ensure the child device is accessible, after the link retraining use
>>>> pcie_wait_for_link to perform the associated state checks and any needed
>>>> delays.
>>>>
>>>> Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
>>>> ---
>>>>  drivers/pci/pcie/aspm.c | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>> index a96b7424c9..4b8a1810be 100644
>>>> --- a/drivers/pci/pcie/aspm.c
>>>> +++ b/drivers/pci/pcie/aspm.c
>>>> @@ -288,7 +288,8 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
>>>>             reg16 &= ~PCI_EXP_LNKCTL_CCC;
>>>>     pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>>>>
>>>> -   if (pcie_retrain_link(link))
>>>> +   /* Retrain link and then wait for the link to become active */
>>>> +   if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))
>>>
>>> pcie_retrain_link() waits for PCI_EXP_LNKSTA_LT (Link Training) to be
>>> cleared, which means the LTSSM has exited the Configuration/Recovery
>>> state.  pcie_wait_for_link() waits for PCI_EXP_LNKSTA_DLLLA (Data Link
>>> Layer Link Active) to be set, which means the link is in DL_Active.
>>>
>>> I don't see an explicit procedure in the spec for determining when
>>> a link retrain is complete, but from PCIe r6.0, sec 6.2.11 (DPC):
>>>
>>>   After software releases the Downstream Port from DPC, the Port’s
>>>   LTSSM must transition to the Detect state, where the Link will
>>>   attempt to retrain. Software can use Data Link Layer State Changed
>>>   interrupts, DL_ACTIVE ERR_COR signaling, or both, to signal when the
>>>   Link reaches the DL_Active state again.
>>>
>>> and sec 6.6:
>>>
>>>   On the completion of Link Training (entering the DL_Active state,
>>>   see Section 3.2), a component must be able to receive and process
>>>   TLPs and DLLPs.
>>>
>>> The only use mentioned in the spec for the Link Training bit is the
>>> implementation note in sec 7.5.3.7 about avoiding race conditions when
>>> using the Retrain Link bit, where software should poll Link Training
>>> until it returns to zero before setting the Retrain Link bit to change
>>> link parameters.
>>>
>>> And I think you're absolutely right that what we *want* here is the
>>> data link layer DL_Active state, not just the link layer L0 state.
>>>
>>> This all makes me think that checking the Link Training bit might be
>>> the wrong thing to begin with.
>>>
>>> Of course, the Data Link Layer Link Active bit wasn't added until PCIe
>>> r1.1, and even now it's optional.  Without it, I don't know if there's
>>> a way to make sure the link is in DL_Active.
> 
> My understanding is there is no way to check for the DL_Active state
> on these devices. Which is why pcie_wait_for_link_delay uses a fixed
> delay in that case.

In TI's J721E SoC, the PCIe controller doesn't report the Data Link
State. The bit is hardwired to zero. However, the status of the Data
Link is reported via a set of user specific registers, which aren't a
part of the standard PCIe registers. With this, I was able to observe
that despite the PCI_EXP_LNKSTA_LT bit being cleared, the Data Link
initialization was still ongoing. This leads to a race condition if it
isn't ensured that the Data Link initialization is complete before the
PCI core attempts to obtain the configuration space addresses via the
map_bus call. Based on the scheduling, it is possible for the kernel to
crash, and I was able to observe this in my setup.

With this patch applied, the issue is fixed.

Tested-by: Siddharth Vadapalli <s-vadapalli@ti.com>

Regards,
Siddharth.
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index a96b7424c9..4b8a1810be 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -288,7 +288,8 @@  static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
 		reg16 &= ~PCI_EXP_LNKCTL_CCC;
 	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
 
-	if (pcie_retrain_link(link))
+	/* Retrain link and then wait for the link to become active */
+	if (pcie_retrain_link(link) && pcie_wait_for_link(parent, true))
 		return;
 
 	/* Training failed. Restore common clock configurations */