diff mbox series

[1/1] PCI: Use the correct bit in Link Training not active check

Message ID 20240301150641.4037-1-ilpo.jarvinen@linux.intel.com
State New
Headers show
Series [1/1] PCI: Use the correct bit in Link Training not active check | expand

Commit Message

Ilpo Järvinen March 1, 2024, 3:06 p.m. UTC
Besides Link Training bit, pcie_retrain_link() can also be asked to
wait for Data Link Layer Link Active bit (PCIe r6.1 sec 7.5.3.8) using
'use_lt' parameter since the merge commit 1abb47390350 ("Merge branch
'pci/enumeration'").

pcie_retrain_link() first tries to ensure Link Training is not
currently active (see Implementation Note in PCIe r6.1 sec 7.5.3.7)
which must always check Link Training bit regardless of 'use_lt'.
Correct the pcie_wait_for_link_status() parameters to only wait for
the correct bit to ensure there is no ongoing Link Training.

Since waiting for Data Link Layer Link Active bit is only used for the
Target Speed quirk, this only impacts the case when the quirk attempts
to restore speed to higher than 2.5 GT/s (The link is Up at that point
so pcie_retrain_link() will fail).

Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Maciej W. Rozycki March 1, 2024, 5:22 p.m. UTC | #1
On Fri, 1 Mar 2024, Ilpo Järvinen wrote:

> Besides Link Training bit, pcie_retrain_link() can also be asked to
> wait for Data Link Layer Link Active bit (PCIe r6.1 sec 7.5.3.8) using
> 'use_lt' parameter since the merge commit 1abb47390350 ("Merge branch
> 'pci/enumeration'").

 Nope, it was added with commit 680e9c47a229 ("PCI: Add support for 
polling DLLLA to pcie_retrain_link()"), before said merge.

> pcie_retrain_link() first tries to ensure Link Training is not
> currently active (see Implementation Note in PCIe r6.1 sec 7.5.3.7)
> which must always check Link Training bit regardless of 'use_lt'.
> Correct the pcie_wait_for_link_status() parameters to only wait for
> the correct bit to ensure there is no ongoing Link Training.

 You're talking the PCIe spec here and code is talking a bad device case.

> Since waiting for Data Link Layer Link Active bit is only used for the
> Target Speed quirk, this only impacts the case when the quirk attempts
> to restore speed to higher than 2.5 GT/s (The link is Up at that point
> so pcie_retrain_link() will fail).

 NAK.  It's used for both clamping and unclamping and it will break the 
workaround, because the whole point there is to wait until DLLA has been 
set.  Using LT is not reliable because it will oscillate in the failure 
case and seeing the bit clear does not mean link has been established.  

 What are you trying to achieve here and what problem is it to fix?

  Maciej
Ilpo Järvinen March 4, 2024, 11:21 a.m. UTC | #2
On Fri, 1 Mar 2024, Maciej W. Rozycki wrote:

> On Fri, 1 Mar 2024, Ilpo Järvinen wrote:
> 
> > Besides Link Training bit, pcie_retrain_link() can also be asked to
> > wait for Data Link Layer Link Active bit (PCIe r6.1 sec 7.5.3.8) using
> > 'use_lt' parameter since the merge commit 1abb47390350 ("Merge branch
> > 'pci/enumeration'").
> 
>  Nope, it was added with commit 680e9c47a229 ("PCI: Add support for 
> polling DLLLA to pcie_retrain_link()"), before said merge.

Ah sorry, my wording was not good here, I meant on the line I was 
changing in the patch and that line didn't exist in 680e9c47a229 at all. 

So yes, DLLLA and use_lt waiting was added in 680e9c47a229 but the merge 
commit brought the implementation note related code into 
pcie_retrain_link() which I think was mismerged when it comes to use_lt.

> > pcie_retrain_link() first tries to ensure Link Training is not
> > currently active (see Implementation Note in PCIe r6.1 sec 7.5.3.7)
> > which must always check Link Training bit regardless of 'use_lt'.
> > Correct the pcie_wait_for_link_status() parameters to only wait for
> > the correct bit to ensure there is no ongoing Link Training.
> 
>  You're talking the PCIe spec here and code is talking a bad device case.
>
> > Since waiting for Data Link Layer Link Active bit is only used for the
> > Target Speed quirk, this only impacts the case when the quirk attempts
> > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > so pcie_retrain_link() will fail).
> 
>  NAK.  It's used for both clamping and unclamping and it will break the 
> workaround, because the whole point there is to wait until DLLA has been 
> set.  Using LT is not reliable because it will oscillate in the failure 
> case and seeing the bit clear does not mean link has been established.  

In pcie_retrain_link(), there are two calls into 
pcie_wait_for_link_status() and the second one of them is meant to 
implement the link-has-been-established check.

The first wait call comes from e7e39756363a ("PCI/ASPM: Avoid link 
retraining race") and is just to ensure the link is not ongoing retraining 
to make sure the latest configuration in captured as required by the 
implementation note. LT being cleared is exactly what is wanted for that 
check because it means that any earlier retraining has ended (a new one 
might be starting but that doesn't matter, we saw it cleared so the new 
configuration should be in effect for that instance of link retraining).

So my point is, the first check is not even meant to check that link has 
been established.

>  What are you trying to achieve here and what problem is it to fix?

Actually, I misthought what it breaks so the problem is not well described 
above but I still think it is broken:

In the case of quirk, before 2.5GT/s is attempted DLLLA is not set, 
right? Then quirk sets 2.5GT/s target speed and calls into 
pcie_retrain_link().

The first call into pcie_wait_for_link_status() is made with (..., false, 
true) which waits until DLLLA is set but this occurs before OS even 
triggered Link Retraining. Since there's no retraining commanded by the 
OS, DLLLA will not get set, the wait will fail and error is returned, and 
the quirk too returns failure.

It could of course occur that because of the HW retraining loop 
(independent of OS control), the link retrains itselfs to 2.5GT/s without 
OS asking for it just by OS setting the target speed alone, which is well 
possible given the HW behavior in your target speed quirk case is not 
fully understood. Even if that's the case, it seems not good to rely on 
the HW originating retraining loop triggering the link retraining that
is necessary here.

Maybe this is far fetched thought but perhaps it could explain why you 
didn't get the link up with your setup when you tried to test it earlier.

Alternative approach to fix this problem would be to not make the first 
call into pcie_wait_for_link_status() at all when use_lt = false.

Of course, I cannot test this with your configuration so I cannot 
confirm how the target speed quirk behaves, I just found it by reading the 
code. The current code does not make sense because the first wait is 
supposed to wait for LT bit, not for DLLLA.
Maciej W. Rozycki March 6, 2024, 12:23 p.m. UTC | #3
On Mon, 4 Mar 2024, Ilpo Järvinen wrote:

> > > Since waiting for Data Link Layer Link Active bit is only used for the
> > > Target Speed quirk, this only impacts the case when the quirk attempts
> > > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > > so pcie_retrain_link() will fail).
> > 
> >  NAK.  It's used for both clamping and unclamping and it will break the 
> > workaround, because the whole point there is to wait until DLLA has been 
> > set.  Using LT is not reliable because it will oscillate in the failure 
> > case and seeing the bit clear does not mean link has been established.  
> 
> In pcie_retrain_link(), there are two calls into 
> pcie_wait_for_link_status() and the second one of them is meant to 
> implement the link-has-been-established check.
> 
> The first wait call comes from e7e39756363a ("PCI/ASPM: Avoid link 
> retraining race") and is just to ensure the link is not ongoing retraining 
> to make sure the latest configuration in captured as required by the 
> implementation note. LT being cleared is exactly what is wanted for that 
> check because it means that any earlier retraining has ended (a new one 
> might be starting but that doesn't matter, we saw it cleared so the new 
> configuration should be in effect for that instance of link retraining).
> 
> So my point is, the first check is not even meant to check that link has 
> been established.

 I see what you mean, and I now remember the note in the spec.  I had 
concerns about it, but did not do anything about it at that point.

 I think we still have no guarantee that LT will be clear at the point we 
set RL, because LT could get reasserted by hardware between our read and 
the setting of RL.  IIUC that doesn't matter really, because the new link 
parameters will be taken into account regardless of whether retraining was
initiated by hardware in an attempt to do link recovery or triggered by 
software via RL.

> >  What are you trying to achieve here and what problem is it to fix?
> 
> Actually, I misthought what it breaks so the problem is not well described 
> above but I still think it is broken:
> 
> In the case of quirk, before 2.5GT/s is attempted DLLLA is not set, 
> right? Then quirk sets 2.5GT/s target speed and calls into 
> pcie_retrain_link().

 Correct.

> The first call into pcie_wait_for_link_status() is made with (..., false, 
> true) which waits until DLLLA is set but this occurs before OS even 
> triggered Link Retraining. Since there's no retraining commanded by the 
> OS, DLLLA will not get set, the wait will fail and error is returned, and 
> the quirk too returns failure.
> 
> It could of course occur that because of the HW retraining loop 
> (independent of OS control), the link retrains itselfs to 2.5GT/s without 
> OS asking for it just by OS setting the target speed alone, which is well 
> possible given the HW behavior in your target speed quirk case is not 
> fully understood. Even if that's the case, it seems not good to rely on 
> the HW originating retraining loop triggering the link retraining that
> is necessary here.

 I think it just confirms what I observed above (and which is surely noted 
somewhere in the spec) that modified link parameters are taken into 
account regardless of how retraining has been initiated and the link gets 
established (at 2.5GT/s) at the first call to `pcie_wait_for_link_status' 
already and returns successfully seeing DLLLA set.

> Maybe this is far fetched thought but perhaps it could explain why you 
> didn't get the link up with your setup when you tried to test it earlier.
> 
> Alternative approach to fix this problem would be to not make the first 
> call into pcie_wait_for_link_status() at all when use_lt = false.
> 
> Of course, I cannot test this with your configuration so I cannot 
> confirm how the target speed quirk behaves, I just found it by reading the 
> code. The current code does not make sense because the first wait is 
> supposed to wait for LT bit, not for DLLLA.

 I think I now understand the problem correctly, and indeed from master 
Linux repo's point of view it's been a defect with the merge referred.  
So I withdraw my objection.  Sorry about the confusion.

 However I think the last paragraph of your commit description:

> Since waiting for Data Link Layer Link Active bit is only used for the
> Target Speed quirk, this only impacts the case when the quirk attempts
> to restore speed to higher than 2.5 GT/s (The link is Up at that point
> so pcie_retrain_link() will fail).

is not accurate enough, which contributed to my confusion.  In particular 
`pcie_retrain_link' succeeds when the link is up, because it calls 
`pcie_wait_for_link_status' such as to succeed when either LT is clear or 
DLLLA is set.  How about:

This only impacts the Target Speed quirk, which is the only case where 
waiting for Data Link Layer Link Active bit is used.  It currently works 
in the problematic case by means of link training getting initiated by 
hardware repeatedly and respecting the new link parameters set by the 
caller, which then make training succeed and bring the link up, setting 
DLLLA and causing pcie_wait_for_link_status() to return success.  We are 
not supposed to rely on luck and need to make sure that LT transitioned
through the inactive state though before we initiate link training by 
hand via RL.

then?

 Also for `pcie_retrain_link' I think it would be clearer if we rephrased 
the note as follows:

	 * Ensure the updated LNKCTL parameters are used during link
	 * training by checking that there is no ongoing link training
	 * that may have started before link parameters were changed, so
	 * as to avoid LTSSM race as recommended in Implementation Note
	 * at the end of PCIe r6.0.1 sec 7.5.3.7.

 NB I am currently away on holiday until the end of next week.  However I 
do have access to my lab and I'll see if I can verify your change tonight
or another evening this week, and overall thank you for your patience.

  Maciej
Ilpo Järvinen March 6, 2024, 3:44 p.m. UTC | #4
On Wed, 6 Mar 2024, Maciej W. Rozycki wrote:
> On Mon, 4 Mar 2024, Ilpo Järvinen wrote:
> 
> > > > Since waiting for Data Link Layer Link Active bit is only used for the
> > > > Target Speed quirk, this only impacts the case when the quirk attempts
> > > > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > > > so pcie_retrain_link() will fail).
> > > 
> > >  NAK.  It's used for both clamping and unclamping and it will break the 
> > > workaround, because the whole point there is to wait until DLLA has been 
> > > set.  Using LT is not reliable because it will oscillate in the failure 
> > > case and seeing the bit clear does not mean link has been established.  
> > 
> > In pcie_retrain_link(), there are two calls into 
> > pcie_wait_for_link_status() and the second one of them is meant to 
> > implement the link-has-been-established check.
> > 
> > The first wait call comes from e7e39756363a ("PCI/ASPM: Avoid link 
> > retraining race") and is just to ensure the link is not ongoing retraining 
> > to make sure the latest configuration in captured as required by the 
> > implementation note. LT being cleared is exactly what is wanted for that 
> > check because it means that any earlier retraining has ended (a new one 
> > might be starting but that doesn't matter, we saw it cleared so the new 
> > configuration should be in effect for that instance of link retraining).
> > 
> > So my point is, the first check is not even meant to check that link has 
> > been established.
> 
>  I see what you mean, and I now remember the note in the spec.  I had 
> concerns about it, but did not do anything about it at that point.
> 
>  I think we still have no guarantee that LT will be clear at the point we 
> set RL, because LT could get reasserted by hardware between our read and 
> the setting of RL.
>
> IIUC that doesn't matter really, because the new link 
> parameters will be taken into account regardless of whether retraining was
> initiated by hardware in an attempt to do link recovery or triggered by 
> software via RL.

I, too, was somewhat worried about having LT never clear for long enough 
to successfully sample it during the wait but it's like you say, any new 
link training should take account the new Target Speed which should 
successfully bring the link up (assuming the quirk works in the first 
place) and that should clear LT.

> > Of course, I cannot test this with your configuration so I cannot 
> > confirm how the target speed quirk behaves, I just found it by reading the 
> > code. The current code does not make sense because the first wait is 
> > supposed to wait for LT bit, not for DLLLA.
> 
>  I think I now understand the problem correctly, and indeed from master 
> Linux repo's point of view it's been a defect with the merge referred.  
> So I withdraw my objection.  Sorry about the confusion.

Thanks, and the confusion was entirely my fault caused my confusing 
and partly wrong wording.

>  However I think the last paragraph of your commit description:
> 
> > Since waiting for Data Link Layer Link Active bit is only used for the
> > Target Speed quirk, this only impacts the case when the quirk attempts
> > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > so pcie_retrain_link() will fail).
> 
> is not accurate enough, which contributed to my confusion.  In particular 
> `pcie_retrain_link' succeeds when the link is up, because it calls 
> `pcie_wait_for_link_status' such as to succeed when either LT is clear or 
> DLLLA is set.

I know, it was simply wrong because I somehow misthought which way those 
use_lt negations worked (and thus thought I found a different problem 
than there really was).

I already did major rewriting here to explain out how the code ended up 
into its current shape but I'll also consider your explanation below 
which is likely better than what I've currently, thanks.

>  How about:
> 
> This only impacts the Target Speed quirk, which is the only case where 
> waiting for Data Link Layer Link Active bit is used.  It currently works 
> in the problematic case by means of link training getting initiated by 
> hardware repeatedly and respecting the new link parameters set by the 
> caller, which then make training succeed and bring the link up, setting 
> DLLLA and causing pcie_wait_for_link_status() to return success.  We are 
> not supposed to rely on luck and need to make sure that LT transitioned
> through the inactive state though before we initiate link training by 
> hand via RL.
>
> then?
> 
>  Also for `pcie_retrain_link' I think it would be clearer if we rephrased 
> the note as follows:
> 
> 	 * Ensure the updated LNKCTL parameters are used during link
> 	 * training by checking that there is no ongoing link training
> 	 * that may have started before link parameters were changed, so
> 	 * as to avoid LTSSM race as recommended in Implementation Note
> 	 * at the end of PCIe r6.0.1 sec 7.5.3.7.
> 
>  NB I am currently away on holiday until the end of next week.  However I 
> do have access to my lab and I'll see if I can verify your change tonight
> or another evening this week, and overall thank you for your patience.

Don't worry. I can easily wait even until you're back from the holiday
so no need to push it or anything. :-)
Ilpo Järvinen March 14, 2024, 11:39 a.m. UTC | #5
On Wed, 6 Mar 2024, Ilpo Järvinen wrote:

> On Wed, 6 Mar 2024, Maciej W. Rozycki wrote:
> > On Mon, 4 Mar 2024, Ilpo Järvinen wrote:
> > 
> > > > > Since waiting for Data Link Layer Link Active bit is only used for the
> > > > > Target Speed quirk, this only impacts the case when the quirk attempts
> > > > > to restore speed to higher than 2.5 GT/s (The link is Up at that point
> > > > > so pcie_retrain_link() will fail).
> > > > 
> > > >  NAK.  It's used for both clamping and unclamping and it will break the 
> > > > workaround, because the whole point there is to wait until DLLA has been 
> > > > set.  Using LT is not reliable because it will oscillate in the failure 
> > > > case and seeing the bit clear does not mean link has been established.  
> > > 
> > > In pcie_retrain_link(), there are two calls into 
> > > pcie_wait_for_link_status() and the second one of them is meant to 
> > > implement the link-has-been-established check.
> > > 
> > > The first wait call comes from e7e39756363a ("PCI/ASPM: Avoid link 
> > > retraining race") and is just to ensure the link is not ongoing retraining 
> > > to make sure the latest configuration in captured as required by the 
> > > implementation note. LT being cleared is exactly what is wanted for that 
> > > check because it means that any earlier retraining has ended (a new one 
> > > might be starting but that doesn't matter, we saw it cleared so the new 
> > > configuration should be in effect for that instance of link retraining).
> > > 
> > > So my point is, the first check is not even meant to check that link has 
> > > been established.
> > 
> >  I see what you mean, and I now remember the note in the spec.  I had 
> > concerns about it, but did not do anything about it at that point.
> > 
> >  I think we still have no guarantee that LT will be clear at the point we 
> > set RL, because LT could get reasserted by hardware between our read and 
> > the setting of RL.
> >
> > IIUC that doesn't matter really, because the new link 
> > parameters will be taken into account regardless of whether retraining was
> > initiated by hardware in an attempt to do link recovery or triggered by 
> > software via RL.
> 
> I, too, was somewhat worried about having LT never clear for long enough 
> to successfully sample it during the wait but it's like you say, any new 
> link training should take account the new Target Speed which should 
> successfully bring the link up (assuming the quirk works in the first 
> place) and that should clear LT.

Hi,

One more point to add here, I started to wonder today why that use_lt 
parameter is needed at all for pcie_retrain_link()?

Once the Target Speed has been changed to 2.5GT/s which is what the quirk 
does before calling retraining, LT too should work "normally" after that.
Maciej W. Rozycki March 14, 2024, 9:58 p.m. UTC | #6
On Thu, 14 Mar 2024, Ilpo Järvinen wrote:

> One more point to add here, I started to wonder today why that use_lt 
> parameter is needed at all for pcie_retrain_link()?
> 
> Once the Target Speed has been changed to 2.5GT/s which is what the quirk 
> does before calling retraining, LT too should work "normally" after that.

 We don't know if the link is going to become stable with the TLS update 
to 2.5GT/s and we want to ensure that the link has reached the active 
state before claiming victory; LT clear does not mean the link is active, 
it only means what it means, that is that the link isn't being trained at 
the moment.

 Also we don't want to reset the TLS to the maximum before the link has 
become active.

 Does this answer your question?

  Maciej
Ilpo Järvinen March 15, 2024, 9:58 a.m. UTC | #7
On Thu, 14 Mar 2024, Maciej W. Rozycki wrote:

> On Thu, 14 Mar 2024, Ilpo Järvinen wrote:
> 
> > One more point to add here, I started to wonder today why that use_lt 
> > parameter is needed at all for pcie_retrain_link()?
> > 
> > Once the Target Speed has been changed to 2.5GT/s which is what the quirk 
> > does before calling retraining, LT too should work "normally" after that.
> 
>  We don't know if the link is going to become stable with the TLS update 
> to 2.5GT/s and we want to ensure that the link has reached the active 
> state before claiming victory; LT clear does not mean the link is active, 
> it only means what it means, that is that the link isn't being trained at 
> the moment.

LT clear means retraining has ended which is the condition 
pcie_retrain_link() should terminate at. It tried and finished retraining 
as proven by LT clear.

>  Also we don't want to reset the TLS to the maximum before the link has 
> become active.

I'm not suggesting to change the if DLLLA check that is within the quirk 
so it will remain the same even if pcie_retrain_link() would no longer 
have the use_lt parameter.

If LT clear after retraining does not imply DLLLA set, then this again 
falls into the quirk territory and IMO the quirk itself should be what 
makes the additional call to wait for DLLLA, not pcie_retrain_link().
I suspect though that DL clear does imply DLLLA set (after the target 
speed was lowered) so my expectation is that the extra wait wouldn't be 
necessary at that point.

>  Does this answer your question?

What I'm trying to achieve is having pcie_retrain_link() focus on 
retraining and quirk on steps required by the quirk. Currently they're 
kind of mixed if we assume the assumption that LT clear doesn't imply 
active link is true. That tells quirk would need additional step, 
that is, wait for DLLLA after the retraining has completed which is 
currently hidden into pcie_retrain_link() rather than explicitly 
called by the quirk.
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d8f11a078924..251a0c66c8cb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5016,7 +5016,7 @@  int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
 	 * avoid LTSSM race as recommended in Implementation Note at the
 	 * end of PCIe r6.0.1 sec 7.5.3.7.
 	 */
-	rc = pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+	rc = pcie_wait_for_link_status(pdev, true, false);
 	if (rc)
 		return rc;