diff mbox series

[v3] PCI: aardvark: Don't touch PCIe registers if no card connected

Message ID 20200702083036.12230-1-pali@kernel.org
State New
Headers show
Series [v3] PCI: aardvark: Don't touch PCIe registers if no card connected | expand

Commit Message

Pali Rohár July 2, 2020, 8:30 a.m. UTC
When there is no PCIe card connected and advk_pcie_rd_conf() or
advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
root bridge, the aardvark driver throws the following error message:

  advk-pcie d0070000.pcie: config read/write timed out

Obviously accessing PCIe registers of disconnected card is not possible.

Extend check in advk_pcie_valid_device() function for validating
availability of PCIe bus. If PCIe link is down, then the device is marked
as Not Found and the driver does not try to access these registers.

This is just an optimization to prevent accessing PCIe registers when card
is disconnected. Trying to access PCIe registers of disconnected card does
not cause any crash, kernel just needs to wait for a timeout. So if card
disappear immediately after checking for PCIe link (before accessing PCIe
registers), it does not cause any problems.

Signed-off-by: Pali Rohár <pali@kernel.org>

---
Changes in V3:
* Add comment to the code
Changes in V2:
* Update commit message, mention that this is optimization
---
 drivers/pci/controller/pci-aardvark.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Lorenzo Pieralisi July 9, 2020, 11:35 a.m. UTC | #1
On Thu, Jul 02, 2020 at 10:30:36AM +0200, Pali Rohár wrote:
> When there is no PCIe card connected and advk_pcie_rd_conf() or
> advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> root bridge, the aardvark driver throws the following error message:
> 
>   advk-pcie d0070000.pcie: config read/write timed out
> 
> Obviously accessing PCIe registers of disconnected card is not possible.
> 
> Extend check in advk_pcie_valid_device() function for validating
> availability of PCIe bus. If PCIe link is down, then the device is marked
> as Not Found and the driver does not try to access these registers.
> 
> This is just an optimization to prevent accessing PCIe registers when card
> is disconnected. Trying to access PCIe registers of disconnected card does
> not cause any crash, kernel just needs to wait for a timeout. So if card
> disappear immediately after checking for PCIe link (before accessing PCIe
> registers), it does not cause any problems.
> 
> Signed-off-by: Pali Rohár <pali@kernel.org>
> 
> ---
> Changes in V3:
> * Add comment to the code
> Changes in V2:
> * Update commit message, mention that this is optimization
> ---
>  drivers/pci/controller/pci-aardvark.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> index 90ff291c24f0..d18f389b36a1 100644
> --- a/drivers/pci/controller/pci-aardvark.c
> +++ b/drivers/pci/controller/pci-aardvark.c
> @@ -644,6 +644,13 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
>  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
>  		return false;
>  
> +	/*
> +	 * If the link goes down after we check for link-up, nothing bad
> +	 * happens but the config access times out.
> +	 */
> +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> +		return false;
> +
>  	return true;
>  }

Question: this basically means that you can only effectively enumerate
bus number == root_bus_nr and AFAICS if at probe the link did not
come up it will never do, will it ?

Isn't this equivalent to limiting the bus numbers the bridge is capable
of handling ?

Reworded: if in advk_pcie_setup_hw() the link does not come up, what's
the point of trying to enumerate the bus hierarchy below the root bus ?

Thanks,
Lorenzo
Pali Rohár July 9, 2020, 12:22 p.m. UTC | #2
On Thursday 09 July 2020 12:35:09 Lorenzo Pieralisi wrote:
> On Thu, Jul 02, 2020 at 10:30:36AM +0200, Pali Rohár wrote:
> > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > root bridge, the aardvark driver throws the following error message:
> > 
> >   advk-pcie d0070000.pcie: config read/write timed out
> > 
> > Obviously accessing PCIe registers of disconnected card is not possible.
> > 
> > Extend check in advk_pcie_valid_device() function for validating
> > availability of PCIe bus. If PCIe link is down, then the device is marked
> > as Not Found and the driver does not try to access these registers.
> > 
> > This is just an optimization to prevent accessing PCIe registers when card
> > is disconnected. Trying to access PCIe registers of disconnected card does
> > not cause any crash, kernel just needs to wait for a timeout. So if card
> > disappear immediately after checking for PCIe link (before accessing PCIe
> > registers), it does not cause any problems.
> > 
> > Signed-off-by: Pali Rohár <pali@kernel.org>
> > 
> > ---
> > Changes in V3:
> > * Add comment to the code
> > Changes in V2:
> > * Update commit message, mention that this is optimization
> > ---
> >  drivers/pci/controller/pci-aardvark.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > index 90ff291c24f0..d18f389b36a1 100644
> > --- a/drivers/pci/controller/pci-aardvark.c
> > +++ b/drivers/pci/controller/pci-aardvark.c
> > @@ -644,6 +644,13 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> >  		return false;
> >  
> > +	/*
> > +	 * If the link goes down after we check for link-up, nothing bad
> > +	 * happens but the config access times out.
> > +	 */
> > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > +		return false;
> > +
> >  	return true;
> >  }
> 
> Question: this basically means that you can only effectively enumerate
> bus number == root_bus_nr and AFAICS if at probe the link did not
> come up it will never do, will it ?
> 
> Isn't this equivalent to limiting the bus numbers the bridge is capable
> of handling ?
> 
> Reworded: if in advk_pcie_setup_hw() the link does not come up, what's
> the point of trying to enumerate the bus hierarchy below the root bus ?

Hello Lorenzo!

PCIe link can theoretically come up even after boot, but aardvark driver
currently does not support link detection at runtime. So it checks and
enumerate device only at probe time.

I do not know if hardware has some mechanism to inform kernel that PCIe
link come up (or down) and re-enumeration is required. Or the only
option is polling via advk_pcie_link_up().

So if device is not visible at the probe time then it would not appear
in system and cannot be used. This is current state.

Just to note that our hardware does not support physical hotplug of
mPCIe cards. You need to connect card when board is powered off.

So if at the aardvark probe time PCIe link is not up then trying to
enumerate devices under (software) root bridge is not needed. But it is
needed to register/enumerate software root bridge device and currently
both is done by one (recursive) call pci_host_probe().
Lorenzo Pieralisi July 9, 2020, 2:47 p.m. UTC | #3
On Thu, Jul 09, 2020 at 02:22:08PM +0200, Pali Rohár wrote:
> On Thursday 09 July 2020 12:35:09 Lorenzo Pieralisi wrote:
> > On Thu, Jul 02, 2020 at 10:30:36AM +0200, Pali Rohár wrote:
> > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > root bridge, the aardvark driver throws the following error message:
> > > 
> > >   advk-pcie d0070000.pcie: config read/write timed out
> > > 
> > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > 
> > > Extend check in advk_pcie_valid_device() function for validating
> > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > as Not Found and the driver does not try to access these registers.
> > > 
> > > This is just an optimization to prevent accessing PCIe registers when card
> > > is disconnected. Trying to access PCIe registers of disconnected card does
> > > not cause any crash, kernel just needs to wait for a timeout. So if card
> > > disappear immediately after checking for PCIe link (before accessing PCIe
> > > registers), it does not cause any problems.
> > > 
> > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > 
> > > ---
> > > Changes in V3:
> > > * Add comment to the code
> > > Changes in V2:
> > > * Update commit message, mention that this is optimization
> > > ---
> > >  drivers/pci/controller/pci-aardvark.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > index 90ff291c24f0..d18f389b36a1 100644
> > > --- a/drivers/pci/controller/pci-aardvark.c
> > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > @@ -644,6 +644,13 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > >  		return false;
> > >  
> > > +	/*
> > > +	 * If the link goes down after we check for link-up, nothing bad
> > > +	 * happens but the config access times out.
> > > +	 */
> > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > +		return false;
> > > +
> > >  	return true;
> > >  }
> > 
> > Question: this basically means that you can only effectively enumerate
> > bus number == root_bus_nr and AFAICS if at probe the link did not
> > come up it will never do, will it ?
> > 
> > Isn't this equivalent to limiting the bus numbers the bridge is capable
> > of handling ?
> > 
> > Reworded: if in advk_pcie_setup_hw() the link does not come up, what's
> > the point of trying to enumerate the bus hierarchy below the root bus ?
> 
> Hello Lorenzo!
> 
> PCIe link can theoretically come up even after boot, but aardvark driver
> currently does not support link detection at runtime. So it checks and
> enumerate device only at probe time.

If the link is not up at probe enumerating devices below the root
bus is basically useless and that's actually what is causing the
delays you are fixing. Is this correct ?

> I do not know if hardware has some mechanism to inform kernel that PCIe
> link come up (or down) and re-enumeration is required. Or the only
> option is polling via advk_pcie_link_up().
> 
> So if device is not visible at the probe time then it would not appear
> in system and cannot be used. This is current state.
> 
> Just to note that our hardware does not support physical hotplug of
> mPCIe cards. You need to connect card when board is powered off.
> 
> So if at the aardvark probe time PCIe link is not up then trying to
> enumerate devices under (software) root bridge is not needed. But it is
> needed to register/enumerate software root bridge device and currently
> both is done by one (recursive) call pci_host_probe().

I understand that but the bridge bus resource can be trimmed to just
contain the root bus because that's the only one where there is a
chance you can enumerate a device.

I would like to get Bjorn's opinion on this, I don't like these "link is
up" checks in config accessors (they are racy and honestly it is a
run-time check that does not make much sense, either it is always
true/false or it is inevitably racy) I was wondering if we can find an
alternative solution but I am not sure the one I suggested above is
better than this patch.

Lorenzo
Pali Rohár July 9, 2020, 3:09 p.m. UTC | #4
On Thursday 09 July 2020 15:47:01 Lorenzo Pieralisi wrote:
> On Thu, Jul 09, 2020 at 02:22:08PM +0200, Pali Rohár wrote:
> > On Thursday 09 July 2020 12:35:09 Lorenzo Pieralisi wrote:
> > > On Thu, Jul 02, 2020 at 10:30:36AM +0200, Pali Rohár wrote:
> > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > root bridge, the aardvark driver throws the following error message:
> > > > 
> > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > 
> > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > 
> > > > Extend check in advk_pcie_valid_device() function for validating
> > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > as Not Found and the driver does not try to access these registers.
> > > > 
> > > > This is just an optimization to prevent accessing PCIe registers when card
> > > > is disconnected. Trying to access PCIe registers of disconnected card does
> > > > not cause any crash, kernel just needs to wait for a timeout. So if card
> > > > disappear immediately after checking for PCIe link (before accessing PCIe
> > > > registers), it does not cause any problems.
> > > > 
> > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > 
> > > > ---
> > > > Changes in V3:
> > > > * Add comment to the code
> > > > Changes in V2:
> > > > * Update commit message, mention that this is optimization
> > > > ---
> > > >  drivers/pci/controller/pci-aardvark.c | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > index 90ff291c24f0..d18f389b36a1 100644
> > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > @@ -644,6 +644,13 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > >  		return false;
> > > >  
> > > > +	/*
> > > > +	 * If the link goes down after we check for link-up, nothing bad
> > > > +	 * happens but the config access times out.
> > > > +	 */
> > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > +		return false;
> > > > +
> > > >  	return true;
> > > >  }
> > > 
> > > Question: this basically means that you can only effectively enumerate
> > > bus number == root_bus_nr and AFAICS if at probe the link did not
> > > come up it will never do, will it ?
> > > 
> > > Isn't this equivalent to limiting the bus numbers the bridge is capable
> > > of handling ?
> > > 
> > > Reworded: if in advk_pcie_setup_hw() the link does not come up, what's
> > > the point of trying to enumerate the bus hierarchy below the root bus ?
> > 
> > Hello Lorenzo!
> > 
> > PCIe link can theoretically come up even after boot, but aardvark driver
> > currently does not support link detection at runtime. So it checks and
> > enumerate device only at probe time.
> 
> If the link is not up at probe enumerating devices below the root
> bus is basically useless and that's actually what is causing the
> delays you are fixing. Is this correct ?

Yes, this is one (but not the only one) delay.

> > I do not know if hardware has some mechanism to inform kernel that PCIe
> > link come up (or down) and re-enumeration is required. Or the only
> > option is polling via advk_pcie_link_up().
> > 
> > So if device is not visible at the probe time then it would not appear
> > in system and cannot be used. This is current state.
> > 
> > Just to note that our hardware does not support physical hotplug of
> > mPCIe cards. You need to connect card when board is powered off.
> > 
> > So if at the aardvark probe time PCIe link is not up then trying to
> > enumerate devices under (software) root bridge is not needed. But it is
> > needed to register/enumerate software root bridge device and currently
> > both is done by one (recursive) call pci_host_probe().
> 
> I understand that but the bridge bus resource can be trimmed to just
> contain the root bus because that's the only one where there is a
> chance you can enumerate a device.

It is possible to register only root bridge without endpoint?

> I would like to get Bjorn's opinion on this, I don't like these "link is
> up" checks in config accessors (they are racy and honestly it is a
> run-time check that does not make much sense, either it is always
> true/false or it is inevitably racy)

It is runtime check, but does not have to be always true/false. I have
tested more Compex wifi cards and under certain conditions they
"disappear" from the bus during usage.

So I think it still make sense to do this "fast" check as it is only
optimization.

> I was wondering if we can find an
> alternative solution but I am not sure the one I suggested above is
> better than this patch.

I do not know if it helps in situation when card disappear from bus on
runtime...
Lorenzo Pieralisi July 10, 2020, 9:18 a.m. UTC | #5
On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:

[...]

> > I understand that but the bridge bus resource can be trimmed to just
> > contain the root bus because that's the only one where there is a
> > chance you can enumerate a device.
> 
> It is possible to register only root bridge without endpoint?

It is possible to register the root bridge with a trimmed IORESOURCE_BUS
so that you don't enumerate anything other than the root port.

> > I would like to get Bjorn's opinion on this, I don't like these "link is
> > up" checks in config accessors (they are racy and honestly it is a
> > run-time check that does not make much sense, either it is always
> > true/false or it is inevitably racy)
> 
> It is runtime check, but does not have to be always true/false. I have
> tested more Compex wifi cards and under certain conditions they
> "disappear" from the bus during usage.

I would be very grateful if you could describe what happens in HW
when these conditions trigger - I would like to understand if this
issue is aardvark specific or it isn't.

> So I think it still make sense to do this "fast" check as it is only
> optimization.

I will merge this patch but I'd also like to understand the underlying
issue better.

Thanks,
Lorenzo
Pali Rohár July 10, 2020, 3:44 p.m. UTC | #6
On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> I would be very grateful if you could describe what happens in HW
> when these conditions trigger - I would like to understand if this
> issue is aardvark specific or it isn't.

Hello Lorenzo! We are not sure what is the problem and where it happens.

There are more issues which happens randomly or under some specific
conditions.

I can reproduce following issue: Connect Compex WLE900VX card, configure
aardvark to gen2 mode. And then card is detected only after the first
link training. If kernel tries to retrain link again (e.g. via ASPM
code) then card is not detected anymore. To detect it again it is needed
to reset card via PERST# signal (assert PERST#, wait, de-assert PERST#).
PCI warm, hot or function reset does not help. When aardvark is
configured in gen1 mode then card is detected fine also after multiple
link training.

Above problem does not happen with Compex WLE200VX (ath9k) or Compex
WLE1216V5-20 cards.

Sometimes WLE900VX card disappear from the bus during usage. It just
stop communicates with ath10k driver and aardvark does not see link.

Another issue which happens for WLE900VX, WLE600VX and WLE1216VS-20 (but
not for WLE200VX): Linux kernel can detect these cards only if it issues
card reset via PERST# signal and start link training (via standard pcie
endpoint register PCI_EXP_LNKCTL/PCI_EXP_LNKCTL_RL) immediately after
enable link training in aardvark (via aardvark specific LINK_TRAINING_EN
bit). If there is e.g. 100ms delay between enabling link training and
setting PCI_EXP_LNKCTL_RL bit then these cards are not detected.

Also issuing reset via PERST# signal is required to detect these cards
if either board was rebooted (not started from cold power off state) or
if U-Boot touched/initialized PCIe aardvark.

WLE200VX works fine also after doing second or third link training and
also works without need to issue reset via PERST# signal.

And WLE900VX card is not detected even after resetting it via PERST#
signal if aardvark link training (LINK_TRAINING_EN bit) was enabled
prior toggling PERST#. PERST# signal is controlled via GPIO.

When I put WLE900VX card into board with uses mvebu PCI driver (not
aardvak) then card is working fine, there is no need to issue card reset
via PERST#, no need to explicitly set gen mode and card is also working
after more link training.

So basically I have no idea why it happens or where is the problem,
either in aardvark or in cards or on both places. As you can see each of
tested card has different set problems.

Today I tested card from different vendor but with same Qualcomm chip as
is in WLE900VX and I observe same behavior as from Compex WLE900VX. So
it looks like that card vendor does not have to matter, important is
wifi chip inside.

I read in kernel bugzilla that WLE600VX and WLE900VX cards are buggy and
more people have problems with them. But issues described in kernel
bugzilla (like card is reporting incorrect PCI device id) I'm not
observing.

If you have any idea how to either debug these problems or come up with
idea where could be the problem, please let me know.
Bjorn Helgaas July 10, 2020, 4:08 p.m. UTC | #7
On Fri, Jul 10, 2020 at 05:44:58PM +0200, Pali Rohár wrote:
> I can reproduce following issue: Connect Compex WLE900VX card, configure
> aardvark to gen2 mode. And then card is detected only after the first
> link training. If kernel tries to retrain link again (e.g. via ASPM
> code) then card is not detected anymore. 

Somebody should go over the ASPM retrain link code and the PCIe spec
with a fine-toothed comb.  Maybe we're doing something wrong there.
Or maybe aardvark has some hardware issue and we need some sort of
quirk to work around it.

> Another issue which happens for WLE900VX, WLE600VX and WLE1216VS-20 (but
> not for WLE200VX): Linux kernel can detect these cards only if it issues
> card reset via PERST# signal and start link training (via standard pcie
> endpoint register PCI_EXP_LNKCTL/PCI_EXP_LNKCTL_RL)

I think you mean "downstream port" (not "endpoint") register?
PCI_EXP_LNKCTL_RL is only applicable to *downstream ports* (root ports
or switch downstream ports) and is reserved for endpoints.

> immediately after
> enable link training in aardvark (via aardvark specific LINK_TRAINING_EN
> bit). If there is e.g. 100ms delay between enabling link training and
> setting PCI_EXP_LNKCTL_RL bit then these cards are not detected.

This sounds problematic.  Hardware should not be dependent on the
software being "fast enough".  In general we should be able to insert
arbitrary delays at any point without breaking anything.

But I have the impression that aardvark requires more software
hand-holding that most hardware does.  If it imposes timing
requirements on the software, that *should* be documented in the
aardvark spec.

> I read in kernel bugzilla that WLE600VX and WLE900VX cards are buggy and
> more people have problems with them. But issues described in kernel
> bugzilla (like card is reporting incorrect PCI device id) I'm not
> observing.

Pointer?  Is the incorrect device ID 0xffff?  That could be a symptom
of a PCIe error.  If we read a device ID that's something other than
0, 0xffff, or the correct ID, that would be really weird.  Even 0
would be really strange.

I suspect these wifi cards are a little special because they probably
play unusual games with power for airplane mode and the like.

Bjorn
Pali Rohár July 10, 2020, 7:30 p.m. UTC | #8
On Friday 10 July 2020 11:08:28 Bjorn Helgaas wrote:
> On Fri, Jul 10, 2020 at 05:44:58PM +0200, Pali Rohár wrote:
> > I can reproduce following issue: Connect Compex WLE900VX card, configure
> > aardvark to gen2 mode. And then card is detected only after the first
> > link training. If kernel tries to retrain link again (e.g. via ASPM
> > code) then card is not detected anymore. 
> 
> Somebody should go over the ASPM retrain link code and the PCIe spec
> with a fine-toothed comb.  Maybe we're doing something wrong there.

I think this is not ASPM related as card simply disappear just after
flipping PCI_EXP_LNKCTL_RL bit second time without changing ASPM bits.

> Or maybe aardvark has some hardware issue and we need some sort of
> quirk to work around it.

It is possible that this is aardvark issue. As I said I really do not
know.

In aardvark driver there is already merged workaround for this issue:
driver force gen1 aardvark mode for gen1 card.

> > Another issue which happens for WLE900VX, WLE600VX and WLE1216VS-20 (but
> > not for WLE200VX): Linux kernel can detect these cards only if it issues
> > card reset via PERST# signal and start link training (via standard pcie
> > endpoint register PCI_EXP_LNKCTL/PCI_EXP_LNKCTL_RL)
> 
> I think you mean "downstream port" (not "endpoint") register?

Yes.

> PCI_EXP_LNKCTL_RL is only applicable to *downstream ports* (root ports
> or switch downstream ports) and is reserved for endpoints.
> 
> > immediately after
> > enable link training in aardvark (via aardvark specific LINK_TRAINING_EN
> > bit). If there is e.g. 100ms delay between enabling link training and
> > setting PCI_EXP_LNKCTL_RL bit then these cards are not detected.
> 
> This sounds problematic.  Hardware should not be dependent on the
> software being "fast enough".  In general we should be able to insert
> arbitrary delays at any point without breaking anything.

Yes, it is problematic. For example following commit broke those cards:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4c7d053d7f77cd5c1a1ba7c7ce085ddba13d1d7

And this commit fixed it (just msleep was moved to different stage):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6964494582f56a3882c2c53b0edbfe99eb32b2e1

But we somehow need to deal with it until we find root cause.

Basically additional sleep in aardvark init phase can break WLE900VX
cards, but not WLE200VX.

And because WLE900VX works fine with pci-mvebu and WLE200VX works fine
with pci-aardvark we cannot deduce from it if problem for combination of
WLE900VX and aardvark is in WLE900VX or in aardvark.

> But I have the impression that aardvark requires more software
> hand-holding that most hardware does.  If it imposes timing
> requirements on the software, that *should* be documented in the
> aardvark spec.

There is absolutely nothing regarding to timings in documentation which
I saw. In documentation are just instructions/steps how to init PCI
subsystem and it is basically advk_pcie_setup_hw() function.

> > I read in kernel bugzilla that WLE600VX and WLE900VX cards are buggy and
> > more people have problems with them. But issues described in kernel
> > bugzilla (like card is reporting incorrect PCI device id) I'm not
> > observing.
> 
> Pointer?

Hm... I cannot find right now pointer to bugzilla, but I have pointer to
ath9k-devel mailing list with that incorrect device id:

https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html

> Is the incorrect device ID 0xffff?

No, incorrect device ID in that case is 0xabcd and vendor ID is correct
(Qualcomm).

> That could be a symptom
> of a PCIe error.  If we read a device ID that's something other than
> 0, 0xffff, or the correct ID, that would be really weird.  Even 0
> would be really strange.

It is strange and also reason why discussion on that list is long.

As I said, I'm not seeing that problem with wrong device ID.

But I know people who are observing same problem on different boards
(which do not use aardvark) as described in above mailing list thread
with Compex ath10k cards.

> I suspect these wifi cards are a little special because they probably
> play unusual games with power for airplane mode and the like.

This is another/different problem and is already "documented" in kernel
bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=84821#c52
Bjorn Helgaas July 10, 2020, 8:08 p.m. UTC | #9
On Fri, Jul 10, 2020 at 09:30:03PM +0200, Pali Rohár wrote:
> On Friday 10 July 2020 11:08:28 Bjorn Helgaas wrote:
> > On Fri, Jul 10, 2020 at 05:44:58PM +0200, Pali Rohár wrote:
> > > I can reproduce following issue: Connect Compex WLE900VX card, configure
> > > aardvark to gen2 mode. And then card is detected only after the first
> > > link training. If kernel tries to retrain link again (e.g. via ASPM
> > > code) then card is not detected anymore. 
> > 
> > Somebody should go over the ASPM retrain link code and the PCIe spec
> > with a fine-toothed comb.  Maybe we're doing something wrong there.
> 
> I think this is not ASPM related as card simply disappear just after
> flipping PCI_EXP_LNKCTL_RL bit second time without changing ASPM bits.

Right.  The retrain code in aspm.c doesn't really have anything in
particular to do with ASPM and it should probably be moved elsewhere.
So I think the problem may be related to retrain and the delays after
it in general, not to ASPM.

> There is absolutely nothing regarding to timings in documentation which
> I saw. In documentation are just instructions/steps how to init PCI
> subsystem and it is basically advk_pcie_setup_hw() function.
> 
> > > I read in kernel bugzilla that WLE600VX and WLE900VX cards are buggy and
> > > more people have problems with them. But issues described in kernel
> > > bugzilla (like card is reporting incorrect PCI device id) I'm not
> > > observing.
> 
> Hm... I cannot find right now pointer to bugzilla, but I have pointer to
> ath9k-devel mailing list with that incorrect device id:
> 
> https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html
> 
> > Is the incorrect device ID 0xffff?
> 
> No, incorrect device ID in that case is 0xabcd and vendor ID is correct
> (Qualcomm).

From a quick look at that thread, it sounds like the device isn't
quite ready yet.  In that case, it's supposed to respond with Config
Request Retry Status, and Linux is supposed to wait longer and retry.
But I don't think Linux does that quite correctly, so it could be
either a hardware problem or Linux being broken.  But I guess that's
not the current problem so I don't want to go down that rathole right
now.
Pali Rohár July 13, 2020, 8:27 a.m. UTC | #10
On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > I understand that but the bridge bus resource can be trimmed to just
> > > contain the root bus because that's the only one where there is a
> > > chance you can enumerate a device.
> > 
> > It is possible to register only root bridge without endpoint?
> 
> It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> so that you don't enumerate anything other than the root port.

Hello Lorenzo! I really do not know how to achieve it. From code it
looks like that pci/probe.c scans child buses unconditionally.

pci-aardvark.c calls pci_host_probe() which calls functions
pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
needs to be reconfigured) which then try to probe child bus via
pci_scan_child_bus_extend() because bridge is not card bus.

In function pci_scan_bridge_extend() I do not see a way how to skip
probing for child buses which would avoid enumerating aardvark root
bridge when PCIe device is not connected.

dmesg output contains:

  advk-pcie d0070000.pcie: link never came up
  advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
  pci_bus 0000:00: root bus resource [bus 00-ff]
  pci_bus 0000:00: root bus resource [mem 0xe8000000-0xe8ffffff]
  pci_bus 0000:00: root bus resource [io  0x0000-0xffff] (bus address [0xe9000000-0xe900ffff])
  pci_bus 0000:00: scanning bus
  pci 0000:00:00.0: [1b4b:0100] type 01 class 0x060400
  pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
  pci_bus 0000:00: fixups for bus
  pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
  pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
  pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
  pci_bus 0000:01: scanning bus
  advk-pcie d0070000.pcie: advk_pcie_valid_device
Lorenzo Pieralisi July 13, 2020, 11:23 a.m. UTC | #11
On Mon, Jul 13, 2020 at 10:27:47AM +0200, Pali Rohár wrote:
> On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> > On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > > I understand that but the bridge bus resource can be trimmed to just
> > > > contain the root bus because that's the only one where there is a
> > > > chance you can enumerate a device.
> > > 
> > > It is possible to register only root bridge without endpoint?
> > 
> > It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> > so that you don't enumerate anything other than the root port.
> 
> Hello Lorenzo! I really do not know how to achieve it. From code it
> looks like that pci/probe.c scans child buses unconditionally.
> 
> pci-aardvark.c calls pci_host_probe() which calls functions
> pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
> pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
> needs to be reconfigured) which then try to probe child bus via
> pci_scan_child_bus_extend() because bridge is not card bus.
> 
> In function pci_scan_bridge_extend() I do not see a way how to skip
> probing for child buses which would avoid enumerating aardvark root
> bridge when PCIe device is not connected.
> 
> dmesg output contains:
> 
>   advk-pcie d0070000.pcie: link never came up
>   advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
>   pci_bus 0000:00: root bus resource [bus 00-ff]

This resource can be limited to the root bus number only before calling
pci_host_probe() (ie see pci_parse_request_of_pci_ranges() and code in
pci_scan_bridge_extend() that programs primary/secondary/subordinate
busses) but I think that only papers over the issue, it does not fix it.

I will go over the thread again but I suspect I can merge the patch even
though I still believe there is work to be done to understand the issue
we are facing.

Lorenzo

>   pci_bus 0000:00: root bus resource [mem 0xe8000000-0xe8ffffff]
>   pci_bus 0000:00: root bus resource [io  0x0000-0xffff] (bus address [0xe9000000-0xe900ffff])
>   pci_bus 0000:00: scanning bus
>   pci 0000:00:00.0: [1b4b:0100] type 01 class 0x060400
>   pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
>   pci_bus 0000:00: fixups for bus
>   pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
>   pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
>   pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
>   pci_bus 0000:01: scanning bus
>   advk-pcie d0070000.pcie: advk_pcie_valid_device
Pali Rohár July 13, 2020, 2:50 p.m. UTC | #12
On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> I will go over the thread again but I suspect I can merge the patch even
> though I still believe there is work to be done to understand the issue
> we are facing.

Just to note that pci-mvebu.c also checks if pcie link is up before
trying to access the real PCIe interface registers, similarly as in my
patch.
Lorenzo Pieralisi July 13, 2020, 4:41 p.m. UTC | #13
On Mon, Jul 13, 2020 at 04:50:03PM +0200, Pali Rohár wrote:
> On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> > I will go over the thread again but I suspect I can merge the patch even
> > though I still believe there is work to be done to understand the issue
> > we are facing.
> 
> Just to note that pci-mvebu.c also checks if pcie link is up before
> trying to access the real PCIe interface registers, similarly as in my
> patch.

I understand - that does not change my opinion though, the link check
is just a workaround, it'd be best if we pinpoint the real issue which
is likely to a HW one.

Lorenzo
Pali Rohár July 14, 2020, 7:38 a.m. UTC | #14
On Monday 13 July 2020 17:41:40 Lorenzo Pieralisi wrote:
> On Mon, Jul 13, 2020 at 04:50:03PM +0200, Pali Rohár wrote:
> > On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> > > I will go over the thread again but I suspect I can merge the patch even
> > > though I still believe there is work to be done to understand the issue
> > > we are facing.
> > 
> > Just to note that pci-mvebu.c also checks if pcie link is up before
> > trying to access the real PCIe interface registers, similarly as in my
> > patch.
> 
> I understand - that does not change my opinion though, the link check
> is just a workaround, it'd be best if we pinpoint the real issue which
> is likely to a HW one.

Lorenzo, if you have an idea how to debug this issue or if you would
like to see some test results, let me know. I can do some tests, but I
currently really do not know more then what I wrote in previous emails.

In my opinion, problem is in HW which Marvell has not documented nor
proved that it exists. Other option is that problem is in Compex card
which can be triggered only by Marvell aardvark HW.
Pali Rohár July 15, 2020, 12:17 p.m. UTC | #15
On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> On Mon, Jul 13, 2020 at 10:27:47AM +0200, Pali Rohár wrote:
> > On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> > > On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > > > I understand that but the bridge bus resource can be trimmed to just
> > > > > contain the root bus because that's the only one where there is a
> > > > > chance you can enumerate a device.
> > > > 
> > > > It is possible to register only root bridge without endpoint?
> > > 
> > > It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> > > so that you don't enumerate anything other than the root port.
> > 
> > Hello Lorenzo! I really do not know how to achieve it. From code it
> > looks like that pci/probe.c scans child buses unconditionally.
> > 
> > pci-aardvark.c calls pci_host_probe() which calls functions
> > pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
> > pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
> > needs to be reconfigured) which then try to probe child bus via
> > pci_scan_child_bus_extend() because bridge is not card bus.
> > 
> > In function pci_scan_bridge_extend() I do not see a way how to skip
> > probing for child buses which would avoid enumerating aardvark root
> > bridge when PCIe device is not connected.
> > 
> > dmesg output contains:
> > 
> >   advk-pcie d0070000.pcie: link never came up
> >   advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
> >   pci_bus 0000:00: root bus resource [bus 00-ff]
> 
> This resource can be limited to the root bus number only before calling
> pci_host_probe() (ie see pci_parse_request_of_pci_ranges() and code in
> pci_scan_bridge_extend() that programs primary/secondary/subordinate
> busses) but I think that only papers over the issue, it does not fix it.

I looked at the code in pci/probe.c again and I do not think it is
possible to avoid scanning devices. pci_scan_child_bus_extend() is
unconditionally calling pci_scan_slot() for devfn=0 as the first thing.
And this function unconditionally calls pci_scan_device() which is
directly trying to read vendor id from config register.

So for me it looks like that kernel expects that can read vendor id and
device id from config register for device which is not connected.

And trying to read config register would cause those timeouts in
aardvark.
Lorenzo Pieralisi July 15, 2020, 4:21 p.m. UTC | #16
On Wed, Jul 15, 2020 at 02:17:26PM +0200, Pali Rohár wrote:
> On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> > On Mon, Jul 13, 2020 at 10:27:47AM +0200, Pali Rohár wrote:
> > > On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> > > > On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > > > > I understand that but the bridge bus resource can be trimmed to just
> > > > > > contain the root bus because that's the only one where there is a
> > > > > > chance you can enumerate a device.
> > > > > 
> > > > > It is possible to register only root bridge without endpoint?
> > > > 
> > > > It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> > > > so that you don't enumerate anything other than the root port.
> > > 
> > > Hello Lorenzo! I really do not know how to achieve it. From code it
> > > looks like that pci/probe.c scans child buses unconditionally.
> > > 
> > > pci-aardvark.c calls pci_host_probe() which calls functions
> > > pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
> > > pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
> > > needs to be reconfigured) which then try to probe child bus via
> > > pci_scan_child_bus_extend() because bridge is not card bus.
> > > 
> > > In function pci_scan_bridge_extend() I do not see a way how to skip
> > > probing for child buses which would avoid enumerating aardvark root
> > > bridge when PCIe device is not connected.
> > > 
> > > dmesg output contains:
> > > 
> > >   advk-pcie d0070000.pcie: link never came up
> > >   advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
> > >   pci_bus 0000:00: root bus resource [bus 00-ff]
> > 
> > This resource can be limited to the root bus number only before calling
> > pci_host_probe() (ie see pci_parse_request_of_pci_ranges() and code in
> > pci_scan_bridge_extend() that programs primary/secondary/subordinate
> > busses) but I think that only papers over the issue, it does not fix it.
> 
> I looked at the code in pci/probe.c again and I do not think it is
> possible to avoid scanning devices. pci_scan_child_bus_extend() is
> unconditionally calling pci_scan_slot() for devfn=0 as the first thing.
> And this function unconditionally calls pci_scan_device() which is
> directly trying to read vendor id from config register.
> 
> So for me it looks like that kernel expects that can read vendor id and
> device id from config register for device which is not connected.

Not if it is connected to a bus that the root port does not decode,
that's what I am saying.

> And trying to read config register would cause those timeouts in
> aardvark.

The root port (which effectively works as PCI bridge from this
standpoint) does not issue config cycles for busses that aren't within
its decoded bus range, which in turn is determined by the firmware
IORESOURCE_BUS resource.

This issue is caused by devices that are connected downstream to
the root port.

Anyway - patch merged but I would be happy to keep this discussion
going, somehow.

If the LPC20 VFIO/IOMMU/PCI microconference is approved it can be a
good venue for this to happen.

Lorenzo
Pali Rohár July 21, 2020, 8:57 a.m. UTC | #17
On Wednesday 15 July 2020 17:21:08 Lorenzo Pieralisi wrote:
> On Wed, Jul 15, 2020 at 02:17:26PM +0200, Pali Rohár wrote:
> > On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> > > On Mon, Jul 13, 2020 at 10:27:47AM +0200, Pali Rohár wrote:
> > > > On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> > > > > On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > > > > > I understand that but the bridge bus resource can be trimmed to just
> > > > > > > contain the root bus because that's the only one where there is a
> > > > > > > chance you can enumerate a device.
> > > > > > 
> > > > > > It is possible to register only root bridge without endpoint?
> > > > > 
> > > > > It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> > > > > so that you don't enumerate anything other than the root port.
> > > > 
> > > > Hello Lorenzo! I really do not know how to achieve it. From code it
> > > > looks like that pci/probe.c scans child buses unconditionally.
> > > > 
> > > > pci-aardvark.c calls pci_host_probe() which calls functions
> > > > pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
> > > > pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
> > > > needs to be reconfigured) which then try to probe child bus via
> > > > pci_scan_child_bus_extend() because bridge is not card bus.
> > > > 
> > > > In function pci_scan_bridge_extend() I do not see a way how to skip
> > > > probing for child buses which would avoid enumerating aardvark root
> > > > bridge when PCIe device is not connected.
> > > > 
> > > > dmesg output contains:
> > > > 
> > > >   advk-pcie d0070000.pcie: link never came up
> > > >   advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
> > > >   pci_bus 0000:00: root bus resource [bus 00-ff]
> > > 
> > > This resource can be limited to the root bus number only before calling
> > > pci_host_probe() (ie see pci_parse_request_of_pci_ranges() and code in
> > > pci_scan_bridge_extend() that programs primary/secondary/subordinate
> > > busses) but I think that only papers over the issue, it does not fix it.
> > 
> > I looked at the code in pci/probe.c again and I do not think it is
> > possible to avoid scanning devices. pci_scan_child_bus_extend() is
> > unconditionally calling pci_scan_slot() for devfn=0 as the first thing.
> > And this function unconditionally calls pci_scan_device() which is
> > directly trying to read vendor id from config register.
> > 
> > So for me it looks like that kernel expects that can read vendor id and
> > device id from config register for device which is not connected.
> 
> Not if it is connected to a bus that the root port does not decode,
> that's what I am saying.
> 
> > And trying to read config register would cause those timeouts in
> > aardvark.
> 
> The root port (which effectively works as PCI bridge from this
> standpoint) does not issue config cycles for busses that aren't within
> its decoded bus range, which in turn is determined by the firmware
> IORESOURCE_BUS resource.
> 
> This issue is caused by devices that are connected downstream to
> the root port.
> 
> Anyway - patch merged

Could you send me a link to git commit? I have looked into
lpieralisi/pci.git repository, but I do not see it here.

> but I would be happy to keep this discussion going, somehow.

Ok, no problem. As I said if anybody has any idea or would like to see
some tests from me, I can do it and provide results.

> If the LPC20 VFIO/IOMMU/PCI microconference is approved it can be a
> good venue for this to happen.
> 
> Lorenzo
Lorenzo Pieralisi July 21, 2020, 10:48 a.m. UTC | #18
On Tue, Jul 21, 2020 at 10:57:13AM +0200, Pali Rohár wrote:
> On Wednesday 15 July 2020 17:21:08 Lorenzo Pieralisi wrote:
> > On Wed, Jul 15, 2020 at 02:17:26PM +0200, Pali Rohár wrote:
> > > On Monday 13 July 2020 12:23:25 Lorenzo Pieralisi wrote:
> > > > On Mon, Jul 13, 2020 at 10:27:47AM +0200, Pali Rohár wrote:
> > > > > On Friday 10 July 2020 10:18:00 Lorenzo Pieralisi wrote:
> > > > > > On Thu, Jul 09, 2020 at 05:09:59PM +0200, Pali Rohár wrote:
> > > > > > > > I understand that but the bridge bus resource can be trimmed to just
> > > > > > > > contain the root bus because that's the only one where there is a
> > > > > > > > chance you can enumerate a device.
> > > > > > > 
> > > > > > > It is possible to register only root bridge without endpoint?
> > > > > > 
> > > > > > It is possible to register the root bridge with a trimmed IORESOURCE_BUS
> > > > > > so that you don't enumerate anything other than the root port.
> > > > > 
> > > > > Hello Lorenzo! I really do not know how to achieve it. From code it
> > > > > looks like that pci/probe.c scans child buses unconditionally.
> > > > > 
> > > > > pci-aardvark.c calls pci_host_probe() which calls functions
> > > > > pci_scan_root_bus_bridge() which calls pci_scan_child_bus() which calls
> > > > > pci_scan_child_bus_extend() which calls pci_scan_bridge_extend() (bridge
> > > > > needs to be reconfigured) which then try to probe child bus via
> > > > > pci_scan_child_bus_extend() because bridge is not card bus.
> > > > > 
> > > > > In function pci_scan_bridge_extend() I do not see a way how to skip
> > > > > probing for child buses which would avoid enumerating aardvark root
> > > > > bridge when PCIe device is not connected.
> > > > > 
> > > > > dmesg output contains:
> > > > > 
> > > > >   advk-pcie d0070000.pcie: link never came up
> > > > >   advk-pcie d0070000.pcie: PCI host bridge to bus 0000:00
> > > > >   pci_bus 0000:00: root bus resource [bus 00-ff]
> > > > 
> > > > This resource can be limited to the root bus number only before calling
> > > > pci_host_probe() (ie see pci_parse_request_of_pci_ranges() and code in
> > > > pci_scan_bridge_extend() that programs primary/secondary/subordinate
> > > > busses) but I think that only papers over the issue, it does not fix it.
> > > 
> > > I looked at the code in pci/probe.c again and I do not think it is
> > > possible to avoid scanning devices. pci_scan_child_bus_extend() is
> > > unconditionally calling pci_scan_slot() for devfn=0 as the first thing.
> > > And this function unconditionally calls pci_scan_device() which is
> > > directly trying to read vendor id from config register.
> > > 
> > > So for me it looks like that kernel expects that can read vendor id and
> > > device id from config register for device which is not connected.
> > 
> > Not if it is connected to a bus that the root port does not decode,
> > that's what I am saying.
> > 
> > > And trying to read config register would cause those timeouts in
> > > aardvark.
> > 
> > The root port (which effectively works as PCI bridge from this
> > standpoint) does not issue config cycles for busses that aren't within
> > its decoded bus range, which in turn is determined by the firmware
> > IORESOURCE_BUS resource.
> > 
> > This issue is caused by devices that are connected downstream to
> > the root port.
> > 
> > Anyway - patch merged
> 
> Could you send me a link to git commit? I have looked into
> lpieralisi/pci.git repository, but I do not see it here.

Apologies - I did not push it out, I have pushed it out on
pci/aardvark now.

> > but I would be happy to keep this discussion going, somehow.
> 
> Ok, no problem. As I said if anybody has any idea or would like to see
> some tests from me, I can do it and provide results.

Sounds good, I will let you know, thanks.

Lorenzo

> > If the LPC20 VFIO/IOMMU/PCI microconference is approved it can be a
> > good venue for this to happen.
> > 
> > Lorenzo
diff mbox series

Patch

diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 90ff291c24f0..d18f389b36a1 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -644,6 +644,13 @@  static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
 	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
 		return false;
 
+	/*
+	 * If the link goes down after we check for link-up, nothing bad
+	 * happens but the config access times out.
+	 */
+	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
+		return false;
+
 	return true;
 }