diff mbox series

PCI: aardvark: Don't touch PCIe registers if no card connected

Message ID 20200528143141.29956-1-pali@kernel.org
State New
Headers show
Series PCI: aardvark: Don't touch PCIe registers if no card connected | expand

Commit Message

Pali Rohár May 28, 2020, 2:31 p.m. UTC
When there is no PCIe card connected and advk_pcie_rd_conf() or
advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
root bridge, the aardvark driver throws the following error message:

  advk-pcie d0070000.pcie: config read/write timed out

Obviously accessing PCIe registers of disconnected card is not possible.

Extend check in advk_pcie_valid_device() function for validating
availability of PCIe bus. If PCIe link is down, then the device is marked
as Not Found and the driver does not try to access these registers.

Signed-off-by: Pali Rohár <pali@kernel.org>
---
 drivers/pci/controller/pci-aardvark.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Bjorn Helgaas May 28, 2020, 4:26 p.m. UTC | #1
On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> When there is no PCIe card connected and advk_pcie_rd_conf() or
> advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> root bridge, the aardvark driver throws the following error message:
> 
>   advk-pcie d0070000.pcie: config read/write timed out
> 
> Obviously accessing PCIe registers of disconnected card is not possible.
> 
> Extend check in advk_pcie_valid_device() function for validating
> availability of PCIe bus. If PCIe link is down, then the device is marked
> as Not Found and the driver does not try to access these registers.
> 
> Signed-off-by: Pali Rohár <pali@kernel.org>
> ---
>  drivers/pci/controller/pci-aardvark.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> index 90ff291c24f0..53a4cfd7d377 100644
> --- a/drivers/pci/controller/pci-aardvark.c
> +++ b/drivers/pci/controller/pci-aardvark.c
> @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
>  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
>  		return false;
>  
> +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> +		return false;

I don't think this is the right fix.  This makes it racy because the
link may go down after we call advk_pcie_valid_device() but before we
perform the config read.

I have no objection to removing the "config read/write timed out"
message.  The "return PCIBIOS_SET_FAILED" in the read case probably
should be augmented by setting "*val = 0xffffffff".

>  	return true;
>  }
>  
> -- 
> 2.20.1
>
Pali Rohár May 28, 2020, 4:38 p.m. UTC | #2
On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > root bridge, the aardvark driver throws the following error message:
> > 
> >   advk-pcie d0070000.pcie: config read/write timed out
> > 
> > Obviously accessing PCIe registers of disconnected card is not possible.
> > 
> > Extend check in advk_pcie_valid_device() function for validating
> > availability of PCIe bus. If PCIe link is down, then the device is marked
> > as Not Found and the driver does not try to access these registers.
> > 
> > Signed-off-by: Pali Rohár <pali@kernel.org>
> > ---
> >  drivers/pci/controller/pci-aardvark.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > index 90ff291c24f0..53a4cfd7d377 100644
> > --- a/drivers/pci/controller/pci-aardvark.c
> > +++ b/drivers/pci/controller/pci-aardvark.c
> > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> >  		return false;
> >  
> > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > +		return false;
> 
> I don't think this is the right fix.  This makes it racy because the
> link may go down after we call advk_pcie_valid_device() but before we
> perform the config read.

Yes, it is racy, but I do not think it cause problems. Trying to read
PCIe registers when device is not connected cause just those timeouts,
printing error message and increased delay in advk_pcie_wait_pio() due
to polling loop. This patch reduce unnecessary access to PCIe registers
when advk_pcie_wait_pio() polling just fail.

I think it is a good idea to not call blocking advk_pcie_wait_pio() when
it is not needed. We could have faster enumeration of PCIe buses when
card is not connected.

> I have no objection to removing the "config read/write timed out"
> message.  The "return PCIBIOS_SET_FAILED" in the read case probably
> should be augmented by setting "*val = 0xffffffff".
> 
> >  	return true;
> >  }
> >  
> > -- 
> > 2.20.1
> >
Bjorn Helgaas May 28, 2020, 4:49 p.m. UTC | #3
On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > root bridge, the aardvark driver throws the following error message:
> > > 
> > >   advk-pcie d0070000.pcie: config read/write timed out
> > > 
> > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > 
> > > Extend check in advk_pcie_valid_device() function for validating
> > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > as Not Found and the driver does not try to access these registers.
> > > 
> > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > ---
> > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > index 90ff291c24f0..53a4cfd7d377 100644
> > > --- a/drivers/pci/controller/pci-aardvark.c
> > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > >  		return false;
> > >  
> > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > +		return false;
> > 
> > I don't think this is the right fix.  This makes it racy because the
> > link may go down after we call advk_pcie_valid_device() but before we
> > perform the config read.
> 
> Yes, it is racy, but I do not think it cause problems. Trying to read
> PCIe registers when device is not connected cause just those timeouts,
> printing error message and increased delay in advk_pcie_wait_pio() due
> to polling loop. This patch reduce unnecessary access to PCIe registers
> when advk_pcie_wait_pio() polling just fail.
> 
> I think it is a good idea to not call blocking advk_pcie_wait_pio() when
> it is not needed. We could have faster enumeration of PCIe buses when
> card is not connected.

Maybe advk_pcie_check_pio_status() and advk_pcie_wait_pio() could be
combined so we could get the correct error status as soon as it's
available, without waiting for a timeout?

In any event, the "return PCIBIOS_SET_FAILED" needs to be fixed.  Most
callers of config read do not check for failure, but most of the ones
that do, check for "val == ~0".  Only a few check for a status of
other than PCIBIOS_SUCCESSFUL.

> > I have no objection to removing the "config read/write timed out"
> > message.  The "return PCIBIOS_SET_FAILED" in the read case probably
> > should be augmented by setting "*val = 0xffffffff".
> > 
> > >  	return true;
> > >  }
> > >  
> > > -- 
> > > 2.20.1
> > >
Pali Rohár May 29, 2020, 8:30 a.m. UTC | #4
On Thursday 28 May 2020 11:49:38 Bjorn Helgaas wrote:
> On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> > On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > root bridge, the aardvark driver throws the following error message:
> > > > 
> > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > 
> > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > 
> > > > Extend check in advk_pcie_valid_device() function for validating
> > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > as Not Found and the driver does not try to access these registers.
> > > > 
> > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > ---
> > > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > index 90ff291c24f0..53a4cfd7d377 100644
> > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > >  		return false;
> > > >  
> > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > +		return false;
> > > 
> > > I don't think this is the right fix.  This makes it racy because the
> > > link may go down after we call advk_pcie_valid_device() but before we
> > > perform the config read.
> > 
> > Yes, it is racy, but I do not think it cause problems. Trying to read
> > PCIe registers when device is not connected cause just those timeouts,
> > printing error message and increased delay in advk_pcie_wait_pio() due
> > to polling loop. This patch reduce unnecessary access to PCIe registers
> > when advk_pcie_wait_pio() polling just fail.
> > 
> > I think it is a good idea to not call blocking advk_pcie_wait_pio() when
> > it is not needed. We could have faster enumeration of PCIe buses when
> > card is not connected.
> 
> Maybe advk_pcie_check_pio_status() and advk_pcie_wait_pio() could be
> combined so we could get the correct error status as soon as it's
> available, without waiting for a timeout?

Any idea how to achieve it?

First call is polling function advk_pcie_wait_pio() and second call is
advk_pcie_check_pio_status() which just reads status register and prints
error message to dmesg.

So for me it looks like that combining these two functions into one does
not change anything. We always need to call polling code prior to
checking status register. And therefore need to wait for timeout. Unless
something like in this proposed patch is not used (to skip whole
register access if it would fail).

> In any event, the "return PCIBIOS_SET_FAILED" needs to be fixed.  Most
> callers of config read do not check for failure, but most of the ones
> that do, check for "val == ~0".  Only a few check for a status of
> other than PCIBIOS_SUCCESSFUL.
> 
> > > I have no objection to removing the "config read/write timed out"
> > > message.  The "return PCIBIOS_SET_FAILED" in the read case probably
> > > should be augmented by setting "*val = 0xffffffff".

Now I see, "*val = 0xffffffff" should be really set when function
advk_pcie_rd_conf() fails.

> > > >  	return true;
> > > >  }
> > > >  
> > > > -- 
> > > > 2.20.1
> > > >
Pali Rohár June 30, 2020, 12:31 p.m. UTC | #5
Hello!

On Friday 29 May 2020 10:30:13 Pali Rohár wrote:
> On Thursday 28 May 2020 11:49:38 Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> > > On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > > > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > > root bridge, the aardvark driver throws the following error message:
> > > > > 
> > > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > > 
> > > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > > 
> > > > > Extend check in advk_pcie_valid_device() function for validating
> > > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > > as Not Found and the driver does not try to access these registers.
> > > > > 
> > > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > > ---
> > > > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > > index 90ff291c24f0..53a4cfd7d377 100644
> > > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > > >  		return false;
> > > > >  
> > > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > > +		return false;
> > > > 
> > > > I don't think this is the right fix.  This makes it racy because the
> > > > link may go down after we call advk_pcie_valid_device() but before we
> > > > perform the config read.
> > > 
> > > Yes, it is racy, but I do not think it cause problems. Trying to read
> > > PCIe registers when device is not connected cause just those timeouts,
> > > printing error message and increased delay in advk_pcie_wait_pio() due
> > > to polling loop. This patch reduce unnecessary access to PCIe registers
> > > when advk_pcie_wait_pio() polling just fail.
> > > 
> > > I think it is a good idea to not call blocking advk_pcie_wait_pio() when
> > > it is not needed. We could have faster enumeration of PCIe buses when
> > > card is not connected.
> > 
> > Maybe advk_pcie_check_pio_status() and advk_pcie_wait_pio() could be
> > combined so we could get the correct error status as soon as it's
> > available, without waiting for a timeout?
> 
> Any idea how to achieve it?
> 
> First call is polling function advk_pcie_wait_pio() and second call is
> advk_pcie_check_pio_status() which just reads status register and prints
> error message to dmesg.
> 
> So for me it looks like that combining these two functions into one does
> not change anything. We always need to call polling code prior to
> checking status register. And therefore need to wait for timeout. Unless
> something like in this proposed patch is not used (to skip whole
> register access if it would fail).

So to answer your question, correct status is possible to retrieve only
after waiting for timeout. As status would be available only after
timeout expires.

Therefore my proposed patch in this (or some other) form is needed if we
want to prevent trying to read from registers and waiting for answer
when card is disconnected.

I would really like to see this issue fixed, so booting linux kernel on
board without connected PCIe card would not be delayed.

Thomas, Lorenzo, Bjorn: do you have any idea how to fix it differently?
Or if not, could be my proposed patch accepted in some form?

> > In any event, the "return PCIBIOS_SET_FAILED" needs to be fixed.  Most
> > callers of config read do not check for failure, but most of the ones
> > that do, check for "val == ~0".  Only a few check for a status of
> > other than PCIBIOS_SUCCESSFUL.
> > 
> > > > I have no objection to removing the "config read/write timed out"
> > > > message.  The "return PCIBIOS_SET_FAILED" in the read case probably
> > > > should be augmented by setting "*val = 0xffffffff".
> 
> Now I see, "*val = 0xffffffff" should be really set when function
> advk_pcie_rd_conf() fails.

I have already sent separate patch which fixes this issue.

> > > > >  	return true;
> > > > >  }
> > > > >  
> > > > > -- 
> > > > > 2.20.1
> > > > >
Bjorn Helgaas June 30, 2020, 1:51 p.m. UTC | #6
On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > root bridge, the aardvark driver throws the following error message:
> > > 
> > >   advk-pcie d0070000.pcie: config read/write timed out
> > > 
> > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > 
> > > Extend check in advk_pcie_valid_device() function for validating
> > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > as Not Found and the driver does not try to access these registers.
> > > 
> > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > ---
> > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > index 90ff291c24f0..53a4cfd7d377 100644
> > > --- a/drivers/pci/controller/pci-aardvark.c
> > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > >  		return false;
> > >  
> > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > +		return false;
> > 
> > I don't think this is the right fix.  This makes it racy because the
> > link may go down after we call advk_pcie_valid_device() but before we
> > perform the config read.
> 
> Yes, it is racy, but I do not think it cause problems. Trying to read
> PCIe registers when device is not connected cause just those timeouts,
> printing error message and increased delay in advk_pcie_wait_pio() due
> to polling loop. This patch reduce unnecessary access to PCIe registers
> when advk_pcie_wait_pio() polling just fail.

What happens when the device is removed after advk_pcie_link_up()
returns true, but before we actually do the config access?
Pali Rohár June 30, 2020, 2:04 p.m. UTC | #7
On Tuesday 30 June 2020 08:51:27 Bjorn Helgaas wrote:
> On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> > On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > root bridge, the aardvark driver throws the following error message:
> > > > 
> > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > 
> > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > 
> > > > Extend check in advk_pcie_valid_device() function for validating
> > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > as Not Found and the driver does not try to access these registers.
> > > > 
> > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > ---
> > > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > index 90ff291c24f0..53a4cfd7d377 100644
> > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > >  		return false;
> > > >  
> > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > +		return false;
> > > 
> > > I don't think this is the right fix.  This makes it racy because the
> > > link may go down after we call advk_pcie_valid_device() but before we
> > > perform the config read.
> > 
> > Yes, it is racy, but I do not think it cause problems. Trying to read
> > PCIe registers when device is not connected cause just those timeouts,
> > printing error message and increased delay in advk_pcie_wait_pio() due
> > to polling loop. This patch reduce unnecessary access to PCIe registers
> > when advk_pcie_wait_pio() polling just fail.
> 
> What happens when the device is removed after advk_pcie_link_up()
> returns true, but before we actually do the config access?

Do you mean to remove device physically at runtime? I was told that our
board would crash or issue reset. Removing device from mini PCIe slot
without power off is not supported.

Anyway, currently we are trying to read from device registers even when
no device is connected. So when advk_pcie_link_up() returns true and
after that device is not connected (somehow board and kernel would be
still alive) I guess that it would behave as without applying this
patch. So kernel starts reading from register and would wait until
timeout expires. As device is not connected there would be no answer,
so kernel print error message to dmesg (same as in commit message) and
returns error that read failed.
Bjorn Helgaas June 30, 2020, 2:58 p.m. UTC | #8
On Tue, Jun 30, 2020 at 04:04:20PM +0200, Pali Rohár wrote:
> On Tuesday 30 June 2020 08:51:27 Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> > > On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > > > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > > root bridge, the aardvark driver throws the following error message:
> > > > > 
> > > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > > 
> > > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > > 
> > > > > Extend check in advk_pcie_valid_device() function for validating
> > > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > > as Not Found and the driver does not try to access these registers.
> > > > > 
> > > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > > ---
> > > > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > > index 90ff291c24f0..53a4cfd7d377 100644
> > > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > > >  		return false;
> > > > >  
> > > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > > +		return false;
> > > > 
> > > > I don't think this is the right fix.  This makes it racy because the
> > > > link may go down after we call advk_pcie_valid_device() but before we
> > > > perform the config read.
> > > 
> > > Yes, it is racy, but I do not think it cause problems. Trying to read
> > > PCIe registers when device is not connected cause just those timeouts,
> > > printing error message and increased delay in advk_pcie_wait_pio() due
> > > to polling loop. This patch reduce unnecessary access to PCIe registers
> > > when advk_pcie_wait_pio() polling just fail.
> > 
> > What happens when the device is removed after advk_pcie_link_up()
> > returns true, but before we actually do the config access?
> 
> Do you mean to remove device physically at runtime? I was told that our
> board would crash or issue reset. Removing device from mini PCIe slot
> without power off is not supported.

Right, I don't think PCIe mini cards support hotplug.

> Anyway, currently we are trying to read from device registers even when
> no device is connected. So when advk_pcie_link_up() returns true and
> after that device is not connected (somehow board and kernel would be
> still alive) I guess that it would behave as without applying this
> patch. So kernel starts reading from register and would wait until
> timeout expires. As device is not connected there would be no answer,
> so kernel print error message to dmesg (same as in commit message) and
> returns error that read failed.

OK, so if I understand correctly, checking advk_pcie_link_up() is
strictly an optimization.  If we guess wrong (e.g., after calling
advk_pcie_link_up(), the link went down because the card was removed,
DPC triggered, etc), the only bad thing is that we wait for a timeout;
it never causes a crash.

If that's the case, I'm fine with this.  But please add a comment to
that effect.

I think several other drivers check for the link being up because we
actually crash if we try to read config space when the link is down.
That's what I was trying to avoid here.

Bjorn
Pali Rohár July 1, 2020, 8:08 a.m. UTC | #9
On Tuesday 30 June 2020 09:58:48 Bjorn Helgaas wrote:
> On Tue, Jun 30, 2020 at 04:04:20PM +0200, Pali Rohár wrote:
> > On Tuesday 30 June 2020 08:51:27 Bjorn Helgaas wrote:
> > > On Thu, May 28, 2020 at 06:38:09PM +0200, Pali Rohár wrote:
> > > > On Thursday 28 May 2020 11:26:04 Bjorn Helgaas wrote:
> > > > > On Thu, May 28, 2020 at 04:31:41PM +0200, Pali Rohár wrote:
> > > > > > When there is no PCIe card connected and advk_pcie_rd_conf() or
> > > > > > advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated
> > > > > > root bridge, the aardvark driver throws the following error message:
> > > > > > 
> > > > > >   advk-pcie d0070000.pcie: config read/write timed out
> > > > > > 
> > > > > > Obviously accessing PCIe registers of disconnected card is not possible.
> > > > > > 
> > > > > > Extend check in advk_pcie_valid_device() function for validating
> > > > > > availability of PCIe bus. If PCIe link is down, then the device is marked
> > > > > > as Not Found and the driver does not try to access these registers.
> > > > > > 
> > > > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > > > ---
> > > > > >  drivers/pci/controller/pci-aardvark.c | 3 +++
> > > > > >  1 file changed, 3 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
> > > > > > index 90ff291c24f0..53a4cfd7d377 100644
> > > > > > --- a/drivers/pci/controller/pci-aardvark.c
> > > > > > +++ b/drivers/pci/controller/pci-aardvark.c
> > > > > > @@ -644,6 +644,9 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
> > > > > >  	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
> > > > > >  		return false;
> > > > > >  
> > > > > > +	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
> > > > > > +		return false;
> > > > > 
> > > > > I don't think this is the right fix.  This makes it racy because the
> > > > > link may go down after we call advk_pcie_valid_device() but before we
> > > > > perform the config read.
> > > > 
> > > > Yes, it is racy, but I do not think it cause problems. Trying to read
> > > > PCIe registers when device is not connected cause just those timeouts,
> > > > printing error message and increased delay in advk_pcie_wait_pio() due
> > > > to polling loop. This patch reduce unnecessary access to PCIe registers
> > > > when advk_pcie_wait_pio() polling just fail.
> > > 
> > > What happens when the device is removed after advk_pcie_link_up()
> > > returns true, but before we actually do the config access?
> > 
> > Do you mean to remove device physically at runtime? I was told that our
> > board would crash or issue reset. Removing device from mini PCIe slot
> > without power off is not supported.
> 
> Right, I don't think PCIe mini cards support hotplug.
> 
> > Anyway, currently we are trying to read from device registers even when
> > no device is connected. So when advk_pcie_link_up() returns true and
> > after that device is not connected (somehow board and kernel would be
> > still alive) I guess that it would behave as without applying this
> > patch. So kernel starts reading from register and would wait until
> > timeout expires. As device is not connected there would be no answer,
> > so kernel print error message to dmesg (same as in commit message) and
> > returns error that read failed.
> 
> OK, so if I understand correctly, checking advk_pcie_link_up() is
> strictly an optimization.  If we guess wrong (e.g., after calling
> advk_pcie_link_up(), the link went down because the card was removed,
> DPC triggered, etc), the only bad thing is that we wait for a timeout;
> it never causes a crash.

Yes.

> If that's the case, I'm fine with this.  But please add a comment to
> that effect.

Ok, I will send V2 with updated commit message.

> I think several other drivers check for the link being up because we
> actually crash if we try to read config space when the link is down.
> That's what I was trying to avoid here.
> 
> Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 90ff291c24f0..53a4cfd7d377 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -644,6 +644,9 @@  static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
 	if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0)
 		return false;
 
+	if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie))
+		return false;
+
 	return true;
 }