diff mbox series

[2/2] PCI: Do not wait for disconnected devices when resuming

Message ID 20240129112710.2852-3-ilpo.jarvinen@linux.intel.com
State New
Headers show
Series PCI: Fix disconnect related issues | expand

Commit Message

Ilpo Järvinen Jan. 29, 2024, 11:27 a.m. UTC
On runtime resume, pci_dev_wait() is called:
  pci_pm_runtime_resume()
    pci_pm_bridge_power_up_actions()
      pci_bridge_wait_for_secondary_bus()
        pci_dev_wait()

While a device is runtime suspended along with its PCIe hierarchy, the
device could get disconnected. In such case, the link will not come up
no matter how log pci_dev_wait() waits for it.

Besides the above mentioned case, there could be other ways to get the
device disconnected while pci_dev_wait() is waiting for the link to
come up.

Make pci_dev_wait() to exit if the device is already disconnected to
avoid unnecessary delay. As disconnected device is not really even a
failure in the same sense as link failing to come up for whatever
reason, return 0 instead of errno.

Also make sure compiler does not become too clever with
dev->error_state and use READ_ONCE() to force a fetch for the
up-to-date value.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/pci.c | 5 +++++
 drivers/pci/pci.h | 4 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Jan. 29, 2024, 6:55 p.m. UTC | #1
On Mon, Jan 29, 2024 at 01:27:10PM +0200, Ilpo Järvinen wrote:
> On runtime resume, pci_dev_wait() is called:
>   pci_pm_runtime_resume()
>     pci_pm_bridge_power_up_actions()
>       pci_bridge_wait_for_secondary_bus()
>         pci_dev_wait()
> 
> While a device is runtime suspended along with its PCIe hierarchy, the
> device could get disconnected. In such case, the link will not come up
> no matter how log pci_dev_wait() waits for it.

s/PCIe/PCI/ (unless this is a PCIe-specific thing)
s/log/long/

> Besides the above mentioned case, there could be other ways to get the
> device disconnected while pci_dev_wait() is waiting for the link to
> come up.
> 
> Make pci_dev_wait() to exit if the device is already disconnected to
> avoid unnecessary delay. As disconnected device is not really even a
> failure in the same sense as link failing to come up for whatever
> reason, return 0 instead of errno.

The device being disconnected is not the same as a link failure.  Do
all the callers do the right thing if pci_dev_wait() returns success
when there's no device there?

> Also make sure compiler does not become too clever with
> dev->error_state and use READ_ONCE() to force a fetch for the
> up-to-date value.

I think we should have a comment there to say why READ_ONCE() is
needed.  Otherwise it's hard to know whether a future change might
make it unnecessary.

> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  drivers/pci/pci.c | 5 +++++
>  drivers/pci/pci.h | 4 +++-
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d8f11a078924..ec9bf6c90312 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1250,6 +1250,11 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>  	for (;;) {
>  		u32 id;
>  
> +		if (pci_dev_is_disconnected(dev)) {
> +			pci_dbg(dev, "disconnected; not waiting\n");
> +			return 0;
> +		}
> +
>  		pci_read_config_dword(dev, PCI_COMMAND, &id);
>  		if (!PCI_POSSIBLE_ERROR(id))
>  			break;
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 2336a8d1edab..563a275dff67 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -4,6 +4,8 @@
>  
>  #include <linux/pci.h>
>  
> +#include <asm/rwonce.h>
> +
>  /* Number of possible devfns: 0.0 to 1f.7 inclusive */
>  #define MAX_NR_DEVFNS 256
>  
> @@ -370,7 +372,7 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
>  
>  static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
>  {
> -	return dev->error_state == pci_channel_io_perm_failure;
> +	return READ_ONCE(dev->error_state) == pci_channel_io_perm_failure;
>  }
>  
>  /* pci_dev priv_flags */
> -- 
> 2.39.2
>
Ilpo Järvinen Jan. 30, 2024, 1:15 p.m. UTC | #2
On Mon, 29 Jan 2024, Bjorn Helgaas wrote:

> On Mon, Jan 29, 2024 at 01:27:10PM +0200, Ilpo Järvinen wrote:
> > On runtime resume, pci_dev_wait() is called:
> >   pci_pm_runtime_resume()
> >     pci_pm_bridge_power_up_actions()
> >       pci_bridge_wait_for_secondary_bus()
> >         pci_dev_wait()
> > 
> > While a device is runtime suspended along with its PCIe hierarchy, the
> > device could get disconnected. In such case, the link will not come up
> > no matter how log pci_dev_wait() waits for it.
> 
> s/PCIe/PCI/ (unless this is a PCIe-specific thing)
> s/log/long/
> 
> > Besides the above mentioned case, there could be other ways to get the
> > device disconnected while pci_dev_wait() is waiting for the link to
> > come up.
> > 
> > Make pci_dev_wait() to exit if the device is already disconnected to
> > avoid unnecessary delay. As disconnected device is not really even a
> > failure in the same sense as link failing to come up for whatever
> > reason, return 0 instead of errno.
> 
> The device being disconnected is not the same as a link failure.

This we agree and it's what I tried to write above.

> Do
> all the callers do the right thing if pci_dev_wait() returns success
> when there's no device there?

It's a bit complicated. I honestly don't know what is the best approach 
here so I'm very much open to your suggestion what would be preferrable.

There are two main use cases (more than two callers but they seem boil 
down to two use cases).

One use case is reset related functions and those would probably prefer to 
have error returned if the wait, and thus reset, failed.

Then the another is wait for buses, that is, 
pci_bridge_wait_for_secondary_bus() which return 0 if there's no device 
(wait successful). For it, it would make sense to return 0 also when 
device is disconnected because it seems analoguous to the case where 
there's no device in the first place.

Perhaps it would be better to return -ENOTTY from pci_dev_wait() and check 
for that disconnected condition inside 
pci_bridge_wait_for_secondary_bus()? To further complicate things, 
however, DPC also depends on the return value of 
pci_bridge_wait_for_secondary_bus() and from its perspective, returning 
error when there is a device that is disconnected might be the desirable 
alternative (but I'm not fully sure how everything in DPC works but I 
highly suspect I'm correct here).

Either way, the fix itself does care and seemed to work regardless of the 
actual value returned by pci_dev_wait().

> > Also make sure compiler does not become too clever with
> > dev->error_state and use READ_ONCE() to force a fetch for the
> > up-to-date value.
> 
> I think we should have a comment there to say why READ_ONCE() is
> needed.  Otherwise it's hard to know whether a future change might
> make it unnecessary.

Sure, I'll add a comment there.
Ilpo Järvinen Feb. 2, 2024, 5:03 p.m. UTC | #3
On Tue, 30 Jan 2024, Ilpo Järvinen wrote:

> On Mon, 29 Jan 2024, Bjorn Helgaas wrote:
> 
> > On Mon, Jan 29, 2024 at 01:27:10PM +0200, Ilpo Järvinen wrote:
> > > On runtime resume, pci_dev_wait() is called:
> > >   pci_pm_runtime_resume()
> > >     pci_pm_bridge_power_up_actions()
> > >       pci_bridge_wait_for_secondary_bus()
> > >         pci_dev_wait()
> > > 
> > > While a device is runtime suspended along with its PCIe hierarchy, the
> > > device could get disconnected. In such case, the link will not come up
> > > no matter how log pci_dev_wait() waits for it.
> > 
> > s/PCIe/PCI/ (unless this is a PCIe-specific thing)
> > s/log/long/
> > 
> > > Besides the above mentioned case, there could be other ways to get the
> > > device disconnected while pci_dev_wait() is waiting for the link to
> > > come up.
> > > 
> > > Make pci_dev_wait() to exit if the device is already disconnected to
> > > avoid unnecessary delay. As disconnected device is not really even a
> > > failure in the same sense as link failing to come up for whatever
> > > reason, return 0 instead of errno.
> > 
> > The device being disconnected is not the same as a link failure.
> 
> This we agree and it's what I tried to write above.
> 
> > Do
> > all the callers do the right thing if pci_dev_wait() returns success
> > when there's no device there?
> 
> It's a bit complicated. I honestly don't know what is the best approach 
> here so I'm very much open to your suggestion what would be preferrable.
> 
> There are two main use cases (more than two callers but they seem boil 
> down to two use cases).
> 
> One use case is reset related functions and those would probably prefer to 
> have error returned if the wait, and thus reset, failed.
> 
> Then the another is wait for buses, that is, 
> pci_bridge_wait_for_secondary_bus() which return 0 if there's no device 
> (wait successful). For it, it would make sense to return 0 also when 
> device is disconnected because it seems analoguous to the case where 
> there's no device in the first place.
> 
> Perhaps it would be better to return -ENOTTY from pci_dev_wait() and check 
> for that disconnected condition inside 
> pci_bridge_wait_for_secondary_bus()? To further complicate things, 
> however, DPC also depends on the return value of 
> pci_bridge_wait_for_secondary_bus() and from its perspective, returning 
> error when there is a device that is disconnected might be the desirable 
> alternative (but I'm not fully sure how everything in DPC works but I 
> highly suspect I'm correct here).

Just to note here I intend to reverse the return to -ENOTTY in v2. 
It is easier and doing anything more complicated than that felt 
over-engineering because it would have just avoided marking same 
disconnected devices disconnected for another time which is harmless.
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d8f11a078924..ec9bf6c90312 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1250,6 +1250,11 @@  static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 	for (;;) {
 		u32 id;
 
+		if (pci_dev_is_disconnected(dev)) {
+			pci_dbg(dev, "disconnected; not waiting\n");
+			return 0;
+		}
+
 		pci_read_config_dword(dev, PCI_COMMAND, &id);
 		if (!PCI_POSSIBLE_ERROR(id))
 			break;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2336a8d1edab..563a275dff67 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -4,6 +4,8 @@ 
 
 #include <linux/pci.h>
 
+#include <asm/rwonce.h>
+
 /* Number of possible devfns: 0.0 to 1f.7 inclusive */
 #define MAX_NR_DEVFNS 256
 
@@ -370,7 +372,7 @@  static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
 
 static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 {
-	return dev->error_state == pci_channel_io_perm_failure;
+	return READ_ONCE(dev->error_state) == pci_channel_io_perm_failure;
 }
 
 /* pci_dev priv_flags */