[1/1,SRU,Bionic] PCI: Restore config space on runtime resume despite being unbound

Message ID 20180626075530.9200-2-acelan.kao@canonical.com
State New
Headers show
Series
  • Fix Nvidia fails after switching its mode
Related show

Commit Message

AceLan Kao June 26, 2018, 7:55 a.m.
From: "Rafael J. Wysocki" <rjw@rjwysocki.net>

BugLink: https://bugs.launchpad.net/bugs/1778658

We leave PCI devices not bound to a driver in D0 during runtime suspend.
But they may have a parent which is bound and can be transitioned to
D3cold at runtime.  Once the parent goes to D3cold, the unbound child
may go to D3cold as well.  When the child goes to D3cold, its internal
state, including configuration of BARs, MSI, ASPM, MPS, etc., is lost.

One example are recent hybrid graphics laptops which cut power to the
discrete GPU when the root port above it goes to ACPI power state D3.
Users may provoke this by unbinding the GPU driver and allowing runtime
PM on the GPU via sysfs:  The PM core will then treat the GPU as
"suspended", which in turn allows the root port to runtime suspend,
causing the power resources listed in its _PR3 object to be powered off.
The GPU's BARs will be uninitialized when a driver later probes it.

Another example are hybrid graphics laptops where the GPU itself (rather
than the root port) is capable of runtime suspending to D3cold.  If the
GPU's integrated HDA controller is not bound and the GPU's driver
decides to runtime suspend to D3cold, the HDA controller's BARs will be
uninitialized when a driver later probes it.

Fix by saving and restoring config space over a runtime suspend cycle
even if the device is not bound.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Peter Wu <peter@lekensteyn.nl>              # Nvidia Optimus
Tested-by: Lukas Wunner <lukas@wunner.de>              # MacBook Pro
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[lukas: add commit message, bikeshed code comments for clarity]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://patchwork.freedesktop.org/patch/msgid/92fb6e6ae2730915eb733c08e2f76c6a313e3860.1520068884.git.lukas@wunner.de
(cherry picked from commit 5775b843a619b3c93f946e2b55a208d9f0f48b59)
Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
---
 drivers/pci/pci-driver.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

Comments

Kleber Souza July 27, 2018, 8:37 a.m. | #1
On 06/26/18 09:55, AceLan Kao wrote:
> From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> 
> BugLink: https://bugs.launchpad.net/bugs/1778658
> 
> We leave PCI devices not bound to a driver in D0 during runtime suspend.
> But they may have a parent which is bound and can be transitioned to
> D3cold at runtime.  Once the parent goes to D3cold, the unbound child
> may go to D3cold as well.  When the child goes to D3cold, its internal
> state, including configuration of BARs, MSI, ASPM, MPS, etc., is lost.
> 
> One example are recent hybrid graphics laptops which cut power to the
> discrete GPU when the root port above it goes to ACPI power state D3.
> Users may provoke this by unbinding the GPU driver and allowing runtime
> PM on the GPU via sysfs:  The PM core will then treat the GPU as
> "suspended", which in turn allows the root port to runtime suspend,
> causing the power resources listed in its _PR3 object to be powered off.
> The GPU's BARs will be uninitialized when a driver later probes it.
> 
> Another example are hybrid graphics laptops where the GPU itself (rather
> than the root port) is capable of runtime suspending to D3cold.  If the
> GPU's integrated HDA controller is not bound and the GPU's driver
> decides to runtime suspend to D3cold, the HDA controller's BARs will be
> uninitialized when a driver later probes it.
> 
> Fix by saving and restoring config space over a runtime suspend cycle
> even if the device is not bound.
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> Tested-by: Peter Wu <peter@lekensteyn.nl>              # Nvidia Optimus
> Tested-by: Lukas Wunner <lukas@wunner.de>              # MacBook Pro
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> [lukas: add commit message, bikeshed code comments for clarity]
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Link: https://patchwork.freedesktop.org/patch/msgid/92fb6e6ae2730915eb733c08e2f76c6a313e3860.1520068884.git.lukas@wunner.de
> (cherry picked from commit 5775b843a619b3c93f946e2b55a208d9f0f48b59)
> Signed-off-by: AceLan Kao <acelan.kao@canonical.com>

Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

> ---
>  drivers/pci/pci-driver.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index d79dbc377b9c..8ed242a668c8 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1226,11 +1226,14 @@ static int pci_pm_runtime_suspend(struct device *dev)
>  	int error;
>  
>  	/*
> -	 * If pci_dev->driver is not set (unbound), the device should
> -	 * always remain in D0 regardless of the runtime PM status
> +	 * If pci_dev->driver is not set (unbound), we leave the device in D0,
> +	 * but it may go to D3cold when the bridge above it runtime suspends.
> +	 * Save its config space in case that happens.
>  	 */
> -	if (!pci_dev->driver)
> +	if (!pci_dev->driver) {
> +		pci_save_state(pci_dev);
>  		return 0;
> +	}
>  
>  	if (!pm || !pm->runtime_suspend)
>  		return -ENOSYS;
> @@ -1278,16 +1281,18 @@ static int pci_pm_runtime_resume(struct device *dev)
>  	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>  
>  	/*
> -	 * If pci_dev->driver is not set (unbound), the device should
> -	 * always remain in D0 regardless of the runtime PM status
> +	 * Restoring config space is necessary even if the device is not bound
> +	 * to a driver because although we left it in D0, it may have gone to
> +	 * D3cold when the bridge above it runtime suspended.
>  	 */
> +	pci_restore_standard_config(pci_dev);
> +
>  	if (!pci_dev->driver)
>  		return 0;
>  
>  	if (!pm || !pm->runtime_resume)
>  		return -ENOSYS;
>  
> -	pci_restore_standard_config(pci_dev);
>  	pci_fixup_device(pci_fixup_resume_early, pci_dev);
>  	pci_enable_wake(pci_dev, PCI_D0, false);
>  	pci_fixup_device(pci_fixup_resume, pci_dev);
>

Patch

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d79dbc377b9c..8ed242a668c8 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1226,11 +1226,14 @@  static int pci_pm_runtime_suspend(struct device *dev)
 	int error;
 
 	/*
-	 * If pci_dev->driver is not set (unbound), the device should
-	 * always remain in D0 regardless of the runtime PM status
+	 * If pci_dev->driver is not set (unbound), we leave the device in D0,
+	 * but it may go to D3cold when the bridge above it runtime suspends.
+	 * Save its config space in case that happens.
 	 */
-	if (!pci_dev->driver)
+	if (!pci_dev->driver) {
+		pci_save_state(pci_dev);
 		return 0;
+	}
 
 	if (!pm || !pm->runtime_suspend)
 		return -ENOSYS;
@@ -1278,16 +1281,18 @@  static int pci_pm_runtime_resume(struct device *dev)
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
 	/*
-	 * If pci_dev->driver is not set (unbound), the device should
-	 * always remain in D0 regardless of the runtime PM status
+	 * Restoring config space is necessary even if the device is not bound
+	 * to a driver because although we left it in D0, it may have gone to
+	 * D3cold when the bridge above it runtime suspended.
 	 */
+	pci_restore_standard_config(pci_dev);
+
 	if (!pci_dev->driver)
 		return 0;
 
 	if (!pm || !pm->runtime_resume)
 		return -ENOSYS;
 
-	pci_restore_standard_config(pci_dev);
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
 	pci_enable_wake(pci_dev, PCI_D0, false);
 	pci_fixup_device(pci_fixup_resume, pci_dev);