i2c: designware: Do not allow i2c_dw_xfer() calls while suspended

Message ID 20180923140020.31074-1-hdegoede@redhat.com
State New
Headers show
Series
  • i2c: designware: Do not allow i2c_dw_xfer() calls while suspended
Related show

Commit Message

Hans de Goede Sept. 23, 2018, 2 p.m.
On most Intel Bay- and Cherry-Trail systems the PMIC is connected over I2C
and the PMIC is accessed through various means by the _PS0 and _PS3 ACPI
methods (power on / off methods) of various devices.

This leads to suspend/resume ordering problems where a device may be
resumed and get its _PS0 method executed before the I2C controller is
resumed. On Cherry Trail this leads to errors like these:

     i2c_designware 808622C1:06: controller timed out
     ACPI Error: AE_ERROR, Returned by Handler for [UserDefinedRegion]
     ACPI Error: Method parse/execution failed \_SB.P18W._ON, AE_ERROR
     video LNXVIDEO:00: Failed to change power state to D0

But on Bay Trail this caused I2C reads to seem to succeed, but they end
up returning wrong data, which ends up getting written back by the typical
read-modify-write cycle done to turn on various power-resources.

Debugging the problems caused by this silent data corruption is quite
nasty. This commit adds a check which disallows i2c_dw_xfer() calls to
happen until the controller's resume method has completed.

Which turns the silent data corruption into getting these errors in
dmesg instead:

    i2c_designware 80860F41:04: Error i2c_dw_xfer call while suspended
    ACPI Error: AE_ERROR, Returned by Handler for [UserDefinedRegion]
    ACPI Error: Method parse/execution failed \_SB.PCI0.GFX0._PS0, AE_ERROR

Which is much better.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/i2c/busses/i2c-designware-core.h    | 2 ++
 drivers/i2c/busses/i2c-designware-master.c  | 6 ++++++
 drivers/i2c/busses/i2c-designware-pcidrv.c  | 7 ++++++-
 drivers/i2c/busses/i2c-designware-platdrv.c | 3 +++
 4 files changed, 17 insertions(+), 1 deletion(-)

Comments

Jarkko Nikula Sept. 25, 2018, 2:03 p.m. | #1
On 09/23/2018 05:00 PM, Hans de Goede wrote:
> On most Intel Bay- and Cherry-Trail systems the PMIC is connected over I2C
> and the PMIC is accessed through various means by the _PS0 and _PS3 ACPI
> methods (power on / off methods) of various devices.
> 
> This leads to suspend/resume ordering problems where a device may be
> resumed and get its _PS0 method executed before the I2C controller is
> resumed. On Cherry Trail this leads to errors like these:
> 
>       i2c_designware 808622C1:06: controller timed out
>       ACPI Error: AE_ERROR, Returned by Handler for [UserDefinedRegion]
>       ACPI Error: Method parse/execution failed \_SB.P18W._ON, AE_ERROR
>       video LNXVIDEO:00: Failed to change power state to D0
> 
> But on Bay Trail this caused I2C reads to seem to succeed, but they end
> up returning wrong data, which ends up getting written back by the typical
> read-modify-write cycle done to turn on various power-resources.
> 
> Debugging the problems caused by this silent data corruption is quite
> nasty. This commit adds a check which disallows i2c_dw_xfer() calls to
> happen until the controller's resume method has completed.
> 
> Which turns the silent data corruption into getting these errors in
> dmesg instead:
> 
>      i2c_designware 80860F41:04: Error i2c_dw_xfer call while suspended
>      ACPI Error: AE_ERROR, Returned by Handler for [UserDefinedRegion]
>      ACPI Error: Method parse/execution failed \_SB.PCI0.GFX0._PS0, AE_ERROR
> 
> Which is much better.
> 
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>   drivers/i2c/busses/i2c-designware-core.h    | 2 ++
>   drivers/i2c/busses/i2c-designware-master.c  | 6 ++++++
>   drivers/i2c/busses/i2c-designware-pcidrv.c  | 7 ++++++-
>   drivers/i2c/busses/i2c-designware-platdrv.c | 3 +++
>   4 files changed, 17 insertions(+), 1 deletion(-)
> 
...
> diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
> index 2a630ac35ba2..77a1ab33a3d4 100644
> --- a/drivers/i2c/busses/i2c-designware-master.c
> +++ b/drivers/i2c/busses/i2c-designware-master.c
> @@ -424,6 +424,12 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>   
>   	pm_runtime_get_sync(dev->dev);
>   
> +	if (dev->suspended) {
> +		dev_err(dev->dev, "Error %s call while suspended\n", __func__);
> +		ret = -EIO;

I don't know for sure but would the -EBUSY be more correct here?
Hans de Goede Sept. 25, 2018, 3:26 p.m. | #2
Hi,

On 25-09-18 16:03, Jarkko Nikula wrote:
> On 09/23/2018 05:00 PM, Hans de Goede wrote:
>> On most Intel Bay- and Cherry-Trail systems the PMIC is connected over I2C
>> and the PMIC is accessed through various means by the _PS0 and _PS3 ACPI
>> methods (power on / off methods) of various devices.
>>
>> This leads to suspend/resume ordering problems where a device may be
>> resumed and get its _PS0 method executed before the I2C controller is
>> resumed. On Cherry Trail this leads to errors like these:
>>
>>       i2c_designware 808622C1:06: controller timed out
>>       ACPI Error: AE_ERROR, Returned by Handler for [UserDefinedRegion]
>>       ACPI Error: Method parse/execution failed \_SB.P18W._ON, AE_ERROR
>>       video LNXVIDEO:00: Failed to change power state to D0
>>
>> But on Bay Trail this caused I2C reads to seem to succeed, but they end
>> up returning wrong data, which ends up getting written back by the typical
>> read-modify-write cycle done to turn on various power-resources.
>>
>> Debugging the problems caused by this silent data corruption is quite
>> nasty. This commit adds a check which disallows i2c_dw_xfer() calls to
>> happen until the controller's resume method has completed.
>>
>> Which turns the silent data corruption into getting these errors in
>> dmesg instead:
>>
>>      i2c_designware 80860F41:04: Error i2c_dw_xfer call while suspended
>>      ACPI Error: AE_ERROR, Returned by Handler for [UserDefinedRegion]
>>      ACPI Error: Method parse/execution failed \_SB.PCI0.GFX0._PS0, AE_ERROR
>>
>> Which is much better.
>>
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>>   drivers/i2c/busses/i2c-designware-core.h    | 2 ++
>>   drivers/i2c/busses/i2c-designware-master.c  | 6 ++++++
>>   drivers/i2c/busses/i2c-designware-pcidrv.c  | 7 ++++++-
>>   drivers/i2c/busses/i2c-designware-platdrv.c | 3 +++
>>   4 files changed, 17 insertions(+), 1 deletion(-)
>>
> ...
>> diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
>> index 2a630ac35ba2..77a1ab33a3d4 100644
>> --- a/drivers/i2c/busses/i2c-designware-master.c
>> +++ b/drivers/i2c/busses/i2c-designware-master.c
>> @@ -424,6 +424,12 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>>       pm_runtime_get_sync(dev->dev);
>> +    if (dev->suspended) {
>> +        dev_err(dev->dev, "Error %s call while suspended\n", __func__);
>> +        ret = -EIO;
> 
> I don't know for sure but would the -EBUSY be more correct here?

I didn't use -EBUSY because I'm afraid some code may handle that in
a special manner and do or schedule a retry or some such, although
I guess that would be more expected of -EAGAIN.

Anyways with that said I'm fine with changing this to -EBUSY, that
was my initial thought on this to.

Please let me know how you want to proceed with this
(which error code you want to use).

Regards,

Hans
Wolfram Sang Sept. 25, 2018, 4:29 p.m. | #3
> Anyways with that said I'm fine with changing this to -EBUSY, that
> was my initial thought on this to.
> 
> Please let me know how you want to proceed with this
> (which error code you want to use).

FYI: I am just now in the process of designing this behaviour into the
core (like with a flag to enable it). It is open coded by various
drivers, and often with issues.
Hans de Goede Sept. 25, 2018, 5:25 p.m. | #4
Hi,

On 25-09-18 18:29, Wolfram Sang wrote:
> 
>> Anyways with that said I'm fine with changing this to -EBUSY, that
>> was my initial thought on this to.
>>
>> Please let me know how you want to proceed with this
>> (which error code you want to use).
> 
> FYI: I am just now in the process of designing this behaviour into the
> core (like with a flag to enable it). It is open coded by various
> drivers, and often with issues.

Sounds good, let me know when you've something ready for
review/testing.  I can reproduce the problem of accessing the
i2c bus while suspended at will by reverting one of the
ordering fixes I recently did, so I can happily test such
a generic fix.

Also I assume you want to not merge this patch then in favor
of going with the generic solution you are working on?

Regards,

Hans
Grygorii Strashko Sept. 25, 2018, 8:47 p.m. | #5
On 09/23/2018 09:00 AM, Hans de Goede wrote:
> On most Intel Bay- and Cherry-Trail systems the PMIC is connected over I2C
> and the PMIC is accessed through various means by the _PS0 and _PS3 ACPI
> methods (power on / off methods) of various devices.
> 
> This leads to suspend/resume ordering problems where a device may be
> resumed and get its _PS0 method executed before the I2C controller is
> resumed. On Cherry Trail this leads to errors like these:
> 
>       i2c_designware 808622C1:06: controller timed out
>       ACPI Error: AE_ERROR, Returned by Handler for [UserDefinedRegion]
>       ACPI Error: Method parse/execution failed \_SB.P18W._ON, AE_ERROR
>       video LNXVIDEO:00: Failed to change power state to D0
> 

Hm. There should some dependency from PM domain (or smth. like this) 
which, in turn should depends from i2c.

> But on Bay Trail this caused I2C reads to seem to succeed, but they end
> up returning wrong data, which ends up getting written back by the typical
> read-modify-write cycle done to turn on various power-resources.
> 
> Debugging the problems caused by this silent data corruption is quite
> nasty. This commit adds a check which disallows i2c_dw_xfer() calls to
> happen until the controller's resume method has completed.
> 
> Which turns the silent data corruption into getting these errors in
> dmesg instead:
> 
>      i2c_designware 80860F41:04: Error i2c_dw_xfer call while suspended
>      ACPI Error: AE_ERROR, Returned by Handler for [UserDefinedRegion]
>      ACPI Error: Method parse/execution failed \_SB.PCI0.GFX0._PS0, AE_ERROR
> 
> Which is much better.
> 
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>   drivers/i2c/busses/i2c-designware-core.h    | 2 ++
>   drivers/i2c/busses/i2c-designware-master.c  | 6 ++++++
>   drivers/i2c/busses/i2c-designware-pcidrv.c  | 7 ++++++-
>   drivers/i2c/busses/i2c-designware-platdrv.c | 3 +++
>   4 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/i2c/busses/i2c-designware-core.h b/drivers/i2c/busses/i2c-designware-core.h
> index 1b5a6d47f73d..31e0c84ab4e7 100644
> --- a/drivers/i2c/busses/i2c-designware-core.h
> +++ b/drivers/i2c/busses/i2c-designware-core.h
> @@ -215,6 +215,7 @@
>    * @disable_int: function to disable all interrupts
>    * @init: function to initialize the I2C hardware
>    * @mode: operation mode - DW_IC_MASTER or DW_IC_SLAVE
> + * @suspended: set to true if the controller is suspended
>    *
>    * HCNT and LCNT parameters can be used if the platform knows more accurate
>    * values than the one computed based only on the input clock frequency.
> @@ -268,6 +269,7 @@ struct dw_i2c_dev {
>   	int			(*init)(struct dw_i2c_dev *dev);
>   	int			mode;
>   	struct i2c_bus_recovery_info rinfo;
> +	bool			suspended;
>   };
>   
>   #define ACCESS_SWAP		0x00000001
> diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
> index 2a630ac35ba2..77a1ab33a3d4 100644
> --- a/drivers/i2c/busses/i2c-designware-master.c
> +++ b/drivers/i2c/busses/i2c-designware-master.c
> @@ -424,6 +424,12 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>   
>   	pm_runtime_get_sync(dev->dev);
>   
> +	if (dev->suspended) {
> +		dev_err(dev->dev, "Error %s call while suspended\n", __func__);
> +		ret = -EIO;
> +		goto done_nolock;
> +	}
> +
>   	reinit_completion(&dev->cmd_complete);
>   	dev->msgs = msgs;
>   	dev->msgs_num = num;
> diff --git a/drivers/i2c/busses/i2c-designware-pcidrv.c b/drivers/i2c/busses/i2c-designware-pcidrv.c
> index d50f80487214..76810deb2de6 100644
> --- a/drivers/i2c/busses/i2c-designware-pcidrv.c
> +++ b/drivers/i2c/busses/i2c-designware-pcidrv.c
> @@ -176,6 +176,7 @@ static int i2c_dw_pci_suspend(struct device *dev)
>   	struct pci_dev *pdev = to_pci_dev(dev);
>   	struct dw_i2c_dev *i_dev = pci_get_drvdata(pdev);
>   
> +	i_dev->suspended = true;

you should lock i2c adapter while setting suspended flag


>   	i_dev->disable(i_dev);
>   
>   	return 0;
> @@ -185,8 +186,12 @@ static int i2c_dw_pci_resume(struct device *dev)
>   {
>   	struct pci_dev *pdev = to_pci_dev(dev);
>   	struct dw_i2c_dev *i_dev = pci_get_drvdata(pdev);
> +	int ret;
>   
> -	return i_dev->init(i_dev);
> +	ret = i_dev->init(i_dev);
> +	i_dev->suspended = false;


> +
> +	return ret;
>   }
>   #endif
>   
> diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c b/drivers/i2c/busses/i2c-designware-platdrv.c
> index 19833f9aaefe..d4604bea78ad 100644
> --- a/drivers/i2c/busses/i2c-designware-platdrv.c
> +++ b/drivers/i2c/busses/i2c-designware-platdrv.c
> @@ -414,6 +414,8 @@ static int dw_i2c_plat_suspend(struct device *dev)
>   {
>   	struct dw_i2c_dev *i_dev = dev_get_drvdata(dev);
>   
> +	i_dev->suspended = true;
> +
>   	if (i_dev->shared_with_punit)
>   		return 0;
>   
> @@ -431,6 +433,7 @@ static int dw_i2c_plat_resume(struct device *dev)
>   		i2c_dw_prepare_clk(i_dev, true);
>   
>   	i_dev->init(i_dev);
> +	i_dev->suspended = false;
>   
>   	return 0;
>   }
>
Wolfram Sang Sept. 25, 2018, 10:36 p.m. | #6
> > FYI: I am just now in the process of designing this behaviour into the
> > core (like with a flag to enable it). It is open coded by various
> > drivers, and often with issues.
> 
> Sounds good, let me know when you've something ready for
> review/testing.  I can reproduce the problem of accessing the
> i2c bus while suspended at will by reverting one of the
> ordering fixes I recently did, so I can happily test such
> a generic fix.

Thanks. Will do! I'll try to get something done until Wednesday or
Thursday.

> Also I assume you want to not merge this patch then in favor
> of going with the generic solution you are working on?

Yes.
Hans de Goede Sept. 26, 2018, 7:40 a.m. | #7
Hi,

On 25-09-18 22:47, Grygorii Strashko wrote:
> 
> 
> On 09/23/2018 09:00 AM, Hans de Goede wrote:
>> On most Intel Bay- and Cherry-Trail systems the PMIC is connected over I2C
>> and the PMIC is accessed through various means by the _PS0 and _PS3 ACPI
>> methods (power on / off methods) of various devices.
>>
>> This leads to suspend/resume ordering problems where a device may be
>> resumed and get its _PS0 method executed before the I2C controller is
>> resumed. On Cherry Trail this leads to errors like these:
>>
>>       i2c_designware 808622C1:06: controller timed out
>>       ACPI Error: AE_ERROR, Returned by Handler for [UserDefinedRegion]
>>       ACPI Error: Method parse/execution failed \_SB.P18W._ON, AE_ERROR
>>       video LNXVIDEO:00: Failed to change power state to D0
>>
> 
> Hm. There should some dependency from PM domain (or smth. like this) which, in turn should depends from i2c.

Yes and I've a patchset pending to fix the dependency missing in this
specific case. But there might be other cases where we miss the dependency
the idea behind this patch / behind the generic solution Wolfram is
working on which will replace this patch / is to make it easier to
catch/debug cases where such a dependency is missing.

Regards,

Hans
Grygorii Strashko Sept. 26, 2018, 10:05 p.m. | #8
On 09/26/2018 02:40 AM, Hans de Goede wrote:
> Hi,
> 
> On 25-09-18 22:47, Grygorii Strashko wrote:
>>
>>
>> On 09/23/2018 09:00 AM, Hans de Goede wrote:
>>> On most Intel Bay- and Cherry-Trail systems the PMIC is connected 
>>> over I2C
>>> and the PMIC is accessed through various means by the _PS0 and _PS3 ACPI
>>> methods (power on / off methods) of various devices.
>>>
>>> This leads to suspend/resume ordering problems where a device may be
>>> resumed and get its _PS0 method executed before the I2C controller is
>>> resumed. On Cherry Trail this leads to errors like these:
>>>
>>>       i2c_designware 808622C1:06: controller timed out
>>>       ACPI Error: AE_ERROR, Returned by Handler for [UserDefinedRegion]
>>>       ACPI Error: Method parse/execution failed \_SB.P18W._ON, AE_ERROR
>>>       video LNXVIDEO:00: Failed to change power state to D0
>>>
>>
>> Hm. There should some dependency from PM domain (or smth. like this) 
>> which, in turn should depends from i2c.
> 
> Yes and I've a patchset pending to fix the dependency missing in this
> specific case. But there might be other cases where we miss the dependency
> the idea behind this patch / behind the generic solution Wolfram is
> working on which will replace this patch / is to make it easier to
> catch/debug cases where such a dependency is missing.

Thanks for explanation, It might be good to update commit message  
and pay attention that this is not just w/a, but debug instrument.

Note. it will help catch not only dependency issues, but also buggy
drivers who left workers/kthreads/irq threads enabled while suspending and
which might run on other CPUs in parallel with suspend process
until suspend_enter()->disable_nonboot_cpus() call.

Patch

diff --git a/drivers/i2c/busses/i2c-designware-core.h b/drivers/i2c/busses/i2c-designware-core.h
index 1b5a6d47f73d..31e0c84ab4e7 100644
--- a/drivers/i2c/busses/i2c-designware-core.h
+++ b/drivers/i2c/busses/i2c-designware-core.h
@@ -215,6 +215,7 @@ 
  * @disable_int: function to disable all interrupts
  * @init: function to initialize the I2C hardware
  * @mode: operation mode - DW_IC_MASTER or DW_IC_SLAVE
+ * @suspended: set to true if the controller is suspended
  *
  * HCNT and LCNT parameters can be used if the platform knows more accurate
  * values than the one computed based only on the input clock frequency.
@@ -268,6 +269,7 @@  struct dw_i2c_dev {
 	int			(*init)(struct dw_i2c_dev *dev);
 	int			mode;
 	struct i2c_bus_recovery_info rinfo;
+	bool			suspended;
 };
 
 #define ACCESS_SWAP		0x00000001
diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
index 2a630ac35ba2..77a1ab33a3d4 100644
--- a/drivers/i2c/busses/i2c-designware-master.c
+++ b/drivers/i2c/busses/i2c-designware-master.c
@@ -424,6 +424,12 @@  i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
 
 	pm_runtime_get_sync(dev->dev);
 
+	if (dev->suspended) {
+		dev_err(dev->dev, "Error %s call while suspended\n", __func__);
+		ret = -EIO;
+		goto done_nolock;
+	}
+
 	reinit_completion(&dev->cmd_complete);
 	dev->msgs = msgs;
 	dev->msgs_num = num;
diff --git a/drivers/i2c/busses/i2c-designware-pcidrv.c b/drivers/i2c/busses/i2c-designware-pcidrv.c
index d50f80487214..76810deb2de6 100644
--- a/drivers/i2c/busses/i2c-designware-pcidrv.c
+++ b/drivers/i2c/busses/i2c-designware-pcidrv.c
@@ -176,6 +176,7 @@  static int i2c_dw_pci_suspend(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct dw_i2c_dev *i_dev = pci_get_drvdata(pdev);
 
+	i_dev->suspended = true;
 	i_dev->disable(i_dev);
 
 	return 0;
@@ -185,8 +186,12 @@  static int i2c_dw_pci_resume(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct dw_i2c_dev *i_dev = pci_get_drvdata(pdev);
+	int ret;
 
-	return i_dev->init(i_dev);
+	ret = i_dev->init(i_dev);
+	i_dev->suspended = false;
+
+	return ret;
 }
 #endif
 
diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c b/drivers/i2c/busses/i2c-designware-platdrv.c
index 19833f9aaefe..d4604bea78ad 100644
--- a/drivers/i2c/busses/i2c-designware-platdrv.c
+++ b/drivers/i2c/busses/i2c-designware-platdrv.c
@@ -414,6 +414,8 @@  static int dw_i2c_plat_suspend(struct device *dev)
 {
 	struct dw_i2c_dev *i_dev = dev_get_drvdata(dev);
 
+	i_dev->suspended = true;
+
 	if (i_dev->shared_with_punit)
 		return 0;
 
@@ -431,6 +433,7 @@  static int dw_i2c_plat_resume(struct device *dev)
 		i2c_dw_prepare_clk(i_dev, true);
 
 	i_dev->init(i_dev);
+	i_dev->suspended = false;
 
 	return 0;
 }