[v2] i2c: omap: Trigger bus recovery in lockup case

Message ID 1507110225-3050-1-git-send-email-claudio.foellmi@ergon.ch
State Accepted
Headers show
Series
  • [v2] i2c: omap: Trigger bus recovery in lockup case
Related show

Commit Message

Claudio Foellmi Oct. 4, 2017, 9:43 a.m.
A very conservative check for bus activity (to prevent interference
in multimaster setups) prevented the bus recovery methods from being
triggered in the case that SDA or SCL was stuck low.
This defeats the purpose of the recovery mechanism, which was introduced
for exactly this situation (a slave device keeping SDA pulled down).

Also added a check to make sure SDA is low before attempting recovery.
If SDA is not stuck low, recovery will not help, so we can skip it.

Note that bus lockups can persist across reboots. The only other options
are to reset or power cycle the offending slave device, and many i2c
slaves do not even have a reset pin.

If we see that one of the lines is low for the entire timeout duration,
we can actually be sure that there is no other master driving the bus.
It is therefore save for us to attempt a bus recovery.

Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Vignesh R <vigneshr@ti.com>
---
Changes since v1:
Added a check before all bus recoveries, to make sure SDA actually is
low. This should prevent most unnecessary attempts, which are not
without risk.

The full discussion of v1 can be found at
http://patchwork.ozlabs.org/patch/813889/

 drivers/i2c/busses/i2c-omap.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

Comments

Vignesh R Oct. 5, 2017, 6:01 a.m. | #1
Hi,

On Wednesday 04 October 2017 03:13 PM, Claudio Foellmi wrote:
> A very conservative check for bus activity (to prevent interference
> in multimaster setups) prevented the bus recovery methods from being
> triggered in the case that SDA or SCL was stuck low.
> This defeats the purpose of the recovery mechanism, which was introduced
> for exactly this situation (a slave device keeping SDA pulled down).
> 
> Also added a check to make sure SDA is low before attempting recovery.
> If SDA is not stuck low, recovery will not help, so we can skip it.
> 
> Note that bus lockups can persist across reboots. The only other options
> are to reset or power cycle the offending slave device, and many i2c
> slaves do not even have a reset pin.
> 
> If we see that one of the lines is low for the entire timeout duration,
> we can actually be sure that there is no other master driving the bus.
> It is therefore save for us to attempt a bus recovery.
> 
> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Vignesh R <vigneshr@ti.com>
> ---
> Changes since v1:
> Added a check before all bus recoveries, to make sure SDA actually is
> low. This should prevent most unnecessary attempts, which are not
> without risk.
> 

Thanks for the patch! This fixes my case. I no longer see recovery
attempt or IRQ flood:
Tested-by: Vignesh R <vigneshr@ti.com>

Regards
Vignesh

> The full discussion of v1 can be found at
> http://patchwork.ozlabs.org/patch/813889/
> 
>  drivers/i2c/busses/i2c-omap.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
> index 1ebb5e9..8cdf40a 100644
> --- a/drivers/i2c/busses/i2c-omap.c
> +++ b/drivers/i2c/busses/i2c-omap.c
> @@ -473,6 +473,22 @@ static int omap_i2c_init(struct omap_i2c_dev *omap)
>  }
>  
>  /*
> + * Try bus recovery, but only if SDA is actually low.
> + */
> +static int omap_i2c_recover_bus(struct omap_i2c_dev *omap)
> +{
> +	u16 systest;
> +
> +	systest = omap_i2c_read_reg(omap, OMAP_I2C_SYSTEST_REG);
> +	if ((systest & OMAP_I2C_SYSTEST_SCL_I_FUNC) &&
> +	    (systest & OMAP_I2C_SYSTEST_SDA_I_FUNC))
> +		return 0; /* bus seems to already be fine */
> +	if (!(systest & OMAP_I2C_SYSTEST_SCL_I_FUNC))
> +		return -ETIMEDOUT; /* recovery would not fix SCL */
> +	return i2c_recover_bus(&omap->adapter);
> +}
> +
> +/*
>   * Waiting on Bus Busy
>   */
>  static int omap_i2c_wait_for_bb(struct omap_i2c_dev *omap)
> @@ -482,7 +498,7 @@ static int omap_i2c_wait_for_bb(struct omap_i2c_dev *omap)
>  	timeout = jiffies + OMAP_I2C_TIMEOUT;
>  	while (omap_i2c_read_reg(omap, OMAP_I2C_STAT_REG) & OMAP_I2C_STAT_BB) {
>  		if (time_after(jiffies, timeout))
> -			return i2c_recover_bus(&omap->adapter);
> +			return omap_i2c_recover_bus(omap);
>  		msleep(1);
>  	}
>  
> @@ -563,8 +579,13 @@ static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap)
>  		}
>  
>  		if (time_after(jiffies, timeout)) {
> +			/*
> +			 * SDA or SCL were low for the entire timeout without
> +			 * any activity detected. Most likely, a slave is
> +			 * locking up the bus with no master driving the clock.
> +			 */
>  			dev_warn(omap->dev, "timeout waiting for bus ready\n");
> -			return -ETIMEDOUT;
> +			return omap_i2c_recover_bus(omap);
>  		}
>  
>  		msleep(1);
>
Grygorii Strashko Oct. 5, 2017, 12:30 p.m. | #2
On 10/05/2017 01:01 AM, Vignesh R wrote:
> Hi,
> 
> On Wednesday 04 October 2017 03:13 PM, Claudio Foellmi wrote:
>> A very conservative check for bus activity (to prevent interference
>> in multimaster setups) prevented the bus recovery methods from being
>> triggered in the case that SDA or SCL was stuck low.
>> This defeats the purpose of the recovery mechanism, which was introduced
>> for exactly this situation (a slave device keeping SDA pulled down).
>>
>> Also added a check to make sure SDA is low before attempting recovery.
>> If SDA is not stuck low, recovery will not help, so we can skip it.
>>
>> Note that bus lockups can persist across reboots. The only other options
>> are to reset or power cycle the offending slave device, and many i2c
>> slaves do not even have a reset pin.
>>
>> If we see that one of the lines is low for the entire timeout duration,
>> we can actually be sure that there is no other master driving the bus.
>> It is therefore save for us to attempt a bus recovery.
>>
>> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
>> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
>> Cc: Vignesh R <vigneshr@ti.com>
>> ---
>> Changes since v1:
>> Added a check before all bus recoveries, to make sure SDA actually is
>> low. This should prevent most unnecessary attempts, which are not
>> without risk.
>>
> 
> Thanks for the patch! This fixes my case. I no longer see recovery
> attempt or IRQ flood:
> Tested-by: Vignesh R <vigneshr@ti.com>

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Wolfram Sang Oct. 28, 2017, 8:52 p.m. | #3
> +	if (!(systest & OMAP_I2C_SYSTEST_SCL_I_FUNC))
> +		return -ETIMEDOUT; /* recovery would not fix SCL */

I'd rather return EBUSY here. It is not that the transaction timed out.
If you agree, I can fix it locally, no need to resend.

Rest looks good, thanks!
Claudio Foellmi Oct. 30, 2017, 9:11 a.m. | #4
On 10/28/2017 10:52 PM, Wolfram Sang wrote:
> 
>> +	if (!(systest & OMAP_I2C_SYSTEST_SCL_I_FUNC))
>> +		return -ETIMEDOUT; /* recovery would not fix SCL */
> 
> I'd rather return EBUSY here. It is not that the transaction timed out.
> If you agree, I can fix it locally, no need to resend.
> 
> Rest looks good, thanks!
>

Agreed, EBUSY is the better choice since that is also what the actual recovery would return.

Thanks,
Claudio
Wolfram Sang Oct. 30, 2017, 2:19 p.m. | #5
On Wed, Oct 04, 2017 at 11:43:45AM +0200, Claudio Foellmi wrote:
> A very conservative check for bus activity (to prevent interference
> in multimaster setups) prevented the bus recovery methods from being
> triggered in the case that SDA or SCL was stuck low.
> This defeats the purpose of the recovery mechanism, which was introduced
> for exactly this situation (a slave device keeping SDA pulled down).
> 
> Also added a check to make sure SDA is low before attempting recovery.
> If SDA is not stuck low, recovery will not help, so we can skip it.
> 
> Note that bus lockups can persist across reboots. The only other options
> are to reset or power cycle the offending slave device, and many i2c
> slaves do not even have a reset pin.
> 
> If we see that one of the lines is low for the entire timeout duration,
> we can actually be sure that there is no other master driving the bus.
> It is therefore save for us to attempt a bus recovery.
> 
> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Vignesh R <vigneshr@ti.com>

Applied with the discussed change to for-next, thanks!

Patch

diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
index 1ebb5e9..8cdf40a 100644
--- a/drivers/i2c/busses/i2c-omap.c
+++ b/drivers/i2c/busses/i2c-omap.c
@@ -473,6 +473,22 @@  static int omap_i2c_init(struct omap_i2c_dev *omap)
 }
 
 /*
+ * Try bus recovery, but only if SDA is actually low.
+ */
+static int omap_i2c_recover_bus(struct omap_i2c_dev *omap)
+{
+	u16 systest;
+
+	systest = omap_i2c_read_reg(omap, OMAP_I2C_SYSTEST_REG);
+	if ((systest & OMAP_I2C_SYSTEST_SCL_I_FUNC) &&
+	    (systest & OMAP_I2C_SYSTEST_SDA_I_FUNC))
+		return 0; /* bus seems to already be fine */
+	if (!(systest & OMAP_I2C_SYSTEST_SCL_I_FUNC))
+		return -ETIMEDOUT; /* recovery would not fix SCL */
+	return i2c_recover_bus(&omap->adapter);
+}
+
+/*
  * Waiting on Bus Busy
  */
 static int omap_i2c_wait_for_bb(struct omap_i2c_dev *omap)
@@ -482,7 +498,7 @@  static int omap_i2c_wait_for_bb(struct omap_i2c_dev *omap)
 	timeout = jiffies + OMAP_I2C_TIMEOUT;
 	while (omap_i2c_read_reg(omap, OMAP_I2C_STAT_REG) & OMAP_I2C_STAT_BB) {
 		if (time_after(jiffies, timeout))
-			return i2c_recover_bus(&omap->adapter);
+			return omap_i2c_recover_bus(omap);
 		msleep(1);
 	}
 
@@ -563,8 +579,13 @@  static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap)
 		}
 
 		if (time_after(jiffies, timeout)) {
+			/*
+			 * SDA or SCL were low for the entire timeout without
+			 * any activity detected. Most likely, a slave is
+			 * locking up the bus with no master driving the clock.
+			 */
 			dev_warn(omap->dev, "timeout waiting for bus ready\n");
-			return -ETIMEDOUT;
+			return omap_i2c_recover_bus(omap);
 		}
 
 		msleep(1);