diff mbox

i2c: imx: double check IIF in case interrupt lost

Message ID 1408004954-29418-1-git-send-email-b38611@freescale.com
State Deferred
Headers show

Commit Message

Nimrod Andy Aug. 14, 2014, 8:29 a.m. UTC
In i2c_imx_read():
...
result = i2c_imx_trx_complete(i2c_imx);
if (result)
		return result;
..

If the current byte read complete, "IIF" status is set, and pend
up one GIC interrupt. In irq handler, wake up the wait queue in
.i2c_imx_trx_complete().

But, for imx6q platform with high bus and cpu loading test cases,
after long time test, sometime i2c interrupt is lost, but "IIF" is
set, according to current logic code, i2c_imx_trx_complete() still
return "-ETIMEDOUT", and then i2c host don't read the rest of data,
i2c driver stop transmit, disable controller and clock. Thus, i2c
device cannot wait clock and always drive the SDA line.

So, SDA is pulled down by i2c device, which needs 9 clocks to recovery
the SDA line.

To avoid the issue, we can double check IIF bit after timeout for waiting
event in .i2c_imx_trx_complete(), if IIF bit is set, process it in
normal flow. The patch just to double check IIF in case interrupt lost.

Signed-off-by: Fugang Duan <B38611@freescale.com>
---
 drivers/i2c/busses/i2c-imx.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

Comments

Uwe Kleine-König Aug. 14, 2014, 9:42 a.m. UTC | #1
Hello,

On Thu, Aug 14, 2014 at 04:29:14PM +0800, Fugang Duan wrote:
> In i2c_imx_read():
> ...
> result = i2c_imx_trx_complete(i2c_imx);
> if (result)
> 		return result;
> ..
> 
> If the current byte read complete, "IIF" status is set, and pend
> up one GIC interrupt. In irq handler, wake up the wait queue in
> .i2c_imx_trx_complete().
> 
> But, for imx6q platform with high bus and cpu loading test cases,
> after long time test, sometime i2c interrupt is lost, but "IIF" is
> set, according to current logic code, i2c_imx_trx_complete() still
> return "-ETIMEDOUT", and then i2c host don't read the rest of data,
> i2c driver stop transmit, disable controller and clock. Thus, i2c
> device cannot wait clock and always drive the SDA line.
> 
> So, SDA is pulled down by i2c device, which needs 9 clocks to recovery
> the SDA line.
> 
> To avoid the issue, we can double check IIF bit after timeout for waiting
> event in .i2c_imx_trx_complete(), if IIF bit is set, process it in
> normal flow. The patch just to double check IIF in case interrupt lost.
> 
> Signed-off-by: Fugang Duan <B38611@freescale.com>
> ---
>  drivers/i2c/busses/i2c-imx.c |   10 ++++++++--
>  1 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
> index aa8bc14..4b63771 100644
> --- a/drivers/i2c/busses/i2c-imx.c
> +++ b/drivers/i2c/busses/i2c-imx.c
> @@ -285,11 +285,17 @@ static int i2c_imx_bus_busy(struct imx_i2c_struct *i2c_imx, int for_busy)
>  
>  static int i2c_imx_trx_complete(struct imx_i2c_struct *i2c_imx)
>  {
> +	unsigned int temp;
> +
>  	wait_event_timeout(i2c_imx->queue, i2c_imx->i2csr & I2SR_IIF, HZ / 10);
>  
>  	if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
> -		dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n", __func__);
> -		return -ETIMEDOUT;
> +		/* Double check IIF to avoid interrupt lost */
> +		temp = imx_i2c_read_reg(i2c_imx, IMX_I2C_I2SR);
> +		if (!(temp & I2SR_IIF)) {
> +			dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n", __func__);
> +			return -ETIMEDOUT;
> +		}
This smells fishy. If possible the fix should be not to loose an
interrupt. Can you 

If I2SR_IIF is unset in i2c_imx->i2csr this means that
i2c_imx_trx_complete was already run before since the last irq
triggered. But if then imx_i2c_read_reg(i2c_imx, IMX_I2C_I2SR) returns
the IIF flag set why doesn't the irq trigger? That would mean there is a
hardware bug, doesn't it? Is there a related Errata? Does it apply to
all SoCs using this driver? Did you check that the irq handling in the
driver isn't racy?

Did you test using irq threading and high-priority tasks? Did you prove
that the situation in question really occurs? (I.e. wait_event_timeout
returns with I2SR_IIF unset in i2c_imx->i2csr but set in IMX_I2C_I2SR.)
I didn't check the code and maybe this might be prevented by the i2c
framework, but maybe you have two waiters?

Best regards
Uwe
Fugang Duan Aug. 14, 2014, 10:09 a.m. UTC | #2
From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Sent: Thursday, August 14, 2014 5:42 PM
>To: Duan Fugang-B38611
>Cc: wsa@the-dreams.de; linux-i2c@vger.kernel.org
>Subject: Re: [PATCH] i2c: imx: double check IIF in case interrupt lost
>
>Hello,
>
>On Thu, Aug 14, 2014 at 04:29:14PM +0800, Fugang Duan wrote:
>> In i2c_imx_read():
>> ...
>> result = i2c_imx_trx_complete(i2c_imx); if (result)
>> 		return result;
>> ..
>>
>> If the current byte read complete, "IIF" status is set, and pend up
>> one GIC interrupt. In irq handler, wake up the wait queue in
>> .i2c_imx_trx_complete().
>>
>> But, for imx6q platform with high bus and cpu loading test cases,
>> after long time test, sometime i2c interrupt is lost, but "IIF" is
>> set, according to current logic code, i2c_imx_trx_complete() still
>> return "-ETIMEDOUT", and then i2c host don't read the rest of data,
>> i2c driver stop transmit, disable controller and clock. Thus, i2c
>> device cannot wait clock and always drive the SDA line.
>>
>> So, SDA is pulled down by i2c device, which needs 9 clocks to recovery
>> the SDA line.
>>
>> To avoid the issue, we can double check IIF bit after timeout for
>> waiting event in .i2c_imx_trx_complete(), if IIF bit is set, process
>> it in normal flow. The patch just to double check IIF in case interrupt
>lost.
>>
>> Signed-off-by: Fugang Duan <B38611@freescale.com>
>> ---
>>  drivers/i2c/busses/i2c-imx.c |   10 ++++++++--
>>  1 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/i2c/busses/i2c-imx.c
>> b/drivers/i2c/busses/i2c-imx.c index aa8bc14..4b63771 100644
>> --- a/drivers/i2c/busses/i2c-imx.c
>> +++ b/drivers/i2c/busses/i2c-imx.c
>> @@ -285,11 +285,17 @@ static int i2c_imx_bus_busy(struct
>> imx_i2c_struct *i2c_imx, int for_busy)
>>
>>  static int i2c_imx_trx_complete(struct imx_i2c_struct *i2c_imx)  {
>> +	unsigned int temp;
>> +
>>  	wait_event_timeout(i2c_imx->queue, i2c_imx->i2csr & I2SR_IIF, HZ /
>> 10);
>>
>>  	if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
>> -		dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n", __func__);
>> -		return -ETIMEDOUT;
>> +		/* Double check IIF to avoid interrupt lost */
>> +		temp = imx_i2c_read_reg(i2c_imx, IMX_I2C_I2SR);
>> +		if (!(temp & I2SR_IIF)) {
>> +			dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n",
>__func__);
>> +			return -ETIMEDOUT;
>> +		}
>This smells fishy. If possible the fix should be not to loose an interrupt.
>Can you
>
>If I2SR_IIF is unset in i2c_imx->i2csr this means that
>i2c_imx_trx_complete was already run before since the last irq triggered.
>But if then imx_i2c_read_reg(i2c_imx, IMX_I2C_I2SR) returns the IIF flag
>set why doesn't the irq trigger? That would mean there is a hardware bug,
>doesn't it? Is there a related Errata? Does it apply to all SoCs using
>this driver? Did you check that the irq handling in the driver isn't racy?
After we debug, it seems that irq lost in the stress case.
Because we increase the timeout value to one "HZ" in wait_event_timeout(), and it 
Return "0" means no interrupt. But when we read I2SR, IIF bit is set.
There have no racy in the driver code, so we judge there have interrupt lost. 

After apply the patch, it really solve the issue.
>
>Did you test using irq threading and high-priority tasks? Did you prove
>that the situation in question really occurs? (I.e. wait_event_timeout
>returns with I2SR_IIF unset in i2c_imx->i2csr but set in IMX_I2C_I2SR.) I
>didn't check the code and maybe this might be prevented by the i2c
>framework, but maybe you have two waiters?
>
As above.  I don't try irq thread and high priority task method.

>Best regards
>Uwe
>
Lin>--
>Pengutronix e.K.                           | Uwe Kleine-König            |
>Industrial ux Solutions                 | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Uwe Kleine-König Aug. 14, 2014, 7:27 p.m. UTC | #3
Hello,

On Thu, Aug 14, 2014 at 10:09:42AM +0000, fugang.duan@freescale.com wrote:
> From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Sent: Thursday, August 14, 2014 5:42 PM
> >On Thu, Aug 14, 2014 at 04:29:14PM +0800, Fugang Duan wrote:
> >> diff --git a/drivers/i2c/busses/i2c-imx.c
> >> b/drivers/i2c/busses/i2c-imx.c index aa8bc14..4b63771 100644
> >> --- a/drivers/i2c/busses/i2c-imx.c
> >> +++ b/drivers/i2c/busses/i2c-imx.c
> >> @@ -285,11 +285,17 @@ static int i2c_imx_bus_busy(struct
> >> imx_i2c_struct *i2c_imx, int for_busy)
> >>
> >>  static int i2c_imx_trx_complete(struct imx_i2c_struct *i2c_imx)  {
> >> +	unsigned int temp;
> >> +
> >>  	wait_event_timeout(i2c_imx->queue, i2c_imx->i2csr & I2SR_IIF, HZ /
> >> 10);
> >>
> >>  	if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
> >> -		dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n", __func__);
> >> -		return -ETIMEDOUT;
> >> +		/* Double check IIF to avoid interrupt lost */
> >> +		temp = imx_i2c_read_reg(i2c_imx, IMX_I2C_I2SR);
> >> +		if (!(temp & I2SR_IIF)) {
> >> +			dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n",
> >__func__);
> >> +			return -ETIMEDOUT;
> >> +		}
> >This smells fishy. If possible the fix should be not to loose an interrupt.
> >Can you
> >
> >If I2SR_IIF is unset in i2c_imx->i2csr this means that
> >i2c_imx_trx_complete was already run before since the last irq triggered.
> >But if then imx_i2c_read_reg(i2c_imx, IMX_I2C_I2SR) returns the IIF flag
> >set why doesn't the irq trigger? That would mean there is a hardware bug,
> >doesn't it? Is there a related Errata? Does it apply to all SoCs using
> >this driver? Did you check that the irq handling in the driver isn't racy?
> After we debug, it seems that irq lost in the stress case.
> Because we increase the timeout value to one "HZ" in wait_event_timeout(), and it 
> Return "0" means no interrupt. But when we read I2SR, IIF bit is set.
> There have no racy in the driver code, so we judge there have interrupt lost. 
> 
> After apply the patch, it really solve the issue.
I'm still not convinced. Either there is a hardware problem (then
Freescale should emit a proper errata for it when it's completely
understood) or there is a software problem and then your fix looks
wrong. Can you do the following please:

Make the code read:

	static int i2c_imx_trx_complete(struct imx_i2c_struct *i2c_imx)
	{
		wait_event_timeout(i2c_imx->queue, i2c_imx->i2csr & I2SR_IIF, HZ / 10);

		if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
			unsigned int temp = imx_i2c_read_reg(i2c_imx, IMX_I2C_I2SR);
			if (!(temp & I2SR_IFF)) {
				dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n", __func__);
				return -ETIMEDOUT;
			} else {
				dev_info(&i2c_imx->adapter.dev, "<%s> Disable tracing\n", __func__);
				tracing_off();
			}
		}

		dev_dbg(&i2c_imx->adapter.dev, "<%s> TRX complete\n", __func__);
		i2c_imx->i2csr = 0;
		return 0;
	}

Then compile a kernel with CONFIG_FUNCTION_TRACER=y and before repeating
your tests do:

	cd /sys/kernel/debug/tracing # assuming you have debugfs mounted accordingly
	echo function > current_tracer
	echo 1 > tracing_on

And once the "Disable tracing" message appears extract

	/sys/kernel/debug/tracing/trace

and send it to me.

Best regards
Uwe
Fugang Duan Aug. 15, 2014, 2:14 a.m. UTC | #4
From: Uwe Kleine-König <kleine-koenig@pengutronix.de> Sent: Friday, August 15, 2014 3:27 AM
>To: Duan Fugang-B38611
>Cc: wsa@the-dreams.de; linux-i2c@vger.kernel.org
>Subject: Re: [PATCH] i2c: imx: double check IIF in case interrupt lost
>
>Hello,
>
>On Thu, Aug 14, 2014 at 10:09:42AM +0000, fugang.duan@freescale.com wrote:
>> From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Sent:
>> Thursday, August 14, 2014 5:42 PM
>> >On Thu, Aug 14, 2014 at 04:29:14PM +0800, Fugang Duan wrote:
>> >> diff --git a/drivers/i2c/busses/i2c-imx.c
>> >> b/drivers/i2c/busses/i2c-imx.c index aa8bc14..4b63771 100644
>> >> --- a/drivers/i2c/busses/i2c-imx.c
>> >> +++ b/drivers/i2c/busses/i2c-imx.c
>> >> @@ -285,11 +285,17 @@ static int i2c_imx_bus_busy(struct
>> >> imx_i2c_struct *i2c_imx, int for_busy)
>> >>
>> >>  static int i2c_imx_trx_complete(struct imx_i2c_struct *i2c_imx)  {
>> >> +	unsigned int temp;
>> >> +
>> >>  	wait_event_timeout(i2c_imx->queue, i2c_imx->i2csr & I2SR_IIF,
>HZ
>> >> / 10);
>> >>
>> >>  	if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
>> >> -		dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n",
>__func__);
>> >> -		return -ETIMEDOUT;
>> >> +		/* Double check IIF to avoid interrupt lost */
>> >> +		temp = imx_i2c_read_reg(i2c_imx, IMX_I2C_I2SR);
>> >> +		if (!(temp & I2SR_IIF)) {
>> >> +			dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n",
>> >__func__);
>> >> +			return -ETIMEDOUT;
>> >> +		}
>> >This smells fishy. If possible the fix should be not to loose an
>interrupt.
>> >Can you
>> >
>> >If I2SR_IIF is unset in i2c_imx->i2csr this means that
>> >i2c_imx_trx_complete was already run before since the last irq
>triggered.
>> >But if then imx_i2c_read_reg(i2c_imx, IMX_I2C_I2SR) returns the IIF
>> >flag set why doesn't the irq trigger? That would mean there is a
>> >hardware bug, doesn't it? Is there a related Errata? Does it apply to
>> >all SoCs using this driver? Did you check that the irq handling in the
>driver isn't racy?
>> After we debug, it seems that irq lost in the stress case.
>> Because we increase the timeout value to one "HZ" in
>> wait_event_timeout(), and it Return "0" means no interrupt. But when we
>read I2SR, IIF bit is set.
>> There have no racy in the driver code, so we judge there have interrupt
>lost.
>>
>> After apply the patch, it really solve the issue.
>I'm still not convinced. Either there is a hardware problem (then
>Freescale should emit a proper errata for it when it's completely
>understood) or there is a software problem and then your fix looks wrong.
>Can you do the following please:
>
>Make the code read:
>
>	static int i2c_imx_trx_complete(struct imx_i2c_struct *i2c_imx)
>	{
>		wait_event_timeout(i2c_imx->queue, i2c_imx->i2csr & I2SR_IIF,
>HZ / 10);
>
>		if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
>			unsigned int temp = imx_i2c_read_reg(i2c_imx,
>IMX_I2C_I2SR);
>			if (!(temp & I2SR_IFF)) {
>				dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n",
>__func__);
>				return -ETIMEDOUT;
>			} else {
>				dev_info(&i2c_imx->adapter.dev, "<%s> Disable
>tracing\n", __func__);
>				tracing_off();
>			}
>		}
>
>		dev_dbg(&i2c_imx->adapter.dev, "<%s> TRX complete\n",
>__func__);
>		i2c_imx->i2csr = 0;
>		return 0;
>	}
>
>Then compile a kernel with CONFIG_FUNCTION_TRACER=y and before repeating
>your tests do:
>
>	cd /sys/kernel/debug/tracing # assuming you have debugfs mounted
>accordingly
>	echo function > current_tracer
>	echo 1 > tracing_on
>
>And once the "Disable tracing" message appears extract
>
>	/sys/kernel/debug/tracing/trace
>
>and send it to me.
>
>Best regards
>Uwe
Uwe, thanks for much.
I am sorry to tell you I don't have the reproduced platform for the issue since the
Issue was found at one platform of customer of freescale, and I debug the issue with
Customer, at last generate the patch for customer platform, test fine. So I want to
Submit the patch. In our platforms, it is not easy to reproduce the issue.

Thanks,
Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Uwe Kleine-König Aug. 15, 2014, 7:18 a.m. UTC | #5
Hello,

On Fri, Aug 15, 2014 at 02:14:15AM +0000, fugang.duan@freescale.com wrote:
> I am sorry to tell you I don't have the reproduced platform for the issue since the
> Issue was found at one platform of customer of freescale, and I debug the issue with
> Customer, at last generate the patch for customer platform, test fine. So I want to
Does the customer use the -rt patch?

> Submit the patch. In our platforms, it is not easy to reproduce the issue.
I tend to not want the patch in mainline. IMHO a wrong workaround that
fixes the issue only for 95% is worse than an unfixed driver.
Maybe when analysing more exact the outcome is that the patch is right,
but if there is a race condition somewhere in the code it might be that
there is still a bug. Not being able to reproduce the problem doesn't
necessarily imply that the introduced change is a real fix.

Best regards
Uwe
Wolfram Sang Aug. 19, 2014, 2:53 p.m. UTC | #6
On Fri, Aug 15, 2014 at 09:18:14AM +0200, Uwe Kleine-König wrote:
> Hello,
> 
> On Fri, Aug 15, 2014 at 02:14:15AM +0000, fugang.duan@freescale.com wrote:
> > I am sorry to tell you I don't have the reproduced platform for the issue since the
> > Issue was found at one platform of customer of freescale, and I debug the issue with
> > Customer, at last generate the patch for customer platform, test fine. So I want to
> Does the customer use the -rt patch?

Valid question.

> > Submit the patch. In our platforms, it is not easy to reproduce the issue.
> I tend to not want the patch in mainline. IMHO a wrong workaround that
> fixes the issue only for 95% is worse than an unfixed driver.

I agree. Yet, I'd like to keep the information we have so far. Anyone
cares to send a patch adding a comment to the driver with what is known
yet?

Thanks,

   Wolfram
Fugang Duan Aug. 20, 2014, 1:52 a.m. UTC | #7
From: Wolfram Sang <wsa@the-dreams.de> Sent: Tuesday, August 19, 2014 10:54 PM

>To: Uwe Kleine-König

>Cc: Duan Fugang-B38611; linux-i2c@vger.kernel.org

>Subject: Re: [PATCH] i2c: imx: double check IIF in case interrupt lost

>

>On Fri, Aug 15, 2014 at 09:18:14AM +0200, Uwe Kleine-König wrote:

>> Hello,

>>

>> On Fri, Aug 15, 2014 at 02:14:15AM +0000, fugang.duan@freescale.com

>wrote:

>> > I am sorry to tell you I don't have the reproduced platform for the

>> > issue since the Issue was found at one platform of customer of

>> > freescale, and I debug the issue with Customer, at last generate the

>> > patch for customer platform, test fine. So I want to

>> Does the customer use the -rt patch?

>

>Valid question.

>

No, customer don't use -rt patch.

>> > Submit the patch. In our platforms, it is not easy to reproduce the

>issue.

>> I tend to not want the patch in mainline. IMHO a wrong workaround that

>> fixes the issue only for 95% is worse than an unfixed driver.

>

>I agree. Yet, I'd like to keep the information we have so far. Anyone

>cares to send a patch adding a comment to the driver with what is known

>yet?

>

>Thanks,

>

>   Wolfram


I agree your decision.

Thanks,
Andy
diff mbox

Patch

diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index aa8bc14..4b63771 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -285,11 +285,17 @@  static int i2c_imx_bus_busy(struct imx_i2c_struct *i2c_imx, int for_busy)
 
 static int i2c_imx_trx_complete(struct imx_i2c_struct *i2c_imx)
 {
+	unsigned int temp;
+
 	wait_event_timeout(i2c_imx->queue, i2c_imx->i2csr & I2SR_IIF, HZ / 10);
 
 	if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
-		dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n", __func__);
-		return -ETIMEDOUT;
+		/* Double check IIF to avoid interrupt lost */
+		temp = imx_i2c_read_reg(i2c_imx, IMX_I2C_I2SR);
+		if (!(temp & I2SR_IIF)) {
+			dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n", __func__);
+			return -ETIMEDOUT;
+		}
 	}
 	dev_dbg(&i2c_imx->adapter.dev, "<%s> TRX complete\n", __func__);
 	i2c_imx->i2csr = 0;