[RFC] i2c: busses: i2c-omap: Increase timeout for i2c interrupt
diff mbox

Message ID 1436504994-31137-1-git-send-email-vigneshr@ti.com
State New
Headers show

Commit Message

Vignesh Raghavendra July 10, 2015, 5:09 a.m. UTC
When system is under load and there is an i2c transaction running
following warning appears on the console:

[  730.003617] omap_i2c 48070000.i2c: controller timed out
[  731.023643] omap_i2c 48070000.i2c: controller timed out

This is because, the completion() call, which is done in bottom half of
the interrupt handler, happens after the timeout period(1s) has elapsed
for the wait_for_completion_timeout() in omap_i2c_xfer_msg(). The
interrupt is raised within a second but due to system load (or other
interrupts), the bottom half does not get scheduled within a second.
Hence even though the interrupt has happened within required time frame,
due to delayed scheduling of bottom half, spurious timeout errors are
reported on the console and i2c controller is reset.

i2c timeout is a rare condition, hence increase timeout to 60s in order
to avoid reporting false timeout events under load.

Signed-off-by: Vignesh R <vigneshr@ti.com>
---

I reproduced this while running i2cdump in a loop and reading from flash
using dd command.

 drivers/i2c/busses/i2c-omap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Roger Quadros July 10, 2015, 8:09 a.m. UTC | #1
Vignesh,

On 10/07/15 08:09, Vignesh R wrote:
> When system is under load and there is an i2c transaction running
> following warning appears on the console:
> 
> [  730.003617] omap_i2c 48070000.i2c: controller timed out
> [  731.023643] omap_i2c 48070000.i2c: controller timed out
> 
> This is because, the completion() call, which is done in bottom half of
> the interrupt handler, happens after the timeout period(1s) has elapsed
> for the wait_for_completion_timeout() in omap_i2c_xfer_msg(). The
> interrupt is raised within a second but due to system load (or other
> interrupts), the bottom half does not get scheduled within a second.
> Hence even though the interrupt has happened within required time frame,
> due to delayed scheduling of bottom half, spurious timeout errors are
> reported on the console and i2c controller is reset.
> 
> i2c timeout is a rare condition, hence increase timeout to 60s in order
> to avoid reporting false timeout events under load.

why not 5s instead of 60s?

cheers,
-roger

> 
> Signed-off-by: Vignesh R <vigneshr@ti.com>
> ---
> 
> I reproduced this while running i2cdump in a loop and reading from flash
> using dd command.
> 
>  drivers/i2c/busses/i2c-omap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
> index d1c22e3fdd14..fa7758f0302c 100644
> --- a/drivers/i2c/busses/i2c-omap.c
> +++ b/drivers/i2c/busses/i2c-omap.c
> @@ -50,7 +50,7 @@
>  #define OMAP_I2C_REV_ON_4430_PLUS	0x50400002
>  
>  /* timeout waiting for the controller to respond */
> -#define OMAP_I2C_TIMEOUT (msecs_to_jiffies(1000))
> +#define OMAP_I2C_TIMEOUT (msecs_to_jiffies(60 * 1000))
>  
>  /* timeout for pm runtime autosuspend */
>  #define OMAP_I2C_PM_TIMEOUT		1000	/* ms */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sverdlin, Alexander (Nokia - DE/Ulm) July 10, 2015, 8:46 a.m. UTC | #2
Hi Vignesh,

On 10/07/15 07:09, ext Vignesh R wrote:
> When system is under load and there is an i2c transaction running
> following warning appears on the console:
> 
> [  730.003617] omap_i2c 48070000.i2c: controller timed out
> [  731.023643] omap_i2c 48070000.i2c: controller timed out
> 
> This is because, the completion() call, which is done in bottom half of
> the interrupt handler, happens after the timeout period(1s) has elapsed
> for the wait_for_completion_timeout() in omap_i2c_xfer_msg(). The
> interrupt is raised within a second but due to system load (or other
> interrupts), the bottom half does not get scheduled within a second.

I would say, if your system cannot handle an interrupt within 1 second,
you have more severe problems than just too small I2C timeout value.

> Hence even though the interrupt has happened within required time frame,
> due to delayed scheduling of bottom half, spurious timeout errors are
> reported on the console and i2c controller is reset.
> 
> i2c timeout is a rare condition, hence increase timeout to 60s in order
> to avoid reporting false timeout events under load.

60 s sounds way too much and actually I simply don't believe this is
the root cause. If I take a look into the driver, then I see, that
the design is not really the best. The whole IRQ handling could be
actually performed in hard IRQ handler, without threading overhead.
Putting even 2 bytes in the controller FIFO should not be too heavy
for the hard IRQ handler. Then these ridiculous spin_lock()s. What
is the reason behind? The IRQ is flagged with ONESHOT, so thread and
hardirq handler are anyway mutually excluded. But if this thread
ever runs longer than it's allowed in IRQ context, then it anyway
produces this IRQ latency because it locks spin_lock_irqsave() for
the whole time! So the whole point of threaded interrupt is missing.

I would propose you to throw away spinlocks. Convert threaded IRQ to
just one hardirq handler. And continue debugging. You will reduce the
load of the system with the above measures, maybe it will not happen
any more, maybe you'll figure out that problem is somewhere else.

> Signed-off-by: Vignesh R <vigneshr@ti.com>
> ---
> 
> I reproduced this while running i2cdump in a loop and reading from flash
> using dd command.
> 
>  drivers/i2c/busses/i2c-omap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
> index d1c22e3fdd14..fa7758f0302c 100644
> --- a/drivers/i2c/busses/i2c-omap.c
> +++ b/drivers/i2c/busses/i2c-omap.c
> @@ -50,7 +50,7 @@
>  #define OMAP_I2C_REV_ON_4430_PLUS	0x50400002
>  
>  /* timeout waiting for the controller to respond */
> -#define OMAP_I2C_TIMEOUT (msecs_to_jiffies(1000))
> +#define OMAP_I2C_TIMEOUT (msecs_to_jiffies(60 * 1000))
>  
>  /* timeout for pm runtime autosuspend */
>  #define OMAP_I2C_PM_TIMEOUT		1000	/* ms */
Wolfram Sang July 10, 2015, 9:09 a.m. UTC | #3
> 60 s sounds way too much and actually I simply don't believe this is
> the root cause. If I take a look into the driver, then I see, that

I agree, this is just a workaround.

> the design is not really the best. The whole IRQ handling could be
> actually performed in hard IRQ handler, without threading overhead.
> Putting even 2 bytes in the controller FIFO should not be too heavy
> for the hard IRQ handler. Then these ridiculous spin_lock()s. What
> is the reason behind? The IRQ is flagged with ONESHOT, so thread and
> hardirq handler are anyway mutually excluded. But if this thread
> ever runs longer than it's allowed in IRQ context, then it anyway
> produces this IRQ latency because it locks spin_lock_irqsave() for
> the whole time! So the whole point of threaded interrupt is missing.

Furthermore, this combination of threaded_irq and struct completion seems
bogus to me. If you just want to ensure the irq happened before timeout,
you just complete when the irq happened and do the "bottom half" after the
completion returned?

> I would propose you to throw away spinlocks. Convert threaded IRQ to
> just one hardirq handler. And continue debugging. You will reduce the
> load of the system with the above measures, maybe it will not happen
> any more, maybe you'll figure out that problem is somewhere else.

Or this.

Felipe converted it to threaded_irq, maybe he has something to add?
Vignesh Raghavendra July 10, 2015, 1:17 p.m. UTC | #4
On 07/10/2015 02:39 PM, Wolfram Sang wrote:
> 
>> 60 s sounds way too much and actually I simply don't believe this is
>> the root cause. If I take a look into the driver, then I see, that
> 
> I agree, this is just a workaround.
> 

Yes, this is a workaround. I thought this is simpler change and can go
into -rc while I work on the better fix. As you can see, the other
suggestions need quite a significant change to the isr code.

>> the design is not really the best. The whole IRQ handling could be
>> actually performed in hard IRQ handler, without threading overhead.
>> Putting even 2 bytes in the controller FIFO should not be too heavy
>> for the hard IRQ handler. Then these ridiculous spin_lock()s. What
>> is the reason behind? The IRQ is flagged with ONESHOT, so thread and
>> hardirq handler are anyway mutually excluded. But if this thread
>> ever runs longer than it's allowed in IRQ context, then it anyway
>> produces this IRQ latency because it locks spin_lock_irqsave() for
>> the whole time! So the whole point of threaded interrupt is missing.
> 
> Furthermore, this combination of threaded_irq and struct completion seems
> bogus to me. If you just want to ensure the irq happened before timeout,
> you just complete when the irq happened and do the "bottom half" after the
> completion returned?

This sounds good to me. I will try to implement this option.
Thanks for the suggestion.

> 
>> I would propose you to throw away spinlocks. Convert threaded IRQ to
>> just one hardirq handler. And continue debugging. You will reduce the
>> load of the system with the above measures, maybe it will not happen
>> any more, maybe you'll figure out that problem is somewhere else.
> 
> Or this.

I am not convinced with moving entire code at hardirq context. I believe
its better to keep hardirq as small as possible.
Sverdlin, Alexander (Nokia - DE/Ulm) July 10, 2015, 1:26 p.m. UTC | #5
Hi!

On 10/07/15 15:17, ext Vignesh R wrote:
>>> I would propose you to throw away spinlocks. Convert threaded IRQ to
>>> >> just one hardirq handler. And continue debugging. You will reduce the
>>> >> load of the system with the above measures, maybe it will not happen
>>> >> any more, maybe you'll figure out that problem is somewhere else.
>> > 
>> > Or this.
> I am not convinced with moving entire code at hardirq context. I believe
> its better to keep hardirq as small as possible.

How deep is the controller's FIFO? 1 byte? 2 bytes? Other drivers can perfectly fill
next byte in hardirq handler. If you need to do 10 opcodes more in hardirq handler,
it's much better for the whole system than to trigger scheduler and thread and and and
just because of these 10 opcodes.
Vignesh Raghavendra July 10, 2015, 1:44 p.m. UTC | #6
Hi,

On 07/10/2015 06:56 PM, Alexander Sverdlin wrote:
> Hi!
> 
> On 10/07/15 15:17, ext Vignesh R wrote:
>>>> I would propose you to throw away spinlocks. Convert threaded IRQ to
>>>>>> just one hardirq handler. And continue debugging. You will reduce the
>>>>>> load of the system with the above measures, maybe it will not happen
>>>>>> any more, maybe you'll figure out that problem is somewhere else.
>>>>
>>>> Or this.
>> I am not convinced with moving entire code at hardirq context. I believe
>> its better to keep hardirq as small as possible.
> 
> How deep is the controller's FIFO? 1 byte? 2 bytes? 

As per AM57x TRM[1] section 24.1.4.8 max FIFO depth can be 64bytes.

[1] http://www.ti.com/lit/ug/spruhz6/spruhz6.pdf
grygorii July 10, 2015, 1:48 p.m. UTC | #7
On 07/10/2015 04:26 PM, Alexander Sverdlin wrote:
> Hi!
> 
> On 10/07/15 15:17, ext Vignesh R wrote:
>>>> I would propose you to throw away spinlocks. Convert threaded IRQ to

Agree. Looks like spinlock is not needed.

>>>>>> just one hardirq handler. And continue debugging. You will reduce the
>>>>>> load of the system with the above measures, maybe it will not happen
>>>>>> any more, maybe you'll figure out that problem is somewhere else.
>>>>
>>>> Or this.
>> I am not convinced with moving entire code at hardirq context. I believe
>> its better to keep hardirq as small as possible.
> 
> How deep is the controller's FIFO? 1 byte? 2 bytes? Other drivers can perfectly fill
> next byte in hardirq handler. If you need to do 10 opcodes more in hardirq handler,
> it's much better for the whole system than to trigger scheduler and thread and and and
> just because of these 10 opcodes.
> 

1) TRM: Built-in configurable FIFOs (8, 16, 32, 64 bytes) for buffered read or write.
2) We trying to be as much compatible with RT kernel as possible where all IRQ
are threaded.

And yes. This patch is u.. WA which tries to fix symptom and not an issue.
grygorii July 10, 2015, 2:02 p.m. UTC | #8
Hi Wolfram,

On 07/10/2015 12:09 PM, Wolfram Sang wrote:
> 
>> 60 s sounds way too much and actually I simply don't believe this is
>> the root cause. If I take a look into the driver, then I see, that
> 
> I agree, this is just a workaround.
> 
>> the design is not really the best. The whole IRQ handling could be
>> actually performed in hard IRQ handler, without threading overhead.
>> Putting even 2 bytes in the controller FIFO should not be too heavy
>> for the hard IRQ handler. Then these ridiculous spin_lock()s. What
>> is the reason behind? The IRQ is flagged with ONESHOT, so thread and
>> hardirq handler are anyway mutually excluded. But if this thread
>> ever runs longer than it's allowed in IRQ context, then it anyway
>> produces this IRQ latency because it locks spin_lock_irqsave() for
>> the whole time! So the whole point of threaded interrupt is missing.
> 
> Furthermore, this combination of threaded_irq and struct completion seems
> bogus to me. If you just want to ensure the irq happened before timeout,
> you just complete when the irq happened and do the "bottom half" after the
> completion returned?
> 

I'd very appreciated if You would be able to clarify your point a bit, pls?
completion is used to indicate end of one message transfer (+check for msg timeout),
so .master_xfer()->omap_i2c_xfer could switch to next msg.
And there could be more than on IRQ triggered depending on msg length
while one message is being transfered.

Patch
diff mbox

diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
index d1c22e3fdd14..fa7758f0302c 100644
--- a/drivers/i2c/busses/i2c-omap.c
+++ b/drivers/i2c/busses/i2c-omap.c
@@ -50,7 +50,7 @@ 
 #define OMAP_I2C_REV_ON_4430_PLUS	0x50400002
 
 /* timeout waiting for the controller to respond */
-#define OMAP_I2C_TIMEOUT (msecs_to_jiffies(1000))
+#define OMAP_I2C_TIMEOUT (msecs_to_jiffies(60 * 1000))
 
 /* timeout for pm runtime autosuspend */
 #define OMAP_I2C_PM_TIMEOUT		1000	/* ms */