Patchwork [1/2] i2c: designware: fix race between subsequent xfers

login
register
mail settings
Submitter Christian Ruppert
Date June 6, 2013, 1:43 p.m.
Message ID <1370526216-10060-1-git-send-email-christian.ruppert@abilis.com>
Download mbox | patch
Permalink /patch/249423/
State Superseded
Headers show

Comments

Christian Ruppert - June 6, 2013, 1:43 p.m.
The designware block is not always properly disabled in the case of
transfer errors. Interrupts from aborted transfers might be handled
after the data structures for the following transfer are initialised but
before the hardware is set up. This might corrupt the data structures to
the point that the system is stuck in an infinite interrupt loop (where
FIFOs are never emptied).
This patch cleanly disables the designware-i2c hardware at the end of
every transfer, successful or not.

Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
---
 drivers/i2c/busses/i2c-designware-core.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)
Mika Westerberg - June 7, 2013, 5:23 a.m.
Hi Christian,

On Thu, Jun 06, 2013 at 03:43:35PM +0200, Christian Ruppert wrote:
> The designware block is not always properly disabled in the case of
> transfer errors. Interrupts from aborted transfers might be handled
> after the data structures for the following transfer are initialised but
> before the hardware is set up. This might corrupt the data structures to
> the point that the system is stuck in an infinite interrupt loop (where
> FIFOs are never emptied).
> This patch cleanly disables the designware-i2c hardware at the end of
> every transfer, successful or not.

Have you tried with the latest mainline driver? There is a commit that
solves similar problem:

2a2d95e9d6d29e7	i2c: designware: always clear interrupts before enabling them

Maybe it helps?
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Ruppert - June 7, 2013, 8:16 a.m.
On Fri, Jun 07, 2013 at 08:23:53AM +0300, Mika Westerberg wrote:
> Hi Christian,
> 
> On Thu, Jun 06, 2013 at 03:43:35PM +0200, Christian Ruppert wrote:
> > The designware block is not always properly disabled in the case of
> > transfer errors. Interrupts from aborted transfers might be handled
> > after the data structures for the following transfer are initialised but
> > before the hardware is set up. This might corrupt the data structures to
> > the point that the system is stuck in an infinite interrupt loop (where
> > FIFOs are never emptied).
> > This patch cleanly disables the designware-i2c hardware at the end of
> > every transfer, successful or not.
> 
> Have you tried with the latest mainline driver? There is a commit that
> solves similar problem:
> 
> 2a2d95e9d6d29e7	i2c: designware: always clear interrupts before enabling them
> 
> Maybe it helps?

Hi Mika,

Thanks for the hint but I have checked both main line and Wolfram's
branch and I saw this patch. I actually hoped it would fix our problem
but it didn't.

Here some more details: We experienced system lockups (complete lock up,
no reaction whatsoever) in long-term tests under heavy system load with
lots of scheduling and forking/killing. These lockups could be traced to
the I2C driver which after some time ended up in an incoherent state:
i2c_dw_isr was being called with DW_IC_INTR_RX_FULL but
dev->msg_read_idx == dev->msgs_num. This resulted in the FIFO never
being emptied by i2c_dw_read. Since the DW_IC_INTR_RX_FULL interrupt is
cleared by emptying the FIFO, this situation results in an IRQ loop
locking up the system.

We found that the situation systematically occurs just after the
originating process is interrupted (premature return of
wait_for_completion_interruptible_timeout) and further analysis showed
the race condition: Interrupts from the previous transfer are sometimes
triggered after the initialisation of dev in the beginning of
i2c_dw_xfer, thus corrupting the state. If these interrupts occur before
dev is initialised everything works fine.

An alternative solution would probably be to make sure the hardware is
disabled before initialising the dev structure in i2c_dw_xfer.

Greetings,
  Christian

Patch

diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c
index 6c0e776..65c0c7a 100644
--- a/drivers/i2c/busses/i2c-designware-core.c
+++ b/drivers/i2c/busses/i2c-designware-core.c
@@ -588,10 +588,20 @@  i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
 	ret = wait_for_completion_interruptible_timeout(&dev->cmd_complete, HZ);
 	if (ret == 0) {
 		dev_err(dev->dev, "controller timed out\n");
+		/* i2c_dw_init implicitly disables the adapter */
 		i2c_dw_init(dev);
 		ret = -ETIMEDOUT;
 		goto done;
-	} else if (ret < 0)
+	}
+
+	/*
+	 * We must disable the adapter before unlocking the &dev->lock mutex
+	 * below. Otherwise the hardware might continue generating interrupts
+	 * which in turn causes a race condition with the following transfer.
+	 */
+	__i2c_dw_enable(dev, false);
+
+	if (ret < 0)
 		goto done;
 
 	if (dev->msg_err) {
@@ -601,8 +611,6 @@  i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
 
 	/* no error */
 	if (likely(!dev->cmd_err)) {
-		/* Disable the adapter */
-		__i2c_dw_enable(dev, false);
 		ret = num;
 		goto done;
 	}