diff mbox

[RFC] i2c-designware-core: disable adapter before fill dev structure

Message ID 1370597401-22501-1-git-send-email-andriy.shevchenko@linux.intel.com
State Rejected
Headers show

Commit Message

Andy Shevchenko June 7, 2013, 9:30 a.m. UTC
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/i2c/busses/i2c-designware-core.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

Comments

Christian Ruppert June 7, 2013, 1:01 p.m. UTC | #1
Hi Andy,

I like your patch and I just did some testing on it. Unluckily, it
doesn't solve the lock-up problem.

I haven't investigated any further but I suspect that on top of the
cases I observed when debugging this (interrupts after initialisation of
dev, easy to prove) there are more obscure cases in which interrupts are
generated in an unexpected manner after errors. The interrupt-driven
transfer state machine of the driver implicitly relies on the fact that
all updates of dev which are related to the same transfer are performed
between the mutex_lock and mutex_unlock calls in i2c_dw_xfer. Thus, I
decided it was safer to disable the block before releasing the mutex
when I wrote my patch.

That said, I think the sequencing at transfer initialisation is more
logical with your patch and I wonder if it is still worth applying. Any
other opinions on this?

Greetings,
  Christian

On Fri, Jun 07, 2013 at 12:30:01PM +0300, Andy Shevchenko wrote:
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  drivers/i2c/busses/i2c-designware-core.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c
> index c41ca63..a1d3d95 100644
> --- a/drivers/i2c/busses/i2c-designware-core.c
> +++ b/drivers/i2c/busses/i2c-designware-core.c
> @@ -366,9 +366,6 @@ static void i2c_dw_xfer_init(struct dw_i2c_dev *dev)
>  	struct i2c_msg *msgs = dev->msgs;
>  	u32 ic_con;
>  
> -	/* Disable the adapter */
> -	__i2c_dw_enable(dev, false);
> -
>  	/* set the slave (target) address */
>  	dw_writel(dev, msgs[dev->msg_write_idx].addr, DW_IC_TAR);
>  
> @@ -561,6 +558,13 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>  	mutex_lock(&dev->lock);
>  	pm_runtime_get_sync(dev->dev);
>  
> +	ret = i2c_dw_wait_bus_not_busy(dev);
> +	if (ret < 0)
> +		goto done;
> +
> +	/* Disable the adapter */
> +	__i2c_dw_enable(dev, false);
> +
>  	INIT_COMPLETION(dev->cmd_complete);
>  	dev->msgs = msgs;
>  	dev->msgs_num = num;
> @@ -572,10 +576,6 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>  	dev->abort_source = 0;
>  	dev->rx_outstanding = 0;
>  
> -	ret = i2c_dw_wait_bus_not_busy(dev);
> -	if (ret < 0)
> -		goto done;
> -
>  	/* start the transfers */
>  	i2c_dw_xfer_init(dev);
>  
> -- 
> 1.8.2.rc0.22.gb3600c3
>
Wolfram Sang June 11, 2013, 6:40 p.m. UTC | #2
On Fri, Jun 07, 2013 at 03:01:34PM +0200, Christian Ruppert wrote:
> Hi Andy,
> 
> I like your patch and I just did some testing on it. Unluckily, it
> doesn't solve the lock-up problem.
> 
> I haven't investigated any further but I suspect that on top of the
> cases I observed when debugging this (interrupts after initialisation of
> dev, easy to prove) there are more obscure cases in which interrupts are
> generated in an unexpected manner after errors. The interrupt-driven
> transfer state machine of the driver implicitly relies on the fact that
> all updates of dev which are related to the same transfer are performed
> between the mutex_lock and mutex_unlock calls in i2c_dw_xfer. Thus, I
> decided it was safer to disable the block before releasing the mutex
> when I wrote my patch.
> 
> That said, I think the sequencing at transfer initialisation is more
> logical with your patch and I wonder if it is still worth applying. Any
> other opinions on this?

Ping. There are:

[V2] i2c: designware: fix race between subsequent xfers
[RFC] i2c-designware-core: disable adapter before fill dev structure

What is the consensus of those two patches?
Christian Ruppert June 12, 2013, 9:41 a.m. UTC | #3
On Tue, Jun 11, 2013 at 08:40:37PM +0200, Wolfram Sang wrote:
> On Fri, Jun 07, 2013 at 03:01:34PM +0200, Christian Ruppert wrote:
> > Hi Andy,
> > 
> > I like your patch and I just did some testing on it. Unluckily, it
> > doesn't solve the lock-up problem.
> > 
> > I haven't investigated any further but I suspect that on top of the
> > cases I observed when debugging this (interrupts after initialisation of
> > dev, easy to prove) there are more obscure cases in which interrupts are
> > generated in an unexpected manner after errors. The interrupt-driven
> > transfer state machine of the driver implicitly relies on the fact that
> > all updates of dev which are related to the same transfer are performed
> > between the mutex_lock and mutex_unlock calls in i2c_dw_xfer. Thus, I
> > decided it was safer to disable the block before releasing the mutex
> > when I wrote my patch.
> > 
> > That said, I think the sequencing at transfer initialisation is more
> > logical with your patch and I wonder if it is still worth applying. Any
> > other opinions on this?
> 
> Ping. There are:
> 
> [V2] i2c: designware: fix race between subsequent xfers
> [RFC] i2c-designware-core: disable adapter before fill dev structure
> 
> What is the consensus of those two patches?

Although I almost like Andy's code better than my own it doesn't seem to
fix the issue we're aiming at (system lock ups due to undesired
interrupts) in all cases. The "[V2] i2c: designware: fix race between
subsequent xfers" is currently the only patch preventing those lock ups
in our tests.

That said, IMHO Andy's patch seems to be a valuable code clean up
nevertheless and could be applied in addition to the other. I suggest I
give the combination of both patches some additional testing on the
occasion and tag Andy's RFC with tested-by me in case it's stable.

Greetings,
  Christian
Christian Ruppert June 13, 2013, 8:16 a.m. UTC | #4
Hello,

As promised, I gave this one some over-night stress testing and I can
confirm what I said previously:

- The patch does _not_ solve the interrupt loop lockups on its own.
- The patch works well in conjunction with
  http://patchwork.ozlabs.org/patch/249622/ (which in turn depends on
  Mika's patch). Under this condition you can assume
  Tested-By: Christian Ruppert <christian.ruppert@abilis.com>

I still think the code is more logical with this patch than without it
and I am in favour of applying both (if Andy agrees that is). We must
keep in mind, however, that http://patchwork.ozlabs.org/patch/249622
does fix a real problem we can observe on our chip and for our TB10x
product we do require some form of it for stability reasons.

Greetings,
  Christian

On Fri, Jun 07, 2013 at 12:30:01PM +0300, Andy Shevchenko wrote:
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  drivers/i2c/busses/i2c-designware-core.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c
> index c41ca63..a1d3d95 100644
> --- a/drivers/i2c/busses/i2c-designware-core.c
> +++ b/drivers/i2c/busses/i2c-designware-core.c
> @@ -366,9 +366,6 @@ static void i2c_dw_xfer_init(struct dw_i2c_dev *dev)
>  	struct i2c_msg *msgs = dev->msgs;
>  	u32 ic_con;
>  
> -	/* Disable the adapter */
> -	__i2c_dw_enable(dev, false);
> -
>  	/* set the slave (target) address */
>  	dw_writel(dev, msgs[dev->msg_write_idx].addr, DW_IC_TAR);
>  
> @@ -561,6 +558,13 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>  	mutex_lock(&dev->lock);
>  	pm_runtime_get_sync(dev->dev);
>  
> +	ret = i2c_dw_wait_bus_not_busy(dev);
> +	if (ret < 0)
> +		goto done;
> +
> +	/* Disable the adapter */
> +	__i2c_dw_enable(dev, false);
> +
>  	INIT_COMPLETION(dev->cmd_complete);
>  	dev->msgs = msgs;
>  	dev->msgs_num = num;
> @@ -572,10 +576,6 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
>  	dev->abort_source = 0;
>  	dev->rx_outstanding = 0;
>  
> -	ret = i2c_dw_wait_bus_not_busy(dev);
> -	if (ret < 0)
> -		goto done;
> -
>  	/* start the transfers */
>  	i2c_dw_xfer_init(dev);
>  
> -- 
> 1.8.2.rc0.22.gb3600c3
>
Andy Shevchenko June 13, 2013, 8:33 a.m. UTC | #5
On Thu, 2013-06-13 at 10:16 +0200, Christian Ruppert wrote: 
> As promised, I gave this one some over-night stress testing and I can
> confirm what I said previously:
> 
> - The patch does _not_ solve the interrupt loop lockups on its own.

So, it just means my assumptions about what is happening there were
wrong.

> - The patch works well in conjunction with
>   http://patchwork.ozlabs.org/patch/249622/ (which in turn depends on
>   Mika's patch). Under this condition you can assume
>   Tested-By: Christian Ruppert <christian.ruppert@abilis.com>
> 
> I still think the code is more logical with this patch than without it
> and I am in favour of applying both (if Andy agrees that is).

Since my patch doesn't fix anything, I think we may drop it away.

>  We must keep in mind, however, that http://patchwork.ozlabs.org/patch/249622
> does fix a real problem we can observe on our chip and for our TB10x
> product we do require some form of it for stability reasons.

I feel like a real fix is to add a memory barier to make changes in the
structure consistent. However, I have no clue where.
Christian Ruppert June 13, 2013, 8:58 a.m. UTC | #6
On Thu, Jun 13, 2013 at 11:33:56AM +0300, Andy Shevchenko wrote:
> On Thu, 2013-06-13 at 10:16 +0200, Christian Ruppert wrote: 
> > As promised, I gave this one some over-night stress testing and I can
> > confirm what I said previously:
> > 
> > - The patch does _not_ solve the interrupt loop lockups on its own.
> 
> So, it just means my assumptions about what is happening there were
> wrong.

So were my initial ones... Or at least insufficient.

> > - The patch works well in conjunction with
> >   http://patchwork.ozlabs.org/patch/249622/ (which in turn depends on
> >   Mika's patch). Under this condition you can assume
> >   Tested-By: Christian Ruppert <christian.ruppert@abilis.com>
> > 
> > I still think the code is more logical with this patch than without it
> > and I am in favour of applying both (if Andy agrees that is).
> 
> Since my patch doesn't fix anything, I think we may drop it away.
> 
> >  We must keep in mind, however, that http://patchwork.ozlabs.org/patch/249622
> > does fix a real problem we can observe on our chip and for our TB10x
> > product we do require some form of it for stability reasons.
> 
> I feel like a real fix is to add a memory barier to make changes in the
> structure consistent. However, I have no clue where.

I'm still not sure about the interrupt behaviour of the dw-i2c block in
the case of error (and since our problem is fixed it's difficult to
justify spending time to investigate further). I suspect that the thing
in some situations sends spurious interrupts which confuse the state
machine - in which case memory barriers won't help us either.

Greetings,
  Christian
diff mbox

Patch

diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c
index c41ca63..a1d3d95 100644
--- a/drivers/i2c/busses/i2c-designware-core.c
+++ b/drivers/i2c/busses/i2c-designware-core.c
@@ -366,9 +366,6 @@  static void i2c_dw_xfer_init(struct dw_i2c_dev *dev)
 	struct i2c_msg *msgs = dev->msgs;
 	u32 ic_con;
 
-	/* Disable the adapter */
-	__i2c_dw_enable(dev, false);
-
 	/* set the slave (target) address */
 	dw_writel(dev, msgs[dev->msg_write_idx].addr, DW_IC_TAR);
 
@@ -561,6 +558,13 @@  i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
 	mutex_lock(&dev->lock);
 	pm_runtime_get_sync(dev->dev);
 
+	ret = i2c_dw_wait_bus_not_busy(dev);
+	if (ret < 0)
+		goto done;
+
+	/* Disable the adapter */
+	__i2c_dw_enable(dev, false);
+
 	INIT_COMPLETION(dev->cmd_complete);
 	dev->msgs = msgs;
 	dev->msgs_num = num;
@@ -572,10 +576,6 @@  i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
 	dev->abort_source = 0;
 	dev->rx_outstanding = 0;
 
-	ret = i2c_dw_wait_bus_not_busy(dev);
-	if (ret < 0)
-		goto done;
-
 	/* start the transfers */
 	i2c_dw_xfer_init(dev);