[v10,4/7] i2c: fsi: Add abort and hardware reset procedures

Message ID 1528918579-27602-5-git-send-email-eajames@linux.vnet.ibm.com
State Superseded
Headers show
Series
  • i2c: Add FSI-attached I2C master algorithm
Related show

Commit Message

Eddie James June 13, 2018, 7:36 p.m.
Add abort procedure for failed transfers. Add engine and bus reset
procedures to recover from as many faults as possible.

Signed-off-by: Eddie James <eajames@linux.vnet.ibm.com>
---
 drivers/i2c/busses/i2c-fsi.c | 179 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 179 insertions(+)

Comments

Wolfram Sang June 26, 2018, 2:38 a.m. | #1
On Wed, Jun 13, 2018 at 02:36:16PM -0500, Eddie James wrote:
> Add abort procedure for failed transfers. Add engine and bus reset
> procedures to recover from as many faults as possible.

I think this is a way too aggressive recovery. Your are doing the 9
pulse toggles basically on any error while this is only when the device
keeps SDA low and you want to recover from that. If SDA is not stuck
low, sending a STOP should do. Or do you have a known case where this is
not going to work?

Also, you implement the pulse toggling manually. Can't you just populate
{get|set}_{scl|sda} and use the generic routine we have in the core?
Eddie James June 27, 2018, 1:48 p.m. | #2
On 06/25/2018 09:38 PM, Wolfram Sang wrote:
> On Wed, Jun 13, 2018 at 02:36:16PM -0500, Eddie James wrote:
>> Add abort procedure for failed transfers. Add engine and bus reset
>> procedures to recover from as many faults as possible.
> I think this is a way too aggressive recovery. Your are doing the 9
> pulse toggles basically on any error while this is only when the device
> keeps SDA low and you want to recover from that. If SDA is not stuck
> low, sending a STOP should do. Or do you have a known case where this is
> not going to work?

It is aggressive, but I don't see the harm in doing this on every error. 
There are some other error conditions with this hardware which may 
require the clock toggling, such as "bus arbitration lost." I think this 
is the safest option for this hardware, and this routine has been tested 
for many years.

>
> Also, you implement the pulse toggling manually. Can't you just populate
> {get|set}_{scl|sda} and use the generic routine we have in the core?

I see that the generic implementation breaks the loop if it sees the 
clock isn't high after setting it, or if SDA goes high. I think it's 
safer to finish the reset for our hardware. Plus, we actually have 
different registers for setting 0 or 1 to the clock/data, so we save 
some cpu cycles by doing it directly instead of implementing set_scl/sda 
and having to check val every time :)

If you feel very strongly that this recovery procedure needs to be 
reduced, then I will work on that and have to do some extensive testing.

Thanks!
Eddie

>
Wolfram Sang July 2, 2018, 6:15 p.m. | #3
Hi Eddie,

> > I think this is a way too aggressive recovery. Your are doing the 9
> > pulse toggles basically on any error while this is only when the device
> > keeps SDA low and you want to recover from that. If SDA is not stuck
> > low, sending a STOP should do. Or do you have a known case where this is
> > not going to work?
> 
> It is aggressive, but I don't see the harm in doing this on every error.

Well, as it happens, I just fixed such a case. Please check these patch
series and elinux wiki pages:

===

(new fault injector)
[PATCH v2 0/2] i2c: gpio: fault-injector: add new injector

(actual recovery fix)
[PATCH 0/2] i2c: recovery: make sure pulses are not misinterpreted

===

And here is the new elinux wiki page to describe my findings:

https://elinux.org/Tests:I2C-bus-recovery-write-byte-fix

Also, the previous pages have been updated to reflect the latest status:

https://elinux.org/Tests:I2C-fault-injection
https://elinux.org/Tests:I2C-bus-recovery

To sum it up: This is a proven case where uncontrolled bus recovery can
result into a bogus write!

> There are some other error conditions with this hardware which may require
> the clock toggling, such as "bus arbitration lost." I think this is the

Why is that? In my understanding, recovery is *only* needed when SDA is
stuck low. If SDA is high, sending STOP should do. If not, it needs to
be researched why.

> safest option for this hardware, and this routine has been tested for many
> years.

I remember having a similar argument with Joakim Tjernlund a while ago.
I recently re-read our argument, yet I still keep my position: I don't
want to do $random things to recover, just a tested and well understood
procedure. And in that thread, I was never given a test case.

> > 
> > Also, you implement the pulse toggling manually. Can't you just populate
> > {get|set}_{scl|sda} and use the generic routine we have in the core?
> 
> I see that the generic implementation breaks the loop if it sees the clock
> isn't high after setting it, or if SDA goes high. I think it's safer to
> finish the reset for our hardware. Plus, we actually have different

Why do you think it is safer? What is the test case for that? I think
one really should do check SDA! See above, you might trigger a write
otherwise. If this breaks something for you, I am looking forward to
discuss it.

> registers for setting 0 or 1 to the clock/data, so we save some cpu cycles
> by doing it directly instead of implementing set_scl/sda and having to check
> val every time :)

Correctness comes above all here. And I am afraid your implementation is
not correct.

> If you feel very strongly that this recovery procedure needs to be reduced,
> then I will work on that and have to do some extensive testing.

I am open for discussion, yet I also feel strong about it. The reason
why the recovery procedure is moved into the core is to have one working
and understood bit-banging algorithm which all drivers can rely on. If
all drivers implement their custom version, they might miss gory details
like the above write_byte fix.

I do understand this might cause testing effort for you, I am sorry for
the delay it causes. However, my goal as a maintainer is to have a
reliable recovery mechanism, for your driver as well as for all drivers.

I hope this is understandable. BTW if you want this driver upstream
soon, then it may be an idea to resend it without any bus recovery and
then we can work on it incrementally.

Kind regards and thanks,

   Wolfram
Eddie James July 5, 2018, 6:50 p.m. | #4
On 07/02/2018 01:15 PM, Wolfram Sang wrote:
> Hi Eddie,
>
>>> I think this is a way too aggressive recovery. Your are doing the 9
>>> pulse toggles basically on any error while this is only when the device
>>> keeps SDA low and you want to recover from that. If SDA is not stuck
>>> low, sending a STOP should do. Or do you have a known case where this is
>>> not going to work?
>> It is aggressive, but I don't see the harm in doing this on every error.
> Well, as it happens, I just fixed such a case. Please check these patch
> series and elinux wiki pages:
>
> ===
>
> (new fault injector)
> [PATCH v2 0/2] i2c: gpio: fault-injector: add new injector
>
> (actual recovery fix)
> [PATCH 0/2] i2c: recovery: make sure pulses are not misinterpreted
>
> ===
>
> And here is the new elinux wiki page to describe my findings:
>
> https://elinux.org/Tests:I2C-bus-recovery-write-byte-fix
>
> Also, the previous pages have been updated to reflect the latest status:
>
> https://elinux.org/Tests:I2C-fault-injection
> https://elinux.org/Tests:I2C-bus-recovery
>
> To sum it up: This is a proven case where uncontrolled bus recovery can
> result into a bogus write!
>
>> There are some other error conditions with this hardware which may require
>> the clock toggling, such as "bus arbitration lost." I think this is the
> Why is that? In my understanding, recovery is *only* needed when SDA is
> stuck low. If SDA is high, sending STOP should do. If not, it needs to
> be researched why.
>
>> safest option for this hardware, and this routine has been tested for many
>> years.
> I remember having a similar argument with Joakim Tjernlund a while ago.
> I recently re-read our argument, yet I still keep my position: I don't
> want to do $random things to recover, just a tested and well understood
> procedure. And in that thread, I was never given a test case.
>
>>> Also, you implement the pulse toggling manually. Can't you just populate
>>> {get|set}_{scl|sda} and use the generic routine we have in the core?
>> I see that the generic implementation breaks the loop if it sees the clock
>> isn't high after setting it, or if SDA goes high. I think it's safer to
>> finish the reset for our hardware. Plus, we actually have different
> Why do you think it is safer? What is the test case for that? I think
> one really should do check SDA! See above, you might trigger a write
> otherwise. If this breaks something for you, I am looking forward to
> discuss it.
>
>> registers for setting 0 or 1 to the clock/data, so we save some cpu cycles
>> by doing it directly instead of implementing set_scl/sda and having to check
>> val every time :)
> Correctness comes above all here. And I am afraid your implementation is
> not correct.
>
>> If you feel very strongly that this recovery procedure needs to be reduced,
>> then I will work on that and have to do some extensive testing.
> I am open for discussion, yet I also feel strong about it. The reason
> why the recovery procedure is moved into the core is to have one working
> and understood bit-banging algorithm which all drivers can rely on. If
> all drivers implement their custom version, they might miss gory details
> like the above write_byte fix.
>
> I do understand this might cause testing effort for you, I am sorry for
> the delay it causes. However, my goal as a maintainer is to have a
> reliable recovery mechanism, for your driver as well as for all drivers.
>
> I hope this is understandable. BTW if you want this driver upstream
> soon, then it may be an idea to resend it without any bus recovery and
> then we can work on it incrementally.

Thanks for the details. I have sent up a new series which will only do 
the bus reset if SDA is low. With our current hardware configuration, 
this *should* be sufficient to recover all the possible errors. However, 
there are configurations where it will not be enough, in which case 
getting the data line stuck high or clock line stuck either high or low 
can occur, necessitating the full reset. But since I can't demonstrate 
those at the moment, I can't argue to include that now :)

Thanks again,
Eddie

>
> Kind regards and thanks,
>
>     Wolfram
Wolfram Sang July 5, 2018, 10:06 p.m. | #5
Eddie,

> Thanks for the details. I have sent up a new series which will only do the
> bus reset if SDA is low. With our current hardware configuration, this

Thanks.

> *should* be sufficient to recover all the possible errors. However, there
> are configurations where it will not be enough, in which case getting the
> data line stuck high or clock line stuck either high or low can occur,
> necessitating the full reset. But since I can't demonstrate those at the
> moment, I can't argue to include that now :)

For the record, I am *really* interested in these cases. Just from
reading the above I wonder how SDA can stuck high when being open drain,
and how you will create 9 SCL pulses if SCL is stuck low. But if we have
a test case, we will figure out something together.

Thanks,

   Wolfram

Patch

diff --git a/drivers/i2c/busses/i2c-fsi.c b/drivers/i2c/busses/i2c-fsi.c
index 695818f..4611a0b 100644
--- a/drivers/i2c/busses/i2c-fsi.c
+++ b/drivers/i2c/busses/i2c-fsi.c
@@ -12,10 +12,12 @@ 
 
 #include <linux/bitfield.h>
 #include <linux/bitops.h>
+#include <linux/delay.h>
 #include <linux/device.h>
 #include <linux/errno.h>
 #include <linux/fsi.h>
 #include <linux/i2c.h>
+#include <linux/jiffies.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/module.h>
@@ -128,6 +130,20 @@ 
 #define I2C_ESTAT_SELF_BUSY	BIT(6)
 #define I2C_ESTAT_VERSION	GENMASK(4, 0)
 
+/* port busy register */
+#define I2C_PORT_BUSY_RESET	BIT(31)
+
+/* wait for command complete or data request */
+#define I2C_CMD_SLEEP_MAX_US	500
+#define I2C_CMD_SLEEP_MIN_US	50
+
+/* wait after reset; choose time from legacy driver */
+#define I2C_RESET_SLEEP_MAX_US	2000
+#define I2C_RESET_SLEEP_MIN_US	1000
+
+/* choose timeout length from legacy driver; it's well tested */
+#define I2C_ABORT_TIMEOUT	msecs_to_jiffies(100)
+
 struct fsi_i2c_master {
 	struct fsi_device	*fsi;
 	u8			fifo_size;
@@ -214,6 +230,169 @@  static int fsi_i2c_set_port(struct fsi_i2c_port *port)
 	return fsi_i2c_write_reg(fsi, I2C_FSI_RESET_ERR, &dummy);
 }
 
+static int fsi_i2c_reset_bus(struct fsi_i2c_master *i2c)
+{
+	int i, rc;
+	u32 mode, stat, ext, dummy = 0;
+
+	rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_MODE, &mode);
+	if (rc)
+		return rc;
+
+	mode |= I2C_MODE_DIAG;
+	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_MODE, &mode);
+	if (rc)
+		return rc;
+
+	for (i = 0; i < 9; i++) {
+		rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_SCL, &dummy);
+		if (rc)
+			return rc;
+
+		rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_SET_SCL, &dummy);
+		if (rc)
+			return rc;
+	}
+
+	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_SCL, &dummy);
+	if (rc)
+		return rc;
+
+	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_SDA, &dummy);
+	if (rc)
+		return rc;
+
+	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_SET_SCL, &dummy);
+	if (rc)
+		return rc;
+
+	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_SET_SDA, &dummy);
+	if (rc)
+		return rc;
+
+	mode &= ~I2C_MODE_DIAG;
+	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_MODE, &mode);
+	if (rc)
+		return rc;
+
+	rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_STAT, &stat);
+	if (rc)
+		return rc;
+
+	/* check for hardware fault */
+	if (!(stat & I2C_STAT_SCL_IN) || !(stat & I2C_STAT_SDA_IN)) {
+		rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_ESTAT, &ext);
+		if (rc)
+			return rc;
+
+		dev_err(&i2c->fsi->dev, "bus stuck status[%08X] ext[%08X]\n",
+			stat, ext);
+	}
+
+	return 0;
+}
+
+static int fsi_i2c_reset(struct fsi_i2c_master *i2c, u16 port)
+{
+	int rc;
+	u32 mode, stat, dummy = 0;
+
+	/* reset engine */
+	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_I2C, &dummy);
+	if (rc)
+		return rc;
+
+	/* re-init engine */
+	rc = fsi_i2c_dev_init(i2c);
+	if (rc)
+		return rc;
+
+	rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_MODE, &mode);
+	if (rc)
+		return rc;
+
+	/* set port; default after reset is 0 */
+	if (port) {
+		mode &= ~I2C_MODE_PORT;
+		mode |= FIELD_PREP(I2C_MODE_PORT, port);
+		rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_MODE, &mode);
+		if (rc)
+			return rc;
+	}
+
+	/* reset busy register; hw workaround */
+	dummy = I2C_PORT_BUSY_RESET;
+	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_PORT_BUSY, &dummy);
+	if (rc)
+		return rc;
+
+	/* force bus reset */
+	rc = fsi_i2c_reset_bus(i2c);
+	if (rc)
+		return rc;
+
+	/* reset errors */
+	dummy = 0;
+	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_ERR, &dummy);
+	if (rc)
+		return rc;
+
+	/* wait for command complete */
+	usleep_range(I2C_RESET_SLEEP_MIN_US, I2C_RESET_SLEEP_MAX_US);
+
+	rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_STAT, &stat);
+	if (rc)
+		return rc;
+
+	if (stat & I2C_STAT_CMD_COMP)
+		return rc;
+
+	/* failed to get command complete; reset engine again */
+	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_I2C, &dummy);
+	if (rc)
+		return rc;
+
+	/* re-init engine again */
+	return fsi_i2c_dev_init(i2c);
+}
+
+static int fsi_i2c_abort(struct fsi_i2c_port *port, u32 status)
+{
+	int rc;
+	unsigned long start;
+	u32 cmd = I2C_CMD_WITH_STOP;
+	struct fsi_device *fsi = port->master->fsi;
+
+	rc = fsi_i2c_reset(port->master, port->port);
+	if (rc)
+		return rc;
+
+	/* skip final stop command for these errors */
+	if (status & (I2C_STAT_PARITY | I2C_STAT_LOST_ARB | I2C_STAT_STOP_ERR))
+		return 0;
+
+	/* write stop command */
+	rc = fsi_i2c_write_reg(fsi, I2C_FSI_CMD, &cmd);
+	if (rc)
+		return rc;
+
+	/* wait until we see command complete in the master */
+	start = jiffies;
+
+	do {
+		rc = fsi_i2c_read_reg(fsi, I2C_FSI_STAT, &status);
+		if (rc)
+			return rc;
+
+		if (status & I2C_STAT_CMD_COMP)
+			return 0;
+
+		usleep_range(I2C_CMD_SLEEP_MIN_US, I2C_CMD_SLEEP_MAX_US);
+	} while (time_after(start + I2C_ABORT_TIMEOUT, jiffies));
+
+	return -ETIMEDOUT;
+}
+
 static int fsi_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
 			int num)
 {