diff mbox

[v3,2/4] i2c: omap: implement workaround for handling invalid BB-bit values

Message ID 1416685634-5864-3-git-send-email-al.kochet@gmail.com
State Accepted
Headers show

Commit Message

Alexander Kochetkov Nov. 22, 2014, 7:47 p.m. UTC
In a multimaster environment, after IP software reset, BB-bit value doesn't
correspond to the current bus state. It may happen what BB-bit will be 0,
while the bus is busy due to another I2C master activity.

Any transfer started when BB=0 and bus is busy wouldn't be completed by IP
and results in controller timeout. More over, in some cases IP could
interrupt another master's transfer and corrupt data on wire.

The commit implement method allowing to prevent IP from entering into
"controller timeout" state and from "data corruption" state.

The one drawback is the need to wait for 10ms before the first transfer.

Tested on Beagleboard XM C.
Tested on BBB and AM437x Starter Kit by Felipe Balbi.

Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Tested-by: Felipe Balbi <balbi@ti.com>
Reviewed-by: Felipe Balbi <balbi@ti.com>
---
 drivers/i2c/busses/i2c-omap.c |  103 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

Comments

Kevin Hilman Nov. 24, 2014, 7:08 p.m. UTC | #1
On Sat, Nov 22, 2014 at 11:47 AM, Alexander Kochetkov
<al.kochet@gmail.com> wrote:
> In a multimaster environment, after IP software reset, BB-bit value doesn't
> correspond to the current bus state. It may happen what BB-bit will be 0,
> while the bus is busy due to another I2C master activity.
>
> Any transfer started when BB=0 and bus is busy wouldn't be completed by IP
> and results in controller timeout. More over, in some cases IP could
> interrupt another master's transfer and corrupt data on wire.
>
> The commit implement method allowing to prevent IP from entering into
> "controller timeout" state and from "data corruption" state.
>
> The one drawback is the need to wait for 10ms before the first transfer.
>
> Tested on Beagleboard XM C.
> Tested on BBB and AM437x Starter Kit by Felipe Balbi.
>
> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
> Tested-by: Felipe Balbi <balbi@ti.com>
> Reviewed-by: Felipe Balbi <balbi@ti.com>

This patch recently hit linux-next (as commit 903c3859f77f) and boot
breakage[1] in next-20141124 on OMAP3530 Beagle and Overo/Tobi boards
was bisected down to this commit.

Kevin

[1] http://status.armcloud.us/boot/?next-20141124&omap
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felipe Balbi Nov. 24, 2014, 7:10 p.m. UTC | #2
On Mon, Nov 24, 2014 at 11:08:48AM -0800, Kevin Hilman wrote:
> On Sat, Nov 22, 2014 at 11:47 AM, Alexander Kochetkov
> <al.kochet@gmail.com> wrote:
> > In a multimaster environment, after IP software reset, BB-bit value doesn't
> > correspond to the current bus state. It may happen what BB-bit will be 0,
> > while the bus is busy due to another I2C master activity.
> >
> > Any transfer started when BB=0 and bus is busy wouldn't be completed by IP
> > and results in controller timeout. More over, in some cases IP could
> > interrupt another master's transfer and corrupt data on wire.
> >
> > The commit implement method allowing to prevent IP from entering into
> > "controller timeout" state and from "data corruption" state.
> >
> > The one drawback is the need to wait for 10ms before the first transfer.
> >
> > Tested on Beagleboard XM C.
> > Tested on BBB and AM437x Starter Kit by Felipe Balbi.
> >
> > Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
> > Tested-by: Felipe Balbi <balbi@ti.com>
> > Reviewed-by: Felipe Balbi <balbi@ti.com>
> 
> This patch recently hit linux-next (as commit 903c3859f77f) and boot
> breakage[1] in next-20141124 on OMAP3530 Beagle and Overo/Tobi boards
> was bisected down to this commit.
> 
> Kevin
> 
> [1] http://status.armcloud.us/boot/?next-20141124&omap

btw, based on Kevin's boot test, only OMAP3530 is failing.
Wolfram Sang Nov. 24, 2014, 7:13 p.m. UTC | #3
On Mon, Nov 24, 2014 at 01:10:23PM -0600, Felipe Balbi wrote:
> On Mon, Nov 24, 2014 at 11:08:48AM -0800, Kevin Hilman wrote:
> > On Sat, Nov 22, 2014 at 11:47 AM, Alexander Kochetkov
> > <al.kochet@gmail.com> wrote:
> > > In a multimaster environment, after IP software reset, BB-bit value doesn't
> > > correspond to the current bus state. It may happen what BB-bit will be 0,
> > > while the bus is busy due to another I2C master activity.
> > >
> > > Any transfer started when BB=0 and bus is busy wouldn't be completed by IP
> > > and results in controller timeout. More over, in some cases IP could
> > > interrupt another master's transfer and corrupt data on wire.
> > >
> > > The commit implement method allowing to prevent IP from entering into
> > > "controller timeout" state and from "data corruption" state.
> > >
> > > The one drawback is the need to wait for 10ms before the first transfer.
> > >
> > > Tested on Beagleboard XM C.
> > > Tested on BBB and AM437x Starter Kit by Felipe Balbi.
> > >
> > > Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
> > > Tested-by: Felipe Balbi <balbi@ti.com>
> > > Reviewed-by: Felipe Balbi <balbi@ti.com>
> > 
> > This patch recently hit linux-next (as commit 903c3859f77f) and boot
> > breakage[1] in next-20141124 on OMAP3530 Beagle and Overo/Tobi boards
> > was bisected down to this commit.
> > 
> > Kevin
> > 
> > [1] http://status.armcloud.us/boot/?next-20141124&omap
> 
> btw, based on Kevin's boot test, only OMAP3530 is failing.

OK, giving Alexander some time for a fix. If it can't be found, I'll
revert this patch. Thanks!
Tony Lindgren Nov. 24, 2014, 7:25 p.m. UTC | #4
* Wolfram Sang <wsa@the-dreams.de> [141124 11:14]:
> On Mon, Nov 24, 2014 at 01:10:23PM -0600, Felipe Balbi wrote:
> > On Mon, Nov 24, 2014 at 11:08:48AM -0800, Kevin Hilman wrote:
> > > On Sat, Nov 22, 2014 at 11:47 AM, Alexander Kochetkov
> > > <al.kochet@gmail.com> wrote:
> > > > In a multimaster environment, after IP software reset, BB-bit value doesn't
> > > > correspond to the current bus state. It may happen what BB-bit will be 0,
> > > > while the bus is busy due to another I2C master activity.
> > > >
> > > > Any transfer started when BB=0 and bus is busy wouldn't be completed by IP
> > > > and results in controller timeout. More over, in some cases IP could
> > > > interrupt another master's transfer and corrupt data on wire.
> > > >
> > > > The commit implement method allowing to prevent IP from entering into
> > > > "controller timeout" state and from "data corruption" state.
> > > >
> > > > The one drawback is the need to wait for 10ms before the first transfer.
> > > >
> > > > Tested on Beagleboard XM C.
> > > > Tested on BBB and AM437x Starter Kit by Felipe Balbi.
> > > >
> > > > Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
> > > > Tested-by: Felipe Balbi <balbi@ti.com>
> > > > Reviewed-by: Felipe Balbi <balbi@ti.com>
> > > 
> > > This patch recently hit linux-next (as commit 903c3859f77f) and boot
> > > breakage[1] in next-20141124 on OMAP3530 Beagle and Overo/Tobi boards
> > > was bisected down to this commit.
> > > 
> > > Kevin
> > > 
> > > [1] http://status.armcloud.us/boot/?next-20141124&omap
> > 
> > btw, based on Kevin's boot test, only OMAP3530 is failing.
> 
> OK, giving Alexander some time for a fix. If it can't be found, I'll
> revert this patch. Thanks!

I can confirm reverting it fixes things for me as well thanks.

Regards,

Tony

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Kochetkov Nov. 24, 2014, 7:39 p.m. UTC | #5
24 нояб. 2014 г., в 22:08, Kevin Hilman <khilman@kernel.org> написал(а):

> This patch recently hit linux-next (as commit 903c3859f77f) and boot
> breakage[1] in next-20141124 on OMAP3530 Beagle and Overo/Tobi boards
> was bisected down to this commit.
> 
> Kevin
> 
> [1] http://status.armcloud.us/boot/?next-20141124&omap


Could someone advice with debugging on OMAP3530?

I'd like to apply one one more commit to see events during boot and
see actual signals on wire (during of timeout)?

Otherwise, commit must dropped (or rewritten such way it is disabled by default).


24 нояб. 2014 г., в 22:13, Wolfram Sang <wsa@the-dreams.de> написал(а):

> OK, giving Alexander some time for a fix. If it can't be found, I'll
> revert this patch. Thanks!

What deadline?

Alexander.



--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Nov. 24, 2014, 7:47 p.m. UTC | #6
* Alexander Kochetkov <al.kochet@gmail.com> [141124 11:41]:
> 
> 24 нояб. 2014 г., в 22:08, Kevin Hilman <khilman@kernel.org> написал(а):
> 
> > This patch recently hit linux-next (as commit 903c3859f77f) and boot
> > breakage[1] in next-20141124 on OMAP3530 Beagle and Overo/Tobi boards
> > was bisected down to this commit.
> > 
> > Kevin
> > 
> > [1] http://status.armcloud.us/boot/?next-20141124&omap
> 
> 
> Could someone advice with debugging on OMAP3530?

Just try to boot it with your patch? :)
 
> I'd like to apply one one more commit to see events during boot and
> see actual signals on wire (during of timeout)?

If you post a patch, I can test boot with it. Looks like the failing
ones have i2c rev3.3 on 34xx vs rev4.4 on 36xx.

But the test function should not loop forever in any case I take?

So I doubt we just want to change the test for for a higher rev number
for  OMAP_I2C_REV_ON_3430_3530.
 
> Otherwise, commit must dropped (or rewritten such way it is disabled by default).
> 
> 
> 24 нояб. 2014 г., в 22:13, Wolfram Sang <wsa@the-dreams.de> написал(а):
> 
> > OK, giving Alexander some time for a fix. If it can't be found, I'll
> > revert this patch. Thanks!
> 
> What deadline?

I'd say by Tuesday as we have multiple automated test systems failing
as soon as people update their trees. And when they are failing, we
may miss other breakage happening in linux next.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Kochetkov Nov. 24, 2014, 8:05 p.m. UTC | #7
Guys, I really sorry for that :(

> Just try to boot it with your patch? :)

Yes, may be two-three times (max).
Something (u-boot, may be) leave the bus in the wrong state.
Really strange.

> But the test function should not loop forever in any case I take?

It doesn't loop forever. It finish with message:
"[    3.046691] omap_i2c 48070000.i2c: timeout waiting for bus ready"

> So I doubt we just want to change the test for for a higher rev number
> for  OMAP_I2C_REV_ON_3430_3530.


Yes, I'll do it within 15 minutes.
And later, if it possible, I'd try to see what happens.

Alexander.

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Kochetkov Nov. 24, 2014, 9:08 p.m. UTC | #8
24 нояб. 2014 г., в 22:47, Tony Lindgren <tony@atomide.com> написал(а):

> I'd say by Tuesday as we have multiple automated test systems failing
> as soon as people update their trees. And when they are failing, we
> may miss other breakage happening in linux next.

Fix:
http://marc.info/?l=linux-i2c&m=141686120720069&w=2


--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Kochetkov Nov. 25, 2014, 12:09 a.m. UTC | #9
24 нояб. 2014 г., в 23:05, Alexander Kochetkov <al.kochet@gmail.com> написал(а):

> Something (u-boot, may be) leave the bus in the wrong state.
> Really strange.

Actually something wrong with i2c-pullups on i2c.1 bus on fault boards.
May be these are boards without pull-ups?
All beagles doesn't have internal pull-ups on i2c.1 since u-boot 2011.x.

Here is the bug in the u-boot related to beagle:
http://git.denx.de/?p=u-boot.git;a=commit;h=04e2a13336f0e507ef416bbede3be92b79c46594

Yes, I made fix, but keep that in mind.

For example one of the boards (omap3-beagle):
http://status.armcloud.us/boot/omap3-beagle/job/next/kernel/next-20141124/defconfig/arm-omap2plus_defconfig/
http://status.armcloud.us/boot/omap3-beagle,legacy/job/next/kernel/next-20141124/defconfig/arm-omap2plus_defconfig/
http://status.armcloud.us/boot/omap3-beagle/job/next/kernel/next-20141124/defconfig/arm-multi_v7_defconfig/

has following warning message in the u-boot log:
> U-Boot 2014.07 (Aug 21 2014 - 11:03:05)
> 
> OMAP3530-GP ES3.0, CPU-OPP2, L3-165MHz, Max CPU Clock 600 MHz
> OMAP3 Beagle board + LPDDR/NAND
> ....
> Beagle Rev C1/C2/C3
> Timed out in wait_for_event: status=1000
> Check if pads/pull-ups of bus 1 are properly configured

Also beagle schematic has following log entry for A3:
4. Added optional pullup resistors on I2C2_SCL and I2C_SDA into the layout.

Is the fault beagle is pre A3 revision?

I can't tell anything about second one board (omap3-overo-tobi), because I could not get it schematic.

And how you have i2c.1 working without pull-ups I don't know.

Alexander.

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Nov. 25, 2014, 3:41 p.m. UTC | #10
* Alexander Kochetkov <al.kochet@gmail.com> [141124 16:11]:
> 
> 24 нояб. 2014 г., в 23:05, Alexander Kochetkov <al.kochet@gmail.com> написал(а):
> 
> > Something (u-boot, may be) leave the bus in the wrong state.
> > Really strange.
> 
> Actually something wrong with i2c-pullups on i2c.1 bus on fault boards.
> May be these are boards without pull-ups?

It could be. Boards that use external i2c pulls should have all the
internal pulls disabled as the pulls get connected in parallel and
the total resistance decreases. And then the i2c signals may not be
up to the spec.

Note that there are internal pulls in the i2c controller, and in the
i2c padconf registers.

> All beagles doesn't have internal pull-ups on i2c.1 since u-boot 2011.x.

The pulls should be enabled based on the board and possibly based
on the board revision.
 
> Here is the bug in the u-boot related to beagle:
> http://git.denx.de/?p=u-boot.git;a=commit;h=04e2a13336f0e507ef416bbede3be92b79c46594
> 
> Yes, I made fix, but keep that in mind.
> 
> For example one of the boards (omap3-beagle):
> http://status.armcloud.us/boot/omap3-beagle/job/next/kernel/next-20141124/defconfig/arm-omap2plus_defconfig/
> http://status.armcloud.us/boot/omap3-beagle,legacy/job/next/kernel/next-20141124/defconfig/arm-omap2plus_defconfig/
> http://status.armcloud.us/boot/omap3-beagle/job/next/kernel/next-20141124/defconfig/arm-multi_v7_defconfig/
> 
> has following warning message in the u-boot log:
> > U-Boot 2014.07 (Aug 21 2014 - 11:03:05)
> > 
> > OMAP3530-GP ES3.0, CPU-OPP2, L3-165MHz, Max CPU Clock 600 MHz
> > OMAP3 Beagle board + LPDDR/NAND
> > ....
> > Beagle Rev C1/C2/C3
> > Timed out in wait_for_event: status=1000
> > Check if pads/pull-ups of bus 1 are properly configured
> 
> Also beagle schematic has following log entry for A3:
> 4. Added optional pullup resistors on I2C2_SCL and I2C_SDA into the layout.
> 
> Is the fault beagle is pre A3 revision?

Seems to be C1/C2/C3 based on the above logs?
 
> I can't tell anything about second one board (omap3-overo-tobi), because I could not get it schematic.
> 
> And how you have i2c.1 working without pull-ups I don't know.

I checked on the LDP and it seems to have external pulls for i2c.
I also tested n900 without your fix, and it's failing too. On n900
there are external pulls and all the internal pulls should be
disabled.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Kochetkov Nov. 26, 2014, 10:30 p.m. UTC | #11
Hello, Tony!

24.11.2014 22:47, Tony Lindgren <tony@atomide.com> *:

> If you post a patch, I can test boot with it. Looks like the failing
> ones have i2c rev3.3 on 34xx vs rev4.4 on 36xx.

> So I doubt we just want to change the test for for a higher rev number
> for  OMAP_I2C_REV_ON_3430_3530.

You 100% right here.

Functional mode bits are unimplemented in the SYSTEST register on omap3530.
"10:5 Reserved Write 0s for future compatibility. Read returns 0."
That was the reason.

My fault.
Thank you for giving right directions.
Thanks Kevin for running test, so I could debug the reason.

Regards,
Alexander.

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
index adfac9b..a238a1d 100644
--- a/drivers/i2c/busses/i2c-omap.c
+++ b/drivers/i2c/busses/i2c-omap.c
@@ -54,6 +54,9 @@ 
 /* timeout for pm runtime autosuspend */
 #define OMAP_I2C_PM_TIMEOUT		1000	/* ms */
 
+/* timeout for making decision on bus free status */
+#define OMAP_I2C_BUS_FREE_TIMEOUT (msecs_to_jiffies(10))
+
 /* For OMAP3 I2C_IV has changed to I2C_WE (wakeup enable) */
 enum {
 	OMAP_I2C_REV_REG = 0,
@@ -206,6 +209,9 @@  struct omap_i2c_dev {
 						 */
 	u32			rev;
 	unsigned		b_hw:1;		/* bad h/w fixes */
+	unsigned		bb_valid:1;	/* true when BB-bit reflects
+						 * the I2C bus state
+						 */
 	unsigned		receiver:1;	/* true when we're in receiver mode */
 	u16			iestate;	/* Saved interrupt register */
 	u16			pscstate;
@@ -332,7 +338,10 @@  static int omap_i2c_reset(struct omap_i2c_dev *dev)
 		/* SYSC register is cleared by the reset; rewrite it */
 		omap_i2c_write_reg(dev, OMAP_I2C_SYSC_REG, sysc);
 
+		/* Schedule I2C-bus monitoring on the next transfer */
+		dev->bb_valid = 0;
 	}
+
 	return 0;
 }
 
@@ -445,6 +454,11 @@  static int omap_i2c_init(struct omap_i2c_dev *dev)
 	dev->scllstate = scll;
 	dev->sclhstate = sclh;
 
+	if (dev->rev < OMAP_I2C_OMAP1_REV_2) {
+		/* Not implemented */
+		dev->bb_valid = 1;
+	}
+
 	__omap_i2c_init(dev);
 
 	return 0;
@@ -469,6 +483,91 @@  static int omap_i2c_wait_for_bb(struct omap_i2c_dev *dev)
 	return 0;
 }
 
+/*
+ * Wait while BB-bit doesn't reflect the I2C bus state
+ *
+ * In a multimaster environment, after IP software reset, BB-bit value doesn't
+ * correspond to the current bus state. It may happen what BB-bit will be 0,
+ * while the bus is busy due to another I2C master activity.
+ * Here are BB-bit values after reset:
+ *     SDA   SCL   BB   NOTES
+ *       0     0    0   1, 2
+ *       1     0    0   1, 2
+ *       0     1    1
+ *       1     1    0   3
+ * Later, if IP detect SDA=0 and SCL=1 (ACK) or SDA 1->0 while SCL=1 (START)
+ * combinations on the bus, it set BB-bit to 1.
+ * If IP detect SDA 0->1 while SCL=1 (STOP) combination on the bus,
+ * it set BB-bit to 0 and BF to 1.
+ * BB and BF bits correctly tracks the bus state while IP is suspended
+ * BB bit became valid on the next FCLK clock after CON_EN bit set
+ *
+ * NOTES:
+ * 1. Any transfer started when BB=0 and bus is busy wouldn't be
+ *    completed by IP and results in controller timeout.
+ * 2. Any transfer started when BB=0 and SCL=0 results in IP
+ *    starting to drive SDA low. In that case IP corrupt data
+ *    on the bus.
+ * 3. Any transfer started in the middle of another master's transfer
+ *    results in unpredictable results and data corruption
+ */
+static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *dev)
+{
+	unsigned long bus_free_timeout = 0;
+	unsigned long timeout;
+	int bus_free = 0;
+	u16 stat, systest;
+
+	if (dev->bb_valid)
+		return 0;
+
+	timeout = jiffies + OMAP_I2C_TIMEOUT;
+	while (1) {
+		stat = omap_i2c_read_reg(dev, OMAP_I2C_STAT_REG);
+		/*
+		 * We will see BB or BF event in a case IP had detected any
+		 * activity on the I2C bus. Now IP correctly tracks the bus
+		 * state. BB-bit value is valid.
+		 */
+		if (stat & (OMAP_I2C_STAT_BB | OMAP_I2C_STAT_BF))
+			break;
+
+		/*
+		 * Otherwise, we must look signals on the bus to make
+		 * the right decision.
+		 */
+		systest = omap_i2c_read_reg(dev, OMAP_I2C_SYSTEST_REG);
+		if ((systest & OMAP_I2C_SYSTEST_SCL_I_FUNC) &&
+		    (systest & OMAP_I2C_SYSTEST_SDA_I_FUNC)) {
+			if (!bus_free) {
+				bus_free_timeout = jiffies +
+					OMAP_I2C_BUS_FREE_TIMEOUT;
+				bus_free = 1;
+			}
+
+			/*
+			 * SDA and SCL lines was high for 10 ms without bus
+			 * activity detected. The bus is free. Consider
+			 * BB-bit value is valid.
+			 */
+			if (time_after(jiffies, bus_free_timeout))
+				break;
+		} else {
+			bus_free = 0;
+		}
+
+		if (time_after(jiffies, timeout)) {
+			dev_warn(dev->dev, "timeout waiting for bus ready\n");
+			return -ETIMEDOUT;
+		}
+
+		msleep(1);
+	}
+
+	dev->bb_valid = 1;
+	return 0;
+}
+
 static void omap_i2c_resize_fifo(struct omap_i2c_dev *dev, u8 size, bool is_rx)
 {
 	u16		buf;
@@ -639,6 +738,10 @@  omap_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
 	if (r < 0)
 		goto out;
 
+	r = omap_i2c_wait_for_bb_valid(dev);
+	if (r < 0)
+		goto out;
+
 	r = omap_i2c_wait_for_bb(dev);
 	if (r < 0)
 		goto out;