diff mbox

[v2,i2c/for-next] i1c: i801: recover from hardware PEC errors

Message ID 1430772076-1151-1-git-send-email-ellen@cumulusnetworks.com
State Superseded
Headers show

Commit Message

Ellen Wang May 4, 2015, 8:41 p.m. UTC
On a CRC error while using hardware-supported PEC, an additional
error bit is set in the auxiliary status register.  If this bit
isn't cleared, all subsequent operations will fail, essentially
hanging the controller.

The fix is simple: check, report, and clear the bit in
i802_check_post().  Also, in case the driver starts with the
hardware in that state, clear it in i801_check_pre() as well.

Signed-off-by: Ellen Wang <ellen@cumulusnetworks.com>
---
This is essentially the patch from Jean Delvare, which handles
the polling case while my original version didn't.  (Thank you!
Please add appropriate attribution if you wish.)

I tested all the additional code paths by selectively commenting
out code: with interrupts, without interrupts, relying on check_pre()
to clear CRCE, no clearing of CRCE at all (baseline).
---
 drivers/i2c/busses/i2c-i801.c |   53 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 51 insertions(+), 2 deletions(-)

Comments

Jean Delvare May 6, 2015, 9:33 a.m. UTC | #1
Hi Ellen,

On Mon,  4 May 2015 13:41:16 -0700, Ellen Wang wrote:
> On a CRC error while using hardware-supported PEC, an additional
> error bit is set in the auxiliary status register.  If this bit
> isn't cleared, all subsequent operations will fail, essentially
> hanging the controller.
> 
> The fix is simple: check, report, and clear the bit in
> i802_check_post().  Also, in case the driver starts with the
> hardware in that state, clear it in i801_check_pre() as well.

You seem to be angry against 1s and 2s ;-) The subsystem is i2c and the
driver is i801, not the other way around.

> 
> Signed-off-by: Ellen Wang <ellen@cumulusnetworks.com>
> ---
> This is essentially the patch from Jean Delvare, which handles
> the polling case while my original version didn't.  (Thank you!
> Please add appropriate attribution if you wish.)

Well, thanks for adding the comments, it's definitely helpful. This is
collaborative work :-)

> 
> I tested all the additional code paths by selectively commenting
> out code: with interrupts, without interrupts, relying on check_pre()
> to clear CRCE, no clearing of CRCE at all (baseline).

Thanks a lot for testing. I'll perform some tests on my ICH5 system as
well and if everything passes I'll resend the patch with fixed subject,
description and credits.
Ellen Wang May 6, 2015, 9:40 a.m. UTC | #2
-- sent from mobile device

On May 6, 2015 2:33:16 AM PDT, Jean Delvare <jdelvare@suse.de> wrote:
>Hi Ellen,
>
>On Mon,  4 May 2015 13:41:16 -0700, Ellen Wang wrote:
>> On a CRC error while using hardware-supported PEC, an additional
>> error bit is set in the auxiliary status register.  If this bit
>> isn't cleared, all subsequent operations will fail, essentially
>> hanging the controller.
>> 
>> The fix is simple: check, report, and clear the bit in
>> i802_check_post().  Also, in case the driver starts with the
>> hardware in that state, clear it in i801_check_pre() as well.
>
>You seem to be angry against 1s and 2s ;-) The subsystem is i2c and the
>driver is i801, not the other way around.

Heh.  I guess I typed i2c-i801 too many times.  Sorry.  Please fix.

>> Signed-off-by: Ellen Wang <ellen@cumulusnetworks.com>
>> ---
>> This is essentially the patch from Jean Delvare, which handles
>> the polling case while my original version didn't.  (Thank you!
>> Please add appropriate attribution if you wish.)
>
>Well, thanks for adding the comments, it's definitely helpful. This is
>collaborative work :-)
>
>> 
>> I tested all the additional code paths by selectively commenting
>> out code: with interrupts, without interrupts, relying on check_pre()
>> to clear CRCE, no clearing of CRCE at all (baseline).
>
>Thanks a lot for testing. I'll perform some tests on my ICH5 system as
>well and if everything passes I'll resend the patch with fixed subject,
>description and credits.
>
>-- 
>Jean Delvare
>SUSE L3 Support

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c
index 5ecbb3f..fa50df0 100644
--- a/drivers/i2c/busses/i2c-i801.c
+++ b/drivers/i2c/busses/i2c-i801.c
@@ -125,6 +125,10 @@ 
 #define SMBHSTCFG_SMB_SMI_EN	2
 #define SMBHSTCFG_I2C_EN	4
 
+/* Auxiliary status register bits, ICH4+ only */
+#define SMBAUXSTS_CRCE		1
+#define SMBAUXSTS_STCO		2
+
 /* Auxiliary control register bits, ICH4+ only */
 #define SMBAUXCTL_CRC		1
 #define SMBAUXCTL_E32B		2
@@ -273,6 +277,29 @@  static int i801_check_pre(struct i801_priv *priv)
 		}
 	}
 
+	/*
+	 * Clear CRC status if needed.
+	 * During normal operation, i801_check_post() takes care
+	 * of it after every operation.  We do it here only in case
+	 * the hardware was already in this state when the driver
+	 * started.
+	 */
+	if (priv->features & FEATURE_SMBUS_PEC) {
+		status = inb_p(SMBAUXSTS(priv)) & SMBAUXSTS_CRCE;
+		if (status) {
+			dev_dbg(&priv->pci_dev->dev,
+				"Clearing aux status flags (%02x)\n", status);
+			outb_p(status, SMBAUXSTS(priv));
+			status = inb_p(SMBAUXSTS(priv)) & SMBAUXSTS_CRCE;
+			if (status) {
+				dev_err(&priv->pci_dev->dev,
+					"Failed clearing aux status flags (%02x)\n",
+					status);
+				return -EBUSY;
+			}
+		}
+	}
+
 	return 0;
 }
 
@@ -316,8 +343,30 @@  static int i801_check_post(struct i801_priv *priv, int status)
 		dev_err(&priv->pci_dev->dev, "Transaction failed\n");
 	}
 	if (status & SMBHSTSTS_DEV_ERR) {
-		result = -ENXIO;
-		dev_dbg(&priv->pci_dev->dev, "No response\n");
+		/*
+		 * This may be a PEC error, check and clear it.
+		 *
+		 * AUXSTS is handled differently from HSTSTS.
+		 * For HSTSTS, i801_isr() or i801_wait_intr()
+		 * has already cleared the error bits in hardware,
+		 * and we are passed a copy of the original value
+		 * in "status".
+		 * For AUXSTS, the hardware register is left
+		 * for us to handle here.
+		 * This is asymmetric, slightly iffy, but safe,
+		 * since all this code is serialized and the CRCE
+		 * bit is harmless as long as it's cleared before
+		 * the next operation.
+		 */
+		if ((priv->features & FEATURE_SMBUS_PEC) &&
+		    (inb_p(SMBAUXSTS(priv)) & SMBAUXSTS_CRCE)) {
+			outb_p(SMBAUXSTS_CRCE, SMBAUXSTS(priv));
+			result = -EBADMSG;
+			dev_dbg(&priv->pci_dev->dev, "PEC error\n");
+		} else {
+			result = -ENXIO;
+			dev_dbg(&priv->pci_dev->dev, "No response\n");
+		}
 	}
 	if (status & SMBHSTSTS_BUS_ERR) {
 		result = -EAGAIN;