diff mbox series

[linux,dev-4.10] fsi: core: Allow more BREAKs to recover a failing link

Message ID 20171020173237.76535-1-cbostic@linux.vnet.ibm.com
State Rejected, archived
Headers show
Series [linux,dev-4.10] fsi: core: Allow more BREAKs to recover a failing link | expand

Commit Message

Christopher Bostic Oct. 20, 2017, 5:32 p.m. UTC
Recovering a failing FSI link can require more than a single
BREAK command to reset the FSI slave. Test results indicate
that communications can be restored when a second or third
BREAK is sent when the previous attempts fail. Additionally,
even if a BREAK succeeds during error recovery process the FSI
slave may flag following access errors to SMODE, SISC, or SSTAT.
Repeated BREAKs will get the slave out of this fail mode.

Signed-off-by: Christopher Bostic <cbostic@linux.vnet.ibm.com>
---
 drivers/fsi/fsi-core.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

Comments

Jeremy Kerr Oct. 23, 2017, 2:57 a.m. UTC | #1
Hi Chris,

> Recovering a failing FSI link can require more than a single
> BREAK command to reset the FSI slave. Test results indicate
> that communications can be restored when a second or third
> BREAK is sent when the previous attempts fail.

This seems somewhat suspicious to me. Do we have any indication from the
CFAM documentation, or the folks responsible for the CFAM implentation,
that multiple breaks will actually cause any difference in behaviour
that a single one? Or is this based only on experimentation?

I'm worried that we'd be papering-over the underlying issue here.

Regards,


Jeremy
Andrew Jeffery Oct. 30, 2017, 4:04 a.m. UTC | #2
On Mon, 2017-10-23 at 10:57 +0800, Jeremy Kerr wrote:
> Hi Chris,
> 
> > Recovering a failing FSI link can require more than a single
> > BREAK command to reset the FSI slave. Test results indicate
> > that communications can be restored when a second or third
> > BREAK is sent when the previous attempts fail.
> 
> This seems somewhat suspicious to me. Do we have any indication from the
> CFAM documentation, or the folks responsible for the CFAM implentation,
> that multiple breaks will actually cause any difference in behaviour
> that a single one? Or is this based only on experimentation?
> 
> I'm worried that we'd be papering-over the underlying issue here.
> 

Chris, my understanding is we're abandoning the approach in this patch.
Is that correct?

Andrew
Christopher Bostic Oct. 30, 2017, 6:20 p.m. UTC | #3
On 10/29/17 11:04 PM, Andrew Jeffery wrote:
> On Mon, 2017-10-23 at 10:57 +0800, Jeremy Kerr wrote:
>> Hi Chris,
>>
>>> Recovering a failing FSI link can require more than a single
>>> BREAK command to reset the FSI slave. Test results indicate
>>> that communications can be restored when a second or third
>>> BREAK is sent when the previous attempts fail.
>> This seems somewhat suspicious to me. Do we have any indication from the
>> CFAM documentation, or the folks responsible for the CFAM implentation,
>> that multiple breaks will actually cause any difference in behaviour
>> that a single one? Or is this based only on experimentation?
>>
>> I'm worried that we'd be papering-over the underlying issue here.
>>
> Chris, my understanding is we're abandoning the approach in this patch.
> Is that correct?

Yes that's correct.   Focus is now on finding cause for the bus 
contention instead of recovering from it.

Chris
>
> Andrew
diff mbox series

Patch

diff --git a/drivers/fsi/fsi-core.c b/drivers/fsi/fsi-core.c
index 8a17176..f3dd7f6 100644
--- a/drivers/fsi/fsi-core.c
+++ b/drivers/fsi/fsi-core.c
@@ -85,6 +85,7 @@  struct fsi_slave {
 #define to_fsi_slave(d) container_of(d, struct fsi_slave, dev)
 
 static const int slave_retries = 2;
+static const int break_retries = 5;
 static int discard_errors;
 
 static int fsi_master_read(struct fsi_master *master, int link,
@@ -228,7 +229,7 @@  int fsi_slave_handle_error(struct fsi_slave *slave, bool write, uint32_t addr,
 		size_t size)
 {
 	struct fsi_master *master = slave->master;
-	int rc, link;
+	int rc, link, i;
 	uint32_t reg;
 	uint8_t id;
 
@@ -262,15 +263,25 @@  int fsi_slave_handle_error(struct fsi_slave *slave, bool write, uint32_t addr,
 	}
 
 	/* getting serious, reset the slave via BREAK */
-	rc = fsi_master_break(master, link);
-	if (rc)
-		return rc;
+	for (i = 0; i < break_retries; i++) {
 
-	rc = fsi_slave_set_smode(master, link, id);
-	if (rc)
-		return rc;
+		dev_dbg(&slave->dev, "recovery break attempt %d of %d max", i+1,
+			break_retries);
+
+		rc = fsi_master_break(master, link);
+		if (rc)
+			continue;
 
-	return fsi_slave_report_and_clear_errors(slave);
+		rc = fsi_slave_set_smode(master, link, id);
+		if (rc)
+			continue;
+
+		rc = fsi_slave_report_and_clear_errors(slave);
+		if (!rc)
+			break;
+	}
+
+	return rc;
 }
 
 int fsi_slave_read(struct fsi_slave *slave, uint32_t addr,