Message ID | 20171020173237.76535-1-cbostic@linux.vnet.ibm.com |
---|---|
State | Rejected, archived |
Headers | show |
Series | [linux,dev-4.10] fsi: core: Allow more BREAKs to recover a failing link | expand |
Hi Chris, > Recovering a failing FSI link can require more than a single > BREAK command to reset the FSI slave. Test results indicate > that communications can be restored when a second or third > BREAK is sent when the previous attempts fail. This seems somewhat suspicious to me. Do we have any indication from the CFAM documentation, or the folks responsible for the CFAM implentation, that multiple breaks will actually cause any difference in behaviour that a single one? Or is this based only on experimentation? I'm worried that we'd be papering-over the underlying issue here. Regards, Jeremy
On Mon, 2017-10-23 at 10:57 +0800, Jeremy Kerr wrote: > Hi Chris, > > > Recovering a failing FSI link can require more than a single > > BREAK command to reset the FSI slave. Test results indicate > > that communications can be restored when a second or third > > BREAK is sent when the previous attempts fail. > > This seems somewhat suspicious to me. Do we have any indication from the > CFAM documentation, or the folks responsible for the CFAM implentation, > that multiple breaks will actually cause any difference in behaviour > that a single one? Or is this based only on experimentation? > > I'm worried that we'd be papering-over the underlying issue here. > Chris, my understanding is we're abandoning the approach in this patch. Is that correct? Andrew
On 10/29/17 11:04 PM, Andrew Jeffery wrote: > On Mon, 2017-10-23 at 10:57 +0800, Jeremy Kerr wrote: >> Hi Chris, >> >>> Recovering a failing FSI link can require more than a single >>> BREAK command to reset the FSI slave. Test results indicate >>> that communications can be restored when a second or third >>> BREAK is sent when the previous attempts fail. >> This seems somewhat suspicious to me. Do we have any indication from the >> CFAM documentation, or the folks responsible for the CFAM implentation, >> that multiple breaks will actually cause any difference in behaviour >> that a single one? Or is this based only on experimentation? >> >> I'm worried that we'd be papering-over the underlying issue here. >> > Chris, my understanding is we're abandoning the approach in this patch. > Is that correct? Yes that's correct. Focus is now on finding cause for the bus contention instead of recovering from it. Chris > > Andrew
diff --git a/drivers/fsi/fsi-core.c b/drivers/fsi/fsi-core.c index 8a17176..f3dd7f6 100644 --- a/drivers/fsi/fsi-core.c +++ b/drivers/fsi/fsi-core.c @@ -85,6 +85,7 @@ struct fsi_slave { #define to_fsi_slave(d) container_of(d, struct fsi_slave, dev) static const int slave_retries = 2; +static const int break_retries = 5; static int discard_errors; static int fsi_master_read(struct fsi_master *master, int link, @@ -228,7 +229,7 @@ int fsi_slave_handle_error(struct fsi_slave *slave, bool write, uint32_t addr, size_t size) { struct fsi_master *master = slave->master; - int rc, link; + int rc, link, i; uint32_t reg; uint8_t id; @@ -262,15 +263,25 @@ int fsi_slave_handle_error(struct fsi_slave *slave, bool write, uint32_t addr, } /* getting serious, reset the slave via BREAK */ - rc = fsi_master_break(master, link); - if (rc) - return rc; + for (i = 0; i < break_retries; i++) { - rc = fsi_slave_set_smode(master, link, id); - if (rc) - return rc; + dev_dbg(&slave->dev, "recovery break attempt %d of %d max", i+1, + break_retries); + + rc = fsi_master_break(master, link); + if (rc) + continue; - return fsi_slave_report_and_clear_errors(slave); + rc = fsi_slave_set_smode(master, link, id); + if (rc) + continue; + + rc = fsi_slave_report_and_clear_errors(slave); + if (!rc) + break; + } + + return rc; } int fsi_slave_read(struct fsi_slave *slave, uint32_t addr,
Recovering a failing FSI link can require more than a single BREAK command to reset the FSI slave. Test results indicate that communications can be restored when a second or third BREAK is sent when the previous attempts fail. Additionally, even if a BREAK succeeds during error recovery process the FSI slave may flag following access errors to SMODE, SISC, or SSTAT. Repeated BREAKs will get the slave out of this fail mode. Signed-off-by: Christopher Bostic <cbostic@linux.vnet.ibm.com> --- drivers/fsi/fsi-core.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)