Message ID | 151266318054.20428.18312394104853955957.stgit@jupiter.in.ibm.com |
---|---|
State | Superseded |
Headers | show |
Series | [1/2] opal/xscom: Move the delay inside xscom_reset() function. | expand |
On Thu, 07 Dec 2017 21:43:00 +0530 Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote: > From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> > > So caller of xscom_reset() does not have to bother about adding a delay > separately. Hmm, this does change behaviour a bit. I suppose it is harmless because it's just adding small delays in error paths. But it seems that the delay was only needed for busy xscom, and only to delay the retry of the same xscom. Whereas xscom_reset gets called in more situations. Does the xscom_clear_error in your next patch also require some delays? I wouldn't mind having those delays explicitly commented in there. Thanks, Nick > > Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> > --- > hw/xscom.c | 27 ++++++++++++++------------- > 1 file changed, 14 insertions(+), 13 deletions(-) > > diff --git a/hw/xscom.c b/hw/xscom.c > index 5b3bd88..2621465 100644 > --- a/hw/xscom.c > +++ b/hw/xscom.c > @@ -96,6 +96,7 @@ static void xscom_reset(uint32_t gcid) > { > u64 hmer; > uint32_t recv_status_reg, log_reg, err_reg; > + struct timespec ts; > > /* Clear errors in HMER */ > mtspr(SPR_HMER, HMER_CLR_MASK); > @@ -126,6 +127,19 @@ static void xscom_reset(uint32_t gcid) > hmer = xscom_wait_done(); > if (hmer & SPR_HMER_XSCOM_FAIL) > goto fail; > + > + /* > + * Its observed that sometimes immediate retry of > + * XSCOM operation returns wrong data. Adding a > + * delay for XSCOM reset to be effective. Delay of > + * 10 ms is found to be working fine experimentally. > + * FIXME: Replace 10ms delay by exact delay needed > + * or other alternate method to confirm XSCOM reset > + * completion, after checking from HW folks. > + */ > + ts.tv_sec = 0; > + ts.tv_nsec = 10 * 1000; > + nanosleep_nopoll(&ts, NULL); > return; > fail: > /* Fatal error resetting XSCOM */ > @@ -140,7 +154,6 @@ static void xscom_reset(uint32_t gcid) > static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr, > bool is_write, int64_t retries) > { > - struct timespec ts; > unsigned int stat = GETFIELD(SPR_HMER_XSCOM_STATUS, hmer); > int64_t rc = OPAL_HARDWARE; > > @@ -160,18 +173,6 @@ static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_add > XSCOM_BUSY_RESET_THRESHOLD, retries); > xscom_reset(gcid); > > - /* > - * Its observed that sometimes immediate retry of > - * XSCOM operation returns wrong data. Adding a > - * delay for XSCOM reset to be effective. Delay of > - * 10 ms is found to be working fine experimentally. > - * FIXME: Replace 10ms delay by exact delay needed > - * or other alternate method to confirm XSCOM reset > - * completion, after checking from HW folks. > - */ > - ts.tv_sec = 0; > - ts.tv_nsec = 10 * 1000; > - nanosleep_nopoll(&ts, NULL); > } > > /* Log error if we have retried enough and its still busy */ > > _______________________________________________ > Skiboot mailing list > Skiboot@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/skiboot
On 12/08/2017 09:12 AM, Nicholas Piggin wrote: > On Thu, 07 Dec 2017 21:43:00 +0530 > Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote: > >> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> >> >> So caller of xscom_reset() does not have to bother about adding a delay >> separately. > > Hmm, this does change behaviour a bit. I suppose it is harmless because > it's just adding small delays in error paths. But it seems that the delay > was only needed for busy xscom, and only to delay the retry of the same > xscom. Whereas xscom_reset gets called in more situations. > > Does the xscom_clear_error in your next patch also require some delays? Yes, it does. I struggled for couple of days to get invalid address for clearing write to 20010800 immediately after xscom_reset(). Without this delay the write was responding with scom done with no error, which was completely confusing as there is no register at 0x20010800. With the delay things started working as expected. It looks like delay is very much required if we are doing next scom operation immediately after the reset. > I wouldn't mind having those delays explicitly commented in there. I see your point. If there is no scom operation immediately after the reset we can do away with the delay. In the normal scom failure other than busy scom and clear scom we don;t need this delay. How about adding 'need_delay' as a second argument to the xscom_reset(gcid, need_delay=<true|false>) function to control the delay instead of adding delays multiple places explicitly ? Do you think that should should work ? I will respin v2 with the change if it's ok. Thanks for your review. -Mahesh. > > Thanks, > Nick > >> >> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> >> --- >> hw/xscom.c | 27 ++++++++++++++------------- >> 1 file changed, 14 insertions(+), 13 deletions(-) >> >> diff --git a/hw/xscom.c b/hw/xscom.c >> index 5b3bd88..2621465 100644 >> --- a/hw/xscom.c >> +++ b/hw/xscom.c >> @@ -96,6 +96,7 @@ static void xscom_reset(uint32_t gcid) >> { >> u64 hmer; >> uint32_t recv_status_reg, log_reg, err_reg; >> + struct timespec ts; >> >> /* Clear errors in HMER */ >> mtspr(SPR_HMER, HMER_CLR_MASK); >> @@ -126,6 +127,19 @@ static void xscom_reset(uint32_t gcid) >> hmer = xscom_wait_done(); >> if (hmer & SPR_HMER_XSCOM_FAIL) >> goto fail; >> + >> + /* >> + * Its observed that sometimes immediate retry of >> + * XSCOM operation returns wrong data. Adding a >> + * delay for XSCOM reset to be effective. Delay of >> + * 10 ms is found to be working fine experimentally. >> + * FIXME: Replace 10ms delay by exact delay needed >> + * or other alternate method to confirm XSCOM reset >> + * completion, after checking from HW folks. >> + */ >> + ts.tv_sec = 0; >> + ts.tv_nsec = 10 * 1000; >> + nanosleep_nopoll(&ts, NULL); >> return; >> fail: >> /* Fatal error resetting XSCOM */ >> @@ -140,7 +154,6 @@ static void xscom_reset(uint32_t gcid) >> static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr, >> bool is_write, int64_t retries) >> { >> - struct timespec ts; >> unsigned int stat = GETFIELD(SPR_HMER_XSCOM_STATUS, hmer); >> int64_t rc = OPAL_HARDWARE; >> >> @@ -160,18 +173,6 @@ static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_add >> XSCOM_BUSY_RESET_THRESHOLD, retries); >> xscom_reset(gcid); >> >> - /* >> - * Its observed that sometimes immediate retry of >> - * XSCOM operation returns wrong data. Adding a >> - * delay for XSCOM reset to be effective. Delay of >> - * 10 ms is found to be working fine experimentally. >> - * FIXME: Replace 10ms delay by exact delay needed >> - * or other alternate method to confirm XSCOM reset >> - * completion, after checking from HW folks. >> - */ >> - ts.tv_sec = 0; >> - ts.tv_nsec = 10 * 1000; >> - nanosleep_nopoll(&ts, NULL); >> } >> >> /* Log error if we have retried enough and its still busy */ >> >> _______________________________________________ >> Skiboot mailing list >> Skiboot@lists.ozlabs.org >> https://lists.ozlabs.org/listinfo/skiboot >
On Fri, 8 Dec 2017 11:13:52 +0530 Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com> wrote: > On 12/08/2017 09:12 AM, Nicholas Piggin wrote: > > On Thu, 07 Dec 2017 21:43:00 +0530 > > Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote: > > > >> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> > >> > >> So caller of xscom_reset() does not have to bother about adding a delay > >> separately. > > > > Hmm, this does change behaviour a bit. I suppose it is harmless because > > it's just adding small delays in error paths. But it seems that the delay > > was only needed for busy xscom, and only to delay the retry of the same > > xscom. Whereas xscom_reset gets called in more situations. > > > > Does the xscom_clear_error in your next patch also require some delays? > > Yes, it does. I struggled for couple of days to get invalid address for > clearing write to 20010800 immediately after xscom_reset(). Without this > delay the write was responding with scom done with no error, which was > completely confusing as there is no register at 0x20010800. With the > delay things started working as expected. It looks like delay is very > much required if we are doing next scom operation immediately after the > reset. Okay that makes sense, I was just curious. Might be good to comment that as well in the clear_error path. > > I wouldn't mind having those delays explicitly commented in there. > > I see your point. If there is no scom operation immediately after the > reset we can do away with the delay. In the normal scom failure other > than busy scom and clear scom we don;t need this delay. How about adding > 'need_delay' as a second argument to the xscom_reset(gcid, > need_delay=<true|false>) function to control the delay instead of adding > delays multiple places explicitly ? Do you think that should should work > ? I will respin v2 with the change if it's ok. Yeah that could be better. Although be careful in case the caller does another xscom_read/write immediately after a failure. If that is a concern I would err on the side of caution. I don't have a better alternative, so long as you are happy with it, that's fine by me. Thanks, Nick
diff --git a/hw/xscom.c b/hw/xscom.c index 5b3bd88..2621465 100644 --- a/hw/xscom.c +++ b/hw/xscom.c @@ -96,6 +96,7 @@ static void xscom_reset(uint32_t gcid) { u64 hmer; uint32_t recv_status_reg, log_reg, err_reg; + struct timespec ts; /* Clear errors in HMER */ mtspr(SPR_HMER, HMER_CLR_MASK); @@ -126,6 +127,19 @@ static void xscom_reset(uint32_t gcid) hmer = xscom_wait_done(); if (hmer & SPR_HMER_XSCOM_FAIL) goto fail; + + /* + * Its observed that sometimes immediate retry of + * XSCOM operation returns wrong data. Adding a + * delay for XSCOM reset to be effective. Delay of + * 10 ms is found to be working fine experimentally. + * FIXME: Replace 10ms delay by exact delay needed + * or other alternate method to confirm XSCOM reset + * completion, after checking from HW folks. + */ + ts.tv_sec = 0; + ts.tv_nsec = 10 * 1000; + nanosleep_nopoll(&ts, NULL); return; fail: /* Fatal error resetting XSCOM */ @@ -140,7 +154,6 @@ static void xscom_reset(uint32_t gcid) static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr, bool is_write, int64_t retries) { - struct timespec ts; unsigned int stat = GETFIELD(SPR_HMER_XSCOM_STATUS, hmer); int64_t rc = OPAL_HARDWARE; @@ -160,18 +173,6 @@ static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_add XSCOM_BUSY_RESET_THRESHOLD, retries); xscom_reset(gcid); - /* - * Its observed that sometimes immediate retry of - * XSCOM operation returns wrong data. Adding a - * delay for XSCOM reset to be effective. Delay of - * 10 ms is found to be working fine experimentally. - * FIXME: Replace 10ms delay by exact delay needed - * or other alternate method to confirm XSCOM reset - * completion, after checking from HW folks. - */ - ts.tv_sec = 0; - ts.tv_nsec = 10 * 1000; - nanosleep_nopoll(&ts, NULL); } /* Log error if we have retried enough and its still busy */