[1/2] opal/xscom: Move the delay inside xscom_reset() function.

Message ID 151266318054.20428.18312394104853955957.stgit@jupiter.in.ibm.com
State Superseded
Headers show
Series
  • [1/2] opal/xscom: Move the delay inside xscom_reset() function.
Related show

Commit Message

Mahesh Jagannath Salgaonkar Dec. 7, 2017, 4:13 p.m.
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

So caller of xscom_reset() does not have to bother about adding a delay
separately.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 hw/xscom.c |   27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

Comments

Nicholas Piggin Dec. 8, 2017, 3:42 a.m. | #1
On Thu, 07 Dec 2017 21:43:00 +0530
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> So caller of xscom_reset() does not have to bother about adding a delay
> separately.

Hmm, this does change behaviour a bit. I suppose it is harmless because
it's just adding small delays in error paths. But it seems that the delay
was only needed for busy xscom, and only to delay the retry of the same
xscom. Whereas xscom_reset gets called in more situations.

Does the xscom_clear_error in your next patch also require some delays?
I wouldn't mind having those delays explicitly commented in there.

Thanks,
Nick

> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  hw/xscom.c |   27 ++++++++++++++-------------
>  1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/xscom.c b/hw/xscom.c
> index 5b3bd88..2621465 100644
> --- a/hw/xscom.c
> +++ b/hw/xscom.c
> @@ -96,6 +96,7 @@ static void xscom_reset(uint32_t gcid)
>  {
>  	u64 hmer;
>  	uint32_t recv_status_reg, log_reg, err_reg;
> +	struct timespec ts;
>  
>  	/* Clear errors in HMER */
>  	mtspr(SPR_HMER, HMER_CLR_MASK);
> @@ -126,6 +127,19 @@ static void xscom_reset(uint32_t gcid)
>  	hmer = xscom_wait_done();
>  	if (hmer & SPR_HMER_XSCOM_FAIL)
>  		goto fail;
> +
> +	/*
> +	 * Its observed that sometimes immediate retry of
> +	 * XSCOM operation returns wrong data. Adding a
> +	 * delay for XSCOM reset to be effective. Delay of
> +	 * 10 ms is found to be working fine experimentally.
> +	 * FIXME: Replace 10ms delay by exact delay needed
> +	 * or other alternate method to confirm XSCOM reset
> +	 * completion, after checking from HW folks.
> +	 */
> +	ts.tv_sec = 0;
> +	ts.tv_nsec = 10 * 1000;
> +	nanosleep_nopoll(&ts, NULL);
>  	return;
>   fail:
>  	/* Fatal error resetting XSCOM */
> @@ -140,7 +154,6 @@ static void xscom_reset(uint32_t gcid)
>  static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr,
>  			      bool is_write, int64_t retries)
>  {
> -	struct timespec ts;
>  	unsigned int stat = GETFIELD(SPR_HMER_XSCOM_STATUS, hmer);
>  	int64_t rc = OPAL_HARDWARE;
>  
> @@ -160,18 +173,6 @@ static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_add
>  				XSCOM_BUSY_RESET_THRESHOLD, retries);
>  			xscom_reset(gcid);
>  
> -			/*
> -			 * Its observed that sometimes immediate retry of
> -			 * XSCOM operation returns wrong data. Adding a
> -			 * delay for XSCOM reset to be effective. Delay of
> -			 * 10 ms is found to be working fine experimentally.
> -			 * FIXME: Replace 10ms delay by exact delay needed
> -			 * or other alternate method to confirm XSCOM reset
> -			 * completion, after checking from HW folks.
> -			 */
> -			ts.tv_sec = 0;
> -			ts.tv_nsec = 10 * 1000;
> -			nanosleep_nopoll(&ts, NULL);
>  		}
>  
>  		/* Log error if we have retried enough and its still busy */
> 
> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
Mahesh Jagannath Salgaonkar Dec. 8, 2017, 5:43 a.m. | #2
On 12/08/2017 09:12 AM, Nicholas Piggin wrote:
> On Thu, 07 Dec 2017 21:43:00 +0530
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> So caller of xscom_reset() does not have to bother about adding a delay
>> separately.
> 
> Hmm, this does change behaviour a bit. I suppose it is harmless because
> it's just adding small delays in error paths. But it seems that the delay
> was only needed for busy xscom, and only to delay the retry of the same
> xscom. Whereas xscom_reset gets called in more situations.
> 
> Does the xscom_clear_error in your next patch also require some delays?

Yes, it does. I struggled for couple of days to get invalid address for
clearing write to 20010800 immediately after xscom_reset(). Without this
delay the write was responding with scom done with no error, which was
completely confusing as there is no register at 0x20010800. With the
delay things started working as expected. It looks like delay is very
much required if we are doing next scom operation immediately after the
reset.

> I wouldn't mind having those delays explicitly commented in there.

I see your point. If there is no scom operation immediately after the
reset we can do away with the delay. In the normal scom failure other
than busy scom and clear scom we don;t need this delay. How about adding
'need_delay' as a second argument to the xscom_reset(gcid,
need_delay=<true|false>) function to control the delay instead of adding
delays multiple places explicitly ? Do you think that should should work
? I will respin v2 with the change if it's ok.

Thanks for your review.

-Mahesh.

> 
> Thanks,
> Nick
> 
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>  hw/xscom.c |   27 ++++++++++++++-------------
>>  1 file changed, 14 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/xscom.c b/hw/xscom.c
>> index 5b3bd88..2621465 100644
>> --- a/hw/xscom.c
>> +++ b/hw/xscom.c
>> @@ -96,6 +96,7 @@ static void xscom_reset(uint32_t gcid)
>>  {
>>  	u64 hmer;
>>  	uint32_t recv_status_reg, log_reg, err_reg;
>> +	struct timespec ts;
>>  
>>  	/* Clear errors in HMER */
>>  	mtspr(SPR_HMER, HMER_CLR_MASK);
>> @@ -126,6 +127,19 @@ static void xscom_reset(uint32_t gcid)
>>  	hmer = xscom_wait_done();
>>  	if (hmer & SPR_HMER_XSCOM_FAIL)
>>  		goto fail;
>> +
>> +	/*
>> +	 * Its observed that sometimes immediate retry of
>> +	 * XSCOM operation returns wrong data. Adding a
>> +	 * delay for XSCOM reset to be effective. Delay of
>> +	 * 10 ms is found to be working fine experimentally.
>> +	 * FIXME: Replace 10ms delay by exact delay needed
>> +	 * or other alternate method to confirm XSCOM reset
>> +	 * completion, after checking from HW folks.
>> +	 */
>> +	ts.tv_sec = 0;
>> +	ts.tv_nsec = 10 * 1000;
>> +	nanosleep_nopoll(&ts, NULL);
>>  	return;
>>   fail:
>>  	/* Fatal error resetting XSCOM */
>> @@ -140,7 +154,6 @@ static void xscom_reset(uint32_t gcid)
>>  static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr,
>>  			      bool is_write, int64_t retries)
>>  {
>> -	struct timespec ts;
>>  	unsigned int stat = GETFIELD(SPR_HMER_XSCOM_STATUS, hmer);
>>  	int64_t rc = OPAL_HARDWARE;
>>  
>> @@ -160,18 +173,6 @@ static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_add
>>  				XSCOM_BUSY_RESET_THRESHOLD, retries);
>>  			xscom_reset(gcid);
>>  
>> -			/*
>> -			 * Its observed that sometimes immediate retry of
>> -			 * XSCOM operation returns wrong data. Adding a
>> -			 * delay for XSCOM reset to be effective. Delay of
>> -			 * 10 ms is found to be working fine experimentally.
>> -			 * FIXME: Replace 10ms delay by exact delay needed
>> -			 * or other alternate method to confirm XSCOM reset
>> -			 * completion, after checking from HW folks.
>> -			 */
>> -			ts.tv_sec = 0;
>> -			ts.tv_nsec = 10 * 1000;
>> -			nanosleep_nopoll(&ts, NULL);
>>  		}
>>  
>>  		/* Log error if we have retried enough and its still busy */
>>
>> _______________________________________________
>> Skiboot mailing list
>> Skiboot@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/skiboot
>
Nicholas Piggin Dec. 8, 2017, 6:20 a.m. | #3
On Fri, 8 Dec 2017 11:13:52 +0530
Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:

> On 12/08/2017 09:12 AM, Nicholas Piggin wrote:
> > On Thu, 07 Dec 2017 21:43:00 +0530
> > Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:
> >   
> >> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> >>
> >> So caller of xscom_reset() does not have to bother about adding a delay
> >> separately.  
> > 
> > Hmm, this does change behaviour a bit. I suppose it is harmless because
> > it's just adding small delays in error paths. But it seems that the delay
> > was only needed for busy xscom, and only to delay the retry of the same
> > xscom. Whereas xscom_reset gets called in more situations.
> > 
> > Does the xscom_clear_error in your next patch also require some delays?  
> 
> Yes, it does. I struggled for couple of days to get invalid address for
> clearing write to 20010800 immediately after xscom_reset(). Without this
> delay the write was responding with scom done with no error, which was
> completely confusing as there is no register at 0x20010800. With the
> delay things started working as expected. It looks like delay is very
> much required if we are doing next scom operation immediately after the
> reset.

Okay that makes sense, I was just curious. Might be good to comment that
as well in the clear_error path.

> > I wouldn't mind having those delays explicitly commented in there.  
> 
> I see your point. If there is no scom operation immediately after the
> reset we can do away with the delay. In the normal scom failure other
> than busy scom and clear scom we don;t need this delay. How about adding
> 'need_delay' as a second argument to the xscom_reset(gcid,
> need_delay=<true|false>) function to control the delay instead of adding
> delays multiple places explicitly ? Do you think that should should work
> ? I will respin v2 with the change if it's ok.

Yeah that could be better. Although be careful in case the caller does
another xscom_read/write immediately after a failure. If that is a
concern I would err on the side of caution.

I don't have a better alternative, so long as you are happy with it,
that's fine by me.

Thanks,
Nick

Patch

diff --git a/hw/xscom.c b/hw/xscom.c
index 5b3bd88..2621465 100644
--- a/hw/xscom.c
+++ b/hw/xscom.c
@@ -96,6 +96,7 @@  static void xscom_reset(uint32_t gcid)
 {
 	u64 hmer;
 	uint32_t recv_status_reg, log_reg, err_reg;
+	struct timespec ts;
 
 	/* Clear errors in HMER */
 	mtspr(SPR_HMER, HMER_CLR_MASK);
@@ -126,6 +127,19 @@  static void xscom_reset(uint32_t gcid)
 	hmer = xscom_wait_done();
 	if (hmer & SPR_HMER_XSCOM_FAIL)
 		goto fail;
+
+	/*
+	 * Its observed that sometimes immediate retry of
+	 * XSCOM operation returns wrong data. Adding a
+	 * delay for XSCOM reset to be effective. Delay of
+	 * 10 ms is found to be working fine experimentally.
+	 * FIXME: Replace 10ms delay by exact delay needed
+	 * or other alternate method to confirm XSCOM reset
+	 * completion, after checking from HW folks.
+	 */
+	ts.tv_sec = 0;
+	ts.tv_nsec = 10 * 1000;
+	nanosleep_nopoll(&ts, NULL);
 	return;
  fail:
 	/* Fatal error resetting XSCOM */
@@ -140,7 +154,6 @@  static void xscom_reset(uint32_t gcid)
 static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr,
 			      bool is_write, int64_t retries)
 {
-	struct timespec ts;
 	unsigned int stat = GETFIELD(SPR_HMER_XSCOM_STATUS, hmer);
 	int64_t rc = OPAL_HARDWARE;
 
@@ -160,18 +173,6 @@  static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_add
 				XSCOM_BUSY_RESET_THRESHOLD, retries);
 			xscom_reset(gcid);
 
-			/*
-			 * Its observed that sometimes immediate retry of
-			 * XSCOM operation returns wrong data. Adding a
-			 * delay for XSCOM reset to be effective. Delay of
-			 * 10 ms is found to be working fine experimentally.
-			 * FIXME: Replace 10ms delay by exact delay needed
-			 * or other alternate method to confirm XSCOM reset
-			 * completion, after checking from HW folks.
-			 */
-			ts.tv_sec = 0;
-			ts.tv_nsec = 10 * 1000;
-			nanosleep_nopoll(&ts, NULL);
 		}
 
 		/* Log error if we have retried enough and its still busy */