diff mbox

[V2,2/2] pseries/eeh: Handle RTAS delay requests in configure_bridge

Message ID 1459219911-14110-2-git-send-email-ruscur@russell.cc (mailing list archive)
State Changes Requested
Headers show

Commit Message

Russell Currey March 29, 2016, 2:51 a.m. UTC
In the configure_pe and configure_bridge RTAS calls, the spec states
that values of 9900-9905 can be returned, indicating that software
should delay for 10^x (where x is the last digit, i.e. 990x)
milliseconds and attempt the call again. Currently, the kernel doesn't
know about this, and respecting it fixes some PCI failures when the
hypervisor is busy.

The delay is capped at 0.2 seconds.

Cc: <stable@vger.kernel.org> # 3.10-
Signed-off-by: Russell Currey <ruscur@russell.cc>
---
V2: Use rtas_busy_delay and the new ibm_configure_pe token, refactoring
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

Comments

Gavin Shan March 29, 2016, 9:49 a.m. UTC | #1
On Tue, Mar 29, 2016 at 12:51:51PM +1000, Russell Currey wrote:
>In the configure_pe and configure_bridge RTAS calls, the spec states
>that values of 9900-9905 can be returned, indicating that software
>should delay for 10^x (where x is the last digit, i.e. 990x)
>milliseconds and attempt the call again. Currently, the kernel doesn't
>know about this, and respecting it fixes some PCI failures when the
>hypervisor is busy.
>
>The delay is capped at 0.2 seconds.
>

When talking about RTAS calls, it might be better to have their full
names defined in PAPR spec. So I guess it'd better to replace configure_pe
and configure_bridge with their corresponding full RTAS call names:
"ibm,configure-pe" and "ibm,configure-bridge".

>Cc: <stable@vger.kernel.org> # 3.10-

I think it would be #3.10+.

>Signed-off-by: Russell Currey <ruscur@russell.cc>
>---
>V2: Use rtas_busy_delay and the new ibm_configure_pe token, refactoring
>---
> arch/powerpc/platforms/pseries/eeh_pseries.c | 35 +++++++++++++++++++++++-----
> 1 file changed, 29 insertions(+), 6 deletions(-)
>
>diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>index 231b1df..c442b11 100644
>--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>@@ -619,20 +619,43 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
> {
> 	int config_addr;
> 	int ret;
>+	/* Waiting 0.2s maximum before skipping configuration */
>+	int max_wait = 200;
>+	int mwait;
> 
> 	/* Figure out the PE address */
> 	config_addr = pe->config_addr;
> 	if (pe->addr)
> 		config_addr = pe->addr;
> 
>-	ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
>-			config_addr, BUID_HI(pe->phb->buid),
>-			BUID_LO(pe->phb->buid));
>+	while (max_wait > 0) {
>+		ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
>+				config_addr, BUID_HI(pe->phb->buid),
>+				BUID_LO(pe->phb->buid));
> 
>-	if (ret)
>-		pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n",
>-			__func__, pe->phb->global_number, pe->addr, ret);
>+		/*
>+		 * RTAS can return a delay value of up to 10^5 milliseconds
>+		 * (RTAS_EXTENDED_DELAY_MAX), which is too long.  Only respect
>+		 * the delay if it's 100ms or less.
>+		 */
> 

The PAPR spec states that the delay can be up to 10^5ms. The spec also suggests
the maximal delay is 10^2. I guess it's worthy to mention it in the above comments
if you like.

>+		switch (ret) {
>+		case 0:
>+			return ret;
>+		case RTAS_EXTENDED_DELAY_MIN:
>+		case RTAS_EXTENDED_DELAY_MIN+1:
>+		case RTAS_EXTENDED_DELAY_MIN+2:
>+			mwait = rtas_busy_delay(ret);
>+			break;
>+		default:
>+			goto err;
>+		}
>+
>+		max_wait -= mwait;

If you like, the block can be simplified to as below. In that case,
tag #err isn't needed.

                if (!ret)
                        return ret;

                max_wait -= rtas_busy_delay(ret);

>+	}
>+err:
>+	pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n",
>+		__func__, pe->phb->global_number, pe->addr, ret);
> 	return ret;
> }
> 

Thanks,
Gavin
Tyrel Datwyler March 29, 2016, 3:51 p.m. UTC | #2
On 03/28/2016 07:51 PM, Russell Currey wrote:
> In the configure_pe and configure_bridge RTAS calls, the spec states
> that values of 9900-9905 can be returned, indicating that software
> should delay for 10^x (where x is the last digit, i.e. 990x)
> milliseconds and attempt the call again. Currently, the kernel doesn't
> know about this, and respecting it fixes some PCI failures when the
> hypervisor is busy.
> 
> The delay is capped at 0.2 seconds.
> 
> Cc: <stable@vger.kernel.org> # 3.10-
> Signed-off-by: Russell Currey <ruscur@russell.cc>
> ---
> V2: Use rtas_busy_delay and the new ibm_configure_pe token, refactoring
> ---
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 35 +++++++++++++++++++++++-----
>  1 file changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index 231b1df..c442b11 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -619,20 +619,43 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
>  {
>  	int config_addr;
>  	int ret;
> +	/* Waiting 0.2s maximum before skipping configuration */
> +	int max_wait = 200;
> +	int mwait;
>  
>  	/* Figure out the PE address */
>  	config_addr = pe->config_addr;
>  	if (pe->addr)
>  		config_addr = pe->addr;
>  
> -	ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
> -			config_addr, BUID_HI(pe->phb->buid),
> -			BUID_LO(pe->phb->buid));
> +	while (max_wait > 0) {
> +		ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
> +				config_addr, BUID_HI(pe->phb->buid),
> +				BUID_LO(pe->phb->buid));
>  
> -	if (ret)
> -		pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n",
> -			__func__, pe->phb->global_number, pe->addr, ret);
> +		/*
> +		 * RTAS can return a delay value of up to 10^5 milliseconds
> +		 * (RTAS_EXTENDED_DELAY_MAX), which is too long.  Only respect
> +		 * the delay if it's 100ms or less.
> +		 */

Your changelog says the delay is capped at 0.2s. However, your comment
in the code mentions the full 10^5ms based on the 9900-9905 codes. It
would probably be best to expand you comment to mention in the code that
you are only handling 9900-9902 to eliminate the confusion of looking at
the above comment vs below implementation.

Further, despite PAPRs software note that the long busy should be
limited to 9900-9902 you might want to add a catch to your switch to log
any unexpected 9903-9905 or just treat them as max 0.2s delay. Firmware
has been know to do things on occasion that the spec says it shouldn't,
and it might not be obvious at first should you receive one of the
longer delay codes why we are going down the error path.

-Tyrel

>  
> +		switch (ret) {
> +		case 0:
> +			return ret;
> +		case RTAS_EXTENDED_DELAY_MIN:
> +		case RTAS_EXTENDED_DELAY_MIN+1:
> +		case RTAS_EXTENDED_DELAY_MIN+2:
> +			mwait = rtas_busy_delay(ret);
> +			break;
> +		default:
> +			goto err;
> +		}
> +
> +		max_wait -= mwait;
> +	}
> +err:
> +	pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n",
> +		__func__, pe->phb->global_number, pe->addr, ret);
>  	return ret;
>  }
>  
>
Russell Currey March 29, 2016, 9:58 p.m. UTC | #3
On Tue, 2016-03-29 at 08:51 -0700, Tyrel Datwyler wrote:
> On 03/28/2016 07:51 PM, Russell Currey wrote:
> > +		/*
> > +		 * RTAS can return a delay value of up to 10^5
> > milliseconds
> > +		 * (RTAS_EXTENDED_DELAY_MAX), which is too long.  Only
> > respect
> > +		 * the delay if it's 100ms or less.
> > +		 */
> Your changelog says the delay is capped at 0.2s. However, your comment
> in the code mentions the full 10^5ms based on the 9900-9905 codes. It
> would probably be best to expand you comment to mention in the code that
> you are only handling 9900-9902 to eliminate the confusion of looking at
> the above comment vs below implementation.
> 
> Further, despite PAPRs software note that the long busy should be
> limited to 9900-9902 you might want to add a catch to your switch to log
> any unexpected 9903-9905 or just treat them as max 0.2s delay. Firmware
> has been know to do things on occasion that the spec says it shouldn't,
> and it might not be obvious at first should you receive one of the
> longer delay codes why we are going down the error path.
> 
> -Tyrel

Good to know, thanks.  I'll respin.

- Russell
Russell Currey March 29, 2016, 10:01 p.m. UTC | #4
On Tue, 2016-03-29 at 20:49 +1100, Gavin Shan wrote:
> On Tue, Mar 29, 2016 at 12:51:51PM +1000, Russell Currey wrote:
> > 
> > In the configure_pe and configure_bridge RTAS calls, the spec states
> > that values of 9900-9905 can be returned, indicating that software
> > should delay for 10^x (where x is the last digit, i.e. 990x)
> > milliseconds and attempt the call again. Currently, the kernel doesn't
> > know about this, and respecting it fixes some PCI failures when the
> > hypervisor is busy.
> > 
> > The delay is capped at 0.2 seconds.
> > 
> When talking about RTAS calls, it might be better to have their full
> names defined in PAPR spec. So I guess it'd better to replace
> configure_pe
> and configure_bridge with their corresponding full RTAS call names:
> "ibm,configure-pe" and "ibm,configure-bridge".
Makes sense, will do.
> 
> > 
> > Cc: <stable@vger.kernel.org> # 3.10-
> I think it would be #3.10+.
> 
<snip>
> > +		/*
> > +		 * RTAS can return a delay value of up to 10^5
> > milliseconds
> > +		 * (RTAS_EXTENDED_DELAY_MAX), which is too long.  Only
> > respect
> > +		 * the delay if it's 100ms or less.
> > +		 */
> > 
> The PAPR spec states that the delay can be up to 10^5ms. The spec also
> suggests
> the maximal delay is 10^2. I guess it's worthy to mention it in the above
> comments
> if you like.

Yeah, I'll expand on that in the comment.
> 
> > 
> > +		switch (ret) {
> > +		case 0:
> > +			return ret;
> > +		case RTAS_EXTENDED_DELAY_MIN:
> > +		case RTAS_EXTENDED_DELAY_MIN+1:
> > +		case RTAS_EXTENDED_DELAY_MIN+2:
> > +			mwait = rtas_busy_delay(ret);
> > +			break;
> > +		default:
> > +			goto err;
> > +		}
> > +
> > +		max_wait -= mwait;
> If you like, the block can be simplified to as below. In that case,
> tag #err isn't needed.
> 
>                 if (!ret)
>                         return ret;
> 
>                 max_wait -= rtas_busy_delay(ret);
> 
That doesn't catch the case where the return value is greater than
RTAS_EXTENDED_DELAY_MIN+2, in which case we would be sleeping for at least
1 second.  However, I am going to change it so that any delay above 100ms
is just treated as if it was 100ms, so I can simplify the code there and
probably remove the switch and goto.
Gavin Shan March 29, 2016, 11:07 p.m. UTC | #5
On Wed, Mar 30, 2016 at 08:01:17AM +1000, Russell Currey wrote:
>On Tue, 2016-03-29 at 20:49 +1100, Gavin Shan wrote:
>> On Tue, Mar 29, 2016 at 12:51:51PM +1000, Russell Currey wrote:
.../...

>> > 
>> > +		switch (ret) {
>> > +		case 0:
>> > +			return ret;
>> > +		case RTAS_EXTENDED_DELAY_MIN:
>> > +		case RTAS_EXTENDED_DELAY_MIN+1:
>> > +		case RTAS_EXTENDED_DELAY_MIN+2:
>> > +			mwait = rtas_busy_delay(ret);
>> > +			break;
>> > +		default:
>> > +			goto err;
>> > +		}
>> > +
>> > +		max_wait -= mwait;
>> If you like, the block can be simplified to as below. In that case,
>> tag #err isn't needed.
>> 
>>                 if (!ret)
>>                         return ret;
>> 
>>                 max_wait -= rtas_busy_delay(ret);
>> 
>That doesn't catch the case where the return value is greater than
>RTAS_EXTENDED_DELAY_MIN+2, in which case we would be sleeping for at least
>1 second.  However, I am going to change it so that any delay above 100ms
>is just treated as if it was 100ms, so I can simplify the code there and
>probably remove the switch and goto.
>

Yeah, we need update @max_wait and check it in advance. The point
is to remove the tag and simplify the switch block if you want, but
it's not a big deal:

		max_wait -= rtas_busy_delay_time(ret)
		if (ret <= 0 || max_wait < 0)
			return ret;

		rtas_busy_delay(ret);

Thanks,
Gavin
diff mbox

Patch

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 231b1df..c442b11 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -619,20 +619,43 @@  static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
 {
 	int config_addr;
 	int ret;
+	/* Waiting 0.2s maximum before skipping configuration */
+	int max_wait = 200;
+	int mwait;
 
 	/* Figure out the PE address */
 	config_addr = pe->config_addr;
 	if (pe->addr)
 		config_addr = pe->addr;
 
-	ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
-			config_addr, BUID_HI(pe->phb->buid),
-			BUID_LO(pe->phb->buid));
+	while (max_wait > 0) {
+		ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
+				config_addr, BUID_HI(pe->phb->buid),
+				BUID_LO(pe->phb->buid));
 
-	if (ret)
-		pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n",
-			__func__, pe->phb->global_number, pe->addr, ret);
+		/*
+		 * RTAS can return a delay value of up to 10^5 milliseconds
+		 * (RTAS_EXTENDED_DELAY_MAX), which is too long.  Only respect
+		 * the delay if it's 100ms or less.
+		 */
 
+		switch (ret) {
+		case 0:
+			return ret;
+		case RTAS_EXTENDED_DELAY_MIN:
+		case RTAS_EXTENDED_DELAY_MIN+1:
+		case RTAS_EXTENDED_DELAY_MIN+2:
+			mwait = rtas_busy_delay(ret);
+			break;
+		default:
+			goto err;
+		}
+
+		max_wait -= mwait;
+	}
+err:
+	pr_warn("%s: Unable to configure bridge PHB#%d-PE#%x (%d)\n",
+		__func__, pe->phb->global_number, pe->addr, ret);
 	return ret;
 }