diff mbox series

powerpc/powernv/nvram: opal_nvram_write handle unknown OPAL errors

Message ID 20180326150233.23089-1-npiggin@gmail.com (mailing list archive)
State Accepted
Commit 741de617661794246f84a21a02fc5e327bffc9ad
Headers show
Series powerpc/powernv/nvram: opal_nvram_write handle unknown OPAL errors | expand

Commit Message

Nicholas Piggin March 26, 2018, 3:02 p.m. UTC
opal_nvram_write currently just assumes success if it encounters an
error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
on other errors instead.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/platforms/powernv/opal-nvram.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Vasant Hegde March 27, 2018, 7:17 a.m. UTC | #1
On 03/26/2018 08:32 PM, Nicholas Piggin wrote:
> opal_nvram_write currently just assumes success if it encounters an
> error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
> on other errors instead.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/platforms/powernv/opal-nvram.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/opal-nvram.c b/arch/powerpc/platforms/powernv/opal-nvram.c
> index 9db4398ded5d..13bf625dc3e8 100644
> --- a/arch/powerpc/platforms/powernv/opal-nvram.c
> +++ b/arch/powerpc/platforms/powernv/opal-nvram.c
> @@ -59,6 +59,8 @@ static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index)
>   		if (rc == OPAL_BUSY_EVENT)
>   			opal_poll_events(NULL);

Current code does continuous poller here. May be we have small breathing time 
here. What you say?


>   	}
> +	if (rc)
> +		return -EIO;

Good catch. Thanks!

Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>

-Vasant
Nicholas Piggin March 27, 2018, 7:38 a.m. UTC | #2
On Tue, 27 Mar 2018 12:47:31 +0530
Vasant Hegde <hegdevasant@linux.vnet.ibm.com> wrote:

> On 03/26/2018 08:32 PM, Nicholas Piggin wrote:
> > opal_nvram_write currently just assumes success if it encounters an
> > error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
> > on other errors instead.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >   arch/powerpc/platforms/powernv/opal-nvram.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/opal-nvram.c b/arch/powerpc/platforms/powernv/opal-nvram.c
> > index 9db4398ded5d..13bf625dc3e8 100644
> > --- a/arch/powerpc/platforms/powernv/opal-nvram.c
> > +++ b/arch/powerpc/platforms/powernv/opal-nvram.c
> > @@ -59,6 +59,8 @@ static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index)
> >   		if (rc == OPAL_BUSY_EVENT)
> >   			opal_poll_events(NULL);  
> 
> Current code does continuous poller here. May be we have small breathing time 
> here. What you say?

Yeah that's probably not a bad idea. I cc'ed skiboot list -- what's a
reasonable delay between retries? Linux has a bunch of similar kind of
loops if you grep for opal_poll_events and OPAL_BUSY. It would be good
to convert them all to a standard form with a standard delay as
recommended by OPAL, and where specific calls have different delay
for a good reason, that would be documented in the OPAL API docs.

Thanks,
Nick
Michael Ellerman March 27, 2018, 12:13 p.m. UTC | #3
Nicholas Piggin <npiggin@gmail.com> writes:

> opal_nvram_write currently just assumes success if it encounters an
> error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
> on other errors instead.

Does that ever happen with current skiboot?

Even if it doesn't I think I'm inclined to tag this for stable.

cheers

> diff --git a/arch/powerpc/platforms/powernv/opal-nvram.c b/arch/powerpc/platforms/powernv/opal-nvram.c
> index 9db4398ded5d..13bf625dc3e8 100644
> --- a/arch/powerpc/platforms/powernv/opal-nvram.c
> +++ b/arch/powerpc/platforms/powernv/opal-nvram.c
> @@ -59,6 +59,8 @@ static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index)
>  		if (rc == OPAL_BUSY_EVENT)
>  			opal_poll_events(NULL);
>  	}
> +	if (rc)
> +		return -EIO;
>  	*index += count;
>  	return count;
>  }
> -- 
> 2.16.1
Nicholas Piggin March 27, 2018, 12:27 p.m. UTC | #4
On Tue, 27 Mar 2018 23:13:00 +1100
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > opal_nvram_write currently just assumes success if it encounters an
> > error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
> > on other errors instead.  
> 
> Does that ever happen with current skiboot?

I can now even using the mambo fake flash driver that never returns
failure, because skiboot will return OPAL_INTERNAL_ERROR if we try
to re-enter it. So I hit it when testing sreset-in-opal cases (the
crash path wants to write something to nvram).

Not sure about the skiboot flash layer. Aside from programming
errors, it looks like perhaps ECC and BMC failure or unresponsive
could cause errors to come back here.

> Even if it doesn't I think I'm inclined to tag this for stable.

It's turning some relatively minor types of errors into a system
hang, so it seems like it could go in stable. I've hit the -EIO case
in these basic tests and it hasn't had obvious bugs.

Thanks,
Nick
T T March 27, 2018, 12:35 p.m. UTC | #5
unscribed me 

    On ‎Tuesday‎, ‎March‎ ‎27‎, ‎2018‎ ‎05‎:‎34‎:‎30‎ ‎AM‎ ‎PDT, Michael Ellerman <mpe@ellerman.id.au> wrote:  
 
 Nicholas Piggin <npiggin@gmail.com> writes:

> opal_nvram_write currently just assumes success if it encounters an
> error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
> on other errors instead.

Does that ever happen with current skiboot?

Even if it doesn't I think I'm inclined to tag this for stable.

cheers

> diff --git a/arch/powerpc/platforms/powernv/opal-nvram.c b/arch/powerpc/platforms/powernv/opal-nvram.c
> index 9db4398ded5d..13bf625dc3e8 100644
> --- a/arch/powerpc/platforms/powernv/opal-nvram.c
> +++ b/arch/powerpc/platforms/powernv/opal-nvram.c
> @@ -59,6 +59,8 @@ static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index)
>          if (rc == OPAL_BUSY_EVENT)
>              opal_poll_events(NULL);
>      }
> +    if (rc)
> +        return -EIO;
>      *index += count;
>      return count;
>  }
> -- 
> 2.16.1
<html><head></head><body><div style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div></div>
            <div>unscribed me&nbsp;<br></div><div><br></div>
            
            <div class="yahoo_quoted" id="yahoo_quoted_2267153160">
                <div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
                    
                    <div>
                        On ‎Tuesday‎, ‎March‎ ‎27‎, ‎2018‎ ‎05‎:‎34‎:‎30‎ ‎AM‎ ‎PDT, Michael Ellerman &lt;mpe@ellerman.id.au&gt; wrote:
                    </div>
                    <div><br></div>
                    <div><br></div>
                    <div><div dir="ltr">Nicholas Piggin &lt;<a href="mailto:npiggin@gmail.com" ymailto="mailto:npiggin@gmail.com">npiggin@gmail.com</a>&gt; writes:<br></div><div dir="ltr"><br></div><div dir="ltr">&gt; opal_nvram_write currently just assumes success if it encounters an<br></div><div dir="ltr">&gt; error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO<br></div><div dir="ltr">&gt; on other errors instead.<br></div><div dir="ltr"><br></div><div dir="ltr">Does that ever happen with current skiboot?<br></div><div dir="ltr"><br></div><div dir="ltr">Even if it doesn't I think I'm inclined to tag this for stable.<br></div><div dir="ltr"><br></div><div dir="ltr">cheers<br></div><div dir="ltr"><br></div><div dir="ltr">&gt; diff --git a/arch/powerpc/platforms/powernv/opal-nvram.c b/arch/powerpc/platforms/powernv/opal-nvram.c<br></div><div dir="ltr">&gt; index 9db4398ded5d..13bf625dc3e8 100644<br></div><div dir="ltr">&gt; --- a/arch/powerpc/platforms/powernv/opal-nvram.c<br></div><div dir="ltr">&gt; +++ b/arch/powerpc/platforms/powernv/opal-nvram.c<br></div><div dir="ltr">&gt; @@ -59,6 +59,8 @@ static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index)<br></div><div dir="ltr">&gt;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; if (rc == OPAL_BUSY_EVENT)<br></div><div dir="ltr">&gt;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; opal_poll_events(NULL);<br></div><div dir="ltr">&gt;&nbsp; &nbsp;&nbsp;&nbsp; }<br></div><div dir="ltr">&gt; +&nbsp;&nbsp;&nbsp; if (rc)<br></div><div dir="ltr">&gt; +&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; return -EIO;<br></div><div dir="ltr">&gt;&nbsp; &nbsp;&nbsp;&nbsp; *index += count;<br></div><div dir="ltr">&gt;&nbsp; &nbsp;&nbsp;&nbsp; return count;<br></div><div dir="ltr">&gt;&nbsp; }<br></div><div dir="ltr">&gt; -- <br></div><div dir="ltr">&gt; 2.16.1<br></div></div>
                </div>
            </div></div></body></html>
Vasant Hegde March 28, 2018, 5:17 p.m. UTC | #6
On 03/27/2018 01:08 PM, Nicholas Piggin wrote:
> On Tue, 27 Mar 2018 12:47:31 +0530
> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> wrote:
> 
>> On 03/26/2018 08:32 PM, Nicholas Piggin wrote:
>>> opal_nvram_write currently just assumes success if it encounters an
>>> error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
>>> on other errors instead.
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>>    arch/powerpc/platforms/powernv/opal-nvram.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/opal-nvram.c b/arch/powerpc/platforms/powernv/opal-nvram.c
>>> index 9db4398ded5d..13bf625dc3e8 100644
>>> --- a/arch/powerpc/platforms/powernv/opal-nvram.c
>>> +++ b/arch/powerpc/platforms/powernv/opal-nvram.c
>>> @@ -59,6 +59,8 @@ static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index)
>>>    		if (rc == OPAL_BUSY_EVENT)
>>>    			opal_poll_events(NULL);
>>
>> Current code does continuous poller here. May be we have small breathing time
>> here. What you say?
> 
> Yeah that's probably not a bad idea. I cc'ed skiboot list -- what's a
> reasonable delay between retries?

I think it depends on individual API. Like in case of dump retrival I've 20ms 
delay... as FSP takes time to copy data to host memory. But may be here we can 
have smaller delay.

> Linux has a bunch of similar kind of
> loops if you grep for opal_poll_events and OPAL_BUSY. It would be good
> to convert them all to a standard form with a standard delay as
> recommended by OPAL, and where specific calls have different delay
> for a good reason, that would be documented in the OPAL API docs.

Yes. We should update API documentation.

-Vasant
Stewart Smith March 29, 2018, 4:27 a.m. UTC | #7
Nicholas Piggin <npiggin@gmail.com> writes:
> opal_nvram_write currently just assumes success if it encounters an
> error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
> on other errors instead.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/platforms/powernv/opal-nvram.c | 2 ++
>  1 file changed, 2 insertions(+)

Acked-by: Stewart Smith <stewart@linux.ibm.com>
Michael Ellerman March 31, 2018, 2:04 p.m. UTC | #8
On Mon, 2018-03-26 at 15:02:33 UTC, Nicholas Piggin wrote:
> opal_nvram_write currently just assumes success if it encounters an
> error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
> on other errors instead.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> Acked-by: Stewart Smith <stewart@linux.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/741de617661794246f84a21a02fc5e

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/powernv/opal-nvram.c b/arch/powerpc/platforms/powernv/opal-nvram.c
index 9db4398ded5d..13bf625dc3e8 100644
--- a/arch/powerpc/platforms/powernv/opal-nvram.c
+++ b/arch/powerpc/platforms/powernv/opal-nvram.c
@@ -59,6 +59,8 @@  static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index)
 		if (rc == OPAL_BUSY_EVENT)
 			opal_poll_events(NULL);
 	}
+	if (rc)
+		return -EIO;
 	*index += count;
 	return count;
 }