diff mbox series

fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM

Message ID 20171002010825.27358-1-stewart@linux.vnet.ibm.com
State Accepted
Headers show
Series fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM | expand

Commit Message

Stewart Smith Oct. 2, 2017, 1:08 a.m. UTC
We had a race condition between FSP Reset/Reload and powering down
the system from the host:

Roughly:

  FSP                Host
  ---                ----
  Power on
                     Power on
  (inject EPOW)
  (trigger FSP R/R)
                     Processes EPOW event, starts shutting down
                     calls OPAL_CEC_POWER_DOWN
  (is still in R/R)
                     gets OPAL_INTERNAL_ERROR, spins in opal_poll_events
  (FSP comes back)
                     spinning in opal_poll_events
  (thinks host is running)

The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload
error path for fsp_sync_msg() is to return -1, which means we give
the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API
docs give us the opportunity to return OPAL_BUSY when trying again
later may be successful, and we're ambiguous as to if you should retry
on OPAL_INTERNAL_ERROR.

For reference, the linux code looks like this:
>static void __noreturn pnv_power_off(void)
>{
>        long rc = OPAL_BUSY;
>
>        pnv_prepare_going_down();
>
>        while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
>                rc = opal_cec_power_down(0);
>                if (rc == OPAL_BUSY_EVENT)
>                        opal_poll_events(NULL);
>                else
>                        mdelay(10);
>        }
>        for (;;)
>                opal_poll_events(NULL);
>}

Which means that *practically* our only option is to return OPAL_BUSY
or OPAL_BUSY_EVENT.

We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're
running pollers to communicate with the FSP and do the final bits of
Reset/Reload handling before we power off the system.

Additionally, we really should update our documentation to point all
of these return codes and what action an OS should take.

CC: stable
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
---
 doc/opal-api/opal-cec-power-down-5.rst | 18 +++++++++++++++---
 doc/opal-api/return-codes.rst          |  6 +++++-
 platforms/ibm-fsp/common.c             |  2 +-
 3 files changed, 21 insertions(+), 5 deletions(-)

Comments

Vasant Hegde Oct. 6, 2017, 5:34 a.m. UTC | #1
On 10/02/2017 06:38 AM, Stewart Smith wrote:
> We had a race condition between FSP Reset/Reload and powering down
> the system from the host:
>

Good catch. Patch looks good to me. But I think we should fix 
ibm_fsp_cec_reboot() as well.

Otherwise patch looks good to me.

Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>

-Vasant
Ananth N Mavinakayanahalli Oct. 6, 2017, 9:06 a.m. UTC | #2
On Mon, Oct 02, 2017 at 01:08:25AM +0000, Stewart Smith wrote:
> We had a race condition between FSP Reset/Reload and powering down
> the system from the host:
> 
> Roughly:
> 
>   FSP                Host
>   ---                ----
>   Power on
>                      Power on
>   (inject EPOW)
>   (trigger FSP R/R)
>                      Processes EPOW event, starts shutting down
>                      calls OPAL_CEC_POWER_DOWN
>   (is still in R/R)
>                      gets OPAL_INTERNAL_ERROR, spins in opal_poll_events
>   (FSP comes back)
>                      spinning in opal_poll_events
>   (thinks host is running)
> 
> The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload
> error path for fsp_sync_msg() is to return -1, which means we give
> the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API
> docs give us the opportunity to return OPAL_BUSY when trying again
> later may be successful, and we're ambiguous as to if you should retry
> on OPAL_INTERNAL_ERROR.
> 
> For reference, the linux code looks like this:
> >static void __noreturn pnv_power_off(void)
> >{
> >        long rc = OPAL_BUSY;
> >
> >        pnv_prepare_going_down();
> >
> >        while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
> >                rc = opal_cec_power_down(0);
> >                if (rc == OPAL_BUSY_EVENT)
> >                        opal_poll_events(NULL);
> >                else
> >                        mdelay(10);
> >        }
> >        for (;;)
> >                opal_poll_events(NULL);
> >}
> 
> Which means that *practically* our only option is to return OPAL_BUSY
> or OPAL_BUSY_EVENT.
> 
> We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're
> running pollers to communicate with the FSP and do the final bits of
> Reset/Reload handling before we power off the system.
> 
> Additionally, we really should update our documentation to point all
> of these return codes and what action an OS should take.

Superb analysis...

> CC: stable
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>

Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Vasant Hegde Oct. 7, 2017, 2:53 p.m. UTC | #3
On 10/02/2017 06:38 AM, Stewart Smith wrote:
> We had a race condition between FSP Reset/Reload and powering down
> the system from the host:
>

.../...

>  ::
> diff --git a/platforms/ibm-fsp/common.c b/platforms/ibm-fsp/common.c
> index 237b63fb4f05..0a9b06f77e33 100644
> --- a/platforms/ibm-fsp/common.c
> +++ b/platforms/ibm-fsp/common.c
> @@ -223,7 +223,7 @@ int64_t ibm_fsp_cec_power_down(uint64_t request)
>  	printf("FSP: Sending shutdown command to FSP...\n");

We may endup filling OPAL console with above message as kernel will repeatedly 
makes shutdown call.



>
>  	if (fsp_sync_msg(fsp_mkmsg(FSP_CMD_POWERDOWN_NORM, 1, request), true))

How about changing this to fsp_queue_msg?  Even in shutdown path we will be able 
to queue message and send it to FSP after R/R completes.

-Vasant
Stewart Smith Oct. 9, 2017, 6:18 a.m. UTC | #4
Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>> --- a/platforms/ibm-fsp/common.c
>> +++ b/platforms/ibm-fsp/common.c
>> @@ -223,7 +223,7 @@ int64_t ibm_fsp_cec_power_down(uint64_t request)
>>  	printf("FSP: Sending shutdown command to FSP...\n");
>
> We may endup filling OPAL console with above message as kernel will repeatedly 
> makes shutdown call.

Yeah, at least that's not user visible though?

>>  	if (fsp_sync_msg(fsp_mkmsg(FSP_CMD_POWERDOWN_NORM, 1, request), true))
>
> How about changing this to fsp_queue_msg?  Even in shutdown path we will be able 
> to queue message and send it to FSP after R/R completes.

If we have a pending firmware update though, we'll have all CPUs in OPAL
with interrupts off, so we do have to go and run pollers to have a
chance of making forward progress. Even though we'd never apply the
update due to R/R, (not that I think we've ever tested this exact
scenario), we would still go down that code path in the kernel.

In that case, we'd still have to return OPAL_BUSY_EVENT so we could
reconnect and process the queued message, so it really wouldn't make
much of a difference (in fact, we'd have to keep track on if we've
queued it or not).
Vasant Hegde Oct. 10, 2017, 4:26 a.m. UTC | #5
On 10/09/2017 11:48 AM, Stewart Smith wrote:
> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>>> --- a/platforms/ibm-fsp/common.c
>>> +++ b/platforms/ibm-fsp/common.c
>>> @@ -223,7 +223,7 @@ int64_t ibm_fsp_cec_power_down(uint64_t request)
>>>  	printf("FSP: Sending shutdown command to FSP...\n");
>>
>> We may endup filling OPAL console with above message as kernel will repeatedly
>> makes shutdown call.
>
> Yeah, at least that's not user visible though?

Yes. Its not user visible. But OPAL msglog will be full of these messages and 
may not be useful for any debugging.. May be that's fine.

>
>>>  	if (fsp_sync_msg(fsp_mkmsg(FSP_CMD_POWERDOWN_NORM, 1, request), true))
>>
>> How about changing this to fsp_queue_msg?  Even in shutdown path we will be able
>> to queue message and send it to FSP after R/R completes.
>
> If we have a pending firmware update though, we'll have all CPUs in OPAL
> with interrupts off, so we do have to go and run pollers to have a
> chance of making forward progress. Even though we'd never apply the
> update due to R/R, (not that I think we've ever tested this exact
> scenario), we would still go down that code path in the kernel.

Yeah. I don't think we ever tested codeupdate with FSP R/R. But with this change 
we will keep on retrying codeupdate as well (along with sending shutdown 
message). Once FSP is back it will try to start codeupdate. that should be fine 
I guess.

-Vasant

>
> In that case, we'd still have to return OPAL_BUSY_EVENT so we could
> reconnect and process the queued message, so it really wouldn't make
> much of a difference (in fact, we'd have to keep track on if we've
> queued it or not).
>
Stewart Smith Oct. 11, 2017, 8:57 a.m. UTC | #6
Stewart Smith <stewart@linux.vnet.ibm.com> writes:
> We had a race condition between FSP Reset/Reload and powering down
> the system from the host:
>
> Roughly:
>
>   FSP                Host
>   ---                ----
>   Power on
>                      Power on
>   (inject EPOW)
>   (trigger FSP R/R)
>                      Processes EPOW event, starts shutting down
>                      calls OPAL_CEC_POWER_DOWN
>   (is still in R/R)
>                      gets OPAL_INTERNAL_ERROR, spins in opal_poll_events
>   (FSP comes back)
>                      spinning in opal_poll_events
>   (thinks host is running)
>
> The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload
> error path for fsp_sync_msg() is to return -1, which means we give
> the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API
> docs give us the opportunity to return OPAL_BUSY when trying again
> later may be successful, and we're ambiguous as to if you should retry
> on OPAL_INTERNAL_ERROR.
>
> For reference, the linux code looks like this:
>>static void __noreturn pnv_power_off(void)
>>{
>>        long rc = OPAL_BUSY;
>>
>>        pnv_prepare_going_down();
>>
>>        while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
>>                rc = opal_cec_power_down(0);
>>                if (rc == OPAL_BUSY_EVENT)
>>                        opal_poll_events(NULL);
>>                else
>>                        mdelay(10);
>>        }
>>        for (;;)
>>                opal_poll_events(NULL);
>>}
>
> Which means that *practically* our only option is to return OPAL_BUSY
> or OPAL_BUSY_EVENT.
>
> We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're
> running pollers to communicate with the FSP and do the final bits of
> Reset/Reload handling before we power off the system.
>
> Additionally, we really should update our documentation to point all
> of these return codes and what action an OS should take.
>
> CC: stable
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
> ---
>  doc/opal-api/opal-cec-power-down-5.rst | 18 +++++++++++++++---
>  doc/opal-api/return-codes.rst          |  6 +++++-
>  platforms/ibm-fsp/common.c             |  2 +-
>  3 files changed, 21 insertions(+), 5 deletions(-)

Merged to master as of 696d378d7b7295366e115e89a785640bf72a5043
and 5.4.x as of 70af010ad33300eb150187c3076c525617565d33
(and made it into 5.4.8)
diff mbox series

Patch

diff --git a/doc/opal-api/opal-cec-power-down-5.rst b/doc/opal-api/opal-cec-power-down-5.rst
index 6daea3de65ff..bdcb84e3b958 100644
--- a/doc/opal-api/opal-cec-power-down-5.rst
+++ b/doc/opal-api/opal-cec-power-down-5.rst
@@ -24,16 +24,28 @@  Return Values
 -------------
 
 ``OPAL_SUCCESS``
-  the power down was updated successful
+  the power down request was successful.
+  This may/may not result in immediate power down. An OS should
+  spin in a loop after getting `OPAL_SUCCESS` as it is likely that there
+  will be a delay before instructions stop being executed.
 
 ``OPAL_BUSY``
-  unable to power down, try again later
+  unable to power down, try again later.
+
+``OPAL_BUSY_EVENT``
+  Unable to power down, call `opal_run_pollers` and try again.
 
 ``OPAL_PARAMETER``
   a parameter was incorrect
 
 ``OPAL_INTERNAL_ERROR``
-  hal code sent incorrect data to hardware device
+  Something went wrong, and waiting and trying again is unlikely to be
+  successful. Although, considering that in a shutdown code path, there's
+  unlikely to be any other valid option to take, retrying is perfectly valid.
+
+  In older OPAL versions (prior to skiboot v5.9), on IBM FSP systems, this
+  return code was returned erroneously instead of OPAL_BUSY_EVENT during an
+  FSP Reset/Reload.
 
 ``OPAL_UNSUPPORTED``
   this platform does not support being powered off.
diff --git a/doc/opal-api/return-codes.rst b/doc/opal-api/return-codes.rst
index 03ea5c19d728..3ea4a3d85169 100644
--- a/doc/opal-api/return-codes.rst
+++ b/doc/opal-api/return-codes.rst
@@ -40,7 +40,8 @@  OPAL_BUSY
 
    #define OPAL_BUSY		-2
 
-Try again later.
+Try again later. Related to `OPAL_BUSY_EVENT`, but `OPAL_BUSY` indicates that the
+caller need not call `OPAL_POLL_EVENTS` itself. **TODO** Clarify current situation.
 
 OPAL_PARTIAL
 ------------
@@ -126,6 +127,9 @@  OPAL_BUSY_EVENT
 
    #define OPAL_BUSY_EVENT		-12
 
+The same as `OPAL_BUSY` but signals that the OS should call `OPAL_POLL_EVENTS` as
+that may be required to get into a state where the call will succeed.
+
 OPAL_HARDWARE_FROZEN
 --------------------
 ::
diff --git a/platforms/ibm-fsp/common.c b/platforms/ibm-fsp/common.c
index 237b63fb4f05..0a9b06f77e33 100644
--- a/platforms/ibm-fsp/common.c
+++ b/platforms/ibm-fsp/common.c
@@ -223,7 +223,7 @@  int64_t ibm_fsp_cec_power_down(uint64_t request)
 	printf("FSP: Sending shutdown command to FSP...\n");
 
 	if (fsp_sync_msg(fsp_mkmsg(FSP_CMD_POWERDOWN_NORM, 1, request), true))
-		return OPAL_INTERNAL_ERROR;
+		return OPAL_BUSY_EVENT;
 
 	fsp_reset_links();
 	return OPAL_SUCCESS;