diff mbox series

[v2] ipmi/power: Fix system reboot issue

Message ID 20190215152223.18095-1-hegdevasant@linux.vnet.ibm.com
State Superseded
Headers show
Series [v2] ipmi/power: Fix system reboot issue | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master

Commit Message

Vasant Hegde Feb. 15, 2019, 3:22 p.m. UTC
Kernel makes reboot/shudown OPAL call for reboot/shutdown. Once kernel
gets response from OPAL it runs opal_poll_events() until firmware
handles the request.

On BMC based system, OPAL makes IPMI call (IPMI_CHASSIS_CONTROL) to
initiate system reboot/shutdown. At present OPAL queues IPMI messages
and return SUCESS to Host. If BMC is not ready to accept command (like
BMC reboot), then these message will fail. We have to manually
reboot/shutdown the system using BMC interface.

This patch adds logic to validate message return value. If message failed,
then it will resend the message. At some stage BMC will be ready to accept
message and handles IPMI message.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
Changes in v2:
  - Fixed use after free issue

-Vasant

 hw/ipmi/ipmi-power.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

Comments

Stewart Smith Feb. 18, 2019, 5:32 a.m. UTC | #1
Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
> Kernel makes reboot/shudown OPAL call for reboot/shutdown. Once kernel
> gets response from OPAL it runs opal_poll_events() until firmware
> handles the request.
>
> On BMC based system, OPAL makes IPMI call (IPMI_CHASSIS_CONTROL) to
> initiate system reboot/shutdown. At present OPAL queues IPMI messages
> and return SUCESS to Host. If BMC is not ready to accept command (like
> BMC reboot), then these message will fail. We have to manually
> reboot/shutdown the system using BMC interface.
>
> This patch adds logic to validate message return value. If message failed,
> then it will resend the message. At some stage BMC will be ready to accept
> message and handles IPMI message.
>
> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> ---
> Changes in v2:
>   - Fixed use after free issue
>
> -Vasant
>
>  hw/ipmi/ipmi-power.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/hw/ipmi/ipmi-power.c b/hw/ipmi/ipmi-power.c
> index f14a0c980..1d175dfb5 100644
> --- a/hw/ipmi/ipmi-power.c
> +++ b/hw/ipmi/ipmi-power.c
> @@ -18,6 +18,28 @@
>  #include <stdlib.h>
>  #include <ipmi.h>
>  #include <opal.h>
> +#include <timebase.h>
> +
> +static void ipmi_chassis_control_complete(struct ipmi_msg *msg)
> +{
> +	uint8_t request = msg->data[0];
> +	uint8_t cc = msg->cc;
> +
> +	ipmi_free_msg(msg);
> +	if (cc == IPMI_CC_NO_ERROR)
> +		return;
> +
> +	prlog(PR_INFO, "IPMI: Chassis control request failed. "
> +	      "request=0x%02x, rc=0x%02x\n", request, cc);
> +
> +	/* Relax a bit before resending command */
> +	time_wait_nopoll(100);

Any reason time_wait rather than just scheduling a job in the future?
Vasant Hegde Feb. 18, 2019, 6:23 a.m. UTC | #2
On 02/18/2019 11:02 AM, Stewart Smith wrote:
> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>> Kernel makes reboot/shudown OPAL call for reboot/shutdown. Once kernel
>> gets response from OPAL it runs opal_poll_events() until firmware
>> handles the request.
>>
>> On BMC based system, OPAL makes IPMI call (IPMI_CHASSIS_CONTROL) to
>> initiate system reboot/shutdown. At present OPAL queues IPMI messages
>> and return SUCESS to Host. If BMC is not ready to accept command (like
>> BMC reboot), then these message will fail. We have to manually
>> reboot/shutdown the system using BMC interface.
>>
>> This patch adds logic to validate message return value. If message failed,
>> then it will resend the message. At some stage BMC will be ready to accept
>> message and handles IPMI message.
>>
>> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
>> ---
>> Changes in v2:
>>    - Fixed use after free issue
>>
>> -Vasant
>>
>>   hw/ipmi/ipmi-power.c | 29 +++++++++++++++++++++++++++--
>>   1 file changed, 27 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ipmi/ipmi-power.c b/hw/ipmi/ipmi-power.c
>> index f14a0c980..1d175dfb5 100644
>> --- a/hw/ipmi/ipmi-power.c
>> +++ b/hw/ipmi/ipmi-power.c
>> @@ -18,6 +18,28 @@
>>   #include <stdlib.h>
>>   #include <ipmi.h>
>>   #include <opal.h>
>> +#include <timebase.h>
>> +
>> +static void ipmi_chassis_control_complete(struct ipmi_msg *msg)
>> +{
>> +	uint8_t request = msg->data[0];
>> +	uint8_t cc = msg->cc;
>> +
>> +	ipmi_free_msg(msg);
>> +	if (cc == IPMI_CC_NO_ERROR)
>> +		return;
>> +
>> +	prlog(PR_INFO, "IPMI: Chassis control request failed. "
>> +	      "request=0x%02x, rc=0x%02x\n", request, cc);
>> +
>> +	/* Relax a bit before resending command */
>> +	time_wait_nopoll(100);
> 
> Any reason time_wait rather than just scheduling a job in the future?

I thought we will relax bit before scheduling again. But no strong reason. We 
can get rid of
time_wait() here.

-Vasant
diff mbox series

Patch

diff --git a/hw/ipmi/ipmi-power.c b/hw/ipmi/ipmi-power.c
index f14a0c980..1d175dfb5 100644
--- a/hw/ipmi/ipmi-power.c
+++ b/hw/ipmi/ipmi-power.c
@@ -18,6 +18,28 @@ 
 #include <stdlib.h>
 #include <ipmi.h>
 #include <opal.h>
+#include <timebase.h>
+
+static void ipmi_chassis_control_complete(struct ipmi_msg *msg)
+{
+	uint8_t request = msg->data[0];
+	uint8_t cc = msg->cc;
+
+	ipmi_free_msg(msg);
+	if (cc == IPMI_CC_NO_ERROR)
+		return;
+
+	prlog(PR_INFO, "IPMI: Chassis control request failed. "
+	      "request=0x%02x, rc=0x%02x\n", request, cc);
+
+	/* Relax a bit before resending command */
+	time_wait_nopoll(100);
+
+	if (ipmi_chassis_control(request)) {
+		prlog(PR_INFO, "IPMI: Failed to resend chassis control "
+		      "request [0x%02x]\n", request);
+	}
+}
 
 int ipmi_chassis_control(uint8_t request)
 {
@@ -29,10 +51,13 @@  int ipmi_chassis_control(uint8_t request)
 	if (request > IPMI_CHASSIS_SOFT_SHUTDOWN)
 		return OPAL_PARAMETER;
 
-	msg = ipmi_mkmsg_simple(IPMI_CHASSIS_CONTROL, &request,
-				sizeof(request));
+	msg = ipmi_mkmsg(IPMI_DEFAULT_INTERFACE, IPMI_CHASSIS_CONTROL,
+			 ipmi_chassis_control_complete, NULL,
+			 &request, sizeof(request), 0);
 	if (!msg)
 		return OPAL_HARDWARE;
+	/* Set msg->error callback function */
+	msg->error = ipmi_chassis_control_complete;
 
 	prlog(PR_INFO, "IPMI: sending chassis control request 0x%02x\n",
 			request);