[v4,14/18] fadump: Introduce new reboot type

Message ID 20180703061727.8789-15-hegdevasant@linux.vnet.ibm.com
State Superseded
Headers show
Series
  • MPIPL support
Related show

Commit Message

Vasant Hegde July 3, 2018, 6:17 a.m.
Enhance reboot2 call to support FADUMP. Payload will call this interface
to initiate fadump.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 core/platform.c                        | 4 ++++
 doc/opal-api/opal-cec-reboot-6-116.rst | 7 +++++++
 include/opal-api.h                     | 1 +
 3 files changed, 12 insertions(+)

Comments

Stewart Smith Aug. 7, 2018, 7:59 a.m. | #1
Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
> Enhance reboot2 call to support FADUMP. Payload will call this interface
> to initiate fadump.
>
> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> ---
>  core/platform.c                        | 4 ++++
>  doc/opal-api/opal-cec-reboot-6-116.rst | 7 +++++++
>  include/opal-api.h                     | 1 +
>  3 files changed, 12 insertions(+)
>
> diff --git a/core/platform.c b/core/platform.c
> index 3282227c1..08f05dc9b 100644
> --- a/core/platform.c
> +++ b/core/platform.c
> @@ -101,6 +101,10 @@ static int64_t opal_cec_reboot2(uint32_t reboot_type, char *diag)
>  	case OPAL_REBOOT_FULL_IPL:
>  		disable_fast_reboot("full IPL reboot requested");
>  		return opal_cec_reboot();
> +	case OPAL_REBOOT_MPIPL:
> +		prlog(PR_ERR, "Kernel requested for fadump\n");
> +		assert(false);
> +		break;
>  	default:
>  		prlog(PR_NOTICE, "OPAL: Unsupported reboot request %d\n", reboot_type);
>  		return OPAL_UNSUPPORTED;
> diff --git a/doc/opal-api/opal-cec-reboot-6-116.rst b/doc/opal-api/opal-cec-reboot-6-116.rst
> index 516d4fc01..11cf54aa1 100644
> --- a/doc/opal-api/opal-cec-reboot-6-116.rst
> +++ b/doc/opal-api/opal-cec-reboot-6-116.rst
> @@ -63,6 +63,13 @@ OPAL_REBOOT_FULL_IPL = 2
>  	On platforms that don't support fast reboot, this is equivalent to a
>  	normal reboot.
>
> +OPAL_REBOOT_MPIPL = 3
> +	Request for fadump reboot. Firmware will reboot the system and collect
> +	dump.
> +
> +	On platforms that don't support fadump, this is equivalent to a
> +	normal assert.

I think a reboot type is the right way to go here, although I was
thinking maybe something not tied to MPIPL itself but something a bit
more generic....

I'm thinking along the lines of "should a dump be taken"... We have
OPAL_REBOOT_PLATFORM_ERROR for when there's something gone wrong on a
platform level (theoretically not the Operating System's fault), maybe
we want OPAL_REBOOT_OS_ERROR ?

With a dump being attempted for OPAL_REBOOT_PLATFORM_ERROR and
OPAL_REBOOT_OS_ERROR reboot types?

A bit of an unanswered question I have is if we should have a fast
reboot path for gathering a dump, as I'm not sure there's much that
would prevent us doing that?
Vasant Hegde Aug. 16, 2018, 5:31 a.m. | #2
On 08/07/2018 01:29 PM, Stewart Smith wrote:
> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>> Enhance reboot2 call to support FADUMP. Payload will call this interface
>> to initiate fadump.
>>
>> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
>> ---
>>   core/platform.c                        | 4 ++++
>>   doc/opal-api/opal-cec-reboot-6-116.rst | 7 +++++++
>>   include/opal-api.h                     | 1 +
>>   3 files changed, 12 insertions(+)
>>
>> diff --git a/core/platform.c b/core/platform.c
>> index 3282227c1..08f05dc9b 100644
>> --- a/core/platform.c
>> +++ b/core/platform.c
>> @@ -101,6 +101,10 @@ static int64_t opal_cec_reboot2(uint32_t reboot_type, char *diag)
>>   	case OPAL_REBOOT_FULL_IPL:
>>   		disable_fast_reboot("full IPL reboot requested");
>>   		return opal_cec_reboot();
>> +	case OPAL_REBOOT_MPIPL:
>> +		prlog(PR_ERR, "Kernel requested for fadump\n");
>> +		assert(false);
>> +		break;
>>   	default:
>>   		prlog(PR_NOTICE, "OPAL: Unsupported reboot request %d\n", reboot_type);
>>   		return OPAL_UNSUPPORTED;
>> diff --git a/doc/opal-api/opal-cec-reboot-6-116.rst b/doc/opal-api/opal-cec-reboot-6-116.rst
>> index 516d4fc01..11cf54aa1 100644
>> --- a/doc/opal-api/opal-cec-reboot-6-116.rst
>> +++ b/doc/opal-api/opal-cec-reboot-6-116.rst
>> @@ -63,6 +63,13 @@ OPAL_REBOOT_FULL_IPL = 2
>>   	On platforms that don't support fast reboot, this is equivalent to a
>>   	normal reboot.
>>
>> +OPAL_REBOOT_MPIPL = 3
>> +	Request for fadump reboot. Firmware will reboot the system and collect
>> +	dump.
>> +
>> +	On platforms that don't support fadump, this is equivalent to a
>> +	normal assert.
> 
> I think a reboot type is the right way to go here, although I was
> thinking maybe something not tied to MPIPL itself but something a bit
> more generic....
> 
> I'm thinking along the lines of "should a dump be taken"... We have
> OPAL_REBOOT_PLATFORM_ERROR for when there's something gone wrong on a
> platform level (theoretically not the Operating System's fault), maybe
> we want OPAL_REBOOT_OS_ERROR ?

Yes. OS_ERROR makes sense. Will fix it.

> 
> With a dump being attempted for OPAL_REBOOT_PLATFORM_ERROR and
> OPAL_REBOOT_OS_ERROR reboot types?

By default we have disabled xscom_trigger_xstop() in P9. So 
OPAL_REBOOT_PLATFORM_ERROR
will return  to kernel and kernel will call panic path.

I think we should plumb panic path to call  OPAL_REBOOT_OS_ERROR.

> 
> A bit of an unanswered question I have is if we should have a fast
> reboot path for gathering a dump, as I'm not sure there's much that
> would prevent us doing that?

Nothing prevents us from doing it . But do we really need? My understanding is
fast reboot is called in normal path (not when we hit some kernel/platform error).

-Vasant
Stewart Smith Aug. 17, 2018, 4:35 a.m. | #3
Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
> On 08/07/2018 01:29 PM, Stewart Smith wrote:
>> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>>> Enhance reboot2 call to support FADUMP. Payload will call this interface
>>> to initiate fadump.
>>>
>>> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
>>> ---
>>>   core/platform.c                        | 4 ++++
>>>   doc/opal-api/opal-cec-reboot-6-116.rst | 7 +++++++
>>>   include/opal-api.h                     | 1 +
>>>   3 files changed, 12 insertions(+)
>>>
>>> diff --git a/core/platform.c b/core/platform.c
>>> index 3282227c1..08f05dc9b 100644
>>> --- a/core/platform.c
>>> +++ b/core/platform.c
>>> @@ -101,6 +101,10 @@ static int64_t opal_cec_reboot2(uint32_t reboot_type, char *diag)
>>>   	case OPAL_REBOOT_FULL_IPL:
>>>   		disable_fast_reboot("full IPL reboot requested");
>>>   		return opal_cec_reboot();
>>> +	case OPAL_REBOOT_MPIPL:
>>> +		prlog(PR_ERR, "Kernel requested for fadump\n");
>>> +		assert(false);
>>> +		break;
>>>   	default:
>>>   		prlog(PR_NOTICE, "OPAL: Unsupported reboot request %d\n", reboot_type);
>>>   		return OPAL_UNSUPPORTED;
>>> diff --git a/doc/opal-api/opal-cec-reboot-6-116.rst b/doc/opal-api/opal-cec-reboot-6-116.rst
>>> index 516d4fc01..11cf54aa1 100644
>>> --- a/doc/opal-api/opal-cec-reboot-6-116.rst
>>> +++ b/doc/opal-api/opal-cec-reboot-6-116.rst
>>> @@ -63,6 +63,13 @@ OPAL_REBOOT_FULL_IPL = 2
>>>   	On platforms that don't support fast reboot, this is equivalent to a
>>>   	normal reboot.
>>>
>>> +OPAL_REBOOT_MPIPL = 3
>>> +	Request for fadump reboot. Firmware will reboot the system and collect
>>> +	dump.
>>> +
>>> +	On platforms that don't support fadump, this is equivalent to a
>>> +	normal assert.
>> 
>> I think a reboot type is the right way to go here, although I was
>> thinking maybe something not tied to MPIPL itself but something a bit
>> more generic....
>> 
>> I'm thinking along the lines of "should a dump be taken"... We have
>> OPAL_REBOOT_PLATFORM_ERROR for when there's something gone wrong on a
>> platform level (theoretically not the Operating System's fault), maybe
>> we want OPAL_REBOOT_OS_ERROR ?
>
> Yes. OS_ERROR makes sense. Will fix it.
>
>> 
>> With a dump being attempted for OPAL_REBOOT_PLATFORM_ERROR and
>> OPAL_REBOOT_OS_ERROR reboot types?
>
> By default we have disabled xscom_trigger_xstop() in P9. So 
> OPAL_REBOOT_PLATFORM_ERROR
> will return  to kernel and kernel will call panic path.

Yeah, I think we could at the least re-enable it and map it to MPIPL (at
least for the time being)?

At least then there'd be some kind of data capture of what went wrong
rather than our current gathering of absolutely nothing.

> I think we should plumb panic path to call  OPAL_REBOOT_OS_ERROR.

Yes, agree.

>> A bit of an unanswered question I have is if we should have a fast
>> reboot path for gathering a dump, as I'm not sure there's much that
>> would prevent us doing that?
>
> Nothing prevents us from doing it . But do we really need? My understanding is
> fast reboot is called in normal path (not when we hit some
> kernel/platform error).

We'll do a fast reboot even in the event of panic() as it's just hooked
up to the normal reboot path. We do disable fast reboot in some paths
though, namely: unrecoverable HMI, platform error, a bunch of internal
OPAL problems etc.
Vasant Hegde Aug. 20, 2018, 5:38 a.m. | #4
On 08/17/2018 10:05 AM, Stewart Smith wrote:
> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>> On 08/07/2018 01:29 PM, Stewart Smith wrote:
>>> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>>>> Enhance reboot2 call to support FADUMP. Payload will call this interface
>>>> to initiate fadump.
>>>>
>>>> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
>>>> ---
>>>>    core/platform.c                        | 4 ++++
>>>>    doc/opal-api/opal-cec-reboot-6-116.rst | 7 +++++++
>>>>    include/opal-api.h                     | 1 +
>>>>    3 files changed, 12 insertions(+)
>>>>
>>>> diff --git a/core/platform.c b/core/platform.c
>>>> index 3282227c1..08f05dc9b 100644
>>>> --- a/core/platform.c
>>>> +++ b/core/platform.c
>>>> @@ -101,6 +101,10 @@ static int64_t opal_cec_reboot2(uint32_t reboot_type, char *diag)
>>>>    	case OPAL_REBOOT_FULL_IPL:
>>>>    		disable_fast_reboot("full IPL reboot requested");
>>>>    		return opal_cec_reboot();
>>>> +	case OPAL_REBOOT_MPIPL:
>>>> +		prlog(PR_ERR, "Kernel requested for fadump\n");
>>>> +		assert(false);
>>>> +		break;
>>>>    	default:
>>>>    		prlog(PR_NOTICE, "OPAL: Unsupported reboot request %d\n", reboot_type);
>>>>    		return OPAL_UNSUPPORTED;
>>>> diff --git a/doc/opal-api/opal-cec-reboot-6-116.rst b/doc/opal-api/opal-cec-reboot-6-116.rst
>>>> index 516d4fc01..11cf54aa1 100644
>>>> --- a/doc/opal-api/opal-cec-reboot-6-116.rst
>>>> +++ b/doc/opal-api/opal-cec-reboot-6-116.rst
>>>> @@ -63,6 +63,13 @@ OPAL_REBOOT_FULL_IPL = 2
>>>>    	On platforms that don't support fast reboot, this is equivalent to a
>>>>    	normal reboot.
>>>>
>>>> +OPAL_REBOOT_MPIPL = 3
>>>> +	Request for fadump reboot. Firmware will reboot the system and collect
>>>> +	dump.
>>>> +
>>>> +	On platforms that don't support fadump, this is equivalent to a
>>>> +	normal assert.
>>>
>>> I think a reboot type is the right way to go here, although I was
>>> thinking maybe something not tied to MPIPL itself but something a bit
>>> more generic....
>>>
>>> I'm thinking along the lines of "should a dump be taken"... We have
>>> OPAL_REBOOT_PLATFORM_ERROR for when there's something gone wrong on a
>>> platform level (theoretically not the Operating System's fault), maybe
>>> we want OPAL_REBOOT_OS_ERROR ?
>>
>> Yes. OS_ERROR makes sense. Will fix it.
>>
>>>
>>> With a dump being attempted for OPAL_REBOOT_PLATFORM_ERROR and
>>> OPAL_REBOOT_OS_ERROR reboot types?
>>
>> By default we have disabled xscom_trigger_xstop() in P9. So
>> OPAL_REBOOT_PLATFORM_ERROR
>> will return  to kernel and kernel will call panic path.
> 
> Yeah, I think we could at the least re-enable it and map it to MPIPL (at
> least for the time being)?

Yes.. We can do that.. I will enable this patch later (once base patches merged).

> 
> At least then there'd be some kind of data capture of what went wrong
> rather than our current gathering of absolutely nothing.

Agree.


> 
>> I think we should plumb panic path to call  OPAL_REBOOT_OS_ERROR.
> 
> Yes, agree.
> 
>>> A bit of an unanswered question I have is if we should have a fast
>>> reboot path for gathering a dump, as I'm not sure there's much that
>>> would prevent us doing that?
>>
>> Nothing prevents us from doing it . But do we really need? My understanding is
>> fast reboot is called in normal path (not when we hit some
>> kernel/platform error).
> 
> We'll do a fast reboot even in the event of panic() as it's just hooked
> up to the normal reboot path. We do disable fast reboot in some paths
> though, namely: unrecoverable HMI, platform error, a bunch of internal
> OPAL problems etc.

Hmmm. may be we should plumb panic() path to call MPIPL. Will look into it later.

-Vasant

Patch

diff --git a/core/platform.c b/core/platform.c
index 3282227c1..08f05dc9b 100644
--- a/core/platform.c
+++ b/core/platform.c
@@ -101,6 +101,10 @@  static int64_t opal_cec_reboot2(uint32_t reboot_type, char *diag)
 	case OPAL_REBOOT_FULL_IPL:
 		disable_fast_reboot("full IPL reboot requested");
 		return opal_cec_reboot();
+	case OPAL_REBOOT_MPIPL:
+		prlog(PR_ERR, "Kernel requested for fadump\n");
+		assert(false);
+		break;
 	default:
 		prlog(PR_NOTICE, "OPAL: Unsupported reboot request %d\n", reboot_type);
 		return OPAL_UNSUPPORTED;
diff --git a/doc/opal-api/opal-cec-reboot-6-116.rst b/doc/opal-api/opal-cec-reboot-6-116.rst
index 516d4fc01..11cf54aa1 100644
--- a/doc/opal-api/opal-cec-reboot-6-116.rst
+++ b/doc/opal-api/opal-cec-reboot-6-116.rst
@@ -63,6 +63,13 @@  OPAL_REBOOT_FULL_IPL = 2
 	On platforms that don't support fast reboot, this is equivalent to a
 	normal reboot.
 
+OPAL_REBOOT_MPIPL = 3
+	Request for fadump reboot. Firmware will reboot the system and collect
+	dump.
+
+	On platforms that don't support fadump, this is equivalent to a
+	normal assert.
+
 Unsupported Reboot type
 	For unsupported reboot type, this function will return with
 	OPAL_UNSUPPORTED and no reboot will be triggered.
diff --git a/include/opal-api.h b/include/opal-api.h
index a099b0250..b7d4fac49 100644
--- a/include/opal-api.h
+++ b/include/opal-api.h
@@ -1233,6 +1233,7 @@  enum {
 	OPAL_REBOOT_NORMAL = 0,
 	OPAL_REBOOT_PLATFORM_ERROR,
 	OPAL_REBOOT_FULL_IPL,
+	OPAL_REBOOT_MPIPL,
 };
 
 /* Argument to OPAL_PCI_TCE_KILL */