[v4,04/10] opal-msg: Enhance opal-get-msg API

Message ID	20190516175816.10558-4-hegdevasant@linux.vnet.ibm.com
State	Superseded
Headers	show Return-Path: <skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org> Gateway: Authorized Use Only! Violators will be prosecuted for <skiboot@lists.ozlabs.org> from <hegdevasant@linux.vnet.ibm.com>; Thu, 16 May 2019 18:58:38 +0100 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 16 May 2019 18:58:35 +0100 From: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> To: skiboot@lists.ozlabs.org Date: Thu, 16 May 2019 23:28:10 +0530 In-Reply-To: <20190516175816.10558-1-hegdevasant@linux.vnet.ibm.com> References: <20190516175816.10558-1-hegdevasant@linux.vnet.ibm.com> Message-Id: <20190516175816.10558-4-hegdevasant@linux.vnet.ibm.com> Subject: [Skiboot] [PATCH v4 04/10] opal-msg: Enhance opal-get-msg API Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" <skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>
Series	[v4,01/10] core/opal: Increase opal-msg-size size \| expand [v4,01/10] core/opal: Increase opal-msg-size size [v4,02/10] opal-msg: Pass return value to callbackhandler [v4,03/10] opal-msg: Pass parameter size to_opal_queue_msg() [v4,04/10] opal-msg: Enhance opal-get-msg API [v4,05/10] core/test/run-msg: Add callback functiontest [v4,06/10] hostservices: Do not call hservices_initon ZZ [v4,07/10] prd: Validate _opal_queue_msg() returnvalue [v4,08/10] prd: Implement generic HBRT - FSPinterface [v4,09/10] prd: Implement generic FSP - HBRTinterface [v4,10/10] opal-prd: Fix prd message size issue

Context	Check	Description
snowpatch_ozlabs/apply_patch	success	Successfully applied on branch master (c8b5e8a95caf029ffe73ea18769fdd7f2da48ab4)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot	success	Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco	success	Signed-off-by present

Vasant Hegde May 16, 2019, 5:58 p.m. UTC

Linux uses opal_get_msg (OPAL_GET_MSG) API to get OPAL messages. This interface
supports upto 8 params (64 bytes). We have a requirement to send bigger data to
Linux. This patch enhances OPAL to send bigger data to Linux.

- Linux will use "opal-msg-size" device tree property to allocate memory for
  OPAL messages (previous patch increased "opal-msg-size" to 64K).

- Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux
  side opal_get_msg user can detect actual data size.

- If buffer size < actual message size, then opal_get_msg will copy partial
  data and return OPAL_PARTIAL to Linux.

- Add new variable "extended" to "opal_msg_entry" structure to keep track
  of messages that has more than 64byte data. We will allocate separate
  memory for these messages and once kernel consumes message we will
  release that memory.

Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
---
 core/opal-msg.c                | 66 ++++++++++++++++++++++++++++++------------
 core/test/run-msg.c            |  6 ++--
 doc/opal-api/opal-messages.rst |  2 +-
 include/opal-api.h             |  2 +-
 4 files changed, 53 insertions(+), 23 deletions(-)

Stewart Smith May 21, 2019, 7:20 a.m. UTC | #1

Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:

> Linux uses opal_get_msg (OPAL_GET_MSG) API to get OPAL messages. This interface
> supports upto 8 params (64 bytes). We have a requirement to send bigger data to
> Linux. This patch enhances OPAL to send bigger data to Linux.
>
> - Linux will use "opal-msg-size" device tree property to allocate memory for
>   OPAL messages (previous patch increased "opal-msg-size" to 64K).
>
> - Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux
>   side opal_get_msg user can detect actual data size.
>
> - If buffer size < actual message size, then opal_get_msg will copy partial
>   data and return OPAL_PARTIAL to Linux.

Looking through the linux code, we should probably very carfully
document the expected behaviour for larger messages an the impact.

Specifically around what happens with OPAL_PARTIAL and ensuring we don't
just print error messages out from the kernel forever.

Specifically, we seem to have two places where we consume messages in
Linux:

in opal.c:
static void opal_handle_message(void)
{
....
        ret = opal_get_msg(__pa(&msg), sizeof(msg));
        /* No opal message pending. */
        if (ret == OPAL_RESOURCE)
                return;

        /* check for errors. */
        if (ret) {
                pr_warn("%s: Failed to retrieve opal message, err=%lld\n",
                        __func__, ret);
                return;
        }

So this gives me pause, as if we have a large message needing to be
retreived, existing kernels never will clear it, and spew pr_warns from
here to eternity.

and in opal-hmi.c:

        if (unrecoverable) {
                /* Pull all HMI events from OPAL before we panic. */
                while (opal_get_msg(__pa(&msg), sizeof(msg)) == OPAL_SUCCESS) {
                      .....
                      print_hmi_event_info(hmi_evt);


So if we have a large event in the middle of a bunch of fatal HMI info,
we'll lose the HMI event info.

So I think we need to ensure two things:
1) we don't lose unrecoverable HMIs
2) existing kernels degrade gracefully.

We can probably solve 1 by something only casually ugly (remove every
event that isn't unrecoverable hmi or don't return OPAL_PARTIAL when
there's an unrecoverable HMI event present).

For 2, I'm trying to think of what's good to do.. my ideas go down two
paths:
- if a message has gotten OPAL_PARTIAL twice, drop it.
- Drop message after first OPAL_PARTIAL if opal_get_msg size parameter
  is the sizeof(msg) rather than what we put in the device tree.

Either way, we should very clearly document this (and the reasoning
behind it) in the OPAL API docs.

I haven't looked at what FreeBSD does, and we should probably do that if
only to get into the habit of doing so.


> - Add new variable "extended" to "opal_msg_entry" structure to keep track
>   of messages that has more than 64byte data. We will allocate separate
>   memory for these messages and once kernel consumes message we will
>   release that memory.
>
> Cc: Jeremy Kerr <jk@ozlabs.org>
> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> Cc: Oliver O'Halloran <oohall@gmail.com>
> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> Acked-by: Jeremy Kerr <jk@ozlabs.org>
> ---
>  core/opal-msg.c                | 66 ++++++++++++++++++++++++++++++------------
>  core/test/run-msg.c            |  6 ++--
>  doc/opal-api/opal-messages.rst |  2 +-
>  include/opal-api.h             |  2 +-
>  4 files changed, 53 insertions(+), 23 deletions(-)
>
> diff --git a/core/opal-msg.c b/core/opal-msg.c
> index 907a9e0af..af1ec7d00 100644
> --- a/core/opal-msg.c
> +++ b/core/opal-msg.c
> @@ -25,6 +25,7 @@
>  struct opal_msg_entry {
>  	struct list_node link;
>  	void (*consumed)(void *data, int status);
> +	bool extended;
>  	void *data;
>  	struct opal_msg msg;
>  };
> @@ -39,37 +40,47 @@ int _opal_queue_msg(enum opal_msg_type msg_type, void *data,
>  		    size_t params_size, const void *params)
>  {
>  	struct opal_msg_entry *entry;
> +	uint64_t entry_size;
> +
> +	if ((params_size + OPAL_MSG_HDR_SIZE) > OPAL_MSG_SIZE) {
> +		prlog(PR_DEBUG, "param_size (0x%x) > opal_msg param size (0x%x)\n",
> +		      (u32)params_size, (u32)(OPAL_MSG_SIZE - OPAL_MSG_HDR_SIZE));
> +		return OPAL_PARAMETER;
> +	}
>  
>  	lock(&opal_msg_lock);
>  
> -	entry = list_pop(&msg_free_list, struct opal_msg_entry, link);
> -	if (!entry) {
> -		prerror("No available node in the free list, allocating\n");
> -		entry = zalloc(sizeof(struct opal_msg_entry));
> +	if (params_size > OPAL_MSG_FIXED_PARAMS_SIZE) {
> +		entry_size = sizeof(struct opal_msg_entry) + params_size;
> +		entry_size -= OPAL_MSG_FIXED_PARAMS_SIZE;
> +		entry = zalloc(entry_size);
> +		if (entry)
> +			entry->extended = true;
> +	} else {
> +		entry = list_pop(&msg_free_list, struct opal_msg_entry, link);
>  		if (!entry) {
> -			prerror("Allocation failed\n");
> -			unlock(&opal_msg_lock);
> -			return OPAL_RESOURCE;
> +			prerror("No available node in the free list, allocating\n");
> +			entry = zalloc(sizeof(struct opal_msg_entry));

I'm tempted to say we should switch to using the pool allocator for
these and hard failing when we run out. This could come in a separate
patch though.

We've (more than once) gotten everything fairly wrong and eaten up all
of skiboot heap with messages that aren't being consumed, causing
everything to explode even more horribly than it was already exploding.

Vasant Hegde May 28, 2019, 4:56 a.m. UTC | #2

On 05/21/2019 12:50 PM, Stewart Smith wrote:
> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
> 
>> Linux uses opal_get_msg (OPAL_GET_MSG) API to get OPAL messages. This interface
>> supports upto 8 params (64 bytes). We have a requirement to send bigger data to
>> Linux. This patch enhances OPAL to send bigger data to Linux.
>>
>> - Linux will use "opal-msg-size" device tree property to allocate memory for
>>    OPAL messages (previous patch increased "opal-msg-size" to 64K).
>>
>> - Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux
>>    side opal_get_msg user can detect actual data size.
>>
>> - If buffer size < actual message size, then opal_get_msg will copy partial
>>    data and return OPAL_PARTIAL to Linux.
> 
> Looking through the linux code, we should probably very carfully
> document the expected behaviour for larger messages an the impact.

Correct. Its already documented in OPAL.

OPAL_PARTIAL
    If pending opal message is greater than supplied buffer.
    In this case the message is *DISCARDED* by OPAL.

...and this patch retains above behavior. i.e. we remove entry from list and return
OPAL_PARTIAL to kernel.

> 
> Specifically around what happens with OPAL_PARTIAL and ensuring we don't
> just print error messages out from the kernel forever.

It will print error message once and move on....as OPAL removes entry from list.
(Same as existing behavior).

> 
> Specifically, we seem to have two places where we consume messages in
> Linux:
> 
> in opal.c:
> static void opal_handle_message(void)
> {
> ....
>          ret = opal_get_msg(__pa(&msg), sizeof(msg));
>          /* No opal message pending. */
>          if (ret == OPAL_RESOURCE)
>                  return;
> 
>          /* check for errors. */
>          if (ret) {
>                  pr_warn("%s: Failed to retrieve opal message, err=%lld\n",
>                          __func__, ret);
>                  return;
>          }
> 
> So this gives me pause, as if we have a large message needing to be
> retreived, existing kernels never will clear it, and spew pr_warns from
> here to eternity.
> 
> and in opal-hmi.c:
> 
>          if (unrecoverable) {
>                  /* Pull all HMI events from OPAL before we panic. */
>                  while (opal_get_msg(__pa(&msg), sizeof(msg)) == OPAL_SUCCESS) {
>                        .....
>                        print_hmi_event_info(hmi_evt);
> 
> 
> So if we have a large event in the middle of a bunch of fatal HMI info,
> we'll lose the HMI event info.
> 
> So I think we need to ensure two things:
> 1) we don't lose unrecoverable HMIs
> 2) existing kernels degrade gracefully.

I think both are taken care.

> 
> We can probably solve 1 by something only casually ugly (remove every
> event that isn't unrecoverable hmi or don't return OPAL_PARTIAL when
> there's an unrecoverable HMI event present).

This is not an issue because existing interface handles HMI events properly.
And on old kernel / new OPAL, for bigger messages we will return OPAL_PARTIAL *and*
remove message from list. We will not have infinite loop. so we are good.

> 
> For 2, I'm trying to think of what's good to do.. my ideas go down two
> paths:
> - if a message has gotten OPAL_PARTIAL twice, drop it.
> - Drop message after first OPAL_PARTIAL if opal_get_msg size parameter
>    is the sizeof(msg) rather than what we put in the device tree.

Only way for kernel to know the maximum message size is through device tree.
So it makes sense to drop message if kernel is not passing sufficient buffer 
(i.e. Old
kernel / new OPAL combination).

> 
> Either way, we should very clearly document this (and the reasoning
> behind it) in the OPAL API docs.
> 
> I haven't looked at what FreeBSD does, and we should probably do that if
> only to get into the habit of doing so.
> 
> 
>> - Add new variable "extended" to "opal_msg_entry" structure to keep track
>>    of messages that has more than 64byte data. We will allocate separate
>>    memory for these messages and once kernel consumes message we will
>>    release that memory.
>>
>> Cc: Jeremy Kerr <jk@ozlabs.org>
>> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> Cc: Oliver O'Halloran <oohall@gmail.com>
>> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
>> Acked-by: Jeremy Kerr <jk@ozlabs.org>
>> ---
>>   core/opal-msg.c                | 66 ++++++++++++++++++++++++++++++------------
>>   core/test/run-msg.c            |  6 ++--
>>   doc/opal-api/opal-messages.rst |  2 +-
>>   include/opal-api.h             |  2 +-
>>   4 files changed, 53 insertions(+), 23 deletions(-)
>>
>> diff --git a/core/opal-msg.c b/core/opal-msg.c
>> index 907a9e0af..af1ec7d00 100644
>> --- a/core/opal-msg.c
>> +++ b/core/opal-msg.c
>> @@ -25,6 +25,7 @@
>>   struct opal_msg_entry {
>>   	struct list_node link;
>>   	void (*consumed)(void *data, int status);
>> +	bool extended;
>>   	void *data;
>>   	struct opal_msg msg;
>>   };
>> @@ -39,37 +40,47 @@ int _opal_queue_msg(enum opal_msg_type msg_type, void *data,
>>   		    size_t params_size, const void *params)
>>   {
>>   	struct opal_msg_entry *entry;
>> +	uint64_t entry_size;
>> +
>> +	if ((params_size + OPAL_MSG_HDR_SIZE) > OPAL_MSG_SIZE) {
>> +		prlog(PR_DEBUG, "param_size (0x%x) > opal_msg param size (0x%x)\n",
>> +		      (u32)params_size, (u32)(OPAL_MSG_SIZE - OPAL_MSG_HDR_SIZE));
>> +		return OPAL_PARAMETER;
>> +	}
>>   
>>   	lock(&opal_msg_lock);
>>   
>> -	entry = list_pop(&msg_free_list, struct opal_msg_entry, link);
>> -	if (!entry) {
>> -		prerror("No available node in the free list, allocating\n");
>> -		entry = zalloc(sizeof(struct opal_msg_entry));
>> +	if (params_size > OPAL_MSG_FIXED_PARAMS_SIZE) {
>> +		entry_size = sizeof(struct opal_msg_entry) + params_size;
>> +		entry_size -= OPAL_MSG_FIXED_PARAMS_SIZE;
>> +		entry = zalloc(entry_size);
>> +		if (entry)
>> +			entry->extended = true;
>> +	} else {
>> +		entry = list_pop(&msg_free_list, struct opal_msg_entry, link);
>>   		if (!entry) {
>> -			prerror("Allocation failed\n");
>> -			unlock(&opal_msg_lock);
>> -			return OPAL_RESOURCE;
>> +			prerror("No available node in the free list, allocating\n");
>> +			entry = zalloc(sizeof(struct opal_msg_entry));
> 
> I'm tempted to say we should switch to using the pool allocator for
> these and hard failing when we run out. This could come in a separate
> patch though.

Agreed. Makes sense to control number of allocation. I will send separate patch 
for that.

-Vasant

Stewart Smith May 30, 2019, 3:40 a.m. UTC | #3

Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
> On 05/21/2019 12:50 PM, Stewart Smith wrote:
>> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>> 
>>> Linux uses opal_get_msg (OPAL_GET_MSG) API to get OPAL messages. This interface
>>> supports upto 8 params (64 bytes). We have a requirement to send bigger data to
>>> Linux. This patch enhances OPAL to send bigger data to Linux.
>>>
>>> - Linux will use "opal-msg-size" device tree property to allocate memory for
>>>    OPAL messages (previous patch increased "opal-msg-size" to 64K).
>>>
>>> - Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux
>>>    side opal_get_msg user can detect actual data size.
>>>
>>> - If buffer size < actual message size, then opal_get_msg will copy partial
>>>    data and return OPAL_PARTIAL to Linux.
>> 
>> Looking through the linux code, we should probably very carfully
>> document the expected behaviour for larger messages an the impact.
>
> Correct. Its already documented in OPAL.
>
> OPAL_PARTIAL
>     If pending opal message is greater than supplied buffer.
>     In this case the message is *DISCARDED* by OPAL.
>
> ...and this patch retains above behavior. i.e. we remove entry from list and return
> OPAL_PARTIAL to kernel.

Ahh excellent.

Oh look, some guy named Stewart wrote that nearly 3 years ago.

>> Specifically around what happens with OPAL_PARTIAL and ensuring we don't
>> just print error messages out from the kernel forever.
>
> It will print error message once and move on....as OPAL removes entry from list.
> (Same as existing behavior).
>
>> 
>> Specifically, we seem to have two places where we consume messages in
>> Linux:
>> 
>> in opal.c:
>> static void opal_handle_message(void)
>> {
>> ....
>>          ret = opal_get_msg(__pa(&msg), sizeof(msg));
>>          /* No opal message pending. */
>>          if (ret == OPAL_RESOURCE)
>>                  return;
>> 
>>          /* check for errors. */
>>          if (ret) {
>>                  pr_warn("%s: Failed to retrieve opal message, err=%lld\n",
>>                          __func__, ret);
>>                  return;
>>          }
>> 
>> So this gives me pause, as if we have a large message needing to be
>> retreived, existing kernels never will clear it, and spew pr_warns from
>> here to eternity.
>> 
>> and in opal-hmi.c:
>> 
>>          if (unrecoverable) {
>>                  /* Pull all HMI events from OPAL before we panic. */
>>                  while (opal_get_msg(__pa(&msg), sizeof(msg)) == OPAL_SUCCESS) {
>>                        .....
>>                        print_hmi_event_info(hmi_evt);
>> 
>> 
>> So if we have a large event in the middle of a bunch of fatal HMI info,
>> we'll lose the HMI event info.
>> 
>> So I think we need to ensure two things:
>> 1) we don't lose unrecoverable HMIs
>> 2) existing kernels degrade gracefully.
>
> I think both are taken care.

Has it been tested?


>> I'm tempted to say we should switch to using the pool allocator for
>> these and hard failing when we run out. This could come in a separate
>> patch though.
>
> Agreed. Makes sense to control number of allocation. I will send separate patch 
> for that.

ok.

Vasant Hegde May 30, 2019, 5:06 a.m. UTC | #4

On 05/30/2019 09:10 AM, Stewart Smith wrote:
> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>> On 05/21/2019 12:50 PM, Stewart Smith wrote:
>>> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>>>

.../...

>>
>> OPAL_PARTIAL
>>      If pending opal message is greater than supplied buffer.
>>      In this case the message is *DISCARDED* by OPAL.
>>
>> ...and this patch retains above behavior. i.e. we remove entry from list and return
>> OPAL_PARTIAL to kernel.
> 
> Ahh excellent.
> 
> Oh look, some guy named Stewart wrote that nearly 3 years ago.
> 

Yes :-)

.../..

>>>
>>> So I think we need to ensure two things:
>>> 1) we don't lose unrecoverable HMIs
>>> 2) existing kernels degrade gracefully.
>>
>> I think both are taken care.
> 
> Has it been tested?

Yes. I had custom patch to inject bigger message size and tested
with both old and new kernel.

-Vasant

[v4,04/10] opal-msg: Enhance opal-get-msg API

Checks

Commit Message

Comments

Patch