diff mbox series

[v8,20/24] MPIPL: Add OPAL API to query saved tags

Message ID 20190616171024.22799-21-hegdevasant@linux.vnet.ibm.com
State Superseded
Headers show
Series MPIPL support | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (dbf27b6c4af84addb36bd3be34f96580aba9c873)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot fail Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present

Commit Message

Vasant Hegde June 16, 2019, 5:10 p.m. UTC
Pre-MPIPL kernel saves various information required to create vmcore in
metadata area and passes metadata area pointer to OPAL. OPAL will preserve
this pointer across MPIPL. Post MPIPL kernel will request for saved tags
via this API. Kernel also needs below tags:
  - Saved CPU registers data to access CPU registers
  - OPAL metadata area to create opalcore

Format:
  opal_mpipl_query_tag(uint32_t idx, uint64_t *tag)

  idx : tag index (0..n)
  tag : OPAL will pass saved tag

  Kernel will make this call with increased `index` until OPAL returns
  OPAL_EMPTY.

Return values:
  OPAL_SUCCESS   : Operation success
  OPAL_PARAMETER : Invalid parameter
  OPAL_EMPTY	 : OPAL completed sending all tags to kernel

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 core/opal-dump.c   | 32 ++++++++++++++++++++++++++++++++
 include/opal-api.h |  3 ++-
 2 files changed, 34 insertions(+), 1 deletion(-)

Comments

Nicholas Piggin June 28, 2019, 1:46 a.m. UTC | #1
Vasant Hegde's on June 17, 2019 3:10 am:
> Pre-MPIPL kernel saves various information required to create vmcore in
> metadata area and passes metadata area pointer to OPAL. OPAL will preserve
> this pointer across MPIPL. Post MPIPL kernel will request for saved tags
> via this API. Kernel also needs below tags:
>   - Saved CPU registers data to access CPU registers
>   - OPAL metadata area to create opalcore
> 
> Format:
>   opal_mpipl_query_tag(uint32_t idx, uint64_t *tag)
> 
>   idx : tag index (0..n)
>   tag : OPAL will pass saved tag
> 
>   Kernel will make this call with increased `index` until OPAL returns
>   OPAL_EMPTY.
> 
> Return values:
>   OPAL_SUCCESS   : Operation success
>   OPAL_PARAMETER : Invalid parameter
>   OPAL_EMPTY	 : OPAL completed sending all tags to kernel

Thanks for doing this API as well. I like giving the client the
ability to just define their own metadata in preserved memory.

Thanks,
Nick
Vasant Hegde June 28, 2019, 10:07 a.m. UTC | #2
On 06/28/2019 07:16 AM, Nicholas Piggin wrote:
> Vasant Hegde's on June 17, 2019 3:10 am:
>> Pre-MPIPL kernel saves various information required to create vmcore in
>> metadata area and passes metadata area pointer to OPAL. OPAL will preserve
>> this pointer across MPIPL. Post MPIPL kernel will request for saved tags
>> via this API. Kernel also needs below tags:
>>    - Saved CPU registers data to access CPU registers
>>    - OPAL metadata area to create opalcore
>>
>> Format:
>>    opal_mpipl_query_tag(uint32_t idx, uint64_t *tag)
>>
>>    idx : tag index (0..n)
>>    tag : OPAL will pass saved tag
>>
>>    Kernel will make this call with increased `index` until OPAL returns
>>    OPAL_EMPTY.
>>
>> Return values:
>>    OPAL_SUCCESS   : Operation success
>>    OPAL_PARAMETER : Invalid parameter
>>    OPAL_EMPTY	 : OPAL completed sending all tags to kernel
> 
> Thanks for doing this API as well. I like giving the client the
> ability to just define their own metadata in preserved memory.

Thanks!

-Vasant
Oliver O'Halloran July 9, 2019, 10:03 a.m. UTC | #3
On Sun, 2019-06-16 at 22:40 +0530, Vasant Hegde wrote:
> Pre-MPIPL kernel saves various information required to create vmcore in
> metadata area and passes metadata area pointer to OPAL. OPAL will preserve
> this pointer across MPIPL. Post MPIPL kernel will request for saved tags
> via this API. Kernel also needs below tags:
>   - Saved CPU registers data to access CPU registers
>   - OPAL metadata area to create opalcore
> 
> Format:
>   opal_mpipl_query_tag(uint32_t idx, uint64_t *tag)
> 
>   idx : tag index (0..n)
>   tag : OPAL will pass saved tag
> 
>   Kernel will make this call with increased `index` until OPAL returns
>   OPAL_EMPTY.
> 
> Return values:
>   OPAL_SUCCESS   : Operation success
>   OPAL_PARAMETER : Invalid parameter
>   OPAL_EMPTY	 : OPAL completed sending all tags to kernel

After spending a while picking through the linux patches I'll say that the
revised API is mostly fine, BUT:

a) The documentation is *woefully* inadequate, and
b) We need to think a bit more about how tags are used.

My interpretation is that the kernel tag is there to allow the crashing kernel
to define it's own dump metadata for the recieving kernel. As a result firmware,
which includes the petitboot kernel, can't make assumptions about the contents
of the tagged data since it could be populated by a version of Linux with a
different metadata ABI or by another OS entirely.

So as far as I can tell the overall process is something like:

1. We boot an OS. This may or may not be based on a Linux kernel.
2. OS configures MPIPL and sets the kernel tag to some structure it defines.
3. OS triggers a MPIPL. We re-run through the SBE and hostboot. Most of
   memory is left as-is. Some memory used by hostboot, the ranges in the MDDT,
   and the skiboot region at 0x3000_0000 are overwritten.
4. Skiboot parses the MDRT and sets a few DT flags to indicate an MPIPL
   happened and populates the tags which are defined by OPAL.
5. Petitkernel boots and detects that an MPIPL happened and tries to preserve
   the crashed kernel's memory.
6. We kexec into a capture kernel that knows how to parse the kernel tag's
   metadata and generates a dump from that.


This raises a few questions:

1) How much memory does firmware have to work with? This doesn't seem to be
actually documented anywhere. The closest thing I can find to an answer is that
in the Linux patches will preserve at least 0x0...0x3000_0000 since that's what
the fadump.ops->get_bootmem_min() returns for powernv. However, I think the
minimum memory that firmware needs is something that it should tell the OS via
the DT rather than having the OS guess. We already communicate some of that
information via the fw-load-area DT property, but that only contains the
ranges where skiboot loads the kernel and initrd (0x2000_000...0x2fff_ffff).

Odds are hostboot doesn't use *that* much memory, but the petitboot kernel can
use quite a bit. I checked on a random lab system and it was using ~400MB
after a reboot, just sitting at the petitboot shell prompt.


2) What happens if the tag points into a range that is moved by firmware? If
this is considered a kernel bug then that needs to be documented in the API.


3) How does a kernel know if it can parse the data pointed to be the kernel tag?
Right now we assume that each tag (kernel tag included) points to a structure
with the following header:

struct tag_header {
	u8 type;
	u8 version;
	u8 reserved[6]
	/* stuff */
}

The fact this is a common header is not documented anywhere or described in any
of the relevant structures (mpipl_fadump in OPAL, and opal_fadump_mem_struct
in Linux), but the code in Linux which scans for the kernel tag assumes it's
there:

From arch/powerpc/platforms/powernv/opal-fadump.c, line 597:

	do {
		ret = opal_mpipl_query_tag(idx, &addr);
		if (ret != OPAL_SUCCESS)
			break;

		addr = be64_to_cpu(addr);
		type = *((u8 *)addr);
		switch (type) {
		case MPIPL_FADUMP_TYPE_CPU:
			opal_cpu_metadata = __va(addr);
			break;
		case MPIPL_FADUMP_TYPE_KERNEL:
			opal_fdm_active = __va(addr);
			break;
		}

		pr_debug("idx: %d, addr: %llx, type: %x\n",
			 idx, addr, type);
		idx++;
	} while (ret == OPAL_SUCCESS);

So if the crashing OS doesn't use that header and the first byte happens to be
the same as MPIPL_FADUMP_TYPE_KERNEL, we'll probably blow up horrible later on
while parsing the kernel-defined fadump structure. While we're here, why is
MPIPL_FADUMP_TYPE_KERNEL = 0x80? It seems like a relic of the old API.


That all said, I think most of these problems can be solved by making the
"type" field an output parameter of opal_mpipl_query_tag() so the common
header can be dropped.

Without the common header you could then add an address-only tag that's defined
by the crashing kernel that indicates the upper limit of preserved memory. That way
the crashing kernel can configure the dump table to make a few GB available for
petitboot and all petitboot needs to do is query that tag to determine how much
memory is availble to it.

Adding a uint64_t structure magic field to the structures defined by OPAL and
Linux is also a good idea since it gives you a reliable way to determine if
it's a structure you know how deal with or not.

Thoughts?



p.s.

Can we please settle on some kind of sane naming convention, like:

	opal_mpipl_  - only used for stuff defined by OPAL and
	fadump_      - generic fadump in linux
	pnv_fadump_  - powernv specific fadump stuff.

One of the things that has made reviewing the mpipl patches painful is that
there's no clear seperation between what's defined in OPAL vs the kernel. It's
not a showstopper, but it makes an already hard job even harder. A particularly
eregious example is this bastard structure:

From skiboot's opal-api.h:

struct mpipl_fadump {
        u8      type;           /* MPIPL_FADUMP_TYPE_* */
        u8      version;
        u8      reserved[6];
        u32     crashing_pir;   /* OPAL crashing CPU PIR */
        u32     cpu_data_version;
        u32     cpu_data_size;
        u32     fadump_region_cnt;
        struct  fadump_region region[];
} __packed;

From the kernel's opal-api.h:

struct opal_mpipl_fadump {
        u8      type;
        u8      version;
        u8      reserved[6];
        __be32  crashing_pir;
        __be32  cpu_data_version;
        __be32  cpu_data_size;
        __be32  region_cnt;
        struct opal_fadump_region       region[OPAL_FADUMP_NR_SECTIONS];
} __attribute__((packed));

NOT EVEN THE SAME NAME! OR DEFINITION!
Vasant Hegde July 9, 2019, 3:23 p.m. UTC | #4
On 07/09/2019 03:33 PM, Oliver O'Halloran wrote:
> On Sun, 2019-06-16 at 22:40 +0530, Vasant Hegde wrote:
>> Pre-MPIPL kernel saves various information required to create vmcore in
>> metadata area and passes metadata area pointer to OPAL. OPAL will preserve
>> this pointer across MPIPL. Post MPIPL kernel will request for saved tags
>> via this API. Kernel also needs below tags:
>>    - Saved CPU registers data to access CPU registers
>>    - OPAL metadata area to create opalcore
>>
>> Format:
>>    opal_mpipl_query_tag(uint32_t idx, uint64_t *tag)
>>
>>    idx : tag index (0..n)
>>    tag : OPAL will pass saved tag
>>
>>    Kernel will make this call with increased `index` until OPAL returns
>>    OPAL_EMPTY.
>>
>> Return values:
>>    OPAL_SUCCESS   : Operation success
>>    OPAL_PARAMETER : Invalid parameter
>>    OPAL_EMPTY	 : OPAL completed sending all tags to kernel
> 
> After spending a while picking through the linux patches I'll say that the
> revised API is mostly fine, BUT:
> 
> a) The documentation is *woefully* inadequate, and

I have documented API under `doc/opal-api/opal-mpipl-173-174.rst` with expected 
behavior
under various scenarios (pretty much similar to how we documented other APIs).


> b) We need to think a bit more about how tags are used.

We (Myself, Ananth, Nick, Hari, Mahesh) had a detailed *offline* discussion on 
tag. We did
thought about passing `tag` as part of opal_mpipl_query_tag() API. But Nick 
don't wanted
that and wanted to keep API simple. So now our API doesn't care about `tag` 
type. Its just
a interface to retrieve various tags. So we had to push the tag field to structure.


> 
> My interpretation is that the kernel tag is there to allow the crashing kernel
> to define it's own dump metadata for the recieving kernel. As a result firmware,

Yes. Kernel will have its metadata area and tag points to the metadata area. 
(well, actually tag
can be anything using which post kernel can retrieve its metadata area).

> which includes the petitboot kernel, can't make assumptions about the contents
> of the tagged data since it could be populated by a version of Linux with a
> different metadata ABI or by another OS entirely.

OPAL will not care about tag content. Its job is to preserve tag across MPIPL and
pass it back to kernel.


> 
> So as far as I can tell the overall process is something like:
> 
> 1. We boot an OS. This may or may not be based on a Linux kernel.
> 2. OS configures MPIPL and sets the kernel tag to some structure it defines.
> 3. OS triggers a MPIPL. We re-run through the SBE and hostboot. Most of
>     memory is left as-is. Some memory used by hostboot, the ranges in the MDDT,
>     and the skiboot region at 0x3000_0000 are overwritten.
> 4. Skiboot parses the MDRT and sets a few DT flags to indicate an MPIPL
>     happened and populates the tags which are defined by OPAL.
> 5. Petitkernel boots and detects that an MPIPL happened and tries to preserve
>     the crashed kernel's memory.
> 6. We kexec into a capture kernel that knows how to parse the kernel tag's
>     metadata and generates a dump from that.

Correct.

> 
> 
> This raises a few questions:
> 
> 1) How much memory does firmware have to work with? This doesn't seem to be

So Post MPIPL,
  -  hostboot needs to be loaded and executed. Its done at predefined address and
     that memory is part of reserved-memory range. So we don't need to worry
     about this memory.
  - OPAL memory
    By the time OPAL loads, hostboot already preserved memory based on MDST, 
MDDT table.
    These table contains OPAL runtime memory as well. So by the time we load 
OPAL its already
     preserved. We are good to reload OPAL on same memory.
- Memory used by OPAL to load petitboot kernel
   First kernel advertises these range to kernel via `fw-load-area` so that 
kernel care of
   preserving these memory.



> actually documented anywhere. The closest thing I can find to an answer is that
> in the Linux patches will preserve at least 0x0...0x3000_0000 since that's what
> the fadump.ops->get_bootmem_min() returns for powernv. 

Kernel will preserve kernel memory ranges. It won't care about firmware memory area.

> However, I think the
> minimum memory that firmware needs is something that it should tell the OS via
> the DT rather than having the OS guess.

OS won't guess anything about firmware memory usage. In fact it won't care about
firmware memory except `fw-load-area`.

> We already communicate some of that
> information via the fw-load-area DT property, but that only contains the
> ranges where skiboot loads the kernel and initrd (0x2000_000...0x2fff_ffff).

OS needs to know about these memory ranges *only*...as first kernel may use these
memory. If it uses it should take care of reserving destination memory to 
preserve these
memory and pass it during MPIPL registration.

> 
> Odds are hostboot doesn't use *that* much memory, but the petitboot kernel can

As mentioned above kernel need not worry about hostboot memory usage...as its
part of reserved-memory ranges.

> use quite a bit. I checked on a random lab system and it was using ~400MB
> after a reboot, just sitting at the petitboot shell prompt.

We are introducing new kernel config (FADUMP_PRESERVE) and that will take care of
petitboot kernel part. Hari can elaborate this.


> 
> 
> 2) What happens if the tag points into a range that is moved by firmware? If
> this is considered a kernel bug then that needs to be documented in the API.

Kernel tag : During registration kernel passes tag (which points to metadata area)
and post MPIPL OPAL has to send back this tag. It won't care whether its part of
moved memory or not. From kernel point of view it should get back what it passed.

OPAL tag : Points to kernel meta data area. Using this pointer kernel can 
retrieve OPAL dump
area and generate opalcore. Kernel won't care whether its part of original 
memory or moved
memory.


> 
> 
> 3) How does a kernel know if it can parse the data pointed to be the kernel tag?

OPAL guarantees to return tag sent by kernel during registration without 
modification.
Additionally it can validate the header.

> Right now we assume that each tag (kernel tag included) points to a structure
> with the following header:
> 
> struct tag_header {
> 	u8 type;
> 	u8 version;
> 	u8 reserved[6]
> 	/* stuff */
> }
> 
> The fact this is a common header is not documented anywhere or described in any
> of the relevant structures (mpipl_fadump in OPAL, and opal_fadump_mem_struct
> in Linux), but the code in Linux which scans for the kernel tag assumes it's
> there:

I think we missed to sync opal-api.h before sending patches. It will be fixed in 
next version.

> 
>  From arch/powerpc/platforms/powernv/opal-fadump.c, line 597:
> 
> 	do {
> 		ret = opal_mpipl_query_tag(idx, &addr);
> 		if (ret != OPAL_SUCCESS)
> 			break;
> 
> 		addr = be64_to_cpu(addr);
> 		type = *((u8 *)addr);
> 		switch (type) {
> 		case MPIPL_FADUMP_TYPE_CPU:
> 			opal_cpu_metadata = __va(addr);
> 			break;
> 		case MPIPL_FADUMP_TYPE_KERNEL:
> 			opal_fdm_active = __va(addr);
> 			break;
> 		}
> 
> 		pr_debug("idx: %d, addr: %llx, type: %x\n",
> 			 idx, addr, type);
> 		idx++;
> 	} while (ret == OPAL_SUCCESS);
> 
> So if the crashing OS doesn't use that header and the first byte happens to be
> the same as MPIPL_FADUMP_TYPE_KERNEL, we'll probably blow up horrible later on

First OS registers for MPIPL only after setting up metadata area. If it crashes 
before that means it is not registered for MPIPL. And OPAL promises to return 
same tag after MPIPL.
So I don't see why things will go wrong here.

> while parsing the kernel-defined fadump structure. While we're here, why is
> MPIPL_FADUMP_TYPE_KERNEL = 0x80? It seems like a relic of the old API.

Well, we need a way to differentiate tag type. Kernel, OPAL. etc.
(in future we may have new tag type as well). And we have byte. So I just
used 0x01 -> 0x7F for OPAL range and 0x80 - 0xff for kernel.


> 
> 
> That all said, I think most of these problems can be solved by making the
> "type" field an output parameter of opal_mpipl_query_tag() so the common
> header can be dropped.

We don't really gain much by moving "type" field from structure to API. Whether
its structure -OR- API, OPAL will just return whatever kernel passed tag. (and 
of course
OPAL tag for parsing OPAL core).

We cannot drop common structure anyway. Kernel needs to know how to parse
OPAL metadata. So we need that.



> 
> Without the common header you could then add an address-only tag that's defined
> by the crashing kernel that indicates the upper limit of preserved memory. That way
> the crashing kernel can configure the dump table to make a few GB available for
> petitboot and all petitboot needs to do is query that tag to determine how much
> memory is availble to it.
> 
> Adding a uint64_t structure magic field to the structures defined by OPAL and
> Linux is also a good idea since it gives you a reliable way to determine if
> it's a structure you know how deal with or not.
> 
> Thoughts?

I don't think we gain anything with that.

> 
> 
> 
> p.s.
> 
> Can we please settle on some kind of sane naming convention, like:
> 
> 	opal_mpipl_  - only used for stuff defined by OPAL and
> 	fadump_      - generic fadump in linux
> 	pnv_fadump_  - powernv specific fadump stuff.

I think I had mixed up fadump and mpipl in earlier version. Now I have settled with
opal_mpipl  for OPAL side. Kernel side I think we are using above convention. If we
missed it we will fix it in next version.


> 
> One of the things that has made reviewing the mpipl patches painful is that
> there's no clear seperation between what's defined in OPAL vs the kernel. It's
> not a showstopper, but it makes an already hard job even harder. A particularly
> eregious example is this bastard structure:

As mentioned above we missed to sync structures before sending out patches.
Sorry for that. It will be fixed in next version.

-Vasant

 >  From skiboot's opal-api.h:
 >
 > struct mpipl_fadump {
 >          u8      type;           /* MPIPL_FADUMP_TYPE_* */
 >          u8      version;
 >          u8      reserved[6];
 >          u32     crashing_pir;   /* OPAL crashing CPU PIR */
 >          u32     cpu_data_version;
 >          u32     cpu_data_size;
 >          u32     fadump_region_cnt;
 >          struct  fadump_region region[];
 > } __packed;
 >
 >  From the kernel's opal-api.h:
 >
 > struct opal_mpipl_fadump {
 >          u8      type;
 >          u8      version;
 >          u8      reserved[6];
 >          __be32  crashing_pir;
 >          __be32  cpu_data_version;
 >          __be32  cpu_data_size;
 >          __be32  region_cnt;
 >          struct opal_fadump_region       region[OPAL_FADUMP_NR_SECTIONS];
 > } __attribute__((packed));
 >
 > NOT EVEN THE SAME NAME! OR DEFINITION!
 >
Hari Bathini July 9, 2019, 5:15 p.m. UTC | #5
On 09/07/19 3:33 PM, Oliver O'Halloran wrote:

[...]
> Can we please settle on some kind of sane naming convention, like:
>
> 	opal_mpipl_  - only used for stuff defined by OPAL and

My bad! Will make sure the definitions in arch/powerpc/include/asm/opal-api.h are consistent
with how the definitions are in `include/opal-api.h` 

> 	fadump_      - generic fadump in linux

Yes. This is how I am having it currently. Will fix aberrations if any, in the next revision.. 

> 	pnv_fadump_  - powernv specific fadump stuff.

Using opal_fadump_ for powernv specific stuff and I should probably use rtas_fadump_ instead of
pseries_fadump_ for pseries stuff..

Thanks
Hari
Michael Neuling July 10, 2019, 3:52 a.m. UTC | #6
On Tue, 2019-07-09 at 20:53 +0530, Vasant Hegde wrote:
> On 07/09/2019 03:33 PM, Oliver O'Halloran wrote:
> > On Sun, 2019-06-16 at 22:40 +0530, Vasant Hegde wrote:
> > > Pre-MPIPL kernel saves various information required to create vmcore in
> > > metadata area and passes metadata area pointer to OPAL. OPAL will preserve
> > > this pointer across MPIPL. Post MPIPL kernel will request for saved tags
> > > via this API. Kernel also needs below tags:
> > >    - Saved CPU registers data to access CPU registers
> > >    - OPAL metadata area to create opalcore
> > > 
> > > Format:
> > >    opal_mpipl_query_tag(uint32_t idx, uint64_t *tag)
> > > 
> > >    idx : tag index (0..n)
> > >    tag : OPAL will pass saved tag
> > > 
> > >    Kernel will make this call with increased `index` until OPAL returns
> > >    OPAL_EMPTY.
> > > 
> > > Return values:
> > >    OPAL_SUCCESS   : Operation success
> > >    OPAL_PARAMETER : Invalid parameter
> > >    OPAL_EMPTY	 : OPAL completed sending all tags to kernel
> > 
> > After spending a while picking through the linux patches I'll say that the
> > revised API is mostly fine, BUT:
> > 
> > a) The documentation is *woefully* inadequate, and
> 
> I have documented API under `doc/opal-api/opal-mpipl-173-174.rst` with
> expected 
> behavior
> under various scenarios (pretty much similar to how we documented other APIs).
> 
> 
> > b) We need to think a bit more about how tags are used.
> 
> We (Myself, Ananth, Nick, Hari, Mahesh) had a detailed *offline* discussion
> on 
> tag. We did
> thought about passing `tag` as part of opal_mpipl_query_tag() API. But Nick 
> don't wanted
> that and wanted to keep API simple. So now our API doesn't care about `tag` 
> type. Its just
> a interface to retrieve various tags. So we had to push the tag field to
> structure.
> 
> 
> > My interpretation is that the kernel tag is there to allow the crashing
> > kernel
> > to define it's own dump metadata for the recieving kernel. As a result
> > firmware,
> 
> Yes. Kernel will have its metadata area and tag points to the metadata area. 
> (well, actually tag
> can be anything using which post kernel can retrieve its metadata area).
> 
> > which includes the petitboot kernel, can't make assumptions about the
> > contents
> > of the tagged data since it could be populated by a version of Linux with a
> > different metadata ABI or by another OS entirely.
> 
> OPAL will not care about tag content. Its job is to preserve tag across MPIPL
> and
> pass it back to kernel.
> 
> 
> > So as far as I can tell the overall process is something like:
> > 
> > 1. We boot an OS. This may or may not be based on a Linux kernel.
> > 2. OS configures MPIPL and sets the kernel tag to some structure it defines.
> > 3. OS triggers a MPIPL. We re-run through the SBE and hostboot. Most of
> >     memory is left as-is. Some memory used by hostboot, the ranges in the
> > MDDT,
> >     and the skiboot region at 0x3000_0000 are overwritten.
> > 4. Skiboot parses the MDRT and sets a few DT flags to indicate an MPIPL
> >     happened and populates the tags which are defined by OPAL.
> > 5. Petitkernel boots and detects that an MPIPL happened and tries to
> > preserve
> >     the crashed kernel's memory.
> > 6. We kexec into a capture kernel that knows how to parse the kernel tag's
> >     metadata and generates a dump from that.
> 
> Correct.
> 
> > 
> > This raises a few questions:
> > 
> > 1) How much memory does firmware have to work with? This doesn't seem to be
> 
> So Post MPIPL,
>   -  hostboot needs to be loaded and executed. Its done at predefined address
> and
>      that memory is part of reserved-memory range. So we don't need to worry
>      about this memory.
>   - OPAL memory
>     By the time OPAL loads, hostboot already preserved memory based on MDST, 
> MDDT table.
>     These table contains OPAL runtime memory as well. So by the time we load 
> OPAL its already
>      preserved. We are good to reload OPAL on same memory.
> - Memory used by OPAL to load petitboot kernel
>    First kernel advertises these range to kernel via `fw-load-area` so that 
> kernel care of
>    preserving these memory.
> 
> 
> 
> > actually documented anywhere. The closest thing I can find to an answer is
> > that
> > in the Linux patches will preserve at least 0x0...0x3000_0000 since that's
> > what
> > the fadump.ops->get_bootmem_min() returns for powernv. 
> 
> Kernel will preserve kernel memory ranges. It won't care about firmware memory
> area.
> 
> > However, I think the
> > minimum memory that firmware needs is something that it should tell the OS
> > via
> > the DT rather than having the OS guess.
> 
> OS won't guess anything about firmware memory usage. In fact it won't care
> about
> firmware memory except `fw-load-area`.
> 
> > We already communicate some of that
> > information via the fw-load-area DT property, but that only contains the
> > ranges where skiboot loads the kernel and initrd (0x2000_000...0x2fff_ffff).
> 
> OS needs to know about these memory ranges *only*...as first kernel may use
> these
> memory. If it uses it should take care of reserving destination memory to 
> preserve these
> memory and pass it during MPIPL registration.
> 
> > Odds are hostboot doesn't use *that* much memory, but the petitboot kernel
> > can
> 
> As mentioned above kernel need not worry about hostboot memory usage...as its
> part of reserved-memory ranges.

Vasant,

I think you may have missed Oliver's point in the above.

The Petitboot kernel needs some amount space to operate. In your current patch
series, the amount of space "firmware" is allowed to use is hard-wired by the
crashing kernel. Olivers point is that this size should be defined dynamically
by firmware, not hard-wired into the crashing kernel.

If we every change petitboot or some other component of the firmware so that it
needs to use more memory, then the crashing kernel won't have any way of
discovering that and it won't allocate enough space for this process to start.

Mikey
Hari Bathini July 10, 2019, 4:50 a.m. UTC | #7
On 10/07/19 9:22 AM, Michael Neuling wrote:
> On Tue, 2019-07-09 at 20:53 +0530, Vasant Hegde wrote:
>> On 07/09/2019 03:33 PM, Oliver O'Halloran wrote:
>>> On Sun, 2019-06-16 at 22:40 +0530, Vasant Hegde wrote:

[...]

>> OS won't guess anything about firmware memory usage. In fact it won't care
>> about
>> firmware memory except `fw-load-area`.
>>
>>> We already communicate some of that
>>> information via the fw-load-area DT property, but that only contains the
>>> ranges where skiboot loads the kernel and initrd (0x2000_000...0x2fff_ffff).
>> OS needs to know about these memory ranges *only*...as first kernel may use
>> these
>> memory. If it uses it should take care of reserving destination memory to 
>> preserve these
>> memory and pass it during MPIPL registration.
>>
>>> Odds are hostboot doesn't use *that* much memory, but the petitboot kernel
>>> can
>> As mentioned above kernel need not worry about hostboot memory usage...as its
>> part of reserved-memory ranges.
> Vasant,
>
> I think you may have missed Oliver's point in the above.
>
> The Petitboot kernel needs some amount space to operate. In your current patch
> series, the amount of space "firmware" is allowed to use is hard-wired by the
> crashing kernel. Olivers point is that this size should be defined dynamically
> by firmware, not hard-wired into the crashing kernel.
>
> If we every change petitboot or some other component of the firmware so that it
> needs to use more memory, then the crashing kernel won't have any way of
> discovering that and it won't allocate enough space for this process to start.

Mikey, the memory to be used by petitboot kernel is restricted by crashkernel parameter.
This is not hard-wired. This has to be configured by the user (a systemadmin, generally).
The value is typically something that has to be sufficient to boot a petitboot kernel and
also the production kernel to capture the dump...

Thanks
Hari
Oliver O'Halloran July 10, 2019, 5:52 a.m. UTC | #8
On Wed, Jul 10, 2019 at 2:50 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
>
> On 10/07/19 9:22 AM, Michael Neuling wrote:
>
> On Tue, 2019-07-09 at 20:53 +0530, Vasant Hegde wrote:
>
> On 07/09/2019 03:33 PM, Oliver O'Halloran wrote:
>
> On Sun, 2019-06-16 at 22:40 +0530, Vasant Hegde wrote:
>
> [...]
>
> OS won't guess anything about firmware memory usage. In fact it won't care
> about
> firmware memory except `fw-load-area`.
>
> We already communicate some of that
> information via the fw-load-area DT property, but that only contains the
> ranges where skiboot loads the kernel and initrd (0x2000_000...0x2fff_ffff).
>
> OS needs to know about these memory ranges *only*...as first kernel may use
> these
> memory. If it uses it should take care of reserving destination memory to
> preserve these
> memory and pass it during MPIPL registration.
>
> Odds are hostboot doesn't use *that* much memory, but the petitboot kernel
> can
>
> As mentioned above kernel need not worry about hostboot memory usage...as its
> part of reserved-memory ranges.
>
> Vasant,
>
> I think you may have missed Oliver's point in the above.
>
> The Petitboot kernel needs some amount space to operate. In your current patch
> series, the amount of space "firmware" is allowed to use is hard-wired by the
> crashing kernel. Olivers point is that this size should be defined dynamically
> by firmware, not hard-wired into the crashing kernel.
>
> If we every change petitboot or some other component of the firmware so that it
> needs to use more memory, then the crashing kernel won't have any way of
> discovering that and it won't allocate enough space for this process to start.
>
> Mikey, the memory to be used by petitboot kernel is restricted by crashkernel parameter.
> This is not hard-wired. This has to be configured by the user (a systemadmin, generally).
> The value is typically something that has to be sufficient to boot a petitboot kernel and
> also the production kernel to capture the dump...

Right, the problem is that the metadata set by the production kernel
metadata *must* be in the format that the petitboot kernel expects.
The point of making OPAL oblivious to the contents of the kernel tag
was to allow the production kernel to define it's own metadata format
rather than having one imposed on it by the platform.

Previously the platform imposed structure was defined by the OPAL API
and we "fixed" that by moving to the tags API where OPAL is oblivious
of the kernel metadata. This is an improvement, but the way the tag
API is structured means that we now have a platform imposed structure
coming from the petitboot kernel which defeats the point of changing
the API in the first place.

As far as I can tell the only reason we've done this is to allow the
petitboot kernel to parse the metadata to work out how much memory
it's allowed to trash. Why not just solve that very specific problem
and let the production kernel do whatever it likes?

Oliver
Hari Bathini July 10, 2019, 7:11 a.m. UTC | #9
On 10/07/19 11:22 AM, Oliver O'Halloran wrote:
> On Wed, Jul 10, 2019 at 2:50 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>>
>> On 10/07/19 9:22 AM, Michael Neuling wrote:
>>
>> On Tue, 2019-07-09 at 20:53 +0530, Vasant Hegde wrote:
>>
>> On 07/09/2019 03:33 PM, Oliver O'Halloran wrote:
>>
>> On Sun, 2019-06-16 at 22:40 +0530, Vasant Hegde wrote:
>>
>> [...]
>>
>> OS won't guess anything about firmware memory usage. In fact it won't care
>> about
>> firmware memory except `fw-load-area`.
>>
>> We already communicate some of that
>> information via the fw-load-area DT property, but that only contains the
>> ranges where skiboot loads the kernel and initrd (0x2000_000...0x2fff_ffff).
>>
>> OS needs to know about these memory ranges *only*...as first kernel may use
>> these
>> memory. If it uses it should take care of reserving destination memory to
>> preserve these
>> memory and pass it during MPIPL registration.
>>
>> Odds are hostboot doesn't use *that* much memory, but the petitboot kernel
>> can
>>
>> As mentioned above kernel need not worry about hostboot memory usage...as its
>> part of reserved-memory ranges.
>>
>> Vasant,
>>
>> I think you may have missed Oliver's point in the above.
>>
>> The Petitboot kernel needs some amount space to operate. In your current patch
>> series, the amount of space "firmware" is allowed to use is hard-wired by the
>> crashing kernel. Olivers point is that this size should be defined dynamically
>> by firmware, not hard-wired into the crashing kernel.
>>
>> If we every change petitboot or some other component of the firmware so that it
>> needs to use more memory, then the crashing kernel won't have any way of
>> discovering that and it won't allocate enough space for this process to start.
>>
>> Mikey, the memory to be used by petitboot kernel is restricted by crashkernel parameter.
>> This is not hard-wired. This has to be configured by the user (a systemadmin, generally).
>> The value is typically something that has to be sufficient to boot a petitboot kernel and
>> also the production kernel to capture the dump...
> Right, the problem is that the metadata set by the production kernel
> metadata *must* be in the format that the petitboot kernel expects.
> The point of making OPAL oblivious to the contents of the kernel tag
> was to allow the production kernel to define it's own metadata format
> rather than having one imposed on it by the platform.
>
> Previously the platform imposed structure was defined by the OPAL API
> and we "fixed" that by moving to the tags API where OPAL is oblivious
> of the kernel metadata. This is an improvement, but the way the tag
> API is structured means that we now have a platform imposed structure
> coming from the petitboot kernel which defeats the point of changing
> the API in the first place.
>
> As far as I can tell the only reason we've done this is to allow the
> petitboot kernel to parse the metadata to work out how much memory
> it's allowed to trash. Why not just solve that very specific problem
> and let the production kernel do whatever it likes?
>
Sorry, if I was not clear but that is how it is. Metadata is setup by
production kernel based on user configuration and after crash, petitboot kernel
looks for this metadata to work out how much memory it is allowed to trash
(https://patchwork.ozlabs.org/patch/1122300/).

As for the type field in the metadata, that requirement comes from the need to
keep metadata retrieval simple. Without a type field, we need to have a predefined
ordering or kernel has to remember which tag no. corresponds to which data type.
While the former seems naive, the later, brings with it a need to handle kernel
metadata separately as we need to get that data before processing tags..

I would prefer it the current way as it is simple and we do any special
handling for kernel metadata..

We should get that requirement documented though..

Thanks
Hari
Oliver O'Halloran July 10, 2019, 9:35 a.m. UTC | #10
On Wed, Jul 10, 2019 at 5:11 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>
> Sorry, if I was not clear but that is how it is. Metadata is setup by
> production kernel based on user configuration and after crash, petitboot kernel
> looks for this metadata to work out how much memory it is allowed to trash
> (https://patchwork.ozlabs.org/patch/1122300/).
>
> As for the type field in the metadata, that requirement comes from the need to
> keep metadata retrieval simple. Without a type field, we need to have a predefined
> ordering or kernel has to remember which tag no. corresponds to which data type.
> While the former seems naive, the later, brings with it a need to handle kernel
> metadata separately as we need to get that data before processing tags.

I don't see a problem with either approach. Say we do the "naive"
thing and change the API to this:

enum opal_mpipl_tag {
OPAL_MPIPL_TAG_PROC_DATA,
OPAL_MPIPL_TAG_CORE,
OPAL_MPIPL_TAG_KERNEL,
}

And fetch the tags with:

int64_t opal_mpipl_query_tag(enum opal_mpipl_tag tag, uint64_t *tag_value);

Is this any more complex, or less flexible, than the current
query-and-iterate approach? The kernel is going to ignore  tags that
it doesn't recognise in either case.

Anyway, take that, add an extra tag that's set by the production
kernel to indicate where preserved memory starts and
CONFIG_FADUMP_PRESERVE boils down to:

rc = opal_mpipl_query_tag(OPAL_MPIPL_TAG_PRESERVE_LIMIT, &limit);
if(!rc)
    memblock_reserve(limit, memblock_end_of_DRAM());

> I would prefer it the current way as it is simple and we do any special
> handling for kernel metadata..

No one is suggesting adding any metadata handling to OPAL. So I don't
understand what you're complaining about here.

> We should get that requirement documented though..
>
> Thanks
> Hari
>
Vasant Hegde July 10, 2019, 9:50 a.m. UTC | #11
On 07/10/2019 09:22 AM, Michael Neuling wrote:
> On Tue, 2019-07-09 at 20:53 +0530, Vasant Hegde wrote:
>> On 07/09/2019 03:33 PM, Oliver O'Halloran wrote:
>>> On Sun, 2019-06-16 at 22:40 +0530, Vasant Hegde wrote:
>>>> Pre-MPIPL kernel saves various information required to create vmcore in
>>>> metadata area and passes metadata area pointer to OPAL. OPAL will preserve
>>>> this pointer across MPIPL. Post MPIPL kernel will request for saved tags
>>>> via this API. Kernel also needs below tags:
>>>>     - Saved CPU registers data to access CPU registers
>>>>     - OPAL metadata area to create opalcore
>>>>

.../...

>>> We already communicate some of that
>>> information via the fw-load-area DT property, but that only contains the
>>> ranges where skiboot loads the kernel and initrd (0x2000_000...0x2fff_ffff).
>>
>> OS needs to know about these memory ranges *only*...as first kernel may use
>> these
>> memory. If it uses it should take care of reserving destination memory to
>> preserve these
>> memory and pass it during MPIPL registration.
>>
>>> Odds are hostboot doesn't use *that* much memory, but the petitboot kernel
>>> can
>>
>> As mentioned above kernel need not worry about hostboot memory usage...as its
>> part of reserved-memory ranges.

Mikey,

> 
> Vasant,
> 
> I think you may have missed Oliver's point in the above.

Yes. I did miss the point of ABI -OR- endianess difference in petitboot and host 
kernel.

> 
> The Petitboot kernel needs some amount space to operate. In your current patch
> series, the amount of space "firmware" is allowed to use is hard-wired by the
> crashing kernel. Olivers point is that this size should be defined dynamically
> by firmware, not hard-wired into the crashing kernel.

Nope. Its not hard wired. But the problem with current patch series is petitboot 
kernel
goes and reads host Linux metadata area to find the memory it can use to boot... 
which
is not correct.

> 
> If we every change petitboot or some other component of the firmware so that it
> needs to use more memory, then the crashing kernel won't have any way of
> discovering that and it won't allocate enough space for this process to start.

We do not have problem with rest of the firmware stack.. as they have their own
load area and its reserved. Only issue is with petitboot kernel.

How about something like below :
   - First boot : Host kernel will tell OPAL about amount of memory firmware 
(including
                        petitboot) can use during MPIPL boot
       opal_mpipl_update(OPAL_MPIPL_BOOT_MEM_SIZE,  <size of memory petitboot 
can use>)
    - OPAL will preserve this across MPIPL
   - Post MPIPL : Petitboot kernel will make a query to get usable size
      opal_mpipl_query_tag(type, *tag)
      type = GET_BOOT_SIZE , *tag = size (assumption is that kernel will load at 
0th address)

This means we have to again change the opal_mpipl_query_tag() API. We will replace
`index` field with `type` field.
   opal_mpipl_query_tag(type, *tag)
    type = OPAL, Kernel tag, CPU tag, boot mem size tag etg

Post MPIPL boot, kernel will make opal_mpipl_query_tag with specific type field 
to tag the
data.

I'm fine with making above changes.

@Nick, any thoughts on above changes?


-Vasant
Hari Bathini July 10, 2019, 10:50 a.m. UTC | #12
On 10/07/19 3:05 PM, Oliver O'Halloran wrote:
> On Wed, Jul 10, 2019 at 5:11 PM Hari Bathini <hbathini@linux.ibm.com> wrote:
>> Sorry, if I was not clear but that is how it is. Metadata is setup by
>> production kernel based on user configuration and after crash, petitboot kernel
>> looks for this metadata to work out how much memory it is allowed to trash
>> (https://patchwork.ozlabs.org/patch/1122300/).
>>
>> As for the type field in the metadata, that requirement comes from the need to
>> keep metadata retrieval simple. Without a type field, we need to have a predefined
>> ordering or kernel has to remember which tag no. corresponds to which data type.
>> While the former seems naive, the later, brings with it a need to handle kernel
>> metadata separately as we need to get that data before processing tags.
> I don't see a problem with either approach. Say we do the "naive"
> thing and change the API to this:
>
> enum opal_mpipl_tag {
> OPAL_MPIPL_TAG_PROC_DATA,
> OPAL_MPIPL_TAG_CORE,
> OPAL_MPIPL_TAG_KERNEL,
> }
>
> And fetch the tags with:
>
> int64_t opal_mpipl_query_tag(enum opal_mpipl_tag tag, uint64_t *tag_value);

Actually, I would love to have it this way but I remember Nick was not for having a
type field in the opal call..

Thanks
Hari
Vasant Hegde July 10, 2019, 2:43 p.m. UTC | #13
On 07/10/2019 03:20 PM, Vasant Hegde wrote:
> On 07/10/2019 09:22 AM, Michael Neuling wrote:
>> On Tue, 2019-07-09 at 20:53 +0530, Vasant Hegde wrote:
>>> On 07/09/2019 03:33 PM, Oliver O'Halloran wrote:
>>>> On Sun, 2019-06-16 at 22:40 +0530, Vasant Hegde wrote:
>>>>> Pre-MPIPL kernel saves various information required to create vmcore in
>>>>> metadata area and passes metadata area pointer to OPAL. OPAL will preserve
>>>>> this pointer across MPIPL. Post MPIPL kernel will request for saved tags
>>>>> via this API. Kernel also needs below tags:
>>>>>     - Saved CPU registers data to access CPU registers
>>>>>     - OPAL metadata area to create opalcore
>>>>>
> 
> .../...
> 
>>>> We already communicate some of that
>>>> information via the fw-load-area DT property, but that only contains the
>>>> ranges where skiboot loads the kernel and initrd (0x2000_000...0x2fff_ffff).
>>>
>>> OS needs to know about these memory ranges *only*...as first kernel may use
>>> these
>>> memory. If it uses it should take care of reserving destination memory to
>>> preserve these
>>> memory and pass it during MPIPL registration.
>>>
>>>> Odds are hostboot doesn't use *that* much memory, but the petitboot kernel
>>>> can
>>>
>>> As mentioned above kernel need not worry about hostboot memory usage...as its
>>> part of reserved-memory ranges.
> 
> Mikey,
> 
>>
>> Vasant,
>>
>> I think you may have missed Oliver's point in the above.
> 
> Yes. I did miss the point of ABI -OR- endianess difference in petitboot and host 
> kernel.
> 
>>
>> The Petitboot kernel needs some amount space to operate. In your current patch
>> series, the amount of space "firmware" is allowed to use is hard-wired by the
>> crashing kernel. Olivers point is that this size should be defined dynamically
>> by firmware, not hard-wired into the crashing kernel.
> 
> Nope. Its not hard wired. But the problem with current patch series is petitboot 
> kernel
> goes and reads host Linux metadata area to find the memory it can use to boot... 
> which
> is not correct.
> 
>>
>> If we every change petitboot or some other component of the firmware so that it
>> needs to use more memory, then the crashing kernel won't have any way of
>> discovering that and it won't allocate enough space for this process to start.
> 
> We do not have problem with rest of the firmware stack.. as they have their own
> load area and its reserved. Only issue is with petitboot kernel.
> 
> How about something like below :
>    - First boot : Host kernel will tell OPAL about amount of memory firmware 
> (including
>                         petitboot) can use during MPIPL boot
>        opal_mpipl_update(OPAL_MPIPL_BOOT_MEM_SIZE,  <size of memory petitboot 
> can use>)
>     - OPAL will preserve this across MPIPL
>    - Post MPIPL : Petitboot kernel will make a query to get usable size
>       opal_mpipl_query_tag(type, *tag)
>       type = GET_BOOT_SIZE , *tag = size (assumption is that kernel will load at 
> 0th address)
> 
> This means we have to again change the opal_mpipl_query_tag() API. We will replace
> `index` field with `type` field.
>    opal_mpipl_query_tag(type, *tag)
>     type = OPAL, Kernel tag, CPU tag, boot mem size tag etg
> 
> Post MPIPL boot, kernel will make opal_mpipl_query_tag with specific type field 
> to tag the
> data.

I think it makes sense to have separate API for tag registration instead of 
overloading
opal_mpipl_update API. So we will have 3 APIs

1 - opal_mpipl_update(ops, src, dest, size)
     enum opal_mpipl_ops {
         OPAL_MPIPL_ADD_RANGE            = 0,
         OPAL_MPIPL_REMOVE_RANGE         = 1,
         OPAL_MPIPL_REMOVE_ALL           = 2,
         OPAL_MPIPL_FREE_PRESERVED_MEMORY= 3,
};

enum opal_mpipl_tags {
         OPAL_MPIPL_TAG_CPU      = 0,
         OPAL_MPIPL_TAG_OPAL     = 1,
         OPAL_MPIPL_TAG_KERNEL   = 2,
         OPAL_MPIPL_TAG_BOOT_MEM = 3,
};

2 - opal_mpipl_register_tag(tag, tag_val)
     kernel will pass tag = OPAL_MPIPL_TAG_KERNEL and OPAL_MPIPL_TAG_BOOT_MEM

3 - opal_mpipl_query_tag(tag, *tag_val)
    Kernel will make this call for all the tags listed in opal_mpipl_tags.


-Vasant

> 
> I'm fine with making above changes.
> 
> @Nick, any thoughts on above changes?
> 
> 
> -Vasant
> 
> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
>
diff mbox series

Patch

diff --git a/core/opal-dump.c b/core/opal-dump.c
index 7b8823d84..e53991784 100644
--- a/core/opal-dump.c
+++ b/core/opal-dump.c
@@ -44,6 +44,17 @@  static struct mpipl_metadata    *mpipl_metadata;
 /* Dump metadata area */
 static struct mpipl_fadump *mpipl_opal_data;
 
+/*
+ * Number of tags passed by OPAL to kernel after MPIPL boot.
+ * Currently it supports below tags:
+ *   - Kernel passed tag during MPIPL registration
+ *   - OPAL metadata area address
+ *   - CPU register data area
+ */
+#define MAX_MPIPL_TAGS		0x03
+static u64 mpipl_tags[MAX_MPIPL_TAGS];
+static int max_tags;
+
 
 static int opal_mpipl_add_entry(u8 region, u64 src, u64 dest, u64 size)
 {
@@ -308,6 +319,20 @@  static int64_t opal_mpipl_update(enum mpipl_ops ops,
 	return rc;
 }
 
+static uint64_t opal_mpipl_query_tag(uint32_t idx, uint64_t *tag)
+{
+	if (!opal_addr_valid(tag)) {
+		prlog(PR_DEBUG, "Invalid tag address\n");
+		return OPAL_PARAMETER;
+	}
+
+	if (idx >= max_tags)
+		return OPAL_EMPTY;
+
+	*tag = mpipl_tags[idx];
+	return OPAL_SUCCESS;
+}
+
 static void post_mpipl_get_opal_data(void)
 {
 	struct mdrt_table *mdrt = (void *)(MDRT_TABLE_BASE);
@@ -356,6 +381,12 @@  static void post_mpipl_get_opal_data(void)
 		if (j == count)
 			break;
 	}
+
+	/* Update tags */
+	if (mpipl_metadata->kernel_tag)
+		mpipl_tags[max_tags++] = mpipl_metadata->kernel_tag;
+
+	mpipl_tags[max_tags++] = (u64)mpipl_opal_data;
 }
 
 void opal_mpipl_save_crashing_pir(void)
@@ -406,4 +437,5 @@  void opal_mpipl_init(void)
 
 	/* OPAL API for MPIPL update */
 	opal_register(OPAL_MPIPL_UPDATE, opal_mpipl_update, 4);
+	opal_register(OPAL_MPIPL_QUERY_TAG, opal_mpipl_query_tag, 2);
 }
diff --git a/include/opal-api.h b/include/opal-api.h
index 5af1a6d0f..6f6656557 100644
--- a/include/opal-api.h
+++ b/include/opal-api.h
@@ -233,7 +233,8 @@ 
 #define OPAL_NPU_MEM_ALLOC			171
 #define OPAL_NPU_MEM_RELEASE			172
 #define OPAL_MPIPL_UPDATE			173
-#define OPAL_LAST				173
+#define OPAL_MPIPL_QUERY_TAG			174
+#define OPAL_LAST				174
 
 #define QUIESCE_HOLD			1 /* Spin all calls at entry */
 #define QUIESCE_REJECT			2 /* Fail all calls with OPAL_BUSY */