diff mbox series

[DOC] powerpc: Provide initial documentation for PAPR hcalls

Message ID 20190827152326.2784-1-vaibhav@linux.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series [DOC] powerpc: Provide initial documentation for PAPR hcalls | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (0e4523c0b4f64eaf7abe59e143e6bdf8f972acff)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch warning total: 0 errors, 1 warnings, 0 checks, 200 lines checked

Commit Message

Vaibhav Jain Aug. 27, 2019, 3:23 p.m. UTC
This doc patch provides an initial description of the hcall op-codes
that are used by Linux kernel running as a guest (LPAR) on top of
PowerVM or any other sPAPR compliant hyper-visor (e.g qemu).

Apart from documenting the hcalls the doc-patch also provides a
rudimentary overview of how hcall ABI, how they are issued with the
Linux kernel and how information/control flows between the guest and
hypervisor.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Change-log:

Initial version of this doc-patch was posted and reviewed as part of
the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for
failure of drc bind after kexec"
https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top
the original patch:

* Replaced the of term PHYP with Hypervisor to indicate both
PowerVM/Qemu [Laurent]
* Emphasized that In/Out arguments to hcalls are in Big-endian format
[Laurent]
* Fixed minor word repetition, spell issues and grammatical error
[Michal, Mpe]
* Replaced various variant of term 'hcall' with a single
variant. [Mpe]
* Changed the documentation format from txt to ReST. [Mpe]
* Changed the name of documentation file to papr_hcalls.rst. [Mpe]
* Updated the section describing privileged operation by hypervisor
to be more accurate [Mpe].
* Fixed up mention of register notation used for describing
hcalls. [Mpe]
* s/NVDimm/NVDIMM [Mpe]
* Added section on return values from hcall [Mpe]
* Described H_CONTINUE return-value for long running hcalls.
---
 Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)
 create mode 100644 Documentation/powerpc/papr_hcalls.rst

Comments

Laurent Dufour Aug. 27, 2019, 3:52 p.m. UTC | #1
Le 27/08/2019 à 17:23, Vaibhav Jain a écrit :
> This doc patch provides an initial description of the hcall op-codes
> that are used by Linux kernel running as a guest (LPAR) on top of
> PowerVM or any other sPAPR compliant hyper-visor (e.g qemu).
> 
> Apart from documenting the hcalls the doc-patch also provides a
> rudimentary overview of how hcall ABI, how they are issued with the
> Linux kernel and how information/control flows between the guest and
> hypervisor.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>

Hi Vaibhav,

Thanks for documenting this.

Besides my few remarks below, please consider:

Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>

> ---
> Change-log:
> 
> Initial version of this doc-patch was posted and reviewed as part of
> the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for
> failure of drc bind after kexec"
> https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top
> the original patch:
> 
> * Replaced the of term PHYP with Hypervisor to indicate both
> PowerVM/Qemu [Laurent]
> * Emphasized that In/Out arguments to hcalls are in Big-endian format
> [Laurent]
> * Fixed minor word repetition, spell issues and grammatical error
> [Michal, Mpe]
> * Replaced various variant of term 'hcall' with a single
> variant. [Mpe]
> * Changed the documentation format from txt to ReST. [Mpe]
> * Changed the name of documentation file to papr_hcalls.rst. [Mpe]
> * Updated the section describing privileged operation by hypervisor
> to be more accurate [Mpe].
> * Fixed up mention of register notation used for describing
> hcalls. [Mpe]
> * s/NVDimm/NVDIMM [Mpe]
> * Added section on return values from hcall [Mpe]
> * Described H_CONTINUE return-value for long running hcalls.
> ---
>   Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++
>   1 file changed, 200 insertions(+)
>   create mode 100644 Documentation/powerpc/papr_hcalls.rst
> 
> diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
> new file mode 100644
> index 000000000000..7afc0310de29
> --- /dev/null
> +++ b/Documentation/powerpc/papr_hcalls.rst
> @@ -0,0 +1,200 @@
> +===========================
> +Hypercall Op-codes (hcalls)
> +===========================
> +
> +Overview
> +=========
> +
> +Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
> +specification [1]_ which describes the run-time environment for a guest
> +operating system and how it should interact with the hypervisor for
> +privileged operations. Currently there are two PAPR compliant hypervisors:
> +
> +- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
> +  IBM-i and  Linux as supported guests (termed as Logical Partitions
> +  or LPARS). It supports the full PAPR specification.
> +
> +- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
> +  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
> +
> +On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
> +a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
> +issue hypercalls to the hypervisor whenever it needs to perform an action
> +that is hypervisor priviledged [3]_ or for other services managed by the
> +hypervisor.
> +
> +Hence a Hypercall (hcall) is essentially a request by the pSeries guest
> +asking hypervisor to perform a privileged operation on behalf of the guest. The
> +guest issues a with necessary input operands. The hypervisor after performing
                  ^ hcall ?

> +the privilege operation returns a status code and output operands back to the
> +guest.
> +
> +HCALL ABI
> +=========
> +The ABI specification for a hcall between a pSeries guest and PAPR hypervisor
> +is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
> +done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
> +and any in-arguments for the hcall are provided in registers *r4-r12* in
> +Big-endian byte order.
Indeed, register valuer are not byte ordered, only values passed through 
buffer in memory are byte ordered.

Should it be explicitly said that Big-endian order is only concerning data 
stored in memory?
What about something like that:
"...any in-arguments for the hcall are provided in registers *r4-r12*. If 
values have to be passed through a memory buffer, the data stored in that 
buffer are in Big-endian order."

> +
> +Once control is returns back to the guest after hypervisor has serviced the
> +'HVCS' instruction the return value of the hcall is available in *r3* and any
> +out values are returned in registers *r4-r12*. Again like in-arguments, all the
> +out value are in Big-endian byte order.
Same would apply here.

> +
> +Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
> +in a arch specific header [4]_ to issue hcalls from the linux kernel
> +running as pseries guest.
> +
> +DRC & DRC Indexes
> +=================
> +::
> +
> +     DR1                                  Guest
> +     +--+        +------------+         +---------+
> +     |  | <----> |            |         |  User   |
> +     +--+  DRC1  |            |   DRC   |  Space  |
> +                 |    PAPR    |  Index  +---------+
> +     DR2         | Hypervisor |         |         |
> +     +--+        |            | <-----> |  Kernel |
> +     |  | <----> |            |  Hcall  |         |
> +     +--+  DRC2  +------------+         +---------+
> +
> +PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
> +available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
> +an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
> +to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
> +called DRC-Index. The DRC-index value is provided to the LPAR via device-tree
> +where its present as an attribute in the device tree node associated with the
> +DR.
> +
> +HCALL Return-values
> +===================
> +
> +After servicing the hcall, hypervisor sets the return-value in *r3* indicating
> +success or failure of the hcall. In case of a failure an error code indicates
> +the cause for error. These codes are defined and documented in arch specific
> +header [4]_.
> +
> +In some cases a hcall can potentially take a long time and need to be issued
> +multiple times in order to be completely serviced. These hcalls will usually
> +accept an opaque value *continue-token* within there argument list and a
> +return value of *H_CONTINUE* indicates that hypervisor hasn't still finished
> +servicing the hcall yet.
> +
> +To make such hcalls the guest need to set *continue-token == 0* for the
> +initial call and use the hypervisor returned value of *continue-token*
> +for each subsequent hcall until hypervisor returns a non *H_CONTINUE*
> +return value.
> +
> +HCALL Op-codes
> +==============
> +
> +Below is a partial list of HCALLs that are supported by PHYP. For the
> +corresponding opcode values please look into the arch specific header [4]_:
> +
> +**H_SCM_READ_METADATA**
> +
> +| Input: *drcIndex, offset, buffer-address, numBytesToRead*
> +| Out: *numBytesRead*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
> +
> +Given a DRC Index of an NVDIMM, read N-bytes from the the metadata area
> +associated with it, at a specified offset and copy it to provided buffer.
> +The metadata area stores configuration information such as label information,
> +bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
> +area hence a separate access semantics is provided.
> +
> +**H_SCM_WRITE_METADATA**
> +
> +| Input: *drcIndex, offset, data, numBytesToWrite*
> +| Out: *None*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
> +
> +Given a DRC Index of an NVDIMM, write N-bytes to the metadata area
> +associated with it, at the specified offset and from the provided buffer.
> +
> +**H_SCM_BIND_MEM**
> +
> +| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
> +| *targetLogicalMemoryAddress, continue-token*
> +| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
> +| *H_Too_Big, H_P5, H_Busy*
> +
> +Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range
> +*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
> +at *targetLogicalMemoryAddress* within guest physical address space. In
> +case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
> +assigns a target address to the guest. The HCALL can fail if the Guest has
> +an active PTE entry to the SCM block being bound.
> +
> +**H_SCM_UNBIND_MEM**
> +| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
> +| Out: numScmBlocksUnbound
> +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
> +| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
> +
> +Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
> +at *startingScmLogicalMemoryAddress* from guest physical address space. The
> +HCALL can fail if the Guest has an active PTE entry to the SCM block being
> +unbound.
> +
> +**H_SCM_QUERY_BLOCK_MEM_BINDING**
> +
> +| Input: *drcIndex, scmBlockIndex*
> +| Out: *Guest-Physical-Address*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
> +
> +Given a DRC-Index and an SCM Block index return the guest physical address to
> +which the SCM block is mapped to.
> +
> +**H_SCM_QUERY_LOGICAL_MEM_BINDING**
> +
> +| Input: *Guest-Physical-Address*
> +| Out: *drcIndex, scmBlockIndex*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
> +
> +Given a guest physical address return which DRC Index and SCM block is mapped
> +to that address.
> +
> +**H_SCM_UNBIND_ALL**
> +
> +| Input: *scmTargetScope, drcIndex*
> +| Out: *None*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
> +| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
> +
> +Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
> +or all SCM blocks belonging to a single NVDIMM identified by its drcIndex
> +from the LPAR memory.
> +
> +**H_SCM_HEALTH**
> +
> +| Input: drcIndex
> +| Out: *health-bitmap, health-bit-valid-bitmap*
> +| Return Value: *H_Success, H_Parameter, H_Hardware*
> +
> +Given a DRC Index return the info on predictive failure and overall health of
> +the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
> +failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
> +valid.
> +
> +**H_SCM_PERFORMANCE_STATS**
> +
> +| Input: drcIndex, resultBuffer Addr
> +| Out: None
> +| Return Value:  *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
> +
> +Given a DRC Index collect the performance statistics for NVDIMM and copy them
> +to the resultBuffer.
> +
> +References
> +==========
> +.. [1] "Power Architecture Platform Reference"
> +       https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
> +.. [2] "Linux on Power Architecture Platform Reference"
> +       https://members.openpowerfoundation.org/document/dl/469
> +.. [3] "Definitions and Notation" Book III-Section 14.5.3
> +       https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
> +.. [4] arch/powerpc/include/asm/hvcall.h
>
Nicholas Piggin Aug. 28, 2019, 1:09 a.m. UTC | #2
Vaibhav Jain's on August 28, 2019 1:23 am:
> This doc patch provides an initial description of the hcall op-codes
> that are used by Linux kernel running as a guest (LPAR) on top of
> PowerVM or any other sPAPR compliant hyper-visor (e.g qemu).
> 
> Apart from documenting the hcalls the doc-patch also provides a
> rudimentary overview of how hcall ABI, how they are issued with the
> Linux kernel and how information/control flows between the guest and
> hypervisor.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Change-log:
> 
> Initial version of this doc-patch was posted and reviewed as part of
> the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for
> failure of drc bind after kexec"
> https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top
> the original patch:
> 
> * Replaced the of term PHYP with Hypervisor to indicate both
> PowerVM/Qemu [Laurent]
> * Emphasized that In/Out arguments to hcalls are in Big-endian format
> [Laurent]
> * Fixed minor word repetition, spell issues and grammatical error
> [Michal, Mpe]
> * Replaced various variant of term 'hcall' with a single
> variant. [Mpe]
> * Changed the documentation format from txt to ReST. [Mpe]
> * Changed the name of documentation file to papr_hcalls.rst. [Mpe]
> * Updated the section describing privileged operation by hypervisor
> to be more accurate [Mpe].
> * Fixed up mention of register notation used for describing
> hcalls. [Mpe]
> * s/NVDimm/NVDIMM [Mpe]
> * Added section on return values from hcall [Mpe]
> * Described H_CONTINUE return-value for long running hcalls.
> ---
>  Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++
>  1 file changed, 200 insertions(+)
>  create mode 100644 Documentation/powerpc/papr_hcalls.rst
> 
> diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
> new file mode 100644
> index 000000000000..7afc0310de29
> --- /dev/null
> +++ b/Documentation/powerpc/papr_hcalls.rst
> @@ -0,0 +1,200 @@
> +===========================
> +Hypercall Op-codes (hcalls)
> +===========================
> +
> +Overview
> +=========
> +
> +Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
> +specification [1]_ which describes the run-time environment for a guest
> +operating system and how it should interact with the hypervisor for
> +privileged operations. Currently there are two PAPR compliant hypervisors:
> +
> +- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
> +  IBM-i and  Linux as supported guests (termed as Logical Partitions
> +  or LPARS). It supports the full PAPR specification.
> +
> +- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
> +  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
> +
> +On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
> +a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
> +issue hypercalls to the hypervisor whenever it needs to perform an action
> +that is hypervisor priviledged [3]_ or for other services managed by the
> +hypervisor.
> +
> +Hence a Hypercall (hcall) is essentially a request by the pSeries guest
> +asking hypervisor to perform a privileged operation on behalf of the guest. The
> +guest issues a with necessary input operands. The hypervisor after performing
> +the privilege operation returns a status code and output operands back to the
> +guest.
> +
> +HCALL ABI
> +=========
> +The ABI specification for a hcall between a pSeries guest and PAPR hypervisor
> +is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
> +done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
> +and any in-arguments for the hcall are provided in registers *r4-r12* in
> +Big-endian byte order.
> +
> +Once control is returns back to the guest after hypervisor has serviced the
> +'HVCS' instruction the return value of the hcall is available in *r3* and any
> +out values are returned in registers *r4-r12*. Again like in-arguments, all the
> +out value are in Big-endian byte order.
> +
> +Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
> +in a arch specific header [4]_ to issue hcalls from the linux kernel
> +running as pseries guest.

Thanks for this. Any chance you could replace the hcall convention in
exception-64s.S with a link to this document, and add it in here? It
needs a small fix or two as well, I think I put an ePAPR convention of
r11 for number in there.

Thanks,
Nick
Michael Ellerman Aug. 28, 2019, 1:24 p.m. UTC | #3
Laurent Dufour <ldufour@linux.vnet.ibm.com> writes:
> Le 27/08/2019 à 17:23, Vaibhav Jain a écrit :
>> This doc patch provides an initial description of the hcall op-codes
>> that are used by Linux kernel running as a guest (LPAR) on top of
>> PowerVM or any other sPAPR compliant hyper-visor (e.g qemu).
>> 
>> Apart from documenting the hcalls the doc-patch also provides a
>> rudimentary overview of how hcall ABI, how they are issued with the
>> Linux kernel and how information/control flows between the guest and
>> hypervisor.
>> 
>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>
> Hi Vaibhav,
>
> Thanks for documenting this.
>
> Besides my few remarks below, please consider:
>
> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
>
>> ---
>> Change-log:
>> 
>> Initial version of this doc-patch was posted and reviewed as part of
>> the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for
>> failure of drc bind after kexec"
>> https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top
>> the original patch:
>> 
>> * Replaced the of term PHYP with Hypervisor to indicate both
>> PowerVM/Qemu [Laurent]
>> * Emphasized that In/Out arguments to hcalls are in Big-endian format
>> [Laurent]
>> * Fixed minor word repetition, spell issues and grammatical error
>> [Michal, Mpe]
>> * Replaced various variant of term 'hcall' with a single
>> variant. [Mpe]
>> * Changed the documentation format from txt to ReST. [Mpe]
>> * Changed the name of documentation file to papr_hcalls.rst. [Mpe]
>> * Updated the section describing privileged operation by hypervisor
>> to be more accurate [Mpe].
>> * Fixed up mention of register notation used for describing
>> hcalls. [Mpe]
>> * s/NVDimm/NVDIMM [Mpe]
>> * Added section on return values from hcall [Mpe]
>> * Described H_CONTINUE return-value for long running hcalls.
>> ---
>>   Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++
>>   1 file changed, 200 insertions(+)
>>   create mode 100644 Documentation/powerpc/papr_hcalls.rst
>> 
>> diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
>> new file mode 100644
>> index 000000000000..7afc0310de29
>> --- /dev/null
>> +++ b/Documentation/powerpc/papr_hcalls.rst
>> @@ -0,0 +1,200 @@
>> +===========================
>> +Hypercall Op-codes (hcalls)
>> +===========================
>> +
>> +Overview
>> +=========
>> +
>> +Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
>> +specification [1]_ which describes the run-time environment for a guest
>> +operating system and how it should interact with the hypervisor for
>> +privileged operations. Currently there are two PAPR compliant hypervisors:
>> +
>> +- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
>> +  IBM-i and  Linux as supported guests (termed as Logical Partitions
>> +  or LPARS). It supports the full PAPR specification.
>> +
>> +- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
>> +  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
>> +
>> +On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
>> +a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
>> +issue hypercalls to the hypervisor whenever it needs to perform an action
>> +that is hypervisor priviledged [3]_ or for other services managed by the
>> +hypervisor.
>> +
>> +Hence a Hypercall (hcall) is essentially a request by the pSeries guest
>> +asking hypervisor to perform a privileged operation on behalf of the guest. The
>> +guest issues a with necessary input operands. The hypervisor after performing
>                   ^ hcall ?
>
>> +the privilege operation returns a status code and output operands back to the
>> +guest.
>> +
>> +HCALL ABI
>> +=========
>> +The ABI specification for a hcall between a pSeries guest and PAPR hypervisor
>> +is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
>> +done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
>> +and any in-arguments for the hcall are provided in registers *r4-r12* in
>> +Big-endian byte order.
> Indeed, register valuer are not byte ordered, only values passed through 
> buffer in memory are byte ordered.
>
> Should it be explicitly said that Big-endian order is only concerning data 
> stored in memory?
> What about something like that:
> "...any in-arguments for the hcall are provided in registers *r4-r12*. If 
> values have to be passed through a memory buffer, the data stored in that 
> buffer are in Big-endian order."

Yes that would be better.

I guess to be pedantic every structure passed in memory needs to be
defined in PAPR and could have some arbitrary ordering, but in practice
everything is big endian.

cheers
diff mbox series

Patch

diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
new file mode 100644
index 000000000000..7afc0310de29
--- /dev/null
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -0,0 +1,200 @@ 
+===========================
+Hypercall Op-codes (hcalls)
+===========================
+
+Overview
+=========
+
+Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
+specification [1]_ which describes the run-time environment for a guest
+operating system and how it should interact with the hypervisor for
+privileged operations. Currently there are two PAPR compliant hypervisors:
+
+- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
+  IBM-i and  Linux as supported guests (termed as Logical Partitions
+  or LPARS). It supports the full PAPR specification.
+
+- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
+  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
+
+On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
+a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
+issue hypercalls to the hypervisor whenever it needs to perform an action
+that is hypervisor priviledged [3]_ or for other services managed by the
+hypervisor.
+
+Hence a Hypercall (hcall) is essentially a request by the pSeries guest
+asking hypervisor to perform a privileged operation on behalf of the guest. The
+guest issues a with necessary input operands. The hypervisor after performing
+the privilege operation returns a status code and output operands back to the
+guest.
+
+HCALL ABI
+=========
+The ABI specification for a hcall between a pSeries guest and PAPR hypervisor
+is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
+done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
+and any in-arguments for the hcall are provided in registers *r4-r12* in
+Big-endian byte order.
+
+Once control is returns back to the guest after hypervisor has serviced the
+'HVCS' instruction the return value of the hcall is available in *r3* and any
+out values are returned in registers *r4-r12*. Again like in-arguments, all the
+out value are in Big-endian byte order.
+
+Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
+in a arch specific header [4]_ to issue hcalls from the linux kernel
+running as pseries guest.
+
+DRC & DRC Indexes
+=================
+::
+
+     DR1                                  Guest
+     +--+        +------------+         +---------+
+     |  | <----> |            |         |  User   |
+     +--+  DRC1  |            |   DRC   |  Space  |
+                 |    PAPR    |  Index  +---------+
+     DR2         | Hypervisor |         |         |
+     +--+        |            | <-----> |  Kernel |
+     |  | <----> |            |  Hcall  |         |
+     +--+  DRC2  +------------+         +---------+
+
+PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
+available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
+an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
+to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
+called DRC-Index. The DRC-index value is provided to the LPAR via device-tree
+where its present as an attribute in the device tree node associated with the
+DR.
+
+HCALL Return-values
+===================
+
+After servicing the hcall, hypervisor sets the return-value in *r3* indicating
+success or failure of the hcall. In case of a failure an error code indicates
+the cause for error. These codes are defined and documented in arch specific
+header [4]_.
+
+In some cases a hcall can potentially take a long time and need to be issued
+multiple times in order to be completely serviced. These hcalls will usually
+accept an opaque value *continue-token* within there argument list and a
+return value of *H_CONTINUE* indicates that hypervisor hasn't still finished
+servicing the hcall yet.
+
+To make such hcalls the guest need to set *continue-token == 0* for the
+initial call and use the hypervisor returned value of *continue-token*
+for each subsequent hcall until hypervisor returns a non *H_CONTINUE*
+return value.
+
+HCALL Op-codes
+==============
+
+Below is a partial list of HCALLs that are supported by PHYP. For the
+corresponding opcode values please look into the arch specific header [4]_:
+
+**H_SCM_READ_METADATA**
+
+| Input: *drcIndex, offset, buffer-address, numBytesToRead*
+| Out: *numBytesRead*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
+
+Given a DRC Index of an NVDIMM, read N-bytes from the the metadata area
+associated with it, at a specified offset and copy it to provided buffer.
+The metadata area stores configuration information such as label information,
+bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
+area hence a separate access semantics is provided.
+
+**H_SCM_WRITE_METADATA**
+
+| Input: *drcIndex, offset, data, numBytesToWrite*
+| Out: *None*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
+
+Given a DRC Index of an NVDIMM, write N-bytes to the metadata area
+associated with it, at the specified offset and from the provided buffer.
+
+**H_SCM_BIND_MEM**
+
+| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
+| *targetLogicalMemoryAddress, continue-token*
+| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
+| *H_Too_Big, H_P5, H_Busy*
+
+Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range
+*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
+at *targetLogicalMemoryAddress* within guest physical address space. In
+case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
+assigns a target address to the guest. The HCALL can fail if the Guest has
+an active PTE entry to the SCM block being bound.
+
+**H_SCM_UNBIND_MEM**
+| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
+| Out: numScmBlocksUnbound
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
+| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
+
+Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
+at *startingScmLogicalMemoryAddress* from guest physical address space. The
+HCALL can fail if the Guest has an active PTE entry to the SCM block being
+unbound.
+
+**H_SCM_QUERY_BLOCK_MEM_BINDING**
+
+| Input: *drcIndex, scmBlockIndex*
+| Out: *Guest-Physical-Address*
+| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
+
+Given a DRC-Index and an SCM Block index return the guest physical address to
+which the SCM block is mapped to.
+
+**H_SCM_QUERY_LOGICAL_MEM_BINDING**
+
+| Input: *Guest-Physical-Address*
+| Out: *drcIndex, scmBlockIndex*
+| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
+
+Given a guest physical address return which DRC Index and SCM block is mapped
+to that address.
+
+**H_SCM_UNBIND_ALL**
+
+| Input: *scmTargetScope, drcIndex*
+| Out: *None*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
+| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
+
+Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
+or all SCM blocks belonging to a single NVDIMM identified by its drcIndex
+from the LPAR memory.
+
+**H_SCM_HEALTH**
+
+| Input: drcIndex
+| Out: *health-bitmap, health-bit-valid-bitmap*
+| Return Value: *H_Success, H_Parameter, H_Hardware*
+
+Given a DRC Index return the info on predictive failure and overall health of
+the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
+failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
+valid.
+
+**H_SCM_PERFORMANCE_STATS**
+
+| Input: drcIndex, resultBuffer Addr
+| Out: None
+| Return Value:  *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
+
+Given a DRC Index collect the performance statistics for NVDIMM and copy them
+to the resultBuffer.
+
+References
+==========
+.. [1] "Power Architecture Platform Reference"
+       https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
+.. [2] "Linux on Power Architecture Platform Reference"
+       https://members.openpowerfoundation.org/document/dl/469
+.. [3] "Definitions and Notation" Book III-Section 14.5.3
+       https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
+.. [4] arch/powerpc/include/asm/hvcall.h