Message ID | 20190827152326.2784-1-vaibhav@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [DOC] powerpc: Provide initial documentation for PAPR hcalls | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (0e4523c0b4f64eaf7abe59e143e6bdf8f972acff) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | warning | total: 0 errors, 1 warnings, 0 checks, 200 lines checked |
Le 27/08/2019 à 17:23, Vaibhav Jain a écrit : > This doc patch provides an initial description of the hcall op-codes > that are used by Linux kernel running as a guest (LPAR) on top of > PowerVM or any other sPAPR compliant hyper-visor (e.g qemu). > > Apart from documenting the hcalls the doc-patch also provides a > rudimentary overview of how hcall ABI, how they are issued with the > Linux kernel and how information/control flows between the guest and > hypervisor. > > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Hi Vaibhav, Thanks for documenting this. Besides my few remarks below, please consider: Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> > --- > Change-log: > > Initial version of this doc-patch was posted and reviewed as part of > the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for > failure of drc bind after kexec" > https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top > the original patch: > > * Replaced the of term PHYP with Hypervisor to indicate both > PowerVM/Qemu [Laurent] > * Emphasized that In/Out arguments to hcalls are in Big-endian format > [Laurent] > * Fixed minor word repetition, spell issues and grammatical error > [Michal, Mpe] > * Replaced various variant of term 'hcall' with a single > variant. [Mpe] > * Changed the documentation format from txt to ReST. [Mpe] > * Changed the name of documentation file to papr_hcalls.rst. [Mpe] > * Updated the section describing privileged operation by hypervisor > to be more accurate [Mpe]. > * Fixed up mention of register notation used for describing > hcalls. [Mpe] > * s/NVDimm/NVDIMM [Mpe] > * Added section on return values from hcall [Mpe] > * Described H_CONTINUE return-value for long running hcalls. > --- > Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++ > 1 file changed, 200 insertions(+) > create mode 100644 Documentation/powerpc/papr_hcalls.rst > > diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst > new file mode 100644 > index 000000000000..7afc0310de29 > --- /dev/null > +++ b/Documentation/powerpc/papr_hcalls.rst > @@ -0,0 +1,200 @@ > +=========================== > +Hypercall Op-codes (hcalls) > +=========================== > + > +Overview > +========= > + > +Virtualization on 64-bit Power Book3S Platforms is based on the PAPR > +specification [1]_ which describes the run-time environment for a guest > +operating system and how it should interact with the hypervisor for > +privileged operations. Currently there are two PAPR compliant hypervisors: > + > +- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, > + IBM-i and Linux as supported guests (termed as Logical Partitions > + or LPARS). It supports the full PAPR specification. > + > +- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. > + Though it only implements a subset of PAPR specification called LoPAPR [2]_. > + > +On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called > +a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must > +issue hypercalls to the hypervisor whenever it needs to perform an action > +that is hypervisor priviledged [3]_ or for other services managed by the > +hypervisor. > + > +Hence a Hypercall (hcall) is essentially a request by the pSeries guest > +asking hypervisor to perform a privileged operation on behalf of the guest. The > +guest issues a with necessary input operands. The hypervisor after performing ^ hcall ? > +the privilege operation returns a status code and output operands back to the > +guest. > + > +HCALL ABI > +========= > +The ABI specification for a hcall between a pSeries guest and PAPR hypervisor > +is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is > +done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* > +and any in-arguments for the hcall are provided in registers *r4-r12* in > +Big-endian byte order. Indeed, register valuer are not byte ordered, only values passed through buffer in memory are byte ordered. Should it be explicitly said that Big-endian order is only concerning data stored in memory? What about something like that: "...any in-arguments for the hcall are provided in registers *r4-r12*. If values have to be passed through a memory buffer, the data stored in that buffer are in Big-endian order." > + > +Once control is returns back to the guest after hypervisor has serviced the > +'HVCS' instruction the return value of the hcall is available in *r3* and any > +out values are returned in registers *r4-r12*. Again like in-arguments, all the > +out value are in Big-endian byte order. Same would apply here. > + > +Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined > +in a arch specific header [4]_ to issue hcalls from the linux kernel > +running as pseries guest. > + > +DRC & DRC Indexes > +================= > +:: > + > + DR1 Guest > + +--+ +------------+ +---------+ > + | | <----> | | | User | > + +--+ DRC1 | | DRC | Space | > + | PAPR | Index +---------+ > + DR2 | Hypervisor | | | > + +--+ | | <-----> | Kernel | > + | | <----> | | Hcall | | > + +--+ DRC2 +------------+ +---------+ > + > +PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc > +available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to > +an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC) > +to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number > +called DRC-Index. The DRC-index value is provided to the LPAR via device-tree > +where its present as an attribute in the device tree node associated with the > +DR. > + > +HCALL Return-values > +=================== > + > +After servicing the hcall, hypervisor sets the return-value in *r3* indicating > +success or failure of the hcall. In case of a failure an error code indicates > +the cause for error. These codes are defined and documented in arch specific > +header [4]_. > + > +In some cases a hcall can potentially take a long time and need to be issued > +multiple times in order to be completely serviced. These hcalls will usually > +accept an opaque value *continue-token* within there argument list and a > +return value of *H_CONTINUE* indicates that hypervisor hasn't still finished > +servicing the hcall yet. > + > +To make such hcalls the guest need to set *continue-token == 0* for the > +initial call and use the hypervisor returned value of *continue-token* > +for each subsequent hcall until hypervisor returns a non *H_CONTINUE* > +return value. > + > +HCALL Op-codes > +============== > + > +Below is a partial list of HCALLs that are supported by PHYP. For the > +corresponding opcode values please look into the arch specific header [4]_: > + > +**H_SCM_READ_METADATA** > + > +| Input: *drcIndex, offset, buffer-address, numBytesToRead* > +| Out: *numBytesRead* > +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware* > + > +Given a DRC Index of an NVDIMM, read N-bytes from the the metadata area > +associated with it, at a specified offset and copy it to provided buffer. > +The metadata area stores configuration information such as label information, > +bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage > +area hence a separate access semantics is provided. > + > +**H_SCM_WRITE_METADATA** > + > +| Input: *drcIndex, offset, data, numBytesToWrite* > +| Out: *None* > +| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware* > + > +Given a DRC Index of an NVDIMM, write N-bytes to the metadata area > +associated with it, at the specified offset and from the provided buffer. > + > +**H_SCM_BIND_MEM** > + > +| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,* > +| *targetLogicalMemoryAddress, continue-token* > +| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound* > +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,* > +| *H_Too_Big, H_P5, H_Busy* > + > +Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range > +*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest > +at *targetLogicalMemoryAddress* within guest physical address space. In > +case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor > +assigns a target address to the guest. The HCALL can fail if the Guest has > +an active PTE entry to the SCM block being bound. > + > +**H_SCM_UNBIND_MEM** > +| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind > +| Out: numScmBlocksUnbound > +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,* > +| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* > + > +Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting > +at *startingScmLogicalMemoryAddress* from guest physical address space. The > +HCALL can fail if the Guest has an active PTE entry to the SCM block being > +unbound. > + > +**H_SCM_QUERY_BLOCK_MEM_BINDING** > + > +| Input: *drcIndex, scmBlockIndex* > +| Out: *Guest-Physical-Address* > +| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* > + > +Given a DRC-Index and an SCM Block index return the guest physical address to > +which the SCM block is mapped to. > + > +**H_SCM_QUERY_LOGICAL_MEM_BINDING** > + > +| Input: *Guest-Physical-Address* > +| Out: *drcIndex, scmBlockIndex* > +| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* > + > +Given a guest physical address return which DRC Index and SCM block is mapped > +to that address. > + > +**H_SCM_UNBIND_ALL** > + > +| Input: *scmTargetScope, drcIndex* > +| Out: *None* > +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,* > +| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* > + > +Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs > +or all SCM blocks belonging to a single NVDIMM identified by its drcIndex > +from the LPAR memory. > + > +**H_SCM_HEALTH** > + > +| Input: drcIndex > +| Out: *health-bitmap, health-bit-valid-bitmap* > +| Return Value: *H_Success, H_Parameter, H_Hardware* > + > +Given a DRC Index return the info on predictive failure and overall health of > +the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive > +failure and health-bit-valid-bitmap indicate which bits in health-bitmap are > +valid. > + > +**H_SCM_PERFORMANCE_STATS** > + > +| Input: drcIndex, resultBuffer Addr > +| Out: None > +| Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege* > + > +Given a DRC Index collect the performance statistics for NVDIMM and copy them > +to the resultBuffer. > + > +References > +========== > +.. [1] "Power Architecture Platform Reference" > + https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference > +.. [2] "Linux on Power Architecture Platform Reference" > + https://members.openpowerfoundation.org/document/dl/469 > +.. [3] "Definitions and Notation" Book III-Section 14.5.3 > + https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 > +.. [4] arch/powerpc/include/asm/hvcall.h >
Vaibhav Jain's on August 28, 2019 1:23 am: > This doc patch provides an initial description of the hcall op-codes > that are used by Linux kernel running as a guest (LPAR) on top of > PowerVM or any other sPAPR compliant hyper-visor (e.g qemu). > > Apart from documenting the hcalls the doc-patch also provides a > rudimentary overview of how hcall ABI, how they are issued with the > Linux kernel and how information/control flows between the guest and > hypervisor. > > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > --- > Change-log: > > Initial version of this doc-patch was posted and reviewed as part of > the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for > failure of drc bind after kexec" > https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top > the original patch: > > * Replaced the of term PHYP with Hypervisor to indicate both > PowerVM/Qemu [Laurent] > * Emphasized that In/Out arguments to hcalls are in Big-endian format > [Laurent] > * Fixed minor word repetition, spell issues and grammatical error > [Michal, Mpe] > * Replaced various variant of term 'hcall' with a single > variant. [Mpe] > * Changed the documentation format from txt to ReST. [Mpe] > * Changed the name of documentation file to papr_hcalls.rst. [Mpe] > * Updated the section describing privileged operation by hypervisor > to be more accurate [Mpe]. > * Fixed up mention of register notation used for describing > hcalls. [Mpe] > * s/NVDimm/NVDIMM [Mpe] > * Added section on return values from hcall [Mpe] > * Described H_CONTINUE return-value for long running hcalls. > --- > Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++ > 1 file changed, 200 insertions(+) > create mode 100644 Documentation/powerpc/papr_hcalls.rst > > diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst > new file mode 100644 > index 000000000000..7afc0310de29 > --- /dev/null > +++ b/Documentation/powerpc/papr_hcalls.rst > @@ -0,0 +1,200 @@ > +=========================== > +Hypercall Op-codes (hcalls) > +=========================== > + > +Overview > +========= > + > +Virtualization on 64-bit Power Book3S Platforms is based on the PAPR > +specification [1]_ which describes the run-time environment for a guest > +operating system and how it should interact with the hypervisor for > +privileged operations. Currently there are two PAPR compliant hypervisors: > + > +- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, > + IBM-i and Linux as supported guests (termed as Logical Partitions > + or LPARS). It supports the full PAPR specification. > + > +- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. > + Though it only implements a subset of PAPR specification called LoPAPR [2]_. > + > +On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called > +a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must > +issue hypercalls to the hypervisor whenever it needs to perform an action > +that is hypervisor priviledged [3]_ or for other services managed by the > +hypervisor. > + > +Hence a Hypercall (hcall) is essentially a request by the pSeries guest > +asking hypervisor to perform a privileged operation on behalf of the guest. The > +guest issues a with necessary input operands. The hypervisor after performing > +the privilege operation returns a status code and output operands back to the > +guest. > + > +HCALL ABI > +========= > +The ABI specification for a hcall between a pSeries guest and PAPR hypervisor > +is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is > +done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* > +and any in-arguments for the hcall are provided in registers *r4-r12* in > +Big-endian byte order. > + > +Once control is returns back to the guest after hypervisor has serviced the > +'HVCS' instruction the return value of the hcall is available in *r3* and any > +out values are returned in registers *r4-r12*. Again like in-arguments, all the > +out value are in Big-endian byte order. > + > +Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined > +in a arch specific header [4]_ to issue hcalls from the linux kernel > +running as pseries guest. Thanks for this. Any chance you could replace the hcall convention in exception-64s.S with a link to this document, and add it in here? It needs a small fix or two as well, I think I put an ePAPR convention of r11 for number in there. Thanks, Nick
Laurent Dufour <ldufour@linux.vnet.ibm.com> writes: > Le 27/08/2019 à 17:23, Vaibhav Jain a écrit : >> This doc patch provides an initial description of the hcall op-codes >> that are used by Linux kernel running as a guest (LPAR) on top of >> PowerVM or any other sPAPR compliant hyper-visor (e.g qemu). >> >> Apart from documenting the hcalls the doc-patch also provides a >> rudimentary overview of how hcall ABI, how they are issued with the >> Linux kernel and how information/control flows between the guest and >> hypervisor. >> >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > > Hi Vaibhav, > > Thanks for documenting this. > > Besides my few remarks below, please consider: > > Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> > >> --- >> Change-log: >> >> Initial version of this doc-patch was posted and reviewed as part of >> the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for >> failure of drc bind after kexec" >> https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top >> the original patch: >> >> * Replaced the of term PHYP with Hypervisor to indicate both >> PowerVM/Qemu [Laurent] >> * Emphasized that In/Out arguments to hcalls are in Big-endian format >> [Laurent] >> * Fixed minor word repetition, spell issues and grammatical error >> [Michal, Mpe] >> * Replaced various variant of term 'hcall' with a single >> variant. [Mpe] >> * Changed the documentation format from txt to ReST. [Mpe] >> * Changed the name of documentation file to papr_hcalls.rst. [Mpe] >> * Updated the section describing privileged operation by hypervisor >> to be more accurate [Mpe]. >> * Fixed up mention of register notation used for describing >> hcalls. [Mpe] >> * s/NVDimm/NVDIMM [Mpe] >> * Added section on return values from hcall [Mpe] >> * Described H_CONTINUE return-value for long running hcalls. >> --- >> Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++ >> 1 file changed, 200 insertions(+) >> create mode 100644 Documentation/powerpc/papr_hcalls.rst >> >> diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst >> new file mode 100644 >> index 000000000000..7afc0310de29 >> --- /dev/null >> +++ b/Documentation/powerpc/papr_hcalls.rst >> @@ -0,0 +1,200 @@ >> +=========================== >> +Hypercall Op-codes (hcalls) >> +=========================== >> + >> +Overview >> +========= >> + >> +Virtualization on 64-bit Power Book3S Platforms is based on the PAPR >> +specification [1]_ which describes the run-time environment for a guest >> +operating system and how it should interact with the hypervisor for >> +privileged operations. Currently there are two PAPR compliant hypervisors: >> + >> +- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, >> + IBM-i and Linux as supported guests (termed as Logical Partitions >> + or LPARS). It supports the full PAPR specification. >> + >> +- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. >> + Though it only implements a subset of PAPR specification called LoPAPR [2]_. >> + >> +On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called >> +a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must >> +issue hypercalls to the hypervisor whenever it needs to perform an action >> +that is hypervisor priviledged [3]_ or for other services managed by the >> +hypervisor. >> + >> +Hence a Hypercall (hcall) is essentially a request by the pSeries guest >> +asking hypervisor to perform a privileged operation on behalf of the guest. The >> +guest issues a with necessary input operands. The hypervisor after performing > ^ hcall ? > >> +the privilege operation returns a status code and output operands back to the >> +guest. >> + >> +HCALL ABI >> +========= >> +The ABI specification for a hcall between a pSeries guest and PAPR hypervisor >> +is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is >> +done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* >> +and any in-arguments for the hcall are provided in registers *r4-r12* in >> +Big-endian byte order. > Indeed, register valuer are not byte ordered, only values passed through > buffer in memory are byte ordered. > > Should it be explicitly said that Big-endian order is only concerning data > stored in memory? > What about something like that: > "...any in-arguments for the hcall are provided in registers *r4-r12*. If > values have to be passed through a memory buffer, the data stored in that > buffer are in Big-endian order." Yes that would be better. I guess to be pedantic every structure passed in memory needs to be defined in PAPR and could have some arbitrary ordering, but in practice everything is big endian. cheers
diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst new file mode 100644 index 000000000000..7afc0310de29 --- /dev/null +++ b/Documentation/powerpc/papr_hcalls.rst @@ -0,0 +1,200 @@ +=========================== +Hypercall Op-codes (hcalls) +=========================== + +Overview +========= + +Virtualization on 64-bit Power Book3S Platforms is based on the PAPR +specification [1]_ which describes the run-time environment for a guest +operating system and how it should interact with the hypervisor for +privileged operations. Currently there are two PAPR compliant hypervisors: + +- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, + IBM-i and Linux as supported guests (termed as Logical Partitions + or LPARS). It supports the full PAPR specification. + +- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. + Though it only implements a subset of PAPR specification called LoPAPR [2]_. + +On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called +a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must +issue hypercalls to the hypervisor whenever it needs to perform an action +that is hypervisor priviledged [3]_ or for other services managed by the +hypervisor. + +Hence a Hypercall (hcall) is essentially a request by the pSeries guest +asking hypervisor to perform a privileged operation on behalf of the guest. The +guest issues a with necessary input operands. The hypervisor after performing +the privilege operation returns a status code and output operands back to the +guest. + +HCALL ABI +========= +The ABI specification for a hcall between a pSeries guest and PAPR hypervisor +is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is +done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* +and any in-arguments for the hcall are provided in registers *r4-r12* in +Big-endian byte order. + +Once control is returns back to the guest after hypervisor has serviced the +'HVCS' instruction the return value of the hcall is available in *r3* and any +out values are returned in registers *r4-r12*. Again like in-arguments, all the +out value are in Big-endian byte order. + +Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined +in a arch specific header [4]_ to issue hcalls from the linux kernel +running as pseries guest. + +DRC & DRC Indexes +================= +:: + + DR1 Guest + +--+ +------------+ +---------+ + | | <----> | | | User | + +--+ DRC1 | | DRC | Space | + | PAPR | Index +---------+ + DR2 | Hypervisor | | | + +--+ | | <-----> | Kernel | + | | <----> | | Hcall | | + +--+ DRC2 +------------+ +---------+ + +PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc +available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to +an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC) +to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number +called DRC-Index. The DRC-index value is provided to the LPAR via device-tree +where its present as an attribute in the device tree node associated with the +DR. + +HCALL Return-values +=================== + +After servicing the hcall, hypervisor sets the return-value in *r3* indicating +success or failure of the hcall. In case of a failure an error code indicates +the cause for error. These codes are defined and documented in arch specific +header [4]_. + +In some cases a hcall can potentially take a long time and need to be issued +multiple times in order to be completely serviced. These hcalls will usually +accept an opaque value *continue-token* within there argument list and a +return value of *H_CONTINUE* indicates that hypervisor hasn't still finished +servicing the hcall yet. + +To make such hcalls the guest need to set *continue-token == 0* for the +initial call and use the hypervisor returned value of *continue-token* +for each subsequent hcall until hypervisor returns a non *H_CONTINUE* +return value. + +HCALL Op-codes +============== + +Below is a partial list of HCALLs that are supported by PHYP. For the +corresponding opcode values please look into the arch specific header [4]_: + +**H_SCM_READ_METADATA** + +| Input: *drcIndex, offset, buffer-address, numBytesToRead* +| Out: *numBytesRead* +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware* + +Given a DRC Index of an NVDIMM, read N-bytes from the the metadata area +associated with it, at a specified offset and copy it to provided buffer. +The metadata area stores configuration information such as label information, +bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage +area hence a separate access semantics is provided. + +**H_SCM_WRITE_METADATA** + +| Input: *drcIndex, offset, data, numBytesToWrite* +| Out: *None* +| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware* + +Given a DRC Index of an NVDIMM, write N-bytes to the metadata area +associated with it, at the specified offset and from the provided buffer. + +**H_SCM_BIND_MEM** + +| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,* +| *targetLogicalMemoryAddress, continue-token* +| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound* +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,* +| *H_Too_Big, H_P5, H_Busy* + +Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range +*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest +at *targetLogicalMemoryAddress* within guest physical address space. In +case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor +assigns a target address to the guest. The HCALL can fail if the Guest has +an active PTE entry to the SCM block being bound. + +**H_SCM_UNBIND_MEM** +| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind +| Out: numScmBlocksUnbound +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,* +| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* + +Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting +at *startingScmLogicalMemoryAddress* from guest physical address space. The +HCALL can fail if the Guest has an active PTE entry to the SCM block being +unbound. + +**H_SCM_QUERY_BLOCK_MEM_BINDING** + +| Input: *drcIndex, scmBlockIndex* +| Out: *Guest-Physical-Address* +| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* + +Given a DRC-Index and an SCM Block index return the guest physical address to +which the SCM block is mapped to. + +**H_SCM_QUERY_LOGICAL_MEM_BINDING** + +| Input: *Guest-Physical-Address* +| Out: *drcIndex, scmBlockIndex* +| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* + +Given a guest physical address return which DRC Index and SCM block is mapped +to that address. + +**H_SCM_UNBIND_ALL** + +| Input: *scmTargetScope, drcIndex* +| Out: *None* +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,* +| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* + +Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs +or all SCM blocks belonging to a single NVDIMM identified by its drcIndex +from the LPAR memory. + +**H_SCM_HEALTH** + +| Input: drcIndex +| Out: *health-bitmap, health-bit-valid-bitmap* +| Return Value: *H_Success, H_Parameter, H_Hardware* + +Given a DRC Index return the info on predictive failure and overall health of +the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive +failure and health-bit-valid-bitmap indicate which bits in health-bitmap are +valid. + +**H_SCM_PERFORMANCE_STATS** + +| Input: drcIndex, resultBuffer Addr +| Out: None +| Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege* + +Given a DRC Index collect the performance statistics for NVDIMM and copy them +to the resultBuffer. + +References +========== +.. [1] "Power Architecture Platform Reference" + https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference +.. [2] "Linux on Power Architecture Platform Reference" + https://members.openpowerfoundation.org/document/dl/469 +.. [3] "Definitions and Notation" Book III-Section 14.5.3 + https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 +.. [4] arch/powerpc/include/asm/hvcall.h
This doc patch provides an initial description of the hcall op-codes that are used by Linux kernel running as a guest (LPAR) on top of PowerVM or any other sPAPR compliant hyper-visor (e.g qemu). Apart from documenting the hcalls the doc-patch also provides a rudimentary overview of how hcall ABI, how they are issued with the Linux kernel and how information/control flows between the guest and hypervisor. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> --- Change-log: Initial version of this doc-patch was posted and reviewed as part of the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec" https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top the original patch: * Replaced the of term PHYP with Hypervisor to indicate both PowerVM/Qemu [Laurent] * Emphasized that In/Out arguments to hcalls are in Big-endian format [Laurent] * Fixed minor word repetition, spell issues and grammatical error [Michal, Mpe] * Replaced various variant of term 'hcall' with a single variant. [Mpe] * Changed the documentation format from txt to ReST. [Mpe] * Changed the name of documentation file to papr_hcalls.rst. [Mpe] * Updated the section describing privileged operation by hypervisor to be more accurate [Mpe]. * Fixed up mention of register notation used for describing hcalls. [Mpe] * s/NVDimm/NVDIMM [Mpe] * Added section on return values from hcall [Mpe] * Described H_CONTINUE return-value for long running hcalls. --- Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++ 1 file changed, 200 insertions(+) create mode 100644 Documentation/powerpc/papr_hcalls.rst