mbox series

[v5,0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec

Message ID 20190723161357.26718-1-vaibhav@linux.ibm.com (mailing list archive)
Headers show
Series powerpc/papr_scm: Workaround for failure of drc bind after kexec | expand

Message

Vaibhav Jain July 23, 2019, 4:13 p.m. UTC
Presently an error is returned in response to hcall H_SCM_BIND_MEM when a
new kernel boots on lpar via kexec. This prevents papr_scm from registering
drc memory regions with nvdimm. The error reported is of the form below:

"papr_scm ibm,persistent-memory:ibm,pmemory@44100002: bind err: -68"

On investigation it was revealed that phyp returns this error as previous
kernel did not completely release bindings for drc scm-memory blocks and
hence phyp rejected request for re-binding these block to lpar with error
H_OVERLAP. Also support for a new H_SCM_UNBIND_ALL is recently added which
is better suited for releasing all the bound scm-memory block from an lpar.

So leveraging new hcall H_SCM_UNBIND_ALL, we can workaround H_OVERLAP issue
during kexec by forcing an unbind of all drm scm-memory blocks and issuing
H_SCM_BIND_MEM to re-bind the drc scm-memory blocks to lpar. This sequence
will also be needed when a new kernel boot on lpar after previous kernel
panicked and it never got an opportunity to call H_SCM_UNBIND_MEM/ALL.

Hence this patch-set implements following changes to papr_scm module:

* Update hvcall.h to include opcodes for new hcall H_SCM_UNBIND_ALL.

* Update it to use H_SCM_UNBIND_ALL instead of H_SCM_UNBIND_MEM

* In case hcall H_SCM_BIND_MEM fails with error H_OVERLAP, force
  H_SCM_UNBIND_ALL and retry the bind operation again.

With the patch-set applied re-bind of drc scm-memory to lpar succeeds after
a kexec to new kernel as illustrated below:

# Old kernel
$ sudo ndctl list -R
[
  {
    "dev":"region0",
    <snip>
    ....
  }
]
# kexec to new kernel
$ sudo kexec --initrd=... vmlinux
...
...
I'm in purgatory
...
papr_scm ibm,persistent-memory:ibm,pmemory@44100002: Un-binding and retrying
...
# New kernel
$ sudo ndctl list -R
[
  {
    "dev":"region0",
    <snip>
    ....
  }
]

---
Change-log:
v5:
* Added a new doc-patch describing the HCALL interface between a guest kernel
  and PAPR compliant hyper-visor like PowerVM/KVM.

v4:
* Updated the patch description of first patch in the series as suggested
  by Mpe.

v3:
* Fixed a build warning reported by kbuild test robot.
* Updated the hcall opcode from latest papr-scm specification.
* Fixed a minor code comment & patch description as pointed out by Oliver.

v2:
* Addressed review comments from Oliver on v1 patchset.

Vaibhav Jain (4):
  powerpc: Document some HCalls for Storage Class Memory
  powerpc/pseries: Update SCM hcall op-codes in hvcall.h
  powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL
  powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails

 Documentation/powerpc/hcalls.txt          | 140 ++++++++++++++++++++++
 arch/powerpc/include/asm/hvcall.h         |  11 +-
 arch/powerpc/platforms/pseries/papr_scm.c |  44 +++++--
 3 files changed, 184 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/powerpc/hcalls.txt

Comments

Aneesh Kumar K V July 24, 2019, 10:04 a.m. UTC | #1
Vaibhav Jain <vaibhav@linux.ibm.com> writes:

> Presently an error is returned in response to hcall H_SCM_BIND_MEM when a
> new kernel boots on lpar via kexec. This prevents papr_scm from registering
> drc memory regions with nvdimm. The error reported is of the form below:
>
> "papr_scm ibm,persistent-memory:ibm,pmemory@44100002: bind err: -68"
>
> On investigation it was revealed that phyp returns this error as previous
> kernel did not completely release bindings for drc scm-memory blocks and
> hence phyp rejected request for re-binding these block to lpar with error
> H_OVERLAP. Also support for a new H_SCM_UNBIND_ALL is recently added which
> is better suited for releasing all the bound scm-memory block from an lpar.
>
> So leveraging new hcall H_SCM_UNBIND_ALL, we can workaround H_OVERLAP issue
> during kexec by forcing an unbind of all drm scm-memory blocks and issuing
> H_SCM_BIND_MEM to re-bind the drc scm-memory blocks to lpar. This sequence
> will also be needed when a new kernel boot on lpar after previous kernel
> panicked and it never got an opportunity to call H_SCM_UNBIND_MEM/ALL.
>
> Hence this patch-set implements following changes to papr_scm module:
>
> * Update hvcall.h to include opcodes for new hcall H_SCM_UNBIND_ALL.
>
> * Update it to use H_SCM_UNBIND_ALL instead of H_SCM_UNBIND_MEM
>
> * In case hcall H_SCM_BIND_MEM fails with error H_OVERLAP, force
>   H_SCM_UNBIND_ALL and retry the bind operation again.
>

You can add for the series.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>