Message ID | 20190624145913.20122-3-vaibhav@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | powerpc/papr_scm: Workaround for failure of drc bind after kexec | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (e610a466d16a086e321f0bd421e2fc75cff28605) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | warning | total: 0 errors, 0 warnings, 1 checks, 47 lines checked |
On Tue, Jun 25, 2019 at 12:59 AM Vaibhav Jain <vaibhav@linux.ibm.com> wrote: > > In some cases initial bind of scm memory for an lpar can fail if > previously it wasn't released using a scm-unbind hcall. This situation > can arise due to panic of the previous kernel or forced lpar reset. In > such cases the H_SCM_BIND_MEM return a H_OVERLAP error. What is a forced lpar reset? fadump? > To mitigate such cases the patch updates drc_pmem_bind() to force a > call to drc_pmem_unbind() in case the initial bind of scm memory fails > with H_OVERLAP error. In case scm-bind operation again fails after the > forced scm-unbind then we follow the existing error path. > > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > --- > arch/powerpc/platforms/pseries/papr_scm.c | 23 ++++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c > index d790e4e4ffb3..049d7927c0a4 100644 > --- a/arch/powerpc/platforms/pseries/papr_scm.c > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > @@ -44,19 +44,26 @@ struct papr_scm_priv { > struct nd_interleave_set nd_set; > }; > +/* Forward declaration */ pointless comment. > +static int drc_pmem_unbind(struct papr_scm_priv *); > > static int drc_pmem_bind(struct papr_scm_priv *p) > { > unsigned long ret[PLPAR_HCALL_BUFSIZE]; > uint64_t rc, token; > - uint64_t saved = 0; > + uint64_t saved; > + bool tried_unbind = false; nit: kernel style uses reverse christmas tree declarations, so this should be: unsigned long ret[PLPAR_HCALL_BUFSIZE]; bool tried_unbind = false; uint64_t rc, token; uint64_t saved; Come to think of it rc should probably be signed since the hcall return codes are negative. I'm surprised that's not causing a warning since we use %lld to print rc rather than %llu. > + dev_dbg(&p->pdev->dev, "bind drc %x\n", p->drc_index); > /* > * When the hypervisor cannot map all the requested memory in a single > * hcall it returns H_BUSY and we call again with the token until > * we get H_SUCCESS. Aborting the retry loop before getting H_SUCCESS > * leave the system in an undefined state, so we wait. > */ > +retry: > token = 0; > + saved = 0; > > do { > rc = plpar_hcall(H_SCM_BIND_MEM, ret, p->drc_index, 0, > @@ -68,8 +75,18 @@ static int drc_pmem_bind(struct papr_scm_priv *p) > } while (rc == H_BUSY); > > if (rc) { > - dev_err(&p->pdev->dev, "bind err: %lld\n", rc); > - return -ENXIO; > + /* retry after unbinding */ > + if (rc == H_OVERLAP && !tried_unbind) { > + dev_warn(&p->pdev->dev, "Un-binding and retrying\n"); > + /* Try unbind and ignore any errors */ > + tried_unbind = true; > + drc_pmem_unbind(p); > + goto retry; I think It'd be cleaner if we put the unbind-and-retry logic into the probe function, e.g: diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 96c53b23e58f..d113779fc27c 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -316,6 +316,14 @@ static int papr_scm_probe(struct platform_device *pdev) /* request the hypervisor to bind this region to somewhere in memory */ rc = drc_pmem_bind(p); + if (rc == -EBUSY) { + /* + * If we kexec()ed the previous kernel might have left the DRC + * bound in memory. Unbind it and try again. + */ + drc_pmem_unbind(p); + rc = drc_pmem_bind(p); + } if (rc) goto err; Up to you though.
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index d790e4e4ffb3..049d7927c0a4 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -44,19 +44,26 @@ struct papr_scm_priv { struct nd_interleave_set nd_set; }; +/* Forward declaration */ +static int drc_pmem_unbind(struct papr_scm_priv *); + static int drc_pmem_bind(struct papr_scm_priv *p) { unsigned long ret[PLPAR_HCALL_BUFSIZE]; uint64_t rc, token; - uint64_t saved = 0; + uint64_t saved; + bool tried_unbind = false; + dev_dbg(&p->pdev->dev, "bind drc %x\n", p->drc_index); /* * When the hypervisor cannot map all the requested memory in a single * hcall it returns H_BUSY and we call again with the token until * we get H_SUCCESS. Aborting the retry loop before getting H_SUCCESS * leave the system in an undefined state, so we wait. */ +retry: token = 0; + saved = 0; do { rc = plpar_hcall(H_SCM_BIND_MEM, ret, p->drc_index, 0, @@ -68,8 +75,18 @@ static int drc_pmem_bind(struct papr_scm_priv *p) } while (rc == H_BUSY); if (rc) { - dev_err(&p->pdev->dev, "bind err: %lld\n", rc); - return -ENXIO; + /* retry after unbinding */ + if (rc == H_OVERLAP && !tried_unbind) { + dev_warn(&p->pdev->dev, "Un-binding and retrying\n"); + /* Try unbind and ignore any errors */ + tried_unbind = true; + drc_pmem_unbind(p); + goto retry; + + } else { + dev_err(&p->pdev->dev, "bind err: %lld\n", rc); + return -ENXIO; + } } p->bound_addr = saved;
In some cases initial bind of scm memory for an lpar can fail if previously it wasn't released using a scm-unbind hcall. This situation can arise due to panic of the previous kernel or forced lpar reset. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error. To mitigate such cases the patch updates drc_pmem_bind() to force a call to drc_pmem_unbind() in case the initial bind of scm memory fails with H_OVERLAP error. In case scm-bind operation again fails after the forced scm-unbind then we follow the existing error path. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> --- arch/powerpc/platforms/pseries/papr_scm.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)