Message ID | 20190723161357.26718-5-vaibhav@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | powerpc/papr_scm: Workaround for failure of drc bind after kexec | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (f3365d1a959d5c6527efe3d38276acc9b58e3f3f) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | warning | total: 0 errors, 0 warnings, 2 checks, 34 lines checked |
Le 23/07/2019 à 18:13, Vaibhav Jain a écrit : > In some cases initial bind of scm memory for an lpar can fail if > previously it wasn't released using a scm-unbind hcall. This situation > can arise due to panic of the previous kernel or forced lpar > fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error. > > To mitigate such cases the patch updates papr_scm_probe() to force a > call to drc_pmem_unbind() in case the initial bind of scm memory fails > with EBUSY error. In case scm-bind operation again fails after the > forced scm-unbind then we follow the existing error path. We also > update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp > and indicate it as a EBUSY error back to the caller. > > Suggested-by: "Oliver O'Halloran" <oohall@gmail.com> > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > Reviewed-by: Oliver O'Halloran <oohall@gmail.com> > --- > Change-log: > > v5: > * None. Re-spinning the patchset. > > v4: > * None. Re-spinning the patchset. > > v3: > * Minor update to a code comment. [Oliver] > > v2: > * Moved the retry code from drc_pmem_bind() to papr_scm_probe() > [Oliver] > * Changed the type of variable 'rc' in drc_pmem_bind() to > int64_t. [Oliver] > --- > arch/powerpc/platforms/pseries/papr_scm.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c > index 82568a7e0a7c..2c07908359b2 100644 > --- a/arch/powerpc/platforms/pseries/papr_scm.c > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > @@ -44,8 +44,9 @@ struct papr_scm_priv { > static int drc_pmem_bind(struct papr_scm_priv *p) > { > unsigned long ret[PLPAR_HCALL_BUFSIZE]; > - uint64_t rc, token; > uint64_t saved = 0; > + uint64_t token; > + int64_t rc; > > /* > * When the hypervisor cannot map all the requested memory in a single > @@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p) > } while (rc == H_BUSY); > > if (rc) { > + /* H_OVERLAP needs a separate error path */ > + if (rc == H_OVERLAP) > + return -EBUSY; > + > dev_err(&p->pdev->dev, "bind err: %lld\n", rc); > return -ENXIO; > } > @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev) > > /* request the hypervisor to bind this region to somewhere in memory */ > rc = drc_pmem_bind(p); > + > + /* If phyp says drc memory still bound then force unbound and retry */ > + if (rc == -EBUSY) { > + dev_warn(&pdev->dev, "Retrying bind after unbinding\n"); > + drc_pmem_unbind(p); > + rc = drc_pmem_bind(p); In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the unbinding has been done, the error would be silently processed. That sounds really unlikely, but should an error message be displayed in this particular case ? > + } > + > if (rc) > goto err; >
On Wed, Jul 24, 2019 at 7:17 PM Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote: > > Le 23/07/2019 à 18:13, Vaibhav Jain a écrit : > > *snip* > > @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev) > > > > /* request the hypervisor to bind this region to somewhere in memory */ > > rc = drc_pmem_bind(p); > > + > > + /* If phyp says drc memory still bound then force unbound and retry */ > > + if (rc == -EBUSY) { > > + dev_warn(&pdev->dev, "Retrying bind after unbinding\n"); > > + drc_pmem_unbind(p); > > + rc = drc_pmem_bind(p); > > In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the > unbinding has been done, the error would be silently processed. That sounds > really unlikely, but should an error message be displayed in this > particular case ? drc_pmem_bind() prints the h-call error code if we get one, so it's not silent
Le 24/07/2019 à 11:24, Oliver O'Halloran a écrit : > On Wed, Jul 24, 2019 at 7:17 PM Laurent Dufour > <ldufour@linux.vnet.ibm.com> wrote: >> >> Le 23/07/2019 à 18:13, Vaibhav Jain a écrit : >>> *snip* >>> @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev) >>> >>> /* request the hypervisor to bind this region to somewhere in memory */ >>> rc = drc_pmem_bind(p); >>> + >>> + /* If phyp says drc memory still bound then force unbound and retry */ >>> + if (rc == -EBUSY) { >>> + dev_warn(&pdev->dev, "Retrying bind after unbinding\n"); >>> + drc_pmem_unbind(p); >>> + rc = drc_pmem_bind(p); >> >> In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the >> unbinding has been done, the error would be silently processed. That sounds >> really unlikely, but should an error message be displayed in this >> particular case ? > > drc_pmem_bind() prints the h-call error code if we get one, so it's not silent That's no more the case whith this patch, H_OVERLAP is handled before writing the error message, which would make sense for the first try. For the record, the patch introduces: @@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p) } while (rc == H_BUSY); if (rc) { + /* H_OVERLAP needs a separate error path */ + if (rc == H_OVERLAP) + return -EBUSY; + dev_err(&p->pdev->dev, "bind err: %lld\n", rc); return -ENXIO; }
On Wed, Jul 24, 2019 at 7:27 PM Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote: > > Le 24/07/2019 à 11:24, Oliver O'Halloran a écrit : > > On Wed, Jul 24, 2019 at 7:17 PM Laurent Dufour > > <ldufour@linux.vnet.ibm.com> wrote: > >> > >> Le 23/07/2019 à 18:13, Vaibhav Jain a écrit : > >>> *snip* > >>> @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev) > >>> > >>> /* request the hypervisor to bind this region to somewhere in memory */ > >>> rc = drc_pmem_bind(p); > >>> + > >>> + /* If phyp says drc memory still bound then force unbound and retry */ > >>> + if (rc == -EBUSY) { > >>> + dev_warn(&pdev->dev, "Retrying bind after unbinding\n"); > >>> + drc_pmem_unbind(p); > >>> + rc = drc_pmem_bind(p); > >> > >> In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the > >> unbinding has been done, the error would be silently processed. That sounds > >> really unlikely, but should an error message be displayed in this > >> particular case ? > > > > drc_pmem_bind() prints the h-call error code if we get one, so it's not silent > > That's no more the case whith this patch, H_OVERLAP is handled before > writing the error message, which would make sense for the first try. > > For the record, the patch introduces: > > @@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p) > } while (rc == H_BUSY); > > if (rc) { > + /* H_OVERLAP needs a separate error path */ > + if (rc == H_OVERLAP) > + return -EBUSY; > + > dev_err(&p->pdev->dev, "bind err: %lld\n", rc); > return -ENXIO; > } Ah, good point. Getting H_OVERLAP is still an error case so I think it's reasonable to still print the message in that case.
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 82568a7e0a7c..2c07908359b2 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -44,8 +44,9 @@ struct papr_scm_priv { static int drc_pmem_bind(struct papr_scm_priv *p) { unsigned long ret[PLPAR_HCALL_BUFSIZE]; - uint64_t rc, token; uint64_t saved = 0; + uint64_t token; + int64_t rc; /* * When the hypervisor cannot map all the requested memory in a single @@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p) } while (rc == H_BUSY); if (rc) { + /* H_OVERLAP needs a separate error path */ + if (rc == H_OVERLAP) + return -EBUSY; + dev_err(&p->pdev->dev, "bind err: %lld\n", rc); return -ENXIO; } @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev) /* request the hypervisor to bind this region to somewhere in memory */ rc = drc_pmem_bind(p); + + /* If phyp says drc memory still bound then force unbound and retry */ + if (rc == -EBUSY) { + dev_warn(&pdev->dev, "Retrying bind after unbinding\n"); + drc_pmem_unbind(p); + rc = drc_pmem_bind(p); + } + if (rc) goto err;