diff mbox series

[v5,4/4] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails

Message ID 20190723161357.26718-5-vaibhav@linux.ibm.com (mailing list archive)
State Superseded
Headers show
Series powerpc/papr_scm: Workaround for failure of drc bind after kexec | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (f3365d1a959d5c6527efe3d38276acc9b58e3f3f)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch warning total: 0 errors, 0 warnings, 2 checks, 34 lines checked

Commit Message

Vaibhav Jain July 23, 2019, 4:13 p.m. UTC
In some cases initial bind of scm memory for an lpar can fail if
previously it wasn't released using a scm-unbind hcall. This situation
can arise due to panic of the previous kernel or forced lpar
fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.

To mitigate such cases the patch updates papr_scm_probe() to force a
call to drc_pmem_unbind() in case the initial bind of scm memory fails
with EBUSY error. In case scm-bind operation again fails after the
forced scm-unbind then we follow the existing error path. We also
update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
and indicate it as a EBUSY error back to the caller.

Suggested-by: "Oliver O'Halloran" <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
---
Change-log:

v5:
* None. Re-spinning the patchset.

v4:
* None. Re-spinning the patchset.

v3:
* Minor update to a code comment. [Oliver]

v2:
* Moved the retry code from drc_pmem_bind() to papr_scm_probe()
  [Oliver]
* Changed the type of variable 'rc' in drc_pmem_bind() to
  int64_t. [Oliver]
---
 arch/powerpc/platforms/pseries/papr_scm.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Laurent Dufour July 24, 2019, 9:17 a.m. UTC | #1
Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
> In some cases initial bind of scm memory for an lpar can fail if
> previously it wasn't released using a scm-unbind hcall. This situation
> can arise due to panic of the previous kernel or forced lpar
> fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.
> 
> To mitigate such cases the patch updates papr_scm_probe() to force a
> call to drc_pmem_unbind() in case the initial bind of scm memory fails
> with EBUSY error. In case scm-bind operation again fails after the
> forced scm-unbind then we follow the existing error path. We also
> update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
> and indicate it as a EBUSY error back to the caller.
> 
> Suggested-by: "Oliver O'Halloran" <oohall@gmail.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
> ---
> Change-log:
> 
> v5:
> * None. Re-spinning the patchset.
> 
> v4:
> * None. Re-spinning the patchset.
> 
> v3:
> * Minor update to a code comment. [Oliver]
> 
> v2:
> * Moved the retry code from drc_pmem_bind() to papr_scm_probe()
>    [Oliver]
> * Changed the type of variable 'rc' in drc_pmem_bind() to
>    int64_t. [Oliver]
> ---
>   arch/powerpc/platforms/pseries/papr_scm.c | 15 ++++++++++++++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 82568a7e0a7c..2c07908359b2 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -44,8 +44,9 @@ struct papr_scm_priv {
>   static int drc_pmem_bind(struct papr_scm_priv *p)
>   {
>   	unsigned long ret[PLPAR_HCALL_BUFSIZE];
> -	uint64_t rc, token;
>   	uint64_t saved = 0;
> +	uint64_t token;
> +	int64_t rc;
> 
>   	/*
>   	 * When the hypervisor cannot map all the requested memory in a single
> @@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
>   	} while (rc == H_BUSY);
> 
>   	if (rc) {
> +		/* H_OVERLAP needs a separate error path */
> +		if (rc == H_OVERLAP)
> +			return -EBUSY;
> +
>   		dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
>   		return -ENXIO;
>   	}
> @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev)
> 
>   	/* request the hypervisor to bind this region to somewhere in memory */
>   	rc = drc_pmem_bind(p);
> +
> +	/* If phyp says drc memory still bound then force unbound and retry */
> +	if (rc == -EBUSY) {
> +		dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
> +		drc_pmem_unbind(p);
> +		rc = drc_pmem_bind(p);

In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the 
unbinding has been done, the error would be silently processed. That sounds 
really unlikely, but should an error message be displayed in this 
particular case ?

> +	}
> +
>   	if (rc)
>   		goto err;
>
Oliver O'Halloran July 24, 2019, 9:24 a.m. UTC | #2
On Wed, Jul 24, 2019 at 7:17 PM Laurent Dufour
<ldufour@linux.vnet.ibm.com> wrote:
>
> Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
> > *snip*
> > @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev)
> >
> >       /* request the hypervisor to bind this region to somewhere in memory */
> >       rc = drc_pmem_bind(p);
> > +
> > +     /* If phyp says drc memory still bound then force unbound and retry */
> > +     if (rc == -EBUSY) {
> > +             dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
> > +             drc_pmem_unbind(p);
> > +             rc = drc_pmem_bind(p);
>
> In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the
> unbinding has been done, the error would be silently processed. That sounds
> really unlikely, but should an error message be displayed in this
> particular case ?

drc_pmem_bind() prints the h-call error code if we get one, so it's not silent
Laurent Dufour July 24, 2019, 9:27 a.m. UTC | #3
Le 24/07/2019 à 11:24, Oliver O'Halloran a écrit :
> On Wed, Jul 24, 2019 at 7:17 PM Laurent Dufour
> <ldufour@linux.vnet.ibm.com> wrote:
>>
>> Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
>>> *snip*
>>> @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev)
>>>
>>>        /* request the hypervisor to bind this region to somewhere in memory */
>>>        rc = drc_pmem_bind(p);
>>> +
>>> +     /* If phyp says drc memory still bound then force unbound and retry */
>>> +     if (rc == -EBUSY) {
>>> +             dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
>>> +             drc_pmem_unbind(p);
>>> +             rc = drc_pmem_bind(p);
>>
>> In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the
>> unbinding has been done, the error would be silently processed. That sounds
>> really unlikely, but should an error message be displayed in this
>> particular case ?
> 
> drc_pmem_bind() prints the h-call error code if we get one, so it's not silent

That's no more the case whith this patch, H_OVERLAP is handled before 
writing the error message, which would make sense for the first try.

For the record, the patch introduces:

@@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
  	} while (rc == H_BUSY);

  	if (rc) {
+		/* H_OVERLAP needs a separate error path */
+		if (rc == H_OVERLAP)
+			return -EBUSY;
+
  		dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
  		return -ENXIO;
  	}
Oliver O'Halloran July 24, 2019, 9:45 a.m. UTC | #4
On Wed, Jul 24, 2019 at 7:27 PM Laurent Dufour
<ldufour@linux.vnet.ibm.com> wrote:
>
> Le 24/07/2019 à 11:24, Oliver O'Halloran a écrit :
> > On Wed, Jul 24, 2019 at 7:17 PM Laurent Dufour
> > <ldufour@linux.vnet.ibm.com> wrote:
> >>
> >> Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
> >>> *snip*
> >>> @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev)
> >>>
> >>>        /* request the hypervisor to bind this region to somewhere in memory */
> >>>        rc = drc_pmem_bind(p);
> >>> +
> >>> +     /* If phyp says drc memory still bound then force unbound and retry */
> >>> +     if (rc == -EBUSY) {
> >>> +             dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
> >>> +             drc_pmem_unbind(p);
> >>> +             rc = drc_pmem_bind(p);
> >>
> >> In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the
> >> unbinding has been done, the error would be silently processed. That sounds
> >> really unlikely, but should an error message be displayed in this
> >> particular case ?
> >
> > drc_pmem_bind() prints the h-call error code if we get one, so it's not silent
>
> That's no more the case whith this patch, H_OVERLAP is handled before
> writing the error message, which would make sense for the first try.
>
> For the record, the patch introduces:
>
> @@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
>         } while (rc == H_BUSY);
>
>         if (rc) {
> +               /* H_OVERLAP needs a separate error path */
> +               if (rc == H_OVERLAP)
> +                       return -EBUSY;
> +
>                 dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
>                 return -ENXIO;
>         }

Ah, good point. Getting H_OVERLAP is still an error case so I think
it's reasonable to still print the message in that case.
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 82568a7e0a7c..2c07908359b2 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -44,8 +44,9 @@  struct papr_scm_priv {
 static int drc_pmem_bind(struct papr_scm_priv *p)
 {
 	unsigned long ret[PLPAR_HCALL_BUFSIZE];
-	uint64_t rc, token;
 	uint64_t saved = 0;
+	uint64_t token;
+	int64_t rc;
 
 	/*
 	 * When the hypervisor cannot map all the requested memory in a single
@@ -65,6 +66,10 @@  static int drc_pmem_bind(struct papr_scm_priv *p)
 	} while (rc == H_BUSY);
 
 	if (rc) {
+		/* H_OVERLAP needs a separate error path */
+		if (rc == H_OVERLAP)
+			return -EBUSY;
+
 		dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
 		return -ENXIO;
 	}
@@ -404,6 +409,14 @@  static int papr_scm_probe(struct platform_device *pdev)
 
 	/* request the hypervisor to bind this region to somewhere in memory */
 	rc = drc_pmem_bind(p);
+
+	/* If phyp says drc memory still bound then force unbound and retry */
+	if (rc == -EBUSY) {
+		dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
+		drc_pmem_unbind(p);
+		rc = drc_pmem_bind(p);
+	}
+
 	if (rc)
 		goto err;