diff mbox series

[2/2] hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure

Message ID 20210416210216.380291-3-danielhb413@gmail.com (mailing list archive)
State Accepted
Headers show
Series pseries: UNISOLATE DRCs to signal device removal error | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (0702e74703f57173e70cfab2a79a3e682e9e96ec)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
snowpatch_ozlabs/needsstable success Patch has no Fixes tags

Commit Message

Daniel Henrique Barboza April 16, 2021, 9:02 p.m. UTC
The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
for both QEMU and phyp. This gives us an opportunity to use this
behavior to signal the hypervisor layer when an error during device
removal happens, allowing it to do a proper error handling, while not
breaking QEMU/phyp implementations that don't have this support.

This patch introduces this idea by unisolating all CPU DRCs that failed
to be removed by dlpar_cpu_remove_by_index(), when handling the
PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
only because its the only CPU removal event QEMU uses, and there's no
need at this moment to add this mechanism for phyp only code.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

David Gibson April 19, 2021, 4:02 a.m. UTC | #1
On Fri, Apr 16, 2021 at 06:02:16PM -0300, Daniel Henrique Barboza wrote:
> The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
> already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
> for both QEMU and phyp. This gives us an opportunity to use this
> behavior to signal the hypervisor layer when an error during device
> removal happens, allowing it to do a proper error handling, while not
> breaking QEMU/phyp implementations that don't have this support.
> 
> This patch introduces this idea by unisolating all CPU DRCs that failed
> to be removed by dlpar_cpu_remove_by_index(), when handling the
> PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
> only because its the only CPU removal event QEMU uses, and there's no
> need at this moment to add this mechanism for phyp only code.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

except...

> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 12cbffd3c2e3..ed66895c2f51 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>  	case PSERIES_HP_ELOG_ACTION_REMOVE:
>  		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
>  			rc = dlpar_cpu_remove_by_count(count);
> -		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
> +		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
>  			rc = dlpar_cpu_remove_by_index(drc_index);
> +			/* Setting the isolation state of an UNISOLATED/CONFIGURED
> +			 * device to UNISOLATE is a no-op, but the hypervison can

typo here s/hypervison/hypervisor/

> +			 * use it as a hint that the cpu removal failed.
> +			 */
> +			if (rc)
> +				dlpar_unisolate_drc(drc_index);
> +		}
>  		else
>  			rc = -EINVAL;
>  		break;
Michael Ellerman April 19, 2021, 12:48 p.m. UTC | #2
Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
> already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
> for both QEMU and phyp. This gives us an opportunity to use this
> behavior to signal the hypervisor layer when an error during device
> removal happens, allowing it to do a proper error handling, while not
> breaking QEMU/phyp implementations that don't have this support.
>
> This patch introduces this idea by unisolating all CPU DRCs that failed
> to be removed by dlpar_cpu_remove_by_index(), when handling the
> PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
> only because its the only CPU removal event QEMU uses, and there's no
> need at this moment to add this mechanism for phyp only code.

Have you also confirmed that phyp is not bothered by it? ie. everything
seems to continue working when you trigger this path on phyp.

cheers

> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 12cbffd3c2e3..ed66895c2f51 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>  	case PSERIES_HP_ELOG_ACTION_REMOVE:
>  		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
>  			rc = dlpar_cpu_remove_by_count(count);
> -		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
> +		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
>  			rc = dlpar_cpu_remove_by_index(drc_index);
> +			/* Setting the isolation state of an UNISOLATED/CONFIGURED
> +			 * device to UNISOLATE is a no-op, but the hypervison can
> +			 * use it as a hint that the cpu removal failed.
> +			 */
> +			if (rc)
> +				dlpar_unisolate_drc(drc_index);
> +		}
>  		else
>  			rc = -EINVAL;
>  		break;
> -- 
> 2.30.2
Daniel Henrique Barboza April 19, 2021, 1:13 p.m. UTC | #3
On 4/19/21 9:48 AM, Michael Ellerman wrote:
> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
>> The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
>> already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
>> for both QEMU and phyp. This gives us an opportunity to use this
>> behavior to signal the hypervisor layer when an error during device
>> removal happens, allowing it to do a proper error handling, while not
>> breaking QEMU/phyp implementations that don't have this support.
>>
>> This patch introduces this idea by unisolating all CPU DRCs that failed
>> to be removed by dlpar_cpu_remove_by_index(), when handling the
>> PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
>> only because its the only CPU removal event QEMU uses, and there's no
>> need at this moment to add this mechanism for phyp only code.
> 
> Have you also confirmed that phyp is not bothered by it? ie. everything
> seems to continue working when you trigger this path on phyp.

Yes. Daniel Bueso (dbuesom@us.ibm.com) from the partition firmware team
helped me with that. We confirmed that phyp returns RTAS_OK under these
conditions (Unisolating an unisolated/configured DRC).


Thanks,


DHB

> 
> cheers
> 
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 12cbffd3c2e3..ed66895c2f51 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -802,8 +802,15 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>>   	case PSERIES_HP_ELOG_ACTION_REMOVE:
>>   		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
>>   			rc = dlpar_cpu_remove_by_count(count);
>> -		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
>> +		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
>>   			rc = dlpar_cpu_remove_by_index(drc_index);
>> +			/* Setting the isolation state of an UNISOLATED/CONFIGURED
>> +			 * device to UNISOLATE is a no-op, but the hypervison can
>> +			 * use it as a hint that the cpu removal failed.
>> +			 */
>> +			if (rc)
>> +				dlpar_unisolate_drc(drc_index);
>> +		}
>>   		else
>>   			rc = -EINVAL;
>>   		break;
>> -- 
>> 2.30.2
Michael Ellerman April 20, 2021, 3:57 a.m. UTC | #4
Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> On 4/19/21 9:48 AM, Michael Ellerman wrote:
>> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
>>> The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is
>>> already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else
>>> for both QEMU and phyp. This gives us an opportunity to use this
>>> behavior to signal the hypervisor layer when an error during device
>>> removal happens, allowing it to do a proper error handling, while not
>>> breaking QEMU/phyp implementations that don't have this support.
>>>
>>> This patch introduces this idea by unisolating all CPU DRCs that failed
>>> to be removed by dlpar_cpu_remove_by_index(), when handling the
>>> PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event
>>> only because its the only CPU removal event QEMU uses, and there's no
>>> need at this moment to add this mechanism for phyp only code.
>> 
>> Have you also confirmed that phyp is not bothered by it? ie. everything
>> seems to continue working when you trigger this path on phyp.
>
> Yes. Daniel Bueso (dbuesom@us.ibm.com) from the partition firmware team
> helped me with that. We confirmed that phyp returns RTAS_OK under these
> conditions (Unisolating an unisolated/configured DRC).

Thanks.

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..ed66895c2f51 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -802,8 +802,15 @@  int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
 	case PSERIES_HP_ELOG_ACTION_REMOVE:
 		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
 			rc = dlpar_cpu_remove_by_count(count);
-		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
+		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
 			rc = dlpar_cpu_remove_by_index(drc_index);
+			/* Setting the isolation state of an UNISOLATED/CONFIGURED
+			 * device to UNISOLATE is a no-op, but the hypervison can
+			 * use it as a hint that the cpu removal failed.
+			 */
+			if (rc)
+				dlpar_unisolate_drc(drc_index);
+		}
 		else
 			rc = -EINVAL;
 		break;