diff mbox series

[v3,1/1] hotplug-cpu.c: show 'last online CPU' error in dlpar_cpu_offline()

Message ID 20210326141954.236323-2-danielhb413@gmail.com (mailing list archive)
State Superseded
Headers show
Series show 'last online CPU' error in dlpar_cpu_offline() | expand
Related show

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (87d76f542a24ecfa797e9bd3bb56c0f19aabff57)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
snowpatch_ozlabs/needsstable success Patch has no Fixes tags

Commit Message

Daniel Henrique Barboza March 26, 2021, 2:19 p.m. UTC
One of the reasons that dlpar_cpu_offline can fail is when attempting to
offline the last online CPU of the kernel. This can be observed in a
pseries QEMU guest that has hotplugged CPUs. If the user offlines all
other CPUs of the guest, and a hotplugged CPU is now the last online
CPU, trying to reclaim it will fail. See [1] for an example.

The current error message in this situation returns rc with -EBUSY and a
generic explanation, e.g.:

pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9, rc: -16

EBUSY can be caused by other conditions, such as cpu_hotplug_disable
being true. Throwing a more specific error message for this case,
instead of just "Failed to offline CPU", makes it clearer that the error
is in fact a known error situation instead of other generic/unknown
cause.

This patch adds a 'last online' check in dlpar_cpu_offline() to catch
the 'last online CPU' offline error, returning a more informative error
message:

pseries-hotplug-cpu: Unable to remove last online CPU PowerPC,POWER9

[1] https://bugzilla.redhat.com/1911414

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Andrew Donnellan March 26, 2021, 3:30 p.m. UTC | #1
On 27/3/21 1:19 am, Daniel Henrique Barboza wrote:
> One of the reasons that dlpar_cpu_offline can fail is when attempting to
> offline the last online CPU of the kernel. This can be observed in a
> pseries QEMU guest that has hotplugged CPUs. If the user offlines all
> other CPUs of the guest, and a hotplugged CPU is now the last online
> CPU, trying to reclaim it will fail. See [1] for an example.
> 
> The current error message in this situation returns rc with -EBUSY and a
> generic explanation, e.g.:
> 
> pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9, rc: -16
> 
> EBUSY can be caused by other conditions, such as cpu_hotplug_disable
> being true. Throwing a more specific error message for this case,
> instead of just "Failed to offline CPU", makes it clearer that the error
> is in fact a known error situation instead of other generic/unknown
> cause.
> 
> This patch adds a 'last online' check in dlpar_cpu_offline() to catch
> the 'last online CPU' offline error, returning a more informative error
> message:
> 
> pseries-hotplug-cpu: Unable to remove last online CPU PowerPC,POWER9
> 
> [1] https://bugzilla.redhat.com/1911414
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Thanks for addressing the issues in Daniel's review.

I haven't tested it, but this patch looks sensible enough to me.

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..4b9df4d645b4 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -271,6 +271,19 @@  static int dlpar_offline_cpu(struct device_node *dn)
 			if (!cpu_online(cpu))
 				break;
 
+			/* device_offline() will return -EBUSY (via cpu_down())
+			 * if there is only one CPU left. Check it here to fail
+			 * earlier and with a more informative error message,
+			 * while also retaining the cpu_add_remove_lock to be sure
+			 * that no CPUs are being online/offlined during this
+			 * check.
+			 */
+			if (num_online_cpus() == 1) {
+				pr_warn("Unable to remove last online CPU %pOFn\n", dn);
+				rc = -EBUSY;
+				goto out_unlock;
+			}
+
 			cpu_maps_update_done();
 			rc = device_offline(get_cpu_device(cpu));
 			if (rc)
@@ -283,6 +296,7 @@  static int dlpar_offline_cpu(struct device_node *dn)
 				thread);
 		}
 	}
+out_unlock:
 	cpu_maps_update_done();
 
 out: