Message ID | 20210114180628.1675603-8-danielhb413@gmail.com |
---|---|
State | New |
Headers | show |
Series | pseries: avoid unplug the last online CPU core + assorted fixes | expand |
On Thu, Jan 14, 2021 at 03:06:28PM -0300, Daniel Henrique Barboza wrote: > The only restriction we have when unplugging CPUs is to forbid unplug of > the boot cpu core. spapr_core_unplug_possible() does not contemplate the > possibility of some cores being offlined by the guest, meaning that we're > rolling the dice regarding on whether we're unplugging the last online > CPU core the guest has. > > If we hit the jackpot, we're going to detach the core DRC and pulse the > hotplug IRQ, but the guest OS will refuse to release the CPU. Our > spapr_core_unplug() DRC release callback will never be called and the CPU > core object will keep existing in QEMU. No error message will be sent > to the user, but the CPU core wasn't unplugged from the guest. > > If the guest OS onlines the CPU core again we won't be able to hotunplug it > either. 'dmesg' inside the guest will report a failed attempt to offline an > unknown CPU: > > [ 923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 > > This is the result of stopping the DRC state transition in the middle in the > first failed attempt. > > We can avoid this, and potentially other bad things from happening, if we > avoid to attempt the unplug altogether in this scenario. Let's check for > the online/offline state of the CPU cores in the guest before allowing > the hotunplug, and forbid removing a CPU core if it's the last one online > in the guest. Good explanation overall, but I think it would be a bit clearer and more direct if you remove the "roll the dice" / "hit the jackpot" metaphor. > Reported-by: Xujun Ma <xuma@redhat.com> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> > --- > hw/ppc/spapr.c | 39 ++++++++++++++++++++++++++++++++++++++- > 1 file changed, 38 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index a2f01c21aa..d269dcd102 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -3709,9 +3709,16 @@ static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev) > static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, > Error **errp) This will need a small rework w.r.t. my suggestions for the previous patch, obviously. > { > + CPUArchId *core_slot; > + SpaprCpuCore *core; > + PowerPCCPU *cpu; > + CPUState *cs; > + bool last_cpu_online = true; > int index; > > - if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) { > + core_slot = spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, > + &index); > + if (!core_slot) { > error_setg(errp, "Unable to find CPU core with core-id: %d", > cc->core_id); > return -1; > @@ -3722,6 +3729,36 @@ static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, > return -1; > } > > + /* Allow for any non-boot CPU core to be unplugged if already offline */ > + core = SPAPR_CPU_CORE(core_slot->cpu); > + cs = CPU(core->threads[0]); > + if (cs->halted) { > + return 0; > + } I think you need to check that *all* the cpu's threads are offline, not just thread 0 for this to be correct. > + > + /* > + * Do not allow core unplug if it's the last core online. > + */ > + cpu = POWERPC_CPU(cs); > + CPU_FOREACH(cs) { > + PowerPCCPU *c = POWERPC_CPU(cs); > + > + if (c == cpu) { > + continue; > + } > + > + if (!cs->halted) { > + last_cpu_online = false; > + break; > + } > + } Likewise here I think your logic needs more careful handling of threads - you need to disallow the hotplug if all of the currently online threads are on the core slated for removal. I'm also a little bit worried about whether just checking cs->halted is sufficient. That's a qemu/tcg core concept that I think that may be set in some circumstances when the CPU is *not* offline. The logic of the suspend-me RTAS call is specifically to both set halted *and* to block interrupts so there's nothing that can pull the vcpu out of halted state. It's possible that handling this correctly will require adding some qemu internal state to explicitly track the "online" state of a vcpu as managed by "suspend-me" and "start-cpu" RTAS calls. > + > + if (last_cpu_online) { > + error_setg(errp, "Unable to unplug CPU core with core-id %d: it is " > + "the only CPU core online in the guest", cc->core_id); > + return -1; > + } > + > return 0; > } >
On Thu, 14 Jan 2021 15:06:28 -0300 Daniel Henrique Barboza <danielhb413@gmail.com> wrote: > The only restriction we have when unplugging CPUs is to forbid unplug of > the boot cpu core. spapr_core_unplug_possible() does not contemplate the I can't remember why this restriction was introduced in the first place... This should be investigated and documented if the limitation still stands. > possibility of some cores being offlined by the guest, meaning that we're > rolling the dice regarding on whether we're unplugging the last online > CPU core the guest has. > Trying to unplug the last CPU is obviously something that deserves special care. LoPAPR is quite explicit on the outcome : this should terminate the partition. 13.7.4.1.1. Isolation of CPUs The isolation of a CPU, in all cases, is preceded by the stop-self RTAS function for all processor threads, and the OS insures that all the CPU’s threads are in the RTAS stopped state prior to isolating the CPU. Isolation of a processor that is not stopped produces unpredictable results. The stopping of the last processor thread of a LPAR partition effectively kills the partition, and at that point, ownership of all partition resources reverts to the platform firmware. R1-13.7.4.1.1-1. For the LRDR option: Prior to issuing the RTAS set-indicator specifying isolate isolation-state of a CPU DR connector type, all the CPU threads must be in the RTAS stopped state. R1-13.7.4.1.1-2. For the LRDR option: Stopping of the last processor thread of a LPAR partition with the stop-self RTAS function, must kill the partition, with ownership of all partition resources reverting to the platform firmware. This is clearly not how things work today : linux doesn't call "stop-self" on the last vCPU and even if it did, QEMU doesn't terminate the VM. If there's a valid reason to not implement this PAPR behavior, I'd like it to be documented. > If we hit the jackpot, we're going to detach the core DRC and pulse the > hotplug IRQ, but the guest OS will refuse to release the CPU. Our > spapr_core_unplug() DRC release callback will never be called and the CPU > core object will keep existing in QEMU. No error message will be sent > to the user, but the CPU core wasn't unplugged from the guest. > > If the guest OS onlines the CPU core again we won't be able to hotunplug it > either. 'dmesg' inside the guest will report a failed attempt to offline an > unknown CPU: > > [ 923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 > > This is the result of stopping the DRC state transition in the middle in the > first failed attempt. > Yes, at this point only a machine reset can fix things up. Given this is linux's choice not to call "stop-self" as it should do, I'm not super fan of hardcoding this logic in QEMU, unless there are really good reasons to do so. > We can avoid this, and potentially other bad things from happening, if we > avoid to attempt the unplug altogether in this scenario. Let's check for > the online/offline state of the CPU cores in the guest before allowing > the hotunplug, and forbid removing a CPU core if it's the last one online > in the guest. > > Reported-by: Xujun Ma <xuma@redhat.com> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> > --- > hw/ppc/spapr.c | 39 ++++++++++++++++++++++++++++++++++++++- > 1 file changed, 38 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index a2f01c21aa..d269dcd102 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -3709,9 +3709,16 @@ static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev) > static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, > Error **errp) > { > + CPUArchId *core_slot; > + SpaprCpuCore *core; > + PowerPCCPU *cpu; > + CPUState *cs; > + bool last_cpu_online = true; > int index; > > - if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) { > + core_slot = spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, > + &index); > + if (!core_slot) { > error_setg(errp, "Unable to find CPU core with core-id: %d", > cc->core_id); > return -1; > @@ -3722,6 +3729,36 @@ static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, > return -1; > } > > + /* Allow for any non-boot CPU core to be unplugged if already offline */ > + core = SPAPR_CPU_CORE(core_slot->cpu); > + cs = CPU(core->threads[0]); > + if (cs->halted) { > + return 0; > + } > + > + /* > + * Do not allow core unplug if it's the last core online. > + */ > + cpu = POWERPC_CPU(cs); > + CPU_FOREACH(cs) { > + PowerPCCPU *c = POWERPC_CPU(cs); > + > + if (c == cpu) { > + continue; > + } > + > + if (!cs->halted) { > + last_cpu_online = false; > + break; > + } > + } > + > + if (last_cpu_online) { > + error_setg(errp, "Unable to unplug CPU core with core-id %d: it is " > + "the only CPU core online in the guest", cc->core_id); > + return -1; > + } > + > return 0; > } >
On 1/15/21 2:22 PM, Greg Kurz wrote: > On Thu, 14 Jan 2021 15:06:28 -0300 > Daniel Henrique Barboza <danielhb413@gmail.com> wrote: > >> The only restriction we have when unplugging CPUs is to forbid unplug of >> the boot cpu core. spapr_core_unplug_possible() does not contemplate the I'll look into it. > > I can't remember why this restriction was introduced in the first place... > This should be investigated and documented if the limitation still stands. > >> possibility of some cores being offlined by the guest, meaning that we're >> rolling the dice regarding on whether we're unplugging the last online >> CPU core the guest has. >> > > Trying to unplug the last CPU is obviously something that deserves > special care. LoPAPR is quite explicit on the outcome : this should > terminate the partition. > > 13.7.4.1.1. Isolation of CPUs > > The isolation of a CPU, in all cases, is preceded by the stop-self > RTAS function for all processor threads, and the OS insures that all > the CPU’s threads are in the RTAS stopped state prior to isolating the > CPU. Isolation of a processor that is not stopped produces unpredictable > results. The stopping of the last processor thread of a LPAR partition > effectively kills the partition, and at that point, ownership of all > partition resources reverts to the platform firmware. I was just investigating a reason why we should check for all thread states before unplugging the core, like David suggested in his reply. rtas_stop_self() was setting 'cs->halted = 1' without a thread activity check like ibm,suspend-me() does and I was wondering why. This text you sent explains it, quoting: "> The isolation of a CPU, in all cases, is preceded by the stop-self RTAS function for all processor threads, and the OS insures that all the CPU’s threads are in the RTAS stopped state prior to isolating the CPU." This seems to be corroborated by arch/powerpc/platform/pseries/hotplug-cpu.c: static void pseries_cpu_offline_self(void) { unsigned int hwcpu = hard_smp_processor_id(); local_irq_disable(); idle_task_exit(); if (xive_enabled()) xive_teardown_cpu(); else xics_teardown_cpu(); unregister_slb_shadow(hwcpu); rtas_stop_self(); /* Should never get here... */ BUG(); for(;;); } IIUC this means that we can rely on cs->halted = 1 since it's coming right after a rtas_stop_self() call. This is still a bit confusing though and I wouldn't mind standardizing the 'CPU core is offline' condition with what ibm,suspend-me is already doing. David, what do you think? > > R1-13.7.4.1.1-1. For the LRDR option: Prior to issuing the RTAS > set-indicator specifying isolate isolation-state of a CPU DR > connector type, all the CPU threads must be in the RTAS stopped > state. > > R1-13.7.4.1.1-2. For the LRDR option: Stopping of the last processor > thread of a LPAR partition with the stop-self RTAS function, must kill > the partition, with ownership of all partition resources reverting to > the platform firmware. > > This is clearly not how things work today : linux doesn't call > "stop-self" on the last vCPU and even if it did, QEMU doesn't > terminate the VM. > > If there's a valid reason to not implement this PAPR behavior, I'd like > it to be documented. I can only speculate. This would create a unorthodox way of shutting down the guest, when the user can just shutdown the whole thing gracefully. Unless we're considering borderline cases, like the security risk mentioned in the kernel docs (Documentation/core-api/cpu_hotplug.rst): "Such advances require CPUs available to a kernel to be removed either for provisioning reasons, or for RAS purposes to keep an offending CPU off system execution path. Hence the need for CPU hotplug support in the Linux kernel." In this extreme scenario I can see a reason to kill the partition/guest by offlining the last online CPU - if it's compromising the host we'd rather terminate immediately instead of waiting for graceful shutdown. > >> If we hit the jackpot, we're going to detach the core DRC and pulse the >> hotplug IRQ, but the guest OS will refuse to release the CPU. Our >> spapr_core_unplug() DRC release callback will never be called and the CPU >> core object will keep existing in QEMU. No error message will be sent >> to the user, but the CPU core wasn't unplugged from the guest. >> >> If the guest OS onlines the CPU core again we won't be able to hotunplug it >> either. 'dmesg' inside the guest will report a failed attempt to offline an >> unknown CPU: >> >> [ 923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 >> >> This is the result of stopping the DRC state transition in the middle in the >> first failed attempt. >> > > Yes, at this point only a machine reset can fix things up. > > Given this is linux's choice not to call "stop-self" as it should do, I'm not > super fan of hardcoding this logic in QEMU, unless there are really good > reasons to do so. I understand where are you coming from and I sympathize. Not sure about how users would feel about that though. They expect a somewhat compatible behavior of multi-arch features like hotplug/hotunplug, and x86 will neither hotunplug nor offline the last CPU as well. There is a very high chance that, even if we pull this design off, I'll need to go to Libvirt and disable it because we broke compatibility with how vcpu unplug operated earlier. Thanks, DHB > >> We can avoid this, and potentially other bad things from happening, if we >> avoid to attempt the unplug altogether in this scenario. Let's check for >> the online/offline state of the CPU cores in the guest before allowing >> the hotunplug, and forbid removing a CPU core if it's the last one online >> in the guest. >> >> Reported-by: Xujun Ma <xuma@redhat.com> >> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 >> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> >> --- >> hw/ppc/spapr.c | 39 ++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 38 insertions(+), 1 deletion(-) >> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index a2f01c21aa..d269dcd102 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -3709,9 +3709,16 @@ static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev) >> static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, >> Error **errp) >> { >> + CPUArchId *core_slot; >> + SpaprCpuCore *core; >> + PowerPCCPU *cpu; >> + CPUState *cs; >> + bool last_cpu_online = true; >> int index; >> >> - if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) { >> + core_slot = spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, >> + &index); >> + if (!core_slot) { >> error_setg(errp, "Unable to find CPU core with core-id: %d", >> cc->core_id); >> return -1; >> @@ -3722,6 +3729,36 @@ static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, >> return -1; >> } >> >> + /* Allow for any non-boot CPU core to be unplugged if already offline */ >> + core = SPAPR_CPU_CORE(core_slot->cpu); >> + cs = CPU(core->threads[0]); >> + if (cs->halted) { >> + return 0; >> + } >> + >> + /* >> + * Do not allow core unplug if it's the last core online. >> + */ >> + cpu = POWERPC_CPU(cs); >> + CPU_FOREACH(cs) { >> + PowerPCCPU *c = POWERPC_CPU(cs); >> + >> + if (c == cpu) { >> + continue; >> + } >> + >> + if (!cs->halted) { >> + last_cpu_online = false; >> + break; >> + } >> + } >> + >> + if (last_cpu_online) { >> + error_setg(errp, "Unable to unplug CPU core with core-id %d: it is " >> + "the only CPU core online in the guest", cc->core_id); >> + return -1; >> + } >> + >> return 0; >> } >> >
On 1/15/21 2:22 PM, Greg Kurz wrote: > On Thu, 14 Jan 2021 15:06:28 -0300 > Daniel Henrique Barboza <danielhb413@gmail.com> wrote: > >> The only restriction we have when unplugging CPUs is to forbid unplug of >> the boot cpu core. spapr_core_unplug_possible() does not contemplate the > > I can't remember why this restriction was introduced in the first place... > This should be investigated and documented if the limitation still stands. I looked it up and found out that restriction was added by this commit: commit 62be8b044adf47327ebefdefb25f28a40316ebd0 Author: Bharata B Rao <bharata@linux.vnet.ibm.com> Date: Wed Jul 27 10:44:42 2016 +0530 spapr: Prevent boot CPU core removal Boot CPU is assumed to be always present in QEMU code. So until that assumptions are gone, deny removal request. In another words, QEMU won't support boot CPU core hot-unplug. I don't think it necessarily has to do with pSeries code though. I was unable to offline the CPU0 of my x86 notebook: # lscpu | grep -i 'on-line' On-line CPU(s) list: 0-7 # echo 0 > /sys/devices/system/cpu/cpu0/online bash: /sys/devices/system/cpu/cpu0/online: Permission denied # # echo 0 > /sys/devices/system/cpu/cpu1/online # # lscpu | grep -i 'on-line' On-line CPU(s) list: 0,2-7 # echo 0 > /sys/devices/system/cpu/cpu0/online bash: /sys/devices/system/cpu/cpu0/online: Permission denied # The pseries kernel does not have this restriction (offlining CPU0). Maybe we're limiting CPU0 unplug in pseries because it would break common QEMU code that has this restriction due to x86/ACPI mechanics because, apparently, x86 can't hotunplug CPU0. If a good x86 soul reads this and confirm/deny my assumption I appreciate :) Thanks, DHB > >> possibility of some cores being offlined by the guest, meaning that we're >> rolling the dice regarding on whether we're unplugging the last online >> CPU core the guest has. >> > > Trying to unplug the last CPU is obviously something that deserves > special care. LoPAPR is quite explicit on the outcome : this should > terminate the partition. > > 13.7.4.1.1. Isolation of CPUs > > The isolation of a CPU, in all cases, is preceded by the stop-self > RTAS function for all processor threads, and the OS insures that all > the CPU’s threads are in the RTAS stopped state prior to isolating the > CPU. Isolation of a processor that is not stopped produces unpredictable > results. The stopping of the last processor thread of a LPAR partition > effectively kills the partition, and at that point, ownership of all > partition resources reverts to the platform firmware. > > R1-13.7.4.1.1-1. For the LRDR option: Prior to issuing the RTAS > set-indicator specifying isolate isolation-state of a CPU DR > connector type, all the CPU threads must be in the RTAS stopped > state. > > R1-13.7.4.1.1-2. For the LRDR option: Stopping of the last processor > thread of a LPAR partition with the stop-self RTAS function, must kill > the partition, with ownership of all partition resources reverting to > the platform firmware. > > This is clearly not how things work today : linux doesn't call > "stop-self" on the last vCPU and even if it did, QEMU doesn't > terminate the VM. > > If there's a valid reason to not implement this PAPR behavior, I'd like > it to be documented. > >> If we hit the jackpot, we're going to detach the core DRC and pulse the >> hotplug IRQ, but the guest OS will refuse to release the CPU. Our >> spapr_core_unplug() DRC release callback will never be called and the CPU >> core object will keep existing in QEMU. No error message will be sent >> to the user, but the CPU core wasn't unplugged from the guest. >> >> If the guest OS onlines the CPU core again we won't be able to hotunplug it >> either. 'dmesg' inside the guest will report a failed attempt to offline an >> unknown CPU: >> >> [ 923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 >> >> This is the result of stopping the DRC state transition in the middle in the >> first failed attempt. >> > > Yes, at this point only a machine reset can fix things up. > > Given this is linux's choice not to call "stop-self" as it should do, I'm not > super fan of hardcoding this logic in QEMU, unless there are really good > reasons to do so. > >> We can avoid this, and potentially other bad things from happening, if we >> avoid to attempt the unplug altogether in this scenario. Let's check for >> the online/offline state of the CPU cores in the guest before allowing >> the hotunplug, and forbid removing a CPU core if it's the last one online >> in the guest. >> >> Reported-by: Xujun Ma <xuma@redhat.com> >> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 >> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> >> --- >> hw/ppc/spapr.c | 39 ++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 38 insertions(+), 1 deletion(-) >> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index a2f01c21aa..d269dcd102 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -3709,9 +3709,16 @@ static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev) >> static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, >> Error **errp) >> { >> + CPUArchId *core_slot; >> + SpaprCpuCore *core; >> + PowerPCCPU *cpu; >> + CPUState *cs; >> + bool last_cpu_online = true; >> int index; >> >> - if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) { >> + core_slot = spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, >> + &index); >> + if (!core_slot) { >> error_setg(errp, "Unable to find CPU core with core-id: %d", >> cc->core_id); >> return -1; >> @@ -3722,6 +3729,36 @@ static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, >> return -1; >> } >> >> + /* Allow for any non-boot CPU core to be unplugged if already offline */ >> + core = SPAPR_CPU_CORE(core_slot->cpu); >> + cs = CPU(core->threads[0]); >> + if (cs->halted) { >> + return 0; >> + } >> + >> + /* >> + * Do not allow core unplug if it's the last core online. >> + */ >> + cpu = POWERPC_CPU(cs); >> + CPU_FOREACH(cs) { >> + PowerPCCPU *c = POWERPC_CPU(cs); >> + >> + if (c == cpu) { >> + continue; >> + } >> + >> + if (!cs->halted) { >> + last_cpu_online = false; >> + break; >> + } >> + } >> + >> + if (last_cpu_online) { >> + error_setg(errp, "Unable to unplug CPU core with core-id %d: it is " >> + "the only CPU core online in the guest", cc->core_id); >> + return -1; >> + } >> + >> return 0; >> } >> >
On 1/15/21 2:22 PM, Greg Kurz wrote: > On Thu, 14 Jan 2021 15:06:28 -0300 > Daniel Henrique Barboza <danielhb413@gmail.com> wrote: > >> The only restriction we have when unplugging CPUs is to forbid unplug of >> the boot cpu core. spapr_core_unplug_possible() does not contemplate the > > I can't remember why this restriction was introduced in the first place... > This should be investigated and documented if the limitation still stands. I looked it up and found out that restriction was added by this commit: commit 62be8b044adf47327ebefdefb25f28a40316ebd0 Author: Bharata B Rao <bharata@linux.vnet.ibm.com> Date: Wed Jul 27 10:44:42 2016 +0530 spapr: Prevent boot CPU core removal Boot CPU is assumed to be always present in QEMU code. So until that assumptions are gone, deny removal request. In another words, QEMU won't support boot CPU core hot-unplug. I don't think it necessarily has to do with pSeries code though. I was unable to offline the CPU0 of my x86 notebook: # lscpu | grep -i 'on-line' On-line CPU(s) list: 0-7 # echo 0 > /sys/devices/system/cpu/cpu0/online bash: /sys/devices/system/cpu/cpu0/online: Permission denied # # echo 0 > /sys/devices/system/cpu/cpu1/online # # lscpu | grep -i 'on-line' On-line CPU(s) list: 0,2-7 # echo 0 > /sys/devices/system/cpu/cpu0/online bash: /sys/devices/system/cpu/cpu0/online: Permission denied # The pseries kernel does not have this restriction (offlining CPU0). Maybe we're limiting CPU0 unplug in pseries because it would break common QEMU code that has this restriction due to x86/ACPI mechanics because, apparently, x86 can't hotunplug CPU0. If a good x86 soul reads this and confirm/deny my assumption I appreciate :) Thanks, DHB > >> possibility of some cores being offlined by the guest, meaning that we're >> rolling the dice regarding on whether we're unplugging the last online >> CPU core the guest has. >> > > Trying to unplug the last CPU is obviously something that deserves > special care. LoPAPR is quite explicit on the outcome : this should > terminate the partition. > > 13.7.4.1.1. Isolation of CPUs > > The isolation of a CPU, in all cases, is preceded by the stop-self > RTAS function for all processor threads, and the OS insures that all > the CPU’s threads are in the RTAS stopped state prior to isolating the > CPU. Isolation of a processor that is not stopped produces unpredictable > results. The stopping of the last processor thread of a LPAR partition > effectively kills the partition, and at that point, ownership of all > partition resources reverts to the platform firmware. > > R1-13.7.4.1.1-1. For the LRDR option: Prior to issuing the RTAS > set-indicator specifying isolate isolation-state of a CPU DR > connector type, all the CPU threads must be in the RTAS stopped > state. > > R1-13.7.4.1.1-2. For the LRDR option: Stopping of the last processor > thread of a LPAR partition with the stop-self RTAS function, must kill > the partition, with ownership of all partition resources reverting to > the platform firmware. > > This is clearly not how things work today : linux doesn't call > "stop-self" on the last vCPU and even if it did, QEMU doesn't > terminate the VM. > > If there's a valid reason to not implement this PAPR behavior, I'd like > it to be documented. > >> If we hit the jackpot, we're going to detach the core DRC and pulse the >> hotplug IRQ, but the guest OS will refuse to release the CPU. Our >> spapr_core_unplug() DRC release callback will never be called and the CPU >> core object will keep existing in QEMU. No error message will be sent >> to the user, but the CPU core wasn't unplugged from the guest. >> >> If the guest OS onlines the CPU core again we won't be able to hotunplug it >> either. 'dmesg' inside the guest will report a failed attempt to offline an >> unknown CPU: >> >> [ 923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 >> >> This is the result of stopping the DRC state transition in the middle in the >> first failed attempt. >> > > Yes, at this point only a machine reset can fix things up. > > Given this is linux's choice not to call "stop-self" as it should do, I'm not > super fan of hardcoding this logic in QEMU, unless there are really good > reasons to do so. > >> We can avoid this, and potentially other bad things from happening, if we >> avoid to attempt the unplug altogether in this scenario. Let's check for >> the online/offline state of the CPU cores in the guest before allowing >> the hotunplug, and forbid removing a CPU core if it's the last one online >> in the guest. >> >> Reported-by: Xujun Ma <xuma@redhat.com> >> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 >> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> >> --- >> hw/ppc/spapr.c | 39 ++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 38 insertions(+), 1 deletion(-) >> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index a2f01c21aa..d269dcd102 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -3709,9 +3709,16 @@ static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev) >> static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, >> Error **errp) >> { >> + CPUArchId *core_slot; >> + SpaprCpuCore *core; >> + PowerPCCPU *cpu; >> + CPUState *cs; >> + bool last_cpu_online = true; >> int index; >> >> - if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) { >> + core_slot = spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, >> + &index); >> + if (!core_slot) { >> error_setg(errp, "Unable to find CPU core with core-id: %d", >> cc->core_id); >> return -1; >> @@ -3722,6 +3729,36 @@ static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, >> return -1; >> } >> >> + /* Allow for any non-boot CPU core to be unplugged if already offline */ >> + core = SPAPR_CPU_CORE(core_slot->cpu); >> + cs = CPU(core->threads[0]); >> + if (cs->halted) { >> + return 0; >> + } >> + >> + /* >> + * Do not allow core unplug if it's the last core online. >> + */ >> + cpu = POWERPC_CPU(cs); >> + CPU_FOREACH(cs) { >> + PowerPCCPU *c = POWERPC_CPU(cs); >> + >> + if (c == cpu) { >> + continue; >> + } >> + >> + if (!cs->halted) { >> + last_cpu_online = false; >> + break; >> + } >> + } >> + >> + if (last_cpu_online) { >> + error_setg(errp, "Unable to unplug CPU core with core-id %d: it is " >> + "the only CPU core online in the guest", cc->core_id); >> + return -1; >> + } >> + >> return 0; >> } >> >
On Fri, Jan 15, 2021 at 06:22:16PM +0100, Greg Kurz wrote: > On Thu, 14 Jan 2021 15:06:28 -0300 > Daniel Henrique Barboza <danielhb413@gmail.com> wrote: > > > The only restriction we have when unplugging CPUs is to forbid unplug of > > the boot cpu core. spapr_core_unplug_possible() does not contemplate the > > I can't remember why this restriction was introduced in the first place... > This should be investigated and documented if the limitation still stands. > > > possibility of some cores being offlined by the guest, meaning that we're > > rolling the dice regarding on whether we're unplugging the last online > > CPU core the guest has. > > > > Trying to unplug the last CPU is obviously something that deserves > special care. LoPAPR is quite explicit on the outcome : this should > terminate the partition. > > 13.7.4.1.1. Isolation of CPUs > > The isolation of a CPU, in all cases, is preceded by the stop-self > RTAS function for all processor threads, and the OS insures that all > the CPU’s threads are in the RTAS stopped state prior to isolating the > CPU. Isolation of a processor that is not stopped produces unpredictable > results. The stopping of the last processor thread of a LPAR partition > effectively kills the partition, and at that point, ownership of all > partition resources reverts to the platform firmware. > > R1-13.7.4.1.1-1. For the LRDR option: Prior to issuing the RTAS > set-indicator specifying isolate isolation-state of a CPU DR > connector type, all the CPU threads must be in the RTAS stopped > state. > > R1-13.7.4.1.1-2. For the LRDR option: Stopping of the last processor > thread of a LPAR partition with the stop-self RTAS function, must kill > the partition, with ownership of all partition resources reverting to > the platform firmware. > > This is clearly not how things work today : linux doesn't call > "stop-self" on the last vCPU and even if it did, QEMU doesn't > terminate the VM. > If there's a valid reason to not implement this PAPR behavior, I'd like > it to be documented. So, we should implement this in QEMU - if you stop-self the last thread, it should be the equivalent of a power off. Linux not ever doing that probably makes sense - it wants you to encourage you to shut down properly for data safety. > > If we hit the jackpot, we're going to detach the core DRC and pulse the > > hotplug IRQ, but the guest OS will refuse to release the CPU. Our > > spapr_core_unplug() DRC release callback will never be called and the CPU > > core object will keep existing in QEMU. No error message will be sent > > to the user, but the CPU core wasn't unplugged from the guest. > > > > If the guest OS onlines the CPU core again we won't be able to hotunplug it > > either. 'dmesg' inside the guest will report a failed attempt to offline an > > unknown CPU: > > > > [ 923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 > > > > This is the result of stopping the DRC state transition in the middle in the > > first failed attempt. > > > > Yes, at this point only a machine reset can fix things up. > > Given this is linux's choice not to call "stop-self" as it should do, I'm not > super fan of hardcoding this logic in QEMU, unless there are really good > reasons to do so. Uh.. sorry I don't follow how linux is doing something wrong here. > > We can avoid this, and potentially other bad things from happening, if we > > avoid to attempt the unplug altogether in this scenario. Let's check for > > the online/offline state of the CPU cores in the guest before allowing > > the hotunplug, and forbid removing a CPU core if it's the last one online > > in the guest. > > > > Reported-by: Xujun Ma <xuma@redhat.com> > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> > > --- > > hw/ppc/spapr.c | 39 ++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 38 insertions(+), 1 deletion(-) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index a2f01c21aa..d269dcd102 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -3709,9 +3709,16 @@ static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev) > > static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, > > Error **errp) > > { > > + CPUArchId *core_slot; > > + SpaprCpuCore *core; > > + PowerPCCPU *cpu; > > + CPUState *cs; > > + bool last_cpu_online = true; > > int index; > > > > - if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) { > > + core_slot = spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, > > + &index); > > + if (!core_slot) { > > error_setg(errp, "Unable to find CPU core with core-id: %d", > > cc->core_id); > > return -1; > > @@ -3722,6 +3729,36 @@ static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, > > return -1; > > } > > > > + /* Allow for any non-boot CPU core to be unplugged if already offline */ > > + core = SPAPR_CPU_CORE(core_slot->cpu); > > + cs = CPU(core->threads[0]); > > + if (cs->halted) { > > + return 0; > > + } > > + > > + /* > > + * Do not allow core unplug if it's the last core online. > > + */ > > + cpu = POWERPC_CPU(cs); > > + CPU_FOREACH(cs) { > > + PowerPCCPU *c = POWERPC_CPU(cs); > > + > > + if (c == cpu) { > > + continue; > > + } > > + > > + if (!cs->halted) { > > + last_cpu_online = false; > > + break; > > + } > > + } > > + > > + if (last_cpu_online) { > > + error_setg(errp, "Unable to unplug CPU core with core-id %d: it is " > > + "the only CPU core online in the guest", cc->core_id); > > + return -1; > > + } > > + > > return 0; > > } > > >
On Fri, Jan 15, 2021 at 03:52:56PM -0300, Daniel Henrique Barboza wrote: > > > On 1/15/21 2:22 PM, Greg Kurz wrote: > > On Thu, 14 Jan 2021 15:06:28 -0300 > > Daniel Henrique Barboza <danielhb413@gmail.com> wrote: > > > > > The only restriction we have when unplugging CPUs is to forbid unplug of > > > the boot cpu core. spapr_core_unplug_possible() does not contemplate the > > I'll look into it. > > > > > I can't remember why this restriction was introduced in the first place... > > This should be investigated and documented if the limitation still stands. > > > > > possibility of some cores being offlined by the guest, meaning that we're > > > rolling the dice regarding on whether we're unplugging the last online > > > CPU core the guest has. > > > > > > > Trying to unplug the last CPU is obviously something that deserves > > special care. LoPAPR is quite explicit on the outcome : this should > > terminate the partition. > > > > 13.7.4.1.1. Isolation of CPUs > > > > The isolation of a CPU, in all cases, is preceded by the stop-self > > RTAS function for all processor threads, and the OS insures that all > > the CPU’s threads are in the RTAS stopped state prior to isolating the > > CPU. Isolation of a processor that is not stopped produces unpredictable > > results. The stopping of the last processor thread of a LPAR partition > > effectively kills the partition, and at that point, ownership of all > > partition resources reverts to the platform firmware. > > > I was just investigating a reason why we should check for all thread > states before unplugging the core, like David suggested in his reply. > rtas_stop_self() was setting 'cs->halted = 1' without a thread activity > check like ibm,suspend-me() does and I was wondering why. This text you sent > explains it, quoting: > > "> The isolation of a CPU, in all cases, is preceded by the stop-self > RTAS function for all processor threads, and the OS insures that all > the CPU’s threads are in the RTAS stopped state prior to isolating the > CPU." > > > This seems to be corroborated by arch/powerpc/platform/pseries/hotplug-cpu.c: Um... this seems like you're overcomplicating this. The crucial point here is that 'start-cpu' and 'stop-self' operate on individual threads, where as dynamic reconfiguration hotplug and unplug works on whole cores. > static void pseries_cpu_offline_self(void) > { > unsigned int hwcpu = hard_smp_processor_id(); > > local_irq_disable(); > idle_task_exit(); > if (xive_enabled()) > xive_teardown_cpu(); > else > xics_teardown_cpu(); > > unregister_slb_shadow(hwcpu); > rtas_stop_self(); > > /* Should never get here... */ > BUG(); > for(;;); > } > > > IIUC this means that we can rely on cs->halted = 1 since it's coming right > after a rtas_stop_self() call. This is still a bit confusing though and I > wouldn't mind standardizing the 'CPU core is offline' condition with what > ibm,suspend-me is already doing. At the moment we have no tracking of whether a core is online. We kinda-sorta track whether a *thread* is online through stop-self / start-cpu. > David, what do you think? I think we can rely on cs->halted = 1 when the thread is offline. What I'm much less certain about is whether we can count on the thread being offline when cs->halted = 1. > > R1-13.7.4.1.1-1. For the LRDR option: Prior to issuing the RTAS > > set-indicator specifying isolate isolation-state of a CPU DR > > connector type, all the CPU threads must be in the RTAS stopped > > state. > > > > R1-13.7.4.1.1-2. For the LRDR option: Stopping of the last processor > > thread of a LPAR partition with the stop-self RTAS function, must kill > > the partition, with ownership of all partition resources reverting to > > the platform firmware. > > > > This is clearly not how things work today : linux doesn't call > > "stop-self" on the last vCPU and even if it did, QEMU doesn't > > terminate the VM. > > > > If there's a valid reason to not implement this PAPR behavior, I'd like > > it to be documented. > > > I can only speculate. This would create a unorthodox way of shutting down > the guest, when the user can just shutdown the whole thing gracefully. > > Unless we're considering borderline cases, like the security risk mentioned > in the kernel docs (Documentation/core-api/cpu_hotplug.rst): > > "Such advances require CPUs available to a kernel to be removed either for > provisioning reasons, or for RAS purposes to keep an offending CPU off > system execution path. Hence the need for CPU hotplug support in the > Linux kernel." > > In this extreme scenario I can see a reason to kill the partition/guest > by offlining the last online CPU - if it's compromising the host we'd > rather terminate immediately instead of waiting for graceful > shutdown. I'm a bit confused by this comment. You seem to be conflating online/offline operations (start-cpu/stop-self) with hot plug/unplug operations - they're obviously related, but they're not the same thing. > > > If we hit the jackpot, we're going to detach the core DRC and pulse the > > > hotplug IRQ, but the guest OS will refuse to release the CPU. Our > > > spapr_core_unplug() DRC release callback will never be called and the CPU > > > core object will keep existing in QEMU. No error message will be sent > > > to the user, but the CPU core wasn't unplugged from the guest. > > > > > > If the guest OS onlines the CPU core again we won't be able to hotunplug it > > > either. 'dmesg' inside the guest will report a failed attempt to offline an > > > unknown CPU: > > > > > > [ 923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 > > > > > > This is the result of stopping the DRC state transition in the middle in the > > > first failed attempt. > > > > > > > Yes, at this point only a machine reset can fix things up. > > > > Given this is linux's choice not to call "stop-self" as it should do, I'm not > > super fan of hardcoding this logic in QEMU, unless there are really good > > reasons to do so. > > I understand where are you coming from and I sympathize. Not sure about how users > would feel about that though. They expect a somewhat compatible behavior of > multi-arch features like hotplug/hotunplug, and x86 will neither hotunplug nor offline > the last CPU as well. > > There is a very high chance that, even if we pull this design off, I'll need to go to > Libvirt and disable it because we broke compatibility with how vcpu unplug operated > earlier.
On 1/17/21 10:18 PM, David Gibson wrote: > On Fri, Jan 15, 2021 at 03:52:56PM -0300, Daniel Henrique Barboza wrote: >> >> >> On 1/15/21 2:22 PM, Greg Kurz wrote: >>> On Thu, 14 Jan 2021 15:06:28 -0300 >>> Daniel Henrique Barboza <danielhb413@gmail.com> wrote: >>> >>>> The only restriction we have when unplugging CPUs is to forbid unplug of >>>> the boot cpu core. spapr_core_unplug_possible() does not contemplate the >> >> I'll look into it. >> >>> >>> I can't remember why this restriction was introduced in the first place... >>> This should be investigated and documented if the limitation still stands. >>> >>>> possibility of some cores being offlined by the guest, meaning that we're >>>> rolling the dice regarding on whether we're unplugging the last online >>>> CPU core the guest has. >>>> >>> >>> Trying to unplug the last CPU is obviously something that deserves >>> special care. LoPAPR is quite explicit on the outcome : this should >>> terminate the partition. >>> >>> 13.7.4.1.1. Isolation of CPUs >>> >>> The isolation of a CPU, in all cases, is preceded by the stop-self >>> RTAS function for all processor threads, and the OS insures that all >>> the CPU’s threads are in the RTAS stopped state prior to isolating the >>> CPU. Isolation of a processor that is not stopped produces unpredictable >>> results. The stopping of the last processor thread of a LPAR partition >>> effectively kills the partition, and at that point, ownership of all >>> partition resources reverts to the platform firmware. >> >> >> I was just investigating a reason why we should check for all thread >> states before unplugging the core, like David suggested in his reply. >> rtas_stop_self() was setting 'cs->halted = 1' without a thread activity >> check like ibm,suspend-me() does and I was wondering why. This text you sent >> explains it, quoting: >> >> "> The isolation of a CPU, in all cases, is preceded by the stop-self >> RTAS function for all processor threads, and the OS insures that all >> the CPU’s threads are in the RTAS stopped state prior to isolating the >> CPU." >> >> >> This seems to be corroborated by arch/powerpc/platform/pseries/hotplug-cpu.c: > > Um... this seems like you're overcomplicating this. The crucial point > here is that 'start-cpu' and 'stop-self' operate on individual > threads, where as dynamic reconfiguration hotplug and unplug works on > whole cores. > >> static void pseries_cpu_offline_self(void) >> { >> unsigned int hwcpu = hard_smp_processor_id(); >> >> local_irq_disable(); >> idle_task_exit(); >> if (xive_enabled()) >> xive_teardown_cpu(); >> else >> xics_teardown_cpu(); >> >> unregister_slb_shadow(hwcpu); >> rtas_stop_self(); >> >> /* Should never get here... */ >> BUG(); >> for(;;); >> } >> >> >> IIUC this means that we can rely on cs->halted = 1 since it's coming right >> after a rtas_stop_self() call. This is still a bit confusing though and I >> wouldn't mind standardizing the 'CPU core is offline' condition with what >> ibm,suspend-me is already doing. > > At the moment we have no tracking of whether a core is online. We > kinda-sorta track whether a *thread* is online through stop-self / > start-cpu. > >> David, what do you think? > > I think we can rely on cs->halted = 1 when the thread is offline. > What I'm much less certain about is whether we can count on the thread > being offline when cs->halted = 1. I guess we should just stick with your first suggestion then. I'll check for both cs->halted and msr to assert whether a whole core if offline, like ibm,suspend-me is doing: if (!cs->halted || (e->msr & (1ULL << MSR_EE))) { rtas_st(rets, 0, H_MULTI_THREADS_ACTIVE); return; } Another question not necessarily related to this fix: we do a similar check in the start of do_client_architecture_support(), returning the same H_MULTI_THREADS_ACTIVE error, but we're not checking e->msr: /* CAS is supposed to be called early when only the boot vCPU is active. */ CPU_FOREACH(cs) { if (cs == CPU(cpu)) { continue; } if (!cs->halted) { warn_report("guest has multiple active vCPUs at CAS, which is not allowed"); return H_MULTI_THREADS_ACTIVE; } } Should we bother changing this logic to check for e->msr as well? If there is no possible harm done I'd rather have the same check returning H_MULTI_THREADS_ACTIVE in both places. If there is a special condition in early boot where this check in do_client_architecture_support() is enough, I would like to put a comment in there to make it clear why. Thanks, DHB > >>> R1-13.7.4.1.1-1. For the LRDR option: Prior to issuing the RTAS >>> set-indicator specifying isolate isolation-state of a CPU DR >>> connector type, all the CPU threads must be in the RTAS stopped >>> state. >>> >>> R1-13.7.4.1.1-2. For the LRDR option: Stopping of the last processor >>> thread of a LPAR partition with the stop-self RTAS function, must kill >>> the partition, with ownership of all partition resources reverting to >>> the platform firmware. >>> >>> This is clearly not how things work today : linux doesn't call >>> "stop-self" on the last vCPU and even if it did, QEMU doesn't >>> terminate the VM. >>> >>> If there's a valid reason to not implement this PAPR behavior, I'd like >>> it to be documented. >> >> >> I can only speculate. This would create a unorthodox way of shutting down >> the guest, when the user can just shutdown the whole thing gracefully. >> >> Unless we're considering borderline cases, like the security risk mentioned >> in the kernel docs (Documentation/core-api/cpu_hotplug.rst): >> >> "Such advances require CPUs available to a kernel to be removed either for >> provisioning reasons, or for RAS purposes to keep an offending CPU off >> system execution path. Hence the need for CPU hotplug support in the >> Linux kernel." >> >> In this extreme scenario I can see a reason to kill the partition/guest >> by offlining the last online CPU - if it's compromising the host we'd >> rather terminate immediately instead of waiting for graceful >> shutdown. > > I'm a bit confused by this comment. You seem to be conflating > online/offline operations (start-cpu/stop-self) with hot plug/unplug > operations - they're obviously related, but they're not the same > thing. > >>>> If we hit the jackpot, we're going to detach the core DRC and pulse the >>>> hotplug IRQ, but the guest OS will refuse to release the CPU. Our >>>> spapr_core_unplug() DRC release callback will never be called and the CPU >>>> core object will keep existing in QEMU. No error message will be sent >>>> to the user, but the CPU core wasn't unplugged from the guest. >>>> >>>> If the guest OS onlines the CPU core again we won't be able to hotunplug it >>>> either. 'dmesg' inside the guest will report a failed attempt to offline an >>>> unknown CPU: >>>> >>>> [ 923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 >>>> >>>> This is the result of stopping the DRC state transition in the middle in the >>>> first failed attempt. >>>> >>> >>> Yes, at this point only a machine reset can fix things up. >>> >>> Given this is linux's choice not to call "stop-self" as it should do, I'm not >>> super fan of hardcoding this logic in QEMU, unless there are really good >>> reasons to do so. >> >> I understand where are you coming from and I sympathize. Not sure about how users >> would feel about that though. They expect a somewhat compatible behavior of >> multi-arch features like hotplug/hotunplug, and x86 will neither hotunplug nor offline >> the last CPU as well. >> >> There is a very high chance that, even if we pull this design off, I'll need to go to >> Libvirt and disable it because we broke compatibility with how vcpu unplug operated >> earlier. >
On Mon, 18 Jan 2021 12:12:03 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > On Fri, Jan 15, 2021 at 06:22:16PM +0100, Greg Kurz wrote: > > On Thu, 14 Jan 2021 15:06:28 -0300 > > Daniel Henrique Barboza <danielhb413@gmail.com> wrote: > > > > > The only restriction we have when unplugging CPUs is to forbid unplug of > > > the boot cpu core. spapr_core_unplug_possible() does not contemplate the > > > > I can't remember why this restriction was introduced in the first place... > > This should be investigated and documented if the limitation still stands. > > > > > possibility of some cores being offlined by the guest, meaning that we're > > > rolling the dice regarding on whether we're unplugging the last online > > > CPU core the guest has. > > > > > > > Trying to unplug the last CPU is obviously something that deserves > > special care. LoPAPR is quite explicit on the outcome : this should > > terminate the partition. > > > > 13.7.4.1.1. Isolation of CPUs > > > > The isolation of a CPU, in all cases, is preceded by the stop-self > > RTAS function for all processor threads, and the OS insures that all > > the CPU’s threads are in the RTAS stopped state prior to isolating the > > CPU. Isolation of a processor that is not stopped produces unpredictable > > results. The stopping of the last processor thread of a LPAR partition > > effectively kills the partition, and at that point, ownership of all > > partition resources reverts to the platform firmware. > > > > R1-13.7.4.1.1-1. For the LRDR option: Prior to issuing the RTAS > > set-indicator specifying isolate isolation-state of a CPU DR > > connector type, all the CPU threads must be in the RTAS stopped > > state. > > > > R1-13.7.4.1.1-2. For the LRDR option: Stopping of the last processor > > thread of a LPAR partition with the stop-self RTAS function, must kill > > the partition, with ownership of all partition resources reverting to > > the platform firmware. > > > > This is clearly not how things work today : linux doesn't call > > "stop-self" on the last vCPU and even if it did, QEMU doesn't > > terminate the VM. > > > If there's a valid reason to not implement this PAPR behavior, I'd like > > it to be documented. > > So, we should implement this in QEMU - if you stop-self the last > thread, it should be the equivalent of a power off. Linux not ever > doing that probably makes sense - it wants you to encourage you to > shut down properly for data safety. > Yes I agree it's fine if linux enforces some safeguard to prevent a brutal shutdown when writing 0 to /sys/devices/system/cpu/cpu?/online within the guest. But in this case, off-lining is part of the usual CPU unplug sequence, which was requested by the host : I don't think the safeguard is relevant in this case. This PAPR _feature_ is just another way of uncleanly shutting down the guest. > > > If we hit the jackpot, we're going to detach the core DRC and pulse the > > > hotplug IRQ, but the guest OS will refuse to release the CPU. Our > > > spapr_core_unplug() DRC release callback will never be called and the CPU > > > core object will keep existing in QEMU. No error message will be sent > > > to the user, but the CPU core wasn't unplugged from the guest. > > > > > > If the guest OS onlines the CPU core again we won't be able to hotunplug it > > > either. 'dmesg' inside the guest will report a failed attempt to offline an > > > unknown CPU: > > > > > > [ 923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 > > > > > > This is the result of stopping the DRC state transition in the middle in the > > > first failed attempt. > > > > > > > Yes, at this point only a machine reset can fix things up. > > > > Given this is linux's choice not to call "stop-self" as it should do, I'm not > > super fan of hardcoding this logic in QEMU, unless there are really good > > reasons to do so. > > Uh.. sorry I don't follow how linux is doing something wrong here. > Well... it doesn't finalize the hot-unplug sequence, and we have no way to cope with that except a machine reset. So I would nearly say this is working as expected : CPU hot unplug was requested and we wait for the guest to release the CPU. Linux not wanting to release the CPU until next reboot for some reason isn't really our concern. > > > We can avoid this, and potentially other bad things from happening, if we > > > avoid to attempt the unplug altogether in this scenario. Let's check for > > > the online/offline state of the CPU cores in the guest before allowing > > > the hotunplug, and forbid removing a CPU core if it's the last one online > > > in the guest. > > > An unplug request can be accepted but its handling can still race with some manual off-lining in the guest, which would leave us in the very same situation. So I don't think this patch fixes anything actually (TOCTOU). I tend to think that mixing manual CPU off-lining and CPU hot-unplug is probably not the best thing to do in the first place, unless one really knows what they're doing. Maybe we should rather document the caveats instead of adding workarounds for what remains a corner case ? > > > Reported-by: Xujun Ma <xuma@redhat.com> > > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 > > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> > > > --- > > > hw/ppc/spapr.c | 39 ++++++++++++++++++++++++++++++++++++++- > > > 1 file changed, 38 insertions(+), 1 deletion(-) > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > index a2f01c21aa..d269dcd102 100644 > > > --- a/hw/ppc/spapr.c > > > +++ b/hw/ppc/spapr.c > > > @@ -3709,9 +3709,16 @@ static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev) > > > static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, > > > Error **errp) > > > { > > > + CPUArchId *core_slot; > > > + SpaprCpuCore *core; > > > + PowerPCCPU *cpu; > > > + CPUState *cs; > > > + bool last_cpu_online = true; > > > int index; > > > > > > - if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) { > > > + core_slot = spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, > > > + &index); > > > + if (!core_slot) { > > > error_setg(errp, "Unable to find CPU core with core-id: %d", > > > cc->core_id); > > > return -1; > > > @@ -3722,6 +3729,36 @@ static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, > > > return -1; > > > } > > > > > > + /* Allow for any non-boot CPU core to be unplugged if already offline */ > > > + core = SPAPR_CPU_CORE(core_slot->cpu); > > > + cs = CPU(core->threads[0]); > > > + if (cs->halted) { > > > + return 0; > > > + } > > > + > > > + /* > > > + * Do not allow core unplug if it's the last core online. > > > + */ > > > + cpu = POWERPC_CPU(cs); > > > + CPU_FOREACH(cs) { > > > + PowerPCCPU *c = POWERPC_CPU(cs); > > > + > > > + if (c == cpu) { > > > + continue; > > > + } > > > + > > > + if (!cs->halted) { > > > + last_cpu_online = false; > > > + break; > > > + } > > > + } > > > + > > > + if (last_cpu_online) { > > > + error_setg(errp, "Unable to unplug CPU core with core-id %d: it is " > > > + "the only CPU core online in the guest", cc->core_id); > > > + return -1; > > > + } > > > + > > > return 0; > > > } > > > > > >
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index a2f01c21aa..d269dcd102 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -3709,9 +3709,16 @@ static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev) static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, Error **errp) { + CPUArchId *core_slot; + SpaprCpuCore *core; + PowerPCCPU *cpu; + CPUState *cs; + bool last_cpu_online = true; int index; - if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) { + core_slot = spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, + &index); + if (!core_slot) { error_setg(errp, "Unable to find CPU core with core-id: %d", cc->core_id); return -1; @@ -3722,6 +3729,36 @@ static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore *cc, return -1; } + /* Allow for any non-boot CPU core to be unplugged if already offline */ + core = SPAPR_CPU_CORE(core_slot->cpu); + cs = CPU(core->threads[0]); + if (cs->halted) { + return 0; + } + + /* + * Do not allow core unplug if it's the last core online. + */ + cpu = POWERPC_CPU(cs); + CPU_FOREACH(cs) { + PowerPCCPU *c = POWERPC_CPU(cs); + + if (c == cpu) { + continue; + } + + if (!cs->halted) { + last_cpu_online = false; + break; + } + } + + if (last_cpu_online) { + error_setg(errp, "Unable to unplug CPU core with core-id %d: it is " + "the only CPU core online in the guest", cc->core_id); + return -1; + } + return 0; }
The only restriction we have when unplugging CPUs is to forbid unplug of the boot cpu core. spapr_core_unplug_possible() does not contemplate the possibility of some cores being offlined by the guest, meaning that we're rolling the dice regarding on whether we're unplugging the last online CPU core the guest has. If we hit the jackpot, we're going to detach the core DRC and pulse the hotplug IRQ, but the guest OS will refuse to release the CPU. Our spapr_core_unplug() DRC release callback will never be called and the CPU core object will keep existing in QEMU. No error message will be sent to the user, but the CPU core wasn't unplugged from the guest. If the guest OS onlines the CPU core again we won't be able to hotunplug it either. 'dmesg' inside the guest will report a failed attempt to offline an unknown CPU: [ 923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 This is the result of stopping the DRC state transition in the middle in the first failed attempt. We can avoid this, and potentially other bad things from happening, if we avoid to attempt the unplug altogether in this scenario. Let's check for the online/offline state of the CPU cores in the guest before allowing the hotunplug, and forbid removing a CPU core if it's the last one online in the guest. Reported-by: Xujun Ma <xuma@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- hw/ppc/spapr.c | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-)