mbox series

[0/2] pseries: UNISOLATE DRCs to signal device removal error

Message ID 20210416210216.380291-1-danielhb413@gmail.com (mailing list archive)
Headers show
Series pseries: UNISOLATE DRCs to signal device removal error | expand

Message

Daniel Henrique Barboza April 16, 2021, 9:02 p.m. UTC
At this moment, PAPR [1] does not have a way to report errors during a device
removal operation. This puts a strain in the hypervisor, which needs extra
mechanisms to try to fallback and recover from an error that might have
happened during the removal. The QEMU community has dealt with it during these
years by either trying to preempt the error before sending the HP event or, in
case of a guest side failure, reboot the guest to complete the removal process.

This started to change with QEMU commit fe1831eff8a4 ("spapr_drc.c: use DRC
reconfiguration to cleanup DIMM unplug state"), where a way to fallback from a
memory removal error was introduced. In this case, when QEMU detects that the
kernel is reconfiguring LMBs DRCs that were marked as pending removal, the
entire process is reverted from the QEMU side as well. Around the same time,
other discussions in the QEMU mailing discussed an alternative for other device
as well.

In [2] the idea of using RTAS set-indicator for this role was first introduced.
The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is already
UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else for both QEMU
and phyp. This gives us an opportunity to use this behavior to signal the
hypervisor layer when a device removal happens, allowing it to do a proper
error handling knowing for sure that the removal failed in the kernel. Using
set-indicator to report HP errors isn't strange to PAPR, as per R1-13.5.3.4-4.
of table 13.7 of [1]:

"For all DR options: If this is a DR operation that involves the user insert-
ing a DR entity, then if the firmware can determine that the inserted entity
would cause a system disturbance, then the set-indicator RTAS call must not
unisolate the entity and must return an error status which is unique to the
particular error."

PAPR does not make any restrictions or considerations about setting an already
Unisolated/Configured DRC to 'unisolate', meaning we have a chance to use it
for this purpose - signal an OS side error when attempting to remove a DR
entity.  To validate the design, this is being implemented only for CPUs.

QEMU will use this mechanism to rollback the device removal (hotunplug) state,
allowing for a better error handling mechanism. A implementation of how QEMU
can do it is in [3]. When using a kernel with this series applied, together
with this QEMU build, this is what happens in a common CPU removal/hotunplug
error scenario (trying to remove the last online CPU):

( QEMU command line: qemu-system-ppc64 -machine pseries,accel=kvm,usb=off
-smp 1,maxcpus=2,threads=1,cores=2,sockets=1 ... )

[root@localhost ~]# QEMU 5.2.92 monitor - type 'help' for more information
(qemu) device_add host-spapr-cpu-core,core-id=1,id=core1
(qemu) 

[root@localhost ~]# echo 0 > /sys/devices/system/cpu/cpu0/online
[   77.548442][   T13] IRQ 19: no longer affine to CPU0
[   77.548452][   T13] IRQ 20: no longer affine to CPU0
[   77.548458][   T13] IRQ 256: no longer affine to CPU0
[   77.548465][   T13] IRQ 258: no longer affine to CPU0
[   77.548472][   T13] IRQ 259: no longer affine to CPU0
[   77.548479][   T13] IRQ 260: no longer affine to CPU0
[   77.548485][   T13] IRQ 261: no longer affine to CPU0
[   77.548590][    T0] cpu 0 (hwid 0) Ready to die...
[root@localhost ~]# (qemu) 
(qemu) device_del core1
(qemu) [   83.214073][  T100] pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9, rc: -16
qemu-system-ppc64: Device hotunplug rejected by the guest for device core1

(qemu) 

As soon as the CPU removal fails in dlpar_cpu(), QEMU becames aware of
it and is able to do error recovery.

If this solution is well received, I'll push for an architecture change
request internally at IBM to make this mechanism PAPR official.


[1] https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06395.html
[3] https://github.com/danielhb/qemu/tree/unisolate_drc_callback_v1

Daniel Henrique Barboza (2):
  dlpar.c: introduce dlpar_unisolate_drc()
  hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure

 arch/powerpc/platforms/pseries/dlpar.c       | 14 ++++++++++++++
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  9 ++++++++-
 arch/powerpc/platforms/pseries/pseries.h     |  1 +
 3 files changed, 23 insertions(+), 1 deletion(-)

Comments

David Gibson April 19, 2021, 3:59 a.m. UTC | #1
On Fri, Apr 16, 2021 at 06:02:14PM -0300, Daniel Henrique Barboza wrote:
> At this moment, PAPR [1] does not have a way to report errors during a device
> removal operation. This puts a strain in the hypervisor, which needs extra
> mechanisms to try to fallback and recover from an error that might have
> happened during the removal. The QEMU community has dealt with it during these
> years by either trying to preempt the error before sending the HP event or, in
> case of a guest side failure, reboot the guest to complete the removal process.
> 
> This started to change with QEMU commit fe1831eff8a4 ("spapr_drc.c: use DRC
> reconfiguration to cleanup DIMM unplug state"), where a way to fallback from a
> memory removal error was introduced. In this case, when QEMU detects that the
> kernel is reconfiguring LMBs DRCs that were marked as pending removal, the
> entire process is reverted from the QEMU side as well. Around the same time,
> other discussions in the QEMU mailing discussed an alternative for other device
> as well.
> 
> In [2] the idea of using RTAS set-indicator for this role was first introduced.
> The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is already
> UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else for both QEMU
> and phyp. This gives us an opportunity to use this behavior to signal the
> hypervisor layer when a device removal happens, allowing it to do a
> proper

Nit: it's not when a device removal happens, but when it *fails* to happen.

> error handling knowing for sure that the removal failed in the kernel. Using
> set-indicator to report HP errors isn't strange to PAPR, as per R1-13.5.3.4-4.
> of table 13.7 of [1]:

> "For all DR options: If this is a DR operation that involves the user insert-
> ing a DR entity, then if the firmware can determine that the inserted entity
> would cause a system disturbance, then the set-indicator RTAS call must not
> unisolate the entity and must return an error status which is unique to the
> particular error."
> 
> PAPR does not make any restrictions or considerations about setting an already
> Unisolated/Configured DRC to 'unisolate', meaning we have a chance to use it
> for this purpose - signal an OS side error when attempting to remove a DR
> entity.  To validate the design, this is being implemented only for CPUs.
> 
> QEMU will use this mechanism to rollback the device removal (hotunplug) state,
> allowing for a better error handling mechanism. A implementation of how QEMU
> can do it is in [3]. When using a kernel with this series applied, together
> with this QEMU build, this is what happens in a common CPU removal/hotunplug
> error scenario (trying to remove the last online CPU):
> 
> ( QEMU command line: qemu-system-ppc64 -machine pseries,accel=kvm,usb=off
> -smp 1,maxcpus=2,threads=1,cores=2,sockets=1 ... )
> 
> [root@localhost ~]# QEMU 5.2.92 monitor - type 'help' for more information
> (qemu) device_add host-spapr-cpu-core,core-id=1,id=core1
> (qemu) 
> 
> [root@localhost ~]# echo 0 > /sys/devices/system/cpu/cpu0/online
> [   77.548442][   T13] IRQ 19: no longer affine to CPU0
> [   77.548452][   T13] IRQ 20: no longer affine to CPU0
> [   77.548458][   T13] IRQ 256: no longer affine to CPU0
> [   77.548465][   T13] IRQ 258: no longer affine to CPU0
> [   77.548472][   T13] IRQ 259: no longer affine to CPU0
> [   77.548479][   T13] IRQ 260: no longer affine to CPU0
> [   77.548485][   T13] IRQ 261: no longer affine to CPU0
> [   77.548590][    T0] cpu 0 (hwid 0) Ready to die...
> [root@localhost ~]# (qemu) 
> (qemu) device_del core1
> (qemu) [   83.214073][  T100] pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9, rc: -16
> qemu-system-ppc64: Device hotunplug rejected by the guest for device core1
> 
> (qemu) 
> 
> As soon as the CPU removal fails in dlpar_cpu(), QEMU becames aware of
> it and is able to do error recovery.
> 
> If this solution is well received, I'll push for an architecture change
> request internally at IBM to make this mechanism PAPR official.
> 
> 
> [1] https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf
> [2] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06395.html
> [3] https://github.com/danielhb/qemu/tree/unisolate_drc_callback_v1
> 
> Daniel Henrique Barboza (2):
>   dlpar.c: introduce dlpar_unisolate_drc()
>   hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure
> 
>  arch/powerpc/platforms/pseries/dlpar.c       | 14 ++++++++++++++
>  arch/powerpc/platforms/pseries/hotplug-cpu.c |  9 ++++++++-
>  arch/powerpc/platforms/pseries/pseries.h     |  1 +
>  3 files changed, 23 insertions(+), 1 deletion(-)
>
Michael Ellerman April 21, 2021, 1:08 p.m. UTC | #2
On Fri, 16 Apr 2021 18:02:14 -0300, Daniel Henrique Barboza wrote:
> At this moment, PAPR [1] does not have a way to report errors during a device
> removal operation. This puts a strain in the hypervisor, which needs extra
> mechanisms to try to fallback and recover from an error that might have
> happened during the removal. The QEMU community has dealt with it during these
> years by either trying to preempt the error before sending the HP event or, in
> case of a guest side failure, reboot the guest to complete the removal process.
> 
> [...]

Applied to powerpc/next.

[1/2] dlpar.c: introduce dlpar_unisolate_drc()
      https://git.kernel.org/powerpc/c/0e3b3ff83ce24a7a01e467ca42e3e33e87195c0d
[2/2] hotplug-cpu.c: set UNISOLATE on dlpar_cpu_remove() failure
      https://git.kernel.org/powerpc/c/29c9a2699e71a7866a98ebdf6ea38135d31b4e1f

cheers