mbox series

[0/3] spapr: Fix device unplug vs CAS or migration

Message ID 158076936422.2118610.5626450767672103134.stgit@bahia.lan
Headers show
Series spapr: Fix device unplug vs CAS or migration | expand

Message

Greg Kurz Feb. 3, 2020, 10:36 p.m. UTC
While working on getting rid of CAS reboot, I realized that we currently
don't handle device hot unplug properly in the following situations:

1) if the device is unplugged between boot and CAS, SLOF doesn't handle
   the even, which is a known limitation. The device hence stays around
   forever (specifically, until some other event is emitted and the guest
   eventually completes the unplug or a reboot). Until we can teach SLOF
   to correctly process the full FDT at CAS, we should trigger a CAS reboot,
   like we already do for hotplug.

2) if the guest is migrated after the even was emitted but before the
   guest could process it, the destination is unaware of the pending
   unplug operation and doesn't remove the device when the guests
   releases it. The 'unplug_requested' field of the DRC is actually state
   that should be migrated.

--
Greg

---

Greg Kurz (3):
      spapr: Don't use spapr_drc_needed() in CAS code
      spapr: Detect hot unplugged devices during CAS
      spapr: Migrate SpaprDrc::unplug_requested


 hw/ppc/spapr_drc.c         |   30 ++++++++++++++++++++++++++----
 hw/ppc/spapr_hcall.c       |   12 +++++++++---
 include/hw/ppc/spapr_drc.h |    8 +++++++-
 3 files changed, 42 insertions(+), 8 deletions(-)

Comments

Greg Kurz Feb. 13, 2020, 3:10 p.m. UTC | #1
Ping ?

This series fixes actual bugs. Also, I have another patch on top of
that to cold plug (or remove) devices pending hot plug (or unplug)
before CAS, hence removing the need for CAS reboot in these cases.
This requires SLOF to correctly parse the FDT it gets at CAS. Patches
have been sent for that too:

https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=689ff6f6554d94fdab854bf4fc4ec85e2675e43d
https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=a093be1ebe7a48321646601d94be6cf735c81e12
https://patchwork.ozlabs.org/patch/1235817/

On Mon, 03 Feb 2020 23:36:04 +0100
Greg Kurz <groug@kaod.org> wrote:

> While working on getting rid of CAS reboot, I realized that we currently
> don't handle device hot unplug properly in the following situations:
> 
> 1) if the device is unplugged between boot and CAS, SLOF doesn't handle
>    the even, which is a known limitation. The device hence stays around
>    forever (specifically, until some other event is emitted and the guest
>    eventually completes the unplug or a reboot). Until we can teach SLOF
>    to correctly process the full FDT at CAS, we should trigger a CAS reboot,
>    like we already do for hotplug.
> 
> 2) if the guest is migrated after the even was emitted but before the
>    guest could process it, the destination is unaware of the pending
>    unplug operation and doesn't remove the device when the guests
>    releases it. The 'unplug_requested' field of the DRC is actually state
>    that should be migrated.
> 
> --
> Greg
> 
> ---
> 
> Greg Kurz (3):
>       spapr: Don't use spapr_drc_needed() in CAS code
>       spapr: Detect hot unplugged devices during CAS
>       spapr: Migrate SpaprDrc::unplug_requested
> 
> 
>  hw/ppc/spapr_drc.c         |   30 ++++++++++++++++++++++++++----
>  hw/ppc/spapr_hcall.c       |   12 +++++++++---
>  include/hw/ppc/spapr_drc.h |    8 +++++++-
>  3 files changed, 42 insertions(+), 8 deletions(-)
> 
>
David Gibson Feb. 14, 2020, 2:29 a.m. UTC | #2
On Thu, Feb 13, 2020 at 04:10:55PM +0100, Greg Kurz wrote:
> Ping ?
> 
> This series fixes actual bugs. Also, I have another patch on top of
> that to cold plug (or remove) devices pending hot plug (or unplug)
> before CAS, hence removing the need for CAS reboot in these cases.
> This requires SLOF to correctly parse the FDT it gets at CAS. Patches
> have been sent for that too:
> 
> https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=689ff6f6554d94fdab854bf4fc4ec85e2675e43d
> https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=a093be1ebe7a48321646601d94be6cf735c81e12
> https://patchwork.ozlabs.org/patch/1235817/

Yeah, sorry, I've been having a bit of trouble getting my head around
the cases here.  I've sent a comment now.

> 
> On Mon, 03 Feb 2020 23:36:04 +0100
> Greg Kurz <groug@kaod.org> wrote:
> 
> > While working on getting rid of CAS reboot, I realized that we currently
> > don't handle device hot unplug properly in the following situations:
> > 
> > 1) if the device is unplugged between boot and CAS, SLOF doesn't handle
> >    the even, which is a known limitation. The device hence stays around
> >    forever (specifically, until some other event is emitted and the guest
> >    eventually completes the unplug or a reboot). Until we can teach SLOF
> >    to correctly process the full FDT at CAS, we should trigger a CAS reboot,
> >    like we already do for hotplug.
> > 
> > 2) if the guest is migrated after the even was emitted but before the
> >    guest could process it, the destination is unaware of the pending
> >    unplug operation and doesn't remove the device when the guests
> >    releases it. The 'unplug_requested' field of the DRC is actually state
> >    that should be migrated.
> > 
>