diff mbox series

spapr: Don't request to unplug the same core twice

Message ID 157185826035.3073024.1664101000438499392.stgit@bahia.lan
State New
Headers show
Series spapr: Don't request to unplug the same core twice | expand

Commit Message

Greg Kurz Oct. 23, 2019, 7:17 p.m. UTC
We must not call spapr_drc_detach() on a detached DRC otherwise bad things
can happen, ie. QEMU hangs or crashes. This is easily demonstrated with
a CPU hotplug/unplug loop using QMP.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

David Gibson Oct. 23, 2019, 10:38 p.m. UTC | #1
On Wed, Oct 23, 2019 at 09:17:40PM +0200, Greg Kurz wrote:
> We must not call spapr_drc_detach() on a detached DRC otherwise bad things
> can happen, ie. QEMU hangs or crashes. This is easily demonstrated with
> a CPU hotplug/unplug loop using QMP.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Ouch, good catch.  Applied.

I wonder if we have the same problem with other DRC types.

> ---
>  hw/ppc/spapr.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f9410d390a07..94f9d27096af 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
>                            spapr_vcpu_id(spapr, cc->core_id));
>      g_assert(drc);
>  
> -    spapr_drc_detach(drc);
> -
> -    spapr_hotplug_req_remove_by_index(drc);
> +    if (!spapr_drc_unplug_requested(drc)) {
> +        spapr_drc_detach(drc);
> +        spapr_hotplug_req_remove_by_index(drc);
> +    }
>  }
>  
>  int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
>
Greg Kurz Oct. 24, 2019, 6:28 a.m. UTC | #2
On Thu, 24 Oct 2019 09:38:17 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Oct 23, 2019 at 09:17:40PM +0200, Greg Kurz wrote:
> > We must not call spapr_drc_detach() on a detached DRC otherwise bad things
> > can happen, ie. QEMU hangs or crashes. This is easily demonstrated with
> > a CPU hotplug/unplug loop using QMP.
> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> 
> Ouch, good catch.  Applied.
> 
> I wonder if we have the same problem with other DRC types.
> 

We don't have it with PHB and PCI types, through the same use of
spapr_drc_unplug_requested().

LMBs see to avoid it by failing device_del early if an unplug
request is already in progress:

    /*
     * An existing pending dimm state for this DIMM means that there is an
     * unplug operation in progress, waiting for the spapr_lmb_release
     * callback to complete the job (BQL can't cover that far). In this case,
     * bail out to avoid detaching DRCs that were already released.
     */
    if (spapr_pending_dimm_unplugs_find(spapr, dimm)) {
        error_setg(&local_err,
                   "Memory unplug already in progress for device %s",
                   dev->id);
        goto out;
    }

Not sure why we error out in this case instead of ignoring the unplug 
request.

> > ---
> >  hw/ppc/spapr.c |    7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index f9410d390a07..94f9d27096af 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
> >                            spapr_vcpu_id(spapr, cc->core_id));
> >      g_assert(drc);
> >  
> > -    spapr_drc_detach(drc);
> > -
> > -    spapr_hotplug_req_remove_by_index(drc);
> > +    if (!spapr_drc_unplug_requested(drc)) {
> > +        spapr_drc_detach(drc);
> > +        spapr_hotplug_req_remove_by_index(drc);
> > +    }
> >  }
> >  
> >  int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > 
>
David Gibson Oct. 28, 2019, 10:20 p.m. UTC | #3
On Thu, Oct 24, 2019 at 08:28:54AM +0200, Greg Kurz wrote:
> On Thu, 24 Oct 2019 09:38:17 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Oct 23, 2019 at 09:17:40PM +0200, Greg Kurz wrote:
> > > We must not call spapr_drc_detach() on a detached DRC otherwise bad things
> > > can happen, ie. QEMU hangs or crashes. This is easily demonstrated with
> > > a CPU hotplug/unplug loop using QMP.
> > > 
> > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > 
> > Ouch, good catch.  Applied.
> > 
> > I wonder if we have the same problem with other DRC types.
> > 
> 
> We don't have it with PHB and PCI types, through the same use of
> spapr_drc_unplug_requested().
> 
> LMBs see to avoid it by failing device_del early if an unplug
> request is already in progress:
> 
>     /*
>      * An existing pending dimm state for this DIMM means that there is an
>      * unplug operation in progress, waiting for the spapr_lmb_release
>      * callback to complete the job (BQL can't cover that far). In this case,
>      * bail out to avoid detaching DRCs that were already released.
>      */
>     if (spapr_pending_dimm_unplugs_find(spapr, dimm)) {
>         error_setg(&local_err,
>                    "Memory unplug already in progress for device %s",
>                    dev->id);
>         goto out;
>     }
> 
> Not sure why we error out in this case instead of ignoring the unplug 
> request.

I suspect no particularly good reason, just history.  Everything's a
bit  different with LMBs because a single device_del will usually
remove a whole bunch of LMB DRCs.  In general the interfacing between
the qemu user side DIMM handling and the PAPR side LMB/DRC handling
is... pretty clunky.

> 
> > > ---
> > >  hw/ppc/spapr.c |    7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index f9410d390a07..94f9d27096af 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >                            spapr_vcpu_id(spapr, cc->core_id));
> > >      g_assert(drc);
> > >  
> > > -    spapr_drc_detach(drc);
> > > -
> > > -    spapr_hotplug_req_remove_by_index(drc);
> > > +    if (!spapr_drc_unplug_requested(drc)) {
> > > +        spapr_drc_detach(drc);
> > > +        spapr_hotplug_req_remove_by_index(drc);
> > > +    }
> > >  }
> > >  
> > >  int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > > 
> > 
>
diff mbox series

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f9410d390a07..94f9d27096af 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3741,9 +3741,10 @@  void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
                           spapr_vcpu_id(spapr, cc->core_id));
     g_assert(drc);
 
-    spapr_drc_detach(drc);
-
-    spapr_hotplug_req_remove_by_index(drc);
+    if (!spapr_drc_unplug_requested(drc)) {
+        spapr_drc_detach(drc);
+        spapr_hotplug_req_remove_by_index(drc);
+    }
 }
 
 int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,