Message ID | 157185826035.3073024.1664101000438499392.stgit@bahia.lan |
---|---|
State | New |
Headers | show |
Series | spapr: Don't request to unplug the same core twice | expand |
On Wed, Oct 23, 2019 at 09:17:40PM +0200, Greg Kurz wrote: > We must not call spapr_drc_detach() on a detached DRC otherwise bad things > can happen, ie. QEMU hangs or crashes. This is easily demonstrated with > a CPU hotplug/unplug loop using QMP. > > Signed-off-by: Greg Kurz <groug@kaod.org> Ouch, good catch. Applied. I wonder if we have the same problem with other DRC types. > --- > hw/ppc/spapr.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index f9410d390a07..94f9d27096af 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev, > spapr_vcpu_id(spapr, cc->core_id)); > g_assert(drc); > > - spapr_drc_detach(drc); > - > - spapr_hotplug_req_remove_by_index(drc); > + if (!spapr_drc_unplug_requested(drc)) { > + spapr_drc_detach(drc); > + spapr_hotplug_req_remove_by_index(drc); > + } > } > > int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr, >
On Thu, 24 Oct 2019 09:38:17 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > On Wed, Oct 23, 2019 at 09:17:40PM +0200, Greg Kurz wrote: > > We must not call spapr_drc_detach() on a detached DRC otherwise bad things > > can happen, ie. QEMU hangs or crashes. This is easily demonstrated with > > a CPU hotplug/unplug loop using QMP. > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > Ouch, good catch. Applied. > > I wonder if we have the same problem with other DRC types. > We don't have it with PHB and PCI types, through the same use of spapr_drc_unplug_requested(). LMBs see to avoid it by failing device_del early if an unplug request is already in progress: /* * An existing pending dimm state for this DIMM means that there is an * unplug operation in progress, waiting for the spapr_lmb_release * callback to complete the job (BQL can't cover that far). In this case, * bail out to avoid detaching DRCs that were already released. */ if (spapr_pending_dimm_unplugs_find(spapr, dimm)) { error_setg(&local_err, "Memory unplug already in progress for device %s", dev->id); goto out; } Not sure why we error out in this case instead of ignoring the unplug request. > > --- > > hw/ppc/spapr.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index f9410d390a07..94f9d27096af 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev, > > spapr_vcpu_id(spapr, cc->core_id)); > > g_assert(drc); > > > > - spapr_drc_detach(drc); > > - > > - spapr_hotplug_req_remove_by_index(drc); > > + if (!spapr_drc_unplug_requested(drc)) { > > + spapr_drc_detach(drc); > > + spapr_hotplug_req_remove_by_index(drc); > > + } > > } > > > > int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr, > > >
On Thu, Oct 24, 2019 at 08:28:54AM +0200, Greg Kurz wrote: > On Thu, 24 Oct 2019 09:38:17 +1100 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Wed, Oct 23, 2019 at 09:17:40PM +0200, Greg Kurz wrote: > > > We must not call spapr_drc_detach() on a detached DRC otherwise bad things > > > can happen, ie. QEMU hangs or crashes. This is easily demonstrated with > > > a CPU hotplug/unplug loop using QMP. > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > > Ouch, good catch. Applied. > > > > I wonder if we have the same problem with other DRC types. > > > > We don't have it with PHB and PCI types, through the same use of > spapr_drc_unplug_requested(). > > LMBs see to avoid it by failing device_del early if an unplug > request is already in progress: > > /* > * An existing pending dimm state for this DIMM means that there is an > * unplug operation in progress, waiting for the spapr_lmb_release > * callback to complete the job (BQL can't cover that far). In this case, > * bail out to avoid detaching DRCs that were already released. > */ > if (spapr_pending_dimm_unplugs_find(spapr, dimm)) { > error_setg(&local_err, > "Memory unplug already in progress for device %s", > dev->id); > goto out; > } > > Not sure why we error out in this case instead of ignoring the unplug > request. I suspect no particularly good reason, just history. Everything's a bit different with LMBs because a single device_del will usually remove a whole bunch of LMB DRCs. In general the interfacing between the qemu user side DIMM handling and the PAPR side LMB/DRC handling is... pretty clunky. > > > > --- > > > hw/ppc/spapr.c | 7 ++++--- > > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > index f9410d390a07..94f9d27096af 100644 > > > --- a/hw/ppc/spapr.c > > > +++ b/hw/ppc/spapr.c > > > @@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev, > > > spapr_vcpu_id(spapr, cc->core_id)); > > > g_assert(drc); > > > > > > - spapr_drc_detach(drc); > > > - > > > - spapr_hotplug_req_remove_by_index(drc); > > > + if (!spapr_drc_unplug_requested(drc)) { > > > + spapr_drc_detach(drc); > > > + spapr_hotplug_req_remove_by_index(drc); > > > + } > > > } > > > > > > int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr, > > > > > >
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index f9410d390a07..94f9d27096af 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev, spapr_vcpu_id(spapr, cc->core_id)); g_assert(drc); - spapr_drc_detach(drc); - - spapr_hotplug_req_remove_by_index(drc); + if (!spapr_drc_unplug_requested(drc)) { + spapr_drc_detach(drc); + spapr_hotplug_req_remove_by_index(drc); + } } int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
We must not call spapr_drc_detach() on a detached DRC otherwise bad things can happen, ie. QEMU hangs or crashes. This is easily demonstrated with a CPU hotplug/unplug loop using QMP. Signed-off-by: Greg Kurz <groug@kaod.org> --- hw/ppc/spapr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)