diff mbox

[1/1] spapr: Ensure CPU cores are added contiguously and removed in LIFO order

Message ID 1468392620-25599-1-git-send-email-bharata@linux.vnet.ibm.com
State New
Headers show

Commit Message

Bharata B Rao July 13, 2016, 6:50 a.m. UTC
If CPU core addition or removal is allowed in random order leading to
holes in the core id range (and hence in the cpu_index range), migration
can fail as migration with holes in cpu_index range isn't yet handled
correctly.

Prevent this situation by enforcing the addition in contiguous order
and removal in LIFO order so that we never end up with holes in
cpu_index range.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
While there is work in progress to support migration when there are holes
in cpu_index range resulting from out-of-order plug or unplug, this patch
is intended as a last resort if no easy, risk-free and elegant solution
emerges before 2.7 dev cycle ends.

 hw/ppc/spapr_cpu_core.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Comments

Igor Mammedov July 13, 2016, 7:42 a.m. UTC | #1
On Wed, 13 Jul 2016 12:20:20 +0530
Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:

> If CPU core addition or removal is allowed in random order leading to
> holes in the core id range (and hence in the cpu_index range), migration
> can fail as migration with holes in cpu_index range isn't yet handled
> correctly.
> 
> Prevent this situation by enforcing the addition in contiguous order
> and removal in LIFO order so that we never end up with holes in
> cpu_index range.
Adding this limitation looks better than adding migration_id as
it will allow libvirt to use the current -numa cpus=... while
doing the same amount of guess work as it does now.

Similar patch for x86 won't be so simple as cpu-add can add cpus
with gaps (breaking migration at that), so I'd need to keep it
that way with some compat code, but that shouldn't be issue.


> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
> While there is work in progress to support migration when there are holes
> in cpu_index range resulting from out-of-order plug or unplug, this patch
> is intended as a last resort if no easy, risk-free and elegant solution
> emerges before 2.7 dev cycle ends.
> 
>  hw/ppc/spapr_cpu_core.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index bc52b3c..4bfc96b 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -126,12 +126,23 @@ static void spapr_core_release(DeviceState *dev, void *opaque)
>  void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                         Error **errp)
>  {
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
>      CPUCore *cc = CPU_CORE(dev);
>      sPAPRDRConnector *drc =
>          spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, cc->core_id);
>      sPAPRDRConnectorClass *drck;
>      Error *local_err = NULL;
> +    int smt = kvmppc_smt_threads();
> +    int index = cc->core_id / smt;
> +    int spapr_max_cores = max_cpus / smp_threads;
> +    int i;
>  
> +    for (i = spapr_max_cores - 1; i > index; i--) {
> +        if (spapr->cores[i]) {
> +            error_setg(errp, "core-id %d should be removed first", i * smt);
> +            return;
> +        }
> +    }
>      g_assert(drc);
>  
>      drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> @@ -214,7 +225,7 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(OBJECT(hotplug_dev));
>      sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
>      int spapr_max_cores = max_cpus / smp_threads;
> -    int index;
> +    int index, i;
>      int smt = kvmppc_smt_threads();
>      Error *local_err = NULL;
>      CPUCore *cc = CPU_CORE(dev);
> @@ -252,6 +263,14 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          goto out;
>      }
>  
> +    for (i = 0; i < index; i++) {
> +        if (!spapr->cores[i]) {
> +            error_setg(&local_err, "core-id %d should be added first",
> +                       i * smt);
> +            goto out;
> +        }
> +    }
> +
>  out:
>      g_free(base_core_type);
>      error_propagate(errp, local_err);
Igor Mammedov July 13, 2016, 8:19 a.m. UTC | #2
On Wed, 13 Jul 2016 09:42:54 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Wed, 13 Jul 2016 12:20:20 +0530
> Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> 
> > If CPU core addition or removal is allowed in random order leading to
> > holes in the core id range (and hence in the cpu_index range), migration
> > can fail as migration with holes in cpu_index range isn't yet handled
> > correctly.
> > 
> > Prevent this situation by enforcing the addition in contiguous order
> > and removal in LIFO order so that we never end up with holes in
> > cpu_index range.  
> Adding this limitation looks better than adding migration_id as
> it will allow libvirt to use the current -numa cpus=... while
> doing the same amount of guess work as it does now.
> 
> Similar patch for x86 won't be so simple as cpu-add can add cpus
> with gaps (breaking migration at that), so I'd need to keep it
> that way with some compat code, but that shouldn't be issue.
CCing Peter for libvirt's opinion on this turn of events

> 
> 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > ---
> > While there is work in progress to support migration when there are holes
> > in cpu_index range resulting from out-of-order plug or unplug, this patch
> > is intended as a last resort if no easy, risk-free and elegant solution
> > emerges before 2.7 dev cycle ends.
> > 
> >  hw/ppc/spapr_cpu_core.c | 21 ++++++++++++++++++++-
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> > index bc52b3c..4bfc96b 100644
> > --- a/hw/ppc/spapr_cpu_core.c
> > +++ b/hw/ppc/spapr_cpu_core.c
> > @@ -126,12 +126,23 @@ static void spapr_core_release(DeviceState *dev, void *opaque)
> >  void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >                         Error **errp)
> >  {
> > +    sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
> >      CPUCore *cc = CPU_CORE(dev);
> >      sPAPRDRConnector *drc =
> >          spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, cc->core_id);
> >      sPAPRDRConnectorClass *drck;
> >      Error *local_err = NULL;
> > +    int smt = kvmppc_smt_threads();
> > +    int index = cc->core_id / smt;
> > +    int spapr_max_cores = max_cpus / smp_threads;
> > +    int i;
> >  
> > +    for (i = spapr_max_cores - 1; i > index; i--) {
> > +        if (spapr->cores[i]) {
> > +            error_setg(errp, "core-id %d should be removed first", i * smt);
> > +            return;
> > +        }
> > +    }
> >      g_assert(drc);
> >  
> >      drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > @@ -214,7 +225,7 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(OBJECT(hotplug_dev));
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
> >      int spapr_max_cores = max_cpus / smp_threads;
> > -    int index;
> > +    int index, i;
> >      int smt = kvmppc_smt_threads();
> >      Error *local_err = NULL;
> >      CPUCore *cc = CPU_CORE(dev);
> > @@ -252,6 +263,14 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >          goto out;
> >      }
> >  
> > +    for (i = 0; i < index; i++) {
> > +        if (!spapr->cores[i]) {
> > +            error_setg(&local_err, "core-id %d should be added first",
> > +                       i * smt);
> > +            goto out;
> > +        }
> > +    }
> > +
> >  out:
> >      g_free(base_core_type);
> >      error_propagate(errp, local_err);  
> 
>
Igor Mammedov July 13, 2016, 8:23 a.m. UTC | #3
On Wed, 13 Jul 2016 10:19:00 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Wed, 13 Jul 2016 09:42:54 +0200
> Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > On Wed, 13 Jul 2016 12:20:20 +0530
> > Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> >   
> > > If CPU core addition or removal is allowed in random order leading to
> > > holes in the core id range (and hence in the cpu_index range), migration
> > > can fail as migration with holes in cpu_index range isn't yet handled
> > > correctly.
> > > 
> > > Prevent this situation by enforcing the addition in contiguous order
> > > and removal in LIFO order so that we never end up with holes in
> > > cpu_index range.    
> > Adding this limitation looks better than adding migration_id as
> > it will allow libvirt to use the current -numa cpus=... while
> > doing the same amount of guess work as it does now.
> > 
> > Similar patch for x86 won't be so simple as cpu-add can add cpus
> > with gaps (breaking migration at that), so I'd need to keep it
> > that way with some compat code, but that shouldn't be issue.  
> CCing Peter for libvirt's opinion on this turn of events
didn't actually CCed him :/

> 
> > 
> >   
> > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > ---
> > > While there is work in progress to support migration when there are holes
> > > in cpu_index range resulting from out-of-order plug or unplug, this patch
> > > is intended as a last resort if no easy, risk-free and elegant solution
> > > emerges before 2.7 dev cycle ends.
> > > 
> > >  hw/ppc/spapr_cpu_core.c | 21 ++++++++++++++++++++-
> > >  1 file changed, 20 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> > > index bc52b3c..4bfc96b 100644
> > > --- a/hw/ppc/spapr_cpu_core.c
> > > +++ b/hw/ppc/spapr_cpu_core.c
> > > @@ -126,12 +126,23 @@ static void spapr_core_release(DeviceState *dev, void *opaque)
> > >  void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >                         Error **errp)
> > >  {
> > > +    sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
> > >      CPUCore *cc = CPU_CORE(dev);
> > >      sPAPRDRConnector *drc =
> > >          spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, cc->core_id);
> > >      sPAPRDRConnectorClass *drck;
> > >      Error *local_err = NULL;
> > > +    int smt = kvmppc_smt_threads();
> > > +    int index = cc->core_id / smt;
> > > +    int spapr_max_cores = max_cpus / smp_threads;
> > > +    int i;
> > >  
> > > +    for (i = spapr_max_cores - 1; i > index; i--) {
> > > +        if (spapr->cores[i]) {
> > > +            error_setg(errp, "core-id %d should be removed first", i * smt);
> > > +            return;
> > > +        }
> > > +    }
> > >      g_assert(drc);
> > >  
> > >      drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > @@ -214,7 +225,7 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(OBJECT(hotplug_dev));
> > >      sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
> > >      int spapr_max_cores = max_cpus / smp_threads;
> > > -    int index;
> > > +    int index, i;
> > >      int smt = kvmppc_smt_threads();
> > >      Error *local_err = NULL;
> > >      CPUCore *cc = CPU_CORE(dev);
> > > @@ -252,6 +263,14 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >          goto out;
> > >      }
> > >  
> > > +    for (i = 0; i < index; i++) {
> > > +        if (!spapr->cores[i]) {
> > > +            error_setg(&local_err, "core-id %d should be added first",
> > > +                       i * smt);
> > > +            goto out;
> > > +        }
> > > +    }
> > > +
> > >  out:
> > >      g_free(base_core_type);
> > >      error_propagate(errp, local_err);    
> > 
> >   
> 
>
Greg Kurz July 13, 2016, 9:20 a.m. UTC | #4
On Wed, 13 Jul 2016 09:42:54 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Wed, 13 Jul 2016 12:20:20 +0530
> Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> 
> > If CPU core addition or removal is allowed in random order leading to
> > holes in the core id range (and hence in the cpu_index range), migration
> > can fail as migration with holes in cpu_index range isn't yet handled
> > correctly.
> > 
> > Prevent this situation by enforcing the addition in contiguous order
> > and removal in LIFO order so that we never end up with holes in
> > cpu_index range.  
> Adding this limitation looks better than adding migration_id as
> it will allow libvirt to use the current -numa cpus=... while
> doing the same amount of guess work as it does now.
> 
> Similar patch for x86 won't be so simple as cpu-add can add cpus
> with gaps (breaking migration at that), so I'd need to keep it
> that way with some compat code, but that shouldn't be issue.
> 

And does libvirt take care not to add cpus with gaps ? If so, maybe it
can do the same with PPC, and we'd simply warn the user that migration
is broken if we introduce gaps. Makes sense ?

> 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > ---
> > While there is work in progress to support migration when there are holes
> > in cpu_index range resulting from out-of-order plug or unplug, this patch
> > is intended as a last resort if no easy, risk-free and elegant solution
> > emerges before 2.7 dev cycle ends.
> > 
> >  hw/ppc/spapr_cpu_core.c | 21 ++++++++++++++++++++-
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> > index bc52b3c..4bfc96b 100644
> > --- a/hw/ppc/spapr_cpu_core.c
> > +++ b/hw/ppc/spapr_cpu_core.c
> > @@ -126,12 +126,23 @@ static void spapr_core_release(DeviceState *dev, void *opaque)
> >  void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >                         Error **errp)
> >  {
> > +    sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
> >      CPUCore *cc = CPU_CORE(dev);
> >      sPAPRDRConnector *drc =
> >          spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, cc->core_id);
> >      sPAPRDRConnectorClass *drck;
> >      Error *local_err = NULL;
> > +    int smt = kvmppc_smt_threads();
> > +    int index = cc->core_id / smt;
> > +    int spapr_max_cores = max_cpus / smp_threads;
> > +    int i;
> >  
> > +    for (i = spapr_max_cores - 1; i > index; i--) {
> > +        if (spapr->cores[i]) {
> > +            error_setg(errp, "core-id %d should be removed first", i * smt);
> > +            return;
> > +        }
> > +    }
> >      g_assert(drc);
> >  
> >      drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > @@ -214,7 +225,7 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(OBJECT(hotplug_dev));
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
> >      int spapr_max_cores = max_cpus / smp_threads;
> > -    int index;
> > +    int index, i;
> >      int smt = kvmppc_smt_threads();
> >      Error *local_err = NULL;
> >      CPUCore *cc = CPU_CORE(dev);
> > @@ -252,6 +263,14 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >          goto out;
> >      }
> >  
> > +    for (i = 0; i < index; i++) {
> > +        if (!spapr->cores[i]) {
> > +            error_setg(&local_err, "core-id %d should be added first",
> > +                       i * smt);
> > +            goto out;
> > +        }
> > +    }
> > +
> >  out:
> >      g_free(base_core_type);
> >      error_propagate(errp, local_err);  
> 
>
David Gibson July 14, 2016, 12:51 a.m. UTC | #5
On Wed, Jul 13, 2016 at 12:20:20PM +0530, Bharata B Rao wrote:
> If CPU core addition or removal is allowed in random order leading to
> holes in the core id range (and hence in the cpu_index range), migration
> can fail as migration with holes in cpu_index range isn't yet handled
> correctly.
> 
> Prevent this situation by enforcing the addition in contiguous order
> and removal in LIFO order so that we never end up with holes in
> cpu_index range.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
> While there is work in progress to support migration when there are holes
> in cpu_index range resulting from out-of-order plug or unplug, this patch
> is intended as a last resort if no easy, risk-free and elegant solution
> emerges before 2.7 dev cycle ends.

Applied to ppc-for-2.7.  We can revert it once the problems with
cpu_index are sorted out.

> 
>  hw/ppc/spapr_cpu_core.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index bc52b3c..4bfc96b 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -126,12 +126,23 @@ static void spapr_core_release(DeviceState *dev, void *opaque)
>  void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                         Error **errp)
>  {
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
>      CPUCore *cc = CPU_CORE(dev);
>      sPAPRDRConnector *drc =
>          spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, cc->core_id);
>      sPAPRDRConnectorClass *drck;
>      Error *local_err = NULL;
> +    int smt = kvmppc_smt_threads();
> +    int index = cc->core_id / smt;
> +    int spapr_max_cores = max_cpus / smp_threads;
> +    int i;
>  
> +    for (i = spapr_max_cores - 1; i > index; i--) {
> +        if (spapr->cores[i]) {
> +            error_setg(errp, "core-id %d should be removed first", i * smt);
> +            return;
> +        }
> +    }
>      g_assert(drc);
>  
>      drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> @@ -214,7 +225,7 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(OBJECT(hotplug_dev));
>      sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
>      int spapr_max_cores = max_cpus / smp_threads;
> -    int index;
> +    int index, i;
>      int smt = kvmppc_smt_threads();
>      Error *local_err = NULL;
>      CPUCore *cc = CPU_CORE(dev);
> @@ -252,6 +263,14 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          goto out;
>      }
>  
> +    for (i = 0; i < index; i++) {
> +        if (!spapr->cores[i]) {
> +            error_setg(&local_err, "core-id %d should be added first",
> +                       i * smt);
> +            goto out;
> +        }
> +    }
> +
>  out:
>      g_free(base_core_type);
>      error_propagate(errp, local_err);
Igor Mammedov July 14, 2016, 8:27 a.m. UTC | #6
On Thu, 14 Jul 2016 10:51:27 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Jul 13, 2016 at 12:20:20PM +0530, Bharata B Rao wrote:
> > If CPU core addition or removal is allowed in random order leading to
> > holes in the core id range (and hence in the cpu_index range), migration
> > can fail as migration with holes in cpu_index range isn't yet handled
> > correctly.
> > 
> > Prevent this situation by enforcing the addition in contiguous order
> > and removal in LIFO order so that we never end up with holes in
> > cpu_index range.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > ---
> > While there is work in progress to support migration when there are holes
> > in cpu_index range resulting from out-of-order plug or unplug, this patch
> > is intended as a last resort if no easy, risk-free and elegant solution
> > emerges before 2.7 dev cycle ends.  
> 
> Applied to ppc-for-2.7.  We can revert it once the problems with
> cpu_index are sorted out.
You'd need to add machine type specific compat option here,
so that new-qemu -M 2.7 wouldn't allow out of order too and
could be migrated to old-qemu -M 2.7

> 
> > 
> >  hw/ppc/spapr_cpu_core.c | 21 ++++++++++++++++++++-
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> > index bc52b3c..4bfc96b 100644
> > --- a/hw/ppc/spapr_cpu_core.c
> > +++ b/hw/ppc/spapr_cpu_core.c
> > @@ -126,12 +126,23 @@ static void spapr_core_release(DeviceState *dev, void *opaque)
> >  void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >                         Error **errp)
> >  {
> > +    sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
> >      CPUCore *cc = CPU_CORE(dev);
> >      sPAPRDRConnector *drc =
> >          spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, cc->core_id);
> >      sPAPRDRConnectorClass *drck;
> >      Error *local_err = NULL;
> > +    int smt = kvmppc_smt_threads();
> > +    int index = cc->core_id / smt;
> > +    int spapr_max_cores = max_cpus / smp_threads;
> > +    int i;
> >  
> > +    for (i = spapr_max_cores - 1; i > index; i--) {
> > +        if (spapr->cores[i]) {
> > +            error_setg(errp, "core-id %d should be removed first", i * smt);
> > +            return;
> > +        }
> > +    }
> >      g_assert(drc);
> >  
> >      drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > @@ -214,7 +225,7 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(OBJECT(hotplug_dev));
> >      sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
> >      int spapr_max_cores = max_cpus / smp_threads;
> > -    int index;
> > +    int index, i;
> >      int smt = kvmppc_smt_threads();
> >      Error *local_err = NULL;
> >      CPUCore *cc = CPU_CORE(dev);
> > @@ -252,6 +263,14 @@ void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >          goto out;
> >      }
> >  
> > +    for (i = 0; i < index; i++) {
> > +        if (!spapr->cores[i]) {
> > +            error_setg(&local_err, "core-id %d should be added first",
> > +                       i * smt);
> > +            goto out;
> > +        }
> > +    }
> > +
> >  out:
> >      g_free(base_core_type);
> >      error_propagate(errp, local_err);  
>
David Gibson July 15, 2016, 5:29 a.m. UTC | #7
On Thu, Jul 14, 2016 at 10:27:15AM +0200, Igor Mammedov wrote:
> On Thu, 14 Jul 2016 10:51:27 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Jul 13, 2016 at 12:20:20PM +0530, Bharata B Rao wrote:
> > > If CPU core addition or removal is allowed in random order leading to
> > > holes in the core id range (and hence in the cpu_index range), migration
> > > can fail as migration with holes in cpu_index range isn't yet handled
> > > correctly.
> > > 
> > > Prevent this situation by enforcing the addition in contiguous order
> > > and removal in LIFO order so that we never end up with holes in
> > > cpu_index range.
> > > 
> > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > ---
> > > While there is work in progress to support migration when there are holes
> > > in cpu_index range resulting from out-of-order plug or unplug, this patch
> > > is intended as a last resort if no easy, risk-free and elegant solution
> > > emerges before 2.7 dev cycle ends.  
> > 
> > Applied to ppc-for-2.7.  We can revert it once the problems with
> > cpu_index are sorted out.
> You'd need to add machine type specific compat option here,
> so that new-qemu -M 2.7 wouldn't allow out of order too and
> could be migrated to old-qemu -M 2.7

Hmm, do we care about migration from newer back to older versions of
qemu upstream?  If so, then I guess we do need this option.  Though
strictly we don't need it until we actually do allow any-order
hotplug.
Bharata B Rao July 15, 2016, 5:35 a.m. UTC | #8
On Fri, Jul 15, 2016 at 03:29:01PM +1000, David Gibson wrote:
> On Thu, Jul 14, 2016 at 10:27:15AM +0200, Igor Mammedov wrote:
> > On Thu, 14 Jul 2016 10:51:27 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> > 
> > > On Wed, Jul 13, 2016 at 12:20:20PM +0530, Bharata B Rao wrote:
> > > > If CPU core addition or removal is allowed in random order leading to
> > > > holes in the core id range (and hence in the cpu_index range), migration
> > > > can fail as migration with holes in cpu_index range isn't yet handled
> > > > correctly.
> > > > 
> > > > Prevent this situation by enforcing the addition in contiguous order
> > > > and removal in LIFO order so that we never end up with holes in
> > > > cpu_index range.
> > > > 
> > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > > ---
> > > > While there is work in progress to support migration when there are holes
> > > > in cpu_index range resulting from out-of-order plug or unplug, this patch
> > > > is intended as a last resort if no easy, risk-free and elegant solution
> > > > emerges before 2.7 dev cycle ends.  
> > > 
> > > Applied to ppc-for-2.7.  We can revert it once the problems with
> > > cpu_index are sorted out.
> > You'd need to add machine type specific compat option here,
> > so that new-qemu -M 2.7 wouldn't allow out of order too and
> > could be migrated to old-qemu -M 2.7
> 
> Hmm, do we care about migration from newer back to older versions of
> qemu upstream?  If so, then I guess we do need this option.  Though
> strictly we don't need it until we actually do allow any-order
> hotplug.

Right. I too thought that when we relax this restriction say in 2.8, we could
have a sPAPRMachineClass option to allow out-of-order hotplug for 2.8 upwards
and disable it for 2.7 downwards. With this, 2.7 guest started with new-qemu
can be migrated to old-qemu.

Regards,
Bharata.
Igor Mammedov July 15, 2016, 9:15 a.m. UTC | #9
On Fri, 15 Jul 2016 15:29:01 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Jul 14, 2016 at 10:27:15AM +0200, Igor Mammedov wrote:
> > On Thu, 14 Jul 2016 10:51:27 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Jul 13, 2016 at 12:20:20PM +0530, Bharata B Rao wrote:  
> > > > If CPU core addition or removal is allowed in random order leading to
> > > > holes in the core id range (and hence in the cpu_index range), migration
> > > > can fail as migration with holes in cpu_index range isn't yet handled
> > > > correctly.
> > > > 
> > > > Prevent this situation by enforcing the addition in contiguous order
> > > > and removal in LIFO order so that we never end up with holes in
> > > > cpu_index range.
> > > > 
> > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > > ---
> > > > While there is work in progress to support migration when there are holes
> > > > in cpu_index range resulting from out-of-order plug or unplug, this patch
> > > > is intended as a last resort if no easy, risk-free and elegant solution
> > > > emerges before 2.7 dev cycle ends.    
> > > 
> > > Applied to ppc-for-2.7.  We can revert it once the problems with
> > > cpu_index are sorted out.  
> > You'd need to add machine type specific compat option here,
> > so that new-qemu -M 2.7 wouldn't allow out of order too and
> > could be migrated to old-qemu -M 2.7  
> 
> Hmm, do we care about migration from newer back to older versions of
> qemu upstream?
upstream we don't, though it would reduce maintenance head-ache.
(if isn't hard then why not do it upstream either)
diff mbox

Patch

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index bc52b3c..4bfc96b 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -126,12 +126,23 @@  static void spapr_core_release(DeviceState *dev, void *opaque)
 void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
                        Error **errp)
 {
+    sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
     CPUCore *cc = CPU_CORE(dev);
     sPAPRDRConnector *drc =
         spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, cc->core_id);
     sPAPRDRConnectorClass *drck;
     Error *local_err = NULL;
+    int smt = kvmppc_smt_threads();
+    int index = cc->core_id / smt;
+    int spapr_max_cores = max_cpus / smp_threads;
+    int i;
 
+    for (i = spapr_max_cores - 1; i > index; i--) {
+        if (spapr->cores[i]) {
+            error_setg(errp, "core-id %d should be removed first", i * smt);
+            return;
+        }
+    }
     g_assert(drc);
 
     drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
@@ -214,7 +225,7 @@  void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(OBJECT(hotplug_dev));
     sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
     int spapr_max_cores = max_cpus / smp_threads;
-    int index;
+    int index, i;
     int smt = kvmppc_smt_threads();
     Error *local_err = NULL;
     CPUCore *cc = CPU_CORE(dev);
@@ -252,6 +263,14 @@  void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         goto out;
     }
 
+    for (i = 0; i < index; i++) {
+        if (!spapr->cores[i]) {
+            error_setg(&local_err, "core-id %d should be added first",
+                       i * smt);
+            goto out;
+        }
+    }
+
 out:
     g_free(base_core_type);
     error_propagate(errp, local_err);