diff mbox

[RFCv2,12/12] ppc: Rework CPU compatibility testing across migration

Message ID 1479248275-18889-13-git-send-email-david@gibson.dropbear.id.au
State New
Headers show

Commit Message

David Gibson Nov. 15, 2016, 10:17 p.m. UTC
Migrating between different CPU versions is quite complicated for ppc.
A long time ago, we ensure identical CPU versions at either end by checking
the PVR had the same value.  However, this breaks under KVM HV, because
we always have to use the host's PVR - it's not virtualized.  That would
mean we couldn't migrate between hosts with different PVRs, even if the
CPUs are close enough to compatible in practice (sometimes identical cores
with different surrounding logic have different PVRs, so this happens in
practice quite often).

So, we removed the PVR check, but instead checked that several flags
indicating supported instructions matched.  This turns out to be a bad
idea, because those instruction masks are not architected information, but
essentially a TCG implementation detail.  So changes to qemu internal CPU
modelling can break migration - this happened between qemu-2.6 and
qemu-2.7.

Modern server-class CPUs can be placed into compatibility modes.  Now that
we're handling those properly, we finally have the information to sanely
deal with CPU compatibility across migration.

This patch bumps the migration version number for the ppc CPU removing the
instruction mask field (and some other unwise VMSTATE_EQUAL checks), and
adding the compatibility PVR to the migration stream.

We consider the CPUs compatible for migration if:
    * The source was running in a compatibility mode which the destination
      supports
OR  * The source has a PVR matching the same qemu CPU class as the
      destination, either an exact match or an approximate match determined
      by the cpu class's pvr_match hook.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 target-ppc/machine.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 79 insertions(+), 8 deletions(-)

Comments

Greg Kurz Dec. 2, 2016, 2:48 p.m. UTC | #1
On Wed, 16 Nov 2016 09:17:55 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> Migrating between different CPU versions is quite complicated for ppc.
> A long time ago, we ensure identical CPU versions at either end by checking
> the PVR had the same value.  However, this breaks under KVM HV, because
> we always have to use the host's PVR - it's not virtualized.  That would
> mean we couldn't migrate between hosts with different PVRs, even if the
> CPUs are close enough to compatible in practice (sometimes identical cores
> with different surrounding logic have different PVRs, so this happens in
> practice quite often).
> 
> So, we removed the PVR check, but instead checked that several flags
> indicating supported instructions matched.  This turns out to be a bad
> idea, because those instruction masks are not architected information, but
> essentially a TCG implementation detail.  So changes to qemu internal CPU
> modelling can break migration - this happened between qemu-2.6 and
> qemu-2.7.
> 
> Modern server-class CPUs can be placed into compatibility modes.  Now that
> we're handling those properly, we finally have the information to sanely
> deal with CPU compatibility across migration.
> 
> This patch bumps the migration version number for the ppc CPU removing the
> instruction mask field (and some other unwise VMSTATE_EQUAL checks), and
> adding the compatibility PVR to the migration stream.
> 

Things have changed since you posted this RFC:

commit 16a2497bd44cac1856e259654fd304079bd1dcdc
Author: David Gibson <david@gibson.dropbear.id.au>
Date:   Mon Nov 21 16:28:12 2016 +1100

    target-ppc: Fix CPU migration from qemu-2.6 <-> later versions

and

commit 146c11f16f12dbfea62cbd7f865614bb6fcbc6b5
Author: David Gibson <david@gibson.dropbear.id.au>
Date:   Mon Nov 21 16:29:30 2016 +1100

    target-ppc: Allow eventual removal of old migration mistakes

I guess that the version bumping isn't necessary anymore if we keep these.

I'll assume yes and rebase this patch against current master, simply dropping
the version bumping and related lines.

> We consider the CPUs compatible for migration if:
>     * The source was running in a compatibility mode which the destination
>       supports
> OR  * The source has a PVR matching the same qemu CPU class as the
>       destination, either an exact match or an approximate match determined
>       by the cpu class's pvr_match hook.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  target-ppc/machine.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 79 insertions(+), 8 deletions(-)
> 
> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> index e43cb6c..25a30d5 100644
> --- a/target-ppc/machine.c
> +++ b/target-ppc/machine.c
> @@ -8,6 +8,7 @@
>  #include "helper_regs.h"
>  #include "mmu-hash64.h"
>  #include "migration/cpu.h"
> +#include "qapi/error.h"
>  
>  static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
>  {
> @@ -163,6 +164,30 @@ static void cpu_pre_save(void *opaque)
>      }
>  }
>  
> +/*
> + * Determine if a given PVR is a "close enough" match to the CPU
> + * object.  For TCG and KVM PR it would probably be sufficient to
> + * require an exact PVR match.  However for KVM HV the user is
> + * restricted to a PVR exactly matching the host CPU.  The correct way
> + * to handle this is to put the guest into an architected
> + * compatibility mode.  However, to allow a more forgiving transition
> + * and migration from before this was widely done, we allow migration
> + * between sufficiently similar PVRs, as determined by the CPU class's
> + * pvr_match() hook.
> + */
> +static bool pvr_match(PowerPCCPU *cpu, uint32_t pvr)
> +{
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +
> +    if (pvr == pcc->pvr) {
> +        return true;
> +    }
> +    if (pcc->pvr_match) {
> +        return pcc->pvr_match(pcc, pvr);
> +    }
> +    return false;
> +}
> +
>  static int cpu_post_load(void *opaque, int version_id)
>  {
>      PowerPCCPU *cpu = opaque;
> @@ -171,10 +196,31 @@ static int cpu_post_load(void *opaque, int version_id)
>      target_ulong msr;
>  
>      /*
> -     * We always ignore the source PVR. The user or management
> -     * software has to take care of running QEMU in a compatible mode.
> +     * If we're operating in compat mode, we should be ok as long as
> +     * the destination supports the same compatiblity mode.
> +     *
> +     * Otherwise, however, we require that the destination has exactly
> +     * the same CPU model as the source.
>       */
> -    env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
> +
> +#if defined(TARGET_PPC64)
> +    if (cpu->compat_pvr) {
> +        Error *local_err = NULL;
> +
> +        ppc_set_compat(cpu, cpu->compat_pvr, &local_err);

This calls cpu_synchronize_state(CPU(cpu)) and trashes the registers. This
is the root cause behind the program interrupts I mentioned in another mail.

Adding a sync_needed boolean argument to ppc_set_compat() seems to be enough
to get this working. So I'll just do that and rerun the tests.

Cheers.

--
Greg

> +        if (local_err) {
> +            error_report_err(local_err);
> +            error_free(local_err);
> +            return -1;
> +        }
> +    } else
> +#endif
> +    {
> +        if (!pvr_match(cpu, env->spr[SPR_PVR])) {
> +            return -1;
> +        }
> +    }
> +
>      env->lr = env->spr[SPR_LR];
>      env->ctr = env->spr[SPR_CTR];
>      cpu_write_xer(env, env->spr[SPR_XER]);
> @@ -527,9 +573,32 @@ static const VMStateDescription vmstate_tlbmas = {
>      }
>  };
>  
> +static bool compat_needed(void *opaque)
> +{
> +    PowerPCCPU *cpu = opaque;
> +
> +    return cpu->vhyp != NULL;
> +}
> +
> +static const VMStateDescription vmstate_compat = {
> +    .name = "cpu/compat",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = compat_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(compat_pvr, PowerPCCPU),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static bool version_before_6(void *opaque, int version_id)
> +{
> +    return version_id < 6;
> +}
> +
>  const VMStateDescription vmstate_ppc_cpu = {
>      .name = "cpu",
> -    .version_id = 5,
> +    .version_id = 6,
>      .minimum_version_id = 5,
>      .minimum_version_id_old = 4,
>      .load_state_old = cpu_load_old,
> @@ -561,10 +630,11 @@ const VMStateDescription vmstate_ppc_cpu = {
>          /* FIXME: access_type? */
>  
>          /* Sanity checking */
> -        VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
> -        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
> -        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
> -        VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
> +        VMSTATE_UNUSED_TEST(version_before_6,
> +                            sizeof(target_ulong) /* msr_mask */
> +                            + sizeof(uint64_t) /* insns_flags */
> +                            + sizeof(uint64_t) /* insns_flags2 */
> +                            + sizeof(uint32_t)), /* nb_BATs */
>          VMSTATE_END_OF_LIST()
>      },
>      .subsections = (const VMStateDescription*[]) {
> @@ -579,6 +649,7 @@ const VMStateDescription vmstate_ppc_cpu = {
>          &vmstate_tlb6xx,
>          &vmstate_tlbemb,
>          &vmstate_tlbmas,
> +        &vmstate_compat,
>          NULL
>      }
>  };
David Gibson Dec. 5, 2016, 4:09 a.m. UTC | #2
On Fri, Dec 02, 2016 at 03:48:25PM +0100, Greg Kurz wrote:
> On Wed, 16 Nov 2016 09:17:55 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Migrating between different CPU versions is quite complicated for ppc.
> > A long time ago, we ensure identical CPU versions at either end by checking
> > the PVR had the same value.  However, this breaks under KVM HV, because
> > we always have to use the host's PVR - it's not virtualized.  That would
> > mean we couldn't migrate between hosts with different PVRs, even if the
> > CPUs are close enough to compatible in practice (sometimes identical cores
> > with different surrounding logic have different PVRs, so this happens in
> > practice quite often).
> > 
> > So, we removed the PVR check, but instead checked that several flags
> > indicating supported instructions matched.  This turns out to be a bad
> > idea, because those instruction masks are not architected information, but
> > essentially a TCG implementation detail.  So changes to qemu internal CPU
> > modelling can break migration - this happened between qemu-2.6 and
> > qemu-2.7.
> > 
> > Modern server-class CPUs can be placed into compatibility modes.  Now that
> > we're handling those properly, we finally have the information to sanely
> > deal with CPU compatibility across migration.
> > 
> > This patch bumps the migration version number for the ppc CPU removing the
> > instruction mask field (and some other unwise VMSTATE_EQUAL checks), and
> > adding the compatibility PVR to the migration stream.
> > 
> 
> Things have changed since you posted this RFC:
> 
> commit 16a2497bd44cac1856e259654fd304079bd1dcdc
> Author: David Gibson <david@gibson.dropbear.id.au>
> Date:   Mon Nov 21 16:28:12 2016 +1100
> 
>     target-ppc: Fix CPU migration from qemu-2.6 <-> later versions
> 
> and
> 
> commit 146c11f16f12dbfea62cbd7f865614bb6fcbc6b5
> Author: David Gibson <david@gibson.dropbear.id.au>
> Date:   Mon Nov 21 16:29:30 2016 +1100
> 
>     target-ppc: Allow eventual removal of old migration mistakes
> 
> I guess that the version bumping isn't necessary anymore if we keep these.
> 
> I'll assume yes and rebase this patch against current master, simply dropping
> the version bumping and related lines.

Yeah, I realised that breaking backwards migration was a bad idea, and
with some help from Dave Gilbert worked out how to make it possible.

I realize I'm going to have to rework my compat series in light of
these changes.

> > We consider the CPUs compatible for migration if:
> >     * The source was running in a compatibility mode which the destination
> >       supports
> > OR  * The source has a PVR matching the same qemu CPU class as the
> >       destination, either an exact match or an approximate match determined
> >       by the cpu class's pvr_match hook.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  target-ppc/machine.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 79 insertions(+), 8 deletions(-)
> > 
> > diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> > index e43cb6c..25a30d5 100644
> > --- a/target-ppc/machine.c
> > +++ b/target-ppc/machine.c
> > @@ -8,6 +8,7 @@
> >  #include "helper_regs.h"
> >  #include "mmu-hash64.h"
> >  #include "migration/cpu.h"
> > +#include "qapi/error.h"
> >  
> >  static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
> >  {
> > @@ -163,6 +164,30 @@ static void cpu_pre_save(void *opaque)
> >      }
> >  }
> >  
> > +/*
> > + * Determine if a given PVR is a "close enough" match to the CPU
> > + * object.  For TCG and KVM PR it would probably be sufficient to
> > + * require an exact PVR match.  However for KVM HV the user is
> > + * restricted to a PVR exactly matching the host CPU.  The correct way
> > + * to handle this is to put the guest into an architected
> > + * compatibility mode.  However, to allow a more forgiving transition
> > + * and migration from before this was widely done, we allow migration
> > + * between sufficiently similar PVRs, as determined by the CPU class's
> > + * pvr_match() hook.
> > + */
> > +static bool pvr_match(PowerPCCPU *cpu, uint32_t pvr)
> > +{
> > +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> > +
> > +    if (pvr == pcc->pvr) {
> > +        return true;
> > +    }
> > +    if (pcc->pvr_match) {
> > +        return pcc->pvr_match(pcc, pvr);
> > +    }
> > +    return false;
> > +}
> > +
> >  static int cpu_post_load(void *opaque, int version_id)
> >  {
> >      PowerPCCPU *cpu = opaque;
> > @@ -171,10 +196,31 @@ static int cpu_post_load(void *opaque, int version_id)
> >      target_ulong msr;
> >  
> >      /*
> > -     * We always ignore the source PVR. The user or management
> > -     * software has to take care of running QEMU in a compatible mode.
> > +     * If we're operating in compat mode, we should be ok as long as
> > +     * the destination supports the same compatiblity mode.
> > +     *
> > +     * Otherwise, however, we require that the destination has exactly
> > +     * the same CPU model as the source.
> >       */
> > -    env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
> > +
> > +#if defined(TARGET_PPC64)
> > +    if (cpu->compat_pvr) {
> > +        Error *local_err = NULL;
> > +
> > +        ppc_set_compat(cpu, cpu->compat_pvr, &local_err);
> 
> This calls cpu_synchronize_state(CPU(cpu)) and trashes the registers. This
> is the root cause behind the program interrupts I mentioned in another mail.
> 
> Adding a sync_needed boolean argument to ppc_set_compat() seems to be enough
> to get this working. So I'll just do that and rerun the tests.
> 
> Cheers.
>
diff mbox

Patch

diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index e43cb6c..25a30d5 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -8,6 +8,7 @@ 
 #include "helper_regs.h"
 #include "mmu-hash64.h"
 #include "migration/cpu.h"
+#include "qapi/error.h"
 
 static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
 {
@@ -163,6 +164,30 @@  static void cpu_pre_save(void *opaque)
     }
 }
 
+/*
+ * Determine if a given PVR is a "close enough" match to the CPU
+ * object.  For TCG and KVM PR it would probably be sufficient to
+ * require an exact PVR match.  However for KVM HV the user is
+ * restricted to a PVR exactly matching the host CPU.  The correct way
+ * to handle this is to put the guest into an architected
+ * compatibility mode.  However, to allow a more forgiving transition
+ * and migration from before this was widely done, we allow migration
+ * between sufficiently similar PVRs, as determined by the CPU class's
+ * pvr_match() hook.
+ */
+static bool pvr_match(PowerPCCPU *cpu, uint32_t pvr)
+{
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+
+    if (pvr == pcc->pvr) {
+        return true;
+    }
+    if (pcc->pvr_match) {
+        return pcc->pvr_match(pcc, pvr);
+    }
+    return false;
+}
+
 static int cpu_post_load(void *opaque, int version_id)
 {
     PowerPCCPU *cpu = opaque;
@@ -171,10 +196,31 @@  static int cpu_post_load(void *opaque, int version_id)
     target_ulong msr;
 
     /*
-     * We always ignore the source PVR. The user or management
-     * software has to take care of running QEMU in a compatible mode.
+     * If we're operating in compat mode, we should be ok as long as
+     * the destination supports the same compatiblity mode.
+     *
+     * Otherwise, however, we require that the destination has exactly
+     * the same CPU model as the source.
      */
-    env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
+
+#if defined(TARGET_PPC64)
+    if (cpu->compat_pvr) {
+        Error *local_err = NULL;
+
+        ppc_set_compat(cpu, cpu->compat_pvr, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            error_free(local_err);
+            return -1;
+        }
+    } else
+#endif
+    {
+        if (!pvr_match(cpu, env->spr[SPR_PVR])) {
+            return -1;
+        }
+    }
+
     env->lr = env->spr[SPR_LR];
     env->ctr = env->spr[SPR_CTR];
     cpu_write_xer(env, env->spr[SPR_XER]);
@@ -527,9 +573,32 @@  static const VMStateDescription vmstate_tlbmas = {
     }
 };
 
+static bool compat_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+
+    return cpu->vhyp != NULL;
+}
+
+static const VMStateDescription vmstate_compat = {
+    .name = "cpu/compat",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = compat_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(compat_pvr, PowerPCCPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static bool version_before_6(void *opaque, int version_id)
+{
+    return version_id < 6;
+}
+
 const VMStateDescription vmstate_ppc_cpu = {
     .name = "cpu",
-    .version_id = 5,
+    .version_id = 6,
     .minimum_version_id = 5,
     .minimum_version_id_old = 4,
     .load_state_old = cpu_load_old,
@@ -561,10 +630,11 @@  const VMStateDescription vmstate_ppc_cpu = {
         /* FIXME: access_type? */
 
         /* Sanity checking */
-        VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
-        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
-        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
-        VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
+        VMSTATE_UNUSED_TEST(version_before_6,
+                            sizeof(target_ulong) /* msr_mask */
+                            + sizeof(uint64_t) /* insns_flags */
+                            + sizeof(uint64_t) /* insns_flags2 */
+                            + sizeof(uint32_t)), /* nb_BATs */
         VMSTATE_END_OF_LIST()
     },
     .subsections = (const VMStateDescription*[]) {
@@ -579,6 +649,7 @@  const VMStateDescription vmstate_ppc_cpu = {
         &vmstate_tlb6xx,
         &vmstate_tlbemb,
         &vmstate_tlbmas,
+        &vmstate_compat,
         NULL
     }
 };