Message ID | 1479248275-18889-13-git-send-email-david@gibson.dropbear.id.au |
---|---|
State | New |
Headers | show |
On Wed, 16 Nov 2016 09:17:55 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > Migrating between different CPU versions is quite complicated for ppc. > A long time ago, we ensure identical CPU versions at either end by checking > the PVR had the same value. However, this breaks under KVM HV, because > we always have to use the host's PVR - it's not virtualized. That would > mean we couldn't migrate between hosts with different PVRs, even if the > CPUs are close enough to compatible in practice (sometimes identical cores > with different surrounding logic have different PVRs, so this happens in > practice quite often). > > So, we removed the PVR check, but instead checked that several flags > indicating supported instructions matched. This turns out to be a bad > idea, because those instruction masks are not architected information, but > essentially a TCG implementation detail. So changes to qemu internal CPU > modelling can break migration - this happened between qemu-2.6 and > qemu-2.7. > > Modern server-class CPUs can be placed into compatibility modes. Now that > we're handling those properly, we finally have the information to sanely > deal with CPU compatibility across migration. > > This patch bumps the migration version number for the ppc CPU removing the > instruction mask field (and some other unwise VMSTATE_EQUAL checks), and > adding the compatibility PVR to the migration stream. > Things have changed since you posted this RFC: commit 16a2497bd44cac1856e259654fd304079bd1dcdc Author: David Gibson <david@gibson.dropbear.id.au> Date: Mon Nov 21 16:28:12 2016 +1100 target-ppc: Fix CPU migration from qemu-2.6 <-> later versions and commit 146c11f16f12dbfea62cbd7f865614bb6fcbc6b5 Author: David Gibson <david@gibson.dropbear.id.au> Date: Mon Nov 21 16:29:30 2016 +1100 target-ppc: Allow eventual removal of old migration mistakes I guess that the version bumping isn't necessary anymore if we keep these. I'll assume yes and rebase this patch against current master, simply dropping the version bumping and related lines. > We consider the CPUs compatible for migration if: > * The source was running in a compatibility mode which the destination > supports > OR * The source has a PVR matching the same qemu CPU class as the > destination, either an exact match or an approximate match determined > by the cpu class's pvr_match hook. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > target-ppc/machine.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 79 insertions(+), 8 deletions(-) > > diff --git a/target-ppc/machine.c b/target-ppc/machine.c > index e43cb6c..25a30d5 100644 > --- a/target-ppc/machine.c > +++ b/target-ppc/machine.c > @@ -8,6 +8,7 @@ > #include "helper_regs.h" > #include "mmu-hash64.h" > #include "migration/cpu.h" > +#include "qapi/error.h" > > static int cpu_load_old(QEMUFile *f, void *opaque, int version_id) > { > @@ -163,6 +164,30 @@ static void cpu_pre_save(void *opaque) > } > } > > +/* > + * Determine if a given PVR is a "close enough" match to the CPU > + * object. For TCG and KVM PR it would probably be sufficient to > + * require an exact PVR match. However for KVM HV the user is > + * restricted to a PVR exactly matching the host CPU. The correct way > + * to handle this is to put the guest into an architected > + * compatibility mode. However, to allow a more forgiving transition > + * and migration from before this was widely done, we allow migration > + * between sufficiently similar PVRs, as determined by the CPU class's > + * pvr_match() hook. > + */ > +static bool pvr_match(PowerPCCPU *cpu, uint32_t pvr) > +{ > + PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu); > + > + if (pvr == pcc->pvr) { > + return true; > + } > + if (pcc->pvr_match) { > + return pcc->pvr_match(pcc, pvr); > + } > + return false; > +} > + > static int cpu_post_load(void *opaque, int version_id) > { > PowerPCCPU *cpu = opaque; > @@ -171,10 +196,31 @@ static int cpu_post_load(void *opaque, int version_id) > target_ulong msr; > > /* > - * We always ignore the source PVR. The user or management > - * software has to take care of running QEMU in a compatible mode. > + * If we're operating in compat mode, we should be ok as long as > + * the destination supports the same compatiblity mode. > + * > + * Otherwise, however, we require that the destination has exactly > + * the same CPU model as the source. > */ > - env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value; > + > +#if defined(TARGET_PPC64) > + if (cpu->compat_pvr) { > + Error *local_err = NULL; > + > + ppc_set_compat(cpu, cpu->compat_pvr, &local_err); This calls cpu_synchronize_state(CPU(cpu)) and trashes the registers. This is the root cause behind the program interrupts I mentioned in another mail. Adding a sync_needed boolean argument to ppc_set_compat() seems to be enough to get this working. So I'll just do that and rerun the tests. Cheers. -- Greg > + if (local_err) { > + error_report_err(local_err); > + error_free(local_err); > + return -1; > + } > + } else > +#endif > + { > + if (!pvr_match(cpu, env->spr[SPR_PVR])) { > + return -1; > + } > + } > + > env->lr = env->spr[SPR_LR]; > env->ctr = env->spr[SPR_CTR]; > cpu_write_xer(env, env->spr[SPR_XER]); > @@ -527,9 +573,32 @@ static const VMStateDescription vmstate_tlbmas = { > } > }; > > +static bool compat_needed(void *opaque) > +{ > + PowerPCCPU *cpu = opaque; > + > + return cpu->vhyp != NULL; > +} > + > +static const VMStateDescription vmstate_compat = { > + .name = "cpu/compat", > + .version_id = 1, > + .minimum_version_id = 1, > + .needed = compat_needed, > + .fields = (VMStateField[]) { > + VMSTATE_UINT32(compat_pvr, PowerPCCPU), > + VMSTATE_END_OF_LIST() > + } > +}; > + > +static bool version_before_6(void *opaque, int version_id) > +{ > + return version_id < 6; > +} > + > const VMStateDescription vmstate_ppc_cpu = { > .name = "cpu", > - .version_id = 5, > + .version_id = 6, > .minimum_version_id = 5, > .minimum_version_id_old = 4, > .load_state_old = cpu_load_old, > @@ -561,10 +630,11 @@ const VMStateDescription vmstate_ppc_cpu = { > /* FIXME: access_type? */ > > /* Sanity checking */ > - VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU), > - VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU), > - VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU), > - VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU), > + VMSTATE_UNUSED_TEST(version_before_6, > + sizeof(target_ulong) /* msr_mask */ > + + sizeof(uint64_t) /* insns_flags */ > + + sizeof(uint64_t) /* insns_flags2 */ > + + sizeof(uint32_t)), /* nb_BATs */ > VMSTATE_END_OF_LIST() > }, > .subsections = (const VMStateDescription*[]) { > @@ -579,6 +649,7 @@ const VMStateDescription vmstate_ppc_cpu = { > &vmstate_tlb6xx, > &vmstate_tlbemb, > &vmstate_tlbmas, > + &vmstate_compat, > NULL > } > };
On Fri, Dec 02, 2016 at 03:48:25PM +0100, Greg Kurz wrote: > On Wed, 16 Nov 2016 09:17:55 +1100 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > Migrating between different CPU versions is quite complicated for ppc. > > A long time ago, we ensure identical CPU versions at either end by checking > > the PVR had the same value. However, this breaks under KVM HV, because > > we always have to use the host's PVR - it's not virtualized. That would > > mean we couldn't migrate between hosts with different PVRs, even if the > > CPUs are close enough to compatible in practice (sometimes identical cores > > with different surrounding logic have different PVRs, so this happens in > > practice quite often). > > > > So, we removed the PVR check, but instead checked that several flags > > indicating supported instructions matched. This turns out to be a bad > > idea, because those instruction masks are not architected information, but > > essentially a TCG implementation detail. So changes to qemu internal CPU > > modelling can break migration - this happened between qemu-2.6 and > > qemu-2.7. > > > > Modern server-class CPUs can be placed into compatibility modes. Now that > > we're handling those properly, we finally have the information to sanely > > deal with CPU compatibility across migration. > > > > This patch bumps the migration version number for the ppc CPU removing the > > instruction mask field (and some other unwise VMSTATE_EQUAL checks), and > > adding the compatibility PVR to the migration stream. > > > > Things have changed since you posted this RFC: > > commit 16a2497bd44cac1856e259654fd304079bd1dcdc > Author: David Gibson <david@gibson.dropbear.id.au> > Date: Mon Nov 21 16:28:12 2016 +1100 > > target-ppc: Fix CPU migration from qemu-2.6 <-> later versions > > and > > commit 146c11f16f12dbfea62cbd7f865614bb6fcbc6b5 > Author: David Gibson <david@gibson.dropbear.id.au> > Date: Mon Nov 21 16:29:30 2016 +1100 > > target-ppc: Allow eventual removal of old migration mistakes > > I guess that the version bumping isn't necessary anymore if we keep these. > > I'll assume yes and rebase this patch against current master, simply dropping > the version bumping and related lines. Yeah, I realised that breaking backwards migration was a bad idea, and with some help from Dave Gilbert worked out how to make it possible. I realize I'm going to have to rework my compat series in light of these changes. > > We consider the CPUs compatible for migration if: > > * The source was running in a compatibility mode which the destination > > supports > > OR * The source has a PVR matching the same qemu CPU class as the > > destination, either an exact match or an approximate match determined > > by the cpu class's pvr_match hook. > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > --- > > target-ppc/machine.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 79 insertions(+), 8 deletions(-) > > > > diff --git a/target-ppc/machine.c b/target-ppc/machine.c > > index e43cb6c..25a30d5 100644 > > --- a/target-ppc/machine.c > > +++ b/target-ppc/machine.c > > @@ -8,6 +8,7 @@ > > #include "helper_regs.h" > > #include "mmu-hash64.h" > > #include "migration/cpu.h" > > +#include "qapi/error.h" > > > > static int cpu_load_old(QEMUFile *f, void *opaque, int version_id) > > { > > @@ -163,6 +164,30 @@ static void cpu_pre_save(void *opaque) > > } > > } > > > > +/* > > + * Determine if a given PVR is a "close enough" match to the CPU > > + * object. For TCG and KVM PR it would probably be sufficient to > > + * require an exact PVR match. However for KVM HV the user is > > + * restricted to a PVR exactly matching the host CPU. The correct way > > + * to handle this is to put the guest into an architected > > + * compatibility mode. However, to allow a more forgiving transition > > + * and migration from before this was widely done, we allow migration > > + * between sufficiently similar PVRs, as determined by the CPU class's > > + * pvr_match() hook. > > + */ > > +static bool pvr_match(PowerPCCPU *cpu, uint32_t pvr) > > +{ > > + PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu); > > + > > + if (pvr == pcc->pvr) { > > + return true; > > + } > > + if (pcc->pvr_match) { > > + return pcc->pvr_match(pcc, pvr); > > + } > > + return false; > > +} > > + > > static int cpu_post_load(void *opaque, int version_id) > > { > > PowerPCCPU *cpu = opaque; > > @@ -171,10 +196,31 @@ static int cpu_post_load(void *opaque, int version_id) > > target_ulong msr; > > > > /* > > - * We always ignore the source PVR. The user or management > > - * software has to take care of running QEMU in a compatible mode. > > + * If we're operating in compat mode, we should be ok as long as > > + * the destination supports the same compatiblity mode. > > + * > > + * Otherwise, however, we require that the destination has exactly > > + * the same CPU model as the source. > > */ > > - env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value; > > + > > +#if defined(TARGET_PPC64) > > + if (cpu->compat_pvr) { > > + Error *local_err = NULL; > > + > > + ppc_set_compat(cpu, cpu->compat_pvr, &local_err); > > This calls cpu_synchronize_state(CPU(cpu)) and trashes the registers. This > is the root cause behind the program interrupts I mentioned in another mail. > > Adding a sync_needed boolean argument to ppc_set_compat() seems to be enough > to get this working. So I'll just do that and rerun the tests. > > Cheers. >
diff --git a/target-ppc/machine.c b/target-ppc/machine.c index e43cb6c..25a30d5 100644 --- a/target-ppc/machine.c +++ b/target-ppc/machine.c @@ -8,6 +8,7 @@ #include "helper_regs.h" #include "mmu-hash64.h" #include "migration/cpu.h" +#include "qapi/error.h" static int cpu_load_old(QEMUFile *f, void *opaque, int version_id) { @@ -163,6 +164,30 @@ static void cpu_pre_save(void *opaque) } } +/* + * Determine if a given PVR is a "close enough" match to the CPU + * object. For TCG and KVM PR it would probably be sufficient to + * require an exact PVR match. However for KVM HV the user is + * restricted to a PVR exactly matching the host CPU. The correct way + * to handle this is to put the guest into an architected + * compatibility mode. However, to allow a more forgiving transition + * and migration from before this was widely done, we allow migration + * between sufficiently similar PVRs, as determined by the CPU class's + * pvr_match() hook. + */ +static bool pvr_match(PowerPCCPU *cpu, uint32_t pvr) +{ + PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu); + + if (pvr == pcc->pvr) { + return true; + } + if (pcc->pvr_match) { + return pcc->pvr_match(pcc, pvr); + } + return false; +} + static int cpu_post_load(void *opaque, int version_id) { PowerPCCPU *cpu = opaque; @@ -171,10 +196,31 @@ static int cpu_post_load(void *opaque, int version_id) target_ulong msr; /* - * We always ignore the source PVR. The user or management - * software has to take care of running QEMU in a compatible mode. + * If we're operating in compat mode, we should be ok as long as + * the destination supports the same compatiblity mode. + * + * Otherwise, however, we require that the destination has exactly + * the same CPU model as the source. */ - env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value; + +#if defined(TARGET_PPC64) + if (cpu->compat_pvr) { + Error *local_err = NULL; + + ppc_set_compat(cpu, cpu->compat_pvr, &local_err); + if (local_err) { + error_report_err(local_err); + error_free(local_err); + return -1; + } + } else +#endif + { + if (!pvr_match(cpu, env->spr[SPR_PVR])) { + return -1; + } + } + env->lr = env->spr[SPR_LR]; env->ctr = env->spr[SPR_CTR]; cpu_write_xer(env, env->spr[SPR_XER]); @@ -527,9 +573,32 @@ static const VMStateDescription vmstate_tlbmas = { } }; +static bool compat_needed(void *opaque) +{ + PowerPCCPU *cpu = opaque; + + return cpu->vhyp != NULL; +} + +static const VMStateDescription vmstate_compat = { + .name = "cpu/compat", + .version_id = 1, + .minimum_version_id = 1, + .needed = compat_needed, + .fields = (VMStateField[]) { + VMSTATE_UINT32(compat_pvr, PowerPCCPU), + VMSTATE_END_OF_LIST() + } +}; + +static bool version_before_6(void *opaque, int version_id) +{ + return version_id < 6; +} + const VMStateDescription vmstate_ppc_cpu = { .name = "cpu", - .version_id = 5, + .version_id = 6, .minimum_version_id = 5, .minimum_version_id_old = 4, .load_state_old = cpu_load_old, @@ -561,10 +630,11 @@ const VMStateDescription vmstate_ppc_cpu = { /* FIXME: access_type? */ /* Sanity checking */ - VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU), - VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU), - VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU), - VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU), + VMSTATE_UNUSED_TEST(version_before_6, + sizeof(target_ulong) /* msr_mask */ + + sizeof(uint64_t) /* insns_flags */ + + sizeof(uint64_t) /* insns_flags2 */ + + sizeof(uint32_t)), /* nb_BATs */ VMSTATE_END_OF_LIST() }, .subsections = (const VMStateDescription*[]) { @@ -579,6 +649,7 @@ const VMStateDescription vmstate_ppc_cpu = { &vmstate_tlb6xx, &vmstate_tlbemb, &vmstate_tlbmas, + &vmstate_compat, NULL } };
Migrating between different CPU versions is quite complicated for ppc. A long time ago, we ensure identical CPU versions at either end by checking the PVR had the same value. However, this breaks under KVM HV, because we always have to use the host's PVR - it's not virtualized. That would mean we couldn't migrate between hosts with different PVRs, even if the CPUs are close enough to compatible in practice (sometimes identical cores with different surrounding logic have different PVRs, so this happens in practice quite often). So, we removed the PVR check, but instead checked that several flags indicating supported instructions matched. This turns out to be a bad idea, because those instruction masks are not architected information, but essentially a TCG implementation detail. So changes to qemu internal CPU modelling can break migration - this happened between qemu-2.6 and qemu-2.7. Modern server-class CPUs can be placed into compatibility modes. Now that we're handling those properly, we finally have the information to sanely deal with CPU compatibility across migration. This patch bumps the migration version number for the ppc CPU removing the instruction mask field (and some other unwise VMSTATE_EQUAL checks), and adding the compatibility PVR to the migration stream. We consider the CPUs compatible for migration if: * The source was running in a compatibility mode which the destination supports OR * The source has a PVR matching the same qemu CPU class as the destination, either an exact match or an approximate match determined by the cpu class's pvr_match hook. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- target-ppc/machine.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 79 insertions(+), 8 deletions(-)