diff mbox

pseries-2.6 migration from QEMU-2.6 to QEMU-2.7 broken

Message ID 87d1jw5mr0.fsf@abhimanyu.i-did-not-set--mail-host-address--so-tickle-me
State New
Headers show

Commit Message

Nikunj A Dadhania Sept. 22, 2016, 9:04 a.m. UTC
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Thu, 2016-09-22 at 11:45 +0530, Bharata B Rao wrote:
>> On Thu, Sep 22, 2016 at 04:07:21PM +1000, Benjamin Herrenschmidt wrote:
>> > 
>> > On Thu, 2016-09-22 at 10:51 +0530, Bharata B Rao wrote:
>> > > 
>> > > The flag values are expected to remain same for a machine version for
>> > > the migration to succeed, but this expectation is broken now. Should
>> > > we make the addition of these flags conditional on machine type
>> > > version ?
>> > > But these flags are part of POWER8 CPU definition which is common for
>> > > both pseries and upcoming powernv.
>> > 
>> > Does this affect KVM ? (And if yes why on earth would KVM give a flying
>> > f*** about the TCG instruction flags ?) ... If not, then I think we can
>> > safely not care.
>> 
>> Yes, KVM migration is broken.
>
> Argh then ... stupid design in QEMU. We can't fix anything without
> breaking migration, yay !

Looking back in the history of the code:

commit: a90db1584a00dc1d1439dc7729d99674b666b85e (target-ppc: Convert
ppc cpu savevm to VMStateDescription) added this:

+        /* Sanity checking */
+        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
+        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),

These flags weren't part of vmstate, I am not sure what was the reason
behind adding it though. Its a bit old, Alexey do you remember?

> I don't know what to do to fix that to be honest. Do we have a way to filter
> what flags actually matter and filter things out when KVM is enabled ?

Something like this works for KVM:


TCG migration still remains broken with this.

Regards,
Nikunj

Comments

Benjamin Herrenschmidt Sept. 22, 2016, 10:04 a.m. UTC | #1
On Thu, 2016-09-22 at 14:34 +0530, Nikunj A Dadhania wrote:
> Something like this works for KVM:

> 

> diff --git a/target-ppc/machine.c b/target-ppc/machine.c

> index 4820f22..1cf3779 100644

> --- a/target-ppc/machine.c

> +++ b/target-ppc/machine.c

> @@ -563,8 +563,8 @@ const VMStateDescription vmstate_ppc_cpu = {

>  

>          /* Sanity checking */

>          VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),

> -        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),

> -        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),

> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags) */

> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags2) */

>          VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),

>          VMSTATE_END_OF_LIST()

>      },

> 

> TCG migration still remains broken with this.


Can we have conditionally present flags and a post-load that does some
matching ?

Cheers,
Ben.
Paolo Bonzini Sept. 22, 2016, 10:32 a.m. UTC | #2
On 22/09/2016 12:04, Benjamin Herrenschmidt wrote:
> On Thu, 2016-09-22 at 14:34 +0530, Nikunj A Dadhania wrote:
>> Something like this works for KVM:
>>
>> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
>> index 4820f22..1cf3779 100644
>> --- a/target-ppc/machine.c
>> +++ b/target-ppc/machine.c
>> @@ -563,8 +563,8 @@ const VMStateDescription vmstate_ppc_cpu = {
>>  
>>          /* Sanity checking */
>>          VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
>> -        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
>> -        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
>> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags) */
>> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags2) */
>>          VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
>>          VMSTATE_END_OF_LIST()
>>      },
>>
>> TCG migration still remains broken with this.
> 
> Can we have conditionally present flags and a post-load that does some
> matching ?

Yes, you can use something like

	VMSTATE_UINT64(env.src_insns_flags, PowerPCCPU),
	VMSTATE_UINT64(env.src_insns_flags2, PowerPCCPU),

and a post_load that compares them if kvm_enabled() only.

*However* a better fix would be to preserve the old flags for
pseries-2.6, and only set the newer flags for pseries-2.7.  I'm not
saying you have to do this, but if it's not hard (no idea) why not learn
how to do it right.

The design is not stupid, it's just that compatibility is harder than
you think and you are going through the same learning experiences that
x86 went though.

Paolo
Alexey Kardashevskiy Sept. 22, 2016, 10:34 a.m. UTC | #3
On 22/09/16 19:04, Nikunj A Dadhania wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
>> On Thu, 2016-09-22 at 11:45 +0530, Bharata B Rao wrote:
>>> On Thu, Sep 22, 2016 at 04:07:21PM +1000, Benjamin Herrenschmidt wrote:
>>>>
>>>> On Thu, 2016-09-22 at 10:51 +0530, Bharata B Rao wrote:
>>>>>
>>>>> The flag values are expected to remain same for a machine version for
>>>>> the migration to succeed, but this expectation is broken now. Should
>>>>> we make the addition of these flags conditional on machine type
>>>>> version ?
>>>>> But these flags are part of POWER8 CPU definition which is common for
>>>>> both pseries and upcoming powernv.
>>>>
>>>> Does this affect KVM ? (And if yes why on earth would KVM give a flying
>>>> f*** about the TCG instruction flags ?) ... If not, then I think we can
>>>> safely not care.
>>>
>>> Yes, KVM migration is broken.
>>
>> Argh then ... stupid design in QEMU. We can't fix anything without
>> breaking migration, yay !
> 
> Looking back in the history of the code:
> 
> commit: a90db1584a00dc1d1439dc7729d99674b666b85e (target-ppc: Convert
> ppc cpu savevm to VMStateDescription) added this:
> 
> +        /* Sanity checking */
> +        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
> +        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
> 
> These flags weren't part of vmstate, I am not sure what was the reason
> behind adding it though. Its a bit old, Alexey do you remember?


These flags say what QEMU can and cannot emulate, when we migrate, we want
to make sure that the QEMU machine remains the same.

_Today_ I would not do that and rather have added CPU flags to ensure
compatibility but those days VMSTATE_xxx_EQUAL() were not considered bad
idea yet :)



>> I don't know what to do to fix that to be honest. Do we have a way to filter
>> what flags actually matter and filter things out when KVM is enabled ?


imho we should migrate them (i.e. without _EQUAL) and let cpu_post_load()
fail if something incompatible arrived.


> 
> Something like this works for KVM:
> 
> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> index 4820f22..1cf3779 100644
> --- a/target-ppc/machine.c
> +++ b/target-ppc/machine.c
> @@ -563,8 +563,8 @@ const VMStateDescription vmstate_ppc_cpu = {
>  
>          /* Sanity checking */
>          VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
> -        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
> -        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags) */
> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags2) */
>          VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
>          VMSTATE_END_OF_LIST()
>      },
> 
> TCG migration still remains broken with this.
Benjamin Herrenschmidt Sept. 22, 2016, 11:07 a.m. UTC | #4
On Thu, 2016-09-22 at 12:32 +0200, Paolo Bonzini wrote:
> *However* a better fix would be to preserve the old flags for
> pseries-2.6, and only set the newer flags for pseries-2.7.  I'm not
> saying you have to do this, but if it's not hard (no idea) why not learn
> how to do it right.
> 
> The design is not stupid, it's just that compatibility is harder than
> you think and you are going through the same learning experiences that
> x86 went though.

Yeah well, the design is stupid inside target-ppc is what I meant in
the sense that it should have been clearer that those flags should not
have affected KVM, especially knowing that TCG still needed a lot of
work to add all the proper HV stuff.

Also most/all those flags concern instructions that are not relevant to
the "PAPR" mode which is running the guest with HV disabled, so
additionally, we might want to consider being smarter in the compare as
well to make sure that only the flags relevant to guest mode are
compared when the vCPU is in PAPR mode.

Cheers,
Ben.
Benjamin Herrenschmidt Sept. 22, 2016, 11:09 a.m. UTC | #5
On Thu, 2016-09-22 at 20:34 +1000, Alexey Kardashevskiy wrote:
> > diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> > index 4820f22..1cf3779 100644
> > --- a/target-ppc/machine.c
> > +++ b/target-ppc/machine.c
> > @@ -563,8 +563,8 @@ const VMStateDescription vmstate_ppc_cpu = {
> >  
> >          /* Sanity checking */
> >          VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
> > -        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
> > -        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
> > +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was
> _EQUAL(env.insns_flags) */
> > +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was
> _EQUAL(env.insns_flags2) */
> >          VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
> >          VMSTATE_END_OF_LIST()
> >      },
> > 
> > TCG migration still remains broken with this.

TCG migration doesn't matter much ... yet, I think.

KVM is what actual customers use, we can probably live with some TCG
migration breakage. Hopefully we'll be done with P8 soon and it will be
stable enough, and we'll be more careful with P9.

Cheers,
Ben.
David Gibson Sept. 23, 2016, 12:52 a.m. UTC | #6
On Thu, Sep 22, 2016 at 02:34:19PM +0530, Nikunj A Dadhania wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
> > On Thu, 2016-09-22 at 11:45 +0530, Bharata B Rao wrote:
> >> On Thu, Sep 22, 2016 at 04:07:21PM +1000, Benjamin Herrenschmidt wrote:
> >> > 
> >> > On Thu, 2016-09-22 at 10:51 +0530, Bharata B Rao wrote:
> >> > > 
> >> > > The flag values are expected to remain same for a machine version for
> >> > > the migration to succeed, but this expectation is broken now. Should
> >> > > we make the addition of these flags conditional on machine type
> >> > > version ?
> >> > > But these flags are part of POWER8 CPU definition which is common for
> >> > > both pseries and upcoming powernv.
> >> > 
> >> > Does this affect KVM ? (And if yes why on earth would KVM give a flying
> >> > f*** about the TCG instruction flags ?) ... If not, then I think we can
> >> > safely not care.
> >> 
> >> Yes, KVM migration is broken.
> >
> > Argh then ... stupid design in QEMU. We can't fix anything without
> > breaking migration, yay !
> 
> Looking back in the history of the code:
> 
> commit: a90db1584a00dc1d1439dc7729d99674b666b85e (target-ppc: Convert
> ppc cpu savevm to VMStateDescription) added this:
> 
> +        /* Sanity checking */
> +        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
> +        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
> 
> These flags weren't part of vmstate, I am not sure what was the reason
> behind adding it though. Its a bit old, Alexey do you remember?
> 
> > I don't know what to do to fix that to be honest. Do we have a way to filter
> > what flags actually matter and filter things out when KVM is enabled ?
> 
> Something like this works for KVM:
> 
> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> index 4820f22..1cf3779 100644
> --- a/target-ppc/machine.c
> +++ b/target-ppc/machine.c
> @@ -563,8 +563,8 @@ const VMStateDescription vmstate_ppc_cpu = {
>  
>          /* Sanity checking */
>          VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
> -        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
> -        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags) */
> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags2) */
>          VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
>          VMSTATE_END_OF_LIST()
>      },

This looks like the right solution to me.  AFAICT this was just a
sanity check that wasn't thought through well enough.

> TCG migration still remains broken with this.

Uh.. why?
David Gibson Sept. 23, 2016, 1:01 a.m. UTC | #7
On Thu, Sep 22, 2016 at 12:32:24PM +0200, Paolo Bonzini wrote:
> 
> 
> On 22/09/2016 12:04, Benjamin Herrenschmidt wrote:
> > On Thu, 2016-09-22 at 14:34 +0530, Nikunj A Dadhania wrote:
> >> Something like this works for KVM:
> >>
> >> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> >> index 4820f22..1cf3779 100644
> >> --- a/target-ppc/machine.c
> >> +++ b/target-ppc/machine.c
> >> @@ -563,8 +563,8 @@ const VMStateDescription vmstate_ppc_cpu = {
> >>  
> >>          /* Sanity checking */
> >>          VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
> >> -        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
> >> -        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
> >> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags) */
> >> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags2) */
> >>          VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
> >>          VMSTATE_END_OF_LIST()
> >>      },
> >>
> >> TCG migration still remains broken with this.
> > 
> > Can we have conditionally present flags and a post-load that does some
> > matching ?
> 
> Yes, you can use something like
> 
> 	VMSTATE_UINT64(env.src_insns_flags, PowerPCCPU),
> 	VMSTATE_UINT64(env.src_insns_flags2, PowerPCCPU),
> 
> and a post_load that compares them if kvm_enabled() only.

We could, but I'm not convinced there's any point.  I don't see that
migrating these flags actually has any value beyond a sanity check,
the consequences of which we obviously didn't think through fully.
They should just be a TCG internal matter.

> *However* a better fix would be to preserve the old flags for
> pseries-2.6, and only set the newer flags for pseries-2.7.  I'm not
> saying you have to do this, but if it's not hard (no idea) why not learn
> how to do it right.
> 
> The design is not stupid, it's just that compatibility is harder than
> you think and you are going through the same learning experiences that
> x86 went though.
> 
> Paolo
>
Nikunj A Dadhania Sept. 23, 2016, 3:18 a.m. UTC | #8
David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Thu, Sep 22, 2016 at 02:34:19PM +0530, Nikunj A Dadhania wrote:
>> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>> 
>> > On Thu, 2016-09-22 at 11:45 +0530, Bharata B Rao wrote:
>> >> On Thu, Sep 22, 2016 at 04:07:21PM +1000, Benjamin Herrenschmidt wrote:
>> >> > 
>> >> > On Thu, 2016-09-22 at 10:51 +0530, Bharata B Rao wrote:
>> >> > > 
>> >> > > The flag values are expected to remain same for a machine version for
>> >> > > the migration to succeed, but this expectation is broken now. Should
>> >> > > we make the addition of these flags conditional on machine type
>> >> > > version ?
>> >> > > But these flags are part of POWER8 CPU definition which is common for
>> >> > > both pseries and upcoming powernv.
>> >> > 
>> >> > Does this affect KVM ? (And if yes why on earth would KVM give a flying
>> >> > f*** about the TCG instruction flags ?) ... If not, then I think we can
>> >> > safely not care.
>> >> 
>> >> Yes, KVM migration is broken.
>> >
>> > Argh then ... stupid design in QEMU. We can't fix anything without
>> > breaking migration, yay !
>> 
>> Looking back in the history of the code:
>> 
>> commit: a90db1584a00dc1d1439dc7729d99674b666b85e (target-ppc: Convert
>> ppc cpu savevm to VMStateDescription) added this:
>> 
>> +        /* Sanity checking */
>> +        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
>> +        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
>> 
>> These flags weren't part of vmstate, I am not sure what was the reason
>> behind adding it though. Its a bit old, Alexey do you remember?
>> 
>> > I don't know what to do to fix that to be honest. Do we have a way to filter
>> > what flags actually matter and filter things out when KVM is enabled ?
>> 
>> Something like this works for KVM:
>> 
>> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
>> index 4820f22..1cf3779 100644
>> --- a/target-ppc/machine.c
>> +++ b/target-ppc/machine.c
>> @@ -563,8 +563,8 @@ const VMStateDescription vmstate_ppc_cpu = {
>>  
>>          /* Sanity checking */
>>          VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
>> -        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
>> -        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
>> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags) */
>> +        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags2) */
>>          VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
>>          VMSTATE_END_OF_LIST()
>>      },
>
> This looks like the right solution to me.  AFAICT this was just a
> sanity check that wasn't thought through well enough.
>
>> TCG migration still remains broken with this.
>
> Uh.. why?

Didn't debug it yet, reported on the other thread

      qemu: fatal: Trying to deliver HV exception 4 with no HV support

      NIP c0000000000795c8   LR d00000000074407c CTR c000000000079544 XER 0000000000000000 CPU#0
      MSR 8000000000009032 HID0 0000000000000000  HF 8000000000000030 iidx 1 didx 1
      TB 00000007 32202510341 DECR 00596259

Once it just hung, without any messages.

Regards
Nikunj
diff mbox

Patch

diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 4820f22..1cf3779 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -563,8 +563,8 @@  const VMStateDescription vmstate_ppc_cpu = {
 
         /* Sanity checking */
         VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
-        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
-        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
+        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags) */
+        VMSTATE_UNUSED(sizeof(target_ulong)), /* was _EQUAL(env.insns_flags2) */
         VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
         VMSTATE_END_OF_LIST()
     },