Patchwork Supsend/resume regression in c995b4 WAS: Re: [Qemu-devel] [PATCH] Fix migration uint8 arrys handled

login
register
mail settings
Submitter Stefan Berger
Date March 22, 2011, 11:50 a.m.
Message ID <4D888CFF.5050204@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/87900/
State New
Headers show

Comments

Stefan Berger - March 22, 2011, 11:50 a.m.
On 03/22/2011 06:40 AM, Avi Kivity wrote:
> On 03/22/2011 12:23 PM, Stefan Berger wrote:
>> On 03/22/2011 05:28 AM, Avi Kivity wrote:
>>> On 03/22/2011 03:46 AM, Anthony Liguori wrote:
>>>> On 03/21/2011 07:25 PM, Stefan Berger wrote:
>>>>> On 03/15/2011 10:53 AM, Juan Quintela wrote:
>>>>>> commit 82fa39b75181b730d6d4d09f443bd26bcfcd045c
>>>>>>
>>>>>> only contains half of the fix.  It forgots the save state fix for
>>>>>> UINT8 indexes.
>>>>>>
>>>>>> Anthony, please apply, without this migration using hpet is broken.
>>>>>> (only current user).
>>>>> I have just been bisecting the code (from the tip) due to 
>>>>> suspend/resume problems and it looks like commit 82fa39b7 is 
>>>>> introducing the suspend/resume problem I am seeing (frozen screen).
>>>>
>>>> It's in tip now.
>>>
>>> Great, spent some lovely time bisecting and fixing it as well.
>>>
>> It doesn't work better now than it did before... Trying a 
>> suspend/resume while in grub leaves me with a black screen upon 
>> resume...
>
> Well, it fixed it for me (autotest migration tests).
>
> Do you mean that 82fa39b7^ works but b784421ce4c doesn't?  What about 
> b784421ce4c^ with 82fa39b7 reverted?  That will rule out some other bug.
>
b784421 works for me. It's the tip that is again broken for 
suspend/resume, this time pointing to

c995b495b9d6e60ab1e390bd398a22425d0b3c8c is the first bad commit
commit c995b495b9d6e60ab1e390bd398a22425d0b3c8c
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date:   Tue Mar 15 12:26:22 2011 +0100

     x86: Save/restore PAT MSR

     Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

      },

Doesn't look bad, but I get a black screen when resuming while in grub.

    Stefan
Stefan Berger - March 22, 2011, 11:56 a.m.
On 03/22/2011 07:50 AM, Stefan Berger wrote:
> On 03/22/2011 06:40 AM, Avi Kivity wrote:
>> On 03/22/2011 12:23 PM, Stefan Berger wrote:
>>> On 03/22/2011 05:28 AM, Avi Kivity wrote:
>>>> On 03/22/2011 03:46 AM, Anthony Liguori wrote:
>>>>> On 03/21/2011 07:25 PM, Stefan Berger wrote:
>>>>>> On 03/15/2011 10:53 AM, Juan Quintela wrote:
>>>>>>> commit 82fa39b75181b730d6d4d09f443bd26bcfcd045c
>>>>>>>
>>>>>>> only contains half of the fix.  It forgots the save state fix for
>>>>>>> UINT8 indexes.
>>>>>>>
>>>>>>> Anthony, please apply, without this migration using hpet is broken.
>>>>>>> (only current user).
>>>>>> I have just been bisecting the code (from the tip) due to 
>>>>>> suspend/resume problems and it looks like commit 82fa39b7 is 
>>>>>> introducing the suspend/resume problem I am seeing (frozen screen).
>>>>>
>>>>> It's in tip now.
>>>>
>>>> Great, spent some lovely time bisecting and fixing it as well.
>>>>
>>> It doesn't work better now than it did before... Trying a 
>>> suspend/resume while in grub leaves me with a black screen upon 
>>> resume...
>>
>> Well, it fixed it for me (autotest migration tests).
>>
>> Do you mean that 82fa39b7^ works but b784421ce4c doesn't?  What about 
>> b784421ce4c^ with 82fa39b7 reverted?  That will rule out some other bug.
>>
> b784421 works for me. It's the tip that is again broken for 
> suspend/resume, this time pointing to
>
> c995b495b9d6e60ab1e390bd398a22425d0b3c8c is the first bad commit
> commit c995b495b9d6e60ab1e390bd398a22425d0b3c8c
> Author: Jan Kiszka <jan.kiszka@siemens.com>
> Date:   Tue Mar 15 12:26:22 2011 +0100
>
>     x86: Save/restore PAT MSR
>
>     Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> diff --git a/target-i386/machine.c b/target-i386/machine.c
> index d78eceb..6384f54 100644
> --- a/target-i386/machine.c
> +++ b/target-i386/machine.c
> @@ -491,6 +491,8 @@ static const VMStateDescription vmstate_cpu = {
>          VMSTATE_UINT64_V(xcr0, CPUState, 12),
>          VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
>          VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
> +
> +        VMSTATE_UINT64_V(pat, CPUState, 13),
>          VMSTATE_END_OF_LIST()
>          /* The above list is not sorted /wrt version numbers, watch 
> out! */
>      },
>
> Doesn't look bad, but I get a black screen when resuming while in grub.
>
I think that patch was probably not necessary:

target-i386/machine.c line 452 has this:

         VMSTATE_UINT64_V(pat, CPUState, 5),

then again in line 495:

         VMSTATE_UINT64_V(pat, CPUState, 13),

    Stefan
Jan Kiszka - March 22, 2011, noon
On 2011-03-22 12:56, Stefan Berger wrote:
> On 03/22/2011 07:50 AM, Stefan Berger wrote:
>> On 03/22/2011 06:40 AM, Avi Kivity wrote:
>>> On 03/22/2011 12:23 PM, Stefan Berger wrote:
>>>> On 03/22/2011 05:28 AM, Avi Kivity wrote:
>>>>> On 03/22/2011 03:46 AM, Anthony Liguori wrote:
>>>>>> On 03/21/2011 07:25 PM, Stefan Berger wrote:
>>>>>>> On 03/15/2011 10:53 AM, Juan Quintela wrote:
>>>>>>>> commit 82fa39b75181b730d6d4d09f443bd26bcfcd045c
>>>>>>>>
>>>>>>>> only contains half of the fix.  It forgots the save state fix for
>>>>>>>> UINT8 indexes.
>>>>>>>>
>>>>>>>> Anthony, please apply, without this migration using hpet is broken.
>>>>>>>> (only current user).
>>>>>>> I have just been bisecting the code (from the tip) due to 
>>>>>>> suspend/resume problems and it looks like commit 82fa39b7 is 
>>>>>>> introducing the suspend/resume problem I am seeing (frozen screen).
>>>>>>
>>>>>> It's in tip now.
>>>>>
>>>>> Great, spent some lovely time bisecting and fixing it as well.
>>>>>
>>>> It doesn't work better now than it did before... Trying a 
>>>> suspend/resume while in grub leaves me with a black screen upon 
>>>> resume...
>>>
>>> Well, it fixed it for me (autotest migration tests).
>>>
>>> Do you mean that 82fa39b7^ works but b784421ce4c doesn't?  What about 
>>> b784421ce4c^ with 82fa39b7 reverted?  That will rule out some other bug.
>>>
>> b784421 works for me. It's the tip that is again broken for 
>> suspend/resume, this time pointing to
>>
>> c995b495b9d6e60ab1e390bd398a22425d0b3c8c is the first bad commit
>> commit c995b495b9d6e60ab1e390bd398a22425d0b3c8c
>> Author: Jan Kiszka <jan.kiszka@siemens.com>
>> Date:   Tue Mar 15 12:26:22 2011 +0100
>>
>>     x86: Save/restore PAT MSR
>>
>>     Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>
>> diff --git a/target-i386/machine.c b/target-i386/machine.c
>> index d78eceb..6384f54 100644
>> --- a/target-i386/machine.c
>> +++ b/target-i386/machine.c
>> @@ -491,6 +491,8 @@ static const VMStateDescription vmstate_cpu = {
>>          VMSTATE_UINT64_V(xcr0, CPUState, 12),
>>          VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
>>          VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
>> +
>> +        VMSTATE_UINT64_V(pat, CPUState, 13),
>>          VMSTATE_END_OF_LIST()
>>          /* The above list is not sorted /wrt version numbers, watch 
>> out! */
>>      },
>>
>> Doesn't look bad, but I get a black screen when resuming while in grub.
>>
> I think that patch was probably not necessary:
> 
> target-i386/machine.c line 452 has this:
> 
>          VMSTATE_UINT64_V(pat, CPUState, 5),
> 
> then again in line 495:
> 
>          VMSTATE_UINT64_V(pat, CPUState, 13),
> 

Ouch, indeed. Moreover, CPU_SAVE_VERSION was not updated (likely the
reason for the breakage). Thanks for debugging this!

Anthony (or whoever), please revert this unneeded commit in qemu.git.

We had a few migration related regressions recently. Do we have
sufficient test cases in autotest for them? Also for migrating from
older to the latest version?

Jan
Avi Kivity - March 22, 2011, 12:21 p.m.
On 03/22/2011 02:00 PM, Jan Kiszka wrote:
> We had a few migration related regressions recently. Do we have
> sufficient test cases in autotest for them? Also for migrating from
> older to the latest version?
>

Autotest did catch the uint8 varray thing.  c995b4 isn't yet in 
qemu-kvm.git, so it wasn't autotested by me (Lucas does autotest 
upstream qemu.git, not sure if it was tested yet).
Jan Kiszka - March 22, 2011, 12:30 p.m.
On 2011-03-22 13:21, Avi Kivity wrote:
> On 03/22/2011 02:00 PM, Jan Kiszka wrote:
>> We had a few migration related regressions recently. Do we have
>> sufficient test cases in autotest for them? Also for migrating from
>> older to the latest version?
>>
> 
> Autotest did catch the uint8 varray thing.  c995b4 isn't yet in 
> qemu-kvm.git, so it wasn't autotested by me (Lucas does autotest 
> upstream qemu.git, not sure if it was tested yet).

Upstream was and current is qemu-kvm.git is broken /wrt migrating
i8254[-kvm] from older versions. We also had some issues around vmmouse.
Therefore my question regarding a forward migration tests.

Jan
Juan Quintela - March 22, 2011, 12:33 p.m.
Jan Kiszka <jan.kiszka@siemens.com> wrote:

>> 
>>          VMSTATE_UINT64_V(pat, CPUState, 13),
>> 
>
> Ouch, indeed. Moreover, CPU_SAVE_VERSION was not updated (likely the
> reason for the breakage). Thanks for debugging this!
>
> Anthony (or whoever), please revert this unneeded commit in qemu.git.
>
> We had a few migration related regressions recently. Do we have
> sufficient test cases in autotest for them? Also for migrating from
> older to the latest version?

The UINT8 problem was my "brown paper bug" fault.  About autotest, it
"learnt" to use two machines to test migration a couple of months ago.
Haven't had the time to write test for it thought.

Later, Juan.
Avi Kivity - March 22, 2011, 12:35 p.m.
On 03/22/2011 02:30 PM, Jan Kiszka wrote:
> On 2011-03-22 13:21, Avi Kivity wrote:
> >  On 03/22/2011 02:00 PM, Jan Kiszka wrote:
> >>  We had a few migration related regressions recently. Do we have
> >>  sufficient test cases in autotest for them? Also for migrating from
> >>  older to the latest version?
> >>
> >
> >  Autotest did catch the uint8 varray thing.  c995b4 isn't yet in
> >  qemu-kvm.git, so it wasn't autotested by me (Lucas does autotest
> >  upstream qemu.git, not sure if it was tested yet).
>
> Upstream was and current is qemu-kvm.git is broken /wrt migrating
> i8254[-kvm] from older versions. We also had some issues around vmmouse.
> Therefore my question regarding a forward migration tests.

Yeah, forgot to reply to that.  I don't test forward migrations and I 
don't think autotest has the infrastructure for it.
Jan Kiszka - March 22, 2011, 12:39 p.m.
On 2011-03-22 13:33, Juan Quintela wrote:
> Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
>>>
>>>          VMSTATE_UINT64_V(pat, CPUState, 13),
>>>
>>
>> Ouch, indeed. Moreover, CPU_SAVE_VERSION was not updated (likely the
>> reason for the breakage). Thanks for debugging this!
>>
>> Anthony (or whoever), please revert this unneeded commit in qemu.git.
>>
>> We had a few migration related regressions recently. Do we have
>> sufficient test cases in autotest for them? Also for migrating from
>> older to the latest version?
> 
> The UINT8 problem was my "brown paper bug" fault.  About autotest, it
> "learnt" to use two machines to test migration a couple of months ago.
> Haven't had the time to write test for it thought.

A basic migrate to file / resume from file on the same host would
already suffice in fact.

Jan

Patch

diff --git a/target-i386/machine.c b/target-i386/machine.c
index d78eceb..6384f54 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -491,6 +491,8 @@  static const VMStateDescription vmstate_cpu = {
          VMSTATE_UINT64_V(xcr0, CPUState, 12),
          VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
          VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
+
+        VMSTATE_UINT64_V(pat, CPUState, 13),
          VMSTATE_END_OF_LIST()
          /* The above list is not sorted /wrt version numbers, watch 
out! */