Supsend/resume regression in c995b4 WAS: Re: [Qemu-devel] [PATCH] Fix migration uint8 arrys handled

Submitted by Stefan Berger on March 22, 2011, 11:50 a.m.

Details

Message ID 4D888CFF.5050204@linux.vnet.ibm.com
State New
Headers show

Commit Message

Stefan Berger March 22, 2011, 11:50 a.m.
On 03/22/2011 06:40 AM, Avi Kivity wrote:
> On 03/22/2011 12:23 PM, Stefan Berger wrote:
>> On 03/22/2011 05:28 AM, Avi Kivity wrote:
>>> On 03/22/2011 03:46 AM, Anthony Liguori wrote:
>>>> On 03/21/2011 07:25 PM, Stefan Berger wrote:
>>>>> On 03/15/2011 10:53 AM, Juan Quintela wrote:
>>>>>> commit 82fa39b75181b730d6d4d09f443bd26bcfcd045c
>>>>>>
>>>>>> only contains half of the fix.  It forgots the save state fix for
>>>>>> UINT8 indexes.
>>>>>>
>>>>>> Anthony, please apply, without this migration using hpet is broken.
>>>>>> (only current user).
>>>>> I have just been bisecting the code (from the tip) due to 
>>>>> suspend/resume problems and it looks like commit 82fa39b7 is 
>>>>> introducing the suspend/resume problem I am seeing (frozen screen).
>>>>
>>>> It's in tip now.
>>>
>>> Great, spent some lovely time bisecting and fixing it as well.
>>>
>> It doesn't work better now than it did before... Trying a 
>> suspend/resume while in grub leaves me with a black screen upon 
>> resume...
>
> Well, it fixed it for me (autotest migration tests).
>
> Do you mean that 82fa39b7^ works but b784421ce4c doesn't?  What about 
> b784421ce4c^ with 82fa39b7 reverted?  That will rule out some other bug.
>
b784421 works for me. It's the tip that is again broken for 
suspend/resume, this time pointing to

c995b495b9d6e60ab1e390bd398a22425d0b3c8c is the first bad commit
commit c995b495b9d6e60ab1e390bd398a22425d0b3c8c
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date:   Tue Mar 15 12:26:22 2011 +0100

     x86: Save/restore PAT MSR

     Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

      },

Doesn't look bad, but I get a black screen when resuming while in grub.

    Stefan

Comments

Stefan Berger March 22, 2011, 11:56 a.m.
On 03/22/2011 07:50 AM, Stefan Berger wrote:
> On 03/22/2011 06:40 AM, Avi Kivity wrote:
>> On 03/22/2011 12:23 PM, Stefan Berger wrote:
>>> On 03/22/2011 05:28 AM, Avi Kivity wrote:
>>>> On 03/22/2011 03:46 AM, Anthony Liguori wrote:
>>>>> On 03/21/2011 07:25 PM, Stefan Berger wrote:
>>>>>> On 03/15/2011 10:53 AM, Juan Quintela wrote:
>>>>>>> commit 82fa39b75181b730d6d4d09f443bd26bcfcd045c
>>>>>>>
>>>>>>> only contains half of the fix.  It forgots the save state fix for
>>>>>>> UINT8 indexes.
>>>>>>>
>>>>>>> Anthony, please apply, without this migration using hpet is broken.
>>>>>>> (only current user).
>>>>>> I have just been bisecting the code (from the tip) due to 
>>>>>> suspend/resume problems and it looks like commit 82fa39b7 is 
>>>>>> introducing the suspend/resume problem I am seeing (frozen screen).
>>>>>
>>>>> It's in tip now.
>>>>
>>>> Great, spent some lovely time bisecting and fixing it as well.
>>>>
>>> It doesn't work better now than it did before... Trying a 
>>> suspend/resume while in grub leaves me with a black screen upon 
>>> resume...
>>
>> Well, it fixed it for me (autotest migration tests).
>>
>> Do you mean that 82fa39b7^ works but b784421ce4c doesn't?  What about 
>> b784421ce4c^ with 82fa39b7 reverted?  That will rule out some other bug.
>>
> b784421 works for me. It's the tip that is again broken for 
> suspend/resume, this time pointing to
>
> c995b495b9d6e60ab1e390bd398a22425d0b3c8c is the first bad commit
> commit c995b495b9d6e60ab1e390bd398a22425d0b3c8c
> Author: Jan Kiszka <jan.kiszka@siemens.com>
> Date:   Tue Mar 15 12:26:22 2011 +0100
>
>     x86: Save/restore PAT MSR
>
>     Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> diff --git a/target-i386/machine.c b/target-i386/machine.c
> index d78eceb..6384f54 100644
> --- a/target-i386/machine.c
> +++ b/target-i386/machine.c
> @@ -491,6 +491,8 @@ static const VMStateDescription vmstate_cpu = {
>          VMSTATE_UINT64_V(xcr0, CPUState, 12),
>          VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
>          VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
> +
> +        VMSTATE_UINT64_V(pat, CPUState, 13),
>          VMSTATE_END_OF_LIST()
>          /* The above list is not sorted /wrt version numbers, watch 
> out! */
>      },
>
> Doesn't look bad, but I get a black screen when resuming while in grub.
>
I think that patch was probably not necessary:

target-i386/machine.c line 452 has this:

         VMSTATE_UINT64_V(pat, CPUState, 5),

then again in line 495:

         VMSTATE_UINT64_V(pat, CPUState, 13),

    Stefan
Jan Kiszka March 22, 2011, noon
On 2011-03-22 12:56, Stefan Berger wrote:
> On 03/22/2011 07:50 AM, Stefan Berger wrote:
>> On 03/22/2011 06:40 AM, Avi Kivity wrote:
>>> On 03/22/2011 12:23 PM, Stefan Berger wrote:
>>>> On 03/22/2011 05:28 AM, Avi Kivity wrote:
>>>>> On 03/22/2011 03:46 AM, Anthony Liguori wrote:
>>>>>> On 03/21/2011 07:25 PM, Stefan Berger wrote:
>>>>>>> On 03/15/2011 10:53 AM, Juan Quintela wrote:
>>>>>>>> commit 82fa39b75181b730d6d4d09f443bd26bcfcd045c
>>>>>>>>
>>>>>>>> only contains half of the fix.  It forgots the save state fix for
>>>>>>>> UINT8 indexes.
>>>>>>>>
>>>>>>>> Anthony, please apply, without this migration using hpet is broken.
>>>>>>>> (only current user).
>>>>>>> I have just been bisecting the code (from the tip) due to 
>>>>>>> suspend/resume problems and it looks like commit 82fa39b7 is 
>>>>>>> introducing the suspend/resume problem I am seeing (frozen screen).
>>>>>>
>>>>>> It's in tip now.
>>>>>
>>>>> Great, spent some lovely time bisecting and fixing it as well.
>>>>>
>>>> It doesn't work better now than it did before... Trying a 
>>>> suspend/resume while in grub leaves me with a black screen upon 
>>>> resume...
>>>
>>> Well, it fixed it for me (autotest migration tests).
>>>
>>> Do you mean that 82fa39b7^ works but b784421ce4c doesn't?  What about 
>>> b784421ce4c^ with 82fa39b7 reverted?  That will rule out some other bug.
>>>
>> b784421 works for me. It's the tip that is again broken for 
>> suspend/resume, this time pointing to
>>
>> c995b495b9d6e60ab1e390bd398a22425d0b3c8c is the first bad commit
>> commit c995b495b9d6e60ab1e390bd398a22425d0b3c8c
>> Author: Jan Kiszka <jan.kiszka@siemens.com>
>> Date:   Tue Mar 15 12:26:22 2011 +0100
>>
>>     x86: Save/restore PAT MSR
>>
>>     Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>
>> diff --git a/target-i386/machine.c b/target-i386/machine.c
>> index d78eceb..6384f54 100644
>> --- a/target-i386/machine.c
>> +++ b/target-i386/machine.c
>> @@ -491,6 +491,8 @@ static const VMStateDescription vmstate_cpu = {
>>          VMSTATE_UINT64_V(xcr0, CPUState, 12),
>>          VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
>>          VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
>> +
>> +        VMSTATE_UINT64_V(pat, CPUState, 13),
>>          VMSTATE_END_OF_LIST()
>>          /* The above list is not sorted /wrt version numbers, watch 
>> out! */
>>      },
>>
>> Doesn't look bad, but I get a black screen when resuming while in grub.
>>
> I think that patch was probably not necessary:
> 
> target-i386/machine.c line 452 has this:
> 
>          VMSTATE_UINT64_V(pat, CPUState, 5),
> 
> then again in line 495:
> 
>          VMSTATE_UINT64_V(pat, CPUState, 13),
> 

Ouch, indeed. Moreover, CPU_SAVE_VERSION was not updated (likely the
reason for the breakage). Thanks for debugging this!

Anthony (or whoever), please revert this unneeded commit in qemu.git.

We had a few migration related regressions recently. Do we have
sufficient test cases in autotest for them? Also for migrating from
older to the latest version?

Jan
Avi Kivity March 22, 2011, 12:21 p.m.
On 03/22/2011 02:00 PM, Jan Kiszka wrote:
> We had a few migration related regressions recently. Do we have
> sufficient test cases in autotest for them? Also for migrating from
> older to the latest version?
>

Autotest did catch the uint8 varray thing.  c995b4 isn't yet in 
qemu-kvm.git, so it wasn't autotested by me (Lucas does autotest 
upstream qemu.git, not sure if it was tested yet).
Jan Kiszka March 22, 2011, 12:30 p.m.
On 2011-03-22 13:21, Avi Kivity wrote:
> On 03/22/2011 02:00 PM, Jan Kiszka wrote:
>> We had a few migration related regressions recently. Do we have
>> sufficient test cases in autotest for them? Also for migrating from
>> older to the latest version?
>>
> 
> Autotest did catch the uint8 varray thing.  c995b4 isn't yet in 
> qemu-kvm.git, so it wasn't autotested by me (Lucas does autotest 
> upstream qemu.git, not sure if it was tested yet).

Upstream was and current is qemu-kvm.git is broken /wrt migrating
i8254[-kvm] from older versions. We also had some issues around vmmouse.
Therefore my question regarding a forward migration tests.

Jan
Juan Quintela March 22, 2011, 12:33 p.m.
Jan Kiszka <jan.kiszka@siemens.com> wrote:

>> 
>>          VMSTATE_UINT64_V(pat, CPUState, 13),
>> 
>
> Ouch, indeed. Moreover, CPU_SAVE_VERSION was not updated (likely the
> reason for the breakage). Thanks for debugging this!
>
> Anthony (or whoever), please revert this unneeded commit in qemu.git.
>
> We had a few migration related regressions recently. Do we have
> sufficient test cases in autotest for them? Also for migrating from
> older to the latest version?

The UINT8 problem was my "brown paper bug" fault.  About autotest, it
"learnt" to use two machines to test migration a couple of months ago.
Haven't had the time to write test for it thought.

Later, Juan.
Avi Kivity March 22, 2011, 12:35 p.m.
On 03/22/2011 02:30 PM, Jan Kiszka wrote:
> On 2011-03-22 13:21, Avi Kivity wrote:
> >  On 03/22/2011 02:00 PM, Jan Kiszka wrote:
> >>  We had a few migration related regressions recently. Do we have
> >>  sufficient test cases in autotest for them? Also for migrating from
> >>  older to the latest version?
> >>
> >
> >  Autotest did catch the uint8 varray thing.  c995b4 isn't yet in
> >  qemu-kvm.git, so it wasn't autotested by me (Lucas does autotest
> >  upstream qemu.git, not sure if it was tested yet).
>
> Upstream was and current is qemu-kvm.git is broken /wrt migrating
> i8254[-kvm] from older versions. We also had some issues around vmmouse.
> Therefore my question regarding a forward migration tests.

Yeah, forgot to reply to that.  I don't test forward migrations and I 
don't think autotest has the infrastructure for it.
Jan Kiszka March 22, 2011, 12:39 p.m.
On 2011-03-22 13:33, Juan Quintela wrote:
> Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
>>>
>>>          VMSTATE_UINT64_V(pat, CPUState, 13),
>>>
>>
>> Ouch, indeed. Moreover, CPU_SAVE_VERSION was not updated (likely the
>> reason for the breakage). Thanks for debugging this!
>>
>> Anthony (or whoever), please revert this unneeded commit in qemu.git.
>>
>> We had a few migration related regressions recently. Do we have
>> sufficient test cases in autotest for them? Also for migrating from
>> older to the latest version?
> 
> The UINT8 problem was my "brown paper bug" fault.  About autotest, it
> "learnt" to use two machines to test migration a couple of months ago.
> Haven't had the time to write test for it thought.

A basic migrate to file / resume from file on the same host would
already suffice in fact.

Jan

Patch hide | download patch | download mbox

diff --git a/target-i386/machine.c b/target-i386/machine.c
index d78eceb..6384f54 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -491,6 +491,8 @@  static const VMStateDescription vmstate_cpu = {
          VMSTATE_UINT64_V(xcr0, CPUState, 12),
          VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
          VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
+
+        VMSTATE_UINT64_V(pat, CPUState, 13),
          VMSTATE_END_OF_LIST()
          /* The above list is not sorted /wrt version numbers, watch 
out! */