Patchwork reenable vm_clock when resuming all vcpus

login
register
mail settings
Submitter Wen Congyang
Date Nov. 4, 2011, 2:45 a.m.
Message ID <4EB351E6.3090807@cn.fujitsu.com>
Download mbox | patch
Permalink /patch/123559/
State New
Headers show

Comments

Wen Congyang - Nov. 4, 2011, 2:45 a.m.
We disable vm_clock when pausing all vcpus, but we forget to
reenable it when resuming all vcpus. It will cause that the
guest can not be rebooted.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

---
 cpus.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
Zhiyong Wu - Nov. 4, 2011, 2:48 a.m.
On Fri, Nov 4, 2011 at 10:45 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
>
> We disable vm_clock when pausing all vcpus, but we forget to
> reenable it when resuming all vcpus. It will cause that the
> guest can not be rebooted.
Actually i also met this issue when rebooting the guest. When the
guest is rebooted, it will crash.

>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>
> ---
>  cpus.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index 6aff425..82530c4 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -891,6 +891,7 @@ void resume_all_vcpus(void)
>  {
>     CPUState *penv = first_cpu;
>
> +    qemu_clock_enable(vm_clock, true);
>     while (penv) {
>         penv->stop = 0;
>         penv->stopped = 0;
> --
> 1.7.1
>



--
Regards,

Zhi Yong Wu
Wen Congyang - Nov. 4, 2011, 6:26 a.m.
At 11/04/2011 10:48 AM, Zhi Yong Wu Write:
> On Fri, Nov 4, 2011 at 10:45 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
>>
>> We disable vm_clock when pausing all vcpus, but we forget to
>> reenable it when resuming all vcpus. It will cause that the
>> guest can not be rebooted.
> Actually i also met this issue when rebooting the guest. When the
> guest is rebooted, it will crash.

When I reboot the guest, it will hang.

Does this patch solve your problem?

> 
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>
>> ---
>>  cpus.c |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index 6aff425..82530c4 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -891,6 +891,7 @@ void resume_all_vcpus(void)
>>  {
>>     CPUState *penv = first_cpu;
>>
>> +    qemu_clock_enable(vm_clock, true);
>>     while (penv) {
>>         penv->stop = 0;
>>         penv->stopped = 0;
>> --
>> 1.7.1
>>
> 
> 
> 
> --
> Regards,
> 
> Zhi Yong Wu
>
Paolo Bonzini - Nov. 4, 2011, 7:43 a.m.
On 11/04/2011 03:45 AM, Wen Congyang wrote:
> We disable vm_clock when pausing all vcpus, but we forget to
> reenable it when resuming all vcpus. It will cause that the
> guest can not be rebooted.
>
> Signed-off-by: Wen Congyang<wency@cn.fujitsu.com>
>
> ---
>   cpus.c |    1 +
>   1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index 6aff425..82530c4 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -891,6 +891,7 @@ void resume_all_vcpus(void)
>   {
>       CPUState *penv = first_cpu;
>
> +    qemu_clock_enable(vm_clock, true);
>       while (penv) {
>           penv->stop = 0;
>           penv->stopped = 0;

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

The code I have now works for TCG, but of course not for KVM.  Thanks.

Paolo
Zhiyong Wu - Nov. 4, 2011, 11:31 a.m.
On Fri, Nov 4, 2011 at 2:26 PM, Wen Congyang <wency@cn.fujitsu.com> wrote:
> At 11/04/2011 10:48 AM, Zhi Yong Wu Write:
>> On Fri, Nov 4, 2011 at 10:45 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
>>>
>>> We disable vm_clock when pausing all vcpus, but we forget to
>>> reenable it when resuming all vcpus. It will cause that the
>>> guest can not be rebooted.
>> Actually i also met this issue when rebooting the guest. When the
>> guest is rebooted, it will crash.
>
> When I reboot the guest, it will hang.
>
> Does this patch solve your problem?
Next Monday, i will do one simple testing for this.

>
>>
>>>
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>
>>> ---
>>>  cpus.c |    1 +
>>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index 6aff425..82530c4 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -891,6 +891,7 @@ void resume_all_vcpus(void)
>>>  {
>>>     CPUState *penv = first_cpu;
>>>
>>> +    qemu_clock_enable(vm_clock, true);
>>>     while (penv) {
>>>         penv->stop = 0;
>>>         penv->stopped = 0;
>>> --
>>> 1.7.1
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Zhi Yong Wu
>>
>
>
Zhiyong Wu - Nov. 7, 2011, 2:24 a.m.
On Fri, Nov 4, 2011 at 2:26 PM, Wen Congyang <wency@cn.fujitsu.com> wrote:
> At 11/04/2011 10:48 AM, Zhi Yong Wu Write:
>> On Fri, Nov 4, 2011 at 10:45 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
>>>
>>> We disable vm_clock when pausing all vcpus, but we forget to
>>> reenable it when resuming all vcpus. It will cause that the
>>> guest can not be rebooted.
>> Actually i also met this issue when rebooting the guest. When the
>> guest is rebooted, it will crash.
>
> When I reboot the guest, it will hang.
>
> Does this patch solve your problem?
Yeah, it can work for my guest now.

BTW: If the patch is not applied, when guest is rebooted, fedora guest
will crash, while rh6.1 guest will hang up.

Tested-by: Zhi Yong Wu <zwu.kernel@gmai.com>

>
>>
>>>
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>
>>> ---
>>>  cpus.c |    1 +
>>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index 6aff425..82530c4 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -891,6 +891,7 @@ void resume_all_vcpus(void)
>>>  {
>>>     CPUState *penv = first_cpu;
>>>
>>> +    qemu_clock_enable(vm_clock, true);
>>>     while (penv) {
>>>         penv->stop = 0;
>>>         penv->stopped = 0;
>>> --
>>> 1.7.1
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Zhi Yong Wu
>>
>
>
Anthony Liguori - Nov. 8, 2011, 5:24 p.m.
On 11/03/2011 09:45 PM, Wen Congyang wrote:
> We disable vm_clock when pausing all vcpus, but we forget to
> reenable it when resuming all vcpus. It will cause that the
> guest can not be rebooted.
>
> Signed-off-by: Wen Congyang<wency@cn.fujitsu.com>

Applied.  Thanks.

Regards,

Anthony Liguori

>
> ---
>   cpus.c |    1 +
>   1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index 6aff425..82530c4 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -891,6 +891,7 @@ void resume_all_vcpus(void)
>   {
>       CPUState *penv = first_cpu;
>
> +    qemu_clock_enable(vm_clock, true);
>       while (penv) {
>           penv->stop = 0;
>           penv->stopped = 0;
Eduardo Habkost - Nov. 9, 2011, 4:22 p.m.
On Tue, Nov 08, 2011 at 11:24:30AM -0600, Anthony Liguori wrote:
> On 11/03/2011 09:45 PM, Wen Congyang wrote:
> >We disable vm_clock when pausing all vcpus, but we forget to
> >reenable it when resuming all vcpus. It will cause that the
> >guest can not be rebooted.
> >
> >Signed-off-by: Wen Congyang<wency@cn.fujitsu.com>
> 
> Applied.  Thanks.

FYI, I found out that this patch solved an issue we were having in our
Autotest runs: https://bugs.launchpad.net/qemu/+bug/886255

But the reason it fixed the issue are a mystery to me. The symptoms
were:

- A RHEL 6.1 guest was installed
- After booting, sshd created the ssh keys normally
- Guest was rebooted (I was not seeing the "guest can't be rebooted"
  issue mentioned on the description of this patch)
- After rebooting, the ssh key files on /etc/ssh had size 0 for some
  unknown reason
- sshd failed to re-create the keys because of the empty key files, and
  failed to start
  - The immediate reason was because the (empty) public key file has
    selinux context that can't be written by ssh-keygen, but that only
    happened because the files become empty.

For some reason, the problem stopped happening after this commit. I have
no idea why it fixed the issue. I don't know if it fixed the actual bug
that caused the problem, or the bug is timing-dependent and for some
reason this committed just changed the timing conditions and the bug is
still hidden in qemu.

The bug was 100% reproducible on the machine I was testing, previously,
and after this commit it didn't happen anymore. I double-checked by
re-testing this commit (47113ab6), and the previous one (f67ab77a)
twice, after bisecting.

(If anybody has some possible explanation on how this patch could have
any impact on the above symptoms, or want to investigate further, please
let me know. I think I will stop debugging this, move on, and finally
stress-test my qemu_fclose() series, now.)
Paolo Bonzini - Nov. 10, 2011, 8:28 a.m.
On 11/09/2011 05:22 PM, Eduardo Habkost wrote:
> The bug was 100% reproducible on the machine I was testing, previously,
> and after this commit it didn't happen anymore. I double-checked by
> re-testing this commit (47113ab6), and the previous one (f67ab77a)
> twice, after bisecting.
>
> (If anybody has some possible explanation on how this patch could have
> any impact on the above symptoms, or want to investigate further, please
> let me know. I think I will stop debugging this, move on, and finally
> stress-test my qemu_fclose() series, now.)

I'm not sure about how the patch could fix this, but I'm not surprised 
that Linux shows the bug in more mysterious ways than Windows.  That's 
because Windows uses the (in-QEMU) HPET for timing and Linux by default 
uses the (in-kernel) kvmclock.

Paolo

Patch

diff --git a/cpus.c b/cpus.c
index 6aff425..82530c4 100644
--- a/cpus.c
+++ b/cpus.c
@@ -891,6 +891,7 @@  void resume_all_vcpus(void)
 {
     CPUState *penv = first_cpu;
 
+    qemu_clock_enable(vm_clock, true);
     while (penv) {
         penv->stop = 0;
         penv->stopped = 0;