diff mbox

latest rc: virtio-blk hangs forever after migration

Message ID CABYiri9MigkURgV273kVjhnYxLz_WnQtsZzToy531vY0cjiGkQ@mail.gmail.com
State New
Headers show

Commit Message

Andrey Korolyov July 15, 2014, 11:40 p.m. UTC
On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
>
>> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com>
>> wrote:
>>>
>>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>>>>
>>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru>
>>>> wrote:
>>>>>
>>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com>
>>>>> wrote:
>>>>>>
>>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> the issue is not specific to the iothread code because generic
>>>>>>> virtio-blk also hangs up:
>>>>>>
>>>>>>
>>>>>> Do you know which version works well?  If you could bisect, that'll
>>>>>> help a lot.
>>>>>>
>>>>>> Thanks,
>>>>>>                 Amit
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
>>>>> every step takes about 10 minutes to complete.
>>>>
>>>>
>>>> Yay! It is even outside of virtio-blk.
>>>>
>>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
>>>> Date:   Tue Jun 3 13:34:48 2014 -0300
>>>>
>>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
>>>> calculation
>>>>
>>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
>>>>
>>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
>>>>     Cc: qemu-stable@nongnu.org
>>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>
>>>
>>> Andrey,
>>>
>>> Can you please provide instructions on how to create reproducible
>>> environment?
>>>
>>> The following patch is equivalent to the original patch, for
>>> the purposes of fixing the kvmclock problem.
>>>
>>> Perhaps it becomes easier to spot the reason for the hang you are
>>> experiencing.
>>>
>>>
>>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
>>> index 272a88a..feb5fc5 100644
>>> --- a/hw/i386/kvm/clock.c
>>> +++ b/hw/i386/kvm/clock.c
>>> @@ -17,7 +17,6 @@
>>>  #include "qemu/host-utils.h"
>>>  #include "sysemu/sysemu.h"
>>>  #include "sysemu/kvm.h"
>>> -#include "sysemu/cpus.h"
>>>  #include "hw/sysbus.h"
>>>  #include "hw/kvm/clock.h"
>>>
>>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
>>>
>>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
>>>
>>> -    assert(time.tsc_timestamp <= migration_tsc);
>>>      delta = migration_tsc - time.tsc_timestamp;
>>>      if (time.tsc_shift < 0) {
>>>          delta >>= -time.tsc_shift;
>>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
>>> int running,
>>>          if (s->clock_valid) {
>>>              return;
>>>          }
>>> -
>>> -        cpu_synchronize_all_states();
>>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>>>          if (ret < 0) {
>>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
>>> strerror(ret));
>>> diff --git a/migration.c b/migration.c
>>> index 8d675b3..34f2325 100644
>>> --- a/migration.c
>>> +++ b/migration.c
>>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>>>                  old_vm_running = runstate_is_running();
>>>
>>> +                cpu_synchronize_all_states();
>>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>>                  if (ret >= 0) {
>>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
>
>
> It could also be useful to apply the above patch _and_ revert
> a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
>
> Paolo

Yes, it solved the issue for me! (it took much time to check because
most of country` debian mirrors went inconsistent by some reason)

Also trivial addition:

Comments

Marcelo Tosatti July 15, 2014, 11:47 p.m. UTC | #1
On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
> >
> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com>
> >> wrote:
> >>>
> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
> >>>>
> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru>
> >>>> wrote:
> >>>>>
> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> the issue is not specific to the iothread code because generic
> >>>>>>> virtio-blk also hangs up:
> >>>>>>
> >>>>>>
> >>>>>> Do you know which version works well?  If you could bisect, that'll
> >>>>>> help a lot.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>                 Amit
> >>>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
> >>>>> every step takes about 10 minutes to complete.
> >>>>
> >>>>
> >>>> Yay! It is even outside of virtio-blk.
> >>>>
> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
> >>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
> >>>> Date:   Tue Jun 3 13:34:48 2014 -0300
> >>>>
> >>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
> >>>> calculation
> >>>>
> >>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
> >>>>
> >>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
> >>>>     Cc: qemu-stable@nongnu.org
> >>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> >>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>
> >>>
> >>> Andrey,
> >>>
> >>> Can you please provide instructions on how to create reproducible
> >>> environment?
> >>>
> >>> The following patch is equivalent to the original patch, for
> >>> the purposes of fixing the kvmclock problem.
> >>>
> >>> Perhaps it becomes easier to spot the reason for the hang you are
> >>> experiencing.
> >>>
> >>>
> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> >>> index 272a88a..feb5fc5 100644
> >>> --- a/hw/i386/kvm/clock.c
> >>> +++ b/hw/i386/kvm/clock.c
> >>> @@ -17,7 +17,6 @@
> >>>  #include "qemu/host-utils.h"
> >>>  #include "sysemu/sysemu.h"
> >>>  #include "sysemu/kvm.h"
> >>> -#include "sysemu/cpus.h"
> >>>  #include "hw/sysbus.h"
> >>>  #include "hw/kvm/clock.h"
> >>>
> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
> >>>
> >>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
> >>>
> >>> -    assert(time.tsc_timestamp <= migration_tsc);
> >>>      delta = migration_tsc - time.tsc_timestamp;
> >>>      if (time.tsc_shift < 0) {
> >>>          delta >>= -time.tsc_shift;
> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
> >>> int running,
> >>>          if (s->clock_valid) {
> >>>              return;
> >>>          }
> >>> -
> >>> -        cpu_synchronize_all_states();
> >>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> >>>          if (ret < 0) {
> >>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
> >>> strerror(ret));
> >>> diff --git a/migration.c b/migration.c
> >>> index 8d675b3..34f2325 100644
> >>> --- a/migration.c
> >>> +++ b/migration.c
> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
> >>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> >>>                  old_vm_running = runstate_is_running();
> >>>
> >>> +                cpu_synchronize_all_states();
> >>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> >>>                  if (ret >= 0) {
> >>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
> >
> >
> > It could also be useful to apply the above patch _and_ revert
> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
> >
> > Paolo
> 
> Yes, it solved the issue for me! (it took much time to check because
> most of country` debian mirrors went inconsistent by some reason)
> 
> Also trivial addition:
> 
> diff --git a/migration.c b/migration.c
> index 34f2325..65d1c88 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -25,6 +25,7 @@
>  #include "qemu/thread.h"
>  #include "qmp-commands.h"
>  #include "trace.h"
> +#include "sysemu/cpus.h"

Can you attach 'git diff' of the qemu tree where the test
was successful please?
Marcelo Tosatti July 16, 2014, 1:16 a.m. UTC | #2
On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
> >
> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosatti@redhat.com>
> >> wrote:
> >>>
> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
> >>>>
> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <andrey@xdel.ru>
> >>>> wrote:
> >>>>>
> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.shah@redhat.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> the issue is not specific to the iothread code because generic
> >>>>>>> virtio-blk also hangs up:
> >>>>>>
> >>>>>>
> >>>>>> Do you know which version works well?  If you could bisect, that'll
> >>>>>> help a lot.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>                 Amit
> >>>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, though
> >>>>> every step takes about 10 minutes to complete.
> >>>>
> >>>>
> >>>> Yay! It is even outside of virtio-blk.
> >>>>
> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
> >>>> Author: Marcelo Tosatti <mtosatti@redhat.com>
> >>>> Date:   Tue Jun 3 13:34:48 2014 -0300
> >>>>
> >>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
> >>>> calculation
> >>>>
> >>>>     Ensure proper env->tsc value for kvmclock_current_nsec calculation.
> >>>>
> >>>>     Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
> >>>>     Cc: qemu-stable@nongnu.org
> >>>>     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> >>>>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>>
> >>>
> >>> Andrey,
> >>>
> >>> Can you please provide instructions on how to create reproducible
> >>> environment?
> >>>
> >>> The following patch is equivalent to the original patch, for
> >>> the purposes of fixing the kvmclock problem.
> >>>
> >>> Perhaps it becomes easier to spot the reason for the hang you are
> >>> experiencing.
> >>>
> >>>
> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> >>> index 272a88a..feb5fc5 100644
> >>> --- a/hw/i386/kvm/clock.c
> >>> +++ b/hw/i386/kvm/clock.c
> >>> @@ -17,7 +17,6 @@
> >>>  #include "qemu/host-utils.h"
> >>>  #include "sysemu/sysemu.h"
> >>>  #include "sysemu/kvm.h"
> >>> -#include "sysemu/cpus.h"
> >>>  #include "hw/sysbus.h"
> >>>  #include "hw/kvm/clock.h"
> >>>
> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
> >>>
> >>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
> >>>
> >>> -    assert(time.tsc_timestamp <= migration_tsc);
> >>>      delta = migration_tsc - time.tsc_timestamp;
> >>>      if (time.tsc_shift < 0) {
> >>>          delta >>= -time.tsc_shift;
> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
> >>> int running,
> >>>          if (s->clock_valid) {
> >>>              return;
> >>>          }
> >>> -
> >>> -        cpu_synchronize_all_states();
> >>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
> >>>          if (ret < 0) {
> >>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
> >>> strerror(ret));
> >>> diff --git a/migration.c b/migration.c
> >>> index 8d675b3..34f2325 100644
> >>> --- a/migration.c
> >>> +++ b/migration.c
> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
> >>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> >>>                  old_vm_running = runstate_is_running();
> >>>
> >>> +                cpu_synchronize_all_states();
> >>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> >>>                  if (ret >= 0) {
> >>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
> >
> >
> > It could also be useful to apply the above patch _and_ revert
> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
> >
> > Paolo
> 
> Yes, it solved the issue for me! (it took much time to check because
> most of country` debian mirrors went inconsistent by some reason)
> 
> Also trivial addition:
> 
> diff --git a/migration.c b/migration.c
> index 34f2325..65d1c88 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -25,6 +25,7 @@
>  #include "qemu/thread.h"
>  #include "qmp-commands.h"
>  #include "trace.h"
> +#include "sysemu/cpus.h"

And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?

That is, test with a stock qemu.git tree and the patch sent today, 
on this thread, to move cpu_synchronize_all_states ?
diff mbox

Patch

diff --git a/migration.c b/migration.c
index 34f2325..65d1c88 100644
--- a/migration.c
+++ b/migration.c
@@ -25,6 +25,7 @@ 
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "sysemu/cpus.h"