From patchwork Wed Jul 16 11:52:29 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 370729 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 7AC9814011D for ; Wed, 16 Jul 2014 21:53:22 +1000 (EST) Received: from localhost ([::1]:38795 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X7Nlw-0004bC-CF for incoming@patchwork.ozlabs.org; Wed, 16 Jul 2014 07:53:20 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44250) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X7NlU-0003z6-AO for qemu-devel@nongnu.org; Wed, 16 Jul 2014 07:52:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X7NlP-0004PC-QX for qemu-devel@nongnu.org; Wed, 16 Jul 2014 07:52:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:21837) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X7NlP-0004P0-IW for qemu-devel@nongnu.org; Wed, 16 Jul 2014 07:52:47 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s6GBqksh001490 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 16 Jul 2014 07:52:46 -0400 Received: from amt.cnet (vpn1-7-198.gru2.redhat.com [10.97.7.198]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s6GBqiO4029503; Wed, 16 Jul 2014 07:52:44 -0400 Received: from amt.cnet (localhost [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id 0B54A10042F; Wed, 16 Jul 2014 08:52:32 -0300 (BRT) Received: (from marcelo@localhost) by amt.cnet (8.14.7/8.14.7/Submit) id s6GBqTER008224; Wed, 16 Jul 2014 08:52:29 -0300 Date: Wed, 16 Jul 2014 08:52:29 -0300 From: Marcelo Tosatti To: Andrey Korolyov Message-ID: <20140716115229.GA7741@amt.cnet> References: <20140715050318.GD26186@grmbl.mre> <20140715210948.GA20036@amt.cnet> <53C5A4C9.80609@redhat.com> <20140716011634.GA30717@amt.cnet> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: Amit Shah , Paolo Bonzini , Fam Zheng , "qemu-devel@nongnu.org" Subject: Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote: > On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti wrote: > > On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote: > >> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini wrote: > >> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto: > >> > > >> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti > >> >> wrote: > >> >>> > >> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote: > >> >>>> > >> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov > >> >>>> wrote: > >> >>>>> > >> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah > >> >>>>> wrote: > >> >>>>>> > >> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote: > >> >>>>>>> > >> >>>>>>> Hello, > >> >>>>>>> > >> >>>>>>> the issue is not specific to the iothread code because generic > >> >>>>>>> virtio-blk also hangs up: > >> >>>>>> > >> >>>>>> > >> >>>>>> Do you know which version works well? If you could bisect, that'll > >> >>>>>> help a lot. > >> >>>>>> > >> >>>>>> Thanks, > >> >>>>>> Amit > >> >>>>> > >> >>>>> > >> >>>>> Hi, > >> >>>>> > >> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, though > >> >>>>> every step takes about 10 minutes to complete. > >> >>>> > >> >>>> > >> >>>> Yay! It is even outside of virtio-blk. > >> >>>> > >> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 > >> >>>> Author: Marcelo Tosatti > >> >>>> Date: Tue Jun 3 13:34:48 2014 -0300 > >> >>>> > >> >>>> kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec > >> >>>> calculation > >> >>>> > >> >>>> Ensure proper env->tsc value for kvmclock_current_nsec calculation. > >> >>>> > >> >>>> Reported-by: Marcin GibuĊ‚a > >> >>>> Cc: qemu-stable@nongnu.org > >> >>>> Signed-off-by: Marcelo Tosatti > >> >>>> Signed-off-by: Paolo Bonzini > >> >>> > >> >>> > >> >>> Andrey, > >> >>> > >> >>> Can you please provide instructions on how to create reproducible > >> >>> environment? > >> >>> > >> >>> The following patch is equivalent to the original patch, for > >> >>> the purposes of fixing the kvmclock problem. > >> >>> > >> >>> Perhaps it becomes easier to spot the reason for the hang you are > >> >>> experiencing. > >> >>> > >> >>> > >> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c > >> >>> index 272a88a..feb5fc5 100644 > >> >>> --- a/hw/i386/kvm/clock.c > >> >>> +++ b/hw/i386/kvm/clock.c > >> >>> @@ -17,7 +17,6 @@ > >> >>> #include "qemu/host-utils.h" > >> >>> #include "sysemu/sysemu.h" > >> >>> #include "sysemu/kvm.h" > >> >>> -#include "sysemu/cpus.h" > >> >>> #include "hw/sysbus.h" > >> >>> #include "hw/kvm/clock.h" > >> >>> > >> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s) > >> >>> > >> >>> cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time)); > >> >>> > >> >>> - assert(time.tsc_timestamp <= migration_tsc); > >> >>> delta = migration_tsc - time.tsc_timestamp; > >> >>> if (time.tsc_shift < 0) { > >> >>> delta >>= -time.tsc_shift; > >> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, > >> >>> int running, > >> >>> if (s->clock_valid) { > >> >>> return; > >> >>> } > >> >>> - > >> >>> - cpu_synchronize_all_states(); > >> >>> ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); > >> >>> if (ret < 0) { > >> >>> fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", > >> >>> strerror(ret)); > >> >>> diff --git a/migration.c b/migration.c > >> >>> index 8d675b3..34f2325 100644 > >> >>> --- a/migration.c > >> >>> +++ b/migration.c > >> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque) > >> >>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); > >> >>> old_vm_running = runstate_is_running(); > >> >>> > >> >>> + cpu_synchronize_all_states(); > >> >>> ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); > >> >>> if (ret >= 0) { > >> >>> qemu_file_set_rate_limit(s->file, INT64_MAX); > >> > > >> > > >> > It could also be useful to apply the above patch _and_ revert > >> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce. > >> > > >> > Paolo > >> > >> Yes, it solved the issue for me! (it took much time to check because > >> most of country` debian mirrors went inconsistent by some reason) > >> > >> Also trivial addition: > >> > >> diff --git a/migration.c b/migration.c > >> index 34f2325..65d1c88 100644 > >> --- a/migration.c > >> +++ b/migration.c > >> @@ -25,6 +25,7 @@ > >> #include "qemu/thread.h" > >> #include "qmp-commands.h" > >> #include "trace.h" > >> +#include "sysemu/cpus.h" > > > > And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ? > > > > That is, test with a stock qemu.git tree and the patch sent today, > > on this thread, to move cpu_synchronize_all_states ? > > > > > > The main reason for things to work for me is a revert of > 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other > patches. I had tested two cases, with Alexander`s patch completely > reverted plus suggestion from Marcelo and only with revert 9b178682 > plug same suggestion. The difference is that the until Alexander` > patch is not reverted, live migration is always failing by the timeout > value, and when reverted migration always succeeds in 8-10 seconds. > Appropriate diffs are attached for the reference. Andrey, Can you please apply only the following attached patch to an upstream QEMU git tree (move_synchronize_all_states.patch), plus the necessary header file corrections, and attempt to reproduce? When you reproduce, please provide a backtrace and version of the QEMU git tree, and instructions on how to reproduce: 1) full QEMU command line 2) steps to reproduce Move cpu_synchronize_all_states to migration.c. diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 272a88a..feb5fc5 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -17,7 +17,6 @@ #include "qemu/host-utils.h" #include "sysemu/sysemu.h" #include "sysemu/kvm.h" -#include "sysemu/cpus.h" #include "hw/sysbus.h" #include "hw/kvm/clock.h" @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s) cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time)); - assert(time.tsc_timestamp <= migration_tsc); delta = migration_tsc - time.tsc_timestamp; if (time.tsc_shift < 0) { delta >>= -time.tsc_shift; @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running, if (s->clock_valid) { return; } - - cpu_synchronize_all_states(); ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); if (ret < 0) { fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); diff --git a/migration.c b/migration.c index 8d675b3..34f2325 100644 --- a/migration.c +++ b/migration.c @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); old_vm_running = runstate_is_running(); + cpu_synchronize_all_states(); ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret >= 0) { qemu_file_set_rate_limit(s->file, INT64_MAX);