From patchwork Thu Apr 11 12:38:27 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Liuji (Jeremy)" X-Patchwork-Id: 235725 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 2E21B2C00BB for ; Thu, 11 Apr 2013 22:39:41 +1000 (EST) Received: from localhost ([::1]:35978 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQGmx-0002PB-El for incoming@patchwork.ozlabs.org; Thu, 11 Apr 2013 08:39:39 -0400 Received: from eggs.gnu.org ([208.118.235.92]:55587) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQGm5-0001sE-V8 for qemu-devel@nongnu.org; Thu, 11 Apr 2013 08:38:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UQGm2-00040R-L3 for qemu-devel@nongnu.org; Thu, 11 Apr 2013 08:38:45 -0400 Received: from szxga01-in.huawei.com ([119.145.14.64]:20590) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQGm1-0003zI-R8 for qemu-devel@nongnu.org; Thu, 11 Apr 2013 08:38:42 -0400 Received: from 172.24.2.119 (EHLO szxeml214-edg.china.huawei.com) ([172.24.2.119]) by szxrg01-dlp.huawei.com (MOS 4.3.4-GA FastPath queued) with ESMTP id BAL19152; Thu, 11 Apr 2013 20:38:35 +0800 (CST) Received: from SZXEML449-HUB.china.huawei.com (10.82.67.192) by szxeml214-edg.china.huawei.com (172.24.2.29) with Microsoft SMTP Server (TLS) id 14.1.323.7; Thu, 11 Apr 2013 20:38:36 +0800 Received: from SZXEML552-MBS.china.huawei.com ([169.254.2.42]) by szxeml449-hub.china.huawei.com ([10.82.67.192]) with mapi id 14.01.0323.007; Thu, 11 Apr 2013 20:38:27 +0800 From: "Liuji (Jeremy)" To: "quintela@redhat.com" Thread-Topic: [PATCH 0/4] QEMUFile improvements and simplifications Thread-Index: AQHONeqqWwa4SXx/vUCilyRfRw9oyZjQ8y7A Date: Thu, 11 Apr 2013 12:38:27 +0000 Message-ID: <3353B972624C8B49AB43F74428B4701D40ADE844@SZXEML552-MBS.china.huawei.com> References: <1365420597-5506-1-git-send-email-pbonzini@redhat.com> <3353B972624C8B49AB43F74428B4701D40ADE2F3@SZXEML552-MBS.china.huawei.com> <877gkaa6kk.fsf@elfo.elfo> In-Reply-To: <877gkaa6kk.fsf@elfo.elfo> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-cr-hashedpuzzle: BbBZ IDKc KA+e LnhU Lq0+ L0H1 MjzV NlUa OE2x Tn9t ZL8/ eokJ gQll kwFn vwMe wdhg; 5; awB3AG8AbABmAEAAcgBlAGQAaABhAHQALgBjAG8AbQA7AG8AdwBhAHMAcwBlAHIAbQBAAHIAZQBkAGgAYQB0AC4AYwBvAG0AOwBwAGIAbwBuAHoAaQBuAGkAQAByAGUAZABoAGEAdAAuAGMAbwBtADsAcQBlAG0AdQAtAGQAZQB2AGUAbABAAG4AbwBuAGcAbgB1AC4AbwByAGcAOwBxAHUAaQBuAHQAZQBsAGEAQAByAGUAZABoAGEAdAAuAGMAbwBtAA==; Sosha1_v1; 7; {49258436-D087-46DE-824E-5EDA1CBE137E}; agBlAHIAZQBtAHkALgBsAGkAdQBAAGgAdQBhAHcAZQBpAC4AYwBvAG0A; Thu, 11 Apr 2013 12:33:20 GMT; UgBlADoAIABbAFAAQQBUAEMASAAgADAALwA0AF0AIABRAEUATQBVAEYAaQBsAGUAIABpAG0AcAByAG8AdgBlAG0AZQBuAHQAcwAgAGEAbgBkACAAcwBpAG0AcABsAGkAZgBpAGMAYQB0AGkAbwBuAHMA x-cr-puzzleid: {49258436-D087-46DE-824E-5EDA1CBE137E} x-originating-ip: [10.146.21.195] MIME-Version: 1.0 X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 119.145.14.64 Cc: "kwolf@redhat.com" , "Luohao \(brian\)" , "qemu-devel@nongnu.org" , "owasserm@redhat.com" , Haofeng , Paolo Bonzini Subject: Re: [Qemu-devel] [PATCH 0/4] QEMUFile improvements and simplifications X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Hi, Juan Thanks for your reply. Yesterday, my disk has no space. So, the core-dump file not saved completely. The info of core-dump file is: #0 0x00007f7a0dbff341 in migration_thread (opaque=0x7f7a0e16cbc0) at migration.c:545 545 double bandwidth = transferred_bytes / time_spent; (gdb) bt #0 0x00007f7a0dbff341 in migration_thread (opaque=0x7f7a0e16cbc0) at migration.c:545 #1 0x00007f7a0becad14 in start_thread (arg=0x7f7957fff700) at pthread_create.c:309 #2 0x00007f7a07cf267d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) l 540 } 541 current_time = qemu_get_clock_ms(rt_clock); 542 if (current_time >= initial_time + BUFFER_DELAY) { 543 uint64_t transferred_bytes = qemu_ftell(s->file) - initial_bytes; 544 uint64_t time_spent = current_time - initial_time - sleep_time; 545 double bandwidth = transferred_bytes / time_spent; 546 max_size = bandwidth * migrate_max_downtime() / 1000000; 547 548 DPRINTF("transferred %" PRIu64 " time_spent %" PRIu64 549 " bandwidth %g max_size %" PRId64 "\n", (gdb) p time_spent $1 = 0 (gdb) p current_time $2 = 23945934 (gdb) p initial_time $3 = 23945833 (gdb) p sleep_time $4 = 101 (gdb) p s->file->last_error $5 = 0 I tested three times. And the value of sleep_time are: 101,100,101 I think that the transfer may be so fast(use a very little time, the bytes_xfer > xfer_limit), and the "g_usleep" function may not be very accurate. So the value of sleep_time may be 100(BUFFER_DELAY) or just a bit more than 100(BUFFER_DELAY). I don't know whether my understanding is correct? Below is my simple patch for evade this problem. Is that correct? But I don't know why using your patch may trigger the problem. > Re: [PATCH 0/4] QEMUFile improvements and simplifications > > "Liuji (Jeremy)" wrote: > > Hi, Paolo > > > > I tested your 4 patches in the latest version of qemu.git/master(commit: > > 93b48c201eb6c0404d15550a0eaa3c0f7937e35e,2013-04-09). > > These patches resolve the "savevm hanging" problem, which is detailedly > described > > in my preceding mail:"After executing "savevm", the QEMU process is > hanging". > > > > But, I found two other problem: > > 1、My VM's OS is winxp. After the execution of "savevm" is completed, > > I exec "loadvm". > > But the winxp change to "blue screen", and then restart. I tested 3 > > times, but the results are same. > > > > 2、The block migration is not OK. The qemu-system-x86_64 process of > > source host is core-dump. > > In the latest version of > > > qemu.git/master(commit:93b48c201eb6c0404d15550a0eaa3c0f7937e35e,201 > 3-04-09), > > the block migration is OK. > > > > > > The info of core-dump file: > > #0 0x00007f8a44cec341 in migration_thread (opaque=0x7f8a45259bc0) at > > migration.c:545 > > 545 double bandwidth = transferred_bytes / time_spent; > > (gdb) bt > > #0 0x00007f8a44cec341 in migration_thread (opaque=0x7f8a45259bc0) at > > migration.c:545 > > #1 0x00007f8a42fb7d14 in ?? () > > #2 0x0000000000000000 in ?? () > > > > Could you recompile with -g to see what is going on? > This really makes no sense :p It looks like the source file and the > compiled version don't agree. > > Paolo, any clue? > > /me re-reads: block-migration, ok, testing goes. > > Later, Juan. diff --git a/migration.c b/migration.c index 3b4b467..58d69fb 100644 --- a/migration.c +++ b/migration.c @@ -503,6 +503,7 @@ static void *migration_thread(void *opaque) int64_t max_size = 0; int64_t start_time = initial_time; bool old_vm_running = false; + double bandwidth = 0; DPRINTF("beginning savevm\n"); qemu_savevm_state_begin(s->file, &s->params); @@ -542,7 +543,13 @@ static void *migration_thread(void *opaque) if (current_time >= initial_time + BUFFER_DELAY) { uint64_t transferred_bytes = qemu_ftell(s->file) - initial_bytes; uint64_t time_spent = current_time - initial_time - sleep_time; - double bandwidth = transferred_bytes / time_spent; + if (time_spent > 0) { + bandwidth = transferred_bytes / time_spent; + } + else { + //when time_spent <= 0, don't change the value of bandwidth. + DPRINTF("time_spent=%" PRIu64 " is too small.\n",time_spent); + } max_size = bandwidth * migrate_max_downtime() / 1000000; DPRINTF("transferred %" PRIu64 " time_spent %" PRIu64 @@ -550,7 +557,7 @@ static void *migration_thread(void *opaque) transferred_bytes, time_spent, bandwidth, max_size); /* if we haven't sent anything, we don't want to recalculate 10000 is a small enough number for our purposes */ - if (s->dirty_bytes_rate && transferred_bytes > 10000) { + if (s->dirty_bytes_rate && transferred_bytes > 10000 && bandwidth > 0) { s->expected_downtime = s->dirty_bytes_rate / bandwidth; }