From patchwork Wed Jan 4 14:21:56 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Lieven X-Patchwork-Id: 134264 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id A513C1007D6 for ; Thu, 5 Jan 2012 01:22:20 +1100 (EST) Received: from localhost ([::1]:57779 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiRjH-0006Sk-S1 for incoming@patchwork.ozlabs.org; Wed, 04 Jan 2012 09:22:11 -0500 Received: from eggs.gnu.org ([140.186.70.92]:53993) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiRj9-0006Se-C9 for qemu-devel@nongnu.org; Wed, 04 Jan 2012 09:22:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RiRj3-0003FI-Lz for qemu-devel@nongnu.org; Wed, 04 Jan 2012 09:22:03 -0500 Received: from ssl.dlh.net ([91.198.192.8]:49299) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiRj3-0003FE-Fa for qemu-devel@nongnu.org; Wed, 04 Jan 2012 09:21:57 -0500 Received: from localhost (localhost [127.0.0.1]) by ssl.dlh.net (Postfix) with ESMTP id A8DA91467F6; Wed, 4 Jan 2012 15:21:56 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at ssl.dlh.net Received: from ssl.dlh.net ([127.0.0.1]) by localhost (ssl.dlh.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cJ2TuQaGyxnN; Wed, 4 Jan 2012 15:21:56 +0100 (CET) Received: from [172.21.12.60] (unknown [82.141.1.226]) by ssl.dlh.net (Postfix) with ESMTPSA id 4A23C1467E9; Wed, 4 Jan 2012 15:21:56 +0100 (CET) Message-ID: <4F046084.3080104@dlh.net> Date: Wed, 04 Jan 2012 15:21:56 +0100 From: Peter Lieven User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Thunderbird/3.1.15 MIME-Version: 1.0 To: Paolo Bonzini References: <032f49425e7284e9f050064cd30855bb@mail.dlh.net> <4F03AD98.7020700@linux.vnet.ibm.com> <4F042FA1.5090909@dlh.net> <4F04326F.8080808@redhat.com> <4F043689.2000604@dlh.net> <4F0437DA.8080600@redhat.com> <4F043B12.60501@dlh.net> <4F0445EE.9010905@redhat.com> <4F044F42.2050508@dlh.net> <4F045ED0.1030309@redhat.com> In-Reply-To: <4F045ED0.1030309@redhat.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 91.198.192.8 Cc: Shu Ming , qemu-devel@nongnu.org, kvm@vger.kernel.org Subject: Re: [Qemu-devel] Stalls on Live Migration of VMs with a lot of memory X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On 04.01.2012 15:14, Paolo Bonzini wrote: > On 01/04/2012 02:08 PM, Peter Lieven wrote: >> >> thus my only option at the moment is to limit the runtime of the while >> loop in stage 2 or >> are there any post 1.0 patches in git that might already help? > > No; even though (as I said) people are aware of the problems and do > plan to fix them, don't hold your breath. :( ok, just for the record. if someone wants the time limit patch for the while loop in stage 2 (which solves the problem for me) and after some tweaking is able to provide a throughput of approx. 450MB/s in my case, i attached it. it also solves the case that due to a lot of dups the rate_limit does not kick in and end the while loop. if (ret < 0) { --- qemu-kvm-1.0/arch_init.c.orig 2012-01-04 14:21:02.000000000 +0100 +++ qemu-kvm-1.0/arch_init.c 2012-01-04 14:27:34.000000000 +0100 @@ -301,6 +301,8 @@ bytes_transferred_last = bytes_transferred; bwidth = qemu_get_clock_ns(rt_clock); + int pages_read = 0; + while ((ret = qemu_file_rate_limit(f)) == 0) { int bytes_sent; @@ -309,6 +311,11 @@ if (bytes_sent == 0) { /* no more blocks */ break; } + if (!(++pages_read & 0xff)) { + if ((qemu_get_clock_ns(rt_clock) - bwidth) > migrate_max_downtime()) + break; /* we have spent more than allowed downtime in this iteration */ + } }