From patchwork Thu Jan 31 11:53:12 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Lieven X-Patchwork-Id: 217187 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id BC9D72C0091 for ; Fri, 1 Feb 2013 00:00:00 +1100 (EST) Received: from localhost ([::1]:33365 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U0tkF-0002CP-0K for incoming@patchwork.ozlabs.org; Thu, 31 Jan 2013 07:59:59 -0500 Received: from eggs.gnu.org ([208.118.235.92]:53136) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U0shx-0002fv-ST for qemu-devel@nongnu.org; Thu, 31 Jan 2013 06:53:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1U0shf-0004xR-FQ for qemu-devel@nongnu.org; Thu, 31 Jan 2013 06:53:18 -0500 Received: from ssl.dlhnet.de ([91.198.192.8]:42523 helo=ssl.dlh.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U0shf-0004xH-6h for qemu-devel@nongnu.org; Thu, 31 Jan 2013 06:53:15 -0500 Received: from localhost (localhost [127.0.0.1]) by ssl.dlh.net (Postfix) with ESMTP id 0ACC914D64A for ; Thu, 31 Jan 2013 12:53:14 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at ssl.dlh.net Received: from ssl.dlh.net ([127.0.0.1]) by localhost (ssl.dlh.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PjNo4jkfQPv8 for ; Thu, 31 Jan 2013 12:53:13 +0100 (CET) Received: from lieven-mac.kamp-intra.net (unknown [82.141.1.215]) by ssl.dlh.net (Postfix) with ESMTPSA id 722EE14D10A for ; Thu, 31 Jan 2013 12:53:13 +0100 (CET) Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) From: Peter Lieven In-Reply-To: <510A3B3E.6090909@redhat.com> Date: Thu, 31 Jan 2013 12:53:12 +0100 Message-Id: <5DEF77B3-5ECB-430D-9C87-EAE149BB0280@dlhnet.de> References: <510A15DC.3070706@dlhnet.de> <510A218C.2000903@redhat.com> <85C05B4E-E6BC-43D3-917F-E3EAC674F2EA@dlhnet.de> <510A2C3D.3060001@redhat.com> <0F16AFAE-CEE1-4F99-BFFE-5B001E66ABD2@dlhnet.de> <510A3276.2050102@redhat.com> <510A373A.8020706@redhat.com> <510A3B3E.6090909@redhat.com> To: "qemu-devel@nongnu.org" X-Mailer: Apple Mail (2.1499) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 91.198.192.8 Subject: Re: [Qemu-devel] RFC migration of zero pages X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org RFC patch is attached. Comments appreciated. I have two concerns left: a) what happens if a page turns from zero to non-zero in the first stage. Is this page transferred in the same round or in the next? b) what happens if live migration fails or is aborted and then again a migration is started to the same target (if this is possible). Is the memory at the target reinitialized? Am 31.01.2013 um 10:37 schrieb Orit Wasserman : > On 01/31/2013 11:25 AM, Peter Lieven wrote: >> >> Am 31.01.2013 um 10:19 schrieb Orit Wasserman : >> >>> On 01/31/2013 11:00 AM, Peter Lieven wrote: >>>> >>>> Am 31.01.2013 um 09:59 schrieb Orit Wasserman : >>>> >>>>> On 01/31/2013 10:37 AM, Peter Lieven wrote: >>>>>> >>>>>> Am 31.01.2013 um 09:33 schrieb Orit Wasserman : >>>>>> >>>>>>> On 01/31/2013 10:10 AM, Peter Lieven wrote: >>>>>>>> >>>>>>>> Am 31.01.2013 um 08:47 schrieb Orit Wasserman : >>>>>>>> >>>>>>>>> On 01/31/2013 08:57 AM, Peter Lieven wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I just came across an idea and would like to have feedback if it makes sence or not. >>>>>>>>>> >>>>>>>>>> If a VM is started without preallocated memory all memory that has not been written to >>>>>>>>>> reads as zeros, right? >>>>>>>>> Hi, >>>>>>>>> No the memory will be unmapped (we allocate on demand). >>>>>>>> >>>>>>>> Yes, but those unmapped pages will read as zeroes if the guest accesses it? >>>>>>> yes. >>>>>>>> >>>>>>>>>> If a VM with a lot of unwritten memory is migrated or if the memory contains a lot >>>>>>>>>> of zeroed out memory (e.g. Windows or Linux guest with page sanitization) all this memory >>>>>>>>>> is allocated on the target during live migration. Especially with KSM this leads >>>>>>>>>> to the problem that this memory is allocated and might be not available completely as >>>>>>>>>> merging of the pages will happen async. >>>>>>>>>> >>>>>>>>>> Wouldn't it make sense to not send zero pages in the first round where the complete >>>>>>>>>> ram is sent (if it is detectable that we are in this stage)? >>>>>>>>> We send one byte per zero page at the moment (see is_dup_page) we can further optimizing it >>>>>>>>> by not sending it. >>>>>>>>> I have to point out that this is a very idle guest and we need to work on a loaded guest >>>>>>>>> which is the more hard problem in migration. >>>>>>>> >>>>>>>> I was not talking about saving one byte (+ 8 bytes for header), my concern was that we memset all (dup) pages >>>>>>>> including the special case of a zero dup page on the migration target. This allocates the memory or does it not? >>>>>>>> >>>>>>> >>>>>>>> If my above assumption that the guest reads unmapped memory as zeroes is right, this mapping >>>>>>>> is not necessary in the case of a zero dup page. >>>>>>>> >>>>>>>> We just have to make sure that we are still in the very first round when deciding not to sent >>>>>>>> a zero page, because otherwise it could be a page that has become zero during migration and >>>>>>>> this of course has to be transferred. >>>>>>> >>>>>>> OK, so if we won't send the pages than it won't be allocate in the dst and it can improve both >>>>>>> memory usage and reduce cpu consumption on it. >>>>>>> That can be good for over commit scenario. >>>>>> >>>>>> Yes. On the Source host those zero pages have likely all been merged by KSM already, but on the destination >>>>>> they are allocated and initially consume real memory. This can be a problem if a lot of incoming migrations happen >>>>>> at the same time. >>>>> >>>>> That can be very effective. >>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Also I notice that the bottle neck in migrating unmapped pages is the detection of those pages >>>>>>>>> because we map the pages in order to check them, for a large guest this is very expensive as mapping a page >>>>>>>>> results in a page fault in the host. >>>>>>>>> So what will be very helpful is actually locating those pages without mapping them >>>>>>>>> which looks very complicated. >>>>>>>> >>>>>>>> This would be a nice improvement, but as you said a guest will sooner or later allocate >>>>>>>> all memory if it is not totally idle. However, bigger parts of this memory might have been reset to zeroes. >>>>>>>> This happens on page deallocation in a Windows Guest by default and can also be enforced in LInux >>>>>>>> with page sanitization. >>>>>>> >>>>>>> true, but it those cases we will want to zero the page in the dst as this is done for security reasons. >>>>>> >>>>>> if i migrate it to a destination where initially all memory is unmapped not migrating the zero page turns it >>>>>> into an unmapped page (which reads a zero?). where is the security problem? its like rethinning on a storage. >>>>>> Or do I understand something wrong here? Is the actual mapping information migrated? >>>>> >>>>> I was referring to pages that had some data and were migrated, so when the guest OS zeros them we need to zero them >>>>> also in destination because the data is also there. >>>> >>>> Ok, so can we with the current implementation effectively decide if a page is transferred for the first time? >>> >>> In the old code (before 1.3 or 1.2 we add a separate function for the first full transfer but now we don't. >>> So I guess you will need to implement it, it shouldn't be too complicated. >>> I would add a flag to the existing code. >>>> >>>> Do we always migrate the complete memory once and then iterate over dirty pages? I have to check the code >>>> that searches for dirty pages to confirm that. >>> We set all the bitmap as dirty in the beginning of migration so in the first iteration all pages will be sent. >>> The code is in arch_init.c, look at ram_save_setup and ram_save_iterate. >> >> I will have a look and sent a RFC patch once I have tested it. > Great! diff --git a/arch_init.c b/arch_init.c index dada6de..33f3b12 100644 --- a/arch_init.c +++ b/arch_init.c @@ -426,6 +426,8 @@ static void migration_bitmap_sync(void) * 0 means no dirty pages */ +static uint64_t complete_rounds; + static int ram_save_block(QEMUFile *f, bool last_stage) { RAMBlock *block = last_seen_block; @@ -451,6 +453,10 @@ static int ram_save_block(QEMUFile *f, bool last_stage) if (!block) { block = QTAILQ_FIRST(&ram_list.blocks); complete_round = true; + if (!complete_rounds) { + error_report("ram_save_block: finished bulk ram migration"); + } + complete_rounds++; } } else { uint8_t *p; @@ -463,10 +469,17 @@ static int ram_save_block(QEMUFile *f, bool last_stage) bytes_sent = -1; if (is_dup_page(p)) { acct_info.dup_pages++; - bytes_sent = save_block_hdr(f, block, offset, cont, + /* we can skip transferring zero pages in the first round because + memory is unmapped (reads as zero) at the target anyway or initialized + to zero in case of mem-prealloc. */ + if (complete_rounds || *p) { + bytes_sent = save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS); - qemu_put_byte(f, *p); - bytes_sent += 1; + qemu_put_byte(f, *p); + bytes_sent += 1; + } else { + bytes_sent = 1; + } } else if (migrate_use_xbzrle()) { current_addr = block->offset + offset; bytes_sent = save_xbzrle_page(f, p, current_addr, block, @@ -569,6 +582,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) qemu_mutex_lock_ramlist(); bytes_transferred = 0; + complete_rounds = 0; reset_ram_globals(); if (migrate_use_xbzrle()) {