From patchwork Thu Jan 31 11:53:12 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Peter Lieven <pl@dlhnet.de>
X-Patchwork-Id: 217187
Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by ozlabs.org (Postfix) with ESMTPS id BC9D72C0091
	for <incoming@patchwork.ozlabs.org>;
	Fri,  1 Feb 2013 00:00:00 +1100 (EST)
Received: from localhost ([::1]:33365 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)
	id 1U0tkF-0002CP-0K
	for incoming@patchwork.ozlabs.org; Thu, 31 Jan 2013 07:59:59 -0500
Received: from eggs.gnu.org ([208.118.235.92]:53136)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pl@dlhnet.de>) id 1U0shx-0002fv-ST
	for qemu-devel@nongnu.org; Thu, 31 Jan 2013 06:53:36 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@dlhnet.de>) id 1U0shf-0004xR-FQ
	for qemu-devel@nongnu.org; Thu, 31 Jan 2013 06:53:18 -0500
Received: from ssl.dlhnet.de ([91.198.192.8]:42523 helo=ssl.dlh.net)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@dlhnet.de>)
	id 1U0shf-0004xH-6h
	for qemu-devel@nongnu.org; Thu, 31 Jan 2013 06:53:15 -0500
Received: from localhost (localhost [127.0.0.1])
	by ssl.dlh.net (Postfix) with ESMTP id 0ACC914D64A
	for <qemu-devel@nongnu.org>; Thu, 31 Jan 2013 12:53:14 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at ssl.dlh.net
Received: from ssl.dlh.net ([127.0.0.1])
	by localhost (ssl.dlh.net [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id PjNo4jkfQPv8 for <qemu-devel@nongnu.org>;
	Thu, 31 Jan 2013 12:53:13 +0100 (CET)
Received: from lieven-mac.kamp-intra.net (unknown [82.141.1.215])
	by ssl.dlh.net (Postfix) with ESMTPSA id 722EE14D10A
	for <qemu-devel@nongnu.org>; Thu, 31 Jan 2013 12:53:13 +0100 (CET)
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
From: Peter Lieven <pl@dlhnet.de>
In-Reply-To: <510A3B3E.6090909@redhat.com>
Date: Thu, 31 Jan 2013 12:53:12 +0100
Message-Id: <5DEF77B3-5ECB-430D-9C87-EAE149BB0280@dlhnet.de>
References: <510A15DC.3070706@dlhnet.de> <510A218C.2000903@redhat.com>
	<85C05B4E-E6BC-43D3-917F-E3EAC674F2EA@dlhnet.de>
	<510A2C3D.3060001@redhat.com>
	<0F16AFAE-CEE1-4F99-BFFE-5B001E66ABD2@dlhnet.de>
	<510A3276.2050102@redhat.com>
	<E7E28795-C05D-4610-A441-42CC5CD00E5A@dlhnet.de>
	<510A373A.8020706@redhat.com>
	<FFC9DD46-D5AA-42B4-89BE-34352042D2F0@dlhnet.de>
	<510A3B3E.6090909@redhat.com>
To: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
X-Mailer: Apple Mail (2.1499)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 91.198.192.8
Subject: Re: [Qemu-devel] RFC migration of zero pages
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

RFC patch is attached. Comments appreciated.
I have two concerns left:
a) what happens if a page turns from zero to non-zero in the first stage. Is
this page transferred in the same round or in the next?
b) what happens if live migration fails or is aborted and then again
a migration is started to the same target (if this is possible). Is the
memory at the target reinitialized?

Am 31.01.2013 um 10:37 schrieb Orit Wasserman <owasserm@redhat.com>:

> On 01/31/2013 11:25 AM, Peter Lieven wrote:
>> 
>> Am 31.01.2013 um 10:19 schrieb Orit Wasserman <owasserm@redhat.com>:
>> 
>>> On 01/31/2013 11:00 AM, Peter Lieven wrote:
>>>> 
>>>> Am 31.01.2013 um 09:59 schrieb Orit Wasserman <owasserm@redhat.com>:
>>>> 
>>>>> On 01/31/2013 10:37 AM, Peter Lieven wrote:
>>>>>> 
>>>>>> Am 31.01.2013 um 09:33 schrieb Orit Wasserman <owasserm@redhat.com>:
>>>>>> 
>>>>>>> On 01/31/2013 10:10 AM, Peter Lieven wrote:
>>>>>>>> 
>>>>>>>> Am 31.01.2013 um 08:47 schrieb Orit Wasserman <owasserm@redhat.com>:
>>>>>>>> 
>>>>>>>>> On 01/31/2013 08:57 AM, Peter Lieven wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I just came across an idea and would like to have feedback if it makes sence or not.
>>>>>>>>>> 
>>>>>>>>>> If a VM is started without preallocated memory all memory that has not been written to
>>>>>>>>>> reads as zeros, right?
>>>>>>>>> Hi,
>>>>>>>>> No the memory will be unmapped (we allocate on demand).
>>>>>>>> 
>>>>>>>> Yes, but those unmapped pages will read as zeroes if the guest accesses it?
>>>>>>> yes.
>>>>>>>> 
>>>>>>>>>> If a VM with a lot of unwritten memory is migrated or if the memory contains a lot
>>>>>>>>>> of zeroed out memory (e.g. Windows or Linux guest with page sanitization) all this memory
>>>>>>>>>> is allocated on the target during live migration. Especially with KSM this leads
>>>>>>>>>> to the problem that this memory is allocated and might be not available completely as
>>>>>>>>>> merging of the pages will happen async.
>>>>>>>>>> 
>>>>>>>>>> Wouldn't it make sense to not send zero pages in the first round where the complete
>>>>>>>>>> ram is sent (if it is detectable that we are in this stage)?
>>>>>>>>> We send one byte per zero page at the moment (see is_dup_page) we can further optimizing it
>>>>>>>>> by not sending it.
>>>>>>>>> I have to point out that this is a very idle guest and we need to work on a loaded guest 
>>>>>>>>> which is the more hard problem in migration.
>>>>>>>> 
>>>>>>>> I was not talking about saving one byte (+ 8 bytes for header), my concern was that we memset all (dup) pages
>>>>>>>> including the special case of a zero dup page on the migration target. This allocates the memory or does it not?
>>>>>>>> 
>>>>>>> 
>>>>>>>> If my above assumption that the guest reads unmapped memory as zeroes is right, this mapping
>>>>>>>> is not necessary in the case of a zero dup page.
>>>>>>>> 
>>>>>>>> We just have to make sure that we are still in the very first round when deciding not to sent
>>>>>>>> a zero page, because otherwise it could be a page that has become zero during migration and
>>>>>>>> this of course has to be transferred.
>>>>>>> 
>>>>>>> OK, so if we won't send the pages than it won't be allocate in the dst and it can improve both 
>>>>>>> memory usage and reduce cpu consumption on it.
>>>>>>> That can be good for over commit scenario.
>>>>>> 
>>>>>> Yes. On the Source host those zero pages have likely all been merged by KSM already, but on the destination
>>>>>> they are allocated and initially consume real memory. This can be a problem if a lot of incoming migrations happen
>>>>>> at the same time.
>>>>> 
>>>>> That can be very effective.
>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Also I notice that the bottle neck in migrating unmapped pages is the detection of those pages
>>>>>>>>> because we map the pages in order to check them, for a large guest this is very expensive as mapping a page
>>>>>>>>> results in a page fault in the host.
>>>>>>>>> So what will be very helpful is actually locating those pages without mapping them
>>>>>>>>> which looks very complicated.
>>>>>>>> 
>>>>>>>> This would be a nice improvement, but as you said a guest will sooner or later allocate
>>>>>>>> all memory if it is not totally idle. However, bigger parts of this memory might have been reset to zeroes.
>>>>>>>> This happens on page deallocation in a Windows Guest by default and can also be enforced in LInux
>>>>>>>> with page sanitization.
>>>>>>> 
>>>>>>> true, but it those cases we will want to zero the page in the dst as this is done for security reasons.
>>>>>> 
>>>>>> if i migrate it to a destination where initially all memory is unmapped not migrating the zero page turns it
>>>>>> into an unmapped page (which reads a zero?). where is the security problem? its like rethinning on a storage.
>>>>>> Or do I understand something wrong here? Is the actual mapping information migrated?
>>>>> 
>>>>> I was referring to pages that had some data and were migrated, so when the guest OS zeros them we need to zero them
>>>>> also in destination because the data is also there.
>>>> 
>>>> Ok, so can we with the current implementation effectively decide if a page is transferred for the first time?
>>> 
>>> In the old code (before 1.3 or 1.2  we add a separate function for the first full transfer but now we don't.
>>> So I guess you will need to implement it, it shouldn't be too complicated.
>>> I would add a flag to the existing code.
>>>> 
>>>> Do we always migrate the complete memory once and then iterate over dirty pages? I have to check the code
>>>> that searches for dirty pages to confirm that.
>>> We set all the bitmap as dirty in the beginning of migration so in the first iteration all pages will be sent.
>>> The code is in arch_init.c, look at ram_save_setup and ram_save_iterate.
>> 
>> I will have a look and sent a RFC patch once I have tested it.
> Great!

diff --git a/arch_init.c b/arch_init.c
index dada6de..33f3b12 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -426,6 +426,8 @@ static void migration_bitmap_sync(void)
  *           0 means no dirty pages
  */
 
+static uint64_t complete_rounds;
+
 static int ram_save_block(QEMUFile *f, bool last_stage)
 {
     RAMBlock *block = last_seen_block;
@@ -451,6 +453,10 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
             if (!block) {
                 block = QTAILQ_FIRST(&ram_list.blocks);
                 complete_round = true;
+                if (!complete_rounds) {
+                    error_report("ram_save_block: finished bulk ram migration");
+                }
+                complete_rounds++;
             }
         } else {
             uint8_t *p;
@@ -463,10 +469,17 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
             bytes_sent = -1;
             if (is_dup_page(p)) {
                 acct_info.dup_pages++;
-                bytes_sent = save_block_hdr(f, block, offset, cont,
+                /* we can skip transferring zero pages in the first round because
+                   memory is unmapped (reads as zero) at the target anyway or initialized
+                   to zero in case of mem-prealloc. */
+                if (complete_rounds || *p) {
+                    bytes_sent = save_block_hdr(f, block, offset, cont,
                                             RAM_SAVE_FLAG_COMPRESS);
-                qemu_put_byte(f, *p);
-                bytes_sent += 1;
+                    qemu_put_byte(f, *p);
+                    bytes_sent += 1;
+                } else {
+                    bytes_sent = 1;
+                }
             } else if (migrate_use_xbzrle()) {
                 current_addr = block->offset + offset;
                 bytes_sent = save_xbzrle_page(f, p, current_addr, block,
@@ -569,6 +582,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
     qemu_mutex_lock_ramlist();
     bytes_transferred = 0;
+    complete_rounds = 0;
     reset_ram_globals();
 
     if (migrate_use_xbzrle()) {