RFC migration of zero pages

Message ID 5DEF77B3-5ECB-430D-9C87-EAE149BB0280@dlhnet.de
State New
Headers show

Commit Message

Peter Lieven Jan. 31, 2013, 11:53 a.m.
RFC patch is attached. Comments appreciated.
I have two concerns left:
a) what happens if a page turns from zero to non-zero in the first stage. Is
this page transferred in the same round or in the next?
b) what happens if live migration fails or is aborted and then again
a migration is started to the same target (if this is possible). Is the
memory at the target reinitialized?

Am 31.01.2013 um 10:37 schrieb Orit Wasserman <owasserm@redhat.com>:

> On 01/31/2013 11:25 AM, Peter Lieven wrote:
>> Am 31.01.2013 um 10:19 schrieb Orit Wasserman <owasserm@redhat.com>:
>>> On 01/31/2013 11:00 AM, Peter Lieven wrote:
>>>> Am 31.01.2013 um 09:59 schrieb Orit Wasserman <owasserm@redhat.com>:
>>>>> On 01/31/2013 10:37 AM, Peter Lieven wrote:
>>>>>> Am 31.01.2013 um 09:33 schrieb Orit Wasserman <owasserm@redhat.com>:
>>>>>>> On 01/31/2013 10:10 AM, Peter Lieven wrote:
>>>>>>>> Am 31.01.2013 um 08:47 schrieb Orit Wasserman <owasserm@redhat.com>:
>>>>>>>>> On 01/31/2013 08:57 AM, Peter Lieven wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> I just came across an idea and would like to have feedback if it makes sence or not.
>>>>>>>>>> If a VM is started without preallocated memory all memory that has not been written to
>>>>>>>>>> reads as zeros, right?
>>>>>>>>> Hi,
>>>>>>>>> No the memory will be unmapped (we allocate on demand).
>>>>>>>> Yes, but those unmapped pages will read as zeroes if the guest accesses it?
>>>>>>> yes.
>>>>>>>>>> If a VM with a lot of unwritten memory is migrated or if the memory contains a lot
>>>>>>>>>> of zeroed out memory (e.g. Windows or Linux guest with page sanitization) all this memory
>>>>>>>>>> is allocated on the target during live migration. Especially with KSM this leads
>>>>>>>>>> to the problem that this memory is allocated and might be not available completely as
>>>>>>>>>> merging of the pages will happen async.
>>>>>>>>>> Wouldn't it make sense to not send zero pages in the first round where the complete
>>>>>>>>>> ram is sent (if it is detectable that we are in this stage)?
>>>>>>>>> We send one byte per zero page at the moment (see is_dup_page) we can further optimizing it
>>>>>>>>> by not sending it.
>>>>>>>>> I have to point out that this is a very idle guest and we need to work on a loaded guest 
>>>>>>>>> which is the more hard problem in migration.
>>>>>>>> I was not talking about saving one byte (+ 8 bytes for header), my concern was that we memset all (dup) pages
>>>>>>>> including the special case of a zero dup page on the migration target. This allocates the memory or does it not?
>>>>>>>> If my above assumption that the guest reads unmapped memory as zeroes is right, this mapping
>>>>>>>> is not necessary in the case of a zero dup page.
>>>>>>>> We just have to make sure that we are still in the very first round when deciding not to sent
>>>>>>>> a zero page, because otherwise it could be a page that has become zero during migration and
>>>>>>>> this of course has to be transferred.
>>>>>>> OK, so if we won't send the pages than it won't be allocate in the dst and it can improve both 
>>>>>>> memory usage and reduce cpu consumption on it.
>>>>>>> That can be good for over commit scenario.
>>>>>> Yes. On the Source host those zero pages have likely all been merged by KSM already, but on the destination
>>>>>> they are allocated and initially consume real memory. This can be a problem if a lot of incoming migrations happen
>>>>>> at the same time.
>>>>> That can be very effective.
>>>>>>>>> Also I notice that the bottle neck in migrating unmapped pages is the detection of those pages
>>>>>>>>> because we map the pages in order to check them, for a large guest this is very expensive as mapping a page
>>>>>>>>> results in a page fault in the host.
>>>>>>>>> So what will be very helpful is actually locating those pages without mapping them
>>>>>>>>> which looks very complicated.
>>>>>>>> This would be a nice improvement, but as you said a guest will sooner or later allocate
>>>>>>>> all memory if it is not totally idle. However, bigger parts of this memory might have been reset to zeroes.
>>>>>>>> This happens on page deallocation in a Windows Guest by default and can also be enforced in LInux
>>>>>>>> with page sanitization.
>>>>>>> true, but it those cases we will want to zero the page in the dst as this is done for security reasons.
>>>>>> if i migrate it to a destination where initially all memory is unmapped not migrating the zero page turns it
>>>>>> into an unmapped page (which reads a zero?). where is the security problem? its like rethinning on a storage.
>>>>>> Or do I understand something wrong here? Is the actual mapping information migrated?
>>>>> I was referring to pages that had some data and were migrated, so when the guest OS zeros them we need to zero them
>>>>> also in destination because the data is also there.
>>>> Ok, so can we with the current implementation effectively decide if a page is transferred for the first time?
>>> In the old code (before 1.3 or 1.2  we add a separate function for the first full transfer but now we don't.
>>> So I guess you will need to implement it, it shouldn't be too complicated.
>>> I would add a flag to the existing code.
>>>> Do we always migrate the complete memory once and then iterate over dirty pages? I have to check the code
>>>> that searches for dirty pages to confirm that.
>>> We set all the bitmap as dirty in the beginning of migration so in the first iteration all pages will be sent.
>>> The code is in arch_init.c, look at ram_save_setup and ram_save_iterate.
>> I will have a look and sent a RFC patch once I have tested it.
> Great!


diff --git a/arch_init.c b/arch_init.c
index dada6de..33f3b12 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -426,6 +426,8 @@  static void migration_bitmap_sync(void)
  *           0 means no dirty pages
+static uint64_t complete_rounds;
 static int ram_save_block(QEMUFile *f, bool last_stage)
     RAMBlock *block = last_seen_block;
@@ -451,6 +453,10 @@  static int ram_save_block(QEMUFile *f, bool last_stage)
             if (!block) {
                 block = QTAILQ_FIRST(&ram_list.blocks);
                 complete_round = true;
+                if (!complete_rounds) {
+                    error_report("ram_save_block: finished bulk ram migration");
+                }
+                complete_rounds++;
         } else {
             uint8_t *p;
@@ -463,10 +469,17 @@  static int ram_save_block(QEMUFile *f, bool last_stage)
             bytes_sent = -1;
             if (is_dup_page(p)) {
-                bytes_sent = save_block_hdr(f, block, offset, cont,
+                /* we can skip transferring zero pages in the first round because
+                   memory is unmapped (reads as zero) at the target anyway or initialized
+                   to zero in case of mem-prealloc. */
+                if (complete_rounds || *p) {
+                    bytes_sent = save_block_hdr(f, block, offset, cont,
-                qemu_put_byte(f, *p);
-                bytes_sent += 1;
+                    qemu_put_byte(f, *p);
+                    bytes_sent += 1;
+                } else {
+                    bytes_sent = 1;
+                }
             } else if (migrate_use_xbzrle()) {
                 current_addr = block->offset + offset;
                 bytes_sent = save_xbzrle_page(f, p, current_addr, block,
@@ -569,6 +582,7 @@  static int ram_save_setup(QEMUFile *f, void *opaque)
     bytes_transferred = 0;
+    complete_rounds = 0;
     if (migrate_use_xbzrle()) {