diff mbox

[Migration,Bug?] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration

Message ID 55128084.2040304@huawei.com
State New
Headers show

Commit Message

Zhanghailiang March 25, 2015, 9:31 a.m. UTC
Hi all,

We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
when we check it just after finishing migration but before VM continue to Run.

We use a patch like bellow to find this issue, you can find it from affix,
and Steps to reproduce:

(1) Compile QEMU:
  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make

(2) Command and output:
SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
(qemu) migrate tcp:192.168.3.8:3004
before saving ram complete
ff703f6889ab8701e4e040872d079a28
md_host : after saving ram complete
ff703f6889ab8701e4e040872d079a28

DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
(qemu) QEMU_VM_SECTION_END, after loading ram
230e1e68ece9cd4e769630e1bcb5ddfb
md_host : after loading all vmstate
230e1e68ece9cd4e769630e1bcb5ddfb
md_host : after cpu_synchronize_all_post_init
230e1e68ece9cd4e769630e1bcb5ddfb

This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.

We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
it is very difficult for us to trace all the actions of dirtying VM's pages.

Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
when do checkpoint in COLO FT, and everything will be OK.)

Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?

This issue has blocked our COLO development... :(

Any help will be greatly appreciated!

Thanks,
zhanghailiang

--

Comments

Dr. David Alan Gilbert March 25, 2015, 9:46 a.m. UTC | #1
* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Hi all,
> 
> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
> when we check it just after finishing migration but before VM continue to Run.
> 
> We use a patch like bellow to find this issue, you can find it from affix,
> and Steps to reproduce:
> 
> (1) Compile QEMU:
>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
> 
> (2) Command and output:
> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
> (qemu) migrate tcp:192.168.3.8:3004
> before saving ram complete
> ff703f6889ab8701e4e040872d079a28
> md_host : after saving ram complete
> ff703f6889ab8701e4e040872d079a28
> 
> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
> (qemu) QEMU_VM_SECTION_END, after loading ram
> 230e1e68ece9cd4e769630e1bcb5ddfb
> md_host : after loading all vmstate
> 230e1e68ece9cd4e769630e1bcb5ddfb
> md_host : after cpu_synchronize_all_post_init
> 230e1e68ece9cd4e769630e1bcb5ddfb
> 
> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
> 
> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
> it is very difficult for us to trace all the actions of dirtying VM's pages.
> 
> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
> when do checkpoint in COLO FT, and everything will be OK.)
> 
> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?

That does sound like a bug.
The only other explanation I have is that memory is being changed by a device emulation
that happens after the end of a saving the vm, or after loading the memory.  That's
certainly possible - especially if a device (say networking) hasn't been properly
stopped.

> This issue has blocked our COLO development... :(
> 
> Any help will be greatly appreciated!

I suggest:
   1) Does it happen with devices other than virtio?
   2) Strip the devices down - e.g. just run with serial and no video/usb
   3) Try doing the md5 comparison at the end of ram_save_complete
   4) mprotect RAM after the ram_save_complete and see if anything faults.
   5) Can you trigger this with normal migration or just COLO?
      I'm wondering if something is doing something on a running/paused/etc state
      change and isn't expecting the new COLO states.

Dave


> 
> Thanks,
> zhanghailiang
> 
> --- a/savevm.c
> +++ b/savevm.c
> @@ -51,6 +51,26 @@
>  #define ARP_PTYPE_IP 0x0800
>  #define ARP_OP_REQUEST_REV 0x3
> 
> +#include "qemu/rcu_queue.h"
> +#include <openssl/md5.h>
> +
> +static void check_host_md5(void)
> +{
> +    int i;
> +    unsigned char md[MD5_DIGEST_LENGTH];
> +    MD5_CTX ctx;
> +    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
> +
> +    MD5_Init(&ctx);
> +    MD5_Update(&ctx, (void *)block->host, block->used_length);
> +    MD5_Final(md, &ctx);
> +    printf("md_host : ");
> +    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
> +        fprintf(stderr, "%02x", md[i]);
> +    }
> +    fprintf(stderr, "\n");
> +}
> +
>  static int announce_self_create(uint8_t *buf,
>                                  uint8_t *mac_addr)
>  {
> @@ -741,7 +761,13 @@ void qemu_savevm_state_complete(QEMUFile *f)
>          qemu_put_byte(f, QEMU_VM_SECTION_END);
>          qemu_put_be32(f, se->section_id);
> 
> +        printf("before saving %s complete\n", se->idstr);
> +        check_host_md5();
> +
>          ret = se->ops->save_live_complete(f, se->opaque);
> +        printf("after saving %s complete\n", se->idstr);
> +        check_host_md5();
> +
>          trace_savevm_section_end(se->idstr, se->section_id, ret);
>          if (ret < 0) {
>              qemu_file_set_error(f, ret);
> @@ -1030,6 +1063,11 @@ int qemu_loadvm_state(QEMUFile *f)
>              }
> 
>              ret = vmstate_load(f, le->se, le->version_id);
> +            if (section_type == QEMU_VM_SECTION_END) {
> +                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
> +                check_host_md5();
> +            }
> +
>              if (ret < 0) {
>                  error_report("error while loading state section id %d(%s)",
>                               section_id, le->se->idstr);
> @@ -1061,7 +1099,11 @@ int qemu_loadvm_state(QEMUFile *f)
>          g_free(buf);
>      }
> 
> +    printf("after loading all vmstate\n");
> +    check_host_md5();
>      cpu_synchronize_all_post_init();
> +    printf("after cpu_synchronize_all_post_init\n");
> +    check_host_md5();
> 
>      ret = 0;
> 
> -- 

> From ecb789cf7f383b112da3cce33eb9822a94b9497a Mon Sep 17 00:00:00 2001
> From: Li Zhijian <lizhijian@cn.fujitsu.com>
> Date: Tue, 24 Mar 2015 21:53:26 -0400
> Subject: [PATCH] check pc.ram block md5sum between migration Source and
>  Destination
> 
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  savevm.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
>  mode change 100644 => 100755 savevm.c
> 
> diff --git a/savevm.c b/savevm.c
> old mode 100644
> new mode 100755
> index 3b0e222..3d431dc
> --- a/savevm.c
> +++ b/savevm.c
> @@ -51,6 +51,26 @@
>  #define ARP_PTYPE_IP 0x0800
>  #define ARP_OP_REQUEST_REV 0x3
>  
> +#include "qemu/rcu_queue.h"
> +#include <openssl/md5.h>
> +
> +static void check_host_md5(void)
> +{
> +    int i;
> +    unsigned char md[MD5_DIGEST_LENGTH];
> +    MD5_CTX ctx;
> +    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
> +
> +    MD5_Init(&ctx);
> +    MD5_Update(&ctx, (void *)block->host, block->used_length);
> +    MD5_Final(md, &ctx);
> +    printf("md_host : ");
> +    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
> +        fprintf(stderr, "%02x", md[i]);
> +    }
> +    fprintf(stderr, "\n");
> +}
> +
>  static int announce_self_create(uint8_t *buf,
>                                  uint8_t *mac_addr)
>  {
> @@ -741,7 +761,13 @@ void qemu_savevm_state_complete(QEMUFile *f)
>          qemu_put_byte(f, QEMU_VM_SECTION_END);
>          qemu_put_be32(f, se->section_id);
>  
> +        printf("before saving %s complete\n", se->idstr);
> +        check_host_md5();
> +
>          ret = se->ops->save_live_complete(f, se->opaque);
> +        printf("after saving %s complete\n", se->idstr);
> +        check_host_md5();
> +
>          trace_savevm_section_end(se->idstr, se->section_id, ret);
>          if (ret < 0) {
>              qemu_file_set_error(f, ret);
> @@ -1007,6 +1033,13 @@ int qemu_loadvm_state(QEMUFile *f)
>              QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
>  
>              ret = vmstate_load(f, le->se, le->version_id);
> +#if 0
> +            if (section_type == QEMU_VM_SECTION_FULL) {
> +                printf("QEMU_VM_SECTION_FULL, after loading %s\n", le->se->idstr);
> +                check_host_md5();
> +            }
> +#endif
> +
>              if (ret < 0) {
>                  error_report("error while loading state for instance 0x%x of"
>                               " device '%s'", instance_id, idstr);
> @@ -1030,6 +1063,11 @@ int qemu_loadvm_state(QEMUFile *f)
>              }
>  
>              ret = vmstate_load(f, le->se, le->version_id);
> +            if (section_type == QEMU_VM_SECTION_END) {
> +                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
> +                check_host_md5();
> +            }
> +
>              if (ret < 0) {
>                  error_report("error while loading state section id %d(%s)",
>                               section_id, le->se->idstr);
> @@ -1061,7 +1099,11 @@ int qemu_loadvm_state(QEMUFile *f)
>          g_free(buf);
>      }
>  
> +    printf("after loading all vmstate\n");
> +    check_host_md5();
>      cpu_synchronize_all_post_init();
> +    printf("after cpu_synchronize_all_post_init\n");
> +    check_host_md5();
>  
>      ret = 0;
>  
> -- 
> 1.7.12.4
> 

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Juan Quintela March 25, 2015, 9:50 a.m. UTC | #2
zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
> Hi all,
>
> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
> when we check it just after finishing migration but before VM continue to Run.
>
> We use a patch like bellow to find this issue, you can find it from affix,
> and Steps to reprduce:
>
> (1) Compile QEMU:
>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>
> (2) Command and output:
> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio

Could you try to reproduce:
- without vhost
- without virtio-net
- cache=unsafe is going to give you trouble, but trouble should only
  happen after migration of pages have finished.

What kind of load were you having when reproducing this issue?
Just to confirm, you have been able to reproduce this without COLO
patches, right?

> (qemu) migrate tcp:192.168.3.8:3004
> before saving ram complete
> ff703f6889ab8701e4e040872d079a28
> md_host : after saving ram complete
> ff703f6889ab8701e4e040872d079a28
>
> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
> (qemu) QEMU_VM_SECTION_END, after loading ram
> 230e1e68ece9cd4e769630e1bcb5ddfb
> md_host : after loading all vmstate
> 230e1e68ece9cd4e769630e1bcb5ddfb
> md_host : after cpu_synchronize_all_post_init
> 230e1e68ece9cd4e769630e1bcb5ddfb
>
> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.

OK, a couple of things.  Memory don't have to be exactly identical.
Virtio devices in particular do funny things on "post-load".  There
aren't warantees for that as far as I know, we should end with an
equivalent device state in memory.

> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
> it is very difficult for us to trace all the actions of dirtying VM's pages.

This seems to point to a bug in one of the devices.

> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
> when do checkpoint in COLO FT, and everything will be OK.)
>
> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?

Pages transferred should be the same, after device state transmission is
when things could change.

> This issue has blocked our COLO development... :(
>
> Any help will be greatly appreciated!

Later, Juan.

>
> Thanks,
> zhanghailiang
>
> --- a/savevm.c
> +++ b/savevm.c
> @@ -51,6 +51,26 @@
>  #define ARP_PTYPE_IP 0x0800
>  #define ARP_OP_REQUEST_REV 0x3
>
> +#include "qemu/rcu_queue.h"
> +#include <openssl/md5.h>
> +
> +static void check_host_md5(void)
> +{
> +    int i;
> +    unsigned char md[MD5_DIGEST_LENGTH];
> +    MD5_CTX ctx;
> +    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
> +
> +    MD5_Init(&ctx);
> +    MD5_Update(&ctx, (void *)block->host, block->used_length);
> +    MD5_Final(md, &ctx);
> +    printf("md_host : ");
> +    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
> +        fprintf(stderr, "%02x", md[i]);
> +    }
> +    fprintf(stderr, "\n");
> +}
> +
>  static int announce_self_create(uint8_t *buf,
>                                  uint8_t *mac_addr)
>  {
> @@ -741,7 +761,13 @@ void qemu_savevm_state_complete(QEMUFile *f)
>          qemu_put_byte(f, QEMU_VM_SECTION_END);
>          qemu_put_be32(f, se->section_id);
>
> +        printf("before saving %s complete\n", se->idstr);
> +        check_host_md5();
> +
>          ret = se->ops->save_live_complete(f, se->opaque);
> +        printf("after saving %s complete\n", se->idstr);
> +        check_host_md5();
> +
>          trace_savevm_section_end(se->idstr, se->section_id, ret);
>          if (ret < 0) {
>              qemu_file_set_error(f, ret);
> @@ -1030,6 +1063,11 @@ int qemu_loadvm_state(QEMUFile *f)
>              }
>
>              ret = vmstate_load(f, le->se, le->version_id);
> +            if (section_type == QEMU_VM_SECTION_END) {
> +                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
> +                check_host_md5();
> +            }
> +
>              if (ret < 0) {
>                  error_report("error while loading state section id %d(%s)",
>                               section_id, le->se->idstr);
> @@ -1061,7 +1099,11 @@ int qemu_loadvm_state(QEMUFile *f)
>          g_free(buf);
>      }
>
> +    printf("after loading all vmstate\n");
> +    check_host_md5();
>      cpu_synchronize_all_post_init();
> +    printf("after cpu_synchronize_all_post_init\n");
> +    check_host_md5();
>
>      ret = 0;
Wen Congyang March 25, 2015, 10:21 a.m. UTC | #3
On 03/25/2015 05:50 PM, Juan Quintela wrote:
> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>> Hi all,
>>
>> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
>> when we check it just after finishing migration but before VM continue to Run.
>>
>> We use a patch like bellow to find this issue, you can find it from affix,
>> and Steps to reprduce:
>>
>> (1) Compile QEMU:
>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>
>> (2) Command and output:
>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
> 
> Could you try to reproduce:
> - without vhost
> - without virtio-net
> - cache=unsafe is going to give you trouble, but trouble should only
>   happen after migration of pages have finished.

I can use e1000 to reproduce this problem.

> 
> What kind of load were you having when reproducing this issue?
> Just to confirm, you have been able to reproduce this without COLO
> patches, right?

I can reproduce it without COLO patches. The newest commit is:
commit 054903a832b865eb5432d79b5c9d1e1ff31b58d7
Author: Peter Maydell <peter.maydell@linaro.org>
Date:   Tue Mar 24 16:34:16 2015 +0000

    Update version for v2.3.0-rc1 release
    
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

> 
>> (qemu) migrate tcp:192.168.3.8:3004
>> before saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>> md_host : after saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>>
>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
>> (qemu) QEMU_VM_SECTION_END, after loading ram
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after loading all vmstate
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after cpu_synchronize_all_post_init
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>
>> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
> 
> OK, a couple of things.  Memory don't have to be exactly identical.
> Virtio devices in particular do funny things on "post-load".  There
> aren't warantees for that as far as I know, we should end with an
> equivalent device state in memory.
> 
>> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
>> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
>> it is very difficult for us to trace all the actions of dirtying VM's pages.
> 
> This seems to point to a bug in one of the devices.
> 
>> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
>> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
>> when do checkpoint in COLO FT, and everything will be OK.)
>>
>> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
> 
> Pages transferred should be the same, after device state transmission is
> when things could change.
> 
>> This issue has blocked our COLO development... :(
>>
>> Any help will be greatly appreciated!
> 
> Later, Juan.
>
Zhanghailiang March 25, 2015, 11:28 a.m. UTC | #4
On 2015/3/25 17:46, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Hi all,
>>
>> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
>> when we check it just after finishing migration but before VM continue to Run.
>>
>> We use a patch like bellow to find this issue, you can find it from affix,
>> and Steps to reproduce:
>>
>> (1) Compile QEMU:
>>   ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>
>> (2) Command and output:
>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
>> (qemu) migrate tcp:192.168.3.8:3004
>> before saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>> md_host : after saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>>
>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
>> (qemu) QEMU_VM_SECTION_END, after loading ram
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after loading all vmstate
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after cpu_synchronize_all_post_init
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>
>> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
>>
>> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
>> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>
>> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
>> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
>> when do checkpoint in COLO FT, and everything will be OK.)
>>
>> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
>
> That does sound like a bug.
> The only other explanation I have is that memory is being changed by a device emulation
> that happens after the end of a saving the vm, or after loading the memory.  That's
> certainly possible - especially if a device (say networking) hasn't been properly
> stopped.
>
>> This issue has blocked our COLO development... :(
>>
>> Any help will be greatly appreciated!
>

Hi Dave,

> I suggest:
>     1) Does it happen with devices other than virtio?
>     2) Strip the devices down - e.g. just run with serial and no video/usb

I will try this ...

>     3) Try doing the md5 comparison at the end of ram_save_complete

Yes, we have done this test at the end of ram_save_complete.

>     4) mprotect RAM after the ram_save_complete and see if anything faults.

I have tried, and there will be kvm error reports. like bellow:

KVM: entry failed, hardware error 0x7
EAX=00000000 EBX=0000e000 ECX=00009578 EDX=0000434f
ESI=0000fc10 EDI=0000434f EBP=00000000 ESP=00001fca
EIP=00009594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0040 00000400 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009b00
SS =434f 000434f0 0000ffff 00009300
DS =434f 000434f0 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     0002dcc8 00000047
IDT=     00000000 0000ffff
CR0=00000010 CR2=ffffffff CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90 <f3> 90 fa fc 66 c3 66 53 66 89 c3 66 e8 9d e8 ff ff 66 01 c3 66 89 d8 66 e8 40 e9 ff ff 66
ERROR: invalid runstate transition: 'internal-error' -> 'colo'

>     5) Can you trigger this with normal migration or just COLO?

Yes, we have reproduced it without COLO, just in normal migration

>        I'm wondering if something is doing something on a running/paused/etc state
>        change and isn't expecting the new COLO states.
>

Actually, we can also reproduce this with COLO, only in master side, we store all RAM to ram-cache, and also duplicate migration_bitmap,
and after a checkpoint, before VM continue to run, we compare cache memory and current memory, if they are different but whose
dirty bitmap is not set, we thought this page is missing tracing by migration bitmap. The result is  result will be like:
(Certainly, this happens also occasionally, sometimes, there will be none page un-traced.)
pc.ram[0xc7000](mis_dirty)
pc.ram[0xec000](mis_dirty)
pc.ram[0xf4000](mis_dirty)
pc.ram[0xfa000](mis_dirty)
pc.ram[0xbf93000](mis_dirty)
pc.ram[0xbf94000](mis_dirty)
pc.ram[0xbf95000](mis_dirty)
pc.ram[0xbf96000](mis_dirty)
pc.ram[0xbf97000](mis_dirty)
pc.ram[0xbf98000](mis_dirty)
pc.ram[0xbf99000](mis_dirty)
pc.ram[0xbf9a000](mis_dirty)
pc.ram[0xbf9b000](mis_dirty)
pc.ram[0xbf9c000](mis_dirty)
...

Thanks,
zhang

>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -51,6 +51,26 @@
>>   #define ARP_PTYPE_IP 0x0800
>>   #define ARP_OP_REQUEST_REV 0x3
>>
>> +#include "qemu/rcu_queue.h"
>> +#include <openssl/md5.h>
>> +
>> +static void check_host_md5(void)
>> +{
>> +    int i;
>> +    unsigned char md[MD5_DIGEST_LENGTH];
>> +    MD5_CTX ctx;
>> +    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
>> +
>> +    MD5_Init(&ctx);
>> +    MD5_Update(&ctx, (void *)block->host, block->used_length);
>> +    MD5_Final(md, &ctx);
>> +    printf("md_host : ");
>> +    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
>> +        fprintf(stderr, "%02x", md[i]);
>> +    }
>> +    fprintf(stderr, "\n");
>> +}
>> +
>>   static int announce_self_create(uint8_t *buf,
>>                                   uint8_t *mac_addr)
>>   {
>> @@ -741,7 +761,13 @@ void qemu_savevm_state_complete(QEMUFile *f)
>>           qemu_put_byte(f, QEMU_VM_SECTION_END);
>>           qemu_put_be32(f, se->section_id);
>>
>> +        printf("before saving %s complete\n", se->idstr);
>> +        check_host_md5();
>> +
>>           ret = se->ops->save_live_complete(f, se->opaque);
>> +        printf("after saving %s complete\n", se->idstr);
>> +        check_host_md5();
>> +
>>           trace_savevm_section_end(se->idstr, se->section_id, ret);
>>           if (ret < 0) {
>>               qemu_file_set_error(f, ret);
>> @@ -1030,6 +1063,11 @@ int qemu_loadvm_state(QEMUFile *f)
>>               }
>>
>>               ret = vmstate_load(f, le->se, le->version_id);
>> +            if (section_type == QEMU_VM_SECTION_END) {
>> +                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
>> +                check_host_md5();
>> +            }
>> +
>>               if (ret < 0) {
>>                   error_report("error while loading state section id %d(%s)",
>>                                section_id, le->se->idstr);
>> @@ -1061,7 +1099,11 @@ int qemu_loadvm_state(QEMUFile *f)
>>           g_free(buf);
>>       }
>>
>> +    printf("after loading all vmstate\n");
>> +    check_host_md5();
>>       cpu_synchronize_all_post_init();
>> +    printf("after cpu_synchronize_all_post_init\n");
>> +    check_host_md5();
>>
>>       ret = 0;
>>
>> --
>
>>  From ecb789cf7f383b112da3cce33eb9822a94b9497a Mon Sep 17 00:00:00 2001
>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Date: Tue, 24 Mar 2015 21:53:26 -0400
>> Subject: [PATCH] check pc.ram block md5sum between migration Source and
>>   Destination
>>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   savevm.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 42 insertions(+)
>>   mode change 100644 => 100755 savevm.c
>>
>> diff --git a/savevm.c b/savevm.c
>> old mode 100644
>> new mode 100755
>> index 3b0e222..3d431dc
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -51,6 +51,26 @@
>>   #define ARP_PTYPE_IP 0x0800
>>   #define ARP_OP_REQUEST_REV 0x3
>>
>> +#include "qemu/rcu_queue.h"
>> +#include <openssl/md5.h>
>> +
>> +static void check_host_md5(void)
>> +{
>> +    int i;
>> +    unsigned char md[MD5_DIGEST_LENGTH];
>> +    MD5_CTX ctx;
>> +    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
>> +
>> +    MD5_Init(&ctx);
>> +    MD5_Update(&ctx, (void *)block->host, block->used_length);
>> +    MD5_Final(md, &ctx);
>> +    printf("md_host : ");
>> +    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
>> +        fprintf(stderr, "%02x", md[i]);
>> +    }
>> +    fprintf(stderr, "\n");
>> +}
>> +
>>   static int announce_self_create(uint8_t *buf,
>>                                   uint8_t *mac_addr)
>>   {
>> @@ -741,7 +761,13 @@ void qemu_savevm_state_complete(QEMUFile *f)
>>           qemu_put_byte(f, QEMU_VM_SECTION_END);
>>           qemu_put_be32(f, se->section_id);
>>
>> +        printf("before saving %s complete\n", se->idstr);
>> +        check_host_md5();
>> +
>>           ret = se->ops->save_live_complete(f, se->opaque);
>> +        printf("after saving %s complete\n", se->idstr);
>> +        check_host_md5();
>> +
>>           trace_savevm_section_end(se->idstr, se->section_id, ret);
>>           if (ret < 0) {
>>               qemu_file_set_error(f, ret);
>> @@ -1007,6 +1033,13 @@ int qemu_loadvm_state(QEMUFile *f)
>>               QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
>>
>>               ret = vmstate_load(f, le->se, le->version_id);
>> +#if 0
>> +            if (section_type == QEMU_VM_SECTION_FULL) {
>> +                printf("QEMU_VM_SECTION_FULL, after loading %s\n", le->se->idstr);
>> +                check_host_md5();
>> +            }
>> +#endif
>> +
>>               if (ret < 0) {
>>                   error_report("error while loading state for instance 0x%x of"
>>                                " device '%s'", instance_id, idstr);
>> @@ -1030,6 +1063,11 @@ int qemu_loadvm_state(QEMUFile *f)
>>               }
>>
>>               ret = vmstate_load(f, le->se, le->version_id);
>> +            if (section_type == QEMU_VM_SECTION_END) {
>> +                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
>> +                check_host_md5();
>> +            }
>> +
>>               if (ret < 0) {
>>                   error_report("error while loading state section id %d(%s)",
>>                                section_id, le->se->idstr);
>> @@ -1061,7 +1099,11 @@ int qemu_loadvm_state(QEMUFile *f)
>>           g_free(buf);
>>       }
>>
>> +    printf("after loading all vmstate\n");
>> +    check_host_md5();
>>       cpu_synchronize_all_post_init();
>> +    printf("after cpu_synchronize_all_post_init\n");
>> +    check_host_md5();
>>
>>       ret = 0;
>>
>> --
>> 1.7.12.4
>>
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>
Zhanghailiang March 25, 2015, 11:32 a.m. UTC | #5
On 2015/3/25 17:50, Juan Quintela wrote:
> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>> Hi all,
>>
>> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
>> when we check it just after finishing migration but before VM continue to Run.
>>
>> We use a patch like bellow to find this issue, you can find it from affix,
>> and Steps to reprduce:
>>
>> (1) Compile QEMU:
>>   ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>
>> (2) Command and output:
>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
>
> Could you try to reproduce:
> - without vhost
> - without virtio-net

Yes, with e1000, the problem is still exist, i will test it with dropping all the unnecessary device configure,
just as Dave's suggestion.

> - cache=unsafe is going to give you trouble, but trouble should only
>    happen after migration of pages have finished.
>
> What kind of load were you having when reproducing this issue?
> Just to confirm, you have been able to reproduce this without COLO
> patches, right?
>

Yes, we reproduce this only in normal migration.

>> (qemu) migrate tcp:192.168.3.8:3004
>> before saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>> md_host : after saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>>
>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
>> (qemu) QEMU_VM_SECTION_END, after loading ram
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after loading all vmstate
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after cpu_synchronize_all_post_init
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>
>> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
>
> OK, a couple of things.  Memory don't have to be exactly identical.
> Virtio devices in particular do funny things on "post-load".  There
> aren't warantees for that as far as I know, we should end with an
> equivalent device state in memory.
>
>> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
>> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>
> This seems to point to a bug in one of the devices.
>
>> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
>> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
>> when do checkpoint in COLO FT, and everything will be OK.)
>>
>> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
>
> Pages transferred should be the same, after device state transmission is
> when things could change.
>
>> This issue has blocked our COLO development... :(
>>
>> Any help will be greatly appreciated!
>
> Later, Juan.
>
>>
>> Thanks,
>> zhanghailiang
>>
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -51,6 +51,26 @@
>>   #define ARP_PTYPE_IP 0x0800
>>   #define ARP_OP_REQUEST_REV 0x3
>>
>> +#include "qemu/rcu_queue.h"
>> +#include <openssl/md5.h>
>> +
>> +static void check_host_md5(void)
>> +{
>> +    int i;
>> +    unsigned char md[MD5_DIGEST_LENGTH];
>> +    MD5_CTX ctx;
>> +    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
>> +
>> +    MD5_Init(&ctx);
>> +    MD5_Update(&ctx, (void *)block->host, block->used_length);
>> +    MD5_Final(md, &ctx);
>> +    printf("md_host : ");
>> +    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
>> +        fprintf(stderr, "%02x", md[i]);
>> +    }
>> +    fprintf(stderr, "\n");
>> +}
>> +
>>   static int announce_self_create(uint8_t *buf,
>>                                   uint8_t *mac_addr)
>>   {
>> @@ -741,7 +761,13 @@ void qemu_savevm_state_complete(QEMUFile *f)
>>           qemu_put_byte(f, QEMU_VM_SECTION_END);
>>           qemu_put_be32(f, se->section_id);
>>
>> +        printf("before saving %s complete\n", se->idstr);
>> +        check_host_md5();
>> +
>>           ret = se->ops->save_live_complete(f, se->opaque);
>> +        printf("after saving %s complete\n", se->idstr);
>> +        check_host_md5();
>> +
>>           trace_savevm_section_end(se->idstr, se->section_id, ret);
>>           if (ret < 0) {
>>               qemu_file_set_error(f, ret);
>> @@ -1030,6 +1063,11 @@ int qemu_loadvm_state(QEMUFile *f)
>>               }
>>
>>               ret = vmstate_load(f, le->se, le->version_id);
>> +            if (section_type == QEMU_VM_SECTION_END) {
>> +                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
>> +                check_host_md5();
>> +            }
>> +
>>               if (ret < 0) {
>>                   error_report("error while loading state section id %d(%s)",
>>                                section_id, le->se->idstr);
>> @@ -1061,7 +1099,11 @@ int qemu_loadvm_state(QEMUFile *f)
>>           g_free(buf);
>>       }
>>
>> +    printf("after loading all vmstate\n");
>> +    check_host_md5();
>>       cpu_synchronize_all_post_init();
>> +    printf("after cpu_synchronize_all_post_init\n");
>> +    check_host_md5();
>>
>>       ret = 0;
>
> .
>
Dr. David Alan Gilbert March 25, 2015, 11:36 a.m. UTC | #6
* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/3/25 17:46, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>Hi all,
> >>
> >>We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
> >>when we check it just after finishing migration but before VM continue to Run.
> >>
> >>We use a patch like bellow to find this issue, you can find it from affix,
> >>and Steps to reproduce:
> >>
> >>(1) Compile QEMU:
> >>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
> >>
> >>(2) Command and output:
> >>SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
> >>(qemu) migrate tcp:192.168.3.8:3004
> >>before saving ram complete
> >>ff703f6889ab8701e4e040872d079a28
> >>md_host : after saving ram complete
> >>ff703f6889ab8701e4e040872d079a28
> >>
> >>DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
> >>(qemu) QEMU_VM_SECTION_END, after loading ram
> >>230e1e68ece9cd4e769630e1bcb5ddfb
> >>md_host : after loading all vmstate
> >>230e1e68ece9cd4e769630e1bcb5ddfb
> >>md_host : after cpu_synchronize_all_post_init
> >>230e1e68ece9cd4e769630e1bcb5ddfb
> >>
> >>This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
> >>
> >>We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
> >>We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
> >>it is very difficult for us to trace all the actions of dirtying VM's pages.
> >>
> >>Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
> >>VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
> >>when do checkpoint in COLO FT, and everything will be OK.)
> >>
> >>Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
> >
> >That does sound like a bug.
> >The only other explanation I have is that memory is being changed by a device emulation
> >that happens after the end of a saving the vm, or after loading the memory.  That's
> >certainly possible - especially if a device (say networking) hasn't been properly
> >stopped.
> >
> >>This issue has blocked our COLO development... :(
> >>
> >>Any help will be greatly appreciated!
> >
> 
> Hi Dave???
> 
> >I suggest:
> >    1) Does it happen with devices other than virtio?
> >    2) Strip the devices down - e.g. just run with serial and no video/usb
> 
> I will try this ...
> 
> >    3) Try doing the md5 comparison at the end of ram_save_complete
> 
> Yes, we have done this test at the end of ram_save_complete.
> 
> >    4) mprotect RAM after the ram_save_complete and see if anything faults.
> 
> I have tried, and there will be kvm error reports. like bellow:
> 
> KVM: entry failed, hardware error 0x7

If the RAM is mprotect'd with PROT_READ|PROT_EXEC I'm not sure why this
would happen; but then the question is why is it trying to do a KVM entry
after the end of ram_save_complete - that suggests the CPUs are trying to
run again, and that shouldn't happen in a normal migration.
If you can trigger this KVM entry on a non-colo migration, then I suggest
get the backtrace from where the KVM entry failed, because then you
should be able to see why the CPU was being restarted.

Dave

> EAX=00000000 EBX=0000e000 ECX=00009578 EDX=0000434f
> ESI=0000fc10 EDI=0000434f EBP=00000000 ESP=00001fca
> EIP=00009594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0040 00000400 0000ffff 00009300
> CS =f000 000f0000 0000ffff 00009b00
> SS =434f 000434f0 0000ffff 00009300
> DS =434f 000434f0 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT=     0002dcc8 00000047
> IDT=     00000000 0000ffff
> CR0=00000010 CR2=ffffffff CR3=00000000 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90 <f3> 90 fa fc 66 c3 66 53 66 89 c3 66 e8 9d e8 ff ff 66 01 c3 66 89 d8 66 e8 40 e9 ff ff 66
> ERROR: invalid runstate transition: 'internal-error' -> 'colo'
> 
> >    5) Can you trigger this with normal migration or just COLO?
> 
> Yes, we have reproduced it without COLO, just in normal migration
> 
> >       I'm wondering if something is doing something on a running/paused/etc state
> >       change and isn't expecting the new COLO states.
> >
> 
> Actually, we can also reproduce this with COLO, only in master side, we store all RAM to ram-cache, and also duplicate migration_bitmap,
> and after a checkpoint, before VM continue to run, we compare cache memory and current memory, if they are different but whose
> dirty bitmap is not set, we thought this page is missing tracing by migration bitmap. The result is  result will be like:
> (Certainly, this happens also occasionally, sometimes, there will be none page un-traced.)
> pc.ram[0xc7000](mis_dirty)
> pc.ram[0xec000](mis_dirty)
> pc.ram[0xf4000](mis_dirty)
> pc.ram[0xfa000](mis_dirty)
> pc.ram[0xbf93000](mis_dirty)
> pc.ram[0xbf94000](mis_dirty)
> pc.ram[0xbf95000](mis_dirty)
> pc.ram[0xbf96000](mis_dirty)
> pc.ram[0xbf97000](mis_dirty)
> pc.ram[0xbf98000](mis_dirty)
> pc.ram[0xbf99000](mis_dirty)
> pc.ram[0xbf9a000](mis_dirty)
> pc.ram[0xbf9b000](mis_dirty)
> pc.ram[0xbf9c000](mis_dirty)
> ...
> 
> Thanks,
> zhang
> 
> >>--- a/savevm.c
> >>+++ b/savevm.c
> >>@@ -51,6 +51,26 @@
> >>  #define ARP_PTYPE_IP 0x0800
> >>  #define ARP_OP_REQUEST_REV 0x3
> >>
> >>+#include "qemu/rcu_queue.h"
> >>+#include <openssl/md5.h>
> >>+
> >>+static void check_host_md5(void)
> >>+{
> >>+    int i;
> >>+    unsigned char md[MD5_DIGEST_LENGTH];
> >>+    MD5_CTX ctx;
> >>+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
> >>+
> >>+    MD5_Init(&ctx);
> >>+    MD5_Update(&ctx, (void *)block->host, block->used_length);
> >>+    MD5_Final(md, &ctx);
> >>+    printf("md_host : ");
> >>+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
> >>+        fprintf(stderr, "%02x", md[i]);
> >>+    }
> >>+    fprintf(stderr, "\n");
> >>+}
> >>+
> >>  static int announce_self_create(uint8_t *buf,
> >>                                  uint8_t *mac_addr)
> >>  {
> >>@@ -741,7 +761,13 @@ void qemu_savevm_state_complete(QEMUFile *f)
> >>          qemu_put_byte(f, QEMU_VM_SECTION_END);
> >>          qemu_put_be32(f, se->section_id);
> >>
> >>+        printf("before saving %s complete\n", se->idstr);
> >>+        check_host_md5();
> >>+
> >>          ret = se->ops->save_live_complete(f, se->opaque);
> >>+        printf("after saving %s complete\n", se->idstr);
> >>+        check_host_md5();
> >>+
> >>          trace_savevm_section_end(se->idstr, se->section_id, ret);
> >>          if (ret < 0) {
> >>              qemu_file_set_error(f, ret);
> >>@@ -1030,6 +1063,11 @@ int qemu_loadvm_state(QEMUFile *f)
> >>              }
> >>
> >>              ret = vmstate_load(f, le->se, le->version_id);
> >>+            if (section_type == QEMU_VM_SECTION_END) {
> >>+                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
> >>+                check_host_md5();
> >>+            }
> >>+
> >>              if (ret < 0) {
> >>                  error_report("error while loading state section id %d(%s)",
> >>                               section_id, le->se->idstr);
> >>@@ -1061,7 +1099,11 @@ int qemu_loadvm_state(QEMUFile *f)
> >>          g_free(buf);
> >>      }
> >>
> >>+    printf("after loading all vmstate\n");
> >>+    check_host_md5();
> >>      cpu_synchronize_all_post_init();
> >>+    printf("after cpu_synchronize_all_post_init\n");
> >>+    check_host_md5();
> >>
> >>      ret = 0;
> >>
> >>--
> >
> >> From ecb789cf7f383b112da3cce33eb9822a94b9497a Mon Sep 17 00:00:00 2001
> >>From: Li Zhijian <lizhijian@cn.fujitsu.com>
> >>Date: Tue, 24 Mar 2015 21:53:26 -0400
> >>Subject: [PATCH] check pc.ram block md5sum between migration Source and
> >>  Destination
> >>
> >>Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> >>---
> >>  savevm.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 42 insertions(+)
> >>  mode change 100644 => 100755 savevm.c
> >>
> >>diff --git a/savevm.c b/savevm.c
> >>old mode 100644
> >>new mode 100755
> >>index 3b0e222..3d431dc
> >>--- a/savevm.c
> >>+++ b/savevm.c
> >>@@ -51,6 +51,26 @@
> >>  #define ARP_PTYPE_IP 0x0800
> >>  #define ARP_OP_REQUEST_REV 0x3
> >>
> >>+#include "qemu/rcu_queue.h"
> >>+#include <openssl/md5.h>
> >>+
> >>+static void check_host_md5(void)
> >>+{
> >>+    int i;
> >>+    unsigned char md[MD5_DIGEST_LENGTH];
> >>+    MD5_CTX ctx;
> >>+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
> >>+
> >>+    MD5_Init(&ctx);
> >>+    MD5_Update(&ctx, (void *)block->host, block->used_length);
> >>+    MD5_Final(md, &ctx);
> >>+    printf("md_host : ");
> >>+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
> >>+        fprintf(stderr, "%02x", md[i]);
> >>+    }
> >>+    fprintf(stderr, "\n");
> >>+}
> >>+
> >>  static int announce_self_create(uint8_t *buf,
> >>                                  uint8_t *mac_addr)
> >>  {
> >>@@ -741,7 +761,13 @@ void qemu_savevm_state_complete(QEMUFile *f)
> >>          qemu_put_byte(f, QEMU_VM_SECTION_END);
> >>          qemu_put_be32(f, se->section_id);
> >>
> >>+        printf("before saving %s complete\n", se->idstr);
> >>+        check_host_md5();
> >>+
> >>          ret = se->ops->save_live_complete(f, se->opaque);
> >>+        printf("after saving %s complete\n", se->idstr);
> >>+        check_host_md5();
> >>+
> >>          trace_savevm_section_end(se->idstr, se->section_id, ret);
> >>          if (ret < 0) {
> >>              qemu_file_set_error(f, ret);
> >>@@ -1007,6 +1033,13 @@ int qemu_loadvm_state(QEMUFile *f)
> >>              QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
> >>
> >>              ret = vmstate_load(f, le->se, le->version_id);
> >>+#if 0
> >>+            if (section_type == QEMU_VM_SECTION_FULL) {
> >>+                printf("QEMU_VM_SECTION_FULL, after loading %s\n", le->se->idstr);
> >>+                check_host_md5();
> >>+            }
> >>+#endif
> >>+
> >>              if (ret < 0) {
> >>                  error_report("error while loading state for instance 0x%x of"
> >>                               " device '%s'", instance_id, idstr);
> >>@@ -1030,6 +1063,11 @@ int qemu_loadvm_state(QEMUFile *f)
> >>              }
> >>
> >>              ret = vmstate_load(f, le->se, le->version_id);
> >>+            if (section_type == QEMU_VM_SECTION_END) {
> >>+                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
> >>+                check_host_md5();
> >>+            }
> >>+
> >>              if (ret < 0) {
> >>                  error_report("error while loading state section id %d(%s)",
> >>                               section_id, le->se->idstr);
> >>@@ -1061,7 +1099,11 @@ int qemu_loadvm_state(QEMUFile *f)
> >>          g_free(buf);
> >>      }
> >>
> >>+    printf("after loading all vmstate\n");
> >>+    check_host_md5();
> >>      cpu_synchronize_all_post_init();
> >>+    printf("after cpu_synchronize_all_post_init\n");
> >>+    check_host_md5();
> >>
> >>      ret = 0;
> >>
> >>--
> >>1.7.12.4
> >>
> >
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Zhanghailiang March 25, 2015, 11:48 a.m. UTC | #7
On 2015/3/25 19:36, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2015/3/25 17:46, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> Hi all,
>>>>
>>>> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>
>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>> and Steps to reproduce:
>>>>
>>>> (1) Compile QEMU:
>>>>   ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>
>>>> (2) Command and output:
>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
>>>> (qemu) migrate tcp:192.168.3.8:3004
>>>> before saving ram complete
>>>> ff703f6889ab8701e4e040872d079a28
>>>> md_host : after saving ram complete
>>>> ff703f6889ab8701e4e040872d079a28
>>>>
>>>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
>>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>> md_host : after loading all vmstate
>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>> md_host : after cpu_synchronize_all_post_init
>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>
>>>> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
>>>>
>>>> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
>>>> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>>>
>>>> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
>>>> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
>>>> when do checkpoint in COLO FT, and everything will be OK.)
>>>>
>>>> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
>>>
>>> That does sound like a bug.
>>> The only other explanation I have is that memory is being changed by a device emulation
>>> that happens after the end of a saving the vm, or after loading the memory.  That's
>>> certainly possible - especially if a device (say networking) hasn't been properly
>>> stopped.
>>>
>>>> This issue has blocked our COLO development... :(
>>>>
>>>> Any help will be greatly appreciated!
>>>
>>
>> Hi Dave???
>>
>>> I suggest:
>>>     1) Does it happen with devices other than virtio?
>>>     2) Strip the devices down - e.g. just run with serial and no video/usb
>>
>> I will try this ...
>>
>>>     3) Try doing the md5 comparison at the end of ram_save_complete
>>
>> Yes, we have done this test at the end of ram_save_complete.
>>
>>>     4) mprotect RAM after the ram_save_complete and see if anything faults.
>>
>> I have tried, and there will be kvm error reports. like bellow:
>>
>> KVM: entry failed, hardware error 0x7
>
> If the RAM is mprotect'd with PROT_READ|PROT_EXEC I'm not sure why this
> would happen; but then the question is why is it trying to do a KVM entry
> after the end of ram_save_complete - that suggests the CPUs are trying to
> run again, and that shouldn't happen in a normal migration.
> If you can trigger this KVM entry on a non-colo migration, then I suggest
> get the backtrace from where the KVM entry failed, because then you
> should be able to see why the CPU was being restarted.
>

Er, sorry, the follow report is reproduced with COLO, i will test this with mprotect
in normal migration.

> Dave
>
>> EAX=00000000 EBX=0000e000 ECX=00009578 EDX=0000434f
>> ESI=0000fc10 EDI=0000434f EBP=00000000 ESP=00001fca
>> EIP=00009594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0040 00000400 0000ffff 00009300
>> CS =f000 000f0000 0000ffff 00009b00
>> SS =434f 000434f0 0000ffff 00009300
>> DS =434f 000434f0 0000ffff 00009300
>> FS =0000 00000000 0000ffff 00009300
>> GS =0000 00000000 0000ffff 00009300
>> LDT=0000 00000000 0000ffff 00008200
>> TR =0000 00000000 0000ffff 00008b00
>> GDT=     0002dcc8 00000047
>> IDT=     00000000 0000ffff
>> CR0=00000010 CR2=ffffffff CR3=00000000 CR4=00000000
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000000
>> Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90 <f3> 90 fa fc 66 c3 66 53 66 89 c3 66 e8 9d e8 ff ff 66 01 c3 66 89 d8 66 e8 40 e9 ff ff 66
>> ERROR: invalid runstate transition: 'internal-error' -> 'colo'
>>
>>>     5) Can you trigger this with normal migration or just COLO?
>>
>> Yes, we have reproduced it without COLO, just in normal migration
>>
>>>        I'm wondering if something is doing something on a running/paused/etc state
>>>        change and isn't expecting the new COLO states.
>>>
>>
>> Actually, we can also reproduce this with COLO, only in master side, we store all RAM to ram-cache, and also duplicate migration_bitmap,
>> and after a checkpoint, before VM continue to run, we compare cache memory and current memory, if they are different but whose
>> dirty bitmap is not set, we thought this page is missing tracing by migration bitmap. The result is  result will be like:
>> (Certainly, this happens also occasionally, sometimes, there will be none page un-traced.)
>> pc.ram[0xc7000](mis_dirty)
>> pc.ram[0xec000](mis_dirty)
>> pc.ram[0xf4000](mis_dirty)
>> pc.ram[0xfa000](mis_dirty)
>> pc.ram[0xbf93000](mis_dirty)
>> pc.ram[0xbf94000](mis_dirty)
>> pc.ram[0xbf95000](mis_dirty)
>> pc.ram[0xbf96000](mis_dirty)
>> pc.ram[0xbf97000](mis_dirty)
>> pc.ram[0xbf98000](mis_dirty)
>> pc.ram[0xbf99000](mis_dirty)
>> pc.ram[0xbf9a000](mis_dirty)
>> pc.ram[0xbf9b000](mis_dirty)
>> pc.ram[0xbf9c000](mis_dirty)
>> ...
>>
>> Thanks,
>> zhang
>>
>>>> --- a/savevm.c
>>>> +++ b/savevm.c
>>>> @@ -51,6 +51,26 @@
>>>>   #define ARP_PTYPE_IP 0x0800
>>>>   #define ARP_OP_REQUEST_REV 0x3
>>>>
>>>> +#include "qemu/rcu_queue.h"
>>>> +#include <openssl/md5.h>
>>>> +
>>>> +static void check_host_md5(void)
>>>> +{
>>>> +    int i;
>>>> +    unsigned char md[MD5_DIGEST_LENGTH];
>>>> +    MD5_CTX ctx;
>>>> +    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
>>>> +
>>>> +    MD5_Init(&ctx);
>>>> +    MD5_Update(&ctx, (void *)block->host, block->used_length);
>>>> +    MD5_Final(md, &ctx);
>>>> +    printf("md_host : ");
>>>> +    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
>>>> +        fprintf(stderr, "%02x", md[i]);
>>>> +    }
>>>> +    fprintf(stderr, "\n");
>>>> +}
>>>> +
>>>>   static int announce_self_create(uint8_t *buf,
>>>>                                   uint8_t *mac_addr)
>>>>   {
>>>> @@ -741,7 +761,13 @@ void qemu_savevm_state_complete(QEMUFile *f)
>>>>           qemu_put_byte(f, QEMU_VM_SECTION_END);
>>>>           qemu_put_be32(f, se->section_id);
>>>>
>>>> +        printf("before saving %s complete\n", se->idstr);
>>>> +        check_host_md5();
>>>> +
>>>>           ret = se->ops->save_live_complete(f, se->opaque);
>>>> +        printf("after saving %s complete\n", se->idstr);
>>>> +        check_host_md5();
>>>> +
>>>>           trace_savevm_section_end(se->idstr, se->section_id, ret);
>>>>           if (ret < 0) {
>>>>               qemu_file_set_error(f, ret);
>>>> @@ -1030,6 +1063,11 @@ int qemu_loadvm_state(QEMUFile *f)
>>>>               }
>>>>
>>>>               ret = vmstate_load(f, le->se, le->version_id);
>>>> +            if (section_type == QEMU_VM_SECTION_END) {
>>>> +                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
>>>> +                check_host_md5();
>>>> +            }
>>>> +
>>>>               if (ret < 0) {
>>>>                   error_report("error while loading state section id %d(%s)",
>>>>                                section_id, le->se->idstr);
>>>> @@ -1061,7 +1099,11 @@ int qemu_loadvm_state(QEMUFile *f)
>>>>           g_free(buf);
>>>>       }
>>>>
>>>> +    printf("after loading all vmstate\n");
>>>> +    check_host_md5();
>>>>       cpu_synchronize_all_post_init();
>>>> +    printf("after cpu_synchronize_all_post_init\n");
>>>> +    check_host_md5();
>>>>
>>>>       ret = 0;
>>>>
>>>> --
>>>
>>>>  From ecb789cf7f383b112da3cce33eb9822a94b9497a Mon Sep 17 00:00:00 2001
>>>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>>>> Date: Tue, 24 Mar 2015 21:53:26 -0400
>>>> Subject: [PATCH] check pc.ram block md5sum between migration Source and
>>>>   Destination
>>>>
>>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>>> ---
>>>>   savevm.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 42 insertions(+)
>>>>   mode change 100644 => 100755 savevm.c
>>>>
>>>> diff --git a/savevm.c b/savevm.c
>>>> old mode 100644
>>>> new mode 100755
>>>> index 3b0e222..3d431dc
>>>> --- a/savevm.c
>>>> +++ b/savevm.c
>>>> @@ -51,6 +51,26 @@
>>>>   #define ARP_PTYPE_IP 0x0800
>>>>   #define ARP_OP_REQUEST_REV 0x3
>>>>
>>>> +#include "qemu/rcu_queue.h"
>>>> +#include <openssl/md5.h>
>>>> +
>>>> +static void check_host_md5(void)
>>>> +{
>>>> +    int i;
>>>> +    unsigned char md[MD5_DIGEST_LENGTH];
>>>> +    MD5_CTX ctx;
>>>> +    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
>>>> +
>>>> +    MD5_Init(&ctx);
>>>> +    MD5_Update(&ctx, (void *)block->host, block->used_length);
>>>> +    MD5_Final(md, &ctx);
>>>> +    printf("md_host : ");
>>>> +    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
>>>> +        fprintf(stderr, "%02x", md[i]);
>>>> +    }
>>>> +    fprintf(stderr, "\n");
>>>> +}
>>>> +
>>>>   static int announce_self_create(uint8_t *buf,
>>>>                                   uint8_t *mac_addr)
>>>>   {
>>>> @@ -741,7 +761,13 @@ void qemu_savevm_state_complete(QEMUFile *f)
>>>>           qemu_put_byte(f, QEMU_VM_SECTION_END);
>>>>           qemu_put_be32(f, se->section_id);
>>>>
>>>> +        printf("before saving %s complete\n", se->idstr);
>>>> +        check_host_md5();
>>>> +
>>>>           ret = se->ops->save_live_complete(f, se->opaque);
>>>> +        printf("after saving %s complete\n", se->idstr);
>>>> +        check_host_md5();
>>>> +
>>>>           trace_savevm_section_end(se->idstr, se->section_id, ret);
>>>>           if (ret < 0) {
>>>>               qemu_file_set_error(f, ret);
>>>> @@ -1007,6 +1033,13 @@ int qemu_loadvm_state(QEMUFile *f)
>>>>               QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
>>>>
>>>>               ret = vmstate_load(f, le->se, le->version_id);
>>>> +#if 0
>>>> +            if (section_type == QEMU_VM_SECTION_FULL) {
>>>> +                printf("QEMU_VM_SECTION_FULL, after loading %s\n", le->se->idstr);
>>>> +                check_host_md5();
>>>> +            }
>>>> +#endif
>>>> +
>>>>               if (ret < 0) {
>>>>                   error_report("error while loading state for instance 0x%x of"
>>>>                                " device '%s'", instance_id, idstr);
>>>> @@ -1030,6 +1063,11 @@ int qemu_loadvm_state(QEMUFile *f)
>>>>               }
>>>>
>>>>               ret = vmstate_load(f, le->se, le->version_id);
>>>> +            if (section_type == QEMU_VM_SECTION_END) {
>>>> +                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
>>>> +                check_host_md5();
>>>> +            }
>>>> +
>>>>               if (ret < 0) {
>>>>                   error_report("error while loading state section id %d(%s)",
>>>>                                section_id, le->se->idstr);
>>>> @@ -1061,7 +1099,11 @@ int qemu_loadvm_state(QEMUFile *f)
>>>>           g_free(buf);
>>>>       }
>>>>
>>>> +    printf("after loading all vmstate\n");
>>>> +    check_host_md5();
>>>>       cpu_synchronize_all_post_init();
>>>> +    printf("after cpu_synchronize_all_post_init\n");
>>>> +    check_host_md5();
>>>>
>>>>       ret = 0;
>>>>
>>>> --
>>>> 1.7.12.4
>>>>
>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>
Paolo Bonzini March 25, 2015, 1:12 p.m. UTC | #8
On 25/03/2015 11:21, Wen Congyang wrote:
> > What kind of load were you having when reproducing this issue?
> > Just to confirm, you have been able to reproduce this without COLO
> > patches, right?
>
> I can reproduce it without COLO patches. 

Thanks.  Please try doing mprotect (using qemu_ram_foreach_block to
mprotect all the areas) after ram_save_complete.

Paolo
Wen Congyang March 26, 2015, 1:43 a.m. UTC | #9
On 03/25/2015 09:12 PM, Paolo Bonzini wrote:
> 
> 
> On 25/03/2015 11:21, Wen Congyang wrote:
>>> What kind of load were you having when reproducing this issue?
>>> Just to confirm, you have been able to reproduce this without COLO
>>> patches, right?
>>
>> I can reproduce it without COLO patches. 
> 
> Thanks.  Please try doing mprotect (using qemu_ram_foreach_block to
> mprotect all the areas) after ram_save_complete.

Nothing happens, so no memory is touched after ram_save_complte()

Thanks
Wen Congyang

> 
> Paolo
> .
>
Wen Congyang March 26, 2015, 3:12 a.m. UTC | #10
On 03/25/2015 05:50 PM, Juan Quintela wrote:
> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>> Hi all,
>>
>> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
>> when we check it just after finishing migration but before VM continue to Run.
>>
>> We use a patch like bellow to find this issue, you can find it from affix,
>> and Steps to reprduce:
>>
>> (1) Compile QEMU:
>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>
>> (2) Command and output:
>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
> 
> Could you try to reproduce:
> - without vhost
> - without virtio-net
> - cache=unsafe is going to give you trouble, but trouble should only
>   happen after migration of pages have finished.

If I use ide disk, it doesn't happen.
Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
it is because I migrate the guest when it is booting. The virtio net
device is not used in this case.

Thanks
Wen Congyang

> 
> What kind of load were you having when reproducing this issue?
> Just to confirm, you have been able to reproduce this without COLO
> patches, right?
> 
>> (qemu) migrate tcp:192.168.3.8:3004
>> before saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>> md_host : after saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>>
>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
>> (qemu) QEMU_VM_SECTION_END, after loading ram
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after loading all vmstate
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after cpu_synchronize_all_post_init
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>
>> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
> 
> OK, a couple of things.  Memory don't have to be exactly identical.
> Virtio devices in particular do funny things on "post-load".  There
> aren't warantees for that as far as I know, we should end with an
> equivalent device state in memory.
> 
>> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
>> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
>> it is very difficult for us to trace all the actions of dirtying VM's pages.
> 
> This seems to point to a bug in one of the devices.
> 
>> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
>> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
>> when do checkpoint in COLO FT, and everything will be OK.)
>>
>> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
> 
> Pages transferred should be the same, after device state transmission is
> when things could change.
> 
>> This issue has blocked our COLO development... :(
>>
>> Any help will be greatly appreciated!
> 
> Later, Juan.
>
Juan Quintela March 26, 2015, 10:29 a.m. UTC | #11
Wen Congyang <wency@cn.fujitsu.com> wrote:
> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>>> Hi all,
>>>
>>> We found that, sometimes, the content of VM's memory is
>>> inconsistent between Source side and Destination side
>>> when we check it just after finishing migration but before VM continue to Run.
>>>
>>> We use a patch like bellow to find this issue, you can find it from affix,
>>> and Steps to reprduce:
>>>
>>> (1) Compile QEMU:
>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>
>>> (2) Command and output:
>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>> qemu64,-kvmclock -netdev tap,id=hn0-device
>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>> -device
>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>> -monitor stdio
>> 
>> Could you try to reproduce:
>> - without vhost
>> - without virtio-net
>> - cache=unsafe is going to give you trouble, but trouble should only
>>   happen after migration of pages have finished.
>
> If I use ide disk, it doesn't happen.
> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
> it is because I migrate the guest when it is booting. The virtio net
> device is not used in this case.

Kevin, Stefan, Michael, any great idea?

Thanks, Juan.

>
> Thanks
> Wen Congyang
>
>> 
>> What kind of load were you having when reproducing this issue?
>> Just to confirm, you have been able to reproduce this without COLO
>> patches, right?
>> 
>>> (qemu) migrate tcp:192.168.3.8:3004
>>> before saving ram complete
>>> ff703f6889ab8701e4e040872d079a28
>>> md_host : after saving ram complete
>>> ff703f6889ab8701e4e040872d079a28
>>>
>>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>> qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device
>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>> -device
>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>> -monitor stdio -incoming tcp:0:3004
>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>> md_host : after loading all vmstate
>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>> md_host : after cpu_synchronize_all_post_init
>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>
>>> This happens occasionally, and it is more easy to reproduce when
>>> issue migration command during VM's startup time.
>> 
>> OK, a couple of things.  Memory don't have to be exactly identical.
>> Virtio devices in particular do funny things on "post-load".  There
>> aren't warantees for that as far as I know, we should end with an
>> equivalent device state in memory.
>> 
>>> We have done further test and found that some pages has been
>>> dirtied but its corresponding migration_bitmap is not set.
>>> We can't figure out which modules of QEMU has missed setting bitmap
>>> when dirty page of VM,
>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>> 
>> This seems to point to a bug in one of the devices.
>> 
>>> Actually, the first time we found this problem was in the COLO FT
>>> development, and it triggered some strange issues in
>>> VM which all pointed to the issue of inconsistent of VM's
>>> memory. (We have try to save all memory of VM to slave side every
>>> time
>>> when do checkpoint in COLO FT, and everything will be OK.)
>>>
>>> Is it OK for some pages that not transferred to destination when do
>>> migration ? Or is it a bug?
>> 
>> Pages transferred should be the same, after device state transmission is
>> when things could change.
>> 
>>> This issue has blocked our COLO development... :(
>>>
>>> Any help will be greatly appreciated!
>> 
>> Later, Juan.
>>
Michael S. Tsirkin March 26, 2015, 11:57 a.m. UTC | #12
On Thu, Mar 26, 2015 at 11:29:43AM +0100, Juan Quintela wrote:
> Wen Congyang <wency@cn.fujitsu.com> wrote:
> > On 03/25/2015 05:50 PM, Juan Quintela wrote:
> >> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
> >>> Hi all,
> >>>
> >>> We found that, sometimes, the content of VM's memory is
> >>> inconsistent between Source side and Destination side
> >>> when we check it just after finishing migration but before VM continue to Run.
> >>>
> >>> We use a patch like bellow to find this issue, you can find it from affix,
> >>> and Steps to reprduce:
> >>>
> >>> (1) Compile QEMU:
> >>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
> >>>
> >>> (2) Command and output:
> >>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
> >>> qemu64,-kvmclock -netdev tap,id=hn0-device
> >>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
> >>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
> >>> -device
> >>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
> >>> -monitor stdio
> >> 
> >> Could you try to reproduce:
> >> - without vhost
> >> - without virtio-net
> >> - cache=unsafe is going to give you trouble, but trouble should only
> >>   happen after migration of pages have finished.
> >
> > If I use ide disk, it doesn't happen.
> > Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
> > it is because I migrate the guest when it is booting. The virtio net
> > device is not used in this case.
> 
> Kevin, Stefan, Michael, any great idea?
> 
> Thanks, Juan.

If this is during boot from disk, we can more or less
rule out virtio-net/vhost-net.

> >
> > Thanks
> > Wen Congyang
> >
> >> 
> >> What kind of load were you having when reproducing this issue?
> >> Just to confirm, you have been able to reproduce this without COLO
> >> patches, right?
> >> 
> >>> (qemu) migrate tcp:192.168.3.8:3004
> >>> before saving ram complete
> >>> ff703f6889ab8701e4e040872d079a28
> >>> md_host : after saving ram complete
> >>> ff703f6889ab8701e4e040872d079a28
> >>>
> >>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
> >>> qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device
> >>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
> >>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
> >>> -device
> >>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
> >>> -monitor stdio -incoming tcp:0:3004
> >>> (qemu) QEMU_VM_SECTION_END, after loading ram
> >>> 230e1e68ece9cd4e769630e1bcb5ddfb
> >>> md_host : after loading all vmstate
> >>> 230e1e68ece9cd4e769630e1bcb5ddfb
> >>> md_host : after cpu_synchronize_all_post_init
> >>> 230e1e68ece9cd4e769630e1bcb5ddfb
> >>>
> >>> This happens occasionally, and it is more easy to reproduce when
> >>> issue migration command during VM's startup time.
> >> 
> >> OK, a couple of things.  Memory don't have to be exactly identical.
> >> Virtio devices in particular do funny things on "post-load".  There
> >> aren't warantees for that as far as I know, we should end with an
> >> equivalent device state in memory.
> >> 
> >>> We have done further test and found that some pages has been
> >>> dirtied but its corresponding migration_bitmap is not set.
> >>> We can't figure out which modules of QEMU has missed setting bitmap
> >>> when dirty page of VM,
> >>> it is very difficult for us to trace all the actions of dirtying VM's pages.
> >> 
> >> This seems to point to a bug in one of the devices.
> >> 
> >>> Actually, the first time we found this problem was in the COLO FT
> >>> development, and it triggered some strange issues in
> >>> VM which all pointed to the issue of inconsistent of VM's
> >>> memory. (We have try to save all memory of VM to slave side every
> >>> time
> >>> when do checkpoint in COLO FT, and everything will be OK.)
> >>>
> >>> Is it OK for some pages that not transferred to destination when do
> >>> migration ? Or is it a bug?
> >> 
> >> Pages transferred should be the same, after device state transmission is
> >> when things could change.
> >> 
> >>> This issue has blocked our COLO development... :(
> >>>
> >>> Any help will be greatly appreciated!
> >> 
> >> Later, Juan.
> >>
Stefan Hajnoczi March 27, 2015, 8:56 a.m. UTC | #13
On Thu, Mar 26, 2015 at 11:29:43AM +0100, Juan Quintela wrote:
> Wen Congyang <wency@cn.fujitsu.com> wrote:
> > On 03/25/2015 05:50 PM, Juan Quintela wrote:
> >> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
> >>> Hi all,
> >>>
> >>> We found that, sometimes, the content of VM's memory is
> >>> inconsistent between Source side and Destination side
> >>> when we check it just after finishing migration but before VM continue to Run.
> >>>
> >>> We use a patch like bellow to find this issue, you can find it from affix,
> >>> and Steps to reprduce:
> >>>
> >>> (1) Compile QEMU:
> >>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
> >>>
> >>> (2) Command and output:
> >>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
> >>> qemu64,-kvmclock -netdev tap,id=hn0-device
> >>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
> >>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
> >>> -device
> >>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
> >>> -monitor stdio
> >> 
> >> Could you try to reproduce:
> >> - without vhost
> >> - without virtio-net
> >> - cache=unsafe is going to give you trouble, but trouble should only
> >>   happen after migration of pages have finished.
> >
> > If I use ide disk, it doesn't happen.
> > Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
> > it is because I migrate the guest when it is booting. The virtio net
> > device is not used in this case.
> 
> Kevin, Stefan, Michael, any great idea?

You must use -drive cache=none if you want to use live migration.  It
should not directly affect memory during migration though.

> >>> We have done further test and found that some pages has been
> >>> dirtied but its corresponding migration_bitmap is not set.
> >>> We can't figure out which modules of QEMU has missed setting bitmap
> >>> when dirty page of VM,
> >>> it is very difficult for us to trace all the actions of dirtying VM's pages.
> >> 
> >> This seems to point to a bug in one of the devices.

I think you'll need to track down which pages are different.  If you are
lucky, their contents will reveal what the page is used for.

Stefan
Wen Congyang March 27, 2015, 9:14 a.m. UTC | #14
On 03/27/2015 04:56 PM, Stefan Hajnoczi wrote:
> On Thu, Mar 26, 2015 at 11:29:43AM +0100, Juan Quintela wrote:
>> Wen Congyang <wency@cn.fujitsu.com> wrote:
>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> We found that, sometimes, the content of VM's memory is
>>>>> inconsistent between Source side and Destination side
>>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>>
>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>> and Steps to reprduce:
>>>>>
>>>>> (1) Compile QEMU:
>>>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>
>>>>> (2) Command and output:
>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>>> qemu64,-kvmclock -netdev tap,id=hn0-device
>>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>>> -device
>>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>>> -monitor stdio
>>>>
>>>> Could you try to reproduce:
>>>> - without vhost
>>>> - without virtio-net
>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>   happen after migration of pages have finished.
>>>
>>> If I use ide disk, it doesn't happen.
>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>> it is because I migrate the guest when it is booting. The virtio net
>>> device is not used in this case.
>>
>> Kevin, Stefan, Michael, any great idea?
> 
> You must use -drive cache=none if you want to use live migration.  It
> should not directly affect memory during migration though.

Otherwise, what will happen? If the user doesn't use cache=none, and
tries to use live migration, qemu doesn't output any message or trigger
an event to notify the user.

Thanks
Wen Congyang

> 
>>>>> We have done further test and found that some pages has been
>>>>> dirtied but its corresponding migration_bitmap is not set.
>>>>> We can't figure out which modules of QEMU has missed setting bitmap
>>>>> when dirty page of VM,
>>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>>>
>>>> This seems to point to a bug in one of the devices.
> 
> I think you'll need to track down which pages are different.  If you are
> lucky, their contents will reveal what the page is used for.
> 
> Stefan
>
Stefan Hajnoczi March 27, 2015, 9:57 a.m. UTC | #15
On Fri, Mar 27, 2015 at 9:14 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
> On 03/27/2015 04:56 PM, Stefan Hajnoczi wrote:
>> On Thu, Mar 26, 2015 at 11:29:43AM +0100, Juan Quintela wrote:
>>> Wen Congyang <wency@cn.fujitsu.com> wrote:
>>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>>> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We found that, sometimes, the content of VM's memory is
>>>>>> inconsistent between Source side and Destination side
>>>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>>>
>>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>>> and Steps to reprduce:
>>>>>>
>>>>>> (1) Compile QEMU:
>>>>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>>
>>>>>> (2) Command and output:
>>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>>>> qemu64,-kvmclock -netdev tap,id=hn0-device
>>>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>>>> -device
>>>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>>>> -monitor stdio
>>>>>
>>>>> Could you try to reproduce:
>>>>> - without vhost
>>>>> - without virtio-net
>>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>>   happen after migration of pages have finished.
>>>>
>>>> If I use ide disk, it doesn't happen.
>>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>>> it is because I migrate the guest when it is booting. The virtio net
>>>> device is not used in this case.
>>>
>>> Kevin, Stefan, Michael, any great idea?
>>
>> You must use -drive cache=none if you want to use live migration.  It
>> should not directly affect memory during migration though.
>
> Otherwise, what will happen? If the user doesn't use cache=none, and
> tries to use live migration, qemu doesn't output any message or trigger
> an event to notify the user.

There is a risk that the destination host sees an inconsistent view of
the image file because the source was still accessing it towards the
end of migration.

Stefan
Wen Congyang March 27, 2015, 10:05 a.m. UTC | #16
On 03/27/2015 05:57 PM, Stefan Hajnoczi wrote:
> On Fri, Mar 27, 2015 at 9:14 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
>> On 03/27/2015 04:56 PM, Stefan Hajnoczi wrote:
>>> On Thu, Mar 26, 2015 at 11:29:43AM +0100, Juan Quintela wrote:
>>>> Wen Congyang <wency@cn.fujitsu.com> wrote:
>>>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>>>> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> We found that, sometimes, the content of VM's memory is
>>>>>>> inconsistent between Source side and Destination side
>>>>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>>>>
>>>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>>>> and Steps to reprduce:
>>>>>>>
>>>>>>> (1) Compile QEMU:
>>>>>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>>>
>>>>>>> (2) Command and output:
>>>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>>>>> qemu64,-kvmclock -netdev tap,id=hn0-device
>>>>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>>>>> -device
>>>>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>>>>> -monitor stdio
>>>>>>
>>>>>> Could you try to reproduce:
>>>>>> - without vhost
>>>>>> - without virtio-net
>>>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>>>   happen after migration of pages have finished.
>>>>>
>>>>> If I use ide disk, it doesn't happen.
>>>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>>>> it is because I migrate the guest when it is booting. The virtio net
>>>>> device is not used in this case.
>>>>
>>>> Kevin, Stefan, Michael, any great idea?
>>>
>>> You must use -drive cache=none if you want to use live migration.  It
>>> should not directly affect memory during migration though.
>>
>> Otherwise, what will happen? If the user doesn't use cache=none, and
>> tries to use live migration, qemu doesn't output any message or trigger
>> an event to notify the user.
> 
> There is a risk that the destination host sees an inconsistent view of
> the image file because the source was still accessing it towards the
> end of migration.

Does the flag BDRV_O_NOFLUSH cause it?

Thanks
Wen Congyang

> 
> Stefan
> .
>
Stefan Hajnoczi March 27, 2015, 10:11 a.m. UTC | #17
On Fri, Mar 27, 2015 at 10:05 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
> On 03/27/2015 05:57 PM, Stefan Hajnoczi wrote:
>> On Fri, Mar 27, 2015 at 9:14 AM, Wen Congyang <wency@cn.fujitsu.com> wrote:
>>> On 03/27/2015 04:56 PM, Stefan Hajnoczi wrote:
>>>> On Thu, Mar 26, 2015 at 11:29:43AM +0100, Juan Quintela wrote:
>>>>> Wen Congyang <wency@cn.fujitsu.com> wrote:
>>>>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>>>>> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> We found that, sometimes, the content of VM's memory is
>>>>>>>> inconsistent between Source side and Destination side
>>>>>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>>>>>
>>>>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>>>>> and Steps to reprduce:
>>>>>>>>
>>>>>>>> (1) Compile QEMU:
>>>>>>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>>>>
>>>>>>>> (2) Command and output:
>>>>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>>>>>> qemu64,-kvmclock -netdev tap,id=hn0-device
>>>>>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>>>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>>>>>> -device
>>>>>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>>>>>> -monitor stdio
>>>>>>>
>>>>>>> Could you try to reproduce:
>>>>>>> - without vhost
>>>>>>> - without virtio-net
>>>>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>>>>   happen after migration of pages have finished.
>>>>>>
>>>>>> If I use ide disk, it doesn't happen.
>>>>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>>>>> it is because I migrate the guest when it is booting. The virtio net
>>>>>> device is not used in this case.
>>>>>
>>>>> Kevin, Stefan, Michael, any great idea?
>>>>
>>>> You must use -drive cache=none if you want to use live migration.  It
>>>> should not directly affect memory during migration though.
>>>
>>> Otherwise, what will happen? If the user doesn't use cache=none, and
>>> tries to use live migration, qemu doesn't output any message or trigger
>>> an event to notify the user.
>>
>> There is a risk that the destination host sees an inconsistent view of
>> the image file because the source was still accessing it towards the
>> end of migration.
>
> Does the flag BDRV_O_NOFLUSH cause it?

Partly, but the problem is worse than that.  BDRV_O_NOFLUSH means that
the host never issues fdatasync(2) and you have no guarantee that the
data reaches the physical disk safely.

The migration problem is related to the host kernel page cache.
cache=none bypasses the page cache with O_DIRECT so the destination
host doesn't see outdated cached data.

Stefan
Zhanghailiang March 27, 2015, 10:13 a.m. UTC | #18
On 2015/3/26 11:52, Li Zhijian wrote:
> On 03/26/2015 11:12 AM, Wen Congyang wrote:
>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>> zhanghailiang<zhang.zhanghailiang@huawei.com>  wrote:
>>>> Hi all,
>>>>
>>>> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>
>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>> and Steps to reprduce:
>>>>
>>>> (1) Compile QEMU:
>>>>   ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>
>>>> (2) Command and output:
>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
>>> Could you try to reproduce:
>>> - without vhost
>>> - without virtio-net
>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>    happen after migration of pages have finished.
>> If I use ide disk, it doesn't happen.
>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>> it is because I migrate the guest when it is booting. The virtio net
>> device is not used in this case.
> Er~~
> it reproduces in my ide disk
> there is no any virtio device, my command line like below
>
> x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net none
> -boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
> usb=off -no-user-config -nodefaults -monitor stdio -vga std
>
> it seems easily to reproduce this issue by following steps in _ubuntu_ guest
> 1.  in source side, choose memtest in grub
> 2. do live migration
> 3. exit memtest(type Esc in when memory testing)
> 4. wait migration complete
>

Yes,it is a thorny problem. It is indeed easy to reproduce, just as
your steps in the above.

This is my test result: (I also test accel=tcg, it can be reproduced also.)
Source side:
# x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
(qemu) ACPI_BUILD: init ACPI tables
ACPI_BUILD: init ACPI tables
migrate tcp:9.61.1.8:3004
ACPI_BUILD: init ACPI tables
before cpu_synchronize_all_states
5a8f72d66732cac80d6a0d5713654c0e
md_host : before saving ram complete
5a8f72d66732cac80d6a0d5713654c0e
md_host : after saving ram complete
5a8f72d66732cac80d6a0d5713654c0e
(qemu)

Destination side:
# x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio -incoming tcp:0:3004
(qemu) QEMU_VM_SECTION_END, after loading ram
d7cb0d8a4bdd1557fb0e78baee50c986
md_host : after loading all vmstate
d7cb0d8a4bdd1557fb0e78baee50c986
md_host : after cpu_synchronize_all_post_init
d7cb0d8a4bdd1557fb0e78baee50c986


Thanks,
zhang

>>
>>> What kind of load were you having when reproducing this issue?
>>> Just to confirm, you have been able to reproduce this without COLO
>>> patches, right?
>>>
>>>> (qemu) migrate tcp:192.168.3.8:3004
>>>> before saving ram complete
>>>> ff703f6889ab8701e4e040872d079a28
>>>> md_host : after saving ram complete
>>>> ff703f6889ab8701e4e040872d079a28
>>>>
>>>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
>>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>> md_host : after loading all vmstate
>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>> md_host : after cpu_synchronize_all_post_init
>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>
>>>> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
>>> OK, a couple of things.  Memory don't have to be exactly identical.
>>> Virtio devices in particular do funny things on "post-load".  There
>>> aren't warantees for that as far as I know, we should end with an
>>> equivalent device state in memory.
>>>
>>>> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
>>>> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>> This seems to point to a bug in one of the devices.
>>>
>>>> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
>>>> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
>>>> when do checkpoint in COLO FT, and everything will be OK.)
>>>>
>>>> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
>>> Pages transferred should be the same, after device state transmission is
>>> when things could change.
>>>
>>>> This issue has blocked our COLO development... :(
>>>>
>>>> Any help will be greatly appreciated!
>>> Later, Juan.
>>>
>> .
>>
>
>
Dr. David Alan Gilbert March 27, 2015, 10:18 a.m. UTC | #19
* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/3/26 11:52, Li Zhijian wrote:
> >On 03/26/2015 11:12 AM, Wen Congyang wrote:
> >>On 03/25/2015 05:50 PM, Juan Quintela wrote:
> >>>zhanghailiang<zhang.zhanghailiang@huawei.com>  wrote:
> >>>>Hi all,
> >>>>
> >>>>We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
> >>>>when we check it just after finishing migration but before VM continue to Run.
> >>>>
> >>>>We use a patch like bellow to find this issue, you can find it from affix,
> >>>>and Steps to reprduce:
> >>>>
> >>>>(1) Compile QEMU:
> >>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
> >>>>
> >>>>(2) Command and output:
> >>>>SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
> >>>Could you try to reproduce:
> >>>- without vhost
> >>>- without virtio-net
> >>>- cache=unsafe is going to give you trouble, but trouble should only
> >>>   happen after migration of pages have finished.
> >>If I use ide disk, it doesn't happen.
> >>Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
> >>it is because I migrate the guest when it is booting. The virtio net
> >>device is not used in this case.
> >Er??????
> >it reproduces in my ide disk
> >there is no any virtio device, my command line like below
> >
> >x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net none
> >-boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
> >usb=off -no-user-config -nodefaults -monitor stdio -vga std
> >
> >it seems easily to reproduce this issue by following steps in _ubuntu_ guest
> >1.  in source side, choose memtest in grub
> >2. do live migration
> >3. exit memtest(type Esc in when memory testing)
> >4. wait migration complete
> >
> 
> Yes???it is a thorny problem. It is indeed easy to reproduce, just as
> your steps in the above.
> 
> This is my test result: (I also test accel=tcg, it can be reproduced also.)
> Source side:
> # x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
> (qemu) ACPI_BUILD: init ACPI tables
> ACPI_BUILD: init ACPI tables
> migrate tcp:9.61.1.8:3004
> ACPI_BUILD: init ACPI tables
> before cpu_synchronize_all_states
> 5a8f72d66732cac80d6a0d5713654c0e
> md_host : before saving ram complete
> 5a8f72d66732cac80d6a0d5713654c0e
> md_host : after saving ram complete
> 5a8f72d66732cac80d6a0d5713654c0e
> (qemu)
>
> Destination side:
> # x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio -incoming tcp:0:3004
> (qemu) QEMU_VM_SECTION_END, after loading ram
> d7cb0d8a4bdd1557fb0e78baee50c986
> md_host : after loading all vmstate
> d7cb0d8a4bdd1557fb0e78baee50c986
> md_host : after cpu_synchronize_all_post_init
> d7cb0d8a4bdd1557fb0e78baee50c986

Hmm, that's not good.  I suggest you md5 each of the RAMBlock's individually;
to see if it's main RAM that's different or something more subtle like
video RAM.

But then maybe it's easier just to dump the whole of RAM to file
and byte compare it (hexdump the two dumps and diff ?)

Dave

> 
> 
> Thanks,
> zhang
> 
> >>
> >>>What kind of load were you having when reproducing this issue?
> >>>Just to confirm, you have been able to reproduce this without COLO
> >>>patches, right?
> >>>
> >>>>(qemu) migrate tcp:192.168.3.8:3004
> >>>>before saving ram complete
> >>>>ff703f6889ab8701e4e040872d079a28
> >>>>md_host : after saving ram complete
> >>>>ff703f6889ab8701e4e040872d079a28
> >>>>
> >>>>DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
> >>>>(qemu) QEMU_VM_SECTION_END, after loading ram
> >>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>md_host : after loading all vmstate
> >>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>md_host : after cpu_synchronize_all_post_init
> >>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>
> >>>>This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
> >>>OK, a couple of things.  Memory don't have to be exactly identical.
> >>>Virtio devices in particular do funny things on "post-load".  There
> >>>aren't warantees for that as far as I know, we should end with an
> >>>equivalent device state in memory.
> >>>
> >>>>We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
> >>>>We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
> >>>>it is very difficult for us to trace all the actions of dirtying VM's pages.
> >>>This seems to point to a bug in one of the devices.
> >>>
> >>>>Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
> >>>>VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
> >>>>when do checkpoint in COLO FT, and everything will be OK.)
> >>>>
> >>>>Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
> >>>Pages transferred should be the same, after device state transmission is
> >>>when things could change.
> >>>
> >>>>This issue has blocked our COLO development... :(
> >>>>
> >>>>Any help will be greatly appreciated!
> >>>Later, Juan.
> >>>
> >>.
> >>
> >
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Juan Quintela March 27, 2015, 10:34 a.m. UTC | #20
Wen Congyang <wency@cn.fujitsu.com> wrote:
> On 03/27/2015 04:56 PM, Stefan Hajnoczi wrote:
>> On Thu, Mar 26, 2015 at 11:29:43AM +0100, Juan Quintela wrote:
>>> Wen Congyang <wency@cn.fujitsu.com> wrote:
>>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>>> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We found that, sometimes, the content of VM's memory is
>>>>>> inconsistent between Source side and Destination side
>>>>>> when we check it just after finishing migration but before VM
>>>>>> continue to Run.
>>>>>>
>>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>>> and Steps to reprduce:
>>>>>>
>>>>>> (1) Compile QEMU:
>>>>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>>
>>>>>> (2) Command and output:
>>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>>>> qemu64,-kvmclock -netdev tap,id=hn0-device
>>>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>>>> -device
>>>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>>>> -monitor stdio
>>>>>
>>>>> Could you try to reproduce:
>>>>> - without vhost
>>>>> - without virtio-net
>>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>>   happen after migration of pages have finished.
>>>>
>>>> If I use ide disk, it doesn't happen.
>>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>>> it is because I migrate the guest when it is booting. The virtio net
>>>> device is not used in this case.
>>>
>>> Kevin, Stefan, Michael, any great idea?
>> 
>> You must use -drive cache=none if you want to use live migration.  It
>> should not directly affect memory during migration though.
>
> Otherwise, what will happen? If the user doesn't use cache=none, and
> tries to use live migration, qemu doesn't output any message or trigger
> an event to notify the user.

Problem here is what is your shared storage.  Some clustered filesystem
got this right and can run without cache=none.  But neither NFS, iscsi
or FC (FC is by my understanding, not sure) can.  And that are the more
used ones.  So, qemu don't now what FS/storage type the user has, so it
can make no real errors.

Later, Juan.


>
> Thanks
> Wen Congyang
>
>> 
>>>>>> We have done further test and found that some pages has been
>>>>>> dirtied but its corresponding migration_bitmap is not set.
>>>>>> We can't figure out which modules of QEMU has missed setting bitmap
>>>>>> when dirty page of VM,
>>>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>>>>
>>>>> This seems to point to a bug in one of the devices.
>> 
>> I think you'll need to track down which pages are different.  If you are
>> lucky, their contents will reveal what the page is used for.
>> 
>> Stefan
>>
Juan Quintela March 27, 2015, 10:36 a.m. UTC | #21
Wen Congyang <wency@cn.fujitsu.com> wrote:
> On 03/27/2015 05:57 PM, Stefan Hajnoczi wrote:
>>>> You must use -drive cache=none if you want to use live migration.  It
>>>> should not directly affect memory during migration though.
>>>
>>> Otherwise, what will happen? If the user doesn't use cache=none, and
>>> tries to use live migration, qemu doesn't output any message or trigger
>>> an event to notify the user.
>> 
>> There is a risk that the destination host sees an inconsistent view of
>> the image file because the source was still accessing it towards the
>> end of migration.
>
> Does the flag BDRV_O_NOFLUSH cause it?

The biggest isue is reads.

Target: reads 1st block of disk
        it enters page cache of target
Source: reads and writes lots of blocks

Migration happens.

Target: rereads 1st block of disk, and gets stalled contents.

No amount of fdatasync(), syncs() whatever on source will fix this problem.


Later, Juan.
Juan Quintela March 27, 2015, 10:51 a.m. UTC | #22
zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
> On 2015/3/26 11:52, Li Zhijian wrote:
>> On 03/26/2015 11:12 AM, Wen Congyang wrote:
>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>> zhanghailiang<zhang.zhanghailiang@huawei.com>  wrote:
>>>>> Hi all,
>>>>>
>>>>> We found that, sometimes, the content of VM's memory is
>>>>> inconsistent between Source side and Destination side
>>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>>
>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>> and Steps to reprduce:
>>>>>
>>>>> (1) Compile QEMU:
>>>>>   ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>
>>>>> (2) Command and output:
>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>>> qemu64,-kvmclock -netdev tap,id=hn0-device
>>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>>> -device
>>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>>> -monitor stdio
>>>> Could you try to reproduce:
>>>> - without vhost
>>>> - without virtio-net
>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>    happen after migration of pages have finished.
>>> If I use ide disk, it doesn't happen.
>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>> it is because I migrate the guest when it is booting. The virtio net
>>> device is not used in this case.
>> Er~~
>> it reproduces in my ide disk
>> there is no any virtio device, my command line like below
>>
>> x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net none
>> -boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
>> usb=off -no-user-config -nodefaults -monitor stdio -vga std
>>
>> it seems easily to reproduce this issue by following steps in _ubuntu_ guest
>> 1.  in source side, choose memtest in grub
>> 2. do live migration
>> 3. exit memtest(type Esc in when memory testing)
>> 4. wait migration complete
>>
>
> Yes,it is a thorny problem. It is indeed easy to reproduce, just as
> your steps in the above.

Thanks for the test case.  I will try to give a try on Monday.  Now that
we have a test case, it should be able to instrument things.  As the
problem is on memtest, it can't be the disk, clearly :p

Later, Juan.


>
> This is my test result: (I also test accel=tcg, it can be reproduced also.)
> Source side:
> # x86_64-softmmu/qemu-system-x86_64 -machine
> pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults -cpu
> qemu64,-kvmclock -boot c -drive
> file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device
> cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
> (qemu) ACPI_BUILD: init ACPI tables
> ACPI_BUILD: init ACPI tables
> migrate tcp:9.61.1.8:3004
> ACPI_BUILD: init ACPI tables
> before cpu_synchronize_all_states
> 5a8f72d66732cac80d6a0d5713654c0e
> md_host : before saving ram complete
> 5a8f72d66732cac80d6a0d5713654c0e
> md_host : after saving ram complete
> 5a8f72d66732cac80d6a0d5713654c0e
> (qemu)
>
> Destination side:
> # x86_64-softmmu/qemu-system-x86_64 -machine
> pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults -cpu
> qemu64,-kvmclock -boot c -drive
> file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device
> cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
> -incoming tcp:0:3004
> (qemu) QEMU_VM_SECTION_END, after loading ram
> d7cb0d8a4bdd1557fb0e78baee50c986
> md_host : after loading all vmstate
> d7cb0d8a4bdd1557fb0e78baee50c986
> md_host : after cpu_synchronize_all_post_init
> d7cb0d8a4bdd1557fb0e78baee50c986
>
>
> Thanks,
> zhang
>
>>>
>>>> What kind of load were you having when reproducing this issue?
>>>> Just to confirm, you have been able to reproduce this without COLO
>>>> patches, right?
>>>>
>>>>> (qemu) migrate tcp:192.168.3.8:3004
>>>>> before saving ram complete
>>>>> ff703f6889ab8701e4e040872d079a28
>>>>> md_host : after saving ram complete
>>>>> ff703f6889ab8701e4e040872d079a28
>>>>>
>>>>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>>> qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device
>>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>>> -device
>>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>>> -monitor stdio -incoming tcp:0:3004
>>>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>> md_host : after loading all vmstate
>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>> md_host : after cpu_synchronize_all_post_init
>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>
>>>>> This happens occasionally, and it is more easy to reproduce when
>>>>> issue migration command during VM's startup time.
>>>> OK, a couple of things.  Memory don't have to be exactly identical.
>>>> Virtio devices in particular do funny things on "post-load".  There
>>>> aren't warantees for that as far as I know, we should end with an
>>>> equivalent device state in memory.
>>>>
>>>>> We have done further test and found that some pages has been
>>>>> dirtied but its corresponding migration_bitmap is not set.
>>>>> We can't figure out which modules of QEMU has missed setting
>>>>> bitmap when dirty page of VM,
>>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>>> This seems to point to a bug in one of the devices.
>>>>
>>>>> Actually, the first time we found this problem was in the COLO FT
>>>>> development, and it triggered some strange issues in
>>>>> VM which all pointed to the issue of inconsistent of VM's
>>>>> memory. (We have try to save all memory of VM to slave side every
>>>>> time
>>>>> when do checkpoint in COLO FT, and everything will be OK.)
>>>>>
>>>>> Is it OK for some pages that not transferred to destination when
>>>>> do migration ? Or is it a bug?
>>>> Pages transferred should be the same, after device state transmission is
>>>> when things could change.
>>>>
>>>>> This issue has blocked our COLO development... :(
>>>>>
>>>>> Any help will be greatly appreciated!
>>>> Later, Juan.
>>>>
>>> .
>>>
>>
>>
Zhanghailiang March 28, 2015, 1:08 a.m. UTC | #23
On 2015/3/27 18:51, Juan Quintela wrote:
> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>> On 2015/3/26 11:52, Li Zhijian wrote:
>>> On 03/26/2015 11:12 AM, Wen Congyang wrote:
>>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>>> zhanghailiang<zhang.zhanghailiang@huawei.com>  wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We found that, sometimes, the content of VM's memory is
>>>>>> inconsistent between Source side and Destination side
>>>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>>>
>>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>>> and Steps to reprduce:
>>>>>>
>>>>>> (1) Compile QEMU:
>>>>>>    ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>>
>>>>>> (2) Command and output:
>>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>>>> qemu64,-kvmclock -netdev tap,id=hn0-device
>>>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>>>> -device
>>>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>>>> -monitor stdio
>>>>> Could you try to reproduce:
>>>>> - without vhost
>>>>> - without virtio-net
>>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>>     happen after migration of pages have finished.
>>>> If I use ide disk, it doesn't happen.
>>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>>> it is because I migrate the guest when it is booting. The virtio net
>>>> device is not used in this case.
>>> Er~~
>>> it reproduces in my ide disk
>>> there is no any virtio device, my command line like below
>>>
>>> x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net none
>>> -boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
>>> usb=off -no-user-config -nodefaults -monitor stdio -vga std
>>>
>>> it seems easily to reproduce this issue by following steps in _ubuntu_ guest
>>> 1.  in source side, choose memtest in grub
>>> 2. do live migration
>>> 3. exit memtest(type Esc in when memory testing)
>>> 4. wait migration complete
>>>
>>
>> Yes,it is a thorny problem. It is indeed easy to reproduce, just as
>> your steps in the above.
>
> Thanks for the test case.  I will try to give a try on Monday.  Now that
> we have a test case, it should be able to instrument things.  As the
> problem is on memtest, it can't be the disk, clearly :p

OK, thanks.

>
>>
>> This is my test result: (I also test accel=tcg, it can be reproduced also.)
>> Source side:
>> # x86_64-softmmu/qemu-system-x86_64 -machine
>> pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults -cpu
>> qemu64,-kvmclock -boot c -drive
>> file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device
>> cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
>> (qemu) ACPI_BUILD: init ACPI tables
>> ACPI_BUILD: init ACPI tables
>> migrate tcp:9.61.1.8:3004
>> ACPI_BUILD: init ACPI tables
>> before cpu_synchronize_all_states
>> 5a8f72d66732cac80d6a0d5713654c0e
>> md_host : before saving ram complete
>> 5a8f72d66732cac80d6a0d5713654c0e
>> md_host : after saving ram complete
>> 5a8f72d66732cac80d6a0d5713654c0e
>> (qemu)
>>
>> Destination side:
>> # x86_64-softmmu/qemu-system-x86_64 -machine
>> pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults -cpu
>> qemu64,-kvmclock -boot c -drive
>> file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device
>> cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
>> -incoming tcp:0:3004
>> (qemu) QEMU_VM_SECTION_END, after loading ram
>> d7cb0d8a4bdd1557fb0e78baee50c986
>> md_host : after loading all vmstate
>> d7cb0d8a4bdd1557fb0e78baee50c986
>> md_host : after cpu_synchronize_all_post_init
>> d7cb0d8a4bdd1557fb0e78baee50c986
>>
>>
>> Thanks,
>> zhang
>>
>>>>
>>>>> What kind of load were you having when reproducing this issue?
>>>>> Just to confirm, you have been able to reproduce this without COLO
>>>>> patches, right?
>>>>>
>>>>>> (qemu) migrate tcp:192.168.3.8:3004
>>>>>> before saving ram complete
>>>>>> ff703f6889ab8701e4e040872d079a28
>>>>>> md_host : after saving ram complete
>>>>>> ff703f6889ab8701e4e040872d079a28
>>>>>>
>>>>>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>>>> qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device
>>>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>>>> -device
>>>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>>>> -monitor stdio -incoming tcp:0:3004
>>>>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>> md_host : after loading all vmstate
>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>> md_host : after cpu_synchronize_all_post_init
>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>>
>>>>>> This happens occasionally, and it is more easy to reproduce when
>>>>>> issue migration command during VM's startup time.
>>>>> OK, a couple of things.  Memory don't have to be exactly identical.
>>>>> Virtio devices in particular do funny things on "post-load".  There
>>>>> aren't warantees for that as far as I know, we should end with an
>>>>> equivalent device state in memory.
>>>>>
>>>>>> We have done further test and found that some pages has been
>>>>>> dirtied but its corresponding migration_bitmap is not set.
>>>>>> We can't figure out which modules of QEMU has missed setting
>>>>>> bitmap when dirty page of VM,
>>>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>>>> This seems to point to a bug in one of the devices.
>>>>>
>>>>>> Actually, the first time we found this problem was in the COLO FT
>>>>>> development, and it triggered some strange issues in
>>>>>> VM which all pointed to the issue of inconsistent of VM's
>>>>>> memory. (We have try to save all memory of VM to slave side every
>>>>>> time
>>>>>> when do checkpoint in COLO FT, and everything will be OK.)
>>>>>>
>>>>>> Is it OK for some pages that not transferred to destination when
>>>>>> do migration ? Or is it a bug?
>>>>> Pages transferred should be the same, after device state transmission is
>>>>> when things could change.
>>>>>
>>>>>> This issue has blocked our COLO development... :(
>>>>>>
>>>>>> Any help will be greatly appreciated!
>>>>> Later, Juan.
>>>>>
>>>> .
>>>>
>>>
>>>
>
> .
>
Zhanghailiang March 28, 2015, 9:54 a.m. UTC | #24
On 2015/3/27 18:18, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2015/3/26 11:52, Li Zhijian wrote:
>>> On 03/26/2015 11:12 AM, Wen Congyang wrote:
>>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>>> zhanghailiang<zhang.zhanghailiang@huawei.com>  wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
>>>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>>>
>>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>>> and Steps to reprduce:
>>>>>>
>>>>>> (1) Compile QEMU:
>>>>>>   ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>>
>>>>>> (2) Command and output:
>>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
>>>>> Could you try to reproduce:
>>>>> - without vhost
>>>>> - without virtio-net
>>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>>    happen after migration of pages have finished.
>>>> If I use ide disk, it doesn't happen.
>>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>>> it is because I migrate the guest when it is booting. The virtio net
>>>> device is not used in this case.
>>> Er??????
>>> it reproduces in my ide disk
>>> there is no any virtio device, my command line like below
>>>
>>> x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net none
>>> -boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
>>> usb=off -no-user-config -nodefaults -monitor stdio -vga std
>>>
>>> it seems easily to reproduce this issue by following steps in _ubuntu_ guest
>>> 1.  in source side, choose memtest in grub
>>> 2. do live migration
>>> 3. exit memtest(type Esc in when memory testing)
>>> 4. wait migration complete
>>>
>>
>> Yes???it is a thorny problem. It is indeed easy to reproduce, just as
>> your steps in the above.
>>
>> This is my test result: (I also test accel=tcg, it can be reproduced also.)
>> Source side:
>> # x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
>> (qemu) ACPI_BUILD: init ACPI tables
>> ACPI_BUILD: init ACPI tables
>> migrate tcp:9.61.1.8:3004
>> ACPI_BUILD: init ACPI tables
>> before cpu_synchronize_all_states
>> 5a8f72d66732cac80d6a0d5713654c0e
>> md_host : before saving ram complete
>> 5a8f72d66732cac80d6a0d5713654c0e
>> md_host : after saving ram complete
>> 5a8f72d66732cac80d6a0d5713654c0e
>> (qemu)
>>
>> Destination side:
>> # x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio -incoming tcp:0:3004
>> (qemu) QEMU_VM_SECTION_END, after loading ram
>> d7cb0d8a4bdd1557fb0e78baee50c986
>> md_host : after loading all vmstate
>> d7cb0d8a4bdd1557fb0e78baee50c986
>> md_host : after cpu_synchronize_all_post_init
>> d7cb0d8a4bdd1557fb0e78baee50c986
>
> Hmm, that's not good.  I suggest you md5 each of the RAMBlock's individually;
> to see if it's main RAM that's different or something more subtle like
> video RAM.
>

Er, all my previous tests are md5 'pc.ram' block only.

> But then maybe it's easier just to dump the whole of RAM to file
> and byte compare it (hexdump the two dumps and diff ?)

Hmm, we also used memcmp function to compare every page, but the addresses
seem to be random.

Besides, in our previous test, we found it seems to be more easy to reproduce
when migration occurs during VM's start-up or reboot process.

Is there any possible that some devices have special treatment when VM start-up
which may miss setting dirty-bitmap ?


Thanks,
zhanghailiang


>>>>
>>>>> What kind of load were you having when reproducing this issue?
>>>>> Just to confirm, you have been able to reproduce this without COLO
>>>>> patches, right?
>>>>>
>>>>>> (qemu) migrate tcp:192.168.3.8:3004
>>>>>> before saving ram complete
>>>>>> ff703f6889ab8701e4e040872d079a28
>>>>>> md_host : after saving ram complete
>>>>>> ff703f6889ab8701e4e040872d079a28
>>>>>>
>>>>>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
>>>>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>> md_host : after loading all vmstate
>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>> md_host : after cpu_synchronize_all_post_init
>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>>
>>>>>> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
>>>>> OK, a couple of things.  Memory don't have to be exactly identical.
>>>>> Virtio devices in particular do funny things on "post-load".  There
>>>>> aren't warantees for that as far as I know, we should end with an
>>>>> equivalent device state in memory.
>>>>>
>>>>>> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
>>>>>> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
>>>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>>>> This seems to point to a bug in one of the devices.
>>>>>
>>>>>> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
>>>>>> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
>>>>>> when do checkpoint in COLO FT, and everything will be OK.)
>>>>>>
>>>>>> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
>>>>> Pages transferred should be the same, after device state transmission is
>>>>> when things could change.
>>>>>
>>>>>> This issue has blocked our COLO development... :(
>>>>>>
>>>>>> Any help will be greatly appreciated!
>>>>> Later, Juan.
>>>>>
>>>> .
>>>>
>>>
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>
Dr. David Alan Gilbert March 30, 2015, 7:59 a.m. UTC | #25
* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/3/27 18:18, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>On 2015/3/26 11:52, Li Zhijian wrote:
> >>>On 03/26/2015 11:12 AM, Wen Congyang wrote:
> >>>>On 03/25/2015 05:50 PM, Juan Quintela wrote:
> >>>>>zhanghailiang<zhang.zhanghailiang@huawei.com>  wrote:
> >>>>>>Hi all,
> >>>>>>
> >>>>>>We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
> >>>>>>when we check it just after finishing migration but before VM continue to Run.
> >>>>>>
> >>>>>>We use a patch like bellow to find this issue, you can find it from affix,
> >>>>>>and Steps to reprduce:
> >>>>>>
> >>>>>>(1) Compile QEMU:
> >>>>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
> >>>>>>
> >>>>>>(2) Command and output:
> >>>>>>SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
> >>>>>Could you try to reproduce:
> >>>>>- without vhost
> >>>>>- without virtio-net
> >>>>>- cache=unsafe is going to give you trouble, but trouble should only
> >>>>>   happen after migration of pages have finished.
> >>>>If I use ide disk, it doesn't happen.
> >>>>Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
> >>>>it is because I migrate the guest when it is booting. The virtio net
> >>>>device is not used in this case.
> >>>Er??????
> >>>it reproduces in my ide disk
> >>>there is no any virtio device, my command line like below
> >>>
> >>>x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net none
> >>>-boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
> >>>usb=off -no-user-config -nodefaults -monitor stdio -vga std
> >>>
> >>>it seems easily to reproduce this issue by following steps in _ubuntu_ guest
> >>>1.  in source side, choose memtest in grub
> >>>2. do live migration
> >>>3. exit memtest(type Esc in when memory testing)
> >>>4. wait migration complete
> >>>
> >>
> >>Yes???it is a thorny problem. It is indeed easy to reproduce, just as
> >>your steps in the above.
> >>
> >>This is my test result: (I also test accel=tcg, it can be reproduced also.)
> >>Source side:
> >># x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
> >>(qemu) ACPI_BUILD: init ACPI tables
> >>ACPI_BUILD: init ACPI tables
> >>migrate tcp:9.61.1.8:3004
> >>ACPI_BUILD: init ACPI tables
> >>before cpu_synchronize_all_states
> >>5a8f72d66732cac80d6a0d5713654c0e
> >>md_host : before saving ram complete
> >>5a8f72d66732cac80d6a0d5713654c0e
> >>md_host : after saving ram complete
> >>5a8f72d66732cac80d6a0d5713654c0e
> >>(qemu)
> >>
> >>Destination side:
> >># x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio -incoming tcp:0:3004
> >>(qemu) QEMU_VM_SECTION_END, after loading ram
> >>d7cb0d8a4bdd1557fb0e78baee50c986
> >>md_host : after loading all vmstate
> >>d7cb0d8a4bdd1557fb0e78baee50c986
> >>md_host : after cpu_synchronize_all_post_init
> >>d7cb0d8a4bdd1557fb0e78baee50c986
> >
> >Hmm, that's not good.  I suggest you md5 each of the RAMBlock's individually;
> >to see if it's main RAM that's different or something more subtle like
> >video RAM.
> >
> 
> Er, all my previous tests are md5 'pc.ram' block only.
> 
> >But then maybe it's easier just to dump the whole of RAM to file
> >and byte compare it (hexdump the two dumps and diff ?)
> 
> Hmm, we also used memcmp function to compare every page, but the addresses
> seem to be random.
> 
> Besides, in our previous test, we found it seems to be more easy to reproduce
> when migration occurs during VM's start-up or reboot process.
> 
> Is there any possible that some devices have special treatment when VM start-up
> which may miss setting dirty-bitmap ?

I don't think there should be, but the code paths used during startup are
probably much less tested with migration.  I'm sure the startup code
uses different part of device emulation.   I do know we have some bugs
filed against migration during windows boot, I'd not considered that it might
be devices not updating the bitmap.

Dave

> 
> 
> Thanks,
> zhanghailiang
> 
> 
> >>>>
> >>>>>What kind of load were you having when reproducing this issue?
> >>>>>Just to confirm, you have been able to reproduce this without COLO
> >>>>>patches, right?
> >>>>>
> >>>>>>(qemu) migrate tcp:192.168.3.8:3004
> >>>>>>before saving ram complete
> >>>>>>ff703f6889ab8701e4e040872d079a28
> >>>>>>md_host : after saving ram complete
> >>>>>>ff703f6889ab8701e4e040872d079a28
> >>>>>>
> >>>>>>DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
> >>>>>>(qemu) QEMU_VM_SECTION_END, after loading ram
> >>>>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>>>md_host : after loading all vmstate
> >>>>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>>>md_host : after cpu_synchronize_all_post_init
> >>>>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>>>
> >>>>>>This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
> >>>>>OK, a couple of things.  Memory don't have to be exactly identical.
> >>>>>Virtio devices in particular do funny things on "post-load".  There
> >>>>>aren't warantees for that as far as I know, we should end with an
> >>>>>equivalent device state in memory.
> >>>>>
> >>>>>>We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
> >>>>>>We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
> >>>>>>it is very difficult for us to trace all the actions of dirtying VM's pages.
> >>>>>This seems to point to a bug in one of the devices.
> >>>>>
> >>>>>>Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
> >>>>>>VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
> >>>>>>when do checkpoint in COLO FT, and everything will be OK.)
> >>>>>>
> >>>>>>Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
> >>>>>Pages transferred should be the same, after device state transmission is
> >>>>>when things could change.
> >>>>>
> >>>>>>This issue has blocked our COLO development... :(
> >>>>>>
> >>>>>>Any help will be greatly appreciated!
> >>>>>Later, Juan.
> >>>>>
> >>>>.
> >>>>
> >>>
> >>>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Wen Congyang March 31, 2015, 7:54 a.m. UTC | #26
On 03/27/2015 04:56 PM, Stefan Hajnoczi wrote:
> On Thu, Mar 26, 2015 at 11:29:43AM +0100, Juan Quintela wrote:
>> Wen Congyang <wency@cn.fujitsu.com> wrote:
>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> We found that, sometimes, the content of VM's memory is
>>>>> inconsistent between Source side and Destination side
>>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>>
>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>> and Steps to reprduce:
>>>>>
>>>>> (1) Compile QEMU:
>>>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>
>>>>> (2) Command and output:
>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
>>>>> qemu64,-kvmclock -netdev tap,id=hn0-device
>>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
>>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>>>>> -device
>>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
>>>>> -monitor stdio
>>>>
>>>> Could you try to reproduce:
>>>> - without vhost
>>>> - without virtio-net
>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>   happen after migration of pages have finished.
>>>
>>> If I use ide disk, it doesn't happen.
>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>> it is because I migrate the guest when it is booting. The virtio net
>>> device is not used in this case.
>>
>> Kevin, Stefan, Michael, any great idea?
> 
> You must use -drive cache=none if you want to use live migration.  It
> should not directly affect memory during migration though.
> 
>>>>> We have done further test and found that some pages has been
>>>>> dirtied but its corresponding migration_bitmap is not set.
>>>>> We can't figure out which modules of QEMU has missed setting bitmap
>>>>> when dirty page of VM,
>>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>>>
>>>> This seems to point to a bug in one of the devices.
> 
> I think you'll need to track down which pages are different.  If you are
> lucky, their contents will reveal what the page is used for.

I have a question about memory dirty log:
If qemu modifies some memory, is this page marked dirty?

Thanks
Wen Congyang

> 
> Stefan
>
Zhanghailiang March 31, 2015, 11:48 a.m. UTC | #27
On 2015/3/30 15:59, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2015/3/27 18:18, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> On 2015/3/26 11:52, Li Zhijian wrote:
>>>>> On 03/26/2015 11:12 AM, Wen Congyang wrote:
>>>>>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
>>>>>>> zhanghailiang<zhang.zhanghailiang@huawei.com>  wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
>>>>>>>> when we check it just after finishing migration but before VM continue to Run.
>>>>>>>>
>>>>>>>> We use a patch like bellow to find this issue, you can find it from affix,
>>>>>>>> and Steps to reprduce:
>>>>>>>>
>>>>>>>> (1) Compile QEMU:
>>>>>>>>   ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
>>>>>>>>
>>>>>>>> (2) Command and output:
>>>>>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
>>>>>>> Could you try to reproduce:
>>>>>>> - without vhost
>>>>>>> - without virtio-net
>>>>>>> - cache=unsafe is going to give you trouble, but trouble should only
>>>>>>>    happen after migration of pages have finished.
>>>>>> If I use ide disk, it doesn't happen.
>>>>>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
>>>>>> it is because I migrate the guest when it is booting. The virtio net
>>>>>> device is not used in this case.
>>>>> Er??????
>>>>> it reproduces in my ide disk
>>>>> there is no any virtio device, my command line like below
>>>>>
>>>>> x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net none
>>>>> -boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
>>>>> usb=off -no-user-config -nodefaults -monitor stdio -vga std
>>>>>
>>>>> it seems easily to reproduce this issue by following steps in _ubuntu_ guest
>>>>> 1.  in source side, choose memtest in grub
>>>>> 2. do live migration
>>>>> 3. exit memtest(type Esc in when memory testing)
>>>>> 4. wait migration complete
>>>>>
>>>>
>>>> Yes???it is a thorny problem. It is indeed easy to reproduce, just as
>>>> your steps in the above.
>>>>
>>>> This is my test result: (I also test accel=tcg, it can be reproduced also.)
>>>> Source side:
>>>> # x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
>>>> (qemu) ACPI_BUILD: init ACPI tables
>>>> ACPI_BUILD: init ACPI tables
>>>> migrate tcp:9.61.1.8:3004
>>>> ACPI_BUILD: init ACPI tables
>>>> before cpu_synchronize_all_states
>>>> 5a8f72d66732cac80d6a0d5713654c0e
>>>> md_host : before saving ram complete
>>>> 5a8f72d66732cac80d6a0d5713654c0e
>>>> md_host : after saving ram complete
>>>> 5a8f72d66732cac80d6a0d5713654c0e
>>>> (qemu)
>>>>
>>>> Destination side:
>>>> # x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio -incoming tcp:0:3004
>>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>>> d7cb0d8a4bdd1557fb0e78baee50c986
>>>> md_host : after loading all vmstate
>>>> d7cb0d8a4bdd1557fb0e78baee50c986
>>>> md_host : after cpu_synchronize_all_post_init
>>>> d7cb0d8a4bdd1557fb0e78baee50c986
>>>
>>> Hmm, that's not good.  I suggest you md5 each of the RAMBlock's individually;
>>> to see if it's main RAM that's different or something more subtle like
>>> video RAM.
>>>
>>
>> Er, all my previous tests are md5 'pc.ram' block only.
>>
>>> But then maybe it's easier just to dump the whole of RAM to file
>>> and byte compare it (hexdump the two dumps and diff ?)
>>
>> Hmm, we also used memcmp function to compare every page, but the addresses
>> seem to be random.
>>
>> Besides, in our previous test, we found it seems to be more easy to reproduce
>> when migration occurs during VM's start-up or reboot process.
>>
>> Is there any possible that some devices have special treatment when VM start-up
>> which may miss setting dirty-bitmap ?
>
> I don't think there should be, but the code paths used during startup are
> probably much less tested with migration.  I'm sure the startup code
> uses different part of device emulation.   I do know we have some bugs

Er, Maybe there is a special case:

During VM's start-up, i found that the KVMslot changed many times, it was a process of
smashing total memory space into small slot.

If some pages was dirtied and its dirty-bitmap has been set in KVM module,
but we didn't sync the bitmaps to QEMU user-space before this slot was smashed,
with its previous bitmap been destroyed.
The bitmap of dirty pages in the previous KVMslot maybe be missed.

What's your opinion? Can this situation i described in the above happen?

The bellow log was grabbed, when i tried to figure out a quite same question (some pages miss dirty-bitmap setting) we found in COLO:
Occasionally, there will be an error report in SLAVE side:

     qemu: warning: error while loading state for instance 0x0 of device
     'kvm-tpr-opt'                                                 '
     qemu-system-x86_64: loadvm failed

We found that it related to three address (gpa: 0xca000,0xcb000,0xcc000, which are the address of 'kvmvapic.rom ?'), and sometimes
their corresponding dirty-map will be missed in Master side, because their KVMSlot is destroyed before we sync its dirty-bitmap to qemu.

(I'm still not quite sure if this can also happen in common migration, i will try to test it in normal migration)

Thanks,
zhanghailiang

/var/log/message:
Mar 31 18:32:53 master kernel: [15908.524721] memslot dirty bitmap of '4' was destroyed, base_gfn 0x100, npages 524032
Mar 31 18:32:53 master kernel: [15908.657853] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.665105] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.672360] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.682058] memslot dirty bitmap of '4' was destroyed, base_gfn 0xfebc0, npages 16
Mar 31 18:32:53 master kernel: [15908.849527] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.856845] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.864161] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.872676] memslot dirty bitmap of '4' was destroyed, base_gfn 0xfeb80, npages 64
Mar 31 18:32:53 master kernel: [15908.882694] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.889948] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.897202] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.911652] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.919543] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.927419] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.935317] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.943209] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.951083] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.958971] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.966837] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.974707] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.988470] memslot dirty bitmap of '2' was destroyed, base_gfn 0xc0, npages 524096
Mar 31 18:32:53 master kernel: [15909.002403] kvm: zapping shadow pages for mmio generation wraparound
Mar 31 18:32:53 master kernel: [15909.002523] memslot dirty bitmap of '2' was destroyed, base_gfn 0xc0, npages 8
Mar 31 18:32:53 master kernel: [15909.010110] memslot dirty bitmap of '4' was destroyed, base_gfn 0xc8, npages 524088
Mar 31 18:32:53 master kernel: [15909.023988] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 3
Mar 31 18:32:53 master kernel: [15909.031594] memslot dirty bitmap of '6' was destroyed, base_gfn 0xd0, npages 524080
Mar 31 18:32:53 master kernel: [15909.044708] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 11
Mar 31 18:32:53 master kernel: [15909.052392] memslot dirty bitmap of '6' was destroyed, base_gfn 0xd8, npages 524072
Mar 31 18:32:53 master kernel: [15909.065651] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 19
Mar 31 18:32:53 master kernel: [15909.073329] memslot dirty bitmap of '6' was destroyed, base_gfn 0xe0, npages 524064
Mar 31 18:32:53 master kernel: [15909.086379] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 27
Mar 31 18:32:53 master kernel: [15909.094084] memslot dirty bitmap of '6' was destroyed, base_gfn 0xe8, npages 524056
Mar 31 18:32:53 master kernel: [15909.107354] memslot dirty bitmap of '6' was destroyed, base_gfn 0xec, npages 524052
Mar 31 18:32:54 master dhcpcd[5408]: eth0: timed out
Mar 31 18:32:54 master dhcpcd[5408]: eth0: trying to use old lease in `/var/lib/dhcpcd/dhcpcd-eth0.info'
Mar 31 18:32:54 master dhcpcd[5408]: eth0: lease expired 30265 seconds ago
Mar 31 18:32:54 master dhcpcd[5408]: eth0: broadcasting for a lease
Mar 31 18:33:02 master qemu-system-x86_64: ====qemu: The 0 times do checkpoing===
Mar 31 18:33:02 master qemu-system-x86_64: == migration_bitmap_sync ==
Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xca000 (hva=0x7f4fa32ca000), who's bitmap not set ---
Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xcb000 (hva=0x7f4fa32cb000), who's bitmap not set ---
Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xcc000 (hva=0x7f4fa32cc000), who's bitmap not set ---
Mar 31 18:33:05 master kernel: [15921.057246] device eth2 left promiscuous mode
Mar 31 18:33:07 master avahi-daemon[5773]: Withdrawing address record for fe80::90be:cfff:fe9e:f03e on tap0.
Mar 31 18:33:07 master kernel: [15922.513313] br0: port 2(tap0) entered disabled state
Mar 31 18:33:07 master kernel: [15922.513480] device tap0 left promiscuous mode
Mar 31 18:33:07 master kernel: [15922.513508] br0: port 2(tap0) entered disabled state
Mar 31 18:33:07 master kernel: [15922.562591] memslot dirty bitmap of '8' was destroyed, base_gfn 0x100, npages 524032
Mar 31 18:33:07 master kernel: [15922.570948] memslot dirty bitmap of '3' was destroyed, base_gfn 0xfd000, npages 4096
Mar 31 18:33:07 master kernel: [15922.578967] memslot dirty bitmap of '0' was destroyed, base_gfn 0x0, npages 160
Mar 31 18:33:07 master kernel: [15922.586556] memslot dirty bitmap of '1' was destroyed, base_gfn 0xfffc0, npages 64
Mar 31 18:33:07 master kernel: [15922.586574] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 31
Mar 31 18:33:07 master kernel: [15922.586575] memslot dirty bitmap of '7' was destroyed, base_gfn 0xf0, npages 16
Mar 31 18:33:07 master kernel: [15922.586577] memslot dirty bitmap of '2' was destroyed, base_gfn 0xc0, npages 10
Mar 31 18:33:07 master kernel: [15922.586578] memslot dirty bitmap of '6' was destroyed, base_gfn 0xec, npages 4
Mar 31 18:33:07 master kernel: [15922.586579] memslot dirty bitmap of '4' was destroyed, base_gfn 0xca, npages 3

PS:
QEMU:
static void migration_bitmap_sync(void)
{
     trace_migration_bitmap_sync_end(migration_dirty_pages
                                     - num_dirty_pages_init);
     num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
     end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    syslog(LOG_INFO, "== migration_bitmap_sync ==");
+    check_bitmap(();

+void check_bitmap(void)
+{
+    int i;
+    char *host;
+    ram_addr_t addr[3] = { 0xca000, 0xcb000, 0xcc000 };
+    RAMBlock *block = NULL;
+    int ret;
+
+    block = QLIST_FIRST_RCU(&ram_list.blocks);
+    for (i = 0; i < 3; i++) {
+        unsigned long nr = block->mr->ram_addr + (addr[i] >> TARGET_PAGE_BITS);
+        host =  memory_region_get_ram_ptr(block->mr) + addr[i];
+        ret = test_bit(nr, migration_bitmap);
+        if (ret == 0) {
+            syslog(LOG_INFO, "qemu:  --- addr=0x%llx (hva=%p), who's bitmap not set ---\n",
+                       addr[i], host);
+        } else {
+            syslog(LOG_INFO, "qemu: +++ OK, addr=0x%llx (hva=%p) , who's bitap is set +++\n",
+                       addr[i], host);;
+        }
+    }
+}

KVM:
static void mark_page_dirty_in_slot(struct kvm *kvm,
                     struct kvm_memory_slot *memslot,
                     gfn_t gfn)
{
+    if ((gfn == 0xca || gfn == 0xcb || gfn == 0xcc) && !memslot->dirty_bitmap) {
+        printk("oops, dirty_bitmap is null, gfn=0x%llx\n",gfn);
+    }
     if (memslot && memslot->dirty_bitmap) {
         unsigned long rel_gfn = gfn - memslot->base_gfn;
+        if (gfn == 0xca || gfn == 0xcb || gfn == 0xcc) {
+            printk("+++ gfn=0x%llx is marked dirty, id=%d, base_gfn=0x%llx, npages=%d\n",
+                   gfn, memslot->id, memslot->base_gfn, memslot->npages);
+        }
         set_bit_le(rel_gfn, memslot->dirty_bitmap);
     }
}

static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
{
     if (!memslot->dirty_bitmap)
         return;
+    if (memslot->id != 9) {
+        printk("memslot dirty bitmap of '%d' was destroyed, base_gfn 0x%llx, npages %lld\n",
+           memslot->id, memslot->base_gfn, memslot->npages);
+    }
     kvm_kvfree(memslot->dirty_bitmap);
     memslot->dirty_bitmap = NULL;
}


> filed against migration during windows boot, I'd not considered that it might
> be devices not updating the bitmap.
>
> Dave
>
>>
>>
>> Thanks,
>> zhanghailiang
>>
>>
>>>>>>
>>>>>>> What kind of load were you having when reproducing this issue?
>>>>>>> Just to confirm, you have been able to reproduce this without COLO
>>>>>>> patches, right?
>>>>>>>
>>>>>>>> (qemu) migrate tcp:192.168.3.8:3004
>>>>>>>> before saving ram complete
>>>>>>>> ff703f6889ab8701e4e040872d079a28
>>>>>>>> md_host : after saving ram complete
>>>>>>>> ff703f6889ab8701e4e040872d079a28
>>>>>>>>
>>>>>>>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
>>>>>>>> (qemu) QEMU_VM_SECTION_END, after loading ram
>>>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>>>> md_host : after loading all vmstate
>>>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>>>> md_host : after cpu_synchronize_all_post_init
>>>>>>>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>>>>>>>
>>>>>>>> This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
>>>>>>> OK, a couple of things.  Memory don't have to be exactly identical.
>>>>>>> Virtio devices in particular do funny things on "post-load".  There
>>>>>>> aren't warantees for that as far as I know, we should end with an
>>>>>>> equivalent device state in memory.
>>>>>>>
>>>>>>>> We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
>>>>>>>> We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
>>>>>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>>>>>>> This seems to point to a bug in one of the devices.
>>>>>>>
>>>>>>>> Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
>>>>>>>> VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
>>>>>>>> when do checkpoint in COLO FT, and everything will be OK.)
>>>>>>>>
>>>>>>>> Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
>>>>>>> Pages transferred should be the same, after device state transmission is
>>>>>>> when things could change.
>>>>>>>
>>>>>>>> This issue has blocked our COLO development... :(
>>>>>>>>
>>>>>>>> Any help will be greatly appreciated!
>>>>>>> Later, Juan.
>>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>
Stefan Hajnoczi March 31, 2015, 2:16 p.m. UTC | #28
On Tue, Mar 31, 2015 at 03:54:51PM +0800, Wen Congyang wrote:
> On 03/27/2015 04:56 PM, Stefan Hajnoczi wrote:
> > On Thu, Mar 26, 2015 at 11:29:43AM +0100, Juan Quintela wrote:
> >> Wen Congyang <wency@cn.fujitsu.com> wrote:
> >>> On 03/25/2015 05:50 PM, Juan Quintela wrote:
> >>>> zhanghailiang <zhang.zhanghailiang@huawei.com> wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> We found that, sometimes, the content of VM's memory is
> >>>>> inconsistent between Source side and Destination side
> >>>>> when we check it just after finishing migration but before VM continue to Run.
> >>>>>
> >>>>> We use a patch like bellow to find this issue, you can find it from affix,
> >>>>> and Steps to reprduce:
> >>>>>
> >>>>> (1) Compile QEMU:
> >>>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
> >>>>>
> >>>>> (2) Command and output:
> >>>>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
> >>>>> qemu64,-kvmclock -netdev tap,id=hn0-device
> >>>>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
> >>>>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
> >>>>> -device
> >>>>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
> >>>>> -monitor stdio
> >>>>
> >>>> Could you try to reproduce:
> >>>> - without vhost
> >>>> - without virtio-net
> >>>> - cache=unsafe is going to give you trouble, but trouble should only
> >>>>   happen after migration of pages have finished.
> >>>
> >>> If I use ide disk, it doesn't happen.
> >>> Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
> >>> it is because I migrate the guest when it is booting. The virtio net
> >>> device is not used in this case.
> >>
> >> Kevin, Stefan, Michael, any great idea?
> > 
> > You must use -drive cache=none if you want to use live migration.  It
> > should not directly affect memory during migration though.
> > 
> >>>>> We have done further test and found that some pages has been
> >>>>> dirtied but its corresponding migration_bitmap is not set.
> >>>>> We can't figure out which modules of QEMU has missed setting bitmap
> >>>>> when dirty page of VM,
> >>>>> it is very difficult for us to trace all the actions of dirtying VM's pages.
> >>>>
> >>>> This seems to point to a bug in one of the devices.
> > 
> > I think you'll need to track down which pages are different.  If you are
> > lucky, their contents will reveal what the page is used for.
> 
> I have a question about memory dirty log:
> If qemu modifies some memory, is this page marked dirty?

I think the answer is "no", the dirty page tracking only affects guest
mode.

Even if dirty page tracking covered QEMU userspace code, remember that
QEMU can be run without KVM in TCG mode.  In order to support that it's
necessary to always mark memory dirty explicitly when written from QEMU
code.

Stefan
Dr. David Alan Gilbert March 31, 2015, 7:06 p.m. UTC | #29
* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/3/30 15:59, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>On 2015/3/27 18:18, Dr. David Alan Gilbert wrote:
> >>>* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>>>On 2015/3/26 11:52, Li Zhijian wrote:
> >>>>>On 03/26/2015 11:12 AM, Wen Congyang wrote:
> >>>>>>On 03/25/2015 05:50 PM, Juan Quintela wrote:
> >>>>>>>zhanghailiang<zhang.zhanghailiang@huawei.com>  wrote:
> >>>>>>>>Hi all,
> >>>>>>>>
> >>>>>>>>We found that, sometimes, the content of VM's memory is inconsistent between Source side and Destination side
> >>>>>>>>when we check it just after finishing migration but before VM continue to Run.
> >>>>>>>>
> >>>>>>>>We use a patch like bellow to find this issue, you can find it from affix,
> >>>>>>>>and Steps to reprduce:
> >>>>>>>>
> >>>>>>>>(1) Compile QEMU:
> >>>>>>>>  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make
> >>>>>>>>
> >>>>>>>>(2) Command and output:
> >>>>>>>>SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
> >>>>>>>Could you try to reproduce:
> >>>>>>>- without vhost
> >>>>>>>- without virtio-net
> >>>>>>>- cache=unsafe is going to give you trouble, but trouble should only
> >>>>>>>   happen after migration of pages have finished.
> >>>>>>If I use ide disk, it doesn't happen.
> >>>>>>Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
> >>>>>>it is because I migrate the guest when it is booting. The virtio net
> >>>>>>device is not used in this case.
> >>>>>Er??????
> >>>>>it reproduces in my ide disk
> >>>>>there is no any virtio device, my command line like below
> >>>>>
> >>>>>x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net none
> >>>>>-boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
> >>>>>usb=off -no-user-config -nodefaults -monitor stdio -vga std
> >>>>>
> >>>>>it seems easily to reproduce this issue by following steps in _ubuntu_ guest
> >>>>>1.  in source side, choose memtest in grub
> >>>>>2. do live migration
> >>>>>3. exit memtest(type Esc in when memory testing)
> >>>>>4. wait migration complete
> >>>>>
> >>>>
> >>>>Yes???it is a thorny problem. It is indeed easy to reproduce, just as
> >>>>your steps in the above.
> >>>>
> >>>>This is my test result: (I also test accel=tcg, it can be reproduced also.)
> >>>>Source side:
> >>>># x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
> >>>>(qemu) ACPI_BUILD: init ACPI tables
> >>>>ACPI_BUILD: init ACPI tables
> >>>>migrate tcp:9.61.1.8:3004
> >>>>ACPI_BUILD: init ACPI tables
> >>>>before cpu_synchronize_all_states
> >>>>5a8f72d66732cac80d6a0d5713654c0e
> >>>>md_host : before saving ram complete
> >>>>5a8f72d66732cac80d6a0d5713654c0e
> >>>>md_host : after saving ram complete
> >>>>5a8f72d66732cac80d6a0d5713654c0e
> >>>>(qemu)
> >>>>
> >>>>Destination side:
> >>>># x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio -incoming tcp:0:3004
> >>>>(qemu) QEMU_VM_SECTION_END, after loading ram
> >>>>d7cb0d8a4bdd1557fb0e78baee50c986
> >>>>md_host : after loading all vmstate
> >>>>d7cb0d8a4bdd1557fb0e78baee50c986
> >>>>md_host : after cpu_synchronize_all_post_init
> >>>>d7cb0d8a4bdd1557fb0e78baee50c986
> >>>
> >>>Hmm, that's not good.  I suggest you md5 each of the RAMBlock's individually;
> >>>to see if it's main RAM that's different or something more subtle like
> >>>video RAM.
> >>>
> >>
> >>Er, all my previous tests are md5 'pc.ram' block only.
> >>
> >>>But then maybe it's easier just to dump the whole of RAM to file
> >>>and byte compare it (hexdump the two dumps and diff ?)
> >>
> >>Hmm, we also used memcmp function to compare every page, but the addresses
> >>seem to be random.
> >>
> >>Besides, in our previous test, we found it seems to be more easy to reproduce
> >>when migration occurs during VM's start-up or reboot process.
> >>
> >>Is there any possible that some devices have special treatment when VM start-up
> >>which may miss setting dirty-bitmap ?
> >
> >I don't think there should be, but the code paths used during startup are
> >probably much less tested with migration.  I'm sure the startup code
> >uses different part of device emulation.   I do know we have some bugs
> 
> Er, Maybe there is a special case:
> 
> During VM's start-up, i found that the KVMslot changed many times, it was a process of
> smashing total memory space into small slot.
> 
> If some pages was dirtied and its dirty-bitmap has been set in KVM module,
> but we didn't sync the bitmaps to QEMU user-space before this slot was smashed,
> with its previous bitmap been destroyed.
> The bitmap of dirty pages in the previous KVMslot maybe be missed.
> 
> What's your opinion? Can this situation i described in the above happen?
> 
> The bellow log was grabbed, when i tried to figure out a quite same question (some pages miss dirty-bitmap setting) we found in COLO:
> Occasionally, there will be an error report in SLAVE side:
> 
>     qemu: warning: error while loading state for instance 0x0 of device
>     'kvm-tpr-opt'                                                 '
>     qemu-system-x86_64: loadvm failed
> 
> We found that it related to three address (gpa: 0xca000,0xcb000,0xcc000, which are the address of 'kvmvapic.rom ?'), and sometimes
> their corresponding dirty-map will be missed in Master side, because their KVMSlot is destroyed before we sync its dirty-bitmap to qemu.
> 
> (I'm still not quite sure if this can also happen in common migration, i will try to test it in normal migration)

Ah, kvm-tpr-opt.
I also hit this today with colo; it didn't happen on a new machine, but it happens on an older machine, or in
this case a nested setup.

-global kvm-apic.vapic=false

Turns the vapic off, and seems to have got me further.

I don't understand the details of kvm-tpr-opt; but it's a hack for speeding up access
to the 'tpr' (Task priority register) which apparently Windows does a lot and used
to slow things down;  the hack does involve modifying some stuff on the fly, and it
doesn't happen if the host has some new features (tpr shadow I think also look for flexpriority).

Dave
> 
> Thanks,
> zhanghailiang
> 
> /var/log/message:
> Mar 31 18:32:53 master kernel: [15908.524721] memslot dirty bitmap of '4' was destroyed, base_gfn 0x100, npages 524032
> Mar 31 18:32:53 master kernel: [15908.657853] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.665105] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.672360] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.682058] memslot dirty bitmap of '4' was destroyed, base_gfn 0xfebc0, npages 16
> Mar 31 18:32:53 master kernel: [15908.849527] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.856845] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.864161] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.872676] memslot dirty bitmap of '4' was destroyed, base_gfn 0xfeb80, npages 64
> Mar 31 18:32:53 master kernel: [15908.882694] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.889948] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.897202] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.911652] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.919543] +++ gfn=0xca is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.927419] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.935317] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.943209] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.951083] +++ gfn=0xcb is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.958971] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.966837] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.974707] +++ gfn=0xcc is marked dirty, id=2, base_gfn=0xc0, npages=524096
> Mar 31 18:32:53 master kernel: [15908.988470] memslot dirty bitmap of '2' was destroyed, base_gfn 0xc0, npages 524096
> Mar 31 18:32:53 master kernel: [15909.002403] kvm: zapping shadow pages for mmio generation wraparound
> Mar 31 18:32:53 master kernel: [15909.002523] memslot dirty bitmap of '2' was destroyed, base_gfn 0xc0, npages 8
> Mar 31 18:32:53 master kernel: [15909.010110] memslot dirty bitmap of '4' was destroyed, base_gfn 0xc8, npages 524088
> Mar 31 18:32:53 master kernel: [15909.023988] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 3
> Mar 31 18:32:53 master kernel: [15909.031594] memslot dirty bitmap of '6' was destroyed, base_gfn 0xd0, npages 524080
> Mar 31 18:32:53 master kernel: [15909.044708] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 11
> Mar 31 18:32:53 master kernel: [15909.052392] memslot dirty bitmap of '6' was destroyed, base_gfn 0xd8, npages 524072
> Mar 31 18:32:53 master kernel: [15909.065651] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 19
> Mar 31 18:32:53 master kernel: [15909.073329] memslot dirty bitmap of '6' was destroyed, base_gfn 0xe0, npages 524064
> Mar 31 18:32:53 master kernel: [15909.086379] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 27
> Mar 31 18:32:53 master kernel: [15909.094084] memslot dirty bitmap of '6' was destroyed, base_gfn 0xe8, npages 524056
> Mar 31 18:32:53 master kernel: [15909.107354] memslot dirty bitmap of '6' was destroyed, base_gfn 0xec, npages 524052
> Mar 31 18:32:54 master dhcpcd[5408]: eth0: timed out
> Mar 31 18:32:54 master dhcpcd[5408]: eth0: trying to use old lease in `/var/lib/dhcpcd/dhcpcd-eth0.info'
> Mar 31 18:32:54 master dhcpcd[5408]: eth0: lease expired 30265 seconds ago
> Mar 31 18:32:54 master dhcpcd[5408]: eth0: broadcasting for a lease
> Mar 31 18:33:02 master qemu-system-x86_64: ====qemu: The 0 times do checkpoing===
> Mar 31 18:33:02 master qemu-system-x86_64: == migration_bitmap_sync ==
> Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xca000 (hva=0x7f4fa32ca000), who's bitmap not set ---
> Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xcb000 (hva=0x7f4fa32cb000), who's bitmap not set ---
> Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xcc000 (hva=0x7f4fa32cc000), who's bitmap not set ---
> Mar 31 18:33:05 master kernel: [15921.057246] device eth2 left promiscuous mode
> Mar 31 18:33:07 master avahi-daemon[5773]: Withdrawing address record for fe80::90be:cfff:fe9e:f03e on tap0.
> Mar 31 18:33:07 master kernel: [15922.513313] br0: port 2(tap0) entered disabled state
> Mar 31 18:33:07 master kernel: [15922.513480] device tap0 left promiscuous mode
> Mar 31 18:33:07 master kernel: [15922.513508] br0: port 2(tap0) entered disabled state
> Mar 31 18:33:07 master kernel: [15922.562591] memslot dirty bitmap of '8' was destroyed, base_gfn 0x100, npages 524032
> Mar 31 18:33:07 master kernel: [15922.570948] memslot dirty bitmap of '3' was destroyed, base_gfn 0xfd000, npages 4096
> Mar 31 18:33:07 master kernel: [15922.578967] memslot dirty bitmap of '0' was destroyed, base_gfn 0x0, npages 160
> Mar 31 18:33:07 master kernel: [15922.586556] memslot dirty bitmap of '1' was destroyed, base_gfn 0xfffc0, npages 64
> Mar 31 18:33:07 master kernel: [15922.586574] memslot dirty bitmap of '5' was destroyed, base_gfn 0xcd, npages 31
> Mar 31 18:33:07 master kernel: [15922.586575] memslot dirty bitmap of '7' was destroyed, base_gfn 0xf0, npages 16
> Mar 31 18:33:07 master kernel: [15922.586577] memslot dirty bitmap of '2' was destroyed, base_gfn 0xc0, npages 10
> Mar 31 18:33:07 master kernel: [15922.586578] memslot dirty bitmap of '6' was destroyed, base_gfn 0xec, npages 4
> Mar 31 18:33:07 master kernel: [15922.586579] memslot dirty bitmap of '4' was destroyed, base_gfn 0xca, npages 3
> 
> PS:
> QEMU:
> static void migration_bitmap_sync(void)
> {
>     trace_migration_bitmap_sync_end(migration_dirty_pages
>                                     - num_dirty_pages_init);
>     num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
>     end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    syslog(LOG_INFO, "== migration_bitmap_sync ==");
> +    check_bitmap(();
> 
> +void check_bitmap(void)
> +{
> +    int i;
> +    char *host;
> +    ram_addr_t addr[3] = { 0xca000, 0xcb000, 0xcc000 };
> +    RAMBlock *block = NULL;
> +    int ret;
> +
> +    block = QLIST_FIRST_RCU(&ram_list.blocks);
> +    for (i = 0; i < 3; i++) {
> +        unsigned long nr = block->mr->ram_addr + (addr[i] >> TARGET_PAGE_BITS);
> +        host =  memory_region_get_ram_ptr(block->mr) + addr[i];
> +        ret = test_bit(nr, migration_bitmap);
> +        if (ret == 0) {
> +            syslog(LOG_INFO, "qemu:  --- addr=0x%llx (hva=%p), who's bitmap not set ---\n",
> +                       addr[i], host);
> +        } else {
> +            syslog(LOG_INFO, "qemu: +++ OK, addr=0x%llx (hva=%p) , who's bitap is set +++\n",
> +                       addr[i], host);;
> +        }
> +    }
> +}
> 
> KVM???
> static void mark_page_dirty_in_slot(struct kvm *kvm,
>                     struct kvm_memory_slot *memslot,
>                     gfn_t gfn)
> {
> +    if ((gfn == 0xca || gfn == 0xcb || gfn == 0xcc) && !memslot->dirty_bitmap) {
> +        printk("oops, dirty_bitmap is null, gfn=0x%llx\n",gfn);
> +    }
>     if (memslot && memslot->dirty_bitmap) {
>         unsigned long rel_gfn = gfn - memslot->base_gfn;
> +        if (gfn == 0xca || gfn == 0xcb || gfn == 0xcc) {
> +            printk("+++ gfn=0x%llx is marked dirty, id=%d, base_gfn=0x%llx, npages=%d\n",
> +                   gfn, memslot->id, memslot->base_gfn, memslot->npages);
> +        }
>         set_bit_le(rel_gfn, memslot->dirty_bitmap);
>     }
> }
> 
> static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
> {
>     if (!memslot->dirty_bitmap)
>         return;
> +    if (memslot->id != 9) {
> +        printk("memslot dirty bitmap of '%d' was destroyed, base_gfn 0x%llx, npages %lld\n",
> +           memslot->id, memslot->base_gfn, memslot->npages);
> +    }
>     kvm_kvfree(memslot->dirty_bitmap);
>     memslot->dirty_bitmap = NULL;
> }
> 
> 
> >filed against migration during windows boot, I'd not considered that it might
> >be devices not updating the bitmap.
> >
> >Dave
> >
> >>
> >>
> >>Thanks,
> >>zhanghailiang
> >>
> >>
> >>>>>>
> >>>>>>>What kind of load were you having when reproducing this issue?
> >>>>>>>Just to confirm, you have been able to reproduce this without COLO
> >>>>>>>patches, right?
> >>>>>>>
> >>>>>>>>(qemu) migrate tcp:192.168.3.8:3004
> >>>>>>>>before saving ram complete
> >>>>>>>>ff703f6889ab8701e4e040872d079a28
> >>>>>>>>md_host : after saving ram complete
> >>>>>>>>ff703f6889ab8701e4e040872d079a28
> >>>>>>>>
> >>>>>>>>DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -incoming tcp:0:3004
> >>>>>>>>(qemu) QEMU_VM_SECTION_END, after loading ram
> >>>>>>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>>>>>md_host : after loading all vmstate
> >>>>>>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>>>>>md_host : after cpu_synchronize_all_post_init
> >>>>>>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>>>>>
> >>>>>>>>This happens occasionally, and it is more easy to reproduce when issue migration command during VM's startup time.
> >>>>>>>OK, a couple of things.  Memory don't have to be exactly identical.
> >>>>>>>Virtio devices in particular do funny things on "post-load".  There
> >>>>>>>aren't warantees for that as far as I know, we should end with an
> >>>>>>>equivalent device state in memory.
> >>>>>>>
> >>>>>>>>We have done further test and found that some pages has been dirtied but its corresponding migration_bitmap is not set.
> >>>>>>>>We can't figure out which modules of QEMU has missed setting bitmap when dirty page of VM,
> >>>>>>>>it is very difficult for us to trace all the actions of dirtying VM's pages.
> >>>>>>>This seems to point to a bug in one of the devices.
> >>>>>>>
> >>>>>>>>Actually, the first time we found this problem was in the COLO FT development, and it triggered some strange issues in
> >>>>>>>>VM which all pointed to the issue of inconsistent of VM's memory. (We have try to save all memory of VM to slave side every time
> >>>>>>>>when do checkpoint in COLO FT, and everything will be OK.)
> >>>>>>>>
> >>>>>>>>Is it OK for some pages that not transferred to destination when do migration ? Or is it a bug?
> >>>>>>>Pages transferred should be the same, after device state transmission is
> >>>>>>>when things could change.
> >>>>>>>
> >>>>>>>>This issue has blocked our COLO development... :(
> >>>>>>>>
> >>>>>>>>Any help will be greatly appreciated!
> >>>>>>>Later, Juan.
> >>>>>>>
> >>>>>>.
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>--
> >>>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>>.
> >>>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox

Patch

From ecb789cf7f383b112da3cce33eb9822a94b9497a Mon Sep 17 00:00:00 2001
From: Li Zhijian <lizhijian@cn.fujitsu.com>
Date: Tue, 24 Mar 2015 21:53:26 -0400
Subject: [PATCH] check pc.ram block md5sum between migration Source and
 Destination

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 savevm.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)
 mode change 100644 => 100755 savevm.c

diff --git a/savevm.c b/savevm.c
old mode 100644
new mode 100755
index 3b0e222..3d431dc
--- a/savevm.c
+++ b/savevm.c
@@ -51,6 +51,26 @@ 
 #define ARP_PTYPE_IP 0x0800
 #define ARP_OP_REQUEST_REV 0x3
 
+#include "qemu/rcu_queue.h"
+#include <openssl/md5.h>
+
+static void check_host_md5(void)
+{
+    int i;
+    unsigned char md[MD5_DIGEST_LENGTH];
+    MD5_CTX ctx;
+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check 'pc.ram' block */
+
+    MD5_Init(&ctx);
+    MD5_Update(&ctx, (void *)block->host, block->used_length);
+    MD5_Final(md, &ctx);
+    printf("md_host : ");
+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
+        fprintf(stderr, "%02x", md[i]);
+    }
+    fprintf(stderr, "\n");
+}
+
 static int announce_self_create(uint8_t *buf,
                                 uint8_t *mac_addr)
 {
@@ -741,7 +761,13 @@  void qemu_savevm_state_complete(QEMUFile *f)
         qemu_put_byte(f, QEMU_VM_SECTION_END);
         qemu_put_be32(f, se->section_id);
 
+        printf("before saving %s complete\n", se->idstr);
+        check_host_md5();
+
         ret = se->ops->save_live_complete(f, se->opaque);
+        printf("after saving %s complete\n", se->idstr);
+        check_host_md5();
+
         trace_savevm_section_end(se->idstr, se->section_id, ret);
         if (ret < 0) {
             qemu_file_set_error(f, ret);
@@ -1007,6 +1033,13 @@  int qemu_loadvm_state(QEMUFile *f)
             QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
 
             ret = vmstate_load(f, le->se, le->version_id);
+#if 0
+            if (section_type == QEMU_VM_SECTION_FULL) {
+                printf("QEMU_VM_SECTION_FULL, after loading %s\n", le->se->idstr);
+                check_host_md5();
+            }
+#endif
+
             if (ret < 0) {
                 error_report("error while loading state for instance 0x%x of"
                              " device '%s'", instance_id, idstr);
@@ -1030,6 +1063,11 @@  int qemu_loadvm_state(QEMUFile *f)
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
+            if (section_type == QEMU_VM_SECTION_END) {
+                printf("QEMU_VM_SECTION_END, after loading %s\n", le->se->idstr);
+                check_host_md5();
+            }
+
             if (ret < 0) {
                 error_report("error while loading state section id %d(%s)",
                              section_id, le->se->idstr);
@@ -1061,7 +1099,11 @@  int qemu_loadvm_state(QEMUFile *f)
         g_free(buf);
     }
 
+    printf("after loading all vmstate\n");
+    check_host_md5();
     cpu_synchronize_all_post_init();
+    printf("after cpu_synchronize_all_post_init\n");
+    check_host_md5();
 
     ret = 0;
 
-- 
1.7.12.4