Patchwork [RFC,09/15] introduce a new monitor command 'dump' to dump guest's memory

login
register
mail settings
Submitter Wen Congyang
Date Jan. 19, 2012, 3:07 a.m.
Message ID <4F1788DD.2040301@cn.fujitsu.com>
Download mbox | patch
Permalink /patch/136748/
State New
Headers show

Comments

Wen Congyang - Jan. 19, 2012, 3:07 a.m.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 Makefile.target  |    8 +-
 dump.c           |  590 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 dump.h           |    3 +
 hmp-commands.hx  |   16 ++
 hmp.c            |    9 +
 hmp.h            |    1 +
 monitor.c        |    3 +
 qapi-schema.json |   13 ++
 qmp-commands.hx  |   26 +++
 9 files changed, 665 insertions(+), 4 deletions(-)
 create mode 100644 dump.c
Eric Blake - Jan. 19, 2012, 4:32 p.m.
On 01/18/2012 08:07 PM, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  Makefile.target  |    8 +-
>  dump.c           |  590 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  dump.h           |    3 +
>  hmp-commands.hx  |   16 ++
>  hmp.c            |    9 +
>  hmp.h            |    1 +
>  monitor.c        |    3 +
>  qapi-schema.json |   13 ++
>  qmp-commands.hx  |   26 +++
>  9 files changed, 665 insertions(+), 4 deletions(-)
>  create mode 100644 dump.c
> 

> +void qmp_dump(const char *file, Error **errp)
> +{
> +    const char *p;
> +    int fd = -1;
> +    DumpState *s;
> +
> +#if !defined(WIN32)
> +    if (strstart(file, "fd:", &p)) {
> +        fd = qemu_get_fd(p);
> +        if (fd == -1) {
> +            error_set(errp, QERR_FD_NOT_FOUND, p);
> +            return;
> +        }
> +    }
> +#endif

Thanks for implementing fd support off the bat.

> +
> +    if  (strstart(file, "file:", &p)) {
> +        fd = open(p, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY);

Use of O_CREAT requires that you pass a third argument to open()
specifying the mode_t to use.

> +++ b/hmp-commands.hx
> @@ -828,6 +828,22 @@ new parameters (if specified) once the vm migration finished successfully.
>  ETEXI
>  
>      {
> +        .name       = "dump",
> +        .args_type  = "file:s",
> +        .params     = "file",
> +        .help       = "dump to file",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd = hmp_dump,
> +    },

What if I want to dump only a fraction of the memory?  I think you need
optional start and length parameters, to limit how much memory to be
dumped, rather than forcing me to dump all memory at once.
Wen Congyang - Jan. 30, 2012, 5:36 a.m.
At 01/20/2012 12:32 AM, Eric Blake Wrote:
> On 01/18/2012 08:07 PM, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  Makefile.target  |    8 +-
>>  dump.c           |  590 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  dump.h           |    3 +
>>  hmp-commands.hx  |   16 ++
>>  hmp.c            |    9 +
>>  hmp.h            |    1 +
>>  monitor.c        |    3 +
>>  qapi-schema.json |   13 ++
>>  qmp-commands.hx  |   26 +++
>>  9 files changed, 665 insertions(+), 4 deletions(-)
>>  create mode 100644 dump.c
>>
> 
>> +void qmp_dump(const char *file, Error **errp)
>> +{
>> +    const char *p;
>> +    int fd = -1;
>> +    DumpState *s;
>> +
>> +#if !defined(WIN32)
>> +    if (strstart(file, "fd:", &p)) {
>> +        fd = qemu_get_fd(p);
>> +        if (fd == -1) {
>> +            error_set(errp, QERR_FD_NOT_FOUND, p);
>> +            return;
>> +        }
>> +    }
>> +#endif
> 
> Thanks for implementing fd support off the bat.
> 
>> +
>> +    if  (strstart(file, "file:", &p)) {
>> +        fd = open(p, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY);
> 
> Use of O_CREAT requires that you pass a third argument to open()
> specifying the mode_t to use.

Yes, I forgot it, and will fix it.

> 
>> +++ b/hmp-commands.hx
>> @@ -828,6 +828,22 @@ new parameters (if specified) once the vm migration finished successfully.
>>  ETEXI
>>  
>>      {
>> +        .name       = "dump",
>> +        .args_type  = "file:s",
>> +        .params     = "file",
>> +        .help       = "dump to file",
>> +        .user_print = monitor_user_noop,
>> +        .mhandler.cmd = hmp_dump,
>> +    },
> 
> What if I want to dump only a fraction of the memory?  I think you need
> optional start and length parameters, to limit how much memory to be
> dumped, rather than forcing me to dump all memory at once.
> 

It is OK to support it, but I do not know why do you want it?

The purpose of this command is dumping the memory when the guest is paniced.
And then we can use crash/gdb(or other application) to investigate why the guest
is paniced. So we should dump the whole memory.

Thanks
Wen Congyang
Eric Blake - Jan. 30, 2012, 5:19 p.m.
On 01/29/2012 10:36 PM, Wen Congyang wrote:
>>> +++ b/hmp-commands.hx
>>> @@ -828,6 +828,22 @@ new parameters (if specified) once the vm migration finished successfully.
>>>  ETEXI
>>>  
>>>      {
>>> +        .name       = "dump",
>>> +        .args_type  = "file:s",
>>> +        .params     = "file",
>>> +        .help       = "dump to file",
>>> +        .user_print = monitor_user_noop,
>>> +        .mhandler.cmd = hmp_dump,
>>> +    },
>>
>> What if I want to dump only a fraction of the memory?  I think you need
>> optional start and length parameters, to limit how much memory to be
>> dumped, rather than forcing me to dump all memory at once.
>>
> 
> It is OK to support it, but I do not know why do you want it?
> 
> The purpose of this command is dumping the memory when the guest is paniced.
> And then we can use crash/gdb(or other application) to investigate why the guest
> is paniced. So we should dump the whole memory.

That's one purpose, but not the only purpose.  We shouldn't be
artificially constraining things into requiring the entire memory region
in order to use this command.

Libvirt provides virDomainMemoryPeek which currently wraps the 'memsave'
and 'pmemsave' monitor commands, but these commands output raw memory.
Your command is introducing a new memory format into ELF images, and if
'memsave' can already do a subset of memory, it also makes sense for
'dump' to do a subset when creating the ELF image.  That is, if a
management app every has a reason to access a subset of memory, then
this reason exists whether the subset is raw or ELF formatted when
presented to the management app.

Meanwhile, on the libvirt side, the virDomainMemoryPeek API to
management apps is constrained - it sends the data inline with the
command, rather than on a side channel.  Someday, I'd like to enhance
libvirt to have a dump-to-stream command, and reuse the existing libvirt
ability to stream large amounts of data on side channels, in order to
let management apps directly and atomically query a subset of memory
into a file with the desired formatting, rather than the current
approach of constraining the management app to only query 64k at a time
and to have to manually pause the guest if they need to atomically
inspect more memory.
Wen Congyang - Jan. 31, 2012, 1:39 a.m.
At 01/31/2012 01:19 AM, Eric Blake Wrote:
> On 01/29/2012 10:36 PM, Wen Congyang wrote:
>>>> +++ b/hmp-commands.hx
>>>> @@ -828,6 +828,22 @@ new parameters (if specified) once the vm migration finished successfully.
>>>>  ETEXI
>>>>  
>>>>      {
>>>> +        .name       = "dump",
>>>> +        .args_type  = "file:s",
>>>> +        .params     = "file",
>>>> +        .help       = "dump to file",
>>>> +        .user_print = monitor_user_noop,
>>>> +        .mhandler.cmd = hmp_dump,
>>>> +    },
>>>
>>> What if I want to dump only a fraction of the memory?  I think you need
>>> optional start and length parameters, to limit how much memory to be
>>> dumped, rather than forcing me to dump all memory at once.
>>>
>>
>> It is OK to support it, but I do not know why do you want it?
>>
>> The purpose of this command is dumping the memory when the guest is paniced.
>> And then we can use crash/gdb(or other application) to investigate why the guest
>> is paniced. So we should dump the whole memory.
> 
> That's one purpose, but not the only purpose.  We shouldn't be
> artificially constraining things into requiring the entire memory region
> in order to use this command.
> 
> Libvirt provides virDomainMemoryPeek which currently wraps the 'memsave'
> and 'pmemsave' monitor commands, but these commands output raw memory.
> Your command is introducing a new memory format into ELF images, and if
> 'memsave' can already do a subset of memory, it also makes sense for
> 'dump' to do a subset when creating the ELF image.  That is, if a
> management app every has a reason to access a subset of memory, then
> this reason exists whether the subset is raw or ELF formatted when
> presented to the management app.

OK. I know why you want it, and will support it. Please wait for some
days.

Thanks
Wen Congyang

> 
> Meanwhile, on the libvirt side, the virDomainMemoryPeek API to
> management apps is constrained - it sends the data inline with the
> command, rather than on a side channel.  Someday, I'd like to enhance
> libvirt to have a dump-to-stream command, and reuse the existing libvirt
> ability to stream large amounts of data on side channels, in order to
> let management apps directly and atomically query a subset of memory
> into a file with the desired formatting, rather than the current
> approach of constraining the management app to only query 64k at a time
> and to have to manually pause the guest if they need to atomically
> inspect more memory.
>

Patch

diff --git a/Makefile.target b/Makefile.target
index d869550..abfb057 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -110,7 +110,7 @@  $(call set-vpath, $(SRC_PATH)/linux-user:$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR
 QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) -I$(SRC_PATH)/linux-user
 obj-y = main.o syscall.o strace.o mmap.o signal.o thunk.o \
       elfload.o linuxload.o uaccess.o gdbstub.o cpu-uname.o \
-      user-exec.o $(oslib-obj-y)
+      user-exec.o $(oslib-obj-y) dump.o
 
 obj-$(TARGET_HAS_BFLT) += flatload.o
 
@@ -148,7 +148,7 @@  LDFLAGS+=-Wl,-segaddr,__STD_PROG_ZONE,0x1000 -image_base 0x0e000000
 LIBS+=-lmx
 
 obj-y = main.o commpage.o machload.o mmap.o signal.o syscall.o thunk.o \
-        gdbstub.o user-exec.o
+        gdbstub.o user-exec.o dump.o
 
 obj-i386-y += ioport-user.o
 
@@ -170,7 +170,7 @@  $(call set-vpath, $(SRC_PATH)/bsd-user)
 QEMU_CFLAGS+=-I$(SRC_PATH)/bsd-user -I$(SRC_PATH)/bsd-user/$(TARGET_ARCH)
 
 obj-y = main.o bsdload.o elfload.o mmap.o signal.o strace.o syscall.o \
-        gdbstub.o uaccess.o user-exec.o
+        gdbstub.o uaccess.o user-exec.o dump.o
 
 obj-i386-y += ioport-user.o
 
@@ -186,7 +186,7 @@  endif #CONFIG_BSD_USER
 # System emulator target
 ifdef CONFIG_SOFTMMU
 
-obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o
+obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o dump.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 obj-$(CONFIG_NO_PCI) += pci-stub.o
diff --git a/dump.c b/dump.c
new file mode 100644
index 0000000..2951b8b
--- /dev/null
+++ b/dump.c
@@ -0,0 +1,590 @@ 
+/*
+ * QEMU dump
+ *
+ * Copyright Fujitsu, Corp. 2011
+ *
+ * Authors:
+ *     Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include <unistd.h>
+#include <elf.h>
+#include <sys/procfs.h>
+#include <glib.h>
+#include "cpu.h"
+#include "cpu-all.h"
+#include "targphys.h"
+#include "monitor.h"
+#include "kvm.h"
+#include "dump.h"
+#include "sysemu.h"
+#include "bswap.h"
+#include "memory_mapping.h"
+#include "error.h"
+#include "qmp-commands.h"
+
+#define CPU_CONVERT_TO_TARGET16(val) \
+({ \
+    uint16_t _val = (val); \
+    if (endian == ELFDATA2LSB) { \
+        _val = cpu_to_le16(_val); \
+    } else {\
+        _val = cpu_to_be16(_val); \
+    } \
+    _val; \
+})
+
+#define CPU_CONVERT_TO_TARGET32(val) \
+({ \
+    uint32_t _val = (val); \
+    if (endian == ELFDATA2LSB) { \
+        _val = cpu_to_le32(_val); \
+    } else {\
+        _val = cpu_to_be32(_val); \
+    } \
+    _val; \
+})
+
+#define CPU_CONVERT_TO_TARGET64(val) \
+({ \
+    uint64_t _val = (val); \
+    if (endian == ELFDATA2LSB) { \
+        _val = cpu_to_le64(_val); \
+    } else {\
+        _val = cpu_to_be64(_val); \
+    } \
+    _val; \
+})
+
+enum {
+    DUMP_STATE_ERROR,
+    DUMP_STATE_SETUP,
+    DUMP_STATE_CANCELLED,
+    DUMP_STATE_ACTIVE,
+    DUMP_STATE_COMPLETED,
+};
+
+typedef struct DumpState {
+    ArchDumpInfo dump_info;
+    MemoryMappingList list;
+    int phdr_num;
+    int state;
+    char *error;
+    int fd;
+    target_phys_addr_t memory_offset;
+} DumpState;
+
+static DumpState *dump_get_current(void)
+{
+    static DumpState current_dump = {
+        .state = DUMP_STATE_SETUP,
+    };
+
+    return &current_dump;
+}
+
+static int dump_cleanup(DumpState *s)
+{
+    int ret = 0;
+
+    free_memory_mapping_list(&s->list);
+    if (s->fd != -1) {
+        close(s->fd);
+        s->fd = -1;
+    }
+
+    return ret;
+}
+
+static void dump_error(DumpState *s, const char *reason)
+{
+    s->state = DUMP_STATE_ERROR;
+    s->error = g_strdup(reason);
+    dump_cleanup(s);
+}
+
+static inline int cpuid(CPUState *env)
+{
+#if defined(CONFIG_USER_ONLY) && defined(CONFIG_USE_NPTL)
+    return env->host_tid;
+#else
+    return env->cpu_index + 1;
+#endif
+}
+
+static int write_elf64_header(DumpState *s)
+{
+    Elf64_Ehdr elf_header;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&elf_header, 0, sizeof(Elf64_Ehdr));
+    memcpy(&elf_header, ELFMAG, 4);
+    elf_header.e_ident[EI_CLASS] = ELFCLASS64;
+    elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
+    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
+    elf_header.e_type = CPU_CONVERT_TO_TARGET16(ET_CORE);
+    elf_header.e_machine = CPU_CONVERT_TO_TARGET16(s->dump_info.d_machine);
+    elf_header.e_version = CPU_CONVERT_TO_TARGET32(EV_CURRENT);
+    elf_header.e_ehsize = CPU_CONVERT_TO_TARGET16(sizeof(elf_header));
+    elf_header.e_phoff = CPU_CONVERT_TO_TARGET64(sizeof(Elf64_Ehdr));
+    elf_header.e_phentsize = CPU_CONVERT_TO_TARGET16(sizeof(Elf64_Phdr));
+    elf_header.e_phnum = CPU_CONVERT_TO_TARGET16(s->phdr_num);
+
+    lseek(s->fd, 0, SEEK_SET);
+    ret = write(s->fd, &elf_header, sizeof(elf_header));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write elf header.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_header(DumpState *s)
+{
+    Elf32_Ehdr elf_header;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&elf_header, 0, sizeof(Elf32_Ehdr));
+    memcpy(&elf_header, ELFMAG, 4);
+    elf_header.e_ident[EI_CLASS] = ELFCLASS32;
+    elf_header.e_ident[EI_DATA] = endian;
+    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
+    elf_header.e_type = CPU_CONVERT_TO_TARGET16(ET_CORE);
+    elf_header.e_machine = CPU_CONVERT_TO_TARGET16(s->dump_info.d_machine);
+    elf_header.e_version = CPU_CONVERT_TO_TARGET32(EV_CURRENT);
+    elf_header.e_ehsize = CPU_CONVERT_TO_TARGET16(sizeof(elf_header));
+    elf_header.e_phoff = CPU_CONVERT_TO_TARGET32(sizeof(Elf32_Ehdr));
+    elf_header.e_phentsize = CPU_CONVERT_TO_TARGET16(sizeof(Elf32_Phdr));
+    elf_header.e_phnum = CPU_CONVERT_TO_TARGET16(s->phdr_num);
+
+    lseek(s->fd, 0, SEEK_SET);
+    ret = write(s->fd, &elf_header, sizeof(elf_header));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write elf header.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
+                            int phdr_index, target_phys_addr_t offset)
+{
+    Elf64_Phdr phdr;
+    off_t phdr_offset;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&phdr, 0, sizeof(Elf64_Phdr));
+    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_LOAD);
+    phdr.p_offset = CPU_CONVERT_TO_TARGET64(offset);
+    phdr.p_paddr = CPU_CONVERT_TO_TARGET64(memory_mapping->phys_addr);
+    if (offset == -1) {
+        phdr.p_filesz = 0;
+    } else {
+        phdr.p_filesz = CPU_CONVERT_TO_TARGET64(memory_mapping->length);
+    }
+    phdr.p_memsz = CPU_CONVERT_TO_TARGET64(memory_mapping->length);
+    phdr.p_vaddr = CPU_CONVERT_TO_TARGET64(memory_mapping->virt_addr);
+
+    phdr_offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*phdr_index;
+    lseek(s->fd, phdr_offset, SEEK_SET);
+    ret = write(s->fd, &phdr, sizeof(Elf64_Phdr));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
+                            int phdr_index, target_phys_addr_t offset)
+{
+    Elf32_Phdr phdr;
+    off_t phdr_offset;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&phdr, 0, sizeof(Elf32_Phdr));
+    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_LOAD);
+    phdr.p_offset = CPU_CONVERT_TO_TARGET32(offset);
+    phdr.p_paddr = CPU_CONVERT_TO_TARGET32(memory_mapping->phys_addr);
+    if (offset == -1) {
+        phdr.p_filesz = 0;
+    } else {
+        phdr.p_filesz = CPU_CONVERT_TO_TARGET32(memory_mapping->length);
+    }
+    phdr.p_memsz = CPU_CONVERT_TO_TARGET32(memory_mapping->length);
+    phdr.p_vaddr = CPU_CONVERT_TO_TARGET32(memory_mapping->virt_addr);
+
+    phdr_offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*phdr_index;
+    lseek(s->fd, phdr_offset, SEEK_SET);
+    ret = write(s->fd, &phdr, sizeof(Elf32_Phdr));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf64_notes(DumpState *s, int phdr_index,
+                             target_phys_addr_t *offset)
+{
+    CPUState *env;
+    int ret;
+    target_phys_addr_t begin = *offset;
+    Elf64_Phdr phdr;
+    off_t phdr_offset;
+    int id;
+    int endian = s->dump_info.d_endian;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        id = cpuid(env);
+        ret = cpu_write_elf64_note(s->fd, env, id, offset);
+        if (ret < 0) {
+            dump_error(s, "dump: failed to write elf notes.\n");
+            return -1;
+        }
+    }
+
+    memset(&phdr, 0, sizeof(Elf64_Phdr));
+    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_NOTE);
+    phdr.p_offset = CPU_CONVERT_TO_TARGET64(begin);
+    phdr.p_paddr = 0;
+    phdr.p_filesz = CPU_CONVERT_TO_TARGET64(*offset - begin);
+    phdr.p_memsz = CPU_CONVERT_TO_TARGET64(*offset - begin);
+    phdr.p_vaddr = 0;
+
+    phdr_offset = sizeof(Elf64_Ehdr);
+    lseek(s->fd, phdr_offset, SEEK_SET);
+    ret = write(s->fd, &phdr, sizeof(Elf64_Phdr));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_notes(DumpState *s, int phdr_index,
+                             target_phys_addr_t *offset)
+{
+    CPUState *env;
+    int ret;
+    target_phys_addr_t begin = *offset;
+    Elf32_Phdr phdr;
+    off_t phdr_offset;
+    int id;
+    int endian = s->dump_info.d_endian;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        id = cpuid(env);
+        ret = cpu_write_elf32_note(s->fd, env, id, offset);
+        if (ret < 0) {
+            dump_error(s, "dump: failed to write elf notes.\n");
+            return -1;
+        }
+    }
+
+    memset(&phdr, 0, sizeof(Elf32_Phdr));
+    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_NOTE);
+    phdr.p_offset = CPU_CONVERT_TO_TARGET32(begin);
+    phdr.p_paddr = 0;
+    phdr.p_filesz = CPU_CONVERT_TO_TARGET32(*offset - begin);
+    phdr.p_memsz = CPU_CONVERT_TO_TARGET32(*offset - begin);
+    phdr.p_vaddr = 0;
+
+    phdr_offset = sizeof(Elf32_Ehdr);
+    lseek(s->fd, phdr_offset, SEEK_SET);
+    ret = write(s->fd, &phdr, sizeof(Elf32_Phdr));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_data(DumpState *s, void *buf, int length,
+                      target_phys_addr_t *offset)
+{
+    int ret;
+
+    lseek(s->fd, *offset, SEEK_SET);
+    ret = write(s->fd, buf, length);
+    if (ret < 0) {
+        dump_error(s, "dump: failed to save memory.\n");
+        return -1;
+    }
+
+    *offset += length;
+    return 0;
+}
+
+/* write the memroy to vmcore. 1 page per I/O. */
+static int write_memory(DumpState *s, RAMBlock *block,
+                        target_phys_addr_t *offset)
+{
+    int i, ret;
+
+    for (i = 0; i < block->length / TARGET_PAGE_SIZE; i++) {
+        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
+                         TARGET_PAGE_SIZE, offset);
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    if ((block->length % TARGET_PAGE_SIZE) != 0) {
+        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
+                         block->length % TARGET_PAGE_SIZE, offset);
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+/* get the memory's offset in the vmcore */
+static target_phys_addr_t get_offset(target_phys_addr_t phys_addr,
+                                     target_phys_addr_t memory_offset)
+{
+    RAMBlock *block;
+    target_phys_addr_t offset = memory_offset;
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (phys_addr >= block->offset &&
+            phys_addr < block->offset + block->length) {
+            return phys_addr - block->offset + offset;
+        }
+        offset += block->length;
+    }
+
+    return -1;
+}
+
+static DumpState *dump_init(int fd, Error **errp)
+{
+    CPUState *env;
+    DumpState *s = dump_get_current();
+    int ret;
+
+    vm_stop(RUN_STATE_PAUSED);
+    s->state = DUMP_STATE_SETUP;
+    if (s->error) {
+        g_free(s->error);
+        s->error = NULL;
+    }
+    s->fd = fd;
+
+    /*
+     * get dump info: endian, class and architecture.
+     * If the target architecture is not supported, cpu_get_dump_info() will
+     * return -1.
+     *
+     * if we use kvm, we should synchronize the register before we get dump
+     * info.
+     */
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        cpu_synchronize_state(env);
+    }
+    ret = cpu_get_dump_info(&s->dump_info);
+    if (ret < 0) {
+        error_set(errp, QERR_UNSUPPORTED);
+        return NULL;
+    }
+
+    /* get memory mapping */
+    s->list.num = 0;
+    QTAILQ_INIT(&s->list.head);
+    get_memory_mapping(&s->list);
+
+    /* crash needs extra memory mapping to determine phys_base. */
+    ret = cpu_add_extra_memory_mapping(&s->list);
+    if (ret < 0) {
+        error_set(errp, QERR_UNDEFINED_ERROR);
+        return NULL;
+    }
+
+    /*
+     * calculate phdr_num
+     *
+     * the type of phdr->num is uint16_t, so we should avoid overflow
+     */
+    s->phdr_num = 1; /* PT_NOTE */
+    if (s->list.num > (1 << 16) - 2) {
+        s->phdr_num = (1 << 16) - 1;
+    } else {
+        s->phdr_num += s->list.num;
+    }
+
+    return s;
+}
+
+/* write elf header, PT_NOTE and elf note to vmcore. */
+static int dump_begin(DumpState *s)
+{
+    target_phys_addr_t offset;
+    int ret;
+
+    s->state = DUMP_STATE_ACTIVE;
+
+    /*
+     * the vmcore's format is:
+     *   --------------
+     *   |  elf header |
+     *   --------------
+     *   |  PT_NOTE    |
+     *   --------------
+     *   |  PT_LOAD    |
+     *   --------------
+     *   |  ......     |
+     *   --------------
+     *   |  PT_LOAD    |
+     *   --------------
+     *   |  elf note   |
+     *   --------------
+     *   |  memory     |
+     *   --------------
+     *
+     * we only know where the memory is saved after we write elf note into
+     * vmcore.
+     */
+
+    /* write elf header to vmcore */
+    if (s->dump_info.d_class == ELFCLASS64) {
+        ret = write_elf64_header(s);
+    } else {
+        ret = write_elf32_header(s);
+    }
+    if (ret < 0) {
+        return -1;
+    }
+
+    /* write elf notes to vmcore */
+    if (s->dump_info.d_class == ELFCLASS64) {
+        offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*s->phdr_num;
+        ret = write_elf64_notes(s, 0, &offset);
+    } else {
+        offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*s->phdr_num;
+        ret = write_elf32_notes(s, 0, &offset);
+    }
+
+    if (ret < 0) {
+        return -1;
+    }
+
+    s->memory_offset = offset;
+    return 0;
+}
+
+/* write PT_LOAD to vmcore */
+static int dump_completed(DumpState *s)
+{
+    target_phys_addr_t offset;
+    MemoryMapping *memory_mapping;
+    int phdr_index = 1, ret;
+
+    QTAILQ_FOREACH(memory_mapping, &s->list.head, next) {
+        offset = get_offset(memory_mapping->phys_addr, s->memory_offset);
+        if (s->dump_info.d_class == ELFCLASS64) {
+            ret = write_elf64_load(s, memory_mapping, phdr_index++, offset);
+        } else {
+            ret = write_elf32_load(s, memory_mapping, phdr_index++, offset);
+        }
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    s->state = DUMP_STATE_COMPLETED;
+    dump_cleanup(s);
+    return 0;
+}
+
+/* write all memory to vmcore */
+static int dump_iterate(DumpState *s)
+{
+    RAMBlock *block;
+    target_phys_addr_t offset = s->memory_offset;
+    int ret;
+
+    /* write all memory to vmcore */
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        ret = write_memory(s, block, &offset);
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    return dump_completed(s);
+}
+
+static int create_vmcore(DumpState *s)
+{
+    int ret;
+
+    ret = dump_begin(s);
+    if (ret < 0) {
+        return -1;
+    }
+
+    ret = dump_iterate(s);
+    if (ret < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
+void qmp_dump(const char *file, Error **errp)
+{
+    const char *p;
+    int fd = -1;
+    DumpState *s;
+
+#if !defined(WIN32)
+    if (strstart(file, "fd:", &p)) {
+        fd = qemu_get_fd(p);
+        if (fd == -1) {
+            error_set(errp, QERR_FD_NOT_FOUND, p);
+            return;
+        }
+    }
+#endif
+
+    if  (strstart(file, "file:", &p)) {
+        fd = open(p, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY);
+        if (fd < 0) {
+            error_set(errp, QERR_OPEN_FILE_FAILED, p);
+            return;
+        }
+    }
+
+    if (fd == -1) {
+        error_set(errp, QERR_INVALID_PARAMETER, "file");
+        return;
+    }
+
+    s = dump_init(fd, errp);
+    if (!s) {
+        return;
+    }
+
+    if (create_vmcore(s) < 0) {
+        error_set(errp, QERR_IO_ERROR);
+    }
+
+    return;
+}
diff --git a/dump.h b/dump.h
index a36468b..b413d18 100644
--- a/dump.h
+++ b/dump.h
@@ -1,6 +1,9 @@ 
 #ifndef DUMP_H
 #define DUMP_H
 
+#include "qdict.h"
+#include "error.h"
+
 typedef struct ArchDumpInfo {
     int d_machine;  /* Architecture */
     int d_endian;   /* ELFDATA2LSB or ELFDATA2MSB */
diff --git a/hmp-commands.hx b/hmp-commands.hx
index a586498..c3615e3 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -828,6 +828,22 @@  new parameters (if specified) once the vm migration finished successfully.
 ETEXI
 
     {
+        .name       = "dump",
+        .args_type  = "file:s",
+        .params     = "file",
+        .help       = "dump to file",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd = hmp_dump,
+    },
+
+
+STEXI
+@item dump @var{file}
+@findex dump
+Dump to @var{file}.
+ETEXI
+
+    {
         .name       = "snapshot_blkdev",
         .args_type  = "device:B,snapshot-file:s?,format:s?",
         .params     = "device [new-image-file] [format]",
diff --git a/hmp.c b/hmp.c
index 8a77780..11b4ce6 100644
--- a/hmp.c
+++ b/hmp.c
@@ -681,3 +681,12 @@  void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict)
     int64_t value = qdict_get_int(qdict, "value");
     qmp_migrate_set_speed(value, NULL);
 }
+
+void hmp_dump(Monitor *mon, const QDict *qdict)
+{
+    Error *errp = NULL;
+    const char *file = qdict_get_str(qdict, "file");
+
+    qmp_dump(file, &errp);
+    hmp_handle_error(mon, &errp);
+}
diff --git a/hmp.h b/hmp.h
index 093242d..8d6a5d2 100644
--- a/hmp.h
+++ b/hmp.h
@@ -49,5 +49,6 @@  void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
+void hmp_dump(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/monitor.c b/monitor.c
index ce2bc13..24d4371 100644
--- a/monitor.c
+++ b/monitor.c
@@ -73,6 +73,9 @@ 
 #endif
 #include "hw/lm32_pic.h"
 
+/* for dump */
+#include "dump.h"
+
 //#define DEBUG
 //#define DEBUG_COMPLETION
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 44cf764..84c2c9a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1275,3 +1275,16 @@ 
 { 'command': 'qom-set',
   'data': { 'path': 'str', 'property': 'str', 'value': 'visitor' },
   'gen': 'no' }
+
+##
+# @dump
+#
+# Dump guest's memory to vmcore.
+#
+# @file: the filename or file descriptor of the vmcore.
+#
+# Returns: nothing on success
+#
+# Since: 1.1
+##
+{ 'command': 'dump', 'data': {'file': 'str'} }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7e3f4b9..fefdae2 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -572,6 +572,32 @@  Example:
 EQMP
 
     {
+        .name       = "dump",
+        .args_type  = "file:s",
+        .params     = "file",
+        .help       = "dump to file",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = qmp_marshal_input_dump,
+    },
+
+SQMP
+dump
+
+
+Dump to file.
+
+Arguments:
+
+- "file": Destination file (json-string)
+
+Example:
+
+-> { "execute": "dump", "arguments": { "file": "fd:dump" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "netdev_add",
         .args_type  = "netdev:O",
         .params     = "[user|tap|socket],id=str[,prop=value][,...]",