diff mbox

[12/12,v12] introduce a new monitor command 'dump-guest-memory' to dump guest's memory

Message ID 4F8E6D7C.3020107@cn.fujitsu.com
State New
Headers show

Commit Message

Wen Congyang April 18, 2012, 7:30 a.m. UTC
The command's usage:
   dump [-p] protocol [begin] [length]
The supported protocol can be file or fd:
1. file: the protocol starts with "file:", and the following string is
   the file's path.
2. fd: the protocol starts with "fd:", and the following string is the
   fd's name.

Note:
  1. If you want to use gdb to process the core, please specify -p option.
     The reason why the -p option is not default is:
       a. guest machine in a catastrophic state can have corrupted memory,
          which we cannot trust.
       b. The guest machine can be in read-mode even if paging is enabled.
          For example: the guest machine uses ACPI to sleep, and ACPI sleep
          state goes in real-mode.
  2. This command doesn't support the fd that is is associated with a pipe,
     socket, or FIFO(lseek will fail with such fd).
  3. If you don't want to dump all guest's memory, please specify the start
     physical address and the length.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 Makefile.target  |    2 +-
 dump.c           |  828 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 elf.h            |    5 +
 hmp-commands.hx  |   28 ++
 hmp.c            |   22 ++
 hmp.h            |    1 +
 memory_mapping.c |   27 ++
 memory_mapping.h |    3 +
 qapi-schema.json |   34 +++
 qmp-commands.hx  |   38 +++
 10 files changed, 987 insertions(+), 1 deletions(-)
 create mode 100644 dump.c

Comments

Daniel P. Berrangé April 18, 2012, 9:23 a.m. UTC | #1
On Wed, Apr 18, 2012 at 03:30:04PM +0800, Wen Congyang wrote:
> The command's usage:
>    dump [-p] protocol [begin] [length]
> The supported protocol can be file or fd:
> 1. file: the protocol starts with "file:", and the following string is
>    the file's path.
> 2. fd: the protocol starts with "fd:", and the following string is the
>    fd's name.
> 
> Note:
>   1. If you want to use gdb to process the core, please specify -p option.
>      The reason why the -p option is not default is:
>        a. guest machine in a catastrophic state can have corrupted memory,
>           which we cannot trust.

What is the behaviour of this command if we set '-p', and the guest
is corrupt ?

If '-p' is not safe when the guest is in a corrupted state, then I'd
argue that '-p' is not safe *ever*, since a malicious guest could
setup bad page maps at any time it likes and we've no way of knowing
this from the host.

>        b. The guest machine can be in read-mode even if paging is enabled.
>           For example: the guest machine uses ACPI to sleep, and ACPI sleep
>           state goes in real-mode.
>   2. This command doesn't support the fd that is is associated with a pipe,
>      socket, or FIFO(lseek will fail with such fd).

How hard would it be to lift that restriction ? When libvirt does save to
file, or core dump these days, we tend to pass a pipe FD to QEMU, which
is connected to libvirt's I/O helper process. The reason for this is that
it lets us turn on O_DIRECT for the dumped data, which has proved to be
quite an important feature for oVirt. So I'd rather like to keep that
ability with the new dump command.

Regards,
Daniel
Wen Congyang April 18, 2012, 9:37 a.m. UTC | #2
At 04/18/2012 05:23 PM, Daniel P. Berrange Wrote:
> On Wed, Apr 18, 2012 at 03:30:04PM +0800, Wen Congyang wrote:
>> The command's usage:
>>    dump [-p] protocol [begin] [length]
>> The supported protocol can be file or fd:
>> 1. file: the protocol starts with "file:", and the following string is
>>    the file's path.
>> 2. fd: the protocol starts with "fd:", and the following string is the
>>    fd's name.
>>
>> Note:
>>   1. If you want to use gdb to process the core, please specify -p option.
>>      The reason why the -p option is not default is:
>>        a. guest machine in a catastrophic state can have corrupted memory,
>>           which we cannot trust.
> 
> What is the behaviour of this command if we set '-p', and the guest
> is corrupt ?
> 
> If '-p' is not safe when the guest is in a corrupted state, then I'd
> argue that '-p' is not safe *ever*, since a malicious guest could
> setup bad page maps at any time it likes and we've no way of knowing
> this from the host.

Yes, '-p' is not safe, and the default value is false.
Someone may use gdb to deal with the vmcore, so '-p' is useful for them.

> 
>>        b. The guest machine can be in read-mode even if paging is enabled.
>>           For example: the guest machine uses ACPI to sleep, and ACPI sleep
>>           state goes in real-mode.
>>   2. This command doesn't support the fd that is is associated with a pipe,
>>      socket, or FIFO(lseek will fail with such fd).
> 
> How hard would it be to lift that restriction ? When libvirt does save to
> file, or core dump these days, we tend to pass a pipe FD to QEMU, which
> is connected to libvirt's I/O helper process. The reason for this is that
> it lets us turn on O_DIRECT for the dumped data, which has proved to be
> quite an important feature for oVirt. So I'd rather like to keep that
> ability with the new dump command.

The reason is that we will use lseek(fd, ...). If you pass a pipe FD
to qemu, lseek() will fail. I donot know the note size before we
write it to the core, so I use lseek()...

Thanks
Wen Congyang

> 
> Regards,
> Daniel
Wen Congyang April 18, 2012, 9:44 a.m. UTC | #3
At 04/18/2012 05:23 PM, Daniel P. Berrange Wrote:
> On Wed, Apr 18, 2012 at 03:30:04PM +0800, Wen Congyang wrote:
>> The command's usage:
>>    dump [-p] protocol [begin] [length]
>> The supported protocol can be file or fd:
>> 1. file: the protocol starts with "file:", and the following string is
>>    the file's path.
>> 2. fd: the protocol starts with "fd:", and the following string is the
>>    fd's name.
>>
>> Note:
>>   1. If you want to use gdb to process the core, please specify -p option.
>>      The reason why the -p option is not default is:
>>        a. guest machine in a catastrophic state can have corrupted memory,
>>           which we cannot trust.
> 
> What is the behaviour of this command if we set '-p', and the guest
> is corrupt ?
> 
> If '-p' is not safe when the guest is in a corrupted state, then I'd
> argue that '-p' is not safe *ever*, since a malicious guest could
> setup bad page maps at any time it likes and we've no way of knowing
> this from the host.
> 
>>        b. The guest machine can be in read-mode even if paging is enabled.
>>           For example: the guest machine uses ACPI to sleep, and ACPI sleep
>>           state goes in real-mode.
>>   2. This command doesn't support the fd that is is associated with a pipe,
>>      socket, or FIFO(lseek will fail with such fd).
> 
> How hard would it be to lift that restriction ? When libvirt does save to
> file, or core dump these days, we tend to pass a pipe FD to QEMU, which
> is connected to libvirt's I/O helper process. The reason for this is that
> it lets us turn on O_DIRECT for the dumped data, which has proved to be
> quite an important feature for oVirt. So I'd rather like to keep that
> ability with the new dump command.

Here is the libvirt's code in doCoreDump():

    /* Core dumps usually imply last-ditch analysis efforts are
     * desired, so we intentionally do not unlink even if a file was
     * created.  */
    if ((fd = qemuOpenFile(driver, path,
                           O_CREAT | O_TRUNC | O_WRONLY | directFlag,
                           NULL, NULL)) < 0)


In my test, the fd returns from qemuOpenFile is not a pipe FD. So
O_DIRECT still exists.
Do I miss something?

Thanks
Wen Congyang
> 
> Regards,
> Daniel
Daniel P. Berrangé April 18, 2012, 9:48 a.m. UTC | #4
On Wed, Apr 18, 2012 at 05:44:36PM +0800, Wen Congyang wrote:
> At 04/18/2012 05:23 PM, Daniel P. Berrange Wrote:
> > On Wed, Apr 18, 2012 at 03:30:04PM +0800, Wen Congyang wrote:
> >> The command's usage:
> >>    dump [-p] protocol [begin] [length]
> >> The supported protocol can be file or fd:
> >> 1. file: the protocol starts with "file:", and the following string is
> >>    the file's path.
> >> 2. fd: the protocol starts with "fd:", and the following string is the
> >>    fd's name.
> >>
> >> Note:
> >>   1. If you want to use gdb to process the core, please specify -p option.
> >>      The reason why the -p option is not default is:
> >>        a. guest machine in a catastrophic state can have corrupted memory,
> >>           which we cannot trust.
> > 
> > What is the behaviour of this command if we set '-p', and the guest
> > is corrupt ?
> > 
> > If '-p' is not safe when the guest is in a corrupted state, then I'd
> > argue that '-p' is not safe *ever*, since a malicious guest could
> > setup bad page maps at any time it likes and we've no way of knowing
> > this from the host.
> > 
> >>        b. The guest machine can be in read-mode even if paging is enabled.
> >>           For example: the guest machine uses ACPI to sleep, and ACPI sleep
> >>           state goes in real-mode.
> >>   2. This command doesn't support the fd that is is associated with a pipe,
> >>      socket, or FIFO(lseek will fail with such fd).
> > 
> > How hard would it be to lift that restriction ? When libvirt does save to
> > file, or core dump these days, we tend to pass a pipe FD to QEMU, which
> > is connected to libvirt's I/O helper process. The reason for this is that
> > it lets us turn on O_DIRECT for the dumped data, which has proved to be
> > quite an important feature for oVirt. So I'd rather like to keep that
> > ability with the new dump command.
> 
> Here is the libvirt's code in doCoreDump():
> 
>     /* Core dumps usually imply last-ditch analysis efforts are
>      * desired, so we intentionally do not unlink even if a file was
>      * created.  */
>     if ((fd = qemuOpenFile(driver, path,
>                            O_CREAT | O_TRUNC | O_WRONLY | directFlag,
>                            NULL, NULL)) < 0)
> 
> 
> In my test, the fd returns from qemuOpenFile is not a pipe FD. So
> O_DIRECT still exists.
> Do I miss something?

If we turned on O_DIRECT on the 'fd' that we pass to QEMU, then QEMU
needs to be carefully written to use aligned memory buffers and do
writes in the correct size. It almost certainly does not do this
correctly.

Thus libvirt uses a I/O helper process where the I/O helper has the
FD for the actual file and QEMU just gets a pipe FD. Thus only the
I/O helper needs to careful about O_DIRECT alignment issues.


Daniel
Eric Blake April 18, 2012, 3:35 p.m. UTC | #5
On 04/18/2012 03:37 AM, Wen Congyang wrote:
>>>   2. This command doesn't support the fd that is is associated with a pipe,
>>>      socket, or FIFO(lseek will fail with such fd).
>>
>> How hard would it be to lift that restriction ? When libvirt does save to
>> file, or core dump these days, we tend to pass a pipe FD to QEMU, which
>> is connected to libvirt's I/O helper process. The reason for this is that
>> it lets us turn on O_DIRECT for the dumped data, which has proved to be
>> quite an important feature for oVirt. So I'd rather like to keep that
>> ability with the new dump command.
> 
> The reason is that we will use lseek(fd, ...). If you pass a pipe FD
> to qemu, lseek() will fail. I donot know the note size before we
> write it to the core, so I use lseek()...

The question, then, is why we don't know the note size in advance, and
what it would take to be able to compute it in advance so that the
output file can be written in a single pass without needing lseek().
Wen Congyang April 18, 2012, 3:55 p.m. UTC | #6
On 04/18/2012 11:35 PM, Eric Blake wrote:
> On 04/18/2012 03:37 AM, Wen Congyang wrote:
>>>>    2. This command doesn't support the fd that is is associated with a pipe,
>>>>       socket, or FIFO(lseek will fail with such fd).
>>>
>>> How hard would it be to lift that restriction ? When libvirt does save to
>>> file, or core dump these days, we tend to pass a pipe FD to QEMU, which
>>> is connected to libvirt's I/O helper process. The reason for this is that
>>> it lets us turn on O_DIRECT for the dumped data, which has proved to be
>>> quite an important feature for oVirt. So I'd rather like to keep that
>>> ability with the new dump command.
>>
>> The reason is that we will use lseek(fd, ...). If you pass a pipe FD
>> to qemu, lseek() will fail. I donot know the note size before we
>> write it to the core, so I use lseek()...
>
> The question, then, is why we don't know the note size in advance, and
> what it would take to be able to compute it in advance so that the
> output file can be written in a single pass without needing lseek().
>

I have fixed this problem in the newest patchset. Please refer to:
http://lists.nongnu.org/archive/html/qemu-devel/2012-04/msg02493.html

Thanks
Wen Congyang
diff mbox

Patch

diff --git a/Makefile.target b/Makefile.target
index dc35266..e810b52 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -228,7 +228,7 @@  obj-$(CONFIG_NO_KVM) += kvm-stub.o
 obj-$(CONFIG_VGA) += vga.o
 obj-y += memory.o savevm.o
 obj-y += memory_mapping.o
-obj-$(CONFIG_HAVE_CORE_DUMP) += arch_dump.o
+obj-$(CONFIG_HAVE_CORE_DUMP) += arch_dump.o dump.o
 LIBS+=-lz
 
 obj-i386-$(CONFIG_KVM) += hyperv.o
diff --git a/dump.c b/dump.c
new file mode 100644
index 0000000..87fb0dd
--- /dev/null
+++ b/dump.c
@@ -0,0 +1,828 @@ 
+/*
+ * QEMU dump
+ *
+ * Copyright Fujitsu, Corp. 2011, 2012
+ *
+ * Authors:
+ *     Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include <unistd.h>
+#include "elf.h"
+#include <sys/procfs.h>
+#include <glib.h>
+#include "cpu.h"
+#include "cpu-all.h"
+#include "targphys.h"
+#include "monitor.h"
+#include "kvm.h"
+#include "dump.h"
+#include "sysemu.h"
+#include "bswap.h"
+#include "memory_mapping.h"
+#include "error.h"
+#include "qmp-commands.h"
+#include "gdbstub.h"
+
+static uint16_t cpu_convert_to_target16(uint16_t val, int endian)
+{
+    if (endian == ELFDATA2LSB) {
+        val = cpu_to_le16(val);
+    } else {
+        val = cpu_to_be16(val);
+    }
+
+    return val;
+}
+
+static uint32_t cpu_convert_to_target32(uint32_t val, int endian)
+{
+    if (endian == ELFDATA2LSB) {
+        val = cpu_to_le32(val);
+    } else {
+        val = cpu_to_be32(val);
+    }
+
+    return val;
+}
+
+static uint64_t cpu_convert_to_target64(uint64_t val, int endian)
+{
+    if (endian == ELFDATA2LSB) {
+        val = cpu_to_le64(val);
+    } else {
+        val = cpu_to_be64(val);
+    }
+
+    return val;
+}
+
+typedef struct DumpState {
+    ArchDumpInfo dump_info;
+    MemoryMappingList list;
+    uint16_t phdr_num;
+    uint32_t sh_info;
+    bool have_section;
+    bool resume;
+    target_phys_addr_t memory_offset;
+    int fd;
+
+    RAMBlock *block;
+    ram_addr_t start;
+    bool has_filter;
+    int64_t begin;
+    int64_t length;
+    Error **errp;
+} DumpState;
+
+static int dump_cleanup(DumpState *s)
+{
+    int ret = 0;
+
+    memory_mapping_list_free(&s->list);
+    if (s->fd != -1) {
+        close(s->fd);
+    }
+    if (s->resume) {
+        vm_start();
+    }
+
+    return ret;
+}
+
+static void dump_error(DumpState *s, const char *reason)
+{
+    dump_cleanup(s);
+}
+
+static int fd_write_vmcore(target_phys_addr_t offset, void *buf, size_t size,
+                           void *opaque)
+{
+    DumpState *s = opaque;
+    int fd = s->fd;
+    off_t ret;
+    size_t writen_size;
+
+    while (1) {
+        ret = lseek(fd, offset, SEEK_SET);
+        if (ret < 0) {
+            if (errno == ESPIPE) {
+                error_set(s->errp, QERR_PIPE_OR_SOCKET_FD);
+                return -1;
+            }
+
+            if (errno != EINTR && errno != EAGAIN) {
+                return -1;
+            }
+            continue;
+        }
+        break;
+    }
+
+    /* The fd may be passed from user, and it can be non-blocked */
+    while (size) {
+        writen_size = qemu_write_full(fd, buf, size);
+        if (writen_size != size && errno != EAGAIN) {
+            return -1;
+        }
+
+        buf += writen_size;
+        size -= writen_size;
+    }
+
+    return 0;
+}
+
+static int write_elf64_header(DumpState *s)
+{
+    Elf64_Ehdr elf_header;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&elf_header, 0, sizeof(Elf64_Ehdr));
+    memcpy(&elf_header, ELFMAG, SELFMAG);
+    elf_header.e_ident[EI_CLASS] = ELFCLASS64;
+    elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
+    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
+    elf_header.e_type = cpu_convert_to_target16(ET_CORE, endian);
+    elf_header.e_machine = cpu_convert_to_target16(s->dump_info.d_machine,
+                                                   endian);
+    elf_header.e_version = cpu_convert_to_target32(EV_CURRENT, endian);
+    elf_header.e_ehsize = cpu_convert_to_target16(sizeof(elf_header), endian);
+    elf_header.e_phoff = cpu_convert_to_target64(sizeof(Elf64_Ehdr), endian);
+    elf_header.e_phentsize = cpu_convert_to_target16(sizeof(Elf64_Phdr),
+                                                     endian);
+    elf_header.e_phnum = cpu_convert_to_target16(s->phdr_num, endian);
+    if (s->have_section) {
+        uint64_t shoff = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr) * s->sh_info;
+
+        elf_header.e_shoff = cpu_convert_to_target64(shoff, endian);
+        elf_header.e_shentsize = cpu_convert_to_target16(sizeof(Elf64_Shdr),
+                                                         endian);
+        elf_header.e_shnum = cpu_convert_to_target16(1, endian);
+    }
+
+    ret = fd_write_vmcore(0, &elf_header, sizeof(elf_header), s);
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write elf header.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_header(DumpState *s)
+{
+    Elf32_Ehdr elf_header;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&elf_header, 0, sizeof(Elf32_Ehdr));
+    memcpy(&elf_header, ELFMAG, SELFMAG);
+    elf_header.e_ident[EI_CLASS] = ELFCLASS32;
+    elf_header.e_ident[EI_DATA] = endian;
+    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
+    elf_header.e_type = cpu_convert_to_target16(ET_CORE, endian);
+    elf_header.e_machine = cpu_convert_to_target16(s->dump_info.d_machine,
+                                                   endian);
+    elf_header.e_version = cpu_convert_to_target32(EV_CURRENT, endian);
+    elf_header.e_ehsize = cpu_convert_to_target16(sizeof(elf_header), endian);
+    elf_header.e_phoff = cpu_convert_to_target32(sizeof(Elf32_Ehdr), endian);
+    elf_header.e_phentsize = cpu_convert_to_target16(sizeof(Elf32_Phdr),
+                                                     endian);
+    elf_header.e_phnum = cpu_convert_to_target16(s->phdr_num, endian);
+    if (s->have_section) {
+        uint32_t shoff = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr) * s->sh_info;
+
+        elf_header.e_shoff = cpu_convert_to_target32(shoff, endian);
+        elf_header.e_shentsize = cpu_convert_to_target16(sizeof(Elf32_Shdr),
+                                                         endian);
+        elf_header.e_shnum = cpu_convert_to_target16(1, endian);
+    }
+
+    ret = fd_write_vmcore(0, &elf_header, sizeof(elf_header), s);
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write elf header.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
+                            int phdr_index, target_phys_addr_t offset)
+{
+    Elf64_Phdr phdr;
+    off_t phdr_offset;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&phdr, 0, sizeof(Elf64_Phdr));
+    phdr.p_type = cpu_convert_to_target32(PT_LOAD, endian);
+    phdr.p_offset = cpu_convert_to_target64(offset, endian);
+    phdr.p_paddr = cpu_convert_to_target64(memory_mapping->phys_addr, endian);
+    if (offset == -1) {
+        /* When the memory is not stored into vmcore, offset will be -1 */
+        phdr.p_filesz = 0;
+    } else {
+        phdr.p_filesz = cpu_convert_to_target64(memory_mapping->length, endian);
+    }
+    phdr.p_memsz = cpu_convert_to_target64(memory_mapping->length, endian);
+    phdr.p_vaddr = cpu_convert_to_target64(memory_mapping->virt_addr, endian);
+
+    phdr_offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*phdr_index;
+    ret = fd_write_vmcore(phdr_offset, &phdr, sizeof(Elf64_Phdr), s);
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
+                            int phdr_index, target_phys_addr_t offset)
+{
+    Elf32_Phdr phdr;
+    off_t phdr_offset;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&phdr, 0, sizeof(Elf32_Phdr));
+    phdr.p_type = cpu_convert_to_target32(PT_LOAD, endian);
+    phdr.p_offset = cpu_convert_to_target32(offset, endian);
+    phdr.p_paddr = cpu_convert_to_target32(memory_mapping->phys_addr, endian);
+    if (offset == -1) {
+        /* When the memory is not stored into vmcore, offset will be -1 */
+        phdr.p_filesz = 0;
+    } else {
+        phdr.p_filesz = cpu_convert_to_target32(memory_mapping->length, endian);
+    }
+    phdr.p_memsz = cpu_convert_to_target32(memory_mapping->length, endian);
+    phdr.p_vaddr = cpu_convert_to_target32(memory_mapping->virt_addr, endian);
+
+    phdr_offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*phdr_index;
+    ret = fd_write_vmcore(phdr_offset, &phdr, sizeof(Elf32_Phdr), s);
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf64_notes(DumpState *s, int phdr_index,
+                             target_phys_addr_t *offset)
+{
+    CPUArchState *env;
+    int ret;
+    target_phys_addr_t begin = *offset;
+    Elf64_Phdr phdr;
+    off_t phdr_offset;
+    int id;
+    int endian = s->dump_info.d_endian;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        id = cpu_index(env);
+        ret = cpu_write_elf64_note(fd_write_vmcore, env, id, offset, s);
+        if (ret < 0) {
+            dump_error(s, "dump: failed to write elf notes.\n");
+            return -1;
+        }
+    }
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        ret = cpu_write_elf64_qemunote(fd_write_vmcore, env, offset, s);
+        if (ret < 0) {
+            dump_error(s, "dump: failed to write CPU status.\n");
+            return -1;
+        }
+    }
+
+    memset(&phdr, 0, sizeof(Elf64_Phdr));
+    phdr.p_type = cpu_convert_to_target32(PT_NOTE, endian);
+    phdr.p_offset = cpu_convert_to_target64(begin, endian);
+    phdr.p_paddr = 0;
+    phdr.p_filesz = cpu_convert_to_target64(*offset - begin, endian);
+    phdr.p_memsz = cpu_convert_to_target64(*offset - begin, endian);
+    phdr.p_vaddr = 0;
+
+    phdr_offset = sizeof(Elf64_Ehdr);
+    ret = fd_write_vmcore(phdr_offset, &phdr, sizeof(Elf64_Phdr), s);
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_notes(DumpState *s, int phdr_index,
+                             target_phys_addr_t *offset)
+{
+    CPUArchState *env;
+    int ret;
+    target_phys_addr_t begin = *offset;
+    Elf32_Phdr phdr;
+    off_t phdr_offset;
+    int id;
+    int endian = s->dump_info.d_endian;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        id = cpu_index(env);
+        ret = cpu_write_elf32_note(fd_write_vmcore, env, id, offset, s);
+        if (ret < 0) {
+            dump_error(s, "dump: failed to write elf notes.\n");
+            return -1;
+        }
+    }
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        ret = cpu_write_elf32_qemunote(fd_write_vmcore, env, offset, s);
+        if (ret < 0) {
+            dump_error(s, "dump: failed to write CPU status.\n");
+            return -1;
+        }
+    }
+
+    memset(&phdr, 0, sizeof(Elf32_Phdr));
+    phdr.p_type = cpu_convert_to_target32(PT_NOTE, endian);
+    phdr.p_offset = cpu_convert_to_target32(begin, endian);
+    phdr.p_paddr = 0;
+    phdr.p_filesz = cpu_convert_to_target32(*offset - begin, endian);
+    phdr.p_memsz = cpu_convert_to_target32(*offset - begin, endian);
+    phdr.p_vaddr = 0;
+
+    phdr_offset = sizeof(Elf32_Ehdr);
+    ret = fd_write_vmcore(phdr_offset, &phdr, sizeof(Elf32_Phdr), s);
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf_section(DumpState *s, target_phys_addr_t *offset, int type)
+{
+    Elf32_Shdr shdr32;
+    Elf64_Shdr shdr64;
+    int endian = s->dump_info.d_endian;
+    int shdr_size;
+    void *shdr;
+    int ret;
+
+    if (type == 0) {
+        shdr_size = sizeof(Elf32_Shdr);
+        memset(&shdr32, 0, shdr_size);
+        shdr32.sh_info = cpu_convert_to_target32(s->sh_info, endian);
+        shdr = &shdr32;
+    } else {
+        shdr_size = sizeof(Elf64_Shdr);
+        memset(&shdr64, 0, shdr_size);
+        shdr64.sh_info = cpu_convert_to_target32(s->sh_info, endian);
+        shdr = &shdr64;
+    }
+
+    ret = fd_write_vmcore(*offset, &shdr, shdr_size, s);
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write section header table.\n");
+        return -1;
+    }
+
+    *offset += shdr_size;
+    return 0;
+}
+
+static int write_data(DumpState *s, void *buf, int length,
+                      target_phys_addr_t *offset)
+{
+    int ret;
+
+    ret = fd_write_vmcore(*offset, buf, length, s);
+    if (ret < 0) {
+        dump_error(s, "dump: failed to save memory.\n");
+        return -1;
+    }
+
+    *offset += length;
+    return 0;
+}
+
+/* write the memroy to vmcore. 1 page per I/O. */
+static int write_memory(DumpState *s, RAMBlock *block, ram_addr_t start,
+                        target_phys_addr_t *offset, int64_t size)
+{
+    int64_t i;
+    int ret;
+
+    for (i = 0; i < size / TARGET_PAGE_SIZE; i++) {
+        ret = write_data(s, block->host + start + i * TARGET_PAGE_SIZE,
+                         TARGET_PAGE_SIZE, offset);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    if ((size % TARGET_PAGE_SIZE) != 0) {
+        ret = write_data(s, block->host + start + i * TARGET_PAGE_SIZE,
+                         size % TARGET_PAGE_SIZE, offset);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/* get the memory's offset in the vmcore */
+static target_phys_addr_t get_offset(target_phys_addr_t phys_addr,
+                                     DumpState *s)
+{
+    RAMBlock *block;
+    target_phys_addr_t offset = s->memory_offset;
+    int64_t size_in_block, start;
+
+    if (s->has_filter) {
+        if (phys_addr < s->begin || phys_addr >= s->begin + s->length) {
+            return -1;
+        }
+    }
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (s->has_filter) {
+            if (block->offset >= s->begin + s->length ||
+                block->offset + block->length <= s->begin) {
+                /* This block is out of the range */
+                continue;
+            }
+
+            if (s->begin <= block->offset) {
+                start = block->offset;
+            } else {
+                start = s->begin;
+            }
+
+            size_in_block = block->length - (start - block->offset);
+            if (s->begin + s->length < block->offset + block->length) {
+                size_in_block -= block->offset + block->length -
+                                 (s->begin + s->length);
+            }
+        } else {
+            start = block->offset;
+            size_in_block = block->length;
+        }
+
+        if (phys_addr >= start && phys_addr < start + size_in_block) {
+            return phys_addr - start + offset;
+        }
+
+        offset += size_in_block;
+    }
+
+    return -1;
+}
+
+/* write elf header, PT_NOTE and elf note to vmcore. */
+static int dump_begin(DumpState *s)
+{
+    target_phys_addr_t offset;
+    int ret;
+
+    /*
+     * the vmcore's format is:
+     *   --------------
+     *   |  elf header |
+     *   --------------
+     *   |  PT_NOTE    |
+     *   --------------
+     *   |  PT_LOAD    |
+     *   --------------
+     *   |  ......     |
+     *   --------------
+     *   |  PT_LOAD    |
+     *   --------------
+     *   |  sec_hdr    |
+     *   --------------
+     *   |  elf note   |
+     *   --------------
+     *   |  memory     |
+     *   --------------
+     *
+     * we only know where the memory is saved after we write elf note into
+     * vmcore.
+     */
+
+    /* write elf header to vmcore */
+    if (s->dump_info.d_class == ELFCLASS64) {
+        ret = write_elf64_header(s);
+    } else {
+        ret = write_elf32_header(s);
+    }
+    if (ret < 0) {
+        return -1;
+    }
+
+    /* write elf section and notes to vmcore */
+    if (s->dump_info.d_class == ELFCLASS64) {
+        if (s->have_section) {
+            offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*s->sh_info;
+            if (write_elf_section(s, &offset, 1) < 0) {
+                return -1;
+            }
+        } else {
+            offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*s->phdr_num;
+        }
+        ret = write_elf64_notes(s, 0, &offset);
+    } else {
+        if (s->have_section) {
+            offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*s->sh_info;
+            if (write_elf_section(s, &offset, 0) < 0) {
+                return -1;
+            }
+        } else {
+            offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*s->phdr_num;
+        }
+        ret = write_elf32_notes(s, 0, &offset);
+    }
+
+    if (ret < 0) {
+        return -1;
+    }
+
+    s->memory_offset = offset;
+    return 0;
+}
+
+/* write PT_LOAD to vmcore */
+static int dump_completed(DumpState *s)
+{
+    target_phys_addr_t offset;
+    MemoryMapping *memory_mapping;
+    int phdr_index = 1, ret;
+
+    QTAILQ_FOREACH(memory_mapping, &s->list.head, next) {
+        offset = get_offset(memory_mapping->phys_addr, s);
+        if (s->dump_info.d_class == ELFCLASS64) {
+            ret = write_elf64_load(s, memory_mapping, phdr_index++, offset);
+        } else {
+            ret = write_elf32_load(s, memory_mapping, phdr_index++, offset);
+        }
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    dump_cleanup(s);
+    return 0;
+}
+
+static int get_next_block(DumpState *s, RAMBlock *block)
+{
+    while (1) {
+        block = QLIST_NEXT(block, next);
+        if (!block) {
+            /* no more block */
+            return 1;
+        }
+
+        s->start = 0;
+        s->block = block;
+        if (s->has_filter) {
+            if (block->offset >= s->begin + s->length ||
+                block->offset + block->length <= s->begin) {
+                /* This block is out of the range */
+                continue;
+            }
+
+            if (s->begin > block->offset) {
+                s->start = s->begin - block->offset;
+            }
+        }
+
+        return 0;
+    }
+}
+
+/* write all memory to vmcore */
+static int dump_iterate(DumpState *s)
+{
+    RAMBlock *block;
+    target_phys_addr_t offset = s->memory_offset;
+    int64_t size;
+    int ret;
+
+    while (1) {
+        block = s->block;
+
+        size = block->length;
+        if (s->has_filter) {
+            size -= s->start;
+            if (s->begin + s->length < block->offset + block->length) {
+                size -= block->offset + block->length - (s->begin + s->length);
+            }
+        }
+        ret = write_memory(s, block, s->start, &offset, size);
+        if (ret == -1) {
+            return ret;
+        }
+
+        ret = get_next_block(s, block);
+        if (ret == 1) {
+            dump_completed(s);
+            return 0;
+        }
+    }
+}
+
+static int create_vmcore(DumpState *s)
+{
+    int ret;
+
+    ret = dump_begin(s);
+    if (ret < 0) {
+        return -1;
+    }
+
+    ret = dump_iterate(s);
+    if (ret < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static ram_addr_t get_start_block(DumpState *s)
+{
+    RAMBlock *block;
+
+    if (!s->has_filter) {
+        s->block = QLIST_FIRST(&ram_list.blocks);
+        return 0;
+    }
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (block->offset >= s->begin + s->length ||
+            block->offset + block->length <= s->begin) {
+            /* This block is out of the range */
+            continue;
+        }
+
+        s->block = block;
+        if (s->begin > block->offset) {
+            s->start = s->begin - block->offset;
+        } else {
+            s->start = 0;
+        }
+        return s->start;
+    }
+
+    return -1;
+}
+
+static int dump_init(DumpState *s, int fd, bool paging, bool has_filter,
+                     int64_t begin, int64_t length, Error **errp)
+{
+    CPUArchState *env;
+    int ret;
+
+    if (runstate_is_running()) {
+        vm_stop(RUN_STATE_SAVE_VM);
+        s->resume = true;
+    } else {
+        s->resume = false;
+    }
+
+    s->errp = errp;
+    s->fd = fd;
+    s->has_filter = has_filter;
+    s->begin = begin;
+    s->length = length;
+    s->start = get_start_block(s);
+    if (s->start == -1) {
+        error_set(errp, QERR_INVALID_PARAMETER, "begin");
+        goto cleanup;
+    }
+
+    /*
+     * get dump info: endian, class and architecture.
+     * If the target architecture is not supported, cpu_get_dump_info() will
+     * return -1.
+     *
+     * if we use kvm, we should synchronize the register before we get dump
+     * info.
+     */
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        cpu_synchronize_state(env);
+    }
+
+    ret = cpu_get_dump_info(&s->dump_info);
+    if (ret < 0) {
+        error_set(errp, QERR_UNSUPPORTED);
+        goto cleanup;
+    }
+
+    /* get memory mapping */
+    memory_mapping_list_init(&s->list);
+    if (paging) {
+        qemu_get_guest_memory_mapping(&s->list);
+    } else {
+        qemu_get_guest_simple_memory_mapping(&s->list);
+    }
+
+    if (s->has_filter) {
+        memory_mapping_filter(&s->list, s->begin, s->length);
+    }
+
+    /*
+     * calculate phdr_num
+     *
+     * the type of ehdr->e_phnum is uint16_t, so we should avoid overflow
+     */
+    s->phdr_num = 1; /* PT_NOTE */
+    if (s->list.num < UINT16_MAX - 2) {
+        s->phdr_num += s->list.num;
+        s->have_section = false;
+    } else {
+        s->have_section = true;
+        s->phdr_num = PN_XNUM;
+        s->sh_info = 1; /* PT_NOTE */
+
+        /* the type of shdr->sh_info is uint32_t, so we should avoid overflow */
+        if (s->list.num <= UINT32_MAX - 1) {
+            s->sh_info += s->list.num;
+        } else {
+            s->sh_info = UINT32_MAX;
+        }
+    }
+
+    return 0;
+
+cleanup:
+    if (s->resume) {
+        vm_start();
+    }
+
+    return -1;
+}
+
+void qmp_dump_guest_memory(bool paging, const char *file, bool has_begin,
+                           int64_t begin, bool has_length, int64_t length,
+                           Error **errp)
+{
+    const char *p;
+    int fd = -1;
+    DumpState *s;
+    int ret;
+
+    if (has_begin && !has_length) {
+        error_set(errp, QERR_MISSING_PARAMETER, "length");
+        return;
+    }
+    if (!has_begin && has_length) {
+        error_set(errp, QERR_MISSING_PARAMETER, "begin");
+        return;
+    }
+
+#if !defined(WIN32)
+    if (strstart(file, "fd:", &p)) {
+        fd = monitor_get_fd(cur_mon, p);
+        if (fd == -1) {
+            error_set(errp, QERR_FD_NOT_FOUND, p);
+            return;
+        }
+    }
+#endif
+
+    if  (strstart(file, "file:", &p)) {
+        fd = qemu_open(p, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, S_IRUSR);
+        if (fd < 0) {
+            error_set(errp, QERR_OPEN_FILE_FAILED, p);
+            return;
+        }
+    }
+
+    if (fd == -1) {
+        error_set(errp, QERR_INVALID_PARAMETER, "protocol");
+        return;
+    }
+
+    s = g_malloc(sizeof(DumpState));
+
+    ret = dump_init(s, fd, paging, has_begin, begin, length, errp);
+    if (ret < 0) {
+        g_free(s);
+        return;
+    }
+
+    if (create_vmcore(s) < 0 && !error_is_set(s->errp)) {
+        error_set(errp, QERR_IO_ERROR);
+    }
+
+    g_free(s);
+}
diff --git a/elf.h b/elf.h
index e1422b8..9c9acfa 100644
--- a/elf.h
+++ b/elf.h
@@ -1037,6 +1037,11 @@  typedef struct elf64_sym {
 
 #define EI_NIDENT	16
 
+/* Special value for e_phnum.  This indicates that the real number of
+   program headers is too large to fit into e_phnum.  Instead the real
+   value is in the field sh_info of section 0.  */
+#define PN_XNUM         0xffff
+
 typedef struct elf32_hdr{
   unsigned char	e_ident[EI_NIDENT];
   Elf32_Half	e_type;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index a6f5a84..7f82b41 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -878,6 +878,34 @@  server will ask the spice/vnc client to automatically reconnect using the
 new parameters (if specified) once the vm migration finished successfully.
 ETEXI
 
+#if defined(CONFIG_HAVE_CORE_DUMP)
+    {
+        .name       = "dump-guest-memory",
+        .args_type  = "paging:-p,protocol:s,begin:i?,length:i?",
+        .params     = "[-p] protocol [begin] [length]",
+        .help       = "dump guest memory to file"
+                      "\n\t\t\t begin(optional): the starting physical address"
+                      "\n\t\t\t length(optional): the memory size, in bytes",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd = hmp_dump_guest_memory,
+    },
+
+
+STEXI
+@item dump-guest-memory [-p] @var{protocol} @var{begin} @var{length}
+@findex dump-guest-memory
+Dump guest memory to @var{protocol}. The file can be processed with crash or
+gdb.
+  protocol: destination file(started with "file:") or destination file
+            descriptor (started with "fd:")
+    paging: do paging to get guest's memory mapping
+     begin: the starting physical address. It's optional, and should be
+            specified with length together.
+    length: the memory size, in bytes. It's optional, and should be specified
+            with begin together.
+ETEXI
+#endif
+
     {
         .name       = "snapshot_blkdev",
         .args_type  = "reuse:-n,device:B,snapshot-file:s?,format:s?",
diff --git a/hmp.c b/hmp.c
index f3e5163..eba1a2f 100644
--- a/hmp.c
+++ b/hmp.c
@@ -943,3 +943,25 @@  void hmp_device_del(Monitor *mon, const QDict *qdict)
     qmp_device_del(id, &err);
     hmp_handle_error(mon, &err);
 }
+
+void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict)
+{
+    Error *errp = NULL;
+    int paging = qdict_get_try_bool(qdict, "paging", 0);
+    const char *file = qdict_get_str(qdict, "protocol");
+    bool has_begin = qdict_haskey(qdict, "begin");
+    bool has_length = qdict_haskey(qdict, "length");
+    int64_t begin = 0;
+    int64_t length = 0;
+
+    if (has_begin) {
+        begin = qdict_get_int(qdict, "begin");
+    }
+    if (has_length) {
+        length = qdict_get_int(qdict, "length");
+    }
+
+    qmp_dump_guest_memory(paging, file, has_begin, begin, has_length, length,
+                          &errp);
+    hmp_handle_error(mon, &errp);
+}
diff --git a/hmp.h b/hmp.h
index 443b812..5cf3241 100644
--- a/hmp.h
+++ b/hmp.h
@@ -61,5 +61,6 @@  void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate(Monitor *mon, const QDict *qdict);
 void hmp_device_del(Monitor *mon, const QDict *qdict);
+void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/memory_mapping.c b/memory_mapping.c
index adb1595..8810bb0 100644
--- a/memory_mapping.c
+++ b/memory_mapping.c
@@ -220,3 +220,30 @@  void qemu_get_guest_simple_memory_mapping(MemoryMappingList *list)
         create_new_memory_mapping(list, block->offset, 0, block->length);
     }
 }
+
+void memory_mapping_filter(MemoryMappingList *list, int64_t begin,
+                           int64_t length)
+{
+    MemoryMapping *cur, *next;
+
+    QTAILQ_FOREACH_SAFE(cur, &list->head, next, next) {
+        if (cur->phys_addr >= begin + length ||
+            cur->phys_addr + cur->length <= begin) {
+            QTAILQ_REMOVE(&list->head, cur, next);
+            list->num--;
+            continue;
+        }
+
+        if (cur->phys_addr < begin) {
+            cur->length -= begin - cur->phys_addr;
+            if (cur->virt_addr) {
+                cur->virt_addr += begin - cur->phys_addr;
+            }
+            cur->phys_addr = begin;
+        }
+
+        if (cur->phys_addr + cur->length > begin + length) {
+            cur->length -= cur->phys_addr + cur->length - begin - length;
+        }
+    }
+}
diff --git a/memory_mapping.h b/memory_mapping.h
index a583e44..b795678 100644
--- a/memory_mapping.h
+++ b/memory_mapping.h
@@ -62,4 +62,7 @@  static inline int qemu_get_guest_memory_mapping(MemoryMappingList *list)
 /* get guest's memory mapping without do paging(virtual address is 0). */
 void qemu_get_guest_simple_memory_mapping(MemoryMappingList *list);
 
+void memory_mapping_filter(MemoryMappingList *list, int64_t begin,
+                           int64_t length);
+
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index ace55f3..6429304 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1721,3 +1721,37 @@ 
 # Since: 0.14.0
 ##
 { 'command': 'device_del', 'data': {'id': 'str'} }
+
+##
+# @dump-guest-memory
+#
+# Dump guest's memory to vmcore.
+#
+# @paging: if true, do paging to get guest's memory mapping. The @paging's
+# default value of @paging is false, If you want to use gdb to process the
+# core, please set @paging to true. The reason why the @paging's value is
+# false:
+#   1. guest machine in a catastrophic state can have corrupted memory,
+#      which we cannot trust.
+#   2. The guest machine can be in read-mode even if paging is enabled.
+#      For example: the guest machine uses ACPI to sleep, and ACPI sleep
+#      state goes in real-mode
+# @protocol: the filename or file descriptor of the vmcore. The supported
+# protocol can be file or fd:
+#   1. file: the protocol starts with "file:", and the following string is
+#      the file's path.
+#   2. fd: the protocol starts with "fd:", and the following string is the
+#      fd's name. This command doesn't support the fd that is is associated
+#      with a pipe, socket, or FIFO(lseek will fail with such fd).
+# @begin: #optional if specified, the starting physical address.
+# @length: #optional if specified, the memory size, in bytes. If you don't
+# want to dump all guest's memory, please specify the start @begin and
+# @length
+#
+# Returns: nothing on success
+#
+# Since: 1.1
+##
+{ 'command': 'dump-guest-memory',
+  'data': { 'paging': 'bool', 'protocol': 'str', '*begin': 'int',
+            '*length': 'int' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index c09ee85..eec70e9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -603,6 +603,44 @@  Example:
 
 EQMP
 
+#if defined(CONFIG_HAVE_CORE_DUMP)
+    {
+        .name       = "dump-guest-memory",
+        .args_type  = "paging:b,protocol:s,begin:i?,end:i?",
+        .params     = "[-p] protocol [begin] [length]",
+        .help       = "dump guest memory to file",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = qmp_marshal_input_dump_guest_memory,
+    },
+
+SQMP
+dump
+
+
+Dump guest memory to file. The file can be processed with crash or gdb.
+
+Arguments:
+
+- "paging": do paging to get guest's memory mapping (json-bool)
+- "protocol": destination file(started with "file:") or destination file
+              descriptor (started with "fd:") (json-string)
+- "begin": the starting physical address. It's optional, and should be specified
+           with length together (json-int)
+- "length": the memory size, in bytes. It's optional, and should be specified
+            with begin together (json-int)
+
+Example:
+
+-> { "execute": "dump-guest-memory", "arguments": { "protocol": "fd:dump" } }
+<- { "return": {} }
+
+Notes:
+
+(1) All boolean arguments default to false
+
+EQMP
+#endif
+
     {
         .name       = "netdev_add",
         .args_type  = "netdev:O",