Patchwork pc: map pc ram from user-specified file

login
register
mail settings
Submitter Peter Feiner
Date Nov. 25, 2011, 7:02 p.m.
Message ID <CADiFPYJSjeXf_zO756vUcn6UMELQHYeW9kmCZ+ynDJstSuFHyQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/127753/
State New
Headers show

Comments

Peter Feiner - Nov. 25, 2011, 7:02 p.m.
Enables providing a backing file for the PC's ram. The file is specified by the
new -pcram-file option. The file is mmap'd shared, so the RAMBlock that it backs
doesn't need to be saved by vm_save / migration.

Signed-off-by: Peter Feiner <peter@gridcentric.com>

---

We have found this small feature very useful for experimenting with memory
migration techniques. By exposing PC memory through a simple interface (i.e.,
the filesystem), we can implement various memory migration techniques
independently of QEMU. For example, one can map a VM's ram to a file being
served over a network, thus implementing on-demand fetching.

In the future, RAMBlocks could be mmap'd privately to implement memory sharing.

Note that unlike the existing -mem-path option, which specifies a (hugetlbfs)
directory in which files for all RAMBlocks are to be created, -pcram-file
specifies a file to be mapped for the "pc.ram" RAMBlock

 arch_init.c     |   11 +++++++
 cpu-all.h       |   12 ++++++++
 exec.c          |   83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-common.h   |    2 +
 qemu-options.hx |   13 ++++++++
 vl.c            |    3 ++
 6 files changed, 124 insertions(+), 0 deletions(-)
Juan Quintela - Nov. 27, 2011, 10:43 p.m.
Peter Feiner <peter@gridcentric.ca> wrote:
> Enables providing a backing file for the PC's ram. The file is specified by the
> new -pcram-file option. The file is mmap'd shared, so the RAMBlock that it backs
> doesn't need to be saved by vm_save / migration.
>
> Signed-off-by: Peter Feiner <peter@gridcentric.com>

Hi

Do you have any performance number for this?  And examples on how your
are using it?

> +#ifdef __linux__
> +    new_block->host = mem_file_ram_alloc(new_block, size);
> +    if (new_block->host) {
> +        assert(!host);
> +    } else
> +#endif
>      if (host) {

This test is (at least suspicious).  Shouldn't we check first if host
is not NULL? (Not that I fully understand that part)

Thanks, Juan.
Peter Feiner - Dec. 1, 2011, 4:46 p.m.
> Hi

Hi Juan,

Sorry for taking so long to reply -- my email filters apparently
aren't setup correctly!

> Do you have any performance number for this?  And examples on how your
> are using it?

The performance should depend only on the VMA backing the file, in
addition to any indirect overhead caused by MMU synchronization. If
the file is a disk file that gets flushed from the buffer cache
frequently, then performance will be abysmal. However, if the file is
guaranteed to be in-core (e.g., mounted on a ramfs), then KVM will hit
the same kernel code paths as a file backed by an anonymous VMA that
isn't swapped out.

Our principal use case is implementing VM migration techniques. We're
particularly interested in memory migration. Right now, QEMU
implements VM migration, but QEMU's migration mechanism is inflexible
with respect to memory. That is, the entire contents of the VM's RAM
are copied from the migration host to the migration destination before
the destination VM can run. With the current VM migration
implementation, it's impossible, for instance, to allow the
destination VM to start immediately and lazily fetch its RAM. With the
-pcram-option, we could specify a file for the RAM that's backed by a
fileystem that fetches pages on demand over the network.

>
>> +#ifdef __linux__
>> +    new_block->host = mem_file_ram_alloc(new_block, size);
>> +    if (new_block->host) {
>> +        assert(!host);
>> +    } else
>> +#endif
>>      if (host) {
>
> This test is (at least suspicious).  Shouldn't we check first if host
> is not NULL? (Not that I fully understand that part)

Here's what I'm really testing:

(host == NULL) or (there's no ram file for this RAMBlock)

Here's my rationale:

If there's a ram file for this RAMBlock, then the user of QEMU expects
the RAMBlock to be backed by some file. Presumably the VM wouldn't run
correctly otherwise (e.g., in the case of migration). However, if qemu
passed host!=NULL into qemu_ram_alloc_from_ptr, then it expects the
RAMBlock to be backed by something else; if the RAMBlock were backed
by something other than the passed in host memory, then the VM
presumably wouldn't work properly in this case either. Hence it's an
error for host to be non-NULL and there to be a ram file for this
RAMBlock, which is indicated by mem_file_ram_alloc returning non-NULL.

It's up to the caller of add_memory_file to know if the RAMBlock named
by idstr is normally allocated by qemu_ram_alloc_from_ptr. Hence why
the exposed command-line option is "--pcram-file file" instead of
"--memory-for-arbitrary-ram-block idstr=x,file".

I hope this clears some things up!
Peter
Peter Feiner - Dec. 8, 2011, 10:24 p.m.
>> Do you have any performance number for this?  And examples on how your
>> are using it?

> Our principal use case is implementing VM migration techniques.

There are other uses of a RAM file interface that I can imagine:

- debugging, e.g., inspecting the memory of a VM after it has crashed
- security research, e.g., extracting passwords from a running VM
Peter Feiner - Dec. 19, 2011, 6:26 p.m.
Hi,

Is there any interest in this feature?

BTW, as far as I can tell, on qemu-devel I'm not supposed to re-post
the patch or post a v2 if there haven't been any specific requests for
changes to v1. Please let me know if you'd like me to submit a new
patch!

Thanks,
Peter Feiner
Anthony Liguori - Dec. 19, 2011, 7:15 p.m.
On 12/19/2011 12:26 PM, Peter Feiner wrote:
> Hi,
>
> Is there any interest in this feature?
>
> BTW, as far as I can tell, on qemu-devel I'm not supposed to re-post
> the patch or post a v2 if there haven't been any specific requests for
> changes to v1. Please let me know if you'd like me to submit a new
> patch!

I still don't understand what the use-case is other than "we use this to 
implement RAM migration outside of QEMU" which is not something I'm terribly 
interested in.  I'd prefer you submit patches to improve RAM migration within QEMU.

Regards,

Anthony Liguori

>
> Thanks,
> Peter Feiner
>

Patch

diff --git a/arch_init.c b/arch_init.c
index a411fdf..96e8a28 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -122,6 +122,14 @@  static int ram_save_block(QEMUFile *f)
     if (!block)
         block = QLIST_FIRST(&ram_list.blocks);

+    while (block->do_not_save) {
+        last_block = block;
+        block = QLIST_NEXT(block, next);
+        if (!block) {
+            return 0;
+        }
+    }
+
     current_addr = block->offset + offset;

     do {
@@ -185,6 +193,9 @@  static ram_addr_t ram_save_remaining(void)

     QLIST_FOREACH(block, &ram_list.blocks, next) {
         ram_addr_t addr;
+        if (block->do_not_save) {
+            continue;
+        }
         for (addr = block->offset; addr < block->offset + block->length;
              addr += TARGET_PAGE_SIZE) {
             if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG)) {
diff --git a/cpu-all.h b/cpu-all.h
index 5f47ab8..a78f38c 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -482,6 +482,7 @@  typedef struct RAMBlock {
     uint32_t flags;
     char idstr[256];
     QLIST_ENTRY(RAMBlock) next;
+    int do_not_save;
 #if defined(__linux__) && !defined(TARGET_S390X)
     int fd;
 #endif
@@ -493,6 +494,17 @@  typedef struct RAMList {
 } RAMList;
 extern RAMList ram_list;

+typedef struct MemFile {
+    const char *idstr;
+    const char *path;
+    QLIST_ENTRY(MemFile) next;
+} MemFile;
+
+typedef struct MemFileList {
+    QLIST_HEAD(files, MemFile) files;
+} MemFileList;
+extern MemFileList mem_file_list;
+
 extern const char *mem_path;
 extern int mem_prealloc;

diff --git a/exec.c b/exec.c
index 6b92198..9a1cbca 100644
--- a/exec.c
+++ b/exec.c
@@ -117,6 +117,8 @@  static MemoryRegion *system_io;

 #endif

+MemFileList mem_file_list = { .files = QLIST_HEAD_INITIALIZER(mem_file_list) };
+
 CPUState *first_cpu;
 /* current CPU in the current thread. It is only valid inside
    cpu_exec() */
@@ -2774,6 +2776,59 @@  void qemu_flush_coalesced_mmio_buffer(void)
         kvm_flush_coalesced_mmio_buffer();
 }

+#ifdef __linux__
+static void *mem_file_ram_alloc(RAMBlock *block,
+                                ram_addr_t memory)
+{
+    void *host;
+    MemFile *mf;
+    struct stat buf;
+    int ret;
+
+    QLIST_FOREACH(mf, &mem_file_list.files, next) {
+        if (strcmp(mf->idstr, block->idstr)) {
+            continue;
+        }
+
+        if (kvm_enabled() && !kvm_has_sync_mmu()) {
+            fprintf(stderr, "host lacks kvm mmu notifiers, "
+                            "MemFile unsupported, abort!\n");
+            abort();
+        }
+
+        block->fd = open(mf->path, O_RDWR);
+        if (block->fd == -1) {
+            fprintf(stderr, "Could not open %s for RAMBlock %s, abort!\n",
+                    mf->path, mf->idstr);
+            abort();
+        }
+        ret = fstat(block->fd, &buf);
+        if (ret != 0) {
+            fprintf(stderr, "Could not stat %s for RAMBlock %s, abort!\n",
+                    mf->path, mf->idstr);
+            abort();
+        }
+        if (buf.st_size != memory) {
+            fprintf(stderr,
+                    "File %s has size %luB. RAMBlock %s expects %luB.
Abort!\n",
+                    mf->path, buf.st_size, block->idstr, memory);
+            abort();
+        }
+
+        host = mmap(NULL, memory, PROT_READ | PROT_WRITE, MAP_SHARED,
+                    block->fd, 0);
+        if (host == MAP_FAILED) {
+            fprintf(stderr, "Failed to mmap %s for RAMBlock %s, abort!\n",
+                    mf->path, mf->idstr);
+            abort();
+        }
+        block->do_not_save = 1;
+        return host;
+    }
+    return NULL;
+}
+#endif
+
 #if defined(__linux__) && !defined(TARGET_S390X)

 #include <sys/vfs.h>
@@ -2914,6 +2969,28 @@  static ram_addr_t last_ram_offset(void)
     return last;
 }

+void add_memory_file(const char *idstr, const char *path)
+{
+#ifndef __linux__
+    fprintf(stderr, "MemFile only supported on Linux, abort!\n");
+    abort();
+#else
+    MemFile *mf;
+
+    QLIST_FOREACH(mf, &mem_file_list.files, next) {
+        if (!strcmp(mf->idstr, idstr)) {
+            fprintf(stderr, "MemFile for \"%s\" already specified, abort!\n",
+                    idstr);
+            abort();
+        }
+    }
+    mf = g_malloc0(sizeof(*mf));
+    mf->idstr = idstr;
+    mf->path = path;
+    QLIST_INSERT_HEAD(&mem_file_list.files, mf, next);
+#endif
+}
+
 ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
                                    ram_addr_t size, void *host)
 {
@@ -2940,6 +3017,12 @@  ram_addr_t qemu_ram_alloc_from_ptr(DeviceState
*dev, const char *name,
     }

     new_block->offset = find_ram_offset(size);
+#ifdef __linux__
+    new_block->host = mem_file_ram_alloc(new_block, size);
+    if (new_block->host) {
+        assert(!host);
+    } else
+#endif
     if (host) {
         new_block->host = host;
         new_block->flags |= RAM_PREALLOC_MASK;
diff --git a/qemu-common.h b/qemu-common.h
index 2ce47aa..41adbac 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -306,6 +306,8 @@  char *os_find_datadir(const char *argv0);
 void os_parse_cmd_args(int index, const char *optarg);
 void os_pidfile_error(void);

+void add_memory_file(const char *idstr, const char *path);
+
 /* Convert a byte between binary and BCD.  */
 static inline uint8_t to_bcd(uint8_t val)
 {
diff --git a/qemu-options.hx b/qemu-options.hx
index 681eaf1..25b7c38 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -387,6 +387,19 @@  Preallocate memory when using -mem-path.
 ETEXI
 #endif

+#ifdef __linux__
+DEF("pcram-file", HAS_ARG, QEMU_OPTION_pcram_file,
+    "-pcram-file FILE  provide backing storage for PC RAM\n", QEMU_ARCH_I386)
+STEXI
+@item -pcram-file @var{path}
+Populate guest PC RAM with memory mapped file @var{path}. All changes to guest
+ram are reflected in the file (i.e., it is a @code{MAP_SHARED} mapping).
+
+PC RAM is neither migrated nor saved.
+ETEXI
+#endif
+
+
 DEF("k", HAS_ARG, QEMU_OPTION_k,
     "-k language     use keyboard layout (for example 'fr' for French)\n",
     QEMU_ARCH_ALL)
diff --git a/vl.c b/vl.c
index f5afed4..2d28797 100644
--- a/vl.c
+++ b/vl.c
@@ -2549,6 +2549,9 @@  int main(int argc, char **argv, char **envp)
                 ram_size = value;
                 break;
             }
+            case QEMU_OPTION_pcram_file:
+                add_memory_file("pc.ram", optarg);
+                break;
             case QEMU_OPTION_mempath:
                 mem_path = optarg;
                 break;