diff mbox

[RFC] File-backed memory maps

Message ID 1253275896.11717.239.camel@localhost.localdomain
State Superseded
Headers show

Commit Message

Nathan Baum Sept. 18, 2009, 12:11 p.m. UTC
Hello,

This patch makes QEMU optionally use file-backed mmaps for the
ram_alloc'ed memory from hw/pc.c and hw/vga.c. I've only tested on
Linux, but I haven't knowingly used Linux specific features.

It means that

  qemu ... -sharedmaps prefix:vga,bios

makes two files named "prefix.vga" and "prefix.bios" which contain the
respective memory regions. The maps I've named are 640k, before_4g,
after_4g, vga, bios and option_rom.

I'm mainly interested in the idea of moving the VNC server into its own
process. It would listen for connections as usual and then send
framebuffer updates from the file. Doing that also requires a
side-channel for communicating graphics mode updates and peripheral
input between QEMU and the VNC server. (Something like "-mouse
<char-dev-spec> -keyboard <char-dev-spec>", perhaps?)

This would make it (relatively) easy for people to create other
interfaces to a VM, like RDP, or an MPEG stream.

To be honest, I'm not sure how useful the mappings other than vga could
be.

I also considered mmaping a framebuffer in a known format (e.g. RGBA)
and swapping/reformating the VGA buffer into that periodically. It
would probably be another display type, and could make it even easier
for other processes. OTOH, it would introduce additional latency which
could be an issue for some?

But mainly I didn't do that because I don't know QEMU's innards well
enough to do that yet. :-)

Signed-off-by: Nathan Baum <nathan@parenthephobia.org.uk>

Comments

Jamie Lokier Sept. 18, 2009, 1:59 p.m. UTC | #1
Nathan Baum wrote:
> makes two files named "prefix.vga" and "prefix.bios" which contain the
> respective memory regions. The maps I've named are 640k, before_4g,
> after_4g, vga, bios and option_rom.
> 
> I'm mainly interested in the idea of moving the VNC server into its own
> process. It would listen for connections as usual and then send
> framebuffer updates from the file. Doing that also requires a
> side-channel for communicating graphics mode updates and peripheral
> input between QEMU and the VNC server. (Something like "-mouse
> <char-dev-spec> -keyboard <char-dev-spec>", perhaps?)

Are there any cache coherency issues?

On architectures other than x86, sometimes data written by one process
is not visible to another process mapping the same file, until the
writing process flushes it's cache for those pages.  Whether that's
necessary depends on the address that pages are mapped to.  Afaik,
normally Linux chooses an address where this issue is avoided, but if
you specify it with MAP_FIXED (or whatever KVM does to map pages),
then there's cache coherency to deal with.

-- Jamie
Nathan Baum Sept. 18, 2009, 3:05 p.m. UTC | #2
On Fri, 2009-09-18 at 14:59 +0100, Jamie Lokier wrote:
> Nathan Baum wrote:
> > makes two files named "prefix.vga" and "prefix.bios" which contain the
> > respective memory regions. The maps I've named are 640k, before_4g,
> > after_4g, vga, bios and option_rom.
> > 
> > I'm mainly interested in the idea of moving the VNC server into its own
> > process. It would listen for connections as usual and then send
> > framebuffer updates from the file. Doing that also requires a
> > side-channel for communicating graphics mode updates and peripheral
> > input between QEMU and the VNC server. (Something like "-mouse
> > <char-dev-spec> -keyboard <char-dev-spec>", perhaps?)
> 
> Are there any cache coherency issues?

Hmm. I expect so.

I'm not sure what the consequences of this are, or whether cache
decoherence itself is the major issue. Even if cache coherence is
guaranteed, the contents of VRAM can change whilst the "display server"
is processing it, likely with undesirable results, unless it were double
buffered.

If, upon a dpy_update, VRAM was blitted to "prefix.vga" which we then
msync(), a display server could be sure of always seeing its copy of the
VRAM in a consistent state. It seems to me that might as well be
implemented as the new display type I mentioned later in my post, rather
than a double-buffered version of this more general memory sharing
feature (who wants to be blitting 4GB of RAM around anyway?). I assume
that with that arrangement, cache decoherence wouldn't be an issue on
any platform.

I'm not sure that would leave this patch with much to do. :-)

> On architectures other than x86, sometimes data written by one process
> is not visible to another process mapping the same file, until the
> writing process flushes it's cache for those pages.  Whether that's
> necessary depends on the address that pages are mapped to.  Afaik,
> normally Linux chooses an address where this issue is avoided, but if
> you specify it with MAP_FIXED (or whatever KVM does to map pages),
> then there's cache coherency to deal with.

> -- Jamie
Anthony Liguori Sept. 18, 2009, 3:15 p.m. UTC | #3
Nathan Baum wrote:
> I'm mainly interested in the idea of moving the VNC server into its own
> process. It would listen for connections as usual and then send
> framebuffer updates from the file. Doing that also requires a
> side-channel for communicating graphics mode updates and peripheral
> input between QEMU and the VNC server. (Something like "-mouse
> <char-dev-spec> -keyboard <char-dev-spec>", perhaps?)
>   

I think the preferred way to do this would be to introduce a shared 
memory encoding to VNC.  You could then implement or whatever as a VNC 
client that used this shared memory transport.

You can't always share VGA remote because sometimes it's in planar 
mode.  However, in the cases where memory is proper ram, you could 
potentially share that memory directly resulting in zero copies.

I think the ideal thing to do would be to share a file descriptor that 
was mmap()'able.  Sharing sys v IPC keys is also a possibility and would 
better integrate with XShmImage.

Regards,

Anthony Liguori
Jamie Lokier Sept. 18, 2009, 4:10 p.m. UTC | #4
Nathan Baum wrote:
> > On architectures other than x86, sometimes data written by one process
> > is not visible to another process mapping the same file, until the
> > writing process flushes it's cache for those pages.  Whether that's
> > necessary depends on the address that pages are mapped to.  Afaik,
> > normally Linux chooses an address where this issue is avoided, but if
> > you specify it with MAP_FIXED (or whatever KVM does to map pages),
> > then there's cache coherency to deal with.

I missed out something important: Not only does the writing process
have to clean it's cache (write dirty cachelines); the reading process
has to invalidate it's cache too.

At the system call level, that means the reading process must call
msync().  Until it does that, it may still see some of the data
change, but it's not guaranteed to see all the changes.

If the writer's copying from QEMU's VRAM to a copy, during that copy,
the reading process will see something uncertain; probably not very
useful for a VNC server.  They would need to coordinate this.  Not
sure if that buys anything over whatever KVM has to do already :-)

RDP server add-ons etc. are a good idea, imho, but perhaps can be
implemented as a shared library loaded into qemu?

-- Jamie
diff mbox

Patch

diff --git a/cpu-common.h b/cpu-common.h
index 6302372..6780506 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -30,6 +30,7 @@  static inline void cpu_register_physical_memory(target_phys_addr_t start_addr,
 
 ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
 ram_addr_t qemu_ram_alloc(ram_addr_t);
+ram_addr_t qemu_ram_alloc_named(ram_addr_t, const char *);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
diff --git a/exec.c b/exec.c
index c82e767..58c2a8e 100644
--- a/exec.c
+++ b/exec.c
@@ -2403,14 +2403,29 @@  void qemu_unregister_coalesced_mmio(target_phys_addr_t addr, ram_addr_t size)
         kvm_uncoalesce_mmio_region(addr, size);
 }
 
-ram_addr_t qemu_ram_alloc(ram_addr_t size)
+extern const char *named_map_prefix;
+extern const char *named_maps;
+
+ram_addr_t qemu_ram_alloc_named(ram_addr_t size, const char *name)
 {
     RAMBlock *new_block;
+    const char *p;
 
     size = TARGET_PAGE_ALIGN(size);
     new_block = qemu_malloc(sizeof(*new_block));
-
-    new_block->host = qemu_vmalloc(size);
+    
+    if (name && named_maps &&
+	(p = strstr(named_maps, name)) &&
+	(p == named_maps || p[-1] == ',')) {
+      char buffer[512];
+      int fd;
+      snprintf(buffer, 512, "%s%s", named_map_prefix, name);
+      fd = open(buffer, O_CREAT | O_RDWR, 0640);
+      ftruncate(fd, 0);
+      ftruncate(fd, size);
+      new_block->host = mmap(NULL, size, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+    } else
+      new_block->host = qemu_vmalloc(size);
     new_block->offset = last_ram_offset;
     new_block->length = size;
 
@@ -2430,6 +2445,11 @@  ram_addr_t qemu_ram_alloc(ram_addr_t size)
     return new_block->offset;
 }
 
+ram_addr_t qemu_ram_alloc(ram_addr_t size)
+{
+    return qemu_ram_alloc_named(size, NULL);
+}
+
 void qemu_ram_free(ram_addr_t addr)
 {
     /* TODO: implement this.  */
diff --git a/hw/pc.c b/hw/pc.c
index 58de372..1dac0ca 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1161,7 +1161,7 @@  static void pc_init1(ram_addr_t ram_size,
     vmport_init();
 
     /* allocate RAM */
-    ram_addr = qemu_ram_alloc(0xa0000);
+    ram_addr = qemu_ram_alloc_named(0xa0000, "640k");
     cpu_register_physical_memory(0, 0xa0000, ram_addr);
 
     /* Allocate, even though we won't register, so we don't break the
@@ -1169,7 +1169,7 @@  static void pc_init1(ram_addr_t ram_size,
      * and some bios areas, which will be registered later
      */
     ram_addr = qemu_ram_alloc(0x100000 - 0xa0000);
-    ram_addr = qemu_ram_alloc(below_4g_mem_size - 0x100000);
+    ram_addr = qemu_ram_alloc_named(below_4g_mem_size - 0x100000, "below_4g");
     cpu_register_physical_memory(0x100000,
                  below_4g_mem_size - 0x100000,
                  ram_addr);
@@ -1179,7 +1179,7 @@  static void pc_init1(ram_addr_t ram_size,
 #if TARGET_PHYS_ADDR_BITS == 32
         hw_error("To much RAM for 32-bit physical address");
 #else
-        ram_addr = qemu_ram_alloc(above_4g_mem_size);
+        ram_addr = qemu_ram_alloc_named(above_4g_mem_size, "above_4g");
         cpu_register_physical_memory(0x100000000ULL,
                                      above_4g_mem_size,
                                      ram_addr);
@@ -1200,7 +1200,7 @@  static void pc_init1(ram_addr_t ram_size,
         (bios_size % 65536) != 0) {
         goto bios_error;
     }
-    bios_offset = qemu_ram_alloc(bios_size);
+    bios_offset = qemu_ram_alloc_named(bios_size, "bios");
     ret = load_image(filename, qemu_get_ram_ptr(bios_offset));
     if (ret != bios_size) {
     bios_error:
@@ -1220,7 +1220,7 @@  static void pc_init1(ram_addr_t ram_size,
 
 
 
-    option_rom_offset = qemu_ram_alloc(0x20000);
+    option_rom_offset = qemu_ram_alloc_named(0x20000, "option_rom");
     oprom_area_size = 0;
     cpu_register_physical_memory(0xc0000, 0x20000, option_rom_offset);
 
diff --git a/hw/vga.c b/hw/vga.c
index 514371c..feb0706 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -2238,7 +2238,7 @@  void vga_common_init(VGACommonState *s, int vga_ram_size)
         expand4to8[i] = v;
     }
 
-    s->vram_offset = qemu_ram_alloc(vga_ram_size);
+    s->vram_offset = qemu_ram_alloc_named(vga_ram_size, "vga");
     s->vram_ptr = qemu_get_ram_ptr(s->vram_offset);
     s->vram_size = vga_ram_size;
     s->get_bpp = vga_get_bpp;
diff --git a/qemu-options.hx b/qemu-options.hx
index d3aa55b..6db8b9e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -280,6 +280,10 @@  a suffix of ``M'' or ``G'' can be used to signify a value in megabytes or
 gigabytes respectively.
 ETEXI
 
+DEF("sharedmaps", HAS_ARG, QEMU_OPTION_sharedmaps,
+    "-sharedmaps prefix:maps \
+                     Use files beginning with 'prefix' for memory allocations 'maps'\n")
+
 DEF("k", HAS_ARG, QEMU_OPTION_k,
     "-k language     use keyboard layout (for example 'fr' for French)\n")
 STEXI
diff --git a/vl.c b/vl.c
index eb01da7..8b9b9e0 100644
--- a/vl.c
+++ b/vl.c
@@ -270,6 +270,9 @@  uint8_t qemu_uuid[16];
 static QEMUBootSetHandler *boot_set_handler;
 static void *boot_set_opaque;
 
+char *named_map_prefix = NULL;
+char *named_maps = NULL;
+
 /***********************************************************/
 /* x86 ISA bus support */
 
@@ -5010,6 +5013,18 @@  int main(int argc, char **argv, char **envp)
                 version();
                 exit(0);
                 break;
+	    case QEMU_OPTION_sharedmaps:
+	      if (!strcmp(optarg, "?")) {
+		fprintf(stderr, "supported named maps: ");
+		fprintf(stderr, "640k, below_4g, above_4g, vga, bios, option_rom\n");
+		exit(0);
+	      } else {
+		named_map_prefix = strdup(optarg);
+		named_maps = strchr(named_map_prefix, ':');
+		if (named_maps)
+		  *named_maps++ = '\0';
+	      }
+	      break;
             case QEMU_OPTION_m: {
                 uint64_t value;
                 char *ptr;