From patchwork Wed Apr 21 17:53:28 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cam Macdonell X-Patchwork-Id: 50665 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 6F670B7D0B for ; Thu, 22 Apr 2010 04:06:49 +1000 (EST) Received: from localhost ([127.0.0.1]:48499 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O4eGN-0004XK-Kh for incoming@patchwork.ozlabs.org; Wed, 21 Apr 2010 14:03:03 -0400 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O4e7g-0006Yn-3K for qemu-devel@nongnu.org; Wed, 21 Apr 2010 13:54:04 -0400 Received: from [140.186.70.92] (port=34449 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O4e7X-0006S2-0n for qemu-devel@nongnu.org; Wed, 21 Apr 2010 13:54:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O4e7L-00023E-U1 for qemu-devel@nongnu.org; Wed, 21 Apr 2010 13:53:54 -0400 Received: from fleet.cs.ualberta.ca ([129.128.22.22]:58905) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O4e7L-00021B-Dp for qemu-devel@nongnu.org; Wed, 21 Apr 2010 13:53:43 -0400 Received: from localhost.localdomain (st-brides.cs.ualberta.ca [129.128.23.21]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp-auth.cs.ualberta.ca (Postfix) with ESMTP id 413A928028; Wed, 21 Apr 2010 11:53:32 -0600 (MDT) From: Cam Macdonell To: kvm@vger.kernel.org Date: Wed, 21 Apr 2010 11:53:28 -0600 Message-Id: <1271872408-22842-5-git-send-email-cam@cs.ualberta.ca> X-Mailer: git-send-email 1.6.2.5 In-Reply-To: <1271872408-22842-4-git-send-email-cam@cs.ualberta.ca> References: <1271872408-22842-1-git-send-email-cam@cs.ualberta.ca> <1271872408-22842-2-git-send-email-cam@cs.ualberta.ca> <1271872408-22842-3-git-send-email-cam@cs.ualberta.ca> <1271872408-22842-4-git-send-email-cam@cs.ualberta.ca> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) Cc: Cam Macdonell , qemu-devel@nongnu.org Subject: [Qemu-devel] [PATCH v5 4/5] Inter-VM shared memory PCI device X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. -device ivshmem,size=[,shm=] Interrupts are supported between multiple VMs by using a shared memory server by using a chardev socket. -device ivshmem,size=[,shm=] [,chardev=][,msi=on][,irqfd=on][,vectors=n] -chardev socket,path=,id= (shared memory server is qemu.git/contrib/ivshmem-server) Sample programs and init scripts are in a git repo here: www.gitorious.org/nahanni --- Makefile.target | 3 + hw/ivshmem.c | 727 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ qemu-char.c | 6 + qemu-char.h | 3 + qemu-doc.texi | 25 ++ 5 files changed, 764 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c diff --git a/Makefile.target b/Makefile.target index 1ffd802..bc9a681 100644 --- a/Makefile.target +++ b/Makefile.target @@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o obj-y += rtl8139.o obj-y += e1000.o +# Inter-VM PCI shared memory +obj-y += ivshmem.o + # Hardware support obj-i386-y = pckbd.o dma.o obj-i386-y += vga.o diff --git a/hw/ivshmem.c b/hw/ivshmem.c new file mode 100644 index 0000000..f8d8fdb --- /dev/null +++ b/hw/ivshmem.c @@ -0,0 +1,727 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell + * + * Based On: cirrus_vga.c and rtl8139.c + * + * This code is licensed under the GNU GPL v2. + */ +#include +#include +#include +#include +#include +#include +#include "hw.h" +#include "console.h" +#include "pc.h" +#include "pci.h" +#include "sysemu.h" + +#include "msix.h" +#include "qemu-kvm.h" +#include "libkvm.h" + +#include +#include +#include +#include + +#define IVSHMEM_IRQFD 0 +#define IVSHMEM_MSI 1 + +#define DEBUG_IVSHMEM +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, args...) \ + do {printf("IVSHMEM: " fmt, ##args); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, args...) +#endif + +typedef struct EventfdEntry { + PCIDevice *pdev; + int vector; +} EventfdEntry; + +typedef struct IVShmemState { + PCIDevice dev; + uint32_t intrmask; + uint32_t intrstatus; + uint32_t doorbell; + + CharDriverState * chr; + CharDriverState ** eventfd_chr; + int ivshmem_mmio_io_addr; + + pcibus_t mmio_addr; + unsigned long ivshmem_offset; + uint64_t ivshmem_size; /* size of shared memory region */ + int shm_fd; /* shared memory file descriptor */ + + int nr_allocated_vms; + /* array of eventfds for each guest */ + int ** eventfds; + /* keep track of # of eventfds for each guest*/ + int * eventfds_posn_count; + + int nr_alloc_guests; + int vm_id; + int num_eventfds; + uint32_t vectors; + uint32_t features; + EventfdEntry *eventfd_table; + + char * shmobj; + char * sizearg; +} IVShmemState; + +/* registers for the Inter-VM shared memory device */ +enum ivshmem_registers { + IntrMask = 0, + IntrStatus = 4, + IVPosition = 8, + Doorbell = 12, +}; + +static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, int feature) { + return (ivs->features & (1 << feature)); +} + +static inline int is_power_of_two(int x) { + return (x & (x-1)) == 0; +} + +static void ivshmem_map(PCIDevice *pci_dev, int region_num, + pcibus_t addr, pcibus_t size, int type) +{ + IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev); + + IVSHMEM_DPRINTF("addr = %u size = %u\n", (uint32_t)addr, (uint32_t)size); + cpu_register_physical_memory(addr, s->ivshmem_size, s->ivshmem_offset); + +} + +/* accessing registers - based on rtl8139 */ +static void ivshmem_update_irq(IVShmemState *s, int val) +{ + int isr; + isr = (s->intrstatus & s->intrmask) & 0xffffffff; + + /* don't print ISR resets */ + if (isr) { + IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n", + isr ? 1 : 0, s->intrstatus, s->intrmask); + } + + qemu_set_irq(s->dev.irq[0], (isr != 0)); +} + +static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val) +{ + IVSHMEM_DPRINTF("IntrMask write(w) val = 0x%04x\n", val); + + s->intrmask = val; + + ivshmem_update_irq(s, val); +} + +static uint32_t ivshmem_IntrMask_read(IVShmemState *s) +{ + uint32_t ret = s->intrmask; + + IVSHMEM_DPRINTF("intrmask read(w) val = 0x%04x\n", ret); + + return ret; +} + +static void ivshmem_IntrStatus_write(IVShmemState *s, uint32_t val) +{ + IVSHMEM_DPRINTF("IntrStatus write(w) val = 0x%04x\n", val); + + s->intrstatus = val; + + ivshmem_update_irq(s, val); + return; +} + +static uint32_t ivshmem_IntrStatus_read(IVShmemState *s) +{ + uint32_t ret = s->intrstatus; + + /* reading ISR clears all interrupts */ + s->intrstatus = 0; + + ivshmem_update_irq(s, 0); + + return ret; +} + +static void ivshmem_io_writew(void *opaque, uint8_t addr, uint32_t val) +{ + + IVSHMEM_DPRINTF("We shouldn't be writing words\n"); +} + +static void ivshmem_io_writel(void *opaque, uint8_t addr, uint32_t val) +{ + IVShmemState *s = opaque; + + u_int64_t write_one = 1; + u_int16_t dest = val >> 16; + u_int16_t vector = val & 0xff; + + addr &= 0xfe; + + switch (addr) + { + case IntrMask: + ivshmem_IntrMask_write(s, val); + break; + + case IntrStatus: + ivshmem_IntrStatus_write(s, val); + break; + + case Doorbell: + /* check doorbell range */ + if ((vector >= 0) && (vector < s->eventfds_posn_count[dest])) { + IVSHMEM_DPRINTF("Writing %ld to VM %d on vector %d\n", write_one, dest, vector); + if (write(s->eventfds[dest][vector], &(write_one), 8) != 8) { + IVSHMEM_DPRINTF("error writing to eventfd\n"); + } + } + break; + default: + IVSHMEM_DPRINTF("Invalid VM Doorbell VM %d\n", dest); + } +} + +static void ivshmem_io_writeb(void *opaque, uint8_t addr, uint32_t val) +{ + IVSHMEM_DPRINTF("We shouldn't be writing bytes\n"); +} + +static uint32_t ivshmem_io_readw(void *opaque, uint8_t addr) +{ + + IVSHMEM_DPRINTF("We shouldn't be reading words\n"); + return 0; +} + +static uint32_t ivshmem_io_readl(void *opaque, uint8_t addr) +{ + + IVShmemState *s = opaque; + uint32_t ret; + + switch (addr) + { + case IntrMask: + ret = ivshmem_IntrMask_read(s); + break; + + case IntrStatus: + ret = ivshmem_IntrStatus_read(s); + break; + + case IVPosition: + /* return my id in the ivshmem list */ + ret = s->vm_id; + break; + + default: + IVSHMEM_DPRINTF("why are we reading 0x%x\n", addr); + ret = 0; + } + + return ret; + +} + +static uint32_t ivshmem_io_readb(void *opaque, uint8_t addr) +{ + IVSHMEM_DPRINTF("We shouldn't be reading bytes\n"); + + return 0; +} + +static void ivshmem_mmio_writeb(void *opaque, + target_phys_addr_t addr, uint32_t val) +{ + ivshmem_io_writeb(opaque, addr & 0xFF, val); +} + +static void ivshmem_mmio_writew(void *opaque, + target_phys_addr_t addr, uint32_t val) +{ + ivshmem_io_writew(opaque, addr & 0xFF, val); +} + +static void ivshmem_mmio_writel(void *opaque, + target_phys_addr_t addr, uint32_t val) +{ + ivshmem_io_writel(opaque, addr & 0xFF, val); +} + +static uint32_t ivshmem_mmio_readb(void *opaque, target_phys_addr_t addr) +{ + return ivshmem_io_readb(opaque, addr & 0xFF); +} + +static uint32_t ivshmem_mmio_readw(void *opaque, target_phys_addr_t addr) +{ + uint32_t val = ivshmem_io_readw(opaque, addr & 0xFF); + return val; +} + +static uint32_t ivshmem_mmio_readl(void *opaque, target_phys_addr_t addr) +{ + uint32_t val = ivshmem_io_readl(opaque, addr & 0xFF); + return val; +} + +static CPUReadMemoryFunc *ivshmem_mmio_read[3] = { + ivshmem_mmio_readb, + ivshmem_mmio_readw, + ivshmem_mmio_readl, +}; + +static CPUWriteMemoryFunc *ivshmem_mmio_write[3] = { + ivshmem_mmio_writeb, + ivshmem_mmio_writew, + ivshmem_mmio_writel, +}; + +static void ivshmem_receive(void *opaque, const uint8_t *buf, int size) +{ + IVShmemState *s = opaque; + + ivshmem_IntrStatus_write(s, *buf); + + IVSHMEM_DPRINTF("ivshmem_receive 0x%02x\n", *buf); +} + +static int ivshmem_can_receive(void * opaque) +{ + return 8; +} + +static void ivshmem_event(void *opaque, int event) +{ + IVSHMEM_DPRINTF("ivshmem_event %d\n", event); +} + +static void fake_irqfd(void *opaque, const uint8_t *buf, int size) { + + EventfdEntry *entry = opaque; + PCIDevice *pdev = entry->pdev; + + IVSHMEM_DPRINTF("fake irqfd on vector %d\n", entry->vector); + msix_notify(pdev, entry->vector); +} + +static CharDriverState* create_eventfd_chr_device(void * opaque, int eventfd, + int vector) +{ + /* create a event character device based on the passed eventfd */ + IVShmemState *s = opaque; + CharDriverState * chr; + + chr = qemu_chr_open_eventfd(eventfd); + + if (chr == NULL) { + IVSHMEM_DPRINTF("creating eventfd for eventfd %d failed\n", eventfd); + exit(-1); + } + + if (ivshmem_has_feature(s, IVSHMEM_MSI)) { + s->eventfd_table[vector].pdev = &s->dev; + s->eventfd_table[vector].vector = vector; + + qemu_chr_add_handlers(chr, ivshmem_can_receive, fake_irqfd, + ivshmem_event, &s->eventfd_table[vector]); + } else { + qemu_chr_add_handlers(chr, ivshmem_can_receive, ivshmem_receive, + ivshmem_event, s); + } + + return chr; + +} + +static int check_shm_size(IVShmemState *s, int shmemfd) { + /* check that the guest isn't going to try and map more memory than the + * card server allocated return -1 to indicate error */ + + struct stat buf; + + fstat(shmemfd, &buf); + + if (s->ivshmem_size > buf.st_size) { + fprintf(stderr, "IVSHMEM ERROR: Requested memory size greater"); + fprintf(stderr, " than shared object size (%ld > %ld)\n", + s->ivshmem_size, buf.st_size); + return -1; + } else { + return 0; + } +} + +static void create_shared_memory_BAR(IVShmemState *s, int fd) { + + s->shm_fd = fd; + + s->ivshmem_offset = qemu_ram_mmap(s->shm_fd, s->ivshmem_size, + MAP_SHARED, 0); + + /* region for shared memory */ + pci_register_bar(&s->dev, 2, s->ivshmem_size, + PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_map); +} + +static void close_guest_eventfds(IVShmemState *s, int posn) +{ + int i, guest_curr_max; + + guest_curr_max = s->eventfds_posn_count[posn]; + + for (i = 0; i < guest_curr_max; i++) + close(s->eventfds[posn][i]); + + free(s->eventfds[posn]); + s->eventfds_posn_count[posn] = 0; +} + +/* this function increase the dynamic storage need to store data about other + * guests */ +static void increase_dynamic_storage(IVShmemState *s, int new_min_size) { + + int j, old_nr_alloc; + + old_nr_alloc = s->nr_alloc_guests; + + while (s->nr_alloc_guests < new_min_size) + s->nr_alloc_guests = s->nr_alloc_guests * 2; + + IVSHMEM_DPRINTF("bumping storage to %d guests\n", s->nr_alloc_guests); + s->eventfds = qemu_realloc(s->eventfds, s->nr_alloc_guests * + sizeof(int *)); + s->eventfds_posn_count = qemu_realloc(s->eventfds_posn_count, + s->nr_alloc_guests * + sizeof(int)); + s->eventfd_table = qemu_realloc(s->eventfd_table, s->nr_alloc_guests * + sizeof(EventfdEntry)); + + if ((s->eventfds == NULL) || (s->eventfds_posn_count == NULL) || + (s->eventfd_table == NULL)) { + fprintf(stderr, "Allocation error - exiting\n"); + exit(1); + } + + if (!ivshmem_has_feature(s, IVSHMEM_IRQFD)) { + s->eventfd_chr = (CharDriverState **)qemu_realloc(s->eventfd_chr, + s->nr_alloc_guests * sizeof(void *)); + if (s->eventfd_chr == NULL) { + fprintf(stderr, "Allocation error - exiting\n"); + exit(1); + } + } + + /* zero out new pointers */ + for (j = old_nr_alloc; j < s->nr_alloc_guests; j++) { + s->eventfds[j] = NULL; + } +} + +static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) +{ + IVShmemState *s = opaque; + int incoming_fd, tmp_fd; + int guest_curr_max; + long incoming_posn; + + memcpy(&incoming_posn, buf, sizeof(long)); + /* pick off s->chr->msgfd and store it, posn should accompany msg */ + tmp_fd = qemu_chr_get_msgfd(s->chr); + IVSHMEM_DPRINTF("posn is %ld, fd is %d\n", incoming_posn, tmp_fd); + + /* make sure we have enough space for this guest */ + if (incoming_posn >= s->nr_alloc_guests) { + increase_dynamic_storage(s, incoming_posn); + } + + if (tmp_fd == -1) { + /* if posn is positive and unseen before then this is our posn*/ + if ((incoming_posn >= 0) && (s->eventfds[incoming_posn] == NULL)) { + /* receive our posn */ + s->vm_id = incoming_posn; + return; + } else { + /* otherwise an fd == -1 means an existing guest has gone away */ + IVSHMEM_DPRINTF("posn %ld has gone away\n", incoming_posn); + close_guest_eventfds(s, incoming_posn); + return; + } + } + + /* because of the implementation of get_msgfd, we need a dup */ + incoming_fd = dup(tmp_fd); + + /* if the position is -1, then it's shared memory region fd */ + if (incoming_posn == -1) { + + s->num_eventfds = 0; + + if (check_shm_size(s, incoming_fd) == -1) { + exit(-1); + } + + /* creating a BAR in qemu_chr callback may be crazy */ + create_shared_memory_BAR(s, incoming_fd); + + return; + } + + /* each guest has an array of eventfds, and we keep track of how many + * guests for each VM */ + guest_curr_max = s->eventfds_posn_count[incoming_posn]; + if (guest_curr_max == 0) { + /* one eventfd per MSI vector */ + s->eventfds[incoming_posn] = (int *) qemu_malloc(s->vectors * + sizeof(int)); + } + + /* this is an eventfd for a particular guest VM */ + IVSHMEM_DPRINTF("eventfds[%ld][%d] = %d\n", incoming_posn, guest_curr_max, + incoming_fd); + s->eventfds[incoming_posn][guest_curr_max] = incoming_fd; + + /* increment count for particular guest */ + s->eventfds_posn_count[incoming_posn]++; + + /* ioeventfd and irqfd are enabled together, + * so the flag IRQFD refers to both */ + if (ivshmem_has_feature(s, IVSHMEM_IRQFD) && guest_curr_max >= 0) { + /* allocate ioeventfd for the new fd + * received for guest @ incoming_posn */ + kvm_set_ioeventfd_mmio_long(incoming_fd, s->mmio_addr + Doorbell, + (incoming_posn << 16) | guest_curr_max, 1); + } + + /* keep track of the maximum VM ID */ + if (incoming_posn > s->num_eventfds) { + s->num_eventfds = incoming_posn; + } + + if (incoming_posn == s->vm_id) { + if (ivshmem_has_feature(s, IVSHMEM_IRQFD)) { + /* setup irqfd for this VM's eventfd */ + int vector = guest_curr_max; + kvm_set_irqfd(s->eventfds[s->vm_id][guest_curr_max], vector, + s->dev.msix_irq_entries[vector].gsi); + } else { + /* initialize char device for callback + * if this is one of my eventfd */ + s->eventfd_chr[guest_curr_max] = create_eventfd_chr_device(s, + s->eventfds[s->vm_id][guest_curr_max], guest_curr_max); + } + } + + return; +} + +static void ivshmem_reset(DeviceState *d) +{ + return; +} + +static void ivshmem_mmio_map(PCIDevice *pci_dev, int region_num, + pcibus_t addr, pcibus_t size, int type) +{ + IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev); + + s->mmio_addr = addr; + cpu_register_physical_memory(addr + 0, 0x400, s->ivshmem_mmio_io_addr); + + /* now that our mmio region has been allocated, we can receive + * the file descriptors */ + if (s->chr != NULL) { + qemu_chr_add_handlers(s->chr, ivshmem_can_receive, ivshmem_read, + ivshmem_event, s); + } + +} + +static uint64_t ivshmem_get_size(IVShmemState * s) { + + uint64_t value; + char *ptr; + + value = strtoul(s->sizearg, &ptr, 10); + switch (*ptr) { + case 0: case 'M': case 'm': + value <<= 20; + break; + case 'G': case 'g': + value <<= 30; + break; + default: + fprintf(stderr, "qemu: invalid ram size: %s\n", s->sizearg); + exit(1); + } + + /* BARs must be a power of 2 */ + if (!is_power_of_two(value)) { + fprintf(stderr, "ivshmem: size must be power of 2\n"); + exit(1); + } + + return value; + +} + +static int pci_ivshmem_init(PCIDevice *dev) +{ + IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); + uint8_t *pci_conf; + int i; + + if (s->sizearg == NULL) + s->ivshmem_size = 4 << 20; /* 4 MB default */ + else { + s->ivshmem_size = ivshmem_get_size(s); + } + + /* IRQFD requires MSI */ + if (ivshmem_has_feature(s, IVSHMEM_IRQFD) && + !ivshmem_has_feature(s, IVSHMEM_MSI)) { + fprintf(stderr, "ivshmem: ioeventfd/irqfd requires MSI\n"); + exit(1); + } + + pci_conf = s->dev.config; + pci_conf[0x00] = 0xf4; /* Qumranet vendor ID 0x5002 */ + pci_conf[0x01] = 0x1a; + pci_conf[0x02] = 0x10; + pci_conf[0x03] = 0x11; + pci_conf[0x04] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY; + pci_conf[0x0a] = 0x00; /* RAM controller */ + pci_conf[0x0b] = 0x05; + pci_conf[0x0e] = 0x00; /* header_type */ + + s->ivshmem_mmio_io_addr = cpu_register_io_memory(ivshmem_mmio_read, + ivshmem_mmio_write, s); + /* region for registers*/ + pci_register_bar(&s->dev, 0, 0x400, + PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_mmio_map); + + /* allocate the MSI-X vectors */ + if (ivshmem_has_feature(s, IVSHMEM_MSI)) { + + if (!msix_init(&s->dev, s->vectors, 1, 0)) { + pci_register_bar(&s->dev, 1, + msix_bar_size(&s->dev), + PCI_BASE_ADDRESS_SPACE_MEMORY, + msix_mmio_map); + IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors); + } else { + IVSHMEM_DPRINTF("msix initialization failed\n"); + } + + /* 'activate' the vectors */ + for (i = 0; i < s->vectors; i++) { + msix_vector_use(&s->dev, i); + } + } + + if ((s->chr != NULL) && (strncmp(s->chr->filename, "unix:", 5) == 0)) { + /* if we get a UNIX socket as the parameter we will talk + * to the ivshmem server later once the MMIO BAR is actually + * allocated (see ivshmem_mmio_map) */ + + IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n", + s->chr->filename); + + /* we allocate enough space for 16 guests and grow as needed */ + s->nr_alloc_guests = 16; + s->vm_id = -1; + + /* allocate/initialize space for interrupt handling */ + s->eventfds = qemu_mallocz(s->nr_alloc_guests * sizeof(int *)); + s->eventfd_table = qemu_mallocz(s->vectors * sizeof(EventfdEntry)); + s->eventfds_posn_count = qemu_mallocz(s->nr_alloc_guests * sizeof(int)); + + pci_conf[PCI_INTERRUPT_PIN] = 1; /* we are going to support interrupts */ + + if (!ivshmem_has_feature(s, IVSHMEM_IRQFD)) { + s->eventfd_chr = (CharDriverState **)qemu_malloc(s->nr_alloc_guests * + sizeof(void *)); + } + + } else { + /* just map the file immediately, we're not using a server */ + int fd; + + if (s->shmobj == NULL) { + fprintf(stderr, "Must specify 'chardev' or 'shm' to ivshmem\n"); + } + + IVSHMEM_DPRINTF("using shm_open (shm object = %s)\n", s->shmobj); + + /* try opening with O_EXCL and if it succeeds zero the memory + * by truncating to 0 */ + if ((fd = shm_open(s->shmobj, O_CREAT|O_RDWR|O_EXCL, + S_IRWXU|S_IRWXG|S_IRWXO)) > 0) { + /* truncate file to length PCI device's memory */ + if (ftruncate(fd, s->ivshmem_size) != 0) { + fprintf(stderr, "kvm_ivshmem: could not truncate shared file\n"); + } + + } else if ((fd = shm_open(s->shmobj, O_CREAT|O_RDWR, + S_IRWXU|S_IRWXG|S_IRWXO)) < 0) { + fprintf(stderr, "kvm_ivshmem: could not open shared file\n"); + exit(-1); + } + + create_shared_memory_BAR(s, fd); + + } + + + return 0; +} + +static int pci_ivshmem_uninit(PCIDevice *dev) +{ + IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); + + cpu_unregister_io_memory(s->ivshmem_mmio_io_addr); + + return 0; +} + +static PCIDeviceInfo ivshmem_info = { + .qdev.name = "ivshmem", + .qdev.size = sizeof(IVShmemState), + .qdev.reset = ivshmem_reset, + .init = pci_ivshmem_init, + .exit = pci_ivshmem_uninit, + .qdev.props = (Property[]) { + DEFINE_PROP_CHR("chardev", IVShmemState, chr), + DEFINE_PROP_STRING("size", IVShmemState, sizearg), + DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 1), + DEFINE_PROP_BIT("irqfd", IVShmemState, features, IVSHMEM_IRQFD, false), + DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true), + DEFINE_PROP_STRING("shm", IVShmemState, shmobj), + DEFINE_PROP_END_OF_LIST(), + } +}; + +static void ivshmem_register_devices(void) +{ + pci_qdev_register(&ivshmem_info); +} + +device_init(ivshmem_register_devices) diff --git a/qemu-char.c b/qemu-char.c index 048da3f..41cb8c7 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2076,6 +2076,12 @@ static void tcp_chr_read(void *opaque) } } +CharDriverState *qemu_chr_open_eventfd(int eventfd){ + + return qemu_chr_open_fd(eventfd, eventfd); + +} + static void tcp_chr_connect(void *opaque) { CharDriverState *chr = opaque; diff --git a/qemu-char.h b/qemu-char.h index 3a9427b..1571091 100644 --- a/qemu-char.h +++ b/qemu-char.h @@ -93,6 +93,9 @@ void qemu_chr_info_print(Monitor *mon, const QObject *ret_data); void qemu_chr_info(Monitor *mon, QObject **ret_data); CharDriverState *qemu_chr_find(const char *name); +/* add an eventfd to the qemu devices that are polled */ +CharDriverState *qemu_chr_open_eventfd(int eventfd); + extern int term_escape_char; /* async I/O support */ diff --git a/qemu-doc.texi b/qemu-doc.texi index 6647b7b..2df4687 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -706,6 +706,31 @@ Using the @option{-net socket} option, it is possible to make VLANs that span several QEMU instances. See @ref{sec_invocation} to have a basic example. +@section Other Devices + +@subsection Inter-VM Shared Memory device + +With KVM enabled on a Linux host, a shared memory device is available. Guests +map a POSIX shared memory region into the guest as a PCI device that enables +zero-copy communication to the application level of the guests. The basic +syntax is: + +@example +qemu -device ivshmem,size=[,shm=] +@end example + +If desired, interrupts can be sent between guest VMs accessing the same shared +memory region. Interrupt support requires using a shared memory server and +using a chardev socket to connect to it. The code for the shared memory server +is qemu.git/contrib/ivshmem-server. An example syntax when using the shared +memory server is: + +@example +qemu -device ivshmem,size=[,shm=] + [,chardev=][,msi=on][,irqfd=on][,vectors=n] +qemu -chardev socket,path=,id= +@end example + @node direct_linux_boot @section Direct Linux Boot