From patchwork Thu Apr 21 07:03:47 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Gibson X-Patchwork-Id: 92382 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id F0536B6F15 for ; Thu, 21 Apr 2011 17:04:11 +1000 (EST) Received: from localhost ([::1]:45010 helo=lists2.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QCnvs-0004Wt-6O for incoming@patchwork.ozlabs.org; Thu, 21 Apr 2011 03:04:08 -0400 Received: from eggs.gnu.org ([140.186.70.92]:55266) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QCnvj-0004Wl-Qv for qemu-devel@nongnu.org; Thu, 21 Apr 2011 03:04:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QCnvh-0008LO-VK for qemu-devel@nongnu.org; Thu, 21 Apr 2011 03:03:59 -0400 Received: from ozlabs.org ([203.10.76.45]:60752) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QCnvh-0008LG-1v for qemu-devel@nongnu.org; Thu, 21 Apr 2011 03:03:57 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 5DAF7B7042; Thu, 21 Apr 2011 17:03:52 +1000 (EST) Date: Thu, 21 Apr 2011 17:03:47 +1000 From: David Gibson To: qemu-devel@nongnu.org Message-ID: <20110421070347.GG11968@yookeroo> Mail-Followup-To: qemu-devel@nongnu.org, Joerg.Rodel@amd.com, eduard.munteanu@linux360.ro, benh@kernel.crashing.org, aik@ozlabs.ru, agraf@suse.de, Anthony Liguori MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 203.10.76.45 Cc: Anthony Liguori , aik@ozlabs.ru, Joerg.Rodel@amd.com, eduard.munteanu@linux360.ro, agraf@suse.de Subject: [Qemu-devel] Supporting emulation of IOMMUs X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org A few months ago, Eduard - Gabriel Munteanu posted a series of patches implementing support for emulating the AMD PCI IOMMU (http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03196.html). In fact, this series implemented a general DMA/IOMMU layer which can be used by any device model, and one translation backend for this implementing the AMD specific PCI IOMMU. These patches don't seem to have gone anywhere for the last few months, however, and so far I've been unable to contact the author (trying again with this mail). I have an interest in this code, because the pSeries machine will also need IOMMU emulation support. At present we only support virtual devices, through the PAPR interface, and we have support for the hypervisor-controller IOMMU translation in the PAPR VIO code. However, we want to add PCI device support and this will also need IOMMU translation. The series seems to have the right basic approach, so if the author is indeed MIA, I was planning to pick up the patches and resubmit them (with support for the pSeries IOMMU added). Before I do that, I was hoping to get some consensus that this is the right way to go. For reference, I have an updated version of the first patch (which adds the core IOMMU layer) below. From: Eduard - Gabriel Munteanu Date: Fri, 4 Feb 2011 01:32:55 +0200 Subject: [PATCH] Generic DMA memory access interface This introduces replacements for memory access functions like cpu_physical_memory_read(). The new interface can handle address translation and access checking through an IOMMU. David Gibson: I have made several bugfixes and cleanups to Eduard's original patch. * dma_memory_rw() was incorrectly using (uninitialized) plen instead of len in the fallback to no-IOMMU case. * the dma_memory_map() tracking was storing the guest physical address of each mapping, but not the qemu user virtual address. However in unmap() it was then attempting to lookup by virtual using a completely bogus cast. * The dma_memory_rw() function is moved from dma_rw.h to dma_rw.c, it was a bit too much code for an inline. * IOMMU support is now available on all target platforms, not just i386, but is configurable (--enable-iommu/--disable-iommu). Stubs are used so that individual drivers can use the new dma interface and it will turn into old-style cpu physical accesses at no cost on IOMMU-less builds. Signed-off-by: Eduard - Gabriel Munteanu Signed-off-by: David Gibson --- Makefile.target | 1 + configure | 12 ++++ hw/dma_rw.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++ hw/dma_rw.h | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 316 insertions(+), 0 deletions(-) create mode 100644 hw/dma_rw.c create mode 100644 hw/dma_rw.h diff --git a/Makefile.target b/Makefile.target index 95f5eda..c3d36c6 100644 --- a/Makefile.target +++ b/Makefile.target @@ -198,6 +198,7 @@ obj-$(CONFIG_REALLY_VIRTFS) += virtio-9p.o obj-y += rwhandler.o obj-$(CONFIG_KVM) += kvm.o kvm-all.o obj-$(CONFIG_NO_KVM) += kvm-stub.o +obj-$(CONFIG_IOMMU) += dma_rw.o LIBS+=-lz QEMU_CFLAGS += $(VNC_TLS_CFLAGS) diff --git a/configure b/configure index da2da04..fa6f4d5 100755 --- a/configure +++ b/configure @@ -130,6 +130,7 @@ xen="" linux_aio="" attr="" vhost_net="" +iommu="no" xfs="" gprof="no" @@ -723,6 +724,10 @@ for opt do ;; --enable-vhost-net) vhost_net="yes" ;; + --enable-iommu) iommu="yes" + ;; + --disable-iommu) iommu="no" + ;; --disable-opengl) opengl="no" ;; --enable-opengl) opengl="yes" @@ -934,6 +939,8 @@ echo " --enable-docs enable documentation build" echo " --disable-docs disable documentation build" echo " --disable-vhost-net disable vhost-net acceleration support" echo " --enable-vhost-net enable vhost-net acceleration support" +echo " --disable-iommu disable IOMMU emulation support" +echo " --enable-vhost-net enable IOMMU emulation support" echo " --enable-trace-backend=B Set trace backend" echo " Available backends:" $("$source_path"/scripts/tracetool --list-backends) echo " --with-trace-file=NAME Full PATH,NAME of file to store traces" @@ -2608,6 +2615,7 @@ echo "madvise $madvise" echo "posix_madvise $posix_madvise" echo "uuid support $uuid" echo "vhost-net support $vhost_net" +echo "IOMMU support $iommu" echo "Trace backend $trace_backend" echo "Trace output file $trace_file-" echo "spice support $spice" @@ -3412,6 +3420,10 @@ if test "$target_softmmu" = "yes" -a \( \ echo "CONFIG_NEED_MMU=y" >> $config_target_mak fi +if test "$iommu" = "yes" ; then + echo "CONFIG_IOMMU=y" >> $config_target_mak +fi + if test "$gprof" = "yes" ; then echo "TARGET_GPROF=yes" >> $config_target_mak if test "$target_linux_user" = "yes" ; then diff --git a/hw/dma_rw.c b/hw/dma_rw.c new file mode 100644 index 0000000..627835c --- /dev/null +++ b/hw/dma_rw.c @@ -0,0 +1,147 @@ +/* + * Generic DMA memory access interface. + * + * Copyright (c) 2011 Eduard - Gabriel Munteanu + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "dma_rw.h" +#include "range.h" + +void dma_memory_rw(DMADevice *dev, dma_addr_t addr, + void *buf, dma_addr_t len, int is_write) +{ + dma_addr_t paddr, plen; + int err; + + /* + * Fast-path non-iommu. + * More importantly, makes it obvious what this function does. + */ + if (NO_IOMMU(dev)) { + cpu_physical_memory_rw(addr, buf, len, is_write); + return; + } + + while (len) { + err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write); + if (err) { + return; + } + + /* The translation might be valid for larger regions. */ + if (plen > len) { + plen = len; + } + + cpu_physical_memory_rw(paddr, buf, plen, is_write); + + len -= plen; + addr += plen; + buf += plen; + } +} + +typedef struct DMAMemoryMap DMAMemoryMap; +struct DMAMemoryMap { + dma_addr_t addr; + size_t len; + void * buf; + DMAInvalidateMapFunc *invalidate; + void *invalidate_opaque; + + QLIST_ENTRY(DMAMemoryMap) list; +}; + +void dma_invalidate_memory_range(DMADevice *dev, + dma_addr_t addr, dma_addr_t len) +{ + DMAMemoryMap *map; + + QLIST_FOREACH(map, &dev->mmu->memory_maps, list) { + if (ranges_overlap(addr, len, map->addr, map->len)) { + map->invalidate(map->invalidate_opaque); + QLIST_REMOVE(map, list); + free(map); + } + } +} + +void *dma_memory_map(DMADevice *dev, DMAInvalidateMapFunc *cb, void *opaque, + dma_addr_t addr, dma_addr_t *len, int is_write) +{ + int err; + target_phys_addr_t paddr, plen; + void *buf; + + if (NO_IOMMU(dev)) { + return cpu_physical_memory_map(addr, len, is_write); + } + + plen = *len; + err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write); + if (err) { + return NULL; + } + + /* + * If this is true, the virtual region is contiguous, + * but the translated physical region isn't. We just + * clamp *len, much like cpu_physical_memory_map() does. + */ + if (plen < *len) { + *len = plen; + } + + buf = cpu_physical_memory_map(paddr, len, is_write); + + /* We treat maps as remote TLBs to cope with stuff like AIO. */ + if (cb) { + DMAMemoryMap *map; + + map = qemu_malloc(sizeof(DMAMemoryMap)); + map->addr = addr; + map->len = *len; + map->buf = buf; + map->invalidate = cb; + map->invalidate_opaque = opaque; + + QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list); + } + + return buf; +} + +void dma_memory_unmap(DMADevice *dev, void *buffer, dma_addr_t len, + int is_write, dma_addr_t access_len) +{ + cpu_physical_memory_unmap(buffer, len, is_write, access_len); + if (!NO_IOMMU(dev)) { + DMAMemoryMap *map; + + QLIST_FOREACH(map, &dev->mmu->memory_maps, list) { + if ((map->buf == buffer) && (map->len == len)) { + QLIST_REMOVE(map, list); + free(map); + } + } + } +} + diff --git a/hw/dma_rw.h b/hw/dma_rw.h new file mode 100644 index 0000000..58a3d0f --- /dev/null +++ b/hw/dma_rw.h @@ -0,0 +1,156 @@ +#ifndef DMA_RW_H +#define DMA_RW_H + +#include "qemu-common.h" + +typedef uint64_t dma_addr_t; + +typedef struct DMAMmu DMAMmu; +typedef struct DMADevice DMADevice; + +typedef int DMATranslateFunc(DMADevice *dev, + dma_addr_t addr, + dma_addr_t *paddr, + dma_addr_t *len, + int is_write); + +typedef void DMAInvalidateMapFunc(void *); + +#ifndef CONFIG_IOMMU +struct DMAMmu { +}; + +struct DMADevice { +}; + +#define NO_IOMMU(_dev) (1) + +static inline void dma_memory_rw(DMADevice *dev, dma_addr_t addr, + void *buf, dma_addr_t len, int is_write) +{ + cpu_physical_memory_rw(addr, buf, len, is_write); +} + +static inline void *dma_memory_map(DMADevice *dev, + DMAInvalidateMapFunc *cb, void *opaque, + dma_addr_t addr, dma_addr_t *len, + int is_write) +{ + return cpu_physical_memory_map(addr, len, is_write); +} + +static inline void dma_memory_unmap(DMADevice *dev, + void *buffer, dma_addr_t len, + int is_write, dma_addr_t access_len) +{ + cpu_physical_memory_unmap(buffer, len, is_write, access_len); +} + +#define DEFINE_DMA_LD(suffix, size) \ +static inline uint##size##_t \ +dma_ld##suffix(DMADevice *dev, dma_addr_t addr) \ +{ \ + return ld##suffix##_phys(addr); \ +} + +#define DEFINE_DMA_ST(suffix, size) \ +static inline void \ +dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val) \ +{ \ + st##suffix##_phys(addr, val); \ +} + +#else +struct DMAMmu { + DeviceState *iommu; + DMATranslateFunc *translate; + QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps; +}; + +struct DMADevice { + DMAMmu *mmu; +}; + +#define NO_IOMMU(_dev) (!(_dev) || !(_dev)->mmu) + +void dma_memory_rw(DMADevice *dev, dma_addr_t addr, + void *buf, dma_addr_t len, int is_write); + +void *dma_memory_map(DMADevice *dev, + DMAInvalidateMapFunc *cb, void *opaque, + dma_addr_t addr, dma_addr_t *len, + int is_write); +void dma_memory_unmap(DMADevice *dev, + void *buffer, dma_addr_t len, + int is_write, dma_addr_t access_len); + +void dma_invalidate_memory_range(DMADevice *dev, + dma_addr_t addr, dma_addr_t len); + +/* warning: like the corresponding ldX_phys / stX_phys functions, these + * DMA accessors can only handle aligned accesses */ + +#define DEFINE_DMA_LD(suffix, size) \ +static inline uint##size##_t \ +dma_ld##suffix(DMADevice *dev, dma_addr_t addr) \ +{ \ + int err; \ + dma_addr_t paddr, plen; \ + \ + if (NO_IOMMU(dev)) { \ + return ld##suffix##_phys(addr); \ + } \ + \ + err = dev->mmu->translate(dev, addr, &paddr, &plen, 0); \ + if (err || (plen < size / 8)) \ + return 0; \ + \ + return ld##suffix##_phys(paddr); \ +} + +#define DEFINE_DMA_ST(suffix, size) \ +static inline void \ +dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val) \ +{ \ + int err; \ + target_phys_addr_t paddr, plen; \ + \ + if (NO_IOMMU(dev)) { \ + st##suffix##_phys(addr, val); \ + return; \ + } \ + err = dev->mmu->translate(dev, addr, &paddr, &plen, 1); \ + if (err || (plen < size / 8)) \ + return; \ + \ + st##suffix##_phys(paddr, val); \ +} +#endif /* CONFIG_IOMMU */ + +DEFINE_DMA_LD(ub, 8) +DEFINE_DMA_LD(uw, 16) +DEFINE_DMA_LD(l, 32) +DEFINE_DMA_LD(q, 64) + +DEFINE_DMA_ST(b, 8) +DEFINE_DMA_ST(w, 16) +DEFINE_DMA_ST(l, 32) +DEFINE_DMA_ST(q, 64) + +static inline void dma_memory_read(DMADevice *dev, + dma_addr_t addr, + void *buf, + dma_addr_t len) +{ + dma_memory_rw(dev, addr, buf, len, 0); +} + +static inline void dma_memory_write(DMADevice *dev, + dma_addr_t addr, + const void *buf, + dma_addr_t len) +{ + dma_memory_rw(dev, addr, (void *) buf, len, 1); +} + +#endif /* DMA_RW_H */