Patchwork [13/13] pseries: Implement IOMMU and DMA for PAPR PCI devices

login
register
mail settings
Submitter David Gibson
Date March 9, 2012, 5:01 a.m.
Message ID <1331269308-22372-14-git-send-email-david@gibson.dropbear.id.au>
Download mbox | patch
Permalink /patch/145666/
State New
Headers show

Comments

David Gibson - March 9, 2012, 5:01 a.m.
Currently the pseries machine emulation does not support DMA for emulated
PCI devices, because the PAPR spec always requires a (guest visible,
paravirtualized) IOMMU which was not implemented.  Now that we have
infrastructure for IOMMU emulation, we can correct this and allow PCI DMA
for pseries.

With the existing PAPR IOMMU code used for VIO devices, this is almost
trivial. We use a single DMAContext for each (virtual) PCI host bridge,
which is the usual configuration on real PAPR machines (which often have
_many_ PCI host bridges).

Cc: Alex Graf <agraf@suse.de>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/spapr.h     |    1 +
 hw/spapr_pci.c |   15 +++++++++++++++
 hw/spapr_pci.h |    1 +
 3 files changed, 17 insertions(+), 0 deletions(-)
Paolo Bonzini - March 9, 2012, 10:23 a.m.
Il 09/03/2012 06:01, David Gibson ha scritto:
> Currently the pseries machine emulation does not support DMA for emulated
> PCI devices, because the PAPR spec always requires a (guest visible,
> paravirtualized) IOMMU which was not implemented.  Now that we have
> infrastructure for IOMMU emulation, we can correct this and allow PCI DMA
> for pseries.
> 
> With the existing PAPR IOMMU code used for VIO devices, this is almost
> trivial. We use a single DMAContext for each (virtual) PCI host bridge,
> which is the usual configuration on real PAPR machines (which often have
> _many_ PCI host bridges).

What about virtio?

Paolo
David Gibson - March 9, 2012, 10:58 a.m.
On Fri, Mar 09, 2012 at 11:23:58AM +0100, Paolo Bonzini wrote:
> Il 09/03/2012 06:01, David Gibson ha scritto:
> > Currently the pseries machine emulation does not support DMA for emulated
> > PCI devices, because the PAPR spec always requires a (guest visible,
> > paravirtualized) IOMMU which was not implemented.  Now that we have
> > infrastructure for IOMMU emulation, we can correct this and allow PCI DMA
> > for pseries.
> > 
> > With the existing PAPR IOMMU code used for VIO devices, this is almost
> > trivial. We use a single DMAContext for each (virtual) PCI host bridge,
> > which is the usual configuration on real PAPR machines (which often have
> > _many_ PCI host bridges).
> 
> What about virtio?

virtio doesn't use virtualized PCI DMA, it uses direct hypervisor
access to guest memory, by guest physical address.  It *shouldn't*,
but it does - that's the way it's specced and that's the way the guest
kernel expects it to be.  It could be fixed with a new feature bit,
but that's a project for another day.
Benjamin Herrenschmidt - March 11, 2012, 2:02 a.m.
On Fri, 2012-03-09 at 21:58 +1100, David Gibson wrote:
> > What about virtio?
> 
> virtio doesn't use virtualized PCI DMA, it uses direct hypervisor
> access to guest memory, by guest physical address.  It *shouldn't*,
> but it does - that's the way it's specced and that's the way the guest
> kernel expects it to be.  It could be fixed with a new feature bit,
> but that's a project for another day.

More precisely, the patch don't break virtio as virtio just bypasses
this all.

Also having virtio go through the iommu might not be such a great idea,
it should definitely remain optional. The ability of virtio to go
straight to guest memory has some significant performance advantages.

Cheers,
Ben.

Patch

diff --git a/hw/spapr.h b/hw/spapr.h
index 210b868..00b15c8 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -328,6 +328,7 @@  typedef struct sPAPRTCE {
 } sPAPRTCE;
 
 #define SPAPR_VIO_BASE_LIOBN    0x00000000
+#define SPAPR_PCI_BASE_LIOBN    0x80000000
 
 void spapr_iommu_init(void);
 DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c
index 523227b..b53fd40 100644
--- a/hw/spapr_pci.c
+++ b/hw/spapr_pci.c
@@ -201,6 +201,14 @@  static MemoryRegionOps spapr_io_ops = {
 /*
  * PHB PCI device
  */
+static DMAContext *spapr_pci_dma_context_fn(PCIBus *bus, void *opaque,
+                                            int devfn)
+{
+    sPAPRPHBState *phb = opaque;
+
+    return phb->dma;
+}
+
 static int spapr_phb_init(SysBusDevice *s)
 {
     sPAPRPHBState *phb = FROM_SYSBUS(sPAPRPHBState, s);
@@ -208,6 +216,7 @@  static int spapr_phb_init(SysBusDevice *s)
     char namebuf[64];
     int i;
     PCIBus *bus;
+    uint32_t liobn;
 
     sprintf(busname, "pci@%" PRIx64, phb->buid);
 
@@ -248,6 +257,10 @@  static int spapr_phb_init(SysBusDevice *s)
                                                  PCI_DEVFN(0, 0),
                                                  SPAPR_PCI_NUM_LSI);
 
+    liobn = SPAPR_PCI_BASE_LIOBN | (pci_find_domain(bus) << 16);
+    phb->dma = spapr_tce_new_dma_context(liobn, 0x40000000);
+    pci_setup_iommu(bus, spapr_pci_dma_context_fn, phb);
+
     QLIST_INSERT_HEAD(&spapr->phbs, phb, list);
 
     /* Initialize the LSI table */
@@ -400,6 +413,8 @@  int spapr_populate_pci_devices(sPAPRPHBState *phb,
     _FDT(fdt_setprop(fdt, bus_off, "interrupt-map", &interrupt_map,
                      7 * sizeof(interrupt_map[0])));
 
+    spapr_dma_dt(fdt, bus_off, "ibm,dma-window", phb->dma);
+
     return 0;
 }
 
diff --git a/hw/spapr_pci.h b/hw/spapr_pci.h
index b4b8a73..365c75e 100644
--- a/hw/spapr_pci.h
+++ b/hw/spapr_pci.h
@@ -37,6 +37,7 @@  typedef struct sPAPRPHBState {
     MemoryRegion memspace, iospace;
     target_phys_addr_t mem_win_addr, mem_win_size, io_win_addr, io_win_size;
     MemoryRegion memwindow, iowindow;
+    DMAContext *dma;
 
     struct {
         uint32_t dt_irq;