Patchwork [25/26] Add a PAPR TCE-bypass mechanism for the pSeries machine

login
register
mail settings
Submitter David Gibson
Date March 16, 2011, 4:57 a.m.
Message ID <1300251423-6715-26-git-send-email-david@gibson.dropbear.id.au>
Download mbox | patch
Permalink /patch/87171/
State New
Headers show

Comments

David Gibson - March 16, 2011, 4:57 a.m.
From: Ben Herrenschmidt <benh@kernel.crashing.org>

Usually, PAPR virtual IO devices use a virtual IOMMU mechanism, TCEs,
to mediate all DMA transfers.  While this is necessary for some sorts of
operation, it can be complex to program and slow for others.

This patch implements a mechanism for bypassing TCE translation, treating
"IO" addresses as plain (guest) physical memory addresses.  This has two
main uses:
 * Simple, but 64-bit aware programs like firmwares can use the VIO devices
without the complexity of TCE setup.
 * The guest OS can optionally use the TCE bypass to improve performance in
suitable situations.

The mechanism used is a per-device flag which disables TCE translation.
The flag is toggled with some (hypervisor-implemented) RTAS methods.

Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
---
 hw/spapr_vio.c |   82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/spapr_vio.h |    5 +++
 2 files changed, 87 insertions(+), 0 deletions(-)
Alexander Graf - March 16, 2011, 4:43 p.m.
On 03/16/2011 05:57 AM, David Gibson wrote:
> From: Ben Herrenschmidt<benh@kernel.crashing.org>
>
> Usually, PAPR virtual IO devices use a virtual IOMMU mechanism, TCEs,
> to mediate all DMA transfers.  While this is necessary for some sorts of
> operation, it can be complex to program and slow for others.
>
> This patch implements a mechanism for bypassing TCE translation, treating
> "IO" addresses as plain (guest) physical memory addresses.  This has two
> main uses:
>   * Simple, but 64-bit aware programs like firmwares can use the VIO devices
> without the complexity of TCE setup.
>   * The guest OS can optionally use the TCE bypass to improve performance in
> suitable situations.
>
> The mechanism used is a per-device flag which disables TCE translation.
> The flag is toggled with some (hypervisor-implemented) RTAS methods.

Is this an official extension used by anyone or is it your own invention 
that's not implemented in pHyp?


Alex
David Gibson - March 17, 2011, 2:21 a.m.
On Wed, Mar 16, 2011 at 05:43:55PM +0100, Alexander Graf wrote:
> On 03/16/2011 05:57 AM, David Gibson wrote:
> >From: Ben Herrenschmidt<benh@kernel.crashing.org>
> >
> >Usually, PAPR virtual IO devices use a virtual IOMMU mechanism, TCEs,
> >to mediate all DMA transfers.  While this is necessary for some sorts of
> >operation, it can be complex to program and slow for others.
> >
> >This patch implements a mechanism for bypassing TCE translation, treating
> >"IO" addresses as plain (guest) physical memory addresses.  This has two
> >main uses:
> >  * Simple, but 64-bit aware programs like firmwares can use the VIO devices
> >without the complexity of TCE setup.
> >  * The guest OS can optionally use the TCE bypass to improve performance in
> >suitable situations.
> >
> >The mechanism used is a per-device flag which disables TCE translation.
> >The flag is toggled with some (hypervisor-implemented) RTAS methods.
> 
> Is this an official extension used by anyone or is it your own
> invention that's not implemented in pHyp?

The latter.
Benjamin Herrenschmidt - March 17, 2011, 3:25 a.m.
On Thu, 2011-03-17 at 13:21 +1100, David Gibson wrote:
> > Is this an official extension used by anyone or is it your own
> > invention that's not implemented in pHyp?
> 
> The latter.

The main reason is to avoid having to deal with TCEs in SLOF :-)

Cheers,
Ben.
Alexander Graf - March 17, 2011, 7:44 a.m.
On 17.03.2011, at 04:25, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Thu, 2011-03-17 at 13:21 +1100, David Gibson wrote:
>>> Is this an official extension used by anyone or is it your own
>>> invention that's not implemented in pHyp?
>> 
>> The latter.
> 
> The main reason is to avoid having to deal with TCEs in SLOF :-)

That makes sense :). Let's move this patch to later when you introduce SLOF support then? As it is, it would be unused code.


Alex

>
Benjamin Herrenschmidt - March 17, 2011, 8:44 a.m.
On Thu, 2011-03-17 at 08:44 +0100, Alexander Graf wrote:
> On 17.03.2011, at 04:25, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Thu, 2011-03-17 at 13:21 +1100, David Gibson wrote:
> >>> Is this an official extension used by anyone or is it your own
> >>> invention that's not implemented in pHyp?
> >> 
> >> The latter.
> > 
> > The main reason is to avoid having to deal with TCEs in SLOF :-)
> 
> That makes sense :). Let's move this patch to later when you introduce SLOF
> support then? As it is, it would be unused code.

Well, SLOF is around the corner, I just need to find out where to put
the git repo :-)

Cheers,
Ben.

> 
> Alex
> 
> >
Alexander Graf - March 17, 2011, 9:37 a.m.
On 03/17/2011 09:44 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2011-03-17 at 08:44 +0100, Alexander Graf wrote:
>> On 17.03.2011, at 04:25, Benjamin Herrenschmidt<benh@kernel.crashing.org>  wrote:
>>
>>> On Thu, 2011-03-17 at 13:21 +1100, David Gibson wrote:
>>>>> Is this an official extension used by anyone or is it your own
>>>>> invention that's not implemented in pHyp?
>>>> The latter.
>>> The main reason is to avoid having to deal with TCEs in SLOF :-)
>> That makes sense :). Let's move this patch to later when you introduce SLOF
>> support then? As it is, it would be unused code.
> Well, SLOF is around the corner, I just need to find out where to put
> the git repo :-)

Include it in v4 then :)


Alex

Patch

diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index 96668f3..280f34a 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -224,6 +224,12 @@  int spapr_tce_dma_write(VIOsPAPRDevice *dev, uint64_t taddr, const void *buf,
             (unsigned long long)taddr, size);
 #endif
 
+    /* Check for bypass */
+    if (dev->flags & VIO_PAPR_FLAG_DMA_BYPASS) {
+        cpu_physical_memory_write(taddr, buf, size);
+        return 0;
+    }
+
     while(size) {
         uint64_t tce;
         uint32_t lsize;
@@ -308,6 +314,12 @@  int spapr_tce_dma_read(VIOsPAPRDevice *dev, uint64_t taddr, void *buf,
             (unsigned long long)taddr, size);
 #endif
 
+    /* Check for bypass */
+    if (dev->flags & VIO_PAPR_FLAG_DMA_BYPASS) {
+        cpu_physical_memory_read(taddr, buf, size);
+        return 0;
+    }
+
     while(size) {
         uint64_t tce;
         uint32_t lsize;
@@ -505,6 +517,72 @@  int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq)
     return 0;
 }
 
+/* "quiesce" handling */
+
+static void spapr_vio_quiesce_one(VIOsPAPRDevice *dev)
+{
+    dev->flags &= ~VIO_PAPR_FLAG_DMA_BYPASS;
+    
+    if (dev->rtce_table) {
+        size_t size = (dev->rtce_window_size >> SPAPR_VIO_TCE_PAGE_SHIFT)
+            * sizeof(VIOsPAPR_RTCE);
+        memset(dev->rtce_table, 0, size);
+    }
+
+    dev->crq.qladdr = 0;
+    dev->crq.qsize = 0;
+    dev->crq.qnext = 0;
+}
+
+static void rtas_set_tce_bypass(sPAPREnvironment *spapr, uint32_t token,
+                                uint32_t nargs, target_ulong args,
+                                uint32_t nret, target_ulong rets)
+{
+    VIOsPAPRBus *bus = spapr->vio_bus;
+    VIOsPAPRDevice *dev;
+    uint32_t unit, enable;
+
+    if (nargs != 2) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+    unit = rtas_ld(args, 0);
+    enable = rtas_ld(args, 1);
+    dev = spapr_vio_find_by_reg(bus, unit);
+    if (!dev) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+    if (enable) {
+        dev->flags |= VIO_PAPR_FLAG_DMA_BYPASS;
+    } else {
+        dev->flags &= ~VIO_PAPR_FLAG_DMA_BYPASS;
+    }
+
+    rtas_st(rets, 0, 0);
+}
+
+static void rtas_quiesce(sPAPREnvironment *spapr, uint32_t token,
+                         uint32_t nargs, target_ulong args,
+                         uint32_t nret, target_ulong rets)
+{
+    VIOsPAPRBus *bus = spapr->vio_bus;
+    DeviceState *qdev;
+    VIOsPAPRDevice *dev = NULL;
+
+    if (nargs != 0) {
+        rtas_st(rets, 0, -3);
+        return;
+    }
+ 
+    QLIST_FOREACH(qdev, &bus->bus.children, sibling) {
+        dev = (VIOsPAPRDevice *)qdev;
+        spapr_vio_quiesce_one(dev);
+    }
+
+    rtas_st(rets, 0, 0);
+}
+
 static int spapr_vio_busdev_init(DeviceState *dev, DeviceInfo *info)
 {
     VIOsPAPRDeviceInfo *_info = (VIOsPAPRDeviceInfo *)info;
@@ -581,6 +659,10 @@  VIOsPAPRBus *spapr_vio_bus_init(void)
     spapr_register_hypercall(H_SEND_CRQ, h_send_crq);
     spapr_register_hypercall(H_ENABLE_CRQ, h_enable_crq);
 
+    /* RTAS calls */
+    spapr_rtas_register("ibm,set-tce-bypass", rtas_set_tce_bypass);
+    spapr_rtas_register("quiesce", rtas_quiesce);
+
     for (_info = device_info_list; _info; _info = _info->next) {
         VIOsPAPRDeviceInfo *info = (VIOsPAPRDeviceInfo *)_info;
 
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index b7d0daa..841b043 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -48,6 +48,8 @@  typedef struct VIOsPAPR_CRQ {
 typedef struct VIOsPAPRDevice {
     DeviceState qdev;
     uint32_t reg;
+    uint32_t flags;
+#define VIO_PAPR_FLAG_DMA_BYPASS        0x1
     qemu_irq qirq;
     uint32_t vio_irq_num;
     target_ulong signal_state;
@@ -104,4 +106,7 @@  void spapr_vlan_create(VIOsPAPRBus *bus, uint32_t reg, NICInfo *nd,
 void spapr_vscsi_create(VIOsPAPRBus *bus, uint32_t reg,
                         qemu_irq qirq, uint32_t vio_irq_num);
 
+int spapr_tce_set_bypass(uint32_t unit, uint32_t enable);
+void spapr_vio_quiesce(void);
+
 #endif /* _HW_SPAPR_VIO_H */