Patchwork pci: implement bridge filtering

login
register
mail settings
Submitter Michael S. Tsirkin
Date Sept. 4, 2011, 6:13 p.m.
Message ID <20110904181313.GA14020@redhat.com>
Download mbox | patch
Permalink /patch/113293/
State New
Headers show

Comments

Michael S. Tsirkin - Sept. 4, 2011, 6:13 p.m.
Support bridge filtering on top of the memory
API as suggested by Avi Kivity:

Create a memory region for the bridge's address space.  This region is
not directly added to system_memory or its descendants.  Devices under
the bridge see this region as its pci_address_space().  The region is
as large as the entire address space - it does not take into account
any windows.

For each of the three windows (pref, non-pref, vga), create an alias
with the appropriate start and size.  Map the alias into the bridge's
parent's pci_address_space(), as subregions.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

The below seems to work fine for me so I applied this.
Still need to test bridge filtering, any help with this
appreciated.

 hw/pci.c           |   70 +---------------------------------------
 hw/pci.h           |    2 -
 hw/pci_bridge.c    |   89 +++++++++++++++++++++++++++++++++++++++++++++++++---
 hw/pci_internals.h |    3 ++
 4 files changed, 89 insertions(+), 75 deletions(-)
Wen Congyang - Sept. 14, 2011, 1:48 a.m.
At 09/05/2011 02:13 AM, Michael S. Tsirkin Write:
> Support bridge filtering on top of the memory
> API as suggested by Avi Kivity:
> 
> Create a memory region for the bridge's address space.  This region is
> not directly added to system_memory or its descendants.  Devices under
> the bridge see this region as its pci_address_space().  The region is
> as large as the entire address space - it does not take into account
> any windows.
> 
> For each of the three windows (pref, non-pref, vga), create an alias
> with the appropriate start and size.  Map the alias into the bridge's
> parent's pci_address_space(), as subregions.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> The below seems to work fine for me so I applied this.
> Still need to test bridge filtering, any help with this
> appreciated.
> 


I test bridge filtering, and the BAR still can be visible on guest even if
I change the memory region.

Thanks
Wen Congyang
Wen Congyang - Sept. 20, 2011, 8:09 a.m.
At 09/14/2011 09:48 AM, Wen Congyang Write:
> At 09/05/2011 02:13 AM, Michael S. Tsirkin Write:
>> Support bridge filtering on top of the memory
>> API as suggested by Avi Kivity:
>>
>> Create a memory region for the bridge's address space.  This region is
>> not directly added to system_memory or its descendants.  Devices under
>> the bridge see this region as its pci_address_space().  The region is
>> as large as the entire address space - it does not take into account
>> any windows.
>>
>> For each of the three windows (pref, non-pref, vga), create an alias
>> with the appropriate start and size.  Map the alias into the bridge's
>> parent's pci_address_space(), as subregions.
>>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>
>> The below seems to work fine for me so I applied this.
>> Still need to test bridge filtering, any help with this
>> appreciated.
>>
> 
> 
> I test bridge filtering, and the BAR still can be visible on guest even if
> I change the memory region.

Hi Michael S. Tsirkin:
I test pci bridge filtering on real hardware, and I find that I can mmap
the resource after I change the memory base and memory limit(The BAR should
be not visible on OS after changing the memory region).

So I try to write and read to the BAR. Here is my test result:
1. Before changing the pci bridge's memory region, I can read and write to the memory, and
   I can get the same value that I write.

2. After changing the pci bridge's memory region, I can still read and write to the memory,
   but it is very slow, and I can not get the same value that I write(The value is always 0).

Does this result means that pci bridge filtering works fine?

Thanks
Wen Congyang
Avi Kivity - Sept. 20, 2011, 11:44 a.m.
On 09/20/2011 11:09 AM, Wen Congyang wrote:
> At 09/14/2011 09:48 AM, Wen Congyang Write:
> >  At 09/05/2011 02:13 AM, Michael S. Tsirkin Write:
> >>  Support bridge filtering on top of the memory
> >>  API as suggested by Avi Kivity:
> >>
> >>  Create a memory region for the bridge's address space.  This region is
> >>  not directly added to system_memory or its descendants.  Devices under
> >>  the bridge see this region as its pci_address_space().  The region is
> >>  as large as the entire address space - it does not take into account
> >>  any windows.
> >>
> >>  For each of the three windows (pref, non-pref, vga), create an alias
> >>  with the appropriate start and size.  Map the alias into the bridge's
> >>  parent's pci_address_space(), as subregions.
> >>
> >>  Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >>  ---
> >>
> >>  The below seems to work fine for me so I applied this.
> >>  Still need to test bridge filtering, any help with this
> >>  appreciated.
> >>
> >
> >
> >  I test bridge filtering, and the BAR still can be visible on guest even if
> >  I change the memory region.
>
> Hi Michael S. Tsirkin:
> I test pci bridge filtering on real hardware, and I find that I can mmap
> the resource after I change the memory base and memory limit(The BAR should
> be not visible on OS after changing the memory region).
>
> So I try to write and read to the BAR. Here is my test result:
> 1. Before changing the pci bridge's memory region, I can read and write to the memory, and
>     I can get the same value that I write.
>
> 2. After changing the pci bridge's memory region, I can still read and write to the memory,
>     but it is very slow, and I can not get the same value that I write(The value is always 0).
>
> Does this result means that pci bridge filtering works fine?
>

Yes.  Instead of hitting the BAR, you hit the default mmio handler.
Michael S. Tsirkin - Sept. 20, 2011, 11:55 a.m.
On Tue, Sep 20, 2011 at 04:09:23PM +0800, Wen Congyang wrote:
> At 09/14/2011 09:48 AM, Wen Congyang Write:
> > At 09/05/2011 02:13 AM, Michael S. Tsirkin Write:
> >> Support bridge filtering on top of the memory
> >> API as suggested by Avi Kivity:
> >>
> >> Create a memory region for the bridge's address space.  This region is
> >> not directly added to system_memory or its descendants.  Devices under
> >> the bridge see this region as its pci_address_space().  The region is
> >> as large as the entire address space - it does not take into account
> >> any windows.
> >>
> >> For each of the three windows (pref, non-pref, vga), create an alias
> >> with the appropriate start and size.  Map the alias into the bridge's
> >> parent's pci_address_space(), as subregions.
> >>
> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
> >>
> >> The below seems to work fine for me so I applied this.
> >> Still need to test bridge filtering, any help with this
> >> appreciated.
> >>
> > 
> > 
> > I test bridge filtering, and the BAR still can be visible on guest even if
> > I change the memory region.
> 
> Hi Michael S. Tsirkin:
> I test pci bridge filtering on real hardware, and I find that I can mmap
> the resource after I change the memory base and memory limit(The BAR should
> be not visible on OS after changing the memory region).
> 
> So I try to write and read to the BAR. Here is my test result:
> 1. Before changing the pci bridge's memory region, I can read and write to the memory, and
>    I can get the same value that I write.
> 
> 2. After changing the pci bridge's memory region, I can still read and write to the memory,
>    but it is very slow, and I can not get the same value that I write(The value is always 0).
> 
> Does this result means that pci bridge filtering works fine?
> 
> Thanks
> Wen Congyang

Sounds more or less right except I expect to get ffffffff
and not 0. Avi, any idea?
Avi Kivity - Sept. 20, 2011, 12:12 p.m.
On 09/20/2011 02:55 PM, Michael S. Tsirkin wrote:
> >
> >  Hi Michael S. Tsirkin:
> >  I test pci bridge filtering on real hardware, and I find that I can mmap
> >  the resource after I change the memory base and memory limit(The BAR should
> >  be not visible on OS after changing the memory region).
> >
> >  So I try to write and read to the BAR. Here is my test result:
> >  1. Before changing the pci bridge's memory region, I can read and write to the memory, and
> >     I can get the same value that I write.
> >
> >  2. After changing the pci bridge's memory region, I can still read and write to the memory,
> >     but it is very slow, and I can not get the same value that I write(The value is always 0).
> >
> >  Does this result means that pci bridge filtering works fine?
> >
> >  Thanks
> >  Wen Congyang
>
> Sounds more or less right except I expect to get ffffffff
> and not 0. Avi, any idea?
>

No, what does the default handler do?
Michael S. Tsirkin - Sept. 20, 2011, 12:22 p.m.
On Tue, Sep 20, 2011 at 02:44:26PM +0300, Avi Kivity wrote:
> On 09/20/2011 11:09 AM, Wen Congyang wrote:
> >At 09/14/2011 09:48 AM, Wen Congyang Write:
> >>  At 09/05/2011 02:13 AM, Michael S. Tsirkin Write:
> >>>  Support bridge filtering on top of the memory
> >>>  API as suggested by Avi Kivity:
> >>>
> >>>  Create a memory region for the bridge's address space.  This region is
> >>>  not directly added to system_memory or its descendants.  Devices under
> >>>  the bridge see this region as its pci_address_space().  The region is
> >>>  as large as the entire address space - it does not take into account
> >>>  any windows.
> >>>
> >>>  For each of the three windows (pref, non-pref, vga), create an alias
> >>>  with the appropriate start and size.  Map the alias into the bridge's
> >>>  parent's pci_address_space(), as subregions.
> >>>
> >>>  Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >>>  ---
> >>>
> >>>  The below seems to work fine for me so I applied this.
> >>>  Still need to test bridge filtering, any help with this
> >>>  appreciated.
> >>>
> >>
> >>
> >>  I test bridge filtering, and the BAR still can be visible on guest even if
> >>  I change the memory region.
> >
> >Hi Michael S. Tsirkin:
> >I test pci bridge filtering on real hardware, and I find that I can mmap
> >the resource after I change the memory base and memory limit(The BAR should
> >be not visible on OS after changing the memory region).
> >
> >So I try to write and read to the BAR. Here is my test result:
> >1. Before changing the pci bridge's memory region, I can read and write to the memory, and
> >    I can get the same value that I write.
> >
> >2. After changing the pci bridge's memory region, I can still read and write to the memory,
> >    but it is very slow, and I can not get the same value that I write(The value is always 0).
> >
> >Does this result means that pci bridge filtering works fine?
> >
> 
> Yes.  Instead of hitting the BAR, you hit the default mmio handler.

Hmm, not sure what's right in that case.
But, same if BAR is disabled? Would be nice to make
some handler in bridge to get called, to set
master abort flag etc.

> -- 
> error compiling committee.c: too many arguments to function
Avi Kivity - Sept. 20, 2011, 12:26 p.m.
On 09/20/2011 03:22 PM, Michael S. Tsirkin wrote:
> >
> >  Yes.  Instead of hitting the BAR, you hit the default mmio handler.
>
> Hmm, not sure what's right in that case.
> But, same if BAR is disabled? Would be nice to make
> some handler in bridge to get called, to set
> master abort flag etc.

Put an mmio region spanning the entire pci address range as a child of 
the pci address space.  Make sure it has lower priority than any of the 
BARs (or vga areas).

Patch

diff --git a/hw/pci.c b/hw/pci.c
index 57ff7b1..56dfa18 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -889,7 +889,6 @@  void pci_register_bar(PCIDevice *pci_dev, int region_num,
     r = &pci_dev->io_regions[region_num];
     r->addr = PCI_BAR_UNMAPPED;
     r->size = size;
-    r->filtered_size = size;
     r->type = type;
     r->memory = NULL;
 
@@ -920,41 +919,6 @@  pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
     return pci_dev->io_regions[region_num].addr;
 }
 
-static void pci_bridge_filter(PCIDevice *d, pcibus_t *addr, pcibus_t *size,
-                              uint8_t type)
-{
-    pcibus_t base = *addr;
-    pcibus_t limit = *addr + *size - 1;
-    PCIDevice *br;
-
-    for (br = d->bus->parent_dev; br; br = br->bus->parent_dev) {
-        uint16_t cmd = pci_get_word(d->config + PCI_COMMAND);
-
-        if (type & PCI_BASE_ADDRESS_SPACE_IO) {
-            if (!(cmd & PCI_COMMAND_IO)) {
-                goto no_map;
-            }
-        } else {
-            if (!(cmd & PCI_COMMAND_MEMORY)) {
-                goto no_map;
-            }
-        }
-
-        base = MAX(base, pci_bridge_get_base(br, type));
-        limit = MIN(limit, pci_bridge_get_limit(br, type));
-    }
-
-    if (base > limit) {
-        goto no_map;
-    }
-    *addr = base;
-    *size = limit - base + 1;
-    return;
-no_map:
-    *addr = PCI_BAR_UNMAPPED;
-    *size = 0;
-}
-
 static pcibus_t pci_bar_address(PCIDevice *d,
 				int reg, uint8_t type, pcibus_t size)
 {
@@ -1024,7 +988,7 @@  static void pci_update_mappings(PCIDevice *d)
 {
     PCIIORegion *r;
     int i;
-    pcibus_t new_addr, filtered_size;
+    pcibus_t new_addr;
 
     for(i = 0; i < PCI_NUM_REGIONS; i++) {
         r = &d->io_regions[i];
@@ -1035,14 +999,8 @@  static void pci_update_mappings(PCIDevice *d)
 
         new_addr = pci_bar_address(d, i, r->type, r->size);
 
-        /* bridge filtering */
-        filtered_size = r->size;
-        if (new_addr != PCI_BAR_UNMAPPED) {
-            pci_bridge_filter(d, &new_addr, &filtered_size, r->type);
-        }
-
         /* This bar isn't changed */
-        if (new_addr == r->addr && filtered_size == r->filtered_size)
+        if (new_addr == r->addr)
             continue;
 
         /* now do the real mapping */
@@ -1050,15 +1008,7 @@  static void pci_update_mappings(PCIDevice *d)
             memory_region_del_subregion(r->address_space, r->memory);
         }
         r->addr = new_addr;
-        r->filtered_size = filtered_size;
         if (r->addr != PCI_BAR_UNMAPPED) {
-            /*
-             * TODO: currently almost all the map funcions assumes
-             * filtered_size == size and addr & ~(size - 1) == addr.
-             * However with bridge filtering, they aren't always true.
-             * Teach them such cases, such that filtered_size < size and
-             * addr & (size - 1) != 0.
-             */
             if (r->type & PCI_BASE_ADDRESS_SPACE_IO) {
                 memory_region_add_subregion_overlap(r->address_space,
                                                     r->addr,
@@ -1576,22 +1526,6 @@  PCIDevice *pci_nic_init_nofail(NICInfo *nd, const char *default_model,
     return res;
 }
 
-static void pci_bridge_update_mappings_fn(PCIBus *b, PCIDevice *d)
-{
-    pci_update_mappings(d);
-}
-
-void pci_bridge_update_mappings(PCIBus *b)
-{
-    PCIBus *child;
-
-    pci_for_each_device_under_bus(b, pci_bridge_update_mappings_fn);
-
-    QLIST_FOREACH(child, &b->child, sibling) {
-        pci_bridge_update_mappings(child);
-    }
-}
-
 /* Whether a given bus number is in range of the secondary
  * bus of the given bridge device. */
 static bool pci_secondary_bus_in_range(PCIDevice *dev, int bus_num)
diff --git a/hw/pci.h b/hw/pci.h
index 391217e..65e1568 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -90,7 +90,6 @@  typedef struct PCIIORegion {
     pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
 #define PCI_BAR_UNMAPPED (~(pcibus_t)0)
     pcibus_t size;
-    pcibus_t filtered_size;
     uint8_t type;
     MemoryRegion *memory;
     MemoryRegion *address_space;
@@ -277,7 +276,6 @@  int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, int *busp,
 
 void do_pci_info_print(Monitor *mon, const QObject *data);
 void do_pci_info(Monitor *mon, QObject **ret_data);
-void pci_bridge_update_mappings(PCIBus *b);
 
 void pci_device_deassert_intx(PCIDevice *dev);
 
diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
index e0b339e..b488f06 100644
--- a/hw/pci_bridge.c
+++ b/hw/pci_bridge.c
@@ -135,6 +135,72 @@  pcibus_t pci_bridge_get_limit(const PCIDevice *bridge, uint8_t type)
     return limit;
 }
 
+static void pci_bridge_init_alias(PCIBridge *bridge, MemoryRegion *alias,
+                                  uint8_t type, const char *name,
+                                  MemoryRegion *space,
+                                  MemoryRegion *parent_space)
+{
+    pcibus_t base = pci_bridge_get_base(&bridge->dev, type);
+    pcibus_t limit = pci_bridge_get_limit(&bridge->dev, type);
+    /* TODO: this doesn't handle base = 0 limit = 2^64 - 1 correctly.
+     * Apparently no way to do this with existing memory APIs. */
+    pcibus_t size = limit >= base ? limit + 1 - base : 0;
+
+    memory_region_init_alias(alias, name, space, base, size);
+    memory_region_add_subregion_overlap(parent_space, base, alias, 1);
+}
+
+static void pci_bridge_cleanup_alias(MemoryRegion *alias,
+                                     MemoryRegion *parent_space)
+{
+    memory_region_del_subregion(parent_space, alias);
+    memory_region_destroy(alias);
+}
+
+static void pci_bridge_region_init(PCIBridge *br)
+{
+    PCIBus *sec_bus = &br->sec_bus;
+    PCIBus *parent = br->dev.bus;
+    pci_bridge_init_alias(br, sec_bus->alias_pref_mem,
+                          PCI_BASE_ADDRESS_MEM_PREFETCH,
+                          "pci_bridge_pref_mem",
+                          sec_bus->address_space_mem,
+                          parent->address_space_mem);
+    pci_bridge_init_alias(br, sec_bus->alias_mem,
+                          PCI_BASE_ADDRESS_SPACE_MEMORY,
+                          "pci_bridge_mem",
+                          sec_bus->address_space_mem,
+                          parent->address_space_mem);
+    pci_bridge_init_alias(br, sec_bus->alias_io,
+                          PCI_BASE_ADDRESS_SPACE_IO,
+                          "pci_bridge_io",
+                          sec_bus->address_space_io,
+                          parent->address_space_io);
+   /* TODO: VGA and VGA palatte snooping support. */
+}
+
+static void pci_bridge_region_cleanup(PCIBridge *br)
+{
+    PCIBus *sec_bus = &br->sec_bus;
+    PCIBus *parent = br->dev.bus;
+    pci_bridge_cleanup_alias(sec_bus->alias_io,
+                             parent->address_space_io);
+    pci_bridge_cleanup_alias(sec_bus->alias_mem,
+                             parent->address_space_mem);
+    pci_bridge_cleanup_alias(sec_bus->alias_pref_mem,
+                             parent->address_space_mem);
+}
+
+static void pci_bridge_update_mappings(PCIBridge *br)
+{
+    /* Make updates atomic to: handle the case of one VCPU updating the bridge
+     * while another accesses an unaffected region. */
+    memory_region_transaction_begin();
+    pci_bridge_region_cleanup(br);
+    pci_bridge_region_init(br);
+    memory_region_transaction_commit();
+}
+
 /* default write_config function for PCI-to-PCI bridge */
 void pci_bridge_write_config(PCIDevice *d,
                              uint32_t address, uint32_t val, int len)
@@ -151,7 +217,7 @@  void pci_bridge_write_config(PCIDevice *d,
         /* memory base/limit, prefetchable base/limit and
            io base/limit upper 16 */
         ranges_overlap(address, len, PCI_MEMORY_BASE, 20)) {
-        pci_bridge_update_mappings(&s->sec_bus);
+        pci_bridge_update_mappings(s);
     }
 
     newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
@@ -246,10 +312,14 @@  int pci_bridge_initfn(PCIDevice *dev)
                         br->bus_name);
     sec_bus->parent_dev = dev;
     sec_bus->map_irq = br->map_irq;
-    /* TODO: use memory API to perform memory filtering. */
-    sec_bus->address_space_mem = parent->address_space_mem;
-    sec_bus->address_space_io = parent->address_space_io;
-
+    sec_bus->address_space_mem = g_new(MemoryRegion, 1);
+    memory_region_init(sec_bus->address_space_mem, "pci_pridge_pci", INT64_MAX);
+    sec_bus->address_space_io = g_new(MemoryRegion, 1);
+    memory_region_init(sec_bus->address_space_io, "pci_bridge_io", 65536);
+    sec_bus->alias_pref_mem = g_new(MemoryRegion, 1);
+    sec_bus->alias_mem = g_new(MemoryRegion, 1);
+    sec_bus->alias_io = g_new(MemoryRegion, 1);
+    pci_bridge_region_init(br);
     QLIST_INIT(&sec_bus->child);
     QLIST_INSERT_HEAD(&parent->child, sec_bus, sibling);
     return 0;
@@ -259,8 +329,17 @@  int pci_bridge_initfn(PCIDevice *dev)
 int pci_bridge_exitfn(PCIDevice *pci_dev)
 {
     PCIBridge *s = DO_UPCAST(PCIBridge, dev, pci_dev);
+    PCIBus *sec_bus = &s->sec_bus;
     assert(QLIST_EMPTY(&s->sec_bus.child));
     QLIST_REMOVE(&s->sec_bus, sibling);
+    pci_bridge_region_cleanup(s);
+    g_free(sec_bus->alias_pref_mem);
+    g_free(sec_bus->alias_mem);
+    g_free(sec_bus->alias_io);
+    memory_region_destroy(sec_bus->address_space_mem);
+    g_free(sec_bus->address_space_mem);
+    memory_region_destroy(sec_bus->address_space_io);
+    g_free(sec_bus->address_space_io);
     /* qbus_free() is called automatically by qdev_free() */
     return 0;
 }
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index c7fd23d..578c8d2 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -27,6 +27,9 @@  struct PCIBus {
     target_phys_addr_t mem_base;
     MemoryRegion *address_space_mem;
     MemoryRegion *address_space_io;
+    MemoryRegion *alias_pref_mem;
+    MemoryRegion *alias_mem;
+    MemoryRegion *alias_io;
 
     QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
     QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */