Patchwork [RFC] msix: Support specifying offsets, BARs, and capability location

login
register
mail settings
Submitter Alex Williamson
Date June 10, 2012, 5:02 p.m.
Message ID <20120610165957.3942.89495.stgit@bling.home>
Download mbox | patch
Permalink /patch/164007/
State New
Headers show

Comments

Alex Williamson - June 10, 2012, 5:02 p.m.
msix_init has very little configurability as to how it lays out
MSI/X for a device.  It claims to resize BARs, but doesn't
actually do this anymore.  This patch allows MSI/X to be fully
specified, which is necessary both for emulated devices trying
to match the physical layout of a hardware device as well as for
any kind of device assignment.

The original intent of msix_init seems to have been to allow
completely virtual devices so enable MSI/X without knowing
anything about it.  A sort of "here's a BAR and a device, add
MSI/X to it".  We've already dropped the resize support with
the memory API, and I think it makes sense to tune the interface
to something that still makes it easy for virtual devices to use,
but allows the full specification for realistically creating
emulated devices using MSI/X.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

This needs more testing, but I wanted to send it out in case someone
else is working on something similar.  In order to not modify the
generic capability adding code VFIO needs to tell qemu about ever
capability for a device.  The main roadblock to doing that is that
msix_init does not provide sufficient specification of the MSI/X area
to match a physical device.  This solves that.

 hw/ivshmem.c    |    5 +
 hw/msix.c       |  204 ++++++++++++++++++++++++++++++++++++-------------------
 hw/msix.h       |   12 ++-
 hw/pci.h        |   14 +++-
 hw/virtio-pci.c |    7 +-
 5 files changed, 161 insertions(+), 81 deletions(-)
Jan Kiszka - June 10, 2012, 5:15 p.m.
On 2012-06-10 19:02, Alex Williamson wrote:
> msix_init has very little configurability as to how it lays out
> MSI/X for a device.  It claims to resize BARs, but doesn't
> actually do this anymore.  This patch allows MSI/X to be fully
> specified, which is necessary both for emulated devices trying
> to match the physical layout of a hardware device as well as for
> any kind of device assignment.
> 
> The original intent of msix_init seems to have been to allow
> completely virtual devices so enable MSI/X without knowing
> anything about it.  A sort of "here's a BAR and a device, add
> MSI/X to it".  We've already dropped the resize support with
> the memory API, and I think it makes sense to tune the interface
> to something that still makes it easy for virtual devices to use,
> but allows the full specification for realistically creating
> emulated devices using MSI/X.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> This needs more testing, but I wanted to send it out in case someone
> else is working on something similar.  In order to not modify the
> generic capability adding code VFIO needs to tell qemu about ever
> capability for a device.  The main roadblock to doing that is that
> msix_init does not provide sufficient specification of the MSI/X area
> to match a physical device.  This solves that.

Reminds me that you asked for my version:

http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/msi

specifically the top two commits. WIP, just picked the old commits,
result untested.

Still need to look into your version.

Jan
Michael S. Tsirkin - June 10, 2012, 5:48 p.m.
On Sun, Jun 10, 2012 at 11:02:48AM -0600, Alex Williamson wrote:
> msix_init has very little configurability as to how it lays out
> MSI/X for a device.  It claims to resize BARs, but doesn't
> actually do this anymore.  This patch allows MSI/X to be fully
> specified, which is necessary both for emulated devices trying
> to match the physical layout of a hardware device as well as for
> any kind of device assignment.
> 
> The original intent of msix_init seems to have been to allow
> completely virtual devices so enable MSI/X without knowing
> anything about it.  A sort of "here's a BAR and a device, add
> MSI/X to it".  We've already dropped the resize support with
> the memory API, and I think it makes sense to tune the interface
> to something that still makes it easy for virtual devices to use,
> but allows the full specification for realistically creating
> emulated devices using MSI/X.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


Overall I am for it. Some comments below.

> ---
> 
> This needs more testing, but I wanted to send it out in case someone
> else is working on something similar.  In order to not modify the
> generic capability adding code VFIO needs to tell qemu about ever
> capability for a device.  The main roadblock to doing that is that
> msix_init does not provide sufficient specification of the MSI/X area
> to match a physical device.  This solves that.
> 
>  hw/ivshmem.c    |    5 +
>  hw/msix.c       |  204 ++++++++++++++++++++++++++++++++++++-------------------
>  hw/msix.h       |   12 ++-
>  hw/pci.h        |   14 +++-
>  hw/virtio-pci.c |    7 +-
>  5 files changed, 161 insertions(+), 81 deletions(-)
> 
> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> index 05559b6..c7cd504 100644
> --- a/hw/ivshmem.c
> +++ b/hw/ivshmem.c
> @@ -563,8 +563,9 @@ static uint64_t ivshmem_get_size(IVShmemState * s) {
>  
>  static void ivshmem_setup_msi(IVShmemState * s)
>  {
> -    memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
> -    if (!msix_init(&s->dev, s->vectors, &s->msix_bar, 1, 0)) {
> +    memory_region_init(&s->msix_bar, "ivshmem-msix", MSIX_PAGE_SIZE);
> +    if (!msix_init(&s->dev, s->vectors, &s->msix_bar, 1, 0,
> +                   &s->msix_bar, 1, 0, 0)) {
>          pci_register_bar(&s->dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
>                           &s->msix_bar);
>          IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
> diff --git a/hw/msix.c b/hw/msix.c
> index 7a2b448..90412b2 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -27,17 +27,10 @@
>  #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
>  #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
>  
> -/* How much space does an MSIX table need. */
> -/* The spec requires giving the table structure
> - * a 4K aligned region all by itself. */
> -#define MSIX_PAGE_SIZE 0x1000
> -/* Reserve second half of the page for pending bits */
> -#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
> -#define MSIX_MAX_ENTRIES 32
> -
>  static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
>  {
> -    uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
> +    uint8_t *table_entry = dev->msix_table_page + dev->msix_table_offset +
> +                           vector * PCI_MSIX_ENTRY_SIZE;
>      MSIMessage msg;
>  
>      msg.address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
> @@ -46,44 +39,28 @@ static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
>  }
>  
>  /* Add MSI-X capability to the config space for the device. */
> -/* Given a bar and its size, add MSI-X table on top of it
> - * and fill MSI-X capability in the config space.
> - * Original bar size must be a power of 2 or 0.
> - * New bar size is returned. */
>  static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
> -                           unsigned bar_nr, unsigned bar_size)
> +                           uint8_t table_bar, unsigned table_offset,
> +                           uint8_t pba_bar, unsigned pba_offset, uint8_t pos)
>  {
>      int config_offset;
>      uint8_t *config;
> -    uint32_t new_size;
>  
>      if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1)
>          return -EINVAL;
> -    if (bar_size > 0x80000000)
> -        return -ENOSPC;
> -
> -    /* Add space for MSI-X structures */
> -    if (!bar_size) {
> -        new_size = MSIX_PAGE_SIZE;
> -    } else if (bar_size < MSIX_PAGE_SIZE) {
> -        bar_size = MSIX_PAGE_SIZE;
> -        new_size = MSIX_PAGE_SIZE * 2;
> -    } else {
> -        new_size = bar_size * 2;
> -    }
>  
>      config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
> -                                       0, MSIX_CAP_LENGTH);
> +                                       pos, MSIX_CAP_LENGTH);
>      if (config_offset < 0)
>          return config_offset;
> +
>      config = pdev->config + config_offset;
>  
>      pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
>      /* Table on top of BAR */
> -    pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
> +    pci_set_long(config + PCI_MSIX_TABLE, table_offset | table_bar);
>      /* Pending bits on top of that */
> -    pci_set_long(config + PCI_MSIX_PBA, (bar_size + MSIX_PAGE_PENDING) |
> -                 bar_nr);
> +    pci_set_long(config + PCI_MSIX_PBA, pba_offset | pba_bar);
>      pdev->msix_cap = config_offset;
>      /* Make flags bit writable. */
>      pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
> @@ -92,8 +69,8 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
>      return 0;
>  }
>  
> -static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
> -                               unsigned size)
> +static uint64_t msix_table_mmio_read(void *opaque, target_phys_addr_t addr,
> +                                     unsigned size)
>  {
>      PCIDevice *dev = opaque;
>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> @@ -102,6 +79,16 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
>      return pci_get_long(page + offset);
>  }
>  
> +static uint64_t msix_pba_mmio_read(void *opaque, target_phys_addr_t addr,
> +                                   unsigned size)
> +{
> +    PCIDevice *dev = opaque;
> +    unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> +    void *page = dev->msix_pba_page;
> +
> +    return pci_get_long(page + offset);
> +}
> +
>  static uint8_t msix_pending_mask(int vector)
>  {
>      return 1 << (vector % 8);
> @@ -109,7 +96,7 @@ static uint8_t msix_pending_mask(int vector)
>  
>  static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
>  {
> -    return dev->msix_table_page + MSIX_PAGE_PENDING + vector / 8;
> +    return dev->msix_pba_page + dev->msix_pba_offset + vector / 8;
>  }
>  
>  static int msix_is_pending(PCIDevice *dev, int vector)
> @@ -129,7 +116,8 @@ static void msix_clr_pending(PCIDevice *dev, int vector)
>  
>  static bool msix_vector_masked(PCIDevice *dev, int vector, bool fmask)
>  {
> -    unsigned offset = vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
> +    unsigned offset = dev->msix_table_offset +
> +                      vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
>      return fmask || dev->msix_table_page[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
>  }
>  
> @@ -209,16 +197,16 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
>      }
>  }
>  
> -static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> -                            uint64_t val, unsigned size)
> +static void msix_table_mmio_write(void *opaque, target_phys_addr_t addr,
> +                                  uint64_t val, unsigned size)
>  {
>      PCIDevice *dev = opaque;
>      unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
> -    int vector = offset / PCI_MSIX_ENTRY_SIZE;
> +    int vector = (offset - dev->msix_table_offset) / PCI_MSIX_ENTRY_SIZE;
>      bool was_masked;
>  
> -    /* MSI-X page includes a read-only PBA and a writeable Vector Control. */
> -    if (vector >= dev->msix_entries_nr) {
> +    /* MSI-X page may include a read-only PBA and a writeable Vector Control. */
> +    if (offset < dev->msix_table_offset || vector >= dev->msix_entries_nr) {
>          return;
>      }
>  
> @@ -227,9 +215,10 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>      msix_handle_mask_update(dev, vector, was_masked);
>  }
>  
> -static const MemoryRegionOps msix_mmio_ops = {
> -    .read = msix_mmio_read,
> -    .write = msix_mmio_write,
> +/* For vector table only or table + pba */
> +static const MemoryRegionOps msix_table_mmio_ops = {
> +    .read = msix_table_mmio_read,
> +    .write = msix_table_mmio_write,
>      .endianness = DEVICE_NATIVE_ENDIAN,

BTW this is a bug we need to fix, of course it is little endian.


>      .valid = {
>          .min_access_size = 4,
> @@ -237,15 +226,29 @@ static const MemoryRegionOps msix_mmio_ops = {
>      },
>  };
>  
> -static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
> +/* For PBA only */
> +static const MemoryRegionOps msix_pba_mmio_ops = {
> +    .read = msix_pba_mmio_read,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .valid = {
> +        .min_access_size = 4,
> +        .max_access_size = 4,
> +    },
> +};
> +
> +static void msix_mmio_setup(PCIDevice *dev, MemoryRegion *table_bar,
> +                            MemoryRegion *pba_bar)
>  {
> -    uint8_t *config = d->config + d->msix_cap;
> -    uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
> -    uint32_t offset = table & ~(MSIX_PAGE_SIZE - 1);
> -    /* TODO: for assigned devices, we'll want to make it possible to map
> -     * pending bits separately in case they are in a separate bar. */
> +    uint8_t *config = dev->config + dev->msix_cap;
> +    uint32_t table, pba;
> +
> +    table = pci_get_long(config + PCI_MSIX_TABLE) & ~(MSIX_PAGE_SIZE - 1);
> +    pba = pci_get_long(config + PCI_MSIX_PBA) & ~(MSIX_PAGE_SIZE - 1);
>  
> -    memory_region_add_subregion(bar, offset, &d->msix_mmio);
> +    memory_region_add_subregion(table_bar, table, &dev->msix_table_mmio);
> +    if (pba_bar && dev->msix_table_page != dev->msix_pba_page) {
> +        memory_region_add_subregion(pba_bar, pba, &dev->msix_pba_mmio);
> +    }
>  }
>  
>  static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
> @@ -253,7 +256,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
>      int vector;
>  
>      for (vector = 0; vector < nentries; ++vector) {
> -        unsigned offset =
> +        unsigned offset = dev->msix_table_offset +
>              vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
>          bool was_masked = msix_is_masked(dev, vector);
>  
> @@ -262,42 +265,92 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
>      }
>  }
>  
> -/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
> - * modified, it should be retrieved with msix_bar_size. */
> +/* Initialize the MSI-X structures */
>  int msix_init(struct PCIDevice *dev, unsigned short nentries,
> -              MemoryRegion *bar,
> -              unsigned bar_nr, unsigned bar_size)
> +              MemoryRegion *table_bar, uint8_t table_bar_nr,
> +              unsigned table_offset, MemoryRegion *pba_bar,
> +              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos)
>  {
>      int ret;
> +    unsigned table_size, pba_size, table_page, pba_page;
> +    bool shared = false;
>  
>      /* Nothing to do if MSI is not supported by interrupt controller */
>      if (!msi_supported) {
>          return -ENOTSUP;
>      }
> -    if (nentries > MSIX_MAX_ENTRIES)
> -        return -EINVAL;
>  
> -    dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
> -                                        sizeof *dev->msix_entry_used);
> +    table_size = nentries * PCI_MSIX_ENTRY_SIZE;
> +    table_page = table_offset & ~(MSIX_PAGE_SIZE - 1);
> +    pba_size = QEMU_ALIGN_UP(nentries, 64) / 8;
> +    pba_page = pba_offset & ~(MSIX_PAGE_SIZE - 1);
> +
> +    /*
> +     * If unspecified, we attempt to put the PBA immediately following
> +     * the vector table, on the same page if there's room.
> +     */

Why even bother with defaults? For virtio? Let's add an API so it
can get a good default for pba offset instead.

> +    if (table_bar_nr == pba_bar_nr && !table_offset && !pba_offset) {
> +        if (table_size + pba_size < MSIX_PAGE_SIZE) {
> +            pba_offset = table_size;
> +        } else {
> +            pba_offset = MSIX_PAGE_SIZE;
> +            pba_page += MSIX_PAGE_SIZE;
> +        }
> +    }
>  
> +    if (table_bar_nr == pba_bar_nr && table_page == pba_page) {
> +        if (memory_region_size(table_bar) < table_page + MSIX_PAGE_SIZE ||
> +            ranges_overlap(table_offset, table_size, pba_offset, pba_size) ||
> +            ranges_overlap(table_offset, table_size,
> +                           table_page + MSIX_PAGE_SIZE, MSIX_PAGE_SIZE) ||
> +            ranges_overlap(pba_offset, pba_size,
> +                           pba_page + MSIX_PAGE_SIZE, MSIX_PAGE_SIZE)) {
> +            return -EINVAL;
> +        }
> +        shared = true;
> +    } else {
> +        if (memory_region_size(table_bar) < table_page + MSIX_PAGE_SIZE ||
> +            memory_region_size(pba_bar) < pba_page + MSIX_PAGE_SIZE) {
> +            return -EINVAL;
> +        }
> +    }
> +            
>      dev->msix_table_page = g_malloc0(MSIX_PAGE_SIZE);
> +    dev->msix_pba_page = shared ? dev->msix_table_page :
> +                                  g_malloc0(MSIX_PAGE_SIZE);

This seems to just make code complex. There's a bit per vector
so you are unlikely to need a lot of memory. Just allocate
what's needed and be done with it.

> +    dev->msix_table_offset = table_offset & (MSIX_PAGE_SIZE - 1);
> +    dev->msix_pba_offset = pba_offset & (MSIX_PAGE_SIZE - 1);

I used a single table to keep it simple.
now that you have separate tables why not make them as
large as they need to be exactly?
Make pointers point at start of each table.
Then you won't need to rework most of the code.

> +
> +    dev->msix_entry_used = g_malloc0(nentries * sizeof *dev->msix_entry_used);
> +
>      msix_mask_all(dev, nentries);
>  
> -    memory_region_init_io(&dev->msix_mmio, &msix_mmio_ops, dev,
> +    memory_region_init_io(&dev->msix_table_mmio, &msix_table_mmio_ops, dev,
>                            "msix", MSIX_PAGE_SIZE);
>  
> +    if (dev->msix_table_page != dev->msix_pba_page) {
> +        memory_region_init_io(&dev->msix_pba_mmio, &msix_pba_mmio_ops, dev,
> +                              "msix-pba", MSIX_PAGE_SIZE);
> +    }
> +
>      dev->msix_entries_nr = nentries;
> -    ret = msix_add_config(dev, nentries, bar_nr, bar_size);
> +    ret = msix_add_config(dev, nentries, table_bar_nr, table_offset,
> +                          pba_bar_nr, pba_offset, cap_pos);
>      if (ret)
>          goto err_config;
>  
>      dev->cap_present |= QEMU_PCI_CAP_MSIX;
> -    msix_mmio_setup(dev, bar);
> +    msix_mmio_setup(dev, table_bar, pba_bar);
>      return 0;
>  
>  err_config:
>      dev->msix_entries_nr = 0;
> -    memory_region_destroy(&dev->msix_mmio);
> +    if (dev->msix_table_page != dev->msix_pba_page) {
> +        memory_region_destroy(&dev->msix_pba_mmio);
> +        g_free(dev->msix_pba_page);
> +        dev->msix_pba_page = NULL;
> +    }
> +    memory_region_destroy(&dev->msix_table_mmio);
>      g_free(dev->msix_table_page);
>      dev->msix_table_page = NULL;
>      g_free(dev->msix_entry_used);
> @@ -316,7 +369,7 @@ static void msix_free_irq_entries(PCIDevice *dev)
>  }
>  
>  /* Clean up resources for the device. */
> -int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
> +int msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar)
>  {
>      if (!msix_present(dev)) {
>          return 0;
> @@ -325,8 +378,14 @@ int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
>      dev->msix_cap = 0;
>      msix_free_irq_entries(dev);
>      dev->msix_entries_nr = 0;
> -    memory_region_del_subregion(bar, &dev->msix_mmio);
> -    memory_region_destroy(&dev->msix_mmio);
> +    if (pba_bar && dev->msix_table_page != dev->msix_pba_page) {
> +        memory_region_del_subregion(pba_bar, &dev->msix_pba_mmio);
> +        memory_region_destroy(&dev->msix_pba_mmio);
> +        g_free(dev->msix_pba_page);
> +        dev->msix_pba_page = NULL;
> +    }
> +    memory_region_del_subregion(table_bar, &dev->msix_table_mmio);
> +    memory_region_destroy(&dev->msix_table_mmio);
>      g_free(dev->msix_table_page);
>      dev->msix_table_page = NULL;
>      g_free(dev->msix_entry_used);
> @@ -343,8 +402,9 @@ void msix_save(PCIDevice *dev, QEMUFile *f)
>          return;
>      }
>  
> -    qemu_put_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
> -    qemu_put_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
> +    qemu_put_buffer(f, dev->msix_table_page + dev->msix_table_offset,
> +                    n * PCI_MSIX_ENTRY_SIZE);
> +    qemu_put_buffer(f, dev->msix_pba_page + dev->msix_pba_offset, (n + 7) / 8);
>  }
>  
>  /* Should be called after restoring the config space. */
> @@ -358,8 +418,9 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
>      }
>  
>      msix_free_irq_entries(dev);
> -    qemu_get_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
> -    qemu_get_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
> +    qemu_get_buffer(f, dev->msix_table_page + dev->msix_table_offset,
> +                    n * PCI_MSIX_ENTRY_SIZE);
> +    qemu_get_buffer(f, dev->msix_pba_page + dev->msix_pba_offset, (n + 7) / 8);
>      msix_update_function_masked(dev);
>  
>      for (vector = 0; vector < n; vector++) {
> @@ -416,7 +477,7 @@ bool msix_recall_all(PCIDevice *dev)
>  {
>      uint8_t ret = 0;
>      uint8_t *b;
> -    for (b = dev->msix_table_page + MSIX_PAGE_PENDING;
> +    for (b = dev->msix_pba_page + dev->msix_pba_offset;
>  	 b <= msix_pending_byte(dev, dev->msix_entries_nr - 1); ++b) {
>          ret |= *b;
>          *b = 0;
> @@ -433,6 +494,7 @@ void msix_reset(PCIDevice *dev)
>      dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
>  	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
>      memset(dev->msix_table_page, 0, MSIX_PAGE_SIZE);
> +    memset(dev->msix_pba_page, 0, MSIX_PAGE_SIZE);
>      msix_mask_all(dev, dev->msix_entries_nr);
>  }
>  
> diff --git a/hw/msix.h b/hw/msix.h
> index 2b15559..b0730be 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -4,14 +4,20 @@
>  #include "qemu-common.h"
>  #include "pci.h"
>  
> +/* How much space does an MSIX table need. */
> +/* The spec requires giving the table structure
> + * a 4K aligned region all by itself. */
> +#define MSIX_PAGE_SIZE 0x1000
> +

This is a bug in current device assignment.
Table can be bigger than 4K.
So pls don't export this macro needs to be
a function for pba and one for vectors.

>  int msix_init(PCIDevice *pdev, unsigned short nentries,
> -              MemoryRegion *bar,
> -              unsigned bar_nr, unsigned bar_size);
> +              MemoryRegion *table_bar, uint8_t table_bar_nr,
> +              unsigned table_offset, MemoryRegion *pba_bar,
> +              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos);
>  
>  void msix_write_config(PCIDevice *pci_dev, uint32_t address,
>                         uint32_t val, int len);
>  
> -int msix_uninit(PCIDevice *d, MemoryRegion *bar);
> +int msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar);
>  
>  unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
>  
> diff --git a/hw/pci.h b/hw/pci.h
> index 3d534e7..ee2dbd6 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -220,14 +220,24 @@ struct PCIDevice {
>      /* MSI-X entries */
>      int msix_entries_nr;
>  
> -    /* Space to store MSIX table */
> +    /* Space to store MSIX table & pending bits array */
>      uint8_t *msix_table_page;
> +    uint8_t *msix_pba_page;
> +
> +    /* Offset of each within the above page */
> +    unsigned msix_table_offset;
> +    unsigned msix_pba_offset;
> +
>      /* MMIO index used to map MSIX table and pending bit entries. */
> -    MemoryRegion msix_mmio;
> +    MemoryRegion msix_table_mmio;
> +    MemoryRegion msix_pba_mmio;
> +
>      /* Reference-count for entries actually in use by driver. */
>      unsigned *msix_entry_used;
> +
>      /* MSIX function mask set or MSIX disabled */
>      bool msix_function_masked;
> +
>      /* Version id needed for VMState */
>      int32_t version_id;
>  
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index 9cfdd11..e7788b9 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -842,9 +842,10 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev)
>      pci_set_word(config + PCI_SUBSYSTEM_ID, vdev->device_id);
>      config[PCI_INTERRUPT_PIN] = 1;
>  
> -    memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
> +    memory_region_init(&proxy->msix_bar, "virtio-msix", MSIX_PAGE_SIZE);
>      if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
> -                                     &proxy->msix_bar, 1, 0)) {
> +                                     &proxy->msix_bar, 1, 0,
> +                                     &proxy->msix_bar, 1, 0, 0)) {
>          pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
>                           &proxy->msix_bar);
>      } else
> @@ -905,7 +906,7 @@ static int virtio_exit_pci(PCIDevice *pci_dev)
>      int r;
>  
>      memory_region_destroy(&proxy->bar);
> -    r = msix_uninit(pci_dev, &proxy->msix_bar);
> +    r = msix_uninit(pci_dev, &proxy->msix_bar, &proxy->msix_bar);
>      memory_region_destroy(&proxy->msix_bar);
>      return r;
>  }

Patch

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 05559b6..c7cd504 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -563,8 +563,9 @@  static uint64_t ivshmem_get_size(IVShmemState * s) {
 
 static void ivshmem_setup_msi(IVShmemState * s)
 {
-    memory_region_init(&s->msix_bar, "ivshmem-msix", 4096);
-    if (!msix_init(&s->dev, s->vectors, &s->msix_bar, 1, 0)) {
+    memory_region_init(&s->msix_bar, "ivshmem-msix", MSIX_PAGE_SIZE);
+    if (!msix_init(&s->dev, s->vectors, &s->msix_bar, 1, 0,
+                   &s->msix_bar, 1, 0, 0)) {
         pci_register_bar(&s->dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
                          &s->msix_bar);
         IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
diff --git a/hw/msix.c b/hw/msix.c
index 7a2b448..90412b2 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -27,17 +27,10 @@ 
 #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
 #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
 
-/* How much space does an MSIX table need. */
-/* The spec requires giving the table structure
- * a 4K aligned region all by itself. */
-#define MSIX_PAGE_SIZE 0x1000
-/* Reserve second half of the page for pending bits */
-#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
-#define MSIX_MAX_ENTRIES 32
-
 static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
 {
-    uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
+    uint8_t *table_entry = dev->msix_table_page + dev->msix_table_offset +
+                           vector * PCI_MSIX_ENTRY_SIZE;
     MSIMessage msg;
 
     msg.address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
@@ -46,44 +39,28 @@  static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
 }
 
 /* Add MSI-X capability to the config space for the device. */
-/* Given a bar and its size, add MSI-X table on top of it
- * and fill MSI-X capability in the config space.
- * Original bar size must be a power of 2 or 0.
- * New bar size is returned. */
 static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
-                           unsigned bar_nr, unsigned bar_size)
+                           uint8_t table_bar, unsigned table_offset,
+                           uint8_t pba_bar, unsigned pba_offset, uint8_t pos)
 {
     int config_offset;
     uint8_t *config;
-    uint32_t new_size;
 
     if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1)
         return -EINVAL;
-    if (bar_size > 0x80000000)
-        return -ENOSPC;
-
-    /* Add space for MSI-X structures */
-    if (!bar_size) {
-        new_size = MSIX_PAGE_SIZE;
-    } else if (bar_size < MSIX_PAGE_SIZE) {
-        bar_size = MSIX_PAGE_SIZE;
-        new_size = MSIX_PAGE_SIZE * 2;
-    } else {
-        new_size = bar_size * 2;
-    }
 
     config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
-                                       0, MSIX_CAP_LENGTH);
+                                       pos, MSIX_CAP_LENGTH);
     if (config_offset < 0)
         return config_offset;
+
     config = pdev->config + config_offset;
 
     pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
     /* Table on top of BAR */
-    pci_set_long(config + PCI_MSIX_TABLE, bar_size | bar_nr);
+    pci_set_long(config + PCI_MSIX_TABLE, table_offset | table_bar);
     /* Pending bits on top of that */
-    pci_set_long(config + PCI_MSIX_PBA, (bar_size + MSIX_PAGE_PENDING) |
-                 bar_nr);
+    pci_set_long(config + PCI_MSIX_PBA, pba_offset | pba_bar);
     pdev->msix_cap = config_offset;
     /* Make flags bit writable. */
     pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK |
@@ -92,8 +69,8 @@  static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
     return 0;
 }
 
-static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
-                               unsigned size)
+static uint64_t msix_table_mmio_read(void *opaque, target_phys_addr_t addr,
+                                     unsigned size)
 {
     PCIDevice *dev = opaque;
     unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
@@ -102,6 +79,16 @@  static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
     return pci_get_long(page + offset);
 }
 
+static uint64_t msix_pba_mmio_read(void *opaque, target_phys_addr_t addr,
+                                   unsigned size)
+{
+    PCIDevice *dev = opaque;
+    unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
+    void *page = dev->msix_pba_page;
+
+    return pci_get_long(page + offset);
+}
+
 static uint8_t msix_pending_mask(int vector)
 {
     return 1 << (vector % 8);
@@ -109,7 +96,7 @@  static uint8_t msix_pending_mask(int vector)
 
 static uint8_t *msix_pending_byte(PCIDevice *dev, int vector)
 {
-    return dev->msix_table_page + MSIX_PAGE_PENDING + vector / 8;
+    return dev->msix_pba_page + dev->msix_pba_offset + vector / 8;
 }
 
 static int msix_is_pending(PCIDevice *dev, int vector)
@@ -129,7 +116,8 @@  static void msix_clr_pending(PCIDevice *dev, int vector)
 
 static bool msix_vector_masked(PCIDevice *dev, int vector, bool fmask)
 {
-    unsigned offset = vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
+    unsigned offset = dev->msix_table_offset +
+                      vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
     return fmask || dev->msix_table_page[offset] & PCI_MSIX_ENTRY_CTRL_MASKBIT;
 }
 
@@ -209,16 +197,16 @@  void msix_write_config(PCIDevice *dev, uint32_t addr,
     }
 }
 
-static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
-                            uint64_t val, unsigned size)
+static void msix_table_mmio_write(void *opaque, target_phys_addr_t addr,
+                                  uint64_t val, unsigned size)
 {
     PCIDevice *dev = opaque;
     unsigned int offset = addr & (MSIX_PAGE_SIZE - 1) & ~0x3;
-    int vector = offset / PCI_MSIX_ENTRY_SIZE;
+    int vector = (offset - dev->msix_table_offset) / PCI_MSIX_ENTRY_SIZE;
     bool was_masked;
 
-    /* MSI-X page includes a read-only PBA and a writeable Vector Control. */
-    if (vector >= dev->msix_entries_nr) {
+    /* MSI-X page may include a read-only PBA and a writeable Vector Control. */
+    if (offset < dev->msix_table_offset || vector >= dev->msix_entries_nr) {
         return;
     }
 
@@ -227,9 +215,10 @@  static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
     msix_handle_mask_update(dev, vector, was_masked);
 }
 
-static const MemoryRegionOps msix_mmio_ops = {
-    .read = msix_mmio_read,
-    .write = msix_mmio_write,
+/* For vector table only or table + pba */
+static const MemoryRegionOps msix_table_mmio_ops = {
+    .read = msix_table_mmio_read,
+    .write = msix_table_mmio_write,
     .endianness = DEVICE_NATIVE_ENDIAN,
     .valid = {
         .min_access_size = 4,
@@ -237,15 +226,29 @@  static const MemoryRegionOps msix_mmio_ops = {
     },
 };
 
-static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
+/* For PBA only */
+static const MemoryRegionOps msix_pba_mmio_ops = {
+    .read = msix_pba_mmio_read,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static void msix_mmio_setup(PCIDevice *dev, MemoryRegion *table_bar,
+                            MemoryRegion *pba_bar)
 {
-    uint8_t *config = d->config + d->msix_cap;
-    uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
-    uint32_t offset = table & ~(MSIX_PAGE_SIZE - 1);
-    /* TODO: for assigned devices, we'll want to make it possible to map
-     * pending bits separately in case they are in a separate bar. */
+    uint8_t *config = dev->config + dev->msix_cap;
+    uint32_t table, pba;
+
+    table = pci_get_long(config + PCI_MSIX_TABLE) & ~(MSIX_PAGE_SIZE - 1);
+    pba = pci_get_long(config + PCI_MSIX_PBA) & ~(MSIX_PAGE_SIZE - 1);
 
-    memory_region_add_subregion(bar, offset, &d->msix_mmio);
+    memory_region_add_subregion(table_bar, table, &dev->msix_table_mmio);
+    if (pba_bar && dev->msix_table_page != dev->msix_pba_page) {
+        memory_region_add_subregion(pba_bar, pba, &dev->msix_pba_mmio);
+    }
 }
 
 static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
@@ -253,7 +256,7 @@  static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
     int vector;
 
     for (vector = 0; vector < nentries; ++vector) {
-        unsigned offset =
+        unsigned offset = dev->msix_table_offset +
             vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL;
         bool was_masked = msix_is_masked(dev, vector);
 
@@ -262,42 +265,92 @@  static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
     }
 }
 
-/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
- * modified, it should be retrieved with msix_bar_size. */
+/* Initialize the MSI-X structures */
 int msix_init(struct PCIDevice *dev, unsigned short nentries,
-              MemoryRegion *bar,
-              unsigned bar_nr, unsigned bar_size)
+              MemoryRegion *table_bar, uint8_t table_bar_nr,
+              unsigned table_offset, MemoryRegion *pba_bar,
+              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos)
 {
     int ret;
+    unsigned table_size, pba_size, table_page, pba_page;
+    bool shared = false;
 
     /* Nothing to do if MSI is not supported by interrupt controller */
     if (!msi_supported) {
         return -ENOTSUP;
     }
-    if (nentries > MSIX_MAX_ENTRIES)
-        return -EINVAL;
 
-    dev->msix_entry_used = g_malloc0(MSIX_MAX_ENTRIES *
-                                        sizeof *dev->msix_entry_used);
+    table_size = nentries * PCI_MSIX_ENTRY_SIZE;
+    table_page = table_offset & ~(MSIX_PAGE_SIZE - 1);
+    pba_size = QEMU_ALIGN_UP(nentries, 64) / 8;
+    pba_page = pba_offset & ~(MSIX_PAGE_SIZE - 1);
+
+    /*
+     * If unspecified, we attempt to put the PBA immediately following
+     * the vector table, on the same page if there's room.
+     */
+    if (table_bar_nr == pba_bar_nr && !table_offset && !pba_offset) {
+        if (table_size + pba_size < MSIX_PAGE_SIZE) {
+            pba_offset = table_size;
+        } else {
+            pba_offset = MSIX_PAGE_SIZE;
+            pba_page += MSIX_PAGE_SIZE;
+        }
+    }
 
+    if (table_bar_nr == pba_bar_nr && table_page == pba_page) {
+        if (memory_region_size(table_bar) < table_page + MSIX_PAGE_SIZE ||
+            ranges_overlap(table_offset, table_size, pba_offset, pba_size) ||
+            ranges_overlap(table_offset, table_size,
+                           table_page + MSIX_PAGE_SIZE, MSIX_PAGE_SIZE) ||
+            ranges_overlap(pba_offset, pba_size,
+                           pba_page + MSIX_PAGE_SIZE, MSIX_PAGE_SIZE)) {
+            return -EINVAL;
+        }
+        shared = true;
+    } else {
+        if (memory_region_size(table_bar) < table_page + MSIX_PAGE_SIZE ||
+            memory_region_size(pba_bar) < pba_page + MSIX_PAGE_SIZE) {
+            return -EINVAL;
+        }
+    }
+            
     dev->msix_table_page = g_malloc0(MSIX_PAGE_SIZE);
+    dev->msix_pba_page = shared ? dev->msix_table_page :
+                                  g_malloc0(MSIX_PAGE_SIZE);
+    dev->msix_table_offset = table_offset & (MSIX_PAGE_SIZE - 1);
+    dev->msix_pba_offset = pba_offset & (MSIX_PAGE_SIZE - 1);
+
+    dev->msix_entry_used = g_malloc0(nentries * sizeof *dev->msix_entry_used);
+
     msix_mask_all(dev, nentries);
 
-    memory_region_init_io(&dev->msix_mmio, &msix_mmio_ops, dev,
+    memory_region_init_io(&dev->msix_table_mmio, &msix_table_mmio_ops, dev,
                           "msix", MSIX_PAGE_SIZE);
 
+    if (dev->msix_table_page != dev->msix_pba_page) {
+        memory_region_init_io(&dev->msix_pba_mmio, &msix_pba_mmio_ops, dev,
+                              "msix-pba", MSIX_PAGE_SIZE);
+    }
+
     dev->msix_entries_nr = nentries;
-    ret = msix_add_config(dev, nentries, bar_nr, bar_size);
+    ret = msix_add_config(dev, nentries, table_bar_nr, table_offset,
+                          pba_bar_nr, pba_offset, cap_pos);
     if (ret)
         goto err_config;
 
     dev->cap_present |= QEMU_PCI_CAP_MSIX;
-    msix_mmio_setup(dev, bar);
+    msix_mmio_setup(dev, table_bar, pba_bar);
     return 0;
 
 err_config:
     dev->msix_entries_nr = 0;
-    memory_region_destroy(&dev->msix_mmio);
+    if (dev->msix_table_page != dev->msix_pba_page) {
+        memory_region_destroy(&dev->msix_pba_mmio);
+        g_free(dev->msix_pba_page);
+        dev->msix_pba_page = NULL;
+    }
+    memory_region_destroy(&dev->msix_table_mmio);
     g_free(dev->msix_table_page);
     dev->msix_table_page = NULL;
     g_free(dev->msix_entry_used);
@@ -316,7 +369,7 @@  static void msix_free_irq_entries(PCIDevice *dev)
 }
 
 /* Clean up resources for the device. */
-int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
+int msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar)
 {
     if (!msix_present(dev)) {
         return 0;
@@ -325,8 +378,14 @@  int msix_uninit(PCIDevice *dev, MemoryRegion *bar)
     dev->msix_cap = 0;
     msix_free_irq_entries(dev);
     dev->msix_entries_nr = 0;
-    memory_region_del_subregion(bar, &dev->msix_mmio);
-    memory_region_destroy(&dev->msix_mmio);
+    if (pba_bar && dev->msix_table_page != dev->msix_pba_page) {
+        memory_region_del_subregion(pba_bar, &dev->msix_pba_mmio);
+        memory_region_destroy(&dev->msix_pba_mmio);
+        g_free(dev->msix_pba_page);
+        dev->msix_pba_page = NULL;
+    }
+    memory_region_del_subregion(table_bar, &dev->msix_table_mmio);
+    memory_region_destroy(&dev->msix_table_mmio);
     g_free(dev->msix_table_page);
     dev->msix_table_page = NULL;
     g_free(dev->msix_entry_used);
@@ -343,8 +402,9 @@  void msix_save(PCIDevice *dev, QEMUFile *f)
         return;
     }
 
-    qemu_put_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
-    qemu_put_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
+    qemu_put_buffer(f, dev->msix_table_page + dev->msix_table_offset,
+                    n * PCI_MSIX_ENTRY_SIZE);
+    qemu_put_buffer(f, dev->msix_pba_page + dev->msix_pba_offset, (n + 7) / 8);
 }
 
 /* Should be called after restoring the config space. */
@@ -358,8 +418,9 @@  void msix_load(PCIDevice *dev, QEMUFile *f)
     }
 
     msix_free_irq_entries(dev);
-    qemu_get_buffer(f, dev->msix_table_page, n * PCI_MSIX_ENTRY_SIZE);
-    qemu_get_buffer(f, dev->msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8);
+    qemu_get_buffer(f, dev->msix_table_page + dev->msix_table_offset,
+                    n * PCI_MSIX_ENTRY_SIZE);
+    qemu_get_buffer(f, dev->msix_pba_page + dev->msix_pba_offset, (n + 7) / 8);
     msix_update_function_masked(dev);
 
     for (vector = 0; vector < n; vector++) {
@@ -416,7 +477,7 @@  bool msix_recall_all(PCIDevice *dev)
 {
     uint8_t ret = 0;
     uint8_t *b;
-    for (b = dev->msix_table_page + MSIX_PAGE_PENDING;
+    for (b = dev->msix_pba_page + dev->msix_pba_offset;
 	 b <= msix_pending_byte(dev, dev->msix_entries_nr - 1); ++b) {
         ret |= *b;
         *b = 0;
@@ -433,6 +494,7 @@  void msix_reset(PCIDevice *dev)
     dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=
 	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
     memset(dev->msix_table_page, 0, MSIX_PAGE_SIZE);
+    memset(dev->msix_pba_page, 0, MSIX_PAGE_SIZE);
     msix_mask_all(dev, dev->msix_entries_nr);
 }
 
diff --git a/hw/msix.h b/hw/msix.h
index 2b15559..b0730be 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -4,14 +4,20 @@ 
 #include "qemu-common.h"
 #include "pci.h"
 
+/* How much space does an MSIX table need. */
+/* The spec requires giving the table structure
+ * a 4K aligned region all by itself. */
+#define MSIX_PAGE_SIZE 0x1000
+
 int msix_init(PCIDevice *pdev, unsigned short nentries,
-              MemoryRegion *bar,
-              unsigned bar_nr, unsigned bar_size);
+              MemoryRegion *table_bar, uint8_t table_bar_nr,
+              unsigned table_offset, MemoryRegion *pba_bar,
+              uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos);
 
 void msix_write_config(PCIDevice *pci_dev, uint32_t address,
                        uint32_t val, int len);
 
-int msix_uninit(PCIDevice *d, MemoryRegion *bar);
+int msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar);
 
 unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
 
diff --git a/hw/pci.h b/hw/pci.h
index 3d534e7..ee2dbd6 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -220,14 +220,24 @@  struct PCIDevice {
     /* MSI-X entries */
     int msix_entries_nr;
 
-    /* Space to store MSIX table */
+    /* Space to store MSIX table & pending bits array */
     uint8_t *msix_table_page;
+    uint8_t *msix_pba_page;
+
+    /* Offset of each within the above page */
+    unsigned msix_table_offset;
+    unsigned msix_pba_offset;
+
     /* MMIO index used to map MSIX table and pending bit entries. */
-    MemoryRegion msix_mmio;
+    MemoryRegion msix_table_mmio;
+    MemoryRegion msix_pba_mmio;
+
     /* Reference-count for entries actually in use by driver. */
     unsigned *msix_entry_used;
+
     /* MSIX function mask set or MSIX disabled */
     bool msix_function_masked;
+
     /* Version id needed for VMState */
     int32_t version_id;
 
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 9cfdd11..e7788b9 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -842,9 +842,10 @@  void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev)
     pci_set_word(config + PCI_SUBSYSTEM_ID, vdev->device_id);
     config[PCI_INTERRUPT_PIN] = 1;
 
-    memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
+    memory_region_init(&proxy->msix_bar, "virtio-msix", MSIX_PAGE_SIZE);
     if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
-                                     &proxy->msix_bar, 1, 0)) {
+                                     &proxy->msix_bar, 1, 0,
+                                     &proxy->msix_bar, 1, 0, 0)) {
         pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
                          &proxy->msix_bar);
     } else
@@ -905,7 +906,7 @@  static int virtio_exit_pci(PCIDevice *pci_dev)
     int r;
 
     memory_region_destroy(&proxy->bar);
-    r = msix_uninit(pci_dev, &proxy->msix_bar);
+    r = msix_uninit(pci_dev, &proxy->msix_bar, &proxy->msix_bar);
     memory_region_destroy(&proxy->msix_bar);
     return r;
 }