diff mbox

[RFC,v2,1/2] Add param Error** to msi_init() & modify the callers

Message ID 1450176195-9066-2-git-send-email-caoj.fnst@cn.fujitsu.com
State New
Headers show

Commit Message

Cao jin Dec. 15, 2015, 10:43 a.m. UTC
msi_init() is a supporting function in PCI device initialization,
in order to convert .init() to .realize(), it should be modified first.
Also modify the callers

Bonus: add more comment for msi_init().
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/audio/intel-hda.c               | 10 ++++-
 hw/ide/ich.c                       |  2 +-
 hw/net/vmxnet3.c                   | 13 +++---
 hw/pci-bridge/ioh3420.c            |  7 +++-
 hw/pci-bridge/pci_bridge_dev.c     |  8 +++-
 hw/pci-bridge/xio3130_downstream.c |  8 +++-
 hw/pci-bridge/xio3130_upstream.c   |  8 +++-
 hw/pci/msi.c                       | 18 +++++++--
 hw/scsi/megasas.c                  | 15 +++++--
 hw/scsi/vmw_pvscsi.c               | 13 ++++--
 hw/usb/hcd-xhci.c                  | 81 +++++++++++++++++++++-----------------
 hw/vfio/pci.c                      | 20 +++++-----
 include/hw/pci/msi.h               |  4 +-
 13 files changed, 135 insertions(+), 72 deletions(-)

Comments

Markus Armbruster March 2, 2016, 9:13 a.m. UTC | #1
This got lost over the Christmas break, sorry.

Cc'ing Marcel for additional PCI expertise.

Cao jin <caoj.fnst@cn.fujitsu.com> writes:

> msi_init() is a supporting function in PCI device initialization,
> in order to convert .init() to .realize(), it should be modified first.

"Supporting function" doesn't imply "should use Error to report
errors".  HACKING explains:

    Use the simplest suitable method to communicate success / failure to
    callers.  Stick to common methods: non-negative on success / -1 on
    error, non-negative / -errno, non-null / null, or Error objects.

    Example: when a function returns a non-null pointer on success, and it
    can fail only in one way (as far as the caller is concerned), returning
    null on failure is just fine, and certainly simpler and a lot easier on
    the eyes than propagating an Error object through an Error ** parameter.

    Example: when a function's callers need to report details on failure
    only the function really knows, use Error **, and set suitable errors.

    Do not report an error to the user when you're also returning an error
    for somebody else to handle.  Leave the reporting to the place that
    consumes the error returned.

As we'll see in your patch of msi.c, msi_init() can fail in multiple
ways, and uses -errno to comunicate them.  That can be okay even in
realize().  It also reports to the user.  That's what makes it
unsuitable for realize().

> Also modify the callers

Actually, you're *fixing* callers!  But the bugs aren't 100% clear, yet,
see below for details.  Once we know what the bugs are, we'll want to
reword the commit message to describe the bugs and their impact.

I recommend to skip ahead to msi.c, then come back to the device models.

> Bonus: add more comment for msi_init().
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  hw/audio/intel-hda.c               | 10 ++++-
>  hw/ide/ich.c                       |  2 +-
>  hw/net/vmxnet3.c                   | 13 +++---
>  hw/pci-bridge/ioh3420.c            |  7 +++-
>  hw/pci-bridge/pci_bridge_dev.c     |  8 +++-
>  hw/pci-bridge/xio3130_downstream.c |  8 +++-
>  hw/pci-bridge/xio3130_upstream.c   |  8 +++-
>  hw/pci/msi.c                       | 18 +++++++--
>  hw/scsi/megasas.c                  | 15 +++++--
>  hw/scsi/vmw_pvscsi.c               | 13 ++++--
>  hw/usb/hcd-xhci.c                  | 81 +++++++++++++++++++++-----------------
>  hw/vfio/pci.c                      | 20 +++++-----
>  include/hw/pci/msi.h               |  4 +-
>  13 files changed, 135 insertions(+), 72 deletions(-)
>
> diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
> index 433463e..0d770131 100644
> --- a/hw/audio/intel-hda.c
> +++ b/hw/audio/intel-hda.c
> @@ -1130,6 +1130,7 @@ static void intel_hda_realize(PCIDevice *pci, Error **errp)
>  {
>      IntelHDAState *d = INTEL_HDA(pci);
>      uint8_t *conf = d->pci.config;
> +    int ret;
>  
>      d->name = object_get_typename(OBJECT(d));
>  
> @@ -1142,11 +1143,18 @@ static void intel_hda_realize(PCIDevice *pci, Error **errp)
>                            "intel-hda", 0x4000);
>      pci_register_bar(&d->pci, 0, 0, &d->mmio);
>      if (d->msi) {
> -        msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1, true, false);
> +        ret = msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1, true,
> +                false, errp);
> +        if (ret < 0) {
> +            goto cleanup_on_msi_fail;
> +        }
>      }
>  
>      hda_codec_bus_init(DEVICE(pci), &d->codecs, sizeof(d->codecs),
>                         intel_hda_response, intel_hda_xfer);
> +
> +cleanup_on_msi_fail:
> +    object_unref(OBJECT(&d->mmio));
>  }
>  

This is a bug fix.  Two bugs, actually.

Bug#1: we ignore msi_init() failure.  If it fails because the board
doesn't support MSI, the failure is ignored silently.  If it fails
because the PCI config space offset is already occupied, we report the
error to the user, but still ignore it.  The latter feels like a
programming error to me.

Your patch fixes realize() to fail when msi_init() fails.

Bug#2: we report errors with error_report_err() instead through errp.
This is because we use pci_add_capability() instead of
pci_add_capability2().

I don't have the time to review the other devices you patch.  Both bugs
should be easy to spot in the patch: if the value of msi_init() is
ignored before the patch, the device got bug#1, and if the patched
device uses realize(), it got bug#2.

The patch could be split up into parts that fix just one thing.  Not
sure that's worth the bother.

>  static void intel_hda_exit(PCIDevice *pci)
> diff --git a/hw/ide/ich.c b/hw/ide/ich.c
> index 16925fa..94b1809 100644
> --- a/hw/ide/ich.c
> +++ b/hw/ide/ich.c
> @@ -145,7 +145,7 @@ static void pci_ich9_ahci_realize(PCIDevice *dev, Error **errp)
>      /* Although the AHCI 1.3 specification states that the first capability
>       * should be PMCAP, the Intel ICH9 data sheet specifies that the ICH9
>       * AHCI device puts the MSI capability first, pointing to 0x80. */
> -    msi_init(dev, ICH9_MSI_CAP_OFFSET, 1, true, false);
> +    msi_init(dev, ICH9_MSI_CAP_OFFSET, 1, true, false, errp);
>  }
>  
>  static void pci_ich9_uninit(PCIDevice *dev)
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index 37373e5..c373e77 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -2135,13 +2135,13 @@ vmxnet3_cleanup_msix(VMXNET3State *s)
>  #define VMXNET3_PER_VECTOR_MASK   (false)
>  
>  static bool
> -vmxnet3_init_msi(VMXNET3State *s)
> +vmxnet3_init_msi(VMXNET3State *s, Error **errp)
>  {
>      PCIDevice *d = PCI_DEVICE(s);
>      int res;
>  
>      res = msi_init(d, VMXNET3_MSI_OFFSET, VMXNET3_MAX_NMSIX_INTRS,
> -                   VMXNET3_USE_64BIT, VMXNET3_PER_VECTOR_MASK);
> +                   VMXNET3_USE_64BIT, VMXNET3_PER_VECTOR_MASK, errp);
>      if (0 > res) {
>          VMW_WRPRN("Failed to initialize MSI, error %d", res);
>          s->msi_used = false;
> @@ -2204,6 +2204,11 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
>  
>      VMW_CBPRN("Starting init...");
>  
> +    if (!vmxnet3_init_msi(s, errp)) {
> +        VMW_WRPRN("Failed to initialize MSI, configuration is inconsistent.");
> +        return;
> +    }
> +
>      memory_region_init_io(&s->bar0, OBJECT(s), &b0_ops, s,
>                            "vmxnet3-b0", VMXNET3_PT_REG_SIZE);
>      pci_register_bar(pci_dev, VMXNET3_BAR0_IDX,
> @@ -2228,10 +2233,6 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
>          VMW_WRPRN("Failed to initialize MSI-X, configuration is inconsistent.");
>      }
>  
> -    if (!vmxnet3_init_msi(s)) {
> -        VMW_WRPRN("Failed to initialize MSI, configuration is inconsistent.");
> -    }
> -
>      vmxnet3_net_init(s);
>  
>      register_savevm(dev, "vmxnet3-msix", -1, 1,
> diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
> index eead195..5df9e83 100644
> --- a/hw/pci-bridge/ioh3420.c
> +++ b/hw/pci-bridge/ioh3420.c
> @@ -96,6 +96,7 @@ static int ioh3420_initfn(PCIDevice *d)
>      PCIEPort *p = PCIE_PORT(d);
>      PCIESlot *s = PCIE_SLOT(d);
>      int rc;
> +    Error *local_err = NULL;
>  
>      pci_bridge_initfn(d, TYPE_PCIE_BUS);
>      pcie_port_init_reg(d);
> @@ -105,12 +106,16 @@ static int ioh3420_initfn(PCIDevice *d)
>      if (rc < 0) {
>          goto err_bridge;
>      }
> +
>      rc = msi_init(d, IOH_EP_MSI_OFFSET, IOH_EP_MSI_NR_VECTOR,
>                    IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
> -                  IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
> +                  IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
> +                  &local_err);
>      if (rc < 0) {
> +        error_report_err(local_err);
>          goto err_bridge;
>      }
> +
>      rc = pcie_cap_init(d, IOH_EP_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT, p->port);
>      if (rc < 0) {
>          goto err_msi;
> diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
> index bc3e1b7..bafaeeb 100644
> --- a/hw/pci-bridge/pci_bridge_dev.c
> +++ b/hw/pci-bridge/pci_bridge_dev.c
> @@ -51,6 +51,7 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
>      PCIBridge *br = PCI_BRIDGE(dev);
>      PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev);
>      int err;
> +    Error *local_err = NULL;
>  
>      pci_bridge_initfn(dev, TYPE_PCI_BUS);
>  
> @@ -66,17 +67,21 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
>          /* MSI is not applicable without SHPC */
>          bridge_dev->flags &= ~(1 << PCI_BRIDGE_DEV_F_MSI_REQ);
>      }
> +
>      err = slotid_cap_init(dev, 0, bridge_dev->chassis_nr, 0);
>      if (err) {
>          goto slotid_error;
>      }
> +
>      if ((bridge_dev->flags & (1 << PCI_BRIDGE_DEV_F_MSI_REQ)) &&
>          msi_supported) {
> -        err = msi_init(dev, 0, 1, true, true);
> +        err = msi_init(dev, 0, 1, true, true, &local_err);
>          if (err < 0) {
> +            error_report_err(local_err);
>              goto msi_error;
>          }
>      }
> +
>      if (shpc_present(dev)) {
>          /* TODO: spec recommends using 64 bit prefetcheable BAR.
>           * Check whether that works well. */
> @@ -84,6 +89,7 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
>                           PCI_BASE_ADDRESS_MEM_TYPE_64, &bridge_dev->bar);
>      }
>      return 0;
> +
>  msi_error:
>      slotid_cap_cleanup(dev);
>  slotid_error:
> diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c
> index b4dd25f..3fc7455 100644
> --- a/hw/pci-bridge/xio3130_downstream.c
> +++ b/hw/pci-bridge/xio3130_downstream.c
> @@ -59,21 +59,26 @@ static int xio3130_downstream_initfn(PCIDevice *d)
>      PCIEPort *p = PCIE_PORT(d);
>      PCIESlot *s = PCIE_SLOT(d);
>      int rc;
> +    Error *local_err = NULL;
>  
>      pci_bridge_initfn(d, TYPE_PCIE_BUS);
>      pcie_port_init_reg(d);
>  
>      rc = msi_init(d, XIO3130_MSI_OFFSET, XIO3130_MSI_NR_VECTOR,
>                    XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
> -                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
> +                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
> +                  &local_err);
>      if (rc < 0) {
> +        error_report_err(local_err);
>          goto err_bridge;
>      }
> +
>      rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
>                                 XIO3130_SSVID_SVID, XIO3130_SSVID_SSID);
>      if (rc < 0) {
>          goto err_bridge;
>      }
> +
>      rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_DOWNSTREAM,
>                         p->port);
>      if (rc < 0) {
> @@ -103,6 +108,7 @@ err_msi:
>      msi_uninit(d);
>  err_bridge:
>      pci_bridge_exitfn(d);
> +
>      return rc;
>  }
>  
> diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
> index 434c8fd..882271c 100644
> --- a/hw/pci-bridge/xio3130_upstream.c
> +++ b/hw/pci-bridge/xio3130_upstream.c
> @@ -55,26 +55,32 @@ static int xio3130_upstream_initfn(PCIDevice *d)
>  {
>      PCIEPort *p = PCIE_PORT(d);
>      int rc;
> +    Error *local_err = NULL;
>  
>      pci_bridge_initfn(d, TYPE_PCIE_BUS);
>      pcie_port_init_reg(d);
>  
>      rc = msi_init(d, XIO3130_MSI_OFFSET, XIO3130_MSI_NR_VECTOR,
>                    XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
> -                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
> +                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
> +                  &local_err);
>      if (rc < 0) {
> +        error_report_err(local_err);
>          goto err_bridge;
>      }
> +
>      rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
>                                 XIO3130_SSVID_SVID, XIO3130_SSVID_SSID);
>      if (rc < 0) {
>          goto err_bridge;
>      }
> +
>      rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_UPSTREAM,
>                         p->port);
>      if (rc < 0) {
>          goto err_msi;
>      }
> +
>      pcie_cap_flr_init(d);
>      pcie_cap_deverr_init(d);
>      rc = pcie_aer_init(d, XIO3130_AER_OFFSET);
> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
> index c1dd531..ad6ed09 100644
> --- a/hw/pci/msi.c
> +++ b/hw/pci/msi.c
> @@ -150,15 +150,23 @@ bool msi_enabled(const PCIDevice *dev)
>           PCI_MSI_FLAGS_ENABLE);
>  }
>  
> -int msi_init(struct PCIDevice *dev, uint8_t offset,
> -             unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask)
> +/*
> + * @nr_vectors: Multiple Message Capable field of Message Control register
> + * @msi64bit: support 64-bit message address or not
> + * @msi_per_vector_mask: support per-vector masking or not

Missing: @offset, @errp.

git-grep @errp shows the various ways we document this parameter
elsewhere.  We suck at consistency.  Pick one you like.

> + *
> + * return: MSI capability offset in config space

"... on success, -errno on error" plus an explanation of what the error
codes mean.  If nobody uses the error codes, we could simply return -1
and call it a day.

> + */

The comment follows GTK-Doc conventions, but not quite.

GTK-Doc comments are designed for easy extraction of reference
documentation.  We don't do that.  We use them in a few places anyway,
such as object.h.  The majority of the code doesn't use them.  Their use
is up to the maintainer.  I detest them, and keep them out of the
subsystems I maintain.  PCI maintainer's call.

Prior discussion:
http://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg03368.html

Here's how I'd write this one:

/*
 * Make PCI device @dev MSI-capable.
 * Non-zero @offset puts capability MSI at that offset in PCI config
 * space.
 * @nr_vectors is the number of MSI vectors (1, 2, 4, 8, 16 or 32).
 * If @msi64bit, make the device capable of sending a 64-bit message
 * address.
 * If @msi_per_vector_mask, make the device support per-vector masking.
 * @errp is for returning errors.
 * Return the offset of capability MSI in config space on success,
 * set @errp and return -errno on error.
 * FIXME explain the error codes, or dumb down return value to -1
 */

Except I'm not sure the function should fail in the first place!  See
below.

> +int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
> +             bool msi64bit, bool msi_per_vector_mask, Error **errp)
>  {
>      unsigned int vectors_order;
> -    uint16_t flags;
> +    uint16_t flags; /* Message Control register value */
>      uint8_t cap_size;
>      int config_offset;
>  
>      if (!msi_supported) {
> +        error_setg(errp, "MSI is not supported by interrupt controller");
>          return -ENOTSUP;

First failure mode: board does not support MSI (-ENOTSUP).

Question to the PCI guys: why is this even an error?  A device with
capability MSI should work just fine in such a board.

>      }
>  
> @@ -182,7 +190,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>      }
>  
>      cap_size = msi_cap_sizeof(flags);
> -    config_offset = pci_add_capability(dev, PCI_CAP_ID_MSI, offset, cap_size);
> +    config_offset = pci_add_capability2(dev, PCI_CAP_ID_MSI, offset,
> +                                        cap_size, errp);

pci_add_capability() is a wrapper around pci_add_capability2() that
additionally reports errors with error_report_err().  This is what makes
it unsuitable for realize().

>      if (config_offset < 0) {
>          return config_offset;

Inherits failing modes from pci_add_capability2().  Two: out of PCI
config space (-ENOSPC), and capability overlap (-EINVAL).

Question for the PCI guys: how can these happen?  When they happen, is
it a programming error?

>      }
> @@ -205,6 +214,7 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>          pci_set_long(dev->wmask + msi_mask_off(dev, msi64bit),
>                       0xffffffff >> (PCI_MSI_VECTORS_MAX - nr_vectors));
>      }
> +
>      return config_offset;
>  }
>  
> diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
> index d7dc667..ba25b3a 100644
> --- a/hw/scsi/megasas.c
> +++ b/hw/scsi/megasas.c
> @@ -2339,6 +2339,17 @@ static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
>      /* Interrupt pin 1 */
>      pci_conf[PCI_INTERRUPT_PIN] = 0x01;
>  
> +    if (megasas_use_msi(s)) {
> +        int ret;
> +
> +        ret = msi_init(dev, 0x50, 1, true, false, errp);
> +        if (ret > 0) {
> +            s->flags &= ~MEGASAS_MASK_USE_MSI;
> +        } else {
> +            return;
> +        }
> +    }
> +
>      memory_region_init_io(&s->mmio_io, OBJECT(s), &megasas_mmio_ops, s,
>                            "megasas-mmio", 0x4000);
>      memory_region_init_io(&s->port_io, OBJECT(s), &megasas_port_ops, s,
> @@ -2346,10 +2357,6 @@ static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
>      memory_region_init_io(&s->queue_io, OBJECT(s), &megasas_queue_ops, s,
>                            "megasas-queue", 0x40000);
>  
> -    if (megasas_use_msi(s) &&
> -        msi_init(dev, 0x50, 1, true, false)) {
> -        s->flags &= ~MEGASAS_MASK_USE_MSI;
> -    }
>      if (megasas_use_msix(s) &&
>          msix_init(dev, 15, &s->mmio_io, b->mmio_bar, 0x2000,
>                    &s->mmio_io, b->mmio_bar, 0x3800, 0x68)) {
> diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
> index 9c71f31..073b956 100644
> --- a/hw/scsi/vmw_pvscsi.c
> +++ b/hw/scsi/vmw_pvscsi.c
> @@ -1014,13 +1014,13 @@ pvscsi_io_read(void *opaque, hwaddr addr, unsigned size)
>  
>  
>  static bool
> -pvscsi_init_msi(PVSCSIState *s)
> +pvscsi_init_msi(PVSCSIState *s, Error **errp)
>  {
>      int res;
>      PCIDevice *d = PCI_DEVICE(s);
>  
>      res = msi_init(d, PVSCSI_MSI_OFFSET, PVSCSI_MSIX_NUM_VECTORS,
> -                   PVSCSI_USE_64BIT, PVSCSI_PER_VECTOR_MASK);
> +                   PVSCSI_USE_64BIT, PVSCSI_PER_VECTOR_MASK, errp);
>      if (res < 0) {
>          trace_pvscsi_init_msi_fail(res);
>          s->msi_used = false;
> @@ -1066,6 +1066,7 @@ static int
>  pvscsi_init(PCIDevice *pci_dev)
>  {
>      PVSCSIState *s = PVSCSI(pci_dev);
> +    Error *local_err = NULL;
>  
>      trace_pvscsi_state("init");
>  
> @@ -1079,12 +1080,16 @@ pvscsi_init(PCIDevice *pci_dev)
>      /* Interrupt pin A */
>      pci_config_set_interrupt_pin(pci_dev->config, 1);
>  
> +    pvscsi_init_msi(s, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return -1;
> +    }
> +
>      memory_region_init_io(&s->io_space, OBJECT(s), &pvscsi_ops, s,
>                            "pvscsi-io", PVSCSI_MEM_SPACE_SIZE);
>      pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->io_space);
>  
> -    pvscsi_init_msi(s);
> -
>      s->completion_worker = qemu_bh_new(pvscsi_process_completion_queue, s);
>      if (!s->completion_worker) {
>          pvscsi_cleanup_msi(s);
> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
> index 268ab36..7cd5f6c 100644
> --- a/hw/usb/hcd-xhci.c
> +++ b/hw/usb/hcd-xhci.c
> @@ -3572,6 +3572,43 @@ static void usb_xhci_init(XHCIState *xhci)
>      }
>  }
>  
> +static void usb_xhci_exit(PCIDevice *dev)
> +{
> +    int i;
> +    XHCIState *xhci = XHCI(dev);
> +
> +    trace_usb_xhci_exit();
> +
> +    for (i = 0; i < xhci->numslots; i++) {
> +        xhci_disable_slot(xhci, i + 1);
> +    }
> +
> +    if (xhci->mfwrap_timer) {
> +        timer_del(xhci->mfwrap_timer);
> +        timer_free(xhci->mfwrap_timer);
> +        xhci->mfwrap_timer = NULL;
> +    }
> +
> +    memory_region_del_subregion(&xhci->mem, &xhci->mem_cap);
> +    memory_region_del_subregion(&xhci->mem, &xhci->mem_oper);
> +    memory_region_del_subregion(&xhci->mem, &xhci->mem_runtime);
> +    memory_region_del_subregion(&xhci->mem, &xhci->mem_doorbell);
> +
> +    for (i = 0; i < xhci->numports; i++) {
> +        XHCIPort *port = &xhci->ports[i];
> +        memory_region_del_subregion(&xhci->mem, &port->mem);
> +    }
> +
> +    /* destroy msix memory region */
> +    if (dev->msix_table && dev->msix_pba
> +        && dev->msix_entry_used) {
> +        memory_region_del_subregion(&xhci->mem, &dev->msix_table_mmio);
> +        memory_region_del_subregion(&xhci->mem, &dev->msix_pba_mmio);
> +    }
> +
> +    usb_bus_release(&xhci->bus);
> +}
> +
>  static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>  {
>      int i, ret;
> @@ -3643,7 +3680,10 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>      }
>  
>      if (xhci_get_flag(xhci, XHCI_FLAG_USE_MSI)) {
> -        msi_init(dev, 0x70, xhci->numintrs, true, false);
> +        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, errp);
> +        if (ret < 0) {
> +            goto cleanup_on_msi_fail;
> +        }
>      }
>      if (xhci_get_flag(xhci, XHCI_FLAG_USE_MSI_X)) {
>          msix_init(dev, xhci->numintrs,
> @@ -3651,43 +3691,10 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>                    &xhci->mem, 0, OFF_MSIX_PBA,
>                    0x90);
>      }
> -}
> -
> -static void usb_xhci_exit(PCIDevice *dev)
> -{
> -    int i;
> -    XHCIState *xhci = XHCI(dev);
> -
> -    trace_usb_xhci_exit();
> -
> -    for (i = 0; i < xhci->numslots; i++) {
> -        xhci_disable_slot(xhci, i + 1);
> -    }
> -
> -    if (xhci->mfwrap_timer) {
> -        timer_del(xhci->mfwrap_timer);
> -        timer_free(xhci->mfwrap_timer);
> -        xhci->mfwrap_timer = NULL;
> -    }
>  
> -    memory_region_del_subregion(&xhci->mem, &xhci->mem_cap);
> -    memory_region_del_subregion(&xhci->mem, &xhci->mem_oper);
> -    memory_region_del_subregion(&xhci->mem, &xhci->mem_runtime);
> -    memory_region_del_subregion(&xhci->mem, &xhci->mem_doorbell);
> -
> -    for (i = 0; i < xhci->numports; i++) {
> -        XHCIPort *port = &xhci->ports[i];
> -        memory_region_del_subregion(&xhci->mem, &port->mem);
> -    }
> -
> -    /* destroy msix memory region */
> -    if (dev->msix_table && dev->msix_pba
> -        && dev->msix_entry_used) {
> -        memory_region_del_subregion(&xhci->mem, &dev->msix_table_mmio);
> -        memory_region_del_subregion(&xhci->mem, &dev->msix_pba_mmio);
> -    }
> -
> -    usb_bus_release(&xhci->bus);
> +cleanup_on_msi_fail:
> +    usb_xhci_exit(dev);
> +    object_unref(OBJECT(&xhci->mem));
>  }
>  
>  static int usb_xhci_post_load(void *opaque, int version_id)
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 1fb868c..633642e 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1139,7 +1139,7 @@ static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
>      }
>  }
>  
> -static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
> +static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
>  {
>      uint16_t ctrl;
>      bool msi_64bit, msi_maskbit;
> @@ -1147,6 +1147,7 @@ static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
>  
>      if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
>                vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
> +        error_setg(errp, "Read error!");
>          return -errno;
>      }
>      ctrl = le16_to_cpu(ctrl);
> @@ -1157,12 +1158,11 @@ static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
>  
>      trace_vfio_msi_setup(vdev->vbasedev.name, pos);
>  
> -    ret = msi_init(&vdev->pdev, pos, entries, msi_64bit, msi_maskbit);
> +    ret = msi_init(&vdev->pdev, pos, entries, msi_64bit, msi_maskbit, errp);
>      if (ret < 0) {
>          if (ret == -ENOTSUP) {
>              return 0;
>          }
> -        error_report("vfio: msi_init failed");
>          return ret;
>      }
>      vdev->msi_cap_size = 0xa + (msi_maskbit ? 0xa : 0) + (msi_64bit ? 0x4 : 0);
> @@ -1654,7 +1654,7 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
>      }
>  }
>  
> -static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
> +static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp)
>  {
>      PCIDevice *pdev = &vdev->pdev;
>      uint8_t cap_id, next, size;
> @@ -1679,7 +1679,7 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
>       * will be changed as we unwind the stack.
>       */
>      if (next) {
> -        ret = vfio_add_std_cap(vdev, next);
> +        ret = vfio_add_std_cap(vdev, next, errp);
>          if (ret) {
>              return ret;
>          }
> @@ -1695,7 +1695,7 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
>  
>      switch (cap_id) {
>      case PCI_CAP_ID_MSI:
> -        ret = vfio_msi_setup(vdev, pos);
> +        ret = vfio_msi_setup(vdev, pos, errp);
>          break;
>      case PCI_CAP_ID_EXP:
>          vfio_check_pcie_flr(vdev, pos);
> @@ -1729,7 +1729,7 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
>      return 0;
>  }
>  
> -static int vfio_add_capabilities(VFIOPCIDevice *vdev)
> +static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
>  {
>      PCIDevice *pdev = &vdev->pdev;
>  
> @@ -1738,7 +1738,7 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev)
>          return 0; /* Nothing to add */
>      }
>  
> -    return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
> +    return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST], errp);
>  }
>  
>  static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
> @@ -2349,6 +2349,7 @@ static int vfio_initfn(PCIDevice *pdev)
>      struct stat st;
>      int groupid;
>      int ret;
> +    Error *local_err = NULL;
>  
>      /* Check that the host device exists */
>      snprintf(path, sizeof(path),
> @@ -2507,8 +2508,9 @@ static int vfio_initfn(PCIDevice *pdev)
>  
>      vfio_map_bars(vdev);
>  
> -    ret = vfio_add_capabilities(vdev);
> +    ret = vfio_add_capabilities(vdev, &local_err);
>      if (ret) {
> +        error_report_err(local_err);
>          goto out_teardown;
>      }
>  
> diff --git a/include/hw/pci/msi.h b/include/hw/pci/msi.h
> index 50e452b..da1dc1a 100644
> --- a/include/hw/pci/msi.h
> +++ b/include/hw/pci/msi.h
> @@ -34,8 +34,8 @@ extern bool msi_supported;
>  void msi_set_message(PCIDevice *dev, MSIMessage msg);
>  MSIMessage msi_get_message(PCIDevice *dev, unsigned int vector);
>  bool msi_enabled(const PCIDevice *dev);
> -int msi_init(struct PCIDevice *dev, uint8_t offset,
> -             unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
> +int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
> +             bool msi64bit, bool msi_per_vector_mask, Error **errp);
>  void msi_uninit(struct PCIDevice *dev);
>  void msi_reset(PCIDevice *dev);
>  void msi_notify(PCIDevice *dev, unsigned int vector);
Cao jin March 3, 2016, 3:56 a.m. UTC | #2
hi, Markus
Thanks for still remembering this patch, and quite a lot response:)
I will give a appropriate response after I read & understand them 
all.(so, not cc other guys here)

On 03/02/2016 05:13 PM, Markus Armbruster wrote:
> This got lost over the Christmas break, sorry.
>
> Cc'ing Marcel for additional PCI expertise.
>
> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>
>> msi_init() is a supporting function in PCI device initialization,
>> in order to convert .init() to .realize(), it should be modified first.
>
> "Supporting function" doesn't imply "should use Error to report
> errors".  HACKING explains:
>
>      Use the simplest suitable method to communicate success / failure to
>      callers.  Stick to common methods: non-negative on success / -1 on
>      error, non-negative / -errno, non-null / null, or Error objects.
>
>      Example: when a function returns a non-null pointer on success, and it
>      can fail only in one way (as far as the caller is concerned), returning
>      null on failure is just fine, and certainly simpler and a lot easier on
>      the eyes than propagating an Error object through an Error ** parameter.
>
>      Example: when a function's callers need to report details on failure
>      only the function really knows, use Error **, and set suitable errors.
>
>      Do not report an error to the user when you're also returning an error
>      for somebody else to handle.  Leave the reporting to the place that
>      consumes the error returned.
>
> As we'll see in your patch of msi.c, msi_init() can fail in multiple
> ways, and uses -errno to comunicate them.  That can be okay even in
> realize().  It also reports to the user.  That's what makes it
> unsuitable for realize().
>
>> Also modify the callers
>
> Actually, you're *fixing* callers!  But the bugs aren't 100% clear, yet,
> see below for details.  Once we know what the bugs are, we'll want to
> reword the commit message to describe the bugs and their impact.
>
> I recommend to skip ahead to msi.c, then come back to the device models.
>
>> Bonus: add more comment for msi_init().
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
>>   hw/audio/intel-hda.c               | 10 ++++-
>>   hw/ide/ich.c                       |  2 +-
>>   hw/net/vmxnet3.c                   | 13 +++---
>>   hw/pci-bridge/ioh3420.c            |  7 +++-
>>   hw/pci-bridge/pci_bridge_dev.c     |  8 +++-
>>   hw/pci-bridge/xio3130_downstream.c |  8 +++-
>>   hw/pci-bridge/xio3130_upstream.c   |  8 +++-
>>   hw/pci/msi.c                       | 18 +++++++--
>>   hw/scsi/megasas.c                  | 15 +++++--
>>   hw/scsi/vmw_pvscsi.c               | 13 ++++--
>>   hw/usb/hcd-xhci.c                  | 81 +++++++++++++++++++++-----------------
>>   hw/vfio/pci.c                      | 20 +++++-----
>>   include/hw/pci/msi.h               |  4 +-
>>   13 files changed, 135 insertions(+), 72 deletions(-)
>>
>> diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
>> index 433463e..0d770131 100644
>> --- a/hw/audio/intel-hda.c
>> +++ b/hw/audio/intel-hda.c
>> @@ -1130,6 +1130,7 @@ static void intel_hda_realize(PCIDevice *pci, Error **errp)
>>   {
>>       IntelHDAState *d = INTEL_HDA(pci);
>>       uint8_t *conf = d->pci.config;
>> +    int ret;
>>
>>       d->name = object_get_typename(OBJECT(d));
>>
>> @@ -1142,11 +1143,18 @@ static void intel_hda_realize(PCIDevice *pci, Error **errp)
>>                             "intel-hda", 0x4000);
>>       pci_register_bar(&d->pci, 0, 0, &d->mmio);
>>       if (d->msi) {
>> -        msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1, true, false);
>> +        ret = msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1, true,
>> +                false, errp);
>> +        if (ret < 0) {
>> +            goto cleanup_on_msi_fail;
>> +        }
>>       }
>>
>>       hda_codec_bus_init(DEVICE(pci), &d->codecs, sizeof(d->codecs),
>>                          intel_hda_response, intel_hda_xfer);
>> +
>> +cleanup_on_msi_fail:
>> +    object_unref(OBJECT(&d->mmio));
>>   }
>>
>
> This is a bug fix.  Two bugs, actually.
>
> Bug#1: we ignore msi_init() failure.  If it fails because the board
> doesn't support MSI, the failure is ignored silently.  If it fails
> because the PCI config space offset is already occupied, we report the
> error to the user, but still ignore it.  The latter feels like a
> programming error to me.
>
> Your patch fixes realize() to fail when msi_init() fails.
>
> Bug#2: we report errors with error_report_err() instead through errp.
> This is because we use pci_add_capability() instead of
> pci_add_capability2().
>
> I don't have the time to review the other devices you patch.  Both bugs
> should be easy to spot in the patch: if the value of msi_init() is
> ignored before the patch, the device got bug#1, and if the patched
> device uses realize(), it got bug#2.
>
> The patch could be split up into parts that fix just one thing.  Not
> sure that's worth the bother.
>
>>   static void intel_hda_exit(PCIDevice *pci)
>> diff --git a/hw/ide/ich.c b/hw/ide/ich.c
>> index 16925fa..94b1809 100644
>> --- a/hw/ide/ich.c
>> +++ b/hw/ide/ich.c
>> @@ -145,7 +145,7 @@ static void pci_ich9_ahci_realize(PCIDevice *dev, Error **errp)
>>       /* Although the AHCI 1.3 specification states that the first capability
>>        * should be PMCAP, the Intel ICH9 data sheet specifies that the ICH9
>>        * AHCI device puts the MSI capability first, pointing to 0x80. */
>> -    msi_init(dev, ICH9_MSI_CAP_OFFSET, 1, true, false);
>> +    msi_init(dev, ICH9_MSI_CAP_OFFSET, 1, true, false, errp);
>>   }
>>
>>   static void pci_ich9_uninit(PCIDevice *dev)
>> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
>> index 37373e5..c373e77 100644
>> --- a/hw/net/vmxnet3.c
>> +++ b/hw/net/vmxnet3.c
>> @@ -2135,13 +2135,13 @@ vmxnet3_cleanup_msix(VMXNET3State *s)
>>   #define VMXNET3_PER_VECTOR_MASK   (false)
>>
>>   static bool
>> -vmxnet3_init_msi(VMXNET3State *s)
>> +vmxnet3_init_msi(VMXNET3State *s, Error **errp)
>>   {
>>       PCIDevice *d = PCI_DEVICE(s);
>>       int res;
>>
>>       res = msi_init(d, VMXNET3_MSI_OFFSET, VMXNET3_MAX_NMSIX_INTRS,
>> -                   VMXNET3_USE_64BIT, VMXNET3_PER_VECTOR_MASK);
>> +                   VMXNET3_USE_64BIT, VMXNET3_PER_VECTOR_MASK, errp);
>>       if (0 > res) {
>>           VMW_WRPRN("Failed to initialize MSI, error %d", res);
>>           s->msi_used = false;
>> @@ -2204,6 +2204,11 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
>>
>>       VMW_CBPRN("Starting init...");
>>
>> +    if (!vmxnet3_init_msi(s, errp)) {
>> +        VMW_WRPRN("Failed to initialize MSI, configuration is inconsistent.");
>> +        return;
>> +    }
>> +
>>       memory_region_init_io(&s->bar0, OBJECT(s), &b0_ops, s,
>>                             "vmxnet3-b0", VMXNET3_PT_REG_SIZE);
>>       pci_register_bar(pci_dev, VMXNET3_BAR0_IDX,
>> @@ -2228,10 +2233,6 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
>>           VMW_WRPRN("Failed to initialize MSI-X, configuration is inconsistent.");
>>       }
>>
>> -    if (!vmxnet3_init_msi(s)) {
>> -        VMW_WRPRN("Failed to initialize MSI, configuration is inconsistent.");
>> -    }
>> -
>>       vmxnet3_net_init(s);
>>
>>       register_savevm(dev, "vmxnet3-msix", -1, 1,
>> diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
>> index eead195..5df9e83 100644
>> --- a/hw/pci-bridge/ioh3420.c
>> +++ b/hw/pci-bridge/ioh3420.c
>> @@ -96,6 +96,7 @@ static int ioh3420_initfn(PCIDevice *d)
>>       PCIEPort *p = PCIE_PORT(d);
>>       PCIESlot *s = PCIE_SLOT(d);
>>       int rc;
>> +    Error *local_err = NULL;
>>
>>       pci_bridge_initfn(d, TYPE_PCIE_BUS);
>>       pcie_port_init_reg(d);
>> @@ -105,12 +106,16 @@ static int ioh3420_initfn(PCIDevice *d)
>>       if (rc < 0) {
>>           goto err_bridge;
>>       }
>> +
>>       rc = msi_init(d, IOH_EP_MSI_OFFSET, IOH_EP_MSI_NR_VECTOR,
>>                     IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
>> -                  IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
>> +                  IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
>> +                  &local_err);
>>       if (rc < 0) {
>> +        error_report_err(local_err);
>>           goto err_bridge;
>>       }
>> +
>>       rc = pcie_cap_init(d, IOH_EP_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT, p->port);
>>       if (rc < 0) {
>>           goto err_msi;
>> diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
>> index bc3e1b7..bafaeeb 100644
>> --- a/hw/pci-bridge/pci_bridge_dev.c
>> +++ b/hw/pci-bridge/pci_bridge_dev.c
>> @@ -51,6 +51,7 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
>>       PCIBridge *br = PCI_BRIDGE(dev);
>>       PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev);
>>       int err;
>> +    Error *local_err = NULL;
>>
>>       pci_bridge_initfn(dev, TYPE_PCI_BUS);
>>
>> @@ -66,17 +67,21 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
>>           /* MSI is not applicable without SHPC */
>>           bridge_dev->flags &= ~(1 << PCI_BRIDGE_DEV_F_MSI_REQ);
>>       }
>> +
>>       err = slotid_cap_init(dev, 0, bridge_dev->chassis_nr, 0);
>>       if (err) {
>>           goto slotid_error;
>>       }
>> +
>>       if ((bridge_dev->flags & (1 << PCI_BRIDGE_DEV_F_MSI_REQ)) &&
>>           msi_supported) {
>> -        err = msi_init(dev, 0, 1, true, true);
>> +        err = msi_init(dev, 0, 1, true, true, &local_err);
>>           if (err < 0) {
>> +            error_report_err(local_err);
>>               goto msi_error;
>>           }
>>       }
>> +
>>       if (shpc_present(dev)) {
>>           /* TODO: spec recommends using 64 bit prefetcheable BAR.
>>            * Check whether that works well. */
>> @@ -84,6 +89,7 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
>>                            PCI_BASE_ADDRESS_MEM_TYPE_64, &bridge_dev->bar);
>>       }
>>       return 0;
>> +
>>   msi_error:
>>       slotid_cap_cleanup(dev);
>>   slotid_error:
>> diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c
>> index b4dd25f..3fc7455 100644
>> --- a/hw/pci-bridge/xio3130_downstream.c
>> +++ b/hw/pci-bridge/xio3130_downstream.c
>> @@ -59,21 +59,26 @@ static int xio3130_downstream_initfn(PCIDevice *d)
>>       PCIEPort *p = PCIE_PORT(d);
>>       PCIESlot *s = PCIE_SLOT(d);
>>       int rc;
>> +    Error *local_err = NULL;
>>
>>       pci_bridge_initfn(d, TYPE_PCIE_BUS);
>>       pcie_port_init_reg(d);
>>
>>       rc = msi_init(d, XIO3130_MSI_OFFSET, XIO3130_MSI_NR_VECTOR,
>>                     XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
>> -                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
>> +                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
>> +                  &local_err);
>>       if (rc < 0) {
>> +        error_report_err(local_err);
>>           goto err_bridge;
>>       }
>> +
>>       rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
>>                                  XIO3130_SSVID_SVID, XIO3130_SSVID_SSID);
>>       if (rc < 0) {
>>           goto err_bridge;
>>       }
>> +
>>       rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_DOWNSTREAM,
>>                          p->port);
>>       if (rc < 0) {
>> @@ -103,6 +108,7 @@ err_msi:
>>       msi_uninit(d);
>>   err_bridge:
>>       pci_bridge_exitfn(d);
>> +
>>       return rc;
>>   }
>>
>> diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
>> index 434c8fd..882271c 100644
>> --- a/hw/pci-bridge/xio3130_upstream.c
>> +++ b/hw/pci-bridge/xio3130_upstream.c
>> @@ -55,26 +55,32 @@ static int xio3130_upstream_initfn(PCIDevice *d)
>>   {
>>       PCIEPort *p = PCIE_PORT(d);
>>       int rc;
>> +    Error *local_err = NULL;
>>
>>       pci_bridge_initfn(d, TYPE_PCIE_BUS);
>>       pcie_port_init_reg(d);
>>
>>       rc = msi_init(d, XIO3130_MSI_OFFSET, XIO3130_MSI_NR_VECTOR,
>>                     XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
>> -                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
>> +                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
>> +                  &local_err);
>>       if (rc < 0) {
>> +        error_report_err(local_err);
>>           goto err_bridge;
>>       }
>> +
>>       rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
>>                                  XIO3130_SSVID_SVID, XIO3130_SSVID_SSID);
>>       if (rc < 0) {
>>           goto err_bridge;
>>       }
>> +
>>       rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_UPSTREAM,
>>                          p->port);
>>       if (rc < 0) {
>>           goto err_msi;
>>       }
>> +
>>       pcie_cap_flr_init(d);
>>       pcie_cap_deverr_init(d);
>>       rc = pcie_aer_init(d, XIO3130_AER_OFFSET);
>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
>> index c1dd531..ad6ed09 100644
>> --- a/hw/pci/msi.c
>> +++ b/hw/pci/msi.c
>> @@ -150,15 +150,23 @@ bool msi_enabled(const PCIDevice *dev)
>>            PCI_MSI_FLAGS_ENABLE);
>>   }
>>
>> -int msi_init(struct PCIDevice *dev, uint8_t offset,
>> -             unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask)
>> +/*
>> + * @nr_vectors: Multiple Message Capable field of Message Control register
>> + * @msi64bit: support 64-bit message address or not
>> + * @msi_per_vector_mask: support per-vector masking or not
>
> Missing: @offset, @errp.
>
> git-grep @errp shows the various ways we document this parameter
> elsewhere.  We suck at consistency.  Pick one you like.
>
>> + *
>> + * return: MSI capability offset in config space
>
> "... on success, -errno on error" plus an explanation of what the error
> codes mean.  If nobody uses the error codes, we could simply return -1
> and call it a day.
>
>> + */
>
> The comment follows GTK-Doc conventions, but not quite.
>
> GTK-Doc comments are designed for easy extraction of reference
> documentation.  We don't do that.  We use them in a few places anyway,
> such as object.h.  The majority of the code doesn't use them.  Their use
> is up to the maintainer.  I detest them, and keep them out of the
> subsystems I maintain.  PCI maintainer's call.
>
> Prior discussion:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg03368.html
>
> Here's how I'd write this one:
>
> /*
>   * Make PCI device @dev MSI-capable.
>   * Non-zero @offset puts capability MSI at that offset in PCI config
>   * space.
>   * @nr_vectors is the number of MSI vectors (1, 2, 4, 8, 16 or 32).
>   * If @msi64bit, make the device capable of sending a 64-bit message
>   * address.
>   * If @msi_per_vector_mask, make the device support per-vector masking.
>   * @errp is for returning errors.
>   * Return the offset of capability MSI in config space on success,
>   * set @errp and return -errno on error.
>   * FIXME explain the error codes, or dumb down return value to -1
>   */
>
> Except I'm not sure the function should fail in the first place!  See
> below.
>
>> +int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
>> +             bool msi64bit, bool msi_per_vector_mask, Error **errp)
>>   {
>>       unsigned int vectors_order;
>> -    uint16_t flags;
>> +    uint16_t flags; /* Message Control register value */
>>       uint8_t cap_size;
>>       int config_offset;
>>
>>       if (!msi_supported) {
>> +        error_setg(errp, "MSI is not supported by interrupt controller");
>>           return -ENOTSUP;
>
> First failure mode: board does not support MSI (-ENOTSUP).
>
> Question to the PCI guys: why is this even an error?  A device with
> capability MSI should work just fine in such a board.
>
>>       }
>>
>> @@ -182,7 +190,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>>       }
>>
>>       cap_size = msi_cap_sizeof(flags);
>> -    config_offset = pci_add_capability(dev, PCI_CAP_ID_MSI, offset, cap_size);
>> +    config_offset = pci_add_capability2(dev, PCI_CAP_ID_MSI, offset,
>> +                                        cap_size, errp);
>
> pci_add_capability() is a wrapper around pci_add_capability2() that
> additionally reports errors with error_report_err().  This is what makes
> it unsuitable for realize().
>
>>       if (config_offset < 0) {
>>           return config_offset;
>
> Inherits failing modes from pci_add_capability2().  Two: out of PCI
> config space (-ENOSPC), and capability overlap (-EINVAL).
>
> Question for the PCI guys: how can these happen?  When they happen, is
> it a programming error?
>
>>       }
>> @@ -205,6 +214,7 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>>           pci_set_long(dev->wmask + msi_mask_off(dev, msi64bit),
>>                        0xffffffff >> (PCI_MSI_VECTORS_MAX - nr_vectors));
>>       }
>> +
>>       return config_offset;
>>   }
>>
>> diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
>> index d7dc667..ba25b3a 100644
>> --- a/hw/scsi/megasas.c
>> +++ b/hw/scsi/megasas.c
>> @@ -2339,6 +2339,17 @@ static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
>>       /* Interrupt pin 1 */
>>       pci_conf[PCI_INTERRUPT_PIN] = 0x01;
>>
>> +    if (megasas_use_msi(s)) {
>> +        int ret;
>> +
>> +        ret = msi_init(dev, 0x50, 1, true, false, errp);
>> +        if (ret > 0) {
>> +            s->flags &= ~MEGASAS_MASK_USE_MSI;
>> +        } else {
>> +            return;
>> +        }
>> +    }
>> +
>>       memory_region_init_io(&s->mmio_io, OBJECT(s), &megasas_mmio_ops, s,
>>                             "megasas-mmio", 0x4000);
>>       memory_region_init_io(&s->port_io, OBJECT(s), &megasas_port_ops, s,
>> @@ -2346,10 +2357,6 @@ static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
>>       memory_region_init_io(&s->queue_io, OBJECT(s), &megasas_queue_ops, s,
>>                             "megasas-queue", 0x40000);
>>
>> -    if (megasas_use_msi(s) &&
>> -        msi_init(dev, 0x50, 1, true, false)) {
>> -        s->flags &= ~MEGASAS_MASK_USE_MSI;
>> -    }
>>       if (megasas_use_msix(s) &&
>>           msix_init(dev, 15, &s->mmio_io, b->mmio_bar, 0x2000,
>>                     &s->mmio_io, b->mmio_bar, 0x3800, 0x68)) {
>> diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
>> index 9c71f31..073b956 100644
>> --- a/hw/scsi/vmw_pvscsi.c
>> +++ b/hw/scsi/vmw_pvscsi.c
>> @@ -1014,13 +1014,13 @@ pvscsi_io_read(void *opaque, hwaddr addr, unsigned size)
>>
>>
>>   static bool
>> -pvscsi_init_msi(PVSCSIState *s)
>> +pvscsi_init_msi(PVSCSIState *s, Error **errp)
>>   {
>>       int res;
>>       PCIDevice *d = PCI_DEVICE(s);
>>
>>       res = msi_init(d, PVSCSI_MSI_OFFSET, PVSCSI_MSIX_NUM_VECTORS,
>> -                   PVSCSI_USE_64BIT, PVSCSI_PER_VECTOR_MASK);
>> +                   PVSCSI_USE_64BIT, PVSCSI_PER_VECTOR_MASK, errp);
>>       if (res < 0) {
>>           trace_pvscsi_init_msi_fail(res);
>>           s->msi_used = false;
>> @@ -1066,6 +1066,7 @@ static int
>>   pvscsi_init(PCIDevice *pci_dev)
>>   {
>>       PVSCSIState *s = PVSCSI(pci_dev);
>> +    Error *local_err = NULL;
>>
>>       trace_pvscsi_state("init");
>>
>> @@ -1079,12 +1080,16 @@ pvscsi_init(PCIDevice *pci_dev)
>>       /* Interrupt pin A */
>>       pci_config_set_interrupt_pin(pci_dev->config, 1);
>>
>> +    pvscsi_init_msi(s, &local_err);
>> +    if (local_err) {
>> +        error_report_err(local_err);
>> +        return -1;
>> +    }
>> +
>>       memory_region_init_io(&s->io_space, OBJECT(s), &pvscsi_ops, s,
>>                             "pvscsi-io", PVSCSI_MEM_SPACE_SIZE);
>>       pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->io_space);
>>
>> -    pvscsi_init_msi(s);
>> -
>>       s->completion_worker = qemu_bh_new(pvscsi_process_completion_queue, s);
>>       if (!s->completion_worker) {
>>           pvscsi_cleanup_msi(s);
>> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
>> index 268ab36..7cd5f6c 100644
>> --- a/hw/usb/hcd-xhci.c
>> +++ b/hw/usb/hcd-xhci.c
>> @@ -3572,6 +3572,43 @@ static void usb_xhci_init(XHCIState *xhci)
>>       }
>>   }
>>
>> +static void usb_xhci_exit(PCIDevice *dev)
>> +{
>> +    int i;
>> +    XHCIState *xhci = XHCI(dev);
>> +
>> +    trace_usb_xhci_exit();
>> +
>> +    for (i = 0; i < xhci->numslots; i++) {
>> +        xhci_disable_slot(xhci, i + 1);
>> +    }
>> +
>> +    if (xhci->mfwrap_timer) {
>> +        timer_del(xhci->mfwrap_timer);
>> +        timer_free(xhci->mfwrap_timer);
>> +        xhci->mfwrap_timer = NULL;
>> +    }
>> +
>> +    memory_region_del_subregion(&xhci->mem, &xhci->mem_cap);
>> +    memory_region_del_subregion(&xhci->mem, &xhci->mem_oper);
>> +    memory_region_del_subregion(&xhci->mem, &xhci->mem_runtime);
>> +    memory_region_del_subregion(&xhci->mem, &xhci->mem_doorbell);
>> +
>> +    for (i = 0; i < xhci->numports; i++) {
>> +        XHCIPort *port = &xhci->ports[i];
>> +        memory_region_del_subregion(&xhci->mem, &port->mem);
>> +    }
>> +
>> +    /* destroy msix memory region */
>> +    if (dev->msix_table && dev->msix_pba
>> +        && dev->msix_entry_used) {
>> +        memory_region_del_subregion(&xhci->mem, &dev->msix_table_mmio);
>> +        memory_region_del_subregion(&xhci->mem, &dev->msix_pba_mmio);
>> +    }
>> +
>> +    usb_bus_release(&xhci->bus);
>> +}
>> +
>>   static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>>   {
>>       int i, ret;
>> @@ -3643,7 +3680,10 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>>       }
>>
>>       if (xhci_get_flag(xhci, XHCI_FLAG_USE_MSI)) {
>> -        msi_init(dev, 0x70, xhci->numintrs, true, false);
>> +        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, errp);
>> +        if (ret < 0) {
>> +            goto cleanup_on_msi_fail;
>> +        }
>>       }
>>       if (xhci_get_flag(xhci, XHCI_FLAG_USE_MSI_X)) {
>>           msix_init(dev, xhci->numintrs,
>> @@ -3651,43 +3691,10 @@ static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
>>                     &xhci->mem, 0, OFF_MSIX_PBA,
>>                     0x90);
>>       }
>> -}
>> -
>> -static void usb_xhci_exit(PCIDevice *dev)
>> -{
>> -    int i;
>> -    XHCIState *xhci = XHCI(dev);
>> -
>> -    trace_usb_xhci_exit();
>> -
>> -    for (i = 0; i < xhci->numslots; i++) {
>> -        xhci_disable_slot(xhci, i + 1);
>> -    }
>> -
>> -    if (xhci->mfwrap_timer) {
>> -        timer_del(xhci->mfwrap_timer);
>> -        timer_free(xhci->mfwrap_timer);
>> -        xhci->mfwrap_timer = NULL;
>> -    }
>>
>> -    memory_region_del_subregion(&xhci->mem, &xhci->mem_cap);
>> -    memory_region_del_subregion(&xhci->mem, &xhci->mem_oper);
>> -    memory_region_del_subregion(&xhci->mem, &xhci->mem_runtime);
>> -    memory_region_del_subregion(&xhci->mem, &xhci->mem_doorbell);
>> -
>> -    for (i = 0; i < xhci->numports; i++) {
>> -        XHCIPort *port = &xhci->ports[i];
>> -        memory_region_del_subregion(&xhci->mem, &port->mem);
>> -    }
>> -
>> -    /* destroy msix memory region */
>> -    if (dev->msix_table && dev->msix_pba
>> -        && dev->msix_entry_used) {
>> -        memory_region_del_subregion(&xhci->mem, &dev->msix_table_mmio);
>> -        memory_region_del_subregion(&xhci->mem, &dev->msix_pba_mmio);
>> -    }
>> -
>> -    usb_bus_release(&xhci->bus);
>> +cleanup_on_msi_fail:
>> +    usb_xhci_exit(dev);
>> +    object_unref(OBJECT(&xhci->mem));
>>   }
>>
>>   static int usb_xhci_post_load(void *opaque, int version_id)
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 1fb868c..633642e 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -1139,7 +1139,7 @@ static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
>>       }
>>   }
>>
>> -static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
>> +static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
>>   {
>>       uint16_t ctrl;
>>       bool msi_64bit, msi_maskbit;
>> @@ -1147,6 +1147,7 @@ static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
>>
>>       if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
>>                 vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
>> +        error_setg(errp, "Read error!");
>>           return -errno;
>>       }
>>       ctrl = le16_to_cpu(ctrl);
>> @@ -1157,12 +1158,11 @@ static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
>>
>>       trace_vfio_msi_setup(vdev->vbasedev.name, pos);
>>
>> -    ret = msi_init(&vdev->pdev, pos, entries, msi_64bit, msi_maskbit);
>> +    ret = msi_init(&vdev->pdev, pos, entries, msi_64bit, msi_maskbit, errp);
>>       if (ret < 0) {
>>           if (ret == -ENOTSUP) {
>>               return 0;
>>           }
>> -        error_report("vfio: msi_init failed");
>>           return ret;
>>       }
>>       vdev->msi_cap_size = 0xa + (msi_maskbit ? 0xa : 0) + (msi_64bit ? 0x4 : 0);
>> @@ -1654,7 +1654,7 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
>>       }
>>   }
>>
>> -static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
>> +static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp)
>>   {
>>       PCIDevice *pdev = &vdev->pdev;
>>       uint8_t cap_id, next, size;
>> @@ -1679,7 +1679,7 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
>>        * will be changed as we unwind the stack.
>>        */
>>       if (next) {
>> -        ret = vfio_add_std_cap(vdev, next);
>> +        ret = vfio_add_std_cap(vdev, next, errp);
>>           if (ret) {
>>               return ret;
>>           }
>> @@ -1695,7 +1695,7 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
>>
>>       switch (cap_id) {
>>       case PCI_CAP_ID_MSI:
>> -        ret = vfio_msi_setup(vdev, pos);
>> +        ret = vfio_msi_setup(vdev, pos, errp);
>>           break;
>>       case PCI_CAP_ID_EXP:
>>           vfio_check_pcie_flr(vdev, pos);
>> @@ -1729,7 +1729,7 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
>>       return 0;
>>   }
>>
>> -static int vfio_add_capabilities(VFIOPCIDevice *vdev)
>> +static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
>>   {
>>       PCIDevice *pdev = &vdev->pdev;
>>
>> @@ -1738,7 +1738,7 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev)
>>           return 0; /* Nothing to add */
>>       }
>>
>> -    return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
>> +    return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST], errp);
>>   }
>>
>>   static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
>> @@ -2349,6 +2349,7 @@ static int vfio_initfn(PCIDevice *pdev)
>>       struct stat st;
>>       int groupid;
>>       int ret;
>> +    Error *local_err = NULL;
>>
>>       /* Check that the host device exists */
>>       snprintf(path, sizeof(path),
>> @@ -2507,8 +2508,9 @@ static int vfio_initfn(PCIDevice *pdev)
>>
>>       vfio_map_bars(vdev);
>>
>> -    ret = vfio_add_capabilities(vdev);
>> +    ret = vfio_add_capabilities(vdev, &local_err);
>>       if (ret) {
>> +        error_report_err(local_err);
>>           goto out_teardown;
>>       }
>>
>> diff --git a/include/hw/pci/msi.h b/include/hw/pci/msi.h
>> index 50e452b..da1dc1a 100644
>> --- a/include/hw/pci/msi.h
>> +++ b/include/hw/pci/msi.h
>> @@ -34,8 +34,8 @@ extern bool msi_supported;
>>   void msi_set_message(PCIDevice *dev, MSIMessage msg);
>>   MSIMessage msi_get_message(PCIDevice *dev, unsigned int vector);
>>   bool msi_enabled(const PCIDevice *dev);
>> -int msi_init(struct PCIDevice *dev, uint8_t offset,
>> -             unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
>> +int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
>> +             bool msi64bit, bool msi_per_vector_mask, Error **errp);
>>   void msi_uninit(struct PCIDevice *dev);
>>   void msi_reset(PCIDevice *dev);
>>   void msi_notify(PCIDevice *dev, unsigned int vector);
>
>
> .
>
Marcel Apfelbaum March 3, 2016, 10:12 a.m. UTC | #3
On 03/02/2016 11:13 AM, Markus Armbruster wrote:
> This got lost over the Christmas break, sorry.
>
> Cc'ing Marcel for additional PCI expertise.
>
> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>
>> msi_init() is a supporting function in PCI device initialization,
>> in order to convert .init() to .realize(), it should be modified first.
>
> "Supporting function" doesn't imply "should use Error to report
> errors".  HACKING explains:
>
>      Use the simplest suitable method to communicate success / failure to
>      callers.  Stick to common methods: non-negative on success / -1 on
>      error, non-negative / -errno, non-null / null, or Error objects.
>
>      Example: when a function returns a non-null pointer on success, and it
>      can fail only in one way (as far as the caller is concerned), returning
>      null on failure is just fine, and certainly simpler and a lot easier on
>      the eyes than propagating an Error object through an Error ** parameter.
>
>      Example: when a function's callers need to report details on failure
>      only the function really knows, use Error **, and set suitable errors.
>
>      Do not report an error to the user when you're also returning an error
>      for somebody else to handle.  Leave the reporting to the place that
>      consumes the error returned.
>
> As we'll see in your patch of msi.c, msi_init() can fail in multiple
> ways, and uses -errno to comunicate them.  That can be okay even in
> realize().  It also reports to the user.  That's what makes it
> unsuitable for realize().
>
>> Also modify the callers
>
> Actually, you're *fixing* callers!  But the bugs aren't 100% clear, yet,
> see below for details.  Once we know what the bugs are, we'll want to
> reword the commit message to describe the bugs and their impact.
>
> I recommend to skip ahead to msi.c, then come back to the device models.
>
>> Bonus: add more comment for msi_init().
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
>>   hw/audio/intel-hda.c               | 10 ++++-
>>   hw/ide/ich.c                       |  2 +-
>>   hw/net/vmxnet3.c                   | 13 +++---
>>   hw/pci-bridge/ioh3420.c            |  7 +++-
>>   hw/pci-bridge/pci_bridge_dev.c     |  8 +++-
>>   hw/pci-bridge/xio3130_downstream.c |  8 +++-
>>   hw/pci-bridge/xio3130_upstream.c   |  8 +++-
>>   hw/pci/msi.c                       | 18 +++++++--
>>   hw/scsi/megasas.c                  | 15 +++++--
>>   hw/scsi/vmw_pvscsi.c               | 13 ++++--
>>   hw/usb/hcd-xhci.c                  | 81 +++++++++++++++++++++-----------------
>>   hw/vfio/pci.c                      | 20 +++++-----
>>   include/hw/pci/msi.h               |  4 +-
>>   13 files changed, 135 insertions(+), 72 deletions(-)
>>

[...]

> Except I'm not sure the function should fail in the first place!  See
> below.
>
>> +int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
>> +             bool msi64bit, bool msi_per_vector_mask, Error **errp)
>>   {
>>       unsigned int vectors_order;
>> -    uint16_t flags;
>> +    uint16_t flags; /* Message Control register value */
>>       uint8_t cap_size;
>>       int config_offset;
>>
>>       if (!msi_supported) {
>> +        error_setg(errp, "MSI is not supported by interrupt controller");
>>           return -ENOTSUP;
>
> First failure mode: board does not support MSI (-ENOTSUP).
>
> Question to the PCI guys: why is this even an error?  A device with
> capability MSI should work just fine in such a board.

Hi Markus,

Adding Jan Kiszka, maybe he can help.

That's a fair question. Is there any history for this decision?
The board not supporting MSI has nothing to do with the capability being there.
The HW should not change because the board doe not support it.

The capability should be present but not active.

>
>>       }
>>
>> @@ -182,7 +190,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>>       }
>>
>>       cap_size = msi_cap_sizeof(flags);
>> -    config_offset = pci_add_capability(dev, PCI_CAP_ID_MSI, offset, cap_size);
>> +    config_offset = pci_add_capability2(dev, PCI_CAP_ID_MSI, offset,
>> +                                        cap_size, errp);
>
> pci_add_capability() is a wrapper around pci_add_capability2() that
> additionally reports errors with error_report_err().  This is what makes
> it unsuitable for realize().
>
>>       if (config_offset < 0) {
>>           return config_offset;
>
> Inherits failing modes from pci_add_capability2().  Two: out of PCI
> config space (-ENOSPC), and capability overlap (-EINVAL).
>
> Question for the PCI guys: how can these happen?  When they happen, is
> it a programming error?

out of PCI config space: a device emulation error, not enough room
for all its capabilities - it seems to be a programming error.

capability overlap: is for device assignment. This checks for a real HW
that is broke. - not a programming error.

>
>>       }
>> @@ -205,6 +214,7 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>>           pci_set_long(dev->wmask + msi_mask_off(dev, msi64bit),
>>                        0xffffffff >> (PCI_MSI_VECTORS_MAX - nr_vectors));
>>       }
>> +
>>       return config_offset;
>>   }
>>


Thanks,
Marcel


[...]
Michael S. Tsirkin March 3, 2016, 10:45 a.m. UTC | #4
On Thu, Mar 03, 2016 at 12:12:27PM +0200, Marcel Apfelbaum wrote:
> >>+int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
> >>+             bool msi64bit, bool msi_per_vector_mask, Error **errp)
> >>  {
> >>      unsigned int vectors_order;
> >>-    uint16_t flags;
> >>+    uint16_t flags; /* Message Control register value */
> >>      uint8_t cap_size;
> >>      int config_offset;
> >>
> >>      if (!msi_supported) {
> >>+        error_setg(errp, "MSI is not supported by interrupt controller");
> >>          return -ENOTSUP;
> >
> >First failure mode: board does not support MSI (-ENOTSUP).
> >
> >Question to the PCI guys: why is this even an error?  A device with
> >capability MSI should work just fine in such a board.
> 
> Hi Markus,
> 
> Adding Jan Kiszka, maybe he can help.
> 
> That's a fair question. Is there any history for this decision?
> The board not supporting MSI has nothing to do with the capability being there.
> The HW should not change because the board doe not support it.
> 
> The capability should be present but not active.

Digging in git log will tell you why we have the msi_supported flag:

commit 02eb84d0ec97f183ac23ee939403a139e8849b1d ("qemu/pci: MSI-X support functions")

	This is a safety measure to avoid breaking platforms which should support
	MSI-X but currently lack this in the interrupt controller emulation.

in other words, on some platforms we must hide msi support from devices
because otherwise guests will try to use it, and our emulation is
incomplete.

And the conclusion from that is that for msi_init to fail silently is
at the moment the right thing to do.

The only other reason for it to fail is pci config space corruption,
this probably never happens in practice.
Marcel Apfelbaum March 3, 2016, 11:19 a.m. UTC | #5
On 03/03/2016 12:45 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 03, 2016 at 12:12:27PM +0200, Marcel Apfelbaum wrote:
>>>> +int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
>>>> +             bool msi64bit, bool msi_per_vector_mask, Error **errp)
>>>>   {
>>>>       unsigned int vectors_order;
>>>> -    uint16_t flags;
>>>> +    uint16_t flags; /* Message Control register value */
>>>>       uint8_t cap_size;
>>>>       int config_offset;
>>>>
>>>>       if (!msi_supported) {
>>>> +        error_setg(errp, "MSI is not supported by interrupt controller");
>>>>           return -ENOTSUP;
>>>
>>> First failure mode: board does not support MSI (-ENOTSUP).
>>>
>>> Question to the PCI guys: why is this even an error?  A device with
>>> capability MSI should work just fine in such a board.
>>
>> Hi Markus,
>>
>> Adding Jan Kiszka, maybe he can help.
>>
>> That's a fair question. Is there any history for this decision?
>> The board not supporting MSI has nothing to do with the capability being there.
>> The HW should not change because the board doe not support it.
>>
>> The capability should be present but not active.
>
> Digging in git log will tell you why we have the msi_supported flag:
>
> commit 02eb84d0ec97f183ac23ee939403a139e8849b1d ("qemu/pci: MSI-X support functions")
>
> 	This is a safety measure to avoid breaking platforms which should support
> 	MSI-X but currently lack this in the interrupt controller emulation.
>
> in other words, on some platforms we must hide msi support from devices
> because otherwise guests will try to use it, and our emulation is
> incomplete.


OK, thanks. So the flag should be "msi_broken" or "msi_present_but_not_implemented" and not
"msi_supported" that leads (at least me) to the assumption that some platform *does not support msi*
rather than it supports it, but we don't emulate it.


>
> And the conclusion from that is that for msi_init to fail silently is
> at the moment the right thing to do.

But this is not the only thing we do, we are modifying the PCI devices. We could fail starting the VM
if a device supporting MSI is added on a platform with broken msi, but this will prevent us to use
assigned devices. Emulated devices should be created with a specific "msi=off" flag.

Thanks,
Marcel

>
> The only other reason for it to fail is pci config space corruption,
> this probably never happens in practice.
>
Michael S. Tsirkin March 3, 2016, 11:33 a.m. UTC | #6
On Thu, Mar 03, 2016 at 01:19:09PM +0200, Marcel Apfelbaum wrote:
> On 03/03/2016 12:45 PM, Michael S. Tsirkin wrote:
> >On Thu, Mar 03, 2016 at 12:12:27PM +0200, Marcel Apfelbaum wrote:
> >>>>+int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
> >>>>+             bool msi64bit, bool msi_per_vector_mask, Error **errp)
> >>>>  {
> >>>>      unsigned int vectors_order;
> >>>>-    uint16_t flags;
> >>>>+    uint16_t flags; /* Message Control register value */
> >>>>      uint8_t cap_size;
> >>>>      int config_offset;
> >>>>
> >>>>      if (!msi_supported) {
> >>>>+        error_setg(errp, "MSI is not supported by interrupt controller");
> >>>>          return -ENOTSUP;
> >>>
> >>>First failure mode: board does not support MSI (-ENOTSUP).
> >>>
> >>>Question to the PCI guys: why is this even an error?  A device with
> >>>capability MSI should work just fine in such a board.
> >>
> >>Hi Markus,
> >>
> >>Adding Jan Kiszka, maybe he can help.
> >>
> >>That's a fair question. Is there any history for this decision?
> >>The board not supporting MSI has nothing to do with the capability being there.
> >>The HW should not change because the board doe not support it.
> >>
> >>The capability should be present but not active.
> >
> >Digging in git log will tell you why we have the msi_supported flag:
> >
> >commit 02eb84d0ec97f183ac23ee939403a139e8849b1d ("qemu/pci: MSI-X support functions")
> >
> >	This is a safety measure to avoid breaking platforms which should support
> >	MSI-X but currently lack this in the interrupt controller emulation.
> >
> >in other words, on some platforms we must hide msi support from devices
> >because otherwise guests will try to use it, and our emulation is
> >incomplete.
> 
> 
> OK, thanks. So the flag should be "msi_broken" or "msi_present_but_not_implemented" and not
> "msi_supported" that leads (at least me) to the assumption that some platform *does not support msi*
> rather than it supports it, but we don't emulate it.
> 
> 
> >
> >And the conclusion from that is that for msi_init to fail silently is
> >at the moment the right thing to do.
> 
> But this is not the only thing we do, we are modifying the PCI devices. We could fail starting the VM
> if a device supporting MSI is added on a platform with broken msi, but this will prevent us to use
> assigned devices. Emulated devices should be created with a specific "msi=off" flag.
> 
> Thanks,
> Marcel

That will just break a bunch of valid configurations, for no real
benefit to users.

> >
> >The only other reason for it to fail is pci config space corruption,
> >this probably never happens in practice.
> >
Markus Armbruster March 3, 2016, 2:24 p.m. UTC | #7
Marcel Apfelbaum <marcel@redhat.com> writes:

> On 03/02/2016 11:13 AM, Markus Armbruster wrote:
>> This got lost over the Christmas break, sorry.
>>
>> Cc'ing Marcel for additional PCI expertise.
>>
>> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>>
>>> msi_init() is a supporting function in PCI device initialization,
>>> in order to convert .init() to .realize(), it should be modified first.
>>
>> "Supporting function" doesn't imply "should use Error to report
>> errors".  HACKING explains:
>>
>>      Use the simplest suitable method to communicate success / failure to
>>      callers.  Stick to common methods: non-negative on success / -1 on
>>      error, non-negative / -errno, non-null / null, or Error objects.
>>
>>      Example: when a function returns a non-null pointer on success, and it
>>      can fail only in one way (as far as the caller is concerned), returning
>>      null on failure is just fine, and certainly simpler and a lot easier on
>>      the eyes than propagating an Error object through an Error ** parameter.
>>
>>      Example: when a function's callers need to report details on failure
>>      only the function really knows, use Error **, and set suitable errors.
>>
>>      Do not report an error to the user when you're also returning an error
>>      for somebody else to handle.  Leave the reporting to the place that
>>      consumes the error returned.
>>
>> As we'll see in your patch of msi.c, msi_init() can fail in multiple
>> ways, and uses -errno to comunicate them.  That can be okay even in
>> realize().  It also reports to the user.  That's what makes it
>> unsuitable for realize().
>>
>>> Also modify the callers
>>
>> Actually, you're *fixing* callers!  But the bugs aren't 100% clear, yet,
>> see below for details.  Once we know what the bugs are, we'll want to
>> reword the commit message to describe the bugs and their impact.
>>
>> I recommend to skip ahead to msi.c, then come back to the device models.
>>
>>> Bonus: add more comment for msi_init().
>>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>>> ---
>>>   hw/audio/intel-hda.c               | 10 ++++-
>>>   hw/ide/ich.c                       |  2 +-
>>>   hw/net/vmxnet3.c                   | 13 +++---
>>>   hw/pci-bridge/ioh3420.c            |  7 +++-
>>>   hw/pci-bridge/pci_bridge_dev.c     |  8 +++-
>>>   hw/pci-bridge/xio3130_downstream.c |  8 +++-
>>>   hw/pci-bridge/xio3130_upstream.c   |  8 +++-
>>>   hw/pci/msi.c                       | 18 +++++++--
>>>   hw/scsi/megasas.c                  | 15 +++++--
>>>   hw/scsi/vmw_pvscsi.c               | 13 ++++--
>>>   hw/usb/hcd-xhci.c                  | 81 +++++++++++++++++++++-----------------
>>>   hw/vfio/pci.c                      | 20 +++++-----
>>>   include/hw/pci/msi.h               |  4 +-
>>>   13 files changed, 135 insertions(+), 72 deletions(-)
>>>
>
> [...]
>
>> Except I'm not sure the function should fail in the first place!  See
>> below.
>>
>>> +int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
>>> +             bool msi64bit, bool msi_per_vector_mask, Error **errp)
>>>   {
>>>       unsigned int vectors_order;
>>> -    uint16_t flags;
>>> +    uint16_t flags; /* Message Control register value */
>>>       uint8_t cap_size;
>>>       int config_offset;
>>>
>>>       if (!msi_supported) {
>>> +        error_setg(errp, "MSI is not supported by interrupt controller");
>>>           return -ENOTSUP;
>>
>> First failure mode: board does not support MSI (-ENOTSUP).
>>
>> Question to the PCI guys: why is this even an error?  A device with
>> capability MSI should work just fine in such a board.
>
> Hi Markus,
>
> Adding Jan Kiszka, maybe he can help.
>
> That's a fair question. Is there any history for this decision?
> The board not supporting MSI has nothing to do with the capability being there.
> The HW should not change because the board doe not support it.
>
> The capability should be present but not active.
>
>>
>>>       }
>>>
>>> @@ -182,7 +190,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>>>       }
>>>
>>>       cap_size = msi_cap_sizeof(flags);
>>> -    config_offset = pci_add_capability(dev, PCI_CAP_ID_MSI, offset, cap_size);
>>> +    config_offset = pci_add_capability2(dev, PCI_CAP_ID_MSI, offset,
>>> +                                        cap_size, errp);
>>
>> pci_add_capability() is a wrapper around pci_add_capability2() that
>> additionally reports errors with error_report_err().  This is what makes
>> it unsuitable for realize().
>>
>>>       if (config_offset < 0) {
>>>           return config_offset;
>>
>> Inherits failing modes from pci_add_capability2().  Two: out of PCI
>> config space (-ENOSPC), and capability overlap (-EINVAL).
>>
>> Question for the PCI guys: how can these happen?  When they happen, is
>> it a programming error?
>
> out of PCI config space: a device emulation error, not enough room
> for all its capabilities - it seems to be a programming error.

Programming error should be an assertion failure, not falling to a
variant of the device the user didn't order and that might not even
exist in the real world.

> capability overlap: is for device assignment. This checks for a real HW
> that is broke. - not a programming error.

Okay.

Thanks!

[...]
Markus Armbruster March 3, 2016, 3:03 p.m. UTC | #8
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Mar 03, 2016 at 01:19:09PM +0200, Marcel Apfelbaum wrote:
>> On 03/03/2016 12:45 PM, Michael S. Tsirkin wrote:
>> >On Thu, Mar 03, 2016 at 12:12:27PM +0200, Marcel Apfelbaum wrote:
>> >>>>+int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
>> >>>>+             bool msi64bit, bool msi_per_vector_mask, Error **errp)
>> >>>>  {
>> >>>>      unsigned int vectors_order;
>> >>>>-    uint16_t flags;
>> >>>>+    uint16_t flags; /* Message Control register value */
>> >>>>      uint8_t cap_size;
>> >>>>      int config_offset;
>> >>>>
>> >>>>      if (!msi_supported) {
>> >>>>+        error_setg(errp, "MSI is not supported by interrupt controller");
>> >>>>          return -ENOTSUP;
>> >>>
>> >>>First failure mode: board does not support MSI (-ENOTSUP).
>> >>>
>> >>>Question to the PCI guys: why is this even an error?  A device with
>> >>>capability MSI should work just fine in such a board.
>> >>
>> >>Hi Markus,
>> >>
>> >>Adding Jan Kiszka, maybe he can help.
>> >>
>> >>That's a fair question. Is there any history for this decision?
>> >>The board not supporting MSI has nothing to do with the capability being there.
>> >>The HW should not change because the board doe not support it.
>> >>
>> >>The capability should be present but not active.
>> >
>> >Digging in git log will tell you why we have the msi_supported flag:
>> >
>> >commit 02eb84d0ec97f183ac23ee939403a139e8849b1d ("qemu/pci: MSI-X support functions")
>> >
>> >	This is a safety measure to avoid breaking platforms which should support
>> >	MSI-X but currently lack this in the interrupt controller emulation.
>> >
>> >in other words, on some platforms we must hide msi support from devices
>> >because otherwise guests will try to use it, and our emulation is
>> >incomplete.
>> 
>> 
>> OK, thanks. So the flag should be "msi_broken" or "msi_present_but_not_implemented" and not
>> "msi_supported" that leads (at least me) to the assumption that some platform *does not support msi*
>> rather than it supports it, but we don't emulate it.

I agree the name is badly misleading for this role.

Now let me see how this contraption actually works.  msi_supported is
global, initialized to false, and becomes globally true when

1. certain MSI-capable interrupt controllers realize: "apic",
  "kvm-apic" if kvm_has_gsi_routing(), "xen-apic", "arm-gicv2m",
  "openpic" models 1 and 2, "kvm-openpic" models 1 and 2

2. "s390-pcihost" class-initializes

3. machine "spapr-machine" initializes

Issues:

* "Global is problematic.  What if a board has more than one interrupt
  controller?  What if one of them sets msi_supported, but the other one
  is of the kind Michael described, i.e. guests know it has MSI, but our
  emulation doesn't actually work?

* "Initialize to false" is problematic.  We don't clear msi_supported
  when we have a broken interrupt controler, we set it when we have a
  working one.  The consequence is that boards with non-MSI interrupt
  controllers are treated just like boards with broken interrupt
  controllers.

  Here's  how msi_supported is documented:

    /* Flag for interrupt controller to declare MSI/MSI-X support */
    bool msi_supported;

  This is matches how the code works.  However, it contradicts the
  commit message Michael quoted.  The most plausible explanation is that
  the commit is flawed.

* Class-initialize (2.) looks wrong to me.  msi_supported becomes true
  when QOM type "s390-pcihost" is created, regardless of whether
  instances get created, let alone used.

* I'm not sure about 3., but the spapr guys can worry about that.

>> >And the conclusion from that is that for msi_init to fail silently is
>> >at the moment the right thing to do.
>> 
>> But this is not the only thing we do, we are modifying the PCI devices. We could fail starting the VM
>> if a device supporting MSI is added on a platform with broken msi, but this will prevent us to use
>> assigned devices. Emulated devices should be created with a specific "msi=off" flag.
>> 
>> Thanks,
>> Marcel
>
> That will just break a bunch of valid configurations, for no real
> benefit to users.

I disagree, strongly.

If I ask for msi=on, then QEMU should either give it to me or fail, not
silently "correct" my configuration.

Of course, if we have msi_supported = false for everybody and its dog
whether it's really needed or not, "or fail" will indeed reject "a bunch
of valid configurations".  We "compensate" by silently messing with the
user's configuration.  That's shoddy workmanship, sorry.

We should instead have msi_supported = false only when it's actually
needed, and thus reject exactly the configurations that won't work.

>> >
>> >The only other reason for it to fail is pci config space corruption,
>> >this probably never happens in practice.
>> >
Michael S. Tsirkin March 3, 2016, 6:33 p.m. UTC | #9
On Thu, Mar 03, 2016 at 04:03:16PM +0100, Markus Armbruster wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Thu, Mar 03, 2016 at 01:19:09PM +0200, Marcel Apfelbaum wrote:
> >> On 03/03/2016 12:45 PM, Michael S. Tsirkin wrote:
> >> >On Thu, Mar 03, 2016 at 12:12:27PM +0200, Marcel Apfelbaum wrote:
> >> >>>>+int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
> >> >>>>+             bool msi64bit, bool msi_per_vector_mask, Error **errp)
> >> >>>>  {
> >> >>>>      unsigned int vectors_order;
> >> >>>>-    uint16_t flags;
> >> >>>>+    uint16_t flags; /* Message Control register value */
> >> >>>>      uint8_t cap_size;
> >> >>>>      int config_offset;
> >> >>>>
> >> >>>>      if (!msi_supported) {
> >> >>>>+        error_setg(errp, "MSI is not supported by interrupt controller");
> >> >>>>          return -ENOTSUP;
> >> >>>
> >> >>>First failure mode: board does not support MSI (-ENOTSUP).
> >> >>>
> >> >>>Question to the PCI guys: why is this even an error?  A device with
> >> >>>capability MSI should work just fine in such a board.
> >> >>
> >> >>Hi Markus,
> >> >>
> >> >>Adding Jan Kiszka, maybe he can help.
> >> >>
> >> >>That's a fair question. Is there any history for this decision?
> >> >>The board not supporting MSI has nothing to do with the capability being there.
> >> >>The HW should not change because the board doe not support it.
> >> >>
> >> >>The capability should be present but not active.
> >> >
> >> >Digging in git log will tell you why we have the msi_supported flag:
> >> >
> >> >commit 02eb84d0ec97f183ac23ee939403a139e8849b1d ("qemu/pci: MSI-X support functions")
> >> >
> >> >	This is a safety measure to avoid breaking platforms which should support
> >> >	MSI-X but currently lack this in the interrupt controller emulation.
> >> >
> >> >in other words, on some platforms we must hide msi support from devices
> >> >because otherwise guests will try to use it, and our emulation is
> >> >incomplete.
> >> 
> >> 
> >> OK, thanks. So the flag should be "msi_broken" or "msi_present_but_not_implemented" and not
> >> "msi_supported" that leads (at least me) to the assumption that some platform *does not support msi*
> >> rather than it supports it, but we don't emulate it.
> 
> I agree the name is badly misleading for this role.
> 
> Now let me see how this contraption actually works.  msi_supported is
> global, initialized to false, and becomes globally true when
> 
> 1. certain MSI-capable interrupt controllers realize: "apic",
>   "kvm-apic" if kvm_has_gsi_routing(), "xen-apic", "arm-gicv2m",
>   "openpic" models 1 and 2, "kvm-openpic" models 1 and 2
> 
> 2. "s390-pcihost" class-initializes
> 
> 3. machine "spapr-machine" initializes
> 
> Issues:
> 
> * "Global is problematic.  What if a board has more than one interrupt
>   controller?  What if one of them sets msi_supported, but the other one
>   is of the kind Michael described, i.e. guests know it has MSI, but our
>   emulation doesn't actually work?
> 
> * "Initialize to false" is problematic.  We don't clear msi_supported
>   when we have a broken interrupt controler, we set it when we have a
>   working one.  The consequence is that boards with non-MSI interrupt
>   controllers are treated just like boards with broken interrupt
>   controllers.
> 
>   Here's  how msi_supported is documented:
> 
>     /* Flag for interrupt controller to declare MSI/MSI-X support */
>     bool msi_supported;
> 
>   This is matches how the code works.  However, it contradicts the
>   commit message Michael quoted.  The most plausible explanation is that
>   the commit is flawed.
> 
> * Class-initialize (2.) looks wrong to me.  msi_supported becomes true
>   when QOM type "s390-pcihost" is created, regardless of whether
>   instances get created, let alone used.
> 
> * I'm not sure about 3., but the spapr guys can worry about that.
> 
> >> >And the conclusion from that is that for msi_init to fail silently is
> >> >at the moment the right thing to do.
> >> 
> >> But this is not the only thing we do, we are modifying the PCI devices. We could fail starting the VM
> >> if a device supporting MSI is added on a platform with broken msi, but this will prevent us to use
> >> assigned devices. Emulated devices should be created with a specific "msi=off" flag.
> >> 
> >> Thanks,
> >> Marcel
> >
> > That will just break a bunch of valid configurations, for no real
> > benefit to users.
> 
> I disagree, strongly.
> 
> If I ask for msi=on, then QEMU should either give it to me or fail, not
> silently "correct" my configuration.
> 
> Of course, if we have msi_supported = false for everybody and its dog
> whether it's really needed or not, "or fail" will indeed reject "a bunch
> of valid configurations".  We "compensate" by silently messing with the
> user's configuration.  That's shoddy workmanship, sorry.
> 
> We should instead have msi_supported = false only when it's actually
> needed, and thus reject exactly the configurations that won't work.

Most people don't care one way or the other.  We started without msi.
We later added msi for some boards only.  Asking everyone to add msi=off
for all other boards at that point would have been silly.

> >> >
> >> >The only other reason for it to fail is pci config space corruption,
> >> >this probably never happens in practice.
> >> >
Markus Armbruster March 4, 2016, 8:42 a.m. UTC | #10
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Mar 03, 2016 at 04:03:16PM +0100, Markus Armbruster wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Thu, Mar 03, 2016 at 01:19:09PM +0200, Marcel Apfelbaum wrote:
>> >> On 03/03/2016 12:45 PM, Michael S. Tsirkin wrote:
>> >> >On Thu, Mar 03, 2016 at 12:12:27PM +0200, Marcel Apfelbaum wrote:
>> >> >>>>+int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
>> >> >>>>+             bool msi64bit, bool msi_per_vector_mask, Error **errp)
>> >> >>>>  {
>> >> >>>>      unsigned int vectors_order;
>> >> >>>>-    uint16_t flags;
>> >> >>>>+    uint16_t flags; /* Message Control register value */
>> >> >>>>      uint8_t cap_size;
>> >> >>>>      int config_offset;
>> >> >>>>
>> >> >>>>      if (!msi_supported) {
>> >> >>>>+        error_setg(errp, "MSI is not supported by interrupt controller");
>> >> >>>>          return -ENOTSUP;
>> >> >>>
>> >> >>>First failure mode: board does not support MSI (-ENOTSUP).
>> >> >>>
>> >> >>>Question to the PCI guys: why is this even an error?  A device with
>> >> >>>capability MSI should work just fine in such a board.
>> >> >>
>> >> >>Hi Markus,
>> >> >>
>> >> >>Adding Jan Kiszka, maybe he can help.
>> >> >>
>> >> >>That's a fair question. Is there any history for this decision?
>> >> >>The board not supporting MSI has nothing to do with the capability being there.
>> >> >>The HW should not change because the board doe not support it.
>> >> >>
>> >> >>The capability should be present but not active.
>> >> >
>> >> >Digging in git log will tell you why we have the msi_supported flag:
>> >> >
>> >> >commit 02eb84d0ec97f183ac23ee939403a139e8849b1d ("qemu/pci: MSI-X support functions")
>> >> >
>> >> >	This is a safety measure to avoid breaking platforms which should support
>> >> >	MSI-X but currently lack this in the interrupt controller emulation.
>> >> >
>> >> >in other words, on some platforms we must hide msi support from devices
>> >> >because otherwise guests will try to use it, and our emulation is
>> >> >incomplete.
>> >> 
>> >> 
>> >> OK, thanks. So the flag should be "msi_broken" or "msi_present_but_not_implemented" and not
>> >> "msi_supported" that leads (at least me) to the assumption that some platform *does not support msi*
>> >> rather than it supports it, but we don't emulate it.
>> 
>> I agree the name is badly misleading for this role.
>> 
>> Now let me see how this contraption actually works.  msi_supported is
>> global, initialized to false, and becomes globally true when
>> 
>> 1. certain MSI-capable interrupt controllers realize: "apic",
>>   "kvm-apic" if kvm_has_gsi_routing(), "xen-apic", "arm-gicv2m",
>>   "openpic" models 1 and 2, "kvm-openpic" models 1 and 2
>> 
>> 2. "s390-pcihost" class-initializes
>> 
>> 3. machine "spapr-machine" initializes
>> 
>> Issues:
>> 
>> * "Global is problematic.  What if a board has more than one interrupt
>>   controller?  What if one of them sets msi_supported, but the other one
>>   is of the kind Michael described, i.e. guests know it has MSI, but our
>>   emulation doesn't actually work?
>> 
>> * "Initialize to false" is problematic.  We don't clear msi_supported
>>   when we have a broken interrupt controler, we set it when we have a
>>   working one.  The consequence is that boards with non-MSI interrupt
>>   controllers are treated just like boards with broken interrupt
>>   controllers.
>> 
>>   Here's  how msi_supported is documented:
>> 
>>     /* Flag for interrupt controller to declare MSI/MSI-X support */
>>     bool msi_supported;
>> 
>>   This is matches how the code works.  However, it contradicts the
>>   commit message Michael quoted.  The most plausible explanation is that
>>   the commit is flawed.
>> 
>> * Class-initialize (2.) looks wrong to me.  msi_supported becomes true
>>   when QOM type "s390-pcihost" is created, regardless of whether
>>   instances get created, let alone used.
>> 
>> * I'm not sure about 3., but the spapr guys can worry about that.
>> 
>> >> >And the conclusion from that is that for msi_init to fail silently is
>> >> >at the moment the right thing to do.
>> >> 
>> >> But this is not the only thing we do, we are modifying the PCI devices. We could fail starting the VM
>> >> if a device supporting MSI is added on a platform with broken msi, but this will prevent us to use
>> >> assigned devices. Emulated devices should be created with a specific "msi=off" flag.
>> >> 
>> >> Thanks,
>> >> Marcel
>> >
>> > That will just break a bunch of valid configurations, for no real
>> > benefit to users.
>> 
>> I disagree, strongly.
>> 
>> If I ask for msi=on, then QEMU should either give it to me or fail, not
>> silently "correct" my configuration.
>> 
>> Of course, if we have msi_supported = false for everybody and its dog
>> whether it's really needed or not, "or fail" will indeed reject "a bunch
>> of valid configurations".  We "compensate" by silently messing with the
>> user's configuration.  That's shoddy workmanship, sorry.
>> 
>> We should instead have msi_supported = false only when it's actually
>> needed, and thus reject exactly the configurations that won't work.
>
> Most people don't care one way or the other.  We started without msi.
> We later added msi for some boards only.  Asking everyone to add msi=off
> for all other boards at that point would have been silly.

I'm afraid you're missing my point entirely.

Yes, we started without MSI.  We then added MSI support to some devices
and to some boards.  Exactly the same history as for physical hardware.

Plugging an MSI-capable device into an MSI-incapable board works just
fine, both for physical and for virtual hardware.  What doesn't work is
plugging an MSI-capable device into an MSI-capable board with *broken*
MSI support.

As a convenience feature, we summarily and silently remove a device's
MSI capability when we detect such a broken board.  At least that's what
the commit message you quoted claims.

In reality, we remove it not just for broken boards, but even for
MSI-incapable boards.

I take issue with "summarily and silently", and "even for MSI-incapable
boards".

When multiple variants of a device exist, and the user didn't ask for a
specific one, then picking the one that works best with the board is
just fine.

It's absolutely not fine when the user did ask for a specific one.  When
I ask for msi=on, I mean it.  If it can't work with this board, tell me.
But silently ignoring my orders is a bug.

It's fine to have emulations of MSI-capable boards where MSI doesn't yet
work.  Even if that means we have to reject MSI-capable devices.

It's absolutely not fine to reject them for MSI-incapable boards, where
they'd work just fine.

Furthermore, I doubt the wisdom of creating virtual devices that don't
exist physically just to have something that works in our broken boards.
If the physical device exists in MSI-capable and MSI-incapable variants,
emulating both is fine.  But if it only ever existed MSI-capable, having
the PCI core(!) drop the MSI capability is a questionable idea.  I
suspect that the need for this dubious hack would be much smaller if we
didn't foolishly treat every MSI-incapable board as broken MSI-capable
board.

If you conclude that cleaning up this carelessly made mess is not worth
the bother now, then at least explain the mess in the code, please.
It's not obvious, and figuring out what's going on and why it is the way
it is has taken me several hours, and Marcel's help.

[...]
Cao jin March 23, 2016, 4:04 a.m. UTC | #11
On 03/02/2016 05:13 PM, Markus Armbruster wrote:
> This got lost over the Christmas break, sorry.
>
> Cc'ing Marcel for additional PCI expertise.
>
> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>
>> msi_init() is a supporting function in PCI device initialization,
>> in order to convert .init() to .realize(), it should be modified first.
>
> "Supporting function" doesn't imply "should use Error to report
> errors".  HACKING explains:
>
>      Use the simplest suitable method to communicate success / failure to
>      callers.  Stick to common methods: non-negative on success / -1 on
>      error, non-negative / -errno, non-null / null, or Error objects.
>
>      Example: when a function returns a non-null pointer on success, and it
>      can fail only in one way (as far as the caller is concerned), returning
>      null on failure is just fine, and certainly simpler and a lot easier on
>      the eyes than propagating an Error object through an Error ** parameter.
>
>      Example: when a function's callers need to report details on failure
>      only the function really knows, use Error **, and set suitable errors.
>
>      Do not report an error to the user when you're also returning an error
>      for somebody else to handle.  Leave the reporting to the place that
>      consumes the error returned.
>

Really appreciate your review, I just finished reading all the comments 
and discussion.

Seems pci_add_capability2()(commit cd9aa33e introduced) doesn`t follow 
the new error reporting rule(report error while also return error).

So I am thinking, could we revert commit cd9aa33e, let 
pci_add_capability() return error code and assert when out of pci space, 
and let caller(only assigned device, others could ignore the error) 
handle the error code(new a error object, propagate it)

Hope to hear PCI Maintainer`s advice(So I don`t cc other in this round)
Markus Armbruster March 23, 2016, 8:12 a.m. UTC | #12
Cao jin <caoj.fnst@cn.fujitsu.com> writes:

> On 03/02/2016 05:13 PM, Markus Armbruster wrote:
>> This got lost over the Christmas break, sorry.
>>
>> Cc'ing Marcel for additional PCI expertise.
>>
>> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>>
>>> msi_init() is a supporting function in PCI device initialization,
>>> in order to convert .init() to .realize(), it should be modified first.
>>
>> "Supporting function" doesn't imply "should use Error to report
>> errors".  HACKING explains:
>>
>>      Use the simplest suitable method to communicate success / failure to
>>      callers.  Stick to common methods: non-negative on success / -1 on
>>      error, non-negative / -errno, non-null / null, or Error objects.
>>
>>      Example: when a function returns a non-null pointer on success, and it
>>      can fail only in one way (as far as the caller is concerned), returning
>>      null on failure is just fine, and certainly simpler and a lot easier on
>>      the eyes than propagating an Error object through an Error ** parameter.
>>
>>      Example: when a function's callers need to report details on failure
>>      only the function really knows, use Error **, and set suitable errors.
>>
>>      Do not report an error to the user when you're also returning an error
>>      for somebody else to handle.  Leave the reporting to the place that
>>      consumes the error returned.
>>
>
> Really appreciate your review, I just finished reading all the
> comments and discussion.
>
> Seems pci_add_capability2()(commit cd9aa33e introduced) doesn`t follow
> the new error reporting rule(report error while also return error).

Misunderstanding?

"Report an error to the user" means error_report() and such.
error_setg() doesn't report to the user, it returns an error object to
the caller.

> So I am thinking, could we revert commit cd9aa33e, let
> pci_add_capability() return error code and assert when out of pci
> space, and let caller(only assigned device, others could ignore the
> error) handle the error code(new a error object, propagate it)
>
> Hope to hear PCI Maintainer`s advice(So I don`t cc other in this round)
Cao jin March 23, 2016, 9:23 a.m. UTC | #13
On 03/23/2016 04:12 PM, Markus Armbruster wrote:
> Cao jin <caoj.fnst@cn.fujitsu.com> writes:
>

>>
>> Really appreciate your review, I just finished reading all the
>> comments and discussion.
>>
>> Seems pci_add_capability2()(commit cd9aa33e introduced) doesn`t follow
>> the new error reporting rule(report error while also return error).
>
> Misunderstanding?
>
> "Report an error to the user" means error_report() and such.
> error_setg() doesn't report to the user, it returns an error object to
> the caller.
>

ah...thanks for correcting me.

>> So I am thinking, could we revert commit cd9aa33e, let
>> pci_add_capability() return error code and assert when out of pci
>> space, and let caller(only assigned device, others could ignore the
>> error) handle the error code(new a error object, propagate it)
>>
>> Hope to hear PCI Maintainer`s advice(So I don`t cc other in this round)
>
diff mbox

Patch

diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index 433463e..0d770131 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -1130,6 +1130,7 @@  static void intel_hda_realize(PCIDevice *pci, Error **errp)
 {
     IntelHDAState *d = INTEL_HDA(pci);
     uint8_t *conf = d->pci.config;
+    int ret;
 
     d->name = object_get_typename(OBJECT(d));
 
@@ -1142,11 +1143,18 @@  static void intel_hda_realize(PCIDevice *pci, Error **errp)
                           "intel-hda", 0x4000);
     pci_register_bar(&d->pci, 0, 0, &d->mmio);
     if (d->msi) {
-        msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1, true, false);
+        ret = msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1, true,
+                false, errp);
+        if (ret < 0) {
+            goto cleanup_on_msi_fail;
+        }
     }
 
     hda_codec_bus_init(DEVICE(pci), &d->codecs, sizeof(d->codecs),
                        intel_hda_response, intel_hda_xfer);
+
+cleanup_on_msi_fail:
+    object_unref(OBJECT(&d->mmio));
 }
 
 static void intel_hda_exit(PCIDevice *pci)
diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 16925fa..94b1809 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -145,7 +145,7 @@  static void pci_ich9_ahci_realize(PCIDevice *dev, Error **errp)
     /* Although the AHCI 1.3 specification states that the first capability
      * should be PMCAP, the Intel ICH9 data sheet specifies that the ICH9
      * AHCI device puts the MSI capability first, pointing to 0x80. */
-    msi_init(dev, ICH9_MSI_CAP_OFFSET, 1, true, false);
+    msi_init(dev, ICH9_MSI_CAP_OFFSET, 1, true, false, errp);
 }
 
 static void pci_ich9_uninit(PCIDevice *dev)
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 37373e5..c373e77 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2135,13 +2135,13 @@  vmxnet3_cleanup_msix(VMXNET3State *s)
 #define VMXNET3_PER_VECTOR_MASK   (false)
 
 static bool
-vmxnet3_init_msi(VMXNET3State *s)
+vmxnet3_init_msi(VMXNET3State *s, Error **errp)
 {
     PCIDevice *d = PCI_DEVICE(s);
     int res;
 
     res = msi_init(d, VMXNET3_MSI_OFFSET, VMXNET3_MAX_NMSIX_INTRS,
-                   VMXNET3_USE_64BIT, VMXNET3_PER_VECTOR_MASK);
+                   VMXNET3_USE_64BIT, VMXNET3_PER_VECTOR_MASK, errp);
     if (0 > res) {
         VMW_WRPRN("Failed to initialize MSI, error %d", res);
         s->msi_used = false;
@@ -2204,6 +2204,11 @@  static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
 
     VMW_CBPRN("Starting init...");
 
+    if (!vmxnet3_init_msi(s, errp)) {
+        VMW_WRPRN("Failed to initialize MSI, configuration is inconsistent.");
+        return;
+    }
+
     memory_region_init_io(&s->bar0, OBJECT(s), &b0_ops, s,
                           "vmxnet3-b0", VMXNET3_PT_REG_SIZE);
     pci_register_bar(pci_dev, VMXNET3_BAR0_IDX,
@@ -2228,10 +2233,6 @@  static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
         VMW_WRPRN("Failed to initialize MSI-X, configuration is inconsistent.");
     }
 
-    if (!vmxnet3_init_msi(s)) {
-        VMW_WRPRN("Failed to initialize MSI, configuration is inconsistent.");
-    }
-
     vmxnet3_net_init(s);
 
     register_savevm(dev, "vmxnet3-msix", -1, 1,
diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index eead195..5df9e83 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -96,6 +96,7 @@  static int ioh3420_initfn(PCIDevice *d)
     PCIEPort *p = PCIE_PORT(d);
     PCIESlot *s = PCIE_SLOT(d);
     int rc;
+    Error *local_err = NULL;
 
     pci_bridge_initfn(d, TYPE_PCIE_BUS);
     pcie_port_init_reg(d);
@@ -105,12 +106,16 @@  static int ioh3420_initfn(PCIDevice *d)
     if (rc < 0) {
         goto err_bridge;
     }
+
     rc = msi_init(d, IOH_EP_MSI_OFFSET, IOH_EP_MSI_NR_VECTOR,
                   IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
-                  IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
+                  IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
+                  &local_err);
     if (rc < 0) {
+        error_report_err(local_err);
         goto err_bridge;
     }
+
     rc = pcie_cap_init(d, IOH_EP_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT, p->port);
     if (rc < 0) {
         goto err_msi;
diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index bc3e1b7..bafaeeb 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -51,6 +51,7 @@  static int pci_bridge_dev_initfn(PCIDevice *dev)
     PCIBridge *br = PCI_BRIDGE(dev);
     PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev);
     int err;
+    Error *local_err = NULL;
 
     pci_bridge_initfn(dev, TYPE_PCI_BUS);
 
@@ -66,17 +67,21 @@  static int pci_bridge_dev_initfn(PCIDevice *dev)
         /* MSI is not applicable without SHPC */
         bridge_dev->flags &= ~(1 << PCI_BRIDGE_DEV_F_MSI_REQ);
     }
+
     err = slotid_cap_init(dev, 0, bridge_dev->chassis_nr, 0);
     if (err) {
         goto slotid_error;
     }
+
     if ((bridge_dev->flags & (1 << PCI_BRIDGE_DEV_F_MSI_REQ)) &&
         msi_supported) {
-        err = msi_init(dev, 0, 1, true, true);
+        err = msi_init(dev, 0, 1, true, true, &local_err);
         if (err < 0) {
+            error_report_err(local_err);
             goto msi_error;
         }
     }
+
     if (shpc_present(dev)) {
         /* TODO: spec recommends using 64 bit prefetcheable BAR.
          * Check whether that works well. */
@@ -84,6 +89,7 @@  static int pci_bridge_dev_initfn(PCIDevice *dev)
                          PCI_BASE_ADDRESS_MEM_TYPE_64, &bridge_dev->bar);
     }
     return 0;
+
 msi_error:
     slotid_cap_cleanup(dev);
 slotid_error:
diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c
index b4dd25f..3fc7455 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -59,21 +59,26 @@  static int xio3130_downstream_initfn(PCIDevice *d)
     PCIEPort *p = PCIE_PORT(d);
     PCIESlot *s = PCIE_SLOT(d);
     int rc;
+    Error *local_err = NULL;
 
     pci_bridge_initfn(d, TYPE_PCIE_BUS);
     pcie_port_init_reg(d);
 
     rc = msi_init(d, XIO3130_MSI_OFFSET, XIO3130_MSI_NR_VECTOR,
                   XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
-                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
+                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
+                  &local_err);
     if (rc < 0) {
+        error_report_err(local_err);
         goto err_bridge;
     }
+
     rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
                                XIO3130_SSVID_SVID, XIO3130_SSVID_SSID);
     if (rc < 0) {
         goto err_bridge;
     }
+
     rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_DOWNSTREAM,
                        p->port);
     if (rc < 0) {
@@ -103,6 +108,7 @@  err_msi:
     msi_uninit(d);
 err_bridge:
     pci_bridge_exitfn(d);
+
     return rc;
 }
 
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 434c8fd..882271c 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -55,26 +55,32 @@  static int xio3130_upstream_initfn(PCIDevice *d)
 {
     PCIEPort *p = PCIE_PORT(d);
     int rc;
+    Error *local_err = NULL;
 
     pci_bridge_initfn(d, TYPE_PCIE_BUS);
     pcie_port_init_reg(d);
 
     rc = msi_init(d, XIO3130_MSI_OFFSET, XIO3130_MSI_NR_VECTOR,
                   XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
-                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
+                  XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
+                  &local_err);
     if (rc < 0) {
+        error_report_err(local_err);
         goto err_bridge;
     }
+
     rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
                                XIO3130_SSVID_SVID, XIO3130_SSVID_SSID);
     if (rc < 0) {
         goto err_bridge;
     }
+
     rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_UPSTREAM,
                        p->port);
     if (rc < 0) {
         goto err_msi;
     }
+
     pcie_cap_flr_init(d);
     pcie_cap_deverr_init(d);
     rc = pcie_aer_init(d, XIO3130_AER_OFFSET);
diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index c1dd531..ad6ed09 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -150,15 +150,23 @@  bool msi_enabled(const PCIDevice *dev)
          PCI_MSI_FLAGS_ENABLE);
 }
 
-int msi_init(struct PCIDevice *dev, uint8_t offset,
-             unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask)
+/*
+ * @nr_vectors: Multiple Message Capable field of Message Control register
+ * @msi64bit: support 64-bit message address or not
+ * @msi_per_vector_mask: support per-vector masking or not
+ *
+ * return: MSI capability offset in config space
+ */
+int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
+             bool msi64bit, bool msi_per_vector_mask, Error **errp)
 {
     unsigned int vectors_order;
-    uint16_t flags;
+    uint16_t flags; /* Message Control register value */
     uint8_t cap_size;
     int config_offset;
 
     if (!msi_supported) {
+        error_setg(errp, "MSI is not supported by interrupt controller");
         return -ENOTSUP;
     }
 
@@ -182,7 +190,8 @@  int msi_init(struct PCIDevice *dev, uint8_t offset,
     }
 
     cap_size = msi_cap_sizeof(flags);
-    config_offset = pci_add_capability(dev, PCI_CAP_ID_MSI, offset, cap_size);
+    config_offset = pci_add_capability2(dev, PCI_CAP_ID_MSI, offset,
+                                        cap_size, errp);
     if (config_offset < 0) {
         return config_offset;
     }
@@ -205,6 +214,7 @@  int msi_init(struct PCIDevice *dev, uint8_t offset,
         pci_set_long(dev->wmask + msi_mask_off(dev, msi64bit),
                      0xffffffff >> (PCI_MSI_VECTORS_MAX - nr_vectors));
     }
+
     return config_offset;
 }
 
diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index d7dc667..ba25b3a 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2339,6 +2339,17 @@  static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
     /* Interrupt pin 1 */
     pci_conf[PCI_INTERRUPT_PIN] = 0x01;
 
+    if (megasas_use_msi(s)) {
+        int ret;
+
+        ret = msi_init(dev, 0x50, 1, true, false, errp);
+        if (ret > 0) {
+            s->flags &= ~MEGASAS_MASK_USE_MSI;
+        } else {
+            return;
+        }
+    }
+
     memory_region_init_io(&s->mmio_io, OBJECT(s), &megasas_mmio_ops, s,
                           "megasas-mmio", 0x4000);
     memory_region_init_io(&s->port_io, OBJECT(s), &megasas_port_ops, s,
@@ -2346,10 +2357,6 @@  static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
     memory_region_init_io(&s->queue_io, OBJECT(s), &megasas_queue_ops, s,
                           "megasas-queue", 0x40000);
 
-    if (megasas_use_msi(s) &&
-        msi_init(dev, 0x50, 1, true, false)) {
-        s->flags &= ~MEGASAS_MASK_USE_MSI;
-    }
     if (megasas_use_msix(s) &&
         msix_init(dev, 15, &s->mmio_io, b->mmio_bar, 0x2000,
                   &s->mmio_io, b->mmio_bar, 0x3800, 0x68)) {
diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
index 9c71f31..073b956 100644
--- a/hw/scsi/vmw_pvscsi.c
+++ b/hw/scsi/vmw_pvscsi.c
@@ -1014,13 +1014,13 @@  pvscsi_io_read(void *opaque, hwaddr addr, unsigned size)
 
 
 static bool
-pvscsi_init_msi(PVSCSIState *s)
+pvscsi_init_msi(PVSCSIState *s, Error **errp)
 {
     int res;
     PCIDevice *d = PCI_DEVICE(s);
 
     res = msi_init(d, PVSCSI_MSI_OFFSET, PVSCSI_MSIX_NUM_VECTORS,
-                   PVSCSI_USE_64BIT, PVSCSI_PER_VECTOR_MASK);
+                   PVSCSI_USE_64BIT, PVSCSI_PER_VECTOR_MASK, errp);
     if (res < 0) {
         trace_pvscsi_init_msi_fail(res);
         s->msi_used = false;
@@ -1066,6 +1066,7 @@  static int
 pvscsi_init(PCIDevice *pci_dev)
 {
     PVSCSIState *s = PVSCSI(pci_dev);
+    Error *local_err = NULL;
 
     trace_pvscsi_state("init");
 
@@ -1079,12 +1080,16 @@  pvscsi_init(PCIDevice *pci_dev)
     /* Interrupt pin A */
     pci_config_set_interrupt_pin(pci_dev->config, 1);
 
+    pvscsi_init_msi(s, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -1;
+    }
+
     memory_region_init_io(&s->io_space, OBJECT(s), &pvscsi_ops, s,
                           "pvscsi-io", PVSCSI_MEM_SPACE_SIZE);
     pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->io_space);
 
-    pvscsi_init_msi(s);
-
     s->completion_worker = qemu_bh_new(pvscsi_process_completion_queue, s);
     if (!s->completion_worker) {
         pvscsi_cleanup_msi(s);
diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 268ab36..7cd5f6c 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3572,6 +3572,43 @@  static void usb_xhci_init(XHCIState *xhci)
     }
 }
 
+static void usb_xhci_exit(PCIDevice *dev)
+{
+    int i;
+    XHCIState *xhci = XHCI(dev);
+
+    trace_usb_xhci_exit();
+
+    for (i = 0; i < xhci->numslots; i++) {
+        xhci_disable_slot(xhci, i + 1);
+    }
+
+    if (xhci->mfwrap_timer) {
+        timer_del(xhci->mfwrap_timer);
+        timer_free(xhci->mfwrap_timer);
+        xhci->mfwrap_timer = NULL;
+    }
+
+    memory_region_del_subregion(&xhci->mem, &xhci->mem_cap);
+    memory_region_del_subregion(&xhci->mem, &xhci->mem_oper);
+    memory_region_del_subregion(&xhci->mem, &xhci->mem_runtime);
+    memory_region_del_subregion(&xhci->mem, &xhci->mem_doorbell);
+
+    for (i = 0; i < xhci->numports; i++) {
+        XHCIPort *port = &xhci->ports[i];
+        memory_region_del_subregion(&xhci->mem, &port->mem);
+    }
+
+    /* destroy msix memory region */
+    if (dev->msix_table && dev->msix_pba
+        && dev->msix_entry_used) {
+        memory_region_del_subregion(&xhci->mem, &dev->msix_table_mmio);
+        memory_region_del_subregion(&xhci->mem, &dev->msix_pba_mmio);
+    }
+
+    usb_bus_release(&xhci->bus);
+}
+
 static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
 {
     int i, ret;
@@ -3643,7 +3680,10 @@  static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
     }
 
     if (xhci_get_flag(xhci, XHCI_FLAG_USE_MSI)) {
-        msi_init(dev, 0x70, xhci->numintrs, true, false);
+        ret = msi_init(dev, 0x70, xhci->numintrs, true, false, errp);
+        if (ret < 0) {
+            goto cleanup_on_msi_fail;
+        }
     }
     if (xhci_get_flag(xhci, XHCI_FLAG_USE_MSI_X)) {
         msix_init(dev, xhci->numintrs,
@@ -3651,43 +3691,10 @@  static void usb_xhci_realize(struct PCIDevice *dev, Error **errp)
                   &xhci->mem, 0, OFF_MSIX_PBA,
                   0x90);
     }
-}
-
-static void usb_xhci_exit(PCIDevice *dev)
-{
-    int i;
-    XHCIState *xhci = XHCI(dev);
-
-    trace_usb_xhci_exit();
-
-    for (i = 0; i < xhci->numslots; i++) {
-        xhci_disable_slot(xhci, i + 1);
-    }
-
-    if (xhci->mfwrap_timer) {
-        timer_del(xhci->mfwrap_timer);
-        timer_free(xhci->mfwrap_timer);
-        xhci->mfwrap_timer = NULL;
-    }
 
-    memory_region_del_subregion(&xhci->mem, &xhci->mem_cap);
-    memory_region_del_subregion(&xhci->mem, &xhci->mem_oper);
-    memory_region_del_subregion(&xhci->mem, &xhci->mem_runtime);
-    memory_region_del_subregion(&xhci->mem, &xhci->mem_doorbell);
-
-    for (i = 0; i < xhci->numports; i++) {
-        XHCIPort *port = &xhci->ports[i];
-        memory_region_del_subregion(&xhci->mem, &port->mem);
-    }
-
-    /* destroy msix memory region */
-    if (dev->msix_table && dev->msix_pba
-        && dev->msix_entry_used) {
-        memory_region_del_subregion(&xhci->mem, &dev->msix_table_mmio);
-        memory_region_del_subregion(&xhci->mem, &dev->msix_pba_mmio);
-    }
-
-    usb_bus_release(&xhci->bus);
+cleanup_on_msi_fail:
+    usb_xhci_exit(dev);
+    object_unref(OBJECT(&xhci->mem));
 }
 
 static int usb_xhci_post_load(void *opaque, int version_id)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1fb868c..633642e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1139,7 +1139,7 @@  static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
     }
 }
 
-static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
+static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
 {
     uint16_t ctrl;
     bool msi_64bit, msi_maskbit;
@@ -1147,6 +1147,7 @@  static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
 
     if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
               vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
+        error_setg(errp, "Read error!");
         return -errno;
     }
     ctrl = le16_to_cpu(ctrl);
@@ -1157,12 +1158,11 @@  static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
 
     trace_vfio_msi_setup(vdev->vbasedev.name, pos);
 
-    ret = msi_init(&vdev->pdev, pos, entries, msi_64bit, msi_maskbit);
+    ret = msi_init(&vdev->pdev, pos, entries, msi_64bit, msi_maskbit, errp);
     if (ret < 0) {
         if (ret == -ENOTSUP) {
             return 0;
         }
-        error_report("vfio: msi_init failed");
         return ret;
     }
     vdev->msi_cap_size = 0xa + (msi_maskbit ? 0xa : 0) + (msi_64bit ? 0x4 : 0);
@@ -1654,7 +1654,7 @@  static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
     }
 }
 
-static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
+static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint8_t cap_id, next, size;
@@ -1679,7 +1679,7 @@  static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
      * will be changed as we unwind the stack.
      */
     if (next) {
-        ret = vfio_add_std_cap(vdev, next);
+        ret = vfio_add_std_cap(vdev, next, errp);
         if (ret) {
             return ret;
         }
@@ -1695,7 +1695,7 @@  static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
 
     switch (cap_id) {
     case PCI_CAP_ID_MSI:
-        ret = vfio_msi_setup(vdev, pos);
+        ret = vfio_msi_setup(vdev, pos, errp);
         break;
     case PCI_CAP_ID_EXP:
         vfio_check_pcie_flr(vdev, pos);
@@ -1729,7 +1729,7 @@  static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
     return 0;
 }
 
-static int vfio_add_capabilities(VFIOPCIDevice *vdev)
+static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
 {
     PCIDevice *pdev = &vdev->pdev;
 
@@ -1738,7 +1738,7 @@  static int vfio_add_capabilities(VFIOPCIDevice *vdev)
         return 0; /* Nothing to add */
     }
 
-    return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
+    return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST], errp);
 }
 
 static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
@@ -2349,6 +2349,7 @@  static int vfio_initfn(PCIDevice *pdev)
     struct stat st;
     int groupid;
     int ret;
+    Error *local_err = NULL;
 
     /* Check that the host device exists */
     snprintf(path, sizeof(path),
@@ -2507,8 +2508,9 @@  static int vfio_initfn(PCIDevice *pdev)
 
     vfio_map_bars(vdev);
 
-    ret = vfio_add_capabilities(vdev);
+    ret = vfio_add_capabilities(vdev, &local_err);
     if (ret) {
+        error_report_err(local_err);
         goto out_teardown;
     }
 
diff --git a/include/hw/pci/msi.h b/include/hw/pci/msi.h
index 50e452b..da1dc1a 100644
--- a/include/hw/pci/msi.h
+++ b/include/hw/pci/msi.h
@@ -34,8 +34,8 @@  extern bool msi_supported;
 void msi_set_message(PCIDevice *dev, MSIMessage msg);
 MSIMessage msi_get_message(PCIDevice *dev, unsigned int vector);
 bool msi_enabled(const PCIDevice *dev);
-int msi_init(struct PCIDevice *dev, uint8_t offset,
-             unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
+int msi_init(struct PCIDevice *dev, uint8_t offset, unsigned int nr_vectors,
+             bool msi64bit, bool msi_per_vector_mask, Error **errp);
 void msi_uninit(struct PCIDevice *dev);
 void msi_reset(PCIDevice *dev);
 void msi_notify(PCIDevice *dev, unsigned int vector);