Patchwork [3/3] msi: Store the capability size in PCIDevice

login
register
mail settings
Submitter Alex Williamson
Date Nov. 2, 2010, 5:37 a.m.
Message ID <20101102053751.10424.7525.stgit@s20.home>
Download mbox | patch
Permalink /patch/69861/
State New
Headers show

Comments

Alex Williamson - Nov. 2, 2010, 5:37 a.m.
Avoid needing to get the MSI capability flags every time we need to
check the capability length.  This also makes it accessible outside
of msi.c, making it easier for users to filter config space writes
using msi_cap and msi_cap_size.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/msi.c |    9 ++++-----
 hw/pci.h |    3 ++-
 2 files changed, 6 insertions(+), 6 deletions(-)
Michael S. Tsirkin - Nov. 2, 2010, 9:25 a.m.
On Mon, Nov 01, 2010 at 11:37:53PM -0600, Alex Williamson wrote:
> Avoid needing to get the MSI capability flags every time we need to
> check the capability length.  This also makes it accessible outside
> of msi.c, making it easier for users to filter config space writes
> using msi_cap and msi_cap_size.

I think for this last use-case, we are better off with returning a
boolean from msi_write_config which tells us whether the write is in
range. This has the advantage in that it will also work well for other
capabilities. Or second best, if that is insufficient for some reason,
export an msi_cap_size function.

> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
>  hw/msi.c |    9 ++++-----
>  hw/pci.h |    3 ++-
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/msi.c b/hw/msi.c
> index 110859b..12e125f 100644
> --- a/hw/msi.c
> +++ b/hw/msi.c
> @@ -148,6 +148,7 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>      }
>  
>      dev->msi_cap = config_offset;
> +    dev->msi_cap_size = cap_size;
>      dev->cap_present |= QEMU_PCI_CAP_MSI;
>  
>      pci_set_word(dev->config + msi_flags_off(dev), flags);
> @@ -170,14 +171,12 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>  
>  void msi_uninit(struct PCIDevice *dev)
>  {
> -    uint8_t cap_size;
> -
>      if (!(dev->cap_present & QEMU_PCI_CAP_MSI))
>          return;
>  
> -    cap_size = msi_cap_sizeof(pci_get_word(dev->config + msi_flags_off(dev)));
> -    pci_del_capability(dev, PCI_CAP_ID_MSIX, cap_size);
> +    pci_del_capability(dev, PCI_CAP_ID_MSIX, dev->msi_cap_size);
>      dev->msi_cap = 0;
> +    dev->msi_cap_size = 0;
>      dev->cap_present &= ~QEMU_PCI_CAP_MSI;
>      MSI_DEV_PRINTF(dev, "uninit\n");
>  }
> @@ -269,7 +268,7 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
>      uint32_t pending;
>      int i;
>  
> -    if (!ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
> +    if (!ranges_overlap(addr, len, dev->msi_cap, dev->msi_cap_size)) {
>          return;
>      }
>  
> diff --git a/hw/pci.h b/hw/pci.h
> index a558803..d268806 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -176,8 +176,9 @@ struct PCIDevice {
>      /* Version id needed for VMState */
>      int32_t version_id;
>  
> -    /* Offset of MSI capability in config space */
> +    /* Offset & size of MSI capability in config space */
>      uint8_t msi_cap;
> +    uint8_t msi_cap_size;
>  
>      /* PCI Express */
>      PCIExpressDevice exp;
Alex Williamson - Nov. 2, 2010, 2 p.m.
On Tue, 2010-11-02 at 11:25 +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 01, 2010 at 11:37:53PM -0600, Alex Williamson wrote:
> > Avoid needing to get the MSI capability flags every time we need to
> > check the capability length.  This also makes it accessible outside
> > of msi.c, making it easier for users to filter config space writes
> > using msi_cap and msi_cap_size.
> 
> I think for this last use-case, we are better off with returning a
> boolean from msi_write_config which tells us whether the write is in
> range. This has the advantage in that it will also work well for other
> capabilities. Or second best, if that is insufficient for some reason,
> export an msi_cap_size function.

Returning whether the write was in range isn't enough.  For device
assignment, I need to know whether the capability was enabled or
disabled.  This currently means checking the enable state before and
after calling msi_write_config and doing the appropriate backend setup.
I think the only way I could blindly call the msi/x write config
routines is if we init the capability with enable/disable callbacks.
I'd be ok with an msi_cap_size function if we don't want to go that far
too.  What do you prefer?  Thanks,

Alex
Michael S. Tsirkin - Nov. 2, 2010, 2:07 p.m.
On Tue, Nov 02, 2010 at 08:00:38AM -0600, Alex Williamson wrote:
> On Tue, 2010-11-02 at 11:25 +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 01, 2010 at 11:37:53PM -0600, Alex Williamson wrote:
> > > Avoid needing to get the MSI capability flags every time we need to
> > > check the capability length.  This also makes it accessible outside
> > > of msi.c, making it easier for users to filter config space writes
> > > using msi_cap and msi_cap_size.
> > 
> > I think for this last use-case, we are better off with returning a
> > boolean from msi_write_config which tells us whether the write is in
> > range. This has the advantage in that it will also work well for other
> > capabilities. Or second best, if that is insufficient for some reason,
> > export an msi_cap_size function.
> 
> Returning whether the write was in range isn't enough.  For device
> assignment, I need to know whether the capability was enabled or
> disabled.  This currently means checking the enable state before and
> after calling msi_write_config and doing the appropriate backend setup.

Sounds good. Why does this mean you need the capability size?
	bool was_enabled = msi_enabled(dev);
	msi_write_config(..)
	if (was_enabled != msi_enabled(dev)) {
	}

> I think the only way I could blindly call the msi/x write config
> routines is if we init the capability with enable/disable callbacks.
> I'd be ok with an msi_cap_size function if we don't want to go that far
> too.  What do you prefer?  Thanks,
> 
> Alex
Alex Williamson - Nov. 2, 2010, 2:23 p.m.
On Tue, 2010-11-02 at 16:07 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 02, 2010 at 08:00:38AM -0600, Alex Williamson wrote:
> > On Tue, 2010-11-02 at 11:25 +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 01, 2010 at 11:37:53PM -0600, Alex Williamson wrote:
> > > > Avoid needing to get the MSI capability flags every time we need to
> > > > check the capability length.  This also makes it accessible outside
> > > > of msi.c, making it easier for users to filter config space writes
> > > > using msi_cap and msi_cap_size.
> > > 
> > > I think for this last use-case, we are better off with returning a
> > > boolean from msi_write_config which tells us whether the write is in
> > > range. This has the advantage in that it will also work well for other
> > > capabilities. Or second best, if that is insufficient for some reason,
> > > export an msi_cap_size function.
> > 
> > Returning whether the write was in range isn't enough.  For device
> > assignment, I need to know whether the capability was enabled or
> > disabled.  This currently means checking the enable state before and
> > after calling msi_write_config and doing the appropriate backend setup.
> 
> Sounds good. Why does this mean you need the capability size?
> 	bool was_enabled = msi_enabled(dev);
> 	msi_write_config(..)
> 	if (was_enabled != msi_enabled(dev)) {
> 	}

Because this code makes me sad...

   bool msi_was_enabled, msix_was_enabled, msi_is_enabled, msix_is_enabled;

   msi_was_enabled = msi_enabled(dev);
   msix_was_enabled = msix_enabled(dev);

   pci_default_write_config(...
   msi_write_config(...
   msix_write_config(...

   msi_is_enabled = msi_enabled(dev);
   msix_is_enabled = msix_enabled(dev);

   if (msi_was_enabled && !msi_is_enabled)
       disable_msi(...
   if (!msi_was_enabled && msi_is_enabled)
       enable_msi(...
   if (msix_was_enabled && !msix_is_enabled)
       disable_msi(...
   if (!msix_was_enabled && msix_is_enabled)
       enable_msix(...

Confining msi tests to an msi related write and msix tests to an msix
related write makes me slightly happier.  I really think we need
callbacks though so common msi/msix code can figure out if we've made a
transition.

Alex

> > I think the only way I could blindly call the msi/x write config
> > routines is if we init the capability with enable/disable callbacks.
> > I'd be ok with an msi_cap_size function if we don't want to go that far
> > too.  What do you prefer?  Thanks,
Michael S. Tsirkin - Nov. 2, 2010, 3:39 p.m.
On Tue, Nov 02, 2010 at 08:23:10AM -0600, Alex Williamson wrote:
> On Tue, 2010-11-02 at 16:07 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 02, 2010 at 08:00:38AM -0600, Alex Williamson wrote:
> > > On Tue, 2010-11-02 at 11:25 +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 01, 2010 at 11:37:53PM -0600, Alex Williamson wrote:
> > > > > Avoid needing to get the MSI capability flags every time we need to
> > > > > check the capability length.  This also makes it accessible outside
> > > > > of msi.c, making it easier for users to filter config space writes
> > > > > using msi_cap and msi_cap_size.
> > > > 
> > > > I think for this last use-case, we are better off with returning a
> > > > boolean from msi_write_config which tells us whether the write is in
> > > > range. This has the advantage in that it will also work well for other
> > > > capabilities. Or second best, if that is insufficient for some reason,
> > > > export an msi_cap_size function.
> > > 
> > > Returning whether the write was in range isn't enough.  For device
> > > assignment, I need to know whether the capability was enabled or
> > > disabled.  This currently means checking the enable state before and
> > > after calling msi_write_config and doing the appropriate backend setup.
> > 
> > Sounds good. Why does this mean you need the capability size?
> > 	bool was_enabled = msi_enabled(dev);
> > 	msi_write_config(..)
> > 	if (was_enabled != msi_enabled(dev)) {
> > 	}
> 
> Because this code makes me sad...
> 
>    bool msi_was_enabled, msix_was_enabled, msi_is_enabled, msix_is_enabled;
> 
>    msi_was_enabled = msi_enabled(dev);
>    msix_was_enabled = msix_enabled(dev);
> 
>    pci_default_write_config(...
>    msi_write_config(...
>    msix_write_config(...
> 
>    msi_is_enabled = msi_enabled(dev);
>    msix_is_enabled = msix_enabled(dev);
> 
>    if (msi_was_enabled && !msi_is_enabled)
>        disable_msi(...
>    if (!msi_was_enabled && msi_is_enabled)
>        enable_msi(...
>    if (msix_was_enabled && !msix_is_enabled)
>        disable_msi(...
>    if (!msix_was_enabled && msix_is_enabled)
>        enable_msix(...
> 
> Confining msi tests to an msi related write and msix tests to an msix
> related write makes me slightly happier.  I really think we need
> callbacks though so common msi/msix code can figure out if we've made a
> transition.
> 
> Alex

This is what we have in qemu-kvm for vhost now, and the code turned out
to be terribly hard to get right.  I would rather not repeat that,
and I would love to rip out the callbacks we have now, too.
One approach would be to simply fold the handling of irqfds
into msix.c.

Having said all that, I really don't understand why does VFIO
force you to figure out that e.g. msix was enabled/disabled.
Can we not get the config write and simply call write() on VFIO?
That is an interface that makes sense to me.


> > > I think the only way I could blindly call the msi/x write config
> > > routines is if we init the capability with enable/disable callbacks.
> > > I'd be ok with an msi_cap_size function if we don't want to go that far
> > > too.  What do you prefer?  Thanks,
> 
>
Alex Williamson - Nov. 2, 2010, 4:08 p.m.
On Tue, 2010-11-02 at 17:39 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 02, 2010 at 08:23:10AM -0600, Alex Williamson wrote:
> > On Tue, 2010-11-02 at 16:07 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Nov 02, 2010 at 08:00:38AM -0600, Alex Williamson wrote:
> > > > On Tue, 2010-11-02 at 11:25 +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Nov 01, 2010 at 11:37:53PM -0600, Alex Williamson wrote:
> > > > > > Avoid needing to get the MSI capability flags every time we need to
> > > > > > check the capability length.  This also makes it accessible outside
> > > > > > of msi.c, making it easier for users to filter config space writes
> > > > > > using msi_cap and msi_cap_size.
> > > > > 
> > > > > I think for this last use-case, we are better off with returning a
> > > > > boolean from msi_write_config which tells us whether the write is in
> > > > > range. This has the advantage in that it will also work well for other
> > > > > capabilities. Or second best, if that is insufficient for some reason,
> > > > > export an msi_cap_size function.
> > > > 
> > > > Returning whether the write was in range isn't enough.  For device
> > > > assignment, I need to know whether the capability was enabled or
> > > > disabled.  This currently means checking the enable state before and
> > > > after calling msi_write_config and doing the appropriate backend setup.
> > > 
> > > Sounds good. Why does this mean you need the capability size?
> > > 	bool was_enabled = msi_enabled(dev);
> > > 	msi_write_config(..)
> > > 	if (was_enabled != msi_enabled(dev)) {
> > > 	}
> > 
> > Because this code makes me sad...
> > 
> >    bool msi_was_enabled, msix_was_enabled, msi_is_enabled, msix_is_enabled;
> > 
> >    msi_was_enabled = msi_enabled(dev);
> >    msix_was_enabled = msix_enabled(dev);
> > 
> >    pci_default_write_config(...
> >    msi_write_config(...
> >    msix_write_config(...
> > 
> >    msi_is_enabled = msi_enabled(dev);
> >    msix_is_enabled = msix_enabled(dev);
> > 
> >    if (msi_was_enabled && !msi_is_enabled)
> >        disable_msi(...
> >    if (!msi_was_enabled && msi_is_enabled)
> >        enable_msi(...
> >    if (msix_was_enabled && !msix_is_enabled)
> >        disable_msi(...
> >    if (!msix_was_enabled && msix_is_enabled)
> >        enable_msix(...
> > 
> > Confining msi tests to an msi related write and msix tests to an msix
> > related write makes me slightly happier.  I really think we need
> > callbacks though so common msi/msix code can figure out if we've made a
> > transition.
> > 
> > Alex
> 
> This is what we have in qemu-kvm for vhost now, and the code turned out
> to be terribly hard to get right.  I would rather not repeat that,
> and I would love to rip out the callbacks we have now, too.
> One approach would be to simply fold the handling of irqfds
> into msix.c.

What makes it hard to get right?  On one hand, if it is hard to get
right, that's all the more reason it should be done in common code so we
don't have to repeat mistakes.

> Having said all that, I really don't understand why does VFIO
> force you to figure out that e.g. msix was enabled/disabled.
> Can we not get the config write and simply call write() on VFIO?
> That is an interface that makes sense to me.

VFIO interrupts are configured via ioctls.  Config space writes to
msi/msix capabilities are emulated.  IMHO, this works out pretty well,
but we could easily make use of the QEMU config emulation if VFIO just
wanted to drop accesses there.  Config space could be used for some
setup, but we have to setup INTx via ioctl and we'd have to pre-register
eventfds per vector.  It's just easy and consistent to set them all up
the same way.

> > > > I think the only way I could blindly call the msi/x write config
> > > > routines is if we init the capability with enable/disable callbacks.
> > > > I'd be ok with an msi_cap_size function if we don't want to go that far
> > > > too.  What do you prefer?  Thanks,
> > 
> >
Michael S. Tsirkin - Nov. 2, 2010, 7:26 p.m.
On Tue, Nov 02, 2010 at 10:08:30AM -0600, Alex Williamson wrote:
> On Tue, 2010-11-02 at 17:39 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 02, 2010 at 08:23:10AM -0600, Alex Williamson wrote:
> > > On Tue, 2010-11-02 at 16:07 +0200, Michael S. Tsirkin wrote:
> > > > On Tue, Nov 02, 2010 at 08:00:38AM -0600, Alex Williamson wrote:
> > > > > On Tue, 2010-11-02 at 11:25 +0200, Michael S. Tsirkin wrote:
> > > > > > On Mon, Nov 01, 2010 at 11:37:53PM -0600, Alex Williamson wrote:
> > > > > > > Avoid needing to get the MSI capability flags every time we need to
> > > > > > > check the capability length.  This also makes it accessible outside
> > > > > > > of msi.c, making it easier for users to filter config space writes
> > > > > > > using msi_cap and msi_cap_size.
> > > > > > 
> > > > > > I think for this last use-case, we are better off with returning a
> > > > > > boolean from msi_write_config which tells us whether the write is in
> > > > > > range. This has the advantage in that it will also work well for other
> > > > > > capabilities. Or second best, if that is insufficient for some reason,
> > > > > > export an msi_cap_size function.
> > > > > 
> > > > > Returning whether the write was in range isn't enough.  For device
> > > > > assignment, I need to know whether the capability was enabled or
> > > > > disabled.  This currently means checking the enable state before and
> > > > > after calling msi_write_config and doing the appropriate backend setup.
> > > > 
> > > > Sounds good. Why does this mean you need the capability size?
> > > > 	bool was_enabled = msi_enabled(dev);
> > > > 	msi_write_config(..)
> > > > 	if (was_enabled != msi_enabled(dev)) {
> > > > 	}
> > > 
> > > Because this code makes me sad...
> > > 
> > >    bool msi_was_enabled, msix_was_enabled, msi_is_enabled, msix_is_enabled;
> > > 
> > >    msi_was_enabled = msi_enabled(dev);
> > >    msix_was_enabled = msix_enabled(dev);
> > > 
> > >    pci_default_write_config(...
> > >    msi_write_config(...
> > >    msix_write_config(...
> > > 
> > >    msi_is_enabled = msi_enabled(dev);
> > >    msix_is_enabled = msix_enabled(dev);
> > > 
> > >    if (msi_was_enabled && !msi_is_enabled)
> > >        disable_msi(...
> > >    if (!msi_was_enabled && msi_is_enabled)
> > >        enable_msi(...
> > >    if (msix_was_enabled && !msix_is_enabled)
> > >        disable_msi(...
> > >    if (!msix_was_enabled && msix_is_enabled)
> > >        enable_msix(...
> > > 
> > > Confining msi tests to an msi related write and msix tests to an msix
> > > related write makes me slightly happier.  I really think we need
> > > callbacks though so common msi/msix code can figure out if we've made a
> > > transition.
> > > 
> > > Alex
> > 
> > This is what we have in qemu-kvm for vhost now, and the code turned out
> > to be terribly hard to get right.  I would rather not repeat that,
> > and I would love to rip out the callbacks we have now, too.
> > One approach would be to simply fold the handling of irqfds
> > into msix.c.
> 
> What makes it hard to get right?  On one hand, if it is hard to get
> right, that's all the more reason it should be done in common code so we
> don't have to repeat mistakes.

Callbacks are hard to use right.

> > Having said all that, I really don't understand why does VFIO
> > force you to figure out that e.g. msix was enabled/disabled.
> > Can we not get the config write and simply call write() on VFIO?
> > That is an interface that makes sense to me.
> 
> VFIO interrupts are configured via ioctls.  Config space writes to
> msi/msix capabilities are emulated.  IMHO, this works out pretty well,
> but we could easily make use of the QEMU config emulation if VFIO just
> wanted to drop accesses there.  Config space could be used for some
> setup, but we have to setup INTx via ioctl and we'd have to pre-register
> eventfds per vector.  It's just easy and consistent to set them all up
> the same way.

Yea. So I think we should just do whatever is needed on startup: create
eventfds etc. And then during operation, qemu should
simply get called on config/memory/io writes and pass these
on to VFIO. If memory write masks a vector, or switches
from INTx to MSI or whatever, VFIO should be able to
figure it out.

BAR config register writes might be the only exception we might
need in qemu until we have explicit hooks into pci.c.

> > > > > I think the only way I could blindly call the msi/x write config
> > > > > routines is if we init the capability with enable/disable callbacks.
> > > > > I'd be ok with an msi_cap_size function if we don't want to go that far
> > > > > too.  What do you prefer?  Thanks,
> > > 
> > > 
> 
>

Patch

diff --git a/hw/msi.c b/hw/msi.c
index 110859b..12e125f 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -148,6 +148,7 @@  int msi_init(struct PCIDevice *dev, uint8_t offset,
     }
 
     dev->msi_cap = config_offset;
+    dev->msi_cap_size = cap_size;
     dev->cap_present |= QEMU_PCI_CAP_MSI;
 
     pci_set_word(dev->config + msi_flags_off(dev), flags);
@@ -170,14 +171,12 @@  int msi_init(struct PCIDevice *dev, uint8_t offset,
 
 void msi_uninit(struct PCIDevice *dev)
 {
-    uint8_t cap_size;
-
     if (!(dev->cap_present & QEMU_PCI_CAP_MSI))
         return;
 
-    cap_size = msi_cap_sizeof(pci_get_word(dev->config + msi_flags_off(dev)));
-    pci_del_capability(dev, PCI_CAP_ID_MSIX, cap_size);
+    pci_del_capability(dev, PCI_CAP_ID_MSIX, dev->msi_cap_size);
     dev->msi_cap = 0;
+    dev->msi_cap_size = 0;
     dev->cap_present &= ~QEMU_PCI_CAP_MSI;
     MSI_DEV_PRINTF(dev, "uninit\n");
 }
@@ -269,7 +268,7 @@  void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
     uint32_t pending;
     int i;
 
-    if (!ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
+    if (!ranges_overlap(addr, len, dev->msi_cap, dev->msi_cap_size)) {
         return;
     }
 
diff --git a/hw/pci.h b/hw/pci.h
index a558803..d268806 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -176,8 +176,9 @@  struct PCIDevice {
     /* Version id needed for VMState */
     int32_t version_id;
 
-    /* Offset of MSI capability in config space */
+    /* Offset & size of MSI capability in config space */
     uint8_t msi_cap;
+    uint8_t msi_cap_size;
 
     /* PCI Express */
     PCIExpressDevice exp;