diff mbox

[v2] pci: relax pci_msi_get_message()

Message ID 1479802130-22640-1-git-send-email-peterx@redhat.com
State New
Headers show

Commit Message

Peter Xu Nov. 22, 2016, 8:08 a.m. UTC
We are very strict in the past getting MSIs from commit
d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming
that MSI should be configured before hand when fetching. When we have
unrecognized configurations, we panic the system. However looks like
this is too strict to be working on some platform, and issues occured.
Firstly it's found on a ppc case and fixed by David in:

  6d17a01 vfio/pci: Fix regression in MSI routing configuration

However we encountered another case now with windows virtio driver and
reported (and possibly more):

  http://bugs.debian.org/844361

To make every driver/hardware happy, let's loosen the rule and go back
to the original behavior - instead of panic the system, when we try to
fetch MSI without configured MSI/MSI-X system, we just provide an empty
message to make drivers happy.

Reported-by: Maciej Kotliński <makotlinski@gmail.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/pci/pci.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Peter Xu Nov. 24, 2016, 5:29 a.m. UTC | #1
On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote:
> We are very strict in the past getting MSIs from commit
> d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming
> that MSI should be configured before hand when fetching. When we have
> unrecognized configurations, we panic the system. However looks like
> this is too strict to be working on some platform, and issues occured.
> Firstly it's found on a ppc case and fixed by David in:
> 
>   6d17a01 vfio/pci: Fix regression in MSI routing configuration
> 
> However we encountered another case now with windows virtio driver and
> reported (and possibly more):
> 
>   http://bugs.debian.org/844361
> 
> To make every driver/hardware happy, let's loosen the rule and go back
> to the original behavior - instead of panic the system, when we try to
> fetch MSI without configured MSI/MSI-X system, we just provide an empty
> message to make drivers happy.
> 
> Reported-by: Maciej Kotliński <makotlinski@gmail.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Sorry I should mark this as "for-2.8". Also cc stable since this bug
exists since 2.7.0.

Michael, do you think it can be a material for 2.8 rc2?

Thanks,

-- peterx
Michael S. Tsirkin Nov. 25, 2016, 4:11 a.m. UTC | #2
On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote:
> We are very strict in the past getting MSIs from commit
> d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming
> that MSI should be configured before hand when fetching. When we have
> unrecognized configurations, we panic the system. However looks like
> this is too strict to be working on some platform, and issues occured.
> Firstly it's found on a ppc case and fixed by David in:
> 
>   6d17a01 vfio/pci: Fix regression in MSI routing configuration
> 
> However we encountered another case now with windows virtio driver and
> reported (and possibly more):
> 
>   http://bugs.debian.org/844361
> 
> To make every driver/hardware happy, let's loosen the rule and go back
> to the original behavior - instead of panic the system, when we try to
> fetch MSI without configured MSI/MSI-X system, we just provide an empty
> message to make drivers happy.
> 
> Reported-by: Maciej Kotliński <makotlinski@gmail.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 24fae16..0cd2bb0 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2606,9 +2606,11 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
>      } else if (msi_enabled(dev)) {
>          msg = msi_get_message(dev, vector);
>      } else {
> -        /* Should never happen */
> -        error_report("%s: unknown interrupt type", __func__);
> -        abort();
> +        /*
> +         * Device is not configured with MSI/MSI-X yet, let's provide
> +         * an empty message to make all device drivers happy.
> +         */
> +        memset(&msg, 0, sizeof(msg));
>      }
>      return msg;
>  }

Looks like a hack to me. Even if callers happen to treat
0 message as a nop, there's no guarantee that's true for
all platforms. Pls change pci_get_msi_message to
return an error code, and fix callers to check that.

> -- 
> 2.7.4
Peter Xu Nov. 25, 2016, 5:28 a.m. UTC | #3
On Fri, Nov 25, 2016 at 06:11:01AM +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote:
> > We are very strict in the past getting MSIs from commit
> > d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming
> > that MSI should be configured before hand when fetching. When we have
> > unrecognized configurations, we panic the system. However looks like
> > this is too strict to be working on some platform, and issues occured.
> > Firstly it's found on a ppc case and fixed by David in:
> > 
> >   6d17a01 vfio/pci: Fix regression in MSI routing configuration
> > 
> > However we encountered another case now with windows virtio driver and
> > reported (and possibly more):
> > 
> >   http://bugs.debian.org/844361
> > 
> > To make every driver/hardware happy, let's loosen the rule and go back
> > to the original behavior - instead of panic the system, when we try to
> > fetch MSI without configured MSI/MSI-X system, we just provide an empty
> > message to make drivers happy.
> > 
> > Reported-by: Maciej Kotliński <makotlinski@gmail.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/pci/pci.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index 24fae16..0cd2bb0 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -2606,9 +2606,11 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
> >      } else if (msi_enabled(dev)) {
> >          msg = msi_get_message(dev, vector);
> >      } else {
> > -        /* Should never happen */
> > -        error_report("%s: unknown interrupt type", __func__);
> > -        abort();
> > +        /*
> > +         * Device is not configured with MSI/MSI-X yet, let's provide
> > +         * an empty message to make all device drivers happy.
> > +         */
> > +        memset(&msg, 0, sizeof(msg));
> >      }
> >      return msg;
> >  }
> 
> Looks like a hack to me. Even if callers happen to treat
> 0 message as a nop, there's no guarantee that's true for
> all platforms. Pls change pci_get_msi_message to
> return an error code, and fix callers to check that.

Actually I'd say I still cannot understand why some guests will
encounter this issue - the driver just tries to fetch and enable
MSI/MSI-X messages without fully activate MSI/MSI-X in general (that's
why msi_enabled() and msix_enabled() both returned false).

But indeed some drivers won't work properly after commit d1f6af6a.
David fixed some case for ppc and vfio (6d17a01), but problem still
exist, like Debian's report (http://bugs.debian.org/844361).

To avoid going into every details of guest drivers, I was thinking
maybe we can just "recover" the behavior to the past: old QEMU (before
d1f6af6a) allows to fetch an meaningless message (e.g., when
msi_get_message() is called, it's possible that MSI is still not
setup, but IMHO that message is meaningless). So here we emulate that
case, and return a meaningless message. From the Debian bug report, we
can see that this does solve the issue (yeah I know this is not even
an excuse to agree on this patch, but again I just think this is
currently the best way to solve the problem...).

If we change this into return error, then kvm_irqchip_add_msi_route()
might fail, and that's a different behavior again, I don't know
whether it'll bring more troubles (only if I can test every guest
driver with that code, but that's merely impossible). So IMHO the best
way to solve the problem is use the current patch and try to keep the
old behavior.

I would really appreciate if you (or anyone) has better suggestion.

Thanks,

-- peterx
Michael S. Tsirkin Nov. 25, 2016, 6:47 a.m. UTC | #4
On Fri, Nov 25, 2016 at 01:28:36PM +0800, Peter Xu wrote:
> On Fri, Nov 25, 2016 at 06:11:01AM +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote:
> > > We are very strict in the past getting MSIs from commit
> > > d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming
> > > that MSI should be configured before hand when fetching. When we have
> > > unrecognized configurations, we panic the system. However looks like
> > > this is too strict to be working on some platform, and issues occured.
> > > Firstly it's found on a ppc case and fixed by David in:
> > > 
> > >   6d17a01 vfio/pci: Fix regression in MSI routing configuration
> > > 
> > > However we encountered another case now with windows virtio driver and
> > > reported (and possibly more):
> > > 
> > >   http://bugs.debian.org/844361
> > > 
> > > To make every driver/hardware happy, let's loosen the rule and go back
> > > to the original behavior - instead of panic the system, when we try to
> > > fetch MSI without configured MSI/MSI-X system, we just provide an empty
> > > message to make drivers happy.
> > > 
> > > Reported-by: Maciej Kotliński <makotlinski@gmail.com>
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  hw/pci/pci.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > index 24fae16..0cd2bb0 100644
> > > --- a/hw/pci/pci.c
> > > +++ b/hw/pci/pci.c
> > > @@ -2606,9 +2606,11 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
> > >      } else if (msi_enabled(dev)) {
> > >          msg = msi_get_message(dev, vector);
> > >      } else {
> > > -        /* Should never happen */
> > > -        error_report("%s: unknown interrupt type", __func__);
> > > -        abort();
> > > +        /*
> > > +         * Device is not configured with MSI/MSI-X yet, let's provide
> > > +         * an empty message to make all device drivers happy.
> > > +         */
> > > +        memset(&msg, 0, sizeof(msg));
> > >      }
> > >      return msg;
> > >  }
> > 
> > Looks like a hack to me. Even if callers happen to treat
> > 0 message as a nop, there's no guarantee that's true for
> > all platforms. Pls change pci_get_msi_message to
> > return an error code, and fix callers to check that.
> 
> Actually I'd say I still cannot understand why some guests will
> encounter this issue - the driver just tries to fetch and enable
> MSI/MSI-X messages without fully activate MSI/MSI-X in general (that's
> why msi_enabled() and msix_enabled() both returned false).
> 
> But indeed some drivers won't work properly after commit d1f6af6a.
> David fixed some case for ppc and vfio (6d17a01), but problem still
> exist, like Debian's report (http://bugs.debian.org/844361).
> 
> To avoid going into every details of guest drivers, I was thinking
> maybe we can just "recover" the behavior to the past: old QEMU (before
> d1f6af6a) allows to fetch an meaningless message (e.g., when
> msi_get_message() is called, it's possible that MSI is still not
> setup, but IMHO that message is meaningless). So here we emulate that
> case, and return a meaningless message. From the Debian bug report, we
> can see that this does solve the issue (yeah I know this is not even
> an excuse to agree on this patch, but again I just think this is
> currently the best way to solve the problem...).
> 
> If we change this into return error, then kvm_irqchip_add_msi_route()
> might fail, and that's a different behavior again, I don't know
> whether it'll bring more troubles (only if I can test every guest
> driver with that code, but that's merely impossible). So IMHO the best
> way to solve the problem is use the current patch and try to keep the
> old behavior.
> 
> I would really appreciate if you (or anyone) has better suggestion.
> 
> Thanks,
> 
> -- peterx

You need to figure out the call stack. What does driver do?
I'm fine with working around it for now but
1. maybe we should fix the driver anyway
2. we might want to emit an error message
Peter Xu Nov. 25, 2016, 7:05 a.m. UTC | #5
On Fri, Nov 25, 2016 at 08:47:26AM +0200, Michael S. Tsirkin wrote:
> On Fri, Nov 25, 2016 at 01:28:36PM +0800, Peter Xu wrote:
> > On Fri, Nov 25, 2016 at 06:11:01AM +0200, Michael S. Tsirkin wrote:
> > > On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote:
> > > > We are very strict in the past getting MSIs from commit
> > > > d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming
> > > > that MSI should be configured before hand when fetching. When we have
> > > > unrecognized configurations, we panic the system. However looks like
> > > > this is too strict to be working on some platform, and issues occured.
> > > > Firstly it's found on a ppc case and fixed by David in:
> > > > 
> > > >   6d17a01 vfio/pci: Fix regression in MSI routing configuration
> > > > 
> > > > However we encountered another case now with windows virtio driver and
> > > > reported (and possibly more):
> > > > 
> > > >   http://bugs.debian.org/844361
> > > > 
> > > > To make every driver/hardware happy, let's loosen the rule and go back
> > > > to the original behavior - instead of panic the system, when we try to
> > > > fetch MSI without configured MSI/MSI-X system, we just provide an empty
> > > > message to make drivers happy.
> > > > 
> > > > Reported-by: Maciej Kotliński <makotlinski@gmail.com>
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  hw/pci/pci.c | 8 +++++---
> > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > index 24fae16..0cd2bb0 100644
> > > > --- a/hw/pci/pci.c
> > > > +++ b/hw/pci/pci.c
> > > > @@ -2606,9 +2606,11 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
> > > >      } else if (msi_enabled(dev)) {
> > > >          msg = msi_get_message(dev, vector);
> > > >      } else {
> > > > -        /* Should never happen */
> > > > -        error_report("%s: unknown interrupt type", __func__);
> > > > -        abort();
> > > > +        /*
> > > > +         * Device is not configured with MSI/MSI-X yet, let's provide
> > > > +         * an empty message to make all device drivers happy.
> > > > +         */
> > > > +        memset(&msg, 0, sizeof(msg));
> > > >      }
> > > >      return msg;
> > > >  }
> > > 
> > > Looks like a hack to me. Even if callers happen to treat
> > > 0 message as a nop, there's no guarantee that's true for
> > > all platforms. Pls change pci_get_msi_message to
> > > return an error code, and fix callers to check that.
> > 
> > Actually I'd say I still cannot understand why some guests will
> > encounter this issue - the driver just tries to fetch and enable
> > MSI/MSI-X messages without fully activate MSI/MSI-X in general (that's
> > why msi_enabled() and msix_enabled() both returned false).
> > 
> > But indeed some drivers won't work properly after commit d1f6af6a.
> > David fixed some case for ppc and vfio (6d17a01), but problem still
> > exist, like Debian's report (http://bugs.debian.org/844361).
> > 
> > To avoid going into every details of guest drivers, I was thinking
> > maybe we can just "recover" the behavior to the past: old QEMU (before
> > d1f6af6a) allows to fetch an meaningless message (e.g., when
> > msi_get_message() is called, it's possible that MSI is still not
> > setup, but IMHO that message is meaningless). So here we emulate that
> > case, and return a meaningless message. From the Debian bug report, we
> > can see that this does solve the issue (yeah I know this is not even
> > an excuse to agree on this patch, but again I just think this is
> > currently the best way to solve the problem...).
> > 
> > If we change this into return error, then kvm_irqchip_add_msi_route()
> > might fail, and that's a different behavior again, I don't know
> > whether it'll bring more troubles (only if I can test every guest
> > driver with that code, but that's merely impossible). So IMHO the best
> > way to solve the problem is use the current patch and try to keep the
> > old behavior.
> > 
> > I would really appreciate if you (or anyone) has better suggestion.
> > 
> > Thanks,
> > 
> > -- peterx
> 
> You need to figure out the call stack. What does driver do?
> I'm fine with working around it for now but
> 1. maybe we should fix the driver anyway
> 2. we might want to emit an error message
> 
> -- 
> MST

Limin posted another comment here:

  http://bugs.debian.org/844361

We can see whether that bug is finally caused by pci-assign too, and
whether that pci-assign fix can solve the problem.

If that works, it'll be nice, then we won't need this patch at least
for now.

Thanks,

-- peterx
diff mbox

Patch

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 24fae16..0cd2bb0 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2606,9 +2606,11 @@  MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
     } else if (msi_enabled(dev)) {
         msg = msi_get_message(dev, vector);
     } else {
-        /* Should never happen */
-        error_report("%s: unknown interrupt type", __func__);
-        abort();
+        /*
+         * Device is not configured with MSI/MSI-X yet, let's provide
+         * an empty message to make all device drivers happy.
+         */
+        memset(&msg, 0, sizeof(msg));
     }
     return msg;
 }