Message ID | 1479802130-22640-1-git-send-email-peterx@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote: > We are very strict in the past getting MSIs from commit > d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming > that MSI should be configured before hand when fetching. When we have > unrecognized configurations, we panic the system. However looks like > this is too strict to be working on some platform, and issues occured. > Firstly it's found on a ppc case and fixed by David in: > > 6d17a01 vfio/pci: Fix regression in MSI routing configuration > > However we encountered another case now with windows virtio driver and > reported (and possibly more): > > http://bugs.debian.org/844361 > > To make every driver/hardware happy, let's loosen the rule and go back > to the original behavior - instead of panic the system, when we try to > fetch MSI without configured MSI/MSI-X system, we just provide an empty > message to make drivers happy. > > Reported-by: Maciej Kotliński <makotlinski@gmail.com> > Signed-off-by: Peter Xu <peterx@redhat.com> Sorry I should mark this as "for-2.8". Also cc stable since this bug exists since 2.7.0. Michael, do you think it can be a material for 2.8 rc2? Thanks, -- peterx
On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote: > We are very strict in the past getting MSIs from commit > d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming > that MSI should be configured before hand when fetching. When we have > unrecognized configurations, we panic the system. However looks like > this is too strict to be working on some platform, and issues occured. > Firstly it's found on a ppc case and fixed by David in: > > 6d17a01 vfio/pci: Fix regression in MSI routing configuration > > However we encountered another case now with windows virtio driver and > reported (and possibly more): > > http://bugs.debian.org/844361 > > To make every driver/hardware happy, let's loosen the rule and go back > to the original behavior - instead of panic the system, when we try to > fetch MSI without configured MSI/MSI-X system, we just provide an empty > message to make drivers happy. > > Reported-by: Maciej Kotliński <makotlinski@gmail.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > hw/pci/pci.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > index 24fae16..0cd2bb0 100644 > --- a/hw/pci/pci.c > +++ b/hw/pci/pci.c > @@ -2606,9 +2606,11 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector) > } else if (msi_enabled(dev)) { > msg = msi_get_message(dev, vector); > } else { > - /* Should never happen */ > - error_report("%s: unknown interrupt type", __func__); > - abort(); > + /* > + * Device is not configured with MSI/MSI-X yet, let's provide > + * an empty message to make all device drivers happy. > + */ > + memset(&msg, 0, sizeof(msg)); > } > return msg; > } Looks like a hack to me. Even if callers happen to treat 0 message as a nop, there's no guarantee that's true for all platforms. Pls change pci_get_msi_message to return an error code, and fix callers to check that. > -- > 2.7.4
On Fri, Nov 25, 2016 at 06:11:01AM +0200, Michael S. Tsirkin wrote: > On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote: > > We are very strict in the past getting MSIs from commit > > d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming > > that MSI should be configured before hand when fetching. When we have > > unrecognized configurations, we panic the system. However looks like > > this is too strict to be working on some platform, and issues occured. > > Firstly it's found on a ppc case and fixed by David in: > > > > 6d17a01 vfio/pci: Fix regression in MSI routing configuration > > > > However we encountered another case now with windows virtio driver and > > reported (and possibly more): > > > > http://bugs.debian.org/844361 > > > > To make every driver/hardware happy, let's loosen the rule and go back > > to the original behavior - instead of panic the system, when we try to > > fetch MSI without configured MSI/MSI-X system, we just provide an empty > > message to make drivers happy. > > > > Reported-by: Maciej Kotliński <makotlinski@gmail.com> > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > hw/pci/pci.c | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > > index 24fae16..0cd2bb0 100644 > > --- a/hw/pci/pci.c > > +++ b/hw/pci/pci.c > > @@ -2606,9 +2606,11 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector) > > } else if (msi_enabled(dev)) { > > msg = msi_get_message(dev, vector); > > } else { > > - /* Should never happen */ > > - error_report("%s: unknown interrupt type", __func__); > > - abort(); > > + /* > > + * Device is not configured with MSI/MSI-X yet, let's provide > > + * an empty message to make all device drivers happy. > > + */ > > + memset(&msg, 0, sizeof(msg)); > > } > > return msg; > > } > > Looks like a hack to me. Even if callers happen to treat > 0 message as a nop, there's no guarantee that's true for > all platforms. Pls change pci_get_msi_message to > return an error code, and fix callers to check that. Actually I'd say I still cannot understand why some guests will encounter this issue - the driver just tries to fetch and enable MSI/MSI-X messages without fully activate MSI/MSI-X in general (that's why msi_enabled() and msix_enabled() both returned false). But indeed some drivers won't work properly after commit d1f6af6a. David fixed some case for ppc and vfio (6d17a01), but problem still exist, like Debian's report (http://bugs.debian.org/844361). To avoid going into every details of guest drivers, I was thinking maybe we can just "recover" the behavior to the past: old QEMU (before d1f6af6a) allows to fetch an meaningless message (e.g., when msi_get_message() is called, it's possible that MSI is still not setup, but IMHO that message is meaningless). So here we emulate that case, and return a meaningless message. From the Debian bug report, we can see that this does solve the issue (yeah I know this is not even an excuse to agree on this patch, but again I just think this is currently the best way to solve the problem...). If we change this into return error, then kvm_irqchip_add_msi_route() might fail, and that's a different behavior again, I don't know whether it'll bring more troubles (only if I can test every guest driver with that code, but that's merely impossible). So IMHO the best way to solve the problem is use the current patch and try to keep the old behavior. I would really appreciate if you (or anyone) has better suggestion. Thanks, -- peterx
On Fri, Nov 25, 2016 at 01:28:36PM +0800, Peter Xu wrote: > On Fri, Nov 25, 2016 at 06:11:01AM +0200, Michael S. Tsirkin wrote: > > On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote: > > > We are very strict in the past getting MSIs from commit > > > d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming > > > that MSI should be configured before hand when fetching. When we have > > > unrecognized configurations, we panic the system. However looks like > > > this is too strict to be working on some platform, and issues occured. > > > Firstly it's found on a ppc case and fixed by David in: > > > > > > 6d17a01 vfio/pci: Fix regression in MSI routing configuration > > > > > > However we encountered another case now with windows virtio driver and > > > reported (and possibly more): > > > > > > http://bugs.debian.org/844361 > > > > > > To make every driver/hardware happy, let's loosen the rule and go back > > > to the original behavior - instead of panic the system, when we try to > > > fetch MSI without configured MSI/MSI-X system, we just provide an empty > > > message to make drivers happy. > > > > > > Reported-by: Maciej Kotliński <makotlinski@gmail.com> > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > --- > > > hw/pci/pci.c | 8 +++++--- > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > > > index 24fae16..0cd2bb0 100644 > > > --- a/hw/pci/pci.c > > > +++ b/hw/pci/pci.c > > > @@ -2606,9 +2606,11 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector) > > > } else if (msi_enabled(dev)) { > > > msg = msi_get_message(dev, vector); > > > } else { > > > - /* Should never happen */ > > > - error_report("%s: unknown interrupt type", __func__); > > > - abort(); > > > + /* > > > + * Device is not configured with MSI/MSI-X yet, let's provide > > > + * an empty message to make all device drivers happy. > > > + */ > > > + memset(&msg, 0, sizeof(msg)); > > > } > > > return msg; > > > } > > > > Looks like a hack to me. Even if callers happen to treat > > 0 message as a nop, there's no guarantee that's true for > > all platforms. Pls change pci_get_msi_message to > > return an error code, and fix callers to check that. > > Actually I'd say I still cannot understand why some guests will > encounter this issue - the driver just tries to fetch and enable > MSI/MSI-X messages without fully activate MSI/MSI-X in general (that's > why msi_enabled() and msix_enabled() both returned false). > > But indeed some drivers won't work properly after commit d1f6af6a. > David fixed some case for ppc and vfio (6d17a01), but problem still > exist, like Debian's report (http://bugs.debian.org/844361). > > To avoid going into every details of guest drivers, I was thinking > maybe we can just "recover" the behavior to the past: old QEMU (before > d1f6af6a) allows to fetch an meaningless message (e.g., when > msi_get_message() is called, it's possible that MSI is still not > setup, but IMHO that message is meaningless). So here we emulate that > case, and return a meaningless message. From the Debian bug report, we > can see that this does solve the issue (yeah I know this is not even > an excuse to agree on this patch, but again I just think this is > currently the best way to solve the problem...). > > If we change this into return error, then kvm_irqchip_add_msi_route() > might fail, and that's a different behavior again, I don't know > whether it'll bring more troubles (only if I can test every guest > driver with that code, but that's merely impossible). So IMHO the best > way to solve the problem is use the current patch and try to keep the > old behavior. > > I would really appreciate if you (or anyone) has better suggestion. > > Thanks, > > -- peterx You need to figure out the call stack. What does driver do? I'm fine with working around it for now but 1. maybe we should fix the driver anyway 2. we might want to emit an error message
On Fri, Nov 25, 2016 at 08:47:26AM +0200, Michael S. Tsirkin wrote: > On Fri, Nov 25, 2016 at 01:28:36PM +0800, Peter Xu wrote: > > On Fri, Nov 25, 2016 at 06:11:01AM +0200, Michael S. Tsirkin wrote: > > > On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote: > > > > We are very strict in the past getting MSIs from commit > > > > d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming > > > > that MSI should be configured before hand when fetching. When we have > > > > unrecognized configurations, we panic the system. However looks like > > > > this is too strict to be working on some platform, and issues occured. > > > > Firstly it's found on a ppc case and fixed by David in: > > > > > > > > 6d17a01 vfio/pci: Fix regression in MSI routing configuration > > > > > > > > However we encountered another case now with windows virtio driver and > > > > reported (and possibly more): > > > > > > > > http://bugs.debian.org/844361 > > > > > > > > To make every driver/hardware happy, let's loosen the rule and go back > > > > to the original behavior - instead of panic the system, when we try to > > > > fetch MSI without configured MSI/MSI-X system, we just provide an empty > > > > message to make drivers happy. > > > > > > > > Reported-by: Maciej Kotliński <makotlinski@gmail.com> > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > --- > > > > hw/pci/pci.c | 8 +++++--- > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > > > > index 24fae16..0cd2bb0 100644 > > > > --- a/hw/pci/pci.c > > > > +++ b/hw/pci/pci.c > > > > @@ -2606,9 +2606,11 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector) > > > > } else if (msi_enabled(dev)) { > > > > msg = msi_get_message(dev, vector); > > > > } else { > > > > - /* Should never happen */ > > > > - error_report("%s: unknown interrupt type", __func__); > > > > - abort(); > > > > + /* > > > > + * Device is not configured with MSI/MSI-X yet, let's provide > > > > + * an empty message to make all device drivers happy. > > > > + */ > > > > + memset(&msg, 0, sizeof(msg)); > > > > } > > > > return msg; > > > > } > > > > > > Looks like a hack to me. Even if callers happen to treat > > > 0 message as a nop, there's no guarantee that's true for > > > all platforms. Pls change pci_get_msi_message to > > > return an error code, and fix callers to check that. > > > > Actually I'd say I still cannot understand why some guests will > > encounter this issue - the driver just tries to fetch and enable > > MSI/MSI-X messages without fully activate MSI/MSI-X in general (that's > > why msi_enabled() and msix_enabled() both returned false). > > > > But indeed some drivers won't work properly after commit d1f6af6a. > > David fixed some case for ppc and vfio (6d17a01), but problem still > > exist, like Debian's report (http://bugs.debian.org/844361). > > > > To avoid going into every details of guest drivers, I was thinking > > maybe we can just "recover" the behavior to the past: old QEMU (before > > d1f6af6a) allows to fetch an meaningless message (e.g., when > > msi_get_message() is called, it's possible that MSI is still not > > setup, but IMHO that message is meaningless). So here we emulate that > > case, and return a meaningless message. From the Debian bug report, we > > can see that this does solve the issue (yeah I know this is not even > > an excuse to agree on this patch, but again I just think this is > > currently the best way to solve the problem...). > > > > If we change this into return error, then kvm_irqchip_add_msi_route() > > might fail, and that's a different behavior again, I don't know > > whether it'll bring more troubles (only if I can test every guest > > driver with that code, but that's merely impossible). So IMHO the best > > way to solve the problem is use the current patch and try to keep the > > old behavior. > > > > I would really appreciate if you (or anyone) has better suggestion. > > > > Thanks, > > > > -- peterx > > You need to figure out the call stack. What does driver do? > I'm fine with working around it for now but > 1. maybe we should fix the driver anyway > 2. we might want to emit an error message > > -- > MST Limin posted another comment here: http://bugs.debian.org/844361 We can see whether that bug is finally caused by pci-assign too, and whether that pci-assign fix can solve the problem. If that works, it'll be nice, then we won't need this patch at least for now. Thanks, -- peterx
diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 24fae16..0cd2bb0 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2606,9 +2606,11 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector) } else if (msi_enabled(dev)) { msg = msi_get_message(dev, vector); } else { - /* Should never happen */ - error_report("%s: unknown interrupt type", __func__); - abort(); + /* + * Device is not configured with MSI/MSI-X yet, let's provide + * an empty message to make all device drivers happy. + */ + memset(&msg, 0, sizeof(msg)); } return msg; }
We are very strict in the past getting MSIs from commit d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming that MSI should be configured before hand when fetching. When we have unrecognized configurations, we panic the system. However looks like this is too strict to be working on some platform, and issues occured. Firstly it's found on a ppc case and fixed by David in: 6d17a01 vfio/pci: Fix regression in MSI routing configuration However we encountered another case now with windows virtio driver and reported (and possibly more): http://bugs.debian.org/844361 To make every driver/hardware happy, let's loosen the rule and go back to the original behavior - instead of panic the system, when we try to fetch MSI without configured MSI/MSI-X system, we just provide an empty message to make drivers happy. Reported-by: Maciej Kotliński <makotlinski@gmail.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- hw/pci/pci.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)