Message ID | 20170113211311.24057-1-dan.streetman@canonical.com |
---|---|
State | New |
Headers | show |
On 13.01.2017 22:13, Dan Streetman wrote: > BugLink: http://bugs.launchpad.net/bugs/1656381 > > Revert the main part of commit: > af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") > > That commit introduced reading the pci device's msi message data to see > if a pirq was previously configured for the device's msi/msix, and re-use > that pirq. At the time, that was the correct behavior. However, a > later change to Qemu caused it to call into the Xen hypervisor to unmap > all pirqs for a pci device, when the pci device disables its MSI/MSIX > vectors; specifically the Qemu commit: > c976437c7dba9c7444fb41df45468968aaa326ad > ("qemu-xen: free all the pirqs for msi/msix when driver unload") > > Once Qemu added this pirq unmapping, it was no longer correct for the > kernel to re-use the pirq number cached in the pci device msi message > data. All Qemu releases since 2.1.0 contain the patch that unmaps the > pirqs when the pci device disables its MSI/MSIX vectors. In Trusty release we have qemu-2.0.0. I wonder whether adding the change to the kernel (hm, actually same would apply to xenial via hwe path) would cause regressions when not using the cloud-archive... But then, changing the 2.0.0 qemu to add this unmapping might cause other unexpected problems... -Stefan > > This bug is causing failures to initialize multiple NVMe controllers > under Xen, because the NVMe driver sets up a single MSIX vector for > each controller (concurrently), and then after using that to talk to > the controller for some configuration data, it disables the single MSIX > vector and re-configures all the MSIX vectors it needs. So the MSIX > setup code tries to re-use the cached pirq from the first vector > for each controller, but the hypervisor has already given away that > pirq to another controller, and its initialization fails. > > This is discussed in more detail at: > https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html > > Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") > Signed-off-by: Dan Streetman <dan.streetman@canonical.com> > (backported from X/Y/Z patch for context/function name changes only) > > --- > arch/x86/pci/xen.c | 23 +++++++---------------- > 1 file changed, 7 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c > index 5eee495..6969c76 100644 > --- a/arch/x86/pci/xen.c > +++ b/arch/x86/pci/xen.c > @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) > return 1; > > list_for_each_entry(msidesc, &dev->msi_list, list) { > - __read_msi_msg(msidesc, &msg); > - pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) | > - ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff); > - if (msg.data != XEN_PIRQ_MSI_DATA || > - xen_irq_from_pirq(pirq) < 0) { > - pirq = xen_allocate_pirq_msi(dev, msidesc); > - if (pirq < 0) { > - irq = -ENODEV; > - goto error; > - } > - xen_msi_compose_msg(dev, pirq, &msg); > - __write_msi_msg(msidesc, &msg); > - dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); > - } else { > - dev_dbg(&dev->dev, > - "xen: msi already bound to pirq=%d\n", pirq); > + pirq = xen_allocate_pirq_msi(dev, msidesc); > + if (pirq < 0) { > + irq = -ENODEV; > + goto error; > } > + xen_msi_compose_msg(dev, pirq, &msg); > + __write_msi_msg(msidesc, &msg); > + dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); > irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq, > (type == PCI_CAP_ID_MSIX) ? > "msi-x" : "msi", >
On Mon, Jan 16, 2017 at 6:34 AM, Stefan Bader <stefan.bader@canonical.com> wrote: > On 13.01.2017 22:13, Dan Streetman wrote: >> BugLink: http://bugs.launchpad.net/bugs/1656381 >> >> Revert the main part of commit: >> af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") >> >> That commit introduced reading the pci device's msi message data to see >> if a pirq was previously configured for the device's msi/msix, and re-use >> that pirq. At the time, that was the correct behavior. However, a >> later change to Qemu caused it to call into the Xen hypervisor to unmap >> all pirqs for a pci device, when the pci device disables its MSI/MSIX >> vectors; specifically the Qemu commit: >> c976437c7dba9c7444fb41df45468968aaa326ad >> ("qemu-xen: free all the pirqs for msi/msix when driver unload") >> >> Once Qemu added this pirq unmapping, it was no longer correct for the >> kernel to re-use the pirq number cached in the pci device msi message >> data. All Qemu releases since 2.1.0 contain the patch that unmaps the >> pirqs when the pci device disables its MSI/MSIX vectors. > > In Trusty release we have qemu-2.0.0. I wonder whether adding the change to the > kernel (hm, actually same would apply to xenial via hwe path) would cause > regressions when not using the cloud-archive... Yes, but applying this patch to the Trusty kernel is completely unrelated to a Trusty-based Xen hypervisor. This change is to the guest kernel. Meaning, if we don't apply this kernel patch to the Trusty kernel, X/Y/Z patched kernels will still see problems if they're running under a Trusty-based Xen hypervisor (i.e. qemu 2.0.0 or earlier), while T unpatched kernel will see problems if it's running under a newer (Vivid or later, or UCA Kilo or later) Xen hypervisor. Applying this to T as well as X/Y/Z means, all of them will work right under qemu 2.1.0 or later Xen hypervisor, and all of them will not work right under qemu 2.0.0 or earlier Xen hypervisor. > But then, changing the 2.0.0 qemu to add this unmapping might cause other > unexpected problems... I opened bug 1657489 to track fixing Trusty qemu. https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1657489 I included a matrix there of hypervisor version and guest kernel version, which I'll include here. 1) qemu not patched (2.0.0 and earlier), guest kernel not patched: CORRECT behavior hypervisor: Trusty qemu or UCA Icehouse qemu guest: all without patch from bug 1656381 failure: none 2) qemu not patched (2.0.0 and earlier), guest kernel patched: INCORRECT behavior hypervisor: Trusty qemu or UCA Icehouse qemu guest: all with patch from bug 1656381 failure: MSI interrupts will fail to be configured for any device, if the device disables and then re-enables its MSI. Only the first time a device enables MSI will work. For example, unloading a driver will result in failure to enable MSI when the driver is reloaded. 3) qemu patched (2.1.0 and later), guest kernel not patched: INCORRECT behavior hypervisor: Vivid or later qemu, or UCA Kilo or later qemu guest: all without patch from bug 1656381 failure: MSI interrupts in the guest may not be correctly mapped if device B enables its MSI after device A has disabled its MSI; when device A re-enables its MSI, some of its interrupts will fail to be configured correctly. NVMe shows this repeatedly with multiple NVMe controllers; usually only 1 NVMe controller will finish initialization correctly. 4) qemu patched (2.1.0 and later), guest kernel patched: CORRECT behavior hypervisor: Vivid or later qemu, or UCA Kilo or later qemu guest: all with patch from bug 1656381 failure: none > > -Stefan > >> >> This bug is causing failures to initialize multiple NVMe controllers >> under Xen, because the NVMe driver sets up a single MSIX vector for >> each controller (concurrently), and then after using that to talk to >> the controller for some configuration data, it disables the single MSIX >> vector and re-configures all the MSIX vectors it needs. So the MSIX >> setup code tries to re-use the cached pirq from the first vector >> for each controller, but the hypervisor has already given away that >> pirq to another controller, and its initialization fails. >> >> This is discussed in more detail at: >> https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html >> >> Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") >> Signed-off-by: Dan Streetman <dan.streetman@canonical.com> >> (backported from X/Y/Z patch for context/function name changes only) >> >> --- >> arch/x86/pci/xen.c | 23 +++++++---------------- >> 1 file changed, 7 insertions(+), 16 deletions(-) >> >> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c >> index 5eee495..6969c76 100644 >> --- a/arch/x86/pci/xen.c >> +++ b/arch/x86/pci/xen.c >> @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) >> return 1; >> >> list_for_each_entry(msidesc, &dev->msi_list, list) { >> - __read_msi_msg(msidesc, &msg); >> - pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) | >> - ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff); >> - if (msg.data != XEN_PIRQ_MSI_DATA || >> - xen_irq_from_pirq(pirq) < 0) { >> - pirq = xen_allocate_pirq_msi(dev, msidesc); >> - if (pirq < 0) { >> - irq = -ENODEV; >> - goto error; >> - } >> - xen_msi_compose_msg(dev, pirq, &msg); >> - __write_msi_msg(msidesc, &msg); >> - dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); >> - } else { >> - dev_dbg(&dev->dev, >> - "xen: msi already bound to pirq=%d\n", pirq); >> + pirq = xen_allocate_pirq_msi(dev, msidesc); >> + if (pirq < 0) { >> + irq = -ENODEV; >> + goto error; >> } >> + xen_msi_compose_msg(dev, pirq, &msg); >> + __write_msi_msg(msidesc, &msg); >> + dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); >> irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq, >> (type == PCI_CAP_ID_MSIX) ? >> "msi-x" : "msi", >> > > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team >
On 18.01.2017 17:08, Dan Streetman wrote: > On Mon, Jan 16, 2017 at 6:34 AM, Stefan Bader > <stefan.bader@canonical.com> wrote: >> On 13.01.2017 22:13, Dan Streetman wrote: >>> BugLink: http://bugs.launchpad.net/bugs/1656381 >>> >>> Revert the main part of commit: >>> af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") >>> >>> That commit introduced reading the pci device's msi message data to see >>> if a pirq was previously configured for the device's msi/msix, and re-use >>> that pirq. At the time, that was the correct behavior. However, a >>> later change to Qemu caused it to call into the Xen hypervisor to unmap >>> all pirqs for a pci device, when the pci device disables its MSI/MSIX >>> vectors; specifically the Qemu commit: >>> c976437c7dba9c7444fb41df45468968aaa326ad >>> ("qemu-xen: free all the pirqs for msi/msix when driver unload") >>> >>> Once Qemu added this pirq unmapping, it was no longer correct for the >>> kernel to re-use the pirq number cached in the pci device msi message >>> data. All Qemu releases since 2.1.0 contain the patch that unmaps the >>> pirqs when the pci device disables its MSI/MSIX vectors. >> >> In Trusty release we have qemu-2.0.0. I wonder whether adding the change to the >> kernel (hm, actually same would apply to xenial via hwe path) would cause >> regressions when not using the cloud-archive... > > Yes, but applying this patch to the Trusty kernel is completely > unrelated to a Trusty-based Xen hypervisor. This change is to the > guest kernel. Meaning, if we don't apply this kernel patch to the > Trusty kernel, X/Y/Z patched kernels will still see problems if > they're running under a Trusty-based Xen hypervisor (i.e. qemu 2.0.0 > or earlier), while T unpatched kernel will see problems if it's > running under a newer (Vivid or later, or UCA Kilo or later) Xen > hypervisor. You are right. I just started to think about compat issues while looking at the Trusty kernel and then wondering which qemu version would be around in that release. Put in simple words. Patching any kernel will allow a guest VM to properly use pci-passthrough on any host/hypervisor that uses a qemu-2.1 onwards. And breaks it on older systems. Which means Precise (too old to worry too much) and Trusty based Xen hosts (not having cloud-archive enabled). And I do not have real numbers about how often there is the case of people running Trusty Xen + pci-passthrough. Guess we will see. Maybe the other unknown is what versions of qemu run on which instance type hosts on AWS. Hopefully recent enough ones on those that offer pass-through... But overall no reason to not change the Trusty kernel, too. -Stefan > > Applying this to T as well as X/Y/Z means, all of them will work right > under qemu 2.1.0 or later Xen hypervisor, and all of them will not > work right under qemu 2.0.0 or earlier Xen hypervisor. > >> But then, changing the 2.0.0 qemu to add this unmapping might cause other >> unexpected problems... > > I opened bug 1657489 to track fixing Trusty qemu. > https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1657489 > > I included a matrix there of hypervisor version and guest kernel > version, which I'll include here. > > 1) qemu not patched (2.0.0 and earlier), guest kernel not patched: > CORRECT behavior > hypervisor: Trusty qemu or UCA Icehouse qemu > guest: all without patch from bug 1656381 > failure: none > > 2) qemu not patched (2.0.0 and earlier), guest kernel patched: > INCORRECT behavior > hypervisor: Trusty qemu or UCA Icehouse qemu > guest: all with patch from bug 1656381 > failure: MSI interrupts will fail to be configured for any device, if > the device disables and then re-enables its MSI. Only the first time a > device enables MSI will work. For example, unloading a driver will > result in failure to enable MSI when the driver is reloaded. > > 3) qemu patched (2.1.0 and later), guest kernel not patched: INCORRECT behavior > hypervisor: Vivid or later qemu, or UCA Kilo or later qemu > guest: all without patch from bug 1656381 > failure: MSI interrupts in the guest may not be correctly mapped if > device B enables its MSI after device A has disabled its MSI; when > device A re-enables its MSI, some of its interrupts will fail to be > configured correctly. NVMe shows this repeatedly with multiple NVMe > controllers; usually only 1 NVMe controller will finish initialization > correctly. > > 4) qemu patched (2.1.0 and later), guest kernel patched: CORRECT behavior > hypervisor: Vivid or later qemu, or UCA Kilo or later qemu > guest: all with patch from bug 1656381 > failure: none > > >> >> -Stefan >> >>> >>> This bug is causing failures to initialize multiple NVMe controllers >>> under Xen, because the NVMe driver sets up a single MSIX vector for >>> each controller (concurrently), and then after using that to talk to >>> the controller for some configuration data, it disables the single MSIX >>> vector and re-configures all the MSIX vectors it needs. So the MSIX >>> setup code tries to re-use the cached pirq from the first vector >>> for each controller, but the hypervisor has already given away that >>> pirq to another controller, and its initialization fails. >>> >>> This is discussed in more detail at: >>> https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html >>> >>> Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") >>> Signed-off-by: Dan Streetman <dan.streetman@canonical.com> >>> (backported from X/Y/Z patch for context/function name changes only) >>> >>> --- >>> arch/x86/pci/xen.c | 23 +++++++---------------- >>> 1 file changed, 7 insertions(+), 16 deletions(-) >>> >>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c >>> index 5eee495..6969c76 100644 >>> --- a/arch/x86/pci/xen.c >>> +++ b/arch/x86/pci/xen.c >>> @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) >>> return 1; >>> >>> list_for_each_entry(msidesc, &dev->msi_list, list) { >>> - __read_msi_msg(msidesc, &msg); >>> - pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) | >>> - ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff); >>> - if (msg.data != XEN_PIRQ_MSI_DATA || >>> - xen_irq_from_pirq(pirq) < 0) { >>> - pirq = xen_allocate_pirq_msi(dev, msidesc); >>> - if (pirq < 0) { >>> - irq = -ENODEV; >>> - goto error; >>> - } >>> - xen_msi_compose_msg(dev, pirq, &msg); >>> - __write_msi_msg(msidesc, &msg); >>> - dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); >>> - } else { >>> - dev_dbg(&dev->dev, >>> - "xen: msi already bound to pirq=%d\n", pirq); >>> + pirq = xen_allocate_pirq_msi(dev, msidesc); >>> + if (pirq < 0) { >>> + irq = -ENODEV; >>> + goto error; >>> } >>> + xen_msi_compose_msg(dev, pirq, &msg); >>> + __write_msi_msg(msidesc, &msg); >>> + dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); >>> irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq, >>> (type == PCI_CAP_ID_MSIX) ? >>> "msi-x" : "msi", >>> >> >> >> >> -- >> kernel-team mailing list >> kernel-team@lists.ubuntu.com >> https://lists.ubuntu.com/mailman/listinfo/kernel-team >>
On Wed, Jan 18, 2017 at 11:42 AM, Stefan Bader <stefan.bader@canonical.com> wrote: > On 18.01.2017 17:08, Dan Streetman wrote: >> On Mon, Jan 16, 2017 at 6:34 AM, Stefan Bader >> <stefan.bader@canonical.com> wrote: >>> On 13.01.2017 22:13, Dan Streetman wrote: >>>> BugLink: http://bugs.launchpad.net/bugs/1656381 >>>> >>>> Revert the main part of commit: >>>> af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") >>>> >>>> That commit introduced reading the pci device's msi message data to see >>>> if a pirq was previously configured for the device's msi/msix, and re-use >>>> that pirq. At the time, that was the correct behavior. However, a >>>> later change to Qemu caused it to call into the Xen hypervisor to unmap >>>> all pirqs for a pci device, when the pci device disables its MSI/MSIX >>>> vectors; specifically the Qemu commit: >>>> c976437c7dba9c7444fb41df45468968aaa326ad >>>> ("qemu-xen: free all the pirqs for msi/msix when driver unload") >>>> >>>> Once Qemu added this pirq unmapping, it was no longer correct for the >>>> kernel to re-use the pirq number cached in the pci device msi message >>>> data. All Qemu releases since 2.1.0 contain the patch that unmaps the >>>> pirqs when the pci device disables its MSI/MSIX vectors. >>> >>> In Trusty release we have qemu-2.0.0. I wonder whether adding the change to the >>> kernel (hm, actually same would apply to xenial via hwe path) would cause >>> regressions when not using the cloud-archive... >> >> Yes, but applying this patch to the Trusty kernel is completely >> unrelated to a Trusty-based Xen hypervisor. This change is to the >> guest kernel. Meaning, if we don't apply this kernel patch to the >> Trusty kernel, X/Y/Z patched kernels will still see problems if >> they're running under a Trusty-based Xen hypervisor (i.e. qemu 2.0.0 >> or earlier), while T unpatched kernel will see problems if it's >> running under a newer (Vivid or later, or UCA Kilo or later) Xen >> hypervisor. > > You are right. I just started to think about compat issues while looking at the > Trusty kernel and then wondering which qemu version would be around in that release. > Put in simple words. Patching any kernel will allow a guest VM to properly use > pci-passthrough on any host/hypervisor that uses a qemu-2.1 onwards. And breaks > it on older systems. Which means Precise (too old to worry too much) and Trusty > based Xen hosts (not having cloud-archive enabled). And I do not have real > numbers about how often there is the case of people running Trusty Xen + > pci-passthrough. Guess we will see. Yep, exactly. > Maybe the other unknown is what versions of qemu run on which instance type > hosts on AWS. Hopefully recent enough ones on those that offer pass-through... I'm checking on that now, but as AWS doesn't publicly discuss its hypervisor implementation, I think it's unlikely we'll find out and/or be able to discuss that. But I have let AWS know that they will need to patch any of their hypervisors that use qemu 2.0.0 or earlier (if they have any with a qemu that old) - if they haven't already patched it, which they may have. > > But overall no reason to not change the Trusty kernel, too. > > -Stefan >> >> Applying this to T as well as X/Y/Z means, all of them will work right >> under qemu 2.1.0 or later Xen hypervisor, and all of them will not >> work right under qemu 2.0.0 or earlier Xen hypervisor. >> >>> But then, changing the 2.0.0 qemu to add this unmapping might cause other >>> unexpected problems... >> >> I opened bug 1657489 to track fixing Trusty qemu. >> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1657489 >> >> I included a matrix there of hypervisor version and guest kernel >> version, which I'll include here. >> >> 1) qemu not patched (2.0.0 and earlier), guest kernel not patched: >> CORRECT behavior >> hypervisor: Trusty qemu or UCA Icehouse qemu >> guest: all without patch from bug 1656381 >> failure: none >> >> 2) qemu not patched (2.0.0 and earlier), guest kernel patched: >> INCORRECT behavior >> hypervisor: Trusty qemu or UCA Icehouse qemu >> guest: all with patch from bug 1656381 >> failure: MSI interrupts will fail to be configured for any device, if >> the device disables and then re-enables its MSI. Only the first time a >> device enables MSI will work. For example, unloading a driver will >> result in failure to enable MSI when the driver is reloaded. >> >> 3) qemu patched (2.1.0 and later), guest kernel not patched: INCORRECT behavior >> hypervisor: Vivid or later qemu, or UCA Kilo or later qemu >> guest: all without patch from bug 1656381 >> failure: MSI interrupts in the guest may not be correctly mapped if >> device B enables its MSI after device A has disabled its MSI; when >> device A re-enables its MSI, some of its interrupts will fail to be >> configured correctly. NVMe shows this repeatedly with multiple NVMe >> controllers; usually only 1 NVMe controller will finish initialization >> correctly. >> >> 4) qemu patched (2.1.0 and later), guest kernel patched: CORRECT behavior >> hypervisor: Vivid or later qemu, or UCA Kilo or later qemu >> guest: all with patch from bug 1656381 >> failure: none >> >> >>> >>> -Stefan >>> >>>> >>>> This bug is causing failures to initialize multiple NVMe controllers >>>> under Xen, because the NVMe driver sets up a single MSIX vector for >>>> each controller (concurrently), and then after using that to talk to >>>> the controller for some configuration data, it disables the single MSIX >>>> vector and re-configures all the MSIX vectors it needs. So the MSIX >>>> setup code tries to re-use the cached pirq from the first vector >>>> for each controller, but the hypervisor has already given away that >>>> pirq to another controller, and its initialization fails. >>>> >>>> This is discussed in more detail at: >>>> https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html >>>> >>>> Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") >>>> Signed-off-by: Dan Streetman <dan.streetman@canonical.com> >>>> (backported from X/Y/Z patch for context/function name changes only) >>>> >>>> --- >>>> arch/x86/pci/xen.c | 23 +++++++---------------- >>>> 1 file changed, 7 insertions(+), 16 deletions(-) >>>> >>>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c >>>> index 5eee495..6969c76 100644 >>>> --- a/arch/x86/pci/xen.c >>>> +++ b/arch/x86/pci/xen.c >>>> @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) >>>> return 1; >>>> >>>> list_for_each_entry(msidesc, &dev->msi_list, list) { >>>> - __read_msi_msg(msidesc, &msg); >>>> - pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) | >>>> - ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff); >>>> - if (msg.data != XEN_PIRQ_MSI_DATA || >>>> - xen_irq_from_pirq(pirq) < 0) { >>>> - pirq = xen_allocate_pirq_msi(dev, msidesc); >>>> - if (pirq < 0) { >>>> - irq = -ENODEV; >>>> - goto error; >>>> - } >>>> - xen_msi_compose_msg(dev, pirq, &msg); >>>> - __write_msi_msg(msidesc, &msg); >>>> - dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); >>>> - } else { >>>> - dev_dbg(&dev->dev, >>>> - "xen: msi already bound to pirq=%d\n", pirq); >>>> + pirq = xen_allocate_pirq_msi(dev, msidesc); >>>> + if (pirq < 0) { >>>> + irq = -ENODEV; >>>> + goto error; >>>> } >>>> + xen_msi_compose_msg(dev, pirq, &msg); >>>> + __write_msi_msg(msidesc, &msg); >>>> + dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); >>>> irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq, >>>> (type == PCI_CAP_ID_MSIX) ? >>>> "msi-x" : "msi", >>>> >>> >>> >>> >>> -- >>> kernel-team mailing list >>> kernel-team@lists.ubuntu.com >>> https://lists.ubuntu.com/mailman/listinfo/kernel-team >>> > >
Applied to trusty master-next branch. Cheers, -- Luís
diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index 5eee495..6969c76 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) return 1; list_for_each_entry(msidesc, &dev->msi_list, list) { - __read_msi_msg(msidesc, &msg); - pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) | - ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff); - if (msg.data != XEN_PIRQ_MSI_DATA || - xen_irq_from_pirq(pirq) < 0) { - pirq = xen_allocate_pirq_msi(dev, msidesc); - if (pirq < 0) { - irq = -ENODEV; - goto error; - } - xen_msi_compose_msg(dev, pirq, &msg); - __write_msi_msg(msidesc, &msg); - dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); - } else { - dev_dbg(&dev->dev, - "xen: msi already bound to pirq=%d\n", pirq); + pirq = xen_allocate_pirq_msi(dev, msidesc); + if (pirq < 0) { + irq = -ENODEV; + goto error; } + xen_msi_compose_msg(dev, pirq, &msg); + __write_msi_msg(msidesc, &msg); + dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq); irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq, (type == PCI_CAP_ID_MSIX) ? "msi-x" : "msi",
BugLink: http://bugs.launchpad.net/bugs/1656381 Revert the main part of commit: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") That commit introduced reading the pci device's msi message data to see if a pirq was previously configured for the device's msi/msix, and re-use that pirq. At the time, that was the correct behavior. However, a later change to Qemu caused it to call into the Xen hypervisor to unmap all pirqs for a pci device, when the pci device disables its MSI/MSIX vectors; specifically the Qemu commit: c976437c7dba9c7444fb41df45468968aaa326ad ("qemu-xen: free all the pirqs for msi/msix when driver unload") Once Qemu added this pirq unmapping, it was no longer correct for the kernel to re-use the pirq number cached in the pci device msi message data. All Qemu releases since 2.1.0 contain the patch that unmaps the pirqs when the pci device disables its MSI/MSIX vectors. This bug is causing failures to initialize multiple NVMe controllers under Xen, because the NVMe driver sets up a single MSIX vector for each controller (concurrently), and then after using that to talk to the controller for some configuration data, it disables the single MSIX vector and re-configures all the MSIX vectors it needs. So the MSIX setup code tries to re-use the cached pirq from the first vector for each controller, but the hypervisor has already given away that pirq to another controller, and its initialization fails. This is discussed in more detail at: https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") Signed-off-by: Dan Streetman <dan.streetman@canonical.com> (backported from X/Y/Z patch for context/function name changes only) --- arch/x86/pci/xen.c | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-)