diff mbox

[Trusty] UBUNTU: SAUCE: xen: do not re-use pirq number cached in pci device msi msg data

Message ID 20170113211311.24057-1-dan.streetman@canonical.com
State New
Headers show

Commit Message

Dan Streetman Jan. 13, 2017, 9:13 p.m. UTC
BugLink: http://bugs.launchpad.net/bugs/1656381

Revert the main part of commit:
af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")

That commit introduced reading the pci device's msi message data to see
if a pirq was previously configured for the device's msi/msix, and re-use
that pirq.  At the time, that was the correct behavior.  However, a
later change to Qemu caused it to call into the Xen hypervisor to unmap
all pirqs for a pci device, when the pci device disables its MSI/MSIX
vectors; specifically the Qemu commit:
c976437c7dba9c7444fb41df45468968aaa326ad
("qemu-xen: free all the pirqs for msi/msix when driver unload")

Once Qemu added this pirq unmapping, it was no longer correct for the
kernel to re-use the pirq number cached in the pci device msi message
data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
pirqs when the pci device disables its MSI/MSIX vectors.

This bug is causing failures to initialize multiple NVMe controllers
under Xen, because the NVMe driver sets up a single MSIX vector for
each controller (concurrently), and then after using that to talk to
the controller for some configuration data, it disables the single MSIX
vector and re-configures all the MSIX vectors it needs.  So the MSIX
setup code tries to re-use the cached pirq from the first vector
for each controller, but the hypervisor has already given away that
pirq to another controller, and its initialization fails.

This is discussed in more detail at:
https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html

Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
(backported from X/Y/Z patch for context/function name changes only)

---
 arch/x86/pci/xen.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

Comments

Tim Gardner Jan. 13, 2017, 9:21 p.m. UTC | #1

Stefan Bader Jan. 16, 2017, 11:34 a.m. UTC | #2
On 13.01.2017 22:13, Dan Streetman wrote:
> BugLink: http://bugs.launchpad.net/bugs/1656381
> 
> Revert the main part of commit:
> af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
> 
> That commit introduced reading the pci device's msi message data to see
> if a pirq was previously configured for the device's msi/msix, and re-use
> that pirq.  At the time, that was the correct behavior.  However, a
> later change to Qemu caused it to call into the Xen hypervisor to unmap
> all pirqs for a pci device, when the pci device disables its MSI/MSIX
> vectors; specifically the Qemu commit:
> c976437c7dba9c7444fb41df45468968aaa326ad
> ("qemu-xen: free all the pirqs for msi/msix when driver unload")
> 
> Once Qemu added this pirq unmapping, it was no longer correct for the
> kernel to re-use the pirq number cached in the pci device msi message
> data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
> pirqs when the pci device disables its MSI/MSIX vectors.

In Trusty release we have qemu-2.0.0. I wonder whether adding the change to the
kernel (hm, actually same would apply to xenial via hwe path) would cause
regressions when not using the cloud-archive...
But then, changing the 2.0.0 qemu to add this unmapping might cause other
unexpected problems...

-Stefan

> 
> This bug is causing failures to initialize multiple NVMe controllers
> under Xen, because the NVMe driver sets up a single MSIX vector for
> each controller (concurrently), and then after using that to talk to
> the controller for some configuration data, it disables the single MSIX
> vector and re-configures all the MSIX vectors it needs.  So the MSIX
> setup code tries to re-use the cached pirq from the first vector
> for each controller, but the hypervisor has already given away that
> pirq to another controller, and its initialization fails.
> 
> This is discussed in more detail at:
> https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html
> 
> Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
> Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
> (backported from X/Y/Z patch for context/function name changes only)
> 
> ---
>  arch/x86/pci/xen.c | 23 +++++++----------------
>  1 file changed, 7 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index 5eee495..6969c76 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>  		return 1;
>  
>  	list_for_each_entry(msidesc, &dev->msi_list, list) {
> -		__read_msi_msg(msidesc, &msg);
> -		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
> -			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
> -		    xen_irq_from_pirq(pirq) < 0) {
> -			pirq = xen_allocate_pirq_msi(dev, msidesc);
> -			if (pirq < 0) {
> -				irq = -ENODEV;
> -				goto error;
> -			}
> -			xen_msi_compose_msg(dev, pirq, &msg);
> -			__write_msi_msg(msidesc, &msg);
> -			dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
> -		} else {
> -			dev_dbg(&dev->dev,
> -				"xen: msi already bound to pirq=%d\n", pirq);
> +		pirq = xen_allocate_pirq_msi(dev, msidesc);
> +		if (pirq < 0) {
> +			irq = -ENODEV;
> +			goto error;
>  		}
> +		xen_msi_compose_msg(dev, pirq, &msg);
> +		__write_msi_msg(msidesc, &msg);
> +		dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>  		irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq,
>  					       (type == PCI_CAP_ID_MSIX) ?
>  					       "msi-x" : "msi",
>
Dan Streetman Jan. 18, 2017, 4:08 p.m. UTC | #3
On Mon, Jan 16, 2017 at 6:34 AM, Stefan Bader
<stefan.bader@canonical.com> wrote:
> On 13.01.2017 22:13, Dan Streetman wrote:
>> BugLink: http://bugs.launchpad.net/bugs/1656381
>>
>> Revert the main part of commit:
>> af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
>>
>> That commit introduced reading the pci device's msi message data to see
>> if a pirq was previously configured for the device's msi/msix, and re-use
>> that pirq.  At the time, that was the correct behavior.  However, a
>> later change to Qemu caused it to call into the Xen hypervisor to unmap
>> all pirqs for a pci device, when the pci device disables its MSI/MSIX
>> vectors; specifically the Qemu commit:
>> c976437c7dba9c7444fb41df45468968aaa326ad
>> ("qemu-xen: free all the pirqs for msi/msix when driver unload")
>>
>> Once Qemu added this pirq unmapping, it was no longer correct for the
>> kernel to re-use the pirq number cached in the pci device msi message
>> data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
>> pirqs when the pci device disables its MSI/MSIX vectors.
>
> In Trusty release we have qemu-2.0.0. I wonder whether adding the change to the
> kernel (hm, actually same would apply to xenial via hwe path) would cause
> regressions when not using the cloud-archive...

Yes, but applying this patch to the Trusty kernel is completely
unrelated to a Trusty-based Xen hypervisor.  This change is to the
guest kernel.  Meaning, if we don't apply this kernel patch to the
Trusty kernel, X/Y/Z patched kernels will still see problems if
they're running under a Trusty-based Xen hypervisor (i.e. qemu 2.0.0
or earlier), while T unpatched kernel will see problems if it's
running under a newer (Vivid or later, or UCA Kilo or later) Xen
hypervisor.

Applying this to T as well as X/Y/Z means, all of them will work right
under qemu 2.1.0 or later Xen hypervisor, and all of them will not
work right under qemu 2.0.0 or earlier Xen hypervisor.

> But then, changing the 2.0.0 qemu to add this unmapping might cause other
> unexpected problems...

I opened bug 1657489 to track fixing Trusty qemu.
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1657489

I included a matrix there of hypervisor version and guest kernel
version, which I'll include here.

1) qemu not patched (2.0.0 and earlier), guest kernel not patched:
CORRECT behavior
hypervisor: Trusty qemu or UCA Icehouse qemu
guest: all without patch from bug 1656381
failure: none

2) qemu not patched (2.0.0 and earlier), guest kernel patched:
INCORRECT behavior
hypervisor: Trusty qemu or UCA Icehouse qemu
guest: all with patch from bug 1656381
failure: MSI interrupts will fail to be configured for any device, if
the device disables and then re-enables its MSI. Only the first time a
device enables MSI will work. For example, unloading a driver will
result in failure to enable MSI when the driver is reloaded.

3) qemu patched (2.1.0 and later), guest kernel not patched: INCORRECT behavior
hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
guest: all without patch from bug 1656381
failure: MSI interrupts in the guest may not be correctly mapped if
device B enables its MSI after device A has disabled its MSI; when
device A re-enables its MSI, some of its interrupts will fail to be
configured correctly. NVMe shows this repeatedly with multiple NVMe
controllers; usually only 1 NVMe controller will finish initialization
correctly.

4) qemu patched (2.1.0 and later), guest kernel patched: CORRECT behavior
hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
guest: all with patch from bug 1656381
failure: none


>
> -Stefan
>
>>
>> This bug is causing failures to initialize multiple NVMe controllers
>> under Xen, because the NVMe driver sets up a single MSIX vector for
>> each controller (concurrently), and then after using that to talk to
>> the controller for some configuration data, it disables the single MSIX
>> vector and re-configures all the MSIX vectors it needs.  So the MSIX
>> setup code tries to re-use the cached pirq from the first vector
>> for each controller, but the hypervisor has already given away that
>> pirq to another controller, and its initialization fails.
>>
>> This is discussed in more detail at:
>> https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html
>>
>> Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
>> Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
>> (backported from X/Y/Z patch for context/function name changes only)
>>
>> ---
>>  arch/x86/pci/xen.c | 23 +++++++----------------
>>  1 file changed, 7 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> index 5eee495..6969c76 100644
>> --- a/arch/x86/pci/xen.c
>> +++ b/arch/x86/pci/xen.c
>> @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>>               return 1;
>>
>>       list_for_each_entry(msidesc, &dev->msi_list, list) {
>> -             __read_msi_msg(msidesc, &msg);
>> -             pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>> -                     ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>> -             if (msg.data != XEN_PIRQ_MSI_DATA ||
>> -                 xen_irq_from_pirq(pirq) < 0) {
>> -                     pirq = xen_allocate_pirq_msi(dev, msidesc);
>> -                     if (pirq < 0) {
>> -                             irq = -ENODEV;
>> -                             goto error;
>> -                     }
>> -                     xen_msi_compose_msg(dev, pirq, &msg);
>> -                     __write_msi_msg(msidesc, &msg);
>> -                     dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>> -             } else {
>> -                     dev_dbg(&dev->dev,
>> -                             "xen: msi already bound to pirq=%d\n", pirq);
>> +             pirq = xen_allocate_pirq_msi(dev, msidesc);
>> +             if (pirq < 0) {
>> +                     irq = -ENODEV;
>> +                     goto error;
>>               }
>> +             xen_msi_compose_msg(dev, pirq, &msg);
>> +             __write_msi_msg(msidesc, &msg);
>> +             dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>>               irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq,
>>                                              (type == PCI_CAP_ID_MSIX) ?
>>                                              "msi-x" : "msi",
>>
>
>
>
> --
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
>
Stefan Bader Jan. 18, 2017, 4:42 p.m. UTC | #4
On 18.01.2017 17:08, Dan Streetman wrote:
> On Mon, Jan 16, 2017 at 6:34 AM, Stefan Bader
> <stefan.bader@canonical.com> wrote:
>> On 13.01.2017 22:13, Dan Streetman wrote:
>>> BugLink: http://bugs.launchpad.net/bugs/1656381
>>>
>>> Revert the main part of commit:
>>> af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
>>>
>>> That commit introduced reading the pci device's msi message data to see
>>> if a pirq was previously configured for the device's msi/msix, and re-use
>>> that pirq.  At the time, that was the correct behavior.  However, a
>>> later change to Qemu caused it to call into the Xen hypervisor to unmap
>>> all pirqs for a pci device, when the pci device disables its MSI/MSIX
>>> vectors; specifically the Qemu commit:
>>> c976437c7dba9c7444fb41df45468968aaa326ad
>>> ("qemu-xen: free all the pirqs for msi/msix when driver unload")
>>>
>>> Once Qemu added this pirq unmapping, it was no longer correct for the
>>> kernel to re-use the pirq number cached in the pci device msi message
>>> data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
>>> pirqs when the pci device disables its MSI/MSIX vectors.
>>
>> In Trusty release we have qemu-2.0.0. I wonder whether adding the change to the
>> kernel (hm, actually same would apply to xenial via hwe path) would cause
>> regressions when not using the cloud-archive...
> 
> Yes, but applying this patch to the Trusty kernel is completely
> unrelated to a Trusty-based Xen hypervisor.  This change is to the
> guest kernel.  Meaning, if we don't apply this kernel patch to the
> Trusty kernel, X/Y/Z patched kernels will still see problems if
> they're running under a Trusty-based Xen hypervisor (i.e. qemu 2.0.0
> or earlier), while T unpatched kernel will see problems if it's
> running under a newer (Vivid or later, or UCA Kilo or later) Xen
> hypervisor.

You are right. I just started to think about compat issues while looking at the
Trusty kernel and then wondering which qemu version would be around in that release.
Put in simple words. Patching any kernel will allow a guest VM to properly use
pci-passthrough on any host/hypervisor that uses a qemu-2.1 onwards. And breaks
it on older systems. Which means Precise (too old to worry too much) and Trusty
based Xen hosts (not having cloud-archive enabled). And I do not have real
numbers about how often there is the case of people running Trusty Xen +
pci-passthrough. Guess we will see.
Maybe the other unknown is what versions of qemu run on which instance type
hosts on AWS. Hopefully recent enough ones on those that offer pass-through...

But overall no reason to not change the Trusty kernel, too.

-Stefan
> 
> Applying this to T as well as X/Y/Z means, all of them will work right
> under qemu 2.1.0 or later Xen hypervisor, and all of them will not
> work right under qemu 2.0.0 or earlier Xen hypervisor.
> 
>> But then, changing the 2.0.0 qemu to add this unmapping might cause other
>> unexpected problems...
> 
> I opened bug 1657489 to track fixing Trusty qemu.
> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1657489
> 
> I included a matrix there of hypervisor version and guest kernel
> version, which I'll include here.
> 
> 1) qemu not patched (2.0.0 and earlier), guest kernel not patched:
> CORRECT behavior
> hypervisor: Trusty qemu or UCA Icehouse qemu
> guest: all without patch from bug 1656381
> failure: none
> 
> 2) qemu not patched (2.0.0 and earlier), guest kernel patched:
> INCORRECT behavior
> hypervisor: Trusty qemu or UCA Icehouse qemu
> guest: all with patch from bug 1656381
> failure: MSI interrupts will fail to be configured for any device, if
> the device disables and then re-enables its MSI. Only the first time a
> device enables MSI will work. For example, unloading a driver will
> result in failure to enable MSI when the driver is reloaded.
> 
> 3) qemu patched (2.1.0 and later), guest kernel not patched: INCORRECT behavior
> hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
> guest: all without patch from bug 1656381
> failure: MSI interrupts in the guest may not be correctly mapped if
> device B enables its MSI after device A has disabled its MSI; when
> device A re-enables its MSI, some of its interrupts will fail to be
> configured correctly. NVMe shows this repeatedly with multiple NVMe
> controllers; usually only 1 NVMe controller will finish initialization
> correctly.
> 
> 4) qemu patched (2.1.0 and later), guest kernel patched: CORRECT behavior
> hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
> guest: all with patch from bug 1656381
> failure: none
> 
> 
>>
>> -Stefan
>>
>>>
>>> This bug is causing failures to initialize multiple NVMe controllers
>>> under Xen, because the NVMe driver sets up a single MSIX vector for
>>> each controller (concurrently), and then after using that to talk to
>>> the controller for some configuration data, it disables the single MSIX
>>> vector and re-configures all the MSIX vectors it needs.  So the MSIX
>>> setup code tries to re-use the cached pirq from the first vector
>>> for each controller, but the hypervisor has already given away that
>>> pirq to another controller, and its initialization fails.
>>>
>>> This is discussed in more detail at:
>>> https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html
>>>
>>> Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
>>> Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
>>> (backported from X/Y/Z patch for context/function name changes only)
>>>
>>> ---
>>>  arch/x86/pci/xen.c | 23 +++++++----------------
>>>  1 file changed, 7 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>>> index 5eee495..6969c76 100644
>>> --- a/arch/x86/pci/xen.c
>>> +++ b/arch/x86/pci/xen.c
>>> @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>>>               return 1;
>>>
>>>       list_for_each_entry(msidesc, &dev->msi_list, list) {
>>> -             __read_msi_msg(msidesc, &msg);
>>> -             pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>>> -                     ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>>> -             if (msg.data != XEN_PIRQ_MSI_DATA ||
>>> -                 xen_irq_from_pirq(pirq) < 0) {
>>> -                     pirq = xen_allocate_pirq_msi(dev, msidesc);
>>> -                     if (pirq < 0) {
>>> -                             irq = -ENODEV;
>>> -                             goto error;
>>> -                     }
>>> -                     xen_msi_compose_msg(dev, pirq, &msg);
>>> -                     __write_msi_msg(msidesc, &msg);
>>> -                     dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>>> -             } else {
>>> -                     dev_dbg(&dev->dev,
>>> -                             "xen: msi already bound to pirq=%d\n", pirq);
>>> +             pirq = xen_allocate_pirq_msi(dev, msidesc);
>>> +             if (pirq < 0) {
>>> +                     irq = -ENODEV;
>>> +                     goto error;
>>>               }
>>> +             xen_msi_compose_msg(dev, pirq, &msg);
>>> +             __write_msi_msg(msidesc, &msg);
>>> +             dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>>>               irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq,
>>>                                              (type == PCI_CAP_ID_MSIX) ?
>>>                                              "msi-x" : "msi",
>>>
>>
>>
>>
>> --
>> kernel-team mailing list
>> kernel-team@lists.ubuntu.com
>> https://lists.ubuntu.com/mailman/listinfo/kernel-team
>>
Dan Streetman Jan. 18, 2017, 6:19 p.m. UTC | #5
On Wed, Jan 18, 2017 at 11:42 AM, Stefan Bader
<stefan.bader@canonical.com> wrote:
> On 18.01.2017 17:08, Dan Streetman wrote:
>> On Mon, Jan 16, 2017 at 6:34 AM, Stefan Bader
>> <stefan.bader@canonical.com> wrote:
>>> On 13.01.2017 22:13, Dan Streetman wrote:
>>>> BugLink: http://bugs.launchpad.net/bugs/1656381
>>>>
>>>> Revert the main part of commit:
>>>> af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
>>>>
>>>> That commit introduced reading the pci device's msi message data to see
>>>> if a pirq was previously configured for the device's msi/msix, and re-use
>>>> that pirq.  At the time, that was the correct behavior.  However, a
>>>> later change to Qemu caused it to call into the Xen hypervisor to unmap
>>>> all pirqs for a pci device, when the pci device disables its MSI/MSIX
>>>> vectors; specifically the Qemu commit:
>>>> c976437c7dba9c7444fb41df45468968aaa326ad
>>>> ("qemu-xen: free all the pirqs for msi/msix when driver unload")
>>>>
>>>> Once Qemu added this pirq unmapping, it was no longer correct for the
>>>> kernel to re-use the pirq number cached in the pci device msi message
>>>> data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
>>>> pirqs when the pci device disables its MSI/MSIX vectors.
>>>
>>> In Trusty release we have qemu-2.0.0. I wonder whether adding the change to the
>>> kernel (hm, actually same would apply to xenial via hwe path) would cause
>>> regressions when not using the cloud-archive...
>>
>> Yes, but applying this patch to the Trusty kernel is completely
>> unrelated to a Trusty-based Xen hypervisor.  This change is to the
>> guest kernel.  Meaning, if we don't apply this kernel patch to the
>> Trusty kernel, X/Y/Z patched kernels will still see problems if
>> they're running under a Trusty-based Xen hypervisor (i.e. qemu 2.0.0
>> or earlier), while T unpatched kernel will see problems if it's
>> running under a newer (Vivid or later, or UCA Kilo or later) Xen
>> hypervisor.
>
> You are right. I just started to think about compat issues while looking at the
> Trusty kernel and then wondering which qemu version would be around in that release.
> Put in simple words. Patching any kernel will allow a guest VM to properly use
> pci-passthrough on any host/hypervisor that uses a qemu-2.1 onwards. And breaks
> it on older systems. Which means Precise (too old to worry too much) and Trusty
> based Xen hosts (not having cloud-archive enabled). And I do not have real
> numbers about how often there is the case of people running Trusty Xen +
> pci-passthrough. Guess we will see.

Yep, exactly.

> Maybe the other unknown is what versions of qemu run on which instance type
> hosts on AWS. Hopefully recent enough ones on those that offer pass-through...

I'm checking on that now, but as AWS doesn't publicly discuss its
hypervisor implementation, I think it's unlikely we'll find out and/or
be able to discuss that.  But I have let AWS know that they will need
to patch any of their hypervisors that use qemu 2.0.0 or earlier (if
they have any with a qemu that old) - if they haven't already patched
it, which they may have.

>
> But overall no reason to not change the Trusty kernel, too.
>
> -Stefan
>>
>> Applying this to T as well as X/Y/Z means, all of them will work right
>> under qemu 2.1.0 or later Xen hypervisor, and all of them will not
>> work right under qemu 2.0.0 or earlier Xen hypervisor.
>>
>>> But then, changing the 2.0.0 qemu to add this unmapping might cause other
>>> unexpected problems...
>>
>> I opened bug 1657489 to track fixing Trusty qemu.
>> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1657489
>>
>> I included a matrix there of hypervisor version and guest kernel
>> version, which I'll include here.
>>
>> 1) qemu not patched (2.0.0 and earlier), guest kernel not patched:
>> CORRECT behavior
>> hypervisor: Trusty qemu or UCA Icehouse qemu
>> guest: all without patch from bug 1656381
>> failure: none
>>
>> 2) qemu not patched (2.0.0 and earlier), guest kernel patched:
>> INCORRECT behavior
>> hypervisor: Trusty qemu or UCA Icehouse qemu
>> guest: all with patch from bug 1656381
>> failure: MSI interrupts will fail to be configured for any device, if
>> the device disables and then re-enables its MSI. Only the first time a
>> device enables MSI will work. For example, unloading a driver will
>> result in failure to enable MSI when the driver is reloaded.
>>
>> 3) qemu patched (2.1.0 and later), guest kernel not patched: INCORRECT behavior
>> hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
>> guest: all without patch from bug 1656381
>> failure: MSI interrupts in the guest may not be correctly mapped if
>> device B enables its MSI after device A has disabled its MSI; when
>> device A re-enables its MSI, some of its interrupts will fail to be
>> configured correctly. NVMe shows this repeatedly with multiple NVMe
>> controllers; usually only 1 NVMe controller will finish initialization
>> correctly.
>>
>> 4) qemu patched (2.1.0 and later), guest kernel patched: CORRECT behavior
>> hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
>> guest: all with patch from bug 1656381
>> failure: none
>>
>>
>>>
>>> -Stefan
>>>
>>>>
>>>> This bug is causing failures to initialize multiple NVMe controllers
>>>> under Xen, because the NVMe driver sets up a single MSIX vector for
>>>> each controller (concurrently), and then after using that to talk to
>>>> the controller for some configuration data, it disables the single MSIX
>>>> vector and re-configures all the MSIX vectors it needs.  So the MSIX
>>>> setup code tries to re-use the cached pirq from the first vector
>>>> for each controller, but the hypervisor has already given away that
>>>> pirq to another controller, and its initialization fails.
>>>>
>>>> This is discussed in more detail at:
>>>> https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html
>>>>
>>>> Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
>>>> Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
>>>> (backported from X/Y/Z patch for context/function name changes only)
>>>>
>>>> ---
>>>>  arch/x86/pci/xen.c | 23 +++++++----------------
>>>>  1 file changed, 7 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>>>> index 5eee495..6969c76 100644
>>>> --- a/arch/x86/pci/xen.c
>>>> +++ b/arch/x86/pci/xen.c
>>>> @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>>>>               return 1;
>>>>
>>>>       list_for_each_entry(msidesc, &dev->msi_list, list) {
>>>> -             __read_msi_msg(msidesc, &msg);
>>>> -             pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>>>> -                     ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>>>> -             if (msg.data != XEN_PIRQ_MSI_DATA ||
>>>> -                 xen_irq_from_pirq(pirq) < 0) {
>>>> -                     pirq = xen_allocate_pirq_msi(dev, msidesc);
>>>> -                     if (pirq < 0) {
>>>> -                             irq = -ENODEV;
>>>> -                             goto error;
>>>> -                     }
>>>> -                     xen_msi_compose_msg(dev, pirq, &msg);
>>>> -                     __write_msi_msg(msidesc, &msg);
>>>> -                     dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>>>> -             } else {
>>>> -                     dev_dbg(&dev->dev,
>>>> -                             "xen: msi already bound to pirq=%d\n", pirq);
>>>> +             pirq = xen_allocate_pirq_msi(dev, msidesc);
>>>> +             if (pirq < 0) {
>>>> +                     irq = -ENODEV;
>>>> +                     goto error;
>>>>               }
>>>> +             xen_msi_compose_msg(dev, pirq, &msg);
>>>> +             __write_msi_msg(msidesc, &msg);
>>>> +             dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>>>>               irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq,
>>>>                                              (type == PCI_CAP_ID_MSIX) ?
>>>>                                              "msi-x" : "msi",
>>>>
>>>
>>>
>>>
>>> --
>>> kernel-team mailing list
>>> kernel-team@lists.ubuntu.com
>>> https://lists.ubuntu.com/mailman/listinfo/kernel-team
>>>
>
>
Luis Henriques Jan. 20, 2017, 12:57 p.m. UTC | #6
Applied to trusty master-next branch.

Cheers,
--
Luís
diff mbox

Patch

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 5eee495..6969c76 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -227,23 +227,14 @@  static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 		return 1;
 
 	list_for_each_entry(msidesc, &dev->msi_list, list) {
-		__read_msi_msg(msidesc, &msg);
-		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
-			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
-		if (msg.data != XEN_PIRQ_MSI_DATA ||
-		    xen_irq_from_pirq(pirq) < 0) {
-			pirq = xen_allocate_pirq_msi(dev, msidesc);
-			if (pirq < 0) {
-				irq = -ENODEV;
-				goto error;
-			}
-			xen_msi_compose_msg(dev, pirq, &msg);
-			__write_msi_msg(msidesc, &msg);
-			dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
-		} else {
-			dev_dbg(&dev->dev,
-				"xen: msi already bound to pirq=%d\n", pirq);
+		pirq = xen_allocate_pirq_msi(dev, msidesc);
+		if (pirq < 0) {
+			irq = -ENODEV;
+			goto error;
 		}
+		xen_msi_compose_msg(dev, pirq, &msg);
+		__write_msi_msg(msidesc, &msg);
+		dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
 		irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq,
 					       (type == PCI_CAP_ID_MSIX) ?
 					       "msi-x" : "msi",