Patchwork PCI/MSI: don't disable AMD IOMMU MSI on Xen dom0

login
register
mail settings
Submitter Wei Wang
Date June 21, 2012, 11:21 a.m.
Message ID <4FE303C4.3060705@amd.com>
Download mbox | patch
Permalink /patch/166294/
State Rejected
Headers show

Comments

Wei Wang - June 21, 2012, 11:21 a.m.
On 06/21/2012 11:59 AM, Jan Beulich wrote:
>>>> On 14.06.12 at 17:15, Wei Wang<wei.wang2@amd.com>  wrote:
>> Am 14.06.2012 16:18, schrieb Jan Beulich:
>>> Have you at all considered getting this fixed on the kernel side?
>>> As I don't have direct access to any AMD IOMMU capable
>>> system - can one, other than by enumerating the respective
>>> PCI IDs or reading ACPI tables, reasonably easily identify the
>>> devices in question (e.g. via vendor/class/sub-class or some
>>> such)? That might permit skipping those in the offending kernel
>>> code...
>>
>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral)
>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this
>> should be enough to identify amd iommu device. I could send you a kernel
>> patch for review using this approach. I would believe that fixing this
>> issue in 4.2, no matter how, is really important for amd iommu.
>
> As you didn't come forward with anything, here's my first
> take on this:

Hi Jan
Thanks a lot for the patch. Actually I plan to send my version today, 
which is based on 3.4 pv_ops but looks very similar to yours. So, Acked!

I also evaluated the possibility of hiding iommu device from dom0. I 
think the change is no quite a lot, at least, for io based pcicfg 
access. A proof-of-concept patch is attached.

Thanks,
Wei


>
> Commit d5dea7d95c48d7bc951cee4910a7fd9c0cd26fb0 ("PCI: msi: Disable msi
> interrupts when we initialize a pci device") caused MSI to get disabled
> on Xen Dom0 despite it having got turned on purposefully by the
> hypervisor. As an immediate band aid, suppress the disabling in this
> specific case.
>
> I don't think, however, that either the original change or this fix are
> really appropriate. The original fix, besides leaving aside the
> presence of a hypervisor like Xen, doesn't deal with all possible
> cases (in particular it has no effect if the secondary kernel was built
> with CONFIG_PCI_MSI unset). And taking into account a hypervisor like
> Xen, the logic like this should probably be skipped altogether (i.e. it
> should be entirely left to the hypervisor, being the entity in control
> of IRQs).
>
> Signed-off-by: Jan Beulich<jbeulich@suse.com>
> Cc: stable@kernel.org
>
> ---
>   drivers/pci/msi.c       |    7 +++++++
>   include/linux/pci_ids.h |    1 +
>   2 files changed, 8 insertions(+)
>
> --- 3.5-rc3/drivers/pci/msi.c
> +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/drivers/pci/msi.c
> @@ -20,6 +20,7 @@
>   #include<linux/errno.h>
>   #include<linux/io.h>
>   #include<linux/slab.h>
> +#include<xen/xen.h>
>
>   #include "pci.h"
>   #include "msi.h"
> @@ -1022,7 +1023,13 @@ void pci_msi_init_pci_dev(struct pci_dev
>   	/* Disable the msi hardware to avoid screaming interrupts
>   	 * during boot.  This is the power on reset default so
>   	 * usually this should be a noop.
> +	 * But on a Xen host don't do this for IOMMUs which the hypervisor
> +	 * is in control of (and hence has already enabled on purpose).
>   	 */
> +	if (xen_initial_domain()
> +	&&  (dev->class>>  8) == PCI_CLASS_SYSTEM_IOMMU
> +	&&  dev->vendor == PCI_VENDOR_ID_AMD)
> +		return;
>   	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
>   	if (pos)
>   		msi_set_enable(dev, pos, 0);
> --- 3.5-rc3/include/linux/pci_ids.h
> +++ 3.5-rc3-xen-pci-msi-no-iommu-disable/include/linux/pci_ids.h
> @@ -75,6 +75,7 @@
>   #define PCI_CLASS_SYSTEM_RTC		0x0803
>   #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
>   #define PCI_CLASS_SYSTEM_SDHCI		0x0805
> +#define PCI_CLASS_SYSTEM_IOMMU		0x0806
>   #define PCI_CLASS_SYSTEM_OTHER		0x0880
>
>   #define PCI_BASE_CLASS_INPUT		0x09
>
>
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Beulich - June 21, 2012, 12:06 p.m.
>>> On 21.06.12 at 13:21, Wei Wang <wei.wang2@amd.com> wrote:
> On 06/21/2012 11:59 AM, Jan Beulich wrote:
>>>>> On 14.06.12 at 17:15, Wei Wang<wei.wang2@amd.com>  wrote:
>>> Am 14.06.2012 16:18, schrieb Jan Beulich:
>>>> Have you at all considered getting this fixed on the kernel side?
>>>> As I don't have direct access to any AMD IOMMU capable
>>>> system - can one, other than by enumerating the respective
>>>> PCI IDs or reading ACPI tables, reasonably easily identify the
>>>> devices in question (e.g. via vendor/class/sub-class or some
>>>> such)? That might permit skipping those in the offending kernel
>>>> code...
>>>
>>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral)
>>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this
>>> should be enough to identify amd iommu device. I could send you a kernel
>>> patch for review using this approach. I would believe that fixing this
>>> issue in 4.2, no matter how, is really important for amd iommu.
>>
>> As you didn't come forward with anything, here's my first
>> take on this:
> 
> Hi Jan
> Thanks a lot for the patch. Actually I plan to send my version today, 
> which is based on 3.4 pv_ops but looks very similar to yours. So, Acked!
> 
> I also evaluated the possibility of hiding iommu device from dom0. I 
> think the change is no quite a lot, at least, for io based pcicfg 
> access. A proof-of-concept patch is attached.

This completely hides the device from Dom0, but only when
config space is accessed via method 1. Did you not see my
earlier patch doing this for MCFG as well (albeit only disallowing
writes, so allowing the device to still be seen by Dom0)?

Whether completely hiding the device is actually okay I can't
easily tell: Would IOMMUs always be either at func 0 of a single-
unction device, or at a non-zero func of a multi-function one? If
not, other devices may get hidden implicitly.

Also I noticed just now that guest_io_read() wouldn't really
behave correctly when pci_cfg_ok() returned false - it might
pass back 0xff even for a multi-byte read. I'll send a fix shortly.

Jan

> --- a/xen/arch/x86/traps.c      Thu Jun 21 11:30:59 2012 +0200
> +++ b/xen/arch/x86/traps.c      Thu Jun 21 13:19:02 2012 +0200
> @@ -73,6 +73,7 @@
>   #include <asm/hpet.h>
>   #include <public/arch-x86/cpuid.h>
>   #include <xsm/xsm.h>
> +#include <asm/hvm/svm/amd-iommu-proto.h>
> 
>   /*
>    * opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
> @@ -1686,10 +1687,19 @@ static int pci_cfg_ok(struct domain *d,
>   {
>       uint32_t machine_bdf;
>       uint16_t start, end;
> +    struct amd_iommu *iommu;
> +
>       if (!IS_PRIV(d))
>           return 0;
> 
>       machine_bdf = (d->arch.pci_cf8 >> 8) & 0xFFFF;
> +
> +    for_each_amd_iommu ( iommu )
> +    {
> +        if ( machine_bdf == iommu->bdf )
> +            return 0;
> +    }
> +
>       start = d->arch.pci_cf8 & 0xFF;
>       end = start + size - 1;
>       if (xsm_pci_config_permission(d, machine_bdf, start, end, write))


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Wang - June 21, 2012, 12:28 p.m.
On 06/21/2012 02:06 PM, Jan Beulich wrote:
>>>> On 21.06.12 at 13:21, Wei Wang<wei.wang2@amd.com>  wrote:
>> On 06/21/2012 11:59 AM, Jan Beulich wrote:
>>>>>> On 14.06.12 at 17:15, Wei Wang<wei.wang2@amd.com>   wrote:
>>>> Am 14.06.2012 16:18, schrieb Jan Beulich:
>>>>> Have you at all considered getting this fixed on the kernel side?
>>>>> As I don't have direct access to any AMD IOMMU capable
>>>>> system - can one, other than by enumerating the respective
>>>>> PCI IDs or reading ACPI tables, reasonably easily identify the
>>>>> devices in question (e.g. via vendor/class/sub-class or some
>>>>> such)? That might permit skipping those in the offending kernel
>>>>> code...
>>>>
>>>> AMD IOMMUs (both v1 and v2) uses class id 08 (System Base Peripheral)
>>>> and sub class id 06 (IOMMU). Combined with PCI_VENDEOR_ID_AMD, this
>>>> should be enough to identify amd iommu device. I could send you a kernel
>>>> patch for review using this approach. I would believe that fixing this
>>>> issue in 4.2, no matter how, is really important for amd iommu.
>>>
>>> As you didn't come forward with anything, here's my first
>>> take on this:
>>
>> Hi Jan
>> Thanks a lot for the patch. Actually I plan to send my version today,
>> which is based on 3.4 pv_ops but looks very similar to yours. So, Acked!
>>
>> I also evaluated the possibility of hiding iommu device from dom0. I
>> think the change is no quite a lot, at least, for io based pcicfg
>> access. A proof-of-concept patch is attached.
>
> This completely hides the device from Dom0, but only when
> config space is accessed via method 1. Did you not see my
> earlier patch doing this for MCFG as well
Could you please provide a particular c/s number?... (I saw too many c/s 
might be related to this topic). so that I could work out a patch to 
support both i/o and mmcfg.

(albeit only disallowing
> writes, so allowing the device to still be seen by Dom0)?
Sounds better to me...this still allows user to check iommu status from 
lspci.

> Whether completely hiding the device is actually okay I can't
> easily tell: Would IOMMUs always be either at func 0 of a single-
> unction device, or at a non-zero func of a multi-function one? If
> not, other devices may get hidden implicitly.

AMD IOMMU is an independent pci-e endpoint, and this function will not 
be used for other purposes other than containing an iommu. So I don't 
see that iommu will share bdf value with other devices.

Thanks,
Wei

> Also I noticed just now that guest_io_read() wouldn't really
> behave correctly when pci_cfg_ok() returned false - it might
> pass back 0xff even for a multi-byte read. I'll send a fix shortly.
>
> Jan
>
>> --- a/xen/arch/x86/traps.c      Thu Jun 21 11:30:59 2012 +0200
>> +++ b/xen/arch/x86/traps.c      Thu Jun 21 13:19:02 2012 +0200
>> @@ -73,6 +73,7 @@
>>    #include<asm/hpet.h>
>>    #include<public/arch-x86/cpuid.h>
>>    #include<xsm/xsm.h>
>> +#include<asm/hvm/svm/amd-iommu-proto.h>
>>
>>    /*
>>     * opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
>> @@ -1686,10 +1687,19 @@ static int pci_cfg_ok(struct domain *d,
>>    {
>>        uint32_t machine_bdf;
>>        uint16_t start, end;
>> +    struct amd_iommu *iommu;
>> +
>>        if (!IS_PRIV(d))
>>            return 0;
>>
>>        machine_bdf = (d->arch.pci_cf8>>  8)&  0xFFFF;
>> +
>> +    for_each_amd_iommu ( iommu )
>> +    {
>> +        if ( machine_bdf == iommu->bdf )
>> +            return 0;
>> +    }
>> +
>>        start = d->arch.pci_cf8&  0xFF;
>>        end = start + size - 1;
>>        if (xsm_pci_config_permission(d, machine_bdf, start, end, write))
>
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Beulich - June 21, 2012, 12:45 p.m.
>>> On 21.06.12 at 14:28, Wei Wang <wei.wang2@amd.com> wrote:
> On 06/21/2012 02:06 PM, Jan Beulich wrote:
>>>>> On 21.06.12 at 13:21, Wei Wang<wei.wang2@amd.com>  wrote:
>>> I also evaluated the possibility of hiding iommu device from dom0. I
>>> think the change is no quite a lot, at least, for io based pcicfg
>>> access. A proof-of-concept patch is attached.
>>
>> This completely hides the device from Dom0, but only when
>> config space is accessed via method 1. Did you not see my
>> earlier patch doing this for MCFG as well
> Could you please provide a particular c/s number?... (I saw too many c/s 
> might be related to this topic). so that I could work out a patch to 
> support both i/o and mmcfg.

I sent this to you yesterday, so you'd be able to test whether
it actually fulfills its purpose before we discuss whether this is
acceptable for 4.2. See
http://lists.xen.org/archives/html/xen-devel/2012-06/msg01129.html

> (albeit only disallowing
>> writes, so allowing the device to still be seen by Dom0)?
> Sounds better to me...this still allows user to check iommu status from 
> lspci.
> 
>> Whether completely hiding the device is actually okay I can't
>> easily tell: Would IOMMUs always be either at func 0 of a single-
>> unction device, or at a non-zero func of a multi-function one? If
>> not, other devices may get hidden implicitly.
> 
> AMD IOMMU is an independent pci-e endpoint, and this function will not 
> be used for other purposes other than containing an iommu. So I don't 
> see that iommu will share bdf value with other devices.

The question is not regarding bdf, but regarding whether under
the same seg:bus:dev there might be multiple functions, one of
which is the IOMMU, and if so, whether the IOMMU would be
guaranteed to have a non-zero function number.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Wang - June 21, 2012, 1:10 p.m.
On 06/21/2012 02:45 PM, Jan Beulich wrote:
>>>> On 21.06.12 at 14:28, Wei Wang<wei.wang2@amd.com>  wrote:
>> On 06/21/2012 02:06 PM, Jan Beulich wrote:
>>>>>> On 21.06.12 at 13:21, Wei Wang<wei.wang2@amd.com>   wrote:
>>>> I also evaluated the possibility of hiding iommu device from dom0. I
>>>> think the change is no quite a lot, at least, for io based pcicfg
>>>> access. A proof-of-concept patch is attached.
>>>
>>> This completely hides the device from Dom0, but only when
>>> config space is accessed via method 1. Did you not see my
>>> earlier patch doing this for MCFG as well
>> Could you please provide a particular c/s number?... (I saw too many c/s
>> might be related to this topic). so that I could work out a patch to
>> support both i/o and mmcfg.
>
> I sent this to you yesterday, so you'd be able to test whether
> it actually fulfills its purpose before we discuss whether this is
> acceptable for 4.2. See
> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01129.html

Oh, yes I found it, my email filter did not work well so I did not see 
it at the right folder. I will test right now.

>
>> (albeit only disallowing
>>> writes, so allowing the device to still be seen by Dom0)?
>> Sounds better to me...this still allows user to check iommu status from
>> lspci.
>>
>>> Whether completely hiding the device is actually okay I can't
>>> easily tell: Would IOMMUs always be either at func 0 of a single-
>>> unction device, or at a non-zero func of a multi-function one? If
>>> not, other devices may get hidden implicitly.
>>
>> AMD IOMMU is an independent pci-e endpoint, and this function will not
>> be used for other purposes other than containing an iommu. So I don't
>> see that iommu will share bdf value with other devices.
>
> The question is not regarding bdf, but regarding whether under
> the same seg:bus:dev there might be multiple functions, one of
> which is the IOMMU, and if so, whether the IOMMU would be
> guaranteed to have a non-zero function number.

In a real system (single or multiple iommu), amd iommu shares the same 
device number with north bridge but has function number 2.. (e.g 
bus:00.2) Howerver according to spec, it does not guaranteed to have 
non-zero function number. So what is the problem you see if iommu uses 
fun0 on a multi-func device?

Thanks,
Wei

> Jan
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Beulich - June 21, 2012, 1:24 p.m.
>>> On 21.06.12 at 15:10, Wei Wang <wei.wang2@amd.com> wrote:
> On 06/21/2012 02:45 PM, Jan Beulich wrote:
>>>>> On 21.06.12 at 14:28, Wei Wang<wei.wang2@amd.com>  wrote:
>>> AMD IOMMU is an independent pci-e endpoint, and this function will not
>>> be used for other purposes other than containing an iommu. So I don't
>>> see that iommu will share bdf value with other devices.
>>
>> The question is not regarding bdf, but regarding whether under
>> the same seg:bus:dev there might be multiple functions, one of
>> which is the IOMMU, and if so, whether the IOMMU would be
>> guaranteed to have a non-zero function number.
> 
> In a real system (single or multiple iommu), amd iommu shares the same 
> device number with north bridge but has function number 2.. (e.g 
> bus:00.2) Howerver according to spec, it does not guaranteed to have 
> non-zero function number. So what is the problem you see if iommu uses 
> fun0 on a multi-func device?

If it's on func 0 and gets hidden completely (as done by your
partial patch), other functions won't be found when scanning
for them (because secondary functions get looked at only
when func 0 actually exists, as otherwise evaluating the header
type register is invalid).

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Wang - June 21, 2012, 1:27 p.m.
On 06/21/2012 03:24 PM, Jan Beulich wrote:
>>>> On 21.06.12 at 15:10, Wei Wang<wei.wang2@amd.com>  wrote:
>> On 06/21/2012 02:45 PM, Jan Beulich wrote:
>>>>>> On 21.06.12 at 14:28, Wei Wang<wei.wang2@amd.com>   wrote:
>>>> AMD IOMMU is an independent pci-e endpoint, and this function will not
>>>> be used for other purposes other than containing an iommu. So I don't
>>>> see that iommu will share bdf value with other devices.
>>>
>>> The question is not regarding bdf, but regarding whether under
>>> the same seg:bus:dev there might be multiple functions, one of
>>> which is the IOMMU, and if so, whether the IOMMU would be
>>> guaranteed to have a non-zero function number.
>>
>> In a real system (single or multiple iommu), amd iommu shares the same
>> device number with north bridge but has function number 2.. (e.g
>> bus:00.2) Howerver according to spec, it does not guaranteed to have
>> non-zero function number. So what is the problem you see if iommu uses
>> fun0 on a multi-func device?
>
> If it's on func 0 and gets hidden completely (as done by your
> partial patch), other functions won't be found when scanning
> for them (because secondary functions get looked at only
> when func 0 actually exists, as otherwise evaluating the header
> type register is invalid).

OK, understood. Then I think we do need to allow pci cfg read for iommu 
device.

Thanks
Wei

> Jan
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff -r baa85434d0ec xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c      Thu Jun 21 11:30:59 2012 +0200
+++ b/xen/arch/x86/traps.c      Thu Jun 21 13:19:02 2012 +0200
@@ -73,6 +73,7 @@ 
  #include <asm/hpet.h>
  #include <public/arch-x86/cpuid.h>
  #include <xsm/xsm.h>
+#include <asm/hvm/svm/amd-iommu-proto.h>

  /*
   * opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
@@ -1686,10 +1687,19 @@  static int pci_cfg_ok(struct domain *d,
  {
      uint32_t machine_bdf;
      uint16_t start, end;
+    struct amd_iommu *iommu;
+
      if (!IS_PRIV(d))
          return 0;

      machine_bdf = (d->arch.pci_cf8 >> 8) & 0xFFFF;
+
+    for_each_amd_iommu ( iommu )
+    {
+        if ( machine_bdf == iommu->bdf )
+            return 0;
+    }
+
      start = d->arch.pci_cf8 & 0xFF;
      end = start + size - 1;
      if (xsm_pci_config_permission(d, machine_bdf, start, end, write))