Patchwork [v9] irq: add quirk for broken interrupt remapping on 55XX chipsets

login
register
mail settings
Submitter Neil Horman
Date April 15, 2013, 10:41 p.m.
Message ID <1366065677-3431-1-git-send-email-nhorman@tuxdriver.com>
Download mbox | patch
Permalink /patch/236758/
State Not Applicable
Headers show

Comments

Neil Horman - April 15, 2013, 10:41 p.m.
A few years back intel published a spec update:
http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf

For the 5520 and 5500 chipsets which contained an errata (specificially errata
53), which noted that these chipsets can't properly do interrupt remapping, and
as a result the recommend that interrupt remapping be disabled in bios.  While
many vendors have a bios update to do exactly that, not all do, and of course
not all users update their bios to a level that corrects the problem.  As a
result, occasionally interrupts can arrive at a cpu even after affinity for that
interrupt has be moved, leading to lost or spurrious interrupts (usually
characterized by the message:
kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)

There have been several incidents recently of people seeing this error, and
investigation has shown that they have system for which their BIOS level is such
that this feature was not properly turned off.  As such, it would be good to
give them a reminder that their systems are vulnurable to this problem.  For
details of those that reported the problem, please see:
https://bugzilla.redhat.com/show_bug.cgi?id=887006

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Prarit Bhargava <prarit@redhat.com>
CC: Don Zickus <dzickus@redhat.com>
CC: Don Dutile <ddutile@redhat.com>
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Asit Mallick <asit.k.mallick@intel.com>
CC: David Woodhouse <dwmw2@infradead.org>
CC: linux-pci@vger.kernel.org
CC: Joerg Roedel <joro@8bytes.org>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Arkadiusz Miśkiewicz <arekm@maven.pl>
---

Change notes:

v2)

* Moved the quirk to the x86 arch, since consensus seems to be that the 55XX
chipset series is x86 only.  I decided however to keep the quirk as a regular
quirk, not an early_quirk.  Early quirks have no way currently to determine if
BIOS has properly disabled the feature in the iommu, at least not without
significant hacking, and since its quite possible this will be a short lived
quirk, should Don Z's workaround code prove successful (and it looks like it may
well), I don't think that necessecary.

* Removed the WARNING banner from the quirk, and added the HW_ERR token to the
string, I opted to leave the newlines in place however, as I really couldnt
find a way to keep the text on a single line is still legible from a code
perspective.  I think theres enough language in there that using cscope on just
about any substring however will turn it up, and again, this may be a short
lived quirk.

v3)

* Removed defines from pci_ids.h, and used direct id values as per request from
Bjorn.

v4)

* Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David
Woodhouse

v5)

* Moved check to an early quirk, and flagged the broken chip, so we could
reasonably disable irq remapping during bootup.

v6)

* Clean up of stupid extra thrash in quirks.c

v7)

* Move broken check to intel_irq_remapping.c
* Fixed another typo
* Finally made the reference bugzilla public

v8)

* Removed extraneous code from irq_remapping_enabled

v9)

* Fix stupid build break from rushing to shuffle simmilar header files about
  Thanks to Arkadiusz Miśkiewicz  for pointing it out
---
 arch/x86/kernel/early-quirks.c      | 26 ++++++++++++++++++++++++++
 drivers/iommu/intel_irq_remapping.c | 10 ++++++++++
 drivers/iommu/irq_remapping.c       |  1 +
 drivers/iommu/irq_remapping.h       |  2 ++
 4 files changed, 39 insertions(+)
Yinghai Lu - April 15, 2013, 11:02 p.m.
On Mon, Apr 15, 2013 at 3:41 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> A few years back intel published a spec update:
> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
>
> For the 5520 and 5500 chipsets which contained an errata (specificially errata
> 53), which noted that these chipsets can't properly do interrupt remapping, and
> as a result the recommend that interrupt remapping be disabled in bios.  While
> many vendors have a bios update to do exactly that, not all do, and of course
> not all users update their bios to a level that corrects the problem.  As a
> result, occasionally interrupts can arrive at a cpu even after affinity for that
> interrupt has be moved, leading to lost or spurrious interrupts (usually
> characterized by the message:
> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
>
> There have been several incidents recently of people seeing this error, and
> investigation has shown that they have system for which their BIOS level is such
> that this feature was not properly turned off.  As such, it would be good to
> give them a reminder that their systems are vulnurable to this problem.  For
> details of those that reported the problem, please see:
> https://bugzilla.redhat.com/show_bug.cgi?id=887006
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Prarit Bhargava <prarit@redhat.com>
> CC: Don Zickus <dzickus@redhat.com>
> CC: Don Dutile <ddutile@redhat.com>
> CC: Bjorn Helgaas <bhelgaas@google.com>
> CC: Asit Mallick <asit.k.mallick@intel.com>
> CC: David Woodhouse <dwmw2@infradead.org>
> CC: linux-pci@vger.kernel.org
> CC: Joerg Roedel <joro@8bytes.org>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> CC: Arkadiusz Miśkiewicz <arekm@maven.pl>
> ---
>
> Change notes:
>
> v2)
>
> * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX
> chipset series is x86 only.  I decided however to keep the quirk as a regular
> quirk, not an early_quirk.  Early quirks have no way currently to determine if
> BIOS has properly disabled the feature in the iommu, at least not without
> significant hacking, and since its quite possible this will be a short lived
> quirk, should Don Z's workaround code prove successful (and it looks like it may
> well), I don't think that necessecary.
>
> * Removed the WARNING banner from the quirk, and added the HW_ERR token to the
> string, I opted to leave the newlines in place however, as I really couldnt
> find a way to keep the text on a single line is still legible from a code
> perspective.  I think theres enough language in there that using cscope on just
> about any substring however will turn it up, and again, this may be a short
> lived quirk.
>
> v3)
>
> * Removed defines from pci_ids.h, and used direct id values as per request from
> Bjorn.
>
> v4)
>
> * Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David
> Woodhouse
>
> v5)
>
> * Moved check to an early quirk, and flagged the broken chip, so we could
> reasonably disable irq remapping during bootup.
>
> v6)
>
> * Clean up of stupid extra thrash in quirks.c
>
> v7)
>
> * Move broken check to intel_irq_remapping.c
> * Fixed another typo
> * Finally made the reference bugzilla public
>
> v8)
>
> * Removed extraneous code from irq_remapping_enabled
>
> v9)
>
> * Fix stupid build break from rushing to shuffle simmilar header files about
>   Thanks to Arkadiusz Miśkiewicz  for pointing it out
> ---
>  arch/x86/kernel/early-quirks.c      | 26 ++++++++++++++++++++++++++
>  drivers/iommu/intel_irq_remapping.c | 10 ++++++++++
>  drivers/iommu/irq_remapping.c       |  1 +
>  drivers/iommu/irq_remapping.h       |  2 ++
>  4 files changed, 39 insertions(+)
>
> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> index 3755ef4..ef4ac6c 100644
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -18,6 +18,7 @@
>  #include <asm/apic.h>
>  #include <asm/iommu.h>
>  #include <asm/gart.h>
> +#include "../drivers/iommu/irq_remapping.h"

looks ugly.

>
>  static void __init fix_hypertransport_config(int num, int slot, int func)
>  {
> @@ -192,6 +193,27 @@ static void __init ati_bugs_contd(int num, int slot, int func)
>  }
>  #endif
>
> +#ifdef CONFIG_IRQ_REMAP
> +static void __init intel_remapping_check(int num, int slot, int func)
> +{
> +       u8 revision;
> +
> +       revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID);
> +
> +       /*
> +        * Revision 0x13 of this chipset supports irq remapping
> +        * but has an erratum that breaks its behavior, flag it as such
> +        */
> +       if (revision == 0x13)
> +               irq_remap_broken = 1;

change to more specific like:

intel_55xx_rev13_found?

> +
> +}
> +#else
> +static void __init intel_remapping_check(int num, int slot, int func)
> +{
> +}
> +#endif
> +
>  #define QFLAG_APPLY_ONCE       0x1
>  #define QFLAG_APPLIED          0x2
>  #define QFLAG_DONE             (QFLAG_APPLY_ONCE|QFLAG_APPLIED)
> @@ -221,6 +243,10 @@ static struct chipset early_qrk[] __initdata = {
>           PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
>         { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
>           PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
> +       { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST,
> +         PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
> +       { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST,
> +         PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
>         {}
>  };
>
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index f3b8f23..5b19b2d 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -524,6 +524,16 @@ static int __init intel_irq_remapping_supported(void)
>
>         if (disable_irq_remap)
>                 return 0;
> +       if (irq_remap_broken) {
> +               WARN_TAINT(1, TAINT_FIRMWARE_WORKAROUND,
> +                          "This system BIOS has enabled interrupt remapping\n"
> +                          "on a chipset that contains an erratum making that\n"
> +                          "feature unstable.  To maintain system stability\n"
> +                          "interrupt remapping is being disabled.  Please\n"
> +                          "contact your BIOS vendor for an update\n");
> +               disable_irq_remap = 1;
> +               return 0;
> +       }
>
>         if (!dmar_ir_support())
>                 return 0;
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index d56f8c1..04d975f 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -19,6 +19,7 @@
>  int irq_remapping_enabled;
>
>  int disable_irq_remap;
> +int irq_remap_broken;
>  int disable_sourceid_checking;
>  int no_x2apic_optout;
>
> diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> index ecb6376..90c4dae 100644
> --- a/drivers/iommu/irq_remapping.h
> +++ b/drivers/iommu/irq_remapping.h
> @@ -32,6 +32,7 @@ struct pci_dev;
>  struct msi_msg;
>
>  extern int disable_irq_remap;
> +extern int irq_remap_broken;
>  extern int disable_sourceid_checking;
>  extern int no_x2apic_optout;
>  extern int irq_remapping_enabled;
> @@ -89,6 +90,7 @@ extern struct irq_remap_ops amd_iommu_irq_ops;
>
>  #define irq_remapping_enabled 0
>  #define disable_irq_remap     1
> +#define irq_remap_broken      0

this one is needed

>
>  #endif /* CONFIG_IRQ_REMAP */
>
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - April 16, 2013, 12:43 a.m.
On Mon, Apr 15, 2013 at 04:02:56PM -0700, Yinghai Lu wrote:
> On Mon, Apr 15, 2013 at 3:41 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> > A few years back intel published a spec update:
> > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
> >
><snip>
> > diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> > index 3755ef4..ef4ac6c 100644
> > --- a/arch/x86/kernel/early-quirks.c
> > +++ b/arch/x86/kernel/early-quirks.c
> > @@ -18,6 +18,7 @@
> >  #include <asm/apic.h>
> >  #include <asm/iommu.h>
> >  #include <asm/gart.h>
> > +#include "../drivers/iommu/irq_remapping.h"
> 
> looks ugly.
> 
Yes, but I think its acceptible given that it makes sense to me to define the
irq_remap_broken flag in the common driver code.  We can certainly move the
header around, but I'd much rather do that in a separate patch.

> >
> >  static void __init fix_hypertransport_config(int num, int slot, int func)
> >  {
> > @@ -192,6 +193,27 @@ static void __init ati_bugs_contd(int num, int slot, int func)
> >  }
> >  #endif
> >
> > +#ifdef CONFIG_IRQ_REMAP
> > +static void __init intel_remapping_check(int num, int slot, int func)
> > +{
> > +       u8 revision;
> > +
> > +       revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID);
> > +
> > +       /*
> > +        * Revision 0x13 of this chipset supports irq remapping
> > +        * but has an erratum that breaks its behavior, flag it as such
> > +        */
> > +       if (revision == 0x13)
> > +               irq_remap_broken = 1;
> 
> change to more specific like:
> 
> intel_55xx_rev13_found?
> 
No.  This was discussed previously, and the consensus was that we
can use a generic name, should other chips have simmilarly broken functionality.

><snip>
> >
> >  int disable_irq_remap;
> > +int irq_remap_broken;
> >  int disable_sourceid_checking;
> >  int no_x2apic_optout;
> >
> > diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> > index ecb6376..90c4dae 100644
> > --- a/drivers/iommu/irq_remapping.h
> > +++ b/drivers/iommu/irq_remapping.h
> > @@ -32,6 +32,7 @@ struct pci_dev;
> >  struct msi_msg;
> >
> >  extern int disable_irq_remap;
> > +extern int irq_remap_broken;
> >  extern int disable_sourceid_checking;
> >  extern int no_x2apic_optout;
> >  extern int irq_remapping_enabled;
> > @@ -89,6 +90,7 @@ extern struct irq_remap_ops amd_iommu_irq_ops;
> >
> >  #define irq_remapping_enabled 0
> >  #define disable_irq_remap     1
> > +#define irq_remap_broken      0
> 
> this one is needed
> 
Um, yes?  I think you mean to say its not needed, since all the users of this
check are only in code thats compiled conditionally with CONFIG_IRQ_REMAP.
You're correct, but I like to have it there for completness, should that change
in the future.

Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arkadiusz Miskiewicz - April 16, 2013, 6:20 a.m.
On Tuesday 16 of April 2013, Neil Horman wrote:
> A few years back intel published a spec update:
> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chi
> pset-ioh-specification-update.pdf
> 
> For the 5520 and 5500 chipsets which contained an errata (specificially
> errata 53), which noted that these chipsets can't properly do interrupt
> remapping, and as a result the recommend that interrupt remapping be
> disabled in bios.  While many vendors have a bios update to do exactly
> that, not all do, and of course not all users update their bios to a level
> that corrects the problem.  As a result, occasionally interrupts can
> arrive at a cpu even after affinity for that interrupt has be moved,
> leading to lost or spurrious interrupts (usually characterized by the
> message:
> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
> 
> There have been several incidents recently of people seeing this error, and
> investigation has shown that they have system for which their BIOS level is
> such that this feature was not properly turned off.  As such, it would be
> good to give them a reminder that their systems are vulnurable to this
> problem.  For details of those that reported the problem, please see:
> https://bugzilla.redhat.com/show_bug.cgi?id=887006

Tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl>

(on top of 3.7.10 kernel)

Stack trace looks useless for me in this case but according changelog this was 
already discussed.

[    0.137512] Freeing SMP alternatives: 20k freed
[    0.143539] ACPI: Core revision 20120913
[    0.156067] dmar: Host address width 40
[    0.160440] dmar: DRHD base: 0x000000fe710000 flags: 0x1
[    0.166467] dmar: IOMMU 0: reg_base_addr fe710000 ver 1:0 cap 
c90780106f0462 ecap f020f7
[    0.175618] dmar: RMRR base: 0x0000008f62f000 end: 0x0000008f631fff
[    0.182705] dmar: RMRR base: 0x0000008f61a000 end: 0x0000008f61afff
[    0.189792] dmar: RMRR base: 0x0000008f617000 end: 0x0000008f617fff
[    0.196871] dmar: RMRR base: 0x0000008f614000 end: 0x0000008f614fff
[    0.203960] dmar: RMRR base: 0x0000008f611000 end: 0x0000008f611fff
[    0.211047] dmar: RMRR base: 0x0000008f60e000 end: 0x0000008f60efff
[    0.218135] dmar: RMRR base: 0x0000008f60b000 end: 0x0000008f60bfff
[    0.225222] dmar: RMRR base: 0x0000008f608000 end: 0x0000008f608fff
[    0.232309] dmar: RMRR base: 0x0000008f605000 end: 0x0000008f605fff
[    0.239388] dmar: ATSR flags: 0x0
[    0.243273] ------------[ cut here ]------------
[    0.248515] WARNING: at 
/home/users/arekm/rpm/BUILD/kernel-3.7.10/linux-3.7/drivers/iommu/intel_irq_remapping.c:518 
intel_irq_remapping_supported+0x37/0x7
a()
[    0.264358] Hardware name: S5500WB
[    0.268238] This system BIOS has enabled interrupt remapping
on a chipset that contains an erratum making that
feature unstable.  To maintain system stability
interrupt remapping is being disabled.  Please
contact your BIOS vendor for an update
[    0.298811] Modules linked in:
[    0.302373] Pid: 1, comm: swapper/0 xid: #0 Not tainted 3.7.10-6 #1
[    0.309453] Call Trace:
[    0.312270]  [<ffffffff8105182a>] warn_slowpath_common+0x7a/0xb0
[    0.319061]  [<ffffffff810518ba>] warn_slowpath_fmt_taint+0x3a/0x40
[    0.326143]  [<ffffffff818e8371>] intel_irq_remapping_supported+0x37/0x7a
[    0.333810]  [<ffffffff813b7226>] irq_remapping_supported+0x26/0x30
[    0.340893]  [<ffffffff818bd1be>] enable_IR+0x9/0x3e
[    0.346521]  [<ffffffff818bd558>] enable_IR_x2apic+0xa0/0x1e3
[    0.353024]  [<ffffffff814cfdc4>] ? set_cpu_sibling_map+0x415/0x435
[    0.360108]  [<ffffffff818bf47a>] default_setup_apic_routing+0x12/0x6b
[    0.367483]  [<ffffffff818bb30b>] native_smp_prepare_cpus+0x2e7/0x336
[    0.374761]  [<ffffffff818abcd5>] kernel_init_freeable+0x89/0x1c4
[    0.381652]  [<ffffffff814b9380>] ? rest_init+0x70/0x70
[    0.387570]  [<ffffffff814b9389>] kernel_init+0x9/0x100
[    0.393489]  [<ffffffff814e8d7c>] ret_from_fork+0x7c/0xb0
[    0.399602]  [<ffffffff814b9380>] ? rest_init+0x70/0x70
[    0.405524] ---[ end trace bf40f410b44b3726 ]---
[    0.410830] Switched APIC routing to physical flat.
[    0.416872] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.456602] smpboot: CPU0: Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz 
(fam: 06, model: 1a, stepping: 04)
[    0.573650] Performance Events: PEBS fmt1+, 16-deep LBR, Nehalem events, 
Intel PMU driver.
[    0.583265] perf_event_intel: CPU erratum AAJ80 worked around
Joerg Roedel - April 16, 2013, 10:24 a.m.
On Mon, Apr 15, 2013 at 06:41:17PM -0400, Neil Horman wrote:
> +#ifdef CONFIG_IRQ_REMAP
> +static void __init intel_remapping_check(int num, int slot, int func)
> +{
> +	u8 revision;
> +
> +	revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID);
> +
> +	/*
> +	 * Revision 0x13 of this chipset supports irq remapping
> +	 * but has an erratum that breaks its behavior, flag it as such
> +	 */
> +	if (revision == 0x13)
> +		irq_remap_broken = 1;
> +
> +}
> +#else

Any reason why you don't check this in the Intel IOMMU init code? You
would safe the ifdefs and you don't have to include
irq-remapping-internal header files somewhere else in the tree.


	Joerg


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - April 16, 2013, 1:07 p.m.
On Tue, Apr 16, 2013 at 12:24:54PM +0200, Joerg Roedel wrote:
> On Mon, Apr 15, 2013 at 06:41:17PM -0400, Neil Horman wrote:
> > +#ifdef CONFIG_IRQ_REMAP
> > +static void __init intel_remapping_check(int num, int slot, int func)
> > +{
> > +	u8 revision;
> > +
> > +	revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID);
> > +
> > +	/*
> > +	 * Revision 0x13 of this chipset supports irq remapping
> > +	 * but has an erratum that breaks its behavior, flag it as such
> > +	 */
> > +	if (revision == 0x13)
> > +		irq_remap_broken = 1;
> > +
> > +}
> > +#else
> 
> Any reason why you don't check this in the Intel IOMMU init code? You
> would safe the ifdefs and you don't have to include
> irq-remapping-internal header files somewhere else in the tree.
> 
> 
> 	Joerg
> 
Mostly because we've spent so much time early in this thread talking about where
the quirk should go, that after this last revision, it didn't even occur to me
that, using this new approach, we don't even need a quirk anymore.  That makes
way more sense to me though, I'll revise the patch again :(.

Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - April 16, 2013, 1:35 p.m.
On Tue, Apr 16, 2013 at 12:24:54PM +0200, Joerg Roedel wrote:
> On Mon, Apr 15, 2013 at 06:41:17PM -0400, Neil Horman wrote:
> > +#ifdef CONFIG_IRQ_REMAP
> > +static void __init intel_remapping_check(int num, int slot, int func)
> > +{
> > +	u8 revision;
> > +
> > +	revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID);
> > +
> > +	/*
> > +	 * Revision 0x13 of this chipset supports irq remapping
> > +	 * but has an erratum that breaks its behavior, flag it as such
> > +	 */
> > +	if (revision == 0x13)
> > +		irq_remap_broken = 1;
> > +
> > +}
> > +#else
> 
> Any reason why you don't check this in the Intel IOMMU init code? You
> would safe the ifdefs and you don't have to include
> irq-remapping-internal header files somewhere else in the tree.
> 
> 
> 	Joerg
> 
> 
> 
Actually, hold on that last note, the intel iommu init code doesn't seem to
create any direct relationship between the set of iommu's and the pci_dev's that
implement them.  In the intel_irq_remapping_supported path I can loop over each
dmar_dhrd_unit, and interrogate each of the devices on its **devices list to see
if the device/vendor and revision ids match, but looking at the dhrd parsing
code, I'm not sure the iommu pci_dev is always going to be on that list.  That
seems like its going to be pretty ugly in and of itself.  Do you have a
suggested way to identify the pci_dev of the device we need in that path without
having to simply iterate over every device in that scope?

Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joerg Roedel - April 16, 2013, 4:37 p.m.
On Tue, Apr 16, 2013 at 09:35:56AM -0400, Neil Horman wrote:
> Actually, hold on that last note, the intel iommu init code doesn't seem to
> create any direct relationship between the set of iommu's and the pci_dev's that
> implement them.  In the intel_irq_remapping_supported path I can loop over each
> dmar_dhrd_unit, and interrogate each of the devices on its **devices list to see
> if the device/vendor and revision ids match, but looking at the dhrd parsing
> code, I'm not sure the iommu pci_dev is always going to be on that list.  That
> seems like its going to be pretty ugly in and of itself.  Do you have a
> suggested way to identify the pci_dev of the device we need in that path without
> having to simply iterate over every device in that scope?

Hmkay, looks like this is a non-trivial problem. Here is what I suggest:
Keep the early-quirk as in your current patch. But add a function to
drivers/iommu/irq_remapping.c to disable irq-remapping and export that
function via the header-file arch/x86/include/asm/irq_remapping.h. Use
that function in the quirk instead of setting the disable-flag directly.
This way you don't have to include any private header file from iommu
code.


	Joerg


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - April 16, 2013, 5:25 p.m.
On Tue, Apr 16, 2013 at 06:37:05PM +0200, Joerg Roedel wrote:
> On Tue, Apr 16, 2013 at 09:35:56AM -0400, Neil Horman wrote:
> > Actually, hold on that last note, the intel iommu init code doesn't seem to
> > create any direct relationship between the set of iommu's and the pci_dev's that
> > implement them.  In the intel_irq_remapping_supported path I can loop over each
> > dmar_dhrd_unit, and interrogate each of the devices on its **devices list to see
> > if the device/vendor and revision ids match, but looking at the dhrd parsing
> > code, I'm not sure the iommu pci_dev is always going to be on that list.  That
> > seems like its going to be pretty ugly in and of itself.  Do you have a
> > suggested way to identify the pci_dev of the device we need in that path without
> > having to simply iterate over every device in that scope?
> 
> Hmkay, looks like this is a non-trivial problem. Here is what I suggest:
> Keep the early-quirk as in your current patch. But add a function to
> drivers/iommu/irq_remapping.c to disable irq-remapping and export that
> function via the header-file arch/x86/include/asm/irq_remapping.h. Use
> that function in the quirk instead of setting the disable-flag directly.
> This way you don't have to include any private header file from iommu
> code.
> 
Ok, that seems reasonable. I'll have a new patch in a day or so.

Thanks!
Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 3755ef4..ef4ac6c 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -18,6 +18,7 @@ 
 #include <asm/apic.h>
 #include <asm/iommu.h>
 #include <asm/gart.h>
+#include "../drivers/iommu/irq_remapping.h"
 
 static void __init fix_hypertransport_config(int num, int slot, int func)
 {
@@ -192,6 +193,27 @@  static void __init ati_bugs_contd(int num, int slot, int func)
 }
 #endif
 
+#ifdef CONFIG_IRQ_REMAP
+static void __init intel_remapping_check(int num, int slot, int func)
+{
+	u8 revision;
+
+	revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID);
+
+	/*
+	 * Revision 0x13 of this chipset supports irq remapping
+	 * but has an erratum that breaks its behavior, flag it as such
+	 */
+	if (revision == 0x13)
+		irq_remap_broken = 1;
+
+}
+#else
+static void __init intel_remapping_check(int num, int slot, int func)
+{
+}
+#endif
+
 #define QFLAG_APPLY_ONCE 	0x1
 #define QFLAG_APPLIED		0x2
 #define QFLAG_DONE		(QFLAG_APPLY_ONCE|QFLAG_APPLIED)
@@ -221,6 +243,10 @@  static struct chipset early_qrk[] __initdata = {
 	  PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
 	{ PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
 	  PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
+	{ PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST,
+	  PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
+	{ PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST,
+	  PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
 	{}
 };
 
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index f3b8f23..5b19b2d 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -524,6 +524,16 @@  static int __init intel_irq_remapping_supported(void)
 
 	if (disable_irq_remap)
 		return 0;
+	if (irq_remap_broken) {
+		WARN_TAINT(1, TAINT_FIRMWARE_WORKAROUND,
+			   "This system BIOS has enabled interrupt remapping\n"
+			   "on a chipset that contains an erratum making that\n"
+			   "feature unstable.  To maintain system stability\n"
+			   "interrupt remapping is being disabled.  Please\n"
+			   "contact your BIOS vendor for an update\n");
+		disable_irq_remap = 1;
+		return 0;
+	}
 
 	if (!dmar_ir_support())
 		return 0;
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index d56f8c1..04d975f 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -19,6 +19,7 @@ 
 int irq_remapping_enabled;
 
 int disable_irq_remap;
+int irq_remap_broken;
 int disable_sourceid_checking;
 int no_x2apic_optout;
 
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index ecb6376..90c4dae 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -32,6 +32,7 @@  struct pci_dev;
 struct msi_msg;
 
 extern int disable_irq_remap;
+extern int irq_remap_broken;
 extern int disable_sourceid_checking;
 extern int no_x2apic_optout;
 extern int irq_remapping_enabled;
@@ -89,6 +90,7 @@  extern struct irq_remap_ops amd_iommu_irq_ops;
 
 #define irq_remapping_enabled 0
 #define disable_irq_remap     1
+#define irq_remap_broken      0
 
 #endif /* CONFIG_IRQ_REMAP */