mbox series

[SRU,C,v2,0/3] Add kernel parameter 'pci=clearmsi' to clear MSI(X)s early on boot

Message ID 20181108162638.17137-1-mfo@canonical.com
Headers show
Series Add kernel parameter 'pci=clearmsi' to clear MSI(X)s early on boot | expand

Message

Mauricio Faria de Oliveira Nov. 8, 2018, 4:26 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1797990

[Changelog]
 * v2:
   - Reorder patch 1 as 3 to allow for the next change:
   - Gate the bus-scan differences with the cmdline option
     (patch 3 only).  Now all functional changes are gated.

[Impact]

 * A kexec/crash kernel might get stuck and fail to boot
   (for crash kernel, kdump fails to collect a crashdump)
   if a PCI device is buggy/stuck/looping and triggers a
   continuous flood of MSI(X) interrupts (that the kernel
   does not yet know about).

 * This fix allowed to obtain crashdumps when debugging a
   heavy-load scenario, in which a (heavy-loaded) network
   adapter wouldn't stop triggering MSI-X interrupts ever
   after panic()->kdump kicked in.

 * This fix disables MSI(X) in all PCI devices on early
   boot (this is OK as it's (re-)enabled normally later)
   with a kernel cmdline parameter (disabled by default).

[Test Case]

 * A synthetic test-case is not yet available, however,
   this particular system/workload triggered the problem
   consistently, and it was used for development/testing.

 * We'll update this bug once a synthetic test-case is
   available; we're working on patching QEMU for this.

 * $ dmesg | grep 'Clearing MSI'
   [    0.000000] Clearing MSI/MSI-X enable bits early in boot (quirk)

 * The comparison of 'dmesg -t | sort' has been reviewed
   between option disabled/enabled on boot & kexec modes,
   and only expected differences found (MHz, PIDs, MIPS).

[Regression Potential] 

 * The potential area for regressions is early boot,
   particularly effects of applying quirks during PCI
   bus scan, which is changed/broader w/ these patches.

 * However, all quirks are applied based on PCI ID
   matching, so would only apply if actually targeting
   a new device.

 * Moreover, the new quirk is only applied based on
   a kernel cmdline parameter that is disabled by
   default, which constraints even more when this
   is actually in effect.

[Other Info]
 
 * The patch series is still under review/discussion
   upstream, but it's relatively important for Ubuntu
   users at this point, and after internal discussions
   we decided to submit it for SRU.

 * These are links to the linux-pci archive with the
   patches [1, 2, 3]

   [1] [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks
       https://lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@canonical.com/

   [2] [PATCH 2/3] x86/PCI: Export find_cap() to be used in early PCI code
       https://lore.kernel.org/linux-pci/20181018183721.27467-2-gpiccoli@canonical.com/

   [3] [PATCH 3/3] x86/quirks: Add parameter to clear MSIs early on boot
       https://lore.kernel.org/linux-pci/20181018183721.27467-3-gpiccoli@canonical.com/

Guilherme G. Piccoli (3):
  UBUNTU: SAUCE: x86/PCI: Export find_cap() to be used in early PCI code
  UBUNTU: SAUCE: x86/quirks: Add parameter to clear MSIs early on boot
  UBUNTU: SAUCE: x86/quirks: Scan all busses for early PCI quirks

 .../admin-guide/kernel-parameters.txt         |  6 +++
 arch/x86/include/asm/pci-direct.h             |  2 +
 arch/x86/kernel/aperture_64.c                 | 30 +-------------
 arch/x86/kernel/early-quirks.c                | 41 +++++++++++++++++++
 arch/x86/pci/common.c                         |  4 ++
 arch/x86/pci/early.c                          | 25 +++++++++++
 6 files changed, 80 insertions(+), 28 deletions(-)

Comments

Thadeu Lima de Souza Cascardo Nov. 8, 2018, 5:14 p.m. UTC | #1
Same considerations as for Xenial.
Khalid Elmously Nov. 12, 2018, 4:56 a.m. UTC | #2
On 2018-11-08 14:26:35 , Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1797990
> 
> [Changelog]
>  * v2:
>    - Reorder patch 1 as 3 to allow for the next change:
>    - Gate the bus-scan differences with the cmdline option
>      (patch 3 only).  Now all functional changes are gated.
> 
> [Impact]
> 
>  * A kexec/crash kernel might get stuck and fail to boot
>    (for crash kernel, kdump fails to collect a crashdump)
>    if a PCI device is buggy/stuck/looping and triggers a
>    continuous flood of MSI(X) interrupts (that the kernel
>    does not yet know about).
> 
>  * This fix allowed to obtain crashdumps when debugging a
>    heavy-load scenario, in which a (heavy-loaded) network
>    adapter wouldn't stop triggering MSI-X interrupts ever
>    after panic()->kdump kicked in.
> 
>  * This fix disables MSI(X) in all PCI devices on early
>    boot (this is OK as it's (re-)enabled normally later)
>    with a kernel cmdline parameter (disabled by default).
> 
> [Test Case]
> 
>  * A synthetic test-case is not yet available, however,
>    this particular system/workload triggered the problem
>    consistently, and it was used for development/testing.
> 
>  * We'll update this bug once a synthetic test-case is
>    available; we're working on patching QEMU for this.
> 
>  * $ dmesg | grep 'Clearing MSI'
>    [    0.000000] Clearing MSI/MSI-X enable bits early in boot (quirk)
> 
>  * The comparison of 'dmesg -t | sort' has been reviewed
>    between option disabled/enabled on boot & kexec modes,
>    and only expected differences found (MHz, PIDs, MIPS).
> 
> [Regression Potential] 
> 
>  * The potential area for regressions is early boot,
>    particularly effects of applying quirks during PCI
>    bus scan, which is changed/broader w/ these patches.
> 
>  * However, all quirks are applied based on PCI ID
>    matching, so would only apply if actually targeting
>    a new device.
> 
>  * Moreover, the new quirk is only applied based on
>    a kernel cmdline parameter that is disabled by
>    default, which constraints even more when this
>    is actually in effect.
> 
> [Other Info]
>  
>  * The patch series is still under review/discussion
>    upstream, but it's relatively important for Ubuntu
>    users at this point, and after internal discussions
>    we decided to submit it for SRU.
> 
>  * These are links to the linux-pci archive with the
>    patches [1, 2, 3]
> 
>    [1] [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks
>        https://lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@canonical.com/
> 
>    [2] [PATCH 2/3] x86/PCI: Export find_cap() to be used in early PCI code
>        https://lore.kernel.org/linux-pci/20181018183721.27467-2-gpiccoli@canonical.com/
> 
>    [3] [PATCH 3/3] x86/quirks: Add parameter to clear MSIs early on boot
>        https://lore.kernel.org/linux-pci/20181018183721.27467-3-gpiccoli@canonical.com/
> 
> Guilherme G. Piccoli (3):
>   UBUNTU: SAUCE: x86/PCI: Export find_cap() to be used in early PCI code
>   UBUNTU: SAUCE: x86/quirks: Add parameter to clear MSIs early on boot
>   UBUNTU: SAUCE: x86/quirks: Scan all busses for early PCI quirks
> 
>  .../admin-guide/kernel-parameters.txt         |  6 +++
>  arch/x86/include/asm/pci-direct.h             |  2 +
>  arch/x86/kernel/aperture_64.c                 | 30 +-------------
>  arch/x86/kernel/early-quirks.c                | 41 +++++++++++++++++++
>  arch/x86/pci/common.c                         |  4 ++
>  arch/x86/pci/early.c                          | 25 +++++++++++
>  6 files changed, 80 insertions(+), 28 deletions(-)
> 

Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Khalid Elmously Nov. 12, 2018, 5:04 a.m. UTC | #3
On 2018-11-08 14:26:35 , Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1797990
> 
> [Changelog]
>  * v2:
>    - Reorder patch 1 as 3 to allow for the next change:
>    - Gate the bus-scan differences with the cmdline option
>      (patch 3 only).  Now all functional changes are gated.
> 
> [Impact]
> 
>  * A kexec/crash kernel might get stuck and fail to boot
>    (for crash kernel, kdump fails to collect a crashdump)
>    if a PCI device is buggy/stuck/looping and triggers a
>    continuous flood of MSI(X) interrupts (that the kernel
>    does not yet know about).
> 
>  * This fix allowed to obtain crashdumps when debugging a
>    heavy-load scenario, in which a (heavy-loaded) network
>    adapter wouldn't stop triggering MSI-X interrupts ever
>    after panic()->kdump kicked in.
> 
>  * This fix disables MSI(X) in all PCI devices on early
>    boot (this is OK as it's (re-)enabled normally later)
>    with a kernel cmdline parameter (disabled by default).
> 
> [Test Case]
> 
>  * A synthetic test-case is not yet available, however,
>    this particular system/workload triggered the problem
>    consistently, and it was used for development/testing.
> 
>  * We'll update this bug once a synthetic test-case is
>    available; we're working on patching QEMU for this.
> 
>  * $ dmesg | grep 'Clearing MSI'
>    [    0.000000] Clearing MSI/MSI-X enable bits early in boot (quirk)
> 
>  * The comparison of 'dmesg -t | sort' has been reviewed
>    between option disabled/enabled on boot & kexec modes,
>    and only expected differences found (MHz, PIDs, MIPS).
> 
> [Regression Potential] 
> 
>  * The potential area for regressions is early boot,
>    particularly effects of applying quirks during PCI
>    bus scan, which is changed/broader w/ these patches.
> 
>  * However, all quirks are applied based on PCI ID
>    matching, so would only apply if actually targeting
>    a new device.
> 
>  * Moreover, the new quirk is only applied based on
>    a kernel cmdline parameter that is disabled by
>    default, which constraints even more when this
>    is actually in effect.
> 
> [Other Info]
>  
>  * The patch series is still under review/discussion
>    upstream, but it's relatively important for Ubuntu
>    users at this point, and after internal discussions
>    we decided to submit it for SRU.
> 
>  * These are links to the linux-pci archive with the
>    patches [1, 2, 3]
> 
>    [1] [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks
>        https://lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@canonical.com/
> 
>    [2] [PATCH 2/3] x86/PCI: Export find_cap() to be used in early PCI code
>        https://lore.kernel.org/linux-pci/20181018183721.27467-2-gpiccoli@canonical.com/
> 
>    [3] [PATCH 3/3] x86/quirks: Add parameter to clear MSIs early on boot
>        https://lore.kernel.org/linux-pci/20181018183721.27467-3-gpiccoli@canonical.com/
> 
> Guilherme G. Piccoli (3):
>   UBUNTU: SAUCE: x86/PCI: Export find_cap() to be used in early PCI code
>   UBUNTU: SAUCE: x86/quirks: Add parameter to clear MSIs early on boot
>   UBUNTU: SAUCE: x86/quirks: Scan all busses for early PCI quirks
> 
>  .../admin-guide/kernel-parameters.txt         |  6 +++
>  arch/x86/include/asm/pci-direct.h             |  2 +
>  arch/x86/kernel/aperture_64.c                 | 30 +-------------
>  arch/x86/kernel/early-quirks.c                | 41 +++++++++++++++++++
>  arch/x86/pci/common.c                         |  4 ++
>  arch/x86/pci/early.c                          | 25 +++++++++++
>  6 files changed, 80 insertions(+), 28 deletions(-)
> 
> -- 
> 2.17.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team