Message ID | 20181108144636.15887-1-mfo@canonical.com |
---|---|
Headers | show |
Series | Add kernel parameter 'pci=clearmsi' to clear MSI(X)s early on boot | expand |
Same considerations as for Xenial.
On 2018-11-08 12:46:33 , Mauricio Faria de Oliveira wrote: > BugLink: https://bugs.launchpad.net/bugs/1797990 > > [Changelog] > * v2: > - Reorder patch 1 as 3 to allow for the next change: > - Gate the bus-scan differences with the cmdline option > (patch 3 only). Now all functional changes are gated. > > [Impact] > > * A kexec/crash kernel might get stuck and fail to boot > (for crash kernel, kdump fails to collect a crashdump) > if a PCI device is buggy/stuck/looping and triggers a > continuous flood of MSI(X) interrupts (that the kernel > does not yet know about). > > * This fix allowed to obtain crashdumps when debugging a > heavy-load scenario, in which a (heavy-loaded) network > adapter wouldn't stop triggering MSI-X interrupts ever > after panic()->kdump kicked in. > > * This fix disables MSI(X) in all PCI devices on early > boot (this is OK as it's (re-)enabled normally later) > with a kernel cmdline parameter (disabled by default). > > [Test Case] > > * A synthetic test-case is not yet available, however, > this particular system/workload triggered the problem > consistently, and it was used for development/testing. > > * We'll update this bug once a synthetic test-case is > available; we're working on patching QEMU for this. > > * $ dmesg | grep 'Clearing MSI' > [ 0.000000] Clearing MSI/MSI-X enable bits early in boot (quirk) > > * The comparison of 'dmesg -t | sort' has been reviewed > between option disabled/enabled on boot & kexec modes, > and only expected differences found (MHz, PIDs, MIPS). > > [Regression Potential] > > * The potential area for regressions is early boot, > particularly effects of applying quirks during PCI > bus scan, which is changed/broader w/ these patches. > > * However, all quirks are applied based on PCI ID > matching, so would only apply if actually targeting > a new device. > > * Moreover, the new quirk is only applied based on > a kernel cmdline parameter that is disabled by > default, which constraints even more when this > is actually in effect. > > [Other Info] > > * The patch series is still under review/discussion > upstream, but it's relatively important for Ubuntu > users at this point, and after internal discussions > we decided to submit it for SRU. > > * These are links to the linux-pci archive with the > patches [1, 2, 3] > > [1] [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks > https://lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@canonical.com/ > > [2] [PATCH 2/3] x86/PCI: Export find_cap() to be used in early PCI code > https://lore.kernel.org/linux-pci/20181018183721.27467-2-gpiccoli@canonical.com/ > > [3] [PATCH 3/3] x86/quirks: Add parameter to clear MSIs early on boot > https://lore.kernel.org/linux-pci/20181018183721.27467-3-gpiccoli@canonical.com/ > > Guilherme G. Piccoli (3): > UBUNTU: SAUCE: x86/PCI: Export find_cap() to be used in early PCI code > UBUNTU: SAUCE: x86/quirks: Add parameter to clear MSIs early on boot > UBUNTU: SAUCE: x86/quirks: Scan all busses for early PCI quirks > > .../admin-guide/kernel-parameters.txt | 6 +++ > arch/x86/include/asm/pci-direct.h | 2 + > arch/x86/kernel/aperture_64.c | 30 +------------- > arch/x86/kernel/early-quirks.c | 41 +++++++++++++++++++ > arch/x86/pci/common.c | 4 ++ > arch/x86/pci/early.c | 25 +++++++++++ > 6 files changed, 80 insertions(+), 28 deletions(-) > Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
On 2018-11-08 12:46:33 , Mauricio Faria de Oliveira wrote: > BugLink: https://bugs.launchpad.net/bugs/1797990 > > [Changelog] > * v2: > - Reorder patch 1 as 3 to allow for the next change: > - Gate the bus-scan differences with the cmdline option > (patch 3 only). Now all functional changes are gated. > > [Impact] > > * A kexec/crash kernel might get stuck and fail to boot > (for crash kernel, kdump fails to collect a crashdump) > if a PCI device is buggy/stuck/looping and triggers a > continuous flood of MSI(X) interrupts (that the kernel > does not yet know about). > > * This fix allowed to obtain crashdumps when debugging a > heavy-load scenario, in which a (heavy-loaded) network > adapter wouldn't stop triggering MSI-X interrupts ever > after panic()->kdump kicked in. > > * This fix disables MSI(X) in all PCI devices on early > boot (this is OK as it's (re-)enabled normally later) > with a kernel cmdline parameter (disabled by default). > > [Test Case] > > * A synthetic test-case is not yet available, however, > this particular system/workload triggered the problem > consistently, and it was used for development/testing. > > * We'll update this bug once a synthetic test-case is > available; we're working on patching QEMU for this. > > * $ dmesg | grep 'Clearing MSI' > [ 0.000000] Clearing MSI/MSI-X enable bits early in boot (quirk) > > * The comparison of 'dmesg -t | sort' has been reviewed > between option disabled/enabled on boot & kexec modes, > and only expected differences found (MHz, PIDs, MIPS). > > [Regression Potential] > > * The potential area for regressions is early boot, > particularly effects of applying quirks during PCI > bus scan, which is changed/broader w/ these patches. > > * However, all quirks are applied based on PCI ID > matching, so would only apply if actually targeting > a new device. > > * Moreover, the new quirk is only applied based on > a kernel cmdline parameter that is disabled by > default, which constraints even more when this > is actually in effect. > > [Other Info] > > * The patch series is still under review/discussion > upstream, but it's relatively important for Ubuntu > users at this point, and after internal discussions > we decided to submit it for SRU. > > * These are links to the linux-pci archive with the > patches [1, 2, 3] > > [1] [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks > https://lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@canonical.com/ > > [2] [PATCH 2/3] x86/PCI: Export find_cap() to be used in early PCI code > https://lore.kernel.org/linux-pci/20181018183721.27467-2-gpiccoli@canonical.com/ > > [3] [PATCH 3/3] x86/quirks: Add parameter to clear MSIs early on boot > https://lore.kernel.org/linux-pci/20181018183721.27467-3-gpiccoli@canonical.com/ > > Guilherme G. Piccoli (3): > UBUNTU: SAUCE: x86/PCI: Export find_cap() to be used in early PCI code > UBUNTU: SAUCE: x86/quirks: Add parameter to clear MSIs early on boot > UBUNTU: SAUCE: x86/quirks: Scan all busses for early PCI quirks > > .../admin-guide/kernel-parameters.txt | 6 +++ > arch/x86/include/asm/pci-direct.h | 2 + > arch/x86/kernel/aperture_64.c | 30 +------------- > arch/x86/kernel/early-quirks.c | 41 +++++++++++++++++++ > arch/x86/pci/common.c | 4 ++ > arch/x86/pci/early.c | 25 +++++++++++ > 6 files changed, 80 insertions(+), 28 deletions(-) > > -- > 2.17.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team