From patchwork Thu Nov 8 14:38:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 994914 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42rQsL31bxz9s9J; Fri, 9 Nov 2018 01:39:18 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1gKlSj-0001eE-GS; Thu, 08 Nov 2018 14:39:13 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1gKlSh-0001cy-3E for kernel-team@lists.ubuntu.com; Thu, 08 Nov 2018 14:39:11 +0000 Received: from mail-qk1-f198.google.com ([209.85.222.198]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gKlSg-0005bg-PN for kernel-team@lists.ubuntu.com; Thu, 08 Nov 2018 14:39:10 +0000 Received: by mail-qk1-f198.google.com with SMTP id g22so3834133qke.15 for ; Thu, 08 Nov 2018 06:39:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=3aqYCcCB53MTX7lZkL6Dxb8Bh1NpTEsAg+aal0OS38o=; b=FaN3ijjRkOAibNxkaOynoyXCRUR8Y28SoJtebIXi8ajH9J3sIhD4o4XtnfQPHmFCvj gwoZRCYyU4IZx/hWLBdvR5pNv+cFSe6rbCWgFXU2CuZEW0hJfF+yRSE7l+wy2unMc6vV cDbbT05xf6HThFYDo1qDGSa8oPDycVRW9Aewb6SOwWlIbcSmnb9SgM0ufN9e1/y9Scy4 ZduRvRlKCpNd7rhbzzsZiZkehOUY4NyCmsW4QVWxFzWF6fHTJ0u+QDOHiUsvDvwpsK7P 0N4gMT9Wcm2uvCA74lYhzDfb2ZNoz8mXwqwx7sJzd3s4K9MDuuzeFVWVTGyAmR0Gd9ch 4r2A== X-Gm-Message-State: AGRZ1gKGAArnCqSRQSKQxyDoK4DZrhuLjrdi1+Aj0E38RnkheVqe7T5U uEPR70B3AmNREa1L9kJ2pDPHTUnLvM2O9NYUcwdBZ4mNi1m9PtDYmi3SLViPQ86aftmz7kQJE7y pIzlNuhmR8D6NxG2BLhmxbAf4mb+LrVA5agXijwnzCw== X-Received: by 2002:a0c:8a29:: with SMTP id 38mr4557937qvt.222.1541687949801; Thu, 08 Nov 2018 06:39:09 -0800 (PST) X-Google-Smtp-Source: AJdET5cYMpOGU7GAIaAEvIHq0A+u9hvN4k1thlgewI7fAjTgjTbhSQ84iZnpWmohhMKERQt5caNXLQ== X-Received: by 2002:a0c:8a29:: with SMTP id 38mr4557921qvt.222.1541687949546; Thu, 08 Nov 2018 06:39:09 -0800 (PST) Received: from localhost.localdomain ([179.159.57.206]) by smtp.gmail.com with ESMTPSA id v50sm3116971qtc.7.2018.11.08.06.39.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 08 Nov 2018 06:39:09 -0800 (PST) From: Mauricio Faria de Oliveira To: kernel-team@lists.ubuntu.com Subject: [SRU X][PATCH v2 2/3] UBUNTU: SAUCE: x86/quirks: Add parameter to clear MSIs early on boot Date: Thu, 8 Nov 2018 12:38:50 -0200 Message-Id: <20181108143851.15758-3-mfo@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181108143851.15758-1-mfo@canonical.com> References: <20181108143851.15758-1-mfo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: gpiccoli@canonical.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: "Guilherme G. Piccoli" BugLink: https://bugs.launchpad.net/bugs/1797990 We observed a kdump failure in x86 that was narrowed down to MSI irq storm coming from a PCI network device. The bug manifests as a lack of progress in the boot process of kdump kernel, and a flood of kernel messages like: [...] [ 342.265294] do_IRQ: 0.155 No irq handler for vector [ 342.266916] do_IRQ: 0.155 No irq handler for vector [ 347.258422] do_IRQ: 14053260 callbacks suppressed [...] The root cause of the issue is that kexec process of the kdump kernel doesn't ensure PCI devices are reset or MSI capabilities are disabled, so a PCI adapter could produce a huge amount of irqs which would steal all the processing time for the CPU (specially since we usually restrict kdump kernel to use a single CPU only). This patch implements the kernel parameter "pci=clearmsi" to clear the MSI/MSI-X enable bits in the Message Control register for all PCI devices during early boot time, thus preventing potential issues in the kexec'ed kernel. PCI spec also supports/enforces this need (see PCI Local Bus spec sections 6.8.1.3 and 6.8.2.3). Suggested-by: Dan Streetman Suggested-by: Gavin Shan Signed-off-by: Guilherme G. Piccoli [mfo: backport to ubuntu-xenial: - different path for Documentation/.../kernel-parameters.txt - update context lines in pci-direct.h and early-quirks.c] Signed-off-by: Mauricio Faria de Oliveira --- Documentation/kernel-parameters.txt | 6 ++++++ arch/x86/include/asm/pci-direct.h | 1 + arch/x86/kernel/early-quirks.c | 32 +++++++++++++++++++++++++++++ arch/x86/pci/common.c | 4 ++++ 4 files changed, 43 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index cffe6b0eaeb2..e4ed78be15f8 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2867,6 +2867,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nomsi [MSI] If the PCI_MSI kernel config parameter is enabled, this kernel boot option can be used to disable the use of MSI interrupts system-wide. + clearmsi [X86] Clears MSI/MSI-X enable bits early in boot + time in order to avoid issues like adapters + screaming irqs and preventing boot progress. + Also, it enforces the PCI Local Bus spec + rule that those bits should be 0 in system reset + events (useful for kexec/kdump cases). noioapicquirk [APIC] Disable all boot interrupt quirks. Safety option to keep boot IRQs enabled. This should never be necessary. diff --git a/arch/x86/include/asm/pci-direct.h b/arch/x86/include/asm/pci-direct.h index c5a8edf2dc8b..4af4c656a4cc 100644 --- a/arch/x86/include/asm/pci-direct.h +++ b/arch/x86/include/asm/pci-direct.h @@ -14,6 +14,7 @@ extern void write_pci_config(u8 bus, u8 slot, u8 func, u8 offset, u32 val); extern void write_pci_config_byte(u8 bus, u8 slot, u8 func, u8 offset, u8 val); extern void write_pci_config_16(u8 bus, u8 slot, u8 func, u8 offset, u16 val); +extern unsigned int pci_early_clear_msi; extern int early_pci_allowed(void); extern unsigned int pci_early_dump_regs; diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index a257d6077d1b..22e8e88807d8 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -27,6 +27,37 @@ #include #include +static void __init early_pci_clear_msi(int bus, int slot, int func) +{ + int pos; + u16 ctrl; + + if (likely(!pci_early_clear_msi)) + return; + + pr_info_once("Clearing MSI/MSI-X enable bits early in boot (quirk)\n"); + + pos = pci_early_find_cap(bus, slot, func, PCI_CAP_ID_MSI); + if (pos) { + ctrl = read_pci_config_16(bus, slot, func, pos + PCI_MSI_FLAGS); + ctrl &= ~PCI_MSI_FLAGS_ENABLE; + write_pci_config_16(bus, slot, func, pos + PCI_MSI_FLAGS, ctrl); + + /* Read again to flush previous write */ + ctrl = read_pci_config_16(bus, slot, func, pos + PCI_MSI_FLAGS); + } + + pos = pci_early_find_cap(bus, slot, func, PCI_CAP_ID_MSIX); + if (pos) { + ctrl = read_pci_config_16(bus, slot, func, pos + PCI_MSIX_FLAGS); + ctrl &= ~PCI_MSIX_FLAGS_ENABLE; + write_pci_config_16(bus, slot, func, pos + PCI_MSIX_FLAGS, ctrl); + + /* Read again to flush previous write */ + ctrl = read_pci_config_16(bus, slot, func, pos + PCI_MSIX_FLAGS); + } +} + #define dev_err(msg) pr_err("pci 0000:%02x:%02x.%d: %s", bus, slot, func, msg) static void __init fix_hypertransport_config(int num, int slot, int func) @@ -701,6 +732,7 @@ static struct chipset early_qrk[] __initdata = { PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet}, { PCI_VENDOR_ID_BROADCOM, 0x4331, PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, 0, apple_airport_reset}, + { PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, early_pci_clear_msi}, {} }; diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index 8fd6f44aee83..c049ccca03a5 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -35,6 +35,7 @@ int noioapicreroute = 1; #endif int pcibios_last_bus = -1; unsigned long pirq_table_addr; +unsigned int pci_early_clear_msi; const struct pci_raw_ops *__read_mostly raw_pci_ops; const struct pci_raw_ops *__read_mostly raw_pci_ext_ops; @@ -621,6 +622,9 @@ char *__init pcibios_setup(char *str) } else if (!strcmp(str, "skip_isa_align")) { pci_probe |= PCI_CAN_SKIP_ISA_ALIGN; return NULL; + } else if (!strcmp(str, "clearmsi")) { + pci_early_clear_msi = 1; + return NULL; } else if (!strcmp(str, "noioapicquirk")) { noioapicquirk = 1; return NULL;