From patchwork Wed Jul 26 05:59:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kai-Heng Feng X-Patchwork-Id: 1812970 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=FpnXUhpJ; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4R9jtR1tMSz1yZv for ; Wed, 26 Jul 2023 16:00:31 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1qOXZR-0003uN-Rr; Wed, 26 Jul 2023 06:00:25 +0000 Received: from smtp-relay-canonical-1.internal ([10.131.114.174] helo=smtp-relay-canonical-1.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1qOXZL-0003ki-KB for kernel-team@lists.ubuntu.com; Wed, 26 Jul 2023 06:00:19 +0000 Received: from localhost.localdomain (unknown [10.101.196.174]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-1.canonical.com (Postfix) with ESMTPSA id 2F04642A9B for ; Wed, 26 Jul 2023 06:00:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1690351219; bh=GaV4mvus0YD8Nqt/4gAtEtfdxq/JwNngg2lly4JtNps=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FpnXUhpJO8jRXxnkpFzWZFTZWeLBUtqq7JUhxmvslMFN9xtX73Y4qM8XyvUao9tgX e7AsdgRj0ceLgEkfDUA2iPpO7fTItEr3EZahXysc/IJCFBcTKdCTANK6t7L2ah189f xQPugOJSgUq9QzoeWUV8ngdJZ7I87FwknKw9+VB80q69CG1sAFduccSnWj7RdTa9sI u2i3937Fh8/1DyeqWEw1StMszb8bKq6lVLtdyStkCU+dyxzLs2MRd4pWQzYxerrgN3 ANpLvnHGKt9coAzB7X91SLRxhpO0bPGNLa3lx0L9KeJotu4diHFVZJy9KFLsxOP3dA N3sJ/lzA/aNiA== From: Kai-Heng Feng To: kernel-team@lists.ubuntu.com Subject: [L/Unstable] [PATCH 6/6] EDAC/i10nm: Skip the absent memory controllers Date: Wed, 26 Jul 2023 13:59:17 +0800 Message-Id: <20230726055917.3291633-7-kai.heng.feng@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230726055917.3291633-1-kai.heng.feng@canonical.com> References: <20230726055917.3291633-1-kai.heng.feng@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Qiuxu Zhuo BugLink: https://bugs.launchpad.net/bugs/2028746 Some Sapphire Rapids workstations' absent memory controllers still appear as PCIe devices that fool the i10nm_edac driver and result in "shift exponent -66 is negative" call traces from skx_get_dimm_info(). Skip the absent memory controllers to avoid the call traces. Reported-by: Kai-Heng Feng Closes: https://lore.kernel.org/linux-edac/CAAd53p41Ku1m1rapeqb1xtD+kKuk+BaUW=dumuoF0ZO3GhFjFA@mail.gmail.com/T/#m5de16dce60a8c836ec235868c7c16e3fefad0cc2 Tested-by: Kai-Heng Feng Reported-by: Koba Ko Closes: https://lore.kernel.org/linux-edac/SA1PR11MB71305B71CCCC3D9305835202892AA@SA1PR11MB7130.namprd11.prod.outlook.com/T/#t Tested-by: Koba Ko Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Signed-off-by: Qiuxu Zhuo Signed-off-by: Tony Luck Link: https://lore.kernel.org/r/20230710013232.59712-1-qiuxu.zhuo@intel.com (cherry picked from commit c545f5e412250555bd4e717d062b117f20bab418 linux-next) Signed-off-by: Kai-Heng Feng --- drivers/edac/i10nm_base.c | 54 +++++++++++++++++++++++++++++++++++---- 1 file changed, 49 insertions(+), 5 deletions(-) diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c index 46034310b78e..7ed98b85aa90 100644 --- a/drivers/edac/i10nm_base.c +++ b/drivers/edac/i10nm_base.c @@ -622,13 +622,49 @@ static struct pci_dev *get_ddr_munit(struct skx_dev *d, int i, u32 *offset, unsi return mdev; } +/** + * i10nm_imc_absent() - Check whether the memory controller @imc is absent + * + * @imc : The pointer to the structure of memory controller EDAC device. + * + * RETURNS : true if the memory controller EDAC device is absent, false otherwise. + */ +static bool i10nm_imc_absent(struct skx_imc *imc) +{ + u32 mcmtr; + int i; + + switch (res_cfg->type) { + case SPR: + for (i = 0; i < res_cfg->ddr_chan_num; i++) { + mcmtr = I10NM_GET_MCMTR(imc, i); + edac_dbg(1, "ch%d mcmtr reg %x\n", i, mcmtr); + if (mcmtr != ~0) + return false; + } + + /* + * Some workstations' absent memory controllers still + * appear as PCIe devices, misleading the EDAC driver. + * By observing that the MMIO registers of these absent + * memory controllers consistently hold the value of ~0. + * + * We identify a memory controller as absent by checking + * if its MMIO register "mcmtr" == ~0 in all its channels. + */ + return true; + default: + return false; + } +} + static int i10nm_get_ddr_munits(void) { struct pci_dev *mdev; void __iomem *mbase; unsigned long size; struct skx_dev *d; - int i, j = 0; + int i, lmc, j = 0; u32 reg, off; u64 base; @@ -654,7 +690,7 @@ static int i10nm_get_ddr_munits(void) edac_dbg(2, "socket%d mmio base 0x%llx (reg 0x%x)\n", j++, base, reg); - for (i = 0; i < res_cfg->ddr_imc_num; i++) { + for (lmc = 0, i = 0; i < res_cfg->ddr_imc_num; i++) { mdev = get_ddr_munit(d, i, &off, &size); if (i == 0 && !mdev) { @@ -664,8 +700,6 @@ static int i10nm_get_ddr_munits(void) if (!mdev) continue; - d->imc[i].mdev = mdev; - edac_dbg(2, "mc%d mmio base 0x%llx size 0x%lx (reg 0x%x)\n", i, base + off, size, reg); @@ -676,7 +710,17 @@ static int i10nm_get_ddr_munits(void) return -ENODEV; } - d->imc[i].mbase = mbase; + d->imc[lmc].mbase = mbase; + if (i10nm_imc_absent(&d->imc[lmc])) { + pci_dev_put(mdev); + iounmap(mbase); + d->imc[lmc].mbase = NULL; + edac_dbg(2, "Skip absent mc%d\n", i); + continue; + } else { + d->imc[lmc].mdev = mdev; + lmc++; + } } }