From patchwork Mon Nov 13 10:13:34 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 837410 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yb61K2CzVz9sMN for ; Mon, 13 Nov 2017 21:13:57 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="eUXWi98r"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3yb61K0J0yzDqZf for ; Mon, 13 Nov 2017 21:13:57 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="eUXWi98r"; dkim-atps=neutral X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c00::243; helo=mail-pf0-x243.google.com; envelope-from=bsingharora@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="eUXWi98r"; dkim-atps=neutral Received: from mail-pf0-x243.google.com (mail-pf0-x243.google.com [IPv6:2607:f8b0:400e:c00::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3yb61D1tWczDqZ8 for ; Mon, 13 Nov 2017 21:13:51 +1100 (AEDT) Received: by mail-pf0-x243.google.com with SMTP id t69so2511438pfg.4 for ; Mon, 13 Nov 2017 02:13:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=jrnL9X8I+U2zHYOXTOaTGDn1/pPTmDNp8//ZWciVU7Q=; b=eUXWi98rgSby7uLT/FZUSljv7C4I/cHIT10LqRWxdlxvEk0Y6nGxOW1TKWzKrwbI7A kVjH35VjjRAo+xc8ZOZD60v9HDrltwwwp7KKz+hkiKHe0XDmA1DrJ+1WRKAzF8BW8CVD 4XOkDs2C1RwEQL79HjS+vsVAOR3Dbp61yXM0kftlpna5q+OLzQ3utTCo3QUVCN1Vf8rc qPAadkNRz8L0qQIpnqsjAt4vhmgPKq8GSYofvQh/zqq3ss4cXe11oKCgQsbUutDjdXS/ Nu51cp7T3NgBCH9tzaM5fJlrCYkU8FvHiE7adNBhI3xOuAS5xw+ONhncYiP3lFygymt4 s5lA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=jrnL9X8I+U2zHYOXTOaTGDn1/pPTmDNp8//ZWciVU7Q=; b=WSX0a56E0SbbQDmHVaP9iNkBAXNvgtxF/bWtZwoRhavHcN+ZbW4C5tN8yyEM3P6QSn 5TA2B2iVYb0paWZsclmeLA+BVDO5H9ZYoGyAZpOp3uTBY7JL302sjypIFJkBVsgjf5Q2 BK7hYOnMPtdw42o6od7j1bsLdQFl8jJ10wGbm57mfJ1M3xC7nHRpHrElP50FSEm3iZaY JcXlkCb6mmkCDaLy/luevcVPM1R8EUVOA+2ksiPuIdtBYDcx1tcHAk2TCU8X/ZWN8k02 T0Uj8tCdH484Uqowc5v8f2XrAJ1c4SrVkWzqjtJwC1uYzdbL5BAW3M1dTRWLWATdGC9N HqOg== X-Gm-Message-State: AJaThX6ubf9Hto0epYPXNnr8YEs97ypCBwcpT2CXF4zOhrh4E2BdP3XW q3b3ICAclIlHp1vovANgpSU= X-Google-Smtp-Source: AGs4zMbyAP0fEQpU2i0n3Hl4rliOG0hOcWagAqqBTOsCg4Sd43i7os3wKfrp2Ty15NyA7zj2KAu8lg== X-Received: by 10.99.96.87 with SMTP id u84mr8310808pgb.424.1510568028968; Mon, 13 Nov 2017 02:13:48 -0800 (PST) Received: from localhost.au.ibm.com (14-202-194-140.static.tpgi.com.au. [14.202.194.140]) by smtp.googlemail.com with ESMTPSA id e17sm32138703pfb.53.2017.11.13.02.13.46 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 13 Nov 2017 02:13:48 -0800 (PST) From: Balbir Singh To: alistair@popple.id.au Date: Mon, 13 Nov 2017 21:13:34 +1100 Message-Id: <20171113101334.31026-1-bsingharora@gmail.com> X-Mailer: git-send-email 2.13.6 Subject: [Skiboot] [RFC] hw/npu2: Implement logging HMI actions X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.24 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: skiboot@lists.ozlabs.org, arbab@linux.vnet.ibm.com MIME-Version: 1.0 Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" Log HMI errors as a starting step. We'll need to get Linux to deduce and interpret the HMI event. The expectation is that we'll add a corresponding handler in HMI handling parts to deal with the various errors that have not been unhandled. Signed-off-by: Balbir Singh --- core/hmi.c | 84 ++++++++++++++++++++++++++++++++++++++++++++- include/npu2-regs.h | 18 ++++++++++ include/opal-api.h | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 199 insertions(+), 1 deletion(-) diff --git a/core/hmi.c b/core/hmi.c index 07c08462..b450032c 100644 --- a/core/hmi.c +++ b/core/hmi.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -517,6 +518,87 @@ static void find_nx_checkstop_reason(int flat_chip_id, *event_generated = true; } +static void find_npu2_checkstop_reason(int flat_chip_id, + struct OpalHMIEvent *hmi_evt, + bool *event_generated) +{ + struct phb *phb; + struct npu *p = NULL; + int i; + + uint64_t npu2_fir; + uint64_t npu2_fir_mask; + uint64_t npu2_fir_action0; + uint64_t npu2_fir_action1; + uint64_t npu2_fir_addr; + uint64_t npu2_fir_mask_addr; + uint64_t npu2_fir_action0_addr; + uint64_t npu2_fir_action1_addr; + uint64_t fatal_errors; + + /* Only check for NPU errors if the chip has a NPU */ + if (PVR_TYPE(mfspr(SPR_PVR)) != PVR_TYPE_P9) + return; + + /* Find the NPU on the chip associated with the HMI. */ + for_each_phb(phb) { + /* NOTE: if a chip ever has >1 NPU this will need adjusting */ + if (dt_node_is_compatible(phb->dt_node, "ibm,power9-npu-pciex") && + (dt_get_chip_id(phb->dt_node) == flat_chip_id)) { + p = phb_to_npu(phb); + break; + } + } + + /* If we didn't find a NPU on the chip, it's not our checkstop. */ + if (p == NULL) + return; + + npu2_fir_addr = NPU2_FIR_REGISTER_0; + npu2_fir_mask_addr = NPU2_FIR_REGISTER_0_MASK; + npu2_fir_action0_addr = NPU2_FIR_REGISTER_0_ACTION0; + npu2_fir_action1_addr = NPU2_FIR_REGISTER_0_ACTION1; + + for (i = 0; i < NPU2_TOTAL_FIR_REGISTERS; i++) { + /* Read all the registers necessary to find a checkstop condition. */ + if (xscom_read(flat_chip_id, npu2_fir_addr, &npu2_fir) || + xscom_read(flat_chip_id, npu2_fir_mask_addr, &npu2_fir_mask) || + xscom_read(flat_chip_id, npu2_fir_action0_addr, &npu2_fir_action0) || + xscom_read(flat_chip_id, npu2_fir_action1_addr, &npu2_fir_action1)) { + prerror("HMI: Couldn't read NPU FIR register%d with XSCOM\n", i); + continue; + } + + fatal_errors = npu2_fir & ~npu2_fir_mask & npu2_fir_action0 & npu2_fir_action1; + + /* If there's no errors, we don't need to do anything. */ + if (!fatal_errors) + continue; + + prlog(PR_ERR, "NPU: FIR#%d FIR 0x%016llx mask 0x%016llx\n", + i, npu2_fir, npu2_fir_mask); + prlog(PR_ERR, "NPU: ACTION0 0x%016llx, ACTION1 0x%016llx\n", + npu2_fir_action0, npu2_fir_action1); + + /* Can't do a fence yet, we are just logging fir information for now */ + npu2_fir_addr += NPU2_FIR_OFFSET; + npu2_fir_mask_addr += NPU2_FIR_OFFSET; + npu2_fir_action0_addr += NPU2_FIR_OFFSET; + npu2_fir_action1_addr += NPU2_FIR_OFFSET; + + } + + /* Set up the HMI event */ + hmi_evt->severity = OpalHMI_SEV_WARNING; + hmi_evt->type = OpalHMI_ERROR_MALFUNC_ALERT; + hmi_evt->u.xstop_error.xstop_type = CHECKSTOP_TYPE_NPU; + hmi_evt->u.xstop_error.u.chip_id = flat_chip_id; + + /* Marking the event as recoverable so that we don't crash */ + queue_hmi_event(hmi_evt, 1); + *event_generated = true; +} + static void find_npu_checkstop_reason(int flat_chip_id, struct OpalHMIEvent *hmi_evt, bool *event_generated) @@ -532,7 +614,7 @@ static void find_npu_checkstop_reason(int flat_chip_id, /* Only check for NPU errors if the chip has a NPU */ if (PVR_TYPE(mfspr(SPR_PVR)) != PVR_TYPE_P8NVL) - return; + return find_npu2_checkstop_reason(flat_chip_id, hmi_evt, event_generated); /* Find the NPU on the chip associated with the HMI. */ for_each_phb(phb) { diff --git a/include/npu2-regs.h b/include/npu2-regs.h index fd1367b7..b7522f4a 100644 --- a/include/npu2-regs.h +++ b/include/npu2-regs.h @@ -466,4 +466,22 @@ void npu2_write_mask(struct npu2 *p, uint64_t reg, uint64_t val, uint64_t mask); #define NPU2_DD1_MISC_SCOM_IND_SCOM_DATA 0x38f #define NPU2_MISC_SCOM_IND_SCOM_DATA 0x68f +#define NPU2_FIR_REGISTER_0 0x0000000005013C00 +#define NPU2_FIR_REGISTER_0_MASK 0x0000000005013C03 +#define NPU2_FIR_REGISTER_0_ACTION0 0x0000000005013C06 +#define NPU2_FIR_REGISTER_0_ACTION1 0x0000000005013C07 + +#define NPU2_FIR_REGISTER_1 0x0000000005013C40 +#define NPU2_FIR_REGISTER_1_MASK 0x0000000005013C43 +#define NPU2_FIR_REGISTER_1_ACTION0 0x0000000005013C46 +#define NPU2_FIR_REGISTER_1_ACTION1 0x0000000005013C47 + +#define NPU2_FIR_REGISTER_2 0x0000000005013C80 +#define NPU2_FIR_REGISTER_2_MASK 0x0000000005013C83 +#define NPU2_FIR_REGISTER_2_ACTION0 0x0000000005013C86 +#define NPU2_FIR_REGISTER_2_ACTION1 0x0000000005013C87 + +#define NPU2_TOTAL_FIR_REGISTERS 3 +#define NPU2_FIR_OFFSET 0x40 + #endif /* __NPU2_REGS_H */ diff --git a/include/opal-api.h b/include/opal-api.h index 0bc036ed..cef88f68 100644 --- a/include/opal-api.h +++ b/include/opal-api.h @@ -725,6 +725,104 @@ enum OpalHMI_NestAccelXstopReason { NX_CHECKSTOP_PBI_ISN_UE = 0x00002000, }; +/* + * Can't use enums for 64 bit values, use #defines + */ +#define NPU2_CHECKSTOP_REG0_NTL_ARRAY_CE PPC_BIT(0) +#define NPU2_CHECKSTOP_REG0_NTL_ARRAY_HDR_CE PPC_BIT(1) +#define NPU2_CHECKSTOP_REG0_NTL_ARRAY_DATA_UE PPC_BIT(2) +#define NPU2_CHECKSTOP_REG0_NTL_NVL_FLIT_PERR PPC_BIT(3) +#define NPU2_CHECKSTOP_REG0_NTL_NVL_DATA_PERR PPC_BIT(4) +#define NPU2_CHECKSTOP_REG0_NTL_NVL_PKT_MALFOR PPC_BIT(5) +#define NPU2_CHECKSTOP_REG0_NTL_NVL_PKT_UNSUPPORTED PPC_BIT(6) +#define NPU2_CHECKSTOP_REG0_NTL_NVL_CONFIG_ERR PPC_BIT(7) +#define NPU2_CHECKSTOP_REG0_NTL_NVL_CRC_ERR PPC_BIT(8) +#define NPU2_CHECKSTOP_REG0_NTL_PRI_ERR PPC_BIT(9) +#define NPU2_CHECKSTOP_REG0_NTL_LOGIC_ERR PPC_BIT(10) +#define NPU2_CHECKSTOP_REG0_NTL_LMD_POISON PPC_BIT(11) +#define NPU2_CHECKSTOP_REG0_NTL_ARRAY_DATA_SUE PPC_BIT(12) +#define NPU2_CHECKSTOP_REG0_CTL_ARRAY_CE PPC_BIT(13) +#define NPU2_CHECKSTOP_REG0_CTL_PBUS_RECOV_ERR PPC_BIT(14) +#define NPU2_CHECKSTOP_REG0_CTL_REG_RING_ERR PPC_BIT(15) +#define NPU2_CHECKSTOP_REG0_CTL_MMIO_ST_DATA_UE PPC_BIT(16) +#define NPU2_CHECKSTOP_REG0_CTL_PEF PPC_BIT(17) +#define NPU2_CHECKSTOP_REG0_CTL_NVL_CFG_ERR PPC_BIT(18) +#define NPU2_CHECKSTOP_REG0_CTL_NVL_FATAL_ERR PPC_BIT(19) +#define NPU2_CHECKSTOP_REG0_RESERVED_1 PPC_BIT(20) +#define NPU2_CHECKSTOP_REG0_CTL_ARRAY_UE PPC_BIT(21) +#define NPU2_CHECKSTOP_REG0_CTL_PBUS_PERR PPC_BIT(22) +#define NPU2_CHECKSTOP_REG0_CTL_PBUS_FATAL_ERR PPC_BIT(23) +#define NPU2_CHECKSTOP_REG0_CTL_PBUS_CONFIG_ERR PPC_BIT(24) +#define NPU2_CHECKSTOP_REG0_CTL_FWD_PROGRESS_ERR PPC_BIT(25) +#define NPU2_CHECKSTOP_REG0_CTL_LOGIC_ERR PPC_BIT(26) +#define NPU2_CHECKSTOP_REG0_DAT_DATA_BE_UE PPC_BIT(29) +#define NPU2_CHECKSTOP_REG0_DAT_DATA_BE_CE PPC_BIT(30) +#define NPU2_CHECKSTOP_REG0_DAT_DATA_BE_PERR PPC_BIT(31) +#define NPU2_CHECKSTOP_REG0_DAT_CREG_PERR PPC_BIT(32) +#define NPU2_CHECKSTOP_REG0_DAT_RTAG_PERR PPC_BIT(33) +#define NPU2_CHECKSTOP_REG0_DAT_STATE_PERR PPC_BIT(34) +#define NPU2_CHECKSTOP_REG0_DAT_LOGIC_ERR PPC_BIT(35) +#define NPU2_CHECKSTOP_REG0_DAT_DATA_BE_SUE PPC_BIT(36) +#define NPU2_CHECKSTOP_REG0_DAT_PBRX_SUE PPC_BIT(37) +#define NPU2_CHECKSTOP_REG0_XTS_INT PPC_BIT(40) +#define NPU2_CHECKSTOP_REG0_XTS_SRAM_CE PPC_BIT(41) +#define NPU2_CHECKSTOP_REG0_XTS_SRAM_UE PPC_BIT(42) +#define NPU2_CHECKSTOP_REG0_XTS_PROTOCOL_CE PPC_BIT(43) +#define NPU2_CHECKSTOP_REG0_XTS_PROTOCOL_UE PPC_BIT(44) +#define NPU2_CHECKSTOP_REG0_XTS_PBUS_PROTOCOL PPC_BIT(45) + +#define NPU2_CHECKSTOP_REG1_NDL_BRK0_STALL PPC_BIT(0) +#define NPU2_CHECKSTOP_REG1_NDL_BRK0_NOSTALL PPC_BIT(1) +#define NPU2_CHECKSTOP_REG1_NDL_BRK1_STALL PPC_BIT(2) +#define NPU2_CHECKSTOP_REG1_NDL_BRK1_NOSTALL PPC_BIT(3) +#define NPU2_CHECKSTOP_REG1_NDL_BRK2_STALL PPC_BIT(4) +#define NPU2_CHECKSTOP_REG1_NDL_BRK2_NOSTALL PPC_BIT(5) +#define NPU2_CHECKSTOP_REG1_NDL_BRK3_STALL PPC_BIT(6) +#define NPU2_CHECKSTOP_REG1_NDL_BRK3_NOSTALL PPC_BIT(7) +#define NPU2_CHECKSTOP_REG1_NDL_BRK4_STALL PPC_BIT(8) +#define NPU2_CHECKSTOP_REG1_NDL_BRK4_NOSTALL PPC_BIT(9) +#define NPU2_CHECKSTOP_REG1_NDL_BRK5_STALL PPC_BIT(10) +#define NPU2_CHECKSTOP_REG1_NDL_BRK5_NOSTALL PPC_BIT(11) +#define NPU2_CHECKSTOP_REG1_MISC_REG_RING_ERR PPC_BIT(12) +#define NPU2_CHECKSTOP_REG1_MISC_INT_RA_PERR PPC_BIT(13) +#define NPU2_CHECKSTOP_REG1_MISC_DA_ADDR_PERR PPC_BIT(14) +#define NPU2_CHECKSTOP_REG1_MISC_CTRL_PERR PPC_BIT(15) +#define NPU2_CHECKSTOP_REG1_MISC_NMMU_ERR PPC_BIT(16) +#define NPU2_CHECKSTOP_REG1_ATS_TVT_ENTRY_INVALID PPC_BIT(17) +#define NPU2_CHECKSTOP_REG1_ATS_TVT_ADDR_RANGE_ERR PPC_BIT(18) +#define NPU2_CHECKSTOP_REG1_ATS_TCE_PAGE_ACCESS_CA_ERR PPC_BIT(19) +#define NPU2_CHECKSTOP_REG1_ATS_TCE_CACHE_MULT_HIT_ERR PPC_BIT(20) +#define NPU2_CHECKSTOP_REG1_ATS_TCE_PAGE_ACCESS_TW_ERR PPC_BIT(21) +#define NPU2_CHECKSTOP_REG1_ATS_TCE_REQ_TO_ERR PPC_BIT(22) +#define NPU2_CHECKSTOP_REG1_ATS_TCD_PERR PPC_BIT(23) +#define NPU2_CHECKSTOP_REG1_ATS_TDR_PERR PPC_BIT(24) +#define NPU2_CHECKSTOP_REG1_ATS_AT_EA_UE PPC_BIT(25) +#define NPU2_CHECKSTOP_REG1_ATS_AT_EA_CE PPC_BIT(26) +#define NPU2_CHECKSTOP_REG1_ATS_AT_TDRMEM_UE PPC_BIT(27) +#define NPU2_CHECKSTOP_REG1_ATS_AT_TDRMEM_CE PPC_BIT(28) +#define NPU2_CHECKSTOP_REG1_ATS_AT_RSPOUT_UE PPC_BIT(29) +#define NPU2_CHECKSTOP_REG1_ATS_AT_RSPOUT_CE PPC_BIT(30) +#define NPU2_CHECKSTOP_REG1_ATS_TVT_PERR PPC_BIT(31) +#define NPU2_CHECKSTOP_REG1_ATS_IODA_ADDR_PERR PPC_BIT(32) +#define NPU2_CHECKSTOP_REG1_ATS_NPU_CTRL_PERR PPC_BIT(33) +#define NPU2_CHECKSTOP_REG1_ATS_NPU_TOR_PERR PPC_BIT(34) +#define NPU2_CHECKSTOP_REG1_ATS_INVAL_IODA_TBL_SEL PPC_BIT(35) + +#define NPU2_CHECKSTOP_REG2_XSL_MMIO_INVALIDATE_REQ_WHILE_1_INPROG PPC_BIT(36) +#define NPU2_CHECKSTOP_REG2_XSL_UNEXPECTED_ITAG_PORT_0 PPC_BIT(37) +#define NPU2_CHECKSTOP_REG2_XSL_UNEXPECTED_ITAG_PORT_1 PPC_BIT(38) +#define NPU2_CHECKSTOP_REG2_XSL_UNEXPECTED_RD_PEE_COMPLETION PPC_BIT(39) +#define NPU2_CHECKSTOP_REG2_XSL_UNEXPECTED_CO_RESP PPC_BIT(40) +#define NPU2_CHECKSTOP_REG2_XSL_XLAT_REQ_WHILE_SPAP_INVALID PPC_BIT(41) +#define NPU2_CHECKSTOP_REG2_XSL_INVALID_PEE PPC_BIT(42) +#define NPU2_CHECKSTOP_REG2_XSL_BLOOM_FILTER_PROTECT_ERR PPC_BIT(43) +#define NPU2_CHECKSTOP_REG2_XSL_CE PPC_BIT(46) +#define NPU2_CHECKSTOP_REG2_XSL_UE PPC_BIT(47) +#define NPU2_CHECKSTOP_REG2_XSL_SLBI_TLBI_BUFF_OVERFLOW PPC_BIT(48) +#define NPU2_CHECKSTOP_REG2_XSL_SBE_CORR_ERR_PB_CHKOUT_RSP_DATA PPC_BIT(49) +#define NPU2_CHECKSTOP_REG2_XSL_UE_PB_CHKOUT_RSP_DATA PPC_BIT(50) +#define NPU2_CHECKSTOP_REG2_XSL_SUE_PB_CHKOUT_RSP_DATA PPC_BIT(51) + struct OpalHMIEvent { uint8_t version; /* 0x00 */ uint8_t severity; /* 0x01 */