From patchwork Wed Oct 7 12:09:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh J Salgaonkar X-Patchwork-Id: 1377981 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4C5tWN228Vz9sSG for ; Wed, 7 Oct 2020 23:12:32 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=Dvli2qv7; dkim-atps=neutral Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4C5tWM4gcyzDqNg for ; Wed, 7 Oct 2020 23:12:31 +1100 (AEDT) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0b-001b2d01.pphosted.com; envelope-from=mahesh@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=Dvli2qv7; dkim-atps=neutral Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4C5tS139GhzDqP4 for ; Wed, 7 Oct 2020 23:09:37 +1100 (AEDT) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 097C7jFZ090268 for ; Wed, 7 Oct 2020 08:09:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=fxUr31gBYHuaATrNEpOD/7/jKXY0EHyRIjUh1wHuO0M=; b=Dvli2qv70BnKWsLspubsmSQqj0wRJmlz62hz6eRqRnXHYFh0VBVTXNJ8i7qHxHWoreHk 3j4s6snk1QYs5yzgfJuqkWjG2ciIpequY/mn/Nv60rwq/OwYAkPD6WLd6WT9RxF7/PWC ukR1G2DLx1SxTZ+6xuxqNuC9eKZwbI2req0IgppXJ/kHu+OsF/tp+9c2ShejIDqNS8P/ tuTilnDLP4qW1tzh25SnWabJnpQo94DUiHTrG4kz7ioUquu/IIbZamOVQdcLROtXsku7 VTX/HbXFXZZE4kdLODQ+D9tk/MMs6nFTtUav8YOnLNV23xG230YAVcZfbqh6yMGhNJ/9 wQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 341cr99g1d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 07 Oct 2020 08:09:34 -0400 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 097C7RuV089084 for ; Wed, 7 Oct 2020 08:09:34 -0400 Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 341cr99g0g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 07 Oct 2020 08:09:34 -0400 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 097C7fvU003924; Wed, 7 Oct 2020 12:09:32 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma04fra.de.ibm.com with ESMTP id 33xgx8271s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 07 Oct 2020 12:09:32 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 097C9TAa35651948 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 7 Oct 2020 12:09:29 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7A730A4059; Wed, 7 Oct 2020 12:09:29 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D51A4A4051; Wed, 7 Oct 2020 12:09:28 +0000 (GMT) Received: from [192.168.0.63] (unknown [9.199.48.78]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 7 Oct 2020 12:09:28 +0000 (GMT) From: Mahesh Salgaonkar To: skiboot list Date: Wed, 07 Oct 2020 17:39:28 +0530 Message-ID: <160207256805.2097386.5051694589375511182.stgit@jupiter> In-Reply-To: <160207247879.2097386.9393389763183654717.stgit@jupiter> References: <160207247879.2097386.9393389763183654717.stgit@jupiter> User-Agent: StGit/0.21 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-10-07_08:2020-10-06, 2020-10-07 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 bulkscore=0 mlxlogscore=999 impostorscore=0 phishscore=0 malwarescore=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 mlxscore=0 adultscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010070079 Subject: [Skiboot] [PATCH v2 06/10] opal/eeh: Send an error callout on EEH error. X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vasant Hegde Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" On EEH error send out an error log (eSEL) with hardware callout. To avoid generating multiple events for same error, use a bit flag in generic PHB structure. Whenever an EEH freeze/fence is detected, a SEND error log bit is set. The error log includes FRU details and PHB diag data. This patch addresses full PHB fences events. Subsequent patches will address single PE and Multi-PE freeze/fences. As part of FRU details it will include slot location of RootPort, io base location code and processor chip fru details as below: | Callout Section | | | | Additional Sections : Disabled | | Callout Count : 3 | | | | Normal Hardware FRU | | Priority : Medium Priority | | Location Code : U78D2.001.RCH0060-P1-C2 | | | | Normal Hardware FRU | | Priority : Medium Priority | | Location Code : U78D2.001.RCH0060-P1 | | Part Number : 01EK968 | | Serial Number : Y230UF6C103M | | | | Normal Hardware FRU | | Priority : Medium Priority | | Location Code : U78D2.001.RCH0060-P1-C48 | | Part Number : 02CY253 | | Serial Number : YA1934460542 | | | Signed-off-by: Mahesh Salgaonkar --- Change in v2: - Introduce new elog type OPAL_RC_PCI_PHB_FREEZE. --- core/pci-opal.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++ hw/phb3.c | 3 ++ hw/phb4.c | 4 +++ include/errorlog.h | 2 + include/pci.h | 3 ++ 5 files changed, 82 insertions(+) diff --git a/core/pci-opal.c b/core/pci-opal.c index aa375c6aa..b0cb50069 100644 --- a/core/pci-opal.c +++ b/core/pci-opal.c @@ -13,6 +13,12 @@ #include #include #include +#include +#include + +DEFINE_LOG_ENTRY(OPAL_RC_PCI_PHB_FREEZE, OPAL_INPUT_OUTPUT_ERR_EVT, + OPAL_PCI, OPAL_IO_DEVICES, OPAL_UNRECOVERABLE_ERR_GENERAL, + OPAL_NA); #define OPAL_PCICFG_ACCESS_READ(op, cb, type) \ static int64_t opal_pci_config_##op(uint64_t phb_id, \ @@ -58,6 +64,66 @@ OPAL_PCICFG_ACCESS_WRITE(write_byte, write8, uint8_t) OPAL_PCICFG_ACCESS_WRITE(write_half_word, write16, uint16_t) OPAL_PCICFG_ACCESS_WRITE(write_word, write32, uint32_t) +/* Generate and send an error log/eSEL */ +static void send_eeh_serviceable_event(struct phb *phb, struct errorlog *buf, + void *diag_buffer) +{ + const char *loc, *part, *serial; + uint32_t chip_id, len; + struct OpalIoPhbErrorCommon *common; + + log_mark_serviceable(buf); + + /* Add FRU callout for PHB base (backplane) */ + loc = phb->base_loc_code; + part = phb->base_part_no; + serial = phb->base_serial_no; + log_add_callout_section(buf, loc, part, serial); + + /* Add FRU callout of associated chip id */ + chip_id = dt_get_chip_id(phb->dt_node); + loc = chip_loc_code(chip_id); + part = chip_part_number(chip_id); + serial = chip_serial_number(chip_id); + log_add_callout_section(buf, loc, part, serial); + + if (!diag_buffer) + goto skip_to_commit; + + /* Insert the phb diag data. */ + common = diag_buffer; + len = be32_to_cpu(common->len); + + log_add_section(buf, OPAL_ELOG_SEC_DIAG); + log_append_data(buf, diag_buffer, len); + +skip_to_commit: + log_commit(buf); +} + +static void send_phb_freeze_event(struct phb *phb, void *diag_buffer) +{ + struct errorlog *buf; + struct pci_device *pd; + const char *loc = NULL; + + buf = opal_elog_create(&e_info(OPAL_RC_PCI_PHB_FREEZE), 0); + if (!buf) { + prerror("Unable to send EEH error log (eSEL)\n"); + return; + } + + log_append_msg(buf, "PHB#%x Freeze/Fence detected!\n", phb->opal_id); + + /* Add slot location info of RootPort */ + pd = list_entry(phb->devices.n.next, struct pci_device, link); + loc = dt_prop_get_def(pd->dn, "ibm,slot-location-code", NULL); + log_add_callout_section(buf, loc, NULL, NULL); + + send_eeh_serviceable_event(phb, buf, diag_buffer); + phb->flags &= ~PCI_EEH_ERR_LOG_SEND; +} + static int64_t opal_pci_config_read_half_word_be(uint64_t phb_id, uint64_t bus_dev_func, uint64_t offset, @@ -1000,6 +1066,10 @@ static int64_t opal_pci_get_phb_diag_data2(uint64_t phb_id, return OPAL_UNSUPPORTED; phb_lock(phb); rc = phb->ops->get_diag_data2(phb, diag_buffer, diag_buffer_len); + + /* Send an error log if required */ + if (phb->flags & PCI_EEH_ERR_LOG_SEND) + send_phb_freeze_event(phb, diag_buffer); phb_unlock(phb); return rc; diff --git a/hw/phb3.c b/hw/phb3.c index 8af6b6164..5465b62ae 100644 --- a/hw/phb3.c +++ b/hw/phb3.c @@ -68,6 +68,9 @@ static bool phb3_fenced(struct phb3 *p) if (nfir & PPC_BIT(16)) { p->flags |= PHB3_AIB_FENCED; + /* Mark flag to send an error log */ + p->phb.flags |= PCI_EEH_ERR_LOG_SEND; + phb3_eeh_dump_regs(p, NULL); return true; } diff --git a/hw/phb4.c b/hw/phb4.c index 79bfdbf9a..cd50361fc 100644 --- a/hw/phb4.c +++ b/hw/phb4.c @@ -2554,6 +2554,9 @@ static bool phb4_fenced(struct phb4 *p) /* Mark ourselves fenced */ p->flags |= PHB4_AIB_FENCED; + /* Mark flag to send an error log */ + p->phb.flags |= PCI_EEH_ERR_LOG_SEND; + PHBERR(p, "PHB Freeze/Fence detected !\n"); phb4_dump_pec_err_regs(p); @@ -3448,6 +3451,7 @@ static int64_t phb4_creset(struct pci_slot *slot) p->flags &= ~PHB4_AIB_FENCED; p->flags &= ~PHB4_CAPP_RECOVERY; p->flags &= ~PHB4_CFG_USE_ASB; + phb4_init_hw(p); pci_slot_set_state(slot, PHB4_SLOT_CRESET_FRESET); diff --git a/include/errorlog.h b/include/errorlog.h index a9c3250e8..9bd200903 100644 --- a/include/errorlog.h +++ b/include/errorlog.h @@ -287,6 +287,7 @@ enum opal_reasoncode { OPAL_RC_PCI_ADD_SLOT = OPAL_SRC_COMPONENT_PCI | 0x11, OPAL_RC_PCI_SCAN = OPAL_SRC_COMPONENT_PCI | 0x12, OPAL_RC_PCI_RESET_PHB = OPAL_SRC_COMPONENT_PCI | 0x10, + OPAL_RC_PCI_PHB_FREEZE = OPAL_SRC_COMPONENT_PCI | 0x13, /* ATTN */ OPAL_RC_ATTN = OPAL_SRC_COMPONENT_ATTN | 0x10, /* MEM_ERR */ @@ -341,6 +342,7 @@ enum opal_reasoncode { }; #define OPAL_ELOG_SEC_DESC 0x44455343 +#define OPAL_ELOG_SEC_DIAG 0x44494147 /* For EEH diag data */ #define DEFINE_LOG_ENTRY(reason, type, id, subsys, \ severity, subtype) static struct opal_err_info err_##reason = \ diff --git a/include/pci.h b/include/pci.h index 0b7a1f8a6..b2a9af3e8 100644 --- a/include/pci.h +++ b/include/pci.h @@ -384,6 +384,9 @@ struct phb { /* Additional data the platform might need to attach */ void *platform_data; + + uint32_t flags; +#define PCI_EEH_ERR_LOG_SEND 0x1 }; static inline void phb_lock(struct phb *phb)