From patchwork Wed Mar 20 18:35:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Barrat X-Patchwork-Id: 1059467 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44Pdtc5KMbz9sPf for ; Thu, 21 Mar 2019 05:36:56 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 44Pdtc3hl3zDqHv for ; Thu, 21 Mar 2019 05:36:56 +1100 (AEDT) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=fbarrat@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 44Pds66M8wzDqJs for ; Thu, 21 Mar 2019 05:35:38 +1100 (AEDT) Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x2KITg5P021682 for ; Wed, 20 Mar 2019 14:35:37 -0400 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 2rbqhek5gp-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 20 Mar 2019 14:35:36 -0400 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 20 Mar 2019 18:35:30 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp02.uk.ibm.com (192.168.101.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 20 Mar 2019 18:35:27 -0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x2KIZUR234668768 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Mar 2019 18:35:30 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 36D88AE051; Wed, 20 Mar 2019 18:35:30 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E629BAE04D; Wed, 20 Mar 2019 18:35:29 +0000 (GMT) Received: from borneo.home (unknown [9.145.24.205]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 20 Mar 2019 18:35:29 +0000 (GMT) From: Frederic Barrat To: skiboot@lists.ozlabs.org, andrew.donnellan@au1.ibm.com Date: Wed, 20 Mar 2019 19:35:20 +0100 X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190320183522.8310-1-fbarrat@linux.ibm.com> References: <20190320183522.8310-1-fbarrat@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 19032018-0008-0000-0000-000002CFA970 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19032018-0009-0000-0000-0000223BC234 Message-Id: <20190320183522.8310-6-fbarrat@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-03-20_11:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1903200135 Subject: [Skiboot] [PATCH 5/7] hw/npu2: Report errors to the OS if an OpenCAPI brick is fenced X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: clombard@linux.ibm.com, arbab@linux.ibm.com Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" Now that the NPU may report interrupts due to the link going down unexpectedly, report those errors to the OS when queried by the 'next_error' PHB callback. The hardware doesn't support recovery of the link when it goes down unexpectedly. So we report the PHB as dead, so that the OS can log the proper message, notify the drivers and take the devices down. Signed-off-by: Frederic Barrat --- hw/npu2-opencapi.c | 55 ++++++++++++++++++++++++++++++++++++++++++---- include/npu2.h | 1 + 2 files changed, 52 insertions(+), 4 deletions(-) diff --git a/hw/npu2-opencapi.c b/hw/npu2-opencapi.c index 285615a5..9df51b22 100644 --- a/hw/npu2-opencapi.c +++ b/hw/npu2-opencapi.c @@ -1434,18 +1434,64 @@ static int64_t npu2_opencapi_ioda_reset(struct phb __unused *phb, return OPAL_SUCCESS; } -static int64_t npu2_opencapi_set_pe(struct phb __unused *phb, - uint64_t __unused pe_num, +static int64_t npu2_opencapi_set_pe(struct phb *phb, + uint64_t pe_num, uint64_t __unused bdfn, uint8_t __unused bcompare, uint8_t __unused dcompare, uint8_t __unused fcompare, uint8_t __unused action) { + struct npu2_dev *dev = phb_to_npu2_dev_ocapi(phb); /* * Ignored on OpenCAPI - we use fixed PE assignments. May need * addressing when we support dual-link devices. + * + * We nonetheless store the PE reported by the OS so that we + * can send it back in case of error. If there are several PCI + * functions on the device, the OS can define many PEs, we + * only keep one, the OS will handle it. */ + dev->linux_pe = pe_num; + return OPAL_SUCCESS; +} + +static int64_t npu2_opencapi_freeze_status(struct phb *phb __unused, + uint64_t pe_number __unused, + uint8_t *freeze_state, + uint16_t *pci_error_type, + uint16_t *severity) +{ + *freeze_state = OPAL_EEH_STOPPED_NOT_FROZEN; + *pci_error_type = OPAL_EEH_NO_ERROR; + if (severity) + *severity = OPAL_EEH_SEV_NO_ERROR; + + return OPAL_SUCCESS; +} + +static int64_t npu2_opencapi_eeh_next_error(struct phb *phb, + uint64_t *first_frozen_pe, + uint16_t *pci_error_type, + uint16_t *severity) +{ + struct npu2_dev *dev = phb_to_npu2_dev_ocapi(phb); + uint64_t reg; + + if (!first_frozen_pe || !pci_error_type || !severity) + return OPAL_PARAMETER; + + reg = npu2_read(dev->npu, NPU2_MISC_FENCE_STATE); + if (reg & PPC_BIT(dev->brick_index)) { + OCAPIERR(dev, "Brick %d fenced!\n", dev->brick_index); + *first_frozen_pe = dev->linux_pe; + *pci_error_type = OPAL_EEH_PHB_ERROR; + *severity = OPAL_EEH_SEV_PHB_DEAD; + } else { + *first_frozen_pe = -1; + *pci_error_type = OPAL_EEH_NO_ERROR; + *severity = OPAL_EEH_SEV_NO_ERROR; + } return OPAL_SUCCESS; } @@ -1646,6 +1692,7 @@ static void setup_device(struct npu2_dev *dev) dev->phb_ocapi.scan_map = 0; dev->bdfn = 0; + dev->linux_pe = -1; dev->train_need_fence = false; dev->train_fenced = false; @@ -1765,10 +1812,10 @@ static const struct phb_ops npu2_opencapi_ops = { .get_msi_64 = NULL, .set_pe = npu2_opencapi_set_pe, .set_peltv = NULL, - .eeh_freeze_status = npu2_freeze_status, /* TODO */ + .eeh_freeze_status = npu2_opencapi_freeze_status, .eeh_freeze_clear = NULL, .eeh_freeze_set = NULL, - .next_error = NULL, + .next_error = npu2_opencapi_eeh_next_error, .err_inject = NULL, .get_diag_data = NULL, .get_diag_data2 = NULL, diff --git a/include/npu2.h b/include/npu2.h index 6c73679f..ef4e7aff 100644 --- a/include/npu2.h +++ b/include/npu2.h @@ -157,6 +157,7 @@ struct npu2_dev { /* OpenCAPI */ struct phb phb_ocapi; + uint64_t linux_pe; bool train_need_fence; bool train_fenced; };