From patchwork Tue Mar 12 20:35:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Barrat X-Patchwork-Id: 1055752 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44JmwR1C2hz9s3q for ; Wed, 13 Mar 2019 07:36:39 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 44JmwR05wSzDqF6 for ; Wed, 13 Mar 2019 07:36:39 +1100 (AEDT) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=fbarrat@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 44Jmv41wt1zDq77 for ; Wed, 13 Mar 2019 07:35:27 +1100 (AEDT) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x2CKYNF5099142 for ; Tue, 12 Mar 2019 16:35:24 -0400 Received: from e06smtp01.uk.ibm.com (e06smtp01.uk.ibm.com [195.75.94.97]) by mx0a-001b2d01.pphosted.com with ESMTP id 2r6k3atkew-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 12 Mar 2019 16:35:24 -0400 Received: from localhost by e06smtp01.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 12 Mar 2019 20:35:22 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp01.uk.ibm.com (192.168.101.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 12 Mar 2019 20:35:20 -0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x2CKZIlo33685522 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 12 Mar 2019 20:35:18 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 92B4542049; Tue, 12 Mar 2019 20:35:18 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4E4BE4204C; Tue, 12 Mar 2019 20:35:18 +0000 (GMT) Received: from borneo.home (unknown [9.145.16.6]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 12 Mar 2019 20:35:18 +0000 (GMT) From: Frederic Barrat To: skiboot@lists.ozlabs.org, andrew.donnellan@au1.ibm.com Date: Tue, 12 Mar 2019 21:35:10 +0100 X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190312203515.18520-1-fbarrat@linux.ibm.com> References: <20190312203515.18520-1-fbarrat@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 19031220-4275-0000-0000-0000031A2D67 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19031220-4276-0000-0000-000038289961 Message-Id: <20190312203515.18520-3-fbarrat@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-03-12_12:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1903120139 Subject: [Skiboot] [PATCH v2 2/7] npu2-opencapi: Setup perf counters to detect CRC errors X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: clombard@linux.ibm.com, arbab@linux.ibm.com Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" It's possible to set up performance counters for the PLL to detect various conditions for the links in nvlink or opencapi mode. Since those counters are currently unused, let's configure them when an obus is in opencapi mode to detect CRC errors on the link. Each link has two counters: - CRC error detected by the host - CRC error detected by the DLx (NAK received by the host) We also dump the counters shortly after the link trains, but they can be read multiple times through cronus, pdbg or linux. The counters are configured to be reset after each read. Signed-off-by: Frederic Barrat Reviewed-by: Andrew Donnellan Reviewed-by: Christophe Lombard --- v2: no change hw/npu2-opencapi.c | 62 +++++++++++++++++++++++++++++++++++++++++++++ include/npu2-regs.h | 17 +++++++++++++ 2 files changed, 79 insertions(+) diff --git a/hw/npu2-opencapi.c b/hw/npu2-opencapi.c index 6ad561c4..6d642cde 100644 --- a/hw/npu2-opencapi.c +++ b/hw/npu2-opencapi.c @@ -909,6 +909,66 @@ static void reset_odl(uint32_t gcid, struct npu2_dev *dev) xscom_write(gcid, config_xscom, reg); } +static void setup_perf_counters(struct npu2_dev *dev) +{ + uint64_t addr, reg, link; + + /* + * setup the DLL perf counters to check CRC errors detected by + * the NPU or the adapter. + * + * Counter 0: link 0/ODL0, CRC error detected by ODL + * Counter 1: link 0/ODL0, CRC error detected by DLx + * Counter 2: link 1/ODL1, CRC error detected by ODL + * Counter 3: link 1/ODL1, CRC error detected by DLx + */ + if ((dev->brick_index == 2) || (dev->brick_index == 5)) + link = 0; + else + link = 1; + + addr = OB_DLL_PERF_MONITOR_CONFIG(dev->brick_index); + xscom_read(dev->npu->chip_id, addr, ®); + if (link == 0) { + reg = SETFIELD(OB_DLL_PERF_MONITOR_CONFIG_ENABLE, reg, + OB_DLL_PERF_MONITOR_CONFIG_LINK0); + reg = SETFIELD(OB_DLL_PERF_MONITOR_CONFIG_ENABLE >> 2, reg, + OB_DLL_PERF_MONITOR_CONFIG_LINK0); + } else { + reg = SETFIELD(OB_DLL_PERF_MONITOR_CONFIG_ENABLE >> 4, reg, + OB_DLL_PERF_MONITOR_CONFIG_LINK1); + reg = SETFIELD(OB_DLL_PERF_MONITOR_CONFIG_ENABLE >> 6, reg, + OB_DLL_PERF_MONITOR_CONFIG_LINK1); + } + reg = SETFIELD(OB_DLL_PERF_MONITOR_CONFIG_SIZE, reg, + OB_DLL_PERF_MONITOR_CONFIG_SIZE16); + xscom_write(dev->npu->chip_id, + OB_DLL_PERF_MONITOR_CONFIG(dev->brick_index), reg); + OCAPIDBG(dev, "perf counter config %llx = %llx\n", addr, reg); + + addr = OB_DLL_PERF_MONITOR_SELECT(dev->brick_index); + xscom_read(dev->npu->chip_id, addr, ®); + reg = SETFIELD(OB_DLL_PERF_MONITOR_SELECT_COUNTER >> (link * 16), + reg, OB_DLL_PERF_MONITOR_SELECT_CRC_ODL); + reg = SETFIELD(OB_DLL_PERF_MONITOR_SELECT_COUNTER >> ((link * 16) + 8), + reg, OB_DLL_PERF_MONITOR_SELECT_CRC_DLX); + xscom_write(dev->npu->chip_id, addr, reg); + OCAPIDBG(dev, "perf counter select %llx = %llx\n", addr, reg); +} + +static void check_perf_counters(struct npu2_dev *dev) +{ + uint64_t addr, reg, link0, link1; + + addr = OB_DLL_PERF_COUNTER0(dev->brick_index); + xscom_read(dev->npu->chip_id, addr, ®); + link0 = GETFIELD(PPC_BITMASK(0, 31), reg); + link1 = GETFIELD(PPC_BITMASK(32, 63), reg); + if (link0 || link1) + OCAPIERR(dev, "CRC error count link0=%08llx link1=%08llx\n", + link0, link1); +} + static void set_init_pattern(uint32_t gcid, struct npu2_dev *dev) { uint64_t reg, config_xscom; @@ -1048,6 +1108,7 @@ static int64_t npu2_opencapi_poll_link(struct pci_slot *slot) case OCAPI_SLOT_LINK_TRAINED: otl_enabletx(chip_id, dev->npu->xscom_base, dev); pci_slot_set_state(slot, OCAPI_SLOT_NORMAL); + check_perf_counters(dev); dev->phb_ocapi.scan_map = 1; return OPAL_SUCCESS; @@ -1569,6 +1630,7 @@ static void setup_device(struct npu2_dev *dev) setup_afu_mmio_bars(dev->npu->chip_id, dev->npu->xscom_base, dev); /* Procedure 13.1.3.9 - AFU Config BARs */ setup_afu_config_bars(dev->npu->chip_id, dev->npu->xscom_base, dev); + setup_perf_counters(dev); set_fence_control(dev->npu->chip_id, dev->npu->xscom_base, dev->brick_index, 0b00); diff --git a/include/npu2-regs.h b/include/npu2-regs.h index 5190aeb7..ca311097 100644 --- a/include/npu2-regs.h +++ b/include/npu2-regs.h @@ -725,6 +725,23 @@ void npu2_scom_write(uint64_t gcid, uint64_t scom_base, #define PU_IOE_PB_FP_CFG_FP1_FMR_DISABLE PPC_BIT(52) #define PU_IOE_PB_FP_CFG_FP1_PRS_DISABLE PPC_BIT(57) +#define OB_DLL_PERF_MONITOR_CONFIG(brick_index) \ + (0x901081C + ((brick_index - 2) >> 1) * 0x3000000) +#define OB_DLL_PERF_MONITOR_CONFIG_ENABLE PPC_BITMASK(0, 1) +#define OB_DLL_PERF_MONITOR_CONFIG_LINK0 0b10 +#define OB_DLL_PERF_MONITOR_CONFIG_LINK1 0b01 +#define OB_DLL_PERF_MONITOR_CONFIG_SIZE PPC_BITMASK(16, 23) +#define OB_DLL_PERF_MONITOR_CONFIG_SIZE16 0xFF +#define OB_DLL_PERF_MONITOR_SELECT(brick_index) \ + (0x901081D + ((brick_index - 2) >> 1) * 0x3000000) +#define OB_DLL_PERF_MONITOR_SELECT_COUNTER PPC_BITMASK(0, 7) +#define OB_DLL_PERF_MONITOR_SELECT_CRC_ODL 0x44 +#define OB_DLL_PERF_MONITOR_SELECT_CRC_DLX 0x45 +#define OB_DLL_PERF_COUNTER0(brick_index) \ + (0x901081E + ((brick_index - 2) >> 1) * 0x3000000) +#define OB_DLL_PERF_COUNTER0_VAL PPC_BITMASK(0, 31) + + #define OB_ODL_OFFSET(brick_index) \ ((((brick_index - 2) >> 1) * 0x3000000) + ((brick_index == 3 || brick_index == 4) ? 1 : 0))