From patchwork Thu Dec 7 16:13:00 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh J Salgaonkar X-Patchwork-Id: 845665 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yt0tp2ttpz9t16 for ; Fri, 8 Dec 2017 03:14:58 +1100 (AEDT) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3yt0tm06sdzDsXq for ; Fri, 8 Dec 2017 03:14:55 +1100 (AEDT) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=mahesh@linux.vnet.ibm.com; receiver=) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3yt0sD6KqJzDrbF for ; Fri, 8 Dec 2017 03:13:36 +1100 (AEDT) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vB7GDY1G097106 for ; Thu, 7 Dec 2017 11:13:34 -0500 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 2eq85hawub-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 07 Dec 2017 11:13:32 -0500 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 7 Dec 2017 16:13:04 -0000 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp10.uk.ibm.com (192.168.101.140) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 7 Dec 2017 16:13:02 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vB7GD13N30539828 for ; Thu, 7 Dec 2017 16:13:01 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CC20952045 for ; Thu, 7 Dec 2017 15:06:13 +0000 (GMT) Received: from jupiter.in.ibm.com (unknown [9.79.213.217]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 765AC52043 for ; Thu, 7 Dec 2017 15:06:13 +0000 (GMT) From: Mahesh J Salgaonkar To: skiboot list Date: Thu, 07 Dec 2017 21:43:00 +0530 User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 17120716-0040-0000-0000-000003F77BF3 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17120716-0041-0000-0000-000025FA7998 Message-Id: <151266318054.20428.18312394104853955957.stgit@jupiter.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-12-07_07:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712070238 Subject: [Skiboot] [PATCH 1/2] opal/xscom: Move the delay inside xscom_reset() function. X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.24 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" From: Mahesh Salgaonkar So caller of xscom_reset() does not have to bother about adding a delay separately. Signed-off-by: Mahesh Salgaonkar --- hw/xscom.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/hw/xscom.c b/hw/xscom.c index 5b3bd88..2621465 100644 --- a/hw/xscom.c +++ b/hw/xscom.c @@ -96,6 +96,7 @@ static void xscom_reset(uint32_t gcid) { u64 hmer; uint32_t recv_status_reg, log_reg, err_reg; + struct timespec ts; /* Clear errors in HMER */ mtspr(SPR_HMER, HMER_CLR_MASK); @@ -126,6 +127,19 @@ static void xscom_reset(uint32_t gcid) hmer = xscom_wait_done(); if (hmer & SPR_HMER_XSCOM_FAIL) goto fail; + + /* + * Its observed that sometimes immediate retry of + * XSCOM operation returns wrong data. Adding a + * delay for XSCOM reset to be effective. Delay of + * 10 ms is found to be working fine experimentally. + * FIXME: Replace 10ms delay by exact delay needed + * or other alternate method to confirm XSCOM reset + * completion, after checking from HW folks. + */ + ts.tv_sec = 0; + ts.tv_nsec = 10 * 1000; + nanosleep_nopoll(&ts, NULL); return; fail: /* Fatal error resetting XSCOM */ @@ -140,7 +154,6 @@ static void xscom_reset(uint32_t gcid) static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr, bool is_write, int64_t retries) { - struct timespec ts; unsigned int stat = GETFIELD(SPR_HMER_XSCOM_STATUS, hmer); int64_t rc = OPAL_HARDWARE; @@ -160,18 +173,6 @@ static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_add XSCOM_BUSY_RESET_THRESHOLD, retries); xscom_reset(gcid); - /* - * Its observed that sometimes immediate retry of - * XSCOM operation returns wrong data. Adding a - * delay for XSCOM reset to be effective. Delay of - * 10 ms is found to be working fine experimentally. - * FIXME: Replace 10ms delay by exact delay needed - * or other alternate method to confirm XSCOM reset - * completion, after checking from HW folks. - */ - ts.tv_sec = 0; - ts.tv_nsec = 10 * 1000; - nanosleep_nopoll(&ts, NULL); } /* Log error if we have retried enough and its still busy */ From patchwork Thu Dec 7 16:13:06 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh J Salgaonkar X-Patchwork-Id: 845664 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yt0sc2FMfz9sP9 for ; Fri, 8 Dec 2017 03:13:56 +1100 (AEDT) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3yt0sb3SyHzDsPp for ; Fri, 8 Dec 2017 03:13:55 +1100 (AEDT) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=mahesh@linux.vnet.ibm.com; receiver=) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3yt0s91WygzDrbF for ; Fri, 8 Dec 2017 03:13:31 +1100 (AEDT) Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vB7GCqfX033697 for ; Thu, 7 Dec 2017 11:13:28 -0500 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0b-001b2d01.pphosted.com with ESMTP id 2eq643hmvh-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 07 Dec 2017 11:13:27 -0500 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 7 Dec 2017 16:13:10 -0000 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp15.uk.ibm.com (192.168.101.145) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 7 Dec 2017 16:13:08 -0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vB7GD8Ro43450496 for ; Thu, 7 Dec 2017 16:13:08 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 52D3E4C04A for ; Thu, 7 Dec 2017 16:07:58 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id ED4B24C050 for ; Thu, 7 Dec 2017 16:07:57 +0000 (GMT) Received: from jupiter.in.ibm.com (unknown [9.79.213.217]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP for ; Thu, 7 Dec 2017 16:07:57 +0000 (GMT) From: Mahesh J Salgaonkar To: skiboot list Date: Thu, 07 Dec 2017 21:43:06 +0530 In-Reply-To: <151266318054.20428.18312394104853955957.stgit@jupiter.in.ibm.com> References: <151266318054.20428.18312394104853955957.stgit@jupiter.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 17120716-0020-0000-0000-000003D58000 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17120716-0021-0000-0000-0000426B0CBF Message-Id: <151266318695.20428.3075500779619115039.stgit@jupiter.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-12-07_07:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712070238 Subject: [Skiboot] [PATCH 2/2] opal/xscom: Add recovery for lost core wakeup scom failures. X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.24 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" From: Mahesh Salgaonkar Due to a hardware issue where core responding to scom was delayed due to thread reconfiguration, leaves the SCOM logic in a state where the subsequent scom to that core can get errors. This is affected for Core PC scom registers in the range of 20010A80-20010ABF The solution is if a xscom timeout occurs to one of Core PC scom registers in the range of 20010A80-20010ABF, a clearing scom write is done to 0x20010800 with data of '0x00000000' which will also get a timeout but clears the scom logic errors. After the clearing write is done the original scom operation can be retried. The scom timeout is reported as status 0x4 (Invalid address) in HMER[21-23]. Signed-off-by: Mahesh Salgaonkar Reviewed-by: Nicholas Piggin --- hw/xscom.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- include/xscom.h | 8 ++++++ 2 files changed, 80 insertions(+), 3 deletions(-) diff --git a/hw/xscom.c b/hw/xscom.c index 2621465..2ad5114 100644 --- a/hw/xscom.c +++ b/hw/xscom.c @@ -151,8 +151,64 @@ static void xscom_reset(uint32_t gcid) */ } +static int xscom_clear_error(uint32_t gcid, uint32_t pcb_addr) +{ + u64 hmer; + uint32_t base_xscom_addr; + uint32_t xscom_clear_reg = 0x20010800; + + /* only in case of p9 */ + if (proc_gen != proc_gen_p9) + return 0; + + /* + * Due to a hardware issue where core responding to scom was delayed + * due to thread reconfiguration, leaves the scom logic in a state + * where the subsequent scom to that core can get errors. This is + * affected for Core PC scom registers in the range of + * 20010A80-20010ABF. + * + * The solution is if a xscom timeout occurs to one of Core PC scom + * registers in the range of 20010A80-20010ABF, a clearing scom + * write is done to 0x20010800 with data of '0x00000000' which will + * also get a timeout but clears the scom logic errors. After the + * clearing write is done the original scom operation can be retried. + * + * The scom timeout is reported as status 0x4 (Invalid address) + * in HMER[21-23]. + */ + + base_xscom_addr = pcb_addr & XSCOM_CLEAR_RANGE_MASK; + if (!((base_xscom_addr >= XSCOM_CLEAR_RANGE_START) && + (base_xscom_addr <= XSCOM_CLEAR_RANGE_END))) + return 0; + + /* Reset the XSCOM or next scom operation will fail. */ + xscom_reset(gcid); + + /* Clear errors in HMER */ + mtspr(SPR_HMER, HMER_CLR_MASK); + + /* Write 0 to clear the xscom logic errors on target chip */ + out_be64(xscom_addr(gcid, xscom_clear_reg), 0); + hmer = xscom_wait_done(); + + /* + * Above clearing xscom write will timeout and error out with + * invalid access as there is no register at that address. This + * xscom operation just helps to clear the xscom logic error. + * + * On failure, reset the XSCOM or we'll hang on the next access + */ + if (hmer & SPR_HMER_XSCOM_FAIL) + xscom_reset(gcid); + + return 1; +} + static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr, - bool is_write, int64_t retries) + bool is_write, int64_t retries, + int64_t *xscom_clear_retries) { unsigned int stat = GETFIELD(SPR_HMER_XSCOM_STATUS, hmer); int64_t rc = OPAL_HARDWARE; @@ -191,6 +247,15 @@ static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_add break; case 4: /* Invalid address / address error */ rc = OPAL_XSCOM_ADDR_ERROR; + if (xscom_clear_error(gcid, pcb_addr)) { + /* return busy if retries still pending. */ + if ((*xscom_clear_retries)--) + return OPAL_XSCOM_BUSY; + + prlog(PR_DEBUG, "XSCOM: error recovery failed for " + "gcid=0x%x pcb_addr=0x%x\n", gcid, pcb_addr); + + } break; case 5: /* Clock error */ rc = OPAL_XSCOM_CLOCK_ERROR; @@ -253,6 +318,7 @@ static int __xscom_read(uint32_t gcid, uint32_t pcb_addr, uint64_t *val) { uint64_t hmer; int64_t ret, retries; + int64_t xscom_clear_retries = XSCOM_CLEAR_MAX_RETRIES; if (!xscom_gcid_ok(gcid)) { prerror("%s: invalid XSCOM gcid 0x%x\n", __func__, gcid); @@ -276,7 +342,8 @@ static int __xscom_read(uint32_t gcid, uint32_t pcb_addr, uint64_t *val) return OPAL_SUCCESS; /* Handle error and possibly eventually retry */ - ret = xscom_handle_error(hmer, gcid, pcb_addr, false, retries); + ret = xscom_handle_error(hmer, gcid, pcb_addr, false, retries, + &xscom_clear_retries); if (ret != OPAL_BUSY) break; } @@ -303,6 +370,7 @@ static int __xscom_write(uint32_t gcid, uint32_t pcb_addr, uint64_t val) { uint64_t hmer; int64_t ret, retries = 0; + int64_t xscom_clear_retries = XSCOM_CLEAR_MAX_RETRIES; if (!xscom_gcid_ok(gcid)) { prerror("%s: invalid XSCOM gcid 0x%x\n", __func__, gcid); @@ -326,7 +394,8 @@ static int __xscom_write(uint32_t gcid, uint32_t pcb_addr, uint64_t val) return OPAL_SUCCESS; /* Handle error and possibly eventually retry */ - ret = xscom_handle_error(hmer, gcid, pcb_addr, true, retries); + ret = xscom_handle_error(hmer, gcid, pcb_addr, true, retries, + &xscom_clear_retries); if (ret != OPAL_BUSY) break; } diff --git a/include/xscom.h b/include/xscom.h index 5a5d0b9..3a1374c 100644 --- a/include/xscom.h +++ b/include/xscom.h @@ -206,6 +206,14 @@ /* Max number of retries when XSCOM remains busy */ #define XSCOM_BUSY_MAX_RETRIES 3000 +/* Max number of retries after xscom clearing is done */ +#define XSCOM_CLEAR_MAX_RETRIES 3 + +/* xscom clear address range/mask */ +#define XSCOM_CLEAR_RANGE_START 0x20010A00 +#define XSCOM_CLEAR_RANGE_END 0x20010ABF +#define XSCOM_CLEAR_RANGE_MASK 0x200FFBFF + /* Retry count after which to reset XSCOM, if still busy */ #define XSCOM_BUSY_RESET_THRESHOLD 1000