From patchwork Thu Sep 8 07:22:40 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stewart Smith X-Patchwork-Id: 667321 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3sVBd82Dr5z9s2G for ; Thu, 8 Sep 2016 17:23:08 +1000 (AEST) Received: from ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3sVBd80yj4zDsTf for ; Thu, 8 Sep 2016 17:23:08 +1000 (AEST) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3sVBd31rBdzDsMy for ; Thu, 8 Sep 2016 17:23:02 +1000 (AEST) Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u887MqLf066118 for ; Thu, 8 Sep 2016 03:23:00 -0400 Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.153]) by mx0a-001b2d01.pphosted.com with ESMTP id 25at70k91n-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 08 Sep 2016 03:23:00 -0400 Received: from localhost by e35.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 8 Sep 2016 01:22:59 -0600 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e35.co.us.ibm.com (192.168.1.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 8 Sep 2016 01:22:58 -0600 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: stewart@linux.vnet.ibm.com Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 793CA3E40030; Thu, 8 Sep 2016 01:22:57 -0600 (MDT) Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u887MuG914221702; Thu, 8 Sep 2016 07:22:56 GMT Received: from localhost (unknown [127.0.0.1]) by IMSVA (Postfix) with SMTP id CDED9112047; Thu, 8 Sep 2016 03:22:56 -0400 (EDT) X-IMSS-HAND-OFF-DIRECTIVE: 127.0.0.1:10026 Received: from birb.localdomain (unknown [9.81.203.71]) by b01ledav004.gho.pok.ibm.com (Postfix) with SMTP id 0A07D112047; Thu, 8 Sep 2016 03:22:49 -0400 (EDT) Received: from ka1.ozlabs.ibm.com (localhost.localdomain [127.0.0.1]) by birb.localdomain (Postfix) with ESMTP id BEA21226E4A6; Thu, 8 Sep 2016 17:22:43 +1000 (AEST) From: Stewart Smith To: skiboot@lists.ozlabs.org Date: Thu, 8 Sep 2016 17:22:40 +1000 X-Mailer: git-send-email 2.1.4 In-Reply-To: <1473314753-30007-1-git-send-email-benh@kernel.crashing.org> References: <1473314753-30007-1-git-send-email-benh@kernel.crashing.org> X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16090807-0012-0000-0000-00001095001C X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005726; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000185; SDB=6.00755242; UDB=6.00357551; IPR=6.00528152; BA=6.00004701; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012618; XFM=3.00000011; UTC=2016-09-08 07:22:59 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16090807-0013-0000-0000-00004543484D Message-Id: <1473319360-29454-1-git-send-email-stewart@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-09-08_04:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1609080105 Subject: [Skiboot] [PATCH v2] xscom: Trace XSCOMs that take more than 10us X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" From: Benjamin Herrenschmidt If a XSCOM takes more than 10us, this will display an entry in the OPAL log, with the PCB address, the time, and a number which is either an error code (negative: the access failed) or a number of retries (positive: the number of times the HMER status returned "1"). We need to suspend that testing while the timebase is being resynchronized otherwise we will get bogus values. Signed-off-by: Benjamin Herrenschmidt [stewart@linux.vnet.ibm.com: bump to 20us, PR_DEBUG rather than printf] Signed-off-by: Stewart Smith --- hw/chiptod.c | 12 ++++++--- hw/xscom.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++----------- include/xscom.h | 5 ++++ 3 files changed, 80 insertions(+), 18 deletions(-) diff --git a/hw/chiptod.c b/hw/chiptod.c index f647830aab32..e0fe397eb69b 100644 --- a/hw/chiptod.c +++ b/hw/chiptod.c @@ -635,13 +635,15 @@ static bool chiptod_to_tb(void) return false; } + xscom_set_tb_unreliable(true); + /* Make us ready to get the TB from the chipTOD */ mtspr(SPR_TFMR, base_tfmr | SPR_TFMR_MOVE_CHIP_TOD_TO_TB); /* Tell the ChipTOD to send it */ if (xscom_writeme(TOD_CHIPTOD_TO_TB, PPC_BIT(0))) { prerror("XSCOM error writing CHIPTOD_TO_TB\n"); - return false; + goto fail; } /* Wait for it to complete */ @@ -649,16 +651,20 @@ static bool chiptod_to_tb(void) do { if (++timeout >= TIMEOUT_LOOPS) { prerror("Chip to TB timeout\n"); - return false; + goto fail; } tfmr = mfspr(SPR_TFMR); if (tfmr & SPR_TFMR_TFMR_CORRUPT) { prerror("MoveToTB: corrupt TFMR !\n"); - return false; + goto fail; } } while(tfmr & SPR_TFMR_MOVE_CHIP_TOD_TO_TB); + xscom_set_tb_unreliable(false); return true; + fail: + xscom_set_tb_unreliable(false); + return false; } static bool chiptod_check_tb_running(void) diff --git a/hw/xscom.c b/hw/xscom.c index 9e9dcee49bd8..6e2b5772a105 100644 --- a/hw/xscom.c +++ b/hw/xscom.c @@ -25,6 +25,9 @@ #include #include +/* Delay after which we warn that a XSCOM is taking a long time */ +#define XSCOM_DELAY_WARN_THRESHOLD (usecs_to_tb(20)) + /* Mask of bits to clear in HMER before an access */ #define HMER_CLR_MASK (~(SPR_HMER_XSCOM_FAIL | \ SPR_HMER_XSCOM_DONE | \ @@ -61,6 +64,7 @@ static struct { * use a global lock instead */ static struct lock xscom_lock = LOCK_UNLOCKED; +static int64_t xscom_tb_unreliable; static inline void *xscom_addr(uint32_t gcid, uint32_t pcb_addr) { @@ -220,7 +224,7 @@ static bool xscom_gcid_ok(uint32_t gcid) * Low level XSCOM access functions, perform a single direct xscom * access via MMIO */ -static int __xscom_read(uint32_t gcid, uint32_t pcb_addr, uint64_t *val) +static int64_t __xscom_read(uint32_t gcid, uint32_t pcb_addr, uint64_t *val) { uint64_t hmer; int64_t ret, retries; @@ -244,7 +248,7 @@ static int __xscom_read(uint32_t gcid, uint32_t pcb_addr, uint64_t *val) /* Check for error */ if (!(hmer & SPR_HMER_XSCOM_FAIL)) - return OPAL_SUCCESS; + return retries; /* Handle error and possibly eventually retry */ ret = xscom_handle_error(hmer, gcid, pcb_addr, false, retries); @@ -256,7 +260,7 @@ static int __xscom_read(uint32_t gcid, uint32_t pcb_addr, uint64_t *val) return ret; } -static int __xscom_write(uint32_t gcid, uint32_t pcb_addr, uint64_t val) +static int64_t __xscom_write(uint32_t gcid, uint32_t pcb_addr, uint64_t val) { uint64_t hmer; int64_t ret, retries = 0; @@ -280,7 +284,7 @@ static int __xscom_write(uint32_t gcid, uint32_t pcb_addr, uint64_t val) /* Check for error */ if (!(hmer & SPR_HMER_XSCOM_FAIL)) - return OPAL_SUCCESS; + return retries; /* Handle error and possibly eventually retry */ ret = xscom_handle_error(hmer, gcid, pcb_addr, true, retries); @@ -295,11 +299,12 @@ static int __xscom_write(uint32_t gcid, uint32_t pcb_addr, uint64_t val) /* * Indirect XSCOM access functions */ -static int xscom_indirect_read(uint32_t gcid, uint64_t pcb_addr, uint64_t *val) +static int64_t xscom_indirect_read(uint32_t gcid, uint64_t pcb_addr, uint64_t *val) { uint32_t addr; uint64_t data; int rc, retries; + int64_t comp_retries = 0; if (proc_gen < proc_gen_p8) { *val = (uint64_t)-1; @@ -311,14 +316,16 @@ static int xscom_indirect_read(uint32_t gcid, uint64_t pcb_addr, uint64_t *val) data = XSCOM_DATA_IND_READ | (pcb_addr & XSCOM_ADDR_IND_ADDR); rc = __xscom_write(gcid, addr, data); - if (rc) + if (rc < 0) goto bail; + comp_retries = rc; /* Wait for completion */ for (retries = 0; retries < XSCOM_IND_MAX_RETRIES; retries++) { rc = __xscom_read(gcid, addr, &data); - if (rc) + if (rc < 0) goto bail; + comp_retries += rc; if ((data & XSCOM_DATA_IND_COMPLETE) && ((data & XSCOM_DATA_IND_ERR) == 0)) { *val = data & XSCOM_DATA_IND_DATA; @@ -333,16 +340,19 @@ static int xscom_indirect_read(uint32_t gcid, uint64_t pcb_addr, uint64_t *val) } } bail: - if (rc) + if (rc < 0) { *val = (uint64_t)-1; - return rc; + return rc; + } + return comp_retries; } -static int xscom_indirect_write(uint32_t gcid, uint64_t pcb_addr, uint64_t val) +static int64_t xscom_indirect_write(uint32_t gcid, uint64_t pcb_addr, uint64_t val) { uint32_t addr; uint64_t data; int rc, retries; + int64_t comp_retries = 0; if (proc_gen < proc_gen_p8) return OPAL_UNSUPPORTED; @@ -353,14 +363,16 @@ static int xscom_indirect_write(uint32_t gcid, uint64_t pcb_addr, uint64_t val) data |= val & XSCOM_ADDR_IND_DATA; rc = __xscom_write(gcid, addr, data); - if (rc) + if (rc < 0) goto bail; + comp_retries = rc; /* Wait for completion */ for (retries = 0; retries < XSCOM_IND_MAX_RETRIES; retries++) { rc = __xscom_read(gcid, addr, &data); - if (rc) + if (rc < 0) goto bail; + comp_retries += rc; if ((data & XSCOM_DATA_IND_COMPLETE) && ((data & XSCOM_DATA_IND_ERR) == 0)) break; @@ -373,7 +385,9 @@ static int xscom_indirect_write(uint32_t gcid, uint64_t pcb_addr, uint64_t val) } } bail: - return rc; + if (rc < 0) + return rc; + return comp_retries; } static uint32_t xscom_decode_chiplet(uint32_t partid, uint64_t *pcb_addr) @@ -397,8 +411,9 @@ static uint32_t xscom_decode_chiplet(uint32_t partid, uint64_t *pcb_addr) */ int xscom_read(uint32_t partid, uint64_t pcb_addr, uint64_t *val) { + uint64_t t_begin, t_end; uint32_t gcid; - int rc; + int64_t rc; /* Handle part ID decoding */ switch(partid >> 28) { @@ -426,15 +441,27 @@ int xscom_read(uint32_t partid, uint64_t pcb_addr, uint64_t *val) /* HW822317 requires us to do global locking */ lock(&xscom_lock); + t_begin = mftb(); /* Direct vs indirect access */ if (pcb_addr & XSCOM_ADDR_IND_FLAG) rc = xscom_indirect_read(gcid, pcb_addr, val); else rc = __xscom_read(gcid, pcb_addr & 0x7fffffff, val); + t_end = mftb(); + + /* Check if we are told not to bother about warning on long delays... */ + if (xscom_tb_unreliable) + t_end = t_begin; /* Unlock it */ unlock(&xscom_lock); + + if (tb_compare(t_begin + XSCOM_DELAY_WARN_THRESHOLD, t_end) == TB_ABEFOREB) + prlog(PR_DEBUG, "XSCOM Read from %08llx took %ld us with %lld (err/retries)\n", + pcb_addr, tb_to_usecs(t_end - t_begin), rc); + if (rc > 0) + rc = 0; return rc; } @@ -442,8 +469,9 @@ opal_call(OPAL_XSCOM_READ, xscom_read, 3); int xscom_write(uint32_t partid, uint64_t pcb_addr, uint64_t val) { + uint64_t t_begin, t_end; uint32_t gcid; - int rc; + int64_t rc; /* Handle part ID decoding */ switch(partid >> 28) { @@ -469,19 +497,42 @@ int xscom_write(uint32_t partid, uint64_t pcb_addr, uint64_t val) /* HW822317 requires us to do global locking */ lock(&xscom_lock); + t_begin = mftb(); /* Direct vs indirect access */ if (pcb_addr & XSCOM_ADDR_IND_FLAG) rc = xscom_indirect_write(gcid, pcb_addr, val); else rc = __xscom_write(gcid, pcb_addr & 0x7fffffff, val); + t_end = mftb(); + + /* Check if we are told not to bother about warning on long delays... */ + if (xscom_tb_unreliable) + t_end = t_begin; /* Unlock it */ unlock(&xscom_lock); + + if (tb_compare(t_begin + XSCOM_DELAY_WARN_THRESHOLD, t_end) == TB_ABEFOREB) + printf("XSCOM Write to %08llx took %ld us with %lld (err/retries)\n", + pcb_addr, tb_to_usecs(t_end - t_begin), rc); + if (rc > 0) + rc = 0; + return rc; } opal_call(OPAL_XSCOM_WRITE, xscom_write, 3); +void xscom_set_tb_unreliable(bool unreliable) +{ + lock(&xscom_lock); + if (unreliable) + xscom_tb_unreliable++; + else if (xscom_tb_unreliable) + xscom_tb_unreliable--; + unlock(&xscom_lock); +} + int xscom_readme(uint64_t pcb_addr, uint64_t *val) { return xscom_read(this_cpu()->chip_id, pcb_addr, val); diff --git a/include/xscom.h b/include/xscom.h index f9550218d4f5..4457b82f9f6a 100644 --- a/include/xscom.h +++ b/include/xscom.h @@ -220,6 +220,11 @@ extern void xscom_init(void); /* Mark XSCOM lock as being in console path */ extern void xscom_used_by_console(void); +/* Mark XSCOM timebase warnings unreliable (used by the chiptod + * code when TB is being resynchronized + */ +extern void xscom_set_tb_unreliable(bool unreliable); + /* Returns true if XSCOM can be used. Typically this returns false if * the current CPU holds the XSCOM lock (to avoid re-entrancy from error path). */