From patchwork Wed Mar 14 10:08:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh J Salgaonkar X-Patchwork-Id: 885683 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 401TQg4HJwz9sV3 for ; Wed, 14 Mar 2018 22:05:15 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 401TQg2pTyzF1bP for ; Wed, 14 Mar 2018 22:05:15 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=mahesh@linux.vnet.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 401S8z0l94zF1MH for ; Wed, 14 Mar 2018 21:08:18 +1100 (AEDT) Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2EA7vVC062494 for ; Wed, 14 Mar 2018 06:08:17 -0400 Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 2gq0sfae0s-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Wed, 14 Mar 2018 06:08:15 -0400 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 14 Mar 2018 10:08:12 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 14 Mar 2018 10:08:11 -0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w2EA8ARt54329552; Wed, 14 Mar 2018 10:08:10 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 282F411C058; Wed, 14 Mar 2018 10:00:49 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8213411C050; Wed, 14 Mar 2018 10:00:48 +0000 (GMT) Received: from jupiter.in.ibm.com (unknown [9.109.203.87]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 14 Mar 2018 10:00:48 +0000 (GMT) From: Mahesh J Salgaonkar To: skiboot list Date: Wed, 14 Mar 2018 15:38:08 +0530 In-Reply-To: <152102197095.14271.15217978286746180285.stgit@jupiter.in.ibm.com> References: <152102197095.14271.15217978286746180285.stgit@jupiter.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18031410-0008-0000-0000-000004DD4D68 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18031410-0009-0000-0000-00001E705032 Message-Id: <152102208893.14271.10456431992823909816.stgit@jupiter.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-03-14_05:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1803140118 Subject: [Skiboot] [PATCH 08/14] opal/hmi: Do not send HMI event if no errors are found. X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" From: Mahesh Salgaonkar For TOD errors, all the cores in the chip get HMIs. Any one thread from any core can fix the issue and TFMR will have error conditions cleared. Rest of the threads need take any action if TOD errors are already cleared. Hence thread 0 of every core should get a fresh copy of TFMR before going ahead recovery path. Initialize recover = -1, so that if no errors found that thread need not send a HMI event to linux. This helps in stop flooding host with hmi event by every thread even there are no errors found. Signed-off-by: Mahesh Salgaonkar --- core/hmi.c | 21 +++++++++++++-------- hw/chiptod.c | 6 +++++- include/chiptod.h | 2 +- 3 files changed, 19 insertions(+), 10 deletions(-) diff --git a/core/hmi.c b/core/hmi.c index 729861622..390e9852e 100644 --- a/core/hmi.c +++ b/core/hmi.c @@ -920,7 +920,7 @@ static int handle_thread_tfac_error(uint64_t tfmr, uint64_t *out_flags) static int handle_all_core_tfac_error(uint64_t tfmr, uint64_t *out_flags) { struct cpu_thread *t, *t0; - int recover = 1; + int recover = -1; t = this_cpu(); t0 = find_cpu_by_pir(cpu_get_thread0(t)); @@ -940,11 +940,15 @@ static int handle_all_core_tfac_error(uint64_t tfmr, uint64_t *out_flags) if (tfmr & SPR_TFMR_TFMR_CORRUPT) { /* Check if it's still in error state */ if (mfspr(SPR_TFMR) & SPR_TFMR_TFMR_CORRUPT) - if (!recover_corrupt_tfmr()) + if (!recover_corrupt_tfmr()) { + unlock(&hmi_lock); recover = 0; + } - if (!recover) + if (!recover) { + unlock(&hmi_lock); goto error_out; + } tfmr = mfspr(SPR_TFMR); @@ -953,8 +957,10 @@ static int handle_all_core_tfac_error(uint64_t tfmr, uint64_t *out_flags) recover = handle_thread_tfac_error(tfmr, out_flags); tfmr &= ~SPR_TFMR_THREAD_ERRORS; } - if (!recover) + if (!recover) { + unlock(&hmi_lock); goto error_out; + } } /* Tell the OS ... */ @@ -988,8 +994,7 @@ static int handle_all_core_tfac_error(uint64_t tfmr, uint64_t *out_flags) /* Now perform the actual TB recovery on thread 0 */ if (t == t0) - recover = chiptod_recover_tb_errors(tfmr, - &this_cpu()->tb_resynced); + recover = chiptod_recover_tb_errors(&this_cpu()->tb_resynced); error_out: /* Last rendez-vous */ @@ -1008,7 +1013,7 @@ error_out: static int handle_tfac_errors(uint64_t hmer, struct OpalHMIEvent *hmi_evt, uint64_t *out_flags) { - int recover = 1; + int recover = -1; uint64_t tfmr = mfspr(SPR_TFMR); /* A TFMR parity error makes us ignore all the local stuff */ @@ -1070,7 +1075,7 @@ static int handle_tfac_errors(uint64_t hmer, struct OpalHMIEvent *hmi_evt, prlog(PR_ERR, "HMI: TB invalid without core error reported !\n"); } - if (hmi_evt) { + if (recover != -1 && hmi_evt) { hmi_evt->severity = OpalHMI_SEV_ERROR_SYNC; hmi_evt->type = OpalHMI_ERROR_TFAC; hmi_evt->tfmr = tfmr; diff --git a/hw/chiptod.c b/hw/chiptod.c index a160e5a10..f6ef9a469 100644 --- a/hw/chiptod.c +++ b/hw/chiptod.c @@ -1505,8 +1505,9 @@ bool tfmr_clear_core_errors(uint64_t tfmr) * 1 <= Successfully recovered from errors * -1 <= No errors found. Errors are already been fixed. */ -int chiptod_recover_tb_errors(uint64_t tfmr, bool *out_resynced) +int chiptod_recover_tb_errors(bool *out_resynced) { + uint64_t tfmr; int rc = -1; *out_resynced = false; @@ -1516,6 +1517,9 @@ int chiptod_recover_tb_errors(uint64_t tfmr, bool *out_resynced) lock(&chiptod_lock); + /* Get fresh copy of TFMR */ + tfmr = mfspr(SPR_TFMR); + /* * Check for TB errors. * On Sync check error, bit 44 of TFMR is set. Check for it and diff --git a/include/chiptod.h b/include/chiptod.h index 7708d4899..667e6fd83 100644 --- a/include/chiptod.h +++ b/include/chiptod.h @@ -29,7 +29,7 @@ enum chiptod_topology { extern void chiptod_init(void); extern bool chiptod_wakeup_resync(void); -extern int chiptod_recover_tb_errors(uint64_t tfmr, bool *out_resynced); +extern int chiptod_recover_tb_errors(bool *out_resynced); extern bool tfmr_recover_local_errors(uint64_t tfmr); extern bool recover_corrupt_tfmr(void); extern void tfmr_cleanup_core_errors(uint64_t tfmr);