From patchwork Mon Apr 16 17:33:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh J Salgaonkar X-Patchwork-Id: 898823 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40PwV82vn2z9s3B for ; Tue, 17 Apr 2018 03:34:08 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 40PwV81l3MzF1wG for ; Tue, 17 Apr 2018 03:34:08 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=mahesh@linux.vnet.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 40PwT41P5mzF1x4 for ; Tue, 17 Apr 2018 03:33:11 +1000 (AEST) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w3GHVNke010119 for ; Mon, 16 Apr 2018 13:33:09 -0400 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 2hcxhb5jqf-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Mon, 16 Apr 2018 13:33:09 -0400 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 16 Apr 2018 18:33:07 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp10.uk.ibm.com (192.168.101.140) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 16 Apr 2018 18:33:06 +0100 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w3GHX5D943253938; Mon, 16 Apr 2018 17:33:06 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E1942AE057; Mon, 16 Apr 2018 18:22:56 +0100 (BST) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 662CDAE055; Mon, 16 Apr 2018 18:22:56 +0100 (BST) Received: from jupiter.in.ibm.com (unknown [9.102.1.147]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 16 Apr 2018 18:22:56 +0100 (BST) From: Mahesh J Salgaonkar To: skiboot list Date: Mon, 16 Apr 2018 23:03:04 +0530 In-Reply-To: <152389987405.2566.355149283827806637.stgit@jupiter.in.ibm.com> References: <152389987405.2566.355149283827806637.stgit@jupiter.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18041617-0040-0000-0000-0000042FA0B4 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18041617-0041-0000-0000-00002633A695 Message-Id: <152389998444.2566.14745965086804941860.stgit@jupiter.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-04-16_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1804160155 Subject: [Skiboot] [PATCH v2 02/15] opal/hmi: Remove races in clearing HMER X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" From: Benjamin Herrenschmidt Writing to HMER acts as an "AND". The current code writes back the value we originally read with the bits we handled cleared. This is racy, if a new bit gets set in HW after the original read, we'll end up clearing it without handling it. Instead, use an all 1's mask with only the bit handled cleared. Signed-off-by: Benjamin Herrenschmidt --- core/hmi.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/core/hmi.c b/core/hmi.c index 8c100ade5..f4cdbd57f 100644 --- a/core/hmi.c +++ b/core/hmi.c @@ -1139,7 +1139,7 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) { struct cpu_thread *cpu = this_cpu(); int recover = 1; - uint64_t tfmr; + uint64_t tfmr, handled = 0; /* * In case of split core, some of the Timer facility errors need @@ -1174,7 +1174,7 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) } } - hmer &= ~SPR_HMER_PROC_RECV_DONE; + handled |= SPR_HMER_PROC_RECV_DONE; if (hmi_evt) { hmi_evt->severity = OpalHMI_SEV_NO_ERROR; hmi_evt->type = OpalHMI_ERROR_PROC_RECOV_DONE; @@ -1182,7 +1182,7 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) } } if (hmer & SPR_HMER_PROC_RECV_ERROR_MASKED) { - hmer &= ~SPR_HMER_PROC_RECV_ERROR_MASKED; + handled |= SPR_HMER_PROC_RECV_ERROR_MASKED; if (hmi_evt) { hmi_evt->severity = OpalHMI_SEV_NO_ERROR; hmi_evt->type = OpalHMI_ERROR_PROC_RECOV_MASKED; @@ -1191,7 +1191,7 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) hmi_print_debug("Processor recovery Done (masked).", hmer); } if (hmer & SPR_HMER_PROC_RECV_AGAIN) { - hmer &= ~SPR_HMER_PROC_RECV_AGAIN; + handled |= SPR_HMER_PROC_RECV_AGAIN; if (hmi_evt) { hmi_evt->severity = OpalHMI_SEV_NO_ERROR; hmi_evt->type = OpalHMI_ERROR_PROC_RECOV_DONE_AGAIN; @@ -1202,7 +1202,7 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) } /* Assert if we see malfunction alert, we can not continue. */ if (hmer & SPR_HMER_MALFUNCTION_ALERT) { - hmer &= ~SPR_HMER_MALFUNCTION_ALERT; + handled |= SPR_HMER_MALFUNCTION_ALERT; hmi_print_debug("Malfunction Alert", hmer); if (hmi_evt) @@ -1211,7 +1211,7 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) /* Assert if we see Hypervisor resource error, we can not continue. */ if (hmer & SPR_HMER_HYP_RESOURCE_ERR) { - hmer &= ~SPR_HMER_HYP_RESOURCE_ERR; + handled |= SPR_HMER_HYP_RESOURCE_ERR; hmi_print_debug("Hypervisor resource error", hmer); recover = 0; @@ -1228,10 +1228,10 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) */ if (hmer & SPR_HMER_TFAC_ERROR) { tfmr = mfspr(SPR_TFMR); /* save original TFMR */ + handled |= SPR_HMER_TFAC_ERROR; hmi_print_debug("Timer Facility Error", hmer); - hmer &= ~SPR_HMER_TFAC_ERROR; recover = chiptod_recover_tb_errors(); if (hmi_evt) { hmi_evt->severity = OpalHMI_SEV_ERROR_SYNC; @@ -1242,7 +1242,7 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) } if (hmer & SPR_HMER_TFMR_PARITY_ERROR) { tfmr = mfspr(SPR_TFMR); /* save original TFMR */ - hmer &= ~SPR_HMER_TFMR_PARITY_ERROR; + handled |= SPR_HMER_TFMR_PARITY_ERROR; hmi_print_debug("TFMR parity Error", hmer); recover = chiptod_recover_tb_errors(); @@ -1259,9 +1259,11 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) /* * HMER bits are sticky, once set to 1 they remain set to 1 until * they are set to 0. Reset the error source bit to 0, otherwise - * we keep getting HMI interrupt again and again. + * we keep getting HMI interrupt again and again. Writing to HMER + * acts as an AND, so we write mask of all 1's except for the bits + * we want to clear. */ - mtspr(SPR_HMER, hmer); + mtspr(SPR_HMER, ~handled); hmi_exit(); /* Set the TB state looking at TFMR register before we head out. */ cpu->tb_invalid = !(mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID);