From patchwork Mon Sep 14 11:09:44 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh J Salgaonkar X-Patchwork-Id: 517367 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40C9B14076B for ; Mon, 14 Sep 2015 21:10:49 +1000 (AEST) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 1FF681A2C01 for ; Mon, 14 Sep 2015 21:10:49 +1000 (AEST) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Received: from e23smtp01.au.ibm.com (e23smtp01.au.ibm.com [202.81.31.143]) (using TLSv1 with cipher CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 665171A2BD6 for ; Mon, 14 Sep 2015 21:10:46 +1000 (AEST) Received: from /spool/local by e23smtp01.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 Sep 2015 21:10:45 +1000 Received: from d23dlp02.au.ibm.com (202.81.31.213) by e23smtp01.au.ibm.com (202.81.31.207) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 14 Sep 2015 21:10:42 +1000 X-Helo: d23dlp02.au.ibm.com X-MailFrom: mahesh@linux.vnet.ibm.com X-RcptTo: skiboot@lists.ozlabs.org Received: from d23relay07.au.ibm.com (d23relay07.au.ibm.com [9.190.26.37]) by d23dlp02.au.ibm.com (Postfix) with ESMTP id 4B0BD2BB0051 for ; Mon, 14 Sep 2015 21:10:42 +1000 (EST) Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139]) by d23relay07.au.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t8EBAZPE4063668 for ; Mon, 14 Sep 2015 21:10:43 +1000 Received: from d23av04.au.ibm.com (localhost [127.0.0.1]) by d23av04.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t8EBA9CE022163 for ; Mon, 14 Sep 2015 21:10:09 +1000 Received: from mars.in.ibm.com (mars.in.ibm.com [9.124.35.17]) by d23av04.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id t8EBA8MJ021728 for ; Mon, 14 Sep 2015 21:10:09 +1000 From: Mahesh J Salgaonkar To: skiboot list Date: Mon, 14 Sep 2015 16:39:44 +0530 Message-ID: <20150914110944.11402.69473.stgit@mars.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15091411-1618-0000-0000-000002C36574 Subject: [Skiboot] [PATCH 1/2] opal: Fix hang in time_wait* calls on HMI for TB errors. X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" From: Mahesh Salgaonkar On TOD/TB errors timebase register stops/freezes until HMI error recovery gets TOD/TB back into running state. However, while HMI recovery is in progress there are chances where some code path may invoke time_wait*() calls which depends on running TB value. In an event of TB not moving, time_wait* calls would keep looping resulting into a hang on that CPU. On OpenPower systems we are seeing system hang on TOD/TB errors. The hang is seen inside OPAL HMI handler while invoking prlog/perror(). The reason is, on OpenPower systems prlog/perror() depends on LPC UART console driver to flush log messages to the console. UART read/write calls invoke time_wait_nopoll() inside opb_[read|write]() functions. When TB is in stopped state this causes a hang in prlog/perror() calls. This patch fixes this issue by modifying time_wait_[no]poll() to check for TB validity and return immediately. Signed-off-by: Mahesh Salgaonkar --- core/hmi.c | 8 ++++++++ core/timebase.c | 10 ++++++++++ include/cpu.h | 1 + 3 files changed, 19 insertions(+) diff --git a/core/hmi.c b/core/hmi.c index cbd35e6..f4453c5 100644 --- a/core/hmi.c +++ b/core/hmi.c @@ -610,6 +610,12 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) pre_recovery_cleanup(); lock(&hmi_lock); + /* + * Not all HMIs would move TB into invalid state. Set the TB state + * looking at TFMR register. TFMR will tell us correct state of + * TB register. + */ + this_cpu()->tb_invalid = !(mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID); printf("HMI: Received HMI interrupt: HMER = 0x%016llx\n", hmer); if (hmi_evt) hmi_evt->hmer = hmer; @@ -697,6 +703,8 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt) */ mtspr(SPR_HMER, hmer); hmi_exit(); + /* Set the TB state looking at TFMR register before we head out. */ + this_cpu()->tb_invalid = !(mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID); unlock(&hmi_lock); return recover; } diff --git a/core/timebase.c b/core/timebase.c index b1d8196..4fcfae5 100644 --- a/core/timebase.c +++ b/core/timebase.c @@ -25,6 +25,11 @@ static void time_wait_poll(unsigned long duration) unsigned long end = mftb() + duration; unsigned long period = msecs_to_tb(5); + if (this_cpu()->tb_invalid) { + cpu_relax(); + return; + } + while (tb_compare(mftb(), end) != TB_AAFTERB) { /* Call pollers periodically but not continually to avoid * bouncing cachelines due to lock contention. */ @@ -57,6 +62,11 @@ void time_wait_nopoll(unsigned long duration) { unsigned long end = mftb() + duration; + if (this_cpu()->tb_invalid) { + cpu_relax(); + return; + } + while(tb_compare(mftb(), end) != TB_AAFTERB) cpu_relax(); } diff --git a/include/cpu.h b/include/cpu.h index d2c1825..03a51f9 100644 --- a/include/cpu.h +++ b/include/cpu.h @@ -85,6 +85,7 @@ struct cpu_thread { uint32_t *core_hmi_state_ptr; /* Mask to indicate thread id in core. */ uint8_t thread_mask; + bool tb_invalid; }; /* This global is set to 1 to allow secondaries to callin,