From patchwork Fri Mar 22 14:58:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh J Salgaonkar X-Patchwork-Id: 1061289 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44QmxR31Cdz9sRj for ; Sat, 23 Mar 2019 01:58:19 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 44QmxQ4V7dzDqWw for ; Sat, 23 Mar 2019 01:58:18 +1100 (AEDT) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=mahesh@linux.vnet.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 44QmxH6QmxzDqVr for ; Sat, 23 Mar 2019 01:58:11 +1100 (AEDT) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x2MEnl6g147129 for ; Fri, 22 Mar 2019 10:58:09 -0400 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0a-001b2d01.pphosted.com with ESMTP id 2rd19621ca-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 22 Mar 2019 10:58:08 -0400 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 22 Mar 2019 14:57:57 -0000 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 22 Mar 2019 14:57:54 -0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x2MEw2Wn42598622 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 22 Mar 2019 14:58:03 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D86E2A405C; Fri, 22 Mar 2019 14:58:02 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 67200A4054; Fri, 22 Mar 2019 14:58:02 +0000 (GMT) Received: from [192.168.0.24] (unknown [9.85.84.86]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 22 Mar 2019 14:58:02 +0000 (GMT) From: Mahesh J Salgaonkar To: skiboot list Date: Fri, 22 Mar 2019 20:28:00 +0530 User-Agent: StGit/unknown-version MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 19032214-0016-0000-0000-00000265FD6E X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19032214-0017-0000-0000-000032C1236E Message-Id: <155326667329.5850.3915679443209236584.stgit@jupiter> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-03-22_08:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1903220109 Subject: [Skiboot] [PATCH] Fix hang in pnv_platform_error_reboot path due to TOD failure. X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: skiboot stable list Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" From: Mahesh Salgaonkar On TOD failure, with TB stuck, when linux heads down to pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed. But with panic cpu stuck inside OPAL, linux never recovers/reboot. p0 c1 t0 NIA : 0x000000003001dd3c <.time_wait+0x64> CFAR : 0x000000003001dce4 <.time_wait+0xc> MSR : 0x9000000002803002 LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec> STACK: SP NIA 0x0000000031c236e0 0x0000000031c23760 (big-endian) 0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec> 0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c> 0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150> 0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc> 0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc> 0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc> 0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4> 0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0> 0x0000000031c23d20 0x00000000300051e4 0xc000001fea6e7870 0xc0000000000a9060 0xc000001fea6e78c0 0xc000000000030b84 0xc000001fea6e7960 0xc0000000000310b0 0xc000001fea6e7990 0xc0000000004792d4 0xc000001fea6e7ad0 0xc00000000018a570 0xc000001fea6e7b40 0xc000000000028e5c 0xc000001fea6e7b60 0xc0000000000a7168 0xc000001fea6e7bd0 0xc0000000000ac9b8 0xc000001fea6e7c80 0xc00000000012d6c8 0xc000001fea6e7d20 0xc00000000012da28 0xc000001fea6e7db0 0xc0000000001366f4 0xc000001fea6e7e20 0xc00000000000b65c This is because, there is a while loop towards the end of ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match with "msg". It loops over time_wait_ms() until exit condition is met. In normal scenario time_wait_ms() calls run pollers so that ipmi backend gets a chance to check ipmi response and set sync_msg to NULL. while (sync_msg == msg) time_wait_ms(10); But in the event when TB is in failed state time_wait_ms()->time_wait_poll() returns immediately without calling pollers and hence we end up looping forever. This patch fixes this hang by calling opal_run_pollers() in TB failed state as well. Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.") Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Mahesh Salgaonkar Reviewed-by: Vasant Hegde --- core/timebase.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/core/timebase.c b/core/timebase.c index 0ae11d253..b1a02a4a1 100644 --- a/core/timebase.c +++ b/core/timebase.c @@ -29,6 +29,16 @@ static void time_wait_poll(unsigned long duration) unsigned long period = msecs_to_tb(5); if (this_cpu()->tb_invalid) { + /* + * Run pollers to allow some backends to process response. + * + * In TOD failure case where TOD is unrecoverable, running + * pollers allows ipmi backend to deal with ipmi response + * from bmc and helps ipmi_queue_msg_sync() to get un-stuck. + * Thus it avoids linux kernel to hang during panic due to + * TOD failure. + */ + opal_run_pollers(); cpu_relax(); return; }