From patchwork Fri Dec 11 05:31:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Vasant Hegde X-Patchwork-Id: 1414739 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CsfXv0lyNz9sWY for ; Fri, 11 Dec 2020 16:31:43 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linux.vnet.ibm.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=X8cXlxno; dkim-atps=neutral Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4CsfXt3FX3zDqtp for ; Fri, 11 Dec 2020 16:31:42 +1100 (AEDT) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=hegdevasant@linux.vnet.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=linux.vnet.ibm.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=X8cXlxno; dkim-atps=neutral Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4CsfXf4T8RzDqs2 for ; Fri, 11 Dec 2020 16:31:30 +1100 (AEDT) Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0BB5OKVU192303; Fri, 11 Dec 2020 00:31:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-type : content-transfer-encoding; s=pp1; bh=XoKsLD9DFH9TMXeExkWdcxm+4LSL9D9WHFODBCOl9is=; b=X8cXlxnoy4OEiMeA9mJ58Jfz79NOPRfhFkBKn+qm1ShqSBs+Lij7T7x87qs0r1eD0odb YJToYgA2XhhRFj2hid/Uq0Wy4m8FeWeVYg60Py+v8cf7hB2szQSOYqHwDSUibFq9qpss 3NQu0AWdcr8K0T3hvS3yYiDB6agygkp+EPgkXLNeCXdBn2JIoWyAz2aOL/BS4bJuGSsL BJmWEuc3ZdQY53BCzr+m8Q+wogeAojocaKZ2ukh45/PhURKWEq5eLUOJhiQtc/NX4FLa D1d/QDeQlM2xr8uF1d9lxMG9OTxsvSamCksfkbaz3b1Rp7piWgemS4vyFqIEN34hk+Ls qQ== Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com with ESMTP id 35c2kgg3a8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 11 Dec 2020 00:31:18 -0500 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0BB5ReOH012868; Fri, 11 Dec 2020 05:31:16 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma06ams.nl.ibm.com with ESMTP id 3581fhpmd4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 11 Dec 2020 05:31:16 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0BB5VEEg33751532 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 11 Dec 2020 05:31:14 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EA284A4064; Fri, 11 Dec 2020 05:31:13 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 108D0A4060; Fri, 11 Dec 2020 05:31:13 +0000 (GMT) Received: from hegdevasant.in.ibm.com (unknown [9.199.47.197]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 11 Dec 2020 05:31:12 +0000 (GMT) From: Vasant Hegde To: skiboot@lists.ozlabs.org Date: Fri, 11 Dec 2020 11:01:06 +0530 Message-Id: <20201211053106.273472-1-hegdevasant@linux.vnet.ibm.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.343, 18.0.737 definitions=2020-12-11_01:2020-12-09, 2020-12-11 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 lowpriorityscore=0 clxscore=1015 mlxscore=0 mlxlogscore=999 bulkscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 priorityscore=1501 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012110028 Subject: [Skiboot] [PATCH] Fix possible deadlock with DEBUG build X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?C=C3=A9dric_Le_Goater?= Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" Sample output from Cédric: ------------------------- [ 88.294111649,7] cpu_idle_p9 called on cpu 0x063c with pm disabled [ 88.289365222,7] cpu_idle_p9 called on cpu 0x025f with pm disabled [ 88.289900684,7] cpu_idle_p9 called on cpu 0x045f with pm disabled [ 88.302621295,7] CHIPTOD: Base TFMR=0x2512000000000000 [ 88.289899701,7] cpu_idle_p9 called on cpu 0x0456 with pm disabled LOCK ERROR: Deadlock detected @0x30402740 (state: 0x0000000400000001) [ 88.332264757,3] *********************************************** [ 88.332300051,3] < assert failed at core/lock.c:32 > [ 88.332328282,3] . [ 88.332347335,3] . [ 88.332364894,3] . [ 88.332377963,3] OO__) [ 88.332395458,3] <"__/ [ 88.332412628,3] ^ ^ [ 88.332450246,3] Fatal TRAP at 00000000300286a0 .lock_error+0x64 MSR 9000000000021002 [ 88.332501812,3] CFAR : 00000000300414f4 MSR : 9000000000021002 [ 88.332536539,3] SRR0 : 00000000300286a0 SRR1 : 9000000000021002 [ 88.332574644,3] HSRR0: 0000000030020024 HSRR1: 9000000000001000 [ 88.332610635,3] DSISR: 00000000 DAR : 0000000000000000 [ 88.332650628,3] LR : 0000000030028690 CTR : 00000000300f9fa0 [ 88.332684451,3] CR : 20002000 XER : 00000000 [ 88.332712767,3] GPR00: 0000000030028690 GPR16: 0000000032c98000 [ 88.332748046,3] GPR01: 0000000032c9b0a0 GPR17: 0000000000000000 [ 88.332784060,3] GPR02: 0000000030169d00 GPR18: 0000000000000000 [ 88.332822091,3] GPR03: 0000000032c9b310 GPR19: 0000000000000000 [ 88.332861357,3] GPR04: 0000000030041480 GPR20: 0000000000000000 [ 88.332897229,3] GPR05: 0000000000000000 GPR21: 0000000000000000 [ 88.332937051,3] GPR06: 0000000000000010 GPR22: 0000000000000000 [ 88.332968463,3] GPR07: 0000000000000000 GPR23: 0000000000000000 [ 88.333007333,3] GPR08: 000000000002cbb5 GPR24: 0000000000000000 [ 88.333041971,3] GPR09: 0000000000000000 GPR25: 0000000000000000 [ 88.333081073,3] GPR10: 0000000000000000 GPR26: 0000000000000003 [ 88.333114301,3] GPR11: 3839616263646566 GPR27: 0000000000000211 [ 88.333156040,3] GPR12: 0000000020002000 GPR28: 000000003042a134 [ 88.333189222,3] GPR13: 0000000000000000 GPR29: 0000000030402740 [ 88.333225638,3] GPR14: 0000000000000000 GPR30: 0000000000000001 [ 88.333259730,3] GPR15: 0000000000000000 GPR31: 0000000000000000 CPU 0211 Backtrace: S: 0000000032c9b3b0 R: 0000000030028690 .lock_error+0x54 S: 0000000032c9b440 R: 0000000030028828 .add_lock_request+0xd0 S: 0000000032c9b4f0 R: 0000000030028a9c .lock_caller+0x8c S: 0000000032c9b5a0 R: 0000000030021b30 .__mcount_stack_check+0x70 S: 0000000032c9b650 R: 00000000300fabb0 .list_check_node+0x1c S: 0000000032c9b6f0 R: 00000000300fac98 .list_check+0x38 S: 0000000032c9b790 R: 00000000300289bc .try_lock_caller+0xac S: 0000000032c9b830 R: 0000000030028ad8 .lock_caller+0xc8 S: 0000000032c9b8e0 R: 0000000030028d74 .lock_recursive_caller+0x54 S: 0000000032c9b980 R: 0000000030020cb8 .console_write+0x48 S: 0000000032c9ba30 R: 00000000300445a8 .vprlog+0xc8 S: 0000000032c9bc20 R: 0000000030044630 ._prlog+0x50 S: 0000000032c9bcb0 R: 0000000030029204 .cpu_idle_p9+0x74 S: 0000000032c9bd40 R: 0000000030029628 .cpu_idle_pm+0x4c S: 0000000032c9bde0 R: 0000000030023fe0 .__secondary_cpu_entry+0xa0 S: 0000000032c9be70 R: 0000000030024034 .secondary_cpu_entry+0x40 S: 0000000032c9bf00 R: 0000000030003290 secondary_wait+0x8c CPU 0x4: opal_run_pollers -> check_stacks -> takes stack_check_lock lock prlog -> console_write -> waits for con_lock CPU 0x211 cpu_idle_p9 -> prlog -> console_write -> Takes con_lock lock list_check_node -> tries to take stack_check_lock and hits deadlock. I think we don't need to hold `stack_check_lock` while printing backtraces. Instead it makes sense to hold backtrace lock (bt_lock) and print output. Reported-by: Cédric Le Goater Signed-off-by: Vasant Hegde Tested-by: Cédric Le Goater --- core/stack.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/core/stack.c b/core/stack.c index 05506f6b3..3edf98411 100644 --- a/core/stack.c +++ b/core/stack.c @@ -250,7 +250,7 @@ void check_stacks(void) unlock(&stack_check_lock); } if (lowest) { - lock(&stack_check_lock); + lock(&bt_lock); prlog(PR_NOTICE, "CPU %04x lowest stack mark %lld bytes left" " pc=%08llx token=%lld\n", lowest->pir, lowest->stack_bot_mark, lowest->stack_bot_pc, @@ -258,7 +258,7 @@ void check_stacks(void) backtrace_print(lowest->stack_bot_bt, &lowest->stack_bot_bt_metadata, NULL, NULL, true); - unlock(&stack_check_lock); + unlock(&bt_lock); } this_cpu()->in_mcount = false;