From patchwork Thu Oct 12 19:18:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh J Salgaonkar X-Patchwork-Id: 825058 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yCgcq2ZS4z9sNV for ; Fri, 13 Oct 2017 06:18:51 +1100 (AEDT) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3yCgcp6dDpzDr9F for ; Fri, 13 Oct 2017 06:18:50 +1100 (AEDT) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=mahesh@linux.vnet.ibm.com; receiver=) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3yCgch6KmJzDr6N for ; Fri, 13 Oct 2017 06:18:44 +1100 (AEDT) Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v9CJIOXq017574 for ; Thu, 12 Oct 2017 15:18:42 -0400 Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com [195.75.94.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 2dj9hkns3d-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 12 Oct 2017 15:18:42 -0400 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 12 Oct 2017 20:18:40 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp11.uk.ibm.com (192.168.101.141) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 12 Oct 2017 20:18:37 +0100 Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v9CJIZjE22872128 for ; Thu, 12 Oct 2017 19:18:36 GMT Received: from d23av01.au.ibm.com (localhost [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id v9CJIaOe006338 for ; Fri, 13 Oct 2017 06:18:36 +1100 Received: from jupiter.in.ibm.com ([9.124.208.147]) by d23av01.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id v9CJIYZN006296; Fri, 13 Oct 2017 06:18:35 +1100 From: Mahesh J Salgaonkar To: Stewart Smith , skiboot list Date: Fri, 13 Oct 2017 00:48:32 +0530 User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-MML: disable x-cbid: 17101219-0040-0000-0000-0000040241E2 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17101219-0041-0000-0000-000020A45900 Message-Id: <150783587481.992.16332072755377718653.stgit@jupiter.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-10-12_10:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1710120274 Subject: [Skiboot] [PATCH] opal/cpu: Mark the core as bad while disabling threads of the core. X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.24 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" From: Mahesh Salgaonkar If any of the core fails to sync its TB during chipTOD initialization, all the threads of that core are disabled. But this does not make linux kernel to ignore the core/cpus. It crashes while bringing them up with below backtrace: [ 38.883898] kexec_core: Starting new kernel cpu 0x0: Vector: 300 (Data Access) at [c0000003f277b730] pc: c0000000001b9890: internal_create_group+0x30/0x304 lr: c0000000001b9880: internal_create_group+0x20/0x304 sp: c0000003f277b9b0 msr: 900000000280b033 dar: 40 dsisr: 40000000 current = 0xc0000003f9f41000 paca = 0xc00000000fe00000 softe: 0 irq_happened: 0x01 pid = 2572, comm = kexec Linux version 4.13.2-openpower1 (jenkins@p89) (gcc version 6.4.0 (Buildroot 2017.08-00006-g319c6e1)) #1 SMP Wed Sep 20 05:42:11 UTC 2017 enter ? for help [c0000003f277b9b0] c0000000008a8780 (unreliable) [c0000003f277ba50] c00000000041c3ac topology_add_dev+0x2c/0x40 [c0000003f277ba70] c00000000006b078 cpuhp_invoke_callback+0x88/0x170 [c0000003f277bac0] c00000000006b22c cpuhp_up_callbacks+0x54/0xb8 [c0000003f277bb10] c00000000006bc68 cpu_up+0x11c/0x168 [c0000003f277bbc0] c00000000002f0e0 default_machine_kexec+0x1fc/0x274 [c0000003f277bc50] c00000000002e2d8 machine_kexec+0x50/0x58 [c0000003f277bc70] c0000000000de4e8 kernel_kexec+0x98/0xb4 [c0000003f277bce0] c00000000008b0f0 SyS_reboot+0x1c8/0x1f4 [c0000003f277be30] c00000000000b118 system_call+0x58/0x6c --- Exception: c01 (System Call) at 00007fff7f775074 SP (7fffe6c7bf10) is in userspace 0:mon> This patch fixes this issue by marking the core status device property as "bad". Signed-off-by: Mahesh Salgaonkar --- core/cpu.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/core/cpu.c b/core/cpu.c index 78565b5..be0e451 100644 --- a/core/cpu.c +++ b/core/cpu.c @@ -766,14 +766,24 @@ void cpu_remove_node(const struct cpu_thread *t) void cpu_disable_all_threads(struct cpu_thread *cpu) { unsigned int i; + struct dt_property *p; for (i = 0; i <= cpu_max_pir; i++) { struct cpu_thread *t = &cpu_stacks[i].cpu; if (t->primary == cpu->primary) t->state = cpu_state_disabled; + } + /* Mark this core as bad so that Linux kernel don't use this CPU. */ + prlog(PR_DEBUG, "CPU: Mark CPU bad (PIR 0x%04x)...\n", cpu->pir); + p = __dt_find_property(cpu->node, "status"); + if (p) + dt_del_property(cpu->node, p); + + dt_add_property_string(cpu->node, "status", "bad"); + /* XXX Do something to actually stop the core */ }