From patchwork Tue May 1 02:22:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 906939 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40ZlbW26xbz9s0W for ; Tue, 1 May 2018 12:24:23 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="srway3Fe"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 40ZlbW0PgMzF2RB for ; Tue, 1 May 2018 12:24:23 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="srway3Fe"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c00::241; helo=mail-pf0-x241.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="srway3Fe"; dkim-atps=neutral Received: from mail-pf0-x241.google.com (mail-pf0-x241.google.com [IPv6:2607:f8b0:400e:c00::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 40ZlYJ58VjzDqyL for ; Tue, 1 May 2018 12:22:28 +1000 (AEST) Received: by mail-pf0-x241.google.com with SMTP id o76so8194023pfi.5 for ; Mon, 30 Apr 2018 19:22:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=/F1X4N6xRTr5CvG0QQ2JC2FgNQH+B1rs9fvpzb80ar4=; b=srway3FeBmac98FeJ5L1w4RvJH17Q3EjeaexCiHKBbSTCACs9ThAb5zU+92aGc/zra gwm2NnX/u6GeXsx2HeAiu3CdVY8aRVrXF2i8NvCZV1Z/Af7D2ihZUhbgyf8KY5vXkHck q1zRVOe6kqToCaZ9wXRm0tcrLyJk8k7vPZT4ioCJIs/DdLa0IjggJKcFMxjrfjbDMxyi LubGqU0lnegn/COzBtkp0bCXPIUn4J1nzzw8QskAy6sq4oUZQ+VQVRPJm0qbTZBKLRhx f4OZ7wdKnKBdnZlvzkNnLCoZHNa2YDR7RAm2OqLiDkQ3D1LVNhQOQj7IV+k4o2jyev+G orzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=/F1X4N6xRTr5CvG0QQ2JC2FgNQH+B1rs9fvpzb80ar4=; b=i7xZS7xbT4DvG9wC/NA8TRONVZYRUm3K2pcr72YQSRmU/7v8HG6xAppVJdKpn4Kl6f pVOLU5ZaAtog7O7Ldj21mc8GeiQ0q6GGM6WnFeFt74arE3BqrsdhQOXNglVlFambn47G ARJ7dLzkyfUh/doFfeC3sdP6t0Y81pVR5jgVRPxemfF61rpc6WEyGBeSD3JTIP2u7Z7D 1//RNGGJ4GEz0h2yalraWr6RyMgLsqgyJuHT9y+Y1AfJUKhiq3LICNTU5xhpIOvmHPTC 0JveWDK3DvTdV/OG9tqbQtvyRDC2UYRZdBBq336KdCt7I4B5YFQ4IwcqfocN8pZHV6ev W7hw== X-Gm-Message-State: ALQs6tAhH+eQUjTCHVG46/vrpkGngoxs4abfNVYsdCHBrj0wmDUKlEIw ibbDFrIleXuQLbGkU0oCnylbvA== X-Google-Smtp-Source: AB8JxZpwFsuukUfp5fWnjds4QP5tocu3xeptV+LCzj8VT9YmBH+dXLkxeGY8Plpj6k9hbV4nPZnK5w== X-Received: by 10.98.231.16 with SMTP id s16mr14047559pfh.227.1525141345866; Mon, 30 Apr 2018 19:22:25 -0700 (PDT) Received: from roar.au.ibm.com (59-102-70-78.tpgi.com.au. [59.102.70.78]) by smtp.gmail.com with ESMTPSA id k84sm7294326pfh.93.2018.04.30.19.22.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 19:22:24 -0700 (PDT) From: Nicholas Piggin To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH] powerpc/watchdog: provide more data in watchdog messages Date: Tue, 1 May 2018 12:22:16 +1000 Message-Id: <20180501022216.1420-1-npiggin@gmail.com> X-Mailer: git-send-email 2.17.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nicholas Piggin Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Provide timebase and timebase of last heartbeat in watchdog lockup messages. Also provide a stack trace of when a CPU becomes un-stuck, which can be useful -- it could be where irqs are re-enabled, so it may be the end of the critical section which is responsible for the latency. Signed-off-by: Nicholas Piggin --- A lockup + unstuck event now looks like this (with irqtrace enabled): watchdog: CPU 1 self-detected hard LOCKUP @ udelay+0x40/0x60 watchdog: CPU 1 TB:82611697355, last heartbeat TB:75431975757 Modules linked in: irq event stamp: 2250184 hardirqs last enabled at (2250183): [] _raw_spin_unlock_irqrestore+0x58/0xd0 hardirqs last disabled at (2250184): [] kernel_init+0x7c/0x1b0 softirqs last enabled at (2243178): [] __do_softirq+0x4d8/0x6f8 softirqs last disabled at (2243159): [] irq_exit+0x108/0x1a0 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc3-00003-g7e2de68fdf39 #319 NIP: c000000000026d40 LR: c00000000000e77c CTR: 0000000000000000 REGS: c000000fffcebd80 TRAP: 0900 Not tainted (4.17.0-rc3-00003-g7e2de68fdf39) MSR: 9000000002009033 CR: 44000824 XER: 20040000 CFAR: c000000000026d4c SOFTE: 1 GPR00: c00000000000e77c c000000006a07dc0 c000000001306000 00000002625a0000 GPR04: c00000000000e76c 0000000000000038 8f5c28f5c28f5c29 c000000ff0e18cb0 GPR08: c0000000011a6000 00000001802d8dcf 00000011bbdcdf4c 7bc4830ddfa301bf GPR12: 0000000000000000 c000000fffffe300 c00000000000e6f8 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 NIP [c000000000026d40] udelay+0x40/0x60 LR [c00000000000e77c] kernel_init+0x8c/0x1b0 Call Trace: [c000000006a07dc0] [c00000000000e76c] kernel_init+0x7c/0x1b0 (unreliable) [c000000006a07e30] [c00000000000b9b0] ret_from_kernel_thread+0x5c/0xac Instruction dump: 3d22ffe9 e929d400 7c6349d2 7c210b78 7d4c42a6 7d2c42a6 7d2a4850 7fa34840 409d0020 60000000 60000000 60000000 <7d2c42a6> 7d2a4850 7fa34840 419dfff4 time 86406258554 watchdog: CPU 1 became unstuck TB:86406275034 irq event stamp: 2250201 hardirqs last enabled at (2250200): [] _raw_spin_unlock_irq+0x48/0x80 hardirqs last disabled at (2250201): [] vprintk_emit+0x98/0x5c0 softirqs last enabled at (2243178): [] __do_softirq+0x4d8/0x6f8 softirqs last disabled at (2250191): [] irq_exit+0x108/0x1a0 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc3-00003-g7e2de68fdf39 #319 NIP: c00000000000abdc LR: c0000000000186b8 CTR: 000000003003e65c REGS: c000000006a07b10 TRAP: 0901 Not tainted (4.17.0-rc3-00003-g7e2de68fdf39) MSR: 9000000002009033 CR: 48000824 XER: 00000000 CFAR: c000000000026d4c SOFTE: 0 GPR00: c000000000018660 c000000006a07d90 c000000001306000 0000000000000900 GPR04: 0000000000000001 0000000000000038 8f5c28f5c28f5c29 c000000ff0e17970 GPR08: c000000001348d98 00000002002d8dee 00000011bbdcdf4c 7bc4830ddfa301bf GPR12: 0000000000000000 c000000fffffe300 NIP [c00000000000abdc] replay_interrupt_return+0x0/0x4 LR [c0000000000186b8] arch_local_irq_restore+0xd8/0xf0 Call Trace: [c000000006a07d90] [c000000000018660] arch_local_irq_restore+0x80/0xf0 (unreliable) [c000000006a07dc0] [c00000000000e7ac] kernel_init+0xbc/0x1b0 [c000000006a07e30] [c00000000000b9b0] ret_from_kernel_thread+0x5c/0xac Instruction dump: 7d200026 618c8000 2c030900 4182e518 2c030500 4182f140 2c030f00 4182f278 2c030a00 4182ff9c 2c030e60 4182eea8 <4e800020> 7c781b78 480003a5 480003bd arch/powerpc/kernel/watchdog.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 6256dc3b0087..8a0f7d97f8d9 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -111,7 +111,11 @@ static inline void wd_smp_unlock(unsigned long *flags) static void wd_lockup_ipi(struct pt_regs *regs) { - pr_emerg("CPU %d Hard LOCKUP\n", raw_smp_processor_id()); + int cpu = raw_smp_processor_id(); + + pr_emerg("CPU %d Hard LOCKUP\n", cpu); + pr_emerg("CPU %d TB:%lld, last heartbeat TB:%lld\n", + cpu, get_tb(), per_cpu(wd_timer_tb, cpu)); print_modules(); print_irqtrace_events(current); if (regs) @@ -154,6 +158,8 @@ static void watchdog_smp_panic(int cpu, u64 tb) pr_emerg("CPU %d detected hard LOCKUP on other CPUs %*pbl\n", cpu, cpumask_pr_args(&wd_smp_cpus_pending)); + pr_emerg("CPU %d TB:%lld, last SMP heartbeat TB:%lld\n", + cpu, tb, wd_smp_last_reset_tb); if (!sysctl_hardlockup_all_cpu_backtrace) { /* @@ -194,10 +200,19 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb) { if (!cpumask_test_cpu(cpu, &wd_smp_cpus_pending)) { if (unlikely(cpumask_test_cpu(cpu, &wd_smp_cpus_stuck))) { + struct pt_regs *regs = get_irq_regs(); unsigned long flags; - pr_emerg("CPU %d became unstuck\n", cpu); wd_smp_lock(&flags); + + pr_emerg("CPU %d became unstuck TB:%lld\n", + cpu, tb); + print_irqtrace_events(current); + if (regs) + show_regs(regs); + else + dump_stack(); + cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck); wd_smp_unlock(&flags); } @@ -245,8 +260,6 @@ void soft_nmi_interrupt(struct pt_regs *regs) tb = get_tb(); if (tb - per_cpu(wd_timer_tb, cpu) >= wd_panic_timeout_tb) { - per_cpu(wd_timer_tb, cpu) = tb; - wd_smp_lock(&flags); if (cpumask_test_cpu(cpu, &wd_smp_cpus_stuck)) { wd_smp_unlock(&flags); @@ -254,7 +267,10 @@ void soft_nmi_interrupt(struct pt_regs *regs) } set_cpu_stuck(cpu, tb); - pr_emerg("CPU %d self-detected hard LOCKUP @ %pS\n", cpu, (void *)regs->nip); + pr_emerg("CPU %d self-detected hard LOCKUP @ %pS\n", + cpu, (void *)regs->nip); + pr_emerg("CPU %d TB:%lld, last heartbeat TB:%lld\n", + cpu, get_tb(), per_cpu(wd_timer_tb, cpu)); print_modules(); print_irqtrace_events(current); show_regs(regs);