From patchwork Thu Feb 22 09:34:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bitao Hu X-Patchwork-Id: 1902628 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.a=rsa-sha256 header.s=default header.b=DRWFVe0T; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TgSfP1kdHz23hY for ; Thu, 22 Feb 2024 20:34:52 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.a=rsa-sha256 header.s=default header.b=DRWFVe0T; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4TgSfN2hqJz3dTw for ; Thu, 22 Feb 2024 20:34:52 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.a=rsa-sha256 header.s=default header.b=DRWFVe0T; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.alibaba.com (client-ip=115.124.30.131; helo=out30-131.freemail.mail.aliyun.com; envelope-from=yaoma@linux.alibaba.com; receiver=lists.ozlabs.org) Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4TgSf506Gdz2ykZ for ; Thu, 22 Feb 2024 20:34:33 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1708594469; h=From:To:Subject:Date:Message-Id:MIME-Version:Content-Type; bh=aPig5fqEBfNoG36h6uMeosrEa0k4/Htbh0JalMplPlo=; b=DRWFVe0T2IMLYgbFG3O9Gt6z9Ah/Qoi+WshReEuHvKf+dShsr+AzE+9W1GPHxEeniSGiqDAEQzjYJv8iKilrRtKrGPgxAmtIM/TSGNugHsn8ieHDtZlL8ECJ1V6BSNtoknCkKi7RmDbSg0Dy3zg1l+i6lkPar5uDsnjFats72dE= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=yaoma@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0W10inwv_1708594464; Received: from localhost.localdomain(mailfrom:yaoma@linux.alibaba.com fp:SMTPD_---0W10inwv_1708594464) by smtp.aliyun-inc.com; Thu, 22 Feb 2024 17:34:26 +0800 From: Bitao Hu To: dianders@chromium.org, akpm@linux-foundation.org, liusong@linux.alibaba.com, tglx@linutronix.de, pmladek@suse.com, kernelfans@gmail.com, deller@gmx.de, npiggin@gmail.com, tsbogend@alpha.franken.de, James.Bottomley@HansenPartnership.com, jan.kiszka@siemens.com Subject: [PATCHv9 0/3] *** Detect interrupt storm in softlockup *** Date: Thu, 22 Feb 2024 17:34:17 +0800 Message-Id: <20240222093420.13956-1-yaoma@linux.alibaba.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: yaoma@linux.alibaba.com, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org, linux-mips@vger.kernel.org Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Hi, guys. I have implemented a low-overhead method for detecting interrupt storm in softlockup. Please review it, all comments are welcome. Changes from v8 to v9: - Patch #1 remains unchanged. - From Thomas Gleixner, split patch #2 into two patches. Interrupt infrastructure first and then the actual usage site in the watchdog code. Changes from v7 to v8: - From Thomas Gleixner, implement statistics within the interrupt core code and provide sensible interfaces for the watchdog code. - Patch #1 remains unchanged. Patch #2 has significant changes based on Thomas's suggestions, which is why I have removed Liu Song and Douglas's Reviewed-by from patch #2. Please review it again, and all comments are welcome. Changes from v6 to v7: - Remove "READ_ONCE" in "start_counting_irqs" - Replace the hard-coded 5 with "NUM_SAMPLE_PERIODS" macro in "set_sample_period". - Add empty lines to help with reading the code. - Remove the branch that processes IRQs where "counts_diff = 0". - Add the Reviewed-by of Liu Song and Douglas. Changes from v5 to v6: - Use "./scripts/checkpatch.pl --strict" to get a few extra style nits and fix them. - Squash patch #3 into patch #1, and wrapp the help text to 80 columns. - Sort existing headers alphabetically in watchdog.c - Drop "softlockup_hardirq_cpus", just read "hardirq_counts" and see if it's non-NULL. - Store "nr_irqs" in a local variable. - Simplify the calculation of "cpu_diff". Changes from v4 to v5: - Rearranging variable placement to make code look neater. Changes from v3 to v4: - Renaming some variable and function names to make the code logic more readable. - Change the code location to avoid predeclaring. - Just swap rather than a double loop in tabulate_irq_count. - Since nr_irqs has the potential to grow at runtime, bounds-check logic has been implemented. - Add SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob. Changes from v2 to v3: - From Liu Song, using enum instead of macro for cpu_stats, shortening the name 'idx_to_stat' to 'stats', adding 'get_16bit_precesion' instead of using right shift operations, and using 'struct irq_counts'. - From kernel robot test, using '__this_cpu_read' and '__this_cpu_write' instead of accessing to an per-cpu array directly, in order to avoid this warning. 'sparse: incorrect type in initializer (different modifiers)' Changes from v1 to v2: - From Douglas, optimize the memory of cpustats. With the maximum number of CPUs, that's now this. 2 * 8192 * 4 + 1 * 8192 * 5 * 4 + 1 * 8192 = 237,568 bytes. - From Liu Song, refactor the code format and add necessary comments. - From Douglas, use interrupt counts instead of interrupt time to determine the cause of softlockup. - Remove the cmdline parameter added in PATCHv1. Bitao Hu (3): watchdog/softlockup: low-overhead detection of interrupt storm irq: use a struct for the kstat_irqs in the interrupt descriptor watchdog/softlockup: report the most frequent interrupts arch/mips/dec/setup.c | 2 +- arch/parisc/kernel/smp.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_xics.c | 2 +- include/linux/irqdesc.h | 9 +- include/linux/kernel_stat.h | 3 + kernel/irq/internals.h | 2 +- kernel/irq/irqdesc.c | 34 ++++- kernel/irq/proc.c | 9 +- kernel/watchdog.c | 213 ++++++++++++++++++++++++++- lib/Kconfig.debug | 13 ++ scripts/gdb/linux/interrupts.py | 6 +- 11 files changed, 268 insertions(+), 27 deletions(-)