From patchwork Sat May 19 04:35:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 916756 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40nsq71YwSz9s52 for ; Sat, 19 May 2018 14:42:59 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="K4LssqlT"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 40nsq671bDzF0QP for ; Sat, 19 May 2018 14:42:58 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="K4LssqlT"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c01::244; helo=mail-pl0-x244.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="K4LssqlT"; dkim-atps=neutral Received: from mail-pl0-x244.google.com (mail-pl0-x244.google.com [IPv6:2607:f8b0:400e:c01::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 40nsgH4C1mzF0Pl for ; Sat, 19 May 2018 14:36:11 +1000 (AEST) Received: by mail-pl0-x244.google.com with SMTP id t12-v6so5698820plo.7 for ; Fri, 18 May 2018 21:36:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=9FVestZuEKynfy5pDN1sWQEID9UFfVcr4idHJaGX2ro=; b=K4LssqlTXl3xH9nBzfaYWzD04qiLVsDhdns0Q2UcuzZetT6oro2e7pepnV5tnHcI46 iGgBhks6XTAst7hrS2Gme2t+KP3hl44C0XeIUZXV2IVf9nW44ZCM4zgKDs405+xvGyxL pQYYU5W2tUjcatmPG18RSpxFrM7zNQ1Ei+Ogfps4jF83kIeYkj/ilyu6slJeORU+XwJy t3/R1gkNVU8By4j+Ep70dhe3x+0PBqGA8aX/Y5PxERcOQK5dkTtWe3Q+Fjt8Rxc8aac/ S+FGzRUlR8fLXKpRKK7D4TUqF+eVfTDPxNVwQIGafOFe1o99obLiv69HVOR4Ahf2wIIx g3BQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=9FVestZuEKynfy5pDN1sWQEID9UFfVcr4idHJaGX2ro=; b=LbMis9sDYLbtcqJr0Nu+EuNQ7lMngT8OE/PkbuRSPMG+HnR7JiFmiw4ly2tKNv06GR zSn+GvqtG8rR2gC6KYafjGrfRFlPf6G34hALeGjl149vsOaGg6rDJMJHhYB9SxiLv1l0 HIY4cTtXYbwlCDER/KvjJvnJ+pXhve83dBkA0PAIyw+F2vDTXIc7nQBovynbin1vYif1 RXGrc2UWRWPj4MRDpf/5T2LG9jAOwAfsJec17FZZjGJoP9MsAMYJdJCbme2tEIQwigt/ JG7+qj/XSC9KoKzjI2s8Y67HdRxi2irhWM6pD1eEGNBxlsDWcjTQ91s5pLSch0k41sF0 y1xA== X-Gm-Message-State: ALKqPwfGRZCew+yekKWdvlOa49bOtK0YEu0vOmP2YwhZaS43LLYRSgeu k2e30Dl+0TyY8ixT3hwJQ3NKqg== X-Google-Smtp-Source: AB8JxZp/PusN7Ivh2ewN6ADeWuoKOLyBXQzdpaJRakZ9DRk9dzsKAmdrGsjXSjAMDH7/KxsgCaq5pw== X-Received: by 2002:a17:902:321:: with SMTP id 30-v6mr12141491pld.122.1526704569227; Fri, 18 May 2018 21:36:09 -0700 (PDT) Received: from roar.au.ibm.com (59-102-70-78.tpgi.com.au. [59.102.70.78]) by smtp.gmail.com with ESMTPSA id l90-v6sm14825596pfb.149.2018.05.18.21.36.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 18 May 2018 21:36:08 -0700 (PDT) From: Nicholas Piggin To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH 2/3] powerpc: smp_send_stop do not offline stopped CPUs Date: Sat, 19 May 2018 14:35:53 +1000 Message-Id: <20180519043554.26640-3-npiggin@gmail.com> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180519043554.26640-1-npiggin@gmail.com> References: <20180519043554.26640-1-npiggin@gmail.com> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nicholas Piggin Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Marking CPUs stopped by smp_send_stop as offline can cause warnings due to cross-CPU wakeups. This trace was noticed on a busy system running a sysrq+c crash test, after the injected crash: WARNING: CPU: 51 PID: 1546 at kernel/sched/core.c:1179 set_task_cpu+0x22c/0x240 CPU: 51 PID: 1546 Comm: kworker/u352:1 Tainted: G D Workqueue: mlx5e mlx5e_update_stats_work [mlx5_core] [...] NIP [c00000000017c21c] set_task_cpu+0x22c/0x240 LR [c00000000017d580] try_to_wake_up+0x230/0x720 Call Trace: [c000000001017700] runqueues+0x0/0xb00 (unreliable) [c00000000017d580] try_to_wake_up+0x230/0x720 [c00000000015a214] insert_work+0x104/0x140 [c00000000015adb0] __queue_work+0x230/0x690 [c000003fc5007910] [c00000000015b26c] queue_work_on+0x5c/0x90 [c0080000135fc8f8] mlx5_cmd_exec+0x538/0xcb0 [mlx5_core] [c008000013608fd0] mlx5_core_access_reg+0x140/0x1d0 [mlx5_core] [c00800001362777c] mlx5e_update_pport_counters.constprop.59+0x6c/0x90 [mlx5_core] [c008000013628868] mlx5e_update_ndo_stats+0x28/0x90 [mlx5_core] [c008000013625558] mlx5e_update_stats_work+0x68/0xb0 [mlx5_core] [c00000000015bcec] process_one_work+0x1bc/0x5f0 [c00000000015ecac] worker_thread+0xac/0x6b0 [c000000000168338] kthread+0x168/0x1b0 [c00000000000b628] ret_from_kernel_thread+0x5c/0xb4 This happens because firstly the CPU is not really offline in the usual sense, processes and interrupts have not been migrated away. Secondly smp_send_stop does not happen atomically on all CPUs, so one CPU can have marked itself offline, while another CPU is still running processes or interrupts which can affect the first CPU. Fix this by just not marking the CPU as offline. It's more like frozen in time, so offline does not really reflect its state properly anyway. There should be nothing in the crash/panic path that walks online CPUs and synchronously waits for them, so this change should not introduce new hangs. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/smp.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 9ca7148b5881..6d6cf14009cf 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -579,9 +579,6 @@ static void nmi_stop_this_cpu(struct pt_regs *regs) nmi_ipi_busy_count--; nmi_ipi_unlock(); - /* Remove this CPU */ - set_cpu_online(smp_processor_id(), false); - spin_begin(); while (1) spin_cpu_relax(); @@ -596,9 +593,6 @@ void smp_send_stop(void) static void stop_this_cpu(void *dummy) { - /* Remove this CPU */ - set_cpu_online(smp_processor_id(), false); - hard_irq_disable(); spin_begin(); while (1)