From patchwork Tue Jun 14 13:54:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 1643311 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=iRDBBJ/S; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LMqgX4kbwz9sGJ for ; Tue, 14 Jun 2022 23:54:48 +1000 (AEST) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4LMqgX0l2wz3cCh for ; Tue, 14 Jun 2022 23:54:48 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=iRDBBJ/S; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0b-001b2d01.pphosted.com; envelope-from=ldufour@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=iRDBBJ/S; dkim-atps=neutral Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4LMqgC19d4z3bpb for ; Tue, 14 Jun 2022 23:54:30 +1000 (AEST) Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25EDBm50007127; Tue, 14 Jun 2022 13:54:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : content-transfer-encoding : mime-version; s=pp1; bh=91KbsJhgd5Z7wAIVHBdXOXopYNH6NERzRpoqR9TzenU=; b=iRDBBJ/SeO1gfxcZsOj2Q9g72nU6Vyz7FYirToCjfcas91dLRAOnsXlKvTtEG0ql+EfF +2NPRoWXdWOcjLyddrYp7nz123GpQ8VL6ec7pc26oXnAm6Ncj8PRAi4lbCB298CSkcR9 cNkuu5nHHDQHgVtVL8MUDg3mEjAeDwfbqsWcyy/ksX+jB4rzpQObJ74niR+YX0WOQTs0 vVmkY6INV0kjEOH48Z6ZtxJ27LBPUBgJvw1+5jFqP7EU7GztU8XW9/LuAnPD4FKaG6sF ZfMUf7uKRTO8UxQF8z0JSQwU9MvyaU43XHrsv4cZAlS7l3NYJ7U21R182FIA1xUmwVea hg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gppbr1kkf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Jun 2022 13:54:22 +0000 Received: from m0098421.ppops.net (m0098421.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25EDsLuf027193; Tue, 14 Jun 2022 13:54:21 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3gppbr1kg4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Jun 2022 13:54:21 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25EDpP1v007434; Tue, 14 Jun 2022 13:54:20 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma04ams.nl.ibm.com with ESMTP id 3gmjp94dju-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Jun 2022 13:54:19 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25EDsKwY20381972 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 14 Jun 2022 13:54:20 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9BB7CAE053; Tue, 14 Jun 2022 13:54:16 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 38A6CAE04D; Tue, 14 Jun 2022 13:54:16 +0000 (GMT) Received: from pomme.tlslab.ibm.com (unknown [9.101.4.33]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 14 Jun 2022 13:54:16 +0000 (GMT) From: Laurent Dufour To: mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org, nathanl@linux.ibm.com, haren@linux.vnet.ibm.com, npiggin@gmail.com Subject: [PATCH v2 0/4] Extending NMI watchdog during LPM Date: Tue, 14 Jun 2022 15:54:10 +0200 Message-Id: <20220614135414.37746-1-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.36.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: aUtiuafk094ucdMJIB0556ynfm-IMZI4 X-Proofpoint-ORIG-GUID: VsppHNCKw4X-KboyXicfKRuxFlHtOfXo X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-14_04,2022-06-13_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=947 priorityscore=1501 suspectscore=0 impostorscore=0 adultscore=0 phishscore=0 bulkscore=0 mlxscore=0 malwarescore=0 clxscore=1015 lowpriorityscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206140054 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" When a partition is transferred, once it arrives at the destination node, the partition is active but much of its memory must be transferred from the start node. It depends on the activity in the partition, but the more CPU the partition has, the more memory to be transferred is likely to be. This causes latency when accessing pages that need to be transferred, and often, for large partitions, it triggers the NMI watchdog. The NMI watchdog causes the CPU stack to dump where it appears to be stuck. In this case, it does not bring much information since it can happen during any memory access of the kernel. In addition, the NMI interrupt mechanism is not secure and can generate a dump system in the event that the interruption is taken while MSR[RI]=0. Depending on the LPAR size and load, it may be interesting to extend the NMI watchdog timer during the LPM. That's configurable through sysctl with the new introduced variable (specific to powerpc) lpm_nmi_watchdog_factor. This value represents the percentage added to watchdog_tresh to set the NMI watchdog timeout during a LPM. Changes in v2: - introduce a timer factor. v1: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer https://lore.kernel.org/linuxppc-dev/20220601155315.35109-1-ldufour@linux.ibm.com/#r Laurent Dufour (4): powerpc/mobility: Wait for memory transfer to complete watchdog: export watchdog_mutex and lockup_detector_reconfigure powerpc/watchdog: introduce a LPM factor pseries/mobility: Set NMI watchdog factor during LPM Documentation/admin-guide/sysctl/kernel.rst | 12 +++ arch/powerpc/include/asm/nmi.h | 2 + arch/powerpc/kernel/watchdog.c | 22 ++++- arch/powerpc/platforms/pseries/mobility.c | 90 ++++++++++++++++++++- include/linux/nmi.h | 3 + kernel/watchdog.c | 6 +- 6 files changed, 129 insertions(+), 6 deletions(-)