[{"id":2005788,"web_url":"http://patchwork.ozlabs.org/comment/2005788/","msgid":"<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>","date":"2018-10-08T15:39:11","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":5234,"url":"http://patchwork.ozlabs.org/api/people/5234/","name":"Christophe Leroy","email":"christophe.leroy@c-s.fr"},"content":"Hi Nick,\n\nLe 19/07/2017 à 08:59, Nicholas Piggin a écrit :\n> Use nmi_enter similarly to system reset interrupts. This uses NMI\n> printk NMI buffers and turns off various debugging facilities that\n> helps avoid tripping on ourselves or other CPUs.\n> \n> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n> ---\n>   arch/powerpc/kernel/traps.c | 9 ++++++---\n>   1 file changed, 6 insertions(+), 3 deletions(-)\n> \n> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n> index 2849c4f50324..6d31f9d7c333 100644\n> --- a/arch/powerpc/kernel/traps.c\n> +++ b/arch/powerpc/kernel/traps.c\n> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)\n>   \n>   void machine_check_exception(struct pt_regs *regs)\n>   {\n> -\tenum ctx_state prev_state = exception_enter();\n>   \tint recover = 0;\n> +\tbool nested = in_nmi();\n> +\tif (!nested)\n> +\t\tnmi_enter();\n\nThis alters preempt_count, then when die() is called\nin_interrupt() returns true allthough the trap didn't happen in \ninterrupt, so oops_end() panics for \"fatal exception in interrupt\" \ninstead of gently sending SIGBUS the faulting app.\n\nAny idea on how to fix this ?\n\nChristophe\n\n>   \n>   \t__this_cpu_inc(irq_stat.mce_exceptions);\n>   \n> @@ -820,10 +822,11 @@ void machine_check_exception(struct pt_regs *regs)\n>   \n>   \t/* Must die if the interrupt is not recoverable */\n>   \tif (!(regs->msr & MSR_RI))\n> -\t\tpanic(\"Unrecoverable Machine check\");\n> +\t\tnmi_panic(regs, \"Unrecoverable Machine check\");\n>   \n>   bail:\n> -\texception_exit(prev_state);\n> +\tif (!nested)\n> +\t\tnmi_exit();\n>   }\n>   \n>   void SMIException(struct pt_regs *regs)\n>","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42TQY86fmDz9sB7\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 03:19:24 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42TQY85LGqzF3C1\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 03:19:24 +1100 (AEDT)","from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42TQWR4s2QzF3BS\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue,  9 Oct 2018 03:17:55 +1100 (AEDT)","from localhost (mailhub1-int [192.168.12.234])\n\tby localhost (Postfix) with ESMTP id 42TQW95s7qz9ttBZ;\n\tMon,  8 Oct 2018 18:17:41 +0200 (CEST)","from pegase1.c-s.fr ([192.168.12.234])\n\tby localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new,\n\tport 10024)\n\twith ESMTP id jzJwmQuxMRkR; Mon,  8 Oct 2018 18:17:41 +0200 (CEST)","from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192])\n\tby pegase1.c-s.fr (Postfix) with ESMTP id 42TQW95Kfpz9ttBY;\n\tMon,  8 Oct 2018 18:17:41 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id BD8178B7D0;\n\tMon,  8 Oct 2018 18:17:51 +0200 (CEST)","from messagerie.si.c-s.fr ([127.0.0.1])\n\tby localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new,\n\tport 10023)\n\twith ESMTP id IysNyAM1mikM; Mon,  8 Oct 2018 18:17:51 +0200 (CEST)","from PO15451 (unknown [192.168.232.3])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id D1FAC8B7BD;\n\tMon,  8 Oct 2018 18:17:50 +0200 (CEST)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=c-s.fr\n\t(client-ip=93.17.236.30; helo=pegase1.c-s.fr;\n\tenvelope-from=christophe.leroy@c-s.fr; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr"],"X-Virus-Scanned":["Debian amavisd-new at c-s.fr","amavisd-new at c-s.fr"],"Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","To":"Nicholas Piggin <npiggin@gmail.com>, linuxppc-dev@lists.ozlabs.org","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>","From":"Christophe LEROY <christophe.leroy@c-s.fr>","Message-ID":"<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>","Date":"Mon, 8 Oct 2018 17:39:11 +0200","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101\n\tThunderbird/52.9.1","MIME-Version":"1.0","In-Reply-To":"<20170719065912.19183-4-npiggin@gmail.com>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Language":"fr","Content-Transfer-Encoding":"8bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2006163,"web_url":"http://patchwork.ozlabs.org/comment/2006163/","msgid":"<20181009143241.026f3e7f@roar.ozlabs.ibm.com>","date":"2018-10-09T04:32:41","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":69518,"url":"http://patchwork.ozlabs.org/api/people/69518/","name":"Nicholas Piggin","email":"npiggin@gmail.com"},"content":"On Mon, 8 Oct 2018 17:39:11 +0200\nChristophe LEROY <christophe.leroy@c-s.fr> wrote:\n\n> Hi Nick,\n> \n> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :\n> > Use nmi_enter similarly to system reset interrupts. This uses NMI\n> > printk NMI buffers and turns off various debugging facilities that\n> > helps avoid tripping on ourselves or other CPUs.\n> > \n> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n> > ---\n> >   arch/powerpc/kernel/traps.c | 9 ++++++---\n> >   1 file changed, 6 insertions(+), 3 deletions(-)\n> > \n> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n> > index 2849c4f50324..6d31f9d7c333 100644\n> > --- a/arch/powerpc/kernel/traps.c\n> > +++ b/arch/powerpc/kernel/traps.c\n> > @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)\n> >   \n> >   void machine_check_exception(struct pt_regs *regs)\n> >   {\n> > -\tenum ctx_state prev_state = exception_enter();\n> >   \tint recover = 0;\n> > +\tbool nested = in_nmi();\n> > +\tif (!nested)\n> > +\t\tnmi_enter();  \n> \n> This alters preempt_count, then when die() is called\n> in_interrupt() returns true allthough the trap didn't happen in \n> interrupt, so oops_end() panics for \"fatal exception in interrupt\" \n> instead of gently sending SIGBUS the faulting app.\n\nThanks for tracking that down.\n\n> Any idea on how to fix this ?\n\nI would say we have to deliver the sigbus by hand.\n\n    if ((user_mode(regs)))\n        _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n    else\n        die(\"Machine check\", regs, SIGBUS);","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42TksM4zgHz9s55\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 15:34:31 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42TksM3TzrzF3Mx\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 15:34:31 +1100 (AEDT)","from mail-pl1-x643.google.com (mail-pl1-x643.google.com\n\t[IPv6:2607:f8b0:4864:20::643])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42TkqQ2Hl9zF39C\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue,  9 Oct 2018 15:32:49 +1100 (AEDT)","by mail-pl1-x643.google.com with SMTP id c8-v6so154671plo.9\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tMon, 08 Oct 2018 21:32:49 -0700 (PDT)","from roar.ozlabs.ibm.com (60-240-121-136.tpgi.com.au.\n\t[60.240.121.136]) by smtp.gmail.com with ESMTPSA id\n\tr18-v6sm16618391pgv.17.2018.10.08.21.32.45\n\t(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);\n\tMon, 08 Oct 2018 21:32:47 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=fail (p=none dis=none) header.from=gmail.com","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"WkK6NVqI\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdmarc=pass (p=none dis=none) header.from=gmail.com","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"WkK6NVqI\"; dkim-atps=neutral","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gmail.com\n\t(client-ip=2607:f8b0:4864:20::643; helo=mail-pl1-x643.google.com;\n\tenvelope-from=npiggin@gmail.com; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=pass (p=none dis=none) header.from=gmail.com","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"WkK6NVqI\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=date:from:to:cc:subject:message-id:in-reply-to:references\n\t:mime-version:content-transfer-encoding;\n\tbh=yhpEXwVV+gBeM7/D6qo6xsBYADreV2L5BcJ/l365BbQ=;\n\tb=WkK6NVqImTlyp5pp5Q+sw6B24/8ZY6PMDsY0ircgrRYnA7TUobS0wwmPNk9gvSH3UL\n\t1kvYYByUQ94WYiwr5+sEMhHtcPlqgHnIZ2ZlWkyZ6wtoTwOkKBo6MjAngN6yJYliHV5D\n\taO9D/Jjj4oxi1GmmceaPxEH9BUzUryHiYA1Y773LFlYK8LyuR+f7oG/X9ZS54lqpMyUA\n\tooyJIBzlZwTzM8GBqEEIpY2j+qIQsSpqSYc3PID7taYTkTJlRo06jmeht3dTRy8nMpa4\n\tL/tvMJLpifTEH0tIS5Cl7QOWDD2KSBAgJCAZcrdSLaFp9duMOZsRa+hGhDbZWYPOdhil\n\tvOZg==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to\n\t:references:mime-version:content-transfer-encoding;\n\tbh=yhpEXwVV+gBeM7/D6qo6xsBYADreV2L5BcJ/l365BbQ=;\n\tb=uho7yHIkWRST93Jub3iwHSCkhOvL08Fxmbt1LSypE4ELDadl9mIQcqG/zV72cMCNce\n\tROZR1xWmqYIqec0bECE31JqU200eMZDoeOlvzYJW/Y6f5x+xdzXz4gHo5j59JcUdZjqt\n\tKpz3R25tpI+wd29zFfKQRjgwYAJhJaxpKjpT2XLAMbYZnelHFgN2r7+NV2ze8v615SqB\n\t5NEJDIxACDHGo4sxI9H07P5CDXNHV2JrjExkzVoi0AfmJ2oZMtsXCH51VaT581/so5pw\n\tm8gwReCxEEzlC373uaBeGplXxAzPSYRr5JjAO7Yho0gf3JC3GJHFvIf43d9w61RykwXe\n\tFX9g==","X-Gm-Message-State":"ABuFfoja8TsgaHyu/wj48b5DxQdlx8vTZmzVzdpCMSy03ROiV0VXpe/E\n\tFeiO5DcLelhHG+y03KRqzWc=","X-Google-Smtp-Source":"ACcGV63xcQWHBT5Ds3Uz7H97cSqVsdj69Cn8c1hka2fOOI6qQuvTZq744N4WM0FVMT2Pgs5cLkr3eg==","X-Received":"by 2002:a17:902:b696:: with SMTP id\n\tc22-v6mr11409892pls.37.1539059567944; \n\tMon, 08 Oct 2018 21:32:47 -0700 (PDT)","Date":"Tue, 9 Oct 2018 14:32:41 +1000","From":"Nicholas Piggin <npiggin@gmail.com>","To":"Christophe LEROY <christophe.leroy@c-s.fr>","Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","Message-ID":"<20181009143241.026f3e7f@roar.ozlabs.ibm.com>","In-Reply-To":"<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>","X-Mailer":"Claws Mail 3.17.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu)","MIME-Version":"1.0","Content-Type":"text/plain; charset=UTF-8","Content-Transfer-Encoding":"quoted-printable","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2006166,"web_url":"http://patchwork.ozlabs.org/comment/2006166/","msgid":"<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>","date":"2018-10-09T04:46:30","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":5234,"url":"http://patchwork.ozlabs.org/api/people/5234/","name":"Christophe Leroy","email":"christophe.leroy@c-s.fr"},"content":"Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :\n> On Mon, 8 Oct 2018 17:39:11 +0200\n> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n> \n>> Hi Nick,\n>>\n>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :\n>>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n>>> printk NMI buffers and turns off various debugging facilities that\n>>> helps avoid tripping on ourselves or other CPUs.\n>>>\n>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n>>> ---\n>>>    arch/powerpc/kernel/traps.c | 9 ++++++---\n>>>    1 file changed, 6 insertions(+), 3 deletions(-)\n>>>\n>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n>>> index 2849c4f50324..6d31f9d7c333 100644\n>>> --- a/arch/powerpc/kernel/traps.c\n>>> +++ b/arch/powerpc/kernel/traps.c\n>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)\n>>>    \n>>>    void machine_check_exception(struct pt_regs *regs)\n>>>    {\n>>> -\tenum ctx_state prev_state = exception_enter();\n>>>    \tint recover = 0;\n>>> +\tbool nested = in_nmi();\n>>> +\tif (!nested)\n>>> +\t\tnmi_enter();\n>>\n>> This alters preempt_count, then when die() is called\n>> in_interrupt() returns true allthough the trap didn't happen in\n>> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n>> instead of gently sending SIGBUS the faulting app.\n> \n> Thanks for tracking that down.\n> \n>> Any idea on how to fix this ?\n> \n> I would say we have to deliver the sigbus by hand.\n> \n>      if ((user_mode(regs)))\n>          _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n>      else\n>          die(\"Machine check\", regs, SIGBUS);\n> \n\nAnd what about all the other things done by 'die()' ?\n\nAnd what if it is a kernel thread ?\n\nIn one of my boards, I have a kernel thread regularly checking the HW, \nand if it gets a machine check I expect it to gently stop and the die \nnotification to be delivered to all registered notifiers.\n\nUntil before this patch, it was working well.\n\nChristophe","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42Tl8s1PKjz9s8T\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 15:47:57 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42Tl8r73j4zF3CT\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 15:47:56 +1100 (AEDT)","from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42Tl7J2rs9zF3CP\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue,  9 Oct 2018 15:46:35 +1100 (AEDT)","from localhost (mailhub1-int [192.168.12.234])\n\tby localhost (Postfix) with ESMTP id 42Tl712hhDz9ttBx;\n\tTue,  9 Oct 2018 06:46:21 +0200 (CEST)","from pegase1.c-s.fr ([192.168.12.234])\n\tby localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new,\n\tport 10024)\n\twith ESMTP id fL7PvXFbq8vy; Tue,  9 Oct 2018 06:46:21 +0200 (CEST)","from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192])\n\tby pegase1.c-s.fr (Postfix) with ESMTP id 42Tl7120Rkz9ttBR;\n\tTue,  9 Oct 2018 06:46:21 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 1B1FE8B775;\n\tTue,  9 Oct 2018 06:46:31 +0200 (CEST)","from messagerie.si.c-s.fr ([127.0.0.1])\n\tby localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new,\n\tport 10023)\n\twith ESMTP id oyboibtDw4Se; Tue,  9 Oct 2018 06:46:31 +0200 (CEST)","from PO15451 (unknown [192.168.232.3])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 4F9E68B75B;\n\tTue,  9 Oct 2018 06:46:31 +0200 (CEST)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=c-s.fr\n\t(client-ip=93.17.236.30; helo=pegase1.c-s.fr;\n\tenvelope-from=christophe.leroy@c-s.fr; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr"],"X-Virus-Scanned":["Debian amavisd-new at c-s.fr","amavisd-new at c-s.fr"],"Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","To":"Nicholas Piggin <npiggin@gmail.com>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>","From":"Christophe LEROY <christophe.leroy@c-s.fr>","Message-ID":"<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>","Date":"Tue, 9 Oct 2018 06:46:30 +0200","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101\n\tThunderbird/52.9.1","MIME-Version":"1.0","In-Reply-To":"<20181009143241.026f3e7f@roar.ozlabs.ibm.com>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Language":"fr","Content-Transfer-Encoding":"8bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2006180,"web_url":"http://patchwork.ozlabs.org/comment/2006180/","msgid":"<20181009153058.2564e7a1@roar.ozlabs.ibm.com>","date":"2018-10-09T05:30:58","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":69518,"url":"http://patchwork.ozlabs.org/api/people/69518/","name":"Nicholas Piggin","email":"npiggin@gmail.com"},"content":"On Tue, 9 Oct 2018 06:46:30 +0200\nChristophe LEROY <christophe.leroy@c-s.fr> wrote:\n\n> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :\n> > On Mon, 8 Oct 2018 17:39:11 +0200\n> > Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n> >   \n> >> Hi Nick,\n> >>\n> >> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  \n> >>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n> >>> printk NMI buffers and turns off various debugging facilities that\n> >>> helps avoid tripping on ourselves or other CPUs.\n> >>>\n> >>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n> >>> ---\n> >>>    arch/powerpc/kernel/traps.c | 9 ++++++---\n> >>>    1 file changed, 6 insertions(+), 3 deletions(-)\n> >>>\n> >>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n> >>> index 2849c4f50324..6d31f9d7c333 100644\n> >>> --- a/arch/powerpc/kernel/traps.c\n> >>> +++ b/arch/powerpc/kernel/traps.c\n> >>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)\n> >>>    \n> >>>    void machine_check_exception(struct pt_regs *regs)\n> >>>    {\n> >>> -\tenum ctx_state prev_state = exception_enter();\n> >>>    \tint recover = 0;\n> >>> +\tbool nested = in_nmi();\n> >>> +\tif (!nested)\n> >>> +\t\tnmi_enter();  \n> >>\n> >> This alters preempt_count, then when die() is called\n> >> in_interrupt() returns true allthough the trap didn't happen in\n> >> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n> >> instead of gently sending SIGBUS the faulting app.  \n> > \n> > Thanks for tracking that down.\n> >   \n> >> Any idea on how to fix this ?  \n> > \n> > I would say we have to deliver the sigbus by hand.\n> > \n> >      if ((user_mode(regs)))\n> >          _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n> >      else\n> >          die(\"Machine check\", regs, SIGBUS);\n> >   \n> \n> And what about all the other things done by 'die()' ?\n> \n> And what if it is a kernel thread ?\n> \n> In one of my boards, I have a kernel thread regularly checking the HW, \n> and if it gets a machine check I expect it to gently stop and the die \n> notification to be delivered to all registered notifiers.\n> \n> Until before this patch, it was working well.\n\nI guess the alternative is we could check regs->trap for machine\ncheck in the die test. Complication is having to account for MCE\nin an interrupt handler.\n\n       if (in_interrupt()) {\n                if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))\n                    panic(\"Fatal exception in interrupt\");\n       }\n\nSomething like that might work for you? We needs a ppc64 macro for the\nMCE, and can probably add something like in_nmi_from_interrupt() for\nthe second part of the test.\n\nThanks,\nNick","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42Tm8v0SSFz9s89\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 16:33:03 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42Tm8t5z7hzF3Nh\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 16:33:02 +1100 (AEDT)","from mail-pf1-x442.google.com (mail-pf1-x442.google.com\n\t[IPv6:2607:f8b0:4864:20::442])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42Tm6f1JctzF3M2\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue,  9 Oct 2018 16:31:06 +1100 (AEDT)","by mail-pf1-x442.google.com with SMTP id r64-v6so220976pfb.13\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tMon, 08 Oct 2018 22:31:05 -0700 (PDT)","from roar.ozlabs.ibm.com (60-240-121-136.tpgi.com.au.\n\t[60.240.121.136]) by smtp.gmail.com with ESMTPSA id\n\tx69-v6sm26733261pff.175.2018.10.08.22.31.01\n\t(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);\n\tMon, 08 Oct 2018 22:31:03 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=fail (p=none dis=none) header.from=gmail.com","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"MFimOmF9\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdmarc=pass (p=none dis=none) header.from=gmail.com","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"MFimOmF9\"; dkim-atps=neutral","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gmail.com\n\t(client-ip=2607:f8b0:4864:20::442; helo=mail-pf1-x442.google.com;\n\tenvelope-from=npiggin@gmail.com; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=pass (p=none dis=none) header.from=gmail.com","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"MFimOmF9\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=date:from:to:cc:subject:message-id:in-reply-to:references\n\t:mime-version:content-transfer-encoding;\n\tbh=PAvZi4B1j4NUthD1J1hm7F7WJLP0ZYFQKrPDBjle3/I=;\n\tb=MFimOmF9SksYsYcT8VPs6I3K/pPiB505J6wUH2b4mHsPIkbzygU2OdfrAhTOejnWI5\n\tIb3HIsPyTbs46dfcE2KNWmyp7SLYmarR2WzEVkgs61z1EzqEA+OPihc3Pur+0801wnId\n\tMHYJ6OTRQ8/NrghTAk0XtobAzNiomu1oG9fl1Z1EolH0Ta5J7B6K+9hqzwSA4BrkBVwz\n\tfBgReZeRCwOIw82ohiLt89prrdDnEhjfAsMPHxHhuian+Cg+y4Lxy81x0+boQ0Cy1FFB\n\ti5Sq6DO9yklXjTSAdNcvh+V1l3W93wWVRVxznr6N+NlowDmBjprYquxBtpigKYdTnT39\n\t+/Jw==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to\n\t:references:mime-version:content-transfer-encoding;\n\tbh=PAvZi4B1j4NUthD1J1hm7F7WJLP0ZYFQKrPDBjle3/I=;\n\tb=NT4zOWTUdu8bQKHxFag0vvdPkNZ22FErQos40rBWnzNzQT26z1glzvgkICgazDABbp\n\turiW+FPOz1WaHjG3lCymgeXRV9SzHDpJtr7mI25CjZKFiDdu9RZMUzA2IJxrREktz9g5\n\tSseHO4Ekpx2B7pQYmDDVvFpzNtiBwfWYI1D5mXgM9zMLR1YtrtZ5MxcEmOjoE2CO5Ntx\n\tqW7l9/I3kMnXTJ2h8AbE+AbN4guXuZVq28t6qZoUU4CJZ4QbcHk1zCrbvJ+YOEBgHC4w\n\t64Dt7+JUixqUALd1Gm3Y2H14FPvjvHKPalGwXiq2AjL6P9OtCfq36wJqWSN8pKK8mi+4\n\tapGQ==","X-Gm-Message-State":"ABuFfoiz8cvO0IRjkuw/ZRNLXP24C+vcCivmmgeHzPHAvBeFIQ+OmhCA\n\tag7oDYwvJ9BUXa8mgFaMW0w=","X-Google-Smtp-Source":"ACcGV62AQ6Y1odG1G6Vt4IF4Mx3jylHpUFFAuW5mqudSHYKGPFtd2Ly6FRxHIaRjh2pBbgRRfYTkHA==","X-Received":"by 2002:a62:42d4:: with SMTP id\n\th81-v6mr29132563pfd.0.1539063063910; \n\tMon, 08 Oct 2018 22:31:03 -0700 (PDT)","Date":"Tue, 9 Oct 2018 15:30:58 +1000","From":"Nicholas Piggin <npiggin@gmail.com>","To":"Christophe LEROY <christophe.leroy@c-s.fr>","Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","Message-ID":"<20181009153058.2564e7a1@roar.ozlabs.ibm.com>","In-Reply-To":"<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>\n\t<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>","X-Mailer":"Claws Mail 3.17.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu)","MIME-Version":"1.0","Content-Type":"text/plain; charset=UTF-8","Content-Transfer-Encoding":"quoted-printable","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2006407,"web_url":"http://patchwork.ozlabs.org/comment/2006407/","msgid":"<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>","date":"2018-10-09T09:36:18","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":5234,"url":"http://patchwork.ozlabs.org/api/people/5234/","name":"Christophe Leroy","email":"christophe.leroy@c-s.fr"},"content":"On 10/09/2018 05:30 AM, Nicholas Piggin wrote:\n> On Tue, 9 Oct 2018 06:46:30 +0200\n> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n> \n>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :\n>>> On Mon, 8 Oct 2018 17:39:11 +0200\n>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>    \n>>>> Hi Nick,\n>>>>\n>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :\n>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n>>>>> printk NMI buffers and turns off various debugging facilities that\n>>>>> helps avoid tripping on ourselves or other CPUs.\n>>>>>\n>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n>>>>> ---\n>>>>>     arch/powerpc/kernel/traps.c | 9 ++++++---\n>>>>>     1 file changed, 6 insertions(+), 3 deletions(-)\n>>>>>\n>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n>>>>> index 2849c4f50324..6d31f9d7c333 100644\n>>>>> --- a/arch/powerpc/kernel/traps.c\n>>>>> +++ b/arch/powerpc/kernel/traps.c\n>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)\n>>>>>     \n>>>>>     void machine_check_exception(struct pt_regs *regs)\n>>>>>     {\n>>>>> -\tenum ctx_state prev_state = exception_enter();\n>>>>>     \tint recover = 0;\n>>>>> +\tbool nested = in_nmi();\n>>>>> +\tif (!nested)\n>>>>> +\t\tnmi_enter();\n>>>>\n>>>> This alters preempt_count, then when die() is called\n>>>> in_interrupt() returns true allthough the trap didn't happen in\n>>>> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n>>>> instead of gently sending SIGBUS the faulting app.\n>>>\n>>> Thanks for tracking that down.\n>>>    \n>>>> Any idea on how to fix this ?\n>>>\n>>> I would say we have to deliver the sigbus by hand.\n>>>\n>>>       if ((user_mode(regs)))\n>>>           _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n>>>       else\n>>>           die(\"Machine check\", regs, SIGBUS);\n>>>    \n>>\n>> And what about all the other things done by 'die()' ?\n>>\n>> And what if it is a kernel thread ?\n>>\n>> In one of my boards, I have a kernel thread regularly checking the HW,\n>> and if it gets a machine check I expect it to gently stop and the die\n>> notification to be delivered to all registered notifiers.\n>>\n>> Until before this patch, it was working well.\n> \n> I guess the alternative is we could check regs->trap for machine\n> check in the die test. Complication is having to account for MCE\n> in an interrupt handler.\n> \n>         if (in_interrupt()) {\n>                  if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))\n>                      panic(\"Fatal exception in interrupt\");\n>         }\n> \n> Something like that might work for you? We needs a ppc64 macro for the\n> MCE, and can probably add something like in_nmi_from_interrupt() for\n> the second part of the test.\n\nDon't know, I'm away from home on business trip so I won't be able to \ntest anything before next week. However it looks more or less like a \nhack, doesn't it ?\n\nWhat about the following ?\n\ndiff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\nindex fd58749b4d6b..1f09033a5103 100644\n--- a/arch/powerpc/kernel/traps.c\n+++ b/arch/powerpc/kernel/traps.c\n@@ -208,7 +208,7 @@ static unsigned long oops_begin(struct pt_regs *regs)\n  NOKPROBE_SYMBOL(oops_begin);\n\n  static void oops_end(unsigned long flags, struct pt_regs *regs,\n-\t\t\t       int signr)\n+\t\t     int signr, bool is_in_interrupt)\n  {\n  \tbust_spinlocks(0);\n  \tadd_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);\n@@ -247,7 +247,7 @@ static void oops_end(unsigned long flags, struct \npt_regs *regs,\n  \t\tmdelay(MSEC_PER_SEC);\n  \t}\n\n-\tif (in_interrupt())\n+\tif (is_in_interrupt)\n  \t\tpanic(\"Fatal exception in interrupt\");\n  \tif (panic_on_oops)\n  \t\tpanic(\"Fatal exception\");\n@@ -288,7 +288,7 @@ static int __die(const char *str, struct pt_regs \n*regs, long err)\n  }\n  NOKPROBE_SYMBOL(__die);\n\n-void die(const char *str, struct pt_regs *regs, long err)\n+static void nmi_die(const char *str, struct pt_regs *regs, long err, \nbool is_in_interrupt)\n  {\n  \tunsigned long flags;\n\n@@ -303,7 +303,13 @@ void die(const char *str, struct pt_regs *regs, \nlong err)\n  \tflags = oops_begin(regs);\n  \tif (__die(str, regs, err))\n  \t\terr = 0;\n-\toops_end(flags, regs, err);\n+\toops_end(flags, regs, err, is_in_interrupt);\n+}\n+NOKPROBE_SYMBOL(nmi_die);\n+\n+void die(const char *str, struct pt_regs *regs, long err)\n+{\n+\tnmi_die(str, regs, err, in_interrupt());\n  }\n  NOKPROBE_SYMBOL(die);\n\n@@ -737,6 +743,7 @@ int machine_check_generic(struct pt_regs *regs)\n  void machine_check_exception(struct pt_regs *regs)\n  {\n  \tint recover = 0;\n+\tbool is_in_interrupt = in_interrupt();\n  \tbool nested = in_nmi();\n  \tif (!nested)\n  \t\tnmi_enter();\n@@ -765,7 +772,7 @@ void machine_check_exception(struct pt_regs *regs)\n  \tif (check_io_access(regs))\n  \t\tgoto bail;\n\n-\tdie(\"Machine check\", regs, SIGBUS);\n+\tnmi_die(\"Machine check\", regs, SIGBUS, is_in_interrupt);\n\n  \t/* Must die if the interrupt is not recoverable */\n  \tif (!(regs->msr & MSR_RI))\n\n\nThanks\nChristophe","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42TthQ6J1sz9s5c\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 21:27:18 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42TthQ4xgfzF3Bm\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 21:27:18 +1100 (AEDT)","from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42Ttfn6HxjzF37B\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue,  9 Oct 2018 21:25:53 +1100 (AEDT)","from localhost (mailhub1-int [192.168.12.234])\n\tby localhost (Postfix) with ESMTP id 42TtfS23vNz9ttCW;\n\tTue,  9 Oct 2018 12:25:36 +0200 (CEST)","from pegase1.c-s.fr ([192.168.12.234])\n\tby localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new,\n\tport 10024)\n\twith ESMTP id 7bD5Opdj_XaS; Tue,  9 Oct 2018 12:25:36 +0200 (CEST)","from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192])\n\tby pegase1.c-s.fr (Postfix) with ESMTP id 42TtfS1SFvz9ttCT;\n\tTue,  9 Oct 2018 12:25:36 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id A8EBF8B7FC;\n\tTue,  9 Oct 2018 12:25:47 +0200 (CEST)","from messagerie.si.c-s.fr ([127.0.0.1])\n\tby localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new,\n\tport 10023)\n\twith ESMTP id lVx0-OH54nbz; Tue,  9 Oct 2018 12:25:47 +0200 (CEST)","from localhost.localdomain (unknown [192.168.232.3])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 5EB848B7FA;\n\tTue,  9 Oct 2018 12:25:47 +0200 (CEST)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=c-s.fr\n\t(client-ip=93.17.236.30; helo=pegase1.c-s.fr;\n\tenvelope-from=christophe.leroy@c-s.fr; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr"],"X-Virus-Scanned":["Debian amavisd-new at c-s.fr","amavisd-new at c-s.fr"],"Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","To":"Nicholas Piggin <npiggin@gmail.com>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>\n\t<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>\n\t<20181009153058.2564e7a1@roar.ozlabs.ibm.com>","From":"Christophe Leroy <christophe.leroy@c-s.fr>","Message-ID":"<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>","Date":"Tue, 9 Oct 2018 09:36:18 +0000","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.7.0","MIME-Version":"1.0","In-Reply-To":"<20181009153058.2564e7a1@roar.ozlabs.ibm.com>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Language":"en-US","Content-Transfer-Encoding":"8bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2006431,"web_url":"http://patchwork.ozlabs.org/comment/2006431/","msgid":"<20181009211650.042d428c@roar.ozlabs.ibm.com>","date":"2018-10-09T11:16:50","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":69518,"url":"http://patchwork.ozlabs.org/api/people/69518/","name":"Nicholas Piggin","email":"npiggin@gmail.com"},"content":"On Tue, 9 Oct 2018 09:36:18 +0000\nChristophe Leroy <christophe.leroy@c-s.fr> wrote:\n\n> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:\n> > On Tue, 9 Oct 2018 06:46:30 +0200\n> > Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n> >   \n> >> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :  \n> >>> On Mon, 8 Oct 2018 17:39:11 +0200\n> >>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n> >>>      \n> >>>> Hi Nick,\n> >>>>\n> >>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  \n> >>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n> >>>>> printk NMI buffers and turns off various debugging facilities that\n> >>>>> helps avoid tripping on ourselves or other CPUs.\n> >>>>>\n> >>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n> >>>>> ---\n> >>>>>     arch/powerpc/kernel/traps.c | 9 ++++++---\n> >>>>>     1 file changed, 6 insertions(+), 3 deletions(-)\n> >>>>>\n> >>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n> >>>>> index 2849c4f50324..6d31f9d7c333 100644\n> >>>>> --- a/arch/powerpc/kernel/traps.c\n> >>>>> +++ b/arch/powerpc/kernel/traps.c\n> >>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)\n> >>>>>     \n> >>>>>     void machine_check_exception(struct pt_regs *regs)\n> >>>>>     {\n> >>>>> -\tenum ctx_state prev_state = exception_enter();\n> >>>>>     \tint recover = 0;\n> >>>>> +\tbool nested = in_nmi();\n> >>>>> +\tif (!nested)\n> >>>>> +\t\tnmi_enter();  \n> >>>>\n> >>>> This alters preempt_count, then when die() is called\n> >>>> in_interrupt() returns true allthough the trap didn't happen in\n> >>>> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n> >>>> instead of gently sending SIGBUS the faulting app.  \n> >>>\n> >>> Thanks for tracking that down.\n> >>>      \n> >>>> Any idea on how to fix this ?  \n> >>>\n> >>> I would say we have to deliver the sigbus by hand.\n> >>>\n> >>>       if ((user_mode(regs)))\n> >>>           _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n> >>>       else\n> >>>           die(\"Machine check\", regs, SIGBUS);\n> >>>      \n> >>\n> >> And what about all the other things done by 'die()' ?\n> >>\n> >> And what if it is a kernel thread ?\n> >>\n> >> In one of my boards, I have a kernel thread regularly checking the HW,\n> >> and if it gets a machine check I expect it to gently stop and the die\n> >> notification to be delivered to all registered notifiers.\n> >>\n> >> Until before this patch, it was working well.  \n> > \n> > I guess the alternative is we could check regs->trap for machine\n> > check in the die test. Complication is having to account for MCE\n> > in an interrupt handler.\n> > \n> >         if (in_interrupt()) {\n> >                  if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))\n> >                      panic(\"Fatal exception in interrupt\");\n> >         }\n> > \n> > Something like that might work for you? We needs a ppc64 macro for the\n> > MCE, and can probably add something like in_nmi_from_interrupt() for\n> > the second part of the test.  \n> \n> Don't know, I'm away from home on business trip so I won't be able to \n> test anything before next week. However it looks more or less like a \n> hack, doesn't it ?\n\nI thought it seemed okay (with the right functions added). Actually it\ncould be a bit nicer to do this, then it works generally :\n\n         if (in_interrupt()) {\n                  if (!in_nmi() || in_nmi_from_interrupt())\n                      panic(\"Fatal exception in interrupt\");\n         }\n\n> \n> What about the following ?\n\nHmm, in some ways maybe it's nicer. One complication is I would like the\nsame thing to be available for platform specific machine check\nhandlers, so then you need to pass is_in_interrupt to them. Which you\ncan do without any problem... But is it cleaner than the above?\n\nI guess one advantage of yours is that a BUG somewhere in the NMI path\nwill panic the system. Or is that a disadvantage?\n\nThanks,\nNick\n\n\n> \n> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n> index fd58749b4d6b..1f09033a5103 100644\n> --- a/arch/powerpc/kernel/traps.c\n> +++ b/arch/powerpc/kernel/traps.c\n> @@ -208,7 +208,7 @@ static unsigned long oops_begin(struct pt_regs *regs)\n>   NOKPROBE_SYMBOL(oops_begin);\n> \n>   static void oops_end(unsigned long flags, struct pt_regs *regs,\n> -\t\t\t       int signr)\n> +\t\t     int signr, bool is_in_interrupt)\n>   {\n>   \tbust_spinlocks(0);\n>   \tadd_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);\n> @@ -247,7 +247,7 @@ static void oops_end(unsigned long flags, struct \n> pt_regs *regs,\n>   \t\tmdelay(MSEC_PER_SEC);\n>   \t}\n> \n> -\tif (in_interrupt())\n> +\tif (is_in_interrupt)\n>   \t\tpanic(\"Fatal exception in interrupt\");\n>   \tif (panic_on_oops)\n>   \t\tpanic(\"Fatal exception\");\n> @@ -288,7 +288,7 @@ static int __die(const char *str, struct pt_regs \n> *regs, long err)\n>   }\n>   NOKPROBE_SYMBOL(__die);\n> \n> -void die(const char *str, struct pt_regs *regs, long err)\n> +static void nmi_die(const char *str, struct pt_regs *regs, long err, \n> bool is_in_interrupt)\n>   {\n>   \tunsigned long flags;\n> \n> @@ -303,7 +303,13 @@ void die(const char *str, struct pt_regs *regs, \n> long err)\n>   \tflags = oops_begin(regs);\n>   \tif (__die(str, regs, err))\n>   \t\terr = 0;\n> -\toops_end(flags, regs, err);\n> +\toops_end(flags, regs, err, is_in_interrupt);\n> +}\n> +NOKPROBE_SYMBOL(nmi_die);\n> +\n> +void die(const char *str, struct pt_regs *regs, long err)\n> +{\n> +\tnmi_die(str, regs, err, in_interrupt());\n>   }\n>   NOKPROBE_SYMBOL(die);\n> \n> @@ -737,6 +743,7 @@ int machine_check_generic(struct pt_regs *regs)\n>   void machine_check_exception(struct pt_regs *regs)\n>   {\n>   \tint recover = 0;\n> +\tbool is_in_interrupt = in_interrupt();\n>   \tbool nested = in_nmi();\n>   \tif (!nested)\n>   \t\tnmi_enter();\n> @@ -765,7 +772,7 @@ void machine_check_exception(struct pt_regs *regs)\n>   \tif (check_io_access(regs))\n>   \t\tgoto bail;\n> \n> -\tdie(\"Machine check\", regs, SIGBUS);\n> +\tnmi_die(\"Machine check\", regs, SIGBUS, is_in_interrupt);\n> \n>   \t/* Must die if the interrupt is not recoverable */\n>   \tif (!(regs->msr & MSR_RI))\n> \n> \n> Thanks\n> Christophe","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42Tvr073DDz9s89\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 22:18:56 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42Tvr04t3XzF39y\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 22:18:56 +1100 (AEDT)","from mail-pf1-x442.google.com (mail-pf1-x442.google.com\n\t[IPv6:2607:f8b0:4864:20::442])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42Tvnl1pBmzF39Y\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue,  9 Oct 2018 22:16:58 +1100 (AEDT)","by mail-pf1-x442.google.com with SMTP id l81-v6so686055pfg.3\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue, 09 Oct 2018 04:16:58 -0700 (PDT)","from roar.ozlabs.ibm.com (60-240-121-136.tpgi.com.au.\n\t[60.240.121.136]) by smtp.gmail.com with ESMTPSA id\n\tk70-v6sm33786692pfc.76.2018.10.09.04.16.54\n\t(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);\n\tTue, 09 Oct 2018 04:16:56 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=fail (p=none dis=none) header.from=gmail.com","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"SuCZosiw\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdmarc=pass (p=none dis=none) header.from=gmail.com","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"SuCZosiw\"; dkim-atps=neutral","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gmail.com\n\t(client-ip=2607:f8b0:4864:20::442; helo=mail-pf1-x442.google.com;\n\tenvelope-from=npiggin@gmail.com; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=pass (p=none dis=none) header.from=gmail.com","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"SuCZosiw\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=date:from:to:cc:subject:message-id:in-reply-to:references\n\t:mime-version:content-transfer-encoding;\n\tbh=pa7vEVoOVEMh1ttml1yYqW1Nzcq1XB8h3JQ8HTmJHU8=;\n\tb=SuCZosiwMDaLlXvZGEowuBCre7LF5CoCzrhBbFuMuyI87KKq6S+Uy+V8z47RhURy6M\n\t6acfH0Prn3Me8nY8BtW3t6gcHU0n2+A40smyC1OlNM1Cz3gY7NZdBeNLMq1Susm1hez1\n\t6FHMO4/0AT7Xf8mgTO3eLW/k5hxBivqXsYtZERwvlMbgA08dITV40GPa+RWJijwXwW0W\n\tDNxdgWoVF5JJaJsKxa5Xfes26RxjWmJNz1b9Q92BLIrPTpl1o7ZO+WYwzrKtrYtR7js4\n\t4med68eus83nphxa3Q0Nxv+sz0kp/g2P4RXQTSlkHPNXYVMzxhOV0nAHHkIwV4X2DjHz\n\t5Dmg==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to\n\t:references:mime-version:content-transfer-encoding;\n\tbh=pa7vEVoOVEMh1ttml1yYqW1Nzcq1XB8h3JQ8HTmJHU8=;\n\tb=D3ioLuM98ER48TbtP3v1gr1ikP3KuSMgCCxmDQSet9lw0ZVFIrzlgeakNGbfPxDjAo\n\tQKQyV4GD8LxYJBRyozwH2t/nevzGGZQgUHibBhLGGK0tBEjzwVhsBuSzmrq7k0gQT1e4\n\t9olAsiF4wO8D5icJPSL/OM24N4EA23mWDVvJmyVwt/pRgCE11sjBaXXYQZii4BKncrYe\n\tuaEMtEaJ2780AEl6UifKM3G3JeBxp8HHQ8Q6SNYDK7+GV531AqBacDWZp3TmiS3QyMoo\n\tM3iLp/2vwfUFOsx3xzBQAm1sLzvSYgmZ8irKMU1b8aQQSFqwC1iUws0KuC2xsmJzUt0L\n\thXsg==","X-Gm-Message-State":"ABuFfoh+m1xZnoZavIMXimXWhmvNQOMYmU40TUKJawhQfLW+KLQDUXmr\n\tUfHkE2lZX2O7ST8PBR9Obag=","X-Google-Smtp-Source":"ACcGV63sR+Cp1dhLUkTrdCh+oeoJp+S0ssjvz0mv4fYTjTjSYRlWq5SWG8IttEhJNCnDwQwS6cSsbw==","X-Received":"by 2002:a63:904a:: with SMTP id\n\ta71-v6mr25784754pge.264.1539083816972; \n\tTue, 09 Oct 2018 04:16:56 -0700 (PDT)","Date":"Tue, 9 Oct 2018 21:16:50 +1000","From":"Nicholas Piggin <npiggin@gmail.com>","To":"Christophe Leroy <christophe.leroy@c-s.fr>","Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","Message-ID":"<20181009211650.042d428c@roar.ozlabs.ibm.com>","In-Reply-To":"<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>\n\t<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>\n\t<20181009153058.2564e7a1@roar.ozlabs.ibm.com>\n\t<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>","X-Mailer":"Claws Mail 3.17.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu)","MIME-Version":"1.0","Content-Type":"text/plain; charset=UTF-8","Content-Transfer-Encoding":"quoted-printable","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2006474,"web_url":"http://patchwork.ozlabs.org/comment/2006474/","msgid":"<e4c9e983-db3e-ab50-c30b-9d538e202147@c-s.fr>","date":"2018-10-09T12:01:37","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":5234,"url":"http://patchwork.ozlabs.org/api/people/5234/","name":"Christophe Leroy","email":"christophe.leroy@c-s.fr"},"content":"Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :\n> On Tue, 9 Oct 2018 09:36:18 +0000\n> Christophe Leroy <christophe.leroy@c-s.fr> wrote:\n> \n>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:\n>>> On Tue, 9 Oct 2018 06:46:30 +0200\n>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>    \n>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :\n>>>>> On Mon, 8 Oct 2018 17:39:11 +0200\n>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>>>       \n>>>>>> Hi Nick,\n>>>>>>\n>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :\n>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n>>>>>>> printk NMI buffers and turns off various debugging facilities that\n>>>>>>> helps avoid tripping on ourselves or other CPUs.\n>>>>>>>\n>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n>>>>>>> ---\n>>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---\n>>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)\n>>>>>>>\n>>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n>>>>>>> index 2849c4f50324..6d31f9d7c333 100644\n>>>>>>> --- a/arch/powerpc/kernel/traps.c\n>>>>>>> +++ b/arch/powerpc/kernel/traps.c\n>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)\n>>>>>>>      \n>>>>>>>      void machine_check_exception(struct pt_regs *regs)\n>>>>>>>      {\n>>>>>>> -\tenum ctx_state prev_state = exception_enter();\n>>>>>>>      \tint recover = 0;\n>>>>>>> +\tbool nested = in_nmi();\n>>>>>>> +\tif (!nested)\n>>>>>>> +\t\tnmi_enter();\n>>>>>>\n>>>>>> This alters preempt_count, then when die() is called\n>>>>>> in_interrupt() returns true allthough the trap didn't happen in\n>>>>>> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n>>>>>> instead of gently sending SIGBUS the faulting app.\n>>>>>\n>>>>> Thanks for tracking that down.\n>>>>>       \n>>>>>> Any idea on how to fix this ?\n>>>>>\n>>>>> I would say we have to deliver the sigbus by hand.\n>>>>>\n>>>>>        if ((user_mode(regs)))\n>>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n>>>>>        else\n>>>>>            die(\"Machine check\", regs, SIGBUS);\n>>>>>       \n>>>>\n>>>> And what about all the other things done by 'die()' ?\n>>>>\n>>>> And what if it is a kernel thread ?\n>>>>\n>>>> In one of my boards, I have a kernel thread regularly checking the HW,\n>>>> and if it gets a machine check I expect it to gently stop and the die\n>>>> notification to be delivered to all registered notifiers.\n>>>>\n>>>> Until before this patch, it was working well.\n>>>\n>>> I guess the alternative is we could check regs->trap for machine\n>>> check in the die test. Complication is having to account for MCE\n>>> in an interrupt handler.\n>>>\n>>>          if (in_interrupt()) {\n>>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))\n>>>                       panic(\"Fatal exception in interrupt\");\n>>>          }\n>>>\n>>> Something like that might work for you? We needs a ppc64 macro for the\n>>> MCE, and can probably add something like in_nmi_from_interrupt() for\n>>> the second part of the test.\n>>\n>> Don't know, I'm away from home on business trip so I won't be able to\n>> test anything before next week. However it looks more or less like a\n>> hack, doesn't it ?\n> \n> I thought it seemed okay (with the right functions added). Actually it\n> could be a bit nicer to do this, then it works generally :\n> \n>           if (in_interrupt()) {\n>                    if (!in_nmi() || in_nmi_from_interrupt())\n>                        panic(\"Fatal exception in interrupt\");\n>           }\n\n\nYes looks nice, but:\n1/ what is in_nmi_from_interrupt() ? Is it (in_nmi() && (in_irq() || \nin_softirq()) ?\n2/ what about in_nmi_from_nmi(), how do we detect that ?\n\nChristophe\n\n> \n>>\n>> What about the following ?\n> \n> Hmm, in some ways maybe it's nicer. One complication is I would like the\n> same thing to be available for platform specific machine check\n> handlers, so then you need to pass is_in_interrupt to them. Which you\n> can do without any problem... But is it cleaner than the above?\n> \n> I guess one advantage of yours is that a BUG somewhere in the NMI path\n> will panic the system. Or is that a disadvantage?\n> \n> Thanks,\n> Nick\n> \n> \n>>\n>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n>> index fd58749b4d6b..1f09033a5103 100644\n>> --- a/arch/powerpc/kernel/traps.c\n>> +++ b/arch/powerpc/kernel/traps.c\n>> @@ -208,7 +208,7 @@ static unsigned long oops_begin(struct pt_regs *regs)\n>>    NOKPROBE_SYMBOL(oops_begin);\n>>\n>>    static void oops_end(unsigned long flags, struct pt_regs *regs,\n>> -\t\t\t       int signr)\n>> +\t\t     int signr, bool is_in_interrupt)\n>>    {\n>>    \tbust_spinlocks(0);\n>>    \tadd_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);\n>> @@ -247,7 +247,7 @@ static void oops_end(unsigned long flags, struct\n>> pt_regs *regs,\n>>    \t\tmdelay(MSEC_PER_SEC);\n>>    \t}\n>>\n>> -\tif (in_interrupt())\n>> +\tif (is_in_interrupt)\n>>    \t\tpanic(\"Fatal exception in interrupt\");\n>>    \tif (panic_on_oops)\n>>    \t\tpanic(\"Fatal exception\");\n>> @@ -288,7 +288,7 @@ static int __die(const char *str, struct pt_regs\n>> *regs, long err)\n>>    }\n>>    NOKPROBE_SYMBOL(__die);\n>>\n>> -void die(const char *str, struct pt_regs *regs, long err)\n>> +static void nmi_die(const char *str, struct pt_regs *regs, long err,\n>> bool is_in_interrupt)\n>>    {\n>>    \tunsigned long flags;\n>>\n>> @@ -303,7 +303,13 @@ void die(const char *str, struct pt_regs *regs,\n>> long err)\n>>    \tflags = oops_begin(regs);\n>>    \tif (__die(str, regs, err))\n>>    \t\terr = 0;\n>> -\toops_end(flags, regs, err);\n>> +\toops_end(flags, regs, err, is_in_interrupt);\n>> +}\n>> +NOKPROBE_SYMBOL(nmi_die);\n>> +\n>> +void die(const char *str, struct pt_regs *regs, long err)\n>> +{\n>> +\tnmi_die(str, regs, err, in_interrupt());\n>>    }\n>>    NOKPROBE_SYMBOL(die);\n>>\n>> @@ -737,6 +743,7 @@ int machine_check_generic(struct pt_regs *regs)\n>>    void machine_check_exception(struct pt_regs *regs)\n>>    {\n>>    \tint recover = 0;\n>> +\tbool is_in_interrupt = in_interrupt();\n>>    \tbool nested = in_nmi();\n>>    \tif (!nested)\n>>    \t\tnmi_enter();\n>> @@ -765,7 +772,7 @@ void machine_check_exception(struct pt_regs *regs)\n>>    \tif (check_io_access(regs))\n>>    \t\tgoto bail;\n>>\n>> -\tdie(\"Machine check\", regs, SIGBUS);\n>> +\tnmi_die(\"Machine check\", regs, SIGBUS, is_in_interrupt);\n>>\n>>    \t/* Must die if the interrupt is not recoverable */\n>>    \tif (!(regs->msr & MSR_RI))\n>>\n>>\n>> Thanks\n>> Christophe","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42TwqG513nz9s7h\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 23:03:22 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42TwqG3ZdBzF3G5\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 23:03:22 +1100 (AEDT)","from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42TwnL1LvxzDrP3\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue,  9 Oct 2018 23:01:42 +1100 (AEDT)","from localhost (mailhub1-int [192.168.12.234])\n\tby localhost (Postfix) with ESMTP id 42Twn21d0Mz9ttFY;\n\tTue,  9 Oct 2018 14:01:26 +0200 (CEST)","from pegase1.c-s.fr ([192.168.12.234])\n\tby localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new,\n\tport 10024)\n\twith ESMTP id yfWdA3zIZAuR; Tue,  9 Oct 2018 14:01:26 +0200 (CEST)","from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192])\n\tby pegase1.c-s.fr (Postfix) with ESMTP id 42Twn214h1z9ttCB;\n\tTue,  9 Oct 2018 14:01:26 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 9C4DD8B803;\n\tTue,  9 Oct 2018 14:01:37 +0200 (CEST)","from messagerie.si.c-s.fr ([127.0.0.1])\n\tby localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new,\n\tport 10023)\n\twith ESMTP id 5dieokrgOn6t; Tue,  9 Oct 2018 14:01:37 +0200 (CEST)","from PO15451 (unknown [192.168.232.3])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 326DE8B7FA;\n\tTue,  9 Oct 2018 14:01:37 +0200 (CEST)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=c-s.fr\n\t(client-ip=93.17.236.30; helo=pegase1.c-s.fr;\n\tenvelope-from=christophe.leroy@c-s.fr; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr"],"X-Virus-Scanned":["Debian amavisd-new at c-s.fr","amavisd-new at c-s.fr"],"Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","To":"Nicholas Piggin <npiggin@gmail.com>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>\n\t<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>\n\t<20181009153058.2564e7a1@roar.ozlabs.ibm.com>\n\t<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>\n\t<20181009211650.042d428c@roar.ozlabs.ibm.com>","From":"Christophe LEROY <christophe.leroy@c-s.fr>","Message-ID":"<e4c9e983-db3e-ab50-c30b-9d538e202147@c-s.fr>","Date":"Tue, 9 Oct 2018 14:01:37 +0200","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101\n\tThunderbird/52.9.1","MIME-Version":"1.0","In-Reply-To":"<20181009211650.042d428c@roar.ozlabs.ibm.com>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Language":"fr","Content-Transfer-Encoding":"8bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2006485,"web_url":"http://patchwork.ozlabs.org/comment/2006485/","msgid":"<20181009221446.33b926e3@roar.ozlabs.ibm.com>","date":"2018-10-09T12:14:46","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":69518,"url":"http://patchwork.ozlabs.org/api/people/69518/","name":"Nicholas Piggin","email":"npiggin@gmail.com"},"content":"On Tue, 9 Oct 2018 14:01:37 +0200\nChristophe LEROY <christophe.leroy@c-s.fr> wrote:\n\n> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :\n> > On Tue, 9 Oct 2018 09:36:18 +0000\n> > Christophe Leroy <christophe.leroy@c-s.fr> wrote:\n> >   \n> >> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:  \n> >>> On Tue, 9 Oct 2018 06:46:30 +0200\n> >>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n> >>>      \n> >>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :  \n> >>>>> On Mon, 8 Oct 2018 17:39:11 +0200\n> >>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n> >>>>>         \n> >>>>>> Hi Nick,\n> >>>>>>\n> >>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  \n> >>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n> >>>>>>> printk NMI buffers and turns off various debugging facilities that\n> >>>>>>> helps avoid tripping on ourselves or other CPUs.\n> >>>>>>>\n> >>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n> >>>>>>> ---\n> >>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---\n> >>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)\n> >>>>>>>\n> >>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n> >>>>>>> index 2849c4f50324..6d31f9d7c333 100644\n> >>>>>>> --- a/arch/powerpc/kernel/traps.c\n> >>>>>>> +++ b/arch/powerpc/kernel/traps.c\n> >>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)\n> >>>>>>>      \n> >>>>>>>      void machine_check_exception(struct pt_regs *regs)\n> >>>>>>>      {\n> >>>>>>> -\tenum ctx_state prev_state = exception_enter();\n> >>>>>>>      \tint recover = 0;\n> >>>>>>> +\tbool nested = in_nmi();\n> >>>>>>> +\tif (!nested)\n> >>>>>>> +\t\tnmi_enter();  \n> >>>>>>\n> >>>>>> This alters preempt_count, then when die() is called\n> >>>>>> in_interrupt() returns true allthough the trap didn't happen in\n> >>>>>> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n> >>>>>> instead of gently sending SIGBUS the faulting app.  \n> >>>>>\n> >>>>> Thanks for tracking that down.\n> >>>>>         \n> >>>>>> Any idea on how to fix this ?  \n> >>>>>\n> >>>>> I would say we have to deliver the sigbus by hand.\n> >>>>>\n> >>>>>        if ((user_mode(regs)))\n> >>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n> >>>>>        else\n> >>>>>            die(\"Machine check\", regs, SIGBUS);\n> >>>>>         \n> >>>>\n> >>>> And what about all the other things done by 'die()' ?\n> >>>>\n> >>>> And what if it is a kernel thread ?\n> >>>>\n> >>>> In one of my boards, I have a kernel thread regularly checking the HW,\n> >>>> and if it gets a machine check I expect it to gently stop and the die\n> >>>> notification to be delivered to all registered notifiers.\n> >>>>\n> >>>> Until before this patch, it was working well.  \n> >>>\n> >>> I guess the alternative is we could check regs->trap for machine\n> >>> check in the die test. Complication is having to account for MCE\n> >>> in an interrupt handler.\n> >>>\n> >>>          if (in_interrupt()) {\n> >>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))\n> >>>                       panic(\"Fatal exception in interrupt\");\n> >>>          }\n> >>>\n> >>> Something like that might work for you? We needs a ppc64 macro for the\n> >>> MCE, and can probably add something like in_nmi_from_interrupt() for\n> >>> the second part of the test.  \n> >>\n> >> Don't know, I'm away from home on business trip so I won't be able to\n> >> test anything before next week. However it looks more or less like a\n> >> hack, doesn't it ?  \n> > \n> > I thought it seemed okay (with the right functions added). Actually it\n> > could be a bit nicer to do this, then it works generally :\n> > \n> >           if (in_interrupt()) {\n> >                    if (!in_nmi() || in_nmi_from_interrupt())\n> >                        panic(\"Fatal exception in interrupt\");\n> >           }  \n> \n> \n> Yes looks nice, but:\n> 1/ what is in_nmi_from_interrupt() ? Is it (in_nmi() && (in_irq() || \n> in_softirq()) ?\n\n  return (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET))) != 0;\n\n(basically just in_interrupt() with the nmi_enter undone)\n\n> 2/ what about in_nmi_from_nmi(), how do we detect that ?\n\nOh good point, I'm not sure. I guess we could irq_enter() in the\nnested case, I think that would make in_nmi_from_interrupt()\nreturn true.\n\nThanks,\nNick","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42Tx965nnkz9s8F\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 23:18:50 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42Tx9648Z8zF3Dc\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue,  9 Oct 2018 23:18:50 +1100 (AEDT)","from mail-pf1-x443.google.com (mail-pf1-x443.google.com\n\t[IPv6:2607:f8b0:4864:20::443])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42Tx4c4ntnzF39C\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue,  9 Oct 2018 23:14:56 +1100 (AEDT)","by mail-pf1-x443.google.com with SMTP id l17-v6so762242pff.2\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue, 09 Oct 2018 05:14:56 -0700 (PDT)","from roar.ozlabs.ibm.com (60-240-121-136.tpgi.com.au.\n\t[60.240.121.136]) by smtp.gmail.com with ESMTPSA id\n\tq2-v6sm69654360pfc.17.2018.10.09.05.14.50\n\t(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);\n\tTue, 09 Oct 2018 05:14:53 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=fail (p=none dis=none) header.from=gmail.com","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"OoRJIZIo\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdmarc=pass (p=none dis=none) header.from=gmail.com","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"OoRJIZIo\"; dkim-atps=neutral","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gmail.com\n\t(client-ip=2607:f8b0:4864:20::443; helo=mail-pf1-x443.google.com;\n\tenvelope-from=npiggin@gmail.com; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=pass (p=none dis=none) header.from=gmail.com","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"OoRJIZIo\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=date:from:to:cc:subject:message-id:in-reply-to:references\n\t:mime-version:content-transfer-encoding;\n\tbh=atSgHrnNu+91+0dyDv68os0nFpmFIyy5S/5zdKfmp04=;\n\tb=OoRJIZIoFQvK15FbFXKjiGVc030BA4p1m6WfDnNHugVtIAh6lqxyZ1izzzaFjMiMSj\n\tm6rNI8oSAvjoeu083dVI2hkq9owXIjqWDNf6yS+3R7+T7W04Mei//z334H2Emv8Dbzg3\n\tTgMzwSaY2WH8NovV3zSrv6yEeuNeMmcz3zTm5kiGoUcNWqqJGzjrbQl33r/fjnt4bo4h\n\tBdNUvFEHqod1GCdlI9w05AxrgaWgwDfYmBh3L+FWRT8XD9RUYQ3iMSVUaNu4uZj/Nu25\n\tl//jGIZYsL8mboOivWv9KdJvK3vBPyfMnkV40otyQGSP30XcXvyT7hP+T4+cLqNbgDpJ\n\tzXwA==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to\n\t:references:mime-version:content-transfer-encoding;\n\tbh=atSgHrnNu+91+0dyDv68os0nFpmFIyy5S/5zdKfmp04=;\n\tb=cwai7b0a6Rh7UHqJRZqXNfffhN9CbTpaVBAXOXGdKhevPzjaOvkv8e7u89/zeirxCN\n\tQDkckYMV4zBOdvSe22pHsrDWwT8JrVwDLTKS5eQNMCK0zaTTMRVqQ0xyqLFh+2MdCnLu\n\tP1o+37LfxWj7id3Y5sQZl2wKOzWhLT8Oi/JjooRVqqxom+gkf/i81UtV0vK8g3aBdyx/\n\tDCQKEiKSmNTiQHzH+0dRyaWgkqLAP2pXzK1eIPq9wNAbMpevyeO+IwgYEhR7fIEfAoO8\n\tmRdgjQfPI3TIkwIiHWMKZdwQPkQILDyCF4dbzd/zvzScfzI2i8Kc2nz/0pts6TsO/Fxn\n\tKeQw==","X-Gm-Message-State":"ABuFfoiSm6vRN/egaD/JxHQRBXqYKwLwC2lOojQwKO/XaPdzPPSh7tAH\n\tfxoDN3A/jU6EnmqdbcOrCpk=","X-Google-Smtp-Source":"ACcGV62C58vSS9oGpNiFW3t5S9jz9JMUXLgV4eANLJ6DO3ZWYoYu8WIu6bX2aUMzwzplbnVdTlesXQ==","X-Received":"by 2002:a62:939d:: with SMTP id\n\tr29-v6mr29959815pfk.55.1539087294059; \n\tTue, 09 Oct 2018 05:14:54 -0700 (PDT)","Date":"Tue, 9 Oct 2018 22:14:46 +1000","From":"Nicholas Piggin <npiggin@gmail.com>","To":"Christophe LEROY <christophe.leroy@c-s.fr>","Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","Message-ID":"<20181009221446.33b926e3@roar.ozlabs.ibm.com>","In-Reply-To":"<e4c9e983-db3e-ab50-c30b-9d538e202147@c-s.fr>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>\n\t<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>\n\t<20181009153058.2564e7a1@roar.ozlabs.ibm.com>\n\t<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>\n\t<20181009211650.042d428c@roar.ozlabs.ibm.com>\n\t<e4c9e983-db3e-ab50-c30b-9d538e202147@c-s.fr>","X-Mailer":"Claws Mail 3.17.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu)","MIME-Version":"1.0","Content-Type":"text/plain; charset=UTF-8","Content-Transfer-Encoding":"quoted-printable","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2008302,"web_url":"http://patchwork.ozlabs.org/comment/2008302/","msgid":"<8281d664-6c4b-3476-ac2d-9fc9ba2c7e03@c-s.fr>","date":"2018-10-11T14:23:23","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":5234,"url":"http://patchwork.ozlabs.org/api/people/5234/","name":"Christophe Leroy","email":"christophe.leroy@c-s.fr"},"content":"Le 09/10/2018 à 14:14, Nicholas Piggin a écrit :\n> On Tue, 9 Oct 2018 14:01:37 +0200\n> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n> \n>> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :\n>>> On Tue, 9 Oct 2018 09:36:18 +0000\n>>> Christophe Leroy <christophe.leroy@c-s.fr> wrote:\n>>>    \n>>>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:\n>>>>> On Tue, 9 Oct 2018 06:46:30 +0200\n>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>>>       \n>>>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :\n>>>>>>> On Mon, 8 Oct 2018 17:39:11 +0200\n>>>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>>>>>          \n>>>>>>>> Hi Nick,\n>>>>>>>>\n>>>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :\n>>>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n>>>>>>>>> printk NMI buffers and turns off various debugging facilities that\n>>>>>>>>> helps avoid tripping on ourselves or other CPUs.\n>>>>>>>>>\n>>>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n>>>>>>>>> ---\n>>>>>>>>>       arch/powerpc/kernel/traps.c | 9 ++++++---\n>>>>>>>>>       1 file changed, 6 insertions(+), 3 deletions(-)\n>>>>>>>>>\n>>>>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n>>>>>>>>> index 2849c4f50324..6d31f9d7c333 100644\n>>>>>>>>> --- a/arch/powerpc/kernel/traps.c\n>>>>>>>>> +++ b/arch/powerpc/kernel/traps.c\n>>>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)\n>>>>>>>>>       \n>>>>>>>>>       void machine_check_exception(struct pt_regs *regs)\n>>>>>>>>>       {\n>>>>>>>>> -\tenum ctx_state prev_state = exception_enter();\n>>>>>>>>>       \tint recover = 0;\n>>>>>>>>> +\tbool nested = in_nmi();\n>>>>>>>>> +\tif (!nested)\n>>>>>>>>> +\t\tnmi_enter();\n>>>>>>>>\n>>>>>>>> This alters preempt_count, then when die() is called\n>>>>>>>> in_interrupt() returns true allthough the trap didn't happen in\n>>>>>>>> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n>>>>>>>> instead of gently sending SIGBUS the faulting app.\n>>>>>>>\n>>>>>>> Thanks for tracking that down.\n>>>>>>>          \n>>>>>>>> Any idea on how to fix this ?\n>>>>>>>\n>>>>>>> I would say we have to deliver the sigbus by hand.\n>>>>>>>\n>>>>>>>         if ((user_mode(regs)))\n>>>>>>>             _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n>>>>>>>         else\n>>>>>>>             die(\"Machine check\", regs, SIGBUS);\n>>>>>>>          \n>>>>>>\n>>>>>> And what about all the other things done by 'die()' ?\n>>>>>>\n>>>>>> And what if it is a kernel thread ?\n>>>>>>\n>>>>>> In one of my boards, I have a kernel thread regularly checking the HW,\n>>>>>> and if it gets a machine check I expect it to gently stop and the die\n>>>>>> notification to be delivered to all registered notifiers.\n>>>>>>\n>>>>>> Until before this patch, it was working well.\n>>>>>\n>>>>> I guess the alternative is we could check regs->trap for machine\n>>>>> check in the die test. Complication is having to account for MCE\n>>>>> in an interrupt handler.\n>>>>>\n>>>>>           if (in_interrupt()) {\n>>>>>                    if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))\n>>>>>                        panic(\"Fatal exception in interrupt\");\n>>>>>           }\n>>>>>\n>>>>> Something like that might work for you? We needs a ppc64 macro for the\n>>>>> MCE, and can probably add something like in_nmi_from_interrupt() for\n>>>>> the second part of the test.\n>>>>\n>>>> Don't know, I'm away from home on business trip so I won't be able to\n>>>> test anything before next week. However it looks more or less like a\n>>>> hack, doesn't it ?\n>>>\n>>> I thought it seemed okay (with the right functions added). Actually it\n>>> could be a bit nicer to do this, then it works generally :\n>>>\n>>>            if (in_interrupt()) {\n>>>                     if (!in_nmi() || in_nmi_from_interrupt())\n>>>                         panic(\"Fatal exception in interrupt\");\n>>>            }\n>>\n>>\n>> Yes looks nice, but:\n>> 1/ what is in_nmi_from_interrupt() ? Is it (in_nmi() && (in_irq() ||\n>> in_softirq()) ?\n> \n>    return (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET))) != 0;\n> \n> (basically just in_interrupt() with the nmi_enter undone)\n> \n>> 2/ what about in_nmi_from_nmi(), how do we detect that ?\n> \n> Oh good point, I'm not sure. I guess we could irq_enter() in the\n> nested case, I think that would make in_nmi_from_interrupt()\n> return true.\n\nYes we could, but I find it ugly.\n\nDon't you think it looks less strange to just check in_interrupt() \nbefore calling nmi_enter()  ?\n\nChristophe","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42WCsl5Hdgz9s9J\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 12 Oct 2018 01:24:59 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42WCsl3h9hzF3JG\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 12 Oct 2018 01:24:59 +1100 (AEDT)","from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42WCr63GrczF1F0\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tFri, 12 Oct 2018 01:23:34 +1100 (AEDT)","from localhost (mailhub1-int [192.168.12.234])\n\tby localhost (Postfix) with ESMTP id 42WCqs3dzzz9ttn0;\n\tThu, 11 Oct 2018 16:23:21 +0200 (CEST)","from pegase1.c-s.fr ([192.168.12.234])\n\tby localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new,\n\tport 10024)\n\twith ESMTP id sJaHp8wQ6q0Q; Thu, 11 Oct 2018 16:23:21 +0200 (CEST)","from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192])\n\tby pegase1.c-s.fr (Postfix) with ESMTP id 42WCqs33TSz9ttBX;\n\tThu, 11 Oct 2018 16:23:21 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 9E0008B881;\n\tThu, 11 Oct 2018 16:23:23 +0200 (CEST)","from messagerie.si.c-s.fr ([127.0.0.1])\n\tby localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new,\n\tport 10023)\n\twith ESMTP id oqLQZIft6Xuf; Thu, 11 Oct 2018 16:23:23 +0200 (CEST)","from PO15451 (unknown [192.168.232.3])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 487858B86C;\n\tThu, 11 Oct 2018 16:23:23 +0200 (CEST)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=c-s.fr\n\t(client-ip=93.17.236.30; helo=pegase1.c-s.fr;\n\tenvelope-from=christophe.leroy@c-s.fr; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr"],"X-Virus-Scanned":["Debian amavisd-new at c-s.fr","amavisd-new at c-s.fr"],"Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","To":"Nicholas Piggin <npiggin@gmail.com>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>\n\t<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>\n\t<20181009153058.2564e7a1@roar.ozlabs.ibm.com>\n\t<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>\n\t<20181009211650.042d428c@roar.ozlabs.ibm.com>\n\t<e4c9e983-db3e-ab50-c30b-9d538e202147@c-s.fr>\n\t<20181009221446.33b926e3@roar.ozlabs.ibm.com>","From":"Christophe LEROY <christophe.leroy@c-s.fr>","Message-ID":"<8281d664-6c4b-3476-ac2d-9fc9ba2c7e03@c-s.fr>","Date":"Thu, 11 Oct 2018 16:23:23 +0200","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101\n\tThunderbird/52.9.1","MIME-Version":"1.0","In-Reply-To":"<20181009221446.33b926e3@roar.ozlabs.ibm.com>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Language":"fr","Content-Transfer-Encoding":"8bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2008314,"web_url":"http://patchwork.ozlabs.org/comment/2008314/","msgid":"<9f0cbf48-d278-08bf-cb32-8b9608768025@c-s.fr>","date":"2018-10-11T14:31:16","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":5234,"url":"http://patchwork.ozlabs.org/api/people/5234/","name":"Christophe Leroy","email":"christophe.leroy@c-s.fr"},"content":"Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :\n> On Tue, 9 Oct 2018 09:36:18 +0000\n> Christophe Leroy <christophe.leroy@c-s.fr> wrote:\n> \n>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:\n>>> On Tue, 9 Oct 2018 06:46:30 +0200\n>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>    \n>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :\n>>>>> On Mon, 8 Oct 2018 17:39:11 +0200\n>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>>>       \n>>>>>> Hi Nick,\n>>>>>>\n>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :\n>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n>>>>>>> printk NMI buffers and turns off various debugging facilities that\n>>>>>>> helps avoid tripping on ourselves or other CPUs.\n>>>>>>>\n>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n>>>>>>> ---\n>>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---\n>>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)\n>>>>>>>\n>>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\n>>>>>>> index 2849c4f50324..6d31f9d7c333 100644\n>>>>>>> --- a/arch/powerpc/kernel/traps.c\n>>>>>>> +++ b/arch/powerpc/kernel/traps.c\n>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)\n>>>>>>>      \n>>>>>>>      void machine_check_exception(struct pt_regs *regs)\n>>>>>>>      {\n>>>>>>> -\tenum ctx_state prev_state = exception_enter();\n>>>>>>>      \tint recover = 0;\n>>>>>>> +\tbool nested = in_nmi();\n>>>>>>> +\tif (!nested)\n>>>>>>> +\t\tnmi_enter();\n>>>>>>\n>>>>>> This alters preempt_count, then when die() is called\n>>>>>> in_interrupt() returns true allthough the trap didn't happen in\n>>>>>> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n>>>>>> instead of gently sending SIGBUS the faulting app.\n>>>>>\n>>>>> Thanks for tracking that down.\n>>>>>       \n>>>>>> Any idea on how to fix this ?\n>>>>>\n>>>>> I would say we have to deliver the sigbus by hand.\n>>>>>\n>>>>>        if ((user_mode(regs)))\n>>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n>>>>>        else\n>>>>>            die(\"Machine check\", regs, SIGBUS);\n>>>>>       \n>>>>\n>>>> And what about all the other things done by 'die()' ?\n>>>>\n>>>> And what if it is a kernel thread ?\n>>>>\n>>>> In one of my boards, I have a kernel thread regularly checking the HW,\n>>>> and if it gets a machine check I expect it to gently stop and the die\n>>>> notification to be delivered to all registered notifiers.\n>>>>\n>>>> Until before this patch, it was working well.\n>>>\n>>> I guess the alternative is we could check regs->trap for machine\n>>> check in the die test. Complication is having to account for MCE\n>>> in an interrupt handler.\n>>>\n>>>          if (in_interrupt()) {\n>>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))\n>>>                       panic(\"Fatal exception in interrupt\");\n>>>          }\n>>>\n>>> Something like that might work for you? We needs a ppc64 macro for the\n>>> MCE, and can probably add something like in_nmi_from_interrupt() for\n>>> the second part of the test.\n>>\n>> Don't know, I'm away from home on business trip so I won't be able to\n>> test anything before next week. However it looks more or less like a\n>> hack, doesn't it ?\n> \n> I thought it seemed okay (with the right functions added). Actually it\n> could be a bit nicer to do this, then it works generally :\n> \n>           if (in_interrupt()) {\n>                    if (!in_nmi() || in_nmi_from_interrupt())\n>                        panic(\"Fatal exception in interrupt\");\n>           }\n> \n>>\n>> What about the following ?\n> \n> Hmm, in some ways maybe it's nicer. One complication is I would like the\n> same thing to be available for platform specific machine check\n> handlers, so then you need to pass is_in_interrupt to them. Which you\n> can do without any problem... But is it cleaner than the above?\n\nFor me it looks cleaner than twiddle the preempt_count depending on \nwhether we were or not already in nmi() .\n\nLet's draft something and see what it looks like.\n\n\n> \n> I guess one advantage of yours is that a BUG somewhere in the NMI path\n> will panic the system. Or is that a disadvantage?\n\nWhy would it panic the system more than now ? And is it an issue at all \n? Doesn't BUG() panic in any case ?\n\nChristophe","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42WD2y2c9Dz9s7W\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 12 Oct 2018 01:32:58 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42WD2y1M33zF3KM\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 12 Oct 2018 01:32:58 +1100 (AEDT)","from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42WD1C6fClzF3HJ\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tFri, 12 Oct 2018 01:31:27 +1100 (AEDT)","from localhost (mailhub1-int [192.168.12.234])\n\tby localhost (Postfix) with ESMTP id 42WD0y5vqhz9ttn0;\n\tThu, 11 Oct 2018 16:31:14 +0200 (CEST)","from pegase1.c-s.fr ([192.168.12.234])\n\tby localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new,\n\tport 10024)\n\twith ESMTP id dtOnJQe3bLs1; Thu, 11 Oct 2018 16:31:14 +0200 (CEST)","from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192])\n\tby pegase1.c-s.fr (Postfix) with ESMTP id 42WD0y5DzMz9ttBX;\n\tThu, 11 Oct 2018 16:31:14 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id D47208B883;\n\tThu, 11 Oct 2018 16:31:16 +0200 (CEST)","from messagerie.si.c-s.fr ([127.0.0.1])\n\tby localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new,\n\tport 10023)\n\twith ESMTP id HmOSWntsdchz; Thu, 11 Oct 2018 16:31:16 +0200 (CEST)","from PO15451 (unknown [192.168.232.3])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 6B0998B86C;\n\tThu, 11 Oct 2018 16:31:16 +0200 (CEST)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=c-s.fr\n\t(client-ip=93.17.236.30; helo=pegase1.c-s.fr;\n\tenvelope-from=christophe.leroy@c-s.fr; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr"],"X-Virus-Scanned":["Debian amavisd-new at c-s.fr","amavisd-new at c-s.fr"],"Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","To":"Nicholas Piggin <npiggin@gmail.com>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>\n\t<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>\n\t<20181009153058.2564e7a1@roar.ozlabs.ibm.com>\n\t<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>\n\t<20181009211650.042d428c@roar.ozlabs.ibm.com>","From":"Christophe LEROY <christophe.leroy@c-s.fr>","Message-ID":"<9f0cbf48-d278-08bf-cb32-8b9608768025@c-s.fr>","Date":"Thu, 11 Oct 2018 16:31:16 +0200","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101\n\tThunderbird/52.9.1","MIME-Version":"1.0","In-Reply-To":"<20181009211650.042d428c@roar.ozlabs.ibm.com>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Language":"fr","Content-Transfer-Encoding":"8bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2009796,"web_url":"http://patchwork.ozlabs.org/comment/2009796/","msgid":"<bb79aa53-8924-b334-93d6-fc907c8880e1@c-s.fr>","date":"2018-10-13T08:29:48","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":5234,"url":"http://patchwork.ozlabs.org/api/people/5234/","name":"Christophe Leroy","email":"christophe.leroy@c-s.fr"},"content":"On 10/11/2018 02:31 PM, Christophe LEROY wrote:\n> \n> \n> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :\n>> On Tue, 9 Oct 2018 09:36:18 +0000\n>> Christophe Leroy <christophe.leroy@c-s.fr> wrote:\n>>\n>>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:\n>>>> On Tue, 9 Oct 2018 06:46:30 +0200\n>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :\n>>>>>> On Mon, 8 Oct 2018 17:39:11 +0200\n>>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>>>>> Hi Nick,\n>>>>>>>\n>>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :\n>>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n>>>>>>>> printk NMI buffers and turns off various debugging facilities that\n>>>>>>>> helps avoid tripping on ourselves or other CPUs.\n>>>>>>>>\n>>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n>>>>>>>> ---\n>>>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---\n>>>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)\n>>>>>>>>\n>>>>>>>> diff --git a/arch/powerpc/kernel/traps.c \n>>>>>>>> b/arch/powerpc/kernel/traps.c\n>>>>>>>> index 2849c4f50324..6d31f9d7c333 100644\n>>>>>>>> --- a/arch/powerpc/kernel/traps.c\n>>>>>>>> +++ b/arch/powerpc/kernel/traps.c\n>>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs \n>>>>>>>> *regs)\n>>>>>>>>      void machine_check_exception(struct pt_regs *regs)\n>>>>>>>>      {\n>>>>>>>> -    enum ctx_state prev_state = exception_enter();\n>>>>>>>>          int recover = 0;\n>>>>>>>> +    bool nested = in_nmi();\n>>>>>>>> +    if (!nested)\n>>>>>>>> +        nmi_enter();\n>>>>>>>\n>>>>>>> This alters preempt_count, then when die() is called\n>>>>>>> in_interrupt() returns true allthough the trap didn't happen in\n>>>>>>> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n>>>>>>> instead of gently sending SIGBUS the faulting app.\n>>>>>>\n>>>>>> Thanks for tracking that down.\n>>>>>>> Any idea on how to fix this ?\n>>>>>>\n>>>>>> I would say we have to deliver the sigbus by hand.\n>>>>>>\n>>>>>>        if ((user_mode(regs)))\n>>>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n>>>>>>        else\n>>>>>>            die(\"Machine check\", regs, SIGBUS);\n>>>>>\n>>>>> And what about all the other things done by 'die()' ?\n>>>>>\n>>>>> And what if it is a kernel thread ?\n>>>>>\n>>>>> In one of my boards, I have a kernel thread regularly checking the HW,\n>>>>> and if it gets a machine check I expect it to gently stop and the die\n>>>>> notification to be delivered to all registered notifiers.\n>>>>>\n>>>>> Until before this patch, it was working well.\n>>>>\n>>>> I guess the alternative is we could check regs->trap for machine\n>>>> check in the die test. Complication is having to account for MCE\n>>>> in an interrupt handler.\n>>>>\n>>>>          if (in_interrupt()) {\n>>>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - \n>>>> (NMI_OFFSET + HARDIRQ_OFFSET)))\n>>>>                       panic(\"Fatal exception in interrupt\");\n>>>>          }\n>>>>\n>>>> Something like that might work for you? We needs a ppc64 macro for the\n>>>> MCE, and can probably add something like in_nmi_from_interrupt() for\n>>>> the second part of the test.\n>>>\n>>> Don't know, I'm away from home on business trip so I won't be able to\n>>> test anything before next week. However it looks more or less like a\n>>> hack, doesn't it ?\n>>\n>> I thought it seemed okay (with the right functions added). Actually it\n>> could be a bit nicer to do this, then it works generally :\n>>\n>>           if (in_interrupt()) {\n>>                    if (!in_nmi() || in_nmi_from_interrupt())\n>>                        panic(\"Fatal exception in interrupt\");\n>>           }\n>>\n>>>\n>>> What about the following ?\n>>\n>> Hmm, in some ways maybe it's nicer. One complication is I would like the\n>> same thing to be available for platform specific machine check\n>> handlers, so then you need to pass is_in_interrupt to them. Which you\n>> can do without any problem... But is it cleaner than the above?\n> \n> For me it looks cleaner than twiddle the preempt_count depending on \n> whether we were or not already in nmi() .\n> \n> Let's draft something and see what it looks like.\n\nOk, finaly I went to your solution, see below, as it avoids having to \nmodify all subarch and platform specific machine check handlers.\n\nUnfortunately it doesn't solves the issue, it only delays it:\n\noops_end() calls do_exit(), which has the following test:\n\n\tif (unlikely(in_interrupt()))\n\t\tpanic(\"Aiee, killing interrupt handler!\");\n\n\nSo at the time being I still have no idea how to fix that, have you ?\n\ndiff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c\nindex fd58749b4d6b..3569e826f0c2 100644\n--- a/arch/powerpc/kernel/traps.c\n+++ b/arch/powerpc/kernel/traps.c\n@@ -132,6 +132,21 @@ static void pmac_backlight_unblank(void)\n  static inline void pmac_backlight_unblank(void) { }\n  #endif\n\n+static bool from_interrupt(void)\n+{\n+\tif (!in_nmi())\n+\t\treturn in_interrupt();\n+\t/*\n+\t * if we are in NMI, we need to determine if we were already in\n+\t * interrupt before entering NMI. To do that, we recalculate irq_count()\n+\t * from before the call to nmi_enter().\n+\t * If we were already in NMI and reentered in a new one, we have\n+\t * increased the preempt count by HARDIRQ_OFFSET, so the calculated\n+\t * value will be not null\n+\t */\n+\treturn irq_count() - NMI_OFFSET - HARDIRQ_OFFSET;\n+}\n+\n  /*\n   * If oops/die is expected to crash the machine, return true here.\n   *\n@@ -147,8 +162,7 @@ bool die_will_crash(void)\n  \t\treturn true;\n  \tif (kexec_should_crash(current))\n  \t\treturn true;\n-\tif (in_interrupt() || panic_on_oops ||\n-\t\t\t!current->pid || is_global_init(current))\n+\tif (from_interrupt() || panic_on_oops || !current->pid || \nis_global_init(current))\n  \t\treturn true;\n\n  \treturn false;\n@@ -242,12 +256,12 @@ static void oops_end(unsigned long flags, struct \npt_regs *regs,\n  \t * know we are going to panic, delay for 1 second so we have a\n  \t * chance to get clean backtraces from all CPUs that are oopsing.\n  \t */\n-\tif (in_interrupt() || panic_on_oops || !current->pid ||\n+\tif (from_interrupt() || panic_on_oops || !current->pid ||\n  \t    is_global_init(current)) {\n  \t\tmdelay(MSEC_PER_SEC);\n  \t}\n\n-\tif (in_interrupt())\n+\tif (from_interrupt())\n  \t\tpanic(\"Fatal exception in interrupt\");\n  \tif (panic_on_oops)\n  \t\tpanic(\"Fatal exception\");\n@@ -378,15 +392,37 @@ void _exception(int signr, struct pt_regs *regs, \nint code, unsigned long addr)\n  \t_exception_pkey(signr, regs, code, addr, 0);\n  }\n\n+static bool exception_nmi_enter(void)\n+{\n+\tbool nested = in_nmi();\n+\n+\t/*\n+\t * In case we are already in an NMI, increase preempt_count by\n+\t * HARDIRQ_OFFSET in order to get from_interrupt() return true\n+\t */\n+\tif (nested)\n+\t\tpreempt_count_add(HARDIRQ_OFFSET);\n+\telse\n+\t\tnmi_enter();\n+\n+\treturn nested;\n+}\n+\n+static void exception_nmi_exit(bool nested)\n+{\n+\tif (nested)\n+\t\tpreempt_count_sub(HARDIRQ_OFFSET);\n+\telse\n+\t\tnmi_exit();\n+}\n+\n  void system_reset_exception(struct pt_regs *regs)\n  {\n  \t/*\n  \t * Avoid crashes in case of nested NMI exceptions. Recoverability\n  \t * is determined by RI and in_nmi\n  \t */\n-\tbool nested = in_nmi();\n-\tif (!nested)\n-\t\tnmi_enter();\n+\tbool nested = exception_nmi_enter();\n\n  \t__this_cpu_inc(irq_stat.sreset_irqs);\n\n@@ -435,8 +471,7 @@ void system_reset_exception(struct pt_regs *regs)\n  \tif (!(regs->msr & MSR_RI))\n  \t\tnmi_panic(regs, \"Unrecoverable System Reset\");\n\n-\tif (!nested)\n-\t\tnmi_exit();\n+\texception_nmi_exit(nested);\n\n  \t/* What should we do here? We could issue a shutdown or hard reset. */\n  }\n@@ -737,9 +772,7 @@ int machine_check_generic(struct pt_regs *regs)\n  void machine_check_exception(struct pt_regs *regs)\n  {\n  \tint recover = 0;\n-\tbool nested = in_nmi();\n-\tif (!nested)\n-\t\tnmi_enter();\n+\tbool nested = exception_nmi_enter();\n\n  \t__this_cpu_inc(irq_stat.mce_exceptions);\n\n@@ -772,8 +805,7 @@ void machine_check_exception(struct pt_regs *regs)\n  \t\tnmi_panic(regs, \"Unrecoverable Machine check\");\n\n  bail:\n-\tif (!nested)\n-\t\tnmi_exit();\n+\texception_nmi_exit(nested);\n  }\n\n  void SMIException(struct pt_regs *regs)\n\n> \n> \n>>\n>> I guess one advantage of yours is that a BUG somewhere in the NMI path\n>> will panic the system. Or is that a disadvantage?\n> \n> Why would it panic the system more than now ? And is it an issue at all \n> ? Doesn't BUG() panic in any case ?\n> \n\nChristophe","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42XHx10Kgqz9s2P\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 13 Oct 2018 19:31:33 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42XHx06CcDzF3LD\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 13 Oct 2018 19:31:32 +1100 (AEDT)","from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42XHvF6yQ3zF0dd\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tSat, 13 Oct 2018 19:30:01 +1100 (AEDT)","from localhost (mailhub1-int [192.168.12.234])\n\tby localhost (Postfix) with ESMTP id 42XHv32Sldz9ttFk;\n\tSat, 13 Oct 2018 10:29:51 +0200 (CEST)","from pegase1.c-s.fr ([192.168.12.234])\n\tby localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new,\n\tport 10024)\n\twith ESMTP id JD_jAP_Ae2By; Sat, 13 Oct 2018 10:29:51 +0200 (CEST)","from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192])\n\tby pegase1.c-s.fr (Postfix) with ESMTP id 42XHv31jCJz9ttFY;\n\tSat, 13 Oct 2018 10:29:51 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 854B48B782;\n\tSat, 13 Oct 2018 10:29:56 +0200 (CEST)","from messagerie.si.c-s.fr ([127.0.0.1])\n\tby localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new,\n\tport 10023)\n\twith ESMTP id iGRmdS3gBAs1; Sat, 13 Oct 2018 10:29:56 +0200 (CEST)","from pc13168vm.idsi0.si.c-s.fr (unknown [192.168.232.3])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 16F088B74B;\n\tSat, 13 Oct 2018 10:29:56 +0200 (CEST)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=c-s.fr\n\t(client-ip=93.17.236.30; helo=pegase1.c-s.fr;\n\tenvelope-from=christophe.leroy@c-s.fr; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr"],"X-Virus-Scanned":["Debian amavisd-new at c-s.fr","amavisd-new at c-s.fr"],"Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","From":"Christophe Leroy <christophe.leroy@c-s.fr>","To":"Nicholas Piggin <npiggin@gmail.com>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>\n\t<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>\n\t<20181009153058.2564e7a1@roar.ozlabs.ibm.com>\n\t<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>\n\t<20181009211650.042d428c@roar.ozlabs.ibm.com>\n\t<9f0cbf48-d278-08bf-cb32-8b9608768025@c-s.fr>","Message-ID":"<bb79aa53-8924-b334-93d6-fc907c8880e1@c-s.fr>","Date":"Sat, 13 Oct 2018 08:29:48 +0000","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.7.0","MIME-Version":"1.0","In-Reply-To":"<9f0cbf48-d278-08bf-cb32-8b9608768025@c-s.fr>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Language":"en-US","Content-Transfer-Encoding":"8bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2009798,"web_url":"http://patchwork.ozlabs.org/comment/2009798/","msgid":"<20181013184815.6a80d196@roar.ozlabs.ibm.com>","date":"2018-10-13T08:48:15","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":69518,"url":"http://patchwork.ozlabs.org/api/people/69518/","name":"Nicholas Piggin","email":"npiggin@gmail.com"},"content":"On Sat, 13 Oct 2018 08:29:48 +0000\nChristophe Leroy <christophe.leroy@c-s.fr> wrote:\n\n> On 10/11/2018 02:31 PM, Christophe LEROY wrote:\n> > \n> > \n> > Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :  \n> >> On Tue, 9 Oct 2018 09:36:18 +0000\n> >> Christophe Leroy <christophe.leroy@c-s.fr> wrote:\n> >>  \n> >>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:  \n> >>>> On Tue, 9 Oct 2018 06:46:30 +0200\n> >>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:  \n> >>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :  \n> >>>>>> On Mon, 8 Oct 2018 17:39:11 +0200\n> >>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:  \n> >>>>>>> Hi Nick,\n> >>>>>>>\n> >>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  \n> >>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n> >>>>>>>> printk NMI buffers and turns off various debugging facilities that\n> >>>>>>>> helps avoid tripping on ourselves or other CPUs.\n> >>>>>>>>\n> >>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n> >>>>>>>> ---\n> >>>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---\n> >>>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)\n> >>>>>>>>\n> >>>>>>>> diff --git a/arch/powerpc/kernel/traps.c \n> >>>>>>>> b/arch/powerpc/kernel/traps.c\n> >>>>>>>> index 2849c4f50324..6d31f9d7c333 100644\n> >>>>>>>> --- a/arch/powerpc/kernel/traps.c\n> >>>>>>>> +++ b/arch/powerpc/kernel/traps.c\n> >>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs \n> >>>>>>>> *regs)\n> >>>>>>>>      void machine_check_exception(struct pt_regs *regs)\n> >>>>>>>>      {\n> >>>>>>>> -    enum ctx_state prev_state = exception_enter();\n> >>>>>>>>          int recover = 0;\n> >>>>>>>> +    bool nested = in_nmi();\n> >>>>>>>> +    if (!nested)\n> >>>>>>>> +        nmi_enter();  \n> >>>>>>>\n> >>>>>>> This alters preempt_count, then when die() is called\n> >>>>>>> in_interrupt() returns true allthough the trap didn't happen in\n> >>>>>>> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n> >>>>>>> instead of gently sending SIGBUS the faulting app.  \n> >>>>>>\n> >>>>>> Thanks for tracking that down.  \n> >>>>>>> Any idea on how to fix this ?  \n> >>>>>>\n> >>>>>> I would say we have to deliver the sigbus by hand.\n> >>>>>>\n> >>>>>>        if ((user_mode(regs)))\n> >>>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n> >>>>>>        else\n> >>>>>>            die(\"Machine check\", regs, SIGBUS);  \n> >>>>>\n> >>>>> And what about all the other things done by 'die()' ?\n> >>>>>\n> >>>>> And what if it is a kernel thread ?\n> >>>>>\n> >>>>> In one of my boards, I have a kernel thread regularly checking the HW,\n> >>>>> and if it gets a machine check I expect it to gently stop and the die\n> >>>>> notification to be delivered to all registered notifiers.\n> >>>>>\n> >>>>> Until before this patch, it was working well.  \n> >>>>\n> >>>> I guess the alternative is we could check regs->trap for machine\n> >>>> check in the die test. Complication is having to account for MCE\n> >>>> in an interrupt handler.\n> >>>>\n> >>>>          if (in_interrupt()) {\n> >>>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - \n> >>>> (NMI_OFFSET + HARDIRQ_OFFSET)))\n> >>>>                       panic(\"Fatal exception in interrupt\");\n> >>>>          }\n> >>>>\n> >>>> Something like that might work for you? We needs a ppc64 macro for the\n> >>>> MCE, and can probably add something like in_nmi_from_interrupt() for\n> >>>> the second part of the test.  \n> >>>\n> >>> Don't know, I'm away from home on business trip so I won't be able to\n> >>> test anything before next week. However it looks more or less like a\n> >>> hack, doesn't it ?  \n> >>\n> >> I thought it seemed okay (with the right functions added). Actually it\n> >> could be a bit nicer to do this, then it works generally :\n> >>\n> >>           if (in_interrupt()) {\n> >>                    if (!in_nmi() || in_nmi_from_interrupt())\n> >>                        panic(\"Fatal exception in interrupt\");\n> >>           }\n> >>  \n> >>>\n> >>> What about the following ?  \n> >>\n> >> Hmm, in some ways maybe it's nicer. One complication is I would like the\n> >> same thing to be available for platform specific machine check\n> >> handlers, so then you need to pass is_in_interrupt to them. Which you\n> >> can do without any problem... But is it cleaner than the above?  \n> > \n> > For me it looks cleaner than twiddle the preempt_count depending on \n> > whether we were or not already in nmi() .\n> > \n> > Let's draft something and see what it looks like.  \n> \n> Ok, finaly I went to your solution, see below, as it avoids having to \n> modify all subarch and platform specific machine check handlers.\n> \n> Unfortunately it doesn't solves the issue, it only delays it:\n> \n> oops_end() calls do_exit(), which has the following test:\n> \n> \tif (unlikely(in_interrupt()))\n> \t\tpanic(\"Aiee, killing interrupt handler!\");\n> \n> \n> So at the time being I still have no idea how to fix that, have you ?\n\nHuh, I'm not sure. x86's MCE handling looks like it does this:\n\n                /*\n                 * We might have interrupted pretty much anything.  In\n                 * fact, if we're a machine check, we can even interrupt\n                 * NMI processing.  We don't want in_nmi() to return true,\n                 * but we need to notify RCU.\n                 */\n                rcu_nmi_enter();\n\nBut I don't see why they don't want the full NMI treatment there. I\nthought the whole point was to do everything so you would get e.g.,\nthe NMI-safe printk and so on.\n\nThe reason the in_interrupt checks work below is because the synchronous\ntrap handlers e.g., for BUG do not enter interrupt context so the\nquestion is about they context they interrupted. Maybe the right way to\ngo is nmi_exit just before deciding to oops.\n\nPerhaps we could ask lkml.\n\nThanks,\nNick","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42XJLf4LPPz9s3Z\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 13 Oct 2018 19:50:18 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42XJLf2WW5zF3HP\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 13 Oct 2018 19:50:18 +1100 (AEDT)","from mail-pl1-x643.google.com (mail-pl1-x643.google.com\n\t[IPv6:2607:f8b0:4864:20::643])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42XJJZ1wH4zF3Df\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tSat, 13 Oct 2018 19:48:30 +1100 (AEDT)","by mail-pl1-x643.google.com with SMTP id y11-v6so6999143plt.3\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tSat, 13 Oct 2018 01:48:30 -0700 (PDT)","from roar.ozlabs.ibm.com (61-68-185-28.tpgi.com.au. [61.68.185.28])\n\tby smtp.gmail.com with ESMTPSA id\n\tw187-v6sm631360pfw.3.2018.10.13.01.48.25\n\t(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);\n\tSat, 13 Oct 2018 01:48:27 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=fail (p=none dis=none) header.from=gmail.com","ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"nbamk+Jf\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdmarc=pass (p=none dis=none) header.from=gmail.com","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"nbamk+Jf\"; dkim-atps=neutral","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gmail.com\n\t(client-ip=2607:f8b0:4864:20::643; helo=mail-pl1-x643.google.com;\n\tenvelope-from=npiggin@gmail.com; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=pass (p=none dis=none) header.from=gmail.com","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"nbamk+Jf\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=date:from:to:cc:subject:message-id:in-reply-to:references\n\t:mime-version:content-transfer-encoding;\n\tbh=qSZC+eggdpDMiIzo3vE3ubtzhmbwNihKA0ZZgSKuLes=;\n\tb=nbamk+Jf9HiEAy+H77fDPUR9eQc0Ko2eey8VzDZ6wsjLzRR5Pj086F/cdiNFXUiBES\n\t5T0IbOvhSvpIEJU/oWIMSurD2H6AxU0qJq2qpRIkTwC41DrPgsLVKfNgaYQCIp9p6Lhh\n\t/eMegK3YR0+wxBs7ZDYfFAgk4y11Jzl2z0faubQD9NfzcTD+38QzXGATDyjOa9WI8kRF\n\t/YTCoTxr0sc2dVGiD/SAOD+e8dDOTYqaAPByTJf6FwhowEyNu59s8/2dRM24gcf7/eoN\n\tqx92rCSOWHhEC2ryWhCSWEb8mOZ3wn1/y8Qww918TtoRwwKppnYj/9jV1HscS7f2otgy\n\tKv8Q==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to\n\t:references:mime-version:content-transfer-encoding;\n\tbh=qSZC+eggdpDMiIzo3vE3ubtzhmbwNihKA0ZZgSKuLes=;\n\tb=uV7zPH+TLH6BlwwrER0dGIj9ylMv0XL4zZQavSyXo/0bFjIgbm/e0Vpeg8BRmYEA/0\n\tw80DsgvKN6ozFx5utYRRBhg2v+/xl09use5QmcNUFoqavhdP3vMxeq7YkZ1NtR/fVRDl\n\tiXlQeXwEV6nh1DZeSfO334C+Sna2mF2VfURVClW4r9imunfaWy571FU6yOoNlr7ku4lT\n\t6FWRnW7eFZsiFG9pRpijXqNqHg6fIv9EWTzDNFvMhTv4NSWEeR7Td2Oi5tJauu90Sobr\n\t36V9vz9kvYoqp9BKxsHbzHf6PngWIY0Zi5TM3FiPD+mAx0xujCmJ2Cogk8uBRI6pqLom\n\t2v7w==","X-Gm-Message-State":"ABuFfoi0vvl0GIAFkkyOhAabYAqNJ2bh0MZFxqHFvTRCd7RPWmlKWjD2\n\tbrI6sOzcNvXzFIFxfoPkuQo=","X-Google-Smtp-Source":"ACcGV60Mtmt9gJJLBTerrXEz0vtOmSV4L7j7x/fEGDShpitgcmZTsVllq+uf+SMbIvP2DQ54kT4TxA==","X-Received":"by 2002:a17:902:bf09:: with SMTP id\n\tbi9-v6mr9039275plb.118.1539420508226; \n\tSat, 13 Oct 2018 01:48:28 -0700 (PDT)","Date":"Sat, 13 Oct 2018 18:48:15 +1000","From":"Nicholas Piggin <npiggin@gmail.com>","To":"Christophe Leroy <christophe.leroy@c-s.fr>","Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","Message-ID":"<20181013184815.6a80d196@roar.ozlabs.ibm.com>","In-Reply-To":"<bb79aa53-8924-b334-93d6-fc907c8880e1@c-s.fr>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>\n\t<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>\n\t<20181009153058.2564e7a1@roar.ozlabs.ibm.com>\n\t<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>\n\t<20181009211650.042d428c@roar.ozlabs.ibm.com>\n\t<9f0cbf48-d278-08bf-cb32-8b9608768025@c-s.fr>\n\t<bb79aa53-8924-b334-93d6-fc907c8880e1@c-s.fr>","X-Mailer":"Claws Mail 3.17.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu)","MIME-Version":"1.0","Content-Type":"text/plain; charset=UTF-8","Content-Transfer-Encoding":"quoted-printable","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":2009800,"web_url":"http://patchwork.ozlabs.org/comment/2009800/","msgid":"<7f8486d4-fe7e-59f0-371d-af2d9ab83bca@c-s.fr>","date":"2018-10-13T08:56:24","subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","submitter":{"id":5234,"url":"http://patchwork.ozlabs.org/api/people/5234/","name":"Christophe Leroy","email":"christophe.leroy@c-s.fr"},"content":"Le 13/10/2018 à 10:48, Nicholas Piggin a écrit :\n> On Sat, 13 Oct 2018 08:29:48 +0000\n> Christophe Leroy <christophe.leroy@c-s.fr> wrote:\n> \n>> On 10/11/2018 02:31 PM, Christophe LEROY wrote:\n>>>\n>>>\n>>> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :\n>>>> On Tue, 9 Oct 2018 09:36:18 +0000\n>>>> Christophe Leroy <christophe.leroy@c-s.fr> wrote:\n>>>>   \n>>>>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:\n>>>>>> On Tue, 9 Oct 2018 06:46:30 +0200\n>>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :\n>>>>>>>> On Mon, 8 Oct 2018 17:39:11 +0200\n>>>>>>>> Christophe LEROY <christophe.leroy@c-s.fr> wrote:\n>>>>>>>>> Hi Nick,\n>>>>>>>>>\n>>>>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :\n>>>>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI\n>>>>>>>>>> printk NMI buffers and turns off various debugging facilities that\n>>>>>>>>>> helps avoid tripping on ourselves or other CPUs.\n>>>>>>>>>>\n>>>>>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>\n>>>>>>>>>> ---\n>>>>>>>>>>       arch/powerpc/kernel/traps.c | 9 ++++++---\n>>>>>>>>>>       1 file changed, 6 insertions(+), 3 deletions(-)\n>>>>>>>>>>\n>>>>>>>>>> diff --git a/arch/powerpc/kernel/traps.c\n>>>>>>>>>> b/arch/powerpc/kernel/traps.c\n>>>>>>>>>> index 2849c4f50324..6d31f9d7c333 100644\n>>>>>>>>>> --- a/arch/powerpc/kernel/traps.c\n>>>>>>>>>> +++ b/arch/powerpc/kernel/traps.c\n>>>>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs\n>>>>>>>>>> *regs)\n>>>>>>>>>>       void machine_check_exception(struct pt_regs *regs)\n>>>>>>>>>>       {\n>>>>>>>>>> -    enum ctx_state prev_state = exception_enter();\n>>>>>>>>>>           int recover = 0;\n>>>>>>>>>> +    bool nested = in_nmi();\n>>>>>>>>>> +    if (!nested)\n>>>>>>>>>> +        nmi_enter();\n>>>>>>>>>\n>>>>>>>>> This alters preempt_count, then when die() is called\n>>>>>>>>> in_interrupt() returns true allthough the trap didn't happen in\n>>>>>>>>> interrupt, so oops_end() panics for \"fatal exception in interrupt\"\n>>>>>>>>> instead of gently sending SIGBUS the faulting app.\n>>>>>>>>\n>>>>>>>> Thanks for tracking that down.\n>>>>>>>>> Any idea on how to fix this ?\n>>>>>>>>\n>>>>>>>> I would say we have to deliver the sigbus by hand.\n>>>>>>>>\n>>>>>>>>         if ((user_mode(regs)))\n>>>>>>>>             _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);\n>>>>>>>>         else\n>>>>>>>>             die(\"Machine check\", regs, SIGBUS);\n>>>>>>>\n>>>>>>> And what about all the other things done by 'die()' ?\n>>>>>>>\n>>>>>>> And what if it is a kernel thread ?\n>>>>>>>\n>>>>>>> In one of my boards, I have a kernel thread regularly checking the HW,\n>>>>>>> and if it gets a machine check I expect it to gently stop and the die\n>>>>>>> notification to be delivered to all registered notifiers.\n>>>>>>>\n>>>>>>> Until before this patch, it was working well.\n>>>>>>\n>>>>>> I guess the alternative is we could check regs->trap for machine\n>>>>>> check in the die test. Complication is having to account for MCE\n>>>>>> in an interrupt handler.\n>>>>>>\n>>>>>>           if (in_interrupt()) {\n>>>>>>                    if (!IS_MCHECK_EXC(regs) || (irq_count() -\n>>>>>> (NMI_OFFSET + HARDIRQ_OFFSET)))\n>>>>>>                        panic(\"Fatal exception in interrupt\");\n>>>>>>           }\n>>>>>>\n>>>>>> Something like that might work for you? We needs a ppc64 macro for the\n>>>>>> MCE, and can probably add something like in_nmi_from_interrupt() for\n>>>>>> the second part of the test.\n>>>>>\n>>>>> Don't know, I'm away from home on business trip so I won't be able to\n>>>>> test anything before next week. However it looks more or less like a\n>>>>> hack, doesn't it ?\n>>>>\n>>>> I thought it seemed okay (with the right functions added). Actually it\n>>>> could be a bit nicer to do this, then it works generally :\n>>>>\n>>>>            if (in_interrupt()) {\n>>>>                     if (!in_nmi() || in_nmi_from_interrupt())\n>>>>                         panic(\"Fatal exception in interrupt\");\n>>>>            }\n>>>>   \n>>>>>\n>>>>> What about the following ?\n>>>>\n>>>> Hmm, in some ways maybe it's nicer. One complication is I would like the\n>>>> same thing to be available for platform specific machine check\n>>>> handlers, so then you need to pass is_in_interrupt to them. Which you\n>>>> can do without any problem... But is it cleaner than the above?\n>>>\n>>> For me it looks cleaner than twiddle the preempt_count depending on\n>>> whether we were or not already in nmi() .\n>>>\n>>> Let's draft something and see what it looks like.\n>>\n>> Ok, finaly I went to your solution, see below, as it avoids having to\n>> modify all subarch and platform specific machine check handlers.\n>>\n>> Unfortunately it doesn't solves the issue, it only delays it:\n>>\n>> oops_end() calls do_exit(), which has the following test:\n>>\n>> \tif (unlikely(in_interrupt()))\n>> \t\tpanic(\"Aiee, killing interrupt handler!\");\n>>\n>>\n>> So at the time being I still have no idea how to fix that, have you ?\n> \n> Huh, I'm not sure. x86's MCE handling looks like it does this:\n> \n>                  /*\n>                   * We might have interrupted pretty much anything.  In\n>                   * fact, if we're a machine check, we can even interrupt\n>                   * NMI processing.  We don't want in_nmi() to return true,\n>                   * but we need to notify RCU.\n>                   */\n>                  rcu_nmi_enter();\n> \n> But I don't see why they don't want the full NMI treatment there. I\n> thought the whole point was to do everything so you would get e.g.,\n> the NMI-safe printk and so on.\n> \n> The reason the in_interrupt checks work below is because the synchronous\n> trap handlers e.g., for BUG do not enter interrupt context so the\n> question is about they context they interrupted. Maybe the right way to\n> go is nmi_exit just before deciding to oops.\n\nYes I arrived at the same conclusion. I tested it just now and it works \nfor me. Thanks.\n\nChristophe\n\n> \n> Perhaps we could ask lkml.\n> \n> Thanks,\n> Nick\n>","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 42XJWN0l8Sz9s3Z\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 13 Oct 2018 19:57:52 +1100 (AEDT)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 42XJWM6fqCzDqnG\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 13 Oct 2018 19:57:51 +1100 (AEDT)","from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 42XJTm5zV2zF3Df\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tSat, 13 Oct 2018 19:56:28 +1100 (AEDT)","from localhost (mailhub1-int [192.168.12.234])\n\tby localhost (Postfix) with ESMTP id 42XJTc04HRz9ttFk;\n\tSat, 13 Oct 2018 10:56:20 +0200 (CEST)","from pegase1.c-s.fr ([192.168.12.234])\n\tby localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new,\n\tport 10024)\n\twith ESMTP id EpsjHLgGVGwk; Sat, 13 Oct 2018 10:56:19 +0200 (CEST)","from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192])\n\tby pegase1.c-s.fr (Postfix) with ESMTP id 42XJTb6MW3z9ttFY;\n\tSat, 13 Oct 2018 10:56:19 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id 349C38B782;\n\tSat, 13 Oct 2018 10:56:25 +0200 (CEST)","from messagerie.si.c-s.fr ([127.0.0.1])\n\tby localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new,\n\tport 10023)\n\twith ESMTP id wula8U0IY4ln; Sat, 13 Oct 2018 10:56:25 +0200 (CEST)","from PO15451 (unknown [192.168.232.3])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id BF9C28B74B;\n\tSat, 13 Oct 2018 10:56:24 +0200 (CEST)"],"Authentication-Results":["ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr","lists.ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=c-s.fr\n\t(client-ip=93.17.236.30; helo=pegase1.c-s.fr;\n\tenvelope-from=christophe.leroy@c-s.fr; receiver=<UNKNOWN>)","lists.ozlabs.org;\n\tdmarc=none (p=none dis=none) header.from=c-s.fr"],"X-Virus-Scanned":["Debian amavisd-new at c-s.fr","amavisd-new at c-s.fr"],"Subject":"Re: [PATCH v2 3/3] powerpc: machine check interrupt is a\n\tnon-maskable interrupt","To":"Nicholas Piggin <npiggin@gmail.com>","References":"<20170719065912.19183-1-npiggin@gmail.com>\n\t<20170719065912.19183-4-npiggin@gmail.com>\n\t<30487984-752a-960d-6aae-6571c55c7ba5@c-s.fr>\n\t<20181009143241.026f3e7f@roar.ozlabs.ibm.com>\n\t<ccf61fc2-1f2e-7b67-c481-fa4c00a65aae@c-s.fr>\n\t<20181009153058.2564e7a1@roar.ozlabs.ibm.com>\n\t<0539727f-8420-3176-30b5-f4a6a1ccd4a4@c-s.fr>\n\t<20181009211650.042d428c@roar.ozlabs.ibm.com>\n\t<9f0cbf48-d278-08bf-cb32-8b9608768025@c-s.fr>\n\t<bb79aa53-8924-b334-93d6-fc907c8880e1@c-s.fr>\n\t<20181013184815.6a80d196@roar.ozlabs.ibm.com>","From":"Christophe LEROY <christophe.leroy@c-s.fr>","Message-ID":"<7f8486d4-fe7e-59f0-371d-af2d9ab83bca@c-s.fr>","Date":"Sat, 13 Oct 2018 10:56:24 +0200","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101\n\tThunderbird/52.9.1","MIME-Version":"1.0","In-Reply-To":"<20181013184815.6a80d196@roar.ozlabs.ibm.com>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Language":"fr","Content-Transfer-Encoding":"8bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>,\n\tlinuxppc-dev@lists.ozlabs.org","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}}]