[{"id":1766686,"web_url":"http://patchwork.ozlabs.org/comment/1766686/","msgid":"<20170912150344.2d6702d1@roar.ozlabs.ibm.com>","date":"2017-09-12T05:03:44","subject":"Re: [PATCH v1 0/4] Revisit MCE handling for UE Errors","submitter":{"id":69518,"url":"http://patchwork.ozlabs.org/api/people/69518/","name":"Nicholas Piggin","email":"npiggin@gmail.com"},"content":"Hi Balbir,\n\nVery cool. How are you testing it? Is it failing memory pages\nand poisoning them out properly?\n\nLooks like you have a printk in the machine_check_early path,\nwhich you shouldn't. I guess because we don't mark that context\nas an NMI. Which we could... but I think you want to put as\nlittle as possible in that path, so avoiding the print would\nbe preferable. Perhaps you could mark the mce event somehow that\nthe failure can be reported during processing it?\n\nFirmware logging is a good question, I could not really see\nwhere this all gets plumbed through. If this is expected to be\na common problem for some types of attached memory, then we\nreally need to build up a log of these errors that can be used\nto exclude the memory after a reboot too. Do we have anything\nlike this capability in firmware?\n\nThanks,\nNick\n\nOn Tue, 12 Sep 2017 14:38:55 +1000\nBalbir Singh <bsingharora@gmail.com> wrote:\n\n> This patch series is designed to hook up memory_failure on\n> UE errors, this is specially helpful for user_mode UE errors.\n> \n> The first patch is a cleanup patch, it removes dead code.\n> I could not find any users of get_mce_fault_addr().\n> The second patch walks kernel/user mode page tables in\n> real mode to extract the effective address of the instruction\n> that caused the UE error and the effective address it was\n> trying to access (for load/store). The third patch hooks\n> up the pfn for instruction UE errors (ierror).\n> \n> The fourth patch hooks up memory_failure to the MCE patch.\n> \n> TODO:\n> Log the address in NVRAM, so that we can recover from\n> bad pages at boot and keep the blacklist persistent.\n> \n> Changelog v2:\n> \t- address review comments from Nick and Mahesh\n> \t(initialization of pfn and more comments on failure\n> \twhen addr_to_pfn() or anaylse_instr() fail)\n> \t- Hookup ierrors to the framework as well\n> \t(comments from Mahesh)\n> \n> Balbir Singh (4):\n>   powerpc/mce.c: Remove unused function get_mce_fault_addr()\n>   powerpc/mce: Hookup derror (load/store) UE errors\n>   powerpc/mce: Hookup ierror (instruction) UE errors\n>   powerpc/mce: hookup memory_failure for UE errors\n> \n>  arch/powerpc/include/asm/mce.h  |   4 +-\n>  arch/powerpc/kernel/mce.c       | 108 ++++++++++++++++++++++++----------------\n>  arch/powerpc/kernel/mce_power.c | 105 +++++++++++++++++++++++++++++++++++---\n>  3 files changed, 163 insertions(+), 54 deletions(-)\n>","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xrt6p2wc7z9s8J\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 12 Sep 2017 15:06:10 +1000 (AEST)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 3xrt6p1dn5zDqjr\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 12 Sep 2017 15:06:10 +1000 (AEST)","from mail-pf0-x243.google.com (mail-pf0-x243.google.com\n\t[IPv6:2607:f8b0:400e:c00::243])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 3xrt4G33M7zDqjf;\n\tTue, 12 Sep 2017 15:03:58 +1000 (AEST)","by mail-pf0-x243.google.com with SMTP id f84so5813014pfj.3;\n\tMon, 11 Sep 2017 22:03:58 -0700 (PDT)","from roar.ozlabs.ibm.com (203-219-56-202.tpgi.com.au.\n\t[203.219.56.202]) by smtp.gmail.com with ESMTPSA id\n\tp88sm16478012pfi.174.2017.09.11.22.03.52\n\t(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);\n\tMon, 11 Sep 2017 22:03:55 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"eV6IdUk6\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"eV6IdUk6\"; dkim-atps=neutral","ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gmail.com\n\t(client-ip=2607:f8b0:400e:c00::243; helo=mail-pf0-x243.google.com;\n\tenvelope-from=npiggin@gmail.com; receiver=<UNKNOWN>)","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"eV6IdUk6\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=date:from:to:cc:subject:message-id:in-reply-to:references\n\t:organization:mime-version:content-transfer-encoding;\n\tbh=B+c/NjT19kVZpP/AOfg/NZIF5LO9Xdvo+5wQjgMh1bs=;\n\tb=eV6IdUk6kWs2iLpXu17ekDYWi0sAJK3FXDIWca46rFvD7jE40JTzaLZh3Ms9xzULlO\n\tTvnuVBmC+abhbTgy2RcNLmhoI16N4IxqBsEy4m2lOb1FBoZzwpUi84Aq9/Nja9fZJiy5\n\tAHU97uJo+tyoXiePTuwU7PaCjS3MCpJtyUTf62gwkMfeKops1c0lcwHAqtAB+n1MERkw\n\tc6c1ADok+WpRl4BBGcKASAhLar8P6zvoMK8+mC8SRCdEOfgoyBTFtzFMEVzJGHv2zgGO\n\tcU3U/LeZkGiFALKzido9dwWK9gUT3ByDoWuy8sFUm/sbhiIG5S6qKA6B0+6tuMXyzaoY\n\tYyDw==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to\n\t:references:organization:mime-version:content-transfer-encoding;\n\tbh=B+c/NjT19kVZpP/AOfg/NZIF5LO9Xdvo+5wQjgMh1bs=;\n\tb=tAiIbEWlf11w9GDpDLaqWO4xyJeq5rsKFPrxpCkajzLE/13kcqBBoHMupt67263JdQ\n\tD2OssAF0A50MTeqBQc7o3hERgEOB9Sa3KDp7y8sZVSQOTr05MJr+3QzZDMKhzzrQ+xf4\n\twViGPocZ+fwSFopfg7MYRyu3NyzxwCLF7uXUojn2I4gPrpMPEViUC82KNsqKPu4M+YmH\n\t9gSC/MOk0EXqcUzF5W5h/47Y9DYhEkeg+AUuIJTlKwTj/+UJ0Yyy2PEMErBdHI8JhEjC\n\tl4wlcUOAMC1dF+/td8LivxID+IWnkJXmWjEF7Dd3+Af4jzJrGbEhb8CFOgnHupOUoF8m\n\taVVg==","X-Gm-Message-State":"AHPjjUieVsS4AwAIdiCuR8PomHbm1okbq1tlbB8rfLnfZYJVr8Bo6B1e\n\teWxTTmow6zLuQg==","X-Google-Smtp-Source":"AOwi7QCciPwCg/she7tKFCLAmD+X+1EsIkN59PV77iE0xLwulBkmSecJsvtaPV+pU9E1ewYkXL8pZQ==","X-Received":"by 10.84.211.9 with SMTP id b9mr1098947pli.105.1505192636351;\n\tMon, 11 Sep 2017 22:03:56 -0700 (PDT)","Date":"Tue, 12 Sep 2017 15:03:44 +1000","From":"Nicholas Piggin <npiggin@gmail.com>","To":"Balbir Singh <bsingharora@gmail.com>","Subject":"Re: [PATCH v1 0/4] Revisit MCE handling for UE Errors","Message-ID":"<20170912150344.2d6702d1@roar.ozlabs.ibm.com>","In-Reply-To":"<20170912043859.32473-1-bsingharora@gmail.com>","References":"<20170912043859.32473-1-bsingharora@gmail.com>","Organization":"IBM","X-Mailer":"Claws Mail 3.15.0-dirty (GTK+ 2.24.31; x86_64-pc-linux-gnu)","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII","Content-Transfer-Encoding":"7bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.23","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"linuxppc-dev@lists.ozlabs.org, skiboot@lists.ozlabs.org,\n\tmahesh@linux.vnet.ibm.com","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":1766712,"web_url":"http://patchwork.ozlabs.org/comment/1766712/","msgid":"<CAKTCnzmSn+gyYKj0Bksiu0iBLnRkd+PwMjvE=LYe4x7dFrrHeA@mail.gmail.com>","date":"2017-09-12T07:11:42","subject":"Re: [PATCH v1 0/4] Revisit MCE handling for UE Errors","submitter":{"id":9347,"url":"http://patchwork.ozlabs.org/api/people/9347/","name":"Balbir Singh","email":"bsingharora@gmail.com"},"content":"On Tue, Sep 12, 2017 at 3:03 PM, Nicholas Piggin <npiggin@gmail.com> wrote:\n> Hi Balbir,\n>\n> Very cool. How are you testing it? Is it failing memory pages\n> and poisoning them out properly?\n>\n\nYep, I tested it and it seems to work correctly so far. I am testing this\non a simulator with injected MCE UE errors for both the data and\ninstruction side.\n\n> Looks like you have a printk in the machine_check_early path,\n> which you shouldn't. I guess because we don't mark that context\n> as an NMI. Which we could... but I think you want to put as\n> little as possible in that path, so avoiding the print would\n> be preferable. Perhaps you could mark the mce event somehow that\n> the failure can be reported during processing it?\n>\n\nGood point, I did see that printk handles stuff via printk_nmi_enter/exit,\nbut its best avoided. Will spin v2\n\n> Firmware logging is a good question, I could not really see\n> where this all gets plumbed through. If this is expected to be\n> a common problem for some types of attached memory, then we\n> really need to build up a log of these errors that can be used\n> to exclude the memory after a reboot too. Do we have anything\n> like this capability in firmware?\n\nIt's to be built, we should log these to NVRAM and revisit at every\nboot to isolate these pages\n\nBalbir Singh.","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68])\n\t(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xrwxK5XYYz9s8J\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 12 Sep 2017 17:13:09 +1000 (AEST)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 3xrwxK4PN4zDrKL\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 12 Sep 2017 17:13:09 +1000 (AEST)","from mail-ua0-x22b.google.com (mail-ua0-x22b.google.com\n\t[IPv6:2607:f8b0:400c:c08::22b])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 3xrwvj1S06zDrG1;\n\tTue, 12 Sep 2017 17:11:45 +1000 (AEST)","by mail-ua0-x22b.google.com with SMTP id q29so13938993uaf.3;\n\tTue, 12 Sep 2017 00:11:44 -0700 (PDT)","by 10.176.75.25 with HTTP; Tue, 12 Sep 2017 00:11:42 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"nR62K/pY\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"nR62K/pY\"; dkim-atps=neutral","ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gmail.com\n\t(client-ip=2607:f8b0:400c:c08::22b; helo=mail-ua0-x22b.google.com;\n\tenvelope-from=bsingharora@gmail.com; receiver=<UNKNOWN>)","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"nR62K/pY\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=mime-version:in-reply-to:references:from:date:message-id:subject:to\n\t:cc; bh=lG3lxtaONdOjLZ0Oy5FfJZDRPRor7L4yQ7/7NsNbC0w=;\n\tb=nR62K/pYnSSGCVgwaEi7lsBp9Q2SAufxhOrEAGLHDwNYS7YkR2HfR5uR7mlfXLGY5J\n\tT7fYzU6arzygDnwDbDuC1cT5vR8AlQwTzODX+h1E0SPn1OnBa/Z48Vmo8A/riB4ZGtqY\n\tvQRKG+iydM0XjhWXeud12327oFTb8n7pkwO/zUwVDcJSoKarecQ4e1Ls/kHXMFBprNFA\n\tYSu4rgH97Rw/cgakccNmijVkxTlk1Am1qTT4/Am/e9+RSh+qhAnoV6gHbTFseOsfbPtG\n\txRKx388oIGtA+lg91jMnDG23/hLtTYQEF8zTKtSxmbjQvs4upO4Zp8nVbb4iy1Yj9uK5\n\tBkBw==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=lG3lxtaONdOjLZ0Oy5FfJZDRPRor7L4yQ7/7NsNbC0w=;\n\tb=V8up7QW696N9/flM5hDgTCA985IAQwrg8Mob38DKjDSCjAuAK7PDZd21+a9d1pIgXP\n\tbthKCk9Je2vKKXd3F6j/Hoom4Axk95JvWyOu+odC9L2cQIfAzxIPmAMN0lBKxbngsztV\n\t0m+5KZZTy/NeB0TYYor+GIlnyXjnuhoBUGzJxt8LFYoUk8IOkY552RwtT7usFKbwI22p\n\tuMXGAfdtPjelY25v9VEkDTG1Txr0ACjB37qtz9VcMM5tLtpLXMyd7oZ6NtyAmfRxv+lB\n\tThwQ6ZVg+DD83LKjA2M9tmVOfoPSs+AExyE0UQM3veKs7AICSNB8HCt039dW/Lk3QVQY\n\tOIBA==","X-Gm-Message-State":"AHPjjUgSjFSIriqxw2FNuEuT0AYP5MCZTmAjVclKPBZVuLP1PuggQvl7\n\tNqmZxfkoC/Lp6uYzbueXEGSrFxYS0Q==","X-Google-Smtp-Source":"AOwi7QBWgLadArFbJ9ZT8cQT4Z4gGVBF60EgZCVT8ZdO4DQFMLrNVZKGddBzhSoWbhobacl2DznqdbFPRV27hm7S7+k=","X-Received":"by 10.159.61.111 with SMTP id m47mr11587718uai.33.1505200302573; \n\tTue, 12 Sep 2017 00:11:42 -0700 (PDT)","MIME-Version":"1.0","In-Reply-To":"<20170912150344.2d6702d1@roar.ozlabs.ibm.com>","References":"<20170912043859.32473-1-bsingharora@gmail.com>\n\t<20170912150344.2d6702d1@roar.ozlabs.ibm.com>","From":"Balbir Singh <bsingharora@gmail.com>","Date":"Tue, 12 Sep 2017 17:11:42 +1000","Message-ID":"<CAKTCnzmSn+gyYKj0Bksiu0iBLnRkd+PwMjvE=LYe4x7dFrrHeA@mail.gmail.com>","Subject":"Re: [PATCH v1 0/4] Revisit MCE handling for UE Errors","To":"Nicholas Piggin <npiggin@gmail.com>","Content-Type":"text/plain; charset=\"UTF-8\"","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.23","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"\"open list:LINUX FOR POWERPC \\(32-BIT AND 64-BIT\\)\"\n\t<linuxppc-dev@lists.ozlabs.org>,\n\tskiboot list <skiboot@lists.ozlabs.org>, \n\tMahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}}]