{"id":781468,"url":"http://patchwork.ozlabs.org/api/patches/781468/?format=json","web_url":"http://patchwork.ozlabs.org/project/skiboot/patch/20170628170415.38d744e7@roar.ozlabs.ibm.com/","project":{"id":44,"url":"http://patchwork.ozlabs.org/api/projects/44/?format=json","name":"skiboot firmware development","link_name":"skiboot","list_id":"skiboot.lists.ozlabs.org","list_email":"skiboot@lists.ozlabs.org","web_url":"http://github.com/open-power/skiboot","scm_url":"http://github.com/open-power/skiboot","webscm_url":"","list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20170628170415.38d744e7@roar.ozlabs.ibm.com>","list_archive_url":null,"date":"2017-06-28T07:04:15","name":"[RFC] hmi: clear xscom and unknown bits from HMER","commit_ref":null,"pull_url":null,"state":"rejected","archived":false,"hash":"5e9526094899ca850f6695dbea0510b0ecf70e67","submitter":{"id":69518,"url":"http://patchwork.ozlabs.org/api/people/69518/?format=json","name":"Nicholas Piggin","email":"npiggin@gmail.com"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/skiboot/patch/20170628170415.38d744e7@roar.ozlabs.ibm.com/mbox/","series":[],"comments":"http://patchwork.ozlabs.org/api/patches/781468/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/781468/checks/","tags":{},"related":[],"headers":{"Return-Path":"<skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","skiboot@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","skiboot@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68])\n\t(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3wyDLf2y0wz9s74\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 28 Jun 2017 17:04:42 +1000 (AEST)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 3wyDLf1wdWzDr3y\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 28 Jun 2017 17:04:42 +1000 (AEST)","from mail-pf0-x241.google.com (mail-pf0-x241.google.com\n\t[IPv6:2607:f8b0:400e:c00::241])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 3wyDLT71rGzDr3p\n\tfor <skiboot@lists.ozlabs.org>; Wed, 28 Jun 2017 17:04:33 +1000 (AEST)","by mail-pf0-x241.google.com with SMTP id d5so7860222pfe.1\n\tfor <skiboot@lists.ozlabs.org>; Wed, 28 Jun 2017 00:04:33 -0700 (PDT)","from roar.ozlabs.ibm.com (59-102-83-48.tpgi.com.au. [59.102.83.48])\n\tby smtp.gmail.com with ESMTPSA id\n\tt5sm2045833pgt.19.2017.06.28.00.04.27\n\t(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);\n\tWed, 28 Jun 2017 00:04:30 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"AMIAqivf\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"AMIAqivf\"; dkim-atps=neutral","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"AMIAqivf\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=date:from:to:cc:subject:message-id:in-reply-to:references\n\t:organization:mime-version:content-transfer-encoding;\n\tbh=lN+Xy6iEGZvGg6bh3EPv1fucCP0c/yT/GokGQg6egWg=;\n\tb=AMIAqivfhXFLfg5uC/NfAA667Gr8d5O8vRD/WiRDwkLAjXQ0CFqFZVBbLbl/3lYgL0\n\tbDXEDxhsC5YiyXnbsW+J8zLT2VqHw5lDcFJIw8FW0xaQzEg5BVxiAKfP7GxQkJQxVTHb\n\tV8sY4u84/eEtk7ab2hcKQxL2PqTnNwDBCrdHF6+rKF7kJrrCqms1srQp+y44XbXmsTAY\n\tFAMk2EuMJMzASjOMA8hs2E1NQ2m5y8xyesk9tCrwqD8XYuia1PdNDe51WSwU/CkK9yXX\n\thSLPF5a19Q27RN21MKfTb4xemBNqHV4CQp4uutXMinKxyWxVi39/nBaUTB/MOtxXVYkT\n\tHWvA==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to\n\t:references:organization:mime-version:content-transfer-encoding;\n\tbh=lN+Xy6iEGZvGg6bh3EPv1fucCP0c/yT/GokGQg6egWg=;\n\tb=Rw3TFSkA5jO9nOg8nLEw2k42EdfAlF/CmUlMhIAAP4wf3+VEKqhOZF9Fjbg0XeNqPL\n\tETR1DH2TPrb/fPbFiYJW3LfOfSYo90YTLSEPRPFc15ldxysytvj1nNzEL1k6UzQ+l0CA\n\tSBJVDdeK70cJ9bKwdG0BtLKie8KfnWPce5ADnRByHk4c5gr4mL0cLhjGcj9pB34jc0Zq\n\tFgVuH3pQKABU9umZwxUmZ8Wd1rfv8k7277eoSyGNJpQyKPPSpqmItO/OiQ2+25VRn67W\n\tkHxA9kzutDWYqfKIUE5mshdklfR0zP9s0oU9j6YbPddCeET2X28YHRF5OW12kKhFyE/S\n\tKVHw==","X-Gm-Message-State":"AKS2vOyYa/fz2k357O16LHqY8Wx/eRo1wLDXUcmzETMwbokHwbg7aNq6\n\tGu7graqN8HQDGQ==","X-Received":"by 10.84.148.203 with SMTP id y11mr10030783plg.211.1498633471698;\n\tWed, 28 Jun 2017 00:04:31 -0700 (PDT)","Date":"Wed, 28 Jun 2017 17:04:15 +1000","From":"Nicholas Piggin <npiggin@gmail.com>","To":"Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>","Message-ID":"<20170628170415.38d744e7@roar.ozlabs.ibm.com>","In-Reply-To":"<1005db51-d5ba-abbf-2f2d-68604ecb83ff@linux.vnet.ibm.com>","References":"<20170623121101.30781-1-npiggin@gmail.com>\n\t<0ca45776-6b27-edcc-6675-267886ec3aa0@linux.vnet.ibm.com>\n\t<1498566765.3651.31.camel@kernel.crashing.org>\n\t<f7d7d224-cda2-1754-4506-2a77293fdc67@linux.vnet.ibm.com>\n\t<20170628144156.6e71c220@roar.ozlabs.ibm.com>\n\t<1005db51-d5ba-abbf-2f2d-68604ecb83ff@linux.vnet.ibm.com>","Organization":"IBM","X-Mailer":"Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu)","MIME-Version":"1.0","Subject":"Re: [Skiboot] [RFC][PATCH] hmi: clear xscom and unknown bits from\n\tHMER","X-BeenThere":"skiboot@lists.ozlabs.org","X-Mailman-Version":"2.1.23","Precedence":"list","List-Id":"Mailing list for skiboot development <skiboot.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/skiboot>,\n\t<mailto:skiboot-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/skiboot/>","List-Post":"<mailto:skiboot@lists.ozlabs.org>","List-Help":"<mailto:skiboot-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/skiboot>,\n\t<mailto:skiboot-request@lists.ozlabs.org?subject=subscribe>","Cc":"Ryan Grimm <grimm@linux.vnet.ibm.com>, aksadiga@in.ibm.com,\n\tskiboot@lists.ozlabs.org","Content-Type":"text/plain; charset=\"utf-8\"","Content-Transfer-Encoding":"base64","Errors-To":"skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org","Sender":"\"Skiboot\"\n\t<skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>"},"content":"On Wed, 28 Jun 2017 12:02:55 +0530\nMahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:\n\n> On 06/28/2017 10:11 AM, Nicholas Piggin wrote:\n> > On Wed, 28 Jun 2017 09:00:05 +0530\n> > Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com> wrote:\n> >   \n> >> On 06/27/2017 06:02 PM, Benjamin Herrenschmidt wrote:  \n> >>> On Tue, 2017-06-27 at 10:46 +0530, Mahesh Jagannath Salgaonkar wrote:    \n> >>>> On 06/23/2017 05:41 PM, Nicholas Piggin wrote:    \n> >>>>> It has been observed the xscom bit in HMER gets stuck (as-yet    \n> >>>>\n> >>>> We see that stuck because opal never clears it after scom read/write.\n> >>>> The bit is cleared just before the next scom read/write. I am not sure\n> >>>> what it was left uncleared until next scom read/write kicks in.    \n> >>>\n> >>> Because we don't care ?     \n> >>\n> >> looking at the code it looks like we didn't care. I sent out a patch\n> >> that clears them once scom operation is complete.\n> >>  \n> >>> It should not be enabled in HMEER...    \n> >>\n> >> Yes, we don't enable them in HMEER.\n> >>  \n> >>>>    \n> >>>>> unkonwn root cause -- HMEER should disable those exceptions).\n> >>>>> This causes HMIs to be continually taken.\n> >>>>>\n> >>>>> HMI: Received HMI interrupt: HMER = 0x0040000000000000\n> >>>>>\n> >>>>> Add some attempt to handle this by clearing the HMER and HMEER.\n> >>>>>\n> >>>>> Try to clear HMER for other unknown HMIs (alternative is to not\n> >>>>> recover).    \n> >>>>\n> >>>> I think we should be just ok with clearing out and masking them again.    \n> >>>\n> >>> Right but we need to understand why we are taking the HMI in the first\n> >>> place since it's not enabled in HMEER unless something's wrong there.\n> >>> Is that reproduceable ?    \n> >>\n> >> We did debug it yesterday and found the reason. Akshay sent out a patch\n> >> that fixes the issue. http://patchwork.ozlabs.org/patch/781434/  \n> > \n> > Given that this bug was caused by Linux, and not due to an actual\n> > HMI (and therefore would not be fixed by clearing the HMER/HMEER\n> > bits), I wonder if this patch is still warranted. HMEER could be\n> > messed up somehow, so maybe a simplified version that just notes\n> > the unexpected HMI and masks out HMEER.\n> > \n> > Any opinions?  \n> \n> Yeah I agree with having simplified version so that it will help us to\n> detect if we at all mess up with HMEER in future.\n\nHow about something very simple?\n\n[PATCH] hmi: report HMEER and fail when encountering unknown HMI\n\nIn the interest of minimising corruption and improving debugging, when\nencountering an unknown HMI, report HMEER and HMER with PR_ERR, and do\nnot recover.\n\nThis is an improvement on existing behaviour, which in the best case\nwill just continue to take HMIs because the exception remains asserted\nand this will cause the system to become unstable.\n\nSigned-off-by: Nicholas Piggin <npiggin@gmail.com>\n---\n core/hmi.c | 9 +++++++++\n 1 file changed, 9 insertions(+)","diff":"diff --git a/core/hmi.c b/core/hmi.c\nindex 84f2c2d6..3329d055 100644\n--- a/core/hmi.c\n+++ b/core/hmi.c\n@@ -823,6 +823,15 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt)\n \t\t}\n \t}\n \n+\tif (hmer) {\n+\t\tuint64_t hmeer = mfspr(SPR_HMEER);\n+\n+\t\tprlog(PR_ERR, \"HMI: Unexpected exception HMEER = 0x%016llx \"\n+\t\t\t      \"HMER = 0x%016llx (not recovering)\\n\",\n+\t\t\t      hmeer, hmer);\n+\t\trecover = 0;\n+\t}\n+\n \tif (recover == 0)\n \t\tdisable_fast_reboot(\"Unrecoverable HMI\");\n \t/*\n","prefixes":["RFC"]}