From patchwork Wed Jan 17 12:08:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 862244 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zM5lq5cpMz9sNV for ; Wed, 17 Jan 2018 23:20:55 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="OcGUkfZz"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3zM5lq0GWlzDqp3 for ; Wed, 17 Jan 2018 23:20:55 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="OcGUkfZz"; dkim-atps=neutral X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c05::243; helo=mail-pg0-x243.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="OcGUkfZz"; dkim-atps=neutral Received: from mail-pg0-x243.google.com (mail-pg0-x243.google.com [IPv6:2607:f8b0:400e:c05::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3zM5Tq02XrzDqgK; Wed, 17 Jan 2018 23:08:46 +1100 (AEDT) Received: by mail-pg0-x243.google.com with SMTP id n17so3584282pgf.10; Wed, 17 Jan 2018 04:08:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=kmIiHzR0PdbHGiiQr5/vEN2eV8MAIhrWnxTrr0HPCBU=; b=OcGUkfZzL1nsvp2Gripjf9oqPvJuPL1IH01dpZ9ZBQEtdU7+qNrHODQ1fvzQ0NNyuw UIvg59xefYFqMMwFXhiOPW5NFA/IoSU8M9b1EmOds1pTHcpu1Ui/KmrY6+52/RneTChe BG7RhtF0mX8mH6N4HtEP3QHXrrTEb5Va+0IIBAu2iIGXg7XvixUlispm1dpceaHVZbku rFOGCfz/cosutnmJQNebhiniwlY5ax5skm697w8tAJKuZzGgBydXwACAy8oE5Dsr1hEu lHbbFk7KFmfVtgN5yQqWcuvvLDbJSYXKIlRmiBkOZCud+A63qKHhE73JlRA13y1r1UVt PuiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=kmIiHzR0PdbHGiiQr5/vEN2eV8MAIhrWnxTrr0HPCBU=; b=uIFMSY50yC94trFlucxEIUznTqZAWg1kR53DKbkWojAFpyUfNXmBF/MjrHQdsbyv0D o1I0ej/k0tpAaSquazI4jWz9MIcYwoM0dsRtVgHfJKvOt6RFK48UcgjNQTPvtD89LOlJ btL/D+UcNTMN1srNbdSm22e512yJ6mGgtqRl7C+qMYNeFuIq3o/DaETHPlE2cd9zzLxY 0kbUue7nDKb56RiHECg1PhH5sX0AOsrNnzt8BTT3Ii1lSDDz53a2lQZviyknatbPYaAW 9SeYO8mQWIjiFrMPOFh/yi+qTOTrAINfZsOVTyCnxeh/1m0UBUWf3t+Kkry/5OOYyfFM BO/Q== X-Gm-Message-State: AKwxytcjwT3F9c3s57RMj1yC9GQNUvNKgc9tq5oJSr42xvxoQjAOg9Wz MFfCe7l22yaEFbWwu/Owf/P+8A== X-Google-Smtp-Source: ACJfBovPSMlU4vL1JeVcUr02qVY0WRnTxbrly5y4+7bsYM3TukZpSTM+4uXC6P6GterYeEQg2D5nyg== X-Received: by 10.98.67.138 with SMTP id l10mr4760480pfi.72.1516190923753; Wed, 17 Jan 2018 04:08:43 -0800 (PST) Received: from roar.au.ibm.com ([203.63.188.198]) by smtp.gmail.com with ESMTPSA id r14sm8501454pfa.136.2018.01.17.04.08.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Jan 2018 04:08:42 -0800 (PST) From: Nicholas Piggin To: linuxppc-dev@lists.ozlabs.org, skiboot@lists.ozlabs.org Date: Wed, 17 Jan 2018 22:08:30 +1000 Message-Id: <20180117120831.22533-1-npiggin@gmail.com> X-Mailer: git-send-email 2.15.1 Subject: [Skiboot] [PATCH][RFC] powerpc/powernv: Taking non-maskable interrupts in OPAL X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.24 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" - Linux uses r13 as a per-cpu register. - Does not restore r13 when returning from interrupt to kernel context (MSR[PR]=0), to accommodate migrating to a different CPU. - OPAL runs with MSR[PR]=0. - OPAL uses r13 for its own register. - OPAL runs with MSR[RI]=1. This means that if we take an interrupt in OPAL (sreset or machine check), Linux may return to OPAL context with the wrong r13. I propose to fix this for now by restoring r13 of interrupted kernel context if MSR[EE]=0. Linux process context won't migrate between CPUs if interrupts are disabled. Possibly we do something a bit smarter in future like paca->in_opal, but this may be a minimal fix. Another issue I ran into when testing this stuff is that our machine check platform error shutdown code that we call from machine check makes various OPAL calls (write nvram, flush console, reboot). Upstream skiboot now has some checks to detect re-entrant calls and reject them. This causes a few issues (infinite loops due to OPAL_BUSY mainly, but we could change that to a different error code). So we can avoid the unnecessary calls if our regs came from "in_opal", but we may still want to perform a few calls. OPAL_SIGNAL_SYSTEM_RESET, OPAL_QUIESCE, and OPAL_CEC_REBOOT/REBOOT2/SHUTDOWN come to mind. Once we make such a re-entrant call, the interrupted OPAL stack is destroyed so we can never return. That's probably okay for the machine check error shutdown path, better to attempt a reboot than leave the machine hung. It's not clear exactly what we want to allow and exclude though. Should we try to write to the console? NVRAM? The more we do the more chance we have of getting stuck somewhere or corrupting things further. What I've done is a quick proof of concept for Linux and skiboot which gives some idea of what we can do. It works, but it feels a bit ad hoc. Not sure. It would be good to have us decide how to deal with all this. There's actually several other questions that open up if we consider that we may want to get debugging information out of opal if it's interrupted by a system reset or machine check -- we can't re-enter and trash the stack because we'd be unable to get a backtrace, maybe a special crash call could tidy things up for us and print useful information. Anyway, I think we can get some minimal fixes in which mostly make things work, but needs more careful thinking in the longer term. Haven't finished the patches completely yet, so I'll repost more polished versions hopefully after comments. Thanks, Nick --- arch/powerpc/kernel/entry_64.S | 16 +++++++++++++--- arch/powerpc/platforms/powernv/opal.c | 10 ++++++---- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2748584b767d..ca25d1056874 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -266,7 +266,7 @@ BEGIN_FTR_SECTION HMT_MEDIUM_LOW END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) - ld r13,GPR13(r1) /* only restore r13 if returning to usermode */ + ld r13,GPR13(r1) /* restore r13 if returning to usermode */ ld r2,GPR2(r1) ld r1,GPR1(r1) mtlr r4 @@ -277,7 +277,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) b . /* prevent speculative execution */ /* exit to kernel */ -1: ld r2,GPR2(r1) +1: + andi. r6,r8,MSR_EE + bne 2f + ld r13,GPR13(r1) /* also restore r13 if EE=0 */ +2: + ld r2,GPR2(r1) ld r1,GPR1(r1) mtlr r4 mtcr r5 @@ -908,7 +913,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) RFI_TO_USER b . /* prevent speculative execution */ -1: mtspr SPRN_SRR1,r3 +1: + andi. r0,r3,MSR_EE + bne+ 2f + REST_GPR(13, r1) +2: + mtspr SPRN_SRR1,r3 ld r2,_CCR(r1) mtcrf 0xFF,r2 diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c index 041ddbd1fc57..5b2997d6e894 100644 --- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -475,10 +475,12 @@ void pnv_platform_error_reboot(struct pt_regs *regs, const char *msg) show_regs(regs); smp_send_stop(); printk_safe_flush_on_panic(); - kmsg_dump(KMSG_DUMP_PANIC); - bust_spinlocks(0); - debug_locks_off(); - console_flush_on_panic(); + if (!(regs->nip >= opal.base && regs->nip < opal.base + opal.size)) { + kmsg_dump(KMSG_DUMP_PANIC); + bust_spinlocks(0); + debug_locks_off(); + console_flush_on_panic(); + } /* * Don't bother to shut things down because this will