Message ID | 20180406041409.8353-1-npiggin@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | [RFC] KVM: PPC: Book3S HV: send kvmppc_bad_interrupt NMIs to Linux handlers | expand |
On Fri, Apr 06, 2018 at 02:14:09PM +1000, Nicholas Piggin wrote: > It's possible to take a SRESET or MCE in these paths due to a bug > in the host code or a NMI IPI etc. A recent bug attempting to load > a virtual address gave a complete but cryptic error, abridged: > > Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] > LE SMP NR_CPUS=2048 NUMA PowerNV > CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted > NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 > REGS: c000000fff76dd80 TRAP: 0200 Not tainted > MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 > CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 > NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 > LR [c0000000000c2430] do_tlbies+0x230/0x2f0 > > Sending NMIs through the Linux handlers gives a nicer output: > > Severe Machine check interrupt [Not recovered] > NIP [c0000000000155ac]: perf_trace_tlbie+0x2c/0x1a0 > Initiator: CPU > Error type: Real address [Load (bad)] > Effective address: d00017fffcc01a28 > opal: Machine check interrupt unrecoverable: MSR(RI=0) > opal: Hardware platform error: Unrecoverable Machine Check exception > CPU: 0 PID: 6700 Comm: qemu-system-ppc Tainted: G M > NIP: c0000000000155ac LR: c0000000000c23c0 CTR: c000000000015580 > REGS: c000000fff9e9d80 TRAP: 0200 Tainted: G M > MSR: 9000000000201001 <SF,HV,ME,LE> CR: 48082222 XER: 00000000 > CFAR: 000000010cbc1a30 DAR: d00017fffcc01a28 DSISR: 00000040 SOFTE: 3 > NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 > LR [c0000000000c23c0] do_tlbies+0x1c0/0x280 > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > This is lightly tested only, not sure if there would be a better way > to do it? The patch seems reasonable. I have just one comment below. Do you want to re-send it without the RFC tag, or should I just take it? > arch/powerpc/kvm/book3s_hv_builtin.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c > index de18299f92b7..0b9b8e188bfa 100644 > --- a/arch/powerpc/kvm/book3s_hv_builtin.c > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c > @@ -18,6 +18,7 @@ > #include <linux/cma.h> > #include <linux/bitops.h> > > +#include <asm/asm-prototypes.h> > #include <asm/cputable.h> > #include <asm/kvm_ppc.h> > #include <asm/kvm_book3s.h> > @@ -633,6 +634,17 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr) > > void kvmppc_bad_interrupt(struct pt_regs *regs) > { > + regs->msr &= ~MSR_RI; This is worth mentioning in the commit message. I assume you're doing it to make sure we won't try to continue execution under any circumstances? > + /* > + * 100 could happen at any time, 200 can happen due to > + * invalid real address access for example. > + */ > + if (TRAP(regs) == 0x100) { > + get_paca()->in_nmi++; > + system_reset_exception(regs); > + } > + if (TRAP(regs) == 0x200) > + machine_check_exception(regs); > die("Bad interrupt in KVM entry/exit code", regs, SIGABRT); > panic("Bad KVM trap"); > } > -- > 2.16.3 Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 17 May 2018 16:33:13 +1000 Paul Mackerras <paulus@ozlabs.org> wrote: > On Fri, Apr 06, 2018 at 02:14:09PM +1000, Nicholas Piggin wrote: > > It's possible to take a SRESET or MCE in these paths due to a bug > > in the host code or a NMI IPI etc. A recent bug attempting to load > > a virtual address gave a complete but cryptic error, abridged: > > > > Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] > > LE SMP NR_CPUS=2048 NUMA PowerNV > > CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted > > NIP: c0000000000155ac LR: c0000000000c2430 CTR: > > c000000000015580 REGS: c000000fff76dd80 TRAP: 0200 Not tainted > > MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: > > 00000000 CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: > > 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 > > LR [c0000000000c2430] do_tlbies+0x230/0x2f0 > > > > Sending NMIs through the Linux handlers gives a nicer output: > > > > Severe Machine check interrupt [Not recovered] > > NIP [c0000000000155ac]: perf_trace_tlbie+0x2c/0x1a0 > > Initiator: CPU > > Error type: Real address [Load (bad)] > > Effective address: d00017fffcc01a28 > > opal: Machine check interrupt unrecoverable: MSR(RI=0) > > opal: Hardware platform error: Unrecoverable Machine Check > > exception CPU: 0 PID: 6700 Comm: qemu-system-ppc Tainted: G M > > NIP: c0000000000155ac LR: c0000000000c23c0 CTR: > > c000000000015580 REGS: c000000fff9e9d80 TRAP: 0200 Tainted: G M > > MSR: 9000000000201001 <SF,HV,ME,LE> CR: 48082222 XER: > > 00000000 CFAR: 000000010cbc1a30 DAR: d00017fffcc01a28 DSISR: > > 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 > > LR [c0000000000c23c0] do_tlbies+0x1c0/0x280 > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > --- > > This is lightly tested only, not sure if there would be a better way > > to do it? > > The patch seems reasonable. I have just one comment below. Do you > want to re-send it without the RFC tag, or should I just take it? I can re-send if you'd like an updated changelog and comment for that MSR_RI bit. > > > arch/powerpc/kvm/book3s_hv_builtin.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c > > b/arch/powerpc/kvm/book3s_hv_builtin.c index > > de18299f92b7..0b9b8e188bfa 100644 --- > > a/arch/powerpc/kvm/book3s_hv_builtin.c +++ > > b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -18,6 +18,7 @@ > > #include <linux/cma.h> > > #include <linux/bitops.h> > > > > +#include <asm/asm-prototypes.h> > > #include <asm/cputable.h> > > #include <asm/kvm_ppc.h> > > #include <asm/kvm_book3s.h> > > @@ -633,6 +634,17 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, > > unsigned long xirr) > > void kvmppc_bad_interrupt(struct pt_regs *regs) > > { > > + regs->msr &= ~MSR_RI; > > This is worth mentioning in the commit message. I assume you're doing > it to make sure we won't try to continue execution under any > circumstances? That's right, sure I'll mention it. Probably a comment here wouldn't hurt either. Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index de18299f92b7..0b9b8e188bfa 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -18,6 +18,7 @@ #include <linux/cma.h> #include <linux/bitops.h> +#include <asm/asm-prototypes.h> #include <asm/cputable.h> #include <asm/kvm_ppc.h> #include <asm/kvm_book3s.h> @@ -633,6 +634,17 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr) void kvmppc_bad_interrupt(struct pt_regs *regs) { + regs->msr &= ~MSR_RI; + /* + * 100 could happen at any time, 200 can happen due to + * invalid real address access for example. + */ + if (TRAP(regs) == 0x100) { + get_paca()->in_nmi++; + system_reset_exception(regs); + } + if (TRAP(regs) == 0x200) + machine_check_exception(regs); die("Bad interrupt in KVM entry/exit code", regs, SIGABRT); panic("Bad KVM trap"); }
It's possible to take a SRESET or MCE in these paths due to a bug in the host code or a NMI IPI etc. A recent bug attempting to load a virtual address gave a complete but cryptic error, abridged: Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 REGS: c000000fff76dd80 TRAP: 0200 Not tainted MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c2430] do_tlbies+0x230/0x2f0 Sending NMIs through the Linux handlers gives a nicer output: Severe Machine check interrupt [Not recovered] NIP [c0000000000155ac]: perf_trace_tlbie+0x2c/0x1a0 Initiator: CPU Error type: Real address [Load (bad)] Effective address: d00017fffcc01a28 opal: Machine check interrupt unrecoverable: MSR(RI=0) opal: Hardware platform error: Unrecoverable Machine Check exception CPU: 0 PID: 6700 Comm: qemu-system-ppc Tainted: G M NIP: c0000000000155ac LR: c0000000000c23c0 CTR: c000000000015580 REGS: c000000fff9e9d80 TRAP: 0200 Tainted: G M MSR: 9000000000201001 <SF,HV,ME,LE> CR: 48082222 XER: 00000000 CFAR: 000000010cbc1a30 DAR: d00017fffcc01a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c23c0] do_tlbies+0x1c0/0x280 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- This is lightly tested only, not sure if there would be a better way to do it? arch/powerpc/kvm/book3s_hv_builtin.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)