Message ID | 20180405175631.31381-2-npiggin@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | KVM powerpc tlbie scalability improvement | expand |
On Fri, Apr 6, 2018 at 3:56 AM, Nicholas Piggin <npiggin@gmail.com> wrote: > This crashes with a "Bad real address for load" attempting to load > from the vmalloc region in realmode (faulting address is in DAR). > > Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] > LE SMP NR_CPUS=2048 NUMA PowerNV > CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994 > NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 > REGS: c000000fff76dd80 TRAP: 0200 Not tainted (4.16.0-01530-g43d1859f0994) > MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 > CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 > NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 > LR [c0000000000c2430] do_tlbies+0x230/0x2f0 > > I suspect the reason is the per-cpu data is not in the linear chunk. > This could be restored if that was able to be fixed, but for now, > just remove the tracepoints. Could you share the stack trace as well? I've not observed this in my testing. May be I don't have as many cpus. I presume your talking about the per cpu data offsets for per cpu trace data? Balbir Singh. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 8 Apr 2018 20:17:47 +1000 Balbir Singh <bsingharora@gmail.com> wrote: > On Fri, Apr 6, 2018 at 3:56 AM, Nicholas Piggin <npiggin@gmail.com> wrote: > > This crashes with a "Bad real address for load" attempting to load > > from the vmalloc region in realmode (faulting address is in DAR). > > > > Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] > > LE SMP NR_CPUS=2048 NUMA PowerNV > > CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994 > > NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 > > REGS: c000000fff76dd80 TRAP: 0200 Not tainted (4.16.0-01530-g43d1859f0994) > > MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 > > CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 > > NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 > > LR [c0000000000c2430] do_tlbies+0x230/0x2f0 > > > > I suspect the reason is the per-cpu data is not in the linear chunk. > > This could be restored if that was able to be fixed, but for now, > > just remove the tracepoints. > > Could you share the stack trace as well? I've not observed this in my testing. I can't seem to find it, I can try reproduce tomorrow. It was coming from h_remove hcall from the guest. It's 176 logical CPUs. > May be I don't have as many cpus. I presume your talking about the per cpu > data offsets for per cpu trace data? It looked like it was dereferencing virtually mapped per-cpu data, yes. Probably the perf_events deref. Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nicholas Piggin <npiggin@gmail.com> writes: > On Sun, 8 Apr 2018 20:17:47 +1000 > Balbir Singh <bsingharora@gmail.com> wrote: > >> On Fri, Apr 6, 2018 at 3:56 AM, Nicholas Piggin <npiggin@gmail.com> wrote: >> > This crashes with a "Bad real address for load" attempting to load >> > from the vmalloc region in realmode (faulting address is in DAR). >> > >> > Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] >> > LE SMP NR_CPUS=2048 NUMA PowerNV >> > CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994 >> > NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 >> > REGS: c000000fff76dd80 TRAP: 0200 Not tainted (4.16.0-01530-g43d1859f0994) >> > MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 >> > CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 >> > NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 >> > LR [c0000000000c2430] do_tlbies+0x230/0x2f0 >> > >> > I suspect the reason is the per-cpu data is not in the linear chunk. >> > This could be restored if that was able to be fixed, but for now, >> > just remove the tracepoints. >> >> Could you share the stack trace as well? I've not observed this in my testing. > > I can't seem to find it, I can try reproduce tomorrow. It was coming > from h_remove hcall from the guest. It's 176 logical CPUs. > >> May be I don't have as many cpus. I presume your talking about the per cpu >> data offsets for per cpu trace data? > > It looked like it was dereferencing virtually mapped per-cpu data, yes. > Probably the perf_events deref. Naveen has posted a series to (hopefully) fix this, which just missed the merge window: https://patchwork.ozlabs.org/patch/894757/ cheers -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Ellerman wrote: > Nicholas Piggin <npiggin@gmail.com> writes: > >> On Sun, 8 Apr 2018 20:17:47 +1000 >> Balbir Singh <bsingharora@gmail.com> wrote: >> >>> On Fri, Apr 6, 2018 at 3:56 AM, Nicholas Piggin <npiggin@gmail.com> wrote: >>> > This crashes with a "Bad real address for load" attempting to load >>> > from the vmalloc region in realmode (faulting address is in DAR). >>> > >>> > Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] >>> > LE SMP NR_CPUS=2048 NUMA PowerNV >>> > CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994 >>> > NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 >>> > REGS: c000000fff76dd80 TRAP: 0200 Not tainted (4.16.0-01530-g43d1859f0994) >>> > MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 >>> > CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 >>> > NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 >>> > LR [c0000000000c2430] do_tlbies+0x230/0x2f0 >>> > >>> > I suspect the reason is the per-cpu data is not in the linear chunk. >>> > This could be restored if that was able to be fixed, but for now, >>> > just remove the tracepoints. >>> >>> Could you share the stack trace as well? I've not observed this in my testing. >> >> I can't seem to find it, I can try reproduce tomorrow. It was coming >> from h_remove hcall from the guest. It's 176 logical CPUs. >> >>> May be I don't have as many cpus. I presume your talking about the per cpu >>> data offsets for per cpu trace data? >> >> It looked like it was dereferencing virtually mapped per-cpu data, yes. >> Probably the perf_events deref. > > Naveen has posted a series to (hopefully) fix this, which just missed > the merge window: > > https://patchwork.ozlabs.org/patch/894757/ I'm afraid that won't actually help here :( That series is specific to the function tracer, while this is using static tracepoints. We could convert trace_tlbie() to a TRACE_EVENT_CONDITION() and guard it within a check for paca->ftrace_enabled, but that would only be useful if the below callsites can ever be hit outside of KVM guest mode. - Naveen -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 10 Apr 2018 11:25:02 +0530 "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote: > Michael Ellerman wrote: > > Nicholas Piggin <npiggin@gmail.com> writes: > > > >> On Sun, 8 Apr 2018 20:17:47 +1000 > >> Balbir Singh <bsingharora@gmail.com> wrote: > >> > >>> On Fri, Apr 6, 2018 at 3:56 AM, Nicholas Piggin <npiggin@gmail.com> wrote: > >>> > This crashes with a "Bad real address for load" attempting to load > >>> > from the vmalloc region in realmode (faulting address is in DAR). > >>> > > >>> > Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] > >>> > LE SMP NR_CPUS=2048 NUMA PowerNV > >>> > CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994 > >>> > NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 > >>> > REGS: c000000fff76dd80 TRAP: 0200 Not tainted (4.16.0-01530-g43d1859f0994) > >>> > MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 > >>> > CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 > >>> > NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 > >>> > LR [c0000000000c2430] do_tlbies+0x230/0x2f0 > >>> > > >>> > I suspect the reason is the per-cpu data is not in the linear chunk. > >>> > This could be restored if that was able to be fixed, but for now, > >>> > just remove the tracepoints. > >>> > >>> Could you share the stack trace as well? I've not observed this in my testing. > >> > >> I can't seem to find it, I can try reproduce tomorrow. It was coming > >> from h_remove hcall from the guest. It's 176 logical CPUs. > >> > >>> May be I don't have as many cpus. I presume your talking about the per cpu > >>> data offsets for per cpu trace data? > >> > >> It looked like it was dereferencing virtually mapped per-cpu data, yes. > >> Probably the perf_events deref. > > > > Naveen has posted a series to (hopefully) fix this, which just missed > > the merge window: > > > > https://patchwork.ozlabs.org/patch/894757/ > > I'm afraid that won't actually help here :( > That series is specific to the function tracer, while this is using > static tracepoints. > > We could convert trace_tlbie() to a TRACE_EVENT_CONDITION() and guard it > within a check for paca->ftrace_enabled, but that would only be useful > if the below callsites can ever be hit outside of KVM guest mode. Right, removing the trace points is the right thing to do here. Doing tracing in real mode would be a whole effort itself, I'd expect. Or disabling realmode handling of HPT hcalls if trace points are active. Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2018-04-05 at 17:56:30 UTC, Nicholas Piggin wrote: > This crashes with a "Bad real address for load" attempting to load > from the vmalloc region in realmode (faulting address is in DAR). > > Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] > LE SMP NR_CPUS=2048 NUMA PowerNV > CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994 > NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 > REGS: c000000fff76dd80 TRAP: 0200 Not tainted (4.16.0-01530-g43d1859f0994) > MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 > CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 > NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 > LR [c0000000000c2430] do_tlbies+0x230/0x2f0 > > I suspect the reason is the per-cpu data is not in the linear chunk. > This could be restored if that was able to be fixed, but for now, > just remove the tracepoints. > > Fixes: 0428491cba ("powerpc/mm: Trace tlbie(l) instructions") > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/19ce7909ed11c49f7eddf59e7f49cd cheers -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index e1c083fbe434..78e6a392330f 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -470,8 +470,6 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues, for (i = 0; i < npages; ++i) { asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : : "r" (rbvalues[i]), "r" (kvm->arch.lpid)); - trace_tlbie(kvm->arch.lpid, 0, rbvalues[i], - kvm->arch.lpid, 0, 0, 0); } if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) { @@ -492,8 +490,6 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues, for (i = 0; i < npages; ++i) { asm volatile(PPC_TLBIEL(%0,%1,0,0,0) : : "r" (rbvalues[i]), "r" (0)); - trace_tlbie(kvm->arch.lpid, 1, rbvalues[i], - 0, 0, 0, 0); } asm volatile("ptesync" : : : "memory"); }
This crashes with a "Bad real address for load" attempting to load from the vmalloc region in realmode (faulting address is in DAR). Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994 NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 REGS: c000000fff76dd80 TRAP: 0200 Not tainted (4.16.0-01530-g43d1859f0994) MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c2430] do_tlbies+0x230/0x2f0 I suspect the reason is the per-cpu data is not in the linear chunk. This could be restored if that was able to be fixed, but for now, just remove the tracepoints. Fixes: 0428491cba ("powerpc/mm: Trace tlbie(l) instructions") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 4 ---- 1 file changed, 4 deletions(-)