Message ID | 20100417172220.GB15037@nowhere |
---|---|
State | RFC |
Delegated to: | David Miller |
Headers | show |
From: Frederic Weisbecker <fweisbec@gmail.com> Date: Sat, 17 Apr 2010 19:22:21 +0200 > This looks like the true cause. > > I tested the following patch and it fixes the issue after several > manual loops of: > > echo function_graph > current_tracer > echo nop > current_tracer > > > Before it was triggering after two trials. Now it doesn't happen > after 10 iterations. > > I'm going to hunt the culprit that causes this, we certainly need > a new __notrace somewhere. Hmmm, interesting... and see my other email about how my per-cpu traces with the non-graph tracer look corrupted when that hrtimer timeout triggers.... Let me know where this __notrace marker is needed though, sounds interesting! -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Frederic Weisbecker <fweisbec@gmail.com> Date: Sat, 17 Apr 2010 19:22:21 +0200 > I tested the following patch and it fixes the issue after several > manual loops of: Wait, this is completely normal. NMIs can come in at any time and nmi_enter() emits a trace via the rcu_nmi_enter(). This happens on x86 too. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Apr 17, 2010 at 02:29:06PM -0700, David Miller wrote: > From: Frederic Weisbecker <fweisbec@gmail.com> > Date: Sat, 17 Apr 2010 19:22:21 +0200 > > > I tested the following patch and it fixes the issue after several > > manual loops of: > > Wait, this is completely normal. > > NMIs can come in at any time and nmi_enter() emits a trace via the > rcu_nmi_enter(). > > This happens on x86 too. Yeah but I've dumped the recursion cases I could find and there was no NMI path. I haven't started the watchdog nor perf, I guess NMI don't trigger in other cases, right? For now, the only reentrancy I could find was irqs that interrupt the tracing path. Which means no good clue there. That said I have only logged recursivity on trace entry path, not yet on return. I'm disabling the protections on entry, just to narrow down the recursivity place, in case it only happens on return. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Apr 17, 2010 at 02:29:06PM -0700, David Miller wrote: > From: Frederic Weisbecker <fweisbec@gmail.com> > Date: Sat, 17 Apr 2010 19:22:21 +0200 > > > I tested the following patch and it fixes the issue after several > > manual loops of: > > Wait, this is completely normal. > > NMIs can come in at any time and nmi_enter() emits a trace via the > rcu_nmi_enter(). > > This happens on x86 too. Ok, I've just dropped the protection from the entry path and it runs fine. The problem happens on function return path. I'm going to do some dumps there. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Frederic Weisbecker <fweisbec@gmail.com> Date: Sat, 17 Apr 2010 23:34:15 +0200 > I haven't started the watchdog nor perf, I guess NMI don't trigger > in other cases, right? They do, for the NMI watchdog, every few seconds. > For now, the only reentrancy I could find was irqs that interrupt > the tracing path. Which means no good clue there. That said I > have only logged recursivity on trace entry path, not yet > on return. > > I'm disabling the protections on entry, just to narrow down > the recursivity place, in case it only happens on return. No need to do so much work, when you hit this case simply disable tracing and dump_stack(). That way you'll see it clearly. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Apr 17, 2010 at 02:38:37PM -0700, David Miller wrote: > From: Frederic Weisbecker <fweisbec@gmail.com> > Date: Sat, 17 Apr 2010 23:34:15 +0200 > > > I haven't started the watchdog nor perf, I guess NMI don't trigger > > in other cases, right? > > They do, for the NMI watchdog, every few seconds. > > > For now, the only reentrancy I could find was irqs that interrupt > > the tracing path. Which means no good clue there. That said I > > have only logged recursivity on trace entry path, not yet > > on return. > > > > I'm disabling the protections on entry, just to narrow down > > the recursivity place, in case it only happens on return. > > No need to do so much work, when you hit this case simply > disable tracing and dump_stack(). That way you'll see it > clearly. This is what I'm doing. But I needed to know where the tracing recursion happens: on tracing entry or return. And obviously it looks like that only happens on return. And now I'm trying the dumps on return -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/sparc/kernel/ftrace.c b/arch/sparc/kernel/ftrace.c index 03ab022..3dc19b6 100644 --- a/arch/sparc/kernel/ftrace.c +++ b/arch/sparc/kernel/ftrace.c @@ -134,18 +134,24 @@ unsigned long prepare_ftrace_return(unsigned long parent, if (unlikely(atomic_read(¤t->tracing_graph_pause))) return parent + 8UL; + pause_graph_tracing(); + if (ftrace_push_return_trace(parent, self_addr, &trace.depth, - frame_pointer) == -EBUSY) + frame_pointer) == -EBUSY) { + unpause_graph_tracing(); return parent + 8UL; + } trace.func = self_addr; /* Only trace if the calling function expects to */ if (!ftrace_graph_entry(&trace)) { current->curr_ret_stack--; + unpause_graph_tracing(); return parent + 8UL; } + unpause_graph_tracing(); return return_hooker; } #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c index 9aed1a5..251a46a 100644 --- a/kernel/trace/trace_functions_graph.c +++ b/kernel/trace/trace_functions_graph.c @@ -163,6 +163,8 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer) struct ftrace_graph_ret trace; unsigned long ret; + pause_graph_tracing(); + ftrace_pop_return_trace(&trace, &ret, frame_pointer); trace.rettime = trace_clock_local(); ftrace_graph_return(&trace); @@ -176,6 +178,8 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer) ret = (unsigned long)panic; } + unpause_graph_tracing(); + return ret; }