Message ID | 20200319232219.446480829@goodmis.org |
---|---|
Headers | show |
Series | ring-buffer/tracing: Remove disabling of ring buffer while reading trace file | expand |
From: Steven Rostedt > Sent: 19 March 2020 23:22 ... > > This patch series attempts to satisfy that request, by creating a > temporary buffer in each of the per cpu iterators to place the > read event into, such that it can be passed to users without worrying > about a writer to corrupt the event while it was being written out. > It also uses the fact that the ring buffer is broken up into pages, > where each page has its own timestamp that gets updated when a > writer crosses over to it. By copying it to the temp buffer, and > doing a "before and after" test of the time stamp with memory barriers, > can allow the events to be saved. Does this mean the you will no longer be able to look at a snapshot of the trace by running 'less trace' (and typically going to the end to get info for all cpus). A lot of the time trace is being written far too fast for it to make any sense to try to read it continuously. Also, if BPF start using ftrace, no one will be able to use it for 'normal debugging' on such systems. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Sat, 21 Mar 2020 19:13:51 +0000 David Laight <David.Laight@ACULAB.COM> wrote: > From: Steven Rostedt > > Sent: 19 March 2020 23:22 > ... > > > > This patch series attempts to satisfy that request, by creating a > > temporary buffer in each of the per cpu iterators to place the > > read event into, such that it can be passed to users without worrying > > about a writer to corrupt the event while it was being written out. > > It also uses the fact that the ring buffer is broken up into pages, > > where each page has its own timestamp that gets updated when a > > writer crosses over to it. By copying it to the temp buffer, and > > doing a "before and after" test of the time stamp with memory barriers, > > can allow the events to be saved. > > Does this mean the you will no longer be able to look at a snapshot > of the trace by running 'less trace' (and typically going to the end > to get info for all cpus). If there's a use case for this, it will be trivial to add an option to bring back the old behavior. If you want that, I can do that, and even add a config that makes it the default. > > A lot of the time trace is being written far too fast for it to make > any sense to try to read it continuously. > > Also, if BPF start using ftrace, no one will be able to use it for > 'normal debugging' on such systems. I believe its used for debugging bpf, not for normal tracing. BPF only uses this when it has their trace_printk() using it. Which gives that nasty "THIS IS A DEBUG KERNEL" message ;-) Thus, I don't think you need to worry about bpf having this in production. -- Steve
On Sat, 21 Mar 2020 19:13:51 +0000 David Laight <David.Laight@ACULAB.COM> wrote: > From: Steven Rostedt > > Sent: 19 March 2020 23:22 > ... > > > > This patch series attempts to satisfy that request, by creating a > > temporary buffer in each of the per cpu iterators to place the > > read event into, such that it can be passed to users without worrying > > about a writer to corrupt the event while it was being written out. > > It also uses the fact that the ring buffer is broken up into pages, > > where each page has its own timestamp that gets updated when a > > writer crosses over to it. By copying it to the temp buffer, and > > doing a "before and after" test of the time stamp with memory barriers, > > can allow the events to be saved. > > Does this mean the you will no longer be able to look at a snapshot > of the trace by running 'less trace' (and typically going to the end > to get info for all cpus). I changed patch 9 to be this: It adds an option "pause-on-trace" that when set, will bring back the old behavior of pausing recording to the ring buffer when the trace file is open. If needed, I can add a kernel command line option and a Kconfig that makes this set to true by default. -- Steve From 71f44d604e5b16cc239d6276b447a515448f582f Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Date: Tue, 17 Mar 2020 17:32:31 -0400 Subject: [PATCH] tracing: Do not disable tracing when reading the trace file When opening the "trace" file, it is no longer necessary to disable tracing. Note, a new option is created called "pause-on-trace", when set, will cause the trace file to emulate its original behavior. Link: http://lkml.kernel.org/r/20200317213416.903351225@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> --- Documentation/trace/ftrace.rst | 6 ++++++ kernel/trace/trace.c | 9 ++++++--- kernel/trace/trace.h | 1 + 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index 99a0890e20ec..c33950a35d65 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -1125,6 +1125,12 @@ Here are the available options: the trace displays additional information about the latency, as described in "Latency trace format". + pause-on-trace + When set, opening the trace file for read, will pause + writing to the ring buffer (as if tracing_on was set to zero). + This simulates the original behavior of the trace file. + When the file is closed, tracing will be enabled again. + record-cmd When any event or tracer is enabled, a hook is enabled in the sched_switch trace point to fill comm cache diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 47889123be7f..650fa81fffe8 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4273,8 +4273,11 @@ __tracing_open(struct inode *inode, struct file *file, bool snapshot) if (trace_clocks[tr->clock_id].in_ns) iter->iter_flags |= TRACE_FILE_TIME_IN_NS; - /* stop the trace while dumping if we are not opening "snapshot" */ - if (!iter->snapshot) + /* + * If pause-on-trace is enabled, then stop the trace while + * dumping, unless this is the "snapshot" file + */ + if (!iter->snapshot && (tr->trace_flags & TRACE_ITER_PAUSE_ON_TRACE)) tracing_stop_tr(tr); if (iter->cpu_file == RING_BUFFER_ALL_CPUS) { @@ -4371,7 +4374,7 @@ static int tracing_release(struct inode *inode, struct file *file) if (iter->trace && iter->trace->close) iter->trace->close(iter); - if (!iter->snapshot) + if (!iter->snapshot && tr->stop_count) /* reenable tracing if it was previously enabled */ tracing_start_tr(tr); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index c61e1b1c85a6..f37e05135986 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1302,6 +1302,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf, C(IRQ_INFO, "irq-info"), \ C(MARKERS, "markers"), \ C(EVENT_FORK, "event-fork"), \ + C(PAUSE_ON_TRACE, "pause-on-trace"), \ FUNCTION_FLAGS \ FGRAPH_FLAGS \ STACK_FLAGS \
From: Steven Rostedt > Sent: 27 March 2020 01:46 > On Sat, 21 Mar 2020 19:13:51 +0000 > David Laight <David.Laight@ACULAB.COM> wrote: > > > From: Steven Rostedt > > > Sent: 19 March 2020 23:22 > > ... > > > > > > This patch series attempts to satisfy that request, by creating a > > > temporary buffer in each of the per cpu iterators to place the > > > read event into, such that it can be passed to users without worrying > > > about a writer to corrupt the event while it was being written out. > > > It also uses the fact that the ring buffer is broken up into pages, > > > where each page has its own timestamp that gets updated when a > > > writer crosses over to it. By copying it to the temp buffer, and > > > doing a "before and after" test of the time stamp with memory barriers, > > > can allow the events to be saved. > > > > Does this mean the you will no longer be able to look at a snapshot > > of the trace by running 'less trace' (and typically going to the end > > to get info for all cpus). > > I changed patch 9 to be this: > > It adds an option "pause-on-trace" that when set, will bring back the > old behavior of pausing recording to the ring buffer when the trace > file is open. > > If needed, I can add a kernel command line option and a Kconfig that > makes this set to true by default. Maybe a different file 'trace_no_pause' ? Along with the one that lets you read the raw trace and get EOF. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, 27 Mar 2020 10:07:00 +0000 David Laight <David.Laight@ACULAB.COM> wrote: > > If needed, I can add a kernel command line option and a Kconfig that > > makes this set to true by default. > > Maybe a different file 'trace_no_pause' ? I rather not add another file, it adds more complexity, and confuses the interface even more. I'll leave this as is. > Along with the one that lets you read the raw trace and get EOF. Can you explain this more? I think we talked about this before, but I don't remember the details. -- Steve
From: Steven Rostedt > Sent: 27 March 2020 14:31 .. > > Along with the one that lets you read the raw trace and get EOF. > > Can you explain this more? I think we talked about this before, but I don't > remember the details. I was trying to use schedviz (on github from google). It reads out the raw trace (for some scheduler events) and then postprocesses it to generate to nice pictures. However the shell script it uses for the captures has to run the copies in the background, sleep a random time, and then kill all the copies having hoped it has waited long enough - truly horrid. There is also some confusion (IIRC between a header and your library code) about the 'time-delta' on 'pad' entries. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)