diff mbox

[v2] ring-buffer: Replace this_cpu_*() with __this_cpu_*()

Message ID 20150317104038.312e73d1@gandalf.local.home
State New
Headers show

Commit Message

Steven Rostedt March 17, 2015, 2:40 p.m. UTC
It has come to my attention that this_cpu_read/write are horrible on
architectures other than x86. Worse yet, they actually disable
preemption or interrupts! This caused some unexpected tracing results
on ARM.

   101.356868: preempt_count_add <-ring_buffer_lock_reserve
   101.356870: preempt_count_sub <-ring_buffer_lock_reserve

The ring_buffer_lock_reserve has recursion protection that requires
accessing a per cpu variable. But since preempt_disable() is traced, it
too got traced while accessing the variable that is suppose to prevent
recursion like this.

The generic version of this_cpu_read() and write() are:

#define _this_cpu_generic_read(pcp)					\
({	typeof(pcp) ret__;						\
	preempt_disable();						\
	ret__ = *this_cpu_ptr(&(pcp));					\
	preempt_enable();						\
	ret__;								\
})

#define _this_cpu_generic_to_op(pcp, val, op)				\
do {									\
	unsigned long flags;						\
	raw_local_irq_save(flags);					\
	*__this_cpu_ptr(&(pcp)) op val;					\
	raw_local_irq_restore(flags);					\
} while (0)


Which is unacceptable for locations that know they are within preempt
disabled or interrupt disabled locations.

Paul McKenney stated that __this_cpu_() versions produce much better code on
other architectures than this_cpu_() does, if we know that the call is done in
a preempt disabled location.

I also changed the recursive_unlock() to use two local variables instead
of accessing the per_cpu variable twice.

Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Cc: Christoph Lameter <cl@linux.com>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
Changes since v1:
  Use __this_cpu_*() instead of this_cpu_ptr()

Comments

Uwe Kleine-König March 17, 2015, 2:47 p.m. UTC | #1
Guten Morgen Steven,

On Tue, Mar 17, 2015 at 10:40:38AM -0400, Steven Rostedt wrote:
>  static __always_inline void trace_recursive_unlock(void)
>  {
> -	unsigned int val = this_cpu_read(current_context);
> +	unsigned int val = __this_cpu_read(current_context);
> +	unsigned int val2;
>  
> -	val--;
> -	val &= this_cpu_read(current_context);
> -	this_cpu_write(current_context, val);
> +	val2 = val - 1;
> +	val &= val2;
> +	__this_cpu_write(current_context, val);
You could use:

	unsigned int val = __this_cpu_read(current_context);

	val = val & (val - 1);

	__this_cpu_write(current_context, val);

and save a few lines and still make it more readable (IMHO).

BTW, this patch makes the additional lines in the trace disappear, so if
you think that makes a Tested-by applicable, feel free to add it.

Best regards
Uwe
Steven Rostedt March 17, 2015, 3:07 p.m. UTC | #2
On Tue, 17 Mar 2015 15:47:01 +0100
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> wrote:

> Guten Morgen Steven,
> 
> On Tue, Mar 17, 2015 at 10:40:38AM -0400, Steven Rostedt wrote:
> >  static __always_inline void trace_recursive_unlock(void)
> >  {
> > -	unsigned int val = this_cpu_read(current_context);
> > +	unsigned int val = __this_cpu_read(current_context);
> > +	unsigned int val2;
> >  
> > -	val--;
> > -	val &= this_cpu_read(current_context);
> > -	this_cpu_write(current_context, val);
> > +	val2 = val - 1;
> > +	val &= val2;
> > +	__this_cpu_write(current_context, val);
> You could use:
> 
> 	unsigned int val = __this_cpu_read(current_context);
> 
> 	val = val & (val - 1);
> 
> 	__this_cpu_write(current_context, val);
> 
> and save a few lines and still make it more readable (IMHO).

Me too. My version came from looking at too much assembly, and val2
just happened to be another register in my mind.

> 
> BTW, this patch makes the additional lines in the trace disappear, so if
> you think that makes a Tested-by applicable, feel free to add it.

OK, will do. Thanks.

Christoph, you happy with this version?

-- Steve
Christoph Lameter (Ampere) March 19, 2015, 4:20 p.m. UTC | #3
On Tue, 17 Mar 2015, Steven Rostedt wrote:

> Christoph, you happy with this version?

Yes looks good.

Acked-by: Christoph Lameter <cl@linux.com>
Steven Rostedt March 19, 2015, 4:31 p.m. UTC | #4
On Thu, 19 Mar 2015 11:20:53 -0500 (CDT)
Christoph Lameter <cl@linux.com> wrote:

> On Tue, 17 Mar 2015, Steven Rostedt wrote:
> 
> > Christoph, you happy with this version?
> 
> Yes looks good.
> 
> Acked-by: Christoph Lameter <cl@linux.com>

Thanks, will add your ack.

-- Steve
Christoph Lameter (Ampere) March 19, 2015, 4:33 p.m. UTC | #5
If you are redoing it then please get the comments a bit cleared up. The
heaviness of the fallback version of this_cpu_read/write can usually
easily be remedied by arch specific definitions. The per cpu
offset is somewhere in a register and one needs to define a macro that
creates an instruction that does a fetch from that register plus
the current offset into the area that is needed. This is similarly easy
for the write path. But then its often easier to just use the __this_cpu
instructions since preemption is often off in these code paths.

I have had code for IA64 in the past that does this.
Steven Rostedt March 19, 2015, 4:40 p.m. UTC | #6
On Thu, 19 Mar 2015 11:33:30 -0500 (CDT)
Christoph Lameter <cl@linux.com> wrote:

> If you are redoing it then please get the comments a bit cleared up. The

What comments should I clear up? This version did not have a comment.
It just switched this_cpu_* to __this_cpu_*, and also updated a
variable algorithm.

-- Steve


> heaviness of the fallback version of this_cpu_read/write can usually
> easily be remedied by arch specific definitions. The per cpu
> offset is somewhere in a register and one needs to define a macro that
> creates an instruction that does a fetch from that register plus
> the current offset into the area that is needed. This is similarly easy
> for the write path. But then its often easier to just use the __this_cpu
> instructions since preemption is often off in these code paths.
> 
> I have had code for IA64 in the past that does this.
Christoph Lameter (Ampere) March 24, 2015, 6:49 p.m. UTC | #7
On Thu, 19 Mar 2015, Steven Rostedt wrote:

> On Thu, 19 Mar 2015 11:33:30 -0500 (CDT)
> Christoph Lameter <cl@linux.com> wrote:
>
> > If you are redoing it then please get the comments a bit cleared up. The
>
> What comments should I clear up? This version did not have a comment.
> It just switched this_cpu_* to __this_cpu_*, and also updated a
> variable algorithm.

Ok fine if there is no comment.
diff mbox

Patch

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 5040d44fe5a3..363b9ec58aae 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2679,7 +2679,7 @@  static DEFINE_PER_CPU(unsigned int, current_context);
 
 static __always_inline int trace_recursive_lock(void)
 {
-	unsigned int val = this_cpu_read(current_context);
+	unsigned int val = __this_cpu_read(current_context);
 	int bit;
 
 	if (in_interrupt()) {
@@ -2696,18 +2696,19 @@  static __always_inline int trace_recursive_lock(void)
 		return 1;
 
 	val |= (1 << bit);
-	this_cpu_write(current_context, val);
+	__this_cpu_write(current_context, val);
 
 	return 0;
 }
 
 static __always_inline void trace_recursive_unlock(void)
 {
-	unsigned int val = this_cpu_read(current_context);
+	unsigned int val = __this_cpu_read(current_context);
+	unsigned int val2;
 
-	val--;
-	val &= this_cpu_read(current_context);
-	this_cpu_write(current_context, val);
+	val2 = val - 1;
+	val &= val2;
+	__this_cpu_write(current_context, val);
 }
 
 #else