Patchwork [BUG] percpu misaligned allocation

login
register
mail settings
Submitter Tejun Heo
Date March 18, 2010, 9:30 a.m.
Message ID <4BA1F2BA.30604@kernel.org>
Download mbox | patch
Permalink /patch/48013/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - March 18, 2010, 9:30 a.m.
Hello,

On 03/18/2010 01:49 PM, Frederic Weisbecker wrote:
> Hi,
> 
> While using the lock events through perf in a sparc box, I can see
> the following message repeated many times:
> 
> 	Kernel unaligned access at TPC[49357c] perf_trace_lock_acquire+0xb4/0x180
> 
> It actually hangs the box as the messages are sent to a serial console.
> 
> When used with perf, the trace events use a per cpu buffer allocated
> in kernel/trace/trace_event_perf.c, and the allocation appears to return
> a misaligned percpu pointer. It is aligned to 4 while it seems it
> requires to be aligned to 8.

Does this fix the problem?
Frédéric Weisbecker - March 18, 2010, 7:50 p.m.
On Thu, Mar 18, 2010 at 06:30:34PM +0900, Tejun Heo wrote:
> Hello,
> 
> On 03/18/2010 01:49 PM, Frederic Weisbecker wrote:
> > Hi,
> > 
> > While using the lock events through perf in a sparc box, I can see
> > the following message repeated many times:
> > 
> > 	Kernel unaligned access at TPC[49357c] perf_trace_lock_acquire+0xb4/0x180
> > 
> > It actually hangs the box as the messages are sent to a serial console.
> > 
> > When used with perf, the trace events use a per cpu buffer allocated
> > in kernel/trace/trace_event_perf.c, and the allocation appears to return
> > a misaligned percpu pointer. It is aligned to 4 while it seems it
> > requires to be aligned to 8.
> 
> Does this fix the problem?
> 
> diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
> index c1cc3ab..d3f7d1b 100644
> --- a/kernel/trace/trace_event_profile.c
> +++ b/kernel/trace/trace_event_profile.c
> @@ -27,13 +27,15 @@ static int ftrace_profile_enable_event(struct ftrace_event_call *event)
>  		return 0;
>  
>  	if (!total_profile_count) {
> -		buf = (char *)alloc_percpu(perf_trace_t);
> +		buf = (char *)__alloc_percpu(sizeof(perf_trace_t),
> +					     __alignof__(unsigned long));
>  		if (!buf)
>  			goto fail_buf;
>  
>  		rcu_assign_pointer(perf_trace_buf, buf);
>  
> -		buf = (char *)alloc_percpu(perf_trace_t);
> +		buf = (char *)__alloc_percpu(sizeof(perf_trace_t),
> +					     __alignof__(unsigned long));
>  		if (!buf)
>  			goto fail_buf_nmi;


Yep, it does the trick.

In case you test, I have two other misalignments, one is in
perf_trace_buf_prepare but it is my bad and it is nothing
related to percpu. I'm going to fix it.
Another is in the ring buffer and Steve has a pending fix.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 19, 2010, 12:54 a.m.
From: Tejun Heo <tj@kernel.org>
Date: Thu, 18 Mar 2010 18:30:34 +0900

>  
>  	if (!total_profile_count) {
> -		buf = (char *)alloc_percpu(perf_trace_t);
> +		buf = (char *)__alloc_percpu(sizeof(perf_trace_t),
> +					     __alignof__(unsigned long));
>  		if (!buf)
>  			goto fail_buf;

Why not make perf_trace_t have the proper alignment?

That's better than patching around it like this.

Defining it as an array of char[]'s is just asking
for lots of trouble.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frédéric Weisbecker - March 19, 2010, 1:31 a.m.
On Thu, Mar 18, 2010 at 05:54:13PM -0700, David Miller wrote:
> From: Tejun Heo <tj@kernel.org>
> Date: Thu, 18 Mar 2010 18:30:34 +0900
> 
> >  
> >  	if (!total_profile_count) {
> > -		buf = (char *)alloc_percpu(perf_trace_t);
> > +		buf = (char *)__alloc_percpu(sizeof(perf_trace_t),
> > +					     __alignof__(unsigned long));
> >  		if (!buf)
> >  			goto fail_buf;
> 
> Why not make perf_trace_t have the proper alignment?


So, making perf_trace_t as align(8) would do the trick?
I lack the knowledge about alignment layout for archs that
need aligned accesses.
At a first glance, what I would except is that every buffer
has a base address aligned, no?


> 
> That's better than patching around it like this.
> 
> Defining it as an array of char[]'s is just asking
> for lots of trouble.


Yeah but we need a generic type. This is because
our buffer can be of any random type to match all
the trace event layouts we have, all of them being
generated by macros.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - March 19, 2010, 1:51 a.m.
Hello,

On 03/19/2010 10:31 AM, Frederic Weisbecker wrote:
> On Thu, Mar 18, 2010 at 05:54:13PM -0700, David Miller wrote:
>> From: Tejun Heo <tj@kernel.org>
>> Date: Thu, 18 Mar 2010 18:30:34 +0900
>>
>>>  
>>>  	if (!total_profile_count) {
>>> -		buf = (char *)alloc_percpu(perf_trace_t);
>>> +		buf = (char *)__alloc_percpu(sizeof(perf_trace_t),
>>> +					     __alignof__(unsigned long));
>>>  		if (!buf)
>>>  			goto fail_buf;
>>
>> Why not make perf_trace_t have the proper alignment?

Sure, I just wanted to verify the cause of the problem.

> So, making perf_trace_t as align(8) would do the trick?
> I lack the knowledge about alignment layout for archs that
> need aligned accesses.

If you can't make it a proper type, __alignof__(unsigned long long)
would be better.

> Yeah but we need a generic type. This is because
> our buffer can be of any random type to match all
> the trace event layouts we have, all of them being
> generated by macros.

I hope those macros align properly according to types.

Thanks.
David Miller - March 19, 2010, 1:57 a.m.
From: Frederic Weisbecker <fweisbec@gmail.com>
Date: Fri, 19 Mar 2010 02:31:22 +0100

> On Thu, Mar 18, 2010 at 05:54:13PM -0700, David Miller wrote:
>> From: Tejun Heo <tj@kernel.org>
>> Date: Thu, 18 Mar 2010 18:30:34 +0900
>> 
>> >  
>> >  	if (!total_profile_count) {
>> > -		buf = (char *)alloc_percpu(perf_trace_t);
>> > +		buf = (char *)__alloc_percpu(sizeof(perf_trace_t),
>> > +					     __alignof__(unsigned long));
>> >  		if (!buf)
>> >  			goto fail_buf;
>> 
>> Why not make perf_trace_t have the proper alignment?
> 
> 
> So, making perf_trace_t as align(8) would do the trick?
> I lack the knowledge about alignment layout for archs that
> need aligned accesses.
> At a first glance, what I would except is that every buffer
> has a base address aligned, no?

Make it of the largest type that could appeat
in a trace entry.

I would use u64 so something like:

	u64 [FTRACE_MAX_PROFILE_SIZE / sizeof(u64)]
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - March 19, 2010, 2:18 a.m.
On 03/19/2010 10:57 AM, David Miller wrote:
> I would use u64 so something like:
> 
> 	u64 [FTRACE_MAX_PROFILE_SIZE / sizeof(u64)]

<paranoid>DIV_ROUND_UP() would be safer than division</paranoid>

Thanks.
Frédéric Weisbecker - March 19, 2010, 2:30 a.m.
On Fri, Mar 19, 2010 at 11:18:51AM +0900, Tejun Heo wrote:
> On 03/19/2010 10:57 AM, David Miller wrote:
> > I would use u64 so something like:
> > 
> > 	u64 [FTRACE_MAX_PROFILE_SIZE / sizeof(u64)]
> 
> <paranoid>DIV_ROUND_UP() would be safer than division</paranoid>
> 
> Thanks.


Ok, thanks guys, I'll try this out.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 19, 2010, 3:02 a.m.
From: Tejun Heo <tj@kernel.org>
Date: Fri, 19 Mar 2010 11:18:51 +0900

> On 03/19/2010 10:57 AM, David Miller wrote:
>> I would use u64 so something like:
>> 
>> 	u64 [FTRACE_MAX_PROFILE_SIZE / sizeof(u64)]
> 
> <paranoid>DIV_ROUND_UP() would be safer than division</paranoid>

There's potential real trouble if it isn't a multiple of sizeof(u64)
so better:

	BUILD_BUG_ON(FTRACE_MAX_PROFILE_SIZE % sizeof(u64));

:-)

What a mess, just because this thing can't be typed properly :-/
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Kennedy - March 19, 2010, 3:48 p.m.
On 19/03/10 03:02, David Miller wrote:
> From: Tejun Heo <tj@kernel.org>
> Date: Fri, 19 Mar 2010 11:18:51 +0900
> 
>> On 03/19/2010 10:57 AM, David Miller wrote:
>>> I would use u64 so something like:
>>>
>>> 	u64 [FTRACE_MAX_PROFILE_SIZE / sizeof(u64)]
>>
>> <paranoid>DIV_ROUND_UP() would be safer than division</paranoid>
> 
> There's potential real trouble if it isn't a multiple of sizeof(u64)
> so better:
> 
> 	BUILD_BUG_ON(FTRACE_MAX_PROFILE_SIZE % sizeof(u64));
> 
> :-)
> 
> What a mess, just because this thing can't be typed properly :-/
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Couldn't you use a union?

For example if you have
union test {
	long t;
	char buffer[50];
};
gcc will then do the right thing.

on x86_64 sizeof(union test) = 56
but on x86_32 it's only 52.

regards
Richard

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index c1cc3ab..d3f7d1b 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -27,13 +27,15 @@  static int ftrace_profile_enable_event(struct ftrace_event_call *event)
 		return 0;
 
 	if (!total_profile_count) {
-		buf = (char *)alloc_percpu(perf_trace_t);
+		buf = (char *)__alloc_percpu(sizeof(perf_trace_t),
+					     __alignof__(unsigned long));
 		if (!buf)
 			goto fail_buf;
 
 		rcu_assign_pointer(perf_trace_buf, buf);
 
-		buf = (char *)alloc_percpu(perf_trace_t);
+		buf = (char *)__alloc_percpu(sizeof(perf_trace_t),
+					     __alignof__(unsigned long));
 		if (!buf)
 			goto fail_buf_nmi;