diff mbox

Bug#609371: linux-image-2.6.37-trunk-sparc64: module scsi_mod: Unknown relocation: 36

Message ID 20110119050844.GA8776@Krystal
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Mathieu Desnoyers Jan. 19, 2011, 5:08 a.m. UTC
* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Tue, 2011-01-18 at 15:13 -0500, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@goodmis.org) wrote:
> > > On Tue, 2011-01-18 at 13:16 -0500, Steven Rostedt wrote:
> > > > On Tue, 2011-01-18 at 12:33 -0500, Steven Rostedt wrote:
> > > > > On Tue, 2011-01-18 at 11:46 -0500, Mathieu Desnoyers wrote:
> > > > 
> > > > > > Also align TRACE_PRINTKS on 8 bytes to make sure the beginning of the section is
> > > > > > aligned on pointer size.
> > > > > 
> > > > > If I can make it crash without the alignments and this fixes the issue,
> > > > > I'll apply both patches.
> > > > 
> > > > After applying David's patch, I do indeed get a crash. I'll now apply
> > > > yours and see if it fixes the issue.
> > > 
> > > Your patch doesn't seem to fix it either. I'll investigate this further.
> > 
> > I think David's patch missed kernel/trace/trace_export.c
> > 
> > struct ftrace_event_call __used                                         \
> > __attribute__((__aligned__(4)))                                         \
> > __attribute__((section("_ftrace_events"))) event_##call = {             \
> > 
> > and kernel/trace/trace.h:
> > 
> > #define FTRACE_ENTRY(call, struct_name, id, tstruct, print)             \
> >         extern struct ftrace_event_call                                 \
> >         __attribute__((__aligned__(4))) event_##call;
> > 
> > does it help if you remove these ?
> 
> I doubt it, but I'll try it anyway.

The following works fine for me now. Comments are welcome.


tracing: use __aligned__() variable and type attributes to work around gcc bug

I just played with the possible combinations, and here are my current results,
with the appropriately aligned linker script sections (with my patch applied):

- No aligned() type attribute nor variable attribute. I get a crash on x86_64
  (NULL pointer exception when executing __trace_add_event_call, the 5th call).
  __alignof__(struct ftrace_event_call) is worth 8.

- adding type attribute aligned(32) on the struct ftrace_event_call and
  struct syscall_metadata declarations seems to work fine, but consumes space
  needlessly (struct ftrace_event_call grows from 136 to 160 bytes). Note that
  tracepoints use the type attribute aligned(32), which falls in this category
  (waste of space). We might want to do better.

- adding type attribute aligned(8) crashes. Note that the gcc documentation for
  type attributes explains that this specifies the minimum alignment for the
  type, so the compiler is free to choose a larger one.

- adding type attribute (packed, aligned(8)) should force the compiler to
  consider a 8-byte alignment for the type (as shown by __alignof__), but it
  doesn't work (crash).

I really don't like specifying alignment as variable attributes (rather than
type attributes), because the extern array that iterates on the attributes has
no clue of the alignment used (same for pointer arithmetic). Unfortunately, it
seems to be the only way to really make sure gcc does not use an alignment
larger than prescribed.

So far, my somewhat premature conclusion is that gcc honours the aligned() type
attribute for the array iteration (thus expecting a 136 bytes array aligned on 8
bytes boundaries), but does not apply the aligned() type attribute to the
structure definition when used in combination with the "section" variable
attribute, choosing to align the structure on 32 bytes instead. My hunch is thatthis is a gcc bug that appeared a few months before Fri Nov 14 17:47:47 2008
-0500, which led me to change the tracepoint structure alignment from 8 to 32
in commit 7e066fb870fcd1025ec3ba7bbde5d541094f4ce1 because of a very similar
problem.

One way to work around this could be to specify the aligned(8) type attribute to
the structure declarations *and* the aligned(8) variable attribute to the
structure definitions.

On 32-bit architectures, we really want a aligned(4), and on 64-bit
architectures, aligned(8). Represent this by creating:

#define __long_aligned __attribute__((__aligned__(__alignof__(long))))

The following patch passes the initial smoke test (it boots fine, without any
NULL pointer exception on x86_64).
(this patch is based on a 2.6.37 kernel)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/compiler.h     |    8 +++++---
 include/linux/ftrace_event.h |    2 +-
 include/linux/syscalls.h     |   20 ++++++++++++--------
 include/trace/ftrace.h       |    8 ++++----
 include/trace/syscall.h      |    2 +-
 kernel/trace/trace.h         |    2 +-
 kernel/trace/trace_export.c  |    2 +-
 7 files changed, 25 insertions(+), 19 deletions(-)

Comments

David Miller Jan. 19, 2011, 5:16 a.m. UTC | #1
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Wed, 19 Jan 2011 00:08:45 -0500

> The following works fine for me now. Comments are welcome.

Thanks for doing this work Mathieu.

> - No aligned() type attribute nor variable attribute. I get a crash on x86_64
>   (NULL pointer exception when executing __trace_add_event_call, the 5th call).
>   __alignof__(struct ftrace_event_call) is worth 8.

This is really bizarre.  Does it only happen on x86_64?

I'm wondering if GCC does something bizarre like work with different
default alignments based upon the section or something like that.

If so, maybe adding the section attribute to the array definition will
"fix" things?

> On 32-bit architectures, we really want a aligned(4), and on 64-bit
> architectures, aligned(8). Represent this by creating:
> 
> #define __long_aligned __attribute__((__aligned__(__alignof__(long))))

Do any of these datastructures have, or will have, "u64" or "long
long" types in them?  If so, then we will need to use "8"
unconditionally or "__alignof__(long long)".

I'll see if I can work out why using no align directive explodes
on x86-64.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 19, 2011, 6:32 a.m. UTC | #2
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Wed, 19 Jan 2011 00:08:45 -0500

> - No aligned() type attribute nor variable attribute. I get a crash on x86_64
>   (NULL pointer exception when executing __trace_add_event_call, the 5th call).
>   __alignof__(struct ftrace_event_call) is worth 8.

I think I figured it out.

It's the static vs. non-static thing, or some other crazyness wrt.
how x86-64 implements it's alignment rules.

GCC on x86-64 uses a completely different policy for aligning local
(ie. "static") data objects vs. globally visible ones, for one thing.
It also has different alignment policies for objects that are part
of an array vs. those which are not.

On both counts, we're lying to the compiler, so maybe it's sort of our
fault.

As far as GCC can see, the object is static and also not part of an
array or any other C construct for which things like this could matter
as long as the alignment it chooses meets the minimum alignment
requirements of the ABI.

So it doesn't let us do this trick where we put the individual event
markers into a special section, yet mark it __used and static, then
later access them as if they were part of a globally visible array.

If you look these static objects, they are emitted with a leading
".align 32" directive.  This is what screws everything up.

When the linker sees this, it aligns the start of every individual
"_ftrace_events" section, and that's where the "gaps" come from and
the crashes.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 19, 2011, 7:20 a.m. UTC | #3
From: David Miller <davem@davemloft.net>
Date: Tue, 18 Jan 2011 22:32:47 -0800 (PST)

> As far as GCC can see, the object is static and also not part of an
> array or any other C construct for which things like this could matter
> as long as the alignment it chooses meets the minimum alignment
> requirements of the ABI.
> 
> So it doesn't let us do this trick where we put the individual event
> markers into a special section, yet mark it __used and static, then
> later access them as if they were part of a globally visible array.
> 
> If you look these static objects, they are emitted with a leading
> ".align 32" directive.  This is what screws everything up.
> 
> When the linker sees this, it aligns the start of every individual
> "_ftrace_events" section, and that's where the "gaps" come from and
> the crashes.

I've come up with one possible way to deal with this.

Put pointers to the trace objects into the special section, and
interate over those instead.

I was wondering why this x86-64 weird alignment behavior doesn't bite
us for our init funcs.  And the reason is that all of these weird
alignment cases only trigger for aggregates (ie. structs and unions).

So we could do:

	static struct ftace_event_call foo_event = { ... };
	static struct ftrace_event_call * __used
		__attribute__((section("_ftrace_event_ptrs")))
		foo_event_ptr = &foo_event;

and

	extern struct ftrace_event_call *__start_ftrace_event_ptrs[];
	extern struct ftrace_event_call *__end_ftrace_event_ptrs[];

	struct ftrace_event_call **p = __start_ftrace_event_ptrs;
	while (p < &__end_ftrace_event_ptrs[0]) {
		struct ftrace_event_call *event = *p++;

		__trace_add_event_call(event, ...);
	}

you get the idea.

And we could mark this entire point section as "initdata" and thus
free'able after bootup and post module load.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Jan. 19, 2011, 3:10 p.m. UTC | #4
* David Miller (davem@davemloft.net) wrote:
> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Date: Wed, 19 Jan 2011 00:08:45 -0500
> 
> > The following works fine for me now. Comments are welcome.
> 
> Thanks for doing this work Mathieu.
> 
> > - No aligned() type attribute nor variable attribute. I get a crash on x86_64
> >   (NULL pointer exception when executing __trace_add_event_call, the 5th call).
> >   __alignof__(struct ftrace_event_call) is worth 8.
> 
> This is really bizarre.  Does it only happen on x86_64?

Sadly, my ppc32 test machine is currently broken, so I could not check on other
than x86 archs.

> I'm wondering if GCC does something bizarre like work with different
> default alignments based upon the section or something like that.
> 
> If so, maybe adding the section attribute to the array definition will
> "fix" things?

Well, I thought about it in my sleep, and it looks like gcc is within its rights
to align these statically declared structures on a larger alignment: gcc has no
clue that we're going to do tricks with the linker to access the structures as
an array, so aligning on a larger alignment *should* be fine for the compiler,
but we suffer because we're doing something non-standard.

> 
> > On 32-bit architectures, we really want a aligned(4), and on 64-bit
> > architectures, aligned(8). Represent this by creating:
> > 
> > #define __long_aligned __attribute__((__aligned__(__alignof__(long))))
> 
> Do any of these datastructures have, or will have, "u64" or "long
> long" types in them?  If so, then we will need to use "8"
> unconditionally or "__alignof__(long long)".

If my memory serves me correctly, I think "long long" is aligned on 4 bytes on
ppc32, but on 8 bytes on x86_32 (yeah, that's weird). How about we create a
#define __long_long_aligned __attribute__((__aligned__(__alignof__(long long))))

?

Thanks,

Mathieu

> 
> I'll see if I can work out why using no align directive explodes
> on x86-64.
Mathieu Desnoyers Jan. 19, 2011, 3:11 p.m. UTC | #5
* David Miller (davem@davemloft.net) wrote:
> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Date: Wed, 19 Jan 2011 00:08:45 -0500
> 
> > - No aligned() type attribute nor variable attribute. I get a crash on x86_64
> >   (NULL pointer exception when executing __trace_add_event_call, the 5th call).
> >   __alignof__(struct ftrace_event_call) is worth 8.
> 
> I think I figured it out.
> 
> It's the static vs. non-static thing, or some other crazyness wrt.
> how x86-64 implements it's alignment rules.
> 
> GCC on x86-64 uses a completely different policy for aligning local
> (ie. "static") data objects vs. globally visible ones, for one thing.
> It also has different alignment policies for objects that are part
> of an array vs. those which are not.
> 
> On both counts, we're lying to the compiler, so maybe it's sort of our
> fault.

Yep, we're in strong agreement here.

> 
> As far as GCC can see, the object is static and also not part of an
> array or any other C construct for which things like this could matter
> as long as the alignment it chooses meets the minimum alignment
> requirements of the ABI.
> 
> So it doesn't let us do this trick where we put the individual event
> markers into a special section, yet mark it __used and static, then
> later access them as if they were part of a globally visible array.
> 
> If you look these static objects, they are emitted with a leading
> ".align 32" directive.  This is what screws everything up.

Ah, yep, good way to identify the culprit.

> 
> When the linker sees this, it aligns the start of every individual
> "_ftrace_events" section, and that's where the "gaps" come from and
> the crashes.

OK (more to come in the following email response).

Mathieu
Mathieu Desnoyers Jan. 19, 2011, 3:33 p.m. UTC | #6
* David Miller (davem@davemloft.net) wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 18 Jan 2011 22:32:47 -0800 (PST)
> 
> > As far as GCC can see, the object is static and also not part of an
> > array or any other C construct for which things like this could matter
> > as long as the alignment it chooses meets the minimum alignment
> > requirements of the ABI.
> > 
> > So it doesn't let us do this trick where we put the individual event
> > markers into a special section, yet mark it __used and static, then
> > later access them as if they were part of a globally visible array.
> > 
> > If you look these static objects, they are emitted with a leading
> > ".align 32" directive.  This is what screws everything up.
> > 
> > When the linker sees this, it aligns the start of every individual
> > "_ftrace_events" section, and that's where the "gaps" come from and
> > the crashes.
> 
> I've come up with one possible way to deal with this.
> 
> Put pointers to the trace objects into the special section, and
> interate over those instead.
> 
> I was wondering why this x86-64 weird alignment behavior doesn't bite
> us for our init funcs.  And the reason is that all of these weird
> alignment cases only trigger for aggregates (ie. structs and unions).
> 
> So we could do:
> 
> 	static struct ftace_event_call foo_event = { ... };
> 	static struct ftrace_event_call * __used
> 		__attribute__((section("_ftrace_event_ptrs")))
> 		foo_event_ptr = &foo_event;
> 
> and
> 
> 	extern struct ftrace_event_call *__start_ftrace_event_ptrs[];
> 	extern struct ftrace_event_call *__end_ftrace_event_ptrs[];
> 
> 	struct ftrace_event_call **p = __start_ftrace_event_ptrs;
> 	while (p < &__end_ftrace_event_ptrs[0]) {
> 		struct ftrace_event_call *event = *p++;
> 
> 		__trace_add_event_call(event, ...);
> 	}
> 
> you get the idea.
> 
> And we could mark this entire point section as "initdata" and thus
> free'able after bootup and post module load.

There are three downsides to this approach compared to forcing both the type and
variable alignments with attributes:

1) It consumes extra space: This approach lets gcc align foo_event on 32-byte
   boundaries, which is unneeded in this case. For structures repeated very
   often, this is a bad thing.

2) It mixes .data and struct ftrace_event_call definitions, thus polluting .data
   cache-lines (actually, the 32-byte alignment will leave some of these
   cachelines partly unused). This would be fixable by specifying yet another
   section name to hold the struct ftace_event_call definitions.

3) Freeing _ftrace_event_ptrs is only possible after init here because Ftrace
   data layout is sub-optimal. The linked-list created at init time by Ftrace
   kind of duplicates the arrays already implicit to the section. If we look at
   Tracepoints for example, which present the exact same section alignment
   problem, we iterate on the tracepoint section each time a tracepoint is
   activated or deactivated. So we need to keep the section there after init.
   Therefore, your indirection approach would grow the data footprint.
   The trade-off here is that walking the table is a very rare operation that
   does not need to be fast, so we accept the performance hit of walking the
   tracepoint table for each enabled tracepoint to shrink the memory footprint.

Especially for reasons (1) and (2), I'd be tempted to go for the
__long_long_aligned type and variable attribute instead. Thoughts ?

I'm still unsure that __long_long_aligned is needed over __long_aligned though.
AFAIK, the only requirement we have for, e.g. tracepoints, is to align on the
pointer size (sizeof(long)), so RCU pointer updates are performed atomically.
Aligning on the pointer size also allows the architecture to efficiently read
the field content. What does aligning on sizeof(long long) bring to us ? Is it
that you are concerned about the fact that the "aligned" type attribute, when
applied to a structure, is only used as a lower-bound by the compiler ? In that
case, we might want to consider using "packed" too:

#define __long_packed_aligned __attribute__((__packed__, __aligned__(__alignof__(long))))

I would just like to know if this attribute causes gcc to emit slower memory
access instructions on ppc/sparc/mips (I remember that at least one of these
emit slower instructions for unaligned read/writes, so I wonder if the compiler
uses them as soon as it sees the "packed" attribute, or if it is more clever
than that).

Thanks,

Mathieu
Sam Ravnborg Jan. 19, 2011, 4:14 p.m. UTC | #7
> 
> If my memory serves me correctly, I think "long long" is aligned on 4 bytes on
> ppc32, but on 8 bytes on x86_32 (yeah, that's weird). How about we create a
> #define __long_long_aligned __attribute__((__aligned__(__alignof__(long long))))

#define __u64_aligned __attribute__((__aligned__(__alignof__(long long))))

A bit shorter but maybe less obvious.

	Sam
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Jan. 19, 2011, 4:18 p.m. UTC | #8
* Sam Ravnborg (sam@ravnborg.org) wrote:
> > 
> > If my memory serves me correctly, I think "long long" is aligned on 4 bytes on
> > ppc32, but on 8 bytes on x86_32 (yeah, that's weird). How about we create a
> > #define __long_long_aligned __attribute__((__aligned__(__alignof__(long long))))
> 
> #define __u64_aligned __attribute__((__aligned__(__alignof__(long long))))
> 
> A bit shorter but maybe less obvious.

Yep, that would make sense.

I'm tempted to try creating

#defined __u64_packed_aligned __attribute__((__packed__, __aligned__(__alignof__(long long))))

in the hope that gcc sees this as a strict alignment requirement (including a
max bound) rather than just a hint. From what I gather in my reading of 

http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html

"The aligned attribute can only increase the alignment; but you can decrease it
by specifying packed as well. See below."

gcc seems to support having both specified. I think this would provide the kind
of alignment guarantees we really need here: both specifying the minimum _and_
maximum alignment.

Thoughts ?

Mathieu

> 
> 	Sam
David Miller Jan. 19, 2011, 9:40 p.m. UTC | #9
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Wed, 19 Jan 2011 10:33:26 -0500

> I'm still unsure that __long_long_aligned is needed over __long_aligned though.
> AFAIK, the only requirement we have for, e.g. tracepoints, is to align on the
> pointer size (sizeof(long)), so RCU pointer updates are performed atomically.
> Aligning on the pointer size also allows the architecture to efficiently read
> the field content. What does aligning on sizeof(long long) bring to us ? Is it
> that you are concerned about the fact that the "aligned" type attribute, when
> applied to a structure, is only used as a lower-bound by the compiler ? In that
> case, we might want to consider using "packed" too:

My concern is that if there is ever a u64 or similarly "long long"
typed member in these tracing structures, it will not be aligned
sufficiently to avoid unaligned access traps on 32-bit systems.

If your suggestion defines the lowest possible alignment and GCC will
do the right thing and "up-align" the structure if necessary, then
fine.

If you add "packed" it is going to screw everything up and we'll
essentially be back to square one.

On RISC like sparc64, "packed" causes even 16-bit words to be read and
written a byte at a time.

Never use "packed" under any circumstances unless absolutely
unavoidable.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt Jan. 19, 2011, 10 p.m. UTC | #10
On Wed, 2011-01-19 at 13:40 -0800, David Miller wrote:

> My concern is that if there is ever a u64 or similarly "long long"
> typed member in these tracing structures, it will not be aligned
> sufficiently to avoid unaligned access traps on 32-bit systems.

The structure that gets placed in this section is the ftrace_event_call.
It consists only of pointers, a struct list_head, a couple of "int", and
a struct trace_event, which consists of: a struct hlist_node, a struct
list_head, an int, and a pointer.

None of these are more than "long" and I don't foresee them needing a
long long type. I think assuming that a long is the largest item should
due.

We can add a comment next to these structures specifying this
dependency, and hopefully it would be updated if we ever do include a
long long in them.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 19, 2011, 10:09 p.m. UTC | #11
From: Steven Rostedt <rostedt@goodmis.org>
Date: Wed, 19 Jan 2011 17:00:23 -0500

> We can add a comment next to these structures specifying this
> dependency, and hopefully it would be updated if we ever do include a
> long long in them.

Yes, I think a huge comment should be placed somewhere and also the
commit message for the final version of this fix should be quite
verbose :-)
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Jan. 19, 2011, 10:13 p.m. UTC | #12
* David Miller (davem@davemloft.net) wrote:
> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Date: Wed, 19 Jan 2011 10:33:26 -0500
> 
> > I'm still unsure that __long_long_aligned is needed over __long_aligned though.
> > AFAIK, the only requirement we have for, e.g. tracepoints, is to align on the
> > pointer size (sizeof(long)), so RCU pointer updates are performed atomically.
> > Aligning on the pointer size also allows the architecture to efficiently read
> > the field content. What does aligning on sizeof(long long) bring to us ? Is it
> > that you are concerned about the fact that the "aligned" type attribute, when
> > applied to a structure, is only used as a lower-bound by the compiler ? In that
> > case, we might want to consider using "packed" too:
> 
> My concern is that if there is ever a u64 or similarly "long long"
> typed member in these tracing structures, it will not be aligned
> sufficiently to avoid unaligned access traps on 32-bit systems.

Hrm, I'd like to see what kind of ill-conceived 32-bit architecture would
generate a unaligned access for a 32-bit aligned u64. Do you have examples in
mind ? By definition, the memory accesses should be at most 32-bit, no ? AFAIK,
gcc treats u64 as two distinct reads on all 32-bit architectures.

> If your suggestion defines the lowest possible alignment and GCC will
> do the right thing and "up-align" the structure if necessary, then
> fine.

Well, I must admit that my assumption is that aligning on the "long" size should
be the only alignment required, both on 32-bit and 64-bit. But I'm curious to
see if there are indeed architectures that break this assumption.

Ideally, I'd like to avoid letting gcc up-align a structure, because it is then
hard to know for sure what the alignment value of the section should be (in the
linker script, we can safely choose 32, but it's more a "safe choice" than
anything else). Moreover, I'm not convinced that gcc will choose to up-align the
structure with the exact same alignment values for both the type declaration and
the variable definition (I'm deeply distrusting gcc to do the right thing here).

> If you add "packed" it is going to screw everything up and we'll
> essentially be back to square one.
> 
> On RISC like sparc64, "packed" causes even 16-bit words to be read and
> written a byte at a time.
> 
> Never use "packed" under any circumstances unless absolutely
> unavoidable.

gcc on my sparc64 box (32-bit userland) disagrees with you here ;) Using
gcc (Debian 4.3.3-14) 4.3.3, here is a demonstration that, indeed, "packed"
generates aweful code, but that "packed, aligned(4 or 8)" generates pretty
decent code:

compiling for sparc32:

struct test {
        unsigned long a;
        unsigned long b;
};

Storing to test "a" field in a main() that returns 0, with -O0:

000104f0 <main>:
   104f0:       9d e3 bf 90     save  %sp, -112, %sp
   104f4:       03 00 00 81     sethi  %hi(0x20400), %g1
   104f8:       84 10 63 9c     or  %g1, 0x39c, %g2     ! 2079c <blah>
   104fc:       82 10 20 2a     mov  0x2a, %g1
   10500:       c2 20 80 00     st  %g1, [ %g2 ]
   10504:       82 10 20 00     clr  %g1
   10508:       b0 10 00 01     mov  %g1, %i0
   1050c:       81 e8 00 00     restore 
   10510:       81 c3 e0 08     retl 
   10514:       01 00 00 00     nop 

__attribute__((packed))

000104f0 <main>:
   104f0:       9d e3 bf 90     save  %sp, -112, %sp
   104f4:       03 00 00 81     sethi  %hi(0x20400), %g1
   104f8:       84 10 63 dc     or  %g1, 0x3dc, %g2     ! 207dc <blah>
   104fc:       c2 08 80 00     ldub  [ %g2 ], %g1
   10500:       82 08 60 00     and  %g1, 0, %g1
   10504:       c2 28 80 00     stb  %g1, [ %g2 ]
   10508:       c2 08 a0 01     ldub  [ %g2 + 1 ], %g1
   1050c:       82 08 60 00     and  %g1, 0, %g1
   10510:       c2 28 a0 01     stb  %g1, [ %g2 + 1 ]
   10514:       c2 08 a0 02     ldub  [ %g2 + 2 ], %g1
   10518:       82 08 60 00     and  %g1, 0, %g1
   1051c:       c2 28 a0 02     stb  %g1, [ %g2 + 2 ]
   10520:       c2 08 a0 03     ldub  [ %g2 + 3 ], %g1
   10524:       82 08 60 00     and  %g1, 0, %g1
   10528:       82 10 60 2a     or  %g1, 0x2a, %g1
   1052c:       c2 28 a0 03     stb  %g1, [ %g2 + 3 ]
   10530:       82 10 20 00     clr  %g1
   10534:       b0 10 00 01     mov  %g1, %i0
   10538:       81 e8 00 00     restore 
   1053c:       81 c3 e0 08     retl 
   10540:       01 00 00 00     nop 

__attribute__((packed, aligned(4)))

000104f0 <main>:
   104f0:       9d e3 bf 90     save  %sp, -112, %sp
   104f4:       03 00 00 81     sethi  %hi(0x20400), %g1
   104f8:       84 10 63 9c     or  %g1, 0x39c, %g2     ! 2079c <blah>
   104fc:       82 10 20 2a     mov  0x2a, %g1
   10500:       c2 20 80 00     st  %g1, [ %g2 ]
   10504:       82 10 20 00     clr  %g1
   10508:       b0 10 00 01     mov  %g1, %i0
   1050c:       81 e8 00 00     restore 
   10510:       81 c3 e0 08     retl 
   10514:       01 00 00 00     nop 

__attribute__((packed, aligned(8)))

000104f0 <main>:
   104f0:       9d e3 bf 90     save  %sp, -112, %sp
   104f4:       03 00 00 81     sethi  %hi(0x20400), %g1
   104f8:       84 10 63 a0     or  %g1, 0x3a0, %g2     ! 207a0 <blah>
   104fc:       82 10 20 2a     mov  0x2a, %g1
   10500:       c2 20 80 00     st  %g1, [ %g2 ]
   10504:       82 10 20 00     clr  %g1
   10508:       b0 10 00 01     mov  %g1, %i0
   1050c:       81 e8 00 00     restore 
   10510:       81 c3 e0 08     retl 
   10514:       01 00 00 00     nop 

Now about :

struct test {
        unsigned long long a;
        unsigned long long b;
};

__attribute__((packed, aligned(8)))
(and without attribute)

000104f0 <main>:
   104f0:       9d e3 bf 90     save  %sp, -112, %sp
   104f4:       03 00 00 81     sethi  %hi(0x20400), %g1
   104f8:       82 10 63 a0     or  %g1, 0x3a0, %g1     ! 207a0 <blah>
   104fc:       84 10 20 00     clr  %g2
   10500:       86 10 20 2a     mov  0x2a, %g3
   10504:       c4 38 40 00     std  %g2, [ %g1 ]
   10508:       82 10 20 00     clr  %g1
   1050c:       b0 10 00 01     mov  %g1, %i0
   10510:       81 e8 00 00     restore 
   10514:       81 c3 e0 08     retl 
   10518:       01 00 00 00     nop 
   1051c:       00 00 00 00     illtrap  0

__attribute__((packed, aligned(4)))

000104f0 <main>:
   104f0:       9d e3 bf 90     save  %sp, -112, %sp
   104f4:       03 00 00 81     sethi  %hi(0x20400), %g1
   104f8:       84 10 63 9c     or  %g1, 0x39c, %g2     ! 2079c <blah>
   104fc:       82 10 20 2a     mov  0x2a, %g1
   10500:       c2 20 a0 04     st  %g1, [ %g2 + 4 ]
   10504:       c0 20 80 00     clr  [ %g2 ]
   10508:       82 10 20 00     clr  %g1
   1050c:       b0 10 00 01     mov  %g1, %i0
   10510:       81 e8 00 00     restore 
   10514:       81 c3 e0 08     retl 
   10518:       01 00 00 00     nop 
   1051c:       00 00 00 00     illtrap  0

So the packed, aligned(__alignof__(long)) options does not look that bad.

Mathieu
David Miller Jan. 19, 2011, 10:21 p.m. UTC | #13
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Wed, 19 Jan 2011 17:13:27 -0500

> Hrm, I'd like to see what kind of ill-conceived 32-bit architecture would
> generate a unaligned access for a 32-bit aligned u64. Do you have examples in
> mind ? By definition, the memory accesses should be at most 32-bit, no ? AFAIK,
> gcc treats u64 as two distinct reads on all 32-bit architectures.

Sparc 32-bit has 64-bit loads and stores, GCC uses them because the ABI
specifies that every structure is at least 8 byte aligned.

> gcc on my sparc64 box (32-bit userland) disagrees with you here ;) Using
> gcc (Debian 4.3.3-14) 4.3.3, here is a demonstration that, indeed, "packed"
> generates aweful code, but that "packed, aligned(4 or 8)" generates pretty
> decent code:

Amazing, if this works then do it.

But please document this fully with comments and such :-)
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Jan. 19, 2011, 10:21 p.m. UTC | #14
* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Wed, 2011-01-19 at 13:40 -0800, David Miller wrote:
> 
> > My concern is that if there is ever a u64 or similarly "long long"
> > typed member in these tracing structures, it will not be aligned
> > sufficiently to avoid unaligned access traps on 32-bit systems.
> 
> The structure that gets placed in this section is the ftrace_event_call.
> It consists only of pointers, a struct list_head, a couple of "int", and
> a struct trace_event, which consists of: a struct hlist_node, a struct
> list_head, an int, and a pointer.
> 
> None of these are more than "long" and I don't foresee them needing a
> long long type. I think assuming that a long is the largest item should
> due.
> 
> We can add a comment next to these structures specifying this
> dependency, and hopefully it would be updated if we ever do include a
> long long in them.

I still wonder how a 32-bit system can generate an unaligned access trap for an
access to a 64-bit variable aligned on 32-bit, given that there is, by
definition, no 64-bit memory accesses available on the architecture ?

Mathieu
David Miller Jan. 19, 2011, 10:23 p.m. UTC | #15
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Wed, 19 Jan 2011 17:21:44 -0500

> I still wonder how a 32-bit system can generate an unaligned access trap for an
> access to a 64-bit variable aligned on 32-bit, given that there is, by
> definition, no 64-bit memory accesses available on the architecture ?

Sparc 32-bit (and I believe MIPS 32-bit as well) have such 64-bit
load and store instructions.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sam Ravnborg Jan. 19, 2011, 10:32 p.m. UTC | #16
> 
> I still wonder how a 32-bit system can generate an unaligned access trap for an
> access to a 64-bit variable aligned on 32-bit, given that there is, by
> definition, no 64-bit memory accesses available on the architecture ?

From the SPARC V8 manual (this is the 32 bit version of SPARC):

    Load/Store Instructions

    ...
    Integer load and store instructions support byte (8-bit), halfword (16-bit), word
    (32-bit), and doubleword (64-bit) accesses.
    ...

    Alignment Restrictions

    Halfword accesses must be aligned on a 2-byte boundary, word accesses (which
    include instruction fetches) must be aligned on a 4-byte boundary, and doubleword
    accesses must be aligned on an 8-byte boundary. An improperly aligned
    address causes a load or store instruction to generate a mem_address_not_aligned
    trap.


	Sam
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Jan. 19, 2011, 10:33 p.m. UTC | #17
* David Miller (davem@davemloft.net) wrote:
> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Date: Wed, 19 Jan 2011 17:13:27 -0500
> 
> > Hrm, I'd like to see what kind of ill-conceived 32-bit architecture would
> > generate a unaligned access for a 32-bit aligned u64. Do you have examples in
> > mind ? By definition, the memory accesses should be at most 32-bit, no ? AFAIK,
> > gcc treats u64 as two distinct reads on all 32-bit architectures.
> 
> Sparc 32-bit has 64-bit loads and stores, GCC uses them because the ABI
> specifies that every structure is at least 8 byte aligned.

Ah, that's the answer I was looking for, thanks!

> 
> > gcc on my sparc64 box (32-bit userland) disagrees with you here ;) Using
> > gcc (Debian 4.3.3-14) 4.3.3, here is a demonstration that, indeed, "packed"
> > generates aweful code, but that "packed, aligned(4 or 8)" generates pretty
> > decent code:
> 
> Amazing, if this works then do it.
> 
> But please document this fully with comments and such :-)

I will, I will! ;)

So I guess we go for the following. Is it verbose enough ?

/*
 * __u64_packed_aligned:
 *
 * Forces gcc to use the u64 type alignment, up-aligning or down-aligning the
 * target type if necessary. The memory accesses to the target structure are
 * efficient (does not require bytewise memory accesses) and the atomic pointer
 * update guarantees required by RCU are kept. u64 is considered as the largest
 * type that can generate a trap for unaligned accesses (u64 on sparc32 needs to
 * be aligned on 64-bit).
 *
 * Specifying both "packed" and "aligned" generates decent code (without the
 * bytewise memory accesses generated by simply using "packed"), and forces
 * gcc to down-align the structure alignment to the alignment of a u64 type.
 *
 * This alignment should be used for both structure definitions and declarations
 * (as *both* the type and variable attribute) when using the "section"
 * attribute to generate arrays of structures.
 */
#define __u64_packed_aligned \
        __attribute__((__packed__, __aligned__(__alignof__(long long))))

Thanks,

Mathieu
Mathieu Desnoyers Jan. 19, 2011, 10:34 p.m. UTC | #18
* Sam Ravnborg (sam@ravnborg.org) wrote:
> > 
> > I still wonder how a 32-bit system can generate an unaligned access trap for an
> > access to a 64-bit variable aligned on 32-bit, given that there is, by
> > definition, no 64-bit memory accesses available on the architecture ?
> 
> From the SPARC V8 manual (this is the 32 bit version of SPARC):
> 
>     Load/Store Instructions
> 
>     ...
>     Integer load and store instructions support byte (8-bit), halfword (16-bit), word
>     (32-bit), and doubleword (64-bit) accesses.
>     ...
> 
>     Alignment Restrictions
> 
>     Halfword accesses must be aligned on a 2-byte boundary, word accesses (which
>     include instruction fetches) must be aligned on a 4-byte boundary, and doubleword
>     accesses must be aligned on an 8-byte boundary. An improperly aligned
>     address causes a load or store instruction to generate a mem_address_not_aligned
>     trap.

Ah! There is always an odd case ;) Thanks!

Mathieu
David Miller Jan. 20, 2011, 12:41 a.m. UTC | #19
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Wed, 19 Jan 2011 17:33:39 -0500

> So I guess we go for the following. Is it verbose enough ?

It's got all of the details that seem to matter, thanks.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Jan. 21, 2011, 12:04 a.m. UTC | #20
* David Miller (davem@davemloft.net) wrote:
> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Date: Wed, 19 Jan 2011 17:33:39 -0500
> 
> > So I guess we go for the following. Is it verbose enough ?
> 
> It's got all of the details that seem to matter, thanks.

I'm letting people following this thread and the bug report know that I posted
the patch we discussed about here in the following thread:

https://lkml.org/lkml/2011/1/19/509

Feedback is welcome,

Mathieu
Richard Mortimer Jan. 21, 2011, 6:06 p.m. UTC | #21
On 21/01/2011 00:04, Mathieu Desnoyers wrote:
> * David Miller (davem@davemloft.net) wrote:
>> From: Mathieu Desnoyers<mathieu.desnoyers@efficios.com>
>> Date: Wed, 19 Jan 2011 17:33:39 -0500
>>
>>> So I guess we go for the following. Is it verbose enough ?
>>
>> It's got all of the details that seem to matter, thanks.
>
> I'm letting people following this thread and the bug report know that I posted
> the patch we discussed about here in the following thread:
>
> https://lkml.org/lkml/2011/1/19/509
>
> Feedback is welcome,
>
Hi Mathieu,

Thanks for this.

The good news is that I have a 2.6.37 kernel + your patches booting and 
it has been running for about an hour on sparc64/Sun Fire V120. I did a 
bit of poking around in /debug and turned on/off a bit of tracing and 
nothing broke.

I did have a few problems applying your patch series. Your diffs have 
semicolons at the end of a couple of the macros that don't appear in the 
mainline 2.6.37. One was circa line 174 of include/linux/tracepoint.h 
but I don't have the other to hand at the moment and I don't have much 
time before I need to go out.

I'm also getting a lot of Kernel unaligned access errors from the 
kernel. I don't know if they are related to this or not and this is the 
first time that I personally have got 2.6.37 to boot on sparc64. The 
messages that I am getting seem to be repeats of

[ 4376.807811] Kernel unaligned access at TPC[456e94] 
try_to_wake_up+0x58/0xec
[ 4376.807908] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
[ 4376.808044] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
[ 4376.808871] Kernel unaligned access at TPC[456e94] 
try_to_wake_up+0x58/0xec
[ 4376.808965] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
[ 4381.813354] log_unaligned: 337 callbacks suppressed

I have to go out now but will be around later/over the weekend.

Regards

Richard

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Jan. 21, 2011, 6:52 p.m. UTC | #22
* Richard Mortimer (richm@oldelvet.org.uk) wrote:
>
>
> On 21/01/2011 00:04, Mathieu Desnoyers wrote:
>> * David Miller (davem@davemloft.net) wrote:
>>> From: Mathieu Desnoyers<mathieu.desnoyers@efficios.com>
>>> Date: Wed, 19 Jan 2011 17:33:39 -0500
>>>
>>>> So I guess we go for the following. Is it verbose enough ?
>>>
>>> It's got all of the details that seem to matter, thanks.
>>
>> I'm letting people following this thread and the bug report know that I posted
>> the patch we discussed about here in the following thread:
>>
>> https://lkml.org/lkml/2011/1/19/509
>>
>> Feedback is welcome,
>>

Hi Richard,

> Hi Mathieu,
>
> Thanks for this.
>
> The good news is that I have a 2.6.37 kernel + your patches booting and  
> it has been running for about an hour on sparc64/Sun Fire V120. I did a  
> bit of poking around in /debug and turned on/off a bit of tracing and  
> nothing broke.

That's a good start :)

> I did have a few problems applying your patch series. Your diffs have  
> semicolons at the end of a couple of the macros that don't appear in the  
> mainline 2.6.37. One was circa line 174 of include/linux/tracepoint.h  
> but I don't have the other to hand at the moment and I don't have much  
> time before I need to go out.

I know what's causing this, I'll rebase my patch before my next post (I was
working at the tail of my lttng patch queue). Thanks for letting me know.

> I'm also getting a lot of Kernel unaligned access errors from the  
> kernel. I don't know if they are related to this or not and this is the  
> first time that I personally have got 2.6.37 to boot on sparc64. The  
> messages that I am getting seem to be repeats of
>
> [ 4376.807811] Kernel unaligned access at TPC[456e94]  
> try_to_wake_up+0x58/0xec
> [ 4376.807908] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
> [ 4376.808044] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
> [ 4376.808871] Kernel unaligned access at TPC[456e94]  
> try_to_wake_up+0x58/0xec
> [ 4376.808965] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
> [ 4381.813354] log_unaligned: 337 callbacks suppressed
>
> I have to go out now but will be around later/over the weekend.

Can you send me your .config ? I'd really like to see which of tracepoint/static
jump patching are enabled.

Do you get this message as soon as the system boots, or only when you enable
tracing ?

Thanks,

Mathieu

>
> Regards
>
> Richard
>
Mathieu Desnoyers Jan. 21, 2011, 7:15 p.m. UTC | #23
* Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> * Richard Mortimer (richm@oldelvet.org.uk) wrote:
[...]
> > I'm also getting a lot of Kernel unaligned access errors from the  
> > kernel. I don't know if they are related to this or not and this is the  
> > first time that I personally have got 2.6.37 to boot on sparc64. The  
> > messages that I am getting seem to be repeats of
> >
> > [ 4376.807811] Kernel unaligned access at TPC[456e94]  
> > try_to_wake_up+0x58/0xec
> > [ 4376.807908] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
> > [ 4376.808044] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
> > [ 4376.808871] Kernel unaligned access at TPC[456e94]  
> > try_to_wake_up+0x58/0xec
> > [ 4376.808965] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
> > [ 4381.813354] log_unaligned: 337 callbacks suppressed
> >
> > I have to go out now but will be around later/over the weekend.

OK, I pinpointed it to my use of the "packed" attribute. So within the
structure:

struct test {
        const char *a;
        int b;
        void *c;
        void *d;
        void *e;
} __attribute__((packed, aligned(8)));

(on sparc64)

It provides the following offsets:

0 8 12 20 28

which is clearly wrong in terms of inner alignment: it removes the padding
between b and c. I am really tempted to just remove the "packed" attribute from
there: our goal is really to make sure the 8-byte accesses are all aligned after
all. So theoretically gcc could decide to align all struct test arrays and
pointers on an alignment larger than 8 if we just specify the aligned(8) type
attribute (because the type attribute is just a minimum floor alignment value),
but the only reason I would see for gcc to align these on larger alignment would
be that the structures would contain a field that requires such largish
alignment (which I doubt we have in the kernel).

I'll prepare updated patches shortly.

Thanks,

Mathieu
Richard Mortimer Jan. 21, 2011, 8:14 p.m. UTC | #24
On 21/01/2011 18:52, Mathieu Desnoyers wrote:
> * Richard Mortimer (richm@oldelvet.org.uk) wrote:
...

>> I'm also getting a lot of Kernel unaligned access errors from the
>> kernel. I don't know if they are related to this or not and this is the
>> first time that I personally have got 2.6.37 to boot on sparc64. The
>> messages that I am getting seem to be repeats of
>>
>> [ 4376.807811] Kernel unaligned access at TPC[456e94]
>> try_to_wake_up+0x58/0xec
>> [ 4376.807908] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
>> [ 4376.808044] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
>> [ 4376.808871] Kernel unaligned access at TPC[456e94]
>> try_to_wake_up+0x58/0xec
>> [ 4376.808965] Kernel unaligned access at TPC[75541c] schedule+0x454/0x660
>> [ 4381.813354] log_unaligned: 337 callbacks suppressed
>>
>> I have to go out now but will be around later/over the weekend.
>
> Can you send me your .config ? I'd really like to see which of tracepoint/static
> jump patching are enabled.
>
It is basically a clone of the Debian 2.6.37 experimental kernel 
.config. It has been through a couple of make oldconfig cycles with 
different kernel versions so it should be pretty similar to what the 
Debian folks intend. Anyhow if it is still useful you can download it from

http://bridge.oldelvet.org.uk/download/110121_linux-2.6.37_sparc64.config

> Do you get this message as soon as the system boots, or only when you enable
> tracing ?
Good question. Looking back through the logs it started about 14 minutes 
after I booted. I don't remember how that relates to me starting to play 
around with tracing but I'm guessing that you have a good idea what is 
happening.

I've managed to grab disassembly of the two locations that are reporting 
problems just in case that helps.

0000000000456e3c <try_to_wake_up>:
   456e3c:       9d e3 bf 50     save  %sp, -176, %sp
   456e40:       a5 52 00 00     rdpr  %pil, %l2
   456e44:       82 14 a0 0e     or  %l2, 0xe, %g1
   456e48:       91 90 60 00     wrpr  %g1, 0, %pil
   456e4c:       c2 5e 00 00     ldx  [ %i0 ], %g1
   456e50:       b2 0e 40 01     and  %i1, %g1, %i1
   456e54:       02 ce 40 2b     brz  %i1, 456f00 <try_to_wake_up+0xc4>
   456e58:       a2 10 20 00     clr  %l1
   456e5c:       c2 06 20 70     ld  [ %i0 + 0x70 ], %g1
   456e60:       80 a0 60 00     cmp  %g1, 0
   456e64:       12 48 00 07     bne  %icc, 456e80 <try_to_wake_up+0x44>
   456e68:       03 00 22 a5     sethi  %hi(0x8a9400), %g1
   456e6c:       90 10 00 18     mov  %i0, %o0
   456e70:       7f ff ff dd     call  456de4 <T.1233>
   456e74:       92 10 20 01     mov  1, %o1
   456e78:       a2 10 20 01     mov  1, %l1
   456e7c:       03 00 22 a5     sethi  %hi(0x8a9400), %g1
   456e80:       a6 10 00 11     mov  %l1, %l3
   456e84:       c4 00 62 80     ld  [ %g1 + 0x280 ], %g2
   456e88:       80 a0 a0 00     cmp  %g2, 0
   456e8c:       02 48 00 0f     be  %icc, 456ec8 <try_to_wake_up+0x8c>
   456e90:       a8 0c 60 ff     and  %l1, 0xff, %l4
   456e94:       e0 58 62 94     ldx  [ %g1 + 0x294 ], %l0
   456e98:       02 c4 00 0d     brz,pn   %l0, 456ecc <try_to_wake_up+0x90>
   456e9c:       11 00 21 93     sethi  %hi(0x864c00), %o0
   456ea0:       a9 3d 20 00     sra  %l4, 0, %l4
   456ea4:       c2 5c 00 00     ldx  [ %l0 ], %g1
   456ea8:       92 10 00 18     mov  %i0, %o1
   456eac:       94 10 00 14     mov  %l4, %o2
   456eb0:       d0 5c 20 08     ldx  [ %l0 + 8 ], %o0
   456eb4:       9f c0 40 00     call  %g1
   456eb8:       a0 04 20 10     add  %l0, 0x10, %l0
   456ebc:       c2 5c 00 00     ldx  [ %l0 ], %g1


...
   7553f8:       81 58 00 00     flushw
   7553fc:       05 00 22 a5     sethi  %hi(0x8a9400), %g2
   755400:       84 10 a2 c8     or  %g2, 0x2c8, %g2     ! 8a96c8 
<__tracepoint_sched_switch>
   755404:       c2 00 a0 08     ld  [ %g2 + 8 ], %g1
   755408:       80 a0 60 00     cmp  %g1, 0
   75540c:       22 48 00 11     be,a   %icc, 755450 <schedule+0x488>
   755410:       e4 5c 21 50     ldx  [ %l0 + 0x150 ], %l2
   755414:       07 00 22 a5     sethi  %hi(0x8a9400), %g3
   755418:       86 10 e2 e4     or  %g3, 0x2e4, %g3     ! 8a96e4 
<__tracepoint_sched_switch+0x1c>
   75541c:       e4 58 c0 00     ldx  [ %g3 ], %l2
   755420:       22 c4 80 0c     brz,a,pn   %l2, 755450 <schedule+0x488>
   755424:       e4 5c 21 50     ldx  [ %l0 + 0x150 ], %l2
   755428:       c2 5c 80 00     ldx  [ %l2 ], %g1

Regards

Richard

P.S. I saw your followup mail so hopefully this matches what you have found!
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Jan. 21, 2011, 8:40 p.m. UTC | #25
* Richard Mortimer (richm@oldelvet.org.uk) wrote:
[...]
> P.S. I saw your followup mail so hopefully this matches what you have found!

Thanks for the info! At first glance, it does not seem to contradict my
findings. When you find time, can you have a try at v3 I just posted ?

Make sure to start tracing in your tests, as I think the unaligned access only
happens when accessing the struct tracepoint fields below the "int state" field.

Thanks,

Mathieu
Richard Mortimer Jan. 21, 2011, 10:50 p.m. UTC | #26
On 21/01/2011 20:40, Mathieu Desnoyers wrote:
> * Richard Mortimer (richm@oldelvet.org.uk) wrote:
> [...]
>> P.S. I saw your followup mail so hopefully this matches what you have found!
>
> Thanks for the info! At first glance, it does not seem to contradict my
> findings. When you find time, can you have a try at v3 I just posted ?
>
I'm setting the compilation off now. I will give it a whirl some time 
Saturday or Sunday.

Before I did that I created kernel/sched.s and have verified that the 
two error locations are accesses to it_func_ptr as you suspected.


ldx     [%g1+%lo(__tracepoint_sched_wakeup)+28], %l0    !, it_func_ptr
and
ldx     [%g3], %l2      !, it_func_ptr

with v3 of you patches applied it uses +32 to get the offset of 
it_func_ptr so that looks good.

> Make sure to start tracing in your tests, as I think the unaligned access only
> happens when accessing the struct tracepoint fields below the "int state" field.
>
Will do.

Regards

Richard
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Mortimer Jan. 22, 2011, 6:42 p.m. UTC | #27
On 21/01/2011 22:50, Richard Mortimer wrote:
>
>
> On 21/01/2011 20:40, Mathieu Desnoyers wrote:
>> * Richard Mortimer (richm@oldelvet.org.uk) wrote:
>>
>> Thanks for the info! At first glance, it does not seem to contradict my
>> findings. When you find time, can you have a try at v3 I just posted ?
>>
> I'm setting the compilation off now. I will give it a whirl some time
> Saturday or Sunday.
>
...

>
>> Make sure to start tracing in your tests, as I think the unaligned
>> access only
>> happens when accessing the struct tracepoint fields below the "int
>> state" field.
>>

I have the revised patch running now with tracing enabled. Everything 
looks good to me.

Thanks.

Richard
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers Jan. 22, 2011, 6:53 p.m. UTC | #28
* Richard Mortimer (richm@oldelvet.org.uk) wrote:
> On 21/01/2011 22:50, Richard Mortimer wrote:
>>
>>
>> On 21/01/2011 20:40, Mathieu Desnoyers wrote:
>>> * Richard Mortimer (richm@oldelvet.org.uk) wrote:
>>>
>>> Thanks for the info! At first glance, it does not seem to contradict my
>>> findings. When you find time, can you have a try at v3 I just posted ?
>>>
>> I'm setting the compilation off now. I will give it a whirl some time
>> Saturday or Sunday.
>>
> ...
>
>>
>>> Make sure to start tracing in your tests, as I think the unaligned
>>> access only
>>> happens when accessing the struct tracepoint fields below the "int
>>> state" field.
>>>
>
> I have the revised patch running now with tracing enabled. Everything  
> looks good to me.

Thanks for testing it!

Mathieu

>
> Thanks.
>
> Richard
diff mbox

Patch

Index: linux-2.6-lttng/include/linux/compiler.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/compiler.h
+++ linux-2.6-lttng/include/linux/compiler.h
@@ -57,6 +57,8 @@  extern void __chk_io_ptr(const volatile
 # include <linux/compiler-intel.h>
 #endif
 
+#define __long_aligned __attribute__((__aligned__(__alignof__(long))))
+
 /*
  * Generic compiler-dependent macros required for kernel
  * build go below this comment. Actual compiler/compiler version
@@ -78,7 +80,7 @@  struct ftrace_branch_data {
 		};
 		unsigned long miss_hit[2];
 	};
-};
+} __long_aligned;
 
 /*
  * Note: DISABLE_BRANCH_PROFILING can be used by special lowlevel code
@@ -94,7 +96,7 @@  void ftrace_likely_update(struct ftrace_
 #define __branch_check__(x, expect) ({					\
 			int ______r;					\
 			static struct ftrace_branch_data		\
-				__attribute__((__aligned__(4)))		\
+				__long_aligned				\
 				__attribute__((section("_ftrace_annotated_branch"))) \
 				______f = {				\
 				.func = __func__,			\
@@ -129,7 +131,7 @@  void ftrace_likely_update(struct ftrace_
 	({								\
 		int ______r;						\
 		static struct ftrace_branch_data			\
-			__attribute__((__aligned__(4)))			\
+			__long_aligned					\
 			__attribute__((section("_ftrace_branch")))	\
 			______f = {					\
 				.func = __func__,			\
Index: linux-2.6-lttng/include/linux/syscalls.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/syscalls.h
+++ linux-2.6-lttng/include/linux/syscalls.h
@@ -126,11 +126,13 @@  extern struct trace_event_functions exit
 
 #define SYSCALL_TRACE_ENTER_EVENT(sname)				\
 	static struct syscall_metadata					\
-	__attribute__((__aligned__(4))) __syscall_meta_##sname;		\
+	__long_aligned							\
+	__syscall_meta_##sname;						\
 	static struct ftrace_event_call					\
-	__attribute__((__aligned__(4))) event_enter_##sname;		\
+	__long_aligned							\
+	event_enter_##sname;						\
 	static struct ftrace_event_call __used				\
-	  __attribute__((__aligned__(4)))				\
+	  __long_aligned						\
 	  __attribute__((section("_ftrace_events")))			\
 	  event_enter_##sname = {					\
 		.name                   = "sys_enter"#sname,		\
@@ -141,11 +143,13 @@  extern struct trace_event_functions exit
 
 #define SYSCALL_TRACE_EXIT_EVENT(sname)					\
 	static struct syscall_metadata					\
-	__attribute__((__aligned__(4))) __syscall_meta_##sname;		\
+	__long_aligned							\
+	__syscall_meta_##sname;						\
 	static struct ftrace_event_call					\
-	__attribute__((__aligned__(4))) event_exit_##sname;		\
+	__long_aligned							\
+	event_exit_##sname;						\
 	static struct ftrace_event_call __used				\
-	  __attribute__((__aligned__(4)))				\
+	  __long_aligned						\
 	  __attribute__((section("_ftrace_events")))			\
 	  event_exit_##sname = {					\
 		.name                   = "sys_exit"#sname,		\
@@ -158,7 +162,7 @@  extern struct trace_event_functions exit
 	SYSCALL_TRACE_ENTER_EVENT(sname);			\
 	SYSCALL_TRACE_EXIT_EVENT(sname);			\
 	static struct syscall_metadata __used			\
-	  __attribute__((__aligned__(4)))			\
+	  __long_aligned					\
 	  __attribute__((section("__syscalls_metadata")))	\
 	  __syscall_meta_##sname = {				\
 		.name 		= "sys"#sname,			\
@@ -174,7 +178,7 @@  extern struct trace_event_functions exit
 	SYSCALL_TRACE_ENTER_EVENT(_##sname);			\
 	SYSCALL_TRACE_EXIT_EVENT(_##sname);			\
 	static struct syscall_metadata __used			\
-	  __attribute__((__aligned__(4)))			\
+	  __long_aligned					\
 	  __attribute__((section("__syscalls_metadata")))	\
 	  __syscall_meta__##sname = {				\
 		.name 		= "sys_"#sname,			\
Index: linux-2.6-lttng/include/trace/ftrace.h
===================================================================
--- linux-2.6-lttng.orig/include/trace/ftrace.h
+++ linux-2.6-lttng/include/trace/ftrace.h
@@ -79,7 +79,7 @@ 
 #undef DEFINE_EVENT
 #define DEFINE_EVENT(template, name, proto, args)	\
 	static struct ftrace_event_call	__used		\
-	__attribute__((__aligned__(4))) event_##name;
+	__long_aligned event_##name;
 
 #undef DEFINE_EVENT_PRINT
 #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
@@ -447,7 +447,7 @@  static inline notrace int ftrace_get_off
  * };
  *
  * static struct ftrace_event_call __used
- * __attribute__((__aligned__(4)))
+ * __long_aligned
  * __attribute__((section("_ftrace_events"))) event_<call> = {
  *	.name			= "<call>",
  *	.class			= event_class_<template>,
@@ -581,7 +581,7 @@  static struct ftrace_event_class __used
 #define DEFINE_EVENT(template, call, proto, args)			\
 									\
 static struct ftrace_event_call __used					\
-__attribute__((__aligned__(4)))						\
+__long_aligned								\
 __attribute__((section("_ftrace_events"))) event_##call = {		\
 	.name			= #call,				\
 	.class			= &event_class_##template,		\
@@ -595,7 +595,7 @@  __attribute__((section("_ftrace_events")
 static const char print_fmt_##call[] = print;				\
 									\
 static struct ftrace_event_call __used					\
-__attribute__((__aligned__(4)))						\
+__long_aligned								\
 __attribute__((section("_ftrace_events"))) event_##call = {		\
 	.name			= #call,				\
 	.class			= &event_class_##template,		\
Index: linux-2.6-lttng/kernel/trace/trace.h
===================================================================
--- linux-2.6-lttng.orig/kernel/trace/trace.h
+++ linux-2.6-lttng/kernel/trace/trace.h
@@ -749,7 +749,7 @@  extern const char *__stop___trace_bprint
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(call, struct_name, id, tstruct, print)		\
 	extern struct ftrace_event_call					\
-	__attribute__((__aligned__(4))) event_##call;
+	__long_aligned event_##call;
 #undef FTRACE_ENTRY_DUP
 #define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print)		\
 	FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
Index: linux-2.6-lttng/kernel/trace/trace_export.c
===================================================================
--- linux-2.6-lttng.orig/kernel/trace/trace_export.c
+++ linux-2.6-lttng/kernel/trace/trace_export.c
@@ -156,7 +156,7 @@  struct ftrace_event_class event_class_ft
 };									\
 									\
 struct ftrace_event_call __used						\
-__attribute__((__aligned__(4)))						\
+__long_aligned								\
 __attribute__((section("_ftrace_events"))) event_##call = {		\
 	.name			= #call,				\
 	.event.type		= etype,				\
Index: linux-2.6-lttng/include/linux/ftrace_event.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/ftrace_event.h
+++ linux-2.6-lttng/include/linux/ftrace_event.h
@@ -194,7 +194,7 @@  struct ftrace_event_call {
 	int				perf_refcount;
 	struct hlist_head __percpu	*perf_events;
 #endif
-};
+} __long_aligned;
 
 #define PERF_MAX_TRACE_SIZE	2048
 
Index: linux-2.6-lttng/include/trace/syscall.h
===================================================================
--- linux-2.6-lttng.orig/include/trace/syscall.h
+++ linux-2.6-lttng/include/trace/syscall.h
@@ -29,7 +29,7 @@  struct syscall_metadata {
 
 	struct ftrace_event_call *enter_event;
 	struct ftrace_event_call *exit_event;
-};
+} __long_aligned;
 
 #ifdef CONFIG_FTRACE_SYSCALLS
 extern unsigned long arch_syscall_addr(int nr);