Patchwork longjmp question

login
register
mail settings
Submitter David Miller
Date Oct. 8, 2011, 6:43 a.m.
Message ID <20111008.024355.585508909596524922.davem@davemloft.net>
Download mbox | patch
Permalink /patch/118411/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

David Miller - Oct. 8, 2011, 6:43 a.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 8 Oct 2011 00:22:10 +0100

> #define RW_FP [%fp + 0x48]

Let's just do away with this value entirely and do this the
safest way possible, since this is the slow path:

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jurij Smakov - Oct. 8, 2011, 12:55 p.m.
On Sat, Oct 08, 2011 at 02:43:55AM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Sat, 8 Oct 2011 00:22:10 +0100
> 
> > #define RW_FP [%fp + 0x48]
> 
> Let's just do away with this value entirely and do this the
> safest way possible, since this is the slow path:
> 
> diff --git a/sysdeps/sparc/sparc32/__longjmp.S b/sysdeps/sparc/sparc32/__longjmp.S
> index a5453b4..70e8e29 100644
> --- a/sysdeps/sparc/sparc32/__longjmp.S
> +++ b/sysdeps/sparc/sparc32/__longjmp.S
> @@ -22,7 +22,6 @@
>  #include <jmpbuf-offsets.h>
>  #define ENV(base,reg) [%base + (reg * 4)]
>  #define ST_FLUSH_WINDOWS 3
> -#define RW_FP [%fp + 0x48]
>  
>  ENTRY(__longjmp)
>  	/* Store our arguments in global registers so we can still
> @@ -55,6 +54,7 @@ LOC(loop):
>  	 ld ENV(g1,JB_SP), %o0	/* Delay slot: extract target SP.  */
>  
>  LOC(thread):
> +	save	%sp, -96, %sp
>  	/*
>  	 * Do a "flush register windows trap".  The trap handler in the
>  	 * kernel writes all the register windows to their stack slots, and
> @@ -67,15 +67,13 @@ LOC(thread):
>  #ifdef PTR_DEMANGLE
>  	ld	ENV(g1,JB_PC), %g5 /* Set return PC. */
>  	ld	ENV(g1,JB_SP), %g1 /* Set saved SP on restore below. */
> -	PTR_DEMANGLE2 (%o7, %g5, %g4)
> +	PTR_DEMANGLE2 (%i7, %g5, %g4)
>  	PTR_DEMANGLE2 (%fp, %g1, %g4)
>  #else
> -	ld	ENV(g1,JB_PC), %o7 /* Set return PC. */
> +	ld	ENV(g1,JB_PC), %i7 /* Set return PC. */
>  	ld	ENV(g1,JB_SP), %fp /* Set saved SP on restore below. */
>  #endif
> -	sub	%fp, 64, %sp	/* Allocate a register frame. */
> -	st	%g3, RW_FP	/* Set saved FP on restore below. */
> -	retl
> +	jmp	%i7 + 8
>  	 restore %g2, 0, %o0	/* Restore values from above register frame. */

I don't really see anything in the code which would ensure that after 
we execute this restore, fp is going to be set to target fp value 
(available as ENV(g1,JB_FP) or g3. It might be the responsibility of 
the unwinding part, but then it would not explain why such code was 
explicitly present in the old version.

Best regards,
David Miller - Oct. 8, 2011, 7:22 p.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 8 Oct 2011 13:55:35 +0100

> On Sat, Oct 08, 2011 at 02:43:55AM -0400, David Miller wrote:
>> @@ -67,15 +67,13 @@ LOC(thread):
>>  #ifdef PTR_DEMANGLE
>>  	ld	ENV(g1,JB_PC), %g5 /* Set return PC. */
>>  	ld	ENV(g1,JB_SP), %g1 /* Set saved SP on restore below. */
>> -	PTR_DEMANGLE2 (%o7, %g5, %g4)
>> +	PTR_DEMANGLE2 (%i7, %g5, %g4)
>>  	PTR_DEMANGLE2 (%fp, %g1, %g4)
>>  #else
>> -	ld	ENV(g1,JB_PC), %o7 /* Set return PC. */
>> +	ld	ENV(g1,JB_PC), %i7 /* Set return PC. */
>>  	ld	ENV(g1,JB_SP), %fp /* Set saved SP on restore below. */
>>  #endif
>> -	sub	%fp, 64, %sp	/* Allocate a register frame. */
>> -	st	%g3, RW_FP	/* Set saved FP on restore below. */
>> -	retl
>> +	jmp	%i7 + 8
>>  	 restore %g2, 0, %o0	/* Restore values from above register frame. */
> 
> I don't really see anything in the code which would ensure that after 
> we execute this restore, fp is going to be set to target fp value 
> (available as ENV(g1,JB_FP) or g3. It might be the responsibility of 
> the unwinding part, but then it would not explain why such code was 
> explicitly present in the old version.

Because the restore traps, causing a window fill, and that trap
handler will load up the register window at JB_SP, inside of which
JB_FP is contained.

The only reason we store JB_FP is for the implementation the optimized
longjmp loop.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jurij Smakov - Oct. 12, 2011, 9:22 p.m.
On Sat, Oct 08, 2011 at 03:22:51PM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Sat, 8 Oct 2011 13:55:35 +0100
> 
> > On Sat, Oct 08, 2011 at 02:43:55AM -0400, David Miller wrote:
> >> @@ -67,15 +67,13 @@ LOC(thread):
> >>  #ifdef PTR_DEMANGLE
> >>  	ld	ENV(g1,JB_PC), %g5 /* Set return PC. */
> >>  	ld	ENV(g1,JB_SP), %g1 /* Set saved SP on restore below. */
> >> -	PTR_DEMANGLE2 (%o7, %g5, %g4)
> >> +	PTR_DEMANGLE2 (%i7, %g5, %g4)
> >>  	PTR_DEMANGLE2 (%fp, %g1, %g4)
> >>  #else
> >> -	ld	ENV(g1,JB_PC), %o7 /* Set return PC. */
> >> +	ld	ENV(g1,JB_PC), %i7 /* Set return PC. */
> >>  	ld	ENV(g1,JB_SP), %fp /* Set saved SP on restore below. */
> >>  #endif
> >> -	sub	%fp, 64, %sp	/* Allocate a register frame. */
> >> -	st	%g3, RW_FP	/* Set saved FP on restore below. */
> >> -	retl
> >> +	jmp	%i7 + 8
> >>  	 restore %g2, 0, %o0	/* Restore values from above register frame. */
> > 
> > I don't really see anything in the code which would ensure that after 
> > we execute this restore, fp is going to be set to target fp value 
> > (available as ENV(g1,JB_FP) or g3. It might be the responsibility of 
> > the unwinding part, but then it would not explain why such code was 
> > explicitly present in the old version.
> 
> Because the restore traps, causing a window fill, and that trap
> handler will load up the register window at JB_SP, inside of which
> JB_FP is contained.
> 
> The only reason we store JB_FP is for the implementation the optimized
> longjmp loop.

Thanks for the explanation.

I've built eglibc with this patch, and nothing exploded in an obvious 
way. Ruby test is still failing though, so I'm pretty sure now that 
this was not a problem even before the patch.

On an unrelated note, I was not able to decipher what's the purpose of 
the following code in __longjmp:

0:
        xor %fp, %g3, %o0
        add %fp, 512, %o1
        andncc %o0, 4095, %o0
        bne LOC(thread)
         cmp %o1, %g3
        bl LOC(thread)

It looks like this code is *very* old (was added back in 1998, it 
seems [0]), so I was not able to find any description for it. If you 
don't mind explaining what's going on here, I would appreciate it.

[0] http://lwn.net/1998/1022/a/sparc-glibc-patch.html

Best regards,
David Miller - Oct. 12, 2011, 10:09 p.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Wed, 12 Oct 2011 22:22:51 +0100

> 0:
>         xor %fp, %g3, %o0
>         add %fp, 512, %o1
>         andncc %o0, 4095, %o0
>         bne LOC(thread)
>          cmp %o1, %g3
>         bl LOC(thread)
> 
> It looks like this code is *very* old (was added back in 1998, it 
> seems [0]), so I was not able to find any description for it. If you 
> don't mind explaining what's going on here, I would appreciate it.
> 
> [0] http://lwn.net/1998/1022/a/sparc-glibc-patch.html
> 
> Best regards,

This code is seeing if the current stack/frame pointers are on a
different "page" or very far from the stack/frame we are unwinding
into.

The thinking is that if the distance is very large, then the loop is
unlikely to be more efficient, or we are unwinding out-of or into
a different kind of stack (thread stack, process stack, signal stack,
etc.).
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 12, 2011, 10:17 p.m.
FYI, I'm taking a quick look into this specific Ruby situation.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 12, 2011, 11:06 p.m.
Jurij, how do I setup this testcase?

I checked out Ruby from SVN and built it, but I can't find this
miniruby thing so that I can run the command line in that Ruby bug
report.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jurij Smakov - Oct. 12, 2011, 11:21 p.m.
On Wed, Oct 12, 2011 at 07:06:17PM -0400, David Miller wrote:
> 
> Jurij, how do I setup this testcase?
> 
> I checked out Ruby from SVN and built it, but I can't find this
> miniruby thing so that I can run the command line in that Ruby bug
> report.
> 
> Thanks.

Thanks for looking at it! Attached is a script which I use to set up 
the environment and start gdb for the binary (you probably will need 
directory names adjusted, if you are building from directly from svn 
and not from Debian package). I'm still in the process of trying to 
understand what's Ruby is trying to do with all the machine state 
saves and restores, so I was not able to make a lot of progress so 
far.

Best regads,
David Miller - Oct. 12, 2011, 11:42 p.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Thu, 13 Oct 2011 00:21:28 +0100

> On Wed, Oct 12, 2011 at 07:06:17PM -0400, David Miller wrote:
>> 
>> Jurij, how do I setup this testcase?
>> 
>> I checked out Ruby from SVN and built it, but I can't find this
>> miniruby thing so that I can run the command line in that Ruby bug
>> report.
>> 
>> Thanks.
> 
> Thanks for looking at it! Attached is a script which I use to set up 
> the environment and start gdb for the binary (you probably will need 
> directory names adjusted, if you are building from directly from svn 
> and not from Debian package). I'm still in the process of trying to 
> understand what's Ruby is trying to do with all the machine state 
> saves and restores, so I was not able to make a lot of progress so 
> far.

Thanks for the script.

Ruby is too clever for it's own good.  Right before it calls setjmp()
it copies the stack frame down to the current bottom of the stack
into a save area.

Later, right before it longjmp()'s, it copies back to the stack from
the save area.

Look at the "workaround" for x86-86 in cont_restore_1(), and all of
the special case code they need in order to get IA64 right.

What a mess.

This is only going to get worse when GCC's support for shrink-wrapping
and other interesting features propagates.  There is no guarentee that
the stack won't expand further downward between when Ruby saves the
stack frames and when it does it's setjmp() call, and it very much
relies upon that such a stack expansion not happening.

Anyways I'll see if there is some way to salvage this and make it
work.  I suspect that Solaris doesn't have the restore loop
optimization we do in longjmp, and that's why Ruby works there with
the same compilers on sparc.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jurij Smakov - Oct. 13, 2011, 10:06 p.m.
On Wed, Oct 12, 2011 at 07:42:28PM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Thu, 13 Oct 2011 00:21:28 +0100
> 
> > On Wed, Oct 12, 2011 at 07:06:17PM -0400, David Miller wrote:
> >> 
> >> Jurij, how do I setup this testcase?
> >> 
> >> I checked out Ruby from SVN and built it, but I can't find this
> >> miniruby thing so that I can run the command line in that Ruby bug
> >> report.
> >> 
> >> Thanks.
> > 
> > Thanks for looking at it! Attached is a script which I use to set up 
> > the environment and start gdb for the binary (you probably will need 
> > directory names adjusted, if you are building from directly from svn 
> > and not from Debian package). I'm still in the process of trying to 
> > understand what's Ruby is trying to do with all the machine state 
> > saves and restores, so I was not able to make a lot of progress so 
> > far.
> 
> Thanks for the script.
> 
> Ruby is too clever for it's own good.  Right before it calls setjmp()
> it copies the stack frame down to the current bottom of the stack
> into a save area.
> 
> Later, right before it longjmp()'s, it copies back to the stack from
> the save area.
> 
> Look at the "workaround" for x86-86 in cont_restore_1(), and all of
> the special case code they need in order to get IA64 right.
> 
> What a mess.
> 
> This is only going to get worse when GCC's support for shrink-wrapping
> and other interesting features propagates.  There is no guarentee that
> the stack won't expand further downward between when Ruby saves the
> stack frames and when it does it's setjmp() call, and it very much
> relies upon that such a stack expansion not happening.
> 
> Anyways I'll see if there is some way to salvage this and make it
> work.  I suspect that Solaris doesn't have the restore loop
> optimization we do in longjmp, and that's why Ruby works there with
> the same compilers on sparc.

I think I've figured it out (famous last words :-). The problem 
appears to be in cont_save_machine_stack in cont.c. The part where new 
memory is allocated and the machine state is saved using memcpy from
cont->machine_stack_src to cont->machine_stack generates the following 
assembler code:

   0xf7f4d728 <+584>:   mov  %i4, %o0                           // %o0 == 437, size of memory to allocate in words
   0xf7f4d72c <+588>:   call  0xf7fb4004 <ruby_xmalloc2@plt>
   0xf7f4d730 <+592>:   mov  4, %o1                             // %o1 == 4, word size
   0xf7f4d734 <+596>:   ld  [ %fp + -12 ], %g3                  // load 'cont' address into g3.
=> 0xf7f4d738 <+600>:   st  %o0, [ %g3 + 0x1c ]                 // %o0 contains the address returned by ruby_xmalloc2, store it in cont->machine_stack
   0xf7f4d73c <+604>:   flushw                                  // flush register windows
   0xf7f4d740 <+608>:   ld  [ %fp + -12 ], %g1                  // load 'cont' address into g1. But flushw might have caused a spill trap and changed fp!
   0xf7f4d744 <+612>:   sll  %i4, 2, %o2                        // 437*4, total amount of memory to copy, goes into o2, third arg for memcpy
   0xf7f4d748 <+616>:   call  0xf7fb2b1c <memcpy@plt>
   0xf7f4d74c <+620>:   ld  [ %g1 + 0x20 ], %o1                 // we load what we think to be cont->machine_stack_src into second arg
   0xf7f4d750 <+624>:   ld  [ %fp + -12 ], %g2
   0xf7f4d754 <+628>:   call  0xf7fb29b4 <_setjmp@plt>
   0xf7f4d758 <+632>:   add  %g2, 0x278, %o0
   0xf7f4d75c <+636>:   cmp  %o0, 0

I believe that whenever flushw causes a spill trap, we are going to 
load an incorrect source address (cont->machine_stack_src) as a second 
memcpy argument. A couple of observations support it: if you 
insert a breakpoint right after memcpy, you find that memory regions 
pointed to by cont->machine_stack and cont->machine_stack_src are not 
synchronized, as one would expect. Furthermore, breaking anywhere 
*before* will make the problem magically go away (perhaps because gdb 
flushes register windows itself on breakpoints, and then flushw in 
cont_capture is effectively a noop?)

I hope it makes at least some sense :-).

Best regards,
David Miller - Oct. 13, 2011, 10:35 p.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Thu, 13 Oct 2011 23:06:17 +0100

> I believe that whenever flushw causes a spill trap, we are going to 
> load an incorrect source address (cont->machine_stack_src) as a second 
> memcpy argument. A couple of observations support it: if you 
> insert a breakpoint right after memcpy, you find that memory regions 
> pointed to by cont->machine_stack and cont->machine_stack_src are not 
> synchronized, as one would expect. Furthermore, breaking anywhere 
> *before* will make the problem magically go away (perhaps because gdb 
> flushes register windows itself on breakpoints, and then flushw in 
> cont_capture is effectively a noop?)
> 
> I hope it makes at least some sense :-).

Good detective work, did I mention that this Ruby continuation stuff
is extremely fragile?

Can you show me what values %sp and %fp have right before the flushw
is executed?

The effect of taking a breakpoint right before the flushw ought to
be the same as executing a flushw.  When a process being debugged
by GDB takes a breakpoint, we flush all the user register windows
out of the cpu and onto the process stack, the wake up the parent
(GDB) and context switch.

Obviously, something different is happening when you just let the
flushw execute without an immediately preceeding breakpoint, so
we have to figure out exactly what that is :-)

Something you might want to try, compile cont.c into an assembler
file cont.s, then insert the following around the flushw

	mov	%fp, %g1
	flushw
	mov	%fp, %g2

Then compile that into an object and link up ruby.

In the debugger, breakpoint right after that "mov %fp, %g2" and
print out from GDB the values of %g1 and %g2.  This might give
some hints as to what's going on exactly.

Another test, go into Ruby's defines.h and get rid of the:

# if defined(__sparc_v9__) || defined(__sparcv9) || defined(__arch64__)
        ("flushw")
# else

and make it always use "ta 0x03" instead of "flushw".  This might
explain why the Ruby developers can't reproduce this on Solaris.  That
could happen if for some reason their Solaris build isn't setting the
defines that guard the flushw instruction usage.

If using "ta 0x03" instead of "flushw" makes a difference that would
be a huge clue.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jurij Smakov - Oct. 14, 2011, 11:06 p.m.
On Thu, Oct 13, 2011 at 06:35:18PM -0400, David Miller wrote:
> Something you might want to try, compile cont.c into an assembler
> file cont.s, then insert the following around the flushw
> 
> 	mov	%fp, %g1
> 	flushw
> 	mov	%fp, %g2
> 
> Then compile that into an object and link up ruby.
> 
> In the debugger, breakpoint right after that "mov %fp, %g2" and
> print out from GDB the values of %g1 and %g2.  This might give
> some hints as to what's going on exactly.
> 
> Another test, go into Ruby's defines.h and get rid of the:
> 
> # if defined(__sparc_v9__) || defined(__sparcv9) || defined(__arch64__)
>         ("flushw")
> # else
> 
> and make it always use "ta 0x03" instead of "flushw".  This might
> explain why the Ruby developers can't reproduce this on Solaris.  That
> could happen if for some reason their Solaris build isn't setting the
> defines that guard the flushw instruction usage.
> 
> If using "ta 0x03" instead of "flushw" makes a difference that would
> be a huge clue.

Replacing "flushw" with "ta 0x03" makes the problem go away. What is 
the difference between the two? I would naively think that the effect 
of both should be saving register windows on the stack, allocating a
new stack frame for each of them, so fp would get adjusted in either 
case. Then I would expect that the correct fix would be to indicate to 
the compiler that flushw is clobbering fp/sp registers, so it cannot 
rely on their contents afterwards. The fact that using "ta 0x03" fixes 
it makes me feel lost again :-).

Unfortunately, I was unable to do a conclusive second test (copying fp 
to some registers before and after flushw) so far, as inserting a 
break even after flushw but before memcpy() call makes the problem go 
away as well. I can break after memcpy, but it might clobber the 
registers, so the saving part would need to be a bit more elaborate.

Best regards,
David Miller - Oct. 14, 2011, 11:26 p.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 15 Oct 2011 00:06:53 +0100

> Replacing "flushw" with "ta 0x03" makes the problem go away. What is 
> the difference between the two? I would naively think that the effect 
> of both should be saving register windows on the stack, allocating a
> new stack frame for each of them, so fp would get adjusted in either 
> case. Then I would expect that the correct fix would be to indicate to 
> the compiler that flushw is clobbering fp/sp registers, so it cannot 
> rely on their contents afterwards. The fact that using "ta 0x03" fixes 
> it makes me feel lost again :-).

Taking a trap has a side effect, in that the %g* and %o* registers
will be saved and restored by the trap entry and exit respectively.

Trap entry also grabs a register window (for the kernel), which
is restored from on trap exit.

The register window flush is performed between this trap statesave and
restore.

Furthermore, it also means that the current register window will be
saved by the "ta 0x03" case.

This is probably why certain gdb breakpoints also make the problem go
away.

Essentially, "ta 0x03" is kind of like:

	save	%sp, -WHATEVER, %sp
	flushw
	restore

so it will restore one more register window than an actual 'flushw'
in userspace would.

I'm starting to become convinced that if you look at the stack
backtrace at the time of the flushw done by ruby, you'll see that
there are multiple stack frames using the same memory regions.

I'll try to look at this more closely myself, especially since you've
given me excellent tips on how to reproduce this and run it under gdb,
but I'm currently fighting a gcc bug which I want to clear away first.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jurij Smakov - Oct. 16, 2011, 5:07 p.m.
On Fri, Oct 14, 2011 at 07:26:50PM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Sat, 15 Oct 2011 00:06:53 +0100
> 
> > Replacing "flushw" with "ta 0x03" makes the problem go away. What is 
> > the difference between the two? I would naively think that the effect 
> > of both should be saving register windows on the stack, allocating a
> > new stack frame for each of them, so fp would get adjusted in either 
> > case. Then I would expect that the correct fix would be to indicate to 
> > the compiler that flushw is clobbering fp/sp registers, so it cannot 
> > rely on their contents afterwards. The fact that using "ta 0x03" fixes 
> > it makes me feel lost again :-).
> 
> Taking a trap has a side effect, in that the %g* and %o* registers
> will be saved and restored by the trap entry and exit respectively.
> 
> Trap entry also grabs a register window (for the kernel), which
> is restored from on trap exit.
> 
> The register window flush is performed between this trap statesave and
> restore.
> 
> Furthermore, it also means that the current register window will be
> saved by the "ta 0x03" case.
> 
> This is probably why certain gdb breakpoints also make the problem go
> away.
> 
> Essentially, "ta 0x03" is kind of like:
> 
> 	save	%sp, -WHATEVER, %sp
> 	flushw
> 	restore
> 
> so it will restore one more register window than an actual 'flushw'
> in userspace would.
> 
> I'm starting to become convinced that if you look at the stack
> backtrace at the time of the flushw done by ruby, you'll see that
> there are multiple stack frames using the same memory regions.
> 
> I'll try to look at this more closely myself, especially since you've
> given me excellent tips on how to reproduce this and run it under gdb,
> but I'm currently fighting a gcc bug which I want to clear away first.
> 
> Thanks!

Thank you!

In the meantime, I've recognized that I can store fp before and after 
flushw in %l0 and %l1, and memcpy is not allowed to touch them. After 
changing the code to

  mov %fp, %l0
  flushw
  mov %fp, %l1

I've found that value of %fp does not change as a result of flushw 
after all, even in the case when it crashes later. So, as far as I can 
tell, memcpy is receiving correct arguments. Furthermore, looking at 
memcpy implementation (backed by __memcpy_ultra3 in my case), I see 
that it's likely (I've not examined all possible paths) that before it 
branches to 'out', o1 will contain the current source address and 
o3 will contain the distance between source and destination. I checked 
these values after our memcpy call, and they are consistent, i.e. o1 
points at the end of source region, and o3 is the difference between 
the end of source and destination regions. That made me wonder whether 
we do copy at least part of the data, and it appears that only the 
beginning of the memory regions is not copied correctly. Here's an 
example dump of the first 32 bytes in the crashing case after memcpy:

(gdb) x/32xw cont->machine_stack_src
0xffffc96c:     0x00000001      0xffffc9e0      0xffffc9e0      0x00000000
0xffffc97c:     0x00000000      0x00000000      0x00000000      0x00000000
0xffffc98c:     0xf7fb1cb8      0xffffca40      0x00003910      0x00000000
0xffffc99c:     0x00022b88      0x00022b88      0x000001b5      0xffffc9e0
0xffffc9ac:     0xf7f4d7a4      0x00000000      0xf7ffc4d0      0xf7decc32
0xffffc9bc:     0xf7de8888      0xf7de46c8      0xffffc9f8      0x00000000
0xffffc9cc:     0x00000001      0xffffffff      0x001d6620      0x00047508
0xffffc9dc:     0x00047508      0x00000000      0x00000000      0x00000000
(gdb) x/32xw cont->machine_stack
0x1d6938:       0x00000001      0x00000000      0x00000000      0x00000000
0x1d6948:       0xf7fb1cb8      0x000c8170      0x00000001      0x000c76a9
0x1d6958:       0xffffcc94      0x00000001      0x000c76a8      0xffffcc30
0x1d6968:       0xf7ebc03c      0x00000000      0x00000000      0x00000005
0x1d6978:       0x000001db      0x00000000      0xf7ffc4d0      0xf7decc32
0x1d6988:       0xf7de8888      0xf7de46c8      0xffffc9f8      0x00000000
0x1d6998:       0x00000001      0xffffffff      0x001d6620      0x00047508
0x1d69a8:       0x00047508      0x00000000      0x00000000      0x00000000

As you can see, the first 17 words of the memory regions differ, but 
after the data appears to be copied correctly (total amount of data 
copied in this case is 437 words).

Assuming that the analysis is correct, and memcpy does receive correct 
arguments, it might be a bug in __memcpy_ultra3 (which would be very 
exciting :-). If you are using an UltraSparc III machine as well, and 
could try it on a different architecture, I would be a very interested 
in the result.

Best regards,
David Miller - Oct. 17, 2011, 12:56 a.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Sun, 16 Oct 2011 18:07:38 +0100

> Assuming that the analysis is correct, and memcpy does receive correct 
> arguments, it might be a bug in __memcpy_ultra3 (which would be very 
> exciting :-). If you are using an UltraSparc III machine as well, and 
> could try it on a different architecture, I would be a very interested 
> in the result.

I reproduced your crash last week on a Niagara3 system, therefore I
don't think it is dependent upon the memcpy implementation.

If you are still convinced it is some memcpy issue :-) I can only
suggest that you check those buffer pointers passed to memcpy, if
(given the size) they overlap at all, that would be a bug.  You can't
use memcpy() for overlapping buffers, one must use memmove() instead.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jurij Smakov - Oct. 18, 2011, 8:46 p.m.
On Sun, Oct 16, 2011 at 08:56:52PM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Sun, 16 Oct 2011 18:07:38 +0100
> 
> > Assuming that the analysis is correct, and memcpy does receive correct 
> > arguments, it might be a bug in __memcpy_ultra3 (which would be very 
> > exciting :-). If you are using an UltraSparc III machine as well, and 
> > could try it on a different architecture, I would be a very interested 
> > in the result.
> 
> I reproduced your crash last week on a Niagara3 system, therefore I
> don't think it is dependent upon the memcpy implementation.
> 
> If you are still convinced it is some memcpy issue :-) I can only
> suggest that you check those buffer pointers passed to memcpy, if
> (given the size) they overlap at all, that would be a bug.  You can't
> use memcpy() for overlapping buffers, one must use memmove() instead.

Ok, I'm now convinced that memcpy() is not broken :-). However, it 
took a while to explain the behaviour I see. Here's what happens:

1. We flush the windows before performing the memcpy. This should 
flush all register windows to memory, except the current one in use 
(this is a crucial observation). So, for the current window we have 
valid values in registers, but junk in memory, pointed to by sp.

2. The source address happen to differ from the current sp only by 4 
bytes:

(gdb) info reg sp
sp             0xffffc970       0xffffc970
(gdb) print cont->machine_stack_src
$3 = (VALUE *) 0xffffc96c
(gdb) 

I guess that the expectation is that all register windows (including 
the current one!) are correctly represented in memory after flushw. 
But for the current register window this is not true, because flushw 
is not supposed to flush it (according to Sparc Architecture Manual, 
3.2.7).

3. We perform the memcpy(), copying the junk we believe to be valid
register values (sp to sp + 16 words) to cont->machine_stack. 
Restoring it later causes a crash.

That explains why 'ta 0x03' works - when it is executed, it actually 
does save, which increments register window, then flushw, making sure 
that the register window corresponding to cont_capture state is 
getting flushed as well, and then restores/returns. As a result, we 
end up with correct register content for the current window in memory 
before memcpy.

Finally, the "failing" memcpy is also easy to explain now: when we 
invoke the memcpy, we do it with the correct arguments, but we really 
do have incorrect memory contents in the source buffer, which end up 
in the destination. However, as soon as gdb breaks, it flushes *all* 
register windows of the interrupted process, including the current 
one, altering the memory contents of the source buffer, and making it 
appear like memcpy failed to synchronize the source and the 
destination.

Best regards,
David Miller - Oct. 18, 2011, 8:53 p.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Tue, 18 Oct 2011 21:46:20 +0100

> Finally, the "failing" memcpy is also easy to explain now: when we 
> invoke the memcpy, we do it with the correct arguments, but we really 
> do have incorrect memory contents in the source buffer, which end up 
> in the destination. However, as soon as gdb breaks, it flushes *all* 
> register windows of the interrupted process, including the current 
> one, altering the memory contents of the source buffer, and making it 
> appear like memcpy failed to synchronize the source and the 
> destination.

Now this makes a lot of sense, thanks for working to get to the bottom
of this.

I would suggest fixing this in Ruby by one or two possible methods:

1) Always do "ta 0x03", this is the simplest but most expensive fix.

2) Force the "flushw" into a helper function and force GCC to allocate
   a register window by clobbering the return address register, something
   like:

	void flushw_helper(void)
	{
	  /* Clobber %o7 so that the compiler is forced to allocate
	     a register window for this function.  */
	  __asm__ __volatile__("flushw" : : : "o7");
	}

   That results (on 32-bit) in something like:

	save	%sp, -96, %sp
	flushw
	return	%i7 + 8
	 nop

   which gives us exactly what we need.

Thanks again Jurij.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jurij Smakov - Oct. 22, 2011, 9 a.m.
On Tue, Oct 18, 2011 at 04:53:53PM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Tue, 18 Oct 2011 21:46:20 +0100
> 
> > Finally, the "failing" memcpy is also easy to explain now: when we 
> > invoke the memcpy, we do it with the correct arguments, but we really 
> > do have incorrect memory contents in the source buffer, which end up 
> > in the destination. However, as soon as gdb breaks, it flushes *all* 
> > register windows of the interrupted process, including the current 
> > one, altering the memory contents of the source buffer, and making it 
> > appear like memcpy failed to synchronize the source and the 
> > destination.
> 
> Now this makes a lot of sense, thanks for working to get to the bottom
> of this.
> 
> I would suggest fixing this in Ruby by one or two possible methods:
> 
> 1) Always do "ta 0x03", this is the simplest but most expensive fix.

It turns out that freebsd/sparc64 is also suffering from this bug, 
and, as far as I can tell, there is no equivalent of 'ta 0x03' there. 
So, to fix it in a more generic way we probably should push for the 
second option.
 
> 2) Force the "flushw" into a helper function and force GCC to allocate
>    a register window by clobbering the return address register, something
>    like:
> 
> 	void flushw_helper(void)
> 	{
> 	  /* Clobber %o7 so that the compiler is forced to allocate
> 	     a register window for this function.  */
> 	  __asm__ __volatile__("flushw" : : : "o7");
> 	}
> 
>    That results (on 32-bit) in something like:
> 
> 	save	%sp, -96, %sp
> 	flushw
> 	return	%i7 + 8
> 	 nop
> 
>    which gives us exactly what we need.

Is something conceptually wrong with just doing this:

asm volatile ("save; flush; restore");

It fixes the problem for freebsd (will test on linux later today), and 
does not rely on compiler doing (or not doing) the "right" 
optimization.

Best regards,
David Miller - Oct. 22, 2011, 9:05 a.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 22 Oct 2011 10:00:24 +0100

> Is something conceptually wrong with just doing this:
> 
> asm volatile ("save; flush; restore");
> 
> It fixes the problem for freebsd (will test on linux later today), and 
> does not rely on compiler doing (or not doing) the "right" 
> optimization.

I would not do a "naked" save here, please just use the construct
I showed you with the %o7 register clobber.

If you don't allocate space for the new register window, you will
be writing the local registers to the parent's register window.

Maybe you get lucky and this works, but after spending multiple weeks
on this bug why take any chances at all at this point?
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jurij Smakov - Oct. 22, 2011, 9:43 a.m.
On Sat, Oct 22, 2011 at 05:05:05AM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Sat, 22 Oct 2011 10:00:24 +0100
> 
> > Is something conceptually wrong with just doing this:
> > 
> > asm volatile ("save; flush; restore");
> > 
> > It fixes the problem for freebsd (will test on linux later today), and 
> > does not rely on compiler doing (or not doing) the "right" 
> > optimization.
> 
> I would not do a "naked" save here, please just use the construct
> I showed you with the %o7 register clobber.

Yes, I missed that save is not allocating any memory, which is 
wrong. What I actually had in mind was this:

asm volatile ("save %sp, -96, %sp; flushw; restore");

My concern with the approach you are proposing is that compiler 
may (in the future, perhaps) just optimize out the helper function, 
making arrangements to save %o7 in one of the unused registers, thus 
removing allocation of another register window and breaking it again.

Best regards,
David Miller - Oct. 22, 2011, 10:54 p.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 22 Oct 2011 10:43:39 +0100

> On Sat, Oct 22, 2011 at 05:05:05AM -0400, David Miller wrote:
>> From: Jurij Smakov <jurij@wooyd.org>
>> Date: Sat, 22 Oct 2011 10:00:24 +0100
>> 
>> > Is something conceptually wrong with just doing this:
>> > 
>> > asm volatile ("save; flush; restore");
>> > 
>> > It fixes the problem for freebsd (will test on linux later today), and 
>> > does not rely on compiler doing (or not doing) the "right" 
>> > optimization.
>> 
>> I would not do a "naked" save here, please just use the construct
>> I showed you with the %o7 register clobber.
> 
> Yes, I missed that save is not allocating any memory, which is 
> wrong. What I actually had in mind was this:
> 
> asm volatile ("save %sp, -96, %sp; flushw; restore");

Only works on 32-bit, will crash on 64-bit.

Just let the compiler do it for you, please... the compiler absolutely cannot
optimize away a function that uses a volatile asm statement.  And it absolutely
cannot make a function that clobbers %o7 a leaf function.

Please just use the approach I suggested, it covers all cases and is in fact
future proof.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jurij Smakov - Oct. 23, 2011, 1:47 p.m.
On Sat, Oct 22, 2011 at 06:54:05PM -0400, David Miller wrote:
>
> Just let the compiler do it for you, please... the compiler absolutely cannot
> optimize away a function that uses a volatile asm statement.  And it absolutely
> cannot make a function that clobbers %o7 a leaf function.

That's not what I see. I tried writing up a patch for Ruby using this, 
and here's what I get:

1. Preprocessed source of cont.c contains flush_register_windows 
definition:

# 219 "./include/ruby/defines.h"
static void
flush_register_windows(void)
{
    asm __volatile__ ("flushw" : : : "o7");
}

2. Preprocessed code of cont_save_machine_stack contains a call to 
flush_register_windows.

3. In cont.s both cont_save_machine_stack and flush_register_windows 
are completely optimized out when compiled with -O2. Relevant 
snippet:

.LL47:
        call    ruby_xmalloc2, 0
         mov    4, %o1
        ld      [%fp-12], %g3
        st      %o0, [%g3+28]
.LLBB115:
.LLBB114:
        .loc 2 222 0
#APP
! 222 "./include/ruby/defines.h" 1
        flushw
! 0 "" 2
#NO_APP
.LLBE114:
.LLBE115:
        .loc 1 353 0
        ld      [%fp-12], %g1
        sll     %i4, 2, %o2
        call    memcpy, 0
         ld     [%g1+32], %o1

In assembled form (objdump -d) it looks like this:

     54c:       40 00 00 00     call  54c <cont_capture+0x24c>
     550:       92 10 20 04     mov  4, %o1
     554:       c6 07 bf f4     ld  [ %fp + -12 ], %g3
     558:       d0 20 e0 1c     st  %o0, [ %g3 + 0x1c ]
     55c:       81 58 00 00     flushw 
     560:       c2 07 bf f4     ld  [ %fp + -12 ], %g1
     564:       95 2f 20 02     sll  %i4, 2, %o2
     568:       40 00 00 00     call  568 <cont_capture+0x268>
     56c:       d2 00 60 20     ld  [ %g1 + 0x20 ], %o1

I also tried listing the clobbered register as "%o7" instead of "o7", 
as the examples I could find prepend the register names with % in the 
clobber list. The results were the same.

Best regards,
David Miller - Oct. 23, 2011, 8:35 p.m.
From: Jurij Smakov <jurij@wooyd.org>
Date: Sun, 23 Oct 2011 14:47:27 +0100

> On Sat, Oct 22, 2011 at 06:54:05PM -0400, David Miller wrote:
>>
>> Just let the compiler do it for you, please... the compiler absolutely cannot
>> optimize away a function that uses a volatile asm statement.  And it absolutely
>> cannot make a function that clobbers %o7 a leaf function.
> 
> That's not what I see. I tried writing up a patch for Ruby using this, 
> and here's what I get:
> 
> 1. Preprocessed source of cont.c contains flush_register_windows 
> definition:
> 
> # 219 "./include/ruby/defines.h"
> static void
> flush_register_windows(void)
> {
>     asm __volatile__ ("flushw" : : : "o7");
> }

You can't put it in a header file, GCC will inline it.  You have to
put it in a function in a seperate C file, and this is what I said to
do the other week in my original email.

You're only other option is to explicitly mark this
__attribute__((__noinline__))
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/sysdeps/sparc/sparc32/__longjmp.S b/sysdeps/sparc/sparc32/__longjmp.S
index a5453b4..70e8e29 100644
--- a/sysdeps/sparc/sparc32/__longjmp.S
+++ b/sysdeps/sparc/sparc32/__longjmp.S
@@ -22,7 +22,6 @@ 
 #include <jmpbuf-offsets.h>
 #define ENV(base,reg) [%base + (reg * 4)]
 #define ST_FLUSH_WINDOWS 3
-#define RW_FP [%fp + 0x48]
 
 ENTRY(__longjmp)
 	/* Store our arguments in global registers so we can still
@@ -55,6 +54,7 @@  LOC(loop):
 	 ld ENV(g1,JB_SP), %o0	/* Delay slot: extract target SP.  */
 
 LOC(thread):
+	save	%sp, -96, %sp
 	/*
 	 * Do a "flush register windows trap".  The trap handler in the
 	 * kernel writes all the register windows to their stack slots, and
@@ -67,15 +67,13 @@  LOC(thread):
 #ifdef PTR_DEMANGLE
 	ld	ENV(g1,JB_PC), %g5 /* Set return PC. */
 	ld	ENV(g1,JB_SP), %g1 /* Set saved SP on restore below. */
-	PTR_DEMANGLE2 (%o7, %g5, %g4)
+	PTR_DEMANGLE2 (%i7, %g5, %g4)
 	PTR_DEMANGLE2 (%fp, %g1, %g4)
 #else
-	ld	ENV(g1,JB_PC), %o7 /* Set return PC. */
+	ld	ENV(g1,JB_PC), %i7 /* Set return PC. */
 	ld	ENV(g1,JB_SP), %fp /* Set saved SP on restore below. */
 #endif
-	sub	%fp, 64, %sp	/* Allocate a register frame. */
-	st	%g3, RW_FP	/* Set saved FP on restore below. */
-	retl
+	jmp	%i7 + 8
 	 restore %g2, 0, %o0	/* Restore values from above register frame. */
 
 LOC(found):