diff mbox

Misaligned stack on 32-bit s390?

Message ID 5461DBC4.9090508@redhat.com
State New
Headers show

Commit Message

Carlos O'Donell Nov. 11, 2014, 9:49 a.m. UTC
On 11/11/2014 04:28 AM, Richard Henderson wrote:
> On 11/11/2014 06:31 AM, Carlos O'Donell wrote:
>> Any clever ideas on how to fix this without copying up a large
>> portion of the stack?
> 
> Nope, because other targets do in fact have to do just that.

Right.
 
> I'm actually surprised that almost all of them don't.  I suppose
> that just depends on how the ABI is set up to pass parameters to
> the user _start...

Yes, on hppa the ABI was such that I didn't depend on the layout,
but s390 does, thus I think it is non-trivial.

The argv might be trivial to copy, but the size of envp is unknown,
and _start from the application side in start.S expects argv and
envp to be right up against eachother AFAICT. Worse it wants to
scan past them in order to find auxv. The saving grace is that
the auxv scan parses an arbitrary number of zeros between envp
and auxv. So I can move the gap to be between envp and auxv.
 
> Fortunately, s390 has a block copy instruction, so the move
> should be trivial to implement.

For argv only. What instruction is the block copy? Are you
talking about lm/stm?

The most naive fix I have working is as follows (I've handed
off to Siddhesh to have a look since I'm out of time this
evening/morning).

---

Cheers,
Carlos.

Comments

Richard Henderson Nov. 11, 2014, 10:19 a.m. UTC | #1
On 11/11/2014 10:49 AM, Carlos O'Donell wrote:
> For argv only. What instruction is the block copy? Are you
> talking about lm/stm?

I was thinking of EX combined with MVC.

But of course you're right about envp and argp, where you'd
have to count the number of entries before forming the count.

> +	# Now we have to zero out the envp entries after NULL to allow\n\
> +	# start.S to properly find auxv by skipping zeroes.\n\
> +	# zero out loop:\n\
> +	lhi   %r7,0\n\
> +.L3:	st    %r7,0(%r6)	# Store zero.\n\
> +	ahi   %r6,4		# Advance dest pointer.\n\
> +	ahi   %r1,-1		# Subtract one from the word count.\n\
> +	ltr   %r1,%r1\n\
> +	jne    .L3		# Keep copying if the word count is non-zero.\n\

In the alpha port, I just copy auxv down as well.


r~
Siddhesh Poyarekar Nov. 11, 2014, 1:35 p.m. UTC | #2
On Tue, Nov 11, 2014 at 04:49:56AM -0500, Carlos O'Donell wrote:
> +	# Set the back chain to zero again\n\
> +.L4:	xc    0(4,%r15),0(%r15)\n\

I don't think this is needed now, since you're not shifting the stack
pointer anymore.

Siddhesh
Andreas Krebbel Nov. 11, 2014, 2:54 p.m. UTC | #3
On 11/11/2014 10:49 AM, Carlos O'Donell wrote:
> On 11/11/2014 04:28 AM, Richard Henderson wrote:
>> On 11/11/2014 06:31 AM, Carlos O'Donell wrote:
>>> Any clever ideas on how to fix this without copying up a large
>>> portion of the stack?
>>
>> Nope, because other targets do in fact have to do just that.
> 
> Right.
> 
>> I'm actually surprised that almost all of them don't.  I suppose
>> that just depends on how the ABI is set up to pass parameters to
>> the user _start...
> 
> Yes, on hppa the ABI was such that I didn't depend on the layout,
> but s390 does, thus I think it is non-trivial.
> 
> The argv might be trivial to copy, but the size of envp is unknown,
> and _start from the application side in start.S expects argv and
> envp to be right up against eachother AFAICT. Worse it wants to
> scan past them in order to find auxv. The saving grace is that
> the auxv scan parses an arbitrary number of zeros between envp
> and auxv. So I can move the gap to be between envp and auxv.
> 
>> Fortunately, s390 has a block copy instruction, so the move
>> should be trivial to implement.
> 
> For argv only. What instruction is the block copy? Are you
> talking about lm/stm?
> 
> The most naive fix I have working is as follows (I've handed
> off to Siddhesh to have a look since I'm out of time this
> evening/morning).

Thanks for working on this!  Really surprising that this stayed unnoticed for that long. I thought
the testsuite invokes the testcases as ld.so parameter as well?

As I understand it an alignment fix is only required if _dl_skip_args is odd.  So what about
checking the least significant bit (tmll) and copying argv/envv down by 4 bytes? That way we would
not need to copy anything in 50% of the cases?

Would you like to continue working on it or should we try to take over? (Stefan or myself)

Bye,

-Andreas-

> 
> diff --git a/sysdeps/s390/s390-32/dl-machine.h b/sysdeps/s390/s390-32/dl-machine.h
> index c56185c..b189552 100644
> --- a/sysdeps/s390/s390-32/dl-machine.h
> +++ b/sysdeps/s390/s390-32/dl-machine.h
> @@ -166,18 +166,47 @@ _dl_start_user:\n\
>  	# See if we were run as a command with the executable file\n\
>  	# name as an extra leading argument.\n\
>  	l     %r1,_dl_skip_args@GOT12(0,%r12)\n\
> -	l     %r1,0(%r1)          # load _dl_skip_args\n\
> +	l     %r1,0(%r1)	# load _dl_skip_args\n\
> +	ltr   %r1,%r1\n\
> +	je    .L4		# Skip the arg adjustment if there were none.\n\
>  	# Get the original argument count.\n\
>  	l     %r0,96(%r15)\n\
>  	# Subtract _dl_skip_args from it.\n\
>  	sr    %r0,%r1\n\
> -	# Adjust the stack pointer to skip _dl_skip_args words.\n\
> -	sll   %r1,2\n\
> -	ar    %r15,%r1\n\
> -	# Set the back chain to zero again\n\
> -	xc    0(4,%r15),0(%r15)\n\
>  	# Store back the modified argument count.\n\
>  	st    %r0,96(%r15)\n\
> +	# Copy argv and envp forward to account for skipped argv entries.\n\
> +	# We skipped at least one argument or we would not get here.\n\
> +	la    %r6,100(%r15)	# Destination pointer i.e. &argv[0]\n\
> +	lr    %r5,%r6\n\
> +	lr    %r0,%r1\n\
> +	sll   %r0,2\n\		# Number of skipped bytes.\n\
> +	ar    %r5,%r0		# Source pointer = Dest + Skipped args.\n\
> +	# argv copy loop:\n\
> +.L1:	l     %r7,0(%r5)	# Load a word from the source.\n\
> +	st    %r7,0(%r6)	# Store the word in the destination.\n\
> +	ahi   %r5,4\n\
> +	ahi   %r6,4\n\
> +	ltr   %r7,%r7\n\
> +	jne   .L1		# Stop after copying the NULL.\n\
> +	# envp copy loop:\n\
> +.L2:	l     %r7,0(%r5)	# Load a word from the source.\n\
> +	st    %r7,0(%r6)	# Store the word in the destination.\n\
> +	ahi   %r5,4\n\
> +	ahi   %r6,4\n\
> +	ltr   %r7,%r7\n\
> +	jne   .L2		# Stop after copying the NULL.\n\
> +	# Now we have to zero out the envp entries after NULL to allow\n\
> +	# start.S to properly find auxv by skipping zeroes.\n\
> +	# zero out loop:\n\
> +	lhi   %r7,0\n\
> +.L3:	st    %r7,0(%r6)	# Store zero.\n\
> +	ahi   %r6,4		# Advance dest pointer.\n\
> +	ahi   %r1,-1		# Subtract one from the word count.\n\
> +	ltr   %r1,%r1\n\
> +	jne    .L3		# Keep copying if the word count is non-zero.\n\
> +	# Set the back chain to zero again\n\
> +.L4:	xc    0(4,%r15),0(%r15)\n\
>  	# The special initializer gets called with the stack just\n\
>  	# as the application's entry point will see it; it can\n\
>  	# switch stacks if it moves these contents over.\n\
> ---
> 
> Cheers,
> Carlos.
>
Andreas Schwab Nov. 11, 2014, 4:33 p.m. UTC | #4
Andreas Krebbel <krebbel@linux.vnet.ibm.com> writes:

> As I understand it an alignment fix is only required if _dl_skip_args is
> odd.  So what about
> checking the least significant bit (tmll) and copying argv/envv down by 4
> bytes? That way we would
> not need to copy anything in 50% of the cases?

I don't think it makes sense optimizing for this case, which is rare
outside of runing the glibc testsuite.

Andreas.
Andreas Krebbel Nov. 11, 2014, 6:19 p.m. UTC | #5
On 11/11/2014 10:49 AM, Carlos O'Donell wrote:
> On 11/11/2014 04:28 AM, Richard Henderson wrote:
>> On 11/11/2014 06:31 AM, Carlos O'Donell wrote:
>>> Any clever ideas on how to fix this without copying up a large
>>> portion of the stack?
>>
>> Nope, because other targets do in fact have to do just that.
> 
> Right.
> 
>> I'm actually surprised that almost all of them don't.  I suppose
>> that just depends on how the ABI is set up to pass parameters to
>> the user _start...
> 
> Yes, on hppa the ABI was such that I didn't depend on the layout,
> but s390 does, thus I think it is non-trivial.
> 
> The argv might be trivial to copy, but the size of envp is unknown,
> and _start from the application side in start.S expects argv and
> envp to be right up against eachother AFAICT. Worse it wants to
> scan past them in order to find auxv. The saving grace is that
> the auxv scan parses an arbitrary number of zeros between envp
> and auxv. So I can move the gap to be between envp and auxv.
> 
>> Fortunately, s390 has a block copy instruction, so the move
>> should be trivial to implement.
> 
> For argv only. What instruction is the block copy? Are you
> talking about lm/stm?
> 
> The most naive fix I have working is as follows (I've handed
> off to Siddhesh to have a look since I'm out of time this
> evening/morning).
> 
> diff --git a/sysdeps/s390/s390-32/dl-machine.h b/sysdeps/s390/s390-32/dl-machine.h
> index c56185c..b189552 100644
> --- a/sysdeps/s390/s390-32/dl-machine.h
> +++ b/sysdeps/s390/s390-32/dl-machine.h
> @@ -166,18 +166,47 @@ _dl_start_user:\n\
>  	# See if we were run as a command with the executable file\n\
>  	# name as an extra leading argument.\n\
>  	l     %r1,_dl_skip_args@GOT12(0,%r12)\n\
> -	l     %r1,0(%r1)          # load _dl_skip_args\n\
> +	l     %r1,0(%r1)	# load _dl_skip_args\n\
> +	ltr   %r1,%r1\n\
> +	je    .L4		# Skip the arg adjustment if there were none.\n\
>  	# Get the original argument count.\n\
>  	l     %r0,96(%r15)\n\
>  	# Subtract _dl_skip_args from it.\n\
>  	sr    %r0,%r1\n\
> -	# Adjust the stack pointer to skip _dl_skip_args words.\n\
> -	sll   %r1,2\n\
> -	ar    %r15,%r1\n\
> -	# Set the back chain to zero again\n\
> -	xc    0(4,%r15),0(%r15)\n\
>  	# Store back the modified argument count.\n\
>  	st    %r0,96(%r15)\n\
> +	# Copy argv and envp forward to account for skipped argv entries.\n\
> +	# We skipped at least one argument or we would not get here.\n\
> +	la    %r6,100(%r15)	# Destination pointer i.e. &argv[0]\n\
> +	lr    %r5,%r6\n\
> +	lr    %r0,%r1\n\
> +	sll   %r0,2\n\		# Number of skipped bytes.\n\
> +	ar    %r5,%r0		# Source pointer = Dest + Skipped args.\n\
> +	# argv copy loop:\n\
> +.L1:	l     %r7,0(%r5)	# Load a word from the source.\n\
> +	st    %r7,0(%r6)	# Store the word in the destination.\n\
> +	ahi   %r5,4\n\
> +	ahi   %r6,4\n\
> +	ltr   %r7,%r7\n\
> +	jne   .L1		# Stop after copying the NULL.\n\
> +	# envp copy loop:\n\
> +.L2:	l     %r7,0(%r5)	# Load a word from the source.\n\
> +	st    %r7,0(%r6)	# Store the word in the destination.\n\
> +	ahi   %r5,4\n\
> +	ahi   %r6,4\n\
> +	ltr   %r7,%r7\n\
> +	jne   .L2		# Stop after copying the NULL.\n\
> +	# Now we have to zero out the envp entries after NULL to allow\n\
> +	# start.S to properly find auxv by skipping zeroes.\n\
> +	# zero out loop:\n\
> +	lhi   %r7,0\n\
> +.L3:	st    %r7,0(%r6)	# Store zero.\n\
> +	ahi   %r6,4		# Advance dest pointer.\n\
> +	ahi   %r1,-1		# Subtract one from the word count.\n\
> +	ltr   %r1,%r1\n\
> +	jne    .L3		# Keep copying if the word count is non-zero.\n\
> +	# Set the back chain to zero again\n\
> +.L4:	xc    0(4,%r15),0(%r15)\n\
>  	# The special initializer gets called with the stack just\n\
>  	# as the application's entry point will see it; it can\n\
>  	# switch stacks if it moves these contents over.\n\

The patch is ok to apply with the xc removed. Thanks!

A minor improvement might be to make use of an index reg in the loops. Perhaps something like this
(untested):

 +	# argv copy loop:\n\
 +	lhi   %r8,0
 +.L1:	l     %r7,0(%r8,%r5)	# Load a word from the source.\n\
 +	st    %r7,0(%r8,%r6)	# Store the word in the destination.\n\
 +	ahi   %r8,4\n\
 +	ltr   %r7,%r7\n\
 +	jne   .L1		# Stop after copying the NULL.\n\
 +	# envp copy loop:\n\
 +.L2:	l     %r7,0(%r8,%r5)	# Load a word from the source.\n\
 +	st    %r7,0(%r8,%r6)	# Store the word in the destination.\n\
 +	ahi   %r8,4\n\
 +	ltr   %r7,%r7\n\
 +	jne   .L2		# Stop after copying the NULL.\n\
 +	# Now we have to zero out the envp entries after NULL to allow\n\
 +	# start.S to properly find auxv by skipping zeroes.\n\
 +	# zero out loop:\n\
 +	lhi   %r7,0\n\
 +.L3:	st    %r7,0(%r8,%r6)	# Store zero.\n\
 +	ahi   %r8,4		# Advance dest pointer.\n\
 +	ahi   %r1,-1		# Subtract one from the word count.\n\
 +	ltr   %r1,%r1\n\
 +	jne    .L3		# Keep copying if the word count is non-zero.\n\
 +	# Set the back chain to zero again\n\
 +.L4:

Bye,

-Andreas-
Carlos O'Donell Nov. 12, 2014, 4 a.m. UTC | #6
On 11/11/2014 09:54 AM, Andreas Krebbel wrote:
> Would you like to continue working on it or should we try to take
> over? (Stefan or myself)

We can get it done. It would be great to have you review the final
patch.

Please note that the patch I posted is incomplete, it fails to
readjust _dl_argv which is cached by the loader and needs to be
changed if argv is moved. Simple fix though.

I also don't know how long s390 lasted without this breaking
something. I guess aligned stacks don't really matter all that
much ;-)

Cheers,
Carlos.
Andreas Krebbel Nov. 12, 2014, 9:05 a.m. UTC | #7
On 11/12/2014 05:00 AM, Carlos O'Donell wrote:
> On 11/11/2014 09:54 AM, Andreas Krebbel wrote:
>> Would you like to continue working on it or should we try to take
>> over? (Stefan or myself)
> 
> We can get it done. It would be great to have you review the final
> patch.
> 
> Please note that the patch I posted is incomplete, it fails to
> readjust _dl_argv which is cached by the loader and needs to be
> changed if argv is moved. Simple fix though.
> 
> I also don't know how long s390 lasted without this breaking
> something. I guess aligned stacks don't really matter all that
> much ;-)

The reason probably is that the broken alignment never reaches the executable. In _start.S we
correct the alignment again. So it really is only a problem for code executed between the argv
adjustments done by ld.so and the entry point of the executable.

-Andreas-

> 
> Cheers,
> Carlos.
>
Rich Felker Nov. 12, 2014, 9:46 p.m. UTC | #8
On Wed, Nov 12, 2014 at 10:05:45AM +0100, Andreas Krebbel wrote:
> On 11/12/2014 05:00 AM, Carlos O'Donell wrote:
> > On 11/11/2014 09:54 AM, Andreas Krebbel wrote:
> >> Would you like to continue working on it or should we try to take
> >> over? (Stefan or myself)
> > 
> > We can get it done. It would be great to have you review the final
> > patch.
> > 
> > Please note that the patch I posted is incomplete, it fails to
> > readjust _dl_argv which is cached by the loader and needs to be
> > changed if argv is moved. Simple fix though.
> > 
> > I also don't know how long s390 lasted without this breaking
> > something. I guess aligned stacks don't really matter all that
> > much ;-)
> 
> The reason probably is that the broken alignment never reaches the executable. In _start.S we
> correct the alignment again. So it really is only a problem for code executed between the argv
> adjustments done by ld.so and the entry point of the executable.

How/why does any code get executed in that interval anyway?

Rich
Andreas Schwab Nov. 12, 2014, 10:17 p.m. UTC | #9
Rich Felker <dalias@libc.org> writes:

> On Wed, Nov 12, 2014 at 10:05:45AM +0100, Andreas Krebbel wrote:
>> The reason probably is that the broken alignment never reaches the executable. In _start.S we
>> correct the alignment again. So it really is only a problem for code executed between the argv
>> adjustments done by ld.so and the entry point of the executable.
>
> How/why does any code get executed in that interval anyway?

It's the dynamic linker itself and the init code.

Andreas.
diff mbox

Patch

diff --git a/sysdeps/s390/s390-32/dl-machine.h b/sysdeps/s390/s390-32/dl-machine.h
index c56185c..b189552 100644
--- a/sysdeps/s390/s390-32/dl-machine.h
+++ b/sysdeps/s390/s390-32/dl-machine.h
@@ -166,18 +166,47 @@  _dl_start_user:\n\
 	# See if we were run as a command with the executable file\n\
 	# name as an extra leading argument.\n\
 	l     %r1,_dl_skip_args@GOT12(0,%r12)\n\
-	l     %r1,0(%r1)          # load _dl_skip_args\n\
+	l     %r1,0(%r1)	# load _dl_skip_args\n\
+	ltr   %r1,%r1\n\
+	je    .L4		# Skip the arg adjustment if there were none.\n\
 	# Get the original argument count.\n\
 	l     %r0,96(%r15)\n\
 	# Subtract _dl_skip_args from it.\n\
 	sr    %r0,%r1\n\
-	# Adjust the stack pointer to skip _dl_skip_args words.\n\
-	sll   %r1,2\n\
-	ar    %r15,%r1\n\
-	# Set the back chain to zero again\n\
-	xc    0(4,%r15),0(%r15)\n\
 	# Store back the modified argument count.\n\
 	st    %r0,96(%r15)\n\
+	# Copy argv and envp forward to account for skipped argv entries.\n\
+	# We skipped at least one argument or we would not get here.\n\
+	la    %r6,100(%r15)	# Destination pointer i.e. &argv[0]\n\
+	lr    %r5,%r6\n\
+	lr    %r0,%r1\n\
+	sll   %r0,2\n\		# Number of skipped bytes.\n\
+	ar    %r5,%r0		# Source pointer = Dest + Skipped args.\n\
+	# argv copy loop:\n\
+.L1:	l     %r7,0(%r5)	# Load a word from the source.\n\
+	st    %r7,0(%r6)	# Store the word in the destination.\n\
+	ahi   %r5,4\n\
+	ahi   %r6,4\n\
+	ltr   %r7,%r7\n\
+	jne   .L1		# Stop after copying the NULL.\n\
+	# envp copy loop:\n\
+.L2:	l     %r7,0(%r5)	# Load a word from the source.\n\
+	st    %r7,0(%r6)	# Store the word in the destination.\n\
+	ahi   %r5,4\n\
+	ahi   %r6,4\n\
+	ltr   %r7,%r7\n\
+	jne   .L2		# Stop after copying the NULL.\n\
+	# Now we have to zero out the envp entries after NULL to allow\n\
+	# start.S to properly find auxv by skipping zeroes.\n\
+	# zero out loop:\n\
+	lhi   %r7,0\n\
+.L3:	st    %r7,0(%r6)	# Store zero.\n\
+	ahi   %r6,4		# Advance dest pointer.\n\
+	ahi   %r1,-1		# Subtract one from the word count.\n\
+	ltr   %r1,%r1\n\
+	jne    .L3		# Keep copying if the word count is non-zero.\n\
+	# Set the back chain to zero again\n\
+.L4:	xc    0(4,%r15),0(%r15)\n\
 	# The special initializer gets called with the stack just\n\
 	# as the application's entry point will see it; it can\n\
 	# switch stacks if it moves these contents over.\n\