diff mbox

[PR64164] drop copyrename, integrate into expand

Message ID or4mgtmpgs.fsf@livre.home
State New
Headers show

Commit Message

Alexandre Oliva Nov. 10, 2015, 10:58 p.m. UTC
On Nov 10, 2015, Alan Lawrence <alan.lawrence@arm.com> wrote:

> FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -O2

Ugh, sorry.  I even checked that testcase by hand before submitting the
patch, because I knew it took the paths I was changing, but I didn't
realize the stack store and load would amount to shifts when the stack
slot was bypassed.

With the following patch, we get a lsr and a ubfx, without the sp
adjustments.  Please let me know if it causes any further problems.  So
far, I've tested it on x86_64-linux-gnu, i686-linux-gnu, and
ppc64le-linux-gnu; the ppc64-linux-gnu test run is running slower and
probably won't be done before I call it a day, but I wanted to give you
something before taking off for the day.

Is this ok to install if ppc64-linux-gnu also regstraps successfully?


[PR67753] adjust for padding when bypassing memory in assign_parm_setup_block

From: Alexandre Oliva <aoliva@redhat.com>

Storing a register in memory as a full word and then accessing the
same memory address under a smaller-than-word mode amounts to
right-shifting of the register word on big endian machines.  So, if
BLOCK_REG_PADDING chooses upward padding for BYTES_BIG_ENDIAN, and
we're copying from the entry_parm REG directly to a pseudo, bypassing
any stack slot, perform the shifting explicitly.

This fixes the miscompile of function_return_val_10 in
gcc.target/aarch64/aapcs64/func-ret-4.c for target aarch64_be-elf
introduced in the first patch for 67753.

for  gcc/ChangeLog

	PR rtl-optimization/67753
	PR rtl-optimization/64164
	* function.c (assign_parm_setup_block): Right-shift
	upward-padded big-endian args when bypassing the stack slot.
---
 gcc/function.c |   44 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)

Comments

Jeff Law Nov. 10, 2015, 11:42 p.m. UTC | #1
On 11/10/2015 03:58 PM, Alexandre Oliva wrote:
> On Nov 10, 2015, Alan Lawrence <alan.lawrence@arm.com> wrote:
>
>> FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -O2
>
> Ugh, sorry.  I even checked that testcase by hand before submitting the
> patch, because I knew it took the paths I was changing, but I didn't
> realize the stack store and load would amount to shifts when the stack
> slot was bypassed.
>
> With the following patch, we get a lsr and a ubfx, without the sp
> adjustments.  Please let me know if it causes any further problems.  So
> far, I've tested it on x86_64-linux-gnu, i686-linux-gnu, and
> ppc64le-linux-gnu; the ppc64-linux-gnu test run is running slower and
> probably won't be done before I call it a day, but I wanted to give you
> something before taking off for the day.
>
> Is this ok to install if ppc64-linux-gnu also regstraps successfully?
>
>
> [PR67753] adjust for padding when bypassing memory in assign_parm_setup_block
>
> From: Alexandre Oliva <aoliva@redhat.com>
>
> Storing a register in memory as a full word and then accessing the
> same memory address under a smaller-than-word mode amounts to
> right-shifting of the register word on big endian machines.  So, if
> BLOCK_REG_PADDING chooses upward padding for BYTES_BIG_ENDIAN, and
> we're copying from the entry_parm REG directly to a pseudo, bypassing
> any stack slot, perform the shifting explicitly.
>
> This fixes the miscompile of function_return_val_10 in
> gcc.target/aarch64/aapcs64/func-ret-4.c for target aarch64_be-elf
> introduced in the first patch for 67753.
>
> for  gcc/ChangeLog
>
> 	PR rtl-optimization/67753
> 	PR rtl-optimization/64164
> 	* function.c (assign_parm_setup_block): Right-shift
> 	upward-padded big-endian args when bypassing the stack slot.
Don't you need to check the value of BLOCK_REG_PADDING at runtime?  The 
padding is essentially allowed to vary.

If you  look at the other places where BLOCK_REG_PADDING is used, it's 
checked in a #ifdef, then again inside a if conditional.



Jeff
Alexandre Oliva Nov. 11, 2015, 6:10 p.m. UTC | #2
On Nov 10, 2015, Jeff Law <law@redhat.com> wrote:

>> * function.c (assign_parm_setup_block): Right-shift
>> upward-padded big-endian args when bypassing the stack slot.
> Don't you need to check the value of BLOCK_REG_PADDING at runtime?
> The padding is essentially allowed to vary.

Well, yeah, it's the result of BLOCK_REG_PADDING that tells whether
upward-padding occurred and shifting is required.

> If you  look at the other places where BLOCK_REG_PADDING is used, it's
> checked in a #ifdef, then again inside a if conditional.

That's what I do in the patch too.

That said, the initial conditions in the if/else-if/else chain for the
no-larger-than-a-word case cover all of the non-BLOCK_REG_PADDING cases
correctly, so that, if BLOCK_REG_PADDING is not defined, we can just
skip the !MEM_P block altogether.  That's also the reason why we can go
straight to shifting when we get there.

I tried to document my reasoning in the comments, but maybe it was still
too obscure?
Jeff Law Nov. 13, 2015, 6:33 a.m. UTC | #3
On 11/11/2015 11:10 AM, Alexandre Oliva wrote:
> On Nov 10, 2015, Jeff Law <law@redhat.com> wrote:
>
>>> * function.c (assign_parm_setup_block): Right-shift
>>> upward-padded big-endian args when bypassing the stack slot.
>> Don't you need to check the value of BLOCK_REG_PADDING at runtime?
>> The padding is essentially allowed to vary.
>
> Well, yeah, it's the result of BLOCK_REG_PADDING that tells whether
> upward-padding occurred and shifting is required.
>
>> If you  look at the other places where BLOCK_REG_PADDING is used, it's
>> checked in a #ifdef, then again inside a if conditional.
>
> That's what I do in the patch too.
?  I don't see the runtime check in your patch.  I see a couple 
gcc_asserts, but no runtime check of BLOCK_REG_PADDING.

>
> That said, the initial conditions in the if/else-if/else chain for the
> no-larger-than-a-word case cover all of the non-BLOCK_REG_PADDING cases
> correctly, so that, if BLOCK_REG_PADDING is not defined, we can just
> skip the !MEM_P block altogether.  That's also the reason why we can go
> straight to shifting when we get there.
>
> I tried to document my reasoning in the comments, but maybe it was still
> too obscure?
Certainly seems that way.  Is it your assertion that the new code is 
what we want regardless of the *value* of REG_BLOCK_PADDING? 
Essentially meaning the check in the IF is covering both cases?

What am I missing here?

Jeff
Alexandre Oliva Nov. 17, 2015, 12:07 a.m. UTC | #4
On Nov 13, 2015, Jeff Law <law@redhat.com> wrote:

> On 11/11/2015 11:10 AM, Alexandre Oliva wrote:
>> On Nov 10, 2015, Jeff Law <law@redhat.com> wrote:
>> 
>>>> * function.c (assign_parm_setup_block): Right-shift
>>>> upward-padded big-endian args when bypassing the stack slot.
>>> Don't you need to check the value of BLOCK_REG_PADDING at runtime?
>>> The padding is essentially allowed to vary.
>> 
>> Well, yeah, it's the result of BLOCK_REG_PADDING that tells whether
>> upward-padding occurred and shifting is required.
>> 
>>> If you  look at the other places where BLOCK_REG_PADDING is used, it's
>>> checked in a #ifdef, then again inside a if conditional.
>> 
>> That's what I do in the patch too.
> ?  I don't see the runtime check in your patch.  I see a couple
> gcc_asserts, but no runtime check of BLOCK_REG_PADDING.

The check is not in my patch, indeed.  That's because the previous block
performs the runtime check, and it only lets through two cases: the one
we handle, and the one nobody uses.

The previous block tests this:

	  if (mode != BLKmode
#ifdef BLOCK_REG_PADDING
	      && (size == UNITS_PER_WORD
		  || (BLOCK_REG_PADDING (mode, data->passed_type, 1)
		      != (BYTES_BIG_ENDIAN ? upward : downward)))
#endif
	      )

i.e., whether we know the mode of the passed value, and its word-sized,
or its padded such that the passed value is in the lowpart of the word.
Since this is in a block that runs when size <= UNITS_PER_WORD, this
catches (and works for) all cases of default padding (when
BLOCK_REG_PADDING is not defined), and for cases in which
BLOCK_REG_PADDING is defined so as to behave like the default padding,
at least for smaller-than-word modes.

So, since this handles little-endian bytes with upward padding and
big-endian bytes with downward padding, what remains to be handled is
little-endian bytes with downward padding and big-endian bytes with
upward padding.  I found no evidence that the former is ever used
anywhere, or why anyone would ever force shifting for both REG and MEM
use, and I don't see how the code would have dealt with this case
anyway, so I left it unhandled.  The other case, big-endian bytes with
upward padding, is precisely the one that my previous patch broke on
AArch64: we have the passed values pushed to the upper part of the REG
so that it could be stored in memory as a whole word and then accessed
in the smaller mode at the same address.  After checking that this is
the case at hand, we shift the value as if we stored it in memory as a
word and loaded it in the value mode.

>> That said, the initial conditions in the if/else-if/else chain for the
>> no-larger-than-a-word case cover all of the non-BLOCK_REG_PADDING cases
>> correctly, so that, if BLOCK_REG_PADDING is not defined, we can just
>> skip the !MEM_P block altogether.  That's also the reason why we can go
>> straight to shifting when we get there.
>> 
>> I tried to document my reasoning in the comments, but maybe it was still
>> too obscure?
> Certainly seems that way.  Is it your assertion that the new code is
> what we want regardless of the *value* of REG_BLOCK_PADDING?

Sort of.  If we get to that point, there's only one reasonable value of
BLOCK_REG_PADDING (*), although there's another possible value that we
historically haven't handled and that makes very little sense to
support.  We'd have silently corrupted it before, while now we'd get an
assertion failure.  I count that as an improvement, though it's unlikely
we'd ever hit it: anyone trying to define BLOCK_REG_PADDING so as to pad
small args downward on little-endian bytes would AFAICT soon find out it
doesn't work.

(*) unless mode is BLKmode, which the newly-added code implicitly
excludes by testing that we don't have a MEM, but rather a REG.

> Essentially meaning the check in the IF is covering both cases?

Among all cases for arguments that are word-sized or smaller, the
initial IF (not present in the patch) covers all of the "usual" cases.
The remaining blocks, including the one I added, cover the remaining
handled case, namely, BIG_ENDIAN_BYTES and upward BLOCK_REG_PADDING, or
BLKmode BIG_ENDIAN_BYTES and downward BLOCK_REG_PADDING (that needs the
opposite padding when storing to big-endian mem, so that the value can
be accessed at the address in which the full word is stored).

> What am I missing here?

I agree that the way the remaining tests are written doesn't make it
clear that they're all handling a single case, which makes things
confusing.  In part, that's because they really aren't; they also deal
with BLKmode MEMs with "usual" padding.  But that's not a case that the
patch affects, because we don't have BLKmode REGs.

Any suggestions on how to improve the comments so that they convey
enough of this reasoning to make sense, without our having to write a
book :-) on the topic?

Thanks,
Jeff Law Nov. 24, 2015, 5:38 a.m. UTC | #5
On 11/16/2015 05:07 PM, Alexandre Oliva wrote:
>
> The check is not in my patch, indeed.  That's because the previous block
> performs the runtime check, and it only lets through two cases: the one
> we handle, and the one nobody uses.
That was the conclusion I was starting to come to, but expressed so 
poorly in my last message.  Sadly it was non-obvious from staring at the 
current code.  Though I must admit that after a week, I can see it 
better now.  Maybe that's a result of re-reading your message a 
half-dozen more times with the current code and your patch all visible 
in windows next to each other :-)


Prior to your change we'd just blindly copy from ENTRY_PARM to MEM, 
which would result in missing the implicit shift if MEM wasn't actually 
a memory.

You're just moving that conditional up and handling the shift 
explicitly.  You've got asserts for the cases you're not handling (and 
no, I'm not aware of the need for this on any LE architecture, while I 
am aware of BE architectures that align in both directions).


> Any suggestions on how to improve the comments so that they convey
> enough of this reasoning to make sense, without our having to write a
> book :-) on the topic?
Refer back to this thread? :-)  Seriously though, looking at things a 
week later, I can see it much better now.  Thanks for your patience on this.

OK for the trunk,
jeff
diff mbox

Patch

diff --git a/gcc/function.c b/gcc/function.c
index a637cb3..1ee092c 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3002,6 +3002,38 @@  assign_parm_setup_block (struct assign_parm_data_all *all,
 	      emit_move_insn (change_address (mem, mode, 0), reg);
 	    }
 
+#ifdef BLOCK_REG_PADDING
+	  /* Storing the register in memory as a full word, as
+	     move_block_from_reg below would do, and then using the
+	     MEM in a smaller mode, has the effect of shifting right
+	     if BYTES_BIG_ENDIAN.  If we're bypassing memory, the
+	     shifting must be explicit.  */
+	  else if (!MEM_P (mem))
+	    {
+	      rtx x;
+
+	      /* If the assert below fails, we should have taken the
+		 mode != BLKmode path above, unless we have downward
+		 padding of smaller-than-word arguments on a machine
+		 with little-endian bytes, which would likely require
+		 additional changes to work correctly.  */
+	      gcc_checking_assert (BYTES_BIG_ENDIAN
+				   && (BLOCK_REG_PADDING (mode,
+							  data->passed_type, 1)
+				       == upward));
+
+	      int by = (UNITS_PER_WORD - size) * BITS_PER_UNIT;
+
+	      x = gen_rtx_REG (word_mode, REGNO (entry_parm));
+	      x = expand_shift (RSHIFT_EXPR, word_mode, x, by,
+				NULL_RTX, 1);
+	      x = force_reg (word_mode, x);
+	      x = gen_lowpart_SUBREG (GET_MODE (mem), x);
+
+	      emit_move_insn (mem, x);
+	    }
+#endif
+
 	  /* Blocks smaller than a word on a BYTES_BIG_ENDIAN
 	     machine must be aligned to the left before storing
 	     to memory.  Note that the previous test doesn't
@@ -3023,14 +3055,20 @@  assign_parm_setup_block (struct assign_parm_data_all *all,
 	      tem = change_address (mem, word_mode, 0);
 	      emit_move_insn (tem, x);
 	    }
-	  else if (!MEM_P (mem))
-	    emit_move_insn (mem, entry_parm);
 	  else
 	    move_block_from_reg (REGNO (entry_parm), mem,
 				 size_stored / UNITS_PER_WORD);
 	}
       else if (!MEM_P (mem))
-	emit_move_insn (mem, entry_parm);
+	{
+	  gcc_checking_assert (size > UNITS_PER_WORD);
+#ifdef BLOCK_REG_PADDING
+	  gcc_checking_assert (BLOCK_REG_PADDING (GET_MODE (mem),
+						  data->passed_type, 0)
+			       == upward);
+#endif
+	  emit_move_insn (mem, entry_parm);
+	}
       else
 	move_block_from_reg (REGNO (entry_parm), mem,
 			     size_stored / UNITS_PER_WORD);