diff mbox

[PR64164] drop copyrename, integrate into expand

Message ID or7flxhw2r.fsf@livre.home
State New
Headers show

Commit Message

Alexandre Oliva Nov. 5, 2015, 5:08 a.m. UTC
On Sep 23, 2015, Alexandre Oliva <aoliva@redhat.com> wrote:

> @@ -2982,38 +2887,39 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
[snip]
> +      if (GET_CODE (reg) != CONCAT)
> +        stack_parm = reg;
> +      else
> +        /* This will use or allocate a stack slot that we'd rather
> +           avoid.  FIXME: Could we avoid it in more cases?  */
> +        target_reg = reg;

It turns out that we can, and that helps fixing PR67753.  In the end, I
ended up using the ABI-reserved stack slot if there is one, but just
allocating an unsplit complex pseudo fixes all remaining cases that used
to require the allocation of a stack slot.  Yay!

As for pr67753 proper, we emitted the store of the PARALLEL entry_parm
into the stack parm only in the conversion seq, which was ultimately
emitted after the copy from stack_parm to target_reg that was supposed
to copy the value originally in entry_parm.  So we copied an
uninitialized stack slot, and the subsequent store in the conversion seq
was optimized out as dead.

This caused a number of regressions on hppa-linux-gnu.  The fix for this
is to arrange for the copy to target_reg to be emitted in the conversion
seq if the copy to stack_parm was.  I can't determine whether this fix
all reported regressions, but from visual inspection of the generated
code I'm pretty sure it fixes at least gcc.c-torture/execute/pr38969.c.


When we do NOT have an ABI-reserved stack slot, the store of the
PARALLEL entry_parm into the intermediate pseudo doesn't need to go in
the conversion seq (emit_group_store from a PARALLEL to a pseudo only
uses registers, according to another comment in function.c), so I've
simplified that case.


This was regstrapped on x86_64-linux-gnu, i686-linux-gnu,
ppc64-linux-gnu, ppc64el-linux-gnu, and cross-build-tested for all
targets for which I've tested the earlier patches in the patchset.
Ok to install?



[PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg

From: Alexandre Oliva <aoliva@redhat.com>

In assign_parms_setup_block, the copy of args in PARALLELs from
entry_parm to stack_parm is deferred to the parm conversion insn seq,
but the copy from stack_parm to target_reg was inserted in the normal
copy seq, that is executed before the conversion insn seq.  Oops.

We could do away with the need for an actual stack_parm in general,
which would have avoided the need for emitting the copy to target_reg
in the conversion seq, but at least on pa, due to the need for stack
to copy between SI and SF modes, it seems like using the reserved
stack slot is beneficial, so I put in logic to use a pre-reserved
stack slot when there is one, and emit the copy to target_reg in the
conversion seq if stack_parm was set up there.

for  gcc/ChangeLog

	PR rtl-optimization/67753
	PR rtl-optimization/64164
	* function.c (assign_parm_setup_block): Avoid allocating a
	stack slot if we don't have an ABI-reserved one.  Emit the
	copy to target_reg in the conversion seq if the copy from
	entry_parm is in it too.  Don't use the conversion seq to copy
	a PARALLEL to a REG or a CONCAT.
---
 gcc/function.c |   39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

Comments

Richard Biener Nov. 5, 2015, 1:44 p.m. UTC | #1
On Thu, Nov 5, 2015 at 6:08 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Sep 23, 2015, Alexandre Oliva <aoliva@redhat.com> wrote:
>
>> @@ -2982,38 +2887,39 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
> [snip]
>> +      if (GET_CODE (reg) != CONCAT)
>> +        stack_parm = reg;
>> +      else
>> +        /* This will use or allocate a stack slot that we'd rather
>> +           avoid.  FIXME: Could we avoid it in more cases?  */
>> +        target_reg = reg;
>
> It turns out that we can, and that helps fixing PR67753.  In the end, I
> ended up using the ABI-reserved stack slot if there is one, but just
> allocating an unsplit complex pseudo fixes all remaining cases that used
> to require the allocation of a stack slot.  Yay!
>
> As for pr67753 proper, we emitted the store of the PARALLEL entry_parm
> into the stack parm only in the conversion seq, which was ultimately
> emitted after the copy from stack_parm to target_reg that was supposed
> to copy the value originally in entry_parm.  So we copied an
> uninitialized stack slot, and the subsequent store in the conversion seq
> was optimized out as dead.
>
> This caused a number of regressions on hppa-linux-gnu.  The fix for this
> is to arrange for the copy to target_reg to be emitted in the conversion
> seq if the copy to stack_parm was.  I can't determine whether this fix
> all reported regressions, but from visual inspection of the generated
> code I'm pretty sure it fixes at least gcc.c-torture/execute/pr38969.c.
>
>
> When we do NOT have an ABI-reserved stack slot, the store of the
> PARALLEL entry_parm into the intermediate pseudo doesn't need to go in
> the conversion seq (emit_group_store from a PARALLEL to a pseudo only
> uses registers, according to another comment in function.c), so I've
> simplified that case.
>
>
> This was regstrapped on x86_64-linux-gnu, i686-linux-gnu,
> ppc64-linux-gnu, ppc64el-linux-gnu, and cross-build-tested for all
> targets for which I've tested the earlier patches in the patchset.
> Ok to install?

Ok.

Thanks,
Richard.

>
>
> [PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg
>
> From: Alexandre Oliva <aoliva@redhat.com>
>
> In assign_parms_setup_block, the copy of args in PARALLELs from
> entry_parm to stack_parm is deferred to the parm conversion insn seq,
> but the copy from stack_parm to target_reg was inserted in the normal
> copy seq, that is executed before the conversion insn seq.  Oops.
>
> We could do away with the need for an actual stack_parm in general,
> which would have avoided the need for emitting the copy to target_reg
> in the conversion seq, but at least on pa, due to the need for stack
> to copy between SI and SF modes, it seems like using the reserved
> stack slot is beneficial, so I put in logic to use a pre-reserved
> stack slot when there is one, and emit the copy to target_reg in the
> conversion seq if stack_parm was set up there.
>
> for  gcc/ChangeLog
>
>         PR rtl-optimization/67753
>         PR rtl-optimization/64164
>         * function.c (assign_parm_setup_block): Avoid allocating a
>         stack slot if we don't have an ABI-reserved one.  Emit the
>         copy to target_reg in the conversion seq if the copy from
>         entry_parm is in it too.  Don't use the conversion seq to copy
>         a PARALLEL to a REG or a CONCAT.
> ---
>  gcc/function.c |   39 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/function.c b/gcc/function.c
> index aaf49a4..156c72b 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -2879,6 +2879,7 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
>    rtx entry_parm = data->entry_parm;
>    rtx stack_parm = data->stack_parm;
>    rtx target_reg = NULL_RTX;
> +  bool in_conversion_seq = false;
>    HOST_WIDE_INT size;
>    HOST_WIDE_INT size_stored;
>
> @@ -2895,9 +2896,23 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
>        if (GET_CODE (reg) != CONCAT)
>         stack_parm = reg;
>        else
> -       /* This will use or allocate a stack slot that we'd rather
> -          avoid.  FIXME: Could we avoid it in more cases?  */
> -       target_reg = reg;
> +       {
> +         target_reg = reg;
> +         /* Avoid allocating a stack slot, if there isn't one
> +            preallocated by the ABI.  It might seem like we should
> +            always prefer a pseudo, but converting between
> +            floating-point and integer modes goes through the stack
> +            on various machines, so it's better to use the reserved
> +            stack slot than to risk wasting it and allocating more
> +            for the conversion.  */
> +         if (stack_parm == NULL_RTX)
> +           {
> +             int save = generating_concat_p;
> +             generating_concat_p = 0;
> +             stack_parm = gen_reg_rtx (mode);
> +             generating_concat_p = save;
> +           }
> +       }
>        data->stack_parm = NULL;
>      }
>
> @@ -2938,7 +2953,9 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
>        mem = validize_mem (copy_rtx (stack_parm));
>
>        /* Handle values in multiple non-contiguous locations.  */
> -      if (GET_CODE (entry_parm) == PARALLEL)
> +      if (GET_CODE (entry_parm) == PARALLEL && !MEM_P (mem))
> +       emit_group_store (mem, entry_parm, data->passed_type, size);
> +      else if (GET_CODE (entry_parm) == PARALLEL)
>         {
>           push_to_sequence2 (all->first_conversion_insn,
>                              all->last_conversion_insn);
> @@ -2946,6 +2963,7 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
>           all->first_conversion_insn = get_insns ();
>           all->last_conversion_insn = get_last_insn ();
>           end_sequence ();
> +         in_conversion_seq = true;
>         }
>
>        else if (size == 0)
> @@ -3025,11 +3043,22 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
>        all->first_conversion_insn = get_insns ();
>        all->last_conversion_insn = get_last_insn ();
>        end_sequence ();
> +      in_conversion_seq = true;
>      }
>
>    if (target_reg)
>      {
> -      emit_move_insn (target_reg, stack_parm);
> +      if (!in_conversion_seq)
> +       emit_move_insn (target_reg, stack_parm);
> +      else
> +       {
> +         push_to_sequence2 (all->first_conversion_insn,
> +                            all->last_conversion_insn);
> +         emit_move_insn (target_reg, stack_parm);
> +         all->first_conversion_insn = get_insns ();
> +         all->last_conversion_insn = get_last_insn ();
> +         end_sequence ();
> +       }
>        stack_parm = target_reg;
>      }
>
>
>
> --
> Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Alan Lawrence Nov. 10, 2015, 3:31 p.m. UTC | #2
On 05/11/15 05:08, Alexandre Oliva wrote:
> [PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg
> for  gcc/ChangeLog
>
> 	PR rtl-optimization/67753
> 	PR rtl-optimization/64164
> 	* function.c (assign_parm_setup_block): Avoid allocating a
> 	stack slot if we don't have an ABI-reserved one.  Emit the
> 	copy to target_reg in the conversion seq if the copy from
> 	entry_parm is in it too.  Don't use the conversion seq to copy
> 	a PARALLEL to a REG or a CONCAT.

Since this change, we have on aarch64_be:

FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -O1
FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -O2
FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -O3 -g
FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -Os
FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -Og -g

The difference in the assembler looks as follows (this is at -Og):

  func_return_val_10:
-	sub	sp, sp, #16
-	lsr	x2, x1, 48
-	lsr	x1, x1, 32
+	ubfx	x2, x1, 16, 16
  	fmov	x3, d0
  	// Start of user assembly
  // 23 "func-ret-4.c" 1
  	mov x0, x30
  // 0 "" 2
  	// End of user assembly
  	adrp	x3, saved_return_address
  	str	x0, [x3, #:lo12:saved_return_address]
  	adrp	x0, myfunc
  	add	x0, x0, :lo12:myfunc
  	// Start of user assembly
  // 23 "func-ret-4.c" 1
  	mov x30, x0
  // 0 "" 2
  	// End of user assembly
  	bfi	w0, w2, 16, 16
  	bfi	w0, w1, 0, 16
  	lsl	x0, x0, 32
-	add	sp, sp, 16

(ubfx is a bitfield extract, the first immediate is the lsbit, the second the 
width. lsr = logical shift right.) And in the RTL dump, this (before the patch):

(insn 4 3 5 2 (set (mem/c:DI (plus:DI (reg/f:DI 68 virtual-stack-vars)
                 (const_int -8 [0xfffffffffffffff8])) [0 t+0 S8 A64])
         (reg:DI 1 x1)) func-ret-4.c:23 -1
      (nil))
(insn 5 4 6 2 (set (reg:HI 78 [ t ])
         (mem/c:HI (plus:DI (reg/f:DI 68 virtual-stack-vars)
                 (const_int -8 [0xfffffffffffffff8])) [0 t+0 S2 A64])) 
func-ret-4.c:23 -1
      (nil))
(insn 6 5 7 2 (set (reg:HI 79 [ t+2 ])
         (mem/c:HI (plus:DI (reg/f:DI 68 virtual-stack-vars)
                 (const_int -6 [0xfffffffffffffffa])) [0 t+2 S2 A16])) 
func-ret-4.c:23 -1
      (nil))

becomes (after the patch):

(insn 4 3 5 2 (set (subreg:SI (reg:CHI 80) 0)
         (reg:SI 1 x1 [ t ])) func-ret-4.c:23 -1
      (nil))
(insn 5 4 6 2 (set (reg:SI 81)
         (subreg:SI (reg:CHI 80) 0)) func-ret-4.c:23 -1
      (nil))
(insn 6 5 7 2 (set (subreg:DI (reg:HI 82) 0)
         (zero_extract:DI (subreg:DI (reg:SI 81) 0)
             (const_int 16 [0x10])
             (const_int 16 [0x10]))) func-ret-4.c:23 -1
      (nil))
(insn 7 6 8 2 (set (reg:HI 78 [ t ])
         (reg:HI 82)) func-ret-4.c:23 -1
      (nil))
(insn 8 7 9 2 (set (reg:SI 83)
         (subreg:SI (reg:CHI 80) 0)) func-ret-4.c:23 -1
      (nil))
(insn 9 8 10 2 (set (reg:HI 79 [ t+2 ])
         (subreg:HI (reg:SI 83) 2)) func-ret-4.c:23 -1
      (nil))

--Alan
diff mbox

Patch

diff --git a/gcc/function.c b/gcc/function.c
index aaf49a4..156c72b 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -2879,6 +2879,7 @@  assign_parm_setup_block (struct assign_parm_data_all *all,
   rtx entry_parm = data->entry_parm;
   rtx stack_parm = data->stack_parm;
   rtx target_reg = NULL_RTX;
+  bool in_conversion_seq = false;
   HOST_WIDE_INT size;
   HOST_WIDE_INT size_stored;
 
@@ -2895,9 +2896,23 @@  assign_parm_setup_block (struct assign_parm_data_all *all,
       if (GET_CODE (reg) != CONCAT)
 	stack_parm = reg;
       else
-	/* This will use or allocate a stack slot that we'd rather
-	   avoid.  FIXME: Could we avoid it in more cases?  */
-	target_reg = reg;
+	{
+	  target_reg = reg;
+	  /* Avoid allocating a stack slot, if there isn't one
+	     preallocated by the ABI.  It might seem like we should
+	     always prefer a pseudo, but converting between
+	     floating-point and integer modes goes through the stack
+	     on various machines, so it's better to use the reserved
+	     stack slot than to risk wasting it and allocating more
+	     for the conversion.  */
+	  if (stack_parm == NULL_RTX)
+	    {
+	      int save = generating_concat_p;
+	      generating_concat_p = 0;
+	      stack_parm = gen_reg_rtx (mode);
+	      generating_concat_p = save;
+	    }
+	}
       data->stack_parm = NULL;
     }
 
@@ -2938,7 +2953,9 @@  assign_parm_setup_block (struct assign_parm_data_all *all,
       mem = validize_mem (copy_rtx (stack_parm));
 
       /* Handle values in multiple non-contiguous locations.  */
-      if (GET_CODE (entry_parm) == PARALLEL)
+      if (GET_CODE (entry_parm) == PARALLEL && !MEM_P (mem))
+	emit_group_store (mem, entry_parm, data->passed_type, size);
+      else if (GET_CODE (entry_parm) == PARALLEL)
 	{
 	  push_to_sequence2 (all->first_conversion_insn,
 			     all->last_conversion_insn);
@@ -2946,6 +2963,7 @@  assign_parm_setup_block (struct assign_parm_data_all *all,
 	  all->first_conversion_insn = get_insns ();
 	  all->last_conversion_insn = get_last_insn ();
 	  end_sequence ();
+	  in_conversion_seq = true;
 	}
 
       else if (size == 0)
@@ -3025,11 +3043,22 @@  assign_parm_setup_block (struct assign_parm_data_all *all,
       all->first_conversion_insn = get_insns ();
       all->last_conversion_insn = get_last_insn ();
       end_sequence ();
+      in_conversion_seq = true;
     }
 
   if (target_reg)
     {
-      emit_move_insn (target_reg, stack_parm);
+      if (!in_conversion_seq)
+	emit_move_insn (target_reg, stack_parm);
+      else
+	{
+	  push_to_sequence2 (all->first_conversion_insn,
+			     all->last_conversion_insn);
+	  emit_move_insn (target_reg, stack_parm);
+	  all->first_conversion_insn = get_insns ();
+	  all->last_conversion_insn = get_last_insn ();
+	  end_sequence ();
+	}
       stack_parm = target_reg;
     }