diff mbox

[RFA,middle-end/53623] Improve extension elimination

Message ID 52B3CE40.6090103@redhat.com
State New
Headers show

Commit Message

Jeff Law Dec. 20, 2013, 4:57 a.m. UTC
PR53623 is a case where transformations by the tree-ssa optimizers are 
inhibiting the RTL optimizers.

As noted in c#5 we have something like this:

   (set (reg:HI %ax) (mem:HI (whatever)))
   (set (reg:DI %rdx) (sign_extend:DI (reg:HI %ax))
   (set (reg:DI %rax) (zero_extend:DI (reg:QI %al))

The first extension can't be combined with the memory ref by REE because 
the source & destination registers are different.  The second extention 
can't be combined either because of the existence of the first extension.

What we'd really like is something like this:

   (set (reg:DI %rdx) (sign_extend:DI (mem:HI (whatever))))
   (set (reg:DI %rax) (zero_extend:DI (reg:QI %rdx))


Jakub had the idea that we could extend REE to handle elimination of 
extentions where the source and destination registers are not the same. 
  That's precisely what this patch does.

Rather than create the desired form directly, we instead generate this:

   (set (reg:DI %rdx) (sign_extend:DI (mem:HI (whatever))))
   (set (reg:DI %rax) (reg:DI $rdx)
   (set (reg:DI %rax) (zero_extend:DI (reg:QI %rax))


With the structure of REE, that's actually far simpler to generate and 
the hard register copy propagation which runs after REE will propagate 
away the unnecessary copy.

The extension is pretty simple.  First we remove the check that the 
source & destination registers must be the same in 
add_removable_extension.  That gets the first extension on the list of 
things to consider.

combine_reaching_defs has all the checks to ensure this is safe.  When 
the source & destination in the extension are not the same register, we 
require that:

1. There only be one reaching def for the source register of the 
extention.  This means we only have to fix up that single reaching def 
rather than multiple reaching defs.  We could certainly handle multiple 
reaching defs if we felt the need in the future, it's just more work.

2. We require both insns to not have already been modified.  This is 
probably overly conservative too.

3. We require both insns to be in the same block.  This simplifies the 
checks is #4.

4. We require the destination of the extension to be neither set nor 
modified between the memory reference and the extension.

[ All other tests for REE must also be passed, those are additional 
tests for this specific case. ]

Assuming all those tests pass, we allow combination of the extension 
with reaching def statement.  We change the destination of the reaching 
def to be the destination of the extension and delete the now dead 
extension.

The problem is that there may be other uses of the original destination 
of the memory load.  So we copy from the new destination of the load 
back into the old destination of the load.

And it "just works".  It triggers all over the place in a bootstrap.  A 
slightly older version triggered 24000 times during a bootstrap (yes, 
24000).

Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Obviously fixes 53623 (P2 regression).  OK for the trunk?


Jeff
* ree.c (combine_set_extension): Handle case where source
	and destination registers in an extension insn are different.
	(combine_reaching_defs): Allow source and destination
	registers in extension to be different under limited
	circumstances.
	(add_removable_extension): Remove restriction that the
	source and destination registers in the extension are the
	same.
	(find_and_remove_re): Emit a copy from the extension's
	destination to its source after the defining insn if
	the source and destination registers are different.


testsuite/

	* gcc.target/i386/pr53623.c: New test.

Comments

Jakub Jelinek Dec. 20, 2013, 8:24 a.m. UTC | #1
On Thu, Dec 19, 2013 at 09:57:36PM -0700, Jeff Law wrote:
> 	* ree.c (combine_set_extension): Handle case where source
> 	and destination registers in an extension insn are different.
> 	(combine_reaching_defs): Allow source and destination
> 	registers in extension to be different under limited
> 	circumstances.
> 	(add_removable_extension): Remove restriction that the
> 	source and destination registers in the extension are the
> 	same.
> 	(find_and_remove_re): Emit a copy from the extension's
> 	destination to its source after the defining insn if
> 	the source and destination registers are different.
> 
> 
> testsuite/
> 
> 	* gcc.target/i386/pr53623.c: New test.

Thanks for working on this, the only thing I'd worry about are
HARD_REGNO_NREGS > 1 registers if the two hard regs might overlap.
Perhaps it is fine as is and dunno how many targets actually allow partial
overlap in between the multi-register REGs.  If you aren't sure this
would be handled properly, perhaps the check could go to
add_removable_extension below the REG_P check add
  && !reg_overlap_mentioned_p (dest, XEXP (src, 0))

> diff --git a/gcc/ree.c b/gcc/ree.c
> index 9938e98..63ad86c 100644
> --- a/gcc/ree.c
> +++ b/gcc/ree.c
> @@ -282,9 +282,21 @@ static bool
>  combine_set_extension (ext_cand *cand, rtx curr_insn, rtx *orig_set)
>  {
>    rtx orig_src = SET_SRC (*orig_set);
> -  rtx new_reg = gen_rtx_REG (cand->mode, REGNO (SET_DEST (*orig_set)));
>    rtx new_set;
>  
> +  /* If the extension's source/destination registers are not the same
> +     then we need to change the original load to reference the destination
> +     of the extension.  Then we need to emit a copy from that destination
> +     to the original destination of the load.  */
> +  rtx new_reg;
> +  bool copy_needed
> +    = REGNO (SET_DEST (PATTERN (cand->insn)))
> +	     != REGNO (XEXP (SET_SRC (PATTERN (cand->insn)), 0));

Perhaps the right formatting here would be
  bool copy_needed
    = (REGNO (SET_DEST (PATTERN (cand->insn)))
       != REGNO (XEXP (SET_SRC (PATTERN (cand->insn)), 0)));
? ()s for emacs, and aligning != under REGNO.

> +  if (copy_needed)
> +    new_reg = gen_rtx_REG (cand->mode, REGNO (SET_DEST (PATTERN (cand->insn))));

Too long line.

Looks good to me otherwise.

	Jakub
Jeff Law Dec. 20, 2013, 4:26 p.m. UTC | #2
On 12/20/13 01:24, Jakub Jelinek wrote:
>
> Thanks for working on this, the only thing I'd worry about are
> HARD_REGNO_NREGS > 1 registers if the two hard regs might overlap.
The reg_set_between_p and reg_used_between_p calls when you dig down 
into them eventually use reg_overlap_mentioned_p which should do the 
right thing in this regard.

I'll audit ree for problems of this nature.

>>
>> +  /* If the extension's source/destination registers are not the same
>> +     then we need to change the original load to reference the destination
>> +     of the extension.  Then we need to emit a copy from that destination
>> +     to the original destination of the load.  */
>> +  rtx new_reg;
>> +  bool copy_needed
>> +    = REGNO (SET_DEST (PATTERN (cand->insn)))
>> +	     != REGNO (XEXP (SET_SRC (PATTERN (cand->insn)), 0));
>
> Perhaps the right formatting here would be
>    bool copy_needed
>      = (REGNO (SET_DEST (PATTERN (cand->insn)))
>         != REGNO (XEXP (SET_SRC (PATTERN (cand->insn)), 0)));
> ? ()s for emacs, and aligning != under REGNO.
Agreed.  I was factoring out that test and didn't fix the formatting.

>
>> +  if (copy_needed)
>> +    new_reg = gen_rtx_REG (cand->mode, REGNO (SET_DEST (PATTERN (cand->insn))));
>
> Too long line.
Ugh.  80 exactly.  There are times I hate our conventions :(

jeff
Jakub Jelinek Dec. 20, 2013, 4:45 p.m. UTC | #3
On Fri, Dec 20, 2013 at 09:26:10AM -0700, Jeff Law wrote:
> >Thanks for working on this, the only thing I'd worry about are
> >HARD_REGNO_NREGS > 1 registers if the two hard regs might overlap.
> The reg_set_between_p and reg_used_between_p calls when you dig down
> into them eventually use reg_overlap_mentioned_p which should do the
> right thing in this regard.
> 
> I'll audit ree for problems of this nature.

The two reg_*_between_p functions check only insns in between the two, but
not the insns themselves.  What I meant is if it e.g. couldn't be possible
to have HARD_REGNO_NREGS == 2 registers say 1 and 2, where the first insn
would load into the low half of 1, then one insn that say sign extends
1 into {2,3}, then perhaps {2,3} is used and finally 1 is zero extended
into {1,2} and that is used later on.  For little endian this would work,
while after the transformation which sign extends the memory into {2,3}
and then copies that into {1,2} (thus overwriting content of low half of
{2,3} with the high half of it).  Perhaps the RA will never allow this.

	Jakub
Jeff Law Dec. 20, 2013, 5:17 p.m. UTC | #4
On 12/20/13 09:45, Jakub Jelinek wrote:
> On Fri, Dec 20, 2013 at 09:26:10AM -0700, Jeff Law wrote:
>>> Thanks for working on this, the only thing I'd worry about are
>>> HARD_REGNO_NREGS > 1 registers if the two hard regs might overlap.
>> The reg_set_between_p and reg_used_between_p calls when you dig down
>> into them eventually use reg_overlap_mentioned_p which should do the
>> right thing in this regard.
>>
>> I'll audit ree for problems of this nature.
>
> The two reg_*_between_p functions check only insns in between the two, but
> not the insns themselves.  What I meant is if it e.g. couldn't be possible
> to have HARD_REGNO_NREGS == 2 registers say 1 and 2, where the first insn
> would load into the low half of 1, then one insn that say sign extends
> 1 into {2,3}, then perhaps {2,3} is used and finally 1 is zero extended
> into {1,2} and that is used later on.  For little endian this would work,
> while after the transformation which sign extends the memory into {2,3}
> and then copies that into {1,2} (thus overwriting content of low half of
> {2,3} with the high half of it).  Perhaps the RA will never allow this.
ISTM if we're presented with something like that (and I don't think 
there's anything in RA which explicitly disallows such code), then what 
we have to evaluate is whether or not the transformation preserves the 
semantics.

So, incoming would look like this (assuming a 32 bit target):


r1:SI = mem:SI
r2:DI = sext:DI (r1:SI)
[ Use r2/r3 ]
r1:DI = zext:DI (r1:SI)

And that would be transformed into:

r2:DI = sext:DI (mem:SI)
r1:DI = r2:DI
[ Use r2/r3 ]
r1:DI = zext:DI (r1:SI)

Where r2 will have the wrong value in the use statements.  ISTM we can 
check for an overlap between the destination of the memory load and the 
destination of the first extension.  Right?

Is that the case you're worrying about?

jeff
Jakub Jelinek Dec. 20, 2013, 5:25 p.m. UTC | #5
On Fri, Dec 20, 2013 at 10:17:10AM -0700, Jeff Law wrote:
> ISTM if we're presented with something like that (and I don't think
> there's anything in RA which explicitly disallows such code), then
> what we have to evaluate is whether or not the transformation
> preserves the semantics.
> 
> So, incoming would look like this (assuming a 32 bit target):
> 
> 
> r1:SI = mem:SI
> r2:DI = sext:DI (r1:SI)
> [ Use r2/r3 ]
> r1:DI = zext:DI (r1:SI)
> 
> And that would be transformed into:
> 
> r2:DI = sext:DI (mem:SI)
> r1:DI = r2:DI
> [ Use r2/r3 ]
> r1:DI = zext:DI (r1:SI)
> 
> Where r2 will have the wrong value in the use statements.  ISTM we
> can check for an overlap between the destination of the memory load
> and the destination of the first extension.  Right?
> 
> Is that the case you're worrying about?

Yes.  So my suggestion actually was not correct for that:
  && !reg_overlap_mentioned_p (dest, XEXP (src, 0))
because the first extension above has r1:SI and r2:DI which don't
overlap, only r1:DI and r2:DI overlap.  So it probably should be checked
in combine_reaching_defs instead where you have already both the registers
in the right modes available and can call reg_overlap_mentioned_p on them
directly.  One argument would be SET_DEST (def_insn) and one SET_DEST
(cand->insn), right?

	Jakub
Jeff Law Dec. 20, 2013, 5:35 p.m. UTC | #6
On 12/20/13 10:25, Jakub Jelinek wrote:
   So it probably should be checked
> in combine_reaching_defs instead where you have already both the registers
> in the right modes available and can call reg_overlap_mentioned_p on them
> directly.  One argument would be SET_DEST (def_insn) and one SET_DEST
> (cand->insn), right?
Certainly my preference is to have all the tests for this exception live 
in combine_reaching_defs.

We actually need to test in the widened mode, so we have to generate a 
new reg expressing SET_DEST (def_insn) in the widened mode.  The other 
argument for the reg_overlap_mentioned_p call is SET_DEST (cand->insn).

I find myself wondering if we should be using the widened mode register 
for the calls to reg_{used,set}_between_p calls.  I'll have to ponder 
that for a few minutes.

jeff
diff mbox

Patch

diff --git a/gcc/ree.c b/gcc/ree.c
index 9938e98..63ad86c 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -282,9 +282,21 @@  static bool
 combine_set_extension (ext_cand *cand, rtx curr_insn, rtx *orig_set)
 {
   rtx orig_src = SET_SRC (*orig_set);
-  rtx new_reg = gen_rtx_REG (cand->mode, REGNO (SET_DEST (*orig_set)));
   rtx new_set;
 
+  /* If the extension's source/destination registers are not the same
+     then we need to change the original load to reference the destination
+     of the extension.  Then we need to emit a copy from that destination
+     to the original destination of the load.  */
+  rtx new_reg;
+  bool copy_needed
+    = REGNO (SET_DEST (PATTERN (cand->insn)))
+	     != REGNO (XEXP (SET_SRC (PATTERN (cand->insn)), 0));
+  if (copy_needed)
+    new_reg = gen_rtx_REG (cand->mode, REGNO (SET_DEST (PATTERN (cand->insn))));
+  else
+    new_reg = gen_rtx_REG (cand->mode, REGNO (SET_DEST (*orig_set)));
+
   /* Merge constants by directly moving the constant into the register under
      some conditions.  Recall that RTL constants are sign-extended.  */
   if (GET_CODE (orig_src) == CONST_INT
@@ -342,7 +354,8 @@  combine_set_extension (ext_cand *cand, rtx curr_insn, rtx *orig_set)
       if (dump_file)
         {
           fprintf (dump_file,
-		   "Tentatively merged extension with definition:\n");
+		   "Tentatively merged extension with definition %s:\n",
+		   (copy_needed) ? "(copy needed)" : "");
           print_rtl_single (dump_file, curr_insn);
         }
       return true;
@@ -662,6 +675,45 @@  combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
   if (!outcome)
     return false;
 
+  /* If the destination operand of the extension is a different
+     register than the source operand, then additional restrictions
+     are needed.  */
+  if ((REGNO (SET_DEST (PATTERN (cand->insn)))
+       != REGNO (XEXP (SET_SRC (PATTERN (cand->insn)), 0))))
+    {
+      /* In theory we could handle more than one reaching def, it
+	 just makes the code to update the insn stream more complex.  */
+      if (state->defs_list.length () != 1)
+	return false;
+
+      /* We require the candidate not already be modified.  This may
+	 be overly conservative.  */
+      if (state->modified[INSN_UID (cand->insn)].kind != EXT_MODIFIED_NONE)
+	return false;
+
+      /* There's only one reaching def.  */
+      rtx def_insn = state->defs_list[0];
+
+      /* The defining statement must not have been modified either.  */
+      if (state->modified[INSN_UID (def_insn)].kind != EXT_MODIFIED_NONE)
+	return false;
+
+      /* The defining statement and candidate insn must be in the same block.
+	 This is merely to keep the test for safety and updating the insn
+	 stream simple.  */
+      if (BLOCK_FOR_INSN (cand->insn) != BLOCK_FOR_INSN (def_insn))
+	return false;
+
+      /* The destination register of the extension insn must not be
+	 used or set between the def_insn and cand->insn exclusive.  */
+      if (reg_used_between_p (SET_DEST (PATTERN (cand->insn)),
+			      def_insn, cand->insn)
+	  || reg_set_between_p (SET_DEST (PATTERN (cand->insn)),
+				def_insn, cand->insn))
+	return false;
+    }
+
+
   /* If cand->insn has been already modified, update cand->mode to a wider
      mode if possible, or punt.  */
   if (state->modified[INSN_UID (cand->insn)].kind != EXT_MODIFIED_NONE)
@@ -778,8 +830,7 @@  add_removable_extension (const_rtx expr, rtx insn,
 
   if (REG_P (dest)
       && (code == SIGN_EXTEND || code == ZERO_EXTEND)
-      && REG_P (XEXP (src, 0))
-      && REGNO (dest) == REGNO (XEXP (src, 0)))
+      && REG_P (XEXP (src, 0)))
     {
       struct df_link *defs, *def;
       ext_cand *cand;
@@ -863,6 +914,7 @@  find_and_remove_re (void)
   int num_re_opportunities = 0, num_realized = 0, i;
   vec<ext_cand> reinsn_list;
   auto_vec<rtx> reinsn_del_list;
+  auto_vec<rtx> reinsn_copy_list;
   ext_state state;
 
   /* Construct DU chain to get all reaching definitions of each
@@ -899,11 +951,41 @@  find_and_remove_re (void)
           if (dump_file)
             fprintf (dump_file, "Eliminated the extension.\n");
           num_realized++;
-          reinsn_del_list.safe_push (curr_cand->insn);
+	  if (REGNO (SET_DEST (PATTERN (curr_cand->insn)))
+	      != REGNO (XEXP (SET_SRC (PATTERN (curr_cand->insn)), 0)))
+	    {
+              reinsn_copy_list.safe_push (curr_cand->insn);
+              reinsn_copy_list.safe_push (state.defs_list[0]);
+	    }
+	  reinsn_del_list.safe_push (curr_cand->insn);
 	  state.modified[INSN_UID (curr_cand->insn)].deleted = 1;
         }
     }
 
+  /* The copy list contains pairs of insns which describe copies we
+     need to insert into the INSN stream.
+
+     The first insn in each pair is the extension insn, from which
+     we derive the source and destination of the copy.
+
+     The second insn in each pair is the memory reference where the
+     extension will ultimately happen.  We emit the new copy
+     immediately after this insn.
+
+     It may first appear that the arguments for the copy are reversed.
+     Remember that the memory reference will be changed to refer to the
+     destination of the extention.  So we're actually emitting a copy
+     from the new destination to the old destination.  */
+  for (unsigned int i = 0; i < reinsn_copy_list.length (); i += 2)
+    {
+      rtx curr_insn = reinsn_copy_list[i];
+      rtx pat = PATTERN (curr_insn);
+      rtx new_reg = gen_rtx_REG (GET_MODE (SET_DEST (pat)),
+				 REGNO (XEXP (SET_SRC (pat), 0)));
+      rtx set = gen_rtx_SET (VOIDmode, new_reg, SET_DEST (pat));
+      emit_insn_after (set, reinsn_copy_list[i + 1]);
+    }
+
   /* Delete all useless extensions here in one sweep.  */
   FOR_EACH_VEC_ELT (reinsn_del_list, i, curr_insn)
     delete_insn (curr_insn);
diff --git a/gcc/testsuite/gcc.target/i386/pr53623.c b/gcc/testsuite/gcc.target/i386/pr53623.c
new file mode 100644
index 0000000..5375b61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr53623.c
@@ -0,0 +1,25 @@ 
+/* { dg-do compile { target { x86_64-*-* } } } */
+/* { dg-options "-O2 -fdump-rtl-ree" } */
+
+
+#include <stdint.h>
+
+typedef (*inst_t)(int64_t rdi, int64_t rsi, int64_t rdx);
+
+int16_t code[256];
+inst_t dispatch[256];
+
+void an_inst(int64_t rdi, int64_t rsi, int64_t rdx) {
+  rdx = code[rdx];
+  uint8_t inst = (uint8_t) rdx;
+  rdx >>= 8;
+  dispatch[inst](rdi, rsi, rdx);
+}
+
+int main(void) {
+  return 0;
+}
+
+/* { dg-final { scan-rtl-dump "copy needed" "ree" } } */
+/* { dg-final { cleanup-rtl-dump "ree" } } */
+