diff mbox

Don't combine param and return value copies

Message ID 20150525005635.GA14752@bubble.grove.modra.org
State New
Headers show

Commit Message

Alan Modra May 25, 2015, 12:56 a.m. UTC
On Sat, May 23, 2015 at 08:45:23AM -0500, Segher Boessenkool wrote:
> Thanks.  Did you see improvements around return as well, or mostly /
> only related to the function start?

The rlwinm vs. rldicl change was in dwarf2out.c add_ranges_num for r3
one insn before a blr.  I'm sure that was due to not combining return
value copies.  In fact, I'm sure all the improvements I saw were due
to changing the exit..  See below.

[snip]
> I agree it looks like a bug waiting to happen.  Please post it as a
> separate patch though.

OK.

> > +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB
> > +   that we don't want to combine with other instructions.  */
> > +
> > +static void
> > +twiddle_first_block (basic_block bb, basic_block to)
> 
> I don't like this much.  Messing with global state makes things harder
> to change later.  If it is hard to come up with a good name for a
> factor, it usually means it is not such a good factor.
> 
> You can do these inside can_combine_{def,use}_p as far as I can see?
> Probably need to give those an extra parameter then: for _def, a bool
> that says "don't allow moves from hard regs", set when the scan has
> encountered a NOTE_INSN_FUNCTION_BEG in this bb; and for _use, a regset
> of those hard regs we don't want to allow moves to (those seen in USEs
> later in that same block).  Or do it in the main loop itself, _{def,use}
> are each called in one place only; whatever works best.

Huh, that's the way I started writing this patch..  The reason I went
with modifying BLOCK_FOR_INSN is that the loop setting up LOG_LINKS
needs to test that anyway.  Changing code in the main loop means
every insn in a function will need to be tested for additional
conditions.  I thought that might be slower.  I can see your point
though, especially if someone wanted to wean combine off LOG_LINKS.
I'll rewrite the patch, which I realized was buggy anyway, and didn't
prevent param copies from being combined.  :-(

> What makes return values different from CALL args here, btw?  Why do
> you not want to combine return values, but you leave alone call args?

I don't think there is much difference between SETs for return values
and SETs for call args.  The reason I deliberately left them out is
that in the original discussion we were talking about parameters and
return values.  So I thought I'd better restrict the change to just
those SETs.

It would be a much simpler patch to make combine ignore all SETs
that are a reg,reg copy with one of them a hard reg.  I was a little
worried I'd regress some target if I tried that.  (Results on
powerpc64le-linux for such a change look good.  A lot more cases where
code is better, due to catching the parameter reg,reg copies.  In
looking at gcc/*.o I haven't yet seen any regressions in code quality.)

> > +/* Fill in log links field for all insns that we wish to combine.  */
> 
> Please don't change this comment; it was more correct before.

But it wasn't correct!  LOG_LINKS are not set up for *all* insns.
Perhaps I should add "that we might wish to combine" rather than
"that we wish to combine"?

> > @@ -1103,7 +1192,7 @@ create_log_links (void)
> >  		 we might wind up changing the semantics of the insn,
> >  		 even if reload can make what appear to be valid
> >  		 assignments later.  */
> > -	      if (regno < FIRST_PSEUDO_REGISTER
> > +	      if (HARD_REGISTER_NUM_P (regno)
> >  		  && asm_noperands (PATTERN (use_insn)) >= 0)
> >  		continue;
> 
> An independent change.

Yeah, I guess.  I tidy these if I'm working on a piece of code.
Here's the whole file done.  Boostrapped and regression tested
powerpc64le-linux and x86_64-linux.

	* combine.c: Use HARD_REGISTER_P and HARD_REGISTER_NUM_P as
	appropriate throughout file in place of comparing regno
	against FIRST_PSEUDO_REGISTER.

Comments

Alan Modra May 25, 2015, 1:31 a.m. UTC | #1
On Mon, May 25, 2015 at 10:26:35AM +0930, Alan Modra wrote:
> looking at gcc/*.o I haven't yet seen any regressions in code quality.)

Well that didn't last very long.  There are regressions, and just from
looking at disassembled object files it would appear to be frame
pointer related.  So the simple patch won't work.
Segher Boessenkool May 25, 2015, 8:12 p.m. UTC | #2
On Mon, May 25, 2015 at 10:26:35AM +0930, Alan Modra wrote:
> On Sat, May 23, 2015 at 08:45:23AM -0500, Segher Boessenkool wrote:
> > Thanks.  Did you see improvements around return as well, or mostly /
> > only related to the function start?
> 
> The rlwinm vs. rldicl change was in dwarf2out.c add_ranges_num for r3
> one insn before a blr.  I'm sure that was due to not combining return
> value copies.  In fact, I'm sure all the improvements I saw were due
> to changing the exit..  See below.
> 
> > > +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB
> > > +   that we don't want to combine with other instructions.  */
> > > +
> > > +static void
> > > +twiddle_first_block (basic_block bb, basic_block to)
> > 
> > I don't like this much.  Messing with global state makes things harder
> > to change later.  If it is hard to come up with a good name for a
> > factor, it usually means it is not such a good factor.
> > 
> > You can do these inside can_combine_{def,use}_p as far as I can see?
> > Probably need to give those an extra parameter then: for _def, a bool
> > that says "don't allow moves from hard regs", set when the scan has
> > encountered a NOTE_INSN_FUNCTION_BEG in this bb; and for _use, a regset
> > of those hard regs we don't want to allow moves to (those seen in USEs
> > later in that same block).  Or do it in the main loop itself, _{def,use}
> > are each called in one place only; whatever works best.
> 
> Huh, that's the way I started writing this patch..  The reason I went
> with modifying BLOCK_FOR_INSN is that the loop setting up LOG_LINKS
> needs to test that anyway.  Changing code in the main loop means
> every insn in a function will need to be tested for additional
> conditions.  I thought that might be slower.  I can see your point
> though, especially if someone wanted to wean combine off LOG_LINKS.

The setup of the LOG_LINKS is a simple fast linear loop, much less
work than the rest of combine.  So don't worry about performance too
much :-)

> > What makes return values different from CALL args here, btw?  Why do
> > you not want to combine return values, but you leave alone call args?
> 
> I don't think there is much difference between SETs for return values
> and SETs for call args.  The reason I deliberately left them out is
> that in the original discussion we were talking about parameters and
> return values.  So I thought I'd better restrict the change to just
> those SETs.
> 
> It would be a much simpler patch to make combine ignore all SETs
> that are a reg,reg copy with one of them a hard reg.  I was a little
> worried I'd regress some target if I tried that.

_All_ moves to/from hard regs includes much more (register asm, fixed
registers, maybe some targets do weird things as well?).  So yes I share
those worries.

Since we do not want any other pass (before RA) to foul up these
register moves either, maybe a better solution would be to mark them
some way at expand time?

> (Results on
> powerpc64le-linux for such a change look good.  A lot more cases where
> code is better, due to catching the parameter reg,reg copies.  In
> looking at gcc/*.o I haven't yet seen any regressions in code quality.)

That is good news :-)

> > > +/* Fill in log links field for all insns that we wish to combine.  */
> > 
> > Please don't change this comment; it was more correct before.
> 
> But it wasn't correct!  LOG_LINKS are not set up for *all* insns.

It "sets" all LOG_LINKS to some value ("sets", it doesn't actually
zero them at the start, it has an assert for that though).

> > > @@ -1103,7 +1192,7 @@ create_log_links (void)
> > >  		 we might wind up changing the semantics of the insn,
> > >  		 even if reload can make what appear to be valid
> > >  		 assignments later.  */
> > > -	      if (regno < FIRST_PSEUDO_REGISTER
> > > +	      if (HARD_REGISTER_NUM_P (regno)
> > >  		  && asm_noperands (PATTERN (use_insn)) >= 0)
> > >  		continue;
> > 
> > An independent change.
> 
> Yeah, I guess.  I tidy these if I'm working on a piece of code.

Pretty far away in this case ;-)

> Here's the whole file done.  Boostrapped and regression tested
> powerpc64le-linux and x86_64-linux.
> 
> 	* combine.c: Use HARD_REGISTER_P and HARD_REGISTER_NUM_P as
> 	appropriate throughout file in place of comparing regno
> 	against FIRST_PSEUDO_REGISTER.

Have we decided we want to convert the whole compiler to this?  It
is a pretty lousy interface IMO: HARD_REGISTER_P does not check if
its arg is a register; it does not check if its arg is a hard register
either (it checks if it is not a pseudo); it cannot be used in all
places in all passes because of that (which means, not in all macros
and hooks either).


Segher
diff mbox

Patch

Index: combine.c
===================================================================
--- combine.c	(revision 223463)
+++ combine.c	(working copy)
@@ -1103,7 +1103,7 @@  create_log_links (void)
 		 we might wind up changing the semantics of the insn,
 		 even if reload can make what appear to be valid
 		 assignments later.  */
-	      if (regno < FIRST_PSEUDO_REGISTER
+	      if (HARD_REGISTER_NUM_P (regno)
 		  && asm_noperands (PATTERN (use_insn)) >= 0)
 		continue;
 
@@ -1730,7 +1730,7 @@  set_nonzero_bits_and_sign_copies (rtx x, const_rtx
   rtx_insn *insn = (rtx_insn *) data;
 
   if (REG_P (x)
-      && REGNO (x) >= FIRST_PSEUDO_REGISTER
+      && !HARD_REGISTER_P (x)
       /* If this register is undefined at the start of the file, we can't
 	 say what its contents were.  */
       && ! REGNO_REG_SET_P
@@ -1897,7 +1897,7 @@  can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
 			  && (REGNO (XEXP (i3elt, 0)) == regno
 			      ? reg_set_between_p (XEXP (elt, 0),
 						   PREV_INSN (insn), i3)
-			      : regno >= FIRST_PSEUDO_REGISTER))
+			      : !HARD_REGISTER_NUM_P (regno)))
 			return 0;
 		    }
 		  while (--i >= 0);
@@ -1971,7 +1971,7 @@  can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
       || (CALL_P (i3)
 	  && (find_reg_fusage (i3, USE, dest)
 	      || (REG_P (dest)
-		  && REGNO (dest) < FIRST_PSEUDO_REGISTER
+		  && HARD_REGISTER_P (dest)
 		  && global_regs[REGNO (dest)])))
       /* Don't substitute into an incremented register.  */
       || FIND_REG_INC_NOTE (i3, dest)
@@ -2021,7 +2021,7 @@  can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
 	 register.  */
 
       if (REG_P (src)
-	  && ((REGNO (dest) < FIRST_PSEUDO_REGISTER
+	  && ((HARD_REGISTER_P (dest)
 	       && ! HARD_REGNO_MODE_OK (REGNO (dest), GET_MODE (dest)))
 	      /* Don't extend the life of a hard register unless it is
 		 user variable (if we have few registers) or it can't
@@ -2030,7 +2030,7 @@  can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
 		 Also avoid substituting a return register into I3, because
 		 reload can't handle a conflict with constraints of other
 		 inputs.  */
-	      || (REGNO (src) < FIRST_PSEUDO_REGISTER
+	      || (HARD_REGISTER_P (src)
 		  && ! HARD_REGNO_MODE_OK (REGNO (src), GET_MODE (src)))))
 	return 0;
     }
@@ -2052,7 +2052,7 @@  can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
 	     we leave it up to the machine description to either accept or
 	     reject use-and-clobber patterns.  */
 	  if (!REG_P (reg)
-	      || REGNO (reg) >= FIRST_PSEUDO_REGISTER
+	      || !HARD_REGISTER_P (reg)
 	      || !fixed_regs[REGNO (reg)])
 	    if (reg_overlap_mentioned_p (reg, src))
 	      return 0;
@@ -2075,7 +2075,7 @@  can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i
      to be an explicit register variable, and was chosen for a reason.  */
 
   if (GET_CODE (src) == ASM_OPERANDS
-      && REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER)
+      && REG_P (dest) && HARD_REGISTER_P (dest))
     return 0;
 
   /* If INSN contains volatile references (specifically volatile MEMs),
@@ -2221,7 +2221,7 @@  combinable_i3pat (rtx_insn *i3, rtx *loc, rtx i2de
 	     checks this; here, we do a more specific test for this case.  */
 
 	  || (REG_P (inner_dest)
-	      && REGNO (inner_dest) < FIRST_PSEUDO_REGISTER
+	      && HARD_REGISTER_P (inner_dest)
 	      && (! HARD_REGNO_MODE_OK (REGNO (inner_dest),
 					GET_MODE (inner_dest))))
 	  || (i1_not_in_src && reg_overlap_mentioned_p (i1dest, src))
@@ -2469,7 +2469,7 @@  can_change_dest_mode (rtx x, int added_sets, machi
   regno = REGNO (x);
   /* Allow hard registers if the new mode is legal, and occupies no more
      registers than the old mode.  */
-  if (regno < FIRST_PSEUDO_REGISTER)
+  if (HARD_REGISTER_NUM_P (regno))
     return (HARD_REGNO_MODE_OK (regno, mode)
 	    && REG_NREGS (x) >= hard_regno_nregs[regno][mode]);
 
@@ -2790,7 +2790,7 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
 
   if (i1 == 0 && NONJUMP_INSN_P (i3) && GET_CODE (PATTERN (i3)) == SET
       && REG_P (SET_SRC (PATTERN (i3)))
-      && REGNO (SET_SRC (PATTERN (i3))) >= FIRST_PSEUDO_REGISTER
+      && !HARD_REGISTER_P (SET_SRC (PATTERN (i3)))
       && find_reg_note (i3, REG_DEAD, SET_SRC (PATTERN (i3)))
       && GET_CODE (PATTERN (i2)) == PARALLEL
       && ! side_effects_p (SET_DEST (PATTERN (i3)))
@@ -3219,7 +3219,7 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
 		{
 		  unsigned int regno = REGNO (newpat_dest);
 		  compare_mode = new_mode;
-		  if (regno < FIRST_PSEUDO_REGISTER)
+		  if (HARD_REGISTER_NUM_P (regno))
 		    newpat_dest = gen_rtx_REG (compare_mode, regno);
 		  else
 		    {
@@ -3588,7 +3588,7 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
 	      machine_mode old_mode = GET_MODE (i2dest);
 	      rtx ni2dest;
 
-	      if (REGNO (i2dest) < FIRST_PSEUDO_REGISTER)
+	      if (HARD_REGISTER_P (i2dest))
 		ni2dest = gen_rtx_REG (new_mode, REGNO (i2dest));
 	      else
 		{
@@ -3604,7 +3604,7 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
 	      m_split_insn = combine_split_insns (parallel, i3);
 
 	      if (m_split_insn == 0
-		  && REGNO (i2dest) >= FIRST_PSEUDO_REGISTER)
+		  && !HARD_REGISTER_P (i2dest))
 		{
 		  struct undo *buf;
 
@@ -3725,7 +3725,7 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
 	     validated that we can do this.  */
 	  if (GET_MODE (i2dest) != split_mode && split_mode != VOIDmode)
 	    {
-	      if (REGNO (i2dest) < FIRST_PSEUDO_REGISTER)
+	      if (HARD_REGISTER_P (i2dest))
 		newdest = gen_rtx_REG (split_mode, REGNO (i2dest));
 	      else
 		{
@@ -5390,7 +5390,7 @@  subst (rtx x, rtx from, rtx to, int in_dest, int i
 
 		  if (code == SUBREG
 		      && REG_P (to)
-		      && REGNO (to) < FIRST_PSEUDO_REGISTER
+		      && HARD_REGISTER_P (to)
 		      && simplify_subreg_regno (REGNO (to), GET_MODE (to),
 						SUBREG_BYTE (x),
 						GET_MODE (x)) < 0)
@@ -6645,7 +6645,7 @@  simplify_set (rtx x)
 	      unsigned int regno = REGNO (dest);
 	      rtx new_dest;
 
-	      if (regno < FIRST_PSEUDO_REGISTER)
+	      if (HARD_REGISTER_NUM_P (regno))
 		new_dest = gen_rtx_REG (compare_mode, regno);
 	      else
 		{
@@ -6750,7 +6750,7 @@  simplify_set (rtx x)
 	< GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))
 #endif
 #ifdef CANNOT_CHANGE_MODE_CLASS
-      && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
+      && ! (REG_P (dest) && HARD_REGISTER_P (dest)
 	    && REG_CANNOT_CHANGE_MODE_P (REGNO (dest),
 					 GET_MODE (SUBREG_REG (src)),
 					 GET_MODE (src)))
@@ -9815,7 +9815,7 @@  reg_nonzero_bits_for_combine (const_rtx x, machine
 	   && rsp->last_set_label < label_tick)
 	  || (rsp->last_set_label == label_tick
               && DF_INSN_LUID (rsp->last_set) < subst_low_luid)
-	  || (REGNO (x) >= FIRST_PSEUDO_REGISTER
+	  || (!HARD_REGISTER_P (x)
 	      && REGNO (x) < reg_n_sets_max
 	      && REG_N_SETS (REGNO (x)) == 1
 	      && !REGNO_REG_SET_P
@@ -9879,7 +9879,7 @@  reg_num_sign_bit_copies_for_combine (const_rtx x,
 	   && rsp->last_set_label < label_tick)
 	  || (rsp->last_set_label == label_tick
               && DF_INSN_LUID (rsp->last_set) < subst_low_luid)
-	  || (REGNO (x) >= FIRST_PSEUDO_REGISTER
+	  || (!HARD_REGISTER_P (x)
 	      && REGNO (x) < reg_n_sets_max
 	      && REG_N_SETS (REGNO (x)) == 1
 	      && !REGNO_REG_SET_P
@@ -12921,7 +12921,7 @@  record_truncated_value (rtx x)
     }
   /* ??? For hard-regs we now record everything.  We might be able to
      optimize this using last_set_mode.  */
-  else if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
+  else if (REG_P (x) && HARD_REGISTER_P (x))
     truncated_mode = GET_MODE (x);
   else
     return false;
@@ -13012,7 +13012,7 @@  get_last_value_validate (rtx *loc, rtx_insn *insn,
 	  if (rsp->last_set_invalid
 	      /* If this is a pseudo-register that was only set once and not
 		 live at the beginning of the function, it is always valid.  */
-	      || (! (regno >= FIRST_PSEUDO_REGISTER
+	      || (! (!HARD_REGISTER_NUM_P (regno)
 		     && regno < reg_n_sets_max
 		     && REG_N_SETS (regno) == 1
 		     && (!REGNO_REG_SET_P
@@ -13129,7 +13129,7 @@  get_last_value (const_rtx x)
 
   if (value == 0
       || (rsp->last_set_label < label_tick_ebb_start
-	  && (regno < FIRST_PSEUDO_REGISTER
+	  && (HARD_REGISTER_NUM_P (regno)
 	      || regno >= reg_n_sets_max
 	      || REG_N_SETS (regno) != 1
 	      || REGNO_REG_SET_P
@@ -13257,7 +13257,7 @@  reg_dead_at_p (rtx reg, rtx_insn *insn)
   /* Check that reg isn't mentioned in NEWPAT_USED_REGS.  For fixed registers
      we allow the machine description to decide whether use-and-clobber
      patterns are OK.  */
-  if (reg_dead_regno < FIRST_PSEUDO_REGISTER)
+  if (HARD_REGISTER_NUM_P (reg_dead_regno))
     {
       for (i = reg_dead_regno; i < reg_dead_endregno; i++)
 	if (!fixed_regs[i] && TEST_HARD_REG_BIT (newpat_used_regs, i))
@@ -13331,7 +13331,7 @@  mark_used_regs_combine (rtx x)
       regno = REGNO (x);
       /* A hard reg in a wide mode may really be multiple registers.
 	 If so, mark all of them just like the first.  */
-      if (regno < FIRST_PSEUDO_REGISTER)
+      if (HARD_REGISTER_NUM_P (regno))
 	{
 	  /* None of this applies to the stack, frame or arg pointers.  */
 	  if (regno == STACK_POINTER_REGNUM
@@ -13449,7 +13449,7 @@  move_deaths (rtx x, rtx maybe_kill_insn, int from_
 	     including X.  In that case, we must put REG_DEAD notes for
 	     the remaining registers in place of NOTE.  */
 
-	  if (note != 0 && regno < FIRST_PSEUDO_REGISTER
+	  if (note != 0 && HARD_REGISTER_NUM_P (regno)
 	      && (GET_MODE_SIZE (GET_MODE (XEXP (note, 0)))
 		  > GET_MODE_SIZE (GET_MODE (x))))
 	    {
@@ -13472,7 +13472,7 @@  move_deaths (rtx x, rtx maybe_kill_insn, int from_
 		    || (note != 0
 			&& (GET_MODE_SIZE (GET_MODE (XEXP (note, 0)))
 			    < GET_MODE_SIZE (GET_MODE (x)))))
-		   && regno < FIRST_PSEUDO_REGISTER
+		   && HARD_REGISTER_NUM_P (regno)
 		   && REG_NREGS (x) > 1)
 	    {
 	      unsigned int ourend = END_REGNO (x);
@@ -13588,7 +13588,7 @@  reg_bitfield_target_p (rtx x, rtx body)
 	return 0;
 
       tregno = REGNO (target), regno = REGNO (x);
-      if (tregno >= FIRST_PSEUDO_REGISTER || regno >= FIRST_PSEUDO_REGISTER)
+      if (!HARD_REGISTER_NUM_P (tregno) || !HARD_REGISTER_NUM_P (regno))
 	return target == x;
 
       endtregno = end_hard_regno (GET_MODE (target), tregno);
@@ -13922,7 +13922,7 @@  distribute_notes (rtx notes, rtx_insn *from_insn,
 		     TEM_INSN is doing.  If so, delete TEM_INSN.  Otherwise, make this
 		     into a REG_UNUSED note instead. Don't delete sets to
 		     global register vars.  */
-		  if ((REGNO (XEXP (note, 0)) >= FIRST_PSEUDO_REGISTER
+		  if ((!HARD_REGISTER_P (XEXP (note, 0))
 		       || !global_regs[REGNO (XEXP (note, 0))])
 		      && reg_set_p (XEXP (note, 0), PATTERN (tem_insn)))
 		    {