diff mbox

Find more shrink-wrapping opportunities

Message ID 4E8CC286.80107@codesourcery.com
State New
Headers show

Commit Message

Bernd Schmidt Oct. 5, 2011, 8:48 p.m. UTC
This adds a little mini-pass to shrink-wrapping, to eliminate a common
case that often makes shrink-wrapping unavailable. If a move insn copies
an argument registers to a call-saved register, the prologue must be
emitted before this insn. We should therefore try to delay such moves
for as long as possible.

A followup patch will add an extra cprop pass, which replaces uses of
the call-saved register with uses of the incoming argument register,
thereby allowing us to move the copies in even more cases.

Bootstrapped and tested on i686-linux. Ok?


Bernd
* function.c (prepare_shrink_wrap, bb_active_p): New function.
	(thread_prologue_and_epilogue_insns): Use bb_active_p.
	Call prepare_shrink_wrap, then recompute bb_active_p for the
	last block.

Comments

Steven Bosscher Oct. 5, 2011, 9:23 p.m. UTC | #1
On Wed, Oct 5, 2011 at 10:48 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> Bootstrapped and tested on i686-linux. Ok?

> +/* Return true if BB has any active insns.  */
> +static bool
> +bb_active_p (basic_block bb)
> +{
> +  rtx label;
> +
> +  /* Test whether there are active instructions in the last block.  */
> +  label = BB_END (bb);
> +  while (label && !LABEL_P (label))
> +    {
> +      if (active_insn_p (label))
> +	break;
> +      label = PREV_INSN (label);
> +    }
> +  return BB_HEAD (bb) != label || !LABEL_P (label);
> +}

This assumes that all basic blocks start with a label, I don't think
that's true. It seems better to use FOR_BB_INSNS_REVERSE here.

Ciao!
Steven
Bernd Schmidt Oct. 5, 2011, 9:38 p.m. UTC | #2
On 10/05/11 23:23, Steven Bosscher wrote:
> On Wed, Oct 5, 2011 at 10:48 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> Bootstrapped and tested on i686-linux. Ok?
> 
>> +/* Return true if BB has any active insns.  */
>> +static bool
>> +bb_active_p (basic_block bb)
>> +{
>> +  rtx label;
>> +
>> +  /* Test whether there are active instructions in the last block.  */
>> +  label = BB_END (bb);
>> +  while (label && !LABEL_P (label))
>> +    {
>> +      if (active_insn_p (label))
>> +	break;
>> +      label = PREV_INSN (label);
>> +    }
>> +  return BB_HEAD (bb) != label || !LABEL_P (label);
>> +}
> 
> This assumes that all basic blocks start with a label, I don't think
> that's true. It seems better to use FOR_BB_INSNS_REVERSE here.

It's the same code as before... and if you get a block without a label
and without active insns, then cfgcleanup hasn't done its job.


Bernd
Jeff Law Oct. 6, 2011, 8:16 p.m. UTC | #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/05/11 14:48, Bernd Schmidt wrote:
> This adds a little mini-pass to shrink-wrapping, to eliminate a
> common case that often makes shrink-wrapping unavailable. If a move
> insn copies an argument registers to a call-saved register, the
> prologue must be emitted before this insn. We should therefore try
> to delay such moves for as long as possible.
> 
> A followup patch will add an extra cprop pass, which replaces uses
> of the call-saved register with uses of the incoming argument
> register, thereby allowing us to move the copies in even more
> cases.
> 
> Bootstrapped and tested on i686-linux. Ok?
Just a couple comments.

Inside bb_active_p there's a comment referring to "last block", which
should be BB.  Just a nit that didn't get changed when ractoring that
code out of thread_prologue_and_epilogue_insns.

In the loop over edges within prepare_shrink_wrap, I think a comment
is warranted.  If I read the code correctly you're looking for the
case where the destination register is live on one and only one edge,
right?  But it'd be easier if there was just a comment to that effect.

OK with the two comment fixes.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOjgy0AAoJEBRtltQi2kC72RgH/1xTWE3Ixeu5KXbjb2UUa8G8
dBnlfL811DjASLLD07ubZRqbaimaj4EMW1QE4sXK0pqUWV62Rmoc82F3+dy+PfFd
qhl+ES4lwJHyHj562DfuIL9JHcjRYGXEoxcSEu2M7+iwMBZQADkcirY2qTAgd+ea
aogM2dnJbaHacgh0xw21r8AF0ZfjVrA1jZ0co4weDeTV4kiEccL+QgTjvLNDT/CV
awxkjpminrBi2zM6ERaukKh3ebkfLwXlOvHJTvDQvH+oDPsxZBQ0JqoMDRGlOkEf
soe7IF+eIwuWOUFr5Z1+7o/FQcsELwY9v6tceEudA/hHtJEk6MWtMGGXDSRhqWk=
=afiG
-----END PGP SIGNATURE-----
Richard Sandiford Nov. 26, 2011, 11:50 a.m. UTC | #4
Bernd Schmidt <bernds@codesourcery.com> writes:
> +		  CLEAR_HARD_REG_SET (set_regs);
> +		  note_stores (PATTERN (scan), record_hard_reg_sets,
> +			       &set_regs);
> +		  if (CALL_P (scan))
> +		    IOR_HARD_REG_SET (set_regs, call_used_reg_set);
> +		  for (link = REG_NOTES (scan); link; link = XEXP (link, 1))
> +		    if (REG_NOTE_KIND (link) == REG_INC)
> +		      record_hard_reg_sets (XEXP (link, 0), NULL, &set_regs);
> +
> +		  if (TEST_HARD_REG_BIT (set_regs, srcreg)
> +		      || reg_referenced_p (SET_DEST (set),
> +					   PATTERN (scan)))
> +		    {
> +		      scan = NULL_RTX;
> +		      break;
> +		    }
> +		  if (CALL_P (scan))
> +		    {
> +		      rtx link = CALL_INSN_FUNCTION_USAGE (scan);
> +		      while (link)
> +			{
> +			  rtx tmp = XEXP (link, 0);
> +			  if (GET_CODE (tmp) == USE
> +			      && reg_referenced_p (SET_DEST (set), tmp))
> +			    break;
> +			  link = XEXP (link, 1);
> +			}
> +		      if (link)
> +			{
> +			  scan = NULL_RTX;
> +			  break;
> +			}
> +		    }

Could we use DF_REF_USES/DEFS here instead?

I'd like to get a stage where new code should treat DF_REF_USES/DEFS
as the default way of testing for register usage.  I think this sort
of ad-hoc liveness stuff should be a last resort, and should have a
comment saying why DF_REF_USES/DEFS isn't suitable.

Rather than walk all the instructions of intermediate blocks,
you could just test DF_LR_BB_INFO (bb)->use to see whether the
block uses the register at all.

> +  FOR_BB_INSNS_SAFE (entry_block, insn, curr)

Is there any particular reason for doing a forward walk rather than
a backward walk?  A backward walk would allow chains to be moved,
and would avoid the need for the quadraticness in the current:

  for each insn I in bb
    for each insn J after I in bb

since you could then keep an up-to-date record of what registers
are used or set by the instructions that you aren't moving.

Also:

+/* Look for sets of call-saved registers in the first block of the
+   function, and move them down into successor blocks if the register
+   is used only on one path.  This exposes more opportunities for
+   shrink-wrapping.
+   These kinds of sets often occur when incoming argument registers are
+   moved to call-saved registers because their values are live across
+   one or more calls during the function.  */
+
+static void
+prepare_shrink_wrap (basic_block entry_block)

There's no check for call-savedness.  Is that deliberate?  I'm seeing
a case where we have:

   (set (reg 2) (reg 15))
   (set (reg 15) (reg 2))

(yes, it's silly, but bear with me) for call-clobbered registers 2 and 15.
We move the second instruction as far as possible, making 2 live for
much longer.  So if the prologue uses 2 as a temporary register, this
would actually prevent shrink-wrapping.

The reason I'm suddenly "reviewing" the code now is that it
doesn't prevent shrink-wrapping, because nothing adds register 2
to the liveness info of the affected blocks.  The temporary prologue
value of register 2 is then moved into register 15.

Testcase is gcc.c-torture/execute/920428-2.c on mips64-linux-gnu cc1
(although any MIPS should do) with:

    -O2 -mabi=32 -mips16 -mno-shared -mabicalls -march=mips32

Richard
Bernd Schmidt Nov. 29, 2011, 4:20 p.m. UTC | #5
On 11/26/11 12:50, Richard Sandiford wrote:
> Could we use DF_REF_USES/DEFS here instead?

I suppose we could, but I can't really get excited about it...

> Also:
> 
> +/* Look for sets of call-saved registers in the first block of the
> +   function, and move them down into successor blocks if the register
> +   is used only on one path.  This exposes more opportunities for
> +   shrink-wrapping.
> +   These kinds of sets often occur when incoming argument registers are
> +   moved to call-saved registers because their values are live across
> +   one or more calls during the function.  */
> +
> +static void
> +prepare_shrink_wrap (basic_block entry_block)
> 
> There's no check for call-savedness.  Is that deliberate?

At least one could argue that it the transformation always saves some
unnecessary work on at least one path.

> The reason I'm suddenly "reviewing" the code now is that it
> doesn't prevent shrink-wrapping, because nothing adds register 2
> to the liveness info of the affected blocks.  The temporary prologue
> value of register 2 is then moved into register 15.

Hmm. Are we just missing another df_analyze call?


Bernd
Richard Sandiford Nov. 29, 2011, 7:02 p.m. UTC | #6
Bernd Schmidt <bernds@codesourcery.com> writes:
>> The reason I'm suddenly "reviewing" the code now is that it
>> doesn't prevent shrink-wrapping, because nothing adds register 2
>> to the liveness info of the affected blocks.  The temporary prologue
>> value of register 2 is then moved into register 15.
>
> Hmm. Are we just missing another df_analyze call?

Well, if we do the kind of backwards walk I was thinking about (so that
we can handle chains), it might be better to update the liveness sets
we care about as we go.  A full df_analyze after each move would be
pretty expensive.

Richard
Richard Sandiford Nov. 29, 2011, 8:23 p.m. UTC | #7
Richard Sandiford <rdsandiford@googlemail.com> writes:
> Bernd Schmidt <bernds@codesourcery.com> writes:
>>> The reason I'm suddenly "reviewing" the code now is that it
>>> doesn't prevent shrink-wrapping, because nothing adds register 2
>>> to the liveness info of the affected blocks.  The temporary prologue
>>> value of register 2 is then moved into register 15.
>>
>> Hmm. Are we just missing another df_analyze call?
>
> Well, if we do the kind of backwards walk I was thinking about (so that
> we can handle chains), it might be better to update the liveness sets
> we care about as we go.  A full df_analyze after each move would be
> pretty expensive.

FWIW, I've got a patch that I'll try to test this weekend.

Richard
Bernd Schmidt Nov. 29, 2011, 10:03 p.m. UTC | #8
On 11/29/11 20:02, Richard Sandiford wrote:
> Bernd Schmidt <bernds@codesourcery.com> writes:
>>> The reason I'm suddenly "reviewing" the code now is that it
>>> doesn't prevent shrink-wrapping, because nothing adds register 2
>>> to the liveness info of the affected blocks.  The temporary prologue
>>> value of register 2 is then moved into register 15.
>>
>> Hmm. Are we just missing another df_analyze call?
> 
> Well, if we do the kind of backwards walk I was thinking about (so that
> we can handle chains), it might be better to update the liveness sets
> we care about as we go.  A full df_analyze after each move would be
> pretty expensive.

Doesn't that have some cleverness to update only modified basic blocks?


Bernd
Richard Sandiford Nov. 30, 2011, 9:26 a.m. UTC | #9
Bernd Schmidt <bernds@codesourcery.com> writes:
> On 11/29/11 20:02, Richard Sandiford wrote:
>> Bernd Schmidt <bernds@codesourcery.com> writes:
>>>> The reason I'm suddenly "reviewing" the code now is that it
>>>> doesn't prevent shrink-wrapping, because nothing adds register 2
>>>> to the liveness info of the affected blocks.  The temporary prologue
>>>> value of register 2 is then moved into register 15.
>>>
>>> Hmm. Are we just missing another df_analyze call?
>> 
>> Well, if we do the kind of backwards walk I was thinking about (so that
>> we can handle chains), it might be better to update the liveness sets
>> we care about as we go.  A full df_analyze after each move would be
>> pretty expensive.
>
> Doesn't that have some cleverness to update only modified basic blocks?

Yes, but it still involves recomputing the LR info for each bb,
which in turn means walking the insns in the affected blocks.
Then we enter the iterative dataflow solver to propagate the
changes to other blocks (although it will of course converge
quickly in this case).  Doing that once per move would be expensive.

Richard
Andrew Pinski Dec. 2, 2011, 2:47 a.m. UTC | #10
On Wed, Oct 5, 2011 at 1:48 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> This adds a little mini-pass to shrink-wrapping, to eliminate a common
> case that often makes shrink-wrapping unavailable. If a move insn copies
> an argument registers to a call-saved register, the prologue must be
> emitted before this insn. We should therefore try to delay such moves
> for as long as possible.
>
> A followup patch will add an extra cprop pass, which replaces uses of
> the call-saved register with uses of the incoming argument register,
> thereby allowing us to move the copies in even more cases.

By the way the testcase in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474 seems to the
testcase which needs the extra cprop pass to allow for shrink
wrapping.

Thanks,
Andrew Pinski
Richard Sandiford Dec. 4, 2011, 12:27 p.m. UTC | #11
Richard Sandiford <rdsandiford@googlemail.com> writes:
> Richard Sandiford <rdsandiford@googlemail.com> writes:
>> Bernd Schmidt <bernds@codesourcery.com> writes:
>>>> The reason I'm suddenly "reviewing" the code now is that it
>>>> doesn't prevent shrink-wrapping, because nothing adds register 2
>>>> to the liveness info of the affected blocks.  The temporary prologue
>>>> value of register 2 is then moved into register 15.
>>>
>>> Hmm. Are we just missing another df_analyze call?
>>
>> Well, if we do the kind of backwards walk I was thinking about (so that
>> we can handle chains), it might be better to update the liveness sets
>> we care about as we go.  A full df_analyze after each move would be
>> pretty expensive.
>
> FWIW, I've got a patch that I'll try to test this weekend.

OK, here it is.  As well as switching to the backward scan and incremental
liveness updates, I added a test for another case that I stumbled over:

  /* Reject targets of abnormal edges.  This is needed for correctness
     on ports like Alpha and MIPS, whose pic_offset_table_rtx can die on
     exception edges even though it is generally treated as call-saved
     for the majority of the compilation.  Moving across abnormal edges
     isn't going to be interesting for shrink-wrap usage anyway.  */
  if (live_edge->flags & EDGE_ABNORMAL)
    return NULL;

The point is that when Alpha and MIPS have call-clobbered global pointers,
they say that the global pointer is call-saved and ensure that the call
patterns restore the gp where necessary.  These combined "call and load"
patterns get split after reload (or in MIPS's case, after prologue/
epilogue generation).  The ports then make sure that exception_receiver
also restores the gp.  The problem is that, if we insert a move from
pic_offset_table_rtx before exception_receiver, we end up moving the
wrong value.

Now all this is really a bit of a hack.  The information ought to be
made more explicit to the target-independent parts of the compiler.
That's going to be quite hard to fix though, and probably isn't stage 3
material.  So while I admit I don't like the test above, it feels like
the best fix for now.

Also, it seemed easiest to drop the single-register restriction at
the same time.  Hopefully this will help for things like DF moves
on 32-bit MIPS FPUs.

While moving bb_note to cfgrtl.c, I was tempted to make rtl_split_block
use it rather than first_insn_after_basic_block_note.  It isn't exactly
a one-for-one transformation though, at least not as far as null checks
go, so I'll leave it for a possible stage 1 cleanup.

Tested on mips64-linux-gnu.  OK to install?

Richard


gcc/
	* sched-int.h (bb_note): Move to...
	* basic-block.h: ...here.
	* haifa-sched.c (bb_note): Move to...
	* cfgrtl.c: ...here.
	* function.c (next_block_for_reg): New function.
	(move_insn_for_shrink_wrap): Likewise.
	(prepare_shrink_wrap): Rewrite to use the above.

Index: gcc/sched-int.h
===================================================================
*** gcc/sched-int.h	2011-12-04 08:52:37.000000000 +0000
--- gcc/sched-int.h	2011-12-04 12:03:51.000000000 +0000
*************** extern void sched_insns_init (rtx);
*** 130,136 ****
  extern void sched_insns_finish (void);
  
  extern void *xrecalloc (void *, size_t, size_t, size_t);
- extern rtx bb_note (basic_block);
  
  extern void reemit_notes (rtx);
  
--- 130,135 ----
Index: gcc/basic-block.h
===================================================================
*** gcc/basic-block.h	2011-12-04 08:52:37.000000000 +0000
--- gcc/basic-block.h	2011-12-04 12:03:51.000000000 +0000
*************** extern void flow_edge_list_print (const
*** 801,806 ****
--- 801,807 ----
  
  /* In cfgrtl.c  */
  extern rtx block_label (basic_block);
+ extern rtx bb_note (basic_block);
  extern bool purge_all_dead_edges (void);
  extern bool purge_dead_edges (basic_block);
  extern bool fixup_abnormal_edges (void);
Index: gcc/haifa-sched.c
===================================================================
*** gcc/haifa-sched.c	2011-12-04 08:52:37.000000000 +0000
--- gcc/haifa-sched.c	2011-12-04 12:03:51.000000000 +0000
*************** add_jump_dependencies (rtx insn, rtx jum
*** 6489,6508 ****
    gcc_assert (!sd_lists_empty_p (jump, SD_LIST_BACK));
  }
  
- /* Return the NOTE_INSN_BASIC_BLOCK of BB.  */
- rtx
- bb_note (basic_block bb)
- {
-   rtx note;
- 
-   note = BB_HEAD (bb);
-   if (LABEL_P (note))
-     note = NEXT_INSN (note);
- 
-   gcc_assert (NOTE_INSN_BASIC_BLOCK_P (note));
-   return note;
- }
- 
  /* Extend data structures for logical insn UID.  */
  void
  sched_extend_luids (void)
--- 6489,6494 ----
Index: gcc/cfgrtl.c
===================================================================
*** gcc/cfgrtl.c	2011-12-04 08:52:37.000000000 +0000
--- gcc/cfgrtl.c	2011-12-04 12:03:51.000000000 +0000
*************** update_bb_for_insn (basic_block bb)
*** 500,505 ****
--- 500,519 ----
  }
  
  
+ /* Return the NOTE_INSN_BASIC_BLOCK of BB.  */
+ rtx
+ bb_note (basic_block bb)
+ {
+   rtx note;
+ 
+   note = BB_HEAD (bb);
+   if (LABEL_P (note))
+     note = NEXT_INSN (note);
+ 
+   gcc_assert (NOTE_INSN_BASIC_BLOCK_P (note));
+   return note;
+ }
+ 
  /* Return the INSN immediately following the NOTE_INSN_BASIC_BLOCK
     note associated with the BLOCK.  */
  
Index: gcc/function.c
===================================================================
*** gcc/function.c	2011-12-04 08:52:37.000000000 +0000
--- gcc/function.c	2011-12-04 12:21:12.000000000 +0000
*************** requires_stack_frame_p (rtx insn, HARD_R
*** 5330,5455 ****
    return false;
  }
  
! /* Look for sets of call-saved registers in the first block of the
!    function, and move them down into successor blocks if the register
!    is used only on one path.  This exposes more opportunities for
!    shrink-wrapping.
!    These kinds of sets often occur when incoming argument registers are
!    moved to call-saved registers because their values are live across
!    one or more calls during the function.  */
  
! static void
! prepare_shrink_wrap (basic_block entry_block)
  {
!   rtx insn, curr;
!   FOR_BB_INSNS_SAFE (entry_block, insn, curr)
      {
!       basic_block next_bb;
!       edge e, live_edge;
!       edge_iterator ei;
!       rtx set, scan;
!       unsigned destreg, srcreg;
! 
!       if (!NONDEBUG_INSN_P (insn))
! 	continue;
!       set = single_set (insn);
!       if (!set)
! 	continue;
! 
!       if (!REG_P (SET_SRC (set)) || !REG_P (SET_DEST (set)))
! 	continue;
!       srcreg = REGNO (SET_SRC (set));
!       destreg = REGNO (SET_DEST (set));
!       if (hard_regno_nregs[srcreg][GET_MODE (SET_SRC (set))] > 1
! 	  || hard_regno_nregs[destreg][GET_MODE (SET_DEST (set))] > 1)
! 	continue;
  
!       next_bb = entry_block;
!       scan = insn;
  
!       for (;;)
  	{
! 	  live_edge = NULL;
! 	  /* Try to find a single edge across which the register is live.
! 	     If we find one, we'll try to move the set across this edge.  */
! 	  FOR_EACH_EDGE (e, ei, next_bb->succs)
! 	    {
! 	      if (REGNO_REG_SET_P (df_get_live_in (e->dest), destreg))
! 		{
! 		  if (live_edge)
! 		    {
! 		      live_edge = NULL;
! 		      break;
! 		    }
! 		  live_edge = e;
! 		}
! 	    }
! 	  if (!live_edge)
! 	    break;
! 	  /* We can sometimes encounter dead code.  Don't try to move it
! 	     into the exit block.  */
! 	  if (live_edge->dest == EXIT_BLOCK_PTR)
! 	    break;
! 	  if (EDGE_COUNT (live_edge->dest->preds) > 1)
! 	    break;
! 	  while (scan != BB_END (next_bb))
! 	    {
! 	      scan = NEXT_INSN (scan);
! 	      if (NONDEBUG_INSN_P (scan))
! 		{
! 		  rtx link;
! 		  HARD_REG_SET set_regs;
! 
! 		  CLEAR_HARD_REG_SET (set_regs);
! 		  note_stores (PATTERN (scan), record_hard_reg_sets,
! 			       &set_regs);
! 		  if (CALL_P (scan))
! 		    IOR_HARD_REG_SET (set_regs, call_used_reg_set);
! 		  for (link = REG_NOTES (scan); link; link = XEXP (link, 1))
! 		    if (REG_NOTE_KIND (link) == REG_INC)
! 		      record_hard_reg_sets (XEXP (link, 0), NULL, &set_regs);
! 
! 		  if (TEST_HARD_REG_BIT (set_regs, srcreg)
! 		      || reg_referenced_p (SET_DEST (set),
! 					   PATTERN (scan)))
! 		    {
! 		      scan = NULL_RTX;
! 		      break;
! 		    }
! 		  if (CALL_P (scan))
! 		    {
! 		      rtx link = CALL_INSN_FUNCTION_USAGE (scan);
! 		      while (link)
! 			{
! 			  rtx tmp = XEXP (link, 0);
! 			  if (GET_CODE (tmp) == USE
! 			      && reg_referenced_p (SET_DEST (set), tmp))
! 			    break;
! 			  link = XEXP (link, 1);
! 			}
! 		      if (link)
! 			{
! 			  scan = NULL_RTX;
! 			  break;
! 			}
! 		    }
! 		}
! 	    }
! 	  if (!scan)
! 	    break;
! 	  next_bb = live_edge->dest;
  	}
  
!       if (next_bb != entry_block)
  	{
! 	  rtx after = BB_HEAD (next_bb);
! 	  while (!NOTE_P (after)
! 		 || NOTE_KIND (after) != NOTE_INSN_BASIC_BLOCK)
! 	    after = NEXT_INSN (after);
! 	  emit_insn_after (PATTERN (insn), after);
! 	  delete_insn (insn);
  	}
      }
  }
  
  #endif
--- 5330,5511 ----
    return false;
  }
  
! /* See whether BB has a single successor that uses [REGNO, END_REGNO),
!    and if BB is its only predecessor.  Return that block if so,
!    otherwise return null.  */
  
! static basic_block
! next_block_for_reg (basic_block bb, int regno, int end_regno)
  {
!   edge e, live_edge;
!   edge_iterator ei;
!   bitmap live;
!   int i;
! 
!   live_edge = NULL;
!   FOR_EACH_EDGE (e, ei, bb->succs)
      {
!       live = df_get_live_in (e->dest);
!       for (i = regno; i < end_regno; i++)
! 	if (REGNO_REG_SET_P (live, i))
! 	  {
! 	    if (live_edge && live_edge != e)
! 	      return NULL;
! 	    live_edge = e;
! 	  }
!     }
! 
!   /* We can sometimes encounter dead code.  Don't try to move it
!      into the exit block.  */
!   if (!live_edge || live_edge->dest == EXIT_BLOCK_PTR)
!     return NULL;
! 
!   /* Reject targets of abnormal edges.  This is needed for correctness
!      on ports like Alpha and MIPS, whose pic_offset_table_rtx can die on
!      exception edges even though it is generally treated as call-saved
!      for the majority of the compilation.  Moving across abnormal edges
!      isn't going to be interesting for shrink-wrap usage anyway.  */
!   if (live_edge->flags & EDGE_ABNORMAL)
!     return NULL;
  
!   if (EDGE_COUNT (live_edge->dest->preds) > 1)
!     return NULL;
  
!   return live_edge->dest;
! }
! 
! /* Try to move INSN from BB to a successor.  Return true on success.
!    USES and DEFS are the set of registers that are used and defined
!    after INSN in BB.  */
! 
! static bool
! move_insn_for_shrink_wrap (basic_block bb, rtx insn,
! 			   const HARD_REG_SET uses,
! 			   const HARD_REG_SET defs)
! {
!   rtx set, src, dest;
!   bitmap live_out, live_in, bb_uses, bb_defs;
!   unsigned int i, dregno, end_dregno, sregno, end_sregno;
!   basic_block next_block;
! 
!   /* Look for a simple register copy.  */
!   set = single_set (insn);
!   if (!set)
!     return false;
!   src = SET_SRC (set);
!   dest = SET_DEST (set);
!   if (!REG_P (dest) || !REG_P (src))
!     return false;
! 
!   /* Make sure that the source register isn't defined later in BB.  */
!   sregno = REGNO (src);
!   end_sregno = END_REGNO (src);
!   if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
!     return false;
! 
!   /* Make sure that the destination register isn't referenced later in BB.  */
!   dregno = REGNO (dest);
!   end_dregno = END_REGNO (dest);
!   if (overlaps_hard_reg_set_p (uses, GET_MODE (dest), dregno)
!       || overlaps_hard_reg_set_p (defs, GET_MODE (dest), dregno))
!     return false;
! 
!   /* See whether there is a successor block to which we could move INSN.  */
!   next_block = next_block_for_reg (bb, dregno, end_dregno);
!   if (!next_block)
!     return false;
! 
!   /* At this point we are committed to moving INSN, but let's try to
!      move it as far as we can.  */
!   do
!     {
!       live_out = df_get_live_out (bb);
!       live_in = df_get_live_in (next_block);
!       bb = next_block;
! 
!       /* Check whether BB uses DEST or clobbers DEST.  We need to add
! 	 INSN to BB if so.  Either way, DEST is no longer live on entry,
! 	 except for any part that overlaps SRC (next loop).  */
!       bb_uses = &DF_LR_BB_INFO (bb)->use;
!       bb_defs = &DF_LR_BB_INFO (bb)->def;
!       for (i = dregno; i < end_dregno; i++)
  	{
! 	  if (REGNO_REG_SET_P (bb_uses, i) || REGNO_REG_SET_P (bb_defs, i))
! 	    next_block = NULL;
! 	  CLEAR_REGNO_REG_SET (live_out, i);
! 	  CLEAR_REGNO_REG_SET (live_in, i);
  	}
  
!       /* Check whether BB clobbers SRC.  We need to add INSN to BB if so.
! 	 Either way, SRC is now live on entry.  */
!       for (i = sregno; i < end_sregno; i++)
  	{
! 	  if (REGNO_REG_SET_P (bb_defs, i))
! 	    next_block = NULL;
! 	  SET_REGNO_REG_SET (live_out, i);
! 	  SET_REGNO_REG_SET (live_in, i);
  	}
+ 
+       /* If we don't need to add the move to BB, look for a single
+ 	 successor block.  */
+       if (next_block)
+ 	next_block = next_block_for_reg (next_block, dregno, end_dregno);
      }
+   while (next_block);
+ 
+   /* BB now defines DEST.  It only uses the parts of DEST that overlap SRC
+      (next loop).  */
+   for (i = dregno; i < end_dregno; i++)
+     {
+       CLEAR_REGNO_REG_SET (bb_uses, i);
+       SET_REGNO_REG_SET (bb_defs, i);
+     }
+ 
+   /* BB now uses SRC.  */
+   for (i = sregno; i < end_sregno; i++)
+     SET_REGNO_REG_SET (bb_uses, i);
+ 
+   emit_insn_after (PATTERN (insn), bb_note (bb));
+   delete_insn (insn);
+   return true;
+ }
+ 
+ /* Look for register copies in the first block of the function, and move
+    them down into successor blocks if the register is used only on one
+    path.  This exposes more opportunities for shrink-wrapping.  These
+    kinds of sets often occur when incoming argument registers are moved
+    to call-saved registers because their values are live across one or
+    more calls during the function.  */
+ 
+ static void
+ prepare_shrink_wrap (basic_block entry_block)
+ {
+   rtx insn, curr, x;
+   HARD_REG_SET uses, defs;
+   df_ref *ref;
+ 
+   CLEAR_HARD_REG_SET (uses);
+   CLEAR_HARD_REG_SET (defs);
+   FOR_BB_INSNS_REVERSE_SAFE (entry_block, insn, curr)
+     if (NONDEBUG_INSN_P (insn)
+ 	&& !move_insn_for_shrink_wrap (entry_block, insn, uses, defs))
+       {
+ 	/* Add all defined registers to DEFs.  */
+ 	for (ref = DF_INSN_DEFS (insn); *ref; ref++)
+ 	  {
+ 	    x = DF_REF_REG (*ref);
+ 	    if (REG_P (x) && HARD_REGISTER_P (x))
+ 	      SET_HARD_REG_BIT (defs, REGNO (x));
+ 	  }
+ 
+ 	/* Add all used registers to USESs.  */
+ 	for (ref = DF_INSN_USES (insn); *ref; ref++)
+ 	  {
+ 	    x = DF_REF_REG (*ref);
+ 	    if (REG_P (x) && HARD_REGISTER_P (x))
+ 	      SET_HARD_REG_BIT (uses, REGNO (x));
+ 	  }
+       }
  }
  
  #endif
Bernd Schmidt Jan. 9, 2012, 3:08 p.m. UTC | #12
On 12/04/2011 01:27 PM, Richard Sandiford wrote:

> OK, here it is.  As well as switching to the backward scan and incremental
> liveness updates, I added a test for another case that I stumbled over:
> 
>   /* Reject targets of abnormal edges.  This is needed for correctness
>      on ports like Alpha and MIPS, whose pic_offset_table_rtx can die on
>      exception edges even though it is generally treated as call-saved
>      for the majority of the compilation.  Moving across abnormal edges
>      isn't going to be interesting for shrink-wrap usage anyway.  */
>   if (live_edge->flags & EDGE_ABNORMAL)
>     return NULL;
> 

> 	* sched-int.h (bb_note): Move to...
> 	* basic-block.h: ...here.
> 	* haifa-sched.c (bb_note): Move to...
> 	* cfgrtl.c: ...here.
> 	* function.c (next_block_for_reg): New function.
> 	(move_insn_for_shrink_wrap): Likewise.
> 	(prepare_shrink_wrap): Rewrite to use the above.

Ok and thanks.


Bernd
diff mbox

Patch

Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 179577)
+++ gcc/function.c	(working copy)
@@ -5352,6 +5352,127 @@  requires_stack_frame_p (rtx insn, HARD_R
       return true;
   return false;
 }
+
+/* Look for sets of call-saved registers in the first block of the
+   function, and move them down into successor blocks if the register
+   is used only on one path.  This exposes more opportunities for
+   shrink-wrapping.
+   These kinds of sets often occur when incoming argument registers are
+   moved to call-saved registers because their values are live across
+   one or more calls during the function.  */
+
+static void
+prepare_shrink_wrap (basic_block entry_block)
+{
+  rtx insn, curr;
+  FOR_BB_INSNS_SAFE (entry_block, insn, curr)
+    {
+      basic_block next_bb;
+      edge e, live_edge;
+      edge_iterator ei;
+      rtx set, scan;
+      unsigned destreg, srcreg;
+
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+      set = single_set (insn);
+      if (!set)
+	continue;
+
+      if (!REG_P (SET_SRC (set)) || !REG_P (SET_DEST (set)))
+	continue;
+      srcreg = REGNO (SET_SRC (set));
+      destreg = REGNO (SET_DEST (set));
+      if (hard_regno_nregs[srcreg][GET_MODE (SET_SRC (set))] > 1
+	  || hard_regno_nregs[destreg][GET_MODE (SET_DEST (set))] > 1)
+	continue;
+
+      next_bb = entry_block;
+      scan = insn;
+
+      for (;;)
+	{
+	  live_edge = NULL;
+	  FOR_EACH_EDGE (e, ei, next_bb->succs)
+	    {
+	      if (REGNO_REG_SET_P (df_get_live_in (e->dest), destreg))
+		{
+		  if (live_edge)
+		    {
+		      live_edge = NULL;
+		      break;
+		    }
+		  live_edge = e;
+		}
+	    }
+	  if (!live_edge)
+	    break;
+	  /* We can sometimes encounter dead code.  Don't try to move it
+	     into the exit block.  */
+	  if (live_edge->dest == EXIT_BLOCK_PTR)
+	    break;
+	  if (EDGE_COUNT (live_edge->dest->preds) > 1)
+	    break;
+	  while (scan != BB_END (next_bb))
+	    {
+	      scan = NEXT_INSN (scan);
+	      if (NONDEBUG_INSN_P (scan))
+		{
+		  rtx link;
+		  HARD_REG_SET set_regs;
+
+		  CLEAR_HARD_REG_SET (set_regs);
+		  note_stores (PATTERN (scan), record_hard_reg_sets,
+			       &set_regs);
+		  if (CALL_P (scan))
+		    IOR_HARD_REG_SET (set_regs, call_used_reg_set);
+		  for (link = REG_NOTES (scan); link; link = XEXP (link, 1))
+		    if (REG_NOTE_KIND (link) == REG_INC)
+		      record_hard_reg_sets (XEXP (link, 0), NULL, &set_regs);
+
+		  if (TEST_HARD_REG_BIT (set_regs, srcreg)
+		      || reg_referenced_p (SET_DEST (set),
+					   PATTERN (scan)))
+		    {
+		      scan = NULL_RTX;
+		      break;
+		    }
+		  if (CALL_P (scan))
+		    {
+		      rtx link = CALL_INSN_FUNCTION_USAGE (scan);
+		      while (link)
+			{
+			  rtx tmp = XEXP (link, 0);
+			  if (GET_CODE (tmp) == USE
+			      && reg_referenced_p (SET_DEST (set), tmp))
+			    break;
+			  link = XEXP (link, 1);
+			}
+		      if (link)
+			{
+			  scan = NULL_RTX;
+			  break;
+			}
+		    }
+		}
+	    }
+	  if (!scan)
+	    break;
+	  next_bb = live_edge->dest;
+	}
+
+      if (next_bb != entry_block)
+	{
+	  rtx after = BB_HEAD (next_bb);
+	  while (!NOTE_P (after)
+		 || NOTE_KIND (after) != NOTE_INSN_BASIC_BLOCK)
+	    after = NEXT_INSN (after);
+	  emit_insn_after (PATTERN (insn), after);
+	  delete_insn (insn);
+	}
+    }
+}
+
 #endif
 
 #ifdef HAVE_return
@@ -5400,6 +5521,23 @@  emit_return_into_block (bool simple_p, b
 }
 #endif
 
+/* Return true if BB has any active insns.  */
+static bool
+bb_active_p (basic_block bb)
+{
+  rtx label;
+
+  /* Test whether there are active instructions in the last block.  */
+  label = BB_END (bb);
+  while (label && !LABEL_P (label))
+    {
+      if (active_insn_p (label))
+	break;
+      label = PREV_INSN (label);
+    }
+  return BB_HEAD (bb) != label || !LABEL_P (label);
+}
+
 /* Generate the prologue and epilogue RTL if the machine supports it.  Thread
    this into place with notes indicating where the prologue ends and where
    the epilogue begins.  Update the basic block information when possible.
@@ -5486,19 +5624,8 @@  thread_prologue_and_epilogue_insns (void
   exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds);
   if (exit_fallthru_edge != NULL)
     {
-      rtx label;
-
       last_bb = exit_fallthru_edge->src;
-      /* Test whether there are active instructions in the last block.  */
-      label = BB_END (last_bb);
-      while (label && !LABEL_P (label))
-	{
-	  if (active_insn_p (label))
-	    break;
-	  label = PREV_INSN (label);
-	}
-
-      last_bb_active = BB_HEAD (last_bb) != label || !LABEL_P (label);
+      last_bb_active = bb_active_p (last_bb);
     }
   else
     {
@@ -5607,6 +5734,12 @@  thread_prologue_and_epilogue_insns (void
 	  note_stores (PATTERN (p_insn), record_hard_reg_sets,
 		       &prologue_clobbered);
 
+      prepare_shrink_wrap (entry_edge->dest);
+
+      /* That may have inserted instructions into the last block.  */
+      if (last_bb && !last_bb_active)
+	last_bb_active = bb_active_p (last_bb);
+
       bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack);
       bitmap_initialize (&bb_on_list, &bitmap_default_obstack);