diff mbox

ifcvt/crossjump patch: Fix PR 42496, 21803

Message ID 4C460A1F.4000509@codesourcery.com
State New
Headers show

Commit Message

Bernd Schmidt July 20, 2010, 8:42 p.m. UTC
On 04/20/2010 12:05 AM, Eric Botcazou wrote:
> Bernd Schmidt wrote: 
>> Here's the second part.  This one should help most architectures, not
>> just the ones with conditional execution.  I've observed it helps on
>> i686 and arm, with the following being a typical result:
>>
>>  .L18:
>>         ldr     r1, [r2, #4]
>>         cmp     r1, #34
>> -       it      hi
>> -       ldrhi   r3, .L98+12
>> -       bhi     .L28
>>         ldr     r3, .L98+12
>> +       bhi     .L28
>>         ldrb    r2, [r3, #4]    @ zero_extendqisi2
>>         cbz     r2, .L29
>>         ldr     r3, [r3, #8]
> 
> I'm uncomfortable with this patch because I'm not sure it belongs in ifcvt.c.
> Conceptually it's a reversed form of cross jumping so it could be implemented 
> more generally in cfgcleanup.c.  And other transformations should already be 
> able to apply this kind of optimizations.  Do you have testcases?

Here's a new patch.  A testcase is included; as I mentioned before this
triggers quite frequently.  This is PR44374.

I've moved and reused code from dead_or_predicable for a new function
can_move_insns_across.  The tests in dead_or_predicable were still
somewhat ad-hoc, after the patch I believe it's using the exact
necessary and sufficient conditions for moving code.

Bootstrapped and regression tested on i686-linux.  Ok?


Bernd
PR rtl-optimization/44374
	* ifcvt.c (find_memory): Remove function.
	(dead_or_predicable): Use can_move_insns_across.
	* df.h (can_move_insns_across): Declare function.
	* cfgcleanup.c (block_was_dirty): New static variable.
	(try_head_merge_bb): New static function.
	(try_optimize_cfg): Call it.  Call df_analyze if block_was_dirty
	is set.
	* df-problems.c: Include "target.h"
	(df_simulate_find_uses): New static function.
	(MEMREF_NORMAL, MEMREF_VOLATILE): New macros.
	(find_memory, find_memory_store): New static functions.
	(can_move_insns_across): New function.
	* Makefile.in (df-problems.o): Update dependencies.

testsuite/
	PR rtl-optimization/44374
	* gcc.target/arm/headmerge-1.c: New test.

Comments

Eric Botcazou July 22, 2010, 7:47 p.m. UTC | #1
> Here's a new patch.  A testcase is included; as I mentioned before this
> triggers quite frequently.  This is PR44374.
>
> I've moved and reused code from dead_or_predicable for a new function
> can_move_insns_across.  The tests in dead_or_predicable were still
> somewhat ad-hoc, after the patch I believe it's using the exact
> necessary and sufficient conditions for moving code.

I'll look into it tomorrow.  Btw, would you mind taking a look at the audit 
trail of PR rtl-opt/44484?  TIA.
Jeff Law Aug. 2, 2010, 3:56 p.m. UTC | #2
On 07/20/10 14:42, Bernd Schmidt wrote:
> On 04/20/2010 12:05 AM, Eric Botcazou wrote:
>> Bernd Schmidt wrote:
>>> Here's the second part.  This one should help most architectures, not
>>> just the ones with conditional execution.  I've observed it helps on
>>> i686 and arm, with the following being a typical result:
>>>
>>>   .L18:
>>>          ldr     r1, [r2, #4]
>>>          cmp     r1, #34
>>> -       it      hi
>>> -       ldrhi   r3, .L98+12
>>> -       bhi     .L28
>>>          ldr     r3, .L98+12
>>> +       bhi     .L28
>>>          ldrb    r2, [r3, #4]    @ zero_extendqisi2
>>>          cbz     r2, .L29
>>>          ldr     r3, [r3, #8]
>> I'm uncomfortable with this patch because I'm not sure it belongs in ifcvt.c.
>> Conceptually it's a reversed form of cross jumping so it could be implemented
>> more generally in cfgcleanup.c.  And other transformations should already be
>> able to apply this kind of optimizations.  Do you have testcases?
> Here's a new patch.  A testcase is included; as I mentioned before this
> triggers quite frequently.  This is PR44374.
>
> I've moved and reused code from dead_or_predicable for a new function
> can_move_insns_across.  The tests in dead_or_predicable were still
> somewhat ad-hoc, after the patch I believe it's using the exact
> necessary and sufficient conditions for moving code.
>
> Bootstrapped and regression tested on i686-linux.  Ok?
Was this the last version of the patch?  It looks pretty good to me.

jeff
Bernd Schmidt Aug. 2, 2010, 3:59 p.m. UTC | #3
On 08/02/2010 05:56 PM, Jeff Law wrote:
>  On 07/20/10 14:42, Bernd Schmidt wrote:
>> Bootstrapped and regression tested on i686-linux.  Ok?
> Was this the last version of the patch?  It looks pretty good to me.

Yes, but some changes are necessary and I expect I'll resubmit a new one
shortly.


Bernd
Jeff Law Aug. 2, 2010, 4:05 p.m. UTC | #4
On 08/02/10 09:59, Bernd Schmidt wrote:
> On 08/02/2010 05:56 PM, Jeff Law wrote:
>>   On 07/20/10 14:42, Bernd Schmidt wrote:
>>> Bootstrapped and regression tested on i686-linux.  Ok?
>> Was this the last version of the patch?  It looks pretty good to me.
> Yes, but some changes are necessary and I expect I'll resubmit a new one
> shortly.
>
OK.  If you could highlight in a quick blurb what changed it'd be 
appreciated -- it'll save me from having to look over the whole thing 
again to figure out what changed from the previous version.

Thanks,

jeff
Bernd Schmidt Aug. 2, 2010, 4:15 p.m. UTC | #5
On 08/02/2010 06:05 PM, Jeff Law wrote:
>  On 08/02/10 09:59, Bernd Schmidt wrote:
>> On 08/02/2010 05:56 PM, Jeff Law wrote:
>>>   On 07/20/10 14:42, Bernd Schmidt wrote:
>>>> Bootstrapped and regression tested on i686-linux.  Ok?
>>> Was this the last version of the patch?  It looks pretty good to me.
>> Yes, but some changes are necessary and I expect I'll resubmit a new one
>> shortly.
>>
> OK.  If you could highlight in a quick blurb what changed it'd be
> appreciated -- it'll save me from having to look over the whole thing
> again to figure out what changed from the previous version.

I intend to make the change I previously mentioned to add a per-bb flag
which notes it's been modified, so that we can use that on the second
pass to decide whether or not to try to optimize it, rather than using
df_get_bb_dirty (since that gets cleared on df_analyze).  Earlier
versions of gcc had a BB_DIRTY bit in bb->flags, I'll reintroduce that
as BB_MODIFIED.  That's cheaper to test anyway.

The other change I'll make is to be slightly more careful wrt. volatile
asms, not moving memory references across them.


Bernd
diff mbox

Patch

Index: ifcvt.c
===================================================================
--- ifcvt.c.orig
+++ ifcvt.c
@@ -101,7 +101,6 @@  static int noce_find_if_block (basic_blo
 static int cond_exec_find_if_block (ce_if_block_t *);
 static int find_if_case_1 (basic_block, edge, edge);
 static int find_if_case_2 (basic_block, edge, edge);
-static int find_memory (rtx *, void *);
 static int dead_or_predicable (basic_block, basic_block, basic_block,
 			       basic_block, int);
 static void noce_emit_move_insn (rtx, rtx);
@@ -3877,15 +3876,6 @@  find_if_case_2 (basic_block test_bb, edg
   return TRUE;
 }
 
-/* A subroutine of dead_or_predicable called through for_each_rtx.
-   Return 1 if a memory is found.  */
-
-static int
-find_memory (rtx *px, void *data ATTRIBUTE_UNUSED)
-{
-  return MEM_P (*px);
-}
-
 /* Used by the code above to perform the actual rtl transformations.
    Return TRUE if successful.
 
@@ -3987,131 +3977,32 @@  dead_or_predicable (basic_block test_bb,
       earliest = jump;
     }
 #endif
+  /* If we allocated new pseudos (e.g. in the conditional move
+     expander called from noce_emit_cmove), we must resize the
+     array first.  */
+  if (max_regno < max_reg_num ())
+    max_regno = max_reg_num ();
+
   /* Try the NCE path if the CE path did not result in any changes.  */
   if (n_validated_changes == 0)
     {
+      rtx cond;
       /* In the non-conditional execution case, we have to verify that there
 	 are no trapping operations, no calls, no references to memory, and
 	 that any registers modified are dead at the branch site.  */
 
-      rtx insn, cond, prev;
-      bitmap merge_set, merge_set_noclobber, test_live, test_set;
-      unsigned i, fail = 0;
-      bitmap_iterator bi;
-
-      /* Check for no calls or trapping operations.  */
-      for (insn = head; ; insn = NEXT_INSN (insn))
-	{
-	  if (CALL_P (insn))
-	    return FALSE;
-	  if (NONDEBUG_INSN_P (insn))
-	    {
-	      if (may_trap_p (PATTERN (insn)))
-		return FALSE;
-
-	      /* ??? Even non-trapping memories such as stack frame
-		 references must be avoided.  For stores, we collect
-		 no lifetime info; for reads, we'd have to assert
-		 true_dependence false against every store in the
-		 TEST range.  */
-	      if (for_each_rtx (&PATTERN (insn), find_memory, NULL))
-		return FALSE;
-	    }
-	  if (insn == end)
-	    break;
-	}
-
-      if (! any_condjump_p (jump))
+      if (!any_condjump_p (jump))
 	return FALSE;
 
       /* Find the extent of the conditional.  */
       cond = noce_get_condition (jump, &earliest, false);
-      if (! cond)
+      if (!cond)
 	return FALSE;
 
-      /* Collect:
-	   MERGE_SET = set of registers set in MERGE_BB
-	   MERGE_SET_NOCLOBBER = like MERGE_SET, but only includes registers
-	     that are really set, not just clobbered.
-	   TEST_LIVE = set of registers live at EARLIEST
-	   TEST_SET = set of registers set between EARLIEST and the
-	     end of the block.  */
-
-      merge_set = BITMAP_ALLOC (&reg_obstack);
-      merge_set_noclobber = BITMAP_ALLOC (&reg_obstack);
-      test_live = BITMAP_ALLOC (&reg_obstack);
-      test_set = BITMAP_ALLOC (&reg_obstack);
-
-      /* ??? bb->local_set is only valid during calculate_global_regs_live,
-	 so we must recompute usage for MERGE_BB.  Not so bad, I suppose,
-         since we've already asserted that MERGE_BB is small.  */
-      /* If we allocated new pseudos (e.g. in the conditional move
-	 expander called from noce_emit_cmove), we must resize the
-	 array first.  */
-      if (max_regno < max_reg_num ())
-	max_regno = max_reg_num ();
-
-      FOR_BB_INSNS (merge_bb, insn)
-	{
-	  if (NONDEBUG_INSN_P (insn))
-	    {
-	      df_simulate_find_defs (insn, merge_set);
-	      df_simulate_find_noclobber_defs (insn, merge_set_noclobber);
-	    }
-	}
-
-      /* For small register class machines, don't lengthen lifetimes of
-	 hard registers before reload.  */
-      if (! reload_completed
-	  && targetm.small_register_classes_for_mode_p (VOIDmode))
-	{
-          EXECUTE_IF_SET_IN_BITMAP (merge_set_noclobber, 0, i, bi)
-	    {
-	      if (i < FIRST_PSEUDO_REGISTER
-		  && ! fixed_regs[i]
-		  && ! global_regs[i])
-		fail = 1;
-	    }
-	}
-
-      /* For TEST, we're interested in a range of insns, not a whole block.
-	 Moreover, we're interested in the insns live from OTHER_BB.  */
-
-      /* The loop below takes the set of live registers
-         after JUMP, and calculates the live set before EARLIEST. */
-      bitmap_copy (test_live, df_get_live_in (other_bb));
-      df_simulate_initialize_backwards (test_bb, test_live);
-      for (insn = jump; ; insn = prev)
-	{
-	  if (INSN_P (insn))
-	    {
-	      df_simulate_find_defs (insn, test_set);
-	      df_simulate_one_insn_backwards (test_bb, insn, test_live);
-	    }
-	  prev = PREV_INSN (insn);
-	  if (insn == earliest)
-	    break;
-	}
-
-      /* We can perform the transformation if
-	   MERGE_SET_NOCLOBBER & TEST_SET
-	 and
-	   MERGE_SET & TEST_LIVE)
-	 and
-	   TEST_SET & DF_LIVE_IN (merge_bb)
-	 are empty.  */
-
-      if (bitmap_intersect_p (test_set, merge_set_noclobber)
-	  || bitmap_intersect_p (test_live, merge_set)
-	  || bitmap_intersect_p (test_set, df_get_live_in (merge_bb)))
-	fail = 1;
-
-      BITMAP_FREE (merge_set_noclobber);
-      BITMAP_FREE (merge_set);
-      BITMAP_FREE (test_live);
-      BITMAP_FREE (test_set);
-
-      if (fail)
+      if (!can_move_insns_across (head, end, earliest, jump,
+				  merge_bb, df_get_live_out (merge_bb),
+				  BB_END (merge_bb),
+				  df_get_live_in (other_bb), NULL))
 	return FALSE;
     }
 
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c.orig
+++ cfgcleanup.c
@@ -66,6 +66,10 @@  static bool first_pass;
 /* Set to true if crossjumps occured in the latest run of try_optimize_cfg.  */
 static bool crossjumps_occured;
 
+/* Set to true if we couldn't run an optimization due to stale liveness
+   information; we should run df_analyze to enable more opportunities.  */
+static bool block_was_dirty;
+
 static bool try_crossjump_to_edge (int, edge, edge);
 static bool try_crossjump_bb (int, basic_block);
 static bool outgoing_edges_match (int, basic_block, basic_block);
@@ -1927,6 +1931,171 @@  try_crossjump_bb (int mode, basic_block 
   return changed;
 }
 
+/* Search the successors of BB for common insn sequences.  When found,
+   share code between them by moving it across the basic block
+   boundary.  Return true if any changes made.  */
+
+static bool
+try_head_merge_bb (basic_block bb)
+{
+  int max_match = INT_MAX;
+  edge e0;
+  bool changed;
+  unsigned ix;
+  rtx e0_last_head;
+  unsigned nedges = EDGE_COUNT (bb->succs);
+
+  /* Nothing to do if there is not at least two outgoing edges.  */
+  if (nedges < 2)
+    return false;
+
+  /* Don't crossjump if this block ends in a computed jump,
+     unless we are optimizing for size.  */
+  if (optimize_bb_for_size_p (bb)
+      && bb != EXIT_BLOCK_PTR
+      && computed_jump_p (BB_END (bb)))
+    return false;
+
+  for (ix = 0; ix < nedges; ix++)
+    {
+      edge e = EDGE_SUCC (bb, ix);
+      basic_block other_bb = e->dest;
+      if ((e->flags & EDGE_ABNORMAL)
+	  || EDGE_COUNT (other_bb->preds) != 1)
+	return false;
+    }
+
+  e0 = EDGE_SUCC (bb, 0);
+  e0_last_head = NULL_RTX;
+  changed = false;
+  if (df_get_bb_dirty (e0->dest))
+    {
+      block_was_dirty = true;
+      return false;
+    }
+
+  for (ix = 1; ix < nedges; ix++)
+    {
+      edge e = EDGE_SUCC (bb, ix);
+      rtx e0_last, e_last;
+      int nmatch;
+
+      nmatch = flow_find_head_matching_sequence (e0->dest, e->dest,
+						 &e0_last, &e_last, 0);
+      if (nmatch == 0)
+	return false;
+
+      if (nmatch < max_match)
+	{
+	  max_match = nmatch;
+	  e0_last_head = e0_last;
+	}
+    }
+
+  /* If we matched an entire block, we probably have to avoid moving the
+     last insn.  */
+  if (max_match > 0
+      && e0_last_head == BB_END (e0->dest)
+      && (find_reg_note (e0_last_head, REG_EH_REGION, 0)
+	  || control_flow_insn_p (e0_last_head)))
+    {
+      max_match--;
+      if (max_match == 0)
+	return false;
+      do
+	e0_last_head = prev_real_insn (e0_last_head);
+      while (DEBUG_INSN_P (e0_last_head));
+    }
+
+  if (max_match > 0)
+    {
+      bool moveall;
+      rtx jump = BB_END (bb);
+      rtx cond, move_before;
+      rtx *currptr = XNEWVEC (rtx, nedges);
+      rtx *headptr = XNEWVEC (rtx, nedges);
+
+      cond = get_condition (jump, &move_before, true, false);
+
+      if (cond == NULL_RTX)
+	move_before = jump;
+      for (ix = 0; ix < nedges; ix++)
+	{
+	  rtx head = BB_HEAD (EDGE_SUCC (bb, ix)->dest);
+	  while (!NONDEBUG_INSN_P (head))
+	    head = NEXT_INSN (head);
+	  currptr[ix] = head;
+	  headptr[ix] = head;
+	}
+
+      do
+	{
+	  rtx move_upto;
+	  moveall = can_move_insns_across (currptr[0], e0_last_head,
+					   move_before, jump, e0->dest,
+					   df_get_live_out (e0->dest),
+					   BB_END (e0->dest), NULL, &move_upto);
+	  if (!moveall && move_upto == NULL_RTX)
+	    {
+	      if (jump == move_before)
+		break;
+
+	      /* Try again, using a different insertion point.  */
+	      move_before = jump;
+	      continue;
+	    }
+
+	  changed = true;
+	  for (;;)
+	    {
+	      if (currptr[0] == move_upto)
+		break;
+	      for (ix = 0; ix < nedges; ix++)
+		{
+		  rtx curr = currptr[ix];
+		  do
+		    curr = NEXT_INSN (curr);
+		  while (!NONDEBUG_INSN_P (curr));
+		  currptr[ix] = curr;
+		}
+	    }
+
+	  reorder_insns (headptr[0], currptr[0], PREV_INSN (move_before));
+	  df_set_bb_dirty (EDGE_SUCC (bb, 0)->dest);
+	  df_set_bb_dirty (bb);
+	  for (ix = 1; ix < nedges; ix++)
+	    {
+	      df_set_bb_dirty (EDGE_SUCC (bb, ix)->dest);
+	      delete_insn_chain (headptr[ix], currptr[ix], false);
+	    }
+	  if (!moveall)
+	    {
+	      if (jump == move_before)
+		break;
+
+	      /* Try again, using a different insertion point.  */
+	      move_before = jump;
+	      for (ix = 0; ix < nedges; ix++)
+		{
+		  rtx curr = currptr[ix];
+		  do
+		    curr = NEXT_INSN (curr);
+		  while (!NONDEBUG_INSN_P (curr));
+		  currptr[ix] = headptr[ix] = curr;
+		}
+	    }
+	}
+      while (!moveall);
+
+      free (currptr);
+      free (headptr);
+    }
+
+  crossjumps_occured |= changed;
+
+  return changed;
+}
+
 /* Return true if BB contains just bb note, or bb note followed
    by only DEBUG_INSNs.  */
 
@@ -1972,6 +2141,7 @@  try_optimize_cfg (int mode)
 	 one predecessor, they may be combined.  */
       do
 	{
+	  block_was_dirty = false;
 	  changed = false;
 	  iterations++;
 
@@ -2170,6 +2340,13 @@  try_optimize_cfg (int mode)
 		  && try_crossjump_bb (mode, b))
 		changed_here = true;
 
+	      if ((mode & CLEANUP_CROSSJUMP)
+		  /* This can lengthen register lifetimes.  Do it only after
+		     reload.  */
+		  && reload_completed
+		  && try_head_merge_bb (b))
+		changed_here = true;
+
 	      /* Don't get confused by the index shift caused by
 		 deleting blocks.  */
 	      if (!changed_here)
@@ -2182,6 +2359,9 @@  try_optimize_cfg (int mode)
 	      && try_crossjump_bb (mode, EXIT_BLOCK_PTR))
 	    changed = true;
 
+	  if (block_was_dirty)
+	    df_analyze ();
+
 #ifdef ENABLE_CHECKING
 	  if (changed)
 	    verify_flow_info ();
@@ -2366,8 +2546,7 @@  cleanup_cfg (int mode)
 	  if ((mode & CLEANUP_EXPENSIVE) && !reload_completed
 	      && !delete_trivially_dead_insns (get_insns (), max_reg_num ()))
 	    break;
-	  else if ((mode & CLEANUP_CROSSJUMP)
-		   && crossjumps_occured)
+	  if ((mode & CLEANUP_CROSSJUMP) && crossjumps_occured)
 	    run_fast_dce ();
 	}
       else
Index: df.h
===================================================================
--- df.h.orig
+++ df.h
@@ -992,7 +992,8 @@  extern void df_simulate_one_insn_backwar
 extern void df_simulate_finalize_backwards (basic_block, bitmap);
 extern void df_simulate_initialize_forwards (basic_block, bitmap);
 extern void df_simulate_one_insn_forwards (basic_block, rtx, bitmap);
-
+extern bool can_move_insns_across (rtx, rtx, rtx, rtx, basic_block, regset,
+				   rtx, regset, rtx *);
 /* Functions defined in df-scan.c.  */
 
 extern void df_scan_alloc (bitmap);
Index: df-problems.c
===================================================================
--- df-problems.c.orig
+++ df-problems.c
@@ -39,6 +39,7 @@  along with GCC; see the file COPYING3.  
 #include "basic-block.h"
 #include "sbitmap.h"
 #include "bitmap.h"
+#include "target.h"
 #include "timevar.h"
 #include "df.h"
 #include "except.h"
@@ -3804,6 +3805,27 @@  df_simulate_find_defs (rtx insn, bitmap 
     }
 }
 
+/* Find the set of uses for INSN.  This includes partial defs.  */
+
+static void
+df_simulate_find_uses (rtx insn, bitmap uses)
+{
+  df_ref *rec;
+  unsigned int uid = INSN_UID (insn);
+
+  for (rec = DF_INSN_UID_DEFS (uid); *rec; rec++)
+    {
+      df_ref def = *rec;
+      if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+	bitmap_set_bit (uses, DF_REF_REGNO (def));
+    }
+  for (rec = DF_INSN_UID_USES (uid); *rec; rec++)
+    {
+      df_ref use = *rec;
+      bitmap_set_bit (uses, DF_REF_REGNO (use));
+    }
+}
+
 /* Find the set of real DEFs, which are not clobbers, for INSN.  */
 
 void
@@ -4031,7 +4053,272 @@  df_simulate_one_insn_forwards (basic_blo
     }
   df_simulate_fixup_sets (bb, live);
 }
+
+/* Used by the next two functions to encode information about the
+   memory references we found.  */
+#define MEMREF_NORMAL 1
+#define MEMREF_VOLATILE 2
+
+/* A subroutine of can_move_insns_across_p called through for_each_rtx.
+   Return either MEMREF_NORMAL or MEMREF_VOLATILE if a memory is found.  */
+
+static int
+find_memory (rtx *px, void *data ATTRIBUTE_UNUSED)
+{
+  rtx x = *px;
+  if (!MEM_P (x))
+    return 0;
+  return MEM_VOLATILE_P (x) ? MEMREF_VOLATILE : MEMREF_NORMAL;
+}
+
+/* A subroutine of can_move_insns_across_p called through note_stores.
+   DATA points to an integer in which we set either the bit for
+   MEMREF_NORMAL or the bit for MEMREF_VOLATILE if we find a MEM
+   of either kind.  */
+
+static void
+find_memory_stores (rtx x, const_rtx pat ATTRIBUTE_UNUSED,
+		    void *data ATTRIBUTE_UNUSED)
+{
+  int *pflags = (int *)data;
+  if (GET_CODE (x) == SUBREG)
+    x = XEXP (x, 0);
+  /* Treat stores to SP as stores to memory, this will prevent problems
+     when there are references to the stack frame.  */
+  if (x == stack_pointer_rtx)
+    *pflags |= MEMREF_VOLATILE;
+  if (!MEM_P (x))
+    return;
+  *pflags |= MEM_VOLATILE_P (x) ? MEMREF_VOLATILE : MEMREF_NORMAL;
+}
 
+/* Return true if it is safe to move a group of insns, described by
+   the range FROM to TO, backwards across another group of insns,
+   described by ACROSS_FROM to ACROSS_TO.  It is assumed that there
+   are no insns between ACROSS_TO and FROM, but they may be in
+   different basic blocks; MERGE_BB and ACROSS_BB say which.  The
+   caller must also pass some lifetime information; LIVE is the set of
+   live registers at a point LIVE_POINT, which may either be TO, or a
+   later insn in the same basic block, suitable for scanning backwards
+   and reaching TO.
+
+   This function may be called in one of two cases: either we try to
+   move identical instructions from all successor blocks into their
+   predecessor, or we try to move from only one successor block.  If
+   OTHER_BRANCH_LIVE is nonnull, it indicates that we're dealing with
+   the second case.  It should contain a set of registers live at the
+   end of ACROSS_TO which must not be clobbered by moving the insns.
+   In that case, we're also more careful about moving memory references
+   and trapping insns.
+
+   We return false if it is not safe to move the entire group, but it
+   may still be possible to move a subgroup.  PMOVE_UPTO, if nonnull,
+   is set to point at the last moveable insn in such a case.  */
+
+bool
+can_move_insns_across (rtx from, rtx to, rtx across_from, rtx across_to,
+		       basic_block merge_bb, regset live, rtx live_point,
+		       regset other_branch_live, rtx *pmove_upto)
+{
+  rtx insn, next, max_to;
+  bitmap merge_set, merge_use, merge_live;
+  bitmap test_set, test_use;
+  unsigned i, fail = 0;
+  bitmap_iterator bi;
+  int memrefs_in_across = 0;
+  int mem_sets_in_across = 0;
+  bool trapping_insns_in_across = false;
+
+  if (pmove_upto != NULL)
+    *pmove_upto = NULL_RTX;
+
+  /* Find real bounds, ignoring debug insns.  */
+  while (!NONDEBUG_INSN_P (from) && from != to)
+    from = NEXT_INSN (from);
+  while (!NONDEBUG_INSN_P (to) && from != to)
+    to = PREV_INSN (to);
+
+  for (insn = across_to; ; insn = next)
+    {
+      if (NONDEBUG_INSN_P (insn))
+	{
+	  memrefs_in_across |= for_each_rtx (&PATTERN (insn), find_memory,
+					     NULL);
+	  note_stores (PATTERN (insn), find_memory_stores,
+		       &mem_sets_in_across);
+	  /* This is used just to find sets of the stack pointer.  */
+	  memrefs_in_across |= mem_sets_in_across;
+	  trapping_insns_in_across |= may_trap_p (PATTERN (insn));
+	}
+      next = PREV_INSN (insn);
+      if (insn == across_from)
+	break;
+    }
+
+  /* Collect:
+     MERGE_SET = set of registers set in MERGE_BB
+     MERGE_USE = set of registers used in MERGE_BB and live at its top
+     MERGE_LIVE = set of registers live at the point inside the MERGE
+     range that we've reached during scanning
+     TEST_SET = set of registers set between ACROSS_FROM and ACROSS_END.
+     TEST_USE = set of registers used between ACROSS_FROM and ACROSS_END,
+     and live before ACROSS_FROM.  */
+
+  merge_set = BITMAP_ALLOC (&reg_obstack);
+  merge_use = BITMAP_ALLOC (&reg_obstack);
+  merge_live = BITMAP_ALLOC (&reg_obstack);
+  test_set = BITMAP_ALLOC (&reg_obstack);
+  test_use = BITMAP_ALLOC (&reg_obstack);
+
+  /* Compute the set of registers set and used in the ACROSS range.  */
+  if (other_branch_live != NULL)
+    bitmap_copy (test_use, other_branch_live);
+  df_simulate_initialize_backwards (merge_bb, test_use);
+  for (insn = across_to; ; insn = next)
+    {
+      if (NONDEBUG_INSN_P (insn))
+	{
+	  df_simulate_find_defs (insn, test_set);
+	  df_simulate_defs (insn, test_use);
+	  df_simulate_uses (insn, test_use);
+	}
+      next = PREV_INSN (insn);
+      if (insn == across_from)
+	break;
+    }
+
+  /* Compute an upper bound for the amount of insns moved, by finding
+     the first insn in MERGE that sets a register in TEST_USE, or uses
+     a register in TEST_SET.  We also check for calls, trapping operations,
+     and memory references.  */
+  max_to = NULL_RTX;
+  for (insn = from; ; insn = next)
+    {
+      if (CALL_P (insn))
+	break;
+      if (NONDEBUG_INSN_P (insn))
+	{
+	  if (may_trap_p (PATTERN (insn))
+	      && (trapping_insns_in_across || other_branch_live != NULL))
+	    break;
+
+	  /* We cannot move memory stores past each other, or move memory
+	     reads past stores, at least not without tracking them and
+	     calling true_dependence on every pair.
+
+	     If there is no other branch and no memory references or
+	     sets in the ACROSS range, we can move memory references
+	     freely, even volatile ones.
+
+	     Otherwise, the rules are as follows: volatile memory
+	     references and stores can't be moved at all, and any type
+	     of memory reference can't be moved if there are volatile
+	     accesses or stores in the ACROSS range.  That leaves
+	     normal reads, which can be moved, as the trapping case is
+	     dealt with elsewhere.  */
+	  if (other_branch_live != NULL || memrefs_in_across != 0)
+	    {
+	      int mem_ref_flags = 0;
+	      int mem_set_flags = 0;
+	      note_stores (PATTERN (insn), find_memory_stores, &mem_set_flags);
+	      mem_ref_flags = for_each_rtx (&PATTERN (insn), find_memory,
+					    NULL);
+	      /* Catch sets of the stack pointer.  */
+	      mem_ref_flags |= mem_set_flags;
+
+	      if ((mem_ref_flags | mem_set_flags) & MEMREF_VOLATILE)
+		break;
+	      if ((memrefs_in_across & MEMREF_VOLATILE) && mem_ref_flags != 0)
+		break;
+	      if (mem_set_flags != 0
+		  || (mem_sets_in_across != 0 && mem_ref_flags != 0))
+		break;
+	    }
+	  df_simulate_find_uses (insn, merge_use);
+	  /* We're only interested in uses which use a value live at
+	     the top, not one previously set in this block.  */
+	  bitmap_and_compl_into (merge_use, merge_set);
+	  df_simulate_find_defs (insn, merge_set);
+	  if (bitmap_intersect_p (merge_set, test_use)
+	      || bitmap_intersect_p (merge_use, test_set))
+	    break;
+	  max_to = insn;
+	}
+      next = NEXT_INSN (insn);
+      if (insn == to)
+	break;
+    }
+  if (max_to != to)
+    fail = 1;
+
+  if (max_to == NULL_RTX || (fail && pmove_upto == NULL))
+    goto out;
+
+  /* Now, lower this upper bound by also taking into account that
+     a range of insns moved across ACROSS must not leave a register
+     live at the end that will be clobbered in ACROSS.  */
+  bitmap_copy (merge_live, live);
+  df_simulate_initialize_backwards (merge_bb, merge_live);
+  /* Scan and update life information until we reach the point we're
+     interested in.  */
+  for (insn = live_point; insn != max_to; insn = PREV_INSN (insn))
+    df_simulate_one_insn_backwards (merge_bb, insn, merge_live);
+
+  /* Now scan backwards to find a point where TEST_SET & LIVE == 0.
+     Insns in the MERGE range that set registers which are also set
+     in the ACROSS range may still be moved as long as we also move
+     later insns which use the results of the set, and make the
+     register dead again.  This is verified by the condition stated
+     above.  */
+  for (; ; insn = next)
+    {
+      if (NONDEBUG_INSN_P (insn))
+	{
+	  if (!bitmap_intersect_p (test_set, merge_live))
+	    {
+	      max_to = insn;
+	      break;
+	    }
+
+	  df_simulate_one_insn_backwards (merge_bb, insn, merge_live);
+	}
+      next = PREV_INSN (insn);
+      if (insn == from)
+	{
+	  fail = 1;
+	  goto out;
+	}
+    }
+
+  if (max_to != to)
+    fail = 1;
+
+  if (pmove_upto)
+    *pmove_upto = max_to;
+
+  /* For small register class machines, don't lengthen lifetimes of
+     hard registers before reload.  */
+  if (! reload_completed
+      && targetm.small_register_classes_for_mode_p (VOIDmode))
+    {
+      EXECUTE_IF_SET_IN_BITMAP (merge_set, 0, i, bi)
+	{
+	  if (i < FIRST_PSEUDO_REGISTER
+	      && ! fixed_regs[i]
+	      && ! global_regs[i])
+	    fail = 1;
+	}
+    }
+
+ out:
+  BITMAP_FREE (merge_set);
+  BITMAP_FREE (merge_use);
+  BITMAP_FREE (merge_live);
+  BITMAP_FREE (test_set);
+  BITMAP_FREE (test_use);
+
+  return !fail;
+}
 
 
 /*----------------------------------------------------------------------------
Index: Makefile.in
===================================================================
--- Makefile.in.orig
+++ Makefile.in
@@ -3154,7 +3154,7 @@  df-core.o : df-core.c $(CONFIG_H) $(SYST
 df-problems.o : df-problems.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(RTL_H) insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
-   $(TM_P_H) $(FLAGS_H) output.h $(EXCEPT_H) dce.h vecprim.h
+   $(TM_P_H) $(TARGET_H) $(FLAGS_H) output.h $(EXCEPT_H) dce.h vecprim.h
 df-scan.o : df-scan.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: testsuite/gcc.target/arm/headmerge-1.c
===================================================================
--- /dev/null
+++ testsuite/gcc.target/arm/headmerge-1.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile }  */
+/* { dg-options "-O2" }  */
+/* { dg-final { scan-assembler-times "#120" 1 } } */
+
+extern void foo1 (int);
+extern void foo2 (int);
+
+void t (int x, int y)
+{
+  if (y < 5)
+    foo1 (120);
+  else
+    foo2 (120);
+}