diff mbox

[stage1] Move insns without introducing new temporaries in loop2_invariant

Message ID 000101d0572a$2de1f390$89a5dab0$@arm.com
State New
Headers show

Commit Message

Thomas Preud'homme March 5, 2015, 9:53 a.m. UTC
Note: this is stage1 material.

Currently loop2_invariant pass hoist instructions out of loop by creating a new temporary for the destination register of that instruction and leaving there a mov from new temporary to old register as shown below:

loop header
start of loop body
//stuff
(set (reg 128) (const_int 0))
//other stuff
end of loop body

becomes:

(set (reg 129) (const_int 0))
loop header
start of loop body
//stuff
(set (reg 128) (reg 128))
//other stuff
end of loop body

This is one of the errors that led to a useless ldr ending up inside a loop (PR64616). This patch fix this specific bit (some other bit was fixed in [1]) by simply moving the instruction if it's known to be safe. This is decided by looking at all the uses of the register set in the instruction and checking that (i) they were all dominated by the instruction and (ii) there is no other def in the loop that could end up reaching one of the use.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00933.html

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2015-02-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        * dominance.c (nearest_common_dominator_for_set): Fix A_Dominated_by_B
        code in comment.
        * loop-invariant.c (can_move_invariant_reg): New.
        (move_invariant_reg): Call above new function to decide whether
        instruction can just be moved, skipping creation of temporary
        register.

*** gcc/testsuite/ChangeLog ***

2015-02-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        * gcc.dg/loop-7.c: Run on all targets and check for loop2_invariant
        being able to move instructions without introducing new temporary
        register.
        * gcc.dg/loop-8.c: New test.


Testsuite was run in QEMU when compiled by an arm-none-eabi GCC cross-compiler targeting Cortex-M3 and a bootstrapped x86_64 native GCC without any regression.

Is this ok for stage1?

Best regards,

Thomas

Comments

Richard Biener March 5, 2015, 11:12 a.m. UTC | #1
On Thu, Mar 5, 2015 at 10:53 AM, Thomas Preud'homme
<thomas.preudhomme@arm.com> wrote:
> Note: this is stage1 material.
>
> Currently loop2_invariant pass hoist instructions out of loop by creating a new temporary for the destination register of that instruction and leaving there a mov from new temporary to old register as shown below:
>
> loop header
> start of loop body
> //stuff
> (set (reg 128) (const_int 0))
> //other stuff
> end of loop body
>
> becomes:
>
> (set (reg 129) (const_int 0))
> loop header
> start of loop body
> //stuff
> (set (reg 128) (reg 128))
> //other stuff
> end of loop body
>
> This is one of the errors that led to a useless ldr ending up inside a loop (PR64616). This patch fix this specific bit (some other bit was fixed in [1]) by simply moving the instruction if it's known to be safe. This is decided by looking at all the uses of the register set in the instruction and checking that (i) they were all dominated by the instruction and (ii) there is no other def in the loop that could end up reaching one of the use.

Why doesn't copy-propagation clean this up?  It's run after loop2.

Richard.

> [1] https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00933.html
>
> ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2015-02-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>         * dominance.c (nearest_common_dominator_for_set): Fix A_Dominated_by_B
>         code in comment.
>         * loop-invariant.c (can_move_invariant_reg): New.
>         (move_invariant_reg): Call above new function to decide whether
>         instruction can just be moved, skipping creation of temporary
>         register.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2015-02-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>         * gcc.dg/loop-7.c: Run on all targets and check for loop2_invariant
>         being able to move instructions without introducing new temporary
>         register.
>         * gcc.dg/loop-8.c: New test.
>
> diff --git a/gcc/dominance.c b/gcc/dominance.c
> index 33d4ae4..09c8c90 100644
> --- a/gcc/dominance.c
> +++ b/gcc/dominance.c
> @@ -982,7 +982,7 @@ nearest_common_dominator_for_set (enum cdi_direction dir, bitmap blocks)
>
>     A_Dominated_by_B (node A, node B)
>     {
> -     return DFS_Number_In(A) >= DFS_Number_In(A)
> +     return DFS_Number_In(A) >= DFS_Number_In(B)
>              && DFS_Number_Out (A) <= DFS_Number_Out(B);
>     }  */
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index f79b497..ab2a45c 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1512,6 +1512,99 @@ replace_uses (struct invariant *inv, rtx reg, bool in_group)
>    return 1;
>  }
>
> +/* Whether invariant INV setting REG can be moved out of LOOP, at the end of
> +   the block preceding its header.  */
> +
> +static bool
> +can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx reg)
> +{
> +  df_ref def, use;
> +  bool ret = false;
> +  unsigned int i, dest_regno, defs_in_loop_count = 0;
> +  rtx_insn *insn = inv->insn;
> +  bitmap may_exit, has_exit, always_executed;
> +  basic_block *body, bb = BLOCK_FOR_INSN (inv->insn);
> +
> +  /* We ignore hard register and memory access for cost and complexity reasons.
> +     Hard register are few at this stage and expensive to consider as they
> +     require building a separate data flow.  Memory access would require using
> +     df_simulate_* and can_move_insns_across functions and is more complex.  */
> +  if (!REG_P (reg) || HARD_REGISTER_P (reg))
> +    return false;
> +
> +  /* Check whether the set is always executed.  We could omit this condition if
> +     we know that the register is unused outside of the loop, but it does not
> +     seem worth finding out.  */
> +  may_exit = BITMAP_ALLOC (NULL);
> +  has_exit = BITMAP_ALLOC (NULL);
> +  always_executed = BITMAP_ALLOC (NULL);
> +  body = get_loop_body_in_dom_order (loop);
> +  find_exits (loop, body, may_exit, has_exit);
> +  compute_always_reached (loop, body, has_exit, always_executed);
> +  /* Find bit position for basic block bb.  */
> +  for (i = 0; i < loop->num_nodes && body[i] != bb; i++);
> +  if (!bitmap_bit_p (always_executed, i))
> +    goto cleanup;
> +
> +  /* Check that all uses reached by the def in insn would still be reached
> +     it.  */
> +  dest_regno = REGNO (reg);
> +  for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use))
> +    {
> +      rtx ref;
> +      basic_block use_bb;
> +
> +      ref = DF_REF_INSN (use);
> +      use_bb = BLOCK_FOR_INSN (ref);
> +
> +      /* Ignore instruction considered for moving.  */
> +      if (ref == insn)
> +       continue;
> +
> +      /* Don't consider uses outside loop.  */
> +      if (!flow_bb_inside_loop_p (loop, use_bb))
> +       continue;
> +
> +      /* Don't move if a use is not dominated by def in insn.  */
> +      if (use_bb == bb && DF_INSN_LUID (insn) > DF_INSN_LUID (ref))
> +       goto cleanup;
> +      if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb))
> +       goto cleanup;
> +
> +      /* Check for other defs.  Any other def in the loop might reach a use
> +        currently reached by the def in insn.  */
> +      if (!defs_in_loop_count)
> +       {
> +         for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = DF_REF_NEXT_REG (def))
> +           {
> +             basic_block def_bb = BLOCK_FOR_INSN (DF_REF_INSN (def));
> +
> +             /* Defs in exit block cannot reach a use they weren't
> +                already.  */
> +             if (single_succ_p (def_bb))
> +               {
> +                 basic_block def_bb_succ;
> +
> +                 def_bb_succ = single_succ (def_bb);
> +                 if (!flow_bb_inside_loop_p (loop, def_bb_succ))
> +                   continue;
> +               }
> +
> +             if (++defs_in_loop_count > 1)
> +               goto cleanup;
> +           }
> +       }
> +    }
> +  ret = true;
> +
> +cleanup:
> +  BITMAP_FREE (always_executed);
> +  BITMAP_FREE (may_exit);
> +  BITMAP_FREE (has_exit);
> +  free (body);
> +  return ret;
> +}
> +
>  /* Move invariant INVNO out of the LOOP.  Returns true if this succeeds, false
>     otherwise.  */
>
> @@ -1545,11 +1638,8 @@ move_invariant_reg (struct loop *loop, unsigned invno)
>             }
>         }
>
> -      /* Move the set out of the loop.  If the set is always executed (we could
> -        omit this condition if we know that the register is unused outside of
> -        the loop, but it does not seem worth finding out) and it has no uses
> -        that would not be dominated by it, we may just move it (TODO).
> -        Otherwise we need to create a temporary register.  */
> +      /* If possible, just move the set out of the loop.  Otherwise, we
> +        need to create a temporary register.  */
>        set = single_set (inv->insn);
>        reg = dest = SET_DEST (set);
>        if (GET_CODE (reg) == SUBREG)
> @@ -1557,19 +1647,25 @@ move_invariant_reg (struct loop *loop, unsigned invno)
>        if (REG_P (reg))
>         regno = REGNO (reg);
>
> -      reg = gen_reg_rtx_and_attrs (dest);
> +      if (!can_move_invariant_reg (loop, inv, reg))
> +       {
> +         reg = gen_reg_rtx_and_attrs (dest);
>
> -      /* Try replacing the destination by a new pseudoregister.  */
> -      validate_change (inv->insn, &SET_DEST (set), reg, true);
> +         /* Try replacing the destination by a new pseudoregister.  */
> +         validate_change (inv->insn, &SET_DEST (set), reg, true);
>
> -      /* As well as all the dominated uses.  */
> -      replace_uses (inv, reg, true);
> +         /* As well as all the dominated uses.  */
> +         replace_uses (inv, reg, true);
>
> -      /* And validate all the changes.  */
> -      if (!apply_change_group ())
> -       goto fail;
> +         /* And validate all the changes.  */
> +         if (!apply_change_group ())
> +           goto fail;
>
> -      emit_insn_after (gen_move_insn (dest, reg), inv->insn);
> +         emit_insn_after (gen_move_insn (dest, reg), inv->insn);
> +       }
> +      else if (dump_file)
> +       fprintf (dump_file, "Invariant %d moved without introducing a new "
> +                           "temporary register\n", invno);
>        reorder_insns (inv->insn, inv->insn, BB_END (preheader));
>
>        /* If there is a REG_EQUAL note on the insn we just moved, and the
> diff --git a/gcc/testsuite/gcc.dg/loop-7.c b/gcc/testsuite/gcc.dg/loop-7.c
> index 0457add..eac645c 100644
> --- a/gcc/testsuite/gcc.dg/loop-7.c
> +++ b/gcc/testsuite/gcc.dg/loop-7.c
> @@ -1,6 +1,6 @@
>  /* PR rtl-optimization/31360  */
>
> -/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-do compile { target } } */
>  /* { dg-options "-O1 -fdump-rtl-loop2_invariant" } */
>
>  void f(int *a)
> @@ -12,5 +12,6 @@ void f(int *a)
>
>  /* Load of 0 is moved out of the loop.  */
>  /* { dg-final { scan-rtl-dump-times "Decided" 1 "loop2_invariant" } } */
> +/* { dg-final { scan-rtl-dump-times "without introducing a new temporary register" 1 "loop2_invariant" } } */
>  /* { dg-final { cleanup-rtl-dump "loop2_invariant" } } */
>
> diff --git a/gcc/testsuite/gcc.dg/loop-8.c b/gcc/testsuite/gcc.dg/loop-8.c
> new file mode 100644
> index 0000000..592e54c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/loop-8.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-rtl-loop2_invariant" } */
> +
> +void
> +f (int *a, int *b)
> +{
> +  int i;
> +
> +  for (i = 0; i < 100; i++)
> +    {
> +      int d = 42;
> +
> +      a[i] = d;
> +      if (i % 2)
> +       d = i;
> +      b[i] = d;
> +    }
> +}
> +
> +/* Load of 42 is moved out of the loop, introducing a new pseudo register.  */
> +/* { dg-final { scan-rtl-dump-times "Decided" 1 "loop2_invariant" } } */
> +/* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" "loop2_invariant" } } */
> +/* { dg-final { cleanup-rtl-dump "loop2_invariant" } } */
> +
>
> Testsuite was run in QEMU when compiled by an arm-none-eabi GCC cross-compiler targeting Cortex-M3 and a bootstrapped x86_64 native GCC without any regression.
>
> Is this ok for stage1?
>
> Best regards,
>
> Thomas
>
>
>
Thomas Preud'homme March 6, 2015, 8:59 a.m. UTC | #2
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Thursday, March 05, 2015 7:12 PM
> >
> > loop header
> > start of loop body
> > //stuff
> > (set (reg 128) (const_int 0))
> > //other stuff
> > end of loop body
> >
> > becomes:
> >
> > (set (reg 129) (const_int 0))
> > loop header
> > start of loop body
> > //stuff
> > (set (reg 128) (reg 128))
> > //other stuff
> > end of loop body
> >
> 
> Why doesn't copy-propagation clean this up?  It's run after loop2.

Actually cprop3 is what makes the situation worse in this case as it
will copy the constant that is set outside the loop in the mov that is in
the loop. In the case or PR64616 the constant is a symbol_ref which
makes it a memory access so it propagates the memory access in the
loop, making the load executed many times.

Note that as I said in the intro this bug is also solved by [1] which is
the first thing that goes wrong for this example. That being said,
loop invariant pass ought to simply move instructions if it can safely
do so.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00933.html

Best regards,

Thomas
Jiong Wang March 6, 2015, 12:09 p.m. UTC | #3
On 05/03/15 09:53, Thomas Preud'homme wrote:
> *** gcc/testsuite/ChangeLog ***
>
> 2015-02-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>          * gcc.dg/loop-7.c: Run on all targets and check for loop2_invariant
>          being able to move instructions without introducing new temporary
Thomas,

   Can you please confirm this relax on all target will not fail on 
AArch64? It do fails on my quick test.
Thomas Preud'homme March 9, 2015, 10:45 a.m. UTC | #4
> From: Jiong Wang [mailto:jiong.wang@arm.com]
> Sent: Friday, March 06, 2015 8:10 PM
> 
> On 05/03/15 09:53, Thomas Preud'homme wrote:
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2015-02-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> >
> >          * gcc.dg/loop-7.c: Run on all targets and check for loop2_invariant
> >          being able to move instructions without introducing new
> temporary
> Thomas,
> 
>    Can you please confirm this relax on all target will not fail on
> AArch64? It do fails on my quick test.

Indeed, I made a very naïve assumption here and should have tested on a wider
range of targets. I'll rework the testcases associated with this patch.

Thanks for catching it Jiong.

Best regards,

Thomas
Steven Bosscher March 9, 2015, 11:48 a.m. UTC | #5
On Thu, Mar 5, 2015 at 10:53 AM, Thomas Preud'homme wrote:
> diff --git a/gcc/dominance.c b/gcc/dominance.c
> index 33d4ae4..09c8c90 100644
> --- a/gcc/dominance.c
> +++ b/gcc/dominance.c
> @@ -982,7 +982,7 @@ nearest_common_dominator_for_set (enum cdi_direction dir, bitmap blocks)
>
>     A_Dominated_by_B (node A, node B)
>     {
> -     return DFS_Number_In(A) >= DFS_Number_In(A)
> +     return DFS_Number_In(A) >= DFS_Number_In(B)
>              && DFS_Number_Out (A) <= DFS_Number_Out(B);
>     }  */

This hunk is obvious enough ;-)


> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index f79b497..ab2a45c 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
...
> +  /* Check whether the set is always executed.  We could omit this condition if
> +     we know that the register is unused outside of the loop, but it does not
> +     seem worth finding out.  */
> +  may_exit = BITMAP_ALLOC (NULL);
> +  has_exit = BITMAP_ALLOC (NULL);
> +  always_executed = BITMAP_ALLOC (NULL);
> +  body = get_loop_body_in_dom_order (loop);
> +  find_exits (loop, body, may_exit, has_exit);
> +  compute_always_reached (loop, body, has_exit, always_executed);
> +  /* Find bit position for basic block bb.  */
> +  for (i = 0; i < loop->num_nodes && body[i] != bb; i++);
> +  if (!bitmap_bit_p (always_executed, i))
> +    goto cleanup;

It looks like this would run for all candidate loop invariants, right?

If so, you're creating run time of O(n_invariants*n_bbs_in_loop), a
potential compile time hog for large loops.

But why compute this at all? Perhaps I'm missing something, but you
already have inv->always_executed available, no?




> +  /* Check that all uses reached by the def in insn would still be reached
> +     it.  */
> +  dest_regno = REGNO (reg);
> +  for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use))
> +    {
> +      rtx ref;

Would be nice if we can start using rtx_insn for new code. You do so
further up, I suggest you continue that good citizenship here :-)


> +      basic_block use_bb;
> +
> +      ref = DF_REF_INSN (use);
> +      use_bb = BLOCK_FOR_INSN (ref);

You can use DF_REF_BB.



> +      /* Ignore instruction considered for moving.  */
> +      if (ref == insn)
> +       continue;
> +
> +      /* Don't consider uses outside loop.  */
> +      if (!flow_bb_inside_loop_p (loop, use_bb))
> +       continue;
> +
> +      /* Don't move if a use is not dominated by def in insn.  */
> +      if (use_bb == bb && DF_INSN_LUID (insn) > DF_INSN_LUID (ref))
> +       goto cleanup;

You're safer with ">=" even if you've already checked ref==insn.


Ciao!
Steven
Thomas Preud'homme March 10, 2015, 7:35 a.m. UTC | #6
> From: Steven Bosscher [mailto:stevenb.gcc@gmail.com]
> Sent: Monday, March 09, 2015 7:48 PM
> To: Thomas Preud'homme
> Cc: GCC Patches; Eric Botcazou
> Subject: Re: [PATCH, stage1] Move insns without introducing new
> temporaries in loop2_invariant
> 
> On Thu, Mar 5, 2015 at 10:53 AM, Thomas Preud'homme wrote:
> > diff --git a/gcc/dominance.c b/gcc/dominance.c
> > index 33d4ae4..09c8c90 100644
> > --- a/gcc/dominance.c
> > +++ b/gcc/dominance.c
> > @@ -982,7 +982,7 @@ nearest_common_dominator_for_set (enum
> cdi_direction dir, bitmap blocks)
> >
> >     A_Dominated_by_B (node A, node B)
> >     {
> > -     return DFS_Number_In(A) >= DFS_Number_In(A)
> > +     return DFS_Number_In(A) >= DFS_Number_In(B)
> >              && DFS_Number_Out (A) <= DFS_Number_Out(B);
> >     }  */
> 
> This hunk is obvious enough ;-)

Thus committed.

Best regards,

Thomas
diff mbox

Patch

diff --git a/gcc/dominance.c b/gcc/dominance.c
index 33d4ae4..09c8c90 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -982,7 +982,7 @@  nearest_common_dominator_for_set (enum cdi_direction dir, bitmap blocks)
 
    A_Dominated_by_B (node A, node B)
    {
-     return DFS_Number_In(A) >= DFS_Number_In(A)
+     return DFS_Number_In(A) >= DFS_Number_In(B)
             && DFS_Number_Out (A) <= DFS_Number_Out(B);
    }  */
 
diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index f79b497..ab2a45c 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1512,6 +1512,99 @@  replace_uses (struct invariant *inv, rtx reg, bool in_group)
   return 1;
 }
 
+/* Whether invariant INV setting REG can be moved out of LOOP, at the end of
+   the block preceding its header.  */
+
+static bool
+can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx reg)
+{
+  df_ref def, use;
+  bool ret = false;
+  unsigned int i, dest_regno, defs_in_loop_count = 0;
+  rtx_insn *insn = inv->insn;
+  bitmap may_exit, has_exit, always_executed;
+  basic_block *body, bb = BLOCK_FOR_INSN (inv->insn);
+
+  /* We ignore hard register and memory access for cost and complexity reasons.
+     Hard register are few at this stage and expensive to consider as they
+     require building a separate data flow.  Memory access would require using
+     df_simulate_* and can_move_insns_across functions and is more complex.  */
+  if (!REG_P (reg) || HARD_REGISTER_P (reg))
+    return false;
+
+  /* Check whether the set is always executed.  We could omit this condition if
+     we know that the register is unused outside of the loop, but it does not
+     seem worth finding out.  */
+  may_exit = BITMAP_ALLOC (NULL);
+  has_exit = BITMAP_ALLOC (NULL);
+  always_executed = BITMAP_ALLOC (NULL);
+  body = get_loop_body_in_dom_order (loop);
+  find_exits (loop, body, may_exit, has_exit);
+  compute_always_reached (loop, body, has_exit, always_executed);
+  /* Find bit position for basic block bb.  */
+  for (i = 0; i < loop->num_nodes && body[i] != bb; i++);
+  if (!bitmap_bit_p (always_executed, i))
+    goto cleanup;
+
+  /* Check that all uses reached by the def in insn would still be reached
+     it.  */
+  dest_regno = REGNO (reg);
+  for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use))
+    {
+      rtx ref;
+      basic_block use_bb;
+
+      ref = DF_REF_INSN (use);
+      use_bb = BLOCK_FOR_INSN (ref);
+
+      /* Ignore instruction considered for moving.  */
+      if (ref == insn)
+	continue;
+
+      /* Don't consider uses outside loop.  */
+      if (!flow_bb_inside_loop_p (loop, use_bb))
+	continue;
+
+      /* Don't move if a use is not dominated by def in insn.  */
+      if (use_bb == bb && DF_INSN_LUID (insn) > DF_INSN_LUID (ref))
+	goto cleanup;
+      if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb))
+	goto cleanup;
+
+      /* Check for other defs.  Any other def in the loop might reach a use
+	 currently reached by the def in insn.  */
+      if (!defs_in_loop_count)
+	{
+	  for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = DF_REF_NEXT_REG (def))
+	    {
+	      basic_block def_bb = BLOCK_FOR_INSN (DF_REF_INSN (def));
+
+	      /* Defs in exit block cannot reach a use they weren't
+		 already.  */
+	      if (single_succ_p (def_bb))
+		{
+		  basic_block def_bb_succ;
+
+		  def_bb_succ = single_succ (def_bb);
+		  if (!flow_bb_inside_loop_p (loop, def_bb_succ))
+		    continue;
+		}
+
+	      if (++defs_in_loop_count > 1)
+		goto cleanup;
+	    }
+	}
+    }
+  ret = true;
+
+cleanup:
+  BITMAP_FREE (always_executed);
+  BITMAP_FREE (may_exit);
+  BITMAP_FREE (has_exit);
+  free (body);
+  return ret;
+}
+
 /* Move invariant INVNO out of the LOOP.  Returns true if this succeeds, false
    otherwise.  */
 
@@ -1545,11 +1638,8 @@  move_invariant_reg (struct loop *loop, unsigned invno)
 	    }
 	}
 
-      /* Move the set out of the loop.  If the set is always executed (we could
-	 omit this condition if we know that the register is unused outside of
-	 the loop, but it does not seem worth finding out) and it has no uses
-	 that would not be dominated by it, we may just move it (TODO).
-	 Otherwise we need to create a temporary register.  */
+      /* If possible, just move the set out of the loop.  Otherwise, we
+	 need to create a temporary register.  */
       set = single_set (inv->insn);
       reg = dest = SET_DEST (set);
       if (GET_CODE (reg) == SUBREG)
@@ -1557,19 +1647,25 @@  move_invariant_reg (struct loop *loop, unsigned invno)
       if (REG_P (reg))
 	regno = REGNO (reg);
 
-      reg = gen_reg_rtx_and_attrs (dest);
+      if (!can_move_invariant_reg (loop, inv, reg))
+	{
+	  reg = gen_reg_rtx_and_attrs (dest);
 
-      /* Try replacing the destination by a new pseudoregister.  */
-      validate_change (inv->insn, &SET_DEST (set), reg, true);
+	  /* Try replacing the destination by a new pseudoregister.  */
+	  validate_change (inv->insn, &SET_DEST (set), reg, true);
 
-      /* As well as all the dominated uses.  */
-      replace_uses (inv, reg, true);
+	  /* As well as all the dominated uses.  */
+	  replace_uses (inv, reg, true);
 
-      /* And validate all the changes.  */
-      if (!apply_change_group ())
-	goto fail;
+	  /* And validate all the changes.  */
+	  if (!apply_change_group ())
+	    goto fail;
 
-      emit_insn_after (gen_move_insn (dest, reg), inv->insn);
+	  emit_insn_after (gen_move_insn (dest, reg), inv->insn);
+	}
+      else if (dump_file)
+	fprintf (dump_file, "Invariant %d moved without introducing a new "
+			    "temporary register\n", invno);
       reorder_insns (inv->insn, inv->insn, BB_END (preheader));
 
       /* If there is a REG_EQUAL note on the insn we just moved, and the
diff --git a/gcc/testsuite/gcc.dg/loop-7.c b/gcc/testsuite/gcc.dg/loop-7.c
index 0457add..eac645c 100644
--- a/gcc/testsuite/gcc.dg/loop-7.c
+++ b/gcc/testsuite/gcc.dg/loop-7.c
@@ -1,6 +1,6 @@ 
 /* PR rtl-optimization/31360  */
 
-/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-do compile { target } } */
 /* { dg-options "-O1 -fdump-rtl-loop2_invariant" } */
 
 void f(int *a)
@@ -12,5 +12,6 @@  void f(int *a)
 
 /* Load of 0 is moved out of the loop.  */
 /* { dg-final { scan-rtl-dump-times "Decided" 1 "loop2_invariant" } } */
+/* { dg-final { scan-rtl-dump-times "without introducing a new temporary register" 1 "loop2_invariant" } } */
 /* { dg-final { cleanup-rtl-dump "loop2_invariant" } } */
 
diff --git a/gcc/testsuite/gcc.dg/loop-8.c b/gcc/testsuite/gcc.dg/loop-8.c
new file mode 100644
index 0000000..592e54c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-8.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-rtl-loop2_invariant" } */
+
+void
+f (int *a, int *b)
+{
+  int i;
+
+  for (i = 0; i < 100; i++)
+    {
+      int d = 42;
+
+      a[i] = d;
+      if (i % 2)
+	d = i;
+      b[i] = d;
+    }
+}
+
+/* Load of 42 is moved out of the loop, introducing a new pseudo register.  */
+/* { dg-final { scan-rtl-dump-times "Decided" 1 "loop2_invariant" } } */
+/* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" "loop2_invariant" } } */
+/* { dg-final { cleanup-rtl-dump "loop2_invariant" } } */
+