diff mbox

[PR,10474] Split live-ranges of function arguments to help shrink-wrapping

Message ID 20131025151905.GA29905@virgil.suse
State New
Headers show

Commit Message

Martin Jambor Oct. 25, 2013, 3:19 p.m. UTC
Hi,

On Thu, Oct 24, 2013 at 01:02:51AM +0200, Steven Bosscher wrote:
> On Wed, Oct 23, 2013 at 6:46 PM, Martin Jambor wrote:
> 
> >  /* Perform the second half of the transformation started in
> > @@ -4522,7 +4704,15 @@ ira (FILE *f)
> >       allocation because of -O0 usage or because the function is too
> >       big.  */
> >    if (ira_conflicts_p)
> > -    find_moveable_pseudos ();
> > +    {
> > +      df_analyze ();
> > +      calculate_dominance_info (CDI_DOMINATORS);
> > +
> > +      find_moveable_pseudos ();
> > +      split_live_ranges_for_shrink_wrap ();
> > +
> > +      free_dominance_info (CDI_DOMINATORS);
> > +    }
> >
> 
> You probably want to add another df_analyze if
> split_live_ranges_for_shrink_wrap makes code transformations. AFAIU
> find_moveable_pseudos doesn't change global liveness but your
> transformation might. IRA/LRA need up-to-date DF_LR results to compute
> allocno live ranges.
> 

OK, I have changed the patch to fo that (it is below, still bootstraps
and passes tests on x86_64 fine).  However, I have noticed that the
corresponding part in function ira now looks like:

  /* ... */
  if (delete_trivially_dead_insns (get_insns (), max_reg_num ()))
    df_analyze ();

  /* It is not worth to do such improvement when we use a simple
     allocation because of -O0 usage or because the function is too
     big.  */
  if (ira_conflicts_p)
    {
      df_analyze ();
      calculate_dominance_info (CDI_DOMINATORS);

      find_moveable_pseudos ();
      if (split_live_ranges_for_shrink_wrap ())
	df_analyze ();

      free_dominance_info (CDI_DOMINATORS);
    }
  /* ... */

So, that left me wondering whether the first call to df_analyze is
actually necessary, or whether perhaps the data are actually already
up to date.  What do you think?

Thanks for all the feedback,

Martin


2013-10-23  Martin Jambor  <mjambor@suse.cz>

	PR rtl-optimization/10474
	* ira.c (find_moveable_pseudos): Do not calculate dominance info
	nor df analysis.
	(interesting_dest_for_shprep): New function.
	(split_live_ranges_for_shrink_wrap): Likewise.
	(ira): Calculate dominance info and df analysis. Call
	split_live_ranges_for_shrink_wrap.

testsuite/
	* gcc.dg/pr10474.c: New testcase.
	* gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
	* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.

Comments

Vladimir Makarov Oct. 29, 2013, 4:01 p.m. UTC | #1
On 10/25/2013 11:19 AM, Martin Jambor wrote:
> Hi,
>
> On Thu, Oct 24, 2013 at 01:02:51AM +0200, Steven Bosscher wrote:
>> On Wed, Oct 23, 2013 at 6:46 PM, Martin Jambor wrote:
>>
>>>  /* Perform the second half of the transformation started in
>>> @@ -4522,7 +4704,15 @@ ira (FILE *f)
>>>       allocation because of -O0 usage or because the function is too
>>>       big.  */
>>>    if (ira_conflicts_p)
>>> -    find_moveable_pseudos ();
>>> +    {
>>> +      df_analyze ();
>>> +      calculate_dominance_info (CDI_DOMINATORS);
>>> +
>>> +      find_moveable_pseudos ();
>>> +      split_live_ranges_for_shrink_wrap ();
>>> +
>>> +      free_dominance_info (CDI_DOMINATORS);
>>> +    }
>>>
>> You probably want to add another df_analyze if
>> split_live_ranges_for_shrink_wrap makes code transformations. AFAIU
>> find_moveable_pseudos doesn't change global liveness but your
>> transformation might. IRA/LRA need up-to-date DF_LR results to compute
>> allocno live ranges.
>>
> OK, I have changed the patch to fo that (it is below, still bootstraps
> and passes tests on x86_64 fine).  However, I have noticed that the
> corresponding part in function ira now looks like:
>
>   /* ... */
>   if (delete_trivially_dead_insns (get_insns (), max_reg_num ()))
>     df_analyze ();
>
>   /* It is not worth to do such improvement when we use a simple
>      allocation because of -O0 usage or because the function is too
>      big.  */
>   if (ira_conflicts_p)
>     {
>       df_analyze ();
>       calculate_dominance_info (CDI_DOMINATORS);
>
>       find_moveable_pseudos ();
>       if (split_live_ranges_for_shrink_wrap ())
> 	df_analyze ();
>
>       free_dominance_info (CDI_DOMINATORS);
>     }
>   /* ... */
>
> So, that left me wondering whether the first call to df_analyze is
> actually necessary, or whether perhaps the data are actually already
> up to date.  What do you think?
I guess it needs some investigation.  delete_trivially_dead_insns code
was taken from the old RA.  First of all, I don't know how many insns
are really trivially dead before RA in optimization and non-optimization
mode.  May be the code can be removed at all.  I'll put it on my todo list.
The patch is ok to commit.  Thanks for working on this, Martin.
>
> 2013-10-23  Martin Jambor  <mjambor@suse.cz>
>
> 	PR rtl-optimization/10474
> 	* ira.c (find_moveable_pseudos): Do not calculate dominance info
> 	nor df analysis.
> 	(interesting_dest_for_shprep): New function.
> 	(split_live_ranges_for_shrink_wrap): Likewise.
> 	(ira): Calculate dominance info and df analysis. Call
> 	split_live_ranges_for_shrink_wrap.
>
> testsuite/
> 	* gcc.dg/pr10474.c: New testcase.
> 	* gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
> 	* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
>
>
Jakub Jelinek Oct. 30, 2013, 11:16 p.m. UTC | #2
On Fri, Oct 25, 2013 at 05:19:06PM +0200, Martin Jambor wrote:
> 2013-10-23  Martin Jambor  <mjambor@suse.cz>
> 
> 	PR rtl-optimization/10474
> 	* ira.c (find_moveable_pseudos): Do not calculate dominance info
> 	nor df analysis.
> 	(interesting_dest_for_shprep): New function.
> 	(split_live_ranges_for_shrink_wrap): Likewise.
> 	(ira): Calculate dominance info and df analysis. Call
> 	split_live_ranges_for_shrink_wrap.
> 
> testsuite/
> 	* gcc.dg/pr10474.c: New testcase.
> 	* gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
> 	* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.

Unfortunately this patch breaks i686-linux bootstrap,
in r204204 compare passes, while in r204205 I'm getting .bad_compare
gcc/fortran/module.o differs                                                                                                                       
gcc/ipa.o differs                                                                                                                                  
gcc/go/gogo.o differs                                                                                                                              
gcc/go/statements.o differs                                                                                                                        
Most likely combine.o is miscompiled, but haven't verified that yet.

The way I'm configuring this on x86_64-linux is:
mkdir ~/hbin
cat > ~/hbin/as <<\EOF2
#!/bin/sh
exec /usr/bin/as --32 "$@"
EOF2
cat > ~/hbin/g++ <<\EOF2
#!/bin/sh
exec /usr/bin/g++ -m32 "$@"
EOF2
cat > ~/hbin/gcc <<\EOF2
#!/bin/sh
exec /usr/bin/gcc -m32 "$@"
EOF2
cat > ~/hbin/ld <<\EOF2
#!/bin/sh
case "$*" in
  --version) cat <<\EOF
GNU ld version 2.20.52.0.1-10.fc17 20100131
Copyright 2012 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later
version.
This program has absolutely no warranty.
EOF
  exit 0;;
esac
exec /usr/bin/ld -m elf_i386 -L /usr/lib/ "$@"
EOF2
chmod 755 ~/hbin/*
PATH=~/hbin:$PATH i386 ../configure --enable-languages=all,obj-c++,lto,go --enable-checking=yes,rtl
PATH=~/hbin:$PATH i386 make -j48

	Jakub
diff mbox

Patch

diff --git a/gcc/ira.c b/gcc/ira.c
index 203fbff..532db31 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3989,9 +3989,6 @@  find_moveable_pseudos (void)
   pseudo_replaced_reg.release ();
   pseudo_replaced_reg.safe_grow_cleared (max_regs);
 
-  df_analyze ();
-  calculate_dominance_info (CDI_DOMINATORS);
-
   i = 0;
   bitmap_initialize (&live, 0);
   bitmap_initialize (&used, 0);
@@ -4311,7 +4308,196 @@  find_moveable_pseudos (void)
   regstat_free_ri ();
   regstat_init_n_sets_and_refs ();
   regstat_compute_ri ();
-  free_dominance_info (CDI_DOMINATORS);
+}
+
+
+/* If insn is interesting for parameter range-splitting shring-wrapping
+   preparation, i.e. it is a single set from a hard register to a pseudo, which
+   is live at CALL_DOM, return the destination.  Otherwise return NULL.  */
+
+static rtx
+interesting_dest_for_shprep (rtx insn, basic_block call_dom)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return NULL;
+  rtx src = SET_SRC (set);
+  rtx dest = SET_DEST (set);
+  if (!REG_P (src) || !HARD_REGISTER_P (src)
+      || !REG_P (dest) || HARD_REGISTER_P (dest)
+      || (call_dom && !bitmap_bit_p (df_get_live_in (call_dom), REGNO (dest))))
+    return NULL;
+  return dest;
+}
+
+/* Split live ranges of pseudos that are loaded from hard registers in the
+   first BB in a BB that dominates all non-sibling call if such a BB can be
+   found and is not in a loop.  Return true if the function has made any
+   changes.  */
+
+static bool
+split_live_ranges_for_shrink_wrap (void)
+{
+  basic_block bb, call_dom = NULL;
+  basic_block first = single_succ (ENTRY_BLOCK_PTR);
+  rtx insn, last_interesting_insn = NULL;
+  bitmap_head need_new, reachable;
+  vec<basic_block> queue;
+
+  if (!flag_shrink_wrap)
+    return false;
+
+  bitmap_initialize (&need_new, 0);
+  bitmap_initialize (&reachable, 0);
+  queue.create (n_basic_blocks);
+
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+      if (CALL_P (insn) && !SIBLING_CALL_P (insn))
+	{
+	  if (bb == first)
+	    {
+	      bitmap_clear (&need_new);
+	      bitmap_clear (&reachable);
+	      queue.release ();
+	      return false;
+	    }
+
+	  bitmap_set_bit (&need_new, bb->index);
+	  bitmap_set_bit (&reachable, bb->index);
+	  queue.quick_push (bb);
+	  break;
+	}
+
+  if (queue.is_empty ())
+    {
+      bitmap_clear (&need_new);
+      bitmap_clear (&reachable);
+      queue.release ();
+      return false;
+    }
+
+  while (!queue.is_empty ())
+    {
+      edge e;
+      edge_iterator ei;
+
+      bb = queue.pop ();
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	if (e->dest != EXIT_BLOCK_PTR
+	    && bitmap_set_bit (&reachable, e->dest->index))
+	  queue.quick_push (e->dest);
+    }
+  queue.release ();
+
+  FOR_BB_INSNS (first, insn)
+    {
+      rtx dest = interesting_dest_for_shprep (insn, NULL);
+      if (!dest)
+	continue;
+
+      if (DF_REG_DEF_COUNT (REGNO (dest)) > 1)
+	{
+	  bitmap_clear (&need_new);
+	  bitmap_clear (&reachable);
+	  return false;
+	}
+
+      for (df_ref use = DF_REG_USE_CHAIN (REGNO(dest));
+	   use;
+	   use = DF_REF_NEXT_REG (use))
+	{
+	  if (NONDEBUG_INSN_P (DF_REF_INSN (use))
+	      && GET_CODE (DF_REF_REG (use)) == SUBREG)
+	    {
+	      /* This is necessary to avoid hitting an assert at
+		 postreload.c:2294 in libstc++ testcases on x86_64-linux.  I'm
+		 not really sure what the probblem actually is there.  */
+	      bitmap_clear (&need_new);
+	      bitmap_clear (&reachable);
+	      return false;
+	    }
+
+	  int ubbi = DF_REF_BB (use)->index;
+	  if (bitmap_bit_p (&reachable, ubbi))
+	    bitmap_set_bit (&need_new, ubbi);
+	}
+      last_interesting_insn = insn;
+    }
+
+  bitmap_clear (&reachable);
+  if (!last_interesting_insn)
+    {
+      bitmap_clear (&need_new);
+      return false;
+    }
+
+  call_dom = nearest_common_dominator_for_set (CDI_DOMINATORS, &need_new);
+  bitmap_clear (&need_new);
+  if (call_dom == first)
+    return false;
+
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
+  while (bb_loop_depth (call_dom) > 0)
+    call_dom = get_immediate_dominator (CDI_DOMINATORS, call_dom);
+  loop_optimizer_finalize ();
+
+  if (call_dom == first)
+    return false;
+
+  calculate_dominance_info (CDI_POST_DOMINATORS);
+  if (dominated_by_p (CDI_POST_DOMINATORS, first, call_dom))
+    {
+      free_dominance_info (CDI_POST_DOMINATORS);
+      return false;
+    }
+  free_dominance_info (CDI_POST_DOMINATORS);
+
+  if (dump_file)
+    fprintf (dump_file, "Will split live ranges of parameters at BB %i\n",
+	     call_dom->index);
+
+  bool ret = false;
+  FOR_BB_INSNS (first, insn)
+    {
+      rtx dest = interesting_dest_for_shprep (insn, call_dom);
+      if (!dest)
+	continue;
+
+      rtx newreg = NULL_RTX;
+      df_ref use, next;
+      for (use = DF_REG_USE_CHAIN (REGNO(dest)); use; use = next)
+	{
+	  rtx uin = DF_REF_INSN (use);
+	  next = DF_REF_NEXT_REG (use);
+
+	  basic_block ubb = BLOCK_FOR_INSN (uin);
+	  if (ubb == call_dom
+	      || dominated_by_p (CDI_DOMINATORS, ubb, call_dom))
+	    {
+	      if (!newreg)
+		newreg = ira_create_new_reg (dest);
+	      validate_change (uin, DF_REF_LOC (use), newreg, true);
+	    }
+	}
+
+      if (newreg)
+	{
+	  rtx new_move = gen_move_insn (newreg, dest);
+	  emit_insn_after (new_move, bb_note (call_dom));
+	  if (dump_file)
+	    {
+	      fprintf (dump_file, "Split live-range of register ");
+	      print_rtl_single (dump_file, dest);
+	    }
+	  ret = true;
+	}
+
+      if (insn == last_interesting_insn)
+	break;
+    }
+  apply_change_group ();
+  return ret;
 }
 
 /* Perform the second half of the transformation started in
@@ -4522,7 +4708,16 @@  ira (FILE *f)
      allocation because of -O0 usage or because the function is too
      big.  */
   if (ira_conflicts_p)
-    find_moveable_pseudos ();
+    {
+      df_analyze ();
+      calculate_dominance_info (CDI_DOMINATORS);
+
+      find_moveable_pseudos ();
+      if (split_live_ranges_for_shrink_wrap ())
+	df_analyze ();
+
+      free_dominance_info (CDI_DOMINATORS);
+    }
 
   max_regno_before_ira = max_reg_num ();
   ira_setup_eliminable_regset (true);
diff --git a/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c
new file mode 100644
index 0000000..fe497c2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-rtl-ira -fdump-rtl-pro_and_epilogue"  } */
+
+int __attribute__((noinline, noclone))
+foo (int a)
+{
+  return a + 5;
+}
+
+static int g;
+
+int __attribute__((noinline, noclone))
+bar (int a)
+{
+  int r;
+
+  if (a)
+    {
+      r = foo (a);
+      g = r + a;
+    }
+  else
+    r = a+1;
+  return r;
+}
+
+/* { dg-final { scan-rtl-dump "Will split live ranges of parameters" "ira"  } } */
+/* { dg-final { scan-rtl-dump "Split live-range of register" "ira"  } } */
+/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue"  } } */
+/* { dg-final { cleanup-rtl-dump "ira" } } */
+/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */
diff --git a/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c
new file mode 100644
index 0000000..872a757
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-rtl-ira -fdump-rtl-pro_and_epilogue"  } */
+
+int __attribute__((noinline, noclone))
+foo (int a)
+{
+  return a + 5;
+}
+
+static int g;
+
+int __attribute__((noinline, noclone))
+bar (int a)
+{
+  int r;
+
+  if (a)
+    {
+      r = a;
+      while (r < 500)
+	if (r % 2)
+	  r = foo (r);
+	else
+	  r = foo (r+1);
+      g = r + a;
+    }
+  else
+    r = g+1;
+  return r;
+}
+
+/* { dg-final { scan-rtl-dump "Will split live ranges of parameters" "ira"  } } */
+/* { dg-final { scan-rtl-dump "Split live-range of register" "ira"  } } */
+/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue"  } } */
+/* { dg-final { cleanup-rtl-dump "ira" } } */
+/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */
diff --git a/gcc/testsuite/gcc.dg/pr10474.c b/gcc/testsuite/gcc.dg/pr10474.c
new file mode 100644
index 0000000..ee085c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr10474.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-rtl-pro_and_epilogue"  } */
+
+void f(int *i)
+{
+	if (!i)
+		return;
+	else
+	{
+		__builtin_printf("Hi");
+		*i=0;
+	}
+}
+
+/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue"  } } */
+/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */