diff mbox series

Improve code quality across partition boundaries

Message ID 20180409141745.GA86100@kam.mff.cuni.cz
State New
Headers show
Series Improve code quality across partition boundaries | expand

Commit Message

Jan Hubicka April 9, 2018, 2:17 p.m. UTC
Hi,
this patch fixes most offending artifacts across the function partition
bounary with -freorder-blocks-and-partitions which is now on by default
for i386 (no other targets).  The problem is that most of cfgcleanup and
cfgrtl is disabled on crossing edges and thus we produce pretty ugly
code.

The comment I removed reffers to non-existing comment, but the fixup
of partitioning only looks for crossing jumps. It generates new
patterns for unconditional jumps and ads trampolines for conditional
jumps if taret requests it (i386 doesn't).

It is thus safe to turn any far jump to near jump (as this patch does)
and based on target we can do more (this patch doesn't)

For this late of stage4 I think we are fine with enabling edge forwarding
only. This saves about 0.5% of code segment on cc1 (measured with and
without patch so the code pays back at least) with profiledbootstrap.
I have patches to fix other issues but I think they can wait for next
stage1.  In some cases we would get better code with crossjumping
too, but it does not seem critical.

profiledbootstrapped/regtested x86_64, ltoprofiledbootstrapped x86_64 and also
profiledbootstrapped on power. I am running firefox build with LTO+FDO on
x86_64 too and regtesting on power. OK?

Honza

	PR rtl/84058
	* cfgcleanup.c (try_forward_edges): Do not give up on crossing
	jumps; choose last target that matches the criteria (i.e.
	no partition changes for non-crossing jumps).
	* cfgrtl.c (cfg_layout_redirect_edge_and_branch): Add basic
	support for redirecting crossing jumps to non-crossing.

Comments

Richard Biener April 9, 2018, 3:43 p.m. UTC | #1
On April 9, 2018 4:17:45 PM GMT+02:00, Jan Hubicka <hubicka@ucw.cz> wrote:
>Hi,
>this patch fixes most offending artifacts across the function partition
>bounary with -freorder-blocks-and-partitions which is now on by default
>for i386 (no other targets).  The problem is that most of cfgcleanup
>and
>cfgrtl is disabled on crossing edges and thus we produce pretty ugly
>code.
>
>The comment I removed reffers to non-existing comment, but the fixup
>of partitioning only looks for crossing jumps. It generates new
>patterns for unconditional jumps and ads trampolines for conditional
>jumps if taret requests it (i386 doesn't).
>
>It is thus safe to turn any far jump to near jump (as this patch does)
>and based on target we can do more (this patch doesn't)
>
>For this late of stage4 I think we are fine with enabling edge
>forwarding
>only. This saves about 0.5% of code segment on cc1 (measured with and
>without patch so the code pays back at least) with profiledbootstrap.
>I have patches to fix other issues but I think they can wait for next
>stage1.  In some cases we would get better code with crossjumping
>too, but it does not seem critical.
>
>profiledbootstrapped/regtested x86_64, ltoprofiledbootstrapped x86_64
>and also
>profiledbootstrapped on power. I am running firefox build with LTO+FDO
>on
>x86_64 too and regtesting on power. OK?

OK. 

Richard. 

>Honza
>
>	PR rtl/84058
>	* cfgcleanup.c (try_forward_edges): Do not give up on crossing
>	jumps; choose last target that matches the criteria (i.e.
>	no partition changes for non-crossing jumps).
>	* cfgrtl.c (cfg_layout_redirect_edge_and_branch): Add basic
>	support for redirecting crossing jumps to non-crossing.
>Index: cfgcleanup.c
>===================================================================
>--- cfgcleanup.c	(revision 259167)
>+++ cfgcleanup.c	(working copy)
>@@ -394,19 +394,6 @@
>   edge_iterator ei;
>   edge e, *threaded_edges = NULL;
> 
>-  /* If we are partitioning hot/cold basic blocks, we don't want to
>-     mess up unconditional or indirect jumps that cross between hot
>-     and cold sections.
>-
>-     Basic block partitioning may result in some jumps that appear to
>-     be optimizable (or blocks that appear to be mergeable), but which
>really
>-     must be left untouched (they are required to make it safely
>across
>-     partition boundaries).  See the comments at the top of
>-     bb-reorder.c:partition_hot_cold_basic_blocks for complete
>details.  */
>-
>-  if (JUMP_P (BB_END (b)) && CROSSING_JUMP_P (BB_END (b)))
>-    return false;
>-
>   for (ei = ei_start (b->succs); (e = ei_safe_edge (ei)); )
>     {
>       basic_block target, first;
>@@ -415,6 +402,7 @@
>       bool threaded = false;
>       int nthreaded_edges = 0;
>       bool may_thread = first_pass || (b->flags & BB_MODIFIED) != 0;
>+      bool new_target_threaded = false;
> 
>       /* Skip complex edges because we don't know how to update them.
> 
>@@ -431,29 +419,12 @@
>       counter = NUM_FIXED_BLOCKS;
>       goto_locus = e->goto_locus;
> 
>-      /* If we are partitioning hot/cold basic_blocks, we don't want
>to mess
>-	 up jumps that cross between hot/cold sections.
>-
>-	 Basic block partitioning may result in some jumps that appear
>-	 to be optimizable (or blocks that appear to be mergeable), but which
>-	 really must be left untouched (they are required to make it safely
>-	 across partition boundaries).  See the comments at the top of
>-	 bb-reorder.c:partition_hot_cold_basic_blocks for complete
>-	 details.  */
>-
>-      if (first != EXIT_BLOCK_PTR_FOR_FN (cfun)
>-	  && JUMP_P (BB_END (first))
>-	  && CROSSING_JUMP_P (BB_END (first)))
>-	return changed;
>-
>       while (counter < n_basic_blocks_for_fn (cfun))
> 	{
> 	  basic_block new_target = NULL;
>-	  bool new_target_threaded = false;
> 	  may_thread |= (target->flags & BB_MODIFIED) != 0;
> 
> 	  if (FORWARDER_BLOCK_P (target)
>-	      && !(single_succ_edge (target)->flags & EDGE_CROSSING)
> 	      && single_succ (target) != EXIT_BLOCK_PTR_FOR_FN (cfun))
> 	    {
> 	      /* Bypass trivial infinite loops.  */
>@@ -543,8 +514,14 @@
> 	    break;
> 
> 	  counter++;
>-	  target = new_target;
>-	  threaded |= new_target_threaded;
>+	  /* Do not turn non-crossing jump to crossing.  Depending on target
>+	     it may require different instruction pattern.  */
>+	  if ((e->flags & EDGE_CROSSING)
>+	      || BB_PARTITION (first) == BB_PARTITION (new_target))
>+	    {
>+	      target = new_target;
>+	      threaded |= new_target_threaded;
>+	    }
> 	}
> 
>       if (counter >= n_basic_blocks_for_fn (cfun))
>Index: cfgrtl.c
>===================================================================
>--- cfgrtl.c	(revision 259167)
>+++ cfgrtl.c	(working copy)
>@@ -4361,6 +4361,18 @@
>   if (e->dest == dest)
>     return e;
> 
>+  if (e->flags & EDGE_CROSSING
>+      && BB_PARTITION (e->src) == BB_PARTITION (dest)
>+      && simplejump_p (BB_END (src)))
>+    {
>+      if (dump_file)
>+	fprintf (dump_file,
>+	  	 "Removing crossing jump while redirecting edge form %i to %i\n",
>+		 e->src->index, dest->index);
>+      delete_insn (BB_END (src));
>+      e->flags |= EDGE_FALLTHRU;
>+    }
>+
>   if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
>       && (ret = try_redirect_by_replacing_jump (e, dest, true)))
>     {
>@@ -4424,8 +4436,9 @@
>   else
>     ret = redirect_branch_edge (e, dest);
> 
>+  fixup_partition_crossing (ret);
>  /* We don't want simplejumps in the insn stream during cfglayout.  */
>-  gcc_assert (!simplejump_p (BB_END (src)));
>+  gcc_assert (!simplejump_p (BB_END (src)) || CROSSING_JUMP_P (BB_END
>(src)));
> 
>   df_set_bb_dirty (src);
>   return ret;
diff mbox series

Patch

Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c	(revision 259167)
+++ cfgcleanup.c	(working copy)
@@ -394,19 +394,6 @@ 
   edge_iterator ei;
   edge e, *threaded_edges = NULL;
 
-  /* If we are partitioning hot/cold basic blocks, we don't want to
-     mess up unconditional or indirect jumps that cross between hot
-     and cold sections.
-
-     Basic block partitioning may result in some jumps that appear to
-     be optimizable (or blocks that appear to be mergeable), but which really
-     must be left untouched (they are required to make it safely across
-     partition boundaries).  See the comments at the top of
-     bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
-
-  if (JUMP_P (BB_END (b)) && CROSSING_JUMP_P (BB_END (b)))
-    return false;
-
   for (ei = ei_start (b->succs); (e = ei_safe_edge (ei)); )
     {
       basic_block target, first;
@@ -415,6 +402,7 @@ 
       bool threaded = false;
       int nthreaded_edges = 0;
       bool may_thread = first_pass || (b->flags & BB_MODIFIED) != 0;
+      bool new_target_threaded = false;
 
       /* Skip complex edges because we don't know how to update them.
 
@@ -431,29 +419,12 @@ 
       counter = NUM_FIXED_BLOCKS;
       goto_locus = e->goto_locus;
 
-      /* If we are partitioning hot/cold basic_blocks, we don't want to mess
-	 up jumps that cross between hot/cold sections.
-
-	 Basic block partitioning may result in some jumps that appear
-	 to be optimizable (or blocks that appear to be mergeable), but which
-	 really must be left untouched (they are required to make it safely
-	 across partition boundaries).  See the comments at the top of
-	 bb-reorder.c:partition_hot_cold_basic_blocks for complete
-	 details.  */
-
-      if (first != EXIT_BLOCK_PTR_FOR_FN (cfun)
-	  && JUMP_P (BB_END (first))
-	  && CROSSING_JUMP_P (BB_END (first)))
-	return changed;
-
       while (counter < n_basic_blocks_for_fn (cfun))
 	{
 	  basic_block new_target = NULL;
-	  bool new_target_threaded = false;
 	  may_thread |= (target->flags & BB_MODIFIED) != 0;
 
 	  if (FORWARDER_BLOCK_P (target)
-	      && !(single_succ_edge (target)->flags & EDGE_CROSSING)
 	      && single_succ (target) != EXIT_BLOCK_PTR_FOR_FN (cfun))
 	    {
 	      /* Bypass trivial infinite loops.  */
@@ -543,8 +514,14 @@ 
 	    break;
 
 	  counter++;
-	  target = new_target;
-	  threaded |= new_target_threaded;
+	  /* Do not turn non-crossing jump to crossing.  Depending on target
+	     it may require different instruction pattern.  */
+	  if ((e->flags & EDGE_CROSSING)
+	      || BB_PARTITION (first) == BB_PARTITION (new_target))
+	    {
+	      target = new_target;
+	      threaded |= new_target_threaded;
+	    }
 	}
 
       if (counter >= n_basic_blocks_for_fn (cfun))
Index: cfgrtl.c
===================================================================
--- cfgrtl.c	(revision 259167)
+++ cfgrtl.c	(working copy)
@@ -4361,6 +4361,18 @@ 
   if (e->dest == dest)
     return e;
 
+  if (e->flags & EDGE_CROSSING
+      && BB_PARTITION (e->src) == BB_PARTITION (dest)
+      && simplejump_p (BB_END (src)))
+    {
+      if (dump_file)
+	fprintf (dump_file,
+	  	 "Removing crossing jump while redirecting edge form %i to %i\n",
+		 e->src->index, dest->index);
+      delete_insn (BB_END (src));
+      e->flags |= EDGE_FALLTHRU;
+    }
+
   if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
       && (ret = try_redirect_by_replacing_jump (e, dest, true)))
     {
@@ -4424,8 +4436,9 @@ 
   else
     ret = redirect_branch_edge (e, dest);
 
+  fixup_partition_crossing (ret);
   /* We don't want simplejumps in the insn stream during cfglayout.  */
-  gcc_assert (!simplejump_p (BB_END (src)));
+  gcc_assert (!simplejump_p (BB_END (src)) || CROSSING_JUMP_P (BB_END (src)));
 
   df_set_bb_dirty (src);
   return ret;