diff mbox

[2/4] bb-reorder: Add the "simple" algorithm

Message ID a691f5437e32cec13ab9f9776975e95be67c4811.1443028412.git.segher@kernel.crashing.org
State New
Headers show

Commit Message

Segher Boessenkool Sept. 23, 2015, 10:06 p.m. UTC
This is the meat of this series: a new algorithm to do basic block
reordering.  It uses the simple greedy approach to maximum weighted
matching, where the weights are the predicted execution frequency of
the edges.  This always finds a solution that is within a factor two
of optimal, if you disregard loops (which we cannot allow) and the
complications of block partitioning.


2015-09-23   Segher Boessenkool  <segher@kernel.crashing.org>

	* bb-reorder.c (reorder_basic_blocks_software_trace_cache): Print
	a header to the dump file.
	(edge_order): New function.
	(reorder_basic_blocks_simple): New function.
	(reorder_basic_blocks): Choose between the STC and the simple
	algorithms (always choose the former).

---
 gcc/bb-reorder.c | 160 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 159 insertions(+), 1 deletion(-)

Comments

Bernd Schmidt Sept. 24, 2015, 10:32 a.m. UTC | #1
On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> This is the meat of this series: a new algorithm to do basic block
> reordering.  It uses the simple greedy approach to maximum weighted
> matching, where the weights are the predicted execution frequency of
> the edges.  This always finds a solution that is within a factor two
> of optimal, if you disregard loops (which we cannot allow) and the
> complications of block partitioning.

Looks really good for the most part.

The comment at the top of the file should be updated to mention both 
algorithms.

> +  /* Sort the edges, the most desirable first.  */
> +
> +  std::stable_sort (edges, edges + n, edge_order);

Any thoughts on this vs qsort? Do you need a stable sort?

> +  int j;
> +  for (j = 0; j < n; j++)

for (int j ...
here and in the other loop that uses j.

> +  /* If the entry edge no longer falls through we have to make a new
> +     block so it can do so again.  */
> +
> +  edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> +  if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
> +    {
> +      force_nonfallthru (e);
> +      e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
> +      BB_COPY_PARTITION (e->src, e->dest);
> +    }
> +}

That's a bit odd, can this situation be prevented earlier? Why wouldn't 
we force the entry edge to fall thru?


Bernd
Segher Boessenkool Sept. 24, 2015, 1:39 p.m. UTC | #2
On Thu, Sep 24, 2015 at 12:32:59PM +0200, Bernd Schmidt wrote:
> On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> >This is the meat of this series: a new algorithm to do basic block
> >reordering.  It uses the simple greedy approach to maximum weighted
> >matching, where the weights are the predicted execution frequency of
> >the edges.  This always finds a solution that is within a factor two
> >of optimal, if you disregard loops (which we cannot allow) and the
> >complications of block partitioning.
> 
> Looks really good for the most part.
> 
> The comment at the top of the file should be updated to mention both 
> algorithms.

Will do.

> >+  /* Sort the edges, the most desirable first.  */
> >+
> >+  std::stable_sort (edges, edges + n, edge_order);
> 
> Any thoughts on this vs qsort? Do you need a stable sort?

We always need stable sorts in GCC; things are not reproducible across
targets with qsort (not every qsort is the same).

Also, you sometimes have two edges back-to-back, with the same weights;
a stable sort ensures we don't put a jump in the middle of that if we
can help it.

> >+  int j;
> >+  for (j = 0; j < n; j++)
> 
> for (int j ...
> here and in the other loop that uses j.

That is so ugly.  Will change though :-)

> >+  /* If the entry edge no longer falls through we have to make a new
> >+     block so it can do so again.  */
> >+
> >+  edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> >+  if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
> >+    {
> >+      force_nonfallthru (e);
> >+      e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
> >+      BB_COPY_PARTITION (e->src, e->dest);
> >+    }
> >+}
> 
> That's a bit odd, can this situation be prevented earlier?

We could always add an extra jump at the start, but that's about the
same code so not helpful.

> Why wouldn't we force the entry edge to fall thru?

Because it pessimises code.  If the model thinks the first block after
entry belongs somewhere in the middle of a fall-through sequence, it
usually is right (typical examples are loops that start with the loop
test).  This algorithm does not peel loops (it does no duplication at
all).

All the optimisable blocks end with an unconditional jump, and this algo
tries to remove as many of those as it can; logically, the start block
has such a jump as well.


Segher
Segher Boessenkool Sept. 24, 2015, 2:31 p.m. UTC | #3
On Thu, Sep 24, 2015 at 08:39:30AM -0500, Segher Boessenkool wrote:
> > Any thoughts on this vs qsort? Do you need a stable sort?
> 
> We always need stable sorts in GCC; things are not reproducible across
> targets with qsort (not every qsort is the same).

s/targets/hosts/
Steven Bosscher Sept. 24, 2015, 4:03 p.m. UTC | #4
On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote:
> +  /* First, collect all edges that can be optimized by reordering blocks:
> +     simple jumps and conditional jumps, as well as the function entry edge.  */
> +
> +  int n = 0;
> +  edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      rtx_insn *end = BB_END (bb);
> +
> +      if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
> +       continue;

Should handle ASM jumps.


> +  FOR_ALL_BB_FN (bb, cfun)
> +    bb->aux = bb;

Bit tricky for the ENTRY and EXIT blocks, that are not really basic
blocks. After the pass, EXIT should not end up pointing to itself.
Maybe use FOR_EACH_BB_FN and set ENTRY separately?


Other than that, looks good to me.

Ciao!
Steven
Segher Boessenkool Sept. 24, 2015, 4:38 p.m. UTC | #5
On Thu, Sep 24, 2015 at 06:03:33PM +0200, Steven Bosscher wrote:
> On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote:
> > +  /* First, collect all edges that can be optimized by reordering blocks:
> > +     simple jumps and conditional jumps, as well as the function entry edge.  */
> > +
> > +  int n = 0;
> > +  edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> > +
> > +  basic_block bb;
> > +  FOR_EACH_BB_FN (bb, cfun)
> > +    {
> > +      rtx_insn *end = BB_END (bb);
> > +
> > +      if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
> > +       continue;
> 
> Should handle ASM jumps.

Right, those are considered as optimisable now, although they are not.
Will fix.

> > +  FOR_ALL_BB_FN (bb, cfun)
> > +    bb->aux = bb;
> 
> Bit tricky for the ENTRY and EXIT blocks, that are not really basic
> blocks. After the pass, EXIT should not end up pointing to itself.

But it doesn't, the next line already takes care of it.


Segher
diff mbox

Patch

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 725cdc3..40e9e50 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2231,6 +2231,9 @@  update_crossing_jump_flags (void)
 static void
 reorder_basic_blocks_software_trace_cache (void)
 {
+  if (dump_file)
+    fprintf (dump_file, "\nReordering with the STC algorithm.\n\n");
+
   int n_traces;
   int i;
   struct trace *traces;
@@ -2261,6 +2264,158 @@  reorder_basic_blocks_software_trace_cache (void)
   FREE (bbd);
 }
 
+/* Return true if edge E1 is more desirable as a fallthrough edge than
+   edge E2 is.  */
+
+static bool
+edge_order (edge e1, edge e2)
+{
+  return EDGE_FREQUENCY (e1) > EDGE_FREQUENCY (e2);
+}
+
+/* Reorder basic blocks using the "simple" algorithm.  This tries to
+   maximize the dynamic number of branches that are fallthrough, without
+   copying instructions.  The algorithm is greedy, looking at the most
+   frequently executed branch first.  */
+
+static void
+reorder_basic_blocks_simple (void)
+{
+  if (dump_file)
+    fprintf (dump_file, "\nReordering with the \"simple\" algorithm.\n\n");
+
+  edge *edges = new edge[2 * n_basic_blocks_for_fn (cfun)];
+
+  /* First, collect all edges that can be optimized by reordering blocks:
+     simple jumps and conditional jumps, as well as the function entry edge.  */
+
+  int n = 0;
+  edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      rtx_insn *end = BB_END (bb);
+
+      if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
+	continue;
+
+      if (any_condjump_p (end))
+	{
+	  edges[n++] = EDGE_SUCC (bb, 0);
+	  edges[n++] = EDGE_SUCC (bb, 1);
+	}
+      else if (single_succ_p (bb))
+	edges[n++] = EDGE_SUCC (bb, 0);
+    }
+
+  /* Sort the edges, the most desirable first.  */
+
+  std::stable_sort (edges, edges + n, edge_order);
+
+  /* Now decide which of those edges to make fallthrough edges.  We set
+     BB_VISITED if a block already has a fallthrough successor assigned
+     to it.  We make ->AUX of an endpoint point to the opposite endpoint
+     of a sequence of blocks that fall through, and ->AUX will be NULL
+     for a block that is in such a sequence but not an endpoint anymore.
+
+     To start with, everything points to itself, nothing is assigned yet.  */
+
+  FOR_ALL_BB_FN (bb, cfun)
+    bb->aux = bb;
+
+  EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
+
+  /* Now for all edges, the most desirable first, see if that edge can
+     connect two sequences.  If it can, update AUX and BB_VISITED; if it
+     cannot, zero out the edge in the table.  */
+
+  int j;
+  for (j = 0; j < n; j++)
+    {
+      edge e = edges[j];
+
+      basic_block tail_a = e->src;
+      basic_block head_b = e->dest;
+      basic_block head_a = (basic_block) tail_a->aux;
+      basic_block tail_b = (basic_block) head_b->aux;
+
+      /* An edge cannot connect two sequences if:
+	 - it crosses partitions;
+	 - its src is not a current endpoint;
+	 - its dest is not a current endpoint;
+	 - or, it would create a loop.  */
+
+      if (e->flags & EDGE_CROSSING
+	  || tail_a->flags & BB_VISITED
+	  || !tail_b
+	  || (!(head_b->flags & BB_VISITED) && head_b != tail_b)
+	  || tail_a == tail_b)
+	{
+	  edges[j] = 0;
+	  continue;
+	}
+
+      tail_a->aux = 0;
+      head_b->aux = 0;
+      head_a->aux = tail_b;
+      tail_b->aux = head_a;
+      tail_a->flags |= BB_VISITED;
+    }
+
+  /* Put the pieces together, in the same order that the start blocks of
+     the sequences already had.  The hot/cold partitioning gives a little
+     complication: as a first pass only do this for blocks in the same
+     partition as the start block, and (if there is anything left to do)
+     in a second pass handle the other partition.  */
+
+  basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
+
+  int current_partition = BB_PARTITION (last_tail);
+  bool need_another_pass = true;
+
+  for (int pass = 0; pass < 2 && need_another_pass; pass++)
+    {
+      need_another_pass = false;
+
+      FOR_EACH_BB_FN (bb, cfun)
+	if ((bb->flags & BB_VISITED && bb->aux) || bb->aux == bb)
+	  {
+	    if (BB_PARTITION (bb) != current_partition)
+	      {
+		need_another_pass = true;
+		continue;
+	      }
+
+	    last_tail->aux = bb;
+	    last_tail = (basic_block) bb->aux;
+	  }
+
+      current_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
+    }
+
+  last_tail->aux = 0;
+
+  /* Finally, link all the chosen fallthrough edges.  */
+
+  for (j = 0; j < n; j++)
+    if (edges[j])
+      edges[j]->src->aux = edges[j]->dest;
+
+  delete[] edges;
+
+  /* If the entry edge no longer falls through we have to make a new
+     block so it can do so again.  */
+
+  edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
+  if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
+    {
+      force_nonfallthru (e);
+      e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
+      BB_COPY_PARTITION (e->src, e->dest);
+    }
+}
+
 /* Reorder basic blocks.  The main entry point to this file.  */
 
 static void
@@ -2274,7 +2429,10 @@  reorder_basic_blocks (void)
   set_edge_can_fallthru_flag ();
   mark_dfs_back_edges ();
 
-  reorder_basic_blocks_software_trace_cache ();
+  if (1)
+    reorder_basic_blocks_software_trace_cache ();
+  else
+    reorder_basic_blocks_simple ();
 
   relink_block_chain (/*stay_in_cfglayout_mode=*/true);