diff mbox

Enable multiple duplicated blocks on threading path

Message ID 528C1692.4090702@redhat.com
State New
Headers show

Commit Message

Jeff Law Nov. 20, 2013, 1:55 a.m. UTC
So when we first started looking at the FSA/FSM optimization for 
coremark, I speculated that it could be done in the jump threader and 
that by doing so we'd see benefits in more general codes.

This patch enables the code to allow multiple duplicated blocks on a 
threading path.  It won't catch the coremark case as-is because of the 
loop issues (which I'll be returning to immediately).

We can measure the benefits quite easily using valgrind/cachegrind.

I've used the trunk compiler with and without this patch to build 
gcc-4.7.3 (no special options, so it includes checking).  I then run .i 
files through that 4.7.3 compiler under the watchful eye of valgrind.

Valgrind gives me nice data about the number of dynamic branches, # 
instructions and the like.  Some of you may recall similar tests in 2011 :-)

This patch allows GCC to eliminate .34% of the dynamic conditional 
branches and .26% of the total number of dynamic instructions. 
Furthermore, if you compare the total number of dynamic branches 
eliminated vs total number of instructions eliminated, you'd find that 
for each dynamic branch eliminated, we also eliminate 3.3 other 
instructions.

To compare, the 2011 testing saw a .63% reduction in dynamic branches 
and a .50% reduction in total dynamic instructions eliminated.    We 
eliminated 3.4 other instructions for each dynamic branch eliminated.

So this picks up fewer cases that the 2011 changes.  But it's still 
significant.  The secondary effects are consistent,  which isn't a great 
surprise I suppose.

There's still some obvious improvements that can be made here, but this 
code is far enough along that it should go in now so that focus can move 
to the loop issues to enable the FSA/FSM coremark optimization.



Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed on the trunk.
* tree-ssa-threadupdate.c: Fix trailing whitespace.
	* tree-ssa-threadupdate.h: Likewise.

Comments

David Malcolm Nov. 20, 2013, 2:33 a.m. UTC | #1
On Tue, 2013-11-19 at 18:55 -0700, Jeff Law wrote:
> So when we first started looking at the FSA/FSM optimization for 
> coremark, I speculated that it could be done in the jump threader and 
> that by doing so we'd see benefits in more general codes.
> 
> This patch enables the code to allow multiple duplicated blocks on a 
> threading path.  It won't catch the coremark case as-is because of the 
> loop issues (which I'll be returning to immediately).
> 
> We can measure the benefits quite easily using valgrind/cachegrind.
> 
> I've used the trunk compiler with and without this patch to build 
> gcc-4.7.3 (no special options, so it includes checking).  I then run .i 
> files through that 4.7.3 compiler under the watchful eye of valgrind.
> 
> Valgrind gives me nice data about the number of dynamic branches, # 
> instructions and the like.  Some of you may recall similar tests in 2011 :-)
> 
> This patch allows GCC to eliminate .34% of the dynamic conditional 
> branches and .26% of the total number of dynamic instructions. 
> Furthermore, if you compare the total number of dynamic branches 
> eliminated vs total number of instructions eliminated, you'd find that 
> for each dynamic branch eliminated, we also eliminate 3.3 other 
> instructions.
> 
> To compare, the 2011 testing saw a .63% reduction in dynamic branches 
> and a .50% reduction in total dynamic instructions eliminated.    We 
> eliminated 3.4 other instructions for each dynamic branch eliminated.
> 
> So this picks up fewer cases that the 2011 changes.  But it's still 
> significant.  The secondary effects are consistent,  which isn't a great 
> surprise I suppose.
> 
> There's still some obvious improvements that can be made here, but this 
> code is far enough along that it should go in now so that focus can move 
> to the loop issues to enable the FSA/FSM coremark optimization.
> 
> 
> 
> Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
> Installed on the trunk.
> 
> plain text document attachment (patch)
> 	* tree-ssa-threadupdate.c: Fix trailing whitespace.
> 	* tree-ssa-threadupdate.h: Likewise.

FWIW, it looks like you attached the whitespace cleanup patch again,
rather than the one you discuss above.

For the archives, it looks like your email is referring to r205074
(though that itself seems to have some purely whitespace fixes, together
with the real changes).
diff mbox

Patch

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index afd7ac4..1b7c73d 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -119,7 +119,7 @@  struct redirection_data : typed_free_remove<redirection_data>
      and wire up its single remaining outgoing edge to the thread path.
 
      The other is a joiner block where we leave the control statement
-     in place, but wire one of the outgoing edges to a thread path. 
+     in place, but wire one of the outgoing edges to a thread path.
 
      In theory we could have multiple block duplicates in a jump
      threading path, but I haven't tried that.
@@ -470,7 +470,7 @@  ssa_fix_duplicate_block_edges (struct redirection_data *rd,
   vec<jump_thread_edge *> *path = THREAD_PATH (e);
 
   for (unsigned int count = 0, i = 1; i < path->length (); i++)
-    { 
+    {
       /* If we were threading through an joiner block, then we want
 	 to keep its control statement and redirect an outgoing edge.
 	 Else we want to remove the control statement & edges, then create
@@ -549,10 +549,10 @@  ssa_create_duplicates (struct redirection_data **slot,
   struct redirection_data *rd = *slot;
 
   /* The second duplicated block in a jump threading path is specific
-     to the path.  So it gets stored in RD rather than in LOCAL_DATA. 
+     to the path.  So it gets stored in RD rather than in LOCAL_DATA.
 	
      Each time we're called, we have to look through the path and see
-     if a second block needs to be duplicated. 
+     if a second block needs to be duplicated.
 
      Note the search starts with the third edge on the path.  The first
      edge is the incoming edge, the second edge always has its source
@@ -567,7 +567,7 @@  ssa_create_duplicates (struct redirection_data **slot,
 	  break;
 	}
     }
-  
+
   /* Create a template block if we have not done so already.  Otherwise
      use the template to create a new block.  */
   if (local_info->template_block == NULL)
@@ -732,7 +732,7 @@  redirection_block_p (basic_block bb)
    the appropriate duplicate of BB.
 
    If NOLOOP_ONLY is true, we only perform the threading as long as it
-   does not affect the structure of the loops in a nontrivial way. 
+   does not affect the structure of the loops in a nontrivial way.
 
    If JOINERS is true, then thread through joiner blocks as well.  */
 
@@ -892,7 +892,7 @@  thread_block_1 (basic_block bb, bool noloop_only, bool joiners)
    By doing things this way we can be as aggressive as possible and
    not worry that copying a joiner block will create a jump threading
    opportunity.  */
-  
+
 static bool
 thread_block (basic_block bb, bool noloop_only)
 {
@@ -1591,7 +1591,7 @@  thread_through_all_blocks (bool may_peel_loop_headers)
     }
 
   /* Assume we had a jump thread path which went from the latch to the exit
-     and a path which goes from outside to inside the same loop.  
+     and a path which goes from outside to inside the same loop.
 
      If the latch to exit was handled first, we will thread it and clear
      loop->header.
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 4617b9c..4950170 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -1,5 +1,5 @@ 
 /* Communication between registering jump thread requests and
-   updating the SSA/CFG for jump threading. 
+   updating the SSA/CFG for jump threading.
    Copyright (C) 2013 Free Software Foundation, Inc.
 
 This file is part of GCC.