Patchwork Speedup loop header copying [part of PR 46590]

login
register
mail settings
Submitter Michael Matz
Date Sept. 2, 2012, 7:35 p.m.
Message ID <alpine.LNX.2.00.1209022054160.9940@wotan.suse.de>
Download mbox | patch
Permalink /patch/181228/
State New
Headers show

Comments

Michael Matz - Sept. 2, 2012, 7:35 p.m.
Hi,

as the bug report tells us one speed problem is loop header copying, in 
particular the update_ssa call that is done for each and every copied loop 
header but touches all blocks in a function.

Now, one idea was to use an optimized update_ssa that works only on the 
relevant subset of blocks (it's dominance frontiers that are the problem).  
I've experimented with the original formulation of frontiers as per Cytron 
which allows to calculate the domfrontier of one basic block lazily.  The 
end result was none, no speedup, no slowdown.  I haven't investigated, 
but I guess the problem is that too often most of the blocks are relevant 
for most of the header copies, either because of virtual ops or because 
calculating the domfrontier of a block needs all domfrontiers of all 
dominated childs (i.e. the domfrontier of the entry needs domfrontiers of 
all blocks).  So, no cake there.

But actually there's no reason that we need to keep SSA form uptodate 
during the multiple header copyings.  We use gimple_duplicate_sese_region 
to do the copying which updates the SSA web before returning (actually 
loop header copying is the only caller of it).  The next thing done is 
just another call to gimple_duplicate_sese_region to copy some other BBs, 
then some split edges, repeat from start.  We can just as well defer the 
whole SSA web updating to after we've duplicated everything we want.

That's what this patch does.  Time for various things on the testcase 
(with -O1):

                         without     with patch
tree SSA rewrite         26.2s         4.8s
tree SSA incremental     21.7s         4.6s
dominance computation    15.0s         4.2s
dominance frontiers      25.6s         6.7s
TOTAL                   135.6s        67.8s

Regstrapped on x86_64-linux, no regressions.  Okay for trunk?


Ciao,
Michael.
------------------
	PR tree-optimization/46590

	* tree-cfg.c (gimple_duplicate_sese_region): Don't update
	SSA web here ...
	* tree-ssa-loop-ch.c (copy_loop_headers): ... but here.
Richard Guenther - Sept. 3, 2012, 1:14 p.m.
On Sun, Sep 2, 2012 at 9:35 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> as the bug report tells us one speed problem is loop header copying, in
> particular the update_ssa call that is done for each and every copied loop
> header but touches all blocks in a function.
>
> Now, one idea was to use an optimized update_ssa that works only on the
> relevant subset of blocks (it's dominance frontiers that are the problem).
> I've experimented with the original formulation of frontiers as per Cytron
> which allows to calculate the domfrontier of one basic block lazily.  The
> end result was none, no speedup, no slowdown.  I haven't investigated,
> but I guess the problem is that too often most of the blocks are relevant
> for most of the header copies, either because of virtual ops or because
> calculating the domfrontier of a block needs all domfrontiers of all
> dominated childs (i.e. the domfrontier of the entry needs domfrontiers of
> all blocks).  So, no cake there.
>
> But actually there's no reason that we need to keep SSA form uptodate
> during the multiple header copyings.  We use gimple_duplicate_sese_region
> to do the copying which updates the SSA web before returning (actually
> loop header copying is the only caller of it).  The next thing done is
> just another call to gimple_duplicate_sese_region to copy some other BBs,
> then some split edges, repeat from start.  We can just as well defer the
> whole SSA web updating to after we've duplicated everything we want.
>
> That's what this patch does.  Time for various things on the testcase
> (with -O1):
>
>                          without     with patch
> tree SSA rewrite         26.2s         4.8s
> tree SSA incremental     21.7s         4.6s
> dominance computation    15.0s         4.2s
> dominance frontiers      25.6s         6.7s
> TOTAL                   135.6s        67.8s
>
> Regstrapped on x86_64-linux, no regressions.  Okay for trunk?

Ok.

Thanks,
Richard.

>
> Ciao,
> Michael.
> ------------------
>         PR tree-optimization/46590
>
>         * tree-cfg.c (gimple_duplicate_sese_region): Don't update
>         SSA web here ...
>         * tree-ssa-loop-ch.c (copy_loop_headers): ... but here.
>
> Index: tree-cfg.c
> ===================================================================
> --- tree-cfg.c  (revision 190803)
> +++ tree-cfg.c  (working copy)
> @@ -5530,9 +5530,10 @@ add_phi_args_after_copy (basic_block *re
>     important exit edge EXIT.  By important we mean that no SSA name defined
>     inside region is live over the other exit edges of the region.  All entry
>     edges to the region must go to ENTRY->dest.  The edge ENTRY is redirected
> -   to the duplicate of the region.  SSA form, dominance and loop information
> -   is updated.  The new basic blocks are stored to REGION_COPY in the same
> -   order as they had in REGION, provided that REGION_COPY is not NULL.
> +   to the duplicate of the region.  Dominance and loop information is
> +   updated, but not the SSA web.  The new basic blocks are stored to
> +   REGION_COPY in the same order as they had in REGION, provided that
> +   REGION_COPY is not NULL.
>     The function returns false if it is unable to copy the region,
>     true otherwise.  */
>
> @@ -5593,8 +5594,6 @@ gimple_duplicate_sese_region (edge entry
>        free_region_copy = true;
>      }
>
> -  gcc_assert (!need_ssa_update_p (cfun));
> -
>    /* Record blocks outside the region that are dominated by something
>       inside.  */
>    doms = NULL;
> @@ -5663,9 +5662,6 @@ gimple_duplicate_sese_region (edge entry
>    /* Add the other PHI node arguments.  */
>    add_phi_args_after_copy (region_copy, n_region, NULL);
>
> -  /* Update the SSA web.  */
> -  update_ssa (TODO_update_ssa);
> -
>    if (free_region_copy)
>      free (region_copy);
>
> Index: tree-ssa-loop-ch.c
> ===================================================================
> --- tree-ssa-loop-ch.c  (revision 190803)
> +++ tree-ssa-loop-ch.c  (working copy)
> @@ -241,6 +241,7 @@ copy_loop_headers (void)
>        split_edge (loop_latch_edge (loop));
>      }
>
> +  update_ssa (TODO_update_ssa);
>    free (bbs);
>    free (copied_bbs);
>

Patch

Index: tree-cfg.c
===================================================================
--- tree-cfg.c	(revision 190803)
+++ tree-cfg.c	(working copy)
@@ -5530,9 +5530,10 @@  add_phi_args_after_copy (basic_block *re
    important exit edge EXIT.  By important we mean that no SSA name defined
    inside region is live over the other exit edges of the region.  All entry
    edges to the region must go to ENTRY->dest.  The edge ENTRY is redirected
-   to the duplicate of the region.  SSA form, dominance and loop information
-   is updated.  The new basic blocks are stored to REGION_COPY in the same
-   order as they had in REGION, provided that REGION_COPY is not NULL.
+   to the duplicate of the region.  Dominance and loop information is
+   updated, but not the SSA web.  The new basic blocks are stored to
+   REGION_COPY in the same order as they had in REGION, provided that
+   REGION_COPY is not NULL.
    The function returns false if it is unable to copy the region,
    true otherwise.  */
 
@@ -5593,8 +5594,6 @@  gimple_duplicate_sese_region (edge entry
       free_region_copy = true;
     }
 
-  gcc_assert (!need_ssa_update_p (cfun));
-
   /* Record blocks outside the region that are dominated by something
      inside.  */
   doms = NULL;
@@ -5663,9 +5662,6 @@  gimple_duplicate_sese_region (edge entry
   /* Add the other PHI node arguments.  */
   add_phi_args_after_copy (region_copy, n_region, NULL);
 
-  /* Update the SSA web.  */
-  update_ssa (TODO_update_ssa);
-
   if (free_region_copy)
     free (region_copy);
 
Index: tree-ssa-loop-ch.c
===================================================================
--- tree-ssa-loop-ch.c	(revision 190803)
+++ tree-ssa-loop-ch.c	(working copy)
@@ -241,6 +241,7 @@  copy_loop_headers (void)
       split_edge (loop_latch_edge (loop));
     }
 
+  update_ssa (TODO_update_ssa);
   free (bbs);
   free (copied_bbs);