diff mbox

[PR43864] Gimple level duplicate block cleanup.

Message ID CAFiYyc2ZP-J8szZ9w9FHq0aJeGamnc2O_c-v44045vktZMg7DQ@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener July 22, 2011, 3:36 p.m. UTC
On Sun, Jul 17, 2011 at 8:33 PM, Tom de Vries <vries@codesourcery.com> wrote:

> Bootstrapped and reg-tested on x86_64.  Ok for trunk (after ARM testing)?

+static int
+same_succ_equal (const void *ve1, const void *ve2)
+{
...
+  if (bitmap_bit_p (e1->bbs, ENTRY_BLOCK)
+      || bitmap_bit_p (e1->bbs, EXIT_BLOCK)
+      || bitmap_bit_p (e2->bbs, ENTRY_BLOCK)
+      || bitmap_bit_p (e2->bbs, EXIT_BLOCK))
+    return 0;

that's odd - what are these checks for?

+  if (dump_file)
+    {
+      fprintf (dump_file, "initial worklist:\n");

with dump_flags & TDF_DETAILS

I'm now at merge_calls and wondering about alias info again.  We are
probably safe for the per-pointer information because we are not
operating flow-sensitive for memory and for merging require value-equivalence
for SSA names.  For calls the same should be true - we are not
flow- or context-sensitive, and even if we were context-sentitive we
require equivalent arguments (for memory arguments we should be safe
because of the non-flow-sensitivity).

So, did you actually run into problems?  If not then I suggest to remove
merge_calls completely (and the related changes that it requires).

+/* Create or update a vop phi in BB2.  Use VUSE1 arguments for all the
+   REDIRECTED_EDGES, or if VUSE1 is NULL_TREE, use BB_VOP_AT_EXIT.  If a new
+   phis is created, use the phi instead of VUSE2 in BB2.  */
+
+static void
+update_vuses (tree vuse1, tree vuse2, basic_block bb2,
+              VEC (edge,heap) *redirected_edges)

...

+  if (vuse2 == NULL_TREE)
+    return;

hm, that's the case when there is no VUSE that is dominated by BB2
(or is in BB2).  Ok, might happen.

+             locus1 = gimple_location (SSA_NAME_DEF_STMT (arg));
+             add_phi_arg (phi, arg, e, locus1);

I don't think virtual operand PHIs should have locations, just use
UNKNOWN_LOCATION here.

+  locus2 = gimple_location (def_stmt2);

Likewise.

+  /* Create a phi, first with default argument vuse2 for all preds.  */
+  lhs = make_ssa_name (SSA_NAME_VAR (vuse2), NULL);
+  VN_INFO_GET (lhs);
+  phi = create_phi_node (lhs, bb2);
+  SSA_NAME_DEF_STMT (lhs) = phi;
+  FOR_EACH_EDGE (e, ei, bb2->preds)
+    add_phi_arg (phi, vuse2, e, locus2);
+
+  /* Now overwrite the arguments associated with the redirected edges with
+     vuse1.  */
+  for (i = 0; i < EDGE_COUNT (redirected_edges); ++i)
+    {
+      e = VEC_index (edge, redirected_edges, i);
+      gcc_assert (PHI_ARG_DEF_FROM_EDGE (phi, e));

No need for this assert.

+      if (vuse1)
+       arg = vuse1;
+      else
+       arg = BB_VOP_AT_EXIT (e->src);
+      SET_PHI_ARG_DEF (phi, e->dest_idx, arg);
+      locus1 = gimple_location (SSA_NAME_DEF_STMT (arg));

See above.

+      gimple_phi_arg_set_location (phi, e->dest_idx, locus1);
+    }


Can you maybe merge this with the update-existing-phi-case?  They
look all too similar.

+  /* Replace uses of vuse2 in bb2 with phi.  */
+  FOR_EACH_IMM_USE_STMT (stmt, iter, vuse2)
+    {
+      if (gimple_code (stmt) == GIMPLE_PHI)

Does FOR_EACH_IMM_USE_ON_STMT really not work for PHIs?
Other code doesn't seem to care.

+       {
+         edge e;
+         if (stmt == phi)
+           continue;
+         e = find_edge (bb2, gimple_bb (stmt));
+         if (e == NULL)
+           continue;
+         use_p = PHI_ARG_DEF_PTR_FROM_EDGE (stmt, e);
+         SET_USE (use_p, lhs);
+         update_stmt (stmt);
+       }
+      else if (gimple_bb (stmt) == bb2)

That check looks odd.  A use can very well appear in a forwarder block.

+       {
+         FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+           SET_USE (use_p, lhs);
+         update_stmt (stmt);
+       }
+    }

+/* Scans the vdefs and vuses of the insn of BB, and returns the vop at entry in
+   VOP_AT_ENTRY, and the vop at exit in VOP_AT_EXIT.  */
+
+static void
+insn_vops (basic_block bb, tree *vop_at_entry, tree *vop_at_exit)

it's easier to start from the bb end and walk until you see the
first vdef or vuse.  Then you have *vop_at_exit.  From there
just walk the SSA_NAME_DEF_STMTs of the vuse until you
hit one whose definition is not in BB - and you have *vop_at_entry.
That way you can avoid scanning most of the stmts.

The function also has an odd name ;)  It should be something like
vops_at_bb_entry_and_exit.

+static void
+vop_at_entry (basic_block bb1, basic_block bb2, tree *vop_at_entry1,
+             tree *vop_at_entry2)

so you don't need the vop at exit at all?  The function is a bit unclear
to me given it does so much stuff other than just computing the BBs
entry VOPs ...

+static void
+replace_block_by (basic_block bb1, basic_block bb2, bool update_vops)
+{

can you add some comments before the different phases of update?
I _think_ I understand what it does, but ...

+/* Runs tail merge optimization.  */
+
+unsigned int
+tail_merge_optimize (unsigned int todo)
+{
+  int nr_bbs_removed_total = 0;
+  int nr_bbs_removed;
+  bool loop_entered = false;
+  int iteration_nr = 0;
+  bool update_vops = ((todo & TODO_update_ssa_only_virtuals) == 0
+                     || !symbol_marked_for_renaming (gimple_vop (cfun)));

you need to simplify this to

  bool update_vops = !symbol_marked_for_renaming (gimple_vop (cfun));

+      if (nr_bbs_removed == 0)
+       break;
+
+      free_dominance_info (CDI_DOMINATORS);

we might want to limit the number of iterations we perform - especially
as you are re-computing dominators on each iteration.  What's the
maximum number of iterations you see during a GCC bootstrap?

+  if (dump_file)
+    fprintf (dump_file, "htab collision / search: %f\n",
+            htab_collisions (same_succ_htab));

in general without dump_flags & TDF_DETAILS passes should print
at most things when they did a transformation (some even don't do that).
Please double-check all your dump-prints.

+      todo |= (TODO_verify_ssa | TODO_verify_stmts | TODO_verify_flow
+              | TODO_dump_func);

should all be already set.

@ -4945,6 +4944,9 @@ execute_pre (bool do_fre)
   scev_finalize ();
   fini_pre (do_fre);

+  todo |= tail_merge_optimize (todo);
+  free_scc_vn ();

Please only run tail_merge_optimize once.  As we are running through
this code three times at -O2.  I suggest to try it in the !do_fre case
only which we only run once (as PRE).  If that doesn't work out
nicely we need to find other means (like introduce a
pass_fre_and_tail_merge which passes down another flag and replace
one FRE pass with that new combined pass).

(this review took 40min again ...) so I appreciate a second look from
somebody else.

Btw, can you expand a bit on the amount of testcases?

Thanks,
Richard.


> Thanks,
> - Tom
>
> 2011-07-17  Tom de Vries  <tom@codesourcery.com>
>
>        PR middle-end/43864
>        * tree-ssa-tail-merge.c: New file.
>        (struct same_succ): Define.
>        (same_succ_t, const_same_succ_t): New typedef.
>        (struct bb_cluster): Define.
>        (bb_cluster_t, const_bb_cluster_t): New typedef.
>        (struct aux_bb_info): Define.
>        (BB_SIZE, BB_SAME_SUCC, BB_CLUSTER, BB_VOP_AT_EXIT): Define.
>        (gvn_uses_equal): New function.
>        (same_succ_print, same_succ_print_traverse, same_succ_hash)
>        (inverse_flags, same_succ_equal, same_succ_alloc, same_succ_delete)
>        (same_succ_reset): New function.
>        (same_succ_htab, same_succ_edge_flags)
>        (deleted_bbs, deleted_bb_preds): New var.
>        (debug_same_succ): New function.
>        (worklist): New var.
>        (print_worklist, add_to_worklist, find_same_succ_bb, find_same_succ)
>        (init_worklist, delete_worklist, delete_basic_block_same_succ)
>        (same_succ_flush_bbs, update_worklist): New function.
>        (print_cluster, debug_cluster, same_predecessors)
>        (add_bb_to_cluster, new_cluster, delete_cluster): New function.
>        (all_clusters): New var.
>        (alloc_cluster_vectors, reset_cluster_vectors, delete_cluster_vectors)
>        (merge_clusters, set_cluster): New function.
>        (gimple_equal_p, find_duplicate, same_phi_alternatives_1)
>        (same_phi_alternatives, bb_has_non_vop_phi, find_clusters_1)
>        (find_clusters): New function.
>        (merge_calls, update_vuses, vop_phi, insn_vops, vop_at_entry)
>        (replace_block_by): New function.
>        (update_bbs): New var.
>        (apply_clusters): New function.
>        (update_debug_stmt, update_debug_stmts): New function.
>        (tail_merge_optimize): New function.
>        tree-flow.h (tail_merge_optimize): Declare.
>        * tree-ssa-pre.c (execute_pre): Use tail_merge_optimize.
>        * Makefile.in (OBJS-common): Add tree-ssa-tail-merge.o.
>        (tree-ssa-tail-merge.o): New rule.
>        * opts.c (default_options_table): Set OPT_ftree_tail_merge by default at
>        OPT_LEVELS_2_PLUS.
>        * tree-ssa-sccvn.c (vn_valueize): Move to ...
>        * tree-ssa-sccvn.h (vn_valueize): Here.
>        * tree-ssa-alias.h (pt_solution_ior_into_shared): Declare.
>        * tree-ssa-structalias.c (find_what_var_points_to): Factor out and
>        use ...
>        (pt_solution_share): New function.
>        (pt_solution_unshare, pt_solution_ior_into_shared): New function.
>        (delete_points_to_sets): Nullify shared_bitmap_table after deletion.
>        * timevar.def (TV_TREE_TAIL_MERGE): New timevar.
>        * common.opt (ftree-tail-merge): New switch.
>        * params.def (PARAM_MAX_TAIL_MERGE_COMPARISONS): New parameter.
>        * doc/invoke.texi (Optimization Options, -O2): Add -ftree-tail-merge.
>        (-ftree-tail-merge, max-tail-merge-comparisons): New item.
>

Comments

Tom de Vries Aug. 24, 2011, 7:38 a.m. UTC | #1
On 07/22/2011 05:36 PM, Richard Guenther wrote:
> That said - I'm reasonably happy with the pass now, but it's rather large
> (this review took 40min again ...) so I appreciate a second look from
> somebody else.
> 

Ian,

Do you have a moment to give a second look to a gimple CFG optimization?  The
optimization removes duplicate basic blocks and reduces code size by 1-2%.

The latest patch is posted at
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01602.html.

Thanks,
- Tom
Ian Lance Taylor Aug. 24, 2011, 7 p.m. UTC | #2
Tom de Vries <vries@codesourcery.com> writes:

> Do you have a moment to give a second look to a gimple CFG optimization?  The
> optimization removes duplicate basic blocks and reduces code size by 1-2%.
>
> The latest patch is posted at
> http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01602.html.


I'm not really the best person to look at this patch, since it applies
to areas of the compiler with which I am less familiar..  However, since
you ask, I did read through the patch, and it looks OK to me.  Since
Richi OK'ed it, this patch is OK with the following changes.


> +typedef struct same_succ *same_succ_t;
> +typedef const struct same_succ *const_same_succ_t;

Don't name new types ending with "_t".  POSIX reserves names ending with
"_t" when <sys/types.h> is #included.  Name these something else.

> +typedef struct bb_cluster *bb_cluster_t;
> +typedef const struct bb_cluster *const_bb_cluster_t;

Same here.


> +@item -ftree-tail-merge
> +Merges identical blocks with same successors.  This flag is enabled by default
> +at @option{-O2} and higher.  The run time of this pass can be limited using
> +@option{max-tail-merge-comparisons} parameter.

I think this text can be improved to be more meaningful to compiler
users.  I suggest something like:

  Look for identical code sequences.  When found, replace one with a
  jump to the other.  This optimization is known as tail merging or
  cross jumping.  This flag is enabled [now same as above]


Thanks.

Ian
Richard Biener Aug. 25, 2011, 7:31 a.m. UTC | #3
On Wed, Aug 24, 2011 at 9:00 PM, Ian Lance Taylor <iant@google.com> wrote:
> Tom de Vries <vries@codesourcery.com> writes:
>
>> Do you have a moment to give a second look to a gimple CFG optimization?  The
>> optimization removes duplicate basic blocks and reduces code size by 1-2%.
>>
>> The latest patch is posted at
>> http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01602.html.
>
>
> I'm not really the best person to look at this patch, since it applies
> to areas of the compiler with which I am less familiar..  However, since
> you ask, I did read through the patch, and it looks OK to me.  Since
> Richi OK'ed it, this patch is OK with the following changes.
>
>
>> +typedef struct same_succ *same_succ_t;
>> +typedef const struct same_succ *const_same_succ_t;
>
> Don't name new types ending with "_t".  POSIX reserves names ending with
> "_t" when <sys/types.h> is #included.  Name these something else.
>
>> +typedef struct bb_cluster *bb_cluster_t;
>> +typedef const struct bb_cluster *const_bb_cluster_t;
>
> Same here.
>
>
>> +@item -ftree-tail-merge
>> +Merges identical blocks with same successors.  This flag is enabled by default
>> +at @option{-O2} and higher.  The run time of this pass can be limited using
>> +@option{max-tail-merge-comparisons} parameter.
>
> I think this text can be improved to be more meaningful to compiler
> users.  I suggest something like:
>
>  Look for identical code sequences.  When found, replace one with a
>  jump to the other.  This optimization is known as tail merging or
>  cross jumping.  This flag is enabled [now same as above]

Can you also add a --param for the maximum number of iterations
you perform (16 sounds quite high for GCC bootstrap), I'd default it
to 2 which seems to catch 99% of all cases.

If you already committed the patch just do it as a followup please.

Thanks,
Richard.

>
> Thanks.
>
> Ian
>
diff mbox

Patch

Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h     (revision 175801)
+++ gcc/tree-flow.h     (working copy)
@@ -806,6 +806,9 @@  bool multiplier_allowed_in_address_p (HO
 unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool);
 bool may_be_nonaddressable_p (tree expr);

+/* In tree-ssa-tail-merge.c.  */
+extern unsigned int tail_merge_optimize (unsigned int);

Eh, tree-flow.h kitchen-sink ;)  Please put it into tree-pass.h instead.

That said - I'm reasonably happy with the pass now, but it's rather large