Patchwork [PR43864] Gimple level duplicate block cleanup.

login
register
mail settings
Submitter Tom de Vries
Date June 10, 2011, 4:54 p.m.
Message ID <4DF24C2C.7080808@codesourcery.com>
Download mbox | patch
Permalink /patch/99936/
State New
Headers show

Comments

Tom de Vries - June 10, 2011, 4:54 p.m.
Hi Richard,

thanks for the review.

On 06/08/2011 11:55 AM, Richard Guenther wrote:
> On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries <vries@codesourcery.com> wrote:
>> Hi Richard,
>>
>> I have a patch for PR43864. The patch adds a gimple level duplicate block
>> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and
>> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the following
>> table (%, lower is better).
>>
>>                     none            pic
>>                thumb1  thumb2  thumb1 thumb2
>> spec2000         99.9    99.9    99.8   99.8
>>
>> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure that the
>> optimizations proposed in PR20070 would fix this PR.
>>
>> The problem in this PR is that when compiling with -O2, the example below should
>> only have one call to free. The original problem is formulated in terms of -Os,
>> but currently we generate one call to free with -Os, although still not the
>> smallest code possible. I'll show here the -O2 case, since that's similar to the
>> original PR.
>>

Example A. (naming it for reference below)

>> #include <stdio.h>
>> void foo (char*, FILE*);
>> char* hprofStartupp(char *outputFileName, char *ctx)
>> {
>>    char fileName[1000];
>>    FILE *fp;
>>    sprintf(fileName, outputFileName);
>>    if (access(fileName, 1) == 0) {
>>        free(ctx);
>>        return 0;
>>    }
>>
>>    fp = fopen(fileName, 0);
>>    if (fp == 0) {
>>        free(ctx);
>>        return 0;
>>    }
>>
>>    foo(outputFileName, fp);
>>
>>    return ctx;
>> }
>>
>> AFAIU, there are 2 complementary methods of rtl optimizations proposed in PR20070.
>> - Merging 2 blocks which are identical expect for input registers, by using a
>>  conditional move to choose between the different input registers.
>> - Merging 2 blocks which have different local registers, by ignoring those
>>  differences
>>
>> Blocks .L6 and.L7 have no difference in local registers, but they have a
>> difference in input registers: r3 and r1. Replacing the move to r5 by a
>> conditional move would probably be benificial in terms of size, but it's not
>> clear what condition the conditional move should be using. Calculating such a
>> condition would add in size and increase the execution path.
>>
>> gcc -O2 -march=armv7-a -mthumb pr43864.c -S:
>> ...
>>        push    {r4, r5, lr}
>>        mov     r4, r0
>>        sub     sp, sp, #1004
>>        mov     r5, r1
>>        mov     r0, sp
>>        mov     r1, r4
>>        bl      sprintf
>>        mov     r0, sp
>>        movs    r1, #1
>>        bl      access
>>        mov     r3, r0
>>        cbz     r0, .L6
>>        movs    r1, #0
>>        mov     r0, sp
>>        bl      fopen
>>        mov     r1, r0
>>        cbz     r0, .L7
>>        mov     r0, r4
>>        bl      foo
>> .L3:
>>        mov     r0, r5
>>        add     sp, sp, #1004
>>        pop     {r4, r5, pc}
>> .L6:
>>        mov     r0, r5
>>        mov     r5, r3
>>        bl      free
>>        b       .L3
>> .L7:
>>        mov     r0, r5
>>        mov     r5, r1
>>        bl      free
>>        b       .L3
>> ...
>>
>> The proposed patch solved the problem by dealing with the 2 blocks at a level
>> when they are still identical: at gimple level. It detect that the 2 blocks are
>> identical, and removes one of them.
>>
>> The following table shows the impact of the patch on the example in terms of
>> size for -march=armv7-a:
>>
>>          without     with    delta
>> Os      :     108      104       -4
>> O2      :     120      104      -16
>> Os thumb:      68       64       -4
>> O2 thumb:      76       64      -12
>>
>> The gain in size for -O2 is that of removing the entire block, plus the
>> replacement of 2 moves by a constant set, which also decreases the execution
>> path. The patch ensures optimal code for both -O2 and -Os.
>>
>>
>> By keeping track of equivalent definitions in the 2 blocks, we can ignore those
>> differences in comparison. Without this feature, we would only match blocks with
>> resultless operations, due the the ssa-nature of gimples.
>> For example, with this feature, we reduce the following function to its minimum
>> at gimple level, rather than at rtl level.
>>

Example B. (naming it for reference below)

>> int f(int c, int b, int d)
>> {
>>  int r, e;
>>
>>  if (c)
>>    r = b + d;
>>  else
>>    {
>>      e = b + d;
>>      r = e;
>>    }
>>
>>  return r;
>> }
>>
>> ;; Function f (f)
>>
>> f (int c, int b, int d)
>> {
>>  int e;
>>
>> <bb 2>:
>>  e_6 = b_3(D) + d_4(D);
>>  return e_6;
>>
>> }
>>
>> I'll send the patch with the testcases in a separate email.
>>
>> OK for trunk?
> 
> I don't like that you hook this into cleanup_tree_cfg - that is called
> _way_ too often.
> 

Here is a reworked patch that addresses several concerns, particularly the
compile time overhead.

Changes:
- The optimization is now in a separate file.
- The optimization is now a pass rather than a cleanup. That allowed me to
  remove the test for pass-local flags.
  New is the pass driver tail_merge_optimize, based on
  tree-cfgcleanup.c:cleanup_tree_cfg_1.
- The pass is run once, on SSA. Before, the patch would
  fix example A only before SSA and example B only on SSA.
  In order to fix example A on SSA, I added these changes:
  - handle the vop state at entry of bb1 and bb2 as equal (gimple_equal_p)
  - insert vop phi in bb2, and use that one (update_vuses)
  - complete pt_solutions_equal_p.

Passed x86_64 bootstrapping and regression testing, currently regtesting on ARM.

I placed the pass at the earliest point where it fixes example B: After copy
propagation and dead code elimination, specifically, after the first invocation
of pass_cd_dce. Do you know (any other points) where the pass should be scheduled?

> This also duplicates the literal matching done on the RTL level - instead
> I think this optimization would be more related to value-numbering
> (either that of SCCVN/FRE/PRE or that of DOM which also does
> jump-threading).

The pass currently does just duplicate block elimination, not cross-jumping.
If we would like to extend this to cross-jumping, I think we need to do the
reverse of value numbering: walk backwards over the bb, and keep track of the
way values are used rather than defined. This will allows us to make a cut
halfway a basic block.
In general, we cannot do cut halfway a basic block in the current implementation
(of value numbering and forward matching), since we assume equivalence of the
incoming vops at bb entry. This assumption is in general only valid if we indeed
replace the entire block by another entire block.
I imagine that a cross-jumping heuristic would be based on the length of the
match and the amount of non-vop phis it would introduce. Then value numbering
would be something orthogonal to this optimization, which would reduce amount of
phis needed for a cross-jump.
I think it would make sense to use SCCVN value numbering at the point that we
have this backward matching.

I'm not sure whether it's a good idea to try to replace the current forward
local value numbering with SCCVN value numbering, since we currently declare
vops equal, which are, in the global sense, not equal. And once we go to
backward matching, we'll need something to keep track of the uses, and we can
reuse the current infrastructure for that, but not the SCCVN value numbering.

Does that make any sense?

Ok for trunk, once ARM testing finishes ok?

Thanks,
- Tom

2011-06-10  Tom de Vries  <tom@codesourcery.com>

	PR middle-end/43864
	* tree-ssa-tail-merge.c: New file.
	(int_int_splay_lookup, int_int_splay_insert)
	(int_int_splay_node_contained_in, int_int_splay_contained_in)
	(equiv_lookup, equiv_insert, equiv_contained_in, equiv_init)
	(equiv_delete, gimple_base_equal_p, pt_solution_equal_p, gimple_equal_p)
	(bb_gimple_equal_p, update_debug_stmts, update_vuses)
	(cleanup_duplicate_preds_1, same_or_local_phi_alternatives)
	(cleanup_duplicate_preds, tail_merge_optimize, gate_tail_merge): New
	function.
	(pass_tail_merge): New gimple pass.
	* tree-pass.h (pass_tail_merge): Declare new pass.
	* passes.c (init_optimization_passes): Use new pass.
	* Makefile.in (OBJS-common): Add tree-ssa-tail-merge.o.
	(tree-ssa-tail-merge.o): New rule.
Richard Guenther - June 14, 2011, 3:01 p.m.
On Fri, Jun 10, 2011 at 6:54 PM, Tom de Vries <vries@codesourcery.com> wrote:
> Hi Richard,
>
> thanks for the review.
>
> On 06/08/2011 11:55 AM, Richard Guenther wrote:
>> On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries <vries@codesourcery.com> wrote:
>>> Hi Richard,
>>>
>>> I have a patch for PR43864. The patch adds a gimple level duplicate block
>>> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and
>>> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the following
>>> table (%, lower is better).
>>>
>>>                     none            pic
>>>                thumb1  thumb2  thumb1 thumb2
>>> spec2000         99.9    99.9    99.8   99.8
>>>
>>> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure that the
>>> optimizations proposed in PR20070 would fix this PR.
>>>
>>> The problem in this PR is that when compiling with -O2, the example below should
>>> only have one call to free. The original problem is formulated in terms of -Os,
>>> but currently we generate one call to free with -Os, although still not the
>>> smallest code possible. I'll show here the -O2 case, since that's similar to the
>>> original PR.
>>>
>
> Example A. (naming it for reference below)
>
>>> #include <stdio.h>
>>> void foo (char*, FILE*);
>>> char* hprofStartupp(char *outputFileName, char *ctx)
>>> {
>>>    char fileName[1000];
>>>    FILE *fp;
>>>    sprintf(fileName, outputFileName);
>>>    if (access(fileName, 1) == 0) {
>>>        free(ctx);
>>>        return 0;
>>>    }
>>>
>>>    fp = fopen(fileName, 0);
>>>    if (fp == 0) {
>>>        free(ctx);
>>>        return 0;
>>>    }
>>>
>>>    foo(outputFileName, fp);
>>>
>>>    return ctx;
>>> }
>>>
>>> AFAIU, there are 2 complementary methods of rtl optimizations proposed in PR20070.
>>> - Merging 2 blocks which are identical expect for input registers, by using a
>>>  conditional move to choose between the different input registers.
>>> - Merging 2 blocks which have different local registers, by ignoring those
>>>  differences
>>>
>>> Blocks .L6 and.L7 have no difference in local registers, but they have a
>>> difference in input registers: r3 and r1. Replacing the move to r5 by a
>>> conditional move would probably be benificial in terms of size, but it's not
>>> clear what condition the conditional move should be using. Calculating such a
>>> condition would add in size and increase the execution path.
>>>
>>> gcc -O2 -march=armv7-a -mthumb pr43864.c -S:
>>> ...
>>>        push    {r4, r5, lr}
>>>        mov     r4, r0
>>>        sub     sp, sp, #1004
>>>        mov     r5, r1
>>>        mov     r0, sp
>>>        mov     r1, r4
>>>        bl      sprintf
>>>        mov     r0, sp
>>>        movs    r1, #1
>>>        bl      access
>>>        mov     r3, r0
>>>        cbz     r0, .L6
>>>        movs    r1, #0
>>>        mov     r0, sp
>>>        bl      fopen
>>>        mov     r1, r0
>>>        cbz     r0, .L7
>>>        mov     r0, r4
>>>        bl      foo
>>> .L3:
>>>        mov     r0, r5
>>>        add     sp, sp, #1004
>>>        pop     {r4, r5, pc}
>>> .L6:
>>>        mov     r0, r5
>>>        mov     r5, r3
>>>        bl      free
>>>        b       .L3
>>> .L7:
>>>        mov     r0, r5
>>>        mov     r5, r1
>>>        bl      free
>>>        b       .L3
>>> ...
>>>
>>> The proposed patch solved the problem by dealing with the 2 blocks at a level
>>> when they are still identical: at gimple level. It detect that the 2 blocks are
>>> identical, and removes one of them.
>>>
>>> The following table shows the impact of the patch on the example in terms of
>>> size for -march=armv7-a:
>>>
>>>          without     with    delta
>>> Os      :     108      104       -4
>>> O2      :     120      104      -16
>>> Os thumb:      68       64       -4
>>> O2 thumb:      76       64      -12
>>>
>>> The gain in size for -O2 is that of removing the entire block, plus the
>>> replacement of 2 moves by a constant set, which also decreases the execution
>>> path. The patch ensures optimal code for both -O2 and -Os.
>>>
>>>
>>> By keeping track of equivalent definitions in the 2 blocks, we can ignore those
>>> differences in comparison. Without this feature, we would only match blocks with
>>> resultless operations, due the the ssa-nature of gimples.
>>> For example, with this feature, we reduce the following function to its minimum
>>> at gimple level, rather than at rtl level.
>>>
>
> Example B. (naming it for reference below)
>
>>> int f(int c, int b, int d)
>>> {
>>>  int r, e;
>>>
>>>  if (c)
>>>    r = b + d;
>>>  else
>>>    {
>>>      e = b + d;
>>>      r = e;
>>>    }
>>>
>>>  return r;
>>> }
>>>
>>> ;; Function f (f)
>>>
>>> f (int c, int b, int d)
>>> {
>>>  int e;
>>>
>>> <bb 2>:
>>>  e_6 = b_3(D) + d_4(D);
>>>  return e_6;
>>>
>>> }
>>>
>>> I'll send the patch with the testcases in a separate email.
>>>
>>> OK for trunk?
>>
>> I don't like that you hook this into cleanup_tree_cfg - that is called
>> _way_ too often.
>>
>
> Here is a reworked patch that addresses several concerns, particularly the
> compile time overhead.
>
> Changes:
> - The optimization is now in a separate file.
> - The optimization is now a pass rather than a cleanup. That allowed me to
>  remove the test for pass-local flags.
>  New is the pass driver tail_merge_optimize, based on
>  tree-cfgcleanup.c:cleanup_tree_cfg_1.
> - The pass is run once, on SSA. Before, the patch would
>  fix example A only before SSA and example B only on SSA.
>  In order to fix example A on SSA, I added these changes:
>  - handle the vop state at entry of bb1 and bb2 as equal (gimple_equal_p)
>  - insert vop phi in bb2, and use that one (update_vuses)
>  - complete pt_solutions_equal_p.
>
> Passed x86_64 bootstrapping and regression testing, currently regtesting on ARM.
>
> I placed the pass at the earliest point where it fixes example B: After copy
> propagation and dead code elimination, specifically, after the first invocation
> of pass_cd_dce. Do you know (any other points) where the pass should be scheduled?

It's probably reasonable to run it after IPA inlining has taken place which
means insert it somewhen after the second pass_fre (I'd suggest after
pass_merge_phi).

But my general comment still applies - I don't like the structural
comparison code at all and you should really use the value-numbering
machineries we have or even better, merge this pass with FRE itself
(or DOM if that suits you more).  For FRE you'd want to hook into
tree-ssa-pre.c:eliminate().

>> This also duplicates the literal matching done on the RTL level - instead
>> I think this optimization would be more related to value-numbering
>> (either that of SCCVN/FRE/PRE or that of DOM which also does
>> jump-threading).
>
> The pass currently does just duplicate block elimination, not cross-jumping.
> If we would like to extend this to cross-jumping, I think we need to do the
> reverse of value numbering: walk backwards over the bb, and keep track of the
> way values are used rather than defined. This will allows us to make a cut
> halfway a basic block.

I don't understand - I propose to do literal matching but using value-numbering
for tracking equivalences to avoid literal matching for stuff we know is
equivalent.  In fact I think it will be mostly calls and stores where we
need to do literal matching, but never intermediate computations on
registers.

But maybe I miss something here.

> In general, we cannot do cut halfway a basic block in the current implementation
> (of value numbering and forward matching), since we assume equivalence of the
> incoming vops at bb entry. This assumption is in general only valid if we indeed
> replace the entire block by another entire block.

Why are VOPs of concern at all?

> I imagine that a cross-jumping heuristic would be based on the length of the
> match and the amount of non-vop phis it would introduce. Then value numbering
> would be something orthogonal to this optimization, which would reduce amount of
> phis needed for a cross-jump.
> I think it would make sense to use SCCVN value numbering at the point that we
> have this backward matching.
>
> I'm not sure whether it's a good idea to try to replace the current forward
> local value numbering with SCCVN value numbering, since we currently declare
> vops equal, which are, in the global sense, not equal. And once we go to
> backward matching, we'll need something to keep track of the uses, and we can
> reuse the current infrastructure for that, but not the SCCVN value numbering.
>
> Does that make any sense?

Ok, let me think about this a bit.

For now about the patch in general.  The functions need renaming to
something more sensible now that this isn't cfg-cleanup anymore.

I miss a general overview of the pass - it's hard to reverse engineer
its working for me.  Like (working backwards), you are detecting
duplicate predecessors - that obviously doesn't work for duplicates
without any successors, like those ending in noreturn calls.

+  n = EDGE_COUNT (bb->preds);
+
+  for (i = 0; i < n; ++i)
+    {
+      e1 = EDGE_PRED (bb, i);
+      if (e1->flags & EDGE_COMPLEX)
+        continue;
+      for (j = i + 1; j < n; ++j)
+        {

that's quadratic in the number of predecessors.

+          /* Block e1->src might be deleted.  If bb and e1->src are the same
+             block, delete e2->src instead, by swapping e1 and e2.  */
+          e1_swapped = (bb == e1->src) ? e2: e1;
+          e2_swapped = (bb == e1->src) ? e1: e2;

is that because you incrementally merge preds two at a time?  As you
are deleting blocks don't you need to adjust the quadratic walking?
Thus, with say four equivalent preds won't your code crash anyway?

I think the code needs to delay the CFG manipulation to the end
of this function.

+/* Returns whether for all phis in E1->dest the phi alternatives for E1 and
+   E2 are either:
+   - equal, or
+   - defined locally in E1->src and E2->src.
+   In the latter case, register the alternatives in *PHI_EQUIV.  */
+
+static bool
+same_or_local_phi_alternatives (equiv_t *phi_equiv, edge e1, edge e2)
+{
+  int n1 = e1->dest_idx;
+  int n2 = e2->dest_idx;
+  gimple_stmt_iterator gsi;
+  basic_block dest = e1->dest;
+  gcc_assert (dest == e2->dest);

too many asserts in general - I'd say for this case pass in the destination
block as argument.

+      gcc_assert (val1 != NULL_TREE);
+      gcc_assert (val2 != NULL_TREE);

superfluous.

+static bool
+cleanup_duplicate_preds_1 (equiv_t phi_equiv, edge e1, edge e2)
...
+  VEC (edge,heap) *redirected_edges;
+  gcc_assert (bb == e2->dest);

same.

+  if (e1->flags != e2->flags)
+    return false;

that's bad - it should handle EDGE_TRUE/FALSE_VALUE mismatches
by swapping edges in the preds.

+  /* TODO: We could allow multiple successor edges here, as long as bb1 and bb2
+     have the same successors.  */
+  if (EDGE_COUNT (bb1->succs) != 1 || EDGE_COUNT (bb2->succs) != 1)
+    return false;

hm, ok - that would need fixing, too.  Same or mergeable successors
of course, which makes me wonder if doing this whole transformation
incrementally and locally is a good idea ;)   Also

+  /* Calculate the changes to be made to the dominator info.
+     Calculate bb2_dom.  */
...

wouldn't be necessary I suppose (just throw away dom info after the
pass).

That is, I'd globally record BB equivalences (thus, "value-number"
BBs) and apply the CFG manipulations at a single point.

Btw, I miss where you insert PHI nodes for all uses that flow in
from the preds preds - you do that for VOPs but not for real
operands?

+  /* Replace uses of vuse2 with uses of the phi.  */
+  for (gsi = gsi_start_bb (bb2); !gsi_end_p (gsi); gsi_next (&gsi))
+    {

why not walk immediate uses of the old PHI and SET_USE to
the new one instead (for those uses in the duplicate BB of course)?

+    case GSS_CALL:
+      if (!pt_solution_equal_p (gimple_call_use_set (s1),
+                                gimple_call_use_set (s2))

I don't understand why you are concerned about equality of
points-to information.  Why not simply ior it (pt_solution_ior_into - note
they are shared so you need to unshare them first).

+/* Return true if p1 and p2 can be considered equal.  */
+
+static bool
+pt_solution_equal_p (struct pt_solution *p1, struct pt_solution *p2)

would go into tree-ssa-structalias.c instead.

+static bool
+gimple_base_equal_p (gimple s1, gimple s2)
+{
...
+  if (gimple_modified_p (s1) || gimple_modified_p (s2))
+    return false;

that shouldn't be of concern.

+  if (s1->gsbase.subcode != s2->gsbase.subcode)
+    return false;

for assigns that are of class GIMPLE_SINGLE_RHS we do not
update subcode during transformations so it can differ for now
equal statements.

I'm not sure if a splay tree for the SSA name version equivalency
map is the best representation - I would have used a simple
array of num_ssa_names size and assign value-numbers
(the lesser version for example).

Thus equiv_insert would do

  value = MIN (SSA_NAME_VERSION (val1), SSA_NAME_VERSION (val2));
  values[SSA_NAME_VERSION (val1)] = value;
  values[SSA_NAME_VERSION (val2)] = value;

if the names are not defined in bb1 resp. bb2 we would have to insert
a PHI node in the merged block - that would be a cost thingy for
doing this value-numbering in a more global way.

You don't seem to be concerned about the SSA names points-to
information, but it surely has the same issues as that of the calls
(so either they should be equal or they should be conservatively
merged).  But as-is you don't allow any values to flow into the
merged blocks that are not equal for both edges, no?

+  TV_TREE_CFG,                          /* tv_id */

add a new timevar.  We wan to be able to turn the pass off,
so also add a new option (I can see it might make debugging harder
in some cases).

Can you assess the effect of the patch on GCC itself (for example
when building cc1?)? What's the size benefit and the compile-time
overhead?

Thanks,
Richard.


> Thanks,
> - Tom
>
> 2011-06-10  Tom de Vries  <tom@codesourcery.com>
>
>        PR middle-end/43864
>        * tree-ssa-tail-merge.c: New file.
>        (int_int_splay_lookup, int_int_splay_insert)
>        (int_int_splay_node_contained_in, int_int_splay_contained_in)
>        (equiv_lookup, equiv_insert, equiv_contained_in, equiv_init)
>        (equiv_delete, gimple_base_equal_p, pt_solution_equal_p, gimple_equal_p)
>        (bb_gimple_equal_p, update_debug_stmts, update_vuses)
>        (cleanup_duplicate_preds_1, same_or_local_phi_alternatives)
>        (cleanup_duplicate_preds, tail_merge_optimize, gate_tail_merge): New
>        function.
>        (pass_tail_merge): New gimple pass.
>        * tree-pass.h (pass_tail_merge): Declare new pass.
>        * passes.c (init_optimization_passes): Use new pass.
>        * Makefile.in (OBJS-common): Add tree-ssa-tail-merge.o.
>        (tree-ssa-tail-merge.o): New rule.
>

Patch

Index: gcc/tree-ssa-tail-merge.c
===================================================================
--- /dev/null (new file)
+++ gcc/tree-ssa-tail-merge.c (revision 0)
@@ -0,0 +1,732 @@ 
+/* Tail merging for gimple.
+   Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Tom de Vries (tom@codesourcery.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "tm_p.h"
+#include "basic-block.h"
+#include "output.h"
+#include "flags.h"
+#include "function.h"
+#include "tree-flow.h"
+#include "timevar.h"
+#include "tree-pass.h"
+#include "splay-tree.h"
+#include "bitmap.h"
+#include "tree-ssa-alias.h"
+
+static bitmap altered_bbs;
+
+/* Returns true if S contains (I1, I2).  */
+
+static bool
+int_int_splay_lookup (splay_tree s, unsigned int i1, unsigned int i2)
+{
+  splay_tree_node node;
+
+  if (s == NULL)
+    return false;
+
+  node = splay_tree_lookup (s, i1);
+  return node && node->value == i2;
+}
+
+/* Attempts to insert (I1, I2) into *S.  Returns true if successful.
+   Allocates *S if necessary.  */
+
+static bool
+int_int_splay_insert (splay_tree *s, unsigned int i1 , unsigned int i2)
+{
+  if (*s != NULL)
+    {
+      /* Check for existing element, which would otherwise be silently
+         overwritten by splay_tree_insert.  */
+      if (splay_tree_lookup (*s, i1))
+        return false;
+    }
+  else
+    *s = splay_tree_new (splay_tree_compare_ints, 0, 0);
+
+  splay_tree_insert (*s, i1, i2);
+  return true;
+}
+
+/* Returns 0 if (NODE->value, NODE->key) is an element of S.  Otherwise,
+   returns 1.  */
+
+static int
+int_int_splay_node_contained_in (splay_tree_node node, void *s)
+{
+  splay_tree_node snode = splay_tree_lookup ((splay_tree)s, node->key);
+  return (!snode || node->value != snode->value) ? 1 : 0;
+}
+
+/* Returns true if all elements of S1 are also in S2.  */
+
+static bool
+int_int_splay_contained_in (splay_tree s1, splay_tree s2)
+{
+  if (s1 == NULL)
+    return true;
+  if (s2 == NULL)
+    return false;
+  return splay_tree_foreach (s1, int_int_splay_node_contained_in, s2) == 0;
+}
+
+typedef splay_tree equiv_t;
+
+/* Returns true if EQUIV contains (SSA_NAME_VERSION (VAL1),
+                                   SSA_NAME_VERSION (VAL2)).  */
+
+static bool
+equiv_lookup (equiv_t equiv, tree val1, tree val2)
+{
+  if (val1 == NULL_TREE || val2 == NULL_TREE
+      || TREE_CODE (val1) != SSA_NAME || TREE_CODE (val2) != SSA_NAME)
+    return false;
+
+  return int_int_splay_lookup (equiv, SSA_NAME_VERSION (val1),
+                               SSA_NAME_VERSION (val2));
+}
+
+/* Attempts to insert (SSA_NAME_VERSION (VAL1), SSA_NAME_VERSION (VAL2)) into
+   EQUIV, provided they are defined BB1 and BB2.  Returns true if successful.
+   Allocates *EQUIV if necessary.  */
+
+static bool
+equiv_insert (equiv_t *equiv, tree val1, tree val2,
+              basic_block bb1, basic_block bb2)
+{
+  if (val1 == NULL_TREE || val2 == NULL_TREE
+      || TREE_CODE (val1) != SSA_NAME || TREE_CODE (val2) != SSA_NAME
+      || gimple_bb (SSA_NAME_DEF_STMT (val1)) != bb1
+      || gimple_bb (SSA_NAME_DEF_STMT (val2)) != bb2)
+    return false;
+
+  return int_int_splay_insert (equiv, SSA_NAME_VERSION (val1),
+                               SSA_NAME_VERSION (val2));
+}
+
+/* Returns true if all elements of S1 are also in S2.  */
+
+static bool
+equiv_contained_in (equiv_t s1, equiv_t s2)
+{
+  return int_int_splay_contained_in (s1, s2);
+}
+
+/* Init equiv_t *S.  */
+
+static void
+equiv_init (equiv_t *s)
+{
+  *s = NULL;
+}
+
+/* Delete equiv_t *S and reinit.  */
+
+static void
+equiv_delete (equiv_t *s)
+{
+  if (!*s)
+    return;
+
+  splay_tree_delete (*s);
+  *s = NULL;
+}
+
+/* Check whether S1 and S2 are equal, considering the fields in
+   gimple_statement_base.  Ignores fields uid, location, bb, and block, and the
+   pass-local flags visited and plf.  */
+
+static bool
+gimple_base_equal_p (gimple s1, gimple s2)
+{
+  if (gimple_code (s1) != gimple_code (s2))
+    return false;
+
+  if (gimple_no_warning_p (s1) != gimple_no_warning_p (s2))
+    return false;
+
+  if (is_gimple_assign (s1)
+      && (gimple_assign_nontemporal_move_p (s1)
+          != gimple_assign_nontemporal_move_p (s2)))
+    return false;
+
+  if (gimple_modified_p (s1) || gimple_modified_p (s2))
+    return false;
+
+  if (gimple_has_volatile_ops (s1) != gimple_has_volatile_ops (s2))
+    return false;
+
+  if (s1->gsbase.subcode != s2->gsbase.subcode)
+    return false;
+
+  if (gimple_num_ops (s1) != gimple_num_ops (s2))
+    return false;
+
+  return true;
+}
+
+/* Return true if p1 and p2 can be considered equal.  */
+
+static bool
+pt_solution_equal_p (struct pt_solution *p1, struct pt_solution *p2)
+{
+  if (p1->anything != p2->anything
+      || p1->nonlocal != p2->nonlocal
+      || p1->escaped != p2->escaped
+      || p1->ipa_escaped != p2->ipa_escaped
+      || p1->null != p2->null
+      || p1->vars_contains_global != p2->vars_contains_global
+      || p1->vars_contains_restrict != p2->vars_contains_restrict)
+    return false;
+
+  if ((p1->vars == NULL) != (p2->vars == NULL))
+    return false;
+
+  if (p1->vars == NULL)
+    return true;
+
+  return bitmap_equal_p (p1->vars, p2->vars);
+}
+
+/* Return true if gimple statements S1 and S2 are equal.  At entry, EQUIV
+   contains pairs of local defs that can be considered equivalent.  If S1 and S2
+   are equal, at exit EQUAL contains the defs and vdefs of S1 and S2.  If found,
+   return vop state at bb entry in PHI_USE1 for BB1 and PHI_USE2 for BB2.  */
+
+static bool
+gimple_equal_p (equiv_t *equiv, gimple s1, gimple s2,
+                tree *phi_vuse1, tree *phi_vuse2)
+{
+  unsigned int i;
+  enum gimple_statement_structure_enum gss;
+  tree lhs1, lhs2;
+  basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2);
+
+  /* Handle omp gimples conservatively.  */
+  if (is_gimple_omp (s1) || is_gimple_omp (s2))
+    return false;
+
+  if (!gimple_base_equal_p (s1, s2))
+    return false;
+
+  gss = gimple_statement_structure (s1);
+  switch (gss)
+    {
+    case GSS_CALL:
+      if (!pt_solution_equal_p (gimple_call_use_set (s1),
+                                gimple_call_use_set (s2))
+          || !pt_solution_equal_p (gimple_call_clobber_set (s1),
+                                   gimple_call_clobber_set (s2))
+          || !gimple_call_same_target_p (s1, s2))
+        return false;
+      /* Falthru.  */
+
+    case GSS_WITH_MEM_OPS_BASE:
+    case GSS_WITH_MEM_OPS:
+      {
+        tree vdef1 = gimple_vdef (s1), vdef2 = gimple_vdef (s2);
+        tree vuse1 = gimple_vuse (s1), vuse2 = gimple_vuse (s2);
+        if (vuse1 == NULL_TREE || vuse2 == NULL_TREE)
+          {
+            if (vuse1 != vuse2)
+              return false;
+          }
+        else if (*phi_vuse1 == NULL_TREE)
+          {
+            *phi_vuse1 = vuse1;
+            *phi_vuse2 = vuse2;
+          }
+        else if (vuse1 != vuse2 && !(vuse1 == *phi_vuse1 && vuse2 == *phi_vuse2)
+                 &&!equiv_lookup (*equiv, vuse1, vuse2))
+          return false;
+
+        if (vdef1 == NULL_TREE || vdef2 == NULL_TREE)
+          {
+            if (vdef1 != vdef2)
+              return false;
+          }
+        else if (!equiv_insert (equiv, vdef1, vdef2, bb1, bb2))
+          return false;
+      }
+      /* Falthru.  */
+
+    case GSS_WITH_OPS:
+      /* Ignore gimple_def_ops and gimple_use_ops.  They are duplicates of
+         gimple_vdef, gimple_vuse and gimple_ops, which are checked
+         elsewhere.  */
+      /* Falthru.  */
+
+    case GSS_BASE:
+      break;
+
+    default:
+      return false;
+    }
+
+  /* Find lhs.  */
+  lhs1 = gimple_get_lhs (s1);
+  lhs2 = gimple_get_lhs (s2);
+
+  /* Handle ops.  */
+  for (i = 0; i < gimple_num_ops (s1); ++i)
+    {
+      tree t1 = gimple_op (s1, i);
+      tree t2 = gimple_op (s2, i);
+      if (t1 == NULL_TREE && t2 == NULL_TREE)
+        continue;
+      if (t1 == NULL_TREE || t2 == NULL_TREE)
+        return false;
+      /* Skip lhs.  */
+      if (lhs1 == t1 && i == 0)
+        continue;
+      if (!operand_equal_p (t1, t2, 0) && !equiv_lookup (*equiv, t1, t2))
+        return false;
+    }
+
+  /* Handle lhs.  */
+  if (lhs1 != lhs2 && !equiv_insert (equiv, lhs1, lhs2, bb1, bb2))
+    return false;
+
+  return true;
+}
+
+/* Return true if BB1 and BB2 contain the same non-debug gimple statements, and
+   if the def pairs in PHI_EQUIV are found to be equivalent defs in BB1 and
+   BB2.  Return vop state at bb entry in PHI_USE1 for BB1 and PHI_USE2 for
+   BB2.  */
+
+static bool
+bb_gimple_equal_p (equiv_t phi_equiv, basic_block bb1, basic_block bb2,
+                   tree *phi_vuse1, tree *phi_vuse2)
+{
+  gimple_stmt_iterator gsi1 = gsi_start_nondebug_bb (bb1);
+  gimple_stmt_iterator gsi2 = gsi_start_nondebug_bb (bb2);
+  bool end1, end2;
+  equiv_t equiv;
+  bool equal = true;
+
+  end1 = gsi_end_p (gsi1);
+  end2 = gsi_end_p (gsi2);
+
+  /* Don't handle empty blocks, these are handled elsewhere in the cleanup.  */
+  if (end1 || end2)
+    return false;
+
+  /* TODO: handle blocks with phi-nodes.  We'll have find corresponding
+     phi-nodes in bb1 and bb2, with the same alternatives for the same
+     preds.  */
+  if (phi_nodes (bb1) != NULL || phi_nodes (bb2) != NULL)
+    return false;
+
+  equiv_init (&equiv);
+  while (true)
+    {
+      if (end1 && end2)
+        break;
+      if (end1 || end2
+          || !gimple_equal_p (&equiv, gsi_stmt (gsi1), gsi_stmt (gsi2),
+                              phi_vuse1, phi_vuse2))
+        {
+          equal = false;
+          break;
+        }
+
+      gsi_next_nondebug (&gsi1);
+      gsi_next_nondebug (&gsi2);
+      end1 = gsi_end_p (gsi1);
+      end2 = gsi_end_p (gsi2);
+    }
+
+  /* equiv now contains all bb1,bb2 def pairs which are equivalent.
+     Check if the phi alternatives are indeed equivalent.  */
+  equal = equal && equiv_contained_in (phi_equiv, equiv);
+
+  equiv_delete (&equiv);
+
+  return equal;
+}
+
+/* Resets all debug statements in BBUSE that have uses that are not
+   dominated by their defs.  */
+
+static void
+update_debug_stmts (basic_block bbuse)
+{
+  use_operand_p use_p;
+  ssa_op_iter oi;
+  basic_block bbdef;
+  gimple stmt, def_stmt;
+  gimple_stmt_iterator gsi;
+  tree name;
+
+  for (gsi = gsi_start_bb (bbuse); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (!is_gimple_debug (stmt))
+        continue;
+      gcc_assert (gimple_debug_bind_p (stmt));
+
+      FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, oi, SSA_OP_USE)
+        {
+          name = USE_FROM_PTR (use_p);
+          gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+          def_stmt = SSA_NAME_DEF_STMT (name);
+          gcc_assert (def_stmt != NULL);
+
+          bbdef = gimple_bb (def_stmt);
+          if (bbdef == NULL || bbuse == bbdef
+              || dominated_by_p (CDI_DOMINATORS, bbuse, bbdef))
+            continue;
+
+          gimple_debug_bind_reset_value (stmt);
+          update_stmt (stmt);
+        }
+    }
+}
+
+/* Create a vop phi in BB2, with VUSE1 arguments for all the REDIRECTED_EDGES,
+   and VUSE2 for the other edges.  Then use the phi instead of VUSE2 in BB2.  */
+
+static void
+update_vuses (tree vuse1, tree vuse2, basic_block bb2,
+              VEC (edge,heap) *redirected_edges)
+{
+  gimple stmt, phi = NULL;
+  tree lhs, vuse, vdef;
+  unsigned int i;
+  gimple def_stmt1, def_stmt2;
+  gimple_stmt_iterator gsi;
+  source_location locus1, locus2;
+
+  if (vuse1 == NULL_TREE && vuse2 == NULL_TREE)
+    return;
+  gcc_assert (vuse1 != NULL_TREE && vuse2 != NULL_TREE);
+
+  def_stmt1 = SSA_NAME_DEF_STMT (vuse1);
+  locus1 = gimple_location (def_stmt1);
+  def_stmt2 = SSA_NAME_DEF_STMT (vuse2);
+  locus2 = gimple_location (def_stmt2);
+
+  /* This is not triggered yet, since we bail out if bb2 has more than
+     1 predecessor.  */
+  gcc_assert (gimple_bb (def_stmt2) != bb2);
+
+  /* No need to create a phi with 2 equal arguments.  */
+  if (vuse1 == vuse2)
+    return;
+
+  /* Create a phi, first with default argument vuse2 for all preds.  */
+  lhs = make_ssa_name (SSA_NAME_VAR (vuse2), NULL);
+  phi = create_phi_node (lhs, bb2);
+  SSA_NAME_DEF_STMT (lhs) = phi;
+  for (i = 0; i < EDGE_COUNT (bb2->preds); ++i)
+    add_phi_arg (phi, vuse2, EDGE_PRED (bb2, i), locus2);
+
+  /* Now overwrite the arguments associated with the redirected edges with
+     vuse1.  */
+  for (i = 0; i < EDGE_COUNT (redirected_edges); ++i)
+    {
+      edge e = VEC_index (edge, redirected_edges, i);
+      gcc_assert (PHI_ARG_DEF_FROM_EDGE (phi, e));
+      SET_PHI_ARG_DEF (phi, e->dest_idx, vuse1);
+      gimple_phi_arg_set_location (phi, e->dest_idx, locus1);
+    }
+
+  /* Replace uses of vuse2 with uses of the phi.  */
+  for (gsi = gsi_start_bb (bb2); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      vuse = gimple_vuse (stmt);
+      vdef = gimple_vdef (stmt);
+      if (vuse != NULL_TREE)
+        {
+          gcc_assert (vuse == vuse2);
+          gimple_set_vuse (stmt,  lhs);
+          update_stmt (stmt);
+        }
+      if (vdef != NULL_TREE)
+        break;
+    }
+}
+
+/* E1 and E2 have a common dest.  Detect if E1->src and E2->src are duplicates,
+   and if so, redirect the predecessor edges of E1->src to E2->src and remove
+   E1->src.  Returns true if any changes were made.  */
+
+static bool
+cleanup_duplicate_preds_1 (equiv_t phi_equiv, edge e1, edge e2)
+{
+  edge pred_edge;
+  basic_block bb1, bb2, pred;
+  basic_block bb_dom = NULL, bb2_dom = NULL;
+  unsigned int i;
+  basic_block bb = e1->dest;
+  tree phi_vuse1 = NULL_TREE, phi_vuse2 = NULL_TREE;
+  VEC (edge,heap) *redirected_edges;
+  gcc_assert (bb == e2->dest);
+
+  if (e1->flags != e2->flags)
+    return false;
+
+  bb1 = e1->src;
+  bb2 = e2->src;
+
+  /* TODO: We could allow multiple successor edges here, as long as bb1 and bb2
+     have the same successors.  */
+  if (EDGE_COUNT (bb1->succs) != 1 || EDGE_COUNT (bb2->succs) != 1)
+    return false;
+
+  if (!bb_gimple_equal_p (phi_equiv, bb1, bb2, &phi_vuse1, &phi_vuse2))
+    return false;
+
+  if (dump_file)
+    fprintf (dump_file, "cleanup_duplicate_preds: "
+             "cleaning up <bb %d>, duplicate of <bb %d>\n", bb1->index,
+             bb2->index);
+
+  /* Calculate the changes to be made to the dominator info.
+     Calculate bb2_dom.  */
+  bb2_dom = nearest_common_dominator (CDI_DOMINATORS, bb2, bb1);
+  if (bb2_dom == bb1 || bb2_dom == bb2)
+    bb2_dom = get_immediate_dominator (CDI_DOMINATORS, bb2_dom);
+
+  /* Calculate bb_dom.  */
+  bb_dom = get_immediate_dominator (CDI_DOMINATORS, bb);
+  if (bb == bb2)
+    bb_dom = bb2_dom;
+  else if (bb_dom == bb1 || bb_dom == bb2)
+    bb_dom = bb2;
+  else
+    {
+      /* Count the predecessors of bb (other than bb1 or bb2), not dominated
+         by bb.  If there are none, merging bb1 and bb2 will mean that bb2
+         dominates bb.  */
+      int not_dominated = 0;
+      for (i = 0; i < EDGE_COUNT (bb->preds); ++i)
+        {
+          pred_edge = EDGE_PRED (bb, i);
+          pred = pred_edge->src;
+          if (pred == bb1 || pred == bb2)
+            continue;
+          if (dominated_by_p (CDI_DOMINATORS, pred, bb))
+            continue;
+          not_dominated++;
+        }
+      if (not_dominated == 0)
+        bb_dom = bb2;
+    }
+
+  redirected_edges = VEC_alloc (edge, heap, 10);
+
+  /* Redirect the incoming edges of bb1 to bb2.  */
+  for (i = EDGE_COUNT (bb1->preds); i > 0 ; --i)
+    {
+      pred_edge = EDGE_PRED (bb1, i - 1);
+      pred = pred_edge->src;
+      pred_edge = redirect_edge_and_branch (pred_edge, bb2);
+      gcc_assert (pred_edge != NULL);
+      VEC_safe_push (edge, heap, redirected_edges, pred_edge);
+    }
+
+  /* The set of predecessors has changed for both bb and bb2.  */
+  bitmap_set_bit (altered_bbs, bb->index);
+  bitmap_set_bit (altered_bbs, bb2->index);
+
+  /* bb1 has no incoming edges anymore, and has become unreachable.  */
+  delete_basic_block (bb1);
+  bitmap_clear_bit (altered_bbs, bb1->index);
+
+  /* Update dominator info.  Note: update order is relevant.  */
+  set_immediate_dominator (CDI_DOMINATORS, bb2, bb2_dom);
+  if (bb != bb2)
+    set_immediate_dominator (CDI_DOMINATORS, bb, bb_dom);
+
+  /* Reset invalidated debug statements.  */
+  update_debug_stmts (bb2);
+
+  /* Insert vop phi, and update vuses.  */
+  update_vuses (phi_vuse1, phi_vuse2, bb2, redirected_edges);
+
+  VEC_free (edge, heap, redirected_edges);
+
+  return true;
+}
+
+/* Returns whether for all phis in E1->dest the phi alternatives for E1 and
+   E2 are either:
+   - equal, or
+   - defined locally in E1->src and E2->src.
+   In the latter case, register the alternatives in *PHI_EQUIV.  */
+
+static bool
+same_or_local_phi_alternatives (equiv_t *phi_equiv, edge e1, edge e2)
+{
+  int n1 = e1->dest_idx;
+  int n2 = e2->dest_idx;
+  gimple_stmt_iterator gsi;
+  basic_block dest = e1->dest;
+  gcc_assert (dest == e2->dest);
+
+  for (gsi = gsi_start_phis (dest); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple phi = gsi_stmt (gsi);
+      tree val1 = gimple_phi_arg_def (phi, n1);
+      tree val2 = gimple_phi_arg_def (phi, n2);
+
+      gcc_assert (val1 != NULL_TREE);
+      gcc_assert (val2 != NULL_TREE);
+
+      if (operand_equal_for_phi_arg_p (val1, val2))
+        continue;
+
+      if (equiv_insert (phi_equiv, val1, val2, e1->src, e2->src))
+        continue;
+
+      /* TODO: handle case that val1 and val2 are vops which are not locally
+         defined.  */
+      return false;
+    }
+
+  return true;
+}
+
+/* Detect duplicate predecessor blocks of BB and clean them up.  Return true if
+   any changes were made.  */
+
+static bool
+cleanup_duplicate_preds (basic_block bb)
+{
+  edge e1, e2, e1_swapped, e2_swapped;
+  unsigned int i, j, n;
+  equiv_t phi_equiv;
+  bool changed;
+
+  n = EDGE_COUNT (bb->preds);
+
+  for (i = 0; i < n; ++i)
+    {
+      e1 = EDGE_PRED (bb, i);
+      if (e1->flags & EDGE_COMPLEX)
+        continue;
+      for (j = i + 1; j < n; ++j)
+        {
+          e2 = EDGE_PRED (bb, j);
+          if (e2->flags & EDGE_COMPLEX)
+            continue;
+
+          /* Block e1->src might be deleted.  If bb and e1->src are the same
+             block, delete e2->src instead, by swapping e1 and e2.  */
+          e1_swapped = (bb == e1->src) ? e2: e1;
+          e2_swapped = (bb == e1->src) ? e1: e2;
+
+          /* For all phis in bb, the phi alternatives for e1 and e2 need to have
+             the same value.  */
+          equiv_init (&phi_equiv);
+          if (same_or_local_phi_alternatives (&phi_equiv, e1_swapped,
+                                              e2_swapped))
+            /* Collapse e1->src and e2->src if they are duplicates.  */
+            changed = cleanup_duplicate_preds_1 (phi_equiv, e1_swapped,
+                                                 e2_swapped);
+          else
+            changed = false;
+
+          equiv_delete (&phi_equiv);
+
+          if (changed)
+            return true;
+        }
+    }
+
+  return false;
+}
+
+/* Runs tail merge optimization.  */
+
+static unsigned int
+tail_merge_optimize (void)
+{
+  basic_block bb;
+  unsigned i, n;
+
+  calculate_dominance_info (CDI_DOMINATORS);
+
+  /* Initialize worklist.  */
+  n = last_basic_block - NUM_FIXED_BLOCKS;
+  altered_bbs = BITMAP_ALLOC (NULL);
+  bitmap_set_range (altered_bbs, NUM_FIXED_BLOCKS, n);
+
+  /* Now process the altered blocks, as long as any are available.  */
+  while (!bitmap_empty_p (altered_bbs))
+    {
+      i = bitmap_first_set_bit (altered_bbs);
+      bitmap_clear_bit (altered_bbs, i);
+      if (i < NUM_FIXED_BLOCKS)
+        continue;
+
+      bb = BASIC_BLOCK (i);
+      if (!bb)
+        continue;
+
+      cleanup_duplicate_preds (bb);
+    }
+
+  BITMAP_FREE (altered_bbs);
+
+#ifdef ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+
+  return 0;
+}
+
+/* Returns true if tail merge pass should be run.  */
+
+static bool
+gate_tail_merge (void)
+{
+  return optimize >= 2;
+}
+
+struct gimple_opt_pass pass_tail_merge =
+{
+ {
+  GIMPLE_PASS,
+  "tailmerge",                          /* name */
+  gate_tail_merge,                      /* gate */
+  tail_merge_optimize,                  /* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  TV_TREE_CFG,                          /* tv_id */
+  PROP_ssa | PROP_cfg,                  /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  TODO_verify_ssa | TODO_verify_stmts
+  | TODO_cleanup_cfg | TODO_dump_func   /* todo_flags_finish */
+ }
+};
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h (revision 173734)
+++ gcc/tree-pass.h (working copy)
@@ -446,6 +446,7 @@  extern struct gimple_opt_pass pass_trace
 extern struct gimple_opt_pass pass_warn_unused_result;
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
+extern struct gimple_opt_pass pass_tail_merge;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in (revision 173734)
+++ gcc/Makefile.in (working copy)
@@ -1441,6 +1441,7 @@  OBJS-common = \
 	tree-ssa-sccvn.o \
 	tree-ssa-sink.o \
 	tree-ssa-structalias.o \
+	tree-ssa-tail-merge.o \
 	tree-ssa-ter.o \
 	tree-ssa-threadedge.o \
 	tree-ssa-threadupdate.o \
@@ -2395,6 +2396,13 @@  stor-layout.o : stor-layout.c $(CONFIG_H
    $(TREE_H) $(PARAMS_H) $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) output.h $(RTL_H) \
    $(GGC_H) $(TM_P_H) $(TARGET_H) langhooks.h $(REGS_H) gt-stor-layout.h \
    $(DIAGNOSTIC_CORE_H) $(CGRAPH_H) $(TREE_INLINE_H) $(TREE_DUMP_H) $(GIMPLE_H)
+tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \
+   $(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \
+   $(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) output.h \
+   $(TREE_H) $(TREE_FLOW_H) $(TREE_INLINE_H) \
+   $(GIMPLE_H) $(FUNCTION_H) \
+   $(TREE_PASS_H) $(TIMEVAR_H) $(SPLAY_TREE_H) \
+   $(CGRAPH_H)
 tree-ssa-structalias.o: tree-ssa-structalias.c \
    $(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(GGC_H) $(OBSTACK_H) $(BITMAP_H) \
    $(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) output.h \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c (revision 173734)
+++ gcc/passes.c (working copy)
@@ -767,6 +767,7 @@  init_optimization_passes (void)
 	  NEXT_PASS (pass_copy_prop);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
+	  NEXT_PASS (pass_tail_merge);
 	  NEXT_PASS (pass_early_ipa_sra);
 	  NEXT_PASS (pass_tail_recursion);
 	  NEXT_PASS (pass_convert_switch);