From patchwork Sun Jul 17 18:33:01 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 105080 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 88965B6F83 for ; Mon, 18 Jul 2011 04:33:16 +1000 (EST) Received: (qmail 19763 invoked by alias); 17 Jul 2011 18:33:14 -0000 Received: (qmail 19658 invoked by uid 22791); 17 Jul 2011 18:33:05 -0000 X-SWARE-Spam-Status: No, hits=-0.1 required=5.0 tests=AWL, BAYES_50, RCVD_IN_DNSWL_NONE, SPF_FAIL, TW_CB, TW_CF, TW_CV, TW_IV, TW_TM X-Spam-Check-By: sourceware.org Received: from smtp-vbr11.xs4all.nl (HELO smtp-vbr11.xs4all.nl) (194.109.24.31) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 17 Jul 2011 18:32:41 +0000 Received: from [192.168.1.68] (teejay.xs4all.nl [213.84.119.160]) (authenticated bits=0) by smtp-vbr11.xs4all.nl (8.13.8/8.13.8) with ESMTP id p6HIWZNS060626 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 17 Jul 2011 20:32:37 +0200 (CEST) (envelope-from vries@codesourcery.com) Message-ID: <4E232ADD.3020803@codesourcery.com> Date: Sun, 17 Jul 2011 20:33:01 +0200 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: Richard Guenther CC: Steven Bosscher , gcc-patches@gcc.gnu.org Subject: Re: [PATCH, PR43864] Gimple level duplicate block cleanup. References: <4DEF4408.4040001@codesourcery.com> <4DF24C2C.7080808@codesourcery.com> <4E1C3A19.8060706@codesourcery.com> In-Reply-To: Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On 07/12/2011 04:07 PM, Richard Guenther wrote: > On Tue, Jul 12, 2011 at 2:12 PM, Tom de Vries wrote: >> Hi Richard, >> >> here's a new version of the pass. I attempted to address as much as possible >> your comments. The pass was bootstrapped and reg-tested on x86_64. >> >> On 06/14/2011 05:01 PM, Richard Guenther wrote: >>> On Fri, Jun 10, 2011 at 6:54 PM, Tom de Vries wrote: >>>> Hi Richard, >>>> >>>> thanks for the review. >>>> >>>> On 06/08/2011 11:55 AM, Richard Guenther wrote: >>>>> On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries wrote: >>>>>> Hi Richard, >>>>>> >>>>>> I have a patch for PR43864. The patch adds a gimple level duplicate block >>>>>> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and >>>>>> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the following >>>>>> table (%, lower is better). >>>>>> >>>>>> none pic >>>>>> thumb1 thumb2 thumb1 thumb2 >>>>>> spec2000 99.9 99.9 99.8 99.8 >>>>>> >>>>>> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure that the >>>>>> optimizations proposed in PR20070 would fix this PR. >>>>>> >>>>>> The problem in this PR is that when compiling with -O2, the example below should >>>>>> only have one call to free. The original problem is formulated in terms of -Os, >>>>>> but currently we generate one call to free with -Os, although still not the >>>>>> smallest code possible. I'll show here the -O2 case, since that's similar to the >>>>>> original PR. >>>>>> >>>> >>>> Example A. (naming it for reference below) >>>> >>>>>> #include >>>>>> void foo (char*, FILE*); >>>>>> char* hprofStartupp(char *outputFileName, char *ctx) >>>>>> { >>>>>> char fileName[1000]; >>>>>> FILE *fp; >>>>>> sprintf(fileName, outputFileName); >>>>>> if (access(fileName, 1) == 0) { >>>>>> free(ctx); >>>>>> return 0; >>>>>> } >>>>>> >>>>>> fp = fopen(fileName, 0); >>>>>> if (fp == 0) { >>>>>> free(ctx); >>>>>> return 0; >>>>>> } >>>>>> >>>>>> foo(outputFileName, fp); >>>>>> >>>>>> return ctx; >>>>>> } >>>>>> >>>>>> AFAIU, there are 2 complementary methods of rtl optimizations proposed in PR20070. >>>>>> - Merging 2 blocks which are identical expect for input registers, by using a >>>>>> conditional move to choose between the different input registers. >>>>>> - Merging 2 blocks which have different local registers, by ignoring those >>>>>> differences >>>>>> >>>>>> Blocks .L6 and.L7 have no difference in local registers, but they have a >>>>>> difference in input registers: r3 and r1. Replacing the move to r5 by a >>>>>> conditional move would probably be benificial in terms of size, but it's not >>>>>> clear what condition the conditional move should be using. Calculating such a >>>>>> condition would add in size and increase the execution path. >>>>>> >>>>>> gcc -O2 -march=armv7-a -mthumb pr43864.c -S: >>>>>> ... >>>>>> push {r4, r5, lr} >>>>>> mov r4, r0 >>>>>> sub sp, sp, #1004 >>>>>> mov r5, r1 >>>>>> mov r0, sp >>>>>> mov r1, r4 >>>>>> bl sprintf >>>>>> mov r0, sp >>>>>> movs r1, #1 >>>>>> bl access >>>>>> mov r3, r0 >>>>>> cbz r0, .L6 >>>>>> movs r1, #0 >>>>>> mov r0, sp >>>>>> bl fopen >>>>>> mov r1, r0 >>>>>> cbz r0, .L7 >>>>>> mov r0, r4 >>>>>> bl foo >>>>>> .L3: >>>>>> mov r0, r5 >>>>>> add sp, sp, #1004 >>>>>> pop {r4, r5, pc} >>>>>> .L6: >>>>>> mov r0, r5 >>>>>> mov r5, r3 >>>>>> bl free >>>>>> b .L3 >>>>>> .L7: >>>>>> mov r0, r5 >>>>>> mov r5, r1 >>>>>> bl free >>>>>> b .L3 >>>>>> ... >>>>>> >>>>>> The proposed patch solved the problem by dealing with the 2 blocks at a level >>>>>> when they are still identical: at gimple level. It detect that the 2 blocks are >>>>>> identical, and removes one of them. >>>>>> >>>>>> The following table shows the impact of the patch on the example in terms of >>>>>> size for -march=armv7-a: >>>>>> >>>>>> without with delta >>>>>> Os : 108 104 -4 >>>>>> O2 : 120 104 -16 >>>>>> Os thumb: 68 64 -4 >>>>>> O2 thumb: 76 64 -12 >>>>>> >>>>>> The gain in size for -O2 is that of removing the entire block, plus the >>>>>> replacement of 2 moves by a constant set, which also decreases the execution >>>>>> path. The patch ensures optimal code for both -O2 and -Os. >>>>>> >>>>>> >>>>>> By keeping track of equivalent definitions in the 2 blocks, we can ignore those >>>>>> differences in comparison. Without this feature, we would only match blocks with >>>>>> resultless operations, due the the ssa-nature of gimples. >>>>>> For example, with this feature, we reduce the following function to its minimum >>>>>> at gimple level, rather than at rtl level. >>>>>> >>>> >>>> Example B. (naming it for reference below) >>>> >>>>>> int f(int c, int b, int d) >>>>>> { >>>>>> int r, e; >>>>>> >>>>>> if (c) >>>>>> r = b + d; >>>>>> else >>>>>> { >>>>>> e = b + d; >>>>>> r = e; >>>>>> } >>>>>> >>>>>> return r; >>>>>> } >>>>>> >>>>>> ;; Function f (f) >>>>>> >>>>>> f (int c, int b, int d) >>>>>> { >>>>>> int e; >>>>>> >>>>>> : >>>>>> e_6 = b_3(D) + d_4(D); >>>>>> return e_6; >>>>>> >>>>>> } >>>>>> >>>>>> I'll send the patch with the testcases in a separate email. >>>>>> >>>>>> OK for trunk? >>>>> >>>>> I don't like that you hook this into cleanup_tree_cfg - that is called >>>>> _way_ too often. >>>>> >>>> >>>> Here is a reworked patch that addresses several concerns, particularly the >>>> compile time overhead. >>>> >>>> Changes: >>>> - The optimization is now in a separate file. >>>> - The optimization is now a pass rather than a cleanup. That allowed me to >>>> remove the test for pass-local flags. >>>> New is the pass driver tail_merge_optimize, based on >>>> tree-cfgcleanup.c:cleanup_tree_cfg_1. >>>> - The pass is run once, on SSA. Before, the patch would >>>> fix example A only before SSA and example B only on SSA. >>>> In order to fix example A on SSA, I added these changes: >>>> - handle the vop state at entry of bb1 and bb2 as equal (gimple_equal_p) >>>> - insert vop phi in bb2, and use that one (update_vuses) >>>> - complete pt_solutions_equal_p. >>>> >>>> Passed x86_64 bootstrapping and regression testing, currently regtesting on ARM. >>>> >>>> I placed the pass at the earliest point where it fixes example B: After copy >>>> propagation and dead code elimination, specifically, after the first invocation >>>> of pass_cd_dce. Do you know (any other points) where the pass should be scheduled? >>> >>> It's probably reasonable to run it after IPA inlining has taken place which >>> means insert it somewhen after the second pass_fre (I'd suggest after >>> pass_merge_phi). >>> >> >> I placed it there, but I ran into some interaction with >> pass_late_warn_uninitialized. Addition of the pass makes test >> gcc.dg/uninit-pred-2_c.c fail. >> >> FAIL: gcc.dg/uninit-pred-2_c.c bogus uninitialized var warning >> (test for bogus messages, line 43) >> FAIL: gcc.dg/uninit-pred-2_c.c real uninitialized var warning >> (test for warnings, line 45) >> >> int foo_2 (int n, int m, int r) >> { >> int flag = 0; >> int v; >> >> if (n) >> { >> v = r; >> flag = 1; >> } >> >> if (m) g++; >> else bar (); >> >> if (flag) >> blah (v); { dg-bogus "uninitialized" "bogus uninitialized var warning" } >> else >> blah (v); { dg-warning "uninitialized" "real uninitialized var warning" } >> >> return 0; >> } >> >> The pass replaces the second call to blah with the first one, and eliminates >> the if. After that, the uninitialized warning is issued for the line number >> of the first call to blah, while at source level the warning only makes sense >> for the second call to blah. >> >> Shall I try putting the pass after pass_late_warn_uninitialized? > > No, simply pass -fno-tree-tail-merge in the testcase. > >>> But my general comment still applies - I don't like the structural >>> comparison code at all and you should really use the value-numbering >>> machineries we have >> >> I now use sccvn. > > Good. > >>> or even better, merge this pass with FRE itself >>> (or DOM if that suits you more). For FRE you'd want to hook into >>> tree-ssa-pre.c:eliminate(). >>> >> >> If we need to do the transformation after pass_late_warn_uninitialized, it needs >> to stay on its own, I suppose. > > I suppose part of the high cost of the pass is running SCCVN, so it > makes sense to share that with the existing FRE run. Done. > Any reason > you use VN_NOWALK? > No, that was just a first-try value. >>>>> This also duplicates the literal matching done on the RTL level - instead >>>>> I think this optimization would be more related to value-numbering >>>>> (either that of SCCVN/FRE/PRE or that of DOM which also does >>>>> jump-threading). >>>> >>>> The pass currently does just duplicate block elimination, not cross-jumping. >>>> If we would like to extend this to cross-jumping, I think we need to do the >>>> reverse of value numbering: walk backwards over the bb, and keep track of the >>>> way values are used rather than defined. This will allows us to make a cut >>>> halfway a basic block. >>> >>> I don't understand - I propose to do literal matching but using value-numbering >>> for tracking equivalences to avoid literal matching for stuff we know is >>> equivalent. In fact I think it will be mostly calls and stores where we >>> need to do literal matching, but never intermediate computations on >>> registers. >>> >> >> I tried to implement that scheme now. >> >>> But maybe I miss something here. >>> >>>> In general, we cannot do cut halfway a basic block in the current implementation >>>> (of value numbering and forward matching), since we assume equivalence of the >>>> incoming vops at bb entry. This assumption is in general only valid if we indeed >>>> replace the entire block by another entire block. >>> >>> Why are VOPs of concern at all? >>> >> >> In the previous version, I inserted the phis for the vops manually. >> In the current version of the pass, I let TODO_update_ssa_only_virtuals deal >> with vops, so it's not relevant anymore. >> >>>> I imagine that a cross-jumping heuristic would be based on the length of the >>>> match and the amount of non-vop phis it would introduce. Then value numbering >>>> would be something orthogonal to this optimization, which would reduce amount of >>>> phis needed for a cross-jump. >>>> I think it would make sense to use SCCVN value numbering at the point that we >>>> have this backward matching. >>>> >>>> I'm not sure whether it's a good idea to try to replace the current forward >>>> local value numbering with SCCVN value numbering, since we currently declare >>>> vops equal, which are, in the global sense, not equal. And once we go to >>>> backward matching, we'll need something to keep track of the uses, and we can >>>> reuse the current infrastructure for that, but not the SCCVN value numbering. >>>> >>>> Does that make any sense? >>> >>> Ok, let me think about this a bit. >>> >> >> I tried to to be more clear on this in the header comment of the pass. >> >>> For now about the patch in general. The functions need renaming to >>> something more sensible now that this isn't cfg-cleanup anymore. >>> >>> I miss a general overview of the pass - it's hard to reverse engineer >>> its working for me. >> >> I added a header comment. >> >>> Like (working backwards), you are detecting >>> duplicate predecessors >>> - that obviously doesn't work for duplicates >>> without any successors, like those ending in noreturn calls. >>> >> >> Merging of blocks without successors works now. >> >>> + n = EDGE_COUNT (bb->preds); >>> + >>> + for (i = 0; i < n; ++i) >>> + { >>> + e1 = EDGE_PRED (bb, i); >>> + if (e1->flags & EDGE_COMPLEX) >>> + continue; >>> + for (j = i + 1; j < n; ++j) >>> + { >>> >>> that's quadratic in the number of predecessors. >>> >> >> The quadratic comparison is now limited by PARAM_TAIL_MERGE_MAX_COMPARISONS. >> Each bb is compared to maximally PARAM_TAIL_MERGE_MAX_COMPARISONS similar bbs >> per worklist iteration. >> >>> + /* Block e1->src might be deleted. If bb and e1->src are the same >>> + block, delete e2->src instead, by swapping e1 and e2. */ >>> + e1_swapped = (bb == e1->src) ? e2: e1; >>> + e2_swapped = (bb == e1->src) ? e1: e2; >>> >>> is that because you incrementally merge preds two at a time? As you >>> are deleting blocks don't you need to adjust the quadratic walking? >>> Thus, with say four equivalent preds won't your code crash anyway? >>> >> >> I think it was to make calculation of dominator info easier, but I use now >> functions from dominance.c for that, so this piece of code is gone. >> >>> I think the code needs to delay the CFG manipulation to the end >>> of this function. >>> >> >> I now delay the cfg manipulation till after each analysis phase. >> >>> +/* Returns whether for all phis in E1->dest the phi alternatives for E1 and >>> + E2 are either: >>> + - equal, or >>> + - defined locally in E1->src and E2->src. >>> + In the latter case, register the alternatives in *PHI_EQUIV. */ >>> + >>> +static bool >>> +same_or_local_phi_alternatives (equiv_t *phi_equiv, edge e1, edge e2) >>> +{ >>> + int n1 = e1->dest_idx; >>> + int n2 = e2->dest_idx; >>> + gimple_stmt_iterator gsi; >>> + basic_block dest = e1->dest; >>> + gcc_assert (dest == e2->dest); >>> >>> too many asserts in general - I'd say for this case pass in the destination >>> block as argument. >>> >>> + gcc_assert (val1 != NULL_TREE); >>> + gcc_assert (val2 != NULL_TREE); >>> >>> superfluous. >>> >>> +static bool >>> +cleanup_duplicate_preds_1 (equiv_t phi_equiv, edge e1, edge e2) >>> ... >>> + VEC (edge,heap) *redirected_edges; >>> + gcc_assert (bb == e2->dest); >>> >>> same. >>> >>> + if (e1->flags != e2->flags) >>> + return false; >>> >>> that's bad - it should handle EDGE_TRUE/FALSE_VALUE mismatches >>> by swapping edges in the preds. >>> >> >> That's handled now. >> >>> + /* TODO: We could allow multiple successor edges here, as long as bb1 and bb2 >>> + have the same successors. */ >>> + if (EDGE_COUNT (bb1->succs) != 1 || EDGE_COUNT (bb2->succs) != 1) >>> + return false; >>> >>> hm, ok - that would need fixing, too. Same or mergeable successors >>> of course, which makes me wonder if doing this whole transformation >>> incrementally and locally is a good idea ;) Also >>> >> >> Also handled now. >> >>> + /* Calculate the changes to be made to the dominator info. >>> + Calculate bb2_dom. */ >>> ... >>> >>> wouldn't be necessary I suppose (just throw away dom info after the >>> pass). >>> >>> That is, I'd globally record BB equivalences (thus, "value-number" >>> BBs) and apply the CFG manipulations at a single point. >>> >> >> I delay the cfg manipulation till after each analysis phase. Delaying the cfg >> manipulation till the end of the pass instead might make the analysis code more >> convoluted. >> >>> Btw, I miss where you insert PHI nodes for all uses that flow in >>> from the preds preds - you do that for VOPs but not for real >>> operands? >>> >> >> Indeed, inserting phis for non-vops is a todo. >> >>> + /* Replace uses of vuse2 with uses of the phi. */ >>> + for (gsi = gsi_start_bb (bb2); !gsi_end_p (gsi); gsi_next (&gsi)) >>> + { >>> >>> why not walk immediate uses of the old PHI and SET_USE to >>> the new one instead (for those uses in the duplicate BB of course)? >>> >> >> And I no longer insert VOP phis, but let a TODO handle that, so this code is gone. > > Ok. Somewhat costly in comparison though. > I tried to add that back, guarded by update_vops. Handled in update_vuses, vop_phi, insn_vops, vop_at_entry, replace_block_by. >>> + case GSS_CALL: >>> + if (!pt_solution_equal_p (gimple_call_use_set (s1), >>> + gimple_call_use_set (s2)) >>> >>> I don't understand why you are concerned about equality of >>> points-to information. Why not simply ior it (pt_solution_ior_into - note >>> they are shared so you need to unshare them first). >>> >> >> I let a todo handle the alias info now. > > Hmm, that's not going to work if it's needed for correctness. > Should be handed my merge_calls now. >>> +/* Return true if p1 and p2 can be considered equal. */ >>> + >>> +static bool >>> +pt_solution_equal_p (struct pt_solution *p1, struct pt_solution *p2) >>> >>> would go into tree-ssa-structalias.c instead. >>> >>> +static bool >>> +gimple_base_equal_p (gimple s1, gimple s2) >>> +{ >>> ... >>> + if (gimple_modified_p (s1) || gimple_modified_p (s2)) >>> + return false; >>> >>> that shouldn't be of concern. >>> >>> + if (s1->gsbase.subcode != s2->gsbase.subcode) >>> + return false; >>> >>> for assigns that are of class GIMPLE_SINGLE_RHS we do not >>> update subcode during transformations so it can differ for now >>> equal statements. >>> >> >> handled properly now. >> >>> I'm not sure if a splay tree for the SSA name version equivalency >>> map is the best representation - I would have used a simple >>> array of num_ssa_names size and assign value-numbers >>> (the lesser version for example). >>> >>> Thus equiv_insert would do >>> >>> value = MIN (SSA_NAME_VERSION (val1), SSA_NAME_VERSION (val2)); >>> values[SSA_NAME_VERSION (val1)] = value; >>> values[SSA_NAME_VERSION (val2)] = value; >>> >>> if the names are not defined in bb1 resp. bb2 we would have to insert >>> a PHI node in the merged block - that would be a cost thingy for >>> doing this value-numbering in a more global way. >>> >> >> local value numbering code has been removed. >> >>> You don't seem to be concerned about the SSA names points-to >>> information, but it surely has the same issues as that of the calls >>> (so either they should be equal or they should be conservatively >>> merged). But as-is you don't allow any values to flow into the >>> merged blocks that are not equal for both edges, no? >>> >> >> Correct, that's still a todo. >> >>> + TV_TREE_CFG, /* tv_id */ >>> >>> add a new timevar. We wan to be able to turn the pass off, >>> so also add a new option (I can see it might make debugging harder >>> in some cases). >>> >> >> I added -ftree-tail-merge and TV_TREE_TAIL_MERGE. >> >>> Can you assess the effect of the patch on GCC itself (for example >>> when building cc1?)? What's the size benefit and the compile-time >>> overhead? >>> >> >> effect on building cc1: >> >> real user sys >> without: 19m50.158s 19m 2.090s 0m20.860s >> with: 19m59.456s 19m17.170s 0m20.350s >> ---------- >> +15.080s >> +1.31% > > That's quite a lot of time. > Measurement for this version: real user sys without 19m59.995s 19m 9.970s 0m21.050s with 19m56.160s 19m14.830s 0m21.530s ---------- +4.86s +0.42% text data bss dec hex filename 17547657 41736 1364384 18953777 1213631 without/cc1 17211049 41736 1364384 18617169 11c1351 with/cc1 -------- -336608 -1.92% >> $ size without/cc1 with/cc1 >> text data bss dec hex filename >> 17515986 41320 1364352 18921658 120b8ba without/cc1 >> 17399226 41320 1364352 18804898 11ef0a2 with/cc1 >> -------- >> -116760 >> -0.67% >> >> OK for trunk, provided build & reg-testing on ARM is ok? > > I miss additions to the testsuite. > I will send an updated patch on thread http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00625.html. > > +static bool > +bb_dominated_by_p (basic_block bb1, basic_block bb2) > > please use > > + if (TREE_CODE (val1) == SSA_NAME) > + { > + if (!same_preds > && !SSA_NAME_IS_DEFAULT_DEF (val1) > && !dominated_by_p (bb2, gimple_bb (SSA_NAME_DEF_STMT (val1))) > return false; > > instead. All stmts should have a BB apart from def stmts of default defs > (which are gimple_nops). > Done. > +/* Return the canonical scc_vn tree for X, if we can use scc_vn_info. > + Otherwise, return X. */ > + > +static tree > +gvn_val (tree x) > +{ > + return ((scc_vn_ok && x != NULL && TREE_CODE (x) == SSA_NAME) > + ? VN_INFO ((x))->valnum : x); > +} > > I suppose we want to export vn_valueize from tree-ssa-sccvn.c instead > which seems to perform the same. Done. > Do you ever call the above > when scc_vn_ok is false or x is NULL? Not in this version. Earlier, I also ran the pass if sccvn bailed out, but pre and fre only run if sccvn succeeded. > > +static bool > +gvn_uses_equal (tree val1, tree val2, basic_block bb1, > + basic_block bb2, bool same_preds) > +{ > + gimple def1, def2; > + basic_block def1_bb, def2_bb; > + > + if (val1 == NULL_TREE || val2 == NULL_TREE) > + return false; > > does this ever happen? Not in the current version. Removed. > > + if (gvn_val (val1) != gvn_val (val2)) > + return false; > > I suppose a shortcut > > if (val1 == val2) > return true; > > is possible? > Indeed. Added. > +static int *bb_size; > + > +/* Init bb_size administration. */ > + > +static void > +init_bb_size (void) > +{ > > if you need more per-BB info you can hook it into bb->aux. What's the > size used for (I guess I'll see below ...)? > > + for (gsi = gsi_start_nondebug_bb (bb); > + !gsi_end_p (gsi); gsi_next_nondebug (&gsi)) > + size++; > > is pretty rough. I guess for a quick false result for comparing BBs > (which means you could initialize the info lazily?) Done. > > +struct same_succ > +{ > + /* The bbs that have the same successor bbs. */ > + bitmap bbs; > + /* The successor bbs. */ > + bitmap succs; > + /* Indicates whether the EDGE_TRUE/FALSE_VALUEs of succ_flags are swapped for > + bb. */ > + bitmap inverse; > + /* The edge flags for each of the successor bbs. */ > + VEC (int, heap) *succ_flags; > + /* Indicates whether the struct is in the worklist. */ > + bool in_worklist; > +}; > > looks somewhat odd at first sight - maybe a overall comment what this > is used for is missing. Well, let's see. > Tried to add an overall comment. > +static hashval_t > +same_succ_hash (const void *ve) > +{ > + const_same_succ_t e = (const_same_succ_t)ve; > + hashval_t hashval = bitmap_hash (e->succs); > + int flags; > + unsigned int i; > + unsigned int first = bitmap_first_set_bit (e->bbs); > + int size = bb_size [first]; > + gimple_stmt_iterator gsi; > + gimple stmt; > + basic_block bb = BASIC_BLOCK (first); > + > + hashval = iterative_hash_hashval_t (size, hashval); > + for (gsi = gsi_start_nondebug_bb (bb); > + !gsi_end_p (gsi); gsi_next_nondebug (&gsi)) > + { > + stmt = gsi_stmt (gsi); > + hashval = iterative_hash_hashval_t (gimple_code (stmt), hashval); > + if (!is_gimple_call (stmt)) > + continue; > + if (gimple_call_internal_p (stmt)) > + hashval = iterative_hash_hashval_t > + ((hashval_t) gimple_call_internal_fn (stmt), hashval); > + else > + hashval = iterative_hash_expr (gimple_call_fn (stmt), hashval); > > you could also keep a cache of the BB hash as you keep a cache > of the size (if this function is called multiple times per BB). Right, I forgot it's a closed hash table. Added the cache of the bb hash. > The hash looks relatively weak - for all asignments it will hash > in GIMPLE_ASSIGN only ... I'd at least hash in gimple_assign_rhs_code. Done. > The call handling OTOH looks overly complicated to me ;) > That's an attempt to handle compiling insn-recog.c efficiently. All bbs without successors are grouped together, and even after selecting on same function name, there are still thousands of bbs in that group. I added now the args as well. > The hash will be dependent on stmt ordering even if that doesn't matter, > like > > i = i + 1; > j = j - 1; > > vs. the swapped variant. Right, that's a todo, added that in the header comment. > Similar the successor edges are not sorted, > so true/false edges may be in different order. > I keep a cache of the successor edge flags, in order of bbs. > Not sure yet if your comparison function would make those BBs > unequal anyway. > > +static bool > +inverse_flags (const_same_succ_t e1, const_same_succ_t e2) > +{ > + int f1a, f1b, f2a, f2b; > + int mask = ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE); > + > + if (VEC_length (int, e1->succ_flags) != 2) > + return false; > ... > > I wonder why you keep a VEC of successor edges in same_succ_t > instead of using the embedded successor edge vector in the basic_block > structure? > To keep the information in order of bbs, for quick comparison. > + bb_to_same_succ[bb->index] = *slot; > > looks like a candidate for per-BB info in bb->aux, too. > Done. > +static void > +find_same_succ (void) > +{ > + int i; > + same_succ_t same = same_succ_alloc (); > + > + for (i = 0; i < last_basic_block; ++i) > + { > + find_same_succ_bb (BASIC_BLOCK (i), &same); > + if (same == NULL) > + same = same_succ_alloc (); > + } > > I suppose you want FOR_EACH_BB (excluding entry/exit block) or > FOR_ALL_BB (including them). The above also can > have BASIC_BLOCK(i) == NULL. Similar in other places. > Done. > + for (i = 0; i < n1; ++i) > + { > + ei = EDGE_PRED (bb1, i); > + for (j = 0; j < n2; ++j) > + { > + ej = EDGE_PRED (bb2, j); > + if (ei->src != ej->src) > + continue; > + nr_matches++; > + break; > + } > + } > > FOR_EACH_EDGE (ei, iterator, bb1->preds) > if (!find_edge (ei->src, bb2)) > return false; > > is easier to parse. > Done. > +static bool > +gimple_subcode_equal_p (gimple s1, gimple s2, bool inv_cond) > +{ > + tree var, var_type; > + bool honor_nans; > + > + if (is_gimple_assign (s1) > + && gimple_assign_rhs_class (s1) == GIMPLE_SINGLE_RHS) > + return true; > > the subcode for GIMPLE_SINGLE_RHS is gimple_assign_rhs_code > (TREE_CODE of gimple_assign_rhs1 actually). > > +static bool > +gimple_base_equal_p (gimple s1, gimple s2, bool inv_cond) > > I wonder if you still need this given .. > > +static bool > +gimple_equal_p (gimple s1, gimple s2, bool same_preds, bool inv_cond) > +{ > + unsigned int i; > + enum gimple_statement_structure_enum gss; > + tree lhs1, lhs2; > + basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2); > + > + /* Handle omp gimples conservatively. */ > + if (is_gimple_omp (s1) || is_gimple_omp (s2)) > + return false; > + > + /* Handle lhs. */ > + lhs1 = gimple_get_lhs (s1); > + lhs2 = gimple_get_lhs (s2); > + if (lhs1 != NULL_TREE && lhs2 != NULL_TREE) > + return (same_preds && TREE_CODE (lhs1) == SSA_NAME > + && TREE_CODE (lhs2) == SSA_NAME > + && gvn_val (lhs1) == gvn_val (lhs2)); > + else if (!(lhs1 == NULL_TREE && lhs2 == NULL_TREE)) > + return false; > > all lhs equivalency is defered to GVN (which means all GIMPLE_ASSIGN > and GIMPLE_CALL stmts with a lhs). > > That leaves the case of calls without a lhs. I'd rather structure this > function like > > if (gimple_code (s1) != gimple_code (s2)) > return false; > swithc (gimple_code (s1)) > { > case GIMPLE_CALL: > ... compare arguments ... > if equal ok, if not and we have a lhs use GVN. > > case GIMPLE_ASSIGN: > ... compare GVN of the LHS ... > > case GIMPLE_COND: > ... compare operands ... > > default: > return false; > } > > Done. > +static bool > +bb_gimple_equal_p (basic_block bb1, basic_block bb2, bool same_preds, > + bool inv_cond) > + > +{ > > you don't do an early out by comparing the pre-computed sizes. This function is only called for bb with the same size. The hash table equal fuction does have the size comparison. > Mind > you can have hashtable collisions where they still differ (did you > check hashtable stats on it? how is the collision rate?) > I managed to lower the collision rate by specifying n_basic_blocks as hash table size. While compiling insn-recog.c, the highest collision rate for a function is 2.69. > +static bool > +bb_has_non_vop_phi (basic_block bb) > +{ > + gimple_seq phis = phi_nodes (bb); > + gimple phi; > + > + if (phis == NULL) > + return false; > + > + if (!gimple_seq_singleton_p (phis)) > + return true; > + > + phi = gimple_seq_first_stmt (phis); > + return !VOID_TYPE_P (TREE_TYPE (gimple_phi_result (phi))); > > return is_gimple_reg (gimple_phi_result (phi)); > Done. > +static void > +update_debug_stmts (void) > +{ > + int i; > + basic_block bb; > + > + for (i = 0; i < last_basic_block; ++i) > + { > + gimple stmt; > + gimple_stmt_iterator gsi; > + > + bb = BASIC_BLOCK (i); > > FOR_EACH_BB > > it must be possible to avoid scanning basic-blocks that are not affected > by the transform, no? In fact the only affected basic-blocks should be > those that were merged with another block? Done. I also check for MAY_HAVE_DEBUG_STMTS now. > > + /* Mark vops for updating. Without this, TODO_update_ssa_only_virtuals > + won't do anything. */ > + mark_sym_for_renaming (gimple_vop (cfun)); > > it won't insert any PHIs, that's correct. Still somewhat ugly, a manual > update of PHI nodes should be possible. > Added. I'm trying to be lazy about it though: + bool update_vops = ((todo & TODO_update_ssa_only_virtuals) == 0 + || !symbol_marked_for_renaming (gimple_vop (cfun))); If we're going to insert those phis anyway given the current todo, we don't bother. > + if (dump_file) > + { > + fprintf (dump_file, "Before TODOs.\n"); > > with TDF_DETAILS only please. > Done. > + free_dominance_info (CDI_DOMINATORS); > > if you keep dominance info up-to-date there is no need to free it. > Indeed. And by not freeing it, the info was checked, and I hit validation errors in that info, so the updating code had problems. I reverted back to calculating when needed and freeing when changed. > + TODO_verify_ssa | TODO_verify_stmts > + | TODO_verify_flow | TODO_update_ssa_only_virtuals > + | TODO_rebuild_alias > > please no TODO_rebuild_alias, simply remove it - alias info in merged > paths should be compatible enough if there is value-equivalence between > SSA names. At least you can't rely on TODO_rebuild_alias for > correctness - it is skipped if IPA PTA was run for example. > Done. > + | TODO_cleanup_cfg > > is that needed? If so return it from your execute function if you changed > anything only. But I doubt your transformation introduces cleanup > opportunities? > If all the predeccessor blocks of a block are merged, the block and it's remaining predecessor block might be merged, so that is a cleanup opportunity. Removed for the moment. > New options and params need documentation in doc/invoke.texi. > Added. > Thanks, > Richard. > Bootstrapped and reg-tested on x86_64. Ok for trunk (after ARM testing)? Thanks, - Tom 2011-07-17 Tom de Vries PR middle-end/43864 * tree-ssa-tail-merge.c: New file. (struct same_succ): Define. (same_succ_t, const_same_succ_t): New typedef. (struct bb_cluster): Define. (bb_cluster_t, const_bb_cluster_t): New typedef. (struct aux_bb_info): Define. (BB_SIZE, BB_SAME_SUCC, BB_CLUSTER, BB_VOP_AT_EXIT): Define. (gvn_uses_equal): New function. (same_succ_print, same_succ_print_traverse, same_succ_hash) (inverse_flags, same_succ_equal, same_succ_alloc, same_succ_delete) (same_succ_reset): New function. (same_succ_htab, same_succ_edge_flags) (deleted_bbs, deleted_bb_preds): New var. (debug_same_succ): New function. (worklist): New var. (print_worklist, add_to_worklist, find_same_succ_bb, find_same_succ) (init_worklist, delete_worklist, delete_basic_block_same_succ) (same_succ_flush_bbs, update_worklist): New function. (print_cluster, debug_cluster, same_predecessors) (add_bb_to_cluster, new_cluster, delete_cluster): New function. (all_clusters): New var. (alloc_cluster_vectors, reset_cluster_vectors, delete_cluster_vectors) (merge_clusters, set_cluster): New function. (gimple_equal_p, find_duplicate, same_phi_alternatives_1) (same_phi_alternatives, bb_has_non_vop_phi, find_clusters_1) (find_clusters): New function. (merge_calls, update_vuses, vop_phi, insn_vops, vop_at_entry) (replace_block_by): New function. (update_bbs): New var. (apply_clusters): New function. (update_debug_stmt, update_debug_stmts): New function. (tail_merge_optimize): New function. tree-flow.h (tail_merge_optimize): Declare. * tree-ssa-pre.c (execute_pre): Use tail_merge_optimize. * Makefile.in (OBJS-common): Add tree-ssa-tail-merge.o. (tree-ssa-tail-merge.o): New rule. * opts.c (default_options_table): Set OPT_ftree_tail_merge by default at OPT_LEVELS_2_PLUS. * tree-ssa-sccvn.c (vn_valueize): Move to ... * tree-ssa-sccvn.h (vn_valueize): Here. * tree-ssa-alias.h (pt_solution_ior_into_shared): Declare. * tree-ssa-structalias.c (find_what_var_points_to): Factor out and use ... (pt_solution_share): New function. (pt_solution_unshare, pt_solution_ior_into_shared): New function. (delete_points_to_sets): Nullify shared_bitmap_table after deletion. * timevar.def (TV_TREE_TAIL_MERGE): New timevar. * common.opt (ftree-tail-merge): New switch. * params.def (PARAM_MAX_TAIL_MERGE_COMPARISONS): New parameter. * doc/invoke.texi (Optimization Options, -O2): Add -ftree-tail-merge. (-ftree-tail-merge, max-tail-merge-comparisons): New item. Index: gcc/tree-ssa-tail-merge.c =================================================================== --- gcc/tree-ssa-tail-merge.c (revision 0) +++ gcc/tree-ssa-tail-merge.c (revision 0) @@ -0,0 +1,1676 @@ +/* Tail merging for gimple. + Copyright (C) 2011 Free Software Foundation, Inc. + Contributed by Tom de Vries (tom@codesourcery.com) + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +/* Pass overview. + + + MOTIVATIONAL EXAMPLE + + gimple representation of gcc/testsuite/gcc.dg/pr43864.c at + + hprofStartupp (charD.1 * outputFileNameD.2600, charD.1 * ctxD.2601) + { + struct FILED.1638 * fpD.2605; + charD.1 fileNameD.2604[1000]; + intD.0 D.3915; + const charD.1 * restrict outputFileName.0D.3914; + + # BLOCK 2 freq:10000 + # PRED: ENTRY [100.0%] (fallthru,exec) + # PT = nonlocal { D.3926 } (restr) + outputFileName.0D.3914_3 + = (const charD.1 * restrict) outputFileNameD.2600_2(D); + # .MEMD.3923_13 = VDEF <.MEMD.3923_12(D)> + # USE = nonlocal null { fileNameD.2604 D.3926 } (restr) + # CLB = nonlocal null { fileNameD.2604 D.3926 } (restr) + sprintfD.759 (&fileNameD.2604, outputFileName.0D.3914_3); + # .MEMD.3923_14 = VDEF <.MEMD.3923_13> + # USE = nonlocal null { fileNameD.2604 D.3926 } (restr) + # CLB = nonlocal null { fileNameD.2604 D.3926 } (restr) + D.3915_4 = accessD.2606 (&fileNameD.2604, 1); + if (D.3915_4 == 0) + goto ; + else + goto ; + # SUCC: 3 [10.0%] (true,exec) 4 [90.0%] (false,exec) + + # BLOCK 3 freq:1000 + # PRED: 2 [10.0%] (true,exec) + # .MEMD.3923_15 = VDEF <.MEMD.3923_14> + # USE = nonlocal null { fileNameD.2604 D.3926 } (restr) + # CLB = nonlocal null { fileNameD.2604 D.3926 } (restr) + freeD.898 (ctxD.2601_5(D)); + goto ; + # SUCC: 7 [100.0%] (fallthru,exec) + + # BLOCK 4 freq:9000 + # PRED: 2 [90.0%] (false,exec) + # .MEMD.3923_16 = VDEF <.MEMD.3923_14> + # PT = nonlocal escaped + # USE = nonlocal null { fileNameD.2604 D.3926 } (restr) + # CLB = nonlocal null { fileNameD.2604 D.3926 } (restr) + fpD.2605_8 = fopenD.1805 (&fileNameD.2604[0], 0B); + if (fpD.2605_8 == 0B) + goto ; + else + goto ; + # SUCC: 5 [1.9%] (true,exec) 6 [98.1%] (false,exec) + + # BLOCK 5 freq:173 + # PRED: 4 [1.9%] (true,exec) + # .MEMD.3923_17 = VDEF <.MEMD.3923_16> + # USE = nonlocal null { fileNameD.2604 D.3926 } (restr) + # CLB = nonlocal null { fileNameD.2604 D.3926 } (restr) + freeD.898 (ctxD.2601_5(D)); + goto ; + # SUCC: 7 [100.0%] (fallthru,exec) + + # BLOCK 6 freq:8827 + # PRED: 4 [98.1%] (false,exec) + # .MEMD.3923_18 = VDEF <.MEMD.3923_16> + # USE = nonlocal null { fileNameD.2604 D.3926 } (restr) + # CLB = nonlocal null { fileNameD.2604 D.3926 } (restr) + fooD.2599 (outputFileNameD.2600_2(D), fpD.2605_8); + # SUCC: 7 [100.0%] (fallthru,exec) + + # BLOCK 7 freq:10000 + # PRED: 3 [100.0%] (fallthru,exec) 5 [100.0%] (fallthru,exec) + 6 [100.0%] (fallthru,exec) + # PT = nonlocal null + + # ctxD.2601_1 = PHI <0B(3), 0B(5), ctxD.2601_5(D)(6)> + # .MEMD.3923_11 = PHI <.MEMD.3923_15(3), .MEMD.3923_17(5), + .MEMD.3923_18(6)> + # VUSE <.MEMD.3923_11> + return ctxD.2601_1; + # SUCC: EXIT [100.0%] + } + + bb 3 and bb 5 can be merged. The blocks have different predecessors, but the + same successors, and the same operations. + + + CONTEXT + + A technique called tail merging (or cross jumping) can fix the example + above. For a block, we look for common code at the end (the tail) of the + predecessor blocks, and insert jumps from one block to the other. + The example is a special case for tail merging, in that 2 whole blocks + can be merged, rather than just the end parts of it. + We currently only focus on whole block merging, so in that sense + calling this pass tail merge is a bit of a misnomer. + + We distinguish 2 kinds of situations in which blocks can be merged: + - same operations, same predecessors. The successor edges coming from one + block are redirected to come from the other block. + - same operations, same successors. The predecessor edges entering one block + are redirected to enter the other block. Note that this operation might + involve introducing phi operations. + + For efficient implementation, we would like to value numbers the blocks, and + have a comparison operator that tells us whether the blocks are equal. + Besides being runtime efficient, block value numbering should also abstract + from irrelevant differences in order of operations, much like normal value + numbering abstracts from irrelevant order of operations. + + For the first situation (same_operations, same predecessors), normal value + numbering fits well. We can calculate a block value number based on the + value numbers of the defs and vdefs. + + For the second situation (same operations, same successors), this approach + doesn't work so well. We can illustrate this using the example. The calls + to free use different vdefs: MEMD.3923_16 and MEMD.3923_14, and these will + remain different in value numbering, since they represent different memory + states. So the resulting vdefs of the frees will be different in value + numbering, so the block value numbers will be different. + + The reason why we call the blocks equal is not because they define the same + values, but because uses in the blocks use (possibly different) defs in the + same way. To be able to detect this efficiently, we need to do some kind of + reverse value numbering, meaning number the uses rather than the defs, and + calculate a block value number based on the value number of the uses. + Ideally, a block comparison operator will also indicate which phis are needed + to merge the blocks. + + For the moment, we don't do block value numbering, but we do insn-by-insn + matching, using scc value numbers to match operations with results, and + structural comparison otherwise, while ignoring vop mismatches. + + + IMPLEMENTATION + + 1. The pass first determines all groups of blocks with the same successor + blocks. + 2. Within each group, it tries to determine clusters of equal basic blocks. + 3. The clusters are applied. + 4. The same successor groups are updated. + 5. This process is repeated from 2 onwards, until no more changes. + + + LIMITATIONS/TODO + + - block only + - handles only 'same operations, same successors'. + It handles same predecessors as a special subcase though. + - does not implement the reverse value numbering and block value numbering. + - does not abstract from statement order. In order to do this, we need to + abstract from statement order in the hash function, and bb comparison + functions. + - improve memory allocation: use garbage collected memory, obstacks, + allocpools where appropriate. + - no insertion of gimple_reg phis, We only introduce vop-phis. + - handle blocks with gimple_reg phi_nodes. + + + SWITCHES + + - ftree-tail-merge. On at -O2. We may have to enable it only at -Os. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "tree.h" +#include "tm_p.h" +#include "basic-block.h" +#include "output.h" +#include "flags.h" +#include "function.h" +#include "tree-flow.h" +#include "timevar.h" +#include "bitmap.h" +#include "tree-ssa-alias.h" +#include "params.h" +#include "tree-pretty-print.h" +#include "hashtab.h" +#include "gimple-pretty-print.h" +#include "tree-ssa-sccvn.h" +#include "tree-dump.h" + +/* Describes a group of bbs with the same successors. The successor bbs are + cached in succs, and the successor edge flags are cached in succ_flags. + If a bb has the EDGE_TRUE/VALSE_VALUE flags swapped compared to succ_flags, + it's marked in inverse. + Additionally, the hash value for the struct is cached in hashval, and + in_worklist indicates whether it's currently part of worklist. */ + +struct same_succ +{ + /* The bbs that have the same successor bbs. */ + bitmap bbs; + /* The successor bbs. */ + bitmap succs; + /* Indicates whether the EDGE_TRUE/FALSE_VALUEs of succ_flags are swapped for + bb. */ + bitmap inverse; + /* The edge flags for each of the successor bbs. */ + VEC (int, heap) *succ_flags; + /* Indicates whether the struct is currently in the worklist. */ + bool in_worklist; + /* The hash value of the struct. */ + hashval_t hashval; +}; +typedef struct same_succ *same_succ_t; +typedef const struct same_succ *const_same_succ_t; + +/* A group of bbs where 1 bb from bbs can replace the other bbs. */ + +struct bb_cluster +{ + /* The bbs in the cluster. */ + bitmap bbs; + /* The preds of the bbs in the cluster. */ + bitmap preds; + /* index in all_clusters vector. */ + int index; +}; +typedef struct bb_cluster *bb_cluster_t; +typedef const struct bb_cluster *const_bb_cluster_t; + +/* Per bb-info. */ + +struct aux_bb_info +{ + /* The number of non-debug statements in the bb. */ + int size; + /* The same_succ that this bb is a member of. */ + same_succ_t same_succ; + /* The cluster that this bb is a member of. */ + bb_cluster_t cluster; + /* The vop state at the exit of a bb. This is shortlived data, used to + communicate data between update_block_by and update_vuses. */ + tree vop_at_exit; +}; + +/* Macros to access the fields of struct aux_bb_info. */ + +#define BB_SIZE(bb) (((struct aux_bb_info *)bb->aux)->size) +#define BB_SAME_SUCC(bb) (((struct aux_bb_info *)bb->aux)->same_succ) +#define BB_CLUSTER(bb) (((struct aux_bb_info *)bb->aux)->cluster) +#define BB_VOP_AT_EXIT(bb) (((struct aux_bb_info *)bb->aux)->vop_at_exit) + +/* VAL1 and VAL2 are either: + - uses in BB1 and BB2, or + - phi alternatives for BB1 and BB2. + SAME_PREDS indicates whether BB1 and BB2 have the same predecessors. + Return true if the uses have the same gvn value, and if the corresponding + defs can be used in both BB1 and BB2. */ + +static bool +gvn_uses_equal (tree val1, tree val2, basic_block bb1, + basic_block bb2, bool same_preds) +{ + gcc_checking_assert (val1 != NULL_TREE && val2 != NULL_TREE); + + if (val1 == val2) + return true; + + if (vn_valueize (val1) != vn_valueize (val2)) + return false; + + /* If BB1 and BB2 have the same predecessors, the same values are defined at + entry of BB1 and BB2. Otherwise, we need to check. */ + + if (TREE_CODE (val1) == SSA_NAME) + { + if (!same_preds + && !SSA_NAME_IS_DEFAULT_DEF (val1) + && !dominated_by_p (CDI_DOMINATORS, bb2, + gimple_bb (SSA_NAME_DEF_STMT (val1)))) + return false; + } + else if (!CONSTANT_CLASS_P (val1)) + return false; + + if (TREE_CODE (val2) == SSA_NAME) + { + if (!same_preds + && !SSA_NAME_IS_DEFAULT_DEF (val2) + && !dominated_by_p (CDI_DOMINATORS, bb1, + gimple_bb (SSA_NAME_DEF_STMT (val2)))) + return false; + } + else if (!CONSTANT_CLASS_P (val2)) + return false; + + return true; +} + +/* Prints E to FILE. */ + +static void +same_succ_print (FILE *file, const same_succ_t e) +{ + unsigned int i; + bitmap_print (file, e->bbs, "bbs:", "\n"); + bitmap_print (file, e->succs, "succs:", "\n"); + bitmap_print (file, e->inverse, "inverse:", "\n"); + fprintf (file, "flags:"); + for (i = 0; i < VEC_length (int, e->succ_flags); ++i) + fprintf (file, " %x", VEC_index (int, e->succ_flags, i)); + fprintf (file, "\n"); +} + +/* Prints same_succ VE to VFILE. */ + +static int +same_succ_print_traverse (void **ve, void *vfile) +{ + const same_succ_t e = *((const same_succ_t *)ve); + FILE *file = ((FILE*)vfile); + same_succ_print (file, e); + return 1; +} + +/* Calculates hash value for same_succ VE. */ + +static hashval_t +same_succ_hash (const void *ve) +{ + const_same_succ_t e = (const_same_succ_t)ve; + hashval_t hashval = bitmap_hash (e->succs); + int flags; + unsigned int i; + unsigned int first = bitmap_first_set_bit (e->bbs); + basic_block bb = BASIC_BLOCK (first); + int size = 0; + gimple_stmt_iterator gsi; + gimple stmt; + tree arg; + + for (gsi = gsi_start_nondebug_bb (bb); + !gsi_end_p (gsi); gsi_next_nondebug (&gsi)) + { + size++; + stmt = gsi_stmt (gsi); + hashval = iterative_hash_hashval_t (gimple_code (stmt), hashval); + if (is_gimple_assign (stmt)) + hashval = iterative_hash_hashval_t (gimple_assign_rhs_code (stmt), + hashval); + if (!is_gimple_call (stmt)) + continue; + if (gimple_call_internal_p (stmt)) + hashval = iterative_hash_hashval_t + ((hashval_t) gimple_call_internal_fn (stmt), hashval); + else + hashval = iterative_hash_expr (gimple_call_fn (stmt), hashval); + for (i = 0; i < gimple_call_num_args (stmt); i++) + { + arg = gimple_call_arg (stmt, i); + arg = vn_valueize (arg); + hashval = iterative_hash_expr (arg, hashval); + } + } + hashval = iterative_hash_hashval_t (size, hashval); + BB_SIZE (bb) = size; + + for (i = 0; i < VEC_length (int, e->succ_flags); ++i) + { + flags = VEC_index (int, e->succ_flags, i); + flags = flags & ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE); + hashval = iterative_hash_hashval_t (flags, hashval); + } + return hashval; +} + +/* Returns true if E1 and E2 have 2 successors, and if the successor flags + are inverse for the EDGE_TRUE_VALUE and EDGE_FALSE_VALUE flags, and equal for + the other edge flags. */ + +static bool +inverse_flags (const_same_succ_t e1, const_same_succ_t e2) +{ + int f1a, f1b, f2a, f2b; + int mask = ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE); + + if (VEC_length (int, e1->succ_flags) != 2) + return false; + + f1a = VEC_index (int, e1->succ_flags, 0); + f1b = VEC_index (int, e1->succ_flags, 1); + f2a = VEC_index (int, e2->succ_flags, 0); + f2b = VEC_index (int, e2->succ_flags, 1); + + if (f1a == f2a && f1b == f2b) + return false; + + return (f1a & mask) == (f2a & mask) && (f1b & mask) == (f2b & mask); +} + +/* Compares SAME_SUCCs VE1 and VE2. */ + +static int +same_succ_equal (const void *ve1, const void *ve2) +{ + const_same_succ_t e1 = (const_same_succ_t)ve1; + const_same_succ_t e2 = (const_same_succ_t)ve2; + unsigned int i, first1, first2; + gimple_stmt_iterator gsi1, gsi2; + gimple s1, s2; + basic_block bb1, bb2; + + if (e1->hashval != e2->hashval) + return 0; + + if (bitmap_bit_p (e1->bbs, ENTRY_BLOCK) + || bitmap_bit_p (e1->bbs, EXIT_BLOCK) + || bitmap_bit_p (e2->bbs, ENTRY_BLOCK) + || bitmap_bit_p (e2->bbs, EXIT_BLOCK)) + return 0; + + if (VEC_length (int, e1->succ_flags) != VEC_length (int, e2->succ_flags)) + return 0; + + if (!bitmap_equal_p (e1->succs, e2->succs)) + return 0; + + if (!inverse_flags (e1, e2)) + { + for (i = 0; i < VEC_length (int, e1->succ_flags); ++i) + if (VEC_index (int, e1->succ_flags, i) + != VEC_index (int, e1->succ_flags, i)) + return 0; + } + + first1 = bitmap_first_set_bit (e1->bbs); + first2 = bitmap_first_set_bit (e2->bbs); + + bb1 = BASIC_BLOCK (first1); + bb2 = BASIC_BLOCK (first2); + + if (BB_SIZE (bb1) != BB_SIZE (bb2)) + return 0; + + gsi1 = gsi_start_nondebug_bb (bb1); + gsi2 = gsi_start_nondebug_bb (bb2); + while (!(gsi_end_p (gsi1) || gsi_end_p (gsi2))) + { + s1 = gsi_stmt (gsi1); + s2 = gsi_stmt (gsi2); + if (gimple_code (s1) != gimple_code (s2)) + return 0; + if (is_gimple_call (s1) && !gimple_call_same_target_p (s1, s2)) + return 0; + gsi_next_nondebug (&gsi1); + gsi_next_nondebug (&gsi2); + } + + return 1; +} + +/* Alloc and init a new SAME_SUCC. */ + +static same_succ_t +same_succ_alloc (void) +{ + same_succ_t same = XNEW (struct same_succ); + + same->bbs = BITMAP_ALLOC (NULL); + same->succs = BITMAP_ALLOC (NULL); + same->inverse = BITMAP_ALLOC (NULL); + same->succ_flags = VEC_alloc (int, heap, 10); + same->in_worklist = false; + + return same; +} + +/* Delete same_succ VE. */ + +static void +same_succ_delete (void *ve) +{ + same_succ_t e = (same_succ_t)ve; + + BITMAP_FREE (e->bbs); + BITMAP_FREE (e->succs); + BITMAP_FREE (e->inverse); + VEC_free (int, heap, e->succ_flags); + + XDELETE (ve); +} + +/* Reset same_succ SAME. */ + +static void +same_succ_reset (same_succ_t same) +{ + bitmap_clear (same->bbs); + bitmap_clear (same->succs); + bitmap_clear (same->inverse); + VEC_truncate (int, same->succ_flags, 0); +} + +/* Hash table with all same_succ entries. */ + +static htab_t same_succ_htab; + +/* Array that is used to store the edge flags for a successor. */ + +static int *same_succ_edge_flags; + +/* Bitmap that is used to mark bbs that are recently deleted. */ + +static bitmap deleted_bbs; + +/* Bitmap that is used to mark predecessors of bbs that are + deleted. */ + +static bitmap deleted_bb_preds; + +/* Prints same_succ_htab to stderr. */ + +extern void debug_same_succ (void); +DEBUG_FUNCTION void +debug_same_succ ( void) +{ + htab_traverse (same_succ_htab, same_succ_print_traverse, stderr); +} + +DEF_VEC_P (same_succ_t); +DEF_VEC_ALLOC_P (same_succ_t, heap); + +/* Vector of bbs to process. */ + +static VEC (same_succ_t, heap) *worklist; + +/* Prints worklist to FILE. */ + +static void +print_worklist (FILE *file) +{ + unsigned int i; + for (i = 0; i < VEC_length (same_succ_t, worklist); ++i) + same_succ_print (file, VEC_index (same_succ_t, worklist, i)); +} + +/* Adds SAME to worklist. */ + +static void +add_to_worklist (same_succ_t same) +{ + if (same->in_worklist) + return; + + if (bitmap_count_bits (same->bbs) < 2) + return; + + same->in_worklist = true; + VEC_safe_push (same_succ_t, heap, worklist, same); +} + +/* Add BB to same_succ_htab. */ + +static void +find_same_succ_bb (basic_block bb, same_succ_t *same_p) +{ + unsigned int j; + bitmap_iterator bj; + same_succ_t same = *same_p; + same_succ_t *slot; + edge_iterator ei; + edge e; + + if (bb == NULL) + return; + bitmap_set_bit (same->bbs, bb->index); + FOR_EACH_EDGE (e, ei, bb->succs) + { + int index = e->dest->index; + bitmap_set_bit (same->succs, index); + same_succ_edge_flags[index] = e->flags; + } + EXECUTE_IF_SET_IN_BITMAP (same->succs, 0, j, bj) + VEC_safe_push (int, heap, same->succ_flags, same_succ_edge_flags[j]); + + same->hashval = same_succ_hash (same); + + slot = (same_succ_t *) htab_find_slot_with_hash (same_succ_htab, same, + same->hashval, INSERT); + if (*slot == NULL) + { + *slot = same; + BB_SAME_SUCC (bb) = same; + add_to_worklist (same); + *same_p = NULL; + } + else + { + bitmap_set_bit ((*slot)->bbs, bb->index); + BB_SAME_SUCC (bb) = *slot; + add_to_worklist (*slot); + if (inverse_flags (same, *slot)) + bitmap_set_bit ((*slot)->inverse, bb->index); + same_succ_reset (same); + } +} + +/* Find bbs with same successors. */ + +static void +find_same_succ (void) +{ + same_succ_t same = same_succ_alloc (); + basic_block bb; + + FOR_EACH_BB (bb) + { + find_same_succ_bb (bb, &same); + if (same == NULL) + same = same_succ_alloc (); + } + + same_succ_delete (same); +} + +/* Initializes worklist administration. */ + +static void +init_worklist (void) +{ + alloc_aux_for_blocks (sizeof (struct aux_bb_info)); + same_succ_htab + = htab_create (n_basic_blocks, same_succ_hash, same_succ_equal, + same_succ_delete); + same_succ_edge_flags = XCNEWVEC (int, last_basic_block); + deleted_bbs = BITMAP_ALLOC (NULL); + deleted_bb_preds = BITMAP_ALLOC (NULL); + worklist = VEC_alloc (same_succ_t, heap, n_basic_blocks); + find_same_succ (); + + if (dump_file) + { + fprintf (dump_file, "initial worklist:\n"); + print_worklist (dump_file); + } +} + +/* Deletes worklist administration. */ + +static void +delete_worklist (void) +{ + free_aux_for_blocks (); + htab_delete (same_succ_htab); + same_succ_htab = NULL; + XDELETEVEC (same_succ_edge_flags); + same_succ_edge_flags = NULL; + BITMAP_FREE (deleted_bbs); + BITMAP_FREE (deleted_bb_preds); + VEC_free (same_succ_t, heap, worklist); +} + +/* Mark BB as deleted, and mark its predecessors. */ + +static void +delete_basic_block_same_succ (basic_block bb) +{ + edge e; + edge_iterator ei; + + bitmap_set_bit (deleted_bbs, bb->index); + + FOR_EACH_EDGE (e, ei, bb->preds) + bitmap_set_bit (deleted_bb_preds, e->src->index); +} + +/* Removes all bbs in BBS from their corresponding same_succ. */ + +static void +same_succ_flush_bbs (bitmap bbs) +{ + unsigned int i; + bitmap_iterator bi; + + EXECUTE_IF_SET_IN_BITMAP (bbs, 0, i, bi) + { + basic_block bb = BASIC_BLOCK (i); + same_succ_t same = BB_SAME_SUCC (bb); + BB_SAME_SUCC (bb) = NULL; + if (bitmap_single_bit_set_p (same->bbs)) + htab_remove_elt_with_hash (same_succ_htab, same, same->hashval); + else + bitmap_clear_bit (same->bbs, i); + } +} + +/* For deleted_bb_preds, find bbs with same successors. */ + +static void +update_worklist (void) +{ + unsigned int i; + bitmap_iterator bi; + basic_block bb; + same_succ_t same; + + bitmap_and_compl_into (deleted_bb_preds, deleted_bbs); + bitmap_clear_bit (deleted_bb_preds, ENTRY_BLOCK); + same_succ_flush_bbs (deleted_bbs); + same_succ_flush_bbs (deleted_bb_preds); + + EXECUTE_IF_SET_IN_BITMAP (deleted_bbs, 0, i, bi) + delete_basic_block (BASIC_BLOCK (i)); + + same = same_succ_alloc (); + EXECUTE_IF_SET_IN_BITMAP (deleted_bb_preds, 0, i, bi) + { + bb = BASIC_BLOCK (i); + gcc_assert (bb != NULL); + find_same_succ_bb (bb, &same); + if (same == NULL) + same = same_succ_alloc (); + } + same_succ_delete (same); + + bitmap_clear (deleted_bbs); + bitmap_clear (deleted_bb_preds); +} + +/* Prints cluster C to FILE. */ + +static void +print_cluster (FILE *file, bb_cluster_t c) +{ + if (c == NULL) + return; + bitmap_print (file, c->bbs, "bbs:", "\n"); + bitmap_print (file, c->preds, "preds:", "\n"); +} + +/* Prints cluster C to stderr. */ + +extern void debug_cluster (bb_cluster_t); +DEBUG_FUNCTION void +debug_cluster (bb_cluster_t c) +{ + print_cluster (stderr, c); +} + +/* Returns true if bb1 and bb2 have the same predecessors. */ + +static bool +same_predecessors (basic_block bb1, basic_block bb2) +{ + edge e; + edge_iterator ei; + unsigned int n1 = EDGE_COUNT (bb1->preds), n2 = EDGE_COUNT (bb2->preds); + + if (n1 != n2) + return false; + + FOR_EACH_EDGE (e, ei, bb1->preds) + if (!find_edge (e->src, bb2)) + return false; + + return true; +} + +/* Add BB to cluster C. Sets BB in C->bbs, and preds of BB in C->preds. */ + +static void +add_bb_to_cluster (bb_cluster_t c, basic_block bb) +{ + edge e; + edge_iterator ei; + + bitmap_set_bit (c->bbs, bb->index); + + FOR_EACH_EDGE (e, ei, bb->preds) + bitmap_set_bit (c->preds, e->src->index); +} + +/* Allocate and init new cluster. */ + +static bb_cluster_t +new_cluster (void) +{ + bb_cluster_t c; + c = XCNEW (struct bb_cluster); + c->bbs = BITMAP_ALLOC (NULL); + c->preds = BITMAP_ALLOC (NULL); + return c; +} + +/* Delete clusters. */ + +static void +delete_cluster (bb_cluster_t c) +{ + if (c == NULL) + return; + BITMAP_FREE (c->bbs); + BITMAP_FREE (c->preds); + XDELETE (c); +} + +DEF_VEC_P (bb_cluster_t); +DEF_VEC_ALLOC_P (bb_cluster_t, heap); + +/* Array that contains all clusters. */ + +static VEC (bb_cluster_t, heap) *all_clusters; + +/* Allocate all cluster vectors. */ + +static void +alloc_cluster_vectors (void) +{ + all_clusters = VEC_alloc (bb_cluster_t, heap, n_basic_blocks); +} + +/* Reset all cluster vectors. */ + +static void +reset_cluster_vectors (void) +{ + unsigned int i; + basic_block bb; + for (i = 0; i < VEC_length (bb_cluster_t, all_clusters); ++i) + delete_cluster (VEC_index (bb_cluster_t, all_clusters, i)); + VEC_truncate (bb_cluster_t, all_clusters, 0); + FOR_EACH_BB (bb) + BB_CLUSTER (bb) = NULL; +} + +/* Delete all cluster vectors. */ + +static void +delete_cluster_vectors (void) +{ + unsigned int i; + for (i = 0; i < VEC_length (bb_cluster_t, all_clusters); ++i) + delete_cluster (VEC_index (bb_cluster_t, all_clusters, i)); + VEC_free (bb_cluster_t, heap, all_clusters); +} + +/* Merge cluster C2 into C1. */ + +static void +merge_clusters (bb_cluster_t c1, bb_cluster_t c2) +{ + bitmap_ior_into (c1->bbs, c2->bbs); + bitmap_ior_into (c1->preds, c2->preds); +} + +/* Register equivalence of BB1 and BB2 (members of cluster C). Store c in + all_clusters, or merge c with existing cluster. */ + +static void +set_cluster (basic_block bb1, basic_block bb2) +{ + basic_block merge_bb, other_bb; + bb_cluster_t merge, old, c; + + if (BB_CLUSTER (bb1) == NULL && BB_CLUSTER (bb2) == NULL) + { + c = new_cluster (); + add_bb_to_cluster (c, bb1); + add_bb_to_cluster (c, bb2); + BB_CLUSTER (bb1) = c; + BB_CLUSTER (bb2) = c; + c->index = VEC_length (bb_cluster_t, all_clusters); + VEC_safe_push (bb_cluster_t, heap, all_clusters, c); + } + else if (BB_CLUSTER (bb1) == NULL || BB_CLUSTER (bb2) == NULL) + { + merge_bb = BB_CLUSTER (bb1) == NULL ? bb2 : bb1; + other_bb = BB_CLUSTER (bb1) == NULL ? bb1 : bb2; + merge = BB_CLUSTER (merge_bb); + add_bb_to_cluster (merge, other_bb); + BB_CLUSTER (other_bb) = merge; + } + else if (BB_CLUSTER (bb1) != BB_CLUSTER (bb2)) + { + unsigned int i; + bitmap_iterator bi; + + old = BB_CLUSTER (bb2); + merge = BB_CLUSTER (bb1); + merge_clusters (merge, old); + EXECUTE_IF_SET_IN_BITMAP (old->bbs, 0, i, bi) + BB_CLUSTER (BASIC_BLOCK (i)) = merge; + VEC_replace (bb_cluster_t, all_clusters, old->index, NULL); + delete_cluster (old); + } + else + gcc_unreachable (); +} + +/* Return true if gimple statements S1 and S2 are equal. SAME_PREDS indicates + whether gimple_bb (s1) and gimple_bb (s2) (members of SAME_SUCC) have the + same predecessors. */ + +static bool +gimple_equal_p (same_succ_t same_succ, gimple s1, gimple s2, bool same_preds) +{ + unsigned int i; + tree lhs1, lhs2; + basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2); + tree t1, t2; + bool equal, inv_cond; + enum tree_code code1, code2; + + if (gimple_code (s1) != gimple_code (s2)) + return false; + + switch (gimple_code (s1)) + { + case GIMPLE_CALL: + if (gimple_call_num_args (s1) != gimple_call_num_args (s2)) + return false; + if (!gimple_call_same_target_p (s1, s2)) + return false; + + equal = true; + for (i = 0; i < gimple_call_num_args (s1); ++i) + { + t1 = gimple_call_arg (s1, i); + t2 = gimple_call_arg (s2, i); + if (operand_equal_p (t1, t2, 0)) + continue; + if (gvn_uses_equal (t1, t2, bb1, bb2, same_preds)) + continue; + equal = false; + break; + } + if (equal) + return true; + + lhs1 = gimple_get_lhs (s1); + lhs2 = gimple_get_lhs (s2); + return (lhs1 != NULL_TREE && lhs2 != NULL_TREE && same_preds + && TREE_CODE (lhs1) == SSA_NAME && TREE_CODE (lhs2) == SSA_NAME + && vn_valueize (lhs1) == vn_valueize (lhs2)); + + case GIMPLE_ASSIGN: + lhs1 = gimple_get_lhs (s1); + lhs2 = gimple_get_lhs (s2); + return (same_preds && TREE_CODE (lhs1) == SSA_NAME + && TREE_CODE (lhs2) == SSA_NAME + && vn_valueize (lhs1) == vn_valueize (lhs2)); + + case GIMPLE_COND: + t1 = gimple_cond_lhs (s1); + t2 = gimple_cond_lhs (s2); + if (!operand_equal_p (t1, t2, 0) + && !gvn_uses_equal (t1, t2, bb1, bb2, same_preds)) + return false; + + t1 = gimple_cond_rhs (s1); + t2 = gimple_cond_rhs (s2); + if (!operand_equal_p (t1, t2, 0) + && !gvn_uses_equal (t1, t2, bb1, bb2, same_preds)) + return false; + + code1 = gimple_expr_code (s1); + code2 = gimple_expr_code (s2); + inv_cond = (bitmap_bit_p (same_succ->inverse, bb1->index) + != bitmap_bit_p (same_succ->inverse, bb2->index)); + if (inv_cond) + { + bool honor_nans + = HONOR_NANS (TYPE_MODE (TREE_TYPE (gimple_cond_lhs (s1)))); + code2 = invert_tree_comparison (code2, honor_nans); + } + return code1 == code2; + + default: + return false; + } +} + +/* Determines whether BB1 and BB2 (members of same_succ) are duplicates. If so, + clusters them. SAME_PREDS indicates whether BB1 and BB2 have the same + predecessors. */ + +static void +find_duplicate (same_succ_t same_succ, basic_block bb1, basic_block bb2, + bool same_preds) +{ + gimple_stmt_iterator gsi1 = gsi_last_nondebug_bb (bb1); + gimple_stmt_iterator gsi2 = gsi_last_nondebug_bb (bb2); + bool end1 = gsi_end_p (gsi1); + bool end2 = gsi_end_p (gsi2); + + while (!end1 && !end2) + { + if (!gimple_equal_p (same_succ, gsi_stmt (gsi1), gsi_stmt (gsi2), + same_preds)) + return; + + gsi_prev_nondebug (&gsi1); + gsi_prev_nondebug (&gsi2); + end1 = gsi_end_p (gsi1); + end2 = gsi_end_p (gsi2); + } + + if (!(end1 && end2)) + return; + + if (dump_file) + fprintf (dump_file, "find_duplicates: duplicate of \n", + bb1->index, bb2->index); + + set_cluster (bb1, bb2); +} + +/* Returns whether for all phis in DEST the phi alternatives for E1 and + E2 are equal. SAME_PREDS indicates whether BB1 and BB2 have the same + predecessors. */ + +static bool +same_phi_alternatives_1 (basic_block dest, edge e1, edge e2, bool same_preds) +{ + int n1 = e1->dest_idx, n2 = e2->dest_idx; + basic_block bb1 = e1->src, bb2 = e2->src; + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_phis (dest); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple phi = gsi_stmt (gsi); + tree lhs = gimple_phi_result (phi); + tree val1 = gimple_phi_arg_def (phi, n1); + tree val2 = gimple_phi_arg_def (phi, n2); + + if (!is_gimple_reg (lhs)) + continue; + + if (operand_equal_for_phi_arg_p (val1, val2)) + continue; + if (gvn_uses_equal (val1, val2, bb1, bb2, same_preds)) + continue; + + return false; + } + + return true; +} + +/* Returns whether for all successors of BB1 and BB2 (members of SAME_SUCC), the + phi alternatives for BB1 and BB2 are equal. SAME_PREDS indicates whether BB1 + and BB2 have the same predecessors. */ + +static bool +same_phi_alternatives (same_succ_t same_succ, basic_block bb1, basic_block bb2, + bool same_preds) +{ + unsigned int s; + bitmap_iterator bs; + edge e1, e2; + basic_block succ; + + EXECUTE_IF_SET_IN_BITMAP (same_succ->succs, 0, s, bs) + { + succ = BASIC_BLOCK (s); + e1 = find_edge (bb1, succ); + e2 = find_edge (bb2, succ); + if (e1->flags & EDGE_COMPLEX + || e2->flags & EDGE_COMPLEX) + return false; + + /* For all phis in bb, the phi alternatives for e1 and e2 need to have + the same value. */ + if (!same_phi_alternatives_1 (succ, e1, e2, same_preds)) + return false; + } + + return true; +} + +/* Return true if BB has non-vop phis. */ + +static bool +bb_has_non_vop_phi (basic_block bb) +{ + gimple_seq phis = phi_nodes (bb); + gimple phi; + + if (phis == NULL) + return false; + + if (!gimple_seq_singleton_p (phis)) + return true; + + phi = gimple_seq_first_stmt (phis); + return is_gimple_reg (gimple_phi_result (phi)); +} + +/* Within SAME_SUCC->bbs, find clusters of bbs which can be merged. */ + +static void +find_clusters_1 (same_succ_t same_succ) +{ + basic_block bb1, bb2; + unsigned int i, j; + bitmap_iterator bi, bj; + bool same_preds; + int nr_comparisons; + int max_comparisons = PARAM_VALUE (PARAM_MAX_TAIL_MERGE_COMPARISONS); + + if (same_succ == NULL) + return; + + EXECUTE_IF_SET_IN_BITMAP (same_succ->bbs, 0, i, bi) + { + bb1 = BASIC_BLOCK (i); + + /* TODO: handle blocks with phi-nodes. We'll have find corresponding + phi-nodes in bb1 and bb2, with the same alternatives for the same + preds. */ + if (bb_has_non_vop_phi (bb1)) + continue; + + nr_comparisons = 0; + EXECUTE_IF_SET_IN_BITMAP (same_succ->bbs, i + 1, j, bj) + { + bb2 = BASIC_BLOCK (j); + + if (bb_has_non_vop_phi (bb2)) + continue; + + if (BB_CLUSTER (bb1) != NULL + && BB_CLUSTER (bb1) == BB_CLUSTER (bb2)) + continue; + + /* Limit quadratic behaviour. */ + nr_comparisons++; + if (nr_comparisons > max_comparisons) + break; + + same_preds = same_predecessors (bb1, bb2); + + if (!(same_phi_alternatives (same_succ, bb1, bb2, same_preds))) + continue; + find_duplicate (same_succ, bb1, bb2, same_preds); + } + } +} + +/* Find clusters of bbs which can be merged. */ + +static void +find_clusters (void) +{ + same_succ_t same; + + while (!VEC_empty (same_succ_t, worklist)) + { + same = VEC_pop (same_succ_t, worklist); + same->in_worklist = false; + if (dump_file) + { + fprintf (dump_file, "processing worklist entry\n"); + same_succ_print (dump_file, same); + } + find_clusters_1 (same); + } +} + +/* Merge the alias info of the calls in BB1 into the calls in BB2. */ + +static void +merge_calls (basic_block bb1, basic_block bb2) +{ + gimple_stmt_iterator gsi1 = gsi_start_nondebug_bb (bb1); + gimple_stmt_iterator gsi2 = gsi_start_nondebug_bb (bb2); + bool end1, end2; + gimple s1, s2; + + end1 = gsi_end_p (gsi1); + end2 = gsi_end_p (gsi2); + + while (true) + { + if (end1 && end2) + return; + gcc_assert (!end1 && !end2); + s1 = gsi_stmt (gsi1); + s2 = gsi_stmt (gsi2); + + if (is_gimple_call (s1) && is_gimple_call (s2)) + { + pt_solution_ior_into_shared (gimple_call_use_set (s2), + gimple_call_use_set (s1)); + pt_solution_ior_into_shared (gimple_call_clobber_set (s2), + gimple_call_clobber_set (s1)); + } + else + gcc_assert (!is_gimple_call (s1) && !is_gimple_call (s2)); + + gsi_next_nondebug (&gsi1); + gsi_next_nondebug (&gsi2); + end1 = gsi_end_p (gsi1); + end2 = gsi_end_p (gsi2); + } +} + +/* Create or update a vop phi in BB2. Use VUSE1 arguments for all the + REDIRECTED_EDGES, or if VUSE1 is NULL_TREE, use BB_VOP_AT_EXIT. If a new + phis is created, use the phi instead of VUSE2 in BB2. */ + +static void +update_vuses (tree vuse1, tree vuse2, basic_block bb2, + VEC (edge,heap) *redirected_edges) +{ + gimple stmt, phi = NULL; + tree lhs, arg, current_arg; + unsigned int i; + gimple def_stmt2; + source_location locus1, locus2; + imm_use_iterator iter; + use_operand_p use_p; + edge_iterator ei; + edge e; + + if (vuse2 == NULL_TREE) + return; + + def_stmt2 = SSA_NAME_DEF_STMT (vuse2); + + /* Update existing phi. */ + if (gimple_bb (def_stmt2) == bb2) + { + phi = def_stmt2; + + for (i = 0; i < EDGE_COUNT (redirected_edges); ++i) + { + e = VEC_index (edge, redirected_edges, i); + if (vuse1) + arg = vuse1; + else + arg = BB_VOP_AT_EXIT (e->src); + current_arg = PHI_ARG_DEF_FROM_EDGE (phi, e); + if (current_arg == NULL) + { + locus1 = gimple_location (SSA_NAME_DEF_STMT (arg)); + add_phi_arg (phi, arg, e, locus1); + } + else + gcc_assert (arg == current_arg); + } + return; + } + + /* No need to create a phi with 2 equal arguments. */ + if (vuse1 == vuse2) + return; + + locus2 = gimple_location (def_stmt2); + + /* Create a phi, first with default argument vuse2 for all preds. */ + lhs = make_ssa_name (SSA_NAME_VAR (vuse2), NULL); + VN_INFO_GET (lhs); + phi = create_phi_node (lhs, bb2); + SSA_NAME_DEF_STMT (lhs) = phi; + FOR_EACH_EDGE (e, ei, bb2->preds) + add_phi_arg (phi, vuse2, e, locus2); + + /* Now overwrite the arguments associated with the redirected edges with + vuse1. */ + for (i = 0; i < EDGE_COUNT (redirected_edges); ++i) + { + e = VEC_index (edge, redirected_edges, i); + gcc_assert (PHI_ARG_DEF_FROM_EDGE (phi, e)); + if (vuse1) + arg = vuse1; + else + arg = BB_VOP_AT_EXIT (e->src); + SET_PHI_ARG_DEF (phi, e->dest_idx, arg); + locus1 = gimple_location (SSA_NAME_DEF_STMT (arg)); + gimple_phi_arg_set_location (phi, e->dest_idx, locus1); + } + + /* Replace uses of vuse2 in bb2 with phi. */ + FOR_EACH_IMM_USE_STMT (stmt, iter, vuse2) + { + if (gimple_code (stmt) == GIMPLE_PHI) + { + edge e; + if (stmt == phi) + continue; + e = find_edge (bb2, gimple_bb (stmt)); + if (e == NULL) + continue; + use_p = PHI_ARG_DEF_PTR_FROM_EDGE (stmt, e); + SET_USE (use_p, lhs); + update_stmt (stmt); + } + else if (gimple_bb (stmt) == bb2) + { + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) + SET_USE (use_p, lhs); + update_stmt (stmt); + } + } +} + +/* Returns the vop phi of BB, if any. */ + +static gimple +vop_phi (basic_block bb) +{ + gimple stmt; + gimple_stmt_iterator gsi; + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + stmt = gsi_stmt (gsi); + if (is_gimple_reg (gimple_phi_result (stmt))) + continue; + return stmt; + } + return NULL; +} + +/* Scans the vdefs and vuses of the insn of BB, and returns the vop at entry in + VOP_AT_ENTRY, and the vop at exit in VOP_AT_EXIT. */ + +static void +insn_vops (basic_block bb, tree *vop_at_entry, tree *vop_at_exit) +{ + gimple stmt; + gimple_stmt_iterator gsi; + tree vuse, vdef; + tree last_vdef = NULL_TREE; + + if (*vop_at_entry != NULL_TREE && *vop_at_exit != NULL_TREE) + return; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + stmt = gsi_stmt (gsi); + vuse = gimple_vuse (stmt); + vdef = gimple_vdef (stmt); + if (vuse != NULL_TREE && *vop_at_entry == NULL_TREE) + { + *vop_at_entry = vuse; + if (*vop_at_exit != NULL_TREE) + return; + } + if (vdef != NULL_TREE) + last_vdef = vdef; + } + + *vop_at_exit = last_vdef != NULL_TREE ? last_vdef : *vop_at_entry; +} + +/* Returns the vop at entry of BB1 in VOP_AT_ENTRY1, and the one of bb2 in + VOP_AT_ENTRY2, where BB1 and BB2 have the same successors. */ + +static void +vop_at_entry (basic_block bb1, basic_block bb2, tree *vop_at_entry1, + tree *vop_at_entry2) +{ + gimple succ_phi, bb1_phi, bb2_phi; + basic_block succ; + tree vop_at_exit1 = NULL_TREE, vop_at_exit2 = NULL_TREE; + bool same_at_exit; + + bb1_phi = vop_phi (bb1); + bb2_phi = vop_phi (bb2); + + *vop_at_entry1 = bb1_phi != NULL ? gimple_phi_result (bb1_phi) : NULL_TREE; + *vop_at_entry2 = bb2_phi != NULL ? gimple_phi_result (bb2_phi) : NULL_TREE; + + if (*vop_at_entry1 != NULL_TREE && *vop_at_entry2 != NULL_TREE) + return; + + if (EDGE_COUNT (bb1->succs) != 0) + { + succ = EDGE_SUCC (bb1, 0)->dest; + succ_phi = vop_phi (succ); + if (succ_phi != NULL) + { + vop_at_exit1 + = PHI_ARG_DEF_FROM_EDGE (succ_phi, find_edge (bb1, succ)); + vop_at_exit2 + = PHI_ARG_DEF_FROM_EDGE (succ_phi, find_edge (bb2, succ)); + } + } + + same_at_exit = vop_at_exit1 == vop_at_exit2; + + if (*vop_at_entry1 == NULL_TREE && vop_at_exit1 != NULL_TREE + && gimple_bb (SSA_NAME_DEF_STMT (vop_at_exit1)) != bb1) + *vop_at_entry1 = vop_at_exit1; + + if (*vop_at_entry2 == NULL_TREE && vop_at_exit2 != NULL_TREE + && gimple_bb (SSA_NAME_DEF_STMT (vop_at_exit2)) != bb2) + *vop_at_entry2 = vop_at_exit2; + + if (*vop_at_entry1 != NULL_TREE && *vop_at_entry2 != NULL_TREE) + return; + + insn_vops (bb1, vop_at_entry1, &vop_at_exit1); + insn_vops (bb2, vop_at_entry2, &vop_at_exit2); + + if (*vop_at_entry1 != NULL_TREE && *vop_at_entry2 != NULL_TREE) + return; + + if (same_at_exit && vop_at_exit1 != NULL_TREE + && *vop_at_entry2 == NULL_TREE + && dominated_by_p (CDI_DOMINATORS, bb2, bb1)) + *vop_at_entry2 = vop_at_exit1; + + if (same_at_exit && vop_at_exit2 != NULL_TREE + && *vop_at_entry1 == NULL_TREE + && dominated_by_p (CDI_DOMINATORS, bb1, bb2)) + *vop_at_entry1 = vop_at_exit2; + + if (*vop_at_entry1 != NULL_TREE && *vop_at_entry2 != NULL_TREE) + return; + + gcc_assert (*vop_at_entry1 == NULL_TREE && *vop_at_entry2 == NULL_TREE); +} + +/* Redirect all edges from BB1 to BB2, marks BB1 for removal, and if + UPDATE_VOPS, inserts vop phis. */ + +static void +replace_block_by (basic_block bb1, basic_block bb2, bool update_vops) +{ + edge pred_edge; + unsigned int i; + tree phi_vuse1, phi_vuse2, arg; + VEC (edge,heap) *redirected_edges = NULL; + edge e; + edge_iterator ei; + + if (update_vops) + { + vop_at_entry (bb1, bb2, &phi_vuse1, &phi_vuse2); + + if (phi_vuse1 != NULL_TREE + && gimple_bb (SSA_NAME_DEF_STMT (phi_vuse1)) == bb1) + { + FOR_EACH_EDGE (e, ei, bb1->preds) + { + arg = PHI_ARG_DEF_FROM_EDGE (SSA_NAME_DEF_STMT (phi_vuse1), e); + BB_VOP_AT_EXIT (e->src) = arg; + } + phi_vuse1 = NULL; + } + redirected_edges = VEC_alloc (edge, heap, 10); + } + + delete_basic_block_same_succ (bb1); + + /* Redirect the incoming edges of bb1 to bb2. */ + for (i = EDGE_COUNT (bb1->preds); i > 0 ; --i) + { + pred_edge = EDGE_PRED (bb1, i - 1); + pred_edge = redirect_edge_and_branch (pred_edge, bb2); + gcc_assert (pred_edge != NULL); + if (update_vops) + VEC_safe_push (edge, heap, redirected_edges, pred_edge); + } + + if (update_vops) + { + update_vuses (phi_vuse1, phi_vuse2, bb2, redirected_edges); + VEC_free (edge, heap, redirected_edges); + } + + merge_calls (bb1, bb2); +} + +/* Bbs for which update_debug_stmt need to be called. */ + +static bitmap update_bbs; + +/* For each cluster in all_clusters, merge all cluster->bbs. Returns + number of bbs removed. Insert vop phis if UPDATE_VOPS. */ + +static int +apply_clusters (bool update_vops) +{ + basic_block bb1, bb2; + bb_cluster_t c; + unsigned int i, j; + bitmap_iterator bj; + int nr_bbs_removed = 0; + + for (i = 0; i < VEC_length (bb_cluster_t, all_clusters); ++i) + { + c = VEC_index (bb_cluster_t, all_clusters, i); + if (c == NULL) + continue; + + bb2 = BASIC_BLOCK (bitmap_first_set_bit (c->bbs)); + gcc_assert (bb2 != NULL); + + bitmap_set_bit (update_bbs, bb2->index); + EXECUTE_IF_SET_IN_BITMAP (c->bbs, 0, j, bj) + { + bb1 = BASIC_BLOCK (j); + gcc_assert (bb1 != NULL); + if (bb1 == bb2) + continue; + + bitmap_clear_bit (update_bbs, bb1->index); + replace_block_by (bb1, bb2, update_vops); + nr_bbs_removed++; + } + } + + return nr_bbs_removed; +} + +/* Resets debug statement STMT if it has uses that are not dominated by their + defs. */ + +static void +update_debug_stmt (gimple stmt) +{ + use_operand_p use_p; + ssa_op_iter oi; + basic_block bbdef, bbuse; + gimple def_stmt; + tree name; + + if (!gimple_debug_bind_p (stmt)) + return; + + bbuse = gimple_bb (stmt); + FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, oi, SSA_OP_USE) + { + name = USE_FROM_PTR (use_p); + gcc_assert (TREE_CODE (name) == SSA_NAME); + + def_stmt = SSA_NAME_DEF_STMT (name); + gcc_assert (def_stmt != NULL); + + bbdef = gimple_bb (def_stmt); + if (bbdef == NULL || bbuse == bbdef + || dominated_by_p (CDI_DOMINATORS, bbuse, bbdef)) + continue; + + gimple_debug_bind_reset_value (stmt); + update_stmt (stmt); + } +} + +/* Resets all debug statements that have uses that are not + dominated by their defs. */ + +static void +update_debug_stmts (void) +{ + basic_block bb; + bitmap_iterator bi; + unsigned int i; + + if (!MAY_HAVE_DEBUG_STMTS) + return; + + EXECUTE_IF_SET_IN_BITMAP (update_bbs, 0, i, bi) + { + gimple stmt; + gimple_stmt_iterator gsi; + + bb = BASIC_BLOCK (i); + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + stmt = gsi_stmt (gsi); + if (!is_gimple_debug (stmt)) + continue; + update_debug_stmt (stmt); + } + } +} + +/* Runs tail merge optimization. */ + +unsigned int +tail_merge_optimize (unsigned int todo) +{ + int nr_bbs_removed_total = 0; + int nr_bbs_removed; + bool loop_entered = false; + int iteration_nr = 0; + bool update_vops = ((todo & TODO_update_ssa_only_virtuals) == 0 + || !symbol_marked_for_renaming (gimple_vop (cfun))); + + if (!flag_tree_tail_merge) + return 0; + + timevar_push (TV_TREE_TAIL_MERGE); + + init_worklist (); + + while (!VEC_empty (same_succ_t, worklist)) + { + if (!loop_entered) + { + loop_entered = true; + alloc_cluster_vectors (); + update_bbs = BITMAP_ALLOC (NULL); + } + else + reset_cluster_vectors (); + + iteration_nr++; + if (dump_file) + fprintf (dump_file, "worklist iteration #%d\n", iteration_nr); + + calculate_dominance_info (CDI_DOMINATORS); + find_clusters (); + gcc_assert (VEC_empty (same_succ_t, worklist)); + if (VEC_empty (bb_cluster_t, all_clusters)) + break; + + nr_bbs_removed = apply_clusters (update_vops); + nr_bbs_removed_total += nr_bbs_removed; + if (nr_bbs_removed == 0) + break; + + free_dominance_info (CDI_DOMINATORS); + update_worklist (); + } + + if (dump_file) + fprintf (dump_file, "htab collision / search: %f\n", + htab_collisions (same_succ_htab)); + + if (nr_bbs_removed_total > 0) + { + calculate_dominance_info (CDI_DOMINATORS); + update_debug_stmts (); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Before TODOs.\n"); + dump_function_to_file (current_function_decl, dump_file, dump_flags); + } + + todo |= (TODO_verify_ssa | TODO_verify_stmts | TODO_verify_flow + | TODO_dump_func); + } + + delete_worklist (); + if (loop_entered) + { + delete_cluster_vectors (); + BITMAP_FREE (update_bbs); + } + + timevar_pop (TV_TREE_TAIL_MERGE); + + return todo; +} Index: gcc/tree-ssa-sccvn.c =================================================================== --- gcc/tree-ssa-sccvn.c (revision 175801) +++ gcc/tree-ssa-sccvn.c (working copy) @@ -2872,19 +2872,6 @@ simplify_unary_expression (gimple stmt) return NULL_TREE; } -/* Valueize NAME if it is an SSA name, otherwise just return it. */ - -static inline tree -vn_valueize (tree name) -{ - if (TREE_CODE (name) == SSA_NAME) - { - tree tem = SSA_VAL (name); - return tem == VN_TOP ? name : tem; - } - return name; -} - /* Try to simplify RHS using equivalences and constant folding. */ static tree Index: gcc/tree-ssa-sccvn.h =================================================================== --- gcc/tree-ssa-sccvn.h (revision 175801) +++ gcc/tree-ssa-sccvn.h (working copy) @@ -209,4 +209,18 @@ unsigned int get_constant_value_id (tree unsigned int get_or_alloc_constant_value_id (tree); bool value_id_constant_p (unsigned int); tree fully_constant_vn_reference_p (vn_reference_t); + +/* Valueize NAME if it is an SSA name, otherwise just return it. */ + +static inline tree +vn_valueize (tree name) +{ + if (TREE_CODE (name) == SSA_NAME) + { + tree tem = VN_INFO (name)->valnum; + return tem == VN_TOP ? name : tem; + } + return name; +} + #endif /* TREE_SSA_SCCVN_H */ Index: gcc/tree-ssa-alias.h =================================================================== --- gcc/tree-ssa-alias.h (revision 175801) +++ gcc/tree-ssa-alias.h (working copy) @@ -134,6 +134,8 @@ extern bool pt_solutions_same_restrict_b extern void pt_solution_reset (struct pt_solution *); extern void pt_solution_set (struct pt_solution *, bitmap, bool, bool); extern void pt_solution_set_var (struct pt_solution *, tree); +extern void pt_solution_ior_into_shared (struct pt_solution *, + struct pt_solution *); extern void dump_pta_stats (FILE *); Index: gcc/opts.c =================================================================== --- gcc/opts.c (revision 175801) +++ gcc/opts.c (working copy) @@ -484,6 +484,7 @@ static const struct default_options defa { OPT_LEVELS_2_PLUS, OPT_falign_jumps, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_falign_labels, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_falign_functions, NULL, 1 }, + { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 }, /* -O3 optimizations. */ { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, Index: gcc/timevar.def =================================================================== --- gcc/timevar.def (revision 175801) +++ gcc/timevar.def (working copy) @@ -127,6 +127,7 @@ DEFTIMEVAR (TV_TREE_GIMPLIFY , "tre DEFTIMEVAR (TV_TREE_EH , "tree eh") DEFTIMEVAR (TV_TREE_CFG , "tree CFG construction") DEFTIMEVAR (TV_TREE_CLEANUP_CFG , "tree CFG cleanup") +DEFTIMEVAR (TV_TREE_TAIL_MERGE , "tree tail merge") DEFTIMEVAR (TV_TREE_VRP , "tree VRP") DEFTIMEVAR (TV_TREE_COPY_PROP , "tree copy propagation") DEFTIMEVAR (TV_FIND_REFERENCED_VARS , "tree find ref. vars") Index: gcc/tree-ssa-pre.c =================================================================== --- gcc/tree-ssa-pre.c (revision 175801) +++ gcc/tree-ssa-pre.c (working copy) @@ -4935,7 +4935,6 @@ execute_pre (bool do_fre) statistics_counter_event (cfun, "Constified", pre_stats.constified); clear_expression_ids (); - free_scc_vn (); if (!do_fre) { remove_dead_inserted_code (); @@ -4945,6 +4944,9 @@ execute_pre (bool do_fre) scev_finalize (); fini_pre (do_fre); + todo |= tail_merge_optimize (todo); + free_scc_vn (); + return todo; } Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 175801) +++ gcc/common.opt (working copy) @@ -1937,6 +1937,10 @@ ftree-dominator-opts Common Report Var(flag_tree_dom) Optimization Enable dominator optimizations +ftree-tail-merge +Common Report Var(flag_tree_tail_merge) Optimization +Enable tail merging on trees + ftree-dse Common Report Var(flag_tree_dse) Optimization Enable dead store elimination Index: gcc/tree-flow.h =================================================================== --- gcc/tree-flow.h (revision 175801) +++ gcc/tree-flow.h (working copy) @@ -806,6 +806,9 @@ bool multiplier_allowed_in_address_p (HO unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool); bool may_be_nonaddressable_p (tree expr); +/* In tree-ssa-tail-merge.c. */ +extern unsigned int tail_merge_optimize (unsigned int); + /* In tree-ssa-threadupdate.c. */ extern bool thread_through_all_blocks (bool); extern void register_jump_thread (edge, edge, edge); Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in (revision 175801) +++ gcc/Makefile.in (working copy) @@ -1466,6 +1466,7 @@ OBJS = \ tree-ssa-sccvn.o \ tree-ssa-sink.o \ tree-ssa-structalias.o \ + tree-ssa-tail-merge.o \ tree-ssa-ter.o \ tree-ssa-threadedge.o \ tree-ssa-threadupdate.o \ @@ -2427,6 +2428,13 @@ stor-layout.o : stor-layout.c $(CONFIG_H $(TREE_H) $(PARAMS_H) $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) output.h $(RTL_H) \ $(GGC_H) $(TM_P_H) $(TARGET_H) langhooks.h $(REGS_H) gt-stor-layout.h \ $(DIAGNOSTIC_CORE_H) $(CGRAPH_H) $(TREE_INLINE_H) $(TREE_DUMP_H) $(GIMPLE_H) +tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \ + $(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \ + $(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) output.h \ + $(TREE_H) $(TREE_FLOW_H) $(TREE_INLINE_H) \ + $(GIMPLE_H) $(FUNCTION_H) \ + $(TIMEVAR_H) tree-ssa-sccvn.h \ + $(CGRAPH_H) gimple-pretty-print.h tree-pretty-print.h $(PARAMS_H) tree-ssa-structalias.o: tree-ssa-structalias.c \ $(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(GGC_H) $(OBSTACK_H) $(BITMAP_H) \ $(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) output.h \ Index: gcc/tree-ssa-structalias.c =================================================================== --- gcc/tree-ssa-structalias.c (revision 175801) +++ gcc/tree-ssa-structalias.c (working copy) @@ -5688,6 +5688,48 @@ shared_bitmap_add (bitmap pt_vars) *slot = (void *) sbi; } +/* Unshares the points-to bitmap of PT. */ + +static void +pt_solution_unshare (struct pt_solution *pt) +{ + bitmap copy; + + if (pt == NULL || pt->vars == NULL || shared_bitmap_table == NULL) + return; + + copy = BITMAP_GGC_ALLOC (); + bitmap_copy (pt->vars, copy); + pt->vars = copy; +} + +/* Shares the points-to bitmap of PT. */ + +static void +pt_solution_share (struct pt_solution *pt) +{ + bitmap shared; + + if (pt == NULL || pt->vars == NULL || shared_bitmap_table == NULL) + return; + + shared = shared_bitmap_lookup (pt->vars); + + if (!shared) + { + /* Share unshared bitmap. */ + shared_bitmap_add (pt->vars); + return; + } + + /* Already using shared bitmap. */ + if (shared == pt->vars) + return; + + /* Use shared bitmap. */ + bitmap_clear (pt->vars); + pt->vars = shared; +} /* Set bits in INTO corresponding to the variable uids in solution set FROM. */ @@ -5734,7 +5776,6 @@ find_what_var_points_to (varinfo_t orig_ unsigned int i; bitmap_iterator bi; bitmap finished_solution; - bitmap result; varinfo_t vi; memset (pt, 0, sizeof (struct pt_solution)); @@ -5788,17 +5829,8 @@ find_what_var_points_to (varinfo_t orig_ stats.points_to_sets_created++; set_uids_in_ptset (finished_solution, vi->solution, pt); - result = shared_bitmap_lookup (finished_solution); - if (!result) - { - shared_bitmap_add (finished_solution); - pt->vars = finished_solution; - } - else - { - pt->vars = result; - bitmap_clear (finished_solution); - } + pt->vars = finished_solution; + pt_solution_share (pt); } /* Given a pointer variable P, fill in its points-to set. */ @@ -5921,6 +5953,25 @@ pt_solution_ior_into (struct pt_solution bitmap_ior_into (dest->vars, src->vars); } +/* Like pt_solution_ior_into, but may be used if the points-to bitmap + of *DEST might be shared. */ + +void +pt_solution_ior_into_shared (struct pt_solution *dest, struct pt_solution *src) +{ + if (!src->vars) + return; + if (!dest->vars) + { + dest->vars = src->vars; + return; + } + + pt_solution_unshare (dest); + pt_solution_ior_into (dest, src); + pt_solution_share (dest); +} + /* Return true if the points-to solution *PT is empty. */ bool @@ -6600,6 +6651,7 @@ delete_points_to_sets (void) unsigned int i; htab_delete (shared_bitmap_table); + shared_bitmap_table = NULL; if (dump_file && (dump_flags & TDF_STATS)) fprintf (dump_file, "Points to sets created:%d\n", stats.points_to_sets_created); Index: gcc/params.def =================================================================== --- gcc/params.def (revision 175801) +++ gcc/params.def (working copy) @@ -892,6 +892,11 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK, "Maximum number of conditional store pairs that can be sunk", 2, 0, 0) +DEFPARAM (PARAM_MAX_TAIL_MERGE_COMPARISONS, + "max-tail-merge-comparisons", + "Maximum amount of similar bbs to compare a bb with", + 10, 0, 0) + /* Local variables: Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 175801) +++ gcc/doc/invoke.texi (working copy) @@ -404,7 +404,7 @@ Objective-C and Objective-C++ Dialects}. -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol --ftree-sink -ftree-sra -ftree-switch-conversion @gol +-ftree-sink -ftree-sra -ftree-switch-conversion -ftree-tail-merge @gol -ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol -funit-at-a-time -funroll-all-loops -funroll-loops @gol -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol @@ -6091,7 +6091,7 @@ also turns on the following optimization -fsched-interblock -fsched-spec @gol -fschedule-insns -fschedule-insns2 @gol -fstrict-aliasing -fstrict-overflow @gol --ftree-switch-conversion @gol +-ftree-switch-conversion -ftree-tail-merge @gol -ftree-pre @gol -ftree-vrp} @@ -6974,6 +6974,11 @@ Perform conversion of simple initializat initializations from a scalar array. This flag is enabled by default at @option{-O2} and higher. +@item -ftree-tail-merge +Merges identical blocks with same successors. This flag is enabled by default +at @option{-O2} and higher. The run time of this pass can be limited using +@option{max-tail-merge-comparisons} parameter. + @item -ftree-dce @opindex ftree-dce Perform dead code elimination (DCE) on trees. This flag is enabled by @@ -8541,6 +8546,10 @@ This is used to avoid quadratic behavior The value of 0 will avoid limiting the search, but may slow down compilation of huge functions. The default value is 30. +@item max-tail-merge-comparisons +The maximum amount of similar bbs to compare a bb with. This is used to +avoid quadratic behaviour in tree tail merging. The default value is 10. + @item max-unrolled-insns The maximum number of instructions that a loop should have if that loop is unrolled, and if the loop is unrolled, it determines how many times