From patchwork Wed Nov 11 20:38:10 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 543092 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id F33881402D4 for ; Thu, 12 Nov 2015 07:38:27 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=kgFFm0/N; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=HugRWKrbc05rZAhDp 7+XupbWHCJ0y0J8KK0RpIHCkjza8JF7qEoa3mpDKN2vGhr6NOOXRw4UYPsKLfhGW 7MOIdZUIIBMcnqt+N/XTuHJcvHbgNi5XHPIjOxvfvn1ytEbWWBgowCGy0buOXbZG HvPx96+Rl/G+op4LA41EO4QSNg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=7m+U68lnt1NIe3FiThYskMl IJQE=; b=kgFFm0/N3ZuwLuGA4JFVhUaRzeTqrMQx1xoG6Y1pRCxLOslFbUE6UfU lYa5zYtEeIA0P2XM6oj6waYVrL7vFfIUtPVYmXTKxE6wJxj7TGHIKctWNmVySvz0 VH/WTca9pVLO8hnxG6+qmWHp83T2OEJQPNHbHFFKLNbDyJVlDx1Y= Received: (qmail 49387 invoked by alias); 11 Nov 2015 20:38:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 49372 invoked by uid 89); 11 Nov 2015 20:38:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Wed, 11 Nov 2015 20:38:13 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id DAA33200; Wed, 11 Nov 2015 20:38:11 +0000 (UTC) Received: from localhost.localdomain (ovpn-113-52.phx2.redhat.com [10.3.113.52]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id tABKcAVq007036; Wed, 11 Nov 2015 15:38:11 -0500 Subject: Re: [Patch, tree-optimization]: Add new path Splitting pass on tree ssa representation To: Ajit Kumar Agarwal , Richard Biener References: <37378DC5BCD0EE48BA4B082E0B55DFAA41F3F56C@XAP-PVEXMBX02.xlnx.xilinx.com> <37378DC5BCD0EE48BA4B082E0B55DFAA41F41150@XAP-PVEXMBX02.xlnx.xilinx.com> <37378DC5BCD0EE48BA4B082E0B55DFAA41F42541@XAP-PVEXMBX02.xlnx.xilinx.com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295155D@XAP-PVEXMBX02.xlnx.xilinx.com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295ADCB@XAP-PVEXMBX02.xlnx.xilinx.com> <55D4F921.2020708@redhat.com> <37378DC5BCD0EE48BA4B082E0B55DFAA4297704C@XAP-PVEXMBX02.xlnx.xilinx.com> Cc: GCC Patches , Vinod Kathail , Shail Aditya Gupta , Vidhumouli Hunsigida , Nagaraju Mekala From: Jeff Law Message-ID: <5643A732.4040707@redhat.com> Date: Wed, 11 Nov 2015 13:38:10 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <37378DC5BCD0EE48BA4B082E0B55DFAA4297704C@XAP-PVEXMBX02.xlnx.xilinx.com> X-IsSubscribed: yes On 09/04/2015 11:36 AM, Ajit Kumar Agarwal wrote: >> diff --git a/gcc/passes.def b/gcc/passes.def >> index 6b66f8f..20ddf3d 100644 >> --- a/gcc/passes.def >> +++ b/gcc/passes.def >> @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3. If not see >> NEXT_PASS (pass_ccp); >> /* After CCP we rewrite no longer addressed locals into SSA >> form if possible. */ >> + NEXT_PASS (pass_path_split); >> NEXT_PASS (pass_forwprop); >> NEXT_PASS (pass_sra_early); > I can't recall if we've discussed the location of the pass at all. I'm > not objecting to this location, but would like to hear why you chose > this particular location in the optimization pipeline. So returning to the question of where this would live in the optimization pipeline and how it interacts with if-conversion and vectorization. The concern with moving it to late in the pipeline was that we'd miss VRP/DCE/CSE opportunities. I'm not sure if you're aware, but we actually run those passes more than once. So it would be possible to run path splitting after if-conversion & vectorization, but before the second passs of VRP & DOM. But trying that seems to result in something scrambling the loop enough that the path splitting opportunity is missed. That might be worth deeper investigation if we can't come up with some kind of heuristics to fire or suppress path splitting. Other random notes as I look over the code: Call the pass "path-split", not "path_split". I don't think we have any passes with underscores in their names, dump files, etc. You factored out the code for transform_duplicate. When you create new functions, they should all have a block comment indicating what they do, return values, etc. I asked you to trim down the #includes in tree-ssa-path-split.c Most were ultimately unnecessary. The trimmed list is just 11 headers. Various functions in tree-ssa-path-split.c were missing their block comments. There were several places in tree-ssa-path-split that I felt deserved a comment. While you are familiar with the code, it's likely someone else will have to look at and modify this code at some point in the future. The comments help make that easier. In find_trace_loop_latch_same_as_join_blk, we find the immediate dominator of the latch and verify it ends in a conditional. That's fine. Then we look at the predecessors of the latch to see if one is succeeded only by the latch and falls through to the latch. That is the block we'll end up redirecting to a copy of the latch. Also fine. Note how there is no testing for the relationship between the immediate dominator of the latch and the predecessors of the latch. ISTM that we can have a fairly arbitrary region in the THEN/ELSE arms of the conditional. Was this intentional? Would it be advisable to verify that the THEN/ELSE arms are single blocks? Do we want to verify that neither the THEN/ELSE arms transfer control other than to the latch? Do we want to verify the predecessors of the latch are immediate successors of the latch's immediate dominator? The is_feasible_trace routine was still checking if the block had a conversion and rejecting it. I removed that check. It does seem to me that we need an upper limit on the number of statements. I wonder if we should factor out the maximum statements to copy code from jump threading and use it for both jump threading and path splitting. Instead of creating loop with multiple latches, what ever happened to the idea of duplicating the latch block twice -- once into each path. Remove the control statement in each duplicate. Then remove everything but the control statement in the original latch. I added some direct dump support. Essentially anytime we split the path, we output something like this: Split path in loop: latch block 9, predecessor 7. That allows tests in the testsuite to look for the "Split path in loop" string rather than inferring the information from the SSA graph update's replacement table. It also allows us to do things like count how many paths get split if we have more complex tests. On the topic of tests. Is the one you provided something where path splitting results in a significant improvement? From looking at the x86_64 output, I can see the path splitting transformation occur, but not any improvement in the final code. While the existing test is useful, testing on code that actually improves as a result of path splitting is better. Ideally we'd test both that path splitting occurred and that the secondary optimizations we wanted triggered. The tests should go into gcc.dg/tree-ssa rather than just gcc.dg. ANyway, here's my work-in-progress. Your thoughts on the various questions, concerns, ideas noted above would be appreciated. Obviously I'd like to wrap things up quickly and include this patch in gcc6. Note, I haven't bootstrapped or regression tested this version. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 34d2356..6613e83 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1474,6 +1474,7 @@ OBJS = \ tree-ssa-loop.o \ tree-ssa-math-opts.o \ tree-ssa-operands.o \ + tree-ssa-path-split.o \ tree-ssa-phionlycprop.o \ tree-ssa-phiopt.o \ tree-ssa-phiprop.o \ diff --git a/gcc/common.opt b/gcc/common.opt index 757ce85..9cc94e2 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2403,6 +2403,10 @@ ftree-vrp Common Report Var(flag_tree_vrp) Init(0) Optimization Perform Value Range Propagation on trees. +ftree-path-split +Common Report Var(flag_tree_path_split) Init(0) Optimization +Perform Path Splitting on trees for loop backedges + funit-at-a-time Common Report Var(flag_unit_at_a_time) Init(1) Compile whole compilation unit at a time. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 213a9d0..b1e95da 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -354,6 +354,7 @@ Objective-C and Objective-C++ Dialects}. -fdump-tree-fre@r{[}-@var{n}@r{]} @gol -fdump-tree-vtable-verify @gol -fdump-tree-vrp@r{[}-@var{n}@r{]} @gol +-fdump-tree-path-split@r{[}-@var{n}@r{]} @gol -fdump-tree-storeccp@r{[}-@var{n}@r{]} @gol -fdump-final-insns=@var{file} @gol -fcompare-debug@r{[}=@var{opts}@r{]} -fcompare-debug-second @gol @@ -462,7 +463,7 @@ Objective-C and Objective-C++ Dialects}. -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol -ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol --ftree-vectorize -ftree-vrp @gol +-ftree-vectorize -ftree-vrp @gol -ftree-path-split @gol -funit-at-a-time -funroll-all-loops -funroll-loops @gol -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol @@ -7169,6 +7170,11 @@ output on to @file{stderr}. If two conflicting dump filenames are given for the same pass, then the latter option overrides the earlier one. +@item path-split +@opindex fdump-tree-path-split +Dump each function after path splitting. The file name is made by +appending @file{.path-split} to the source file name + @item all Turn on all options, except @option{raw}, @option{slim}, @option{verbose} and @option{lineno}. @@ -7811,6 +7817,7 @@ also turns on the following optimization flags: -ftree-switch-conversion -ftree-tail-merge @gol -ftree-pre @gol -ftree-vrp @gol +-ftree-path-split @gol -fipa-ra} Please note the warning under @option{-fgcse} about @@ -8819,7 +8826,7 @@ currently enabled, but may be enabled by @option{-O2} in the future. @item -ftree-sink @opindex ftree-sink -Perform forward store motion on trees. This flag is +Perform forward store motion on trees. This flag is enabled by default at @option{-O} and higher. @item -ftree-bit-ccp @@ -9125,6 +9132,13 @@ enabled by default at @option{-O2} and higher. Null pointer check elimination is only done if @option{-fdelete-null-pointer-checks} is enabled. +@item -ftree-path-split +@opindex ftree-path-split +Perform Path Splitting on trees. When the two execution paths of a +if-then-else merge at the loop latch node, try to duplicate the +merge node into two paths. This is enabled by default at @option{-O2} +and above. + @item -fsplit-ivs-in-unroller @opindex fsplit-ivs-in-unroller Enables expression of values of induction variables in later iterations diff --git a/gcc/opts.c b/gcc/opts.c index 9a3fbb3..9a0b27c 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -509,6 +509,7 @@ static const struct default_options default_options_table[] = { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 }, + { OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 }, /* -O3 optimizations. */ { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 }, diff --git a/gcc/passes.def b/gcc/passes.def index c0ab6b9..e0c1cd8 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -79,6 +79,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_remove_cgraph_callee_edges); NEXT_PASS (pass_object_sizes); NEXT_PASS (pass_ccp); + NEXT_PASS (pass_path_split); /* After CCP we rewrite no longer addressed locals into SSA form if possible. */ NEXT_PASS (pass_forwprop); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c new file mode 100644 index 0000000..4b8637b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c @@ -0,0 +1,67 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-path-split-details " } */ + +#include +#include + +#define RGBMAX 255 + +int +test() +{ + int i, Pels; + unsigned char sum = 0; + unsigned char xr, xg, xb; + unsigned char xc, xm, xy, xk; + unsigned char *ReadPtr, *EritePtr; + + ReadPtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100); + EritePtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100); + + for (i = 0; i < 100;i++) + { + ReadPtr[i] = 100 - i; + } + + for (i = 0; i < 100; i++) + { + xr = *ReadPtr++; + xg = *ReadPtr++; + xb = *ReadPtr++; + + xc = (unsigned char) (RGBMAX - xr); + xm = (unsigned char) (RGBMAX - xg); + xy = (unsigned char) (RGBMAX - xb); + + if (xc < xm) + { + xk = (unsigned char) (xc < xy ? xc : xy); + } + else + { + xk = (unsigned char) (xm < xy ? xm : xy); + } + + xc = (unsigned char) (xc - xk); + xm = (unsigned char) (xm - xk); + xy = (unsigned char) (xy - xk); + + *EritePtr++ = xc; + *EritePtr++ = xm; + *EritePtr++ = xy; + *EritePtr++ = xk; + sum += *EritePtr; + } + return sum; +} + +int +main() +{ + if (test() != 33) + abort(); + + return 0; +} + +/* { dg-final { scan-tree-dump "Split path in loop:" "path-split" } } */ diff --git a/gcc/timevar.def b/gcc/timevar.def index b429faf..3dba68b 100644 --- a/gcc/timevar.def +++ b/gcc/timevar.def @@ -300,3 +300,4 @@ DEFTIMEVAR (TV_LINK , "link JIT code") DEFTIMEVAR (TV_LOAD , "load JIT result") DEFTIMEVAR (TV_JIT_ACQUIRING_MUTEX , "acquiring JIT mutex") DEFTIMEVAR (TV_JIT_CLIENT_CODE , "JIT client code") +DEFTIMEVAR (TV_TREE_PATH_SPLIT , "tree path split") diff --git a/gcc/tracer.c b/gcc/tracer.c index 941dc20..c7b5150 100644 --- a/gcc/tracer.c +++ b/gcc/tracer.c @@ -51,9 +51,9 @@ #include "tree-inline.h" #include "cfgloop.h" #include "fibonacci_heap.h" +#include "tracer.h" static int count_insns (basic_block); -static bool ignore_bb_p (const_basic_block); static bool better_p (const_edge, const_edge); static edge find_best_successor (basic_block); static edge find_best_predecessor (basic_block); @@ -85,7 +85,7 @@ bb_seen_p (basic_block bb) } /* Return true if we should ignore the basic block for purposes of tracing. */ -static bool +bool ignore_bb_p (const_basic_block bb) { if (bb->index < NUM_FIXED_BLOCKS) @@ -226,6 +226,24 @@ find_trace (basic_block bb, basic_block *trace) return i; } +/* Duplicate block BB2, placing it after BB in the CFG. Return the + newly created block. */ +basic_block +transform_duplicate (basic_block bb, basic_block bb2) +{ + edge e; + basic_block copy; + + e = find_edge (bb, bb2); + + copy = duplicate_block (bb2, e, bb); + flush_pending_stmts (e); + + add_phi_args_after_copy (©, 1, NULL); + + return (copy); +} + /* Look for basic blocks in frequency order, construct traces and tail duplicate if profitable. */ @@ -321,17 +339,8 @@ tail_duplicate (void) entries or at least rotate the loop. */ && bb2->loop_father->header != bb2) { - edge e; - basic_block copy; - - nduplicated += counts [bb2->index]; - - e = find_edge (bb, bb2); - - copy = duplicate_block (bb2, e, bb); - flush_pending_stmts (e); - - add_phi_args_after_copy (©, 1, NULL); + nduplicated += counts [bb2->index]; + basic_block copy = transform_duplicate (bb, bb2); /* Reconsider the original copy of block we've duplicated. Removing the most common predecessor may make it to be diff --git a/gcc/tracer.h b/gcc/tracer.h new file mode 100644 index 0000000..454d3b7 --- /dev/null +++ b/gcc/tracer.h @@ -0,0 +1,26 @@ +/* Header file for Tracer. + Copyright (C) 2015 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#ifndef GCC_TRACER_H +#define GCC_TRACER_H + +extern basic_block transform_duplicate (basic_block bb, basic_block bb2); +extern bool ignore_bb_p (const_basic_block bb); + +#endif /* GCC_TRaCER_H */ diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 49e22a9..6963acc 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -390,6 +390,7 @@ extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_path_split (gcc::context *ctxt); extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt); extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt); extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt); diff --git a/gcc/tree-ssa-path-split.c b/gcc/tree-ssa-path-split.c new file mode 100644 index 0000000..33dbc4d --- /dev/null +++ b/gcc/tree-ssa-path-split.c @@ -0,0 +1,254 @@ +/* Support routines for Path Splitting. + Copyright (C) 2015 Free Software Foundation, Inc. + Contributed by Ajit Kumar Agarwal . + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "cfganal.h" +#include "cfgloop.h" +#include "gimple-iterator.h" +#include "tracer.h" + +/* Given LOOP, if its latch has an immediate dominator that ends + in a simple conditional, then search for a block that is a + predecessor of the latch that falls through to the latch and + has no other successors. + + When found, put that block into TRACE[0] and the latch block into + TRACE[1]. Otherwise do nothing. */ + +static void +find_trace_loop_latch_same_as_join_blk (loop_p loop, basic_block *trace) +{ + basic_block latch = loop->latch; + + /* We don't support path splitting if the latch has more than two + predecessors. */ + if (EDGE_COUNT (latch->preds) == 2) + { + basic_block bb = get_immediate_dominator (CDI_DOMINATORS, latch); + gimple *last = gsi_stmt (gsi_last_bb (bb)); + + /* The immediate dominator of the latch must end in a conditional. */ + if (last && gimple_code (last) != GIMPLE_COND) + return; + + /* This looks for a predecessor of the latch which has a single + fallthru successor (that is the latch). Only one of the blocks + can match that pattern. + + Do we need to check that E->src is a successor of BB? */ + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, latch->preds) + { + if (!single_succ_p (e->src) + || !(single_succ_edge (e->src)->flags & EDGE_FALLTHRU)) + break; + else + { + trace[0] = e->src; + trace[1] = latch; + break; + } + } + } +} + +/* Return TRUE if BB is a reasonable block to duplicate by examining + its size, false otherwise. BB will always be a loop latch block. + + Should this use the same tests as we do for jump threading? */ + +static bool +is_feasible_trace (basic_block bb) +{ + int num_stmt = 0; + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (!is_gimple_debug (stmt)) + num_stmt++; + } + + /* We may want to limit how many statements we copy. */ + if (num_stmt > 1) + return true; + + return false; +} + +/* If the immediate dominator of the latch of the loop is + block with conditional branch, then the loop latch is + duplicated to its predecessors path preserving the SSA + semantics. + + CFG before transformation. + + : + xk_35 = MIN_EXPR ; + goto ; + + : + xk_36 = MIN_EXPR ; + + : + # xk_4 = PHI + xc_37 = xc_32 - xk_4; + xm_38 = xm_33 - xk_4; + xy_39 = xy_34 - xk_4; + + CFG After Path Splitting transformation + before cleanup phase. + + : + xk_35 = MIN_EXPR ; + + : + # xk_29 = PHI + xc_56 = xc_32 - xk_29; + xm_57 = xm_33 - xk_29; + xy_58 = xy_34 - xk_29; + goto ; + + : + xk_36 = MIN_EXPR ; + + : + # xk_4 = PHI + xc_37 = xc_32 - xk_4; + xm_38 = xm_33 - xk_4; + xy_39 = xy_34 - xk_4; + + : ....... */ + +static bool +perform_path_splitting () +{ + bool changed = false; + basic_block trace[2] = {NULL, NULL}; + loop_p loop; + + loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS); + initialize_original_copy_tables (); + calculate_dominance_info (CDI_DOMINATORS); + + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) + { + /* If we're optimizing for size, or should not duplicate + the latch block for some other reason, then ignore this + loop. */ + if (ignore_bb_p (loop->latch)) + continue; + + /* Get the latch node and its predecessor, storing them into + trace[0] and trace[1] respectively. */ + find_trace_loop_latch_same_as_join_blk (loop, trace); + + if (trace[0] && trace[1] && is_feasible_trace (trace[1])) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Split path in loop: latch block %d, predecessor %d.\n", + trace[1]->index, trace[0]->index); + transform_duplicate (trace[0], trace[1]); + trace[0] = NULL; + trace[1] = NULL; + changed = true; + } + } + + loop_optimizer_finalize (); + free_original_copy_tables (); + return changed; +} + +/* Main entry point for path splitting. Returns TODO_cleanup_cfg if any + paths where split, otherwise return zero. */ + +static unsigned int +execute_path_split (void) +{ + /* If we don't have at least 2 real blocks and backedges in the + CFG, then there's no point in trying to perform path splitting. */ + if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1 + || !mark_dfs_back_edges ()) + return 0; + + bool changed = perform_path_splitting(); + if (changed) + { + free_dominance_info (CDI_DOMINATORS); + /* If we changed the CFG schedule loops for fixup by cleanup_cfg. */ + if (current_loops) + loops_state_set (LOOPS_NEED_FIXUP); + } + + return changed ? TODO_cleanup_cfg : 0; + +} + +static bool +gate_path_split(void) +{ + return flag_tree_path_split != 0; +} + +namespace { + +const pass_data pass_data_path_split = +{ + GIMPLE_PASS, /* type */ + "path-split", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_TREE_PATH_SPLIT, /* tv_id */ + PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa, /* todo_flags_finish */ +}; + +class pass_path_split : public gimple_opt_pass +{ + public: + pass_path_split (gcc::context *ctxt) + : gimple_opt_pass (pass_data_path_split, ctxt) + {} + /* opt_pass methods: */ + opt_pass * clone () { return new pass_path_split (m_ctxt); } + virtual bool gate (function *) { return gate_path_split (); } + virtual unsigned int execute (function *) { return execute_path_split (); } + +}; // class pass_path_split + +} // anon namespace + +gimple_opt_pass * +make_pass_path_split (gcc::context *ctxt) +{ + return new pass_path_split (ctxt); +}