From patchwork Wed Nov 11 20:38:10 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jeff Law <law@redhat.com>
X-Patchwork-Id: 543092
Return-Path: 
 <gcc-patches-return-413706-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id F33881402D4
	for <incoming@patchwork.ozlabs.org>;
	Thu, 12 Nov 2015 07:38:27 +1100 (AEDT)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=kgFFm0/N; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:subject:to:references:cc:from:message-id:date:mime-version
	:in-reply-to:content-type; q=dns; s=default; b=HugRWKrbc05rZAhDp
	7+XupbWHCJ0y0J8KK0RpIHCkjza8JF7qEoa3mpDKN2vGhr6NOOXRw4UYPsKLfhGW
	7MOIdZUIIBMcnqt+N/XTuHJcvHbgNi5XHPIjOxvfvn1ytEbWWBgowCGy0buOXbZG
	HvPx96+Rl/G+op4LA41EO4QSNg=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:subject:to:references:cc:from:message-id:date:mime-version
	:in-reply-to:content-type; s=default; bh=7m+U68lnt1NIe3FiThYskMl
	IJQE=; b=kgFFm0/N3ZuwLuGA4JFVhUaRzeTqrMQx1xoG6Y1pRCxLOslFbUE6UfU
	lYa5zYtEeIA0P2XM6oj6waYVrL7vFfIUtPVYmXTKxE6wJxj7TGHIKctWNmVySvz0
	VH/WTca9pVLO8hnxG6+qmWHp83T2OEJQPNHbHFFKLNbDyJVlDx1Y=
Received: (qmail 49387 invoked by alias); 11 Nov 2015 20:38:18 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 49372 invoked by uid 89); 11 Nov 2015 20:38:16 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00,
	RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	(AES256-GCM-SHA384 encrypted) ESMTPS;
	Wed, 11 Nov 2015 20:38:13 +0000
Received: from int-mx10.intmail.prod.int.phx2.redhat.com
	(int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])	by
	mx1.redhat.com (Postfix) with ESMTPS id DAA33200;
	Wed, 11 Nov 2015 20:38:11 +0000 (UTC)
Received: from localhost.localdomain (ovpn-113-52.phx2.redhat.com
	[10.3.113.52])	by int-mx10.intmail.prod.int.phx2.redhat.com
	(8.14.4/8.14.4) with ESMTP id tABKcAVq007036;
	Wed, 11 Nov 2015 15:38:11 -0500
Subject: Re: [Patch,
	tree-optimization]: Add new path Splitting pass on tree ssa
	representation
To: Ajit Kumar Agarwal <ajit.kumar.agarwal@xilinx.com>,
	Richard Biener <richard.guenther@gmail.com>
References: 
 <37378DC5BCD0EE48BA4B082E0B55DFAA41F3F56C@XAP-PVEXMBX02.xlnx.xilinx.com>
	<CAFiYyc1wXMQ_69b=Q9V=bz5KAiLAp8C9VugpfyOZqt9iiDMzXQ@mail.gmail.com>
	<37378DC5BCD0EE48BA4B082E0B55DFAA41F41150@XAP-PVEXMBX02.xlnx.xilinx.com>
	<CAFiYyc0T+8CeeruKp-S2byhyBXW41ub7esqin+OuAYAy8q1WJA@mail.gmail.com>
	<37378DC5BCD0EE48BA4B082E0B55DFAA41F42541@XAP-PVEXMBX02.xlnx.xilinx.com>
	<CAFiYyc23ZRxFv4S=XKQUX2VgT113GDZVVjOYViSoYy4yP11cdg@mail.gmail.com>
	<37378DC5BCD0EE48BA4B082E0B55DFAA4295155D@XAP-PVEXMBX02.xlnx.xilinx.com>
	<37378DC5BCD0EE48BA4B082E0B55DFAA4295ADCB@XAP-PVEXMBX02.xlnx.xilinx.com>
	<55D4F921.2020708@redhat.com>
	<37378DC5BCD0EE48BA4B082E0B55DFAA4297704C@XAP-PVEXMBX02.xlnx.xilinx.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, Vinod Kathail <vinodk@xilinx.com>,
	Shail Aditya Gupta <shailadi@xilinx.com>,
	Vidhumouli Hunsigida <vidhum@xilinx.com>,
	Nagaraju Mekala <nmekala@xilinx.com>
From: Jeff Law <law@redhat.com>
Message-ID: <5643A732.4040707@redhat.com>
Date: Wed, 11 Nov 2015 13:38:10 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: 
 <37378DC5BCD0EE48BA4B082E0B55DFAA4297704C@XAP-PVEXMBX02.xlnx.xilinx.com>
X-IsSubscribed: yes

On 09/04/2015 11:36 AM, Ajit Kumar Agarwal wrote:

>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index 6b66f8f..20ddf3d 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
>>    	  NEXT_PASS (pass_ccp);
>>    	  /* After CCP we rewrite no longer addressed locals into SSA
>>    	     form if possible.  */
>> +          NEXT_PASS (pass_path_split);
>>    	  NEXT_PASS (pass_forwprop);
>>    	  NEXT_PASS (pass_sra_early);
> I can't recall if we've discussed the location of the pass at all.  I'm
> not objecting to this location, but would like to hear why you chose
> this particular location in the optimization pipeline.
So returning to the question of where this would live in the 
optimization pipeline and how it interacts with if-conversion and 
vectorization.

The concern with moving it to late in the pipeline was that we'd miss 
VRP/DCE/CSE opportunities.  I'm not sure if you're aware, but we 
actually run those passes more than once.  So it would be possible to 
run path splitting after if-conversion & vectorization, but before the 
second passs of VRP & DOM.  But trying that seems to result in something 
scrambling the loop enough that the path splitting opportunity is 
missed.  That might be worth deeper investigation if we can't come up 
with some kind of heuristics to fire or suppress path splitting.

Other random notes as I look over the code:

Call the pass "path-split", not "path_split".  I don't think we have any 
passes with underscores in their names, dump files, etc.

You factored out the code for transform_duplicate.  When you create new 
functions, they should all have a block comment indicating what they do, 
return values, etc.

I asked you to trim down the #includes in tree-ssa-path-split.c  Most 
were ultimately unnecessary.  The trimmed list is just 11 headers.

Various functions in tree-ssa-path-split.c were missing their block 
comments.  There were several places in tree-ssa-path-split that I felt 
deserved a comment.  While you are familiar with the code, it's likely 
someone else will have to look at and modify this code at some point in 
the future.  The comments help make that easier.

In find_trace_loop_latch_same_as_join_blk, we find the immediate 
dominator of the latch and verify it ends in a conditional.  That's 
fine.  Then we look at the predecessors of the latch to see if one is 
succeeded only by the latch and falls through to the latch.  That is the 
block we'll end up redirecting to a copy of the latch.  Also fine.

Note how there is no testing for the relationship between the immediate 
dominator of the latch and the predecessors of the latch.  ISTM that we 
can have a fairly arbitrary region in the THEN/ELSE arms of the 
conditional.  Was this intentional?  Would it be advisable to verify 
that the THEN/ELSE arms are single blocks?  Do we want to verify that 
neither the THEN/ELSE arms transfer control other than to the latch?  Do 
we want to verify the predecessors of the latch are immediate successors 
of the latch's immediate dominator?

The is_feasible_trace routine was still checking if the block had a 
conversion and rejecting it.  I removed that check.  It does seem to me 
that we need an upper limit on the number of statements.  I wonder if we 
should factor out the maximum statements to copy code from jump 
threading and use it for both jump threading and path splitting.

Instead of creating loop with multiple latches, what ever happened to 
the idea of duplicating the latch block twice -- once into each path. 
Remove the control statement in each duplicate.  Then remove everything 
but the control statement in the original latch.


I added some direct dump support.  Essentially anytime we split the 
path, we output something like this:

Split path in loop: latch block 9, predecessor 7.

That allows tests in the testsuite to look for the "Split path in loop" 
string rather than inferring the information from the SSA graph update's 
replacement table.  It also allows us to do things like count how many 
paths get split if we have more complex tests.

On the topic of tests.  Is the one you provided something where path 
splitting results in a significant improvement?  From looking at the 
x86_64 output, I can see the path splitting transformation occur, but 
not any improvement in the final code.

While the existing test is useful, testing on code that actually 
improves as a result of path splitting is better.  Ideally we'd test 
both that path splitting occurred and that the secondary optimizations 
we wanted triggered.

The tests should go into gcc.dg/tree-ssa rather than just gcc.dg.

ANyway, here's my work-in-progress.  Your thoughts on the various 
questions, concerns, ideas noted above would be appreciated.  Obviously 
I'd like to wrap things up quickly and include this patch in gcc6.

Note, I haven't bootstrapped or regression tested this version.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 34d2356..6613e83 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1474,6 +1474,7 @@ OBJS = \
 	tree-ssa-loop.o \
 	tree-ssa-math-opts.o \
 	tree-ssa-operands.o \
+	tree-ssa-path-split.o \
 	tree-ssa-phionlycprop.o \
 	tree-ssa-phiopt.o \
 	tree-ssa-phiprop.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 757ce85..9cc94e2 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2403,6 +2403,10 @@ ftree-vrp
 Common Report Var(flag_tree_vrp) Init(0) Optimization
 Perform Value Range Propagation on trees.
 
+ftree-path-split
+Common Report Var(flag_tree_path_split) Init(0) Optimization
+Perform Path Splitting on trees for loop backedges
+
 funit-at-a-time
 Common Report Var(flag_unit_at_a_time) Init(1)
 Compile whole compilation unit at a time.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 213a9d0..b1e95da 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -354,6 +354,7 @@ Objective-C and Objective-C++ Dialects}.
 -fdump-tree-fre@r{[}-@var{n}@r{]} @gol
 -fdump-tree-vtable-verify @gol
 -fdump-tree-vrp@r{[}-@var{n}@r{]} @gol
+-fdump-tree-path-split@r{[}-@var{n}@r{]} @gol
 -fdump-tree-storeccp@r{[}-@var{n}@r{]} @gol
 -fdump-final-insns=@var{file} @gol
 -fcompare-debug@r{[}=@var{opts}@r{]}  -fcompare-debug-second @gol
@@ -462,7 +463,7 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
 -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
 -ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
--ftree-vectorize -ftree-vrp @gol
+-ftree-vectorize -ftree-vrp @gol -ftree-path-split @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
 -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol
 -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol
@@ -7169,6 +7170,11 @@ output on to @file{stderr}. If two conflicting dump filenames are
 given for the same pass, then the latter option overrides the earlier
 one.
 
+@item path-split
+@opindex fdump-tree-path-split
+Dump each function after path splitting.  The file name is made by
+appending @file{.path-split} to the source file name
+
 @item all
 Turn on all options, except @option{raw}, @option{slim}, @option{verbose}
 and @option{lineno}.
@@ -7811,6 +7817,7 @@ also turns on the following optimization flags:
 -ftree-switch-conversion -ftree-tail-merge @gol
 -ftree-pre @gol
 -ftree-vrp @gol
+-ftree-path-split @gol
 -fipa-ra}
 
 Please note the warning under @option{-fgcse} about
@@ -8819,7 +8826,7 @@ currently enabled, but may be enabled by @option{-O2} in the future.
 
 @item -ftree-sink
 @opindex ftree-sink
-Perform forward store motion  on trees.  This flag is
+Perform forward store motion on trees.  This flag is
 enabled by default at @option{-O} and higher.
 
 @item -ftree-bit-ccp
@@ -9125,6 +9132,13 @@ enabled by default at @option{-O2} and higher.  Null pointer check
 elimination is only done if @option{-fdelete-null-pointer-checks} is
 enabled.
 
+@item -ftree-path-split
+@opindex ftree-path-split
+Perform Path Splitting on trees.  When the two execution paths of a
+if-then-else merge at the loop latch node, try to duplicate the
+merge node into two paths. This is enabled by default at @option{-O2}
+and above.
+
 @item -fsplit-ivs-in-unroller
 @opindex fsplit-ivs-in-unroller
 Enables expression of values of induction variables in later iterations
diff --git a/gcc/opts.c b/gcc/opts.c
index 9a3fbb3..9a0b27c 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -509,6 +509,7 @@ static const struct default_options default_options_table[] =
     { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
+    { OPT_LEVELS_2_PLUS, OPT_ftree_path_split, NULL, 1 },
 
     /* -O3 optimizations.  */
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
diff --git a/gcc/passes.def b/gcc/passes.def
index c0ab6b9..e0c1cd8 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_remove_cgraph_callee_edges);
 	  NEXT_PASS (pass_object_sizes);
 	  NEXT_PASS (pass_ccp);
+	  NEXT_PASS (pass_path_split);
 	  /* After CCP we rewrite no longer addressed locals into SSA
 	     form if possible.  */
 	  NEXT_PASS (pass_forwprop);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c
new file mode 100644
index 0000000..4b8637b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/path-split-1.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-path-split-details " } */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#define RGBMAX 255
+
+int
+test()
+{
+  int i, Pels;
+  unsigned char sum = 0;
+  unsigned char xr, xg, xb;
+  unsigned char xc, xm, xy, xk;
+  unsigned char *ReadPtr, *EritePtr;
+
+  ReadPtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100);
+  EritePtr = ( unsigned char *) malloc (sizeof (unsigned char) * 100);
+
+  for (i = 0; i < 100;i++)
+     {
+       ReadPtr[i] = 100 - i;
+     }
+
+  for (i = 0; i < 100; i++)
+     {
+       xr = *ReadPtr++;
+       xg = *ReadPtr++;
+       xb = *ReadPtr++;
+
+       xc = (unsigned char) (RGBMAX - xr);
+       xm = (unsigned char) (RGBMAX - xg);
+       xy = (unsigned char) (RGBMAX - xb);
+
+       if (xc < xm)
+         {
+           xk = (unsigned char) (xc < xy ? xc : xy);
+         }
+       else
+        {
+          xk = (unsigned char) (xm < xy ? xm : xy);
+        }
+
+       xc = (unsigned char) (xc - xk);
+       xm = (unsigned char) (xm - xk);
+       xy = (unsigned char) (xy - xk);
+
+       *EritePtr++ = xc;
+       *EritePtr++ = xm;
+       *EritePtr++ = xy;
+       *EritePtr++ = xk;
+       sum += *EritePtr;
+    }
+  return sum;
+}
+
+int
+main()
+{
+  if (test() != 33)
+    abort();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "Split path in loop:" "path-split" } } */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index b429faf..3dba68b 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -300,3 +300,4 @@ DEFTIMEVAR (TV_LINK		     , "link JIT code")
 DEFTIMEVAR (TV_LOAD		     , "load JIT result")
 DEFTIMEVAR (TV_JIT_ACQUIRING_MUTEX   , "acquiring JIT mutex")
 DEFTIMEVAR (TV_JIT_CLIENT_CODE   , "JIT client code")
+DEFTIMEVAR (TV_TREE_PATH_SPLIT  , "tree path split")
diff --git a/gcc/tracer.c b/gcc/tracer.c
index 941dc20..c7b5150 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -51,9 +51,9 @@
 #include "tree-inline.h"
 #include "cfgloop.h"
 #include "fibonacci_heap.h"
+#include "tracer.h"
 
 static int count_insns (basic_block);
-static bool ignore_bb_p (const_basic_block);
 static bool better_p (const_edge, const_edge);
 static edge find_best_successor (basic_block);
 static edge find_best_predecessor (basic_block);
@@ -85,7 +85,7 @@ bb_seen_p (basic_block bb)
 }
 
 /* Return true if we should ignore the basic block for purposes of tracing.  */
-static bool
+bool
 ignore_bb_p (const_basic_block bb)
 {
   if (bb->index < NUM_FIXED_BLOCKS)
@@ -226,6 +226,24 @@ find_trace (basic_block bb, basic_block *trace)
   return i;
 }
 
+/* Duplicate block BB2, placing it after BB in the CFG.  Return the
+   newly created block.  */
+basic_block
+transform_duplicate (basic_block bb, basic_block bb2)
+{
+  edge e;
+  basic_block copy;
+
+  e = find_edge (bb, bb2);
+
+  copy = duplicate_block (bb2, e, bb);
+  flush_pending_stmts (e);
+
+  add_phi_args_after_copy (&copy, 1, NULL);
+
+  return (copy);
+}
+
 /* Look for basic blocks in frequency order, construct traces and tail duplicate
    if profitable.  */
 
@@ -321,17 +339,8 @@ tail_duplicate (void)
 		 entries or at least rotate the loop.  */
 	      && bb2->loop_father->header != bb2)
 	    {
-	      edge e;
-	      basic_block copy;
-
-	      nduplicated += counts [bb2->index];
-
-	      e = find_edge (bb, bb2);
-
-	      copy = duplicate_block (bb2, e, bb);
-	      flush_pending_stmts (e);
-
-	      add_phi_args_after_copy (&copy, 1, NULL);
+              nduplicated += counts [bb2->index];
+              basic_block copy = transform_duplicate (bb, bb2);
 
 	      /* Reconsider the original copy of block we've duplicated.
 	         Removing the most common predecessor may make it to be
diff --git a/gcc/tracer.h b/gcc/tracer.h
new file mode 100644
index 0000000..454d3b7
--- /dev/null
+++ b/gcc/tracer.h
@@ -0,0 +1,26 @@
+/* Header file for Tracer.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_TRACER_H
+#define GCC_TRACER_H
+
+extern basic_block transform_duplicate (basic_block bb, basic_block bb2);
+extern bool ignore_bb_p (const_basic_block bb);
+
+#endif /* GCC_TRaCER_H */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 49e22a9..6963acc 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -390,6 +390,7 @@ extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_path_split (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-path-split.c b/gcc/tree-ssa-path-split.c
new file mode 100644
index 0000000..33dbc4d
--- /dev/null
+++ b/gcc/tree-ssa-path-split.c
@@ -0,0 +1,254 @@
+/* Support routines for Path Splitting.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Ajit Kumar Agarwal <ajitkum@xilinx.com>.
+
+ This file is part of GCC.
+
+ GCC is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "cfganal.h"
+#include "cfgloop.h"
+#include "gimple-iterator.h"
+#include "tracer.h"
+
+/* Given LOOP, if its latch has an immediate dominator that ends
+   in a simple conditional, then search for a block that is a
+   predecessor of the latch that falls through to the latch and
+   has no other successors.
+
+   When found, put that block into TRACE[0] and the latch block into
+   TRACE[1].  Otherwise do nothing.  */
+
+static void
+find_trace_loop_latch_same_as_join_blk (loop_p loop, basic_block *trace)
+{
+  basic_block latch = loop->latch;
+
+  /* We don't support path splitting if the latch has more than two
+     predecessors.  */
+  if (EDGE_COUNT (latch->preds) == 2)
+    {
+      basic_block bb = get_immediate_dominator (CDI_DOMINATORS, latch);
+      gimple *last = gsi_stmt (gsi_last_bb (bb));
+
+      /* The immediate dominator of the latch must end in a conditional.  */
+      if (last && gimple_code (last) != GIMPLE_COND)
+	return;
+
+      /* This looks for a predecessor of the latch which has a single
+	 fallthru successor (that is the latch).  Only one of the blocks
+	 can match that pattern. 
+
+	 Do we need to check that E->src is a successor of BB?  */
+      edge_iterator ei;
+      edge e;
+      FOR_EACH_EDGE (e, ei, latch->preds)
+	{
+	  if (!single_succ_p (e->src)
+	      || !(single_succ_edge (e->src)->flags & EDGE_FALLTHRU))
+	    break;
+	  else
+	    {
+	      trace[0] = e->src;
+	      trace[1] = latch;
+	      break;
+	    }
+	}
+    }
+}
+
+/* Return TRUE if BB is a reasonable block to duplicate by examining
+   its size, false otherwise.  BB will always be a loop latch block.
+
+   Should this use the same tests as we do for jump threading?  */
+
+static bool
+is_feasible_trace (basic_block bb)
+{
+  int num_stmt = 0;
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      if (!is_gimple_debug (stmt))
+	num_stmt++;
+    }
+
+  /* We may want to limit how many statements we copy.  */
+  if (num_stmt > 1)
+    return true;
+
+  return false;
+}
+
+/* If the immediate dominator of the latch of the loop is
+   block with conditional branch, then the loop latch  is
+   duplicated to its predecessors path preserving the SSA
+   semantics.
+
+   CFG before transformation.
+
+   <bb 6>:
+      xk_35 = MIN_EXPR <xy_34, xc_32>;
+      goto <bb 8>;
+
+   <bb 7>:
+      xk_36 = MIN_EXPR <xy_34, xm_33>;
+
+   <bb 8>:
+      # xk_4 = PHI <xk_35(6), xk_36(7)>
+      xc_37 = xc_32 - xk_4;
+      xm_38 = xm_33 - xk_4;
+      xy_39 = xy_34 - xk_4;
+
+   CFG After Path Splitting transformation
+   before cleanup phase.
+
+   <bb 7>:
+     xk_35 = MIN_EXPR <xy_34, xc_32>;
+
+   <bb 8>:
+     # xk_29 = PHI <xk_35(7)>
+     xc_56 = xc_32 - xk_29;
+     xm_57 = xm_33 - xk_29;
+     xy_58 = xy_34 - xk_29;
+     goto <bb 11>;
+
+   <bb 9>:
+     xk_36 = MIN_EXPR <xy_34, xm_33>;
+
+   <bb 10>:
+     # xk_4 = PHI <xk_36(9)>
+     xc_37 = xc_32 - xk_4;
+     xm_38 = xm_33 - xk_4;
+     xy_39 = xy_34 - xk_4;
+
+  <bb 11>: .......  */
+
+static bool
+perform_path_splitting ()
+{
+  bool changed = false;
+  basic_block trace[2] = {NULL, NULL};
+  loop_p loop;
+
+  loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
+  initialize_original_copy_tables ();
+  calculate_dominance_info (CDI_DOMINATORS);
+
+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+    {
+      /* If we're optimizing for size, or should not duplicate
+	 the latch block for some other reason, then ignore this
+	 loop.  */
+      if (ignore_bb_p (loop->latch))
+	continue;
+
+      /* Get the latch node and its predecessor, storing them into
+	 trace[0] and trace[1] respectively.  */
+      find_trace_loop_latch_same_as_join_blk (loop, trace);
+
+      if (trace[0] && trace[1] && is_feasible_trace (trace[1]))
+	{
+          if (dump_file && (dump_flags & TDF_DETAILS)) 
+            fprintf (dump_file,
+		     "Split path in loop: latch block %d, predecessor %d.\n",
+		     trace[1]->index, trace[0]->index);
+	  transform_duplicate (trace[0], trace[1]);
+	  trace[0] = NULL;
+	  trace[1] = NULL;
+	  changed = true;
+	}
+    }
+
+  loop_optimizer_finalize ();
+  free_original_copy_tables ();
+  return changed;
+}
+
+/* Main entry point for path splitting.  Returns TODO_cleanup_cfg if any
+   paths where split, otherwise return zero.  */
+
+static unsigned int
+execute_path_split (void)
+{
+  /* If we don't have at least 2 real blocks and backedges in the
+     CFG, then there's no point in trying to perform path splitting.  */
+  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1
+      || !mark_dfs_back_edges ())
+    return 0;
+
+  bool changed = perform_path_splitting();
+  if (changed)
+    {
+      free_dominance_info (CDI_DOMINATORS);
+      /* If we changed the CFG schedule loops for fixup by cleanup_cfg.  */
+      if (current_loops)
+	loops_state_set (LOOPS_NEED_FIXUP);
+    }
+
+  return changed ? TODO_cleanup_cfg : 0;
+
+}
+
+static bool
+gate_path_split(void)
+{
+  return flag_tree_path_split != 0;
+}
+
+namespace {
+
+const pass_data pass_data_path_split =
+{
+  GIMPLE_PASS, /* type */
+  "path-split", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_TREE_PATH_SPLIT, /* tv_id */
+  PROP_ssa, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_path_split : public gimple_opt_pass
+{
+   public:
+    pass_path_split (gcc::context *ctxt)
+      : gimple_opt_pass (pass_data_path_split, ctxt)
+    {}
+   /* opt_pass methods: */
+   opt_pass * clone () { return new pass_path_split (m_ctxt); }
+   virtual bool gate (function *) { return gate_path_split (); }
+   virtual unsigned int execute (function *) { return execute_path_split (); }
+
+}; // class pass_path_split
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_path_split (gcc::context *ctxt)
+{
+  return new pass_path_split (ctxt);
+}