From patchwork Fri Dec 19 10:20:07 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Zamyatin, Igor" <igor.zamyatin@intel.com>
X-Patchwork-Id: 422843
Return-Path: 
 <gcc-patches-return-387796-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 8CEBF140081
	for <incoming@patchwork.ozlabs.org>;
	Fri, 19 Dec 2014 21:20:23 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:content-type
	:content-transfer-encoding:mime-version; q=dns; s=default; b=jBh
	FMqKGJuRzMgJyDU3IeGg11stPA0KBoxVkHn+jq2dpi5s50wQh3PT8FBysj48+kIP
	Aun+v/JUayXIMHjCDn4SfInaUm8ij/kmtabqCRWUhBic5VHfxJSdnEK5W8U8b/Sc
	OXa8l83m9xPpXFIAb6V87j52fCvs3ukAuJdrM7vo=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:content-type
	:content-transfer-encoding:mime-version; s=default; bh=qOCrABjgG
	mcLarg4CDUG2ZH3F6Q=; b=yfcdGrGgr9xI6FcTez9h18WdT1HRyLnSNTkcxzwTE
	qr5omjnGZkhOfU4S24mMzDQRIQ+dx+2OLjKqqhH+4nZxRGILftcAlHp+gwJfastj
	S54/3DjsiG6/HpkM7KkdTwgf9ENWG6EVCMx6MtSVBGTgcFYEE05uIPXtUKZmkpuq
	UE=
Received: (qmail 26104 invoked by alias); 19 Dec 2014 10:20:15 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 26091 invoked by uid 89); 19 Dec 2014 10:20:14 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00,
	SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: mga11.intel.com
Received: from mga11.intel.com (HELO mga11.intel.com) (192.55.52.93) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Fri, 19 Dec 2014 10:20:11 +0000
Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by
	fmsmga102.fm.intel.com with ESMTP; 19 Dec 2014 02:20:10 -0800
X-ExtLoop1: 1
Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by
	fmsmga002.fm.intel.com with ESMTP; 19 Dec 2014 02:20:09 -0800
Received: from irsmsx101.ger.corp.intel.com ([169.254.1.126]) by
	IRSMSX104.ger.corp.intel.com ([169.254.5.209]) with mapi id
	14.03.0195.001; Fri, 19 Dec 2014 10:20:08 +0000
From: "Zamyatin, Igor" <igor.zamyatin@intel.com>
To: "GCC Patches (gcc-patches@gcc.gnu.org)" <gcc-patches@gcc.gnu.org>
CC: "ysrumyan@gmail.com" <ysrumyan@gmail.com>
Subject: [PATCH] Fix for PR64081 in RTL loop unroller
Date: Fri, 19 Dec 2014 10:20:07 +0000
Message-ID: 
 <0EFAB2BDD0F67E4FB6CCC8B9F87D756969CF7BFC@IRSMSX101.ger.corp.intel.com>
MIME-Version: 1.0
X-IsSubscribed: yes

Hi!

This is an attempt to extend RTL unroller to allow cases like mentioned in the PR - 
namely when loop has duplicated exit blocks and back edges.

Bootstrapped and regtested on x86_64, also checking wide range of benchmarks - spec2K, spec2006, EEMBC
Is it ok for trunk in case if no testing issues?

Thanks,
Igor

Changelog:

Gcc:

2014-12-19  Igor Zamyatin  <igor.zamyatin@intel.com>

	PR rtl-optimization/64081
	* loop-iv.c (def_pred_latch_p): New function.
	(latch_dominating_def): Allow specific cases with non-single
	definitions.
	(iv_get_reaching_def): Likewise.
	(check_complex_exit_p): New function.
	(check_simple_exit): Use check_complex_exit_p to allow certain cases
	with exits not executing on any iteration.


Testsuite:

2014-12-19  Igor Zamyatin  <igor.zamyatin@intel.com>

	PR rtl-optimization/64081
	* gcc.dg/pr64081.c: New test.

diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index f55cea2..d5d48f1 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -314,34 +314,67 @@ iv_analysis_loop_init (struct loop *loop)
   check_iv_ref_table_size ();
 }
 
+/* Checks that def is in an immediate predecessor of the latch block.  */
+
+static bool
+def_pred_latch_p (df_ref d_ref)
+{
+  basic_block bb = DF_REF_BB (d_ref);
+  edge_iterator ei;
+  edge e;
+
+  FOR_EACH_EDGE (e, ei, current_loop->latch->preds)
+    {
+      if (e->src == bb)
+	return true;
+    }
+  return false;
+}
+
 /* Finds the definition of REG that dominates loop latch and stores
    it to DEF.  Returns false if there is not a single definition
-   dominating the latch.  If REG has no definition in loop, DEF
+   dominating the latch or all defs are same and they are on different
+   predecessors of loop latch.  If REG has no definition in loop, DEF
    is set to NULL and true is returned.  */
 
 static bool
 latch_dominating_def (rtx reg, df_ref *def)
 {
   df_ref single_rd = NULL, adef;
-  unsigned regno = REGNO (reg);
+  unsigned regno = REGNO (reg), def_num = 0;
   struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (current_loop->latch);
 
   for (adef = DF_REG_DEF_CHAIN (regno); adef; adef = DF_REF_NEXT_REG (adef))
     {
+      /* Initialize this to true for the very first iteration when
+	 SINGLE_RD is NULL.  */
+      bool def_pred_latch = true;
+
       if (!bitmap_bit_p (df->blocks_to_analyze, DF_REF_BBNO (adef))
 	  || !bitmap_bit_p (&bb_info->out, DF_REF_ID (adef)))
 	continue;
 
-      /* More than one reaching definition.  */
+      /* More than one reaching definition is ok in case definitions are
+	 in predecessors of latch block and those definitions are the same.
+	 Probably this could be relaxed and check for sub-dominance instead
+	 predecessor.  */
       if (single_rd)
-	return false;
-
-      if (!just_once_each_iteration_p (current_loop, DF_REF_BB (adef)))
-	return false;
+	{
+	  def_num++;
+	  if (!(def_pred_latch = def_pred_latch_p (adef))
+	      || !rtx_equal_p( PATTERN (DF_REF_INSN (single_rd)),
+			       PATTERN (DF_REF_INSN (adef))))
+	    return false;
+	}
 
       single_rd = adef;
     }
 
+  /* If we have single definition it has to be excuted on each iteration.  */
+  if (!def_num && single_rd
+      && !just_once_each_iteration_p (current_loop, DF_REF_BB (single_rd)))
+    return false;
+
   *def = single_rd;
   return true;
 }
@@ -351,10 +384,10 @@ latch_dominating_def (rtx reg, df_ref *def)
 static enum iv_grd_result
 iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def)
 {
-  df_ref use, adef;
+  df_ref use, adef = NULL;
   basic_block def_bb, use_bb;
   rtx_insn *def_insn;
-  bool dom_p;
+  bool dom_p, dom_latch_p = false;
 
   *def = NULL;
   if (!simple_reg_p (reg))
@@ -369,11 +402,26 @@ iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def)
   if (!DF_REF_CHAIN (use))
     return GRD_INVARIANT;
 
-  /* More than one reaching def.  */
+  adef = DF_REF_CHAIN (use)->ref;
+  /* Having more than one reaching def is ok once all defs are in blocks which
+     are latch's predecessors.  */
   if (DF_REF_CHAIN (use)->next)
-    return GRD_INVALID;
+    {
+      df_link* defs;
+      unsigned int def_num = 0;
 
-  adef = DF_REF_CHAIN (use)->ref;
+      for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
+	{
+	  if (!def_pred_latch_p (defs->ref))
+	    return GRD_INVALID;
+	  def_num++;
+	}
+      /* Make sure all preds contain definitions.  */
+      if (def_num != EDGE_COUNT (current_loop->latch->preds))
+	return GRD_INVALID;
+
+      dom_latch_p = true;
+    }
 
   /* We do not handle setting only part of the register.  */
   if (DF_REF_FLAGS (adef) & DF_REF_READ_WRITE)
@@ -396,8 +444,8 @@ iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def)
 
   /* The definition does not dominate the use.  This is still OK if
      this may be a use of a biv, i.e. if the def_bb dominates loop
-     latch.  */
-  if (just_once_each_iteration_p (current_loop, def_bb))
+     latch or all defs are in latch's predecessors.  */
+  if (dom_latch_p || just_once_each_iteration_p (current_loop, def_bb))
     return GRD_MAYBE_BIV;
 
   return GRD_INVALID;
@@ -2910,6 +2958,49 @@ fail:
   return;
 }
 
+/* Check whether exit is not simple but still good for further analysis.
+   Looks like such loops mostly are a result of jump threading.  */
+
+static bool
+check_complex_exit_p (struct loop* loop, basic_block bb)
+{
+  edge e;
+  basic_block pred, exit;
+
+  if (EDGE_COUNT (bb->preds) > 1)
+    return false;
+
+  e = EDGE_PRED (bb, 0);
+
+  pred = e->src;
+  if (EDGE_COUNT (pred->succs) != 2)
+    return false;
+
+  /* Predecessor must be tested (at least) once during any iteration.  */
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, pred))
+    return false;
+
+  if (EDGE_SUCC (pred, 0)->dest == bb)
+    exit = EDGE_SUCC (pred, 1)->dest;
+  else
+    exit = EDGE_SUCC (pred, 0)->dest;
+
+  /* Check that EXIT is really loop exit.  */
+  if (flow_bb_inside_loop_p (loop, exit))
+    {
+      edge_iterator eei;
+      edge ee;
+
+      FOR_EACH_EDGE (ee, eei, exit->succs)
+	{
+	  if (!flow_bb_inside_loop_p (loop, ee->dest))
+	    return true;
+	}
+    }
+  return false;
+
+}
+
 /* Checks whether E is a simple exit from LOOP and stores its description
    into DESC.  */
 
@@ -2929,7 +3020,8 @@ check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc)
     return;
 
   /* It must be tested (at least) once during any iteration.  */
-  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, exit_bb))
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, exit_bb)
+      && !check_complex_exit_p (loop, exit_bb))
     return;
 
   /* It must end in a simple conditional jump.  */
diff --git a/gcc/testsuite/gcc.dg/pr64081.c b/gcc/testsuite/gcc.dg/pr64081.c
new file mode 100644
index 0000000..6151d00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr64081.c
@@ -0,0 +1,41 @@
+/* PR rtl-optimization/64081 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -funroll-loops -fdump-rtl-loop2_unroll" } */
+
+long token;
+long *data;
+long *arr1;
+long *arr2;
+
+int main (int argc, const char* argv[])
+{
+  unsigned long i;
+  static long pos, start, count;
+  static int dir;
+
+  pos = 0;
+  dir = 1;
+
+  for (i = 0; i < argc; i++)
+    {
+      start = pos;
+      for (count = 0; count <= 400; count++)
+	{
+	  if (token == data[pos])
+	    break;
+	  if (dir == 1)
+	    pos = arr1[pos];
+	  else
+	    pos = arr2[pos];
+	  if (pos == -1)
+	    {
+	      pos = start;
+	      dir = !dir;
+	    }
+	}
+    }
+  return pos + dir;
+}
+
+/* { dg-final { scan-rtl-dump "loop unrolled 7 times" "loop2_unroll" } } */
+/* { dg-final { cleanup-rtl-dump "loop*" } } */