From patchwork Mon May 30 15:26:33 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Hubicka <hubicka@ucw.cz>
X-Patchwork-Id: 627871
Return-Path: 
 <gcc-patches-return-428608-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3rJL7q6yYjz9t6s
	for <incoming@patchwork.ozlabs.org>;
	Tue, 31 May 2016 01:26:47 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=SNtSv6WB; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:references:mime-version
	:content-type:in-reply-to; q=dns; s=default; b=O32xSjEmskOoXYurD
	8IW9M7JfefEcnrgkF5udIhL+utAiRxvod4xYYawslhH2jTjYfKvVeikX0TXz++L9
	hfIVh1t04f+rLkhMI3Da1qgG9Xg9+BIQNNJQxAnXN6YaKA7IELUecxKv45sHSQQo
	s9OD91xW7oBc58wIvQMvQgbuTE=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:references:mime-version
	:content-type:in-reply-to; s=default; bh=P+0d31ru8LUEOgiIjsb9qIu
	C+s0=; b=SNtSv6WBm+C5zf/yiBTkRJCkKKKy5f02BPnXzyQGna63emWFmhli+cN
	0cldPpK/wmcy0v1GrzY6fSiHwu7goPR1aiPO8WL8ALIAXXZTDFB03IBq8kOnOa++
	NVeFeLBfCs82GkIXCVhfDjPy6iw9V504AWjkFa5OzQwAfES9SDCM=
Received: (qmail 76718 invoked by alias); 30 May 2016 15:26:40 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 76708 invoked by uid 89); 30 May 2016 15:26:39 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.2 required=5.0 tests=AWL, BAYES_50,
	KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY,
	RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=intend,
	default_options, @opindex, iterates
X-HELO: nikam.ms.mff.cuni.cz
Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz)
	(195.113.20.16) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384
	encrypted) ESMTPS; Mon, 30 May 2016 15:26:37 +0000
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)	id
	6F1F454582C; Mon, 30 May 2016 17:26:33 +0200 (CEST)
Date: Mon, 30 May 2016 17:26:33 +0200
From: Jan Hubicka <hubicka@ucw.cz>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: Richard Biener <rguenther@suse.de>,
	Sandra Loosemore <sandra@codesourcery.com>, gcc-patches@gcc.gnu.org
Subject: Re: Enable loop peeling at -O3
Message-ID: <20160530152633.GA96777@kam.mff.cuni.cz>
References: <20160527131928.GE44464@kam.mff.cuni.cz>
	<57486D96.8090508@codesourcery.com>
	<20160528150444.GB5812@kam.mff.cuni.cz>
	<alpine.LSU.2.11.1605301123000.1493@t29.fhfr.qr>
	<20160530110740.GC2770@kam.mff.cuni.cz>
	<alpine.LSU.2.11.1605301327270.1493@t29.fhfr.qr>
	<20160530113921.GD2770@kam.mff.cuni.cz>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20160530113921.GD2770@kam.mff.cuni.cz>
User-Agent: Mutt/1.5.21 (2010-09-15)

Hi,
this is version of patch I intend to commit after re-testing at x86_64-linux
with loop peeling enabled at -O3.

It drops -fpeel-all-loops, add logic to not peel loops multiple times
and fix profile updating.

Bootstrapped/regtested x86_64-linux

Honza

	* doc/invoke.texi (-fpeel-loops,-O3): Update documentation.
	* opts.c (default_options): Enable peel loops at -O3.
	* tree-ssa-loop-ivcanon.c (peeled_loops): New static var.
	(try_peel_loop): Do not re-peel already peeled loops;
	use likely upper bounds; fix profile updating.
	(pass_complete_unroll::execute): Initialize peeled_loops.
	
	* gcc.dg/tree-ssa/peel1.c: New testcase.
	* gcc.dg/tree-ssa/peel2.c: New testcase.
	* gcc.dg/tree-ssa/pr61743-1.c: Disable loop peeling.
	* gcc.dg/tree-ssa/pr61743-2.c: Disable loop peeling.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 236873)
+++ doc/invoke.texi	(working copy)
@@ -6338,7 +6338,8 @@ by @option{-O2} and also turns on the @o
 @option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
 @option{-ftree-loop-distribute-patterns}, @option{-fsplit-paths}
 @option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
-@option{-ftree-partial-pre} and @option{-fipa-cp-clone} options.
+@option{-ftree-partial-pre}, @option{-fpeel-loops}
+and @option{-fipa-cp-clone} options.
 
 @item -O0
 @opindex O0
@@ -8661,10 +8662,11 @@ the loop is entered.  This usually makes
 @item -fpeel-loops
 @opindex fpeel-loops
 Peels loops for which there is enough information that they do not
-roll much (from profile feedback).  It also turns on complete loop peeling
-(i.e.@: complete removal of loops with small constant number of iterations).
+roll much (from profile feedback or static analysis).  It also turns on
+complete loop peeling (i.e.@: complete removal of loops with small constant
+number of iterations).
 
-Enabled with @option{-fprofile-use}.
+Enabled with @option{-O3} and/or @option{-fprofile-use}.
 
 @item -fmove-loop-invariants
 @opindex fmove-loop-invariants
Index: opts.c
===================================================================
--- opts.c	(revision 236873)
+++ opts.c	(working copy)
@@ -535,6 +535,7 @@ static const struct default_options defa
     { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC },
     { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 },
 
     /* -Ofast adds optimizations to -O3.  */
     { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
Index: testsuite/gcc.dg/tree-ssa/peel1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/peel1.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/peel1.c	(working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+struct foo {int b; int a[3];} foo;
+void add(struct foo *a,int l)
+{
+  int i;
+  for (i=0;i<l;i++)
+    a->a[i]++;
+}
+/* { dg-final { scan-tree-dump "Loop 1 likely iterates at most 3 times." "cunroll"} } */
+/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." "cunroll"} } */
Index: testsuite/gcc.dg/tree-ssa/peel2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/peel2.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/peel2.c	(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fpeel-all-loops -fdump-tree-cunroll-details --param max-peel-times=16 --param max-peeled-insns=100" } */
+void add(int *a,int l)
+{
+  int i;
+  for (i=0;i<l;i++)
+    a[i]++;
+}
+/* { dg-final { scan-tree-dump "Peeled loop 1, 16 times." "cunroll"} } */
Index: testsuite/gcc.dg/tree-ssa/pr61743-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/pr61743-1.c	(revision 236873)
+++ testsuite/gcc.dg/tree-ssa/pr61743-1.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details" } */
+/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fno-peel-loops" } */
 
 #define N 8
 #define M 14
Index: testsuite/gcc.dg/tree-ssa/pr61743-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/pr61743-2.c	(revision 236873)
+++ testsuite/gcc.dg/tree-ssa/pr61743-2.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details" } */
+/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fno-peel-loops" } */
 
 #define N 8
 #define M 14
Index: tree-ssa-loop-ivcanon.c
===================================================================
--- tree-ssa-loop-ivcanon.c	(revision 236878)
+++ tree-ssa-loop-ivcanon.c	(working copy)
@@ -594,6 +594,8 @@ remove_redundant_iv_tests (struct loop *
 /* Stores loops that will be unlooped after we process whole loop tree. */
 static vec<loop_p> loops_to_unloop;
 static vec<int> loops_to_unloop_nunroll;
+/* Stores loops that has been peeled.  */
+static bitmap peeled_loops;
 
 /* Cancel all fully unrolled loops by putting __builtin_unreachable
    on the latch edge.  
@@ -962,14 +964,16 @@ try_peel_loop (struct loop *loop,
   vec<edge> to_remove = vNULL;
   edge e;
 
-  /* If the iteration bound is known and large, then we can safely eliminate
-     the check in peeled copies.  */
-  if (TREE_CODE (niter) != INTEGER_CST)
-    exit = NULL;
-
   if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0)
     return false;
 
+  if (bitmap_bit_p (peeled_loops, loop->num))
+    {
+      if (dump_file)
+        fprintf (dump_file, "Not peeling: loop is already peeled\n");
+      return false;
+    }
+
   /* Peel only innermost loops.
      While the code is perfectly capable of peeling non-innermost loops,
      the heuristics would probably need some improvements. */
@@ -990,6 +994,8 @@ try_peel_loop (struct loop *loop,
   /* Check if there is an estimate on the number of iterations.  */
   npeel = estimated_loop_iterations_int (loop);
   if (npeel < 0)
+    npeel = likely_max_loop_iterations_int (loop);
+  if (npeel < 0)
     {
       if (dump_file)
         fprintf (dump_file, "Not peeling: number of iterations is not "
@@ -1036,8 +1042,7 @@ try_peel_loop (struct loop *loop,
       && wi::leu_p (npeel, wi::to_widest (niter)))
     {
       bitmap_ones (wont_exit);
-      if (wi::eq_p (wi::to_widest (niter), npeel))
-        bitmap_clear_bit (wont_exit, 0);
+      bitmap_clear_bit (wont_exit, 0);
     }
   else
     {
@@ -1074,14 +1079,14 @@ try_peel_loop (struct loop *loop,
     }
   if (loop->any_upper_bound)
     {
-      if (wi::ltu_p (npeel, loop->nb_iterations_estimate))
+      if (wi::ltu_p (npeel, loop->nb_iterations_upper_bound))
         loop->nb_iterations_upper_bound -= npeel;
       else
         loop->nb_iterations_upper_bound = 0;
     }
   if (loop->any_likely_upper_bound)
     {
-      if (wi::ltu_p (npeel, loop->nb_iterations_estimate))
+      if (wi::ltu_p (npeel, loop->nb_iterations_likely_upper_bound))
 	loop->nb_iterations_likely_upper_bound -= npeel;
       else
 	{
@@ -1107,6 +1112,7 @@ try_peel_loop (struct loop *loop,
   else if (loop->header->frequency)
     scale = RDIV (entry_freq * REG_BR_PROB_BASE, loop->header->frequency);
   scale_loop_profile (loop, scale, 0);
+  bitmap_set_bit (peeled_loops, loop->num);
   return true;
 }
 /* Adds a canonical induction variable to LOOP if suitable.
@@ -1519,9 +1526,20 @@ pass_complete_unroll::execute (function
   if (number_of_loops (fun) <= 1)
     return 0;
 
-  return tree_unroll_loops_completely (flag_unroll_loops
-				       || flag_peel_loops
-				       || optimize >= 3, true);
+  /* If we ever decide to run loop peeling more than once, we will need to
+     track loops already peeled in loop structures themselves to avoid
+     re-peeling the same loop multiple times.  */
+  if (flag_peel_loops)
+    peeled_loops = BITMAP_ALLOC (NULL);
+  int val = tree_unroll_loops_completely (flag_unroll_loops
+					  || flag_peel_loops
+					  || optimize >= 3, true);
+  if (peeled_loops)
+    {
+      BITMAP_FREE (peeled_loops);
+      peeled_loops = NULL;
+    }
+  return val;
 }
 
 } // anon namespace