From patchwork Mon May 30 15:26:33 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 627871 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rJL7q6yYjz9t6s for ; Tue, 31 May 2016 01:26:47 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=SNtSv6WB; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=O32xSjEmskOoXYurD 8IW9M7JfefEcnrgkF5udIhL+utAiRxvod4xYYawslhH2jTjYfKvVeikX0TXz++L9 hfIVh1t04f+rLkhMI3Da1qgG9Xg9+BIQNNJQxAnXN6YaKA7IELUecxKv45sHSQQo s9OD91xW7oBc58wIvQMvQgbuTE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=P+0d31ru8LUEOgiIjsb9qIu C+s0=; b=SNtSv6WBm+C5zf/yiBTkRJCkKKKy5f02BPnXzyQGna63emWFmhli+cN 0cldPpK/wmcy0v1GrzY6fSiHwu7goPR1aiPO8WL8ALIAXXZTDFB03IBq8kOnOa++ NVeFeLBfCs82GkIXCVhfDjPy6iw9V504AWjkFa5OzQwAfES9SDCM= Received: (qmail 76718 invoked by alias); 30 May 2016 15:26:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 76708 invoked by uid 89); 30 May 2016 15:26:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.2 required=5.0 tests=AWL, BAYES_50, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=intend, default_options, @opindex, iterates X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Mon, 30 May 2016 15:26:37 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 6F1F454582C; Mon, 30 May 2016 17:26:33 +0200 (CEST) Date: Mon, 30 May 2016 17:26:33 +0200 From: Jan Hubicka To: Jan Hubicka Cc: Richard Biener , Sandra Loosemore , gcc-patches@gcc.gnu.org Subject: Re: Enable loop peeling at -O3 Message-ID: <20160530152633.GA96777@kam.mff.cuni.cz> References: <20160527131928.GE44464@kam.mff.cuni.cz> <57486D96.8090508@codesourcery.com> <20160528150444.GB5812@kam.mff.cuni.cz> <20160530110740.GC2770@kam.mff.cuni.cz> <20160530113921.GD2770@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160530113921.GD2770@kam.mff.cuni.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Hi, this is version of patch I intend to commit after re-testing at x86_64-linux with loop peeling enabled at -O3. It drops -fpeel-all-loops, add logic to not peel loops multiple times and fix profile updating. Bootstrapped/regtested x86_64-linux Honza * doc/invoke.texi (-fpeel-loops,-O3): Update documentation. * opts.c (default_options): Enable peel loops at -O3. * tree-ssa-loop-ivcanon.c (peeled_loops): New static var. (try_peel_loop): Do not re-peel already peeled loops; use likely upper bounds; fix profile updating. (pass_complete_unroll::execute): Initialize peeled_loops. * gcc.dg/tree-ssa/peel1.c: New testcase. * gcc.dg/tree-ssa/peel2.c: New testcase. * gcc.dg/tree-ssa/pr61743-1.c: Disable loop peeling. * gcc.dg/tree-ssa/pr61743-2.c: Disable loop peeling. Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 236873) +++ doc/invoke.texi (working copy) @@ -6338,7 +6338,8 @@ by @option{-O2} and also turns on the @o @option{-fgcse-after-reload}, @option{-ftree-loop-vectorize}, @option{-ftree-loop-distribute-patterns}, @option{-fsplit-paths} @option{-ftree-slp-vectorize}, @option{-fvect-cost-model}, -@option{-ftree-partial-pre} and @option{-fipa-cp-clone} options. +@option{-ftree-partial-pre}, @option{-fpeel-loops} +and @option{-fipa-cp-clone} options. @item -O0 @opindex O0 @@ -8661,10 +8662,11 @@ the loop is entered. This usually makes @item -fpeel-loops @opindex fpeel-loops Peels loops for which there is enough information that they do not -roll much (from profile feedback). It also turns on complete loop peeling -(i.e.@: complete removal of loops with small constant number of iterations). +roll much (from profile feedback or static analysis). It also turns on +complete loop peeling (i.e.@: complete removal of loops with small constant +number of iterations). -Enabled with @option{-fprofile-use}. +Enabled with @option{-O3} and/or @option{-fprofile-use}. @item -fmove-loop-invariants @opindex fmove-loop-invariants Index: opts.c =================================================================== --- opts.c (revision 236873) +++ opts.c (working copy) @@ -535,6 +535,7 @@ static const struct default_options defa { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC }, { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 }, + { OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 }, /* -Ofast adds optimizations to -O3. */ { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 }, Index: testsuite/gcc.dg/tree-ssa/peel1.c =================================================================== --- testsuite/gcc.dg/tree-ssa/peel1.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/peel1.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ +struct foo {int b; int a[3];} foo; +void add(struct foo *a,int l) +{ + int i; + for (i=0;ia[i]++; +} +/* { dg-final { scan-tree-dump "Loop 1 likely iterates at most 3 times." "cunroll"} } */ +/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." "cunroll"} } */ Index: testsuite/gcc.dg/tree-ssa/peel2.c =================================================================== --- testsuite/gcc.dg/tree-ssa/peel2.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/peel2.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpeel-all-loops -fdump-tree-cunroll-details --param max-peel-times=16 --param max-peeled-insns=100" } */ +void add(int *a,int l) +{ + int i; + for (i=0;i loops_to_unloop; static vec loops_to_unloop_nunroll; +/* Stores loops that has been peeled. */ +static bitmap peeled_loops; /* Cancel all fully unrolled loops by putting __builtin_unreachable on the latch edge. @@ -962,14 +964,16 @@ try_peel_loop (struct loop *loop, vec to_remove = vNULL; edge e; - /* If the iteration bound is known and large, then we can safely eliminate - the check in peeled copies. */ - if (TREE_CODE (niter) != INTEGER_CST) - exit = NULL; - if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0) return false; + if (bitmap_bit_p (peeled_loops, loop->num)) + { + if (dump_file) + fprintf (dump_file, "Not peeling: loop is already peeled\n"); + return false; + } + /* Peel only innermost loops. While the code is perfectly capable of peeling non-innermost loops, the heuristics would probably need some improvements. */ @@ -990,6 +994,8 @@ try_peel_loop (struct loop *loop, /* Check if there is an estimate on the number of iterations. */ npeel = estimated_loop_iterations_int (loop); if (npeel < 0) + npeel = likely_max_loop_iterations_int (loop); + if (npeel < 0) { if (dump_file) fprintf (dump_file, "Not peeling: number of iterations is not " @@ -1036,8 +1042,7 @@ try_peel_loop (struct loop *loop, && wi::leu_p (npeel, wi::to_widest (niter))) { bitmap_ones (wont_exit); - if (wi::eq_p (wi::to_widest (niter), npeel)) - bitmap_clear_bit (wont_exit, 0); + bitmap_clear_bit (wont_exit, 0); } else { @@ -1074,14 +1079,14 @@ try_peel_loop (struct loop *loop, } if (loop->any_upper_bound) { - if (wi::ltu_p (npeel, loop->nb_iterations_estimate)) + if (wi::ltu_p (npeel, loop->nb_iterations_upper_bound)) loop->nb_iterations_upper_bound -= npeel; else loop->nb_iterations_upper_bound = 0; } if (loop->any_likely_upper_bound) { - if (wi::ltu_p (npeel, loop->nb_iterations_estimate)) + if (wi::ltu_p (npeel, loop->nb_iterations_likely_upper_bound)) loop->nb_iterations_likely_upper_bound -= npeel; else { @@ -1107,6 +1112,7 @@ try_peel_loop (struct loop *loop, else if (loop->header->frequency) scale = RDIV (entry_freq * REG_BR_PROB_BASE, loop->header->frequency); scale_loop_profile (loop, scale, 0); + bitmap_set_bit (peeled_loops, loop->num); return true; } /* Adds a canonical induction variable to LOOP if suitable. @@ -1519,9 +1526,20 @@ pass_complete_unroll::execute (function if (number_of_loops (fun) <= 1) return 0; - return tree_unroll_loops_completely (flag_unroll_loops - || flag_peel_loops - || optimize >= 3, true); + /* If we ever decide to run loop peeling more than once, we will need to + track loops already peeled in loop structures themselves to avoid + re-peeling the same loop multiple times. */ + if (flag_peel_loops) + peeled_loops = BITMAP_ALLOC (NULL); + int val = tree_unroll_loops_completely (flag_unroll_loops + || flag_peel_loops + || optimize >= 3, true); + if (peeled_loops) + { + BITMAP_FREE (peeled_loops); + peeled_loops = NULL; + } + return val; } } // anon namespace