From patchwork Mon Nov 19 12:46:44 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 199998 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 361FC2C00A9 for ; Mon, 19 Nov 2012 23:47:02 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1353934023; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Date: From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To:User-Agent: Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:Sender:Delivered-To; bh=hy1iaAbsh8r5e0mxCcP3 p9TmO0M=; b=KLOiQveYAKZGqrmH1pTxtfpRMngPJtjeiDK2dwnFUTyGh8lZs+aD ykTETWzF+t89zqK3NXPZhaR1PRcTTa2nrHaI9wTyJpQjI3JDV2YSBZsPVGe6/B2w 176EaCPh7xtuf8Zr/6J0MDLyvJrlEgzck6kiF/wcFvwi/ZMmHezF5b8= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To:User-Agent:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=UsG2uSD4b4CKw/sOGLkXAWML3dybH3oqr8QWXG5oY858qyNsJXZDZ6uCviJTOv YEgEWCnI3LCo2zTg7Hdt1eC/s8bSeMCLrW9rpiLY74NiTGjKcxWErj+hdZfs3YXh BZwD9dfPXMR9hmFgs9ntOhT3yveES7GG5mXdNvgFRgbwM=; Received: (qmail 1656 invoked by alias); 19 Nov 2012 12:46:57 -0000 Received: (qmail 1646 invoked by uid 22791); 19 Nov 2012 12:46:56 -0000 X-SWARE-Spam-Status: No, hits=-4.0 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RP_MATCHES_RCVD, TW_TM X-Spam-Check-By: sourceware.org Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 19 Nov 2012 12:46:48 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 0F150543F71; Mon, 19 Nov 2012 13:46:44 +0100 (CET) Date: Mon, 19 Nov 2012 13:46:44 +0100 From: Jan Hubicka To: Eric Botcazou Cc: Jan Hubicka , gcc-patches@gcc.gnu.org, jakub@redhat.com Subject: Re: Reduce complette unrolling & peeling limits Message-ID: <20121119124644.GA7359@kam.mff.cuni.cz> References: <20121114233407.GC12910@kam.mff.cuni.cz> <20121118165230.GC27832@atrey.karlin.mff.cuni.cz> <20121118170852.GA18650@kam.mff.cuni.cz> <3701798.hzb6LFUzXT@polaris> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <3701798.hzb6LFUzXT@polaris> User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, this is patch I will try to test once I have chance :) t simply prevents unroller from analyzing loops when they are already too large. * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Add UPPER_BOUND parameter. (try_unroll_loop_completely) Update. Index: tree-ssa-loop-ivcanon.c =================================================================== --- tree-ssa-loop-ivcanon.c (revision 193598) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -1,5 +1,5 @@ -/* Induction variable canonicalization. - Copyright (C) 2004, 2005, 2007, 2008, 2010 +/* Induction variable canonicalization and loop peeling. + Copyright (C) 2004, 2005, 2007, 2008, 2010, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -29,9 +29,12 @@ along with GCC; see the file COPYING3. variables. In that case the created optimization possibilities are likely to pay up. - Additionally in case we detect that it is beneficial to unroll the - loop completely, we do it right here to expose the optimization - possibilities to the following passes. */ + We also perform + - complette unrolling (or peeling) when the loops is rolling few enough + times + - simple peeling (i.e. copying few initial iterations prior the loop) + when number of iteration estimate is known (typically by the profile + info). */ #include "config.h" #include "system.h" @@ -207,10 +210,12 @@ constant_after_peeling (tree op, gimple iteration of the loop. EDGE_TO_CANCEL (if non-NULL) is an non-exit edge eliminated in the last iteration of loop. - Return results in SIZE, estimate benefits for complete unrolling exiting by EXIT. */ + Return results in SIZE, estimate benefits for complete unrolling exiting by EXIT. + Stop estimating after UPPER_BOUND is met. Return true in this case */ -static void -tree_estimate_loop_size (struct loop *loop, edge exit, edge edge_to_cancel, struct loop_size *size) +static bool +tree_estimate_loop_size (struct loop *loop, edge exit, edge edge_to_cancel, struct loop_size *size, + int upper_bound) { basic_block *body = get_loop_body (loop); gimple_stmt_iterator gsi; @@ -316,6 +321,12 @@ tree_estimate_loop_size (struct loop *lo if (likely_eliminated || likely_eliminated_last) size->last_iteration_eliminated_by_peeling += num; } + if ((size->overall - size->eliminated_by_peeling + - size->last_iteration_eliminated_by_peeling) > upper_bound) + { + free (body); + return true; + } } } while (path.length ()) @@ -357,6 +368,7 @@ tree_estimate_loop_size (struct loop *lo size->last_iteration_eliminated_by_peeling); free (body); + return false; } /* Estimate number of insns of completely unrolled loop. @@ -699,12 +711,23 @@ try_unroll_loop_completely (struct loop sbitmap wont_exit; edge e; unsigned i; + bool large; vec to_remove = vec(); if (ul == UL_SINGLE_ITER) return false; - tree_estimate_loop_size (loop, exit, edge_to_cancel, &size); + large = tree_estimate_loop_size + (loop, exit, edge_to_cancel, &size, + ul == UL_NO_GROWTH ? 0 + : PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS) * 2); ninsns = size.overall; + if (large) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: it is too large.\n", + loop->num); + return false; + } unr_insns = estimated_unrolled_size (&size, n_unroll); if (dump_file && (dump_flags & TDF_DETAILS)) @@ -865,6 +888,133 @@ try_unroll_loop_completely (struct loop return true; } +/* Return number of instructions after peeling. */ +static unsigned HOST_WIDE_INT +estimated_peeled_sequence_size (struct loop_size *size, + unsigned HOST_WIDE_INT npeel) +{ + return MAX (npeel * (HOST_WIDE_INT) (size->overall + - size->eliminated_by_peeling), 1); +} + +/* If the loop is expected to iterate N times and is + small enough, duplicate the loop body N+1 times before + the loop itself. This way the hot path will never + enter the loop. + Parameters are the same as for try_unroll_loops_completely */ + +static bool +try_peel_loop (struct loop *loop, + edge exit, tree niter, + HOST_WIDE_INT maxiter) +{ + int npeel; + struct loop_size size; + int peeled_size; + sbitmap wont_exit; + unsigned i; + vec to_remove = vec(); + edge e; + + /* If the iteration bound is known and large, then we can safely eliminate + the check in peeled copies. */ + if (TREE_CODE (niter) != INTEGER_CST) + exit = NULL; + + if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0) + return false; + + /* Peel only innermost loops. */ + if (loop->inner) + { + if (dump_file) + fprintf (dump_file, "Not peeling: outer loop\n"); + return false; + } + + if (!optimize_loop_for_speed_p (loop)) + { + if (dump_file) + fprintf (dump_file, "Not peeling: cold loop\n"); + return false; + } + + /* Check if there is an estimate on the number of iterations. */ + npeel = estimated_loop_iterations_int (loop); + if (npeel < 0) + { + if (dump_file) + fprintf (dump_file, "Not peeling: number of iterations is not " + "estimated\n"); + return false; + } + if (maxiter >= 0 && maxiter <= npeel) + { + if (dump_file) + fprintf (dump_file, "Not peeling: upper bound is known so can " + "unroll complettely\n"); + return false; + } + + /* We want to peel estimated number of iterations + 1 (so we never + enter the loop on quick path). Check against PARAM_MAX_PEEL_TIMES + and be sure to avoid overflows. */ + if (npeel > PARAM_VALUE (PARAM_MAX_PEEL_TIMES) - 1) + { + if (dump_file) + fprintf (dump_file, "Not peeling: rolls too much " + "(%i + 1 > --param max-peel-times)\n", npeel); + return false; + } + npeel++; + + /* Check peeled loops size. */ + tree_estimate_loop_size (loop, exit, NULL, &size, + PARAM_VALUE (PARAM_MAX_PEELED_INSNS)); + if ((peeled_size = estimated_peeled_sequence_size (&size, npeel)) + > PARAM_VALUE (PARAM_MAX_PEELED_INSNS)) + { + if (dump_file) + fprintf (dump_file, "Not peeling: peeled sequence size is too large " + "(%i insns > --param max-peel-insns)", peeled_size); + return false; + } + + /* Duplicate possibly eliminating the exits. */ + initialize_original_copy_tables (); + wont_exit = sbitmap_alloc (npeel + 1); + bitmap_ones (wont_exit); + bitmap_clear_bit (wont_exit, 0); + if (!gimple_duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop), + npeel, wont_exit, + exit, &to_remove, + DLTHE_FLAG_UPDATE_FREQ + | DLTHE_FLAG_COMPLETTE_PEEL)) + { + free_original_copy_tables (); + free (wont_exit); + return false; + } + FOR_EACH_VEC_ELT (to_remove, i, e) + { + bool ok = remove_path (e); + gcc_assert (ok); + } + free (wont_exit); + free_original_copy_tables (); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Peeled loop %d, %i times.\n", + loop->num, npeel); + } + if (loop->any_upper_bound) + loop->nb_iterations_upper_bound -= double_int::from_uhwi (npeel); + loop->nb_iterations_estimate = double_int_zero; + /* Make sure to mark loop cold so we do not try to peel it more. */ + scale_loop_profile (loop, 1, 0); + loop->header->count = 0; + return true; +} /* Adds a canonical induction variable to LOOP if suitable. CREATE_IV is true if we may create a new iv. UL determines which loops we are allowed to completely unroll. If TRY_EVAL is true, we try @@ -939,6 +1089,9 @@ canonicalize_loop_induction_variables (s && exit && just_once_each_iteration_p (loop, exit->src)) create_canonical_iv (loop, exit, niter); + if (ul == UL_ALL) + modified |= try_peel_loop (loop, exit, niter, maxiter); + return modified; } @@ -981,8 +1134,10 @@ canonicalize_induction_variables (void) } BITMAP_FREE (loop_closed_ssa_invalidated); + /* Update virtuals because we possibly introduced __builtin_unreachable + call. */ if (changed) - return TODO_cleanup_cfg; + return TODO_cleanup_cfg | TODO_update_ssa_only_virtuals; return 0; }