From patchwork Fri Sep 13 16:15:52 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xinliang David Li X-Patchwork-Id: 274804 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 6BCC02C0130 for ; Sat, 14 Sep 2013 02:16:08 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=hgfNQSi6O7QIJb7x/S +cd0DdZLj0pbvq5+e3QrBnmxsIoypCaqH16EN9a4UNi56vjM9rnNYBrThcM++PW3 5KQU/s5+HhacfeY4qq3CfcmD4SG7LdkLBlGi852Xgk4KALvNPHu0wA38PFu/vvf2 7PYqv4zx9RWoNSnbhqYFAm9x8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=Cx5TdoXDxFyH8OHlpe4NOKVV Hno=; b=B5FAzO4PXFK59myZys+Fia8H9vE7jcxJF66XpdzPH+4PaGZRd2STCuhq 5IzXLlFxZDPbd5DyQcvjrrQrVrCk9cRsPN/OJc7cCKbQvgX4pS6ud5o0HihbJ7Yf J421J9TiFBOioOLklvo9uW1895NYF+iFUpnRLOhMeEJ7YWxOWBw= Received: (qmail 17623 invoked by alias); 13 Sep 2013 16:15:58 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 17609 invoked by uid 89); 13 Sep 2013 16:15:58 -0000 Received: from mail-wg0-f45.google.com (HELO mail-wg0-f45.google.com) (74.125.82.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 13 Sep 2013 16:15:58 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.3 required=5.0 tests=AWL, BAYES_00, KHOP_THREADED, NO_RELAYS autolearn=ham version=3.3.2 X-HELO: mail-wg0-f45.google.com Received: by mail-wg0-f45.google.com with SMTP id y10so1376434wgg.12 for ; Fri, 13 Sep 2013 09:15:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=iUkcJCSHFVK+GOUV75fKy9Ps2f8DZR1lMeGa8BbmDkM=; b=KcPsoIPdc9VN+zMHBNO8GfYIZethDt8MlWfGpfwkLLFfez3kfiNEfBYNu0Tyfd2DAb cbPxob8DwLnThZRdDkvlYNWdQHFJZPdCns1gGd0hh3qAj0Xg6eodh/jvEPvm/h3g5AOa 3D+7U9fMdgU3LhKnxhMTHS43CN8he2yDaMZpArHYDZBMQBQiGX1RL2VwevSPFtDm9Bro 4mlukDoTY3JOIaT55rUNynktZ2Tl0t45yZx6F2JhYU3yFicNiVK9cYZfA4q2av5FP9ld 9hOXuqFMMADLv4R6UzoPHseUhU08+WoE5ez7BMA+aBIFZet3iEs72S+HQHNBc5kWE5kh 8iCA== X-Gm-Message-State: ALoCoQm9CvNJXnEkH5HUcL3XHhxt7jAQ6SCZ+OPi5EyvDPAZBFazP0QXNOLn8EGNbpqpT3Aw6VJp7aS1t8B53tlhPC8BAysoTGNz33aatVFYZZf0w1rYDITDyI8eUedFndwu9ryCERQHCikSNGFuNhMv59lwc9O7WSdYbvZFVzjReBPM/NwvnI4UfpY2LpaA2kCbJSOPO6iCBNRUbekkQLjQ6m7zbKB0Ug== MIME-Version: 1.0 X-Received: by 10.194.240.197 with SMTP id wc5mr12007414wjc.23.1379088952879; Fri, 13 Sep 2013 09:15:52 -0700 (PDT) Received: by 10.180.212.81 with HTTP; Fri, 13 Sep 2013 09:15:52 -0700 (PDT) In-Reply-To: References: Date: Fri, 13 Sep 2013 09:15:52 -0700 Message-ID: Subject: Re: New GCC options for loop vectorization From: Xinliang David Li To: Richard Biener Cc: "Joseph S. Myers" , GCC Patches X-IsSubscribed: yes New patch attached. 1) the peeling part is removed 2) the new patch implements the last-one-wins logic. -ftree-vectorize behaves like a true alias. -fno-tree-vectorize can override previous -ftree-xxx-vectorize. Ok for trunk after testing? thanks, David On Fri, Sep 13, 2013 at 8:16 AM, Xinliang David Li wrote: > On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener > wrote: >> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li wrote: >>> Currently -ftree-vectorize turns on both loop and slp vectorizations, >>> but there is no simple way to turn on loop vectorization alone. The >>> logic for default O3 setting is also complicated. >>> >>> In this patch, two new options are introduced: >>> >>> 1) -ftree-loop-vectorize >>> >>> This option is used to turn on loop vectorization only. option >>> -ftree-slp-vectorize also becomes a first class citizen, and no funny >>> business of Init(2) is needed. With this change, -ftree-vectorize >>> becomes a simple alias to -ftree-loop-vectorize + >>> -ftree-slp-vectorize. >>> >>> For instance, to turn on only slp vectorize at O3, the old way is: >>> >>> -O3 -fno-tree-vectorize -ftree-slp-vectorize >>> >>> With the new change it becomes: >>> >>> -O3 -fno-loop-vectorize >>> >>> >>> To turn on only loop vectorize at O2, the old way is >>> >>> -O2 -ftree-vectorize -fno-slp-vectorize >>> >>> The new way is >>> >>> -O2 -ftree-loop-vectorize >>> >>> >>> >>> 2) -ftree-vect-loop-peeling >>> >>> This option is used to turn on/off loop peeling for alignment. In the >>> long run, this should be folded into the cheap cost model proposed by >>> Richard. This option is also useful in scenarios where peeling can >>> introduce runtime problems: >>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html which happens to be >>> common in practice. >>> >>> >>> >>> Patch attached. Compiler boostrapped. Ok after testing? >> >> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2). > > Ok. Can you also comment on 2) ? > >> >> I've stopped a quick try doing 1) myself because >> >> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options >> opts->x_flag_ipa_reference = false; >> break; >> >> + case OPT_ftree_vectorize: >> + if (!opts_set->x_flag_tree_loop_vectorize) >> + opts->x_flag_tree_loop_vectorize = value; >> + if (!opts_set->x_flag_tree_slp_vectorize) >> + opts->x_flag_tree_slp_vectorize = value; >> + break; >> >> doesn't look obviously correct. Does that handle >> >> -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize >> >> or >> >> -ftree-loop-vectorize -fno-tree-vectorize >> >> properly? Currently at least >> >> -ftree-slp-vectorize -fno-tree-vectorize >> >> doesn't "work". > > > Right -- same is true for -fprofile-use option. FDO enables some > passes, but can not re-enable them if they are flipped off before. > >> >> That said, the option machinery doesn't handle an option being an alias >> for two other options, so it's mechanism to contract positives/negatives >> doesn't work here and the override hooks do not work reliably for >> repeated options. >> >> Or am I wrong here? Should we care at all? Joseph? > > We should probably just document the behavior. Even better, we should > deprecate the old option. > > thanks, > > David > >> >> Thanks, >> Richard. >> >>> >>> thanks, >>> >>> David Index: ChangeLog =================================================================== --- ChangeLog (revision 202540) +++ ChangeLog (working copy) @@ -1,3 +1,22 @@ +2013-09-12 Xinliang David Li + + * tree-if-conv.c (main_tree_if_conversion): Check new flag. + * omp-low.c (omp_max_vf): Ditto. + (expand_omp_simd): Ditto. + * tree-vectorizer.c (vectorize_loops): Ditto. + (gate_vect_slp): Ditto. + (gate_increase_alignment): Ditto. + * tree-ssa-pre.c (inhibit_phi_insertion): Ditto. + * tree-ssa-loop.c (gate_tree_vectorize): Ditto. + (gate_tree_vectorize): Name change. + (tree_vectorize): Ditto. + (pass_vectorize::gate): Call new function. + (pass_vectorize::execute): Ditto. + opts.c: O3 default setting change. + (finish_options): Check new flag. + * doc/invoke.texi: Document new flags. + * common.opt: New flags. + 2013-09-12 Vladimir Makarov PR middle-end/58335 Index: cp/lambda.c =================================================================== --- cp/lambda.c (revision 202540) +++ cp/lambda.c (working copy) @@ -792,7 +792,7 @@ maybe_add_lambda_conv_op (tree type) particular, parameter pack expansions are marked PACK_EXPANSION_LOCAL_P in the body CALL, but not in DECLTYPE_CALL. */ - vec *direct_argvec; + vec *direct_argvec = NULL; tree decltype_call = 0, call; tree fn_result = TREE_TYPE (TREE_TYPE (callop)); Index: tree-ssa-loop.c =================================================================== --- tree-ssa-loop.c (revision 202540) +++ tree-ssa-loop.c (working copy) @@ -303,7 +303,7 @@ make_pass_predcom (gcc::context *ctxt) /* Loop autovectorization. */ static unsigned int -tree_vectorize (void) +tree_loop_vectorize (void) { if (number_of_loops (cfun) <= 1) return 0; @@ -312,9 +312,9 @@ tree_vectorize (void) } static bool -gate_tree_vectorize (void) +gate_tree_loop_vectorize (void) { - return flag_tree_vectorize || cfun->has_force_vect_loops; + return flag_tree_loop_vectorize || cfun->has_force_vect_loops; } namespace { @@ -342,8 +342,8 @@ public: {} /* opt_pass methods: */ - bool gate () { return gate_tree_vectorize (); } - unsigned int execute () { return tree_vectorize (); } + bool gate () { return gate_tree_loop_vectorize (); } + unsigned int execute () { return tree_loop_vectorize (); } }; // class pass_vectorize Index: common.opt =================================================================== --- common.opt (revision 202540) +++ common.opt (working copy) @@ -2263,15 +2263,19 @@ Common Report Var(flag_var_tracking_unin Perform variable tracking and also tag variables that are uninitialized ftree-vectorize -Common Report Var(flag_tree_vectorize) Optimization -Enable loop vectorization on trees +Common Report Optimization +Enable vectorization on trees ftree-vectorizer-verbose= Common RejectNegative Joined UInteger Var(common_deferred_options) Defer -ftree-vectorizer-verbose= This switch is deprecated. Use -fopt-info instead. +ftree-loop-vectorize +Common Report Var(flag_tree_loop_vectorize) Optimization +Enable loop vectorization on trees + ftree-slp-vectorize -Common Report Var(flag_tree_slp_vectorize) Init(2) Optimization +Common Report Var(flag_tree_slp_vectorize) Optimization Enable basic block vectorization (SLP) on trees fvect-cost-model Index: tree-if-conv.c =================================================================== --- tree-if-conv.c (revision 202540) +++ tree-if-conv.c (working copy) @@ -1789,7 +1789,7 @@ main_tree_if_conversion (void) FOR_EACH_LOOP (li, loop, 0) if (flag_tree_loop_if_convert == 1 || flag_tree_loop_if_convert_stores == 1 - || flag_tree_vectorize + || flag_tree_loop_vectorize || loop->force_vect) changed |= tree_if_conversion (loop); @@ -1815,7 +1815,7 @@ main_tree_if_conversion (void) static bool gate_tree_if_conversion (void) { - return (((flag_tree_vectorize || cfun->has_force_vect_loops) + return (((flag_tree_loop_vectorize || cfun->has_force_vect_loops) && flag_tree_loop_if_convert != 0) || flag_tree_loop_if_convert == 1 || flag_tree_loop_if_convert_stores == 1); Index: tree-ssa-pre.c =================================================================== --- tree-ssa-pre.c (revision 202540) +++ tree-ssa-pre.c (working copy) @@ -3026,7 +3026,7 @@ inhibit_phi_insertion (basic_block bb, p unsigned i; /* If we aren't going to vectorize we don't inhibit anything. */ - if (!flag_tree_vectorize) + if (!flag_tree_loop_vectorize) return false; /* Otherwise we inhibit the insertion when the address of the Index: opts.c =================================================================== --- opts.c (revision 202540) +++ opts.c (working copy) @@ -498,7 +498,8 @@ static const struct default_options defa { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, - { OPT_LEVELS_3_PLUS, OPT_ftree_vectorize, NULL, 1 }, + { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 }, + { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 }, @@ -826,7 +827,8 @@ finish_options (struct gcc_options *opts /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion is disabled. */ - if (!opts->x_flag_tree_vectorize || !opts->x_flag_tree_loop_if_convert) + if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize) + || !opts->x_flag_tree_loop_if_convert) maybe_set_param_value (PARAM_MAX_STORES_TO_SINK, 0, opts->x_param_values, opts_set->x_param_values); @@ -1660,8 +1662,10 @@ common_handle_option (struct gcc_options opts->x_flag_unswitch_loops = value; if (!opts_set->x_flag_gcse_after_reload) opts->x_flag_gcse_after_reload = value; - if (!opts_set->x_flag_tree_vectorize) - opts->x_flag_tree_vectorize = value; + if (!opts_set->x_flag_tree_loop_vectorize) + opts->x_flag_tree_loop_vectorize = value; + if (!opts_set->x_flag_tree_slp_vectorize) + opts->x_flag_tree_slp_vectorize = value; if (!opts_set->x_flag_vect_cost_model) opts->x_flag_vect_cost_model = value; if (!opts_set->x_flag_tree_loop_distribute_patterns) @@ -1691,6 +1695,10 @@ common_handle_option (struct gcc_options opts->x_flag_ipa_reference = false; break; + case OPT_ftree_vectorize: + opts->x_flag_tree_loop_vectorize = value; + opts->x_flag_tree_slp_vectorize = value; + break; case OPT_fshow_column: dc->show_column = value; break; Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 202540) +++ doc/invoke.texi (working copy) @@ -419,10 +419,11 @@ Objective-C and Objective-C++ Dialects}. -ftree-loop-if-convert-stores -ftree-loop-im @gol -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol +-ftree-loop-vectorize @gol -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol --ftree-switch-conversion -ftree-tail-merge @gol --ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol +-ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol +-ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol -funit-at-a-time -funroll-all-loops -funroll-loops @gol -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol -fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb @gol @@ -6751,8 +6752,8 @@ invoking @option{-O2} on programs that u Optimize yet more. @option{-O3} turns on all optimizations specified by @option{-O2} and also turns on the @option{-finline-functions}, @option{-funswitch-loops}, @option{-fpredictive-commoning}, -@option{-fgcse-after-reload}, @option{-ftree-vectorize}, -@option{-fvect-cost-model}, +@option{-fgcse-after-reload}, @option{-ftree-loop-vectorize}, +@option{-ftree-slp-vectorize}, @option{-fvect-cost-model}, @option{-ftree-partial-pre} and @option{-fipa-cp-clone} options. @item -O0 @@ -8011,8 +8012,13 @@ higher. @item -ftree-vectorize @opindex ftree-vectorize +Perform vectorization on trees. This flag enables @option{-ftree-loop-vectorize} +and @option{-ftree-slp-vectorize} if neither option is explicitly specified. + +@item -ftree-loop-vectorize +@opindex ftree-loop-vectorize Perform loop vectorization on trees. This flag is enabled by default at -@option{-O3}. +@option{-O3} and when @option{-ftree-vectorize} is enabled. @item -ftree-slp-vectorize @opindex ftree-slp-vectorize Index: omp-low.c =================================================================== --- omp-low.c (revision 202540) +++ omp-low.c (working copy) @@ -2305,8 +2305,8 @@ omp_max_vf (void) { if (!optimize || optimize_debug - || (!flag_tree_vectorize - && global_options_set.x_flag_tree_vectorize)) + || (!flag_tree_loop_vectorize + && global_options_set.x_flag_tree_loop_vectorize)) return 1; int vs = targetm.vectorize.autovectorize_vector_sizes (); @@ -5684,10 +5684,10 @@ expand_omp_simd (struct omp_region *regi loop->simduid = OMP_CLAUSE__SIMDUID__DECL (simduid); cfun->has_simduid_loops = true; } - /* If not -fno-tree-vectorize, hint that we want to vectorize + /* If not -fno-tree-loop-vectorize, hint that we want to vectorize the loop. */ - if ((flag_tree_vectorize - || !global_options_set.x_flag_tree_vectorize) + if ((flag_tree_loop_vectorize + || !global_options_set.x_flag_tree_loop_vectorize) && loop->safelen > 1) { loop->force_vect = true; Index: tree-vectorizer.c =================================================================== --- tree-vectorizer.c (revision 202540) +++ tree-vectorizer.c (working copy) @@ -341,7 +341,7 @@ vectorize_loops (void) than all previously defined loops. This fact allows us to run only over initial loops skipping newly generated ones. */ FOR_EACH_LOOP (li, loop, 0) - if ((flag_tree_vectorize && optimize_loop_nest_for_speed_p (loop)) + if ((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop)) || loop->force_vect) { loop_vec_info loop_vinfo; @@ -486,10 +486,7 @@ execute_vect_slp (void) static bool gate_vect_slp (void) { - /* Apply SLP either if the vectorizer is on and the user didn't specify - whether to run SLP or not, or if the SLP flag was set by the user. */ - return ((flag_tree_vectorize != 0 && flag_tree_slp_vectorize != 0) - || flag_tree_slp_vectorize == 1); + return flag_tree_slp_vectorize != 0; } namespace { @@ -579,7 +576,7 @@ increase_alignment (void) static bool gate_increase_alignment (void) { - return flag_section_anchors && flag_tree_vectorize; + return flag_section_anchors && flag_tree_loop_vectorize; }