From patchwork Fri Nov 17 10:23:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Botcazou X-Patchwork-Id: 838934 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-467111-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="YTwgOYq2"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ydZ473bP2z9s7v for ; Fri, 17 Nov 2017 21:24:54 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=RSalz25fTV05YrcY OCWHBtU/qMD3+xLI1uDUzknQo9h/iPIRhcwnUJppYza4pWz18svy4QDXvBU+L4xN pxW1Ahg6vVPbwIoMFBStwWrHugFa/fvZfyhNjFxPhKLg+faX3/pdFR0l7VjGtf2B Wq8zv0f8FKPSDefSLI3tXQMu56M= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=default; bh=JbaOp40TkkAAgosPegq5u2 NAySg=; b=YTwgOYq2csZyyayHeXgkAiIz92x/wG1f7MixsGmBlhO/HlqJIn/qfh O0vqDPFfc1sDa2TOzQx7Jsmh8ivQ3RGduhYkXiyJE0drK9G2oBU81ug9SaVP/FhT LKCTnY9sugqWc0yxZMu3lWAK7Y4J9BhF8cxLBbPPgIhQQJVd8l7O0= Received: (qmail 56278 invoked by alias); 17 Nov 2017 10:24:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 56071 invoked by uid 89); 17 Nov 2017 10:24:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-14.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KB_WAM_FROM_NAME_SINGLEWORD, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=polishing X-HELO: smtp.eu.adacore.com Received: from mel.act-europe.fr (HELO smtp.eu.adacore.com) (194.98.77.210) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 17 Nov 2017 10:24:01 +0000 Received: from localhost (localhost [127.0.0.1]) by filtered-smtp.eu.adacore.com (Postfix) with ESMTP id 1B118822CB; Fri, 17 Nov 2017 11:23:58 +0100 (CET) Received: from smtp.eu.adacore.com ([127.0.0.1]) by localhost (smtp.eu.adacore.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IiAmF5Oy1HzL; Fri, 17 Nov 2017 11:23:57 +0100 (CET) Received: from polaris.localnet (bon31-6-88-161-99-133.fbx.proxad.net [88.161.99.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.eu.adacore.com (Postfix) with ESMTPSA id 37C12822CA; Fri, 17 Nov 2017 11:23:57 +0100 (CET) From: Eric Botcazou To: gcc-patches@gcc.gnu.org Cc: Mike Stump Subject: [patch] Add support for #pragma GCC unroll Date: Fri, 17 Nov 2017 11:23:56 +0100 Message-ID: <3136854.5zjt3GTbhu@polaris> User-Agent: KMail/4.14.10 (Linux/3.16.7-53-desktop; KDE/4.14.9; x86_64; ; ) MIME-Version: 1.0 Hi, this is a cleaned up and updated revision of Mike's latest posted patch implementing #pragma GCC unroll in the C and C++ compilers. To be honest, we're not so much interested in the front-end bits as in the middle-end bits, because the latter would at last make the Ada version of the pragma work, but the front-end bits are a significant part of the whole thing so it's probably fair to rescue them as well. The C and C++ front-end bits are (almost) verbatim from Mike. The cleanup comprises making the new 3rd operand of ANNOTATE_EXPR mandatory, so that you don't have to add guards all over the place, polishing a few rough edges and eliminating a few preexisting nits in the unrolling code. Tested on x86_64-suse-linux, OK for the mainline? 2017-11-17 Mike Stump Eric Botcazou ChangeLog: * doc/extend.texi (Loop-Specific Pragmas): Document pragma GCC unroll. * doc/generic.texi (ANNOTATE_EXPR): Document 3rd operand. * cfgloop.h (struct loop): Add unroll field. * function.h (struct function): Add has_unroll bitfield. * gimplify.c (gimple_boolify) : Deal with unroll kind. (gimplify_expr) : Propagate 3rd operand. * loop-init.c (pass_loop2::gate): Return true if cfun->has_unroll. (pass_rtl_unroll_loops::gate): Likewise. * loop-unroll.c (decide_unrolling): Tweak note message. Skip loops if loop->unroll==1 and force unrolling loop->unroll > 1. (decide_unroll_constant_iterations): Use note for consistency and return early if loop->unroll is set. (decide_unroll_runtime_iterations): Use note for consistency and take loop->unroll into account. (decide_unroll_stupid): Likewise. * lto-streamer-in.c (input_cfg): Read loop->unroll. * lto-streamer-out.c (output_cfg): Write loop->unroll. * tree-cfg.c (replace_loop_annotate_in_block) New. (replace_loop_annotate) : Likewise. (print_loop): Print loop->unroll if set. * tree-core.h (enum annot_expr_kind): Add annot_expr_unroll_kind. * tree-inline.c (copy_loops): Copy unroll and set cfun->has_unroll. * tree-pretty-print.c (dump_generic_node) : New. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Bail out if loop->unroll is set and smaller than the trip count. Otherwise bypass entirely the heuristics if loop->unroll is set. Remove dead note. Fix off-by-one bug in other node. (try_peel_loop): Bail out if loop->unroll is set. Fix formatting. (tree_unroll_loops_completely_1): Force unrolling if loop->unroll is greater than 1. * tree.def (ANNOTATE_EXPR): Add 3rd operand. ada/ChangeLog: * gcc-interface/trans.c (gnat_gimplify_stmt) : Add 3rd operand to ANNOTATE_EXPR and pass unrolling hints. c-family/ChangeLog: * c-pragma.c (init_pragma): Register pragma GCC unroll. * c-pragma.h (enum pragma_kind): Add PRAGMA_UNROLL. c/ChangeLog: * c-parser.c (c_parser_while_statement): Add unroll parameter and build ANNOTATE_EXPR if present. Add 3rd operand to ANNOTATE_EXPR. (c_parser_do_statement): Likewise. (c_parser_for_statement): Likewise. (c_parser_statement_after_labels): Adjust calls to above. (c_parse_pragma_ivdep): New static function. (c_parser_pragma_unroll): Likewise. (c_parser_pragma) : Add support for pragma Unroll. : New case. cp/ChangeLog: * constexpr.c (cxx_eval_constant_expression) : Remove assertion on 2nd operand. (potential_constant_expression_1): Likewise. * cp-array-notation.c (create_an_loop): Adjut call to finish_for_cond. * cp-tree.h (cp_convert_range_for): Adjust prototype. (finish_while_stmt_cond): Likewise. (finish_do_stmt): Likewise. (finish_for_cond): Likewise. * init.c (build_vec_init): Adjut call to finish_for_cond. * parser.c (cp_parser_statement): Adjust call to cp_parser_iteration_statement. (cp_parser_for): Add unroll parameter and pass it in calls to cp_parser_range_for and cp_parser_c_for. (cp_parser_c_for): Add unroll parameter and pass it in call to finish_for_cond. (cp_parser_range_for): Add unroll parameter and pass it in call to cp_convert_range_for. (cp_convert_range_for): Add unroll parameter and pass it in call to finish_for_cond. (cp_parser_iteration_statement): Add unroll parameter and pass it in calls to finish_while_stmt_cond, finish_do_stmt and cp_parser_for. (cp_parser_pragma_ivdep): New static function. (cp_parser_pragma_unroll): Likewise. (cp_parser_pragma) : Add support for pragma Unroll. : New case. * pt.c (tsubst_expr): Adjut calls to finish_for_cond, cp_convert_range_for, finish_while_stmt_cond and finish_do_stmt. : Propagate 3rd operand. * semantics.c (finish_while_stmt_cond): Add unroll parameter and build ANNOTATE_EXPR if present. Add 3rd operand to ANNOTATE_EXPR. (finish_do_stmt): Likewise. (finish_for_cond): Likewise. fortran/ChangeLog: * trans-stmt.c (gfc_trans_forall_loop): Add 3rd operand to ANNOTATE_EXPR. testsuite/ChangeLog: * c-c++-common/unroll-1.c: New test. * c-c++-common/unroll-2.c: Likewise. * c-c++-common/unroll-3.c: Likewise. * c-c++-common/unroll-4.c: Likewise. * gcc.dg/tree-prof/unroll-1.c: Use detailed dump and adjust scan. * gcc.dg/unroll-2.c (foo): Adjust message. (foo2): Likewise. * gcc.dg/unroll-3.c: Adjust scan. * gcc.dg/unroll-4.c: Likewise. * gcc.dg/unroll-5.c: Likewise. * gcc.dg/unroll-7.c: Use detailed dump and adjust scan. * gnat.dg/unroll1.ad[sb]: New test. * gnat.dg/unroll2.ad[sb]: Likewise. ada/gcc-interface/trans.c | 25 ++- c-family/c-pragma.c | 4 c-family/c-pragma.h | 1 c/c-parser.c | 151 +++++++++++++++---- cfgloop.h | 5 cp/constexpr.c | 2 cp/cp-array-notation.c | 2 cp/cp-tree.h | 9 - cp/init.c | 2 cp/parser.c | 122 ++++++++++++--- cp/pt.c | 16 +- cp/semantics.c | 42 ++++- doc/extend.texi | 12 + doc/generic.texi | 2 fortran/trans-stmt.c | 5 function.h | 5 gimplify.c | 4 loop-init.c | 6 loop-unroll.c | 66 +++++--- lto-streamer-in.c | 1 lto-streamer-out.c | 1 testsuite/c-c++-common/unroll-1.c | 41 +++++ testsuite/c-c++-common/unroll-2.c | 41 +++++ testsuite/c-c++-common/unroll-3.c | 20 ++ testsuite/c-c++-common/unroll-4.c | 29 +++ testsuite/gcc.dg/tree-prof/unroll-1.c | 4 testsuite/gcc.dg/unroll-2.c | 4 testsuite/gcc.dg/unroll-3.c | 2 testsuite/gcc.dg/unroll-4.c | 2 testsuite/gcc.dg/unroll-5.c | 2 testsuite/gcc.dg/unroll-7.c | 4 testsuite/gnat.dg/unroll1.adb | 27 +++ testsuite/gnat.dg/unroll1.ads | 9 + testsuite/gnat.dg/unroll2.adb | 26 +++ testsuite/gnat.dg/unroll2.ads | 9 + tree-cfg.c | 8 + tree-core.h | 1 tree-inline.c | 5 tree-pretty-print.c | 4 tree-ssa-loop-ivcanon.c | 269 ++++++++++++++++++-------------- tree.def | 5 41 files changed, 755 insertions(+), 240 deletions(-) Index: ada/gcc-interface/trans.c =================================================================== --- ada/gcc-interface/trans.c (revision 254797) +++ ada/gcc-interface/trans.c (working copy) @@ -8506,17 +8506,30 @@ gnat_gimplify_stmt (tree *stmt_p) { /* Deal with the optimization hints. */ if (LOOP_STMT_IVDEP (stmt)) - gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, + gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_ivdep_kind), + integer_zero_node); + if (LOOP_STMT_NO_UNROLL (stmt)) + gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + integer_one_node); + if (LOOP_STMT_UNROLL (stmt)) + gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + build_int_cst (NULL_TREE, USHRT_MAX)); if (LOOP_STMT_NO_VECTOR (stmt)) - gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, + gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, build_int_cst (integer_type_node, - annot_expr_no_vector_kind)); + annot_expr_no_vector_kind), + integer_zero_node); if (LOOP_STMT_VECTOR (stmt)) - gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, + gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, build_int_cst (integer_type_node, - annot_expr_vector_kind)); + annot_expr_vector_kind), + integer_zero_node); gnu_cond = build3 (COND_EXPR, void_type_node, gnu_cond, NULL_TREE, Index: c/c-parser.c =================================================================== --- c/c-parser.c (revision 254797) +++ c/c-parser.c (working copy) @@ -1408,9 +1408,9 @@ static tree c_parser_c99_block_statement location_t * = NULL); static void c_parser_if_statement (c_parser *, bool *, vec *); static void c_parser_switch_statement (c_parser *, bool *); -static void c_parser_while_statement (c_parser *, bool, bool *); -static void c_parser_do_statement (c_parser *, bool); -static void c_parser_for_statement (c_parser *, bool, bool *); +static void c_parser_while_statement (c_parser *, bool, unsigned short, bool *); +static void c_parser_do_statement (c_parser *, bool, unsigned short); +static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *); static tree c_parser_asm_statement (c_parser *); static tree c_parser_asm_operands (c_parser *); static tree c_parser_asm_goto_operands (c_parser *); @@ -5495,13 +5495,13 @@ c_parser_statement_after_labels (c_parse c_parser_switch_statement (parser, if_p); break; case RID_WHILE: - c_parser_while_statement (parser, false, if_p); + c_parser_while_statement (parser, false, 0, if_p); break; case RID_DO: - c_parser_do_statement (parser, false); + c_parser_do_statement (parser, 0, false); break; case RID_FOR: - c_parser_for_statement (parser, false, if_p); + c_parser_for_statement (parser, false, 0, if_p); break; case RID_CILK_FOR: if (!flag_cilkplus) @@ -6035,7 +6035,8 @@ c_parser_switch_statement (c_parser *par implement -Wparentheses. */ static void -c_parser_while_statement (c_parser *parser, bool ivdep, bool *if_p) +c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll, + bool *if_p) { tree block, cond, body, save_break, save_cont; location_t loc; @@ -6051,9 +6052,15 @@ c_parser_while_statement (c_parser *pars "%<_Cilk_spawn%> statement cannot be used as a condition for while statement")) cond = error_mark_node; if (ivdep && cond != error_mark_node) - cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + build_int_cst (integer_type_node, unroll)); save_break = c_break_label; c_break_label = NULL_TREE; save_cont = c_cont_label; @@ -6088,7 +6095,7 @@ c_parser_while_statement (c_parser *pars */ static void -c_parser_do_statement (c_parser *parser, bool ivdep) +c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll) { tree block, cond, body, save_break, save_cont, new_break, new_cont; location_t loc; @@ -6116,9 +6123,16 @@ c_parser_do_statement (c_parser *parser, "%<_Cilk_spawn%> statement cannot be used as a condition for a do-while statement")) cond = error_mark_node; if (ivdep && cond != error_mark_node) - cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_unroll_kind), build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + unroll)); if (!c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>")) c_parser_skip_to_end_of_block_or_statement (parser); c_finish_loop (loc, cond, NULL, body, new_break, new_cont, false); @@ -6185,7 +6199,8 @@ c_parser_do_statement (c_parser *parser, implement -Wparentheses. */ static void -c_parser_for_statement (c_parser *parser, bool ivdep, bool *if_p) +c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll, + bool *if_p) { tree block, cond, incr, save_break, save_cont, body; /* The following are only used when parsing an ObjC foreach statement. */ @@ -6306,6 +6321,12 @@ c_parser_for_statement (c_parser *parser "% pragma"); cond = error_mark_node; } + else if (unroll) + { + c_parser_error (parser, "missing loop condition in loop with " + "% pragma"); + cond = error_mark_node; + } else { c_parser_consume_token (parser); @@ -6323,9 +6344,15 @@ c_parser_for_statement (c_parser *parser "expected %<;%>"); } if (ivdep && cond != error_mark_node) - cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + build_int_cst (integer_type_node, unroll)); } /* Parse the increment expression (the third expression in a for-statement). In the case of a foreach-statement, this is @@ -11035,6 +11062,45 @@ c_parser_objc_at_dynamic_declaration (c_ } +static bool +c_parse_pragma_ivdep (c_parser *parser) +{ + c_parser_consume_pragma (parser); + c_parser_skip_to_pragma_eol (parser); + return true; +} + +static unsigned short +c_parser_pragma_unroll (c_parser *parser) +{ + unsigned short unroll; + c_parser_consume_pragma (parser); + location_t location = c_parser_peek_token (parser)->location; + tree expr = c_parser_expr_no_commas (parser, NULL).value; + mark_exp_read (expr); + expr = c_fully_fold (expr, false, NULL); + HOST_WIDE_INT lunroll = 0; + if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)) + || TREE_CODE (expr) != INTEGER_CST + || (lunroll = tree_to_shwi (expr)) < 0 + || lunroll > USHRT_MAX) + { + error_at (location, "%<#pragma GCC unroll%> requires an" + " assignment-expression that evaluates to a non-negative" + " integral constant less than or equal to %u", USHRT_MAX); + unroll = 0; + } + else + { + unroll = (unsigned short)lunroll; + if (unroll == 0) + unroll = 1; + } + + c_parser_skip_to_pragma_eol (parser); + return unroll; +} + /* Handle pragmas. Some OpenMP pragmas are associated with, and therefore should be considered, statements. ALLOW_STMT is true if we're within the context of a function and such pragmas are to be allowed. Returns @@ -11177,21 +11243,46 @@ c_parser_pragma (c_parser *parser, enum return c_parser_omp_ordered (parser, context, if_p); case PRAGMA_IVDEP: - c_parser_consume_pragma (parser); - c_parser_skip_to_pragma_eol (parser); - if (!c_parser_next_token_is_keyword (parser, RID_FOR) - && !c_parser_next_token_is_keyword (parser, RID_WHILE) - && !c_parser_next_token_is_keyword (parser, RID_DO)) - { - c_parser_error (parser, "for, while or do statement expected"); - return false; - } - if (c_parser_next_token_is_keyword (parser, RID_FOR)) - c_parser_for_statement (parser, true, if_p); - else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) - c_parser_while_statement (parser, true, if_p); - else - c_parser_do_statement (parser, true); + { + bool ivdep = c_parse_pragma_ivdep (parser); + unsigned short unroll = 0; + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL) + unroll = c_parser_pragma_unroll (parser); + if (!c_parser_next_token_is_keyword (parser, RID_FOR) + && !c_parser_next_token_is_keyword (parser, RID_WHILE) + && !c_parser_next_token_is_keyword (parser, RID_DO)) + { + c_parser_error (parser, "for, while or do statement expected"); + return false; + } + if (c_parser_next_token_is_keyword (parser, RID_FOR)) + c_parser_for_statement (parser, ivdep, unroll, if_p); + else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) + c_parser_while_statement (parser, ivdep, unroll, if_p); + else + c_parser_do_statement (parser, ivdep, unroll); + } + return false; + case PRAGMA_UNROLL: + { + unsigned short unroll = c_parser_pragma_unroll (parser); + bool ivdep = false; + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP) + ivdep = c_parse_pragma_ivdep (parser); + if (!c_parser_next_token_is_keyword (parser, RID_FOR) + && !c_parser_next_token_is_keyword (parser, RID_WHILE) + && !c_parser_next_token_is_keyword (parser, RID_DO)) + { + c_parser_error (parser, "for, while or do statement expected"); + return false; + } + if (c_parser_next_token_is_keyword (parser, RID_FOR)) + c_parser_for_statement (parser, ivdep, unroll, if_p); + else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) + c_parser_while_statement (parser, ivdep, unroll, if_p); + else + c_parser_do_statement (parser, ivdep, unroll); + } return false; case PRAGMA_GCC_PCH_PREPROCESS: Index: c-family/c-pragma.c =================================================================== --- c-family/c-pragma.c (revision 254797) +++ c-family/c-pragma.c (working copy) @@ -1544,6 +1544,10 @@ init_pragma (void) cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false, false); + if (!flag_preprocess_only) + cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL, + false, false); + if (flag_cilkplus) cpp_register_deferred_pragma (parse_in, "cilk", "grainsize", PRAGMA_CILK_GRAINSIZE, true, false); Index: c-family/c-pragma.h =================================================================== --- c-family/c-pragma.h (revision 254797) +++ c-family/c-pragma.h (working copy) @@ -75,6 +75,7 @@ enum pragma_kind { PRAGMA_GCC_PCH_PREPROCESS, PRAGMA_IVDEP, + PRAGMA_UNROLL, PRAGMA_FIRST_EXTERNAL }; Index: cfgloop.h =================================================================== --- cfgloop.h (revision 254797) +++ cfgloop.h (working copy) @@ -221,6 +221,11 @@ struct GTY ((chain_next ("%h.next"))) lo /* True if the loop is part of an oacc kernels region. */ unsigned in_oacc_kernels_region : 1; + /* The number of times to unroll the loop. 0, means no information + given, just do what we always do. A value of 1, means don't unroll + the loop. */ + unsigned short unroll; + /* For SIMD loops, this is a unique identifier of the loop, referenced by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE builtins. */ Index: cp/constexpr.c =================================================================== --- cp/constexpr.c (revision 254797) +++ cp/constexpr.c (working copy) @@ -4631,7 +4631,6 @@ cxx_eval_constant_expression (const cons return t; case ANNOTATE_EXPR: - gcc_assert (tree_to_uhwi (TREE_OPERAND (t, 1)) == annot_expr_ivdep_kind); r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0), lval, non_constant_p, overflow_p, @@ -5879,7 +5878,6 @@ potential_constant_expression_1 (tree t, } case ANNOTATE_EXPR: - gcc_assert (tree_to_uhwi (TREE_OPERAND (t, 1)) == annot_expr_ivdep_kind); return RECUR (TREE_OPERAND (t, 0), rval); default: Index: cp/cp-array-notation.c =================================================================== --- cp/cp-array-notation.c (revision 254797) +++ cp/cp-array-notation.c (working copy) @@ -67,7 +67,7 @@ create_an_loop (tree init, tree cond, tr finish_expr_stmt (init); for_stmt = begin_for_stmt (NULL_TREE, NULL_TREE); finish_init_stmt (for_stmt); - finish_for_cond (cond, for_stmt, false); + finish_for_cond (cond, for_stmt, false, 0); finish_for_expr (incr, for_stmt); finish_expr_stmt (body); finish_for_stmt (for_stmt); Index: cp/cp-tree.h =================================================================== --- cp/cp-tree.h (revision 254797) +++ cp/cp-tree.h (working copy) @@ -6402,7 +6402,8 @@ extern tree implicitly_declare_fn extern bool maybe_clone_body (tree); /* In parser.c */ -extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool); +extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool, + unsigned short); extern bool parsing_nsdmi (void); extern bool parsing_default_capturing_generic_lambda_in_template (void); extern void inject_this_parameter (tree, cp_cv_quals); @@ -6687,16 +6688,16 @@ extern void begin_else_clause (tree); extern void finish_else_clause (tree); extern void finish_if_stmt (tree); extern tree begin_while_stmt (void); -extern void finish_while_stmt_cond (tree, tree, bool); +extern void finish_while_stmt_cond (tree, tree, bool, unsigned short); extern void finish_while_stmt (tree); extern tree begin_do_stmt (void); extern void finish_do_body (tree); -extern void finish_do_stmt (tree, tree, bool); +extern void finish_do_stmt (tree, tree, bool, unsigned short); extern tree finish_return_stmt (tree); extern tree begin_for_scope (tree *); extern tree begin_for_stmt (tree, tree); extern void finish_init_stmt (tree); -extern void finish_for_cond (tree, tree, bool); +extern void finish_for_cond (tree, tree, bool, unsigned short); extern void finish_for_expr (tree, tree); extern void finish_for_stmt (tree); extern tree begin_range_for_stmt (tree, tree); Index: cp/init.c =================================================================== --- cp/init.c (revision 254797) +++ cp/init.c (working copy) @@ -4319,7 +4319,7 @@ build_vec_init (tree base, tree maxindex finish_init_stmt (for_stmt); finish_for_cond (build2 (GT_EXPR, boolean_type_node, iterator, build_int_cst (TREE_TYPE (iterator), -1)), - for_stmt, false); + for_stmt, false, 0); elt_init = cp_build_unary_op (PREDECREMENT_EXPR, iterator, false, complain); if (elt_init == error_mark_node) Index: cp/parser.c =================================================================== --- cp/parser.c (revision 254797) +++ cp/parser.c (working copy) @@ -2119,15 +2119,15 @@ static tree cp_parser_selection_statemen static tree cp_parser_condition (cp_parser *); static tree cp_parser_iteration_statement - (cp_parser *, bool *, bool); + (cp_parser *, bool *, bool, unsigned short); static bool cp_parser_init_statement (cp_parser *, tree *decl); static tree cp_parser_for - (cp_parser *, bool); + (cp_parser *, bool, unsigned short); static tree cp_parser_c_for - (cp_parser *, tree, tree, bool); + (cp_parser *, tree, tree, bool, unsigned short); static tree cp_parser_range_for - (cp_parser *, tree, tree, tree, bool); + (cp_parser *, tree, tree, tree, bool, unsigned short); static void do_range_for_auto_deduction (tree, tree); static tree cp_parser_perform_range_for_lookup @@ -10875,7 +10875,7 @@ cp_parser_statement (cp_parser* parser, case RID_WHILE: case RID_DO: case RID_FOR: - statement = cp_parser_iteration_statement (parser, if_p, false); + statement = cp_parser_iteration_statement (parser, if_p, false, 0); break; case RID_CILK_FOR: @@ -11742,7 +11742,7 @@ cp_parser_condition (cp_parser* parser) not included. */ static tree -cp_parser_for (cp_parser *parser, bool ivdep) +cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll) { tree init, scope, decl; bool is_range_for; @@ -11754,13 +11754,14 @@ cp_parser_for (cp_parser *parser, bool i is_range_for = cp_parser_init_statement (parser, &decl); if (is_range_for) - return cp_parser_range_for (parser, scope, init, decl, ivdep); + return cp_parser_range_for (parser, scope, init, decl, ivdep, unroll); else - return cp_parser_c_for (parser, scope, init, ivdep); + return cp_parser_c_for (parser, scope, init, ivdep, unroll); } static tree -cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep) +cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep, + unsigned short unroll) { /* Normal for loop */ tree condition = NULL_TREE; @@ -11781,7 +11782,13 @@ cp_parser_c_for (cp_parser *parser, tree "% pragma"); condition = error_mark_node; } - finish_for_cond (condition, stmt, ivdep); + else if (unroll) + { + cp_parser_error (parser, "missing loop condition in loop with " + "% pragma"); + condition = error_mark_node; + } + finish_for_cond (condition, stmt, ivdep, unroll); /* Look for the `;'. */ cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON); @@ -11805,7 +11812,7 @@ cp_parser_c_for (cp_parser *parser, tree static tree cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl, - bool ivdep) + bool ivdep, unsigned short unroll) { tree stmt, range_expr; auto_vec bindings; @@ -11874,6 +11881,8 @@ cp_parser_range_for (cp_parser *parser, stmt = begin_range_for_stmt (scope, init); if (ivdep) RANGE_FOR_IVDEP (stmt) = 1; + if (unroll) + /* TODO */(void)0; finish_range_for_decl (stmt, range_decl, range_expr); if (!type_dependent_expression_p (range_expr) /* do_auto_deduction doesn't mess with template init-lists. */ @@ -11884,7 +11893,8 @@ cp_parser_range_for (cp_parser *parser, { stmt = begin_for_stmt (scope, init); stmt = cp_convert_range_for (stmt, range_decl, range_expr, - decomp_first_name, decomp_cnt, ivdep); + decomp_first_name, decomp_cnt, ivdep, + unroll); } return stmt; } @@ -11978,7 +11988,7 @@ do_range_for_auto_deduction (tree decl, tree cp_convert_range_for (tree statement, tree range_decl, tree range_expr, tree decomp_first_name, unsigned int decomp_cnt, - bool ivdep) + bool ivdep, unsigned short unroll) { tree begin, end; tree iter_type, begin_expr, end_expr; @@ -12039,7 +12049,7 @@ cp_convert_range_for (tree statement, tr begin, ERROR_MARK, end, ERROR_MARK, NULL, tf_warning_or_error); - finish_for_cond (condition, statement, ivdep); + finish_for_cond (condition, statement, ivdep, unroll); /* The new increment expression. */ expression = finish_unary_op_expr (input_location, @@ -12214,7 +12224,8 @@ cp_parser_range_for_member_function (tre Returns the new WHILE_STMT, DO_STMT, FOR_STMT or RANGE_FOR_STMT. */ static tree -cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep) +cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, + unsigned short unroll) { cp_token *token; enum rid keyword; @@ -12248,7 +12259,7 @@ cp_parser_iteration_statement (cp_parser parens.require_open (parser); /* Parse the condition. */ condition = cp_parser_condition (parser); - finish_while_stmt_cond (condition, statement, ivdep); + finish_while_stmt_cond (condition, statement, ivdep, unroll); /* Look for the `)'. */ parens.require_close (parser); /* Parse the dependent statement. */ @@ -12279,7 +12290,7 @@ cp_parser_iteration_statement (cp_parser /* Parse the expression. */ expression = cp_parser_expression (parser); /* We're done with the do-statement. */ - finish_do_stmt (expression, statement, ivdep); + finish_do_stmt (expression, statement, ivdep, unroll); /* Look for the `)'. */ parens.require_close (parser); /* Look for the `;'. */ @@ -12293,7 +12304,7 @@ cp_parser_iteration_statement (cp_parser matching_parens parens; parens.require_open (parser); - statement = cp_parser_for (parser, ivdep); + statement = cp_parser_for (parser, ivdep, unroll); /* Look for the `)'. */ parens.require_close (parser); @@ -38672,6 +38683,41 @@ cp_parser_cilk_grainsize (cp_parser *par cp_parser_skip_to_pragma_eol (parser, pragma_tok); } +static bool +cp_parser_pragma_ivdep (cp_parser *parser, cp_token *pragma_tok) +{ + cp_parser_skip_to_pragma_eol (parser, pragma_tok); + return true; +} + +static unsigned short +cp_parser_pragma_unroll (cp_parser *parser, cp_token *pragma_tok) +{ + location_t location = cp_lexer_peek_token (parser->lexer)->location; + tree expr = cp_parser_constant_expression (parser); + unsigned short unroll; + expr = maybe_constant_value (expr); + cp_parser_skip_to_pragma_eol (parser, pragma_tok); + HOST_WIDE_INT lunroll = 0; + if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)) + || TREE_CODE (expr) != INTEGER_CST + || (lunroll = tree_to_shwi (expr)) < 0 + || lunroll > USHRT_MAX) + { + error_at (location, "%<#pragma GCC unroll%> requires an" + " assignment-expression that evaluates to a non-negative" + " integral constant less than or equal to %u", USHRT_MAX); + unroll = 0; + } + else + { + unroll = (unsigned short)lunroll; + if (unroll == 0) + unroll = 1; + } + return unroll; +} + /* Normal parsing of a pragma token. Here we can (and must) use the regular lexer. */ @@ -38914,9 +38960,45 @@ cp_parser_pragma (cp_parser *parser, enu "%<#pragma GCC ivdep%> must be inside a function"); break; } - cp_parser_skip_to_pragma_eol (parser, pragma_tok); + bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok); + unsigned short unroll = 0; cp_token *tok; tok = cp_lexer_peek_token (the_parser->lexer); + if (tok->type == CPP_PRAGMA + && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL) + { + unroll = cp_parser_pragma_unroll (parser, pragma_tok); + tok = cp_lexer_peek_token (the_parser->lexer); + } + if (tok->type != CPP_KEYWORD + || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE + && tok->keyword != RID_DO)) + { + cp_parser_error (parser, "for, while or do statement expected"); + return false; + } + cp_parser_iteration_statement (parser, if_p, ivdep, unroll); + return true; + } + + case PRAGMA_UNROLL: + { + if (context == pragma_external) + { + error_at (pragma_tok->location, + "%<#pragma GCC unroll%> must be inside a function"); + break; + } + unsigned short unroll = cp_parser_pragma_unroll (parser, pragma_tok); + bool ivdep = false; + cp_token *tok; + tok = cp_lexer_peek_token (the_parser->lexer); + if (tok->type == CPP_PRAGMA + && cp_parser_pragma_kind (tok) == PRAGMA_IVDEP) + { + ivdep = cp_parser_pragma_ivdep (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + } if (tok->type != CPP_KEYWORD || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE && tok->keyword != RID_DO)) @@ -38924,7 +39006,7 @@ cp_parser_pragma (cp_parser *parser, enu cp_parser_error (parser, "for, while or do statement expected"); return false; } - cp_parser_iteration_statement (parser, if_p, true); + cp_parser_iteration_statement (parser, if_p, ivdep, unroll); return true; } Index: cp/pt.c =================================================================== --- cp/pt.c (revision 254797) +++ cp/pt.c (working copy) @@ -16090,7 +16090,7 @@ tsubst_expr (tree t, tree args, tsubst_f RECUR (FOR_INIT_STMT (t)); finish_init_stmt (stmt); tmp = RECUR (FOR_COND (t)); - finish_for_cond (tmp, stmt, false); + finish_for_cond (tmp, stmt, false, 0); tmp = RECUR (FOR_EXPR (t)); finish_for_expr (tmp, stmt); RECUR (FOR_BODY (t)); @@ -16112,11 +16112,11 @@ tsubst_expr (tree t, tree args, tsubst_f decl = tsubst_decomp_names (decl, RANGE_FOR_DECL (t), args, complain, in_decl, &first, &cnt); stmt = cp_convert_range_for (stmt, decl, expr, first, cnt, - RANGE_FOR_IVDEP (t)); + RANGE_FOR_IVDEP (t), 0); } else stmt = cp_convert_range_for (stmt, decl, expr, NULL_TREE, 0, - RANGE_FOR_IVDEP (t)); + RANGE_FOR_IVDEP (t), 0); RECUR (RANGE_FOR_BODY (t)); finish_for_stmt (stmt); } @@ -16125,7 +16125,7 @@ tsubst_expr (tree t, tree args, tsubst_f case WHILE_STMT: stmt = begin_while_stmt (); tmp = RECUR (WHILE_COND (t)); - finish_while_stmt_cond (tmp, stmt, false); + finish_while_stmt_cond (tmp, stmt, false, 0); RECUR (WHILE_BODY (t)); finish_while_stmt (stmt); break; @@ -16135,7 +16135,7 @@ tsubst_expr (tree t, tree args, tsubst_f RECUR (DO_BODY (t)); finish_do_body (stmt); tmp = RECUR (DO_COND (t)); - finish_do_stmt (tmp, stmt, false); + finish_do_stmt (tmp, stmt, false, 0); break; case IF_STMT: @@ -16699,8 +16699,10 @@ tsubst_expr (tree t, tree args, tsubst_f case ANNOTATE_EXPR: tmp = RECUR (TREE_OPERAND (t, 0)); - RETURN (build2_loc (EXPR_LOCATION (t), ANNOTATE_EXPR, - TREE_TYPE (tmp), tmp, RECUR (TREE_OPERAND (t, 1)))); + RETURN (build3_loc (EXPR_LOCATION (t), ANNOTATE_EXPR, + TREE_TYPE (tmp), tmp, + RECUR (TREE_OPERAND (t, 1)), + RECUR (TREE_OPERAND (t, 2)))); default: gcc_assert (!STATEMENT_CODE_P (TREE_CODE (t))); Index: cp/semantics.c =================================================================== --- cp/semantics.c (revision 254797) +++ cp/semantics.c (working copy) @@ -802,7 +802,8 @@ begin_while_stmt (void) WHILE_STMT. */ void -finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep) +finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep, + unsigned short unroll) { if (check_no_cilk (cond, "Cilk array notation cannot be used as a condition for while statement", @@ -812,11 +813,20 @@ finish_while_stmt_cond (tree cond, tree finish_cond (&WHILE_COND (while_stmt), cond); begin_maybe_infinite_loop (cond); if (ivdep && cond != error_mark_node) - WHILE_COND (while_stmt) = build2 (ANNOTATE_EXPR, + WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR, TREE_TYPE (WHILE_COND (while_stmt)), WHILE_COND (while_stmt), build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR, + TREE_TYPE (WHILE_COND (while_stmt)), + WHILE_COND (while_stmt), + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + build_int_cst (integer_type_node, + unroll)); simplify_loop_decl_cond (&WHILE_COND (while_stmt), WHILE_BODY (while_stmt)); } @@ -861,7 +871,7 @@ finish_do_body (tree do_stmt) COND is as indicated. */ void -finish_do_stmt (tree cond, tree do_stmt, bool ivdep) +finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll) { if (check_no_cilk (cond, "Cilk array notation cannot be used as a condition for a do-while statement", @@ -870,8 +880,13 @@ finish_do_stmt (tree cond, tree do_stmt, cond = maybe_convert_cond (cond); end_maybe_infinite_loop (cond); if (ivdep && cond != error_mark_node) - cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, - build_int_cst (integer_type_node, annot_expr_ivdep_kind)); + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, annot_expr_unroll_kind), + build_int_cst (integer_type_node, unroll)); DO_COND (do_stmt) = cond; } @@ -980,7 +995,7 @@ finish_init_stmt (tree for_stmt) FOR_STMT. */ void -finish_for_cond (tree cond, tree for_stmt, bool ivdep) +finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll) { if (check_no_cilk (cond, "Cilk array notation cannot be used in a condition for a for-loop", @@ -990,11 +1005,20 @@ finish_for_cond (tree cond, tree for_stm finish_cond (&FOR_COND (for_stmt), cond); begin_maybe_infinite_loop (cond); if (ivdep && cond != error_mark_node) - FOR_COND (for_stmt) = build2 (ANNOTATE_EXPR, + FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR, TREE_TYPE (FOR_COND (for_stmt)), FOR_COND (for_stmt), build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR, + TREE_TYPE (FOR_COND (for_stmt)), + FOR_COND (for_stmt), + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + build_int_cst (integer_type_node, + unroll)); simplify_loop_decl_cond (&FOR_COND (for_stmt), FOR_BODY (for_stmt)); } Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 254797) +++ doc/extend.texi (working copy) @@ -22376,6 +22376,18 @@ void ignore_vec_dep (int *a, int k, int @} @end smallexample +@table @code +@item #pragma GCC unroll @var{n} +@cindex pragma GCC unroll @var{n} + +With this pragma, the programmer informs the optimizer how many times +a loop should be unrolled. A 0 or 1 informs the compiler to not +perform any loop unrolling. The pragma must be immediately before +@samp{#pragma ivdep} or a @code{for}, @code{while} or @code{do} loop +and applies only to the loop that follows. @var{n} is an +assignment-expression that evaluates to an integer constant. + +@end table @node Unnamed Fields @section Unnamed Structure and Union Fields Index: doc/generic.texi =================================================================== --- doc/generic.texi (revision 254797) +++ doc/generic.texi (working copy) @@ -1686,7 +1686,7 @@ its sole argument yields the representat @item ANNOTATE_EXPR This node is used to attach markers to an expression. The first operand is the annotated expression, the second is an @code{INTEGER_CST} with -a value from @code{enum annot_expr_kind}. +a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}. @end table Index: fortran/trans-stmt.c =================================================================== --- fortran/trans-stmt.c (revision 254797) +++ fortran/trans-stmt.c (working copy) @@ -3453,9 +3453,10 @@ gfc_trans_forall_loop (forall_info *fora cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node, count, build_int_cst (TREE_TYPE (count), 0)); if (forall_tmp->do_concurrent) - cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_ivdep_kind), + integer_zero_node); tmp = build1_v (GOTO_EXPR, exit_label); tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node, Index: function.h =================================================================== --- function.h (revision 254797) +++ function.h (working copy) @@ -385,8 +385,11 @@ struct GTY(()) function { nonzero value in loop->simduid. */ unsigned int has_simduid_loops : 1; - /* Set when the tail call has been identified. */ + /* Nonzero when the tail call has been identified. */ unsigned int tail_call_marked : 1; + + /* Nonzero if the current function contains a #pragma GCC unroll. */ + unsigned int has_unroll : 1; }; /* Add the decl D to the local_decls list of FUN. */ Index: gimplify.c =================================================================== --- gimplify.c (revision 254797) +++ gimplify.c (working copy) @@ -3747,6 +3747,7 @@ gimple_boolify (tree expr) switch ((enum annot_expr_kind) TREE_INT_CST_LOW (TREE_OPERAND (expr, 1))) { case annot_expr_ivdep_kind: + case annot_expr_unroll_kind: case annot_expr_no_vector_kind: case annot_expr_vector_kind: TREE_OPERAND (expr, 0) = gimple_boolify (TREE_OPERAND (expr, 0)); @@ -11389,6 +11390,7 @@ gimplify_expr (tree *expr_p, gimple_seq { tree cond = TREE_OPERAND (*expr_p, 0); tree kind = TREE_OPERAND (*expr_p, 1); + tree data = TREE_OPERAND (*expr_p, 2); tree type = TREE_TYPE (cond); if (!INTEGRAL_TYPE_P (type)) { @@ -11399,7 +11401,7 @@ gimplify_expr (tree *expr_p, gimple_seq tree tmp = create_tmp_var (type); gimplify_arg (&cond, pre_p, EXPR_LOCATION (*expr_p)); gcall *call - = gimple_build_call_internal (IFN_ANNOTATE, 2, cond, kind); + = gimple_build_call_internal (IFN_ANNOTATE, 3, cond, kind, data); gimple_call_set_lhs (call, tmp); gimplify_seq_add_stmt (pre_p, call); *expr_p = tmp; Index: loop-init.c =================================================================== --- loop-init.c (revision 254797) +++ loop-init.c (working copy) @@ -361,8 +361,8 @@ pass_loop2::gate (function *fun) && (flag_move_loop_invariants || flag_unswitch_loops || flag_unroll_loops - || (flag_branch_on_count_reg - && targetm.have_doloop_end ()))) + || (flag_branch_on_count_reg && targetm.have_doloop_end ()) + || cfun->has_unroll)) return true; else { @@ -560,7 +560,7 @@ public: /* opt_pass methods: */ virtual bool gate (function *) { - return (flag_unroll_loops || flag_unroll_all_loops); + return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll); } virtual unsigned int execute (function *); Index: loop-unroll.c =================================================================== --- loop-unroll.c (revision 254797) +++ loop-unroll.c (working copy) @@ -212,7 +212,7 @@ report_unroll (struct loop *loop, locati /* Decide whether unroll loops and how much. */ static void -decide_unrolling (int flags) +decide_unrolling (int base_flags) { struct loop *loop; @@ -224,9 +224,16 @@ decide_unrolling (int flags) if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, locus, - ";; *** Considering loop %d at BB %d for " - "unrolling ***\n", - loop->num, loop->header->index); + "considering unrolling loop %d at BB %d\n", + loop->num, loop->header->index); + + if (loop->unroll == 1) + { + if (dump_file) + fprintf (dump_file, + ";; Not unrolling loop, user didn't want it unrolled\n"); + continue; + } /* Do not peel cold areas. */ if (optimize_loop_for_size_p (loop)) @@ -258,6 +265,9 @@ decide_unrolling (int flags) /* Try transformations one by one in decreasing order of priority. */ + int flags = base_flags; + if (loop->unroll > 1) + flags = UAP_UNROLL | UAP_UNROLL_ALL; decide_unroll_constant_iterations (loop, flags); if (loop->lpt_decision.decision == LPT_NONE) @@ -353,13 +363,13 @@ decide_unroll_constant_iterations (struc return; } - if (dump_file) - fprintf (dump_file, - "\n;; Considering unrolling loop with constant " - "number of iterations\n"); + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, + "considering unrolling loop with constant " + "number of iterations\n"); /* nunroll = total number of copies of the original loop body in - unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */ + unrolled loop (i.e. if it is 2, we have to duplicate loop body once). */ nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns; nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns; @@ -391,6 +401,14 @@ decide_unroll_constant_iterations (struc return; } + /* Check for an explicit unrolling factor. */ + if (loop->unroll) + { + loop->lpt_decision.decision = LPT_UNROLL_CONSTANT; + loop->lpt_decision.times = MIN ((unsigned) loop->unroll - 1, desc->niter); + return; + } + /* Check whether the loop rolls enough to consider. Consult also loop bounds and profile; in the case the loop has more than one exit it may well loop less than determined maximal number @@ -657,10 +675,10 @@ decide_unroll_runtime_iterations (struct return; } - if (dump_file) - fprintf (dump_file, - "\n;; Considering unrolling loop with runtime " - "computable number of iterations\n"); + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, + "considering unrolling loop with runtime-" + "computable number of iterations\n"); /* nunroll = total number of copies of the original loop body in unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */ @@ -674,6 +692,9 @@ decide_unroll_runtime_iterations (struct if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); + if (loop->unroll) + nunroll = loop->unroll; + /* Skip big loops. */ if (nunroll <= 1) { @@ -712,8 +733,9 @@ decide_unroll_runtime_iterations (struct return; } - /* Success; now force nunroll to be power of 2, as we are unable to - cope with overflows in computation of number of iterations. */ + /* Success; now force nunroll to be power of 2, as code-gen + requires it, we are unable to cope with overflows in + computation of number of iterations. */ for (i = 1; 2 * i <= nunroll; i *= 2) continue; @@ -824,9 +846,10 @@ compare_and_jump_seq (rtx op0, rtx op1, return seq; } -/* Unroll LOOP for which we are able to count number of iterations in runtime - LOOP->LPT_DECISION.TIMES times. The transformation does this (with some - extra care for case n < 0): +/* Unroll LOOP for which we are able to count number of iterations in + runtime LOOP->LPT_DECISION.TIMES times. The times value must be a + power of two. The transformation does this (with some extra care + for case n < 0): for (i = 0; i < n; i++) body; @@ -1139,8 +1162,8 @@ decide_unroll_stupid (struct loop *loop, return; } - if (dump_file) - fprintf (dump_file, "\n;; Considering unrolling loop stupidly\n"); + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, "considering unrolling loop stupidly\n"); /* nunroll = total number of copies of the original loop body in unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */ @@ -1155,6 +1178,9 @@ decide_unroll_stupid (struct loop *loop, if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); + if (loop->unroll) + nunroll = loop->unroll; + /* Skip big loops. */ if (nunroll <= 1) { Index: lto-streamer-in.c =================================================================== --- lto-streamer-in.c (revision 254797) +++ lto-streamer-in.c (working copy) @@ -825,6 +825,7 @@ input_cfg (struct lto_input_block *ib, s /* Read OMP SIMD related info. */ loop->safelen = streamer_read_hwi (ib); + loop->unroll = streamer_read_hwi (ib); loop->dont_vectorize = streamer_read_hwi (ib); loop->force_vectorize = streamer_read_hwi (ib); loop->simduid = stream_read_tree (ib, data_in); Index: lto-streamer-out.c =================================================================== --- lto-streamer-out.c (revision 254797) +++ lto-streamer-out.c (working copy) @@ -1929,6 +1929,7 @@ output_cfg (struct output_block *ob, str /* Write OMP SIMD related info. */ streamer_write_hwi (ob, loop->safelen); + streamer_write_hwi (ob, loop->unroll); streamer_write_hwi (ob, loop->dont_vectorize); streamer_write_hwi (ob, loop->force_vectorize); stream_write_tree (ob, loop->simduid, true); Index: testsuite/c-c++-common/unroll-1.c =================================================================== --- testsuite/c-c++-common/unroll-1.c (revision 0) +++ testsuite/c-c++-common/unroll-1.c (working copy) @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdisable-tree-cunroll -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */ + +extern void bar (int); + +int j; + +void test (void) +{ + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + /* { dg-final { scan-tree-dump "11:.*: note: loop with 8 iterations completely unrolled" "cunrolli" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 7; ++i) + bar(i); + /* { dg-final { scan-tree-dump "16:.*: note: loop with 7 iterations completely unrolled" "cunrolli" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 15; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "21:.*: note: loop unrolled 7 times" "loop2_unroll" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "26:.*: note: loop unrolled 7 times" "loop2_unroll" } } */ + + #pragma GCC unroll 7 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "31:.*: note: loop unrolled 3 times" "loop2_unroll" } } */ + + unsigned long i = 0; + #pragma GCC unroll 3 + do { + bar(i); + } while (++i < 9); + /* { dg-final { scan-rtl-dump "3\[79\]:.*: note: loop unrolled 2 times" "loop2_unroll" } } */ +} Index: testsuite/c-c++-common/unroll-2.c =================================================================== --- testsuite/c-c++-common/unroll-2.c (revision 0) +++ testsuite/c-c++-common/unroll-2.c (working copy) @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */ + +extern void bar (int); + +int j; + +void test (void) +{ + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + /* { dg-final { scan-tree-dump "11:.*: note: loop with 8 iterations completely unrolled" "cunrolli" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 7; ++i) + bar(i); + /* { dg-final { scan-tree-dump "16:.*: note: loop with 7 iterations completely unrolled" "cunrolli" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 15; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "21:.*: note: loop unrolled 7 times" "loop2_unroll" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "26:.*: note: loop unrolled 7 times" "loop2_unroll" } } */ + + #pragma GCC unroll 7 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "31:.*: note: loop unrolled 3 times" "loop2_unroll" } } */ + + unsigned long i = 0; + #pragma GCC unroll 3 + do { + bar(i); + } while (++i < 9); + /* { dg-final { scan-rtl-dump "3\[79\]:.*: note: loop unrolled 2 times" "loop2_unroll" } } */ +} Index: testsuite/c-c++-common/unroll-3.c =================================================================== --- testsuite/c-c++-common/unroll-3.c (revision 0) +++ testsuite/c-c++-common/unroll-3.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */ + +extern void bar (int); + +int j; + +void test (void) +{ + #pragma GCC unroll 0 + for (unsigned long i = 1; i <= 3; ++i) + bar(i); + + #pragma GCC unroll 0 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + + /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */ + /* { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */ +} Index: testsuite/c-c++-common/unroll-4.c =================================================================== --- testsuite/c-c++-common/unroll-4.c (revision 0) +++ testsuite/c-c++-common/unroll-4.c (working copy) @@ -0,0 +1,29 @@ +/* { dg-do compile } */ + +extern void bar (int); + +int j; + +void test (void) +{ + #pragma GCC unroll 4+4 + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + + #pragma GCC unroll -1 /* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */ + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + + #pragma GCC unroll 20000000000 /* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */ + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + + #pragma GCC unroll j /* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */ + /* { dg-error "cannot appear in a constant-expression|is not usable in a constant expression" "" { target c++ } 21 } */ + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + + #pragma GCC unroll 4.2 /* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */ + for (unsigned long i = 1; i <= 8; ++i) + bar(i); +} Index: testsuite/gcc.dg/tree-prof/unroll-1.c =================================================================== --- testsuite/gcc.dg/tree-prof/unroll-1.c (revision 254797) +++ testsuite/gcc.dg/tree-prof/unroll-1.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-options "-O3 -fdump-rtl-loop2_unroll -funroll-loops -fno-peel-loops" } */ +/* { dg-options "-O3 -fdump-rtl-loop2_unroll-details -funroll-loops -fno-peel-loops" } */ void abort (); int a[1000]; @@ -20,4 +20,4 @@ main() t(); return 0; } -/* { dg-final-use { scan-rtl-dump "Considering unrolling loop with constant number of iterations" "loop2_unroll" } } */ +/* { dg-final-use { scan-rtl-dump "considering unrolling loop with constant number of iterations" "loop2_unroll" } } */ Index: testsuite/gcc.dg/unroll-2.c =================================================================== --- testsuite/gcc.dg/unroll-2.c (revision 254797) +++ testsuite/gcc.dg/unroll-2.c (working copy) @@ -15,7 +15,7 @@ int foo(void) { int i; bar(); - for (i = 0; i < 2; i++) /* { dg-message "note: loop with 3 iterations completely unrolled" } */ + for (i = 0; i < 2; i++) /* { dg-message "note: loop with 2 iterations completely unrolled" } */ { a[i]= b[i] + 1; } @@ -25,7 +25,7 @@ int foo(void) int foo2(void) { int i; - for (i = 0; i < 2; i++) /* { dg-message "note: loop with 3 iterations completely unrolled" } */ + for (i = 0; i < 2; i++) /* { dg-message "note: loop with 2 iterations completely unrolled" } */ { a[i]= b[i] + 1; } Index: testsuite/gcc.dg/unroll-3.c =================================================================== --- testsuite/gcc.dg/unroll-3.c (revision 254797) +++ testsuite/gcc.dg/unroll-3.c (working copy) @@ -28,4 +28,4 @@ int foo2(void) return 1; } -/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */ +/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */ Index: testsuite/gcc.dg/unroll-4.c =================================================================== --- testsuite/gcc.dg/unroll-4.c (revision 254797) +++ testsuite/gcc.dg/unroll-4.c (working copy) @@ -28,4 +28,4 @@ int foo2(void) return 1; } -/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */ +/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */ Index: testsuite/gcc.dg/unroll-5.c =================================================================== --- testsuite/gcc.dg/unroll-5.c (revision 254797) +++ testsuite/gcc.dg/unroll-5.c (working copy) @@ -28,4 +28,4 @@ int foo2(void) return 1; } -/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */ +/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */ Index: testsuite/gcc.dg/unroll-7.c =================================================================== --- testsuite/gcc.dg/unroll-7.c (revision 254797) +++ testsuite/gcc.dg/unroll-7.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -funroll-loops" } */ +/* { dg-options "-O2 -fdump-rtl-loop2_unroll-details -funroll-loops" } */ /* { dg-require-effective-target int32plus } */ extern int *a; @@ -14,5 +14,5 @@ int t(void) /* { dg-final { scan-rtl-dump "number of iterations: .const_int 999999" "loop2_unroll" } } */ /* { dg-final { scan-rtl-dump "upper bound: 999999" "loop2_unroll" } } */ /* { dg-final { scan-rtl-dump "realistic bound: 999999" "loop2_unroll" } } */ -/* { dg-final { scan-rtl-dump "Considering unrolling loop with constant number of iterations" "loop2_unroll" } } */ +/* { dg-final { scan-rtl-dump "considering unrolling loop with constant number of iterations" "loop2_unroll" } } */ /* { dg-final { scan-rtl-dump-not "Invalid sum" "loop2_unroll" } } */ Index: testsuite/gnat.dg/unroll1.adb =================================================================== --- testsuite/gnat.dg/unroll1.adb (revision 0) +++ testsuite/gnat.dg/unroll1.adb (working copy) @@ -0,0 +1,27 @@ +-- { dg-do compile } +-- { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } + +package body Unroll1 is + + function "+" (X, Y : Sarray) return Sarray is + R : Sarray; + begin + for I in Sarray'Range loop + pragma Loop_Optimize (No_Unroll); + R(I) := X(I) + Y(I); + end loop; + return R; + end; + + procedure Add (X, Y : Sarray; R : out Sarray) is + begin + for I in Sarray'Range loop + pragma Loop_Optimize (No_Unroll); + R(I) := X(I) + Y(I); + end loop; + end; + +end Unroll1; + +-- { dg-final { scan-tree-dump-times "Not unrolling loop .: user didn't want it unrolled completely" 2 "cunrolli" } } */ +-- { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */ Index: testsuite/gnat.dg/unroll1.ads =================================================================== --- testsuite/gnat.dg/unroll1.ads (revision 0) +++ testsuite/gnat.dg/unroll1.ads (working copy) @@ -0,0 +1,9 @@ +package Unroll1 is + + type Sarray is array (1 .. 4) of Float; + for Sarray'Alignment use 16; + + function "+" (X, Y : Sarray) return Sarray; + procedure Add (X, Y : Sarray; R : out Sarray); + +end Unroll1; Index: testsuite/gnat.dg/unroll2.adb =================================================================== --- testsuite/gnat.dg/unroll2.adb (revision 0) +++ testsuite/gnat.dg/unroll2.adb (working copy) @@ -0,0 +1,26 @@ +-- { dg-do compile } +-- { dg-options "-O2 -fdump-tree-cunrolli-details" } + +package body Unroll2 is + + function "+" (X, Y : Sarray) return Sarray is + R : Sarray; + begin + for I in Sarray'Range loop + pragma Loop_Optimize (Unroll); + R(I) := X(I) + Y(I); + end loop; + return R; + end; + + procedure Add (X, Y : Sarray; R : out Sarray) is + begin + for I in Sarray'Range loop + pragma Loop_Optimize (Unroll); + R(I) := X(I) + Y(I); + end loop; + end; + +end Unroll2; + +-- { dg-final { scan-tree-dump-times "note: loop with 3 iterations completely unrolled" 2 "cunrolli" } } */ Index: testsuite/gnat.dg/unroll2.ads =================================================================== --- testsuite/gnat.dg/unroll2.ads (revision 0) +++ testsuite/gnat.dg/unroll2.ads (working copy) @@ -0,0 +1,9 @@ +package Unroll2 is + + type Sarray is array (1 .. 4) of Float; + for Sarray'Alignment use 16; + + function "+" (X, Y : Sarray) return Sarray; + procedure Add (X, Y : Sarray; R : out Sarray); + +end Unroll2; Index: tree-cfg.c =================================================================== --- tree-cfg.c (revision 254797) +++ tree-cfg.c (working copy) @@ -280,6 +280,11 @@ replace_loop_annotate_in_block (basic_bl case annot_expr_ivdep_kind: loop->safelen = INT_MAX; break; + case annot_expr_unroll_kind: + loop->unroll + = (unsigned short) tree_to_shwi (gimple_call_arg (stmt, 2)); + cfun->has_unroll = true; + break; case annot_expr_no_vector_kind: loop->dont_vectorize = true; break; @@ -334,6 +339,7 @@ replace_loop_annotate (void) switch ((annot_expr_kind) tree_to_shwi (gimple_call_arg (stmt, 1))) { case annot_expr_ivdep_kind: + case annot_expr_unroll_kind: case annot_expr_no_vector_kind: case annot_expr_vector_kind: break; @@ -7991,6 +7997,8 @@ print_loop (FILE *file, struct loop *loo fprintf (file, ", estimate = "); print_decu (loop->nb_iterations_estimate, file); } + if (loop->unroll) + fprintf (file, ", unroll = %d", loop->unroll); fprintf (file, ")\n"); /* Print loop's body. */ Index: tree-core.h =================================================================== --- tree-core.h (revision 254797) +++ tree-core.h (working copy) @@ -851,6 +851,7 @@ enum tree_node_kind { enum annot_expr_kind { annot_expr_ivdep_kind, + annot_expr_unroll_kind, annot_expr_no_vector_kind, annot_expr_vector_kind, annot_expr_kind_last Index: tree-inline.c =================================================================== --- tree-inline.c (revision 254797) +++ tree-inline.c (working copy) @@ -2596,6 +2596,11 @@ copy_loops (copy_body_data *id, flow_loop_tree_node_add (dest_parent, dest_loop); dest_loop->safelen = src_loop->safelen; + if (src_loop->unroll) + { + dest_loop->unroll = src_loop->unroll; + cfun->has_unroll = true; + } dest_loop->dont_vectorize = src_loop->dont_vectorize; if (src_loop->force_vectorize) { Index: tree-pretty-print.c =================================================================== --- tree-pretty-print.c (revision 254797) +++ tree-pretty-print.c (working copy) @@ -2632,6 +2632,10 @@ dump_generic_node (pretty_printer *pp, t case annot_expr_ivdep_kind: pp_string (pp, ", ivdep"); break; + case annot_expr_unroll_kind: + pp_printf (pp, ", unroll %d", + (int) TREE_INT_CST_LOW (TREE_OPERAND (node, 2))); + break; case annot_expr_no_vector_kind: pp_string (pp, ", no-vector"); break; Index: tree-ssa-loop-ivcanon.c =================================================================== --- tree-ssa-loop-ivcanon.c (revision 254797) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -681,11 +681,9 @@ try_unroll_loop_completely (struct loop HOST_WIDE_INT maxiter, location_t locus) { - unsigned HOST_WIDE_INT n_unroll = 0, ninsns, unr_insns; - struct loop_size size; + unsigned HOST_WIDE_INT n_unroll = 0; bool n_unroll_found = false; edge edge_to_cancel = NULL; - dump_flags_t report_flags = MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS; /* See if we proved number of iterations to be low constant. @@ -726,7 +724,8 @@ try_unroll_loop_completely (struct loop if (!n_unroll_found) return false; - if (n_unroll > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES)) + if (!loop->unroll + && n_unroll > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Not unrolling loop %d " @@ -740,121 +739,137 @@ try_unroll_loop_completely (struct loop if (n_unroll) { - bool large; if (ul == UL_SINGLE_ITER) return false; - /* EXIT can be removed only if we are sure it passes first N_UNROLL - iterations. */ - bool remove_exit = (exit && niter - && TREE_CODE (niter) == INTEGER_CST - && wi::leu_p (n_unroll, wi::to_widest (niter))); - - large = tree_estimate_loop_size - (loop, remove_exit ? exit : NULL, edge_to_cancel, &size, - PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)); - ninsns = size.overall; - if (large) + if (loop->unroll) { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: it is too large.\n", - loop->num); - return false; + /* If the unrolling factor is too large, bail out. */ + if (n_unroll > (unsigned)loop->unroll) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Not unrolling loop %d: " + "user didn't want it unrolled completely.\n", + loop->num); + return false; + } } - - unr_insns = estimated_unrolled_size (&size, n_unroll); - if (dump_file && (dump_flags & TDF_DETAILS)) + else { - fprintf (dump_file, " Loop size: %d\n", (int) ninsns); - fprintf (dump_file, " Estimated size after unrolling: %d\n", - (int) unr_insns); - } + struct loop_size size; + /* EXIT can be removed only if we are sure it passes first N_UNROLL + iterations. */ + bool remove_exit = (exit && niter + && TREE_CODE (niter) == INTEGER_CST + && wi::leu_p (n_unroll, wi::to_widest (niter))); + bool large + = tree_estimate_loop_size + (loop, remove_exit ? exit : NULL, edge_to_cancel, &size, + PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)); + if (large) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: it is too large.\n", + loop->num); + return false; + } - /* If the code is going to shrink, we don't need to be extra cautious - on guessing if the unrolling is going to be profitable. */ - if (unr_insns - /* If there is IV variable that will become constant, we save - one instruction in the loop prologue we do not account - otherwise. */ - <= ninsns + (size.constant_iv != false)) - ; - /* We unroll only inner loops, because we do not consider it profitable - otheriwse. We still can cancel loopback edge of not rolling loop; - this is always a good idea. */ - else if (ul == UL_NO_GROWTH) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: size would grow.\n", - loop->num); - return false; - } - /* Outer loops tend to be less interesting candidates for complete - unrolling unless we can do a lot of propagation into the inner loop - body. For now we disable outer loop unrolling when the code would - grow. */ - else if (loop->inner) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: " - "it is not innermost and code would grow.\n", - loop->num); - return false; - } - /* If there is call on a hot path through the loop, then - there is most probably not much to optimize. */ - else if (size.num_non_pure_calls_on_hot_path) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: " - "contains call and code would grow.\n", - loop->num); - return false; - } - /* If there is pure/const call in the function, then we - can still optimize the unrolled loop body if it contains - some other interesting code than the calls and code - storing or cumulating the return value. */ - else if (size.num_pure_calls_on_hot_path - /* One IV increment, one test, one ivtmp store - and one useful stmt. That is about minimal loop - doing pure call. */ - && (size.non_call_stmts_on_hot_path - <= 3 + size.num_pure_calls_on_hot_path)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: " - "contains just pure calls and code would grow.\n", - loop->num); - return false; - } - /* Complete unrolling is a major win when control flow is removed and - one big basic block is created. If the loop contains control flow - the optimization may still be a win because of eliminating the loop - overhead but it also may blow the branch predictor tables. - Limit number of branches on the hot path through the peeled - sequence. */ - else if (size.num_branches_on_hot_path * (int)n_unroll - > PARAM_VALUE (PARAM_MAX_PEEL_BRANCHES)) - { + unsigned HOST_WIDE_INT ninsns = size.overall; + unsigned HOST_WIDE_INT unr_insns + = estimated_unrolled_size (&size, n_unroll); if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: " - " number of branches on hot path in the unrolled sequence" - " reach --param max-peel-branches limit.\n", - loop->num); - return false; - } - else if (unr_insns - > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: " - "(--param max-completely-peeled-insns limit reached).\n", - loop->num); - return false; + { + fprintf (dump_file, " Loop size: %d\n", (int) ninsns); + fprintf (dump_file, " Estimated size after unrolling: %d\n", + (int) unr_insns); + } + + /* If the code is going to shrink, we don't need to be extra + cautious on guessing if the unrolling is going to be + profitable. */ + if (unr_insns + /* If there is IV variable that will become constant, we + save one instruction in the loop prologue we do not + account otherwise. */ + <= ninsns + (size.constant_iv != false)) + ; + /* We unroll only inner loops, because we do not consider it + profitable otheriwse. We still can cancel loopback edge + of not rolling loop; this is always a good idea. */ + else if (ul == UL_NO_GROWTH) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: size would grow.\n", + loop->num); + return false; + } + /* Outer loops tend to be less interesting candidates for + complete unrolling unless we can do a lot of propagation + into the inner loop body. For now we disable outer loop + unrolling when the code would grow. */ + else if (loop->inner) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "it is not innermost and code would grow.\n", + loop->num); + return false; + } + /* If there is call on a hot path through the loop, then + there is most probably not much to optimize. */ + else if (size.num_non_pure_calls_on_hot_path) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "contains call and code would grow.\n", + loop->num); + return false; + } + /* If there is pure/const call in the function, then we can + still optimize the unrolled loop body if it contains some + other interesting code than the calls and code storing or + cumulating the return value. */ + else if (size.num_pure_calls_on_hot_path + /* One IV increment, one test, one ivtmp store and + one useful stmt. That is about minimal loop + doing pure call. */ + && (size.non_call_stmts_on_hot_path + <= 3 + size.num_pure_calls_on_hot_path)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "contains just pure calls and code would grow.\n", + loop->num); + return false; + } + /* Complete unrolling is major win when control flow is + removed and one big basic block is created. If the loop + contains control flow the optimization may still be a win + because of eliminating the loop overhead but it also may + blow the branch predictor tables. Limit number of + branches on the hot path through the peeled sequence. */ + else if (size.num_branches_on_hot_path * (int)n_unroll + > PARAM_VALUE (PARAM_MAX_PEEL_BRANCHES)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "number of branches on hot path in the unrolled " + "sequence reaches --param max-peel-branches limit.\n", + loop->num); + return false; + } + else if (unr_insns + > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "number of insns in the unrolled sequence reaches " + "--param max-completely-peeled-insns limit.\n", + loop->num); + return false; + } } - if (!n_unroll) - dump_printf_loc (report_flags, locus, - "loop turned into non-loop; it never loops.\n"); initialize_original_copy_tables (); auto_sbitmap wont_exit (n_unroll + 1); @@ -898,8 +913,8 @@ try_unroll_loop_completely (struct loop else gimple_cond_make_true (cond); update_stmt (cond); - /* Do not remove the path. Doing so may remove outer loop - and confuse bookkeeping code in tree_unroll_loops_completelly. */ + /* Do not remove the path, as doing so may remove outer loop and + confuse bookkeeping code in tree_unroll_loops_completely. */ } /* Store the loop for later unlooping and exit removal. */ @@ -915,7 +930,7 @@ try_unroll_loop_completely (struct loop { dump_printf_loc (MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS, locus, "loop with %d iterations completely unrolled", - (int) (n_unroll + 1)); + (int) n_unroll); if (loop->header->count.initialized_p ()) dump_printf (MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS, " (header execution count %d)", @@ -963,7 +978,8 @@ try_peel_loop (struct loop *loop, struct loop_size size; int peeled_size; - if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0 + if (!flag_peel_loops + || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0 || !peeled_loops) return false; @@ -974,20 +990,29 @@ try_peel_loop (struct loop *loop, return false; } + /* We don't peel loops that will be unrolled as this can duplicate a + loop more times than the user requested. */ + if (loop->unroll) + { + if (dump_file) + fprintf (dump_file, "Not peeling: user didn't want it peeled.\n"); + return false; + } + /* Peel only innermost loops. While the code is perfectly capable of peeling non-innermost loops, the heuristics would probably need some improvements. */ if (loop->inner) { if (dump_file) - fprintf (dump_file, "Not peeling: outer loop\n"); + fprintf (dump_file, "Not peeling: outer loop\n"); return false; } if (!optimize_loop_for_speed_p (loop)) { if (dump_file) - fprintf (dump_file, "Not peeling: cold loop\n"); + fprintf (dump_file, "Not peeling: cold loop\n"); return false; } @@ -1005,7 +1030,7 @@ try_peel_loop (struct loop *loop, if (maxiter >= 0 && maxiter <= npeel) { if (dump_file) - fprintf (dump_file, "Not peeling: upper bound is known so can " + fprintf (dump_file, "Not peeling: upper bound is known so can " "unroll completely\n"); return false; } @@ -1016,7 +1041,7 @@ try_peel_loop (struct loop *loop, if (npeel > PARAM_VALUE (PARAM_MAX_PEEL_TIMES) - 1) { if (dump_file) - fprintf (dump_file, "Not peeling: rolls too much " + fprintf (dump_file, "Not peeling: rolls too much " "(%i + 1 > --param max-peel-times)\n", (int) npeel); return false; } @@ -1029,7 +1054,7 @@ try_peel_loop (struct loop *loop, > PARAM_VALUE (PARAM_MAX_PEELED_INSNS)) { if (dump_file) - fprintf (dump_file, "Not peeling: peeled sequence size is too large " + fprintf (dump_file, "Not peeling: peeled sequence size is too large " "(%i insns > --param max-peel-insns)", peeled_size); return false; } @@ -1317,7 +1342,9 @@ tree_unroll_loops_completely_1 (bool may if (!loop_father) return false; - if (may_increase_size && optimize_loop_nest_for_speed_p (loop) + if (loop->unroll > 1) + ul = UL_ALL; + else if (may_increase_size && optimize_loop_nest_for_speed_p (loop) /* Unroll outermost loops only if asked to do so or they do not cause code growth. */ && (unroll_outer || loop_outer (loop_father))) @@ -1566,7 +1593,7 @@ public: {} /* opt_pass methods: */ - virtual bool gate (function *) { return optimize >= 2; } + virtual bool gate (function *) { return optimize >= 2 || cfun->has_unroll; } virtual unsigned int execute (function *); }; // class pass_complete_unrolli Index: tree.def =================================================================== --- tree.def (revision 254797) +++ tree.def (working copy) @@ -1410,8 +1410,9 @@ DEFTREECODE (TARGET_OPTION_NODE, "target /* ANNOTATE_EXPR. Operand 0 is the expression to be annotated. - Operand 1 is the annotation kind. */ -DEFTREECODE (ANNOTATE_EXPR, "annotate_expr", tcc_expression, 2) + Operand 1 is the annotation kind. + Operand 2 is additional data. */ +DEFTREECODE (ANNOTATE_EXPR, "annotate_expr", tcc_expression, 3) /* Cilk spawn statement Operand 0 is the CALL_EXPR. */