diff mbox series

Add support for #pragma GCC unroll v2

Message ID 2195708.5NdSF6Gi6B@polaris
State New
Headers show
Series Add support for #pragma GCC unroll v2 | expand

Commit Message

Eric Botcazou Nov. 22, 2017, 10:46 a.m. UTC
Hi,

this is a revised version of:
  https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01452.html

with the following changes:
 1. integration of Bernhard's patch for the Fortran front-end,
 2. Sandra's fix for the documentation,
 3. minor tweaks to the C and C++ front-end,
 4. change at the GIMPLE level for the cunrolli pass,
 5. change at the RTL level with a fix for a thinko,
 6. More testcases for all languages.

This makes it so that the presence of a pragma GCC unroll doesn't have a 
global effect on the function: at the GIMPLE level, the cunrolli pass is no 
longer forced, so that the unrolling is done by the first activated pass 
(cunroll at -O1, cunrolli at -O2 and above); at the RTL level, this was 
already the case but the code no longer fiddles with the unrolling flag.

Tested on x86_64-suse-linux, OK for the mainline?


2017-11-22  Mike Stump  <mikestump@comcast.net>
            Eric Botcazou  <ebotcazou@adacore.com>
            Bernhard Reutner-Fischer  <aldot@gcc.gnu.org>

ChangeLog/
        * doc/extend.texi (Loop-Specific Pragmas): Document pragma GCC unroll.
        * doc/generic.texi (ANNOTATE_EXPR): Document 3rd operand.
        * cfgloop.h (struct loop): Add unroll field.
        * function.h (struct function): Add has_unroll bitfield.
        * gimplify.c (gimple_boolify) <ANNOTATE_EXPR>: Deal with unroll kind.
        (gimplify_expr) <ANNOTATE_EXPR>: Propagate 3rd operand.
        * loop-init.c (pass_loop2::gate): Return true if cfun->has_unroll.
        (pass_rtl_unroll_loops::gate): Likewise.
        * loop-unroll.c (decide_unrolling): Tweak note message.  Skip loops
        for which loop->unroll==1.
        (decide_unroll_constant_iterations): Use note for consistency and
        take loop->unroll into account.  Return early if loop->unroll is set.
	Fix thinko in existing test.
        (decide_unroll_runtime_iterations): Use note for consistency and
        take loop->unroll into account.
        (decide_unroll_stupid): Likewise.
        * lto-streamer-in.c (input_cfg): Read loop->unroll.
        * lto-streamer-out.c (output_cfg): Write loop->unroll.
        * tree-cfg.c (replace_loop_annotate_in_block) <annot_expr_unroll_kind>
        New.
        (replace_loop_annotate) <annot_expr_unroll_kind>: Likewise.
        (print_loop): Print loop->unroll if set.
        * tree-core.h (enum annot_expr_kind): Add annot_expr_unroll_kind.
        * tree-inline.c (copy_loops): Copy unroll and set cfun->has_unroll.
        * tree-pretty-print.c (dump_generic_node) <annot_expr_unroll_kind>:
        New.
        * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Bail out if
        loop->unroll is set and smaller than the trip count.  Otherwise bypass
        entirely the heuristics if loop->unroll is set.  Remove dead note.
        Fix off-by-one bug in other node.
        (try_peel_loop): Bail out if loop->unroll is set.  Fix formatting.
        (tree_unroll_loops_completely_1): Force unrolling if loop->unroll
        is greater than 1.
	(tree_unroll_loops_completely): Make static.
	(pass_complete_unroll::execute): Use correct type for variable.
	(pass_complete_unrolli::execute): Fix formatting.
        * tree.def (ANNOTATE_EXPR): Add 3rd operand.

ada/ChangeLog:
        * gcc-interface/trans.c (gnat_gimplify_stmt) <LOOP_STMT>: Add 3rd
        operand to ANNOTATE_EXPR and pass unrolling hints.

c-family/ChangeLog:
        * c-pragma.c (init_pragma): Register pragma GCC unroll.
        * c-pragma.h (enum pragma_kind): Add PRAGMA_UNROLL.

c/ChangeLog:
        * c-parser.c (c_parser_while_statement): Add unroll parameter and
        build ANNOTATE_EXPR if present.  Add 3rd operand to ANNOTATE_EXPR.
        (c_parser_do_statement): Likewise.
        (c_parser_for_statement): Likewise.
        (c_parser_statement_after_labels): Adjust calls to above.
        (c_parse_pragma_ivdep): New static function.
        (c_parser_pragma_unroll): Likewise.
        (c_parser_pragma) <PRAGMA_IVDEP>: Add support for pragma Unroll.
        <PRAGMA_UNROLL>: New case.

cp/ChangeLog:
        * constexpr.c (cxx_eval_constant_expression) <ANNOTATE_EXPR>: Remove
        assertion on 2nd operand.
        (potential_constant_expression_1): Likewise.
        * cp-array-notation.c (create_an_loop): Adjut call to finish_for_cond.
        * cp-tree.h (cp_convert_range_for): Adjust prototype.
        (finish_while_stmt_cond): Likewise.
        (finish_do_stmt): Likewise.
        (finish_for_cond): Likewise.
        * init.c (build_vec_init): Adjut call to finish_for_cond.
        * parser.c (cp_parser_statement): Adjust call to
        cp_parser_iteration_statement.
        (cp_parser_for): Add unroll parameter and pass it in calls to
        cp_parser_range_for and cp_parser_c_for.
        (cp_parser_c_for): Add unroll parameter and pass it in call to
        finish_for_cond.
        (cp_parser_range_for): Add unroll parameter and pass it in call to
        cp_convert_range_for.
        (cp_convert_range_for): Add unroll parameter and pass it in call to
        finish_for_cond.
        (cp_parser_iteration_statement): Add unroll parameter and pass it in
        calls to finish_while_stmt_cond, finish_do_stmt and cp_parser_for.
        (cp_parser_pragma_ivdep): New static function.
        (cp_parser_pragma_unroll): Likewise.
        (cp_parser_pragma) <PRAGMA_IVDEP>: Add support for pragma Unroll.
        <PRAGMA_UNROLL>: New case.
        * pt.c (tsubst_expr): Adjut calls to finish_for_cond,
        cp_convert_range_for, finish_while_stmt_cond and finish_do_stmt.
        <ANNOTATE_EXPR>: Propagate 3rd operand.
        * semantics.c (finish_while_stmt_cond): Add unroll parameter and
        build ANNOTATE_EXPR if present.  Add 3rd operand to ANNOTATE_EXPR.
        (finish_do_stmt): Likewise.
        (finish_for_cond): Likewise.

fortran/ChangeLog:
	* array.c (gfc_copy_iterator): Copy unroll field.
	* decl.c (directive_unroll): New global variable.
	(gfc_match_gcc_unroll): New function.
	* gfortran.h (gfc_iterator]): Add unroll field.
	(directive_unroll): Declare:
	* match.c (gfc_match_do): Use memset to initialize the iterator.
	* match.h (gfc_match_gcc_unroll): New prototype.
	* parse.c (decode_gcc_attribute): Match "unroll".
	(parse_do_block): Set iterator's unroll.
	(parse_executable): Diagnose misplaced unroll directive.
	* trans-stmt.c (gfc_trans_simple_do) Annotate loop condition with
	annot_expr_unroll_kind.
	(gfc_trans_do): Likewise.
	(gfc_trans_forall_loop): Add 3rd operand to ANNOTATE_EXPR.

testsuite/ChangeLog:
        * c-c++-common/unroll-1.c: New test.
        * c-c++-common/unroll-2.c: Likewise.
        * c-c++-common/unroll-3.c: Likewise.
        * c-c++-common/unroll-4.c: Likewise.
	* c-c++-common/unroll-5.c: Likewise.
	* testsuite/gcc.dg/pr64277.c: Adjust scan.
        * gcc.dg/tree-prof/unroll-1.c: Use detailed dump and adjust scan.
	* gcc.dg/tree-ssa/cunroll-1.c: Adjust scan.
	* gcc.dg/tree-ssa/cunroll-12.c: Likewise.
	* gcc.dg/tree-ssa/cunroll-13.c: Likewise.
	* gcc.dg/tree-ssa/cunroll-14.c: Likewise.
	* gcc.dg/tree-ssa/cunroll-2.c: Likewise.
	* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
	* gcc.dg/tree-ssa/cunroll-5.c: Likewise.
	* gcc.dg/tree-ssa/loop-1.c: Likewise.
	* gcc.dg/tree-ssa/loop-23.c: Likewise.
	* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
	* gcc.dg/tree-ssa/pr61743-2.c: Likewise.
        * gcc.dg/unroll-2.c (foo): Adjust message.
        (foo2): Likewise.
        * gcc.dg/unroll-3.c: Adjust scan.
        * gcc.dg/unroll-4.c: Likewise.
        * gcc.dg/unroll-5.c: Likewise.
        * gcc.dg/unroll-7.c: Use detailed dump and adjust scan.
	* gfortran.dg/directive_unroll_1.f90: New test.
	* gfortran.dg/directive_unroll_2.f90: Likewise.
	* gfortran.dg/directive_unroll_3.f90: Lkewise.
	* gfortran.dg/directive_unroll_4.f90: Likewise.
	* gfortran.dg/directive_unroll_5.f90: Likewise.
        * gnat.dg/unroll1.ad[sb]: New test.
        * gnat.dg/unroll2.ad[sb]: Likewise.
	* gnat.dg/unroll3.ad[sb]: Likewise.

 ada/gcc-interface/trans.c                    |   25 +-
 c-family/c-pragma.c                          |    4 
 c-family/c-pragma.h                          |    1 
 c/c-parser.c                                 |  160 ++++++++++++---
 cfgloop.h                                    |    5 
 cp/constexpr.c                               |    2 
 cp/cp-array-notation.c                       |    2 
 cp/cp-tree.h                                 |    9 
 cp/init.c                                    |    2 
 cp/parser.c                                  |  129 ++++++++++--
 cp/pt.c                                      |   16 -
 cp/semantics.c                               |   42 +++-
 doc/extend.texi                              |   18 +
 doc/generic.texi                             |    2 
 fortran/array.c                              |    1 
 fortran/decl.c                               |   38 +++
 fortran/gfortran.h                           |    2 
 fortran/match.c                              |    2 
 fortran/match.h                              |    1 
 fortran/parse.c                              |   13 +
 fortran/trans-stmt.c                         |   15 +
 function.h                                   |    5 
 gimplify.c                                   |    4 
 loop-init.c                                  |    6 
 loop-unroll.c                                |  107 ++++++----
 lto-streamer-in.c                            |    1 
 lto-streamer-out.c                           |    1 
 testsuite/c-c++-common/unroll-1.c            |   41 +++
 testsuite/c-c++-common/unroll-2.c            |   41 +++
 testsuite/c-c++-common/unroll-3.c            |   41 +++
 testsuite/c-c++-common/unroll-4.c            |   20 +
 testsuite/c-c++-common/unroll-5.c            |   29 ++
 testsuite/gcc.dg/pr64277.c                   |    2 
 testsuite/gcc.dg/tree-prof/unroll-1.c        |    4 
 testsuite/gcc.dg/tree-ssa/cunroll-1.c        |    2 
 testsuite/gcc.dg/tree-ssa/cunroll-12.c       |    2 
 testsuite/gcc.dg/tree-ssa/cunroll-13.c       |    2 
 testsuite/gcc.dg/tree-ssa/cunroll-14.c       |    2 
 testsuite/gcc.dg/tree-ssa/cunroll-2.c        |    2 
 testsuite/gcc.dg/tree-ssa/cunroll-3.c        |    2 
 testsuite/gcc.dg/tree-ssa/cunroll-5.c        |    2 
 testsuite/gcc.dg/tree-ssa/loop-1.c           |    2 
 testsuite/gcc.dg/tree-ssa/loop-23.c          |    3 
 testsuite/gcc.dg/tree-ssa/pr61743-1.c        |    4 
 testsuite/gcc.dg/tree-ssa/pr61743-2.c        |    4 
 testsuite/gcc.dg/unroll-2.c                  |    4 
 testsuite/gcc.dg/unroll-3.c                  |    2 
 testsuite/gcc.dg/unroll-4.c                  |    2 
 testsuite/gcc.dg/unroll-5.c                  |    2 
 testsuite/gcc.dg/unroll-7.c                  |    4 
 testsuite/gfortran.dg/directive_unroll_1.f90 |   52 +++++
 testsuite/gfortran.dg/directive_unroll_2.f90 |   52 +++++
 testsuite/gfortran.dg/directive_unroll_3.f90 |   52 +++++
 testsuite/gfortran.dg/directive_unroll_4.f90 |   29 ++
 testsuite/gfortran.dg/directive_unroll_5.f90 |   38 +++
 testsuite/gnat.dg/unroll1.adb                |   27 ++
 testsuite/gnat.dg/unroll1.ads                |    9 
 testsuite/gnat.dg/unroll2.adb                |   26 ++
 testsuite/gnat.dg/unroll2.ads                |    9 
 testsuite/gnat.dg/unroll3.adb                |   26 ++
 testsuite/gnat.dg/unroll3.ads                |    9 
 tree-cfg.c                                   |    8 
 tree-core.h                                  |    1 
 tree-inline.c                                |    5 
 tree-pretty-print.c                          |    4 
 tree-ssa-loop-ivcanon.c                      |  278 +++++++++++++------------
 tree.def                                     |    5 
 67 files changed, 1165 insertions(+), 297 deletions(-)

Comments

Richard Biener Nov. 23, 2017, 8:58 a.m. UTC | #1
On Wed, Nov 22, 2017 at 11:46 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> Hi,
>
> this is a revised version of:
>   https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01452.html
>
> with the following changes:
>  1. integration of Bernhard's patch for the Fortran front-end,
>  2. Sandra's fix for the documentation,
>  3. minor tweaks to the C and C++ front-end,
>  4. change at the GIMPLE level for the cunrolli pass,
>  5. change at the RTL level with a fix for a thinko,
>  6. More testcases for all languages.
>
> This makes it so that the presence of a pragma GCC unroll doesn't have a
> global effect on the function: at the GIMPLE level, the cunrolli pass is no
> longer forced, so that the unrolling is done by the first activated pass
> (cunroll at -O1, cunrolli at -O2 and above); at the RTL level, this was
> already the case but the code no longer fiddles with the unrolling flag.
>
> Tested on x86_64-suse-linux, OK for the mainline?

The middle-end, testsuite and boilerplate changes in the FEs are ok.

Pragma support in the FEs need FE maintainer approval.

Thanks,
Richard,

>
> 2017-11-22  Mike Stump  <mikestump@comcast.net>
>             Eric Botcazou  <ebotcazou@adacore.com>
>             Bernhard Reutner-Fischer  <aldot@gcc.gnu.org>
>
> ChangeLog/
>         * doc/extend.texi (Loop-Specific Pragmas): Document pragma GCC unroll.
>         * doc/generic.texi (ANNOTATE_EXPR): Document 3rd operand.
>         * cfgloop.h (struct loop): Add unroll field.
>         * function.h (struct function): Add has_unroll bitfield.
>         * gimplify.c (gimple_boolify) <ANNOTATE_EXPR>: Deal with unroll kind.
>         (gimplify_expr) <ANNOTATE_EXPR>: Propagate 3rd operand.
>         * loop-init.c (pass_loop2::gate): Return true if cfun->has_unroll.
>         (pass_rtl_unroll_loops::gate): Likewise.
>         * loop-unroll.c (decide_unrolling): Tweak note message.  Skip loops
>         for which loop->unroll==1.
>         (decide_unroll_constant_iterations): Use note for consistency and
>         take loop->unroll into account.  Return early if loop->unroll is set.
>         Fix thinko in existing test.
>         (decide_unroll_runtime_iterations): Use note for consistency and
>         take loop->unroll into account.
>         (decide_unroll_stupid): Likewise.
>         * lto-streamer-in.c (input_cfg): Read loop->unroll.
>         * lto-streamer-out.c (output_cfg): Write loop->unroll.
>         * tree-cfg.c (replace_loop_annotate_in_block) <annot_expr_unroll_kind>
>         New.
>         (replace_loop_annotate) <annot_expr_unroll_kind>: Likewise.
>         (print_loop): Print loop->unroll if set.
>         * tree-core.h (enum annot_expr_kind): Add annot_expr_unroll_kind.
>         * tree-inline.c (copy_loops): Copy unroll and set cfun->has_unroll.
>         * tree-pretty-print.c (dump_generic_node) <annot_expr_unroll_kind>:
>         New.
>         * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Bail out if
>         loop->unroll is set and smaller than the trip count.  Otherwise bypass
>         entirely the heuristics if loop->unroll is set.  Remove dead note.
>         Fix off-by-one bug in other node.
>         (try_peel_loop): Bail out if loop->unroll is set.  Fix formatting.
>         (tree_unroll_loops_completely_1): Force unrolling if loop->unroll
>         is greater than 1.
>         (tree_unroll_loops_completely): Make static.
>         (pass_complete_unroll::execute): Use correct type for variable.
>         (pass_complete_unrolli::execute): Fix formatting.
>         * tree.def (ANNOTATE_EXPR): Add 3rd operand.
>
> ada/ChangeLog:
>         * gcc-interface/trans.c (gnat_gimplify_stmt) <LOOP_STMT>: Add 3rd
>         operand to ANNOTATE_EXPR and pass unrolling hints.
>
> c-family/ChangeLog:
>         * c-pragma.c (init_pragma): Register pragma GCC unroll.
>         * c-pragma.h (enum pragma_kind): Add PRAGMA_UNROLL.
>
> c/ChangeLog:
>         * c-parser.c (c_parser_while_statement): Add unroll parameter and
>         build ANNOTATE_EXPR if present.  Add 3rd operand to ANNOTATE_EXPR.
>         (c_parser_do_statement): Likewise.
>         (c_parser_for_statement): Likewise.
>         (c_parser_statement_after_labels): Adjust calls to above.
>         (c_parse_pragma_ivdep): New static function.
>         (c_parser_pragma_unroll): Likewise.
>         (c_parser_pragma) <PRAGMA_IVDEP>: Add support for pragma Unroll.
>         <PRAGMA_UNROLL>: New case.
>
> cp/ChangeLog:
>         * constexpr.c (cxx_eval_constant_expression) <ANNOTATE_EXPR>: Remove
>         assertion on 2nd operand.
>         (potential_constant_expression_1): Likewise.
>         * cp-array-notation.c (create_an_loop): Adjut call to finish_for_cond.
>         * cp-tree.h (cp_convert_range_for): Adjust prototype.
>         (finish_while_stmt_cond): Likewise.
>         (finish_do_stmt): Likewise.
>         (finish_for_cond): Likewise.
>         * init.c (build_vec_init): Adjut call to finish_for_cond.
>         * parser.c (cp_parser_statement): Adjust call to
>         cp_parser_iteration_statement.
>         (cp_parser_for): Add unroll parameter and pass it in calls to
>         cp_parser_range_for and cp_parser_c_for.
>         (cp_parser_c_for): Add unroll parameter and pass it in call to
>         finish_for_cond.
>         (cp_parser_range_for): Add unroll parameter and pass it in call to
>         cp_convert_range_for.
>         (cp_convert_range_for): Add unroll parameter and pass it in call to
>         finish_for_cond.
>         (cp_parser_iteration_statement): Add unroll parameter and pass it in
>         calls to finish_while_stmt_cond, finish_do_stmt and cp_parser_for.
>         (cp_parser_pragma_ivdep): New static function.
>         (cp_parser_pragma_unroll): Likewise.
>         (cp_parser_pragma) <PRAGMA_IVDEP>: Add support for pragma Unroll.
>         <PRAGMA_UNROLL>: New case.
>         * pt.c (tsubst_expr): Adjut calls to finish_for_cond,
>         cp_convert_range_for, finish_while_stmt_cond and finish_do_stmt.
>         <ANNOTATE_EXPR>: Propagate 3rd operand.
>         * semantics.c (finish_while_stmt_cond): Add unroll parameter and
>         build ANNOTATE_EXPR if present.  Add 3rd operand to ANNOTATE_EXPR.
>         (finish_do_stmt): Likewise.
>         (finish_for_cond): Likewise.
>
> fortran/ChangeLog:
>         * array.c (gfc_copy_iterator): Copy unroll field.
>         * decl.c (directive_unroll): New global variable.
>         (gfc_match_gcc_unroll): New function.
>         * gfortran.h (gfc_iterator]): Add unroll field.
>         (directive_unroll): Declare:
>         * match.c (gfc_match_do): Use memset to initialize the iterator.
>         * match.h (gfc_match_gcc_unroll): New prototype.
>         * parse.c (decode_gcc_attribute): Match "unroll".
>         (parse_do_block): Set iterator's unroll.
>         (parse_executable): Diagnose misplaced unroll directive.
>         * trans-stmt.c (gfc_trans_simple_do) Annotate loop condition with
>         annot_expr_unroll_kind.
>         (gfc_trans_do): Likewise.
>         (gfc_trans_forall_loop): Add 3rd operand to ANNOTATE_EXPR.
>
> testsuite/ChangeLog:
>         * c-c++-common/unroll-1.c: New test.
>         * c-c++-common/unroll-2.c: Likewise.
>         * c-c++-common/unroll-3.c: Likewise.
>         * c-c++-common/unroll-4.c: Likewise.
>         * c-c++-common/unroll-5.c: Likewise.
>         * testsuite/gcc.dg/pr64277.c: Adjust scan.
>         * gcc.dg/tree-prof/unroll-1.c: Use detailed dump and adjust scan.
>         * gcc.dg/tree-ssa/cunroll-1.c: Adjust scan.
>         * gcc.dg/tree-ssa/cunroll-12.c: Likewise.
>         * gcc.dg/tree-ssa/cunroll-13.c: Likewise.
>         * gcc.dg/tree-ssa/cunroll-14.c: Likewise.
>         * gcc.dg/tree-ssa/cunroll-2.c: Likewise.
>         * gcc.dg/tree-ssa/cunroll-3.c: Likewise.
>         * gcc.dg/tree-ssa/cunroll-5.c: Likewise.
>         * gcc.dg/tree-ssa/loop-1.c: Likewise.
>         * gcc.dg/tree-ssa/loop-23.c: Likewise.
>         * gcc.dg/tree-ssa/pr61743-1.c: Likewise.
>         * gcc.dg/tree-ssa/pr61743-2.c: Likewise.
>         * gcc.dg/unroll-2.c (foo): Adjust message.
>         (foo2): Likewise.
>         * gcc.dg/unroll-3.c: Adjust scan.
>         * gcc.dg/unroll-4.c: Likewise.
>         * gcc.dg/unroll-5.c: Likewise.
>         * gcc.dg/unroll-7.c: Use detailed dump and adjust scan.
>         * gfortran.dg/directive_unroll_1.f90: New test.
>         * gfortran.dg/directive_unroll_2.f90: Likewise.
>         * gfortran.dg/directive_unroll_3.f90: Lkewise.
>         * gfortran.dg/directive_unroll_4.f90: Likewise.
>         * gfortran.dg/directive_unroll_5.f90: Likewise.
>         * gnat.dg/unroll1.ad[sb]: New test.
>         * gnat.dg/unroll2.ad[sb]: Likewise.
>         * gnat.dg/unroll3.ad[sb]: Likewise.
>
>  ada/gcc-interface/trans.c                    |   25 +-
>  c-family/c-pragma.c                          |    4
>  c-family/c-pragma.h                          |    1
>  c/c-parser.c                                 |  160 ++++++++++++---
>  cfgloop.h                                    |    5
>  cp/constexpr.c                               |    2
>  cp/cp-array-notation.c                       |    2
>  cp/cp-tree.h                                 |    9
>  cp/init.c                                    |    2
>  cp/parser.c                                  |  129 ++++++++++--
>  cp/pt.c                                      |   16 -
>  cp/semantics.c                               |   42 +++-
>  doc/extend.texi                              |   18 +
>  doc/generic.texi                             |    2
>  fortran/array.c                              |    1
>  fortran/decl.c                               |   38 +++
>  fortran/gfortran.h                           |    2
>  fortran/match.c                              |    2
>  fortran/match.h                              |    1
>  fortran/parse.c                              |   13 +
>  fortran/trans-stmt.c                         |   15 +
>  function.h                                   |    5
>  gimplify.c                                   |    4
>  loop-init.c                                  |    6
>  loop-unroll.c                                |  107 ++++++----
>  lto-streamer-in.c                            |    1
>  lto-streamer-out.c                           |    1
>  testsuite/c-c++-common/unroll-1.c            |   41 +++
>  testsuite/c-c++-common/unroll-2.c            |   41 +++
>  testsuite/c-c++-common/unroll-3.c            |   41 +++
>  testsuite/c-c++-common/unroll-4.c            |   20 +
>  testsuite/c-c++-common/unroll-5.c            |   29 ++
>  testsuite/gcc.dg/pr64277.c                   |    2
>  testsuite/gcc.dg/tree-prof/unroll-1.c        |    4
>  testsuite/gcc.dg/tree-ssa/cunroll-1.c        |    2
>  testsuite/gcc.dg/tree-ssa/cunroll-12.c       |    2
>  testsuite/gcc.dg/tree-ssa/cunroll-13.c       |    2
>  testsuite/gcc.dg/tree-ssa/cunroll-14.c       |    2
>  testsuite/gcc.dg/tree-ssa/cunroll-2.c        |    2
>  testsuite/gcc.dg/tree-ssa/cunroll-3.c        |    2
>  testsuite/gcc.dg/tree-ssa/cunroll-5.c        |    2
>  testsuite/gcc.dg/tree-ssa/loop-1.c           |    2
>  testsuite/gcc.dg/tree-ssa/loop-23.c          |    3
>  testsuite/gcc.dg/tree-ssa/pr61743-1.c        |    4
>  testsuite/gcc.dg/tree-ssa/pr61743-2.c        |    4
>  testsuite/gcc.dg/unroll-2.c                  |    4
>  testsuite/gcc.dg/unroll-3.c                  |    2
>  testsuite/gcc.dg/unroll-4.c                  |    2
>  testsuite/gcc.dg/unroll-5.c                  |    2
>  testsuite/gcc.dg/unroll-7.c                  |    4
>  testsuite/gfortran.dg/directive_unroll_1.f90 |   52 +++++
>  testsuite/gfortran.dg/directive_unroll_2.f90 |   52 +++++
>  testsuite/gfortran.dg/directive_unroll_3.f90 |   52 +++++
>  testsuite/gfortran.dg/directive_unroll_4.f90 |   29 ++
>  testsuite/gfortran.dg/directive_unroll_5.f90 |   38 +++
>  testsuite/gnat.dg/unroll1.adb                |   27 ++
>  testsuite/gnat.dg/unroll1.ads                |    9
>  testsuite/gnat.dg/unroll2.adb                |   26 ++
>  testsuite/gnat.dg/unroll2.ads                |    9
>  testsuite/gnat.dg/unroll3.adb                |   26 ++
>  testsuite/gnat.dg/unroll3.ads                |    9
>  tree-cfg.c                                   |    8
>  tree-core.h                                  |    1
>  tree-inline.c                                |    5
>  tree-pretty-print.c                          |    4
>  tree-ssa-loop-ivcanon.c                      |  278 +++++++++++++------------
>  tree.def                                     |    5
>  67 files changed, 1165 insertions(+), 297 deletions(-)
>
> --
> Eric Botcazou
Eric Botcazou Nov. 27, 2017, 11:57 a.m. UTC | #2
> The middle-end, testsuite and boilerplate changes in the FEs are ok.

Thanks.  But it turns out that the Ada compiler needs a way to convey a pragma 
unroll without explicit unrolling factor, because otherwise the RTL unroller 
will happily try to unroll some loops USHRT_MAX times...

Tested on x86_64-suse-linux, applied on the mainline.

> Pragma support in the FEs need FE maintainer approval.

Yes, I have posted separate patches for the C/C++ and Fortran front-ends.


2017-11-27  Eric Botcazou  <ebotcazou@adacore.com>

	* cfgloop.h (struct loop): Document usage of USHRT_MAX for unroll.
	* loop-unroll.c (decide_unroll_constant_iterations): Implement it.
	(decide_unroll_runtime_iterations): Likewise.
	(decide_unroll_stupid): Likewise.


2017-11-27  Eric Botcazou  <ebotcazou@adacore.com>

	* gnat.dg/unroll1.ads: Remove alignment clause.
	* gnat.dg/unroll2.ads: Likewise.
	* gnat.dg/unroll3.ads: Likewise.
	* gnat.dg/unroll1.adb: Remove bogus comment terminator.
	* gnat.dg/unroll2.adb: Likewise.
	* gnat.dg/unroll3.adb: Likewise.
	* gnat.dg/unroll4.ad[sb]: New testcase.
	* gnat.dg/unroll4_pkg.ads: New helper.
	
--
Eric Botcazou
Index: cfgloop.h
===================================================================
--- cfgloop.h	(revision 255147)
+++ cfgloop.h	(working copy)
@@ -221,9 +221,10 @@ struct GTY ((chain_next ("%h.next"))) lo
   /* True if the loop is part of an oacc kernels region.  */
   unsigned in_oacc_kernels_region : 1;
 
-  /* The number of times to unroll the loop.  0, means no information
-     given, just do what we always do.  A value of 1, means don't unroll
-     the loop.  */
+  /* The number of times to unroll the loop.  0 means no information given,
+     just do what we always do.  A value of 1 means do not unroll the loop.
+     A value of USHRT_MAX means unroll with no specific unrolling factor.
+     Other values means unroll with the given unrolling factor.  */
   unsigned short unroll;
 
   /* For SIMD loops, this is a unique identifier of the loop, referenced
Index: loop-unroll.c
===================================================================
--- loop-unroll.c	(revision 255147)
+++ loop-unroll.c	(working copy)
@@ -395,7 +395,7 @@ decide_unroll_constant_iterations (struc
     }
 
   /* Check for an explicit unrolling factor.  */
-  if (loop->unroll)
+  if (loop->unroll > 0 && loop->unroll < USHRT_MAX)
     {
       /* However we cannot unroll completely at the RTL level a loop with
 	 constant number of iterations; it should have been peeled instead.  */
@@ -693,7 +693,7 @@ decide_unroll_runtime_iterations (struct
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
 
-  if (loop->unroll)
+  if (loop->unroll > 0 && loop->unroll < USHRT_MAX)
     nunroll = loop->unroll;
 
   /* Skip big loops.  */
@@ -1177,7 +1177,7 @@ decide_unroll_stupid (struct loop *loop,
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
 
-  if (loop->unroll)
+  if (loop->unroll > 0 && loop->unroll < USHRT_MAX)
     nunroll = loop->unroll;
 
   /* Skip big loops.  */
Index: testsuite/gnat.dg/unroll1.adb
===================================================================
--- testsuite/gnat.dg/unroll1.adb	(revision 255147)
+++ testsuite/gnat.dg/unroll1.adb	(working copy)
@@ -23,5 +23,5 @@ package body Unroll1 is
 
 end Unroll1;
 
--- { dg-final { scan-tree-dump-times "Not unrolling loop .: user didn't want it unrolled completely" 2 "cunrolli" } } */
--- { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */
+-- { dg-final { scan-tree-dump-times "Not unrolling loop .: user didn't want it unrolled completely" 2 "cunrolli" } }
+-- { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } }
Index: testsuite/gnat.dg/unroll1.ads
===================================================================
--- testsuite/gnat.dg/unroll1.ads	(revision 255147)
+++ testsuite/gnat.dg/unroll1.ads	(working copy)
@@ -1,7 +1,6 @@
 package Unroll1 is
 
    type Sarray is array (1 .. 4) of Float;
-   for Sarray'Alignment use 16;
 
    function "+" (X, Y : Sarray) return Sarray;
    procedure Add (X, Y : Sarray; R : out Sarray);
Index: testsuite/gnat.dg/unroll2.adb
===================================================================
--- testsuite/gnat.dg/unroll2.adb	(revision 255147)
+++ testsuite/gnat.dg/unroll2.adb	(working copy)
@@ -23,4 +23,4 @@ package body Unroll2 is
 
 end Unroll2;
 
--- { dg-final { scan-tree-dump-times "note: loop with 3 iterations completely unrolled" 2 "cunrolli" } } */
+-- { dg-final { scan-tree-dump-times "note: loop with 3 iterations completely unrolled" 2 "cunrolli" } }
Index: testsuite/gnat.dg/unroll2.ads
===================================================================
--- testsuite/gnat.dg/unroll2.ads	(revision 255147)
+++ testsuite/gnat.dg/unroll2.ads	(working copy)
@@ -1,7 +1,6 @@
 package Unroll2 is
 
    type Sarray is array (1 .. 4) of Float;
-   for Sarray'Alignment use 16;
 
    function "+" (X, Y : Sarray) return Sarray;
    procedure Add (X, Y : Sarray; R : out Sarray);
Index: testsuite/gnat.dg/unroll3.adb
===================================================================
--- testsuite/gnat.dg/unroll3.adb	(revision 255147)
+++ testsuite/gnat.dg/unroll3.adb	(working copy)
@@ -23,4 +23,4 @@ package body Unroll3 is
 
 end Unroll3;
 
--- { dg-final { scan-tree-dump-times "note: loop with 3 iterations completely unrolled" 2 "cunroll" } } */
+-- { dg-final { scan-tree-dump-times "note: loop with 3 iterations completely unrolled" 2 "cunroll" } }
Index: testsuite/gnat.dg/unroll3.ads
===================================================================
--- testsuite/gnat.dg/unroll3.ads	(revision 255147)
+++ testsuite/gnat.dg/unroll3.ads	(working copy)
@@ -1,7 +1,6 @@
 package Unroll3 is
 
    type Sarray is array (1 .. 4) of Float;
-   for Sarray'Alignment use 16;
 
    function "+" (X, Y : Sarray) return Sarray;
    procedure Add (X, Y : Sarray; R : out Sarray);
Index: testsuite/gnat.dg/unroll4.adb
===================================================================
--- testsuite/gnat.dg/unroll4.adb	(revision 0)
+++ testsuite/gnat.dg/unroll4.adb	(working copy)
@@ -0,0 +1,26 @@
+-- { dg-do compile }
+-- { dg-options "-O -fdump-rtl-loop2_unroll-details" }
+
+package body Unroll4 is
+
+   function "+" (X, Y : Sarray) return Sarray is
+      R : Sarray;
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+      return R;
+   end;
+
+   procedure Add (X, Y : Sarray; R : out Sarray) is
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+   end;
+
+end Unroll4;
+
+-- { dg-final { scan-rtl-dump-times "note: loop unrolled 7 times" 2 "loop2_unroll" } }
Index: testsuite/gnat.dg/unroll4.ads
===================================================================
--- testsuite/gnat.dg/unroll4.ads	(revision 0)
+++ testsuite/gnat.dg/unroll4.ads	(working copy)
@@ -0,0 +1,10 @@
+with Unroll4_Pkg; use Unroll4_Pkg;
+
+package Unroll4 is
+
+   type Sarray is array (1 .. N) of Float;
+
+   function "+" (X, Y : Sarray) return Sarray;
+   procedure Add (X, Y : Sarray; R : out Sarray);
+
+end Unroll4;
Index: testsuite/gnat.dg/unroll4_pkg.ads
===================================================================
--- testsuite/gnat.dg/unroll4_pkg.ads	(revision 0)
+++ testsuite/gnat.dg/unroll4_pkg.ads	(working copy)
@@ -0,0 +1,5 @@
+package Unroll4_Pkg is
+
+   function N return Positive;
+
+end Unroll4_Pkg;
diff mbox series

Patch

Index: ada/gcc-interface/trans.c
===================================================================
--- ada/gcc-interface/trans.c	(revision 255000)
+++ ada/gcc-interface/trans.c	(working copy)
@@ -8506,17 +8506,30 @@  gnat_gimplify_stmt (tree *stmt_p)
 	  {
 	    /* Deal with the optimization hints.  */
 	    if (LOOP_STMT_IVDEP (stmt))
-	      gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+	      gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
 				 build_int_cst (integer_type_node,
-						annot_expr_ivdep_kind));
+						annot_expr_ivdep_kind),
+				 integer_zero_node);
+	    if (LOOP_STMT_NO_UNROLL (stmt))
+	      gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+				 build_int_cst (integer_type_node,
+						annot_expr_unroll_kind),
+				 integer_one_node);
+	    if (LOOP_STMT_UNROLL (stmt))
+	      gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+				 build_int_cst (integer_type_node,
+						annot_expr_unroll_kind),
+				 build_int_cst (NULL_TREE, USHRT_MAX));
 	    if (LOOP_STMT_NO_VECTOR (stmt))
-	      gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+	      gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
 				 build_int_cst (integer_type_node,
-						annot_expr_no_vector_kind));
+						annot_expr_no_vector_kind),
+				 integer_zero_node);
 	    if (LOOP_STMT_VECTOR (stmt))
-	      gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+	      gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
 				 build_int_cst (integer_type_node,
-						annot_expr_vector_kind));
+						annot_expr_vector_kind),
+				 integer_zero_node);
 
 	    gnu_cond
 	      = build3 (COND_EXPR, void_type_node, gnu_cond, NULL_TREE,
Index: c/c-parser.c
===================================================================
--- c/c-parser.c	(revision 255000)
+++ c/c-parser.c	(working copy)
@@ -1410,9 +1410,9 @@  static tree c_parser_c99_block_statement
 					  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec<tree> *);
 static void c_parser_switch_statement (c_parser *, bool *);
-static void c_parser_while_statement (c_parser *, bool, bool *);
-static void c_parser_do_statement (c_parser *, bool);
-static void c_parser_for_statement (c_parser *, bool, bool *);
+static void c_parser_while_statement (c_parser *, bool, unsigned short, bool *);
+static void c_parser_do_statement (c_parser *, bool, unsigned short);
+static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -5499,13 +5499,13 @@  c_parser_statement_after_labels (c_parse
 	  c_parser_switch_statement (parser, if_p);
 	  break;
 	case RID_WHILE:
-	  c_parser_while_statement (parser, false, if_p);
+	  c_parser_while_statement (parser, false, 0, if_p);
 	  break;
 	case RID_DO:
-	  c_parser_do_statement (parser, false);
+	  c_parser_do_statement (parser, 0, false);
 	  break;
 	case RID_FOR:
-	  c_parser_for_statement (parser, false, if_p);
+	  c_parser_for_statement (parser, false, 0, if_p);
 	  break;
 	case RID_CILK_FOR:
 	  if (!flag_cilkplus)
@@ -6039,7 +6039,8 @@  c_parser_switch_statement (c_parser *par
    implement -Wparentheses.  */
 
 static void
-c_parser_while_statement (c_parser *parser, bool ivdep, bool *if_p)
+c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
+			  bool *if_p)
 {
   tree block, cond, body, save_break, save_cont;
   location_t loc;
@@ -6055,9 +6056,15 @@  c_parser_while_statement (c_parser *pars
 	 "%<_Cilk_spawn%> statement cannot be used as a condition for while statement"))
     cond = error_mark_node;
   if (ivdep && cond != error_mark_node)
-    cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		   build_int_cst (integer_type_node,
-		   annot_expr_ivdep_kind));
+				  annot_expr_ivdep_kind),
+		   integer_zero_node);
+  if (unroll && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node,
+				  annot_expr_unroll_kind),
+		   build_int_cst (integer_type_node, unroll));
   save_break = c_break_label;
   c_break_label = NULL_TREE;
   save_cont = c_cont_label;
@@ -6092,7 +6099,7 @@  c_parser_while_statement (c_parser *pars
 */
 
 static void
-c_parser_do_statement (c_parser *parser, bool ivdep)
+c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll)
 {
   tree block, cond, body, save_break, save_cont, new_break, new_cont;
   location_t loc;
@@ -6120,9 +6127,16 @@  c_parser_do_statement (c_parser *parser,
 	 "%<_Cilk_spawn%> statement cannot be used as a condition for a do-while statement"))
     cond = error_mark_node;
   if (ivdep && cond != error_mark_node)
-    cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node,
+				  annot_expr_ivdep_kind),
+		   integer_zero_node);
+  if (unroll && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node,
+				  annot_expr_unroll_kind),
 		   build_int_cst (integer_type_node,
-		   annot_expr_ivdep_kind));
+				  unroll));
   if (!c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>"))
     c_parser_skip_to_end_of_block_or_statement (parser);
   c_finish_loop (loc, cond, NULL, body, new_break, new_cont, false);
@@ -6189,7 +6203,8 @@  c_parser_do_statement (c_parser *parser,
    implement -Wparentheses.  */
 
 static void
-c_parser_for_statement (c_parser *parser, bool ivdep, bool *if_p)
+c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll,
+			bool *if_p)
 {
   tree block, cond, incr, save_break, save_cont, body;
   /* The following are only used when parsing an ObjC foreach statement.  */
@@ -6310,6 +6325,12 @@  c_parser_for_statement (c_parser *parser
 				  "%<GCC ivdep%> pragma");
 		  cond = error_mark_node;
 		}
+	      else if (unroll)
+		{
+		  c_parser_error (parser, "missing loop condition in loop with "
+				  "%<GCC unroll%> pragma");
+		  cond = error_mark_node;
+		}
 	      else
 		{
 		  c_parser_consume_token (parser);
@@ -6327,9 +6348,15 @@  c_parser_for_statement (c_parser *parser
 					 "expected %<;%>");
 	    }
 	  if (ivdep && cond != error_mark_node)
-	    cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+	    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+			   build_int_cst (integer_type_node,
+					  annot_expr_ivdep_kind),
+			   integer_zero_node);
+	  if (unroll && cond != error_mark_node)
+	    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 			   build_int_cst (integer_type_node,
-			   annot_expr_ivdep_kind));
+					  annot_expr_unroll_kind),
+			   build_int_cst (integer_type_node, unroll));
 	}
       /* Parse the increment expression (the third expression in a
 	 for-statement).  In the case of a foreach-statement, this is
@@ -11039,6 +11066,49 @@  c_parser_objc_at_dynamic_declaration (c_
 }
 
 
+/* Parse a pragma GCC ivdep.  */
+
+static bool
+c_parse_pragma_ivdep (c_parser *parser)
+{
+  c_parser_consume_pragma (parser);
+  c_parser_skip_to_pragma_eol (parser);
+  return true;
+}
+
+/* Parse a pragma GCC unroll.  */
+
+static unsigned short
+c_parser_pragma_unroll (c_parser *parser)
+{
+  unsigned short unroll;
+  c_parser_consume_pragma (parser);
+  location_t location = c_parser_peek_token (parser)->location;
+  tree expr = c_parser_expr_no_commas (parser, NULL).value;
+  mark_exp_read (expr);
+  expr = c_fully_fold (expr, false, NULL);
+  HOST_WIDE_INT lunroll = 0;
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
+      || TREE_CODE (expr) != INTEGER_CST
+      || (lunroll = tree_to_shwi (expr)) < 0
+      || lunroll > USHRT_MAX)
+    {
+      error_at (location, "%<#pragma GCC unroll%> requires an"
+		" assignment-expression that evaluates to a non-negative"
+		" integral constant less than or equal to %u", USHRT_MAX);
+      unroll = 0;
+    }
+  else
+    {
+      unroll = (unsigned short)lunroll;
+      if (unroll == 0)
+	unroll = 1;
+    }
+
+  c_parser_skip_to_pragma_eol (parser);
+  return unroll;
+}
+
 /* Handle pragmas.  Some OpenMP pragmas are associated with, and therefore
    should be considered, statements.  ALLOW_STMT is true if we're within
    the context of a function and such pragmas are to be allowed.  Returns
@@ -11181,21 +11251,51 @@  c_parser_pragma (c_parser *parser, enum
       return c_parser_omp_ordered (parser, context, if_p);
 
     case PRAGMA_IVDEP:
-      c_parser_consume_pragma (parser);
-      c_parser_skip_to_pragma_eol (parser);
-      if (!c_parser_next_token_is_keyword (parser, RID_FOR)
-	  && !c_parser_next_token_is_keyword (parser, RID_WHILE)
-	  && !c_parser_next_token_is_keyword (parser, RID_DO))
-	{
-	  c_parser_error (parser, "for, while or do statement expected");
-	  return false;
-	}
-      if (c_parser_next_token_is_keyword (parser, RID_FOR))
-	c_parser_for_statement (parser, true, if_p);
-      else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
-	c_parser_while_statement (parser, true, if_p);
-      else
-	c_parser_do_statement (parser, true);
+      {
+	const bool ivdep = c_parse_pragma_ivdep (parser);
+	unsigned short unroll;
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL)
+	  unroll = c_parser_pragma_unroll (parser);
+	else
+	  unroll = 0;
+	if (!c_parser_next_token_is_keyword (parser, RID_FOR)
+	    && !c_parser_next_token_is_keyword (parser, RID_WHILE)
+	    && !c_parser_next_token_is_keyword (parser, RID_DO))
+	  {
+	    c_parser_error (parser, "for, while or do statement expected");
+	    return false;
+	  }
+	if (c_parser_next_token_is_keyword (parser, RID_FOR))
+	  c_parser_for_statement (parser, ivdep, unroll, if_p);
+	else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
+	  c_parser_while_statement (parser, ivdep, unroll, if_p);
+	else
+	  c_parser_do_statement (parser, ivdep, unroll);
+      }
+      return false;
+
+    case PRAGMA_UNROLL:
+      {
+	unsigned short unroll = c_parser_pragma_unroll (parser);
+	bool ivdep;
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP)
+	  ivdep = c_parse_pragma_ivdep (parser);
+	else
+	  ivdep = false;
+	if (!c_parser_next_token_is_keyword (parser, RID_FOR)
+	    && !c_parser_next_token_is_keyword (parser, RID_WHILE)
+	    && !c_parser_next_token_is_keyword (parser, RID_DO))
+	  {
+	    c_parser_error (parser, "for, while or do statement expected");
+	    return false;
+	  }
+	if (c_parser_next_token_is_keyword (parser, RID_FOR))
+	  c_parser_for_statement (parser, ivdep, unroll, if_p);
+	else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
+	  c_parser_while_statement (parser, ivdep, unroll, if_p);
+	else
+	  c_parser_do_statement (parser, ivdep, unroll);
+      }
       return false;
 
     case PRAGMA_GCC_PCH_PREPROCESS:
Index: c-family/c-pragma.c
===================================================================
--- c-family/c-pragma.c	(revision 255000)
+++ c-family/c-pragma.c	(working copy)
@@ -1544,6 +1544,10 @@  init_pragma (void)
     cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false,
 				  false);
 
+  if (!flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL,
+				  false, false);
+
   if (flag_cilkplus)
     cpp_register_deferred_pragma (parse_in, "cilk", "grainsize",
 				  PRAGMA_CILK_GRAINSIZE, true, false);
Index: c-family/c-pragma.h
===================================================================
--- c-family/c-pragma.h	(revision 255000)
+++ c-family/c-pragma.h	(working copy)
@@ -75,6 +75,7 @@  enum pragma_kind {
 
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
+  PRAGMA_UNROLL,
 
   PRAGMA_FIRST_EXTERNAL
 };
Index: cfgloop.h
===================================================================
--- cfgloop.h	(revision 255000)
+++ cfgloop.h	(working copy)
@@ -221,6 +221,11 @@  struct GTY ((chain_next ("%h.next"))) lo
   /* True if the loop is part of an oacc kernels region.  */
   unsigned in_oacc_kernels_region : 1;
 
+  /* The number of times to unroll the loop.  0, means no information
+     given, just do what we always do.  A value of 1, means don't unroll
+     the loop.  */
+  unsigned short unroll;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
      by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
      builtins.  */
Index: cp/constexpr.c
===================================================================
--- cp/constexpr.c	(revision 255000)
+++ cp/constexpr.c	(working copy)
@@ -4672,7 +4672,6 @@  cxx_eval_constant_expression (const cons
       return t;
 
     case ANNOTATE_EXPR:
-      gcc_assert (tree_to_uhwi (TREE_OPERAND (t, 1)) == annot_expr_ivdep_kind);
       r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0),
 					lval,
 					non_constant_p, overflow_p,
@@ -5920,7 +5919,6 @@  potential_constant_expression_1 (tree t,
       }
 
     case ANNOTATE_EXPR:
-      gcc_assert (tree_to_uhwi (TREE_OPERAND (t, 1)) == annot_expr_ivdep_kind);
       return RECUR (TREE_OPERAND (t, 0), rval);
 
     default:
Index: cp/cp-array-notation.c
===================================================================
--- cp/cp-array-notation.c	(revision 255000)
+++ cp/cp-array-notation.c	(working copy)
@@ -67,7 +67,7 @@  create_an_loop (tree init, tree cond, tr
   finish_expr_stmt (init);
   for_stmt = begin_for_stmt (NULL_TREE, NULL_TREE);
   finish_init_stmt (for_stmt);
-  finish_for_cond (cond, for_stmt, false);
+  finish_for_cond (cond, for_stmt, false, 0);
   finish_for_expr (incr, for_stmt);
   finish_expr_stmt (body);
   finish_for_stmt (for_stmt);
Index: cp/cp-tree.h
===================================================================
--- cp/cp-tree.h	(revision 255000)
+++ cp/cp-tree.h	(working copy)
@@ -6409,7 +6409,8 @@  extern tree implicitly_declare_fn
 extern bool maybe_clone_body			(tree);
 
 /* In parser.c */
-extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool);
+extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool,
+				  unsigned short);
 extern bool parsing_nsdmi (void);
 extern bool parsing_default_capturing_generic_lambda_in_template (void);
 extern void inject_this_parameter (tree, cp_cv_quals);
@@ -6694,16 +6695,16 @@  extern void begin_else_clause			(tree);
 extern void finish_else_clause			(tree);
 extern void finish_if_stmt			(tree);
 extern tree begin_while_stmt			(void);
-extern void finish_while_stmt_cond		(tree, tree, bool);
+extern void finish_while_stmt_cond	(tree, tree, bool, unsigned short);
 extern void finish_while_stmt			(tree);
 extern tree begin_do_stmt			(void);
 extern void finish_do_body			(tree);
-extern void finish_do_stmt			(tree, tree, bool);
+extern void finish_do_stmt		(tree, tree, bool, unsigned short);
 extern tree finish_return_stmt			(tree);
 extern tree begin_for_scope			(tree *);
 extern tree begin_for_stmt			(tree, tree);
 extern void finish_init_stmt			(tree);
-extern void finish_for_cond			(tree, tree, bool);
+extern void finish_for_cond		(tree, tree, bool, unsigned short);
 extern void finish_for_expr			(tree, tree);
 extern void finish_for_stmt			(tree);
 extern tree begin_range_for_stmt		(tree, tree);
Index: cp/init.c
===================================================================
--- cp/init.c	(revision 255000)
+++ cp/init.c	(working copy)
@@ -4319,7 +4319,7 @@  build_vec_init (tree base, tree maxindex
       finish_init_stmt (for_stmt);
       finish_for_cond (build2 (GT_EXPR, boolean_type_node, iterator,
 			       build_int_cst (TREE_TYPE (iterator), -1)),
-		       for_stmt, false);
+		       for_stmt, false, 0);
       elt_init = cp_build_unary_op (PREDECREMENT_EXPR, iterator, false,
 				    complain);
       if (elt_init == error_mark_node)
Index: cp/parser.c
===================================================================
--- cp/parser.c	(revision 255000)
+++ cp/parser.c	(working copy)
@@ -2121,15 +2121,15 @@  static tree cp_parser_selection_statemen
 static tree cp_parser_condition
   (cp_parser *);
 static tree cp_parser_iteration_statement
-  (cp_parser *, bool *, bool);
+  (cp_parser *, bool *, bool, unsigned short);
 static bool cp_parser_init_statement
   (cp_parser *, tree *decl);
 static tree cp_parser_for
-  (cp_parser *, bool);
+  (cp_parser *, bool, unsigned short);
 static tree cp_parser_c_for
-  (cp_parser *, tree, tree, bool);
+  (cp_parser *, tree, tree, bool, unsigned short);
 static tree cp_parser_range_for
-  (cp_parser *, tree, tree, tree, bool);
+  (cp_parser *, tree, tree, tree, bool, unsigned short);
 static void do_range_for_auto_deduction
   (tree, tree);
 static tree cp_parser_perform_range_for_lookup
@@ -10878,7 +10878,7 @@  cp_parser_statement (cp_parser* parser,
 	case RID_WHILE:
 	case RID_DO:
 	case RID_FOR:
-	  statement = cp_parser_iteration_statement (parser, if_p, false);
+	  statement = cp_parser_iteration_statement (parser, if_p, false, 0);
 	  break;
 
 	case RID_CILK_FOR:
@@ -11745,7 +11745,7 @@  cp_parser_condition (cp_parser* parser)
    not included. */
 
 static tree
-cp_parser_for (cp_parser *parser, bool ivdep)
+cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
 {
   tree init, scope, decl;
   bool is_range_for;
@@ -11757,13 +11757,14 @@  cp_parser_for (cp_parser *parser, bool i
   is_range_for = cp_parser_init_statement (parser, &decl);
 
   if (is_range_for)
-    return cp_parser_range_for (parser, scope, init, decl, ivdep);
+    return cp_parser_range_for (parser, scope, init, decl, ivdep, unroll);
   else
-    return cp_parser_c_for (parser, scope, init, ivdep);
+    return cp_parser_c_for (parser, scope, init, ivdep, unroll);
 }
 
 static tree
-cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep)
+cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep,
+		 unsigned short unroll)
 {
   /* Normal for loop */
   tree condition = NULL_TREE;
@@ -11784,7 +11785,13 @@  cp_parser_c_for (cp_parser *parser, tree
 		       "%<GCC ivdep%> pragma");
       condition = error_mark_node;
     }
-  finish_for_cond (condition, stmt, ivdep);
+  else if (unroll)
+    {
+      cp_parser_error (parser, "missing loop condition in loop with "
+		       "%<GCC unroll%> pragma");
+      condition = error_mark_node;
+    }
+  finish_for_cond (condition, stmt, ivdep, unroll);
   /* Look for the `;'.  */
   cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
 
@@ -11808,7 +11815,7 @@  cp_parser_c_for (cp_parser *parser, tree
 
 static tree
 cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl,
-		     bool ivdep)
+		     bool ivdep, unsigned short unroll)
 {
   tree stmt, range_expr;
   auto_vec <cxx_binding *, 16> bindings;
@@ -11877,6 +11884,8 @@  cp_parser_range_for (cp_parser *parser,
       stmt = begin_range_for_stmt (scope, init);
       if (ivdep)
 	RANGE_FOR_IVDEP (stmt) = 1;
+      if (unroll)
+	/* TODO */(void)0;
       finish_range_for_decl (stmt, range_decl, range_expr);
       if (!type_dependent_expression_p (range_expr)
 	  /* do_auto_deduction doesn't mess with template init-lists.  */
@@ -11887,7 +11896,8 @@  cp_parser_range_for (cp_parser *parser,
     {
       stmt = begin_for_stmt (scope, init);
       stmt = cp_convert_range_for (stmt, range_decl, range_expr,
-				   decomp_first_name, decomp_cnt, ivdep);
+				   decomp_first_name, decomp_cnt, ivdep,
+				   unroll);
     }
   return stmt;
 }
@@ -11981,7 +11991,7 @@  do_range_for_auto_deduction (tree decl,
 tree
 cp_convert_range_for (tree statement, tree range_decl, tree range_expr,
 		      tree decomp_first_name, unsigned int decomp_cnt,
-		      bool ivdep)
+		      bool ivdep, unsigned short unroll)
 {
   tree begin, end;
   tree iter_type, begin_expr, end_expr;
@@ -12042,7 +12052,7 @@  cp_convert_range_for (tree statement, tr
 				 begin, ERROR_MARK,
 				 end, ERROR_MARK,
 				 NULL, tf_warning_or_error);
-  finish_for_cond (condition, statement, ivdep);
+  finish_for_cond (condition, statement, ivdep, unroll);
 
   /* The new increment expression.  */
   expression = finish_unary_op_expr (input_location,
@@ -12217,7 +12227,8 @@  cp_parser_range_for_member_function (tre
    Returns the new WHILE_STMT, DO_STMT, FOR_STMT or RANGE_FOR_STMT.  */
 
 static tree
-cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep)
+cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,
+			       unsigned short unroll)
 {
   cp_token *token;
   enum rid keyword;
@@ -12251,7 +12262,7 @@  cp_parser_iteration_statement (cp_parser
 	parens.require_open (parser);
 	/* Parse the condition.  */
 	condition = cp_parser_condition (parser);
-	finish_while_stmt_cond (condition, statement, ivdep);
+	finish_while_stmt_cond (condition, statement, ivdep, unroll);
 	/* Look for the `)'.  */
 	parens.require_close (parser);
 	/* Parse the dependent statement.  */
@@ -12282,7 +12293,7 @@  cp_parser_iteration_statement (cp_parser
 	/* Parse the expression.  */
 	expression = cp_parser_expression (parser);
 	/* We're done with the do-statement.  */
-	finish_do_stmt (expression, statement, ivdep);
+	finish_do_stmt (expression, statement, ivdep, unroll);
 	/* Look for the `)'.  */
 	parens.require_close (parser);
 	/* Look for the `;'.  */
@@ -12296,7 +12307,7 @@  cp_parser_iteration_statement (cp_parser
 	matching_parens parens;
 	parens.require_open (parser);
 
-	statement = cp_parser_for (parser, ivdep);
+	statement = cp_parser_for (parser, ivdep, unroll);
 
 	/* Look for the `)'.  */
 	parens.require_close (parser);
@@ -38748,6 +38759,45 @@  cp_parser_cilk_grainsize (cp_parser *par
   cp_parser_skip_to_pragma_eol (parser, pragma_tok);
 }
 
+/* Parse a pragma GCC ivdep.  */
+
+static bool
+cp_parser_pragma_ivdep (cp_parser *parser, cp_token *pragma_tok)
+{
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+  return true;
+}
+
+/* Parse a pragma GCC unroll.  */
+
+static unsigned short
+cp_parser_pragma_unroll (cp_parser *parser, cp_token *pragma_tok)
+{
+  location_t location = cp_lexer_peek_token (parser->lexer)->location;
+  tree expr = cp_parser_constant_expression (parser);
+  unsigned short unroll;
+  expr = maybe_constant_value (expr);
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+  HOST_WIDE_INT lunroll = 0;
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
+      || TREE_CODE (expr) != INTEGER_CST
+      || (lunroll = tree_to_shwi (expr)) < 0
+      || lunroll > USHRT_MAX)
+    {
+      error_at (location, "%<#pragma GCC unroll%> requires an"
+		" assignment-expression that evaluates to a non-negative"
+		" integral constant less than or equal to %u", USHRT_MAX);
+      unroll = 0;
+    }
+  else
+    {
+      unroll = (unsigned short)lunroll;
+      if (unroll == 0)
+	unroll = 1;
+    }
+  return unroll;
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -38984,15 +39034,42 @@  cp_parser_pragma (cp_parser *parser, enu
 
     case PRAGMA_IVDEP:
       {
-	if (context == pragma_external)
+	const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
+	unsigned short unroll;
+	cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
+	if (tok->type == CPP_PRAGMA
+	    && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
 	  {
-	    error_at (pragma_tok->location,
-		      "%<#pragma GCC ivdep%> must be inside a function");
-	    break;
+	    unroll = cp_parser_pragma_unroll (parser, pragma_tok);
+	    tok = cp_lexer_peek_token (the_parser->lexer);
 	  }
-	cp_parser_skip_to_pragma_eol (parser, pragma_tok);
-	cp_token *tok;
-	tok = cp_lexer_peek_token (the_parser->lexer);
+	else
+	  unroll = 0;
+	if (tok->type != CPP_KEYWORD
+	    || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE
+		&& tok->keyword != RID_DO))
+	  {
+	    cp_parser_error (parser, "for, while or do statement expected");
+	    return false;
+	  }
+	cp_parser_iteration_statement (parser, if_p, ivdep, unroll);
+	return true;
+      }
+
+    case PRAGMA_UNROLL:
+      {
+	const unsigned short unroll
+	  = cp_parser_pragma_unroll (parser, pragma_tok);
+	bool ivdep;
+	cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
+	if (tok->type == CPP_PRAGMA
+	    && cp_parser_pragma_kind (tok) == PRAGMA_IVDEP)
+	  {
+	    ivdep = cp_parser_pragma_ivdep (parser, tok);
+	    tok = cp_lexer_peek_token (the_parser->lexer);
+	  }
+	else
+	  ivdep = false;
 	if (tok->type != CPP_KEYWORD
 	    || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE
 		&& tok->keyword != RID_DO))
@@ -39000,7 +39077,7 @@  cp_parser_pragma (cp_parser *parser, enu
 	    cp_parser_error (parser, "for, while or do statement expected");
 	    return false;
 	  }
-	cp_parser_iteration_statement (parser, if_p, true);
+	cp_parser_iteration_statement (parser, if_p, ivdep, unroll);
 	return true;
       }
 
Index: cp/pt.c
===================================================================
--- cp/pt.c	(revision 255000)
+++ cp/pt.c	(working copy)
@@ -16119,7 +16119,7 @@  tsubst_expr (tree t, tree args, tsubst_f
       RECUR (FOR_INIT_STMT (t));
       finish_init_stmt (stmt);
       tmp = RECUR (FOR_COND (t));
-      finish_for_cond (tmp, stmt, false);
+      finish_for_cond (tmp, stmt, false, 0);
       tmp = RECUR (FOR_EXPR (t));
       finish_for_expr (tmp, stmt);
       RECUR (FOR_BODY (t));
@@ -16141,11 +16141,11 @@  tsubst_expr (tree t, tree args, tsubst_f
 	    decl = tsubst_decomp_names (decl, RANGE_FOR_DECL (t), args,
 					complain, in_decl, &first, &cnt);
 	    stmt = cp_convert_range_for (stmt, decl, expr, first, cnt,
-					 RANGE_FOR_IVDEP (t));
+					 RANGE_FOR_IVDEP (t), 0);
 	  }
 	else
 	  stmt = cp_convert_range_for (stmt, decl, expr, NULL_TREE, 0,
-				       RANGE_FOR_IVDEP (t));
+				       RANGE_FOR_IVDEP (t), 0);
         RECUR (RANGE_FOR_BODY (t));
         finish_for_stmt (stmt);
       }
@@ -16154,7 +16154,7 @@  tsubst_expr (tree t, tree args, tsubst_f
     case WHILE_STMT:
       stmt = begin_while_stmt ();
       tmp = RECUR (WHILE_COND (t));
-      finish_while_stmt_cond (tmp, stmt, false);
+      finish_while_stmt_cond (tmp, stmt, false, 0);
       RECUR (WHILE_BODY (t));
       finish_while_stmt (stmt);
       break;
@@ -16164,7 +16164,7 @@  tsubst_expr (tree t, tree args, tsubst_f
       RECUR (DO_BODY (t));
       finish_do_body (stmt);
       tmp = RECUR (DO_COND (t));
-      finish_do_stmt (tmp, stmt, false);
+      finish_do_stmt (tmp, stmt, false, 0);
       break;
 
     case IF_STMT:
@@ -16728,8 +16728,10 @@  tsubst_expr (tree t, tree args, tsubst_f
 
     case ANNOTATE_EXPR:
       tmp = RECUR (TREE_OPERAND (t, 0));
-      RETURN (build2_loc (EXPR_LOCATION (t), ANNOTATE_EXPR,
-			  TREE_TYPE (tmp), tmp, RECUR (TREE_OPERAND (t, 1))));
+      RETURN (build3_loc (EXPR_LOCATION (t), ANNOTATE_EXPR,
+			  TREE_TYPE (tmp), tmp,
+			  RECUR (TREE_OPERAND (t, 1)),
+			  RECUR (TREE_OPERAND (t, 2))));
 
     default:
       gcc_assert (!STATEMENT_CODE_P (TREE_CODE (t)));
Index: cp/semantics.c
===================================================================
--- cp/semantics.c	(revision 255000)
+++ cp/semantics.c	(working copy)
@@ -802,7 +802,8 @@  begin_while_stmt (void)
    WHILE_STMT.  */
 
 void
-finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep)
+finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep,
+			unsigned short unroll)
 {
   if (check_no_cilk (cond,
       "Cilk array notation cannot be used as a condition for while statement",
@@ -812,11 +813,20 @@  finish_while_stmt_cond (tree cond, tree
   finish_cond (&WHILE_COND (while_stmt), cond);
   begin_maybe_infinite_loop (cond);
   if (ivdep && cond != error_mark_node)
-    WHILE_COND (while_stmt) = build2 (ANNOTATE_EXPR,
+    WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR,
 				      TREE_TYPE (WHILE_COND (while_stmt)),
 				      WHILE_COND (while_stmt),
 				      build_int_cst (integer_type_node,
-						     annot_expr_ivdep_kind));
+						     annot_expr_ivdep_kind),
+				      integer_zero_node);
+  if (unroll && cond != error_mark_node)
+    WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR,
+				      TREE_TYPE (WHILE_COND (while_stmt)),
+				      WHILE_COND (while_stmt),
+				      build_int_cst (integer_type_node,
+						     annot_expr_unroll_kind),
+				      build_int_cst (integer_type_node,
+						     unroll));
   simplify_loop_decl_cond (&WHILE_COND (while_stmt), WHILE_BODY (while_stmt));
 }
 
@@ -861,7 +871,7 @@  finish_do_body (tree do_stmt)
    COND is as indicated.  */
 
 void
-finish_do_stmt (tree cond, tree do_stmt, bool ivdep)
+finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll)
 {
   if (check_no_cilk (cond,
   "Cilk array notation cannot be used as a condition for a do-while statement",
@@ -870,8 +880,13 @@  finish_do_stmt (tree cond, tree do_stmt,
   cond = maybe_convert_cond (cond);
   end_maybe_infinite_loop (cond);
   if (ivdep && cond != error_mark_node)
-    cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
-		   build_int_cst (integer_type_node, annot_expr_ivdep_kind));
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node, annot_expr_ivdep_kind),
+		   integer_zero_node);
+  if (unroll && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node, annot_expr_unroll_kind),
+		   build_int_cst (integer_type_node, unroll));
   DO_COND (do_stmt) = cond;
 }
 
@@ -980,7 +995,7 @@  finish_init_stmt (tree for_stmt)
    FOR_STMT.  */
 
 void
-finish_for_cond (tree cond, tree for_stmt, bool ivdep)
+finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll)
 {
   if (check_no_cilk (cond,
 	 "Cilk array notation cannot be used in a condition for a for-loop",
@@ -990,11 +1005,20 @@  finish_for_cond (tree cond, tree for_stm
   finish_cond (&FOR_COND (for_stmt), cond);
   begin_maybe_infinite_loop (cond);
   if (ivdep && cond != error_mark_node)
-    FOR_COND (for_stmt) = build2 (ANNOTATE_EXPR,
+    FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR,
 				  TREE_TYPE (FOR_COND (for_stmt)),
 				  FOR_COND (for_stmt),
 				  build_int_cst (integer_type_node,
-						 annot_expr_ivdep_kind));
+						 annot_expr_ivdep_kind),
+				  integer_zero_node);
+  if (unroll && cond != error_mark_node)
+    FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR,
+				  TREE_TYPE (FOR_COND (for_stmt)),
+				  FOR_COND (for_stmt),
+				  build_int_cst (integer_type_node,
+						 annot_expr_unroll_kind),
+				  build_int_cst (integer_type_node,
+						 unroll));
   simplify_loop_decl_cond (&FOR_COND (for_stmt), FOR_BODY (for_stmt));
 }
 
Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(revision 255000)
+++ doc/extend.texi	(working copy)
@@ -22332,9 +22332,7 @@  function.  The parenthesis around the op
 
 The @code{#pragma GCC target} pragma is presently implemented for
 x86, ARM, AArch64, PowerPC, S/390, and Nios II targets only.
-@end table
 
-@table @code
 @item #pragma GCC optimize (@var{"string"}...)
 @cindex pragma GCC optimize
 
@@ -22345,9 +22343,7 @@  if @code{attribute((optimize("STRING")))
 function.  The parenthesis around the options is optional.
 @xref{Function Attributes}, for more information about the
 @code{optimize} attribute and the attribute syntax.
-@end table
 
-@table @code
 @item #pragma GCC push_options
 @itemx #pragma GCC pop_options
 @cindex pragma GCC push_options
@@ -22358,15 +22354,14 @@  options.  It is intended for include fil
 to switch to using a different @samp{#pragma GCC target} or
 @samp{#pragma GCC optimize} and then to pop back to the previous
 options.
-@end table
 
-@table @code
 @item #pragma GCC reset_options
 @cindex pragma GCC reset_options
 
 This pragma clears the current @code{#pragma GCC target} and
 @code{#pragma GCC optimize} to use the default switches as specified
 on the command line.
+
 @end table
 
 @node Loop-Specific Pragmas
@@ -22375,7 +22370,6 @@  on the command line.
 @table @code
 @item #pragma GCC ivdep
 @cindex pragma GCC ivdep
-@end table
 
 With this pragma, the programmer asserts that there are no loop-carried
 dependencies which would prevent consecutive iterations of
@@ -22410,6 +22404,16 @@  void ignore_vec_dep (int *a, int k, int
 @}
 @end smallexample
 
+@item #pragma GCC unroll @var{n}
+@cindex pragma GCC unroll @var{n}
+
+You can use this pragma to control how many times a loop should be 
+unrolled.  It must be placed immediately before a @code{for}, 
+@code{while} or @code{do} loop or a @samp{#pragma ivdep}, and applies 
+only to the loop that follows.  @var{n} is an integer constant 
+expression; a value of 0 or 1 disables unrolling of the loop.
+
+@end table
 
 @node Unnamed Fields
 @section Unnamed Structure and Union Fields
Index: doc/generic.texi
===================================================================
--- doc/generic.texi	(revision 255000)
+++ doc/generic.texi	(working copy)
@@ -1686,7 +1686,7 @@  its sole argument yields the representat
 @item ANNOTATE_EXPR
 This node is used to attach markers to an expression. The first operand
 is the annotated expression, the second is an @code{INTEGER_CST} with
-a value from @code{enum annot_expr_kind}.
+a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}.
 @end table
 
 
Index: fortran/array.c
===================================================================
--- fortran/array.c	(revision 255000)
+++ fortran/array.c	(working copy)
@@ -2123,6 +2123,7 @@  gfc_copy_iterator (gfc_iterator *src)
   dest->start = gfc_copy_expr (src->start);
   dest->end = gfc_copy_expr (src->end);
   dest->step = gfc_copy_expr (src->step);
+  dest->unroll = src->unroll;
 
   return dest;
 }
Index: fortran/decl.c
===================================================================
--- fortran/decl.c	(revision 255000)
+++ fortran/decl.c	(working copy)
@@ -95,6 +95,9 @@  gfc_symbol *gfc_new_block;
 
 bool gfc_matching_function;
 
+/* Set upon parsing a !GCC$ unroll n directive for use in the next loop.  */
+int directive_unroll = -1;
+
 /* If a kind expression of a component of a parameterized derived type is
    parameterized, temporarily store the expression here.  */
 static gfc_expr *saved_kind_expr = NULL;
@@ -104,7 +107,6 @@  static gfc_expr *saved_kind_expr = NULL;
 static gfc_actual_arglist *decl_type_param_list;
 static gfc_actual_arglist *type_param_spec_list;
 
-
 /********************* DATA statement subroutines *********************/
 
 static bool in_match_data = false;
@@ -10943,3 +10945,37 @@  syntax:
   gfc_error ("Syntax error in !GCC$ ATTRIBUTES statement at %C");
   return MATCH_ERROR;
 }
+
+
+/* Match a !GCC$ UNROLL statement of the form:
+      !GCC$ UNROLL n
+
+   The parameter n is the number of times we are supposed to unroll.
+
+   When we come here, we have already matched the !GCC$ UNROLL string.  */
+match
+gfc_match_gcc_unroll (void)
+{
+  int value;
+
+  if (gfc_match_small_int (&value) == MATCH_YES)
+    {
+      if (value < 0 || value > USHRT_MAX)
+	{
+	  gfc_error ("%<GCC unroll%> directive requires a"
+	      " non-negative integral constant"
+	      " less than or equal to %u at %C",
+	      USHRT_MAX
+	  );
+	  return MATCH_ERROR;
+	}
+      if (gfc_match_eos () == MATCH_YES)
+	{
+	  directive_unroll = value == 0 ? 1 : value;
+	  return MATCH_YES;
+	}
+    }
+
+  gfc_error ("Syntax error in !GCC$ UNROLL directive at %C");
+  return MATCH_ERROR;
+}
Index: fortran/gfortran.h
===================================================================
--- fortran/gfortran.h	(revision 255000)
+++ fortran/gfortran.h	(working copy)
@@ -2350,6 +2350,7 @@  gfc_case;
 typedef struct
 {
   gfc_expr *var, *start, *end, *step;
+  unsigned short unroll;
 }
 gfc_iterator;
 
@@ -2724,6 +2725,7 @@  gfc_finalizer;
 /* decl.c */
 bool gfc_in_match_data (void);
 match gfc_match_char_spec (gfc_typespec *);
+extern int directive_unroll;
 
 /* Handling Parameterized Derived Types  */
 bool gfc_insert_kind_parameter_exprs (gfc_expr *);
Index: fortran/match.c
===================================================================
--- fortran/match.c	(revision 255000)
+++ fortran/match.c	(working copy)
@@ -2539,8 +2539,8 @@  gfc_match_do (void)
 
   old_loc = gfc_current_locus;
 
+  memset (&iter, '\0', sizeof (gfc_iterator));
   label = NULL;
-  iter.var = iter.start = iter.end = iter.step = NULL;
 
   m = gfc_match_label ();
   if (m == MATCH_ERROR)
Index: fortran/match.h
===================================================================
--- fortran/match.h	(revision 255000)
+++ fortran/match.h	(working copy)
@@ -241,6 +241,7 @@  match gfc_match_contiguous (void);
 match gfc_match_dimension (void);
 match gfc_match_external (void);
 match gfc_match_gcc_attributes (void);
+match gfc_match_gcc_unroll (void);
 match gfc_match_import (void);
 match gfc_match_intent (void);
 match gfc_match_intrinsic (void);
Index: fortran/parse.c
===================================================================
--- fortran/parse.c	(revision 255000)
+++ fortran/parse.c	(working copy)
@@ -1063,6 +1063,7 @@  decode_gcc_attribute (void)
   old_locus = gfc_current_locus;
 
   match ("attributes", gfc_match_gcc_attributes, ST_ATTR_DECL);
+  match ("unroll", gfc_match_gcc_unroll, ST_NONE);
 
   /* All else has failed, so give up.  See if any of the matchers has
      stored an error message of some sort.  */
@@ -4634,7 +4635,14 @@  parse_do_block (void)
   s.ext.end_do_label = new_st.label1;
 
   if (new_st.ext.iterator != NULL)
-    stree = new_st.ext.iterator->var->symtree;
+    {
+      stree = new_st.ext.iterator->var->symtree;
+      if (directive_unroll != -1)
+	{
+	  new_st.ext.iterator->unroll = directive_unroll;
+	  directive_unroll = -1;
+	}
+    }
   else
     stree = NULL;
 
@@ -5392,6 +5400,9 @@  parse_executable (gfc_statement st)
 	  return st;
 	}
 
+      if (directive_unroll != -1)
+	gfc_error ("%<GCC unroll%> directive does not commence a loop at %C");
+
       st = next_statement ();
     }
 }
Index: fortran/trans-stmt.c
===================================================================
--- fortran/trans-stmt.c	(revision 255000)
+++ fortran/trans-stmt.c	(working copy)
@@ -1980,6 +1980,11 @@  gfc_trans_simple_do (gfc_code * code, st
 			    fold_convert (type, to));
 
   cond = gfc_evaluate_now_loc (loc, cond, &body);
+  if (code->ext.iterator->unroll && cond != error_mark_node)
+    cond
+      = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		build_int_cst (integer_type_node, annot_expr_unroll_kind),
+		build_int_cst (integer_type_node, code->ext.iterator->unroll));
 
   /* The loop exit.  */
   tmp = fold_build1_loc (loc, GOTO_EXPR, void_type_node, exit_label);
@@ -2306,6 +2311,11 @@  gfc_trans_do (gfc_code * code, tree exit
   /* End with the loop condition.  Loop until countm1t == 0.  */
   cond = fold_build2_loc (loc, EQ_EXPR, logical_type_node, countm1t,
 			  build_int_cst (utype, 0));
+  if (code->ext.iterator->unroll && cond != error_mark_node)
+    cond
+      = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		build_int_cst (integer_type_node, annot_expr_unroll_kind),
+		build_int_cst (integer_type_node, code->ext.iterator->unroll));
   tmp = fold_build1_loc (loc, GOTO_EXPR, void_type_node, exit_label);
   tmp = fold_build3_loc (loc, COND_EXPR, void_type_node,
 			 cond, tmp, build_empty_stmt (loc));
@@ -3460,9 +3470,10 @@  gfc_trans_forall_loop (forall_info *fora
       cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
 			      count, build_int_cst (TREE_TYPE (count), 0));
       if (forall_tmp->do_concurrent)
-	cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+	cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		       build_int_cst (integer_type_node,
-				      annot_expr_parallel_kind));
+				      annot_expr_parallel_kind),
+		       integer_zero_node);
 
       tmp = build1_v (GOTO_EXPR, exit_label);
       tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node,
Index: function.h
===================================================================
--- function.h	(revision 255000)
+++ function.h	(working copy)
@@ -385,8 +385,11 @@  struct GTY(()) function {
      nonzero value in loop->simduid.  */
   unsigned int has_simduid_loops : 1;
 
-  /* Set when the tail call has been identified.  */
+  /* Nonzero when the tail call has been identified.  */
   unsigned int tail_call_marked : 1;
+
+  /* Nonzero if the current function contains a #pragma GCC unroll.  */
+  unsigned int has_unroll : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
Index: gimplify.c
===================================================================
--- gimplify.c	(revision 255000)
+++ gimplify.c	(working copy)
@@ -3747,6 +3747,7 @@  gimple_boolify (tree expr)
       switch ((enum annot_expr_kind) TREE_INT_CST_LOW (TREE_OPERAND (expr, 1)))
 	{
 	case annot_expr_ivdep_kind:
+	case annot_expr_unroll_kind:
 	case annot_expr_no_vector_kind:
 	case annot_expr_vector_kind:
 	case annot_expr_parallel_kind:
@@ -11390,6 +11391,7 @@  gimplify_expr (tree *expr_p, gimple_seq
 	  {
 	    tree cond = TREE_OPERAND (*expr_p, 0);
 	    tree kind = TREE_OPERAND (*expr_p, 1);
+	    tree data = TREE_OPERAND (*expr_p, 2);
 	    tree type = TREE_TYPE (cond);
 	    if (!INTEGRAL_TYPE_P (type))
 	      {
@@ -11400,7 +11402,7 @@  gimplify_expr (tree *expr_p, gimple_seq
 	    tree tmp = create_tmp_var (type);
 	    gimplify_arg (&cond, pre_p, EXPR_LOCATION (*expr_p));
 	    gcall *call
-	      = gimple_build_call_internal (IFN_ANNOTATE, 2, cond, kind);
+	      = gimple_build_call_internal (IFN_ANNOTATE, 3, cond, kind, data);
 	    gimple_call_set_lhs (call, tmp);
 	    gimplify_seq_add_stmt (pre_p, call);
 	    *expr_p = tmp;
Index: loop-init.c
===================================================================
--- loop-init.c	(revision 255000)
+++ loop-init.c	(working copy)
@@ -361,8 +361,8 @@  pass_loop2::gate (function *fun)
       && (flag_move_loop_invariants
 	  || flag_unswitch_loops
 	  || flag_unroll_loops
-	  || (flag_branch_on_count_reg
-	      && targetm.have_doloop_end ())))
+	  || (flag_branch_on_count_reg && targetm.have_doloop_end ())
+	  || cfun->has_unroll))
     return true;
   else
     {
@@ -560,7 +560,7 @@  public:
   /* opt_pass methods: */
   virtual bool gate (function *)
     {
-      return (flag_unroll_loops || flag_unroll_all_loops);
+      return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll);
     }
 
   virtual unsigned int execute (function *);
Index: loop-unroll.c
===================================================================
--- loop-unroll.c	(revision 255000)
+++ loop-unroll.c	(working copy)
@@ -224,9 +224,16 @@  decide_unrolling (int flags)
 
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, locus,
-                         ";; *** Considering loop %d at BB %d for "
-                         "unrolling ***\n",
-                         loop->num, loop->header->index);
+			 "considering unrolling loop %d at BB %d\n",
+			 loop->num, loop->header->index);
+
+      if (loop->unroll == 1)
+	{
+	  if (dump_file)
+	    fprintf (dump_file,
+		     ";; Not unrolling loop, user didn't want it unrolled\n");
+	  continue;
+	}
 
       /* Do not peel cold areas.  */
       if (optimize_loop_for_size_p (loop))
@@ -256,9 +263,7 @@  decide_unrolling (int flags)
       loop->ninsns = num_loop_insns (loop);
       loop->av_ninsns = average_num_loop_insns (loop);
 
-      /* Try transformations one by one in decreasing order of
-	 priority.  */
-
+      /* Try transformations one by one in decreasing order of priority.  */
       decide_unroll_constant_iterations (loop, flags);
       if (loop->lpt_decision.decision == LPT_NONE)
 	decide_unroll_runtime_iterations (loop, flags);
@@ -347,19 +352,17 @@  decide_unroll_constant_iterations (struc
   struct niter_desc *desc;
   widest_int iterations;
 
-  if (!(flags & UAP_UNROLL))
-    {
-      /* We were not asked to, just return back silently.  */
-      return;
-    }
-
-  if (dump_file)
-    fprintf (dump_file,
-	     "\n;; Considering unrolling loop with constant "
-	     "number of iterations\n");
+  /* If we were not asked to unroll this loop, just return back silently.  */
+  if (!(flags & UAP_UNROLL) && !loop->unroll)
+    return;
+
+  if (dump_enabled_p ())
+    dump_printf (MSG_NOTE,
+		 "considering unrolling loop with constant "
+		 "number of iterations\n");
 
   /* nunroll = total number of copies of the original loop body in
-     unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
+     unrolled loop (i.e. if it is 2, we have to duplicate loop body once).  */
   nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
   nunroll_by_av
     = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
@@ -391,6 +394,24 @@  decide_unroll_constant_iterations (struc
       return;
     }
 
+  /* Check for an explicit unrolling factor.  */
+  if (loop->unroll)
+    {
+      /* However we cannot unroll completely at the RTL level a loop with
+	 constant number of iterations; it should have been peeled instead.  */
+      if ((unsigned) loop->unroll - 1 > desc->niter - 2)
+	{
+	  if (dump_file)
+	    fprintf (dump_file, ";; Loop should have been peeled\n");
+	}
+      else
+	{
+	  loop->lpt_decision.decision = LPT_UNROLL_CONSTANT;
+	  loop->lpt_decision.times = loop->unroll - 1;
+	}
+      return;
+    }
+
   /* Check whether the loop rolls enough to consider.  
      Consult also loop bounds and profile; in the case the loop has more
      than one exit it may well loop less than determined maximal number
@@ -412,7 +433,7 @@  decide_unroll_constant_iterations (struc
   best_copies = 2 * nunroll + 10;
 
   i = 2 * nunroll + 2;
-  if (i - 1 >= desc->niter)
+  if (i > desc->niter - 2)
     i = desc->niter - 2;
 
   for (; i >= nunroll - 1; i--)
@@ -651,16 +672,14 @@  decide_unroll_runtime_iterations (struct
   struct niter_desc *desc;
   widest_int iterations;
 
-  if (!(flags & UAP_UNROLL))
-    {
-      /* We were not asked to, just return back silently.  */
-      return;
-    }
-
-  if (dump_file)
-    fprintf (dump_file,
-	     "\n;; Considering unrolling loop with runtime "
-	     "computable number of iterations\n");
+  /* If we were not asked to unroll this loop, just return back silently.  */
+  if (!(flags & UAP_UNROLL) && !loop->unroll)
+    return;
+
+  if (dump_enabled_p ())
+    dump_printf (MSG_NOTE,
+		 "considering unrolling loop with runtime-"
+		 "computable number of iterations\n");
 
   /* nunroll = total number of copies of the original loop body in
      unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
@@ -674,6 +693,9 @@  decide_unroll_runtime_iterations (struct
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
 
+  if (loop->unroll)
+    nunroll = loop->unroll;
+
   /* Skip big loops.  */
   if (nunroll <= 1)
     {
@@ -712,8 +734,9 @@  decide_unroll_runtime_iterations (struct
       return;
     }
 
-  /* Success; now force nunroll to be power of 2, as we are unable to
-     cope with overflows in computation of number of iterations.  */
+  /* Success; now force nunroll to be power of 2, as code-gen
+     requires it, we are unable to cope with overflows in
+     computation of number of iterations.  */
   for (i = 1; 2 * i <= nunroll; i *= 2)
     continue;
 
@@ -824,9 +847,10 @@  compare_and_jump_seq (rtx op0, rtx op1,
   return seq;
 }
 
-/* Unroll LOOP for which we are able to count number of iterations in runtime
-   LOOP->LPT_DECISION.TIMES times.  The transformation does this (with some
-   extra care for case n < 0):
+/* Unroll LOOP for which we are able to count number of iterations in
+   runtime LOOP->LPT_DECISION.TIMES times.  The times value must be a
+   power of two.  The transformation does this (with some extra care
+   for case n < 0):
 
    for (i = 0; i < n; i++)
      body;
@@ -1133,14 +1157,12 @@  decide_unroll_stupid (struct loop *loop,
   struct niter_desc *desc;
   widest_int iterations;
 
-  if (!(flags & UAP_UNROLL_ALL))
-    {
-      /* We were not asked to, just return back silently.  */
-      return;
-    }
+  /* If we were not asked to unroll this loop, just return back silently.  */
+  if (!(flags & UAP_UNROLL_ALL) && !loop->unroll)
+    return;
 
-  if (dump_file)
-    fprintf (dump_file, "\n;; Considering unrolling loop stupidly\n");
+  if (dump_enabled_p ())
+    dump_printf (MSG_NOTE, "considering unrolling loop stupidly\n");
 
   /* nunroll = total number of copies of the original loop body in
      unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
@@ -1155,6 +1177,9 @@  decide_unroll_stupid (struct loop *loop,
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
 
+  if (loop->unroll)
+    nunroll = loop->unroll;
+
   /* Skip big loops.  */
   if (nunroll <= 1)
     {
@@ -1170,7 +1195,7 @@  decide_unroll_stupid (struct loop *loop,
   if (desc->simple_p && !desc->assumptions)
     {
       if (dump_file)
-	fprintf (dump_file, ";; The loop is simple\n");
+	fprintf (dump_file, ";; Loop is simple\n");
       return;
     }
 
Index: lto-streamer-in.c
===================================================================
--- lto-streamer-in.c	(revision 255000)
+++ lto-streamer-in.c	(working copy)
@@ -825,6 +825,7 @@  input_cfg (struct lto_input_block *ib, s
 
       /* Read OMP SIMD related info.  */
       loop->safelen = streamer_read_hwi (ib);
+      loop->unroll = streamer_read_hwi (ib);
       loop->dont_vectorize = streamer_read_hwi (ib);
       loop->force_vectorize = streamer_read_hwi (ib);
       loop->simduid = stream_read_tree (ib, data_in);
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(revision 255000)
+++ lto-streamer-out.c	(working copy)
@@ -1929,6 +1929,7 @@  output_cfg (struct output_block *ob, str
 
       /* Write OMP SIMD related info.  */
       streamer_write_hwi (ob, loop->safelen);
+      streamer_write_hwi (ob, loop->unroll);
       streamer_write_hwi (ob, loop->dont_vectorize);
       streamer_write_hwi (ob, loop->force_vectorize);
       stream_write_tree (ob, loop->simduid, true);
Index: testsuite/c-c++-common/unroll-1.c
===================================================================
--- testsuite/c-c++-common/unroll-1.c	(revision 0)
+++ testsuite/c-c++-common/unroll-1.c	(working copy)
@@ -0,0 +1,41 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details -fdump-rtl-loop2_unroll-details" } */
+
+extern void bar (int);
+
+int j;
+
+void test (void)
+{
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+  /* { dg-final { scan-tree-dump "11:.*: note: loop with 8 iterations completely unrolled" "cunrolli" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 7; ++i)
+    bar(i);
+  /* { dg-final { scan-tree-dump "16:.*: note: loop with 7 iterations completely unrolled" "cunrolli" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 15; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "21:.*: note: loop unrolled 7 times" "loop2_unroll" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "26:.*: note: loop unrolled 7 times" "loop2_unroll" } } */
+
+  #pragma GCC unroll 7
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "31:.*: note: loop unrolled 3 times" "loop2_unroll" } } */
+
+  unsigned long i = 0;
+  #pragma GCC unroll 3
+  do {
+    bar(i);
+  } while (++i < 9);
+  /* { dg-final { scan-rtl-dump "3\[79\]:.*: note: loop unrolled 2 times" "loop2_unroll" } } */
+}
Index: testsuite/c-c++-common/unroll-2.c
===================================================================
--- testsuite/c-c++-common/unroll-2.c	(revision 0)
+++ testsuite/c-c++-common/unroll-2.c	(working copy)
@@ -0,0 +1,41 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-cunroll-details -fdump-rtl-loop2_unroll-details" } */
+
+extern void bar (int);
+
+int j;
+
+void test (void)
+{
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+  /* { dg-final { scan-tree-dump "11:.*: note: loop with 7 iterations completely unrolled" "cunroll" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 7; ++i)
+    bar(i);
+  /* { dg-final { scan-tree-dump "16:.*: note: loop with 6 iterations completely unrolled" "cunroll" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 15; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "21:.*: note: loop unrolled 7 times" "loop2_unroll" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "26:.*: note: loop unrolled 7 times" "loop2_unroll" } } */
+
+  #pragma GCC unroll 7
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "31:.*: note: loop unrolled 3 times" "loop2_unroll" } } */
+
+  unsigned long i = 0;
+  #pragma GCC unroll 3
+  do {
+    bar(i);
+  } while (++i < 9);
+  /* { dg-final { scan-rtl-dump "3\[79\]:.*: note: loop unrolled 2 times" "loop2_unroll" } } */
+}
Index: testsuite/c-c++-common/unroll-3.c
===================================================================
--- testsuite/c-c++-common/unroll-3.c	(revision 0)
+++ testsuite/c-c++-common/unroll-3.c	(working copy)
@@ -0,0 +1,41 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -fdisable-tree-cunroll -fdump-rtl-loop2_unroll-details" } */
+
+extern void bar (int);
+
+int j;
+
+void test (void)
+{
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump-not "11:.*: note: loop unrolled" "loop2_unroll" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 7; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump-not "16:.*: note: loop unrolled" "loop2_unroll" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 15; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "21:.*: note: loop unrolled 7 times" "loop2_unroll" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "26:.*: note: loop unrolled 7 times" "loop2_unroll" } } */
+
+  #pragma GCC unroll 7
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "31:.*: note: loop unrolled 3 times" "loop2_unroll" } } */
+
+  unsigned long i = 0;
+  #pragma GCC unroll 3
+  do {
+    bar(i);
+  } while (++i < 9);
+  /* { dg-final { scan-rtl-dump "3\[79\]:.*: note: loop unrolled 2 times" "loop2_unroll" } } */
+}
Index: testsuite/c-c++-common/unroll-4.c
===================================================================
--- testsuite/c-c++-common/unroll-4.c	(revision 0)
+++ testsuite/c-c++-common/unroll-4.c	(working copy)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */
+
+extern void bar (int);
+
+int j;
+
+void test (void)
+{
+  #pragma GCC unroll 0
+  for (unsigned long i = 1; i <= 3; ++i)
+    bar(i);
+
+  #pragma GCC unroll 0
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+
+  /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */
+  /* { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */
+}
Index: testsuite/c-c++-common/unroll-5.c
===================================================================
--- testsuite/c-c++-common/unroll-5.c	(revision 0)
+++ testsuite/c-c++-common/unroll-5.c	(working copy)
@@ -0,0 +1,29 @@ 
+/* { dg-do compile } */
+
+extern void bar (int);
+
+int j;
+
+void test (void)
+{
+  #pragma GCC unroll 4+4
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+
+  #pragma GCC unroll -1	/* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+
+  #pragma GCC unroll 20000000000	/* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+
+  #pragma GCC unroll j	/* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */
+                        /* { dg-error "cannot appear in a constant-expression|is not usable in a constant expression" "" { target c++ } 21 } */
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+
+  #pragma GCC unroll  4.2	/* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+}
Index: testsuite/gcc.dg/pr64277.c
===================================================================
--- testsuite/gcc.dg/pr64277.c	(revision 255000)
+++ testsuite/gcc.dg/pr64277.c	(working copy)
@@ -1,8 +1,8 @@ 
 /* PR tree-optimization/64277 */
 /* { dg-do compile } */
 /* { dg-options "-O3 -Wall -Werror -fdump-tree-cunroll-details" } */
+/* { dg-final { scan-tree-dump "loop with 4 iterations completely unrolled" "cunroll" } } */
 /* { dg-final { scan-tree-dump "loop with 5 iterations completely unrolled" "cunroll" } } */
-/* { dg-final { scan-tree-dump "loop with 6 iterations completely unrolled" "cunroll" } } */
 
 #if __SIZEOF_INT__ < 4
   __extension__ typedef __INT32_TYPE__ int32_t;
Index: testsuite/gcc.dg/tree-prof/unroll-1.c
===================================================================
--- testsuite/gcc.dg/tree-prof/unroll-1.c	(revision 255000)
+++ testsuite/gcc.dg/tree-prof/unroll-1.c	(working copy)
@@ -1,4 +1,4 @@ 
-/* { dg-options "-O3 -fdump-rtl-loop2_unroll -funroll-loops -fno-peel-loops" } */
+/* { dg-options "-O3 -fdump-rtl-loop2_unroll-details -funroll-loops -fno-peel-loops" } */
 void abort ();
 
 int a[1000];
@@ -20,4 +20,4 @@  main()
     t();
   return 0;
 }
-/* { dg-final-use { scan-rtl-dump "Considering unrolling loop with constant number of iterations" "loop2_unroll" } } */
+/* { dg-final-use { scan-rtl-dump "considering unrolling loop with constant number of iterations" "loop2_unroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/cunroll-1.c	(working copy)
@@ -9,5 +9,5 @@  test(int c)
     a[i]=5;
 }
 /* Array bounds says the loop will not roll much.  */
-/* { dg-final { scan-tree-dump "loop with 3 iterations completely unrolled" "cunrolli"} } */
+/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunrolli"} } */
 /* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunrolli"} } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-12.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-12.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/cunroll-12.c	(working copy)
@@ -7,5 +7,5 @@  t(struct a *a)
   for (int i=0;a->a[i];i++)
     a->a[i]++;
 }
-/* { dg-final { scan-tree-dump-times "loop with 7 iterations completely unrolled" 1 "cunroll" } } */
+/* { dg-final { scan-tree-dump-times "loop with 6 iterations completely unrolled" 1 "cunroll" } } */
 /* { dg-final { scan-tree-dump-not "Invalid sum" "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-13.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-13.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/cunroll-13.c	(working copy)
@@ -19,5 +19,5 @@  t(struct a *a)
 /* { dg-final { scan-tree-dump-times "Loop 1 iterates 123454 times" 1 "cunroll" } } */
 /* { dg-final { scan-tree-dump-times "Last iteration exit edge was proved true" 1 "cunroll" } } */
 /* { dg-final { scan-tree-dump-times "Exit condition of peeled iterations was eliminated" 1 "cunroll" } } */
-/* { dg-final { scan-tree-dump-times "loop with 7 iterations completely unrolled" 1 "cunroll" } } */
+/* { dg-final { scan-tree-dump-times "loop with 6 iterations completely unrolled" 1 "cunroll" } } */
 /* { dg-final { scan-tree-dump-not "Invalid sum" "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-14.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-14.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/cunroll-14.c	(working copy)
@@ -7,7 +7,7 @@  t(struct a *a)
   for (int i=0;i<5 && a->a[i];i++)
     a->a[i]++;
 }
-/* { dg-final { scan-tree-dump-times "loop with 5 iterations completely unrolled" 1 "cunroll" } } */
+/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 1 "cunroll" } } */
 /* { dg-final { scan-tree-dump-not "Invalid sum" "cunroll" } } */
 /* { dg-final { scan-tree-dump-times "Loop 1 iterates 4 times" 1 "cunroll" } } */
 /* { dg-final { scan-tree-dump-times "Last iteration exit edge was proved true" 1 "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/cunroll-2.c	(working copy)
@@ -14,4 +14,4 @@  test(int c)
     }
 }
 /* We are not able to get rid of the final conditional because the loop has two exits.  */
-/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunroll"} } */
+/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" "cunroll"} } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/cunroll-3.c	(working copy)
@@ -12,4 +12,4 @@  test(int c)
 }
 /* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */
 
-/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunrolli"} } */
+/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" "cunrolli"} } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/cunroll-5.c	(working copy)
@@ -9,6 +9,6 @@  test(int c)
     a[i]=5;
 }
 /* Basic testcase for complette unrolling.  */
-/* { dg-final { scan-tree-dump "loop with 6 iterations completely unrolled" "cunroll"} } */
+/* { dg-final { scan-tree-dump "loop with 5 iterations completely unrolled" "cunroll"} } */
 /* { dg-final { scan-tree-dump "Exit condition of peeled iterations was eliminated." "cunroll"} } */
 /* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunroll"} } */
Index: testsuite/gcc.dg/tree-ssa/loop-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/loop-1.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/loop-1.c	(working copy)
@@ -34,7 +34,7 @@  int xxx(void)
 /* We should be able to find out that the loop iterates four times and unroll it completely.  */
 
 /* { dg-final { scan-tree-dump-times "Added canonical iv to loop 1, 4 iterations" 1 "ivcanon"} } */
-/* { dg-final { scan-tree-dump-times "loop with 5 iterations completely unrolled" 1 "cunroll"} } */
+/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 1 "cunroll"} } */
 /* { dg-final { scan-tree-dump-times "foo" 5 "optimized"} } */
 
 /* Because hppa, ia64 and Windows targets include an external declaration
Index: testsuite/gcc.dg/tree-ssa/loop-23.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/loop-23.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/loop-23.c	(working copy)
@@ -24,5 +24,4 @@  int foo(void)
   return sum;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 1 "cunroll" } } */
-
+/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/pr61743-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/pr61743-1.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/pr61743-1.c	(working copy)
@@ -48,5 +48,5 @@  int foo1 (e_u8 a[4][N], int b1, int b2,
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 8 "cunroll" } } */
-/* { dg-final { scan-tree-dump-times "loop with 9 iterations completely unrolled" 2 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 8 "cunroll" } } */
+/* { dg-final { scan-tree-dump-times "loop with 8 iterations completely unrolled" 2 "cunrolli" } } */
Index: testsuite/gcc.dg/tree-ssa/pr61743-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/pr61743-2.c	(revision 255000)
+++ testsuite/gcc.dg/tree-ssa/pr61743-2.c	(working copy)
@@ -48,5 +48,5 @@  int foo1 (e_u8 a[4][N], int b1, int b2,
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 2 "cunroll" } } */
-/* { dg-final { scan-tree-dump-times "loop with 8 iterations completely unrolled" 2 "cunroll" } } */
+/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 2 "cunroll" } } */
+/* { dg-final { scan-tree-dump-times "loop with 7 iterations completely unrolled" 2 "cunroll" } } */
Index: testsuite/gcc.dg/unroll-2.c
===================================================================
--- testsuite/gcc.dg/unroll-2.c	(revision 255000)
+++ testsuite/gcc.dg/unroll-2.c	(working copy)
@@ -15,7 +15,7 @@  int foo(void)
 {
   int i;
   bar();
-  for (i = 0; i < 2; i++) /* { dg-message "note: loop with 3 iterations completely unrolled" } */
+  for (i = 0; i < 2; i++) /* { dg-message "note: loop with 2 iterations completely unrolled" } */
   {
      a[i]= b[i] + 1;
   }
@@ -25,7 +25,7 @@  int foo(void)
 int foo2(void)
 {
   int i;
-  for (i = 0; i < 2; i++) /* { dg-message "note: loop with 3 iterations completely unrolled" } */
+  for (i = 0; i < 2; i++) /* { dg-message "note: loop with 2 iterations completely unrolled" } */
   {
      a[i]= b[i] + 1;
   }
Index: testsuite/gcc.dg/unroll-3.c
===================================================================
--- testsuite/gcc.dg/unroll-3.c	(revision 255000)
+++ testsuite/gcc.dg/unroll-3.c	(working copy)
@@ -28,4 +28,4 @@  int foo2(void)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
Index: testsuite/gcc.dg/unroll-4.c
===================================================================
--- testsuite/gcc.dg/unroll-4.c	(revision 255000)
+++ testsuite/gcc.dg/unroll-4.c	(working copy)
@@ -28,4 +28,4 @@  int foo2(void)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
Index: testsuite/gcc.dg/unroll-5.c
===================================================================
--- testsuite/gcc.dg/unroll-5.c	(revision 255000)
+++ testsuite/gcc.dg/unroll-5.c	(working copy)
@@ -28,4 +28,4 @@  int foo2(void)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
Index: testsuite/gcc.dg/unroll-7.c
===================================================================
--- testsuite/gcc.dg/unroll-7.c	(revision 255000)
+++ testsuite/gcc.dg/unroll-7.c	(working copy)
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -funroll-loops" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll-details -funroll-loops" } */
 /* { dg-require-effective-target int32plus } */
 
 extern int *a;
@@ -14,5 +14,5 @@  int t(void)
 /* { dg-final { scan-rtl-dump "number of iterations: .const_int 999999" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump "upper bound: 999999" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump "realistic bound: 999999" "loop2_unroll" } } */
-/* { dg-final { scan-rtl-dump "Considering unrolling loop with constant number of iterations" "loop2_unroll" } } */
+/* { dg-final { scan-rtl-dump "considering unrolling loop with constant number of iterations" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump-not "Invalid sum" "loop2_unroll" } } */
Index: testsuite/gfortran.dg/directive_unroll_1.f90
===================================================================
--- testsuite/gfortran.dg/directive_unroll_1.f90	(revision 0)
+++ testsuite/gfortran.dg/directive_unroll_1.f90	(working copy)
@@ -0,0 +1,52 @@ 
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-cunrolli-details -fdump-rtl-loop2_unroll-details" }
+! Test that
+! #pragma GCC unroll n
+! works
+
+subroutine test1(a)
+  implicit NONE
+  integer :: a(8)
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=1, 8, 1
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-tree-dump "12:.*: note: loop with 8 iterations completely unrolled" "cunrolli" } } */
+end subroutine test1
+
+subroutine test2(a, n)
+  implicit NONE
+  integer :: a(n)
+  integer (kind=1), intent(in) :: n
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=1, n, 1
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-rtl-dump "24:.: note: loop unrolled 7 times" "loop2_unroll" } }
+end subroutine test2
+
+subroutine test3(a, n)
+  implicit NONE
+  integer (kind=1), intent(in) :: n
+  integer :: a(n)
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=n, 1, -1
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-rtl-dump "36:.: note: loop unrolled 7 times" "loop2_unroll" } }
+end subroutine test3
+
+subroutine test4(a, n)
+  implicit NONE
+  integer (kind=1), intent(in) :: n
+  integer :: a(n)
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=1, n, 2
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-rtl-dump "48:.: note: loop unrolled 7 times" "loop2_unroll" } }
+end subroutine test4
Index: testsuite/gfortran.dg/directive_unroll_2.f90
===================================================================
--- testsuite/gfortran.dg/directive_unroll_2.f90	(revision 0)
+++ testsuite/gfortran.dg/directive_unroll_2.f90	(working copy)
@@ -0,0 +1,52 @@ 
+! { dg-do compile }
+! { dg-options "-O -fdump-tree-cunroll-details -fdump-rtl-loop2_unroll-details" }
+! Test that
+! #pragma GCC unroll n
+! works
+
+subroutine test1(a)
+  implicit NONE
+  integer :: a(8)
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=1, 8, 1
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-tree-dump "12:.*: note: loop with 7 iterations completely unrolled" "cunroll" } } */
+end subroutine test1
+
+subroutine test2(a, n)
+  implicit NONE
+  integer :: a(n)
+  integer (kind=1), intent(in) :: n
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=1, n, 1
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-rtl-dump "24:.: note: loop unrolled 7 times" "loop2_unroll" } }
+end subroutine test2
+
+subroutine test3(a, n)
+  implicit NONE
+  integer (kind=1), intent(in) :: n
+  integer :: a(n)
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=n, 1, -1
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-rtl-dump "36:.: note: loop unrolled 7 times" "loop2_unroll" } }
+end subroutine test3
+
+subroutine test4(a, n)
+  implicit NONE
+  integer (kind=1), intent(in) :: n
+  integer :: a(n)
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=1, n, 2
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-rtl-dump "48:.: note: loop unrolled 7 times" "loop2_unroll" } }
+end subroutine test4
Index: testsuite/gfortran.dg/directive_unroll_3.f90
===================================================================
--- testsuite/gfortran.dg/directive_unroll_3.f90	(revision 0)
+++ testsuite/gfortran.dg/directive_unroll_3.f90	(working copy)
@@ -0,0 +1,52 @@ 
+! { dg-do compile }
+! { dg-options "-O -fdisable-tree-cunroll -fdump-rtl-loop2_unroll-details" }
+! Test that
+! #pragma GCC unroll n
+! works
+
+subroutine test1(a)
+  implicit NONE
+  integer :: a(8)
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=1, 8, 1
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-rtl-dump-not "12:.: note: loop unrolled" "loop2_unroll" } }
+end subroutine test1
+
+subroutine test2(a, n)
+  implicit NONE
+  integer :: a(n)
+  integer (kind=1), intent(in) :: n
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=1, n, 1
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-rtl-dump "24:.: note: loop unrolled 7 times" "loop2_unroll" } }
+end subroutine test2
+
+subroutine test3(a, n)
+  implicit NONE
+  integer (kind=1), intent(in) :: n
+  integer :: a(n)
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=n, 1, -1
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-rtl-dump "36:.: note: loop unrolled 7 times" "loop2_unroll" } }
+end subroutine test3
+
+subroutine test4(a, n)
+  implicit NONE
+  integer (kind=1), intent(in) :: n
+  integer :: a(n)
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  DO i=1, n, 2
+    call dummy(a(i))
+  ENDDO
+! { dg-final { scan-rtl-dump "48:.: note: loop unrolled 7 times" "loop2_unroll" } }
+end subroutine test4
Index: testsuite/gfortran.dg/directive_unroll_4.f90
===================================================================
--- testsuite/gfortran.dg/directive_unroll_4.f90	(revision 0)
+++ testsuite/gfortran.dg/directive_unroll_4.f90	(working copy)
@@ -0,0 +1,29 @@ 
+! { dg-do compile }
+! { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" }
+! Test that
+! #pragma GCC unroll n
+! works
+
+subroutine test1(a)
+  implicit NONE
+  integer :: a(8)
+  integer (kind=4) :: i
+!GCC$ unroll 0
+  DO i=1, 8, 1
+    call dummy(a(i))
+  ENDDO
+end subroutine test1
+
+subroutine test2(a, n)
+  implicit NONE
+  integer :: a(n)
+  integer (kind=1), intent(in) :: n
+  integer (kind=4) :: i
+!GCC$ unroll 0
+  DO i=1, n, 1
+    call dummy(a(i))
+  ENDDO
+end subroutine test2
+
+! { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */
+! { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */
Index: testsuite/gfortran.dg/directive_unroll_5.f90
===================================================================
--- testsuite/gfortran.dg/directive_unroll_5.f90	(revision 0)
+++ testsuite/gfortran.dg/directive_unroll_5.f90	(working copy)
@@ -0,0 +1,38 @@ 
+! { dg-do compile }
+
+! Test that
+! #pragma GCC unroll n
+! rejects invalid n and improper use
+
+subroutine wrong1(n)
+  implicit NONE
+  integer (kind=1), intent(in) :: n
+  integer (kind=4) :: i
+!GCC$ unroll 999999999 ! { dg-error "non-negative integral constant less than" }
+  DO i=0, n, 1
+    call dummy1(i)
+  ENDDO
+end subroutine wrong1
+
+subroutine wrong2(a, b, n)
+  implicit NONE
+  integer (kind=1), intent(in) :: n
+  integer :: a(n), b(n)
+  integer (kind=4) :: i
+!GCC$ unroll -1 ! { dg-error "non-negative integral constant less than" }
+  DO i=1, n, 2
+    call dummy2(a(i), b(i), i)
+  ENDDO
+end subroutine wrong2
+
+subroutine wrong3(a, b, n)
+  implicit NONE
+  integer (kind=1), intent(in) :: n
+  integer :: a(n), b(n)
+  integer (kind=4) :: i
+!GCC$ unroll 8
+  write (*,*) "wrong"! { dg-error "directive does not commence a loop" }
+  DO i=n, 1, -1
+    call dummy2(a(i), b(i), i)
+  ENDDO
+end subroutine wrong3
Index: testsuite/gnat.dg/unroll1.adb
===================================================================
--- testsuite/gnat.dg/unroll1.adb	(revision 0)
+++ testsuite/gnat.dg/unroll1.adb	(working copy)
@@ -0,0 +1,27 @@ 
+-- { dg-do compile }
+-- { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" }
+
+package body Unroll1 is
+
+   function "+" (X, Y : Sarray) return Sarray is
+      R : Sarray;
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (No_Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+      return R;
+   end;
+
+   procedure Add (X, Y : Sarray; R : out Sarray) is
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (No_Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+   end;
+
+end Unroll1;
+
+-- { dg-final { scan-tree-dump-times "Not unrolling loop .: user didn't want it unrolled completely" 2 "cunrolli" } } */
+-- { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */
Index: testsuite/gnat.dg/unroll1.ads
===================================================================
--- testsuite/gnat.dg/unroll1.ads	(revision 0)
+++ testsuite/gnat.dg/unroll1.ads	(working copy)
@@ -0,0 +1,9 @@ 
+package Unroll1 is
+
+   type Sarray is array (1 .. 4) of Float;
+   for Sarray'Alignment use 16;
+
+   function "+" (X, Y : Sarray) return Sarray;
+   procedure Add (X, Y : Sarray; R : out Sarray);
+
+end Unroll1;
Index: testsuite/gnat.dg/unroll2.adb
===================================================================
--- testsuite/gnat.dg/unroll2.adb	(revision 0)
+++ testsuite/gnat.dg/unroll2.adb	(working copy)
@@ -0,0 +1,26 @@ 
+-- { dg-do compile }
+-- { dg-options "-O2 -fdump-tree-cunrolli-details" }
+
+package body Unroll2 is
+
+   function "+" (X, Y : Sarray) return Sarray is
+      R : Sarray;
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+      return R;
+   end;
+
+   procedure Add (X, Y : Sarray; R : out Sarray) is
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+   end;
+
+end Unroll2;
+
+-- { dg-final { scan-tree-dump-times "note: loop with 3 iterations completely unrolled" 2 "cunrolli" } } */
Index: testsuite/gnat.dg/unroll2.ads
===================================================================
--- testsuite/gnat.dg/unroll2.ads	(revision 0)
+++ testsuite/gnat.dg/unroll2.ads	(working copy)
@@ -0,0 +1,9 @@ 
+package Unroll2 is
+
+   type Sarray is array (1 .. 4) of Float;
+   for Sarray'Alignment use 16;
+
+   function "+" (X, Y : Sarray) return Sarray;
+   procedure Add (X, Y : Sarray; R : out Sarray);
+
+end Unroll2;
Index: testsuite/gnat.dg/unroll3.adb
===================================================================
--- testsuite/gnat.dg/unroll3.adb	(revision 0)
+++ testsuite/gnat.dg/unroll3.adb	(working copy)
@@ -0,0 +1,26 @@ 
+-- { dg-do compile }
+-- { dg-options "-O -fdump-tree-cunroll-details" }
+
+package body Unroll3 is
+
+   function "+" (X, Y : Sarray) return Sarray is
+      R : Sarray;
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+      return R;
+   end;
+
+   procedure Add (X, Y : Sarray; R : out Sarray) is
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+   end;
+
+end Unroll3;
+
+-- { dg-final { scan-tree-dump-times "note: loop with 3 iterations completely unrolled" 2 "cunroll" } } */
Index: testsuite/gnat.dg/unroll3.ads
===================================================================
--- testsuite/gnat.dg/unroll3.ads	(revision 0)
+++ testsuite/gnat.dg/unroll3.ads	(working copy)
@@ -0,0 +1,9 @@ 
+package Unroll3 is
+
+   type Sarray is array (1 .. 4) of Float;
+   for Sarray'Alignment use 16;
+
+   function "+" (X, Y : Sarray) return Sarray;
+   procedure Add (X, Y : Sarray; R : out Sarray);
+
+end Unroll3;
Index: tree-cfg.c
===================================================================
--- tree-cfg.c	(revision 255000)
+++ tree-cfg.c	(working copy)
@@ -280,6 +280,11 @@  replace_loop_annotate_in_block (basic_bl
 	case annot_expr_ivdep_kind:
 	  loop->safelen = INT_MAX;
 	  break;
+	case annot_expr_unroll_kind:
+	  loop->unroll
+	    = (unsigned short) tree_to_shwi (gimple_call_arg (stmt, 2));
+	  cfun->has_unroll = true;
+	  break;
 	case annot_expr_no_vector_kind:
 	  loop->dont_vectorize = true;
 	  break;
@@ -338,6 +343,7 @@  replace_loop_annotate (void)
 	  switch ((annot_expr_kind) tree_to_shwi (gimple_call_arg (stmt, 1)))
 	    {
 	    case annot_expr_ivdep_kind:
+	    case annot_expr_unroll_kind:
 	    case annot_expr_no_vector_kind:
 	    case annot_expr_vector_kind:
 	      break;
@@ -7993,6 +7999,8 @@  print_loop (FILE *file, struct loop *loo
       fprintf (file, ", estimate = ");
       print_decu (loop->nb_iterations_estimate, file);
     }
+  if (loop->unroll)
+    fprintf (file, ", unroll = %d", loop->unroll);
   fprintf (file, ")\n");
 
   /* Print loop's body.  */
Index: tree-core.h
===================================================================
--- tree-core.h	(revision 255000)
+++ tree-core.h	(working copy)
@@ -851,6 +851,7 @@  enum tree_node_kind {
 
 enum annot_expr_kind {
   annot_expr_ivdep_kind,
+  annot_expr_unroll_kind,
   annot_expr_no_vector_kind,
   annot_expr_vector_kind,
   annot_expr_parallel_kind,
Index: tree-inline.c
===================================================================
--- tree-inline.c	(revision 255000)
+++ tree-inline.c	(working copy)
@@ -2597,6 +2597,11 @@  copy_loops (copy_body_data *id,
 	  flow_loop_tree_node_add (dest_parent, dest_loop);
 
 	  dest_loop->safelen = src_loop->safelen;
+	  if (src_loop->unroll)
+	    {
+	      dest_loop->unroll = src_loop->unroll;
+	      cfun->has_unroll = true;
+	    }
 	  dest_loop->dont_vectorize = src_loop->dont_vectorize;
 	  if (src_loop->force_vectorize)
 	    {
Index: tree-pretty-print.c
===================================================================
--- tree-pretty-print.c	(revision 255000)
+++ tree-pretty-print.c	(working copy)
@@ -2632,6 +2632,10 @@  dump_generic_node (pretty_printer *pp, t
 	case annot_expr_ivdep_kind:
 	  pp_string (pp, ", ivdep");
 	  break;
+	case annot_expr_unroll_kind:
+	  pp_printf (pp, ", unroll %d",
+		     (int) TREE_INT_CST_LOW (TREE_OPERAND (node, 2)));
+	  break;
 	case annot_expr_no_vector_kind:
 	  pp_string (pp, ", no-vector");
 	  break;
Index: tree-ssa-loop-ivcanon.c
===================================================================
--- tree-ssa-loop-ivcanon.c	(revision 255000)
+++ tree-ssa-loop-ivcanon.c	(working copy)
@@ -681,11 +681,9 @@  try_unroll_loop_completely (struct loop
 			    HOST_WIDE_INT maxiter,
 			    location_t locus)
 {
-  unsigned HOST_WIDE_INT n_unroll = 0, ninsns, unr_insns;
-  struct loop_size size;
+  unsigned HOST_WIDE_INT n_unroll = 0;
   bool n_unroll_found = false;
   edge edge_to_cancel = NULL;
-  dump_flags_t report_flags = MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS;
 
   /* See if we proved number of iterations to be low constant.
 
@@ -726,7 +724,8 @@  try_unroll_loop_completely (struct loop
   if (!n_unroll_found)
     return false;
 
-  if (n_unroll > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES))
+  if (!loop->unroll
+      && n_unroll > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "Not unrolling loop %d "
@@ -740,121 +739,137 @@  try_unroll_loop_completely (struct loop
 
   if (n_unroll)
     {
-      bool large;
       if (ul == UL_SINGLE_ITER)
 	return false;
 
-      /* EXIT can be removed only if we are sure it passes first N_UNROLL
-	 iterations.  */
-      bool remove_exit = (exit && niter
-			  && TREE_CODE (niter) == INTEGER_CST
-			  && wi::leu_p (n_unroll, wi::to_widest (niter)));
-
-      large = tree_estimate_loop_size
-		 (loop, remove_exit ? exit : NULL, edge_to_cancel, &size,
-		  PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS));
-      ninsns = size.overall;
-      if (large)
+      if (loop->unroll)
 	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: it is too large.\n",
-		     loop->num);
-	  return false;
+	  /* If the unrolling factor is too large, bail out.  */
+	  if (n_unroll > (unsigned)loop->unroll)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file,
+			 "Not unrolling loop %d: "
+			 "user didn't want it unrolled completely.\n",
+			 loop->num);
+	      return false;
+	    }
 	}
-
-      unr_insns = estimated_unrolled_size (&size, n_unroll);
-      if (dump_file && (dump_flags & TDF_DETAILS))
+      else
 	{
-	  fprintf (dump_file, "  Loop size: %d\n", (int) ninsns);
-	  fprintf (dump_file, "  Estimated size after unrolling: %d\n",
-		   (int) unr_insns);
-	}
+	  struct loop_size size;
+	  /* EXIT can be removed only if we are sure it passes first N_UNROLL
+	     iterations.  */
+	  bool remove_exit = (exit && niter
+			      && TREE_CODE (niter) == INTEGER_CST
+			      && wi::leu_p (n_unroll, wi::to_widest (niter)));
+	  bool large
+	    = tree_estimate_loop_size
+		(loop, remove_exit ? exit : NULL, edge_to_cancel, &size,
+		 PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS));
+	  if (large)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: it is too large.\n",
+			 loop->num);
+	      return false;
+	    }
 
-      /* If the code is going to shrink, we don't need to be extra cautious
-	 on guessing if the unrolling is going to be profitable.  */
-      if (unr_insns
-	  /* If there is IV variable that will become constant, we save
-	     one instruction in the loop prologue we do not account
-	     otherwise.  */
-	  <= ninsns + (size.constant_iv != false))
-	;
-      /* We unroll only inner loops, because we do not consider it profitable
-	 otheriwse.  We still can cancel loopback edge of not rolling loop;
-	 this is always a good idea.  */
-      else if (ul == UL_NO_GROWTH)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: size would grow.\n",
-		     loop->num);
-	  return false;
-	}
-      /* Outer loops tend to be less interesting candidates for complete
-	 unrolling unless we can do a lot of propagation into the inner loop
-	 body.  For now we disable outer loop unrolling when the code would
-	 grow.  */
-      else if (loop->inner)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: "
-		     "it is not innermost and code would grow.\n",
-		     loop->num);
-	  return false;
-	}
-      /* If there is call on a hot path through the loop, then
-	 there is most probably not much to optimize.  */
-      else if (size.num_non_pure_calls_on_hot_path)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: "
-		     "contains call and code would grow.\n",
-		     loop->num);
-	  return false;
-	}
-      /* If there is pure/const call in the function, then we
-	 can still optimize the unrolled loop body if it contains
-	 some other interesting code than the calls and code
-	 storing or cumulating the return value.  */
-      else if (size.num_pure_calls_on_hot_path
-	       /* One IV increment, one test, one ivtmp store
-		  and one useful stmt.  That is about minimal loop
-		  doing pure call.  */
-	       && (size.non_call_stmts_on_hot_path
-		   <= 3 + size.num_pure_calls_on_hot_path))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: "
-		     "contains just pure calls and code would grow.\n",
-		     loop->num);
-	  return false;
-	}
-      /* Complete unrolling is a major win when control flow is removed and
-	 one big basic block is created.  If the loop contains control flow
-	 the optimization may still be a win because of eliminating the loop
-	 overhead but it also may blow the branch predictor tables.
-	 Limit number of branches on the hot path through the peeled
-	 sequence.  */
-      else if (size.num_branches_on_hot_path * (int)n_unroll
-	       > PARAM_VALUE (PARAM_MAX_PEEL_BRANCHES))
-	{
+	  unsigned HOST_WIDE_INT ninsns = size.overall;
+	  unsigned HOST_WIDE_INT unr_insns
+	    = estimated_unrolled_size (&size, n_unroll);
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: "
-		     " number of branches on hot path in the unrolled sequence"
-		     " reach --param max-peel-branches limit.\n",
-		     loop->num);
-	  return false;
-	}
-      else if (unr_insns
-	       > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: "
-		     "(--param max-completely-peeled-insns limit reached).\n",
-		     loop->num);
-	  return false;
+	    {
+	      fprintf (dump_file, "  Loop size: %d\n", (int) ninsns);
+	      fprintf (dump_file, "  Estimated size after unrolling: %d\n",
+		       (int) unr_insns);
+	    }
+
+	  /* If the code is going to shrink, we don't need to be extra
+	     cautious on guessing if the unrolling is going to be
+	     profitable.  */
+	  if (unr_insns
+	      /* If there is IV variable that will become constant, we
+		 save one instruction in the loop prologue we do not
+		 account otherwise.  */
+	      <= ninsns + (size.constant_iv != false))
+	    ;
+	  /* We unroll only inner loops, because we do not consider it
+	     profitable otheriwse.  We still can cancel loopback edge
+	     of not rolling loop; this is always a good idea.  */
+	  else if (ul == UL_NO_GROWTH)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: size would grow.\n",
+			 loop->num);
+	      return false;
+	    }
+	  /* Outer loops tend to be less interesting candidates for
+	     complete unrolling unless we can do a lot of propagation
+	     into the inner loop body.  For now we disable outer loop
+	     unrolling when the code would grow.  */
+	  else if (loop->inner)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: "
+			 "it is not innermost and code would grow.\n",
+			 loop->num);
+	      return false;
+	    }
+	  /* If there is call on a hot path through the loop, then
+	     there is most probably not much to optimize.  */
+	  else if (size.num_non_pure_calls_on_hot_path)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: "
+			 "contains call and code would grow.\n",
+			 loop->num);
+	      return false;
+	    }
+	  /* If there is pure/const call in the function, then we can
+	     still optimize the unrolled loop body if it contains some
+	     other interesting code than the calls and code storing or
+	     cumulating the return value.  */
+	  else if (size.num_pure_calls_on_hot_path
+		   /* One IV increment, one test, one ivtmp store and
+		      one useful stmt.  That is about minimal loop
+		      doing pure call.  */
+		   && (size.non_call_stmts_on_hot_path
+		       <= 3 + size.num_pure_calls_on_hot_path))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: "
+			 "contains just pure calls and code would grow.\n",
+			 loop->num);
+	      return false;
+	    }
+	  /* Complete unrolling is major win when control flow is
+	     removed and one big basic block is created.  If the loop
+	     contains control flow the optimization may still be a win
+	     because of eliminating the loop overhead but it also may
+	     blow the branch predictor tables.  Limit number of
+	     branches on the hot path through the peeled sequence.  */
+	  else if (size.num_branches_on_hot_path * (int)n_unroll
+		   > PARAM_VALUE (PARAM_MAX_PEEL_BRANCHES))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: "
+			 "number of branches on hot path in the unrolled "
+			 "sequence reaches --param max-peel-branches limit.\n",
+			 loop->num);
+	      return false;
+	    }
+	  else if (unr_insns
+		   > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: "
+			 "number of insns in the unrolled sequence reaches "
+			 "--param max-completely-peeled-insns limit.\n",
+			 loop->num);
+	      return false;
+	    }
 	}
-      if (!n_unroll)
-        dump_printf_loc (report_flags, locus,
-                         "loop turned into non-loop; it never loops.\n");
 
       initialize_original_copy_tables ();
       auto_sbitmap wont_exit (n_unroll + 1);
@@ -898,8 +913,8 @@  try_unroll_loop_completely (struct loop
       else
 	gimple_cond_make_true (cond);
       update_stmt (cond);
-      /* Do not remove the path. Doing so may remove outer loop
-	 and confuse bookkeeping code in tree_unroll_loops_completelly.  */
+      /* Do not remove the path, as doing so may remove outer loop and
+	 confuse bookkeeping code in tree_unroll_loops_completely.  */
     }
 
   /* Store the loop for later unlooping and exit removal.  */
@@ -915,7 +930,7 @@  try_unroll_loop_completely (struct loop
         {
           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS, locus,
                            "loop with %d iterations completely unrolled",
-			   (int) (n_unroll + 1));
+			   (int) n_unroll);
           if (loop->header->count.initialized_p ())
             dump_printf (MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS,
                          " (header execution count %d)",
@@ -963,7 +978,8 @@  try_peel_loop (struct loop *loop,
   struct loop_size size;
   int peeled_size;
 
-  if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0
+  if (!flag_peel_loops
+      || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0
       || !peeled_loops)
     return false;
 
@@ -974,20 +990,29 @@  try_peel_loop (struct loop *loop,
       return false;
     }
 
+  /* We don't peel loops that will be unrolled as this can duplicate a
+     loop more times than the user requested.  */
+  if (loop->unroll)
+    {
+      if (dump_file)
+        fprintf (dump_file, "Not peeling: user didn't want it peeled.\n");
+      return false;
+    }
+
   /* Peel only innermost loops.
      While the code is perfectly capable of peeling non-innermost loops,
      the heuristics would probably need some improvements. */
   if (loop->inner)
     {
       if (dump_file)
-        fprintf (dump_file, "Not peeling: outer loop\n");
+	fprintf (dump_file, "Not peeling: outer loop\n");
       return false;
     }
 
   if (!optimize_loop_for_speed_p (loop))
     {
       if (dump_file)
-        fprintf (dump_file, "Not peeling: cold loop\n");
+	fprintf (dump_file, "Not peeling: cold loop\n");
       return false;
     }
 
@@ -1005,7 +1030,7 @@  try_peel_loop (struct loop *loop,
   if (maxiter >= 0 && maxiter <= npeel)
     {
       if (dump_file)
-        fprintf (dump_file, "Not peeling: upper bound is known so can "
+	fprintf (dump_file, "Not peeling: upper bound is known so can "
 		 "unroll completely\n");
       return false;
     }
@@ -1016,7 +1041,7 @@  try_peel_loop (struct loop *loop,
   if (npeel > PARAM_VALUE (PARAM_MAX_PEEL_TIMES) - 1)
     {
       if (dump_file)
-        fprintf (dump_file, "Not peeling: rolls too much "
+	fprintf (dump_file, "Not peeling: rolls too much "
 		 "(%i + 1 > --param max-peel-times)\n", (int) npeel);
       return false;
     }
@@ -1029,7 +1054,7 @@  try_peel_loop (struct loop *loop,
       > PARAM_VALUE (PARAM_MAX_PEELED_INSNS))
     {
       if (dump_file)
-        fprintf (dump_file, "Not peeling: peeled sequence size is too large "
+	fprintf (dump_file, "Not peeling: peeled sequence size is too large "
 		 "(%i insns > --param max-peel-insns)", peeled_size);
       return false;
     }
@@ -1317,7 +1342,9 @@  tree_unroll_loops_completely_1 (bool may
   if (!loop_father)
     return false;
 
-  if (may_increase_size && optimize_loop_nest_for_speed_p (loop)
+  if (loop->unroll > 1)
+    ul = UL_ALL;
+  else if (may_increase_size && optimize_loop_nest_for_speed_p (loop)
       /* Unroll outermost loops only if asked to do so or they do
 	 not cause code growth.  */
       && (unroll_outer || loop_outer (loop_father)))
@@ -1345,7 +1372,7 @@  tree_unroll_loops_completely_1 (bool may
    MAY_INCREASE_SIZE is true, perform the unrolling only if the
    size of the code does not increase.  */
 
-unsigned int
+static unsigned int
 tree_unroll_loops_completely (bool may_increase_size, bool unroll_outer)
 {
   bitmap father_bbs = BITMAP_ALLOC (NULL);
@@ -1522,9 +1549,9 @@  pass_complete_unroll::execute (function
      re-peeling the same loop multiple times.  */
   if (flag_peel_loops)
     peeled_loops = BITMAP_ALLOC (NULL);
-  int val = tree_unroll_loops_completely (flag_unroll_loops
-					  || flag_peel_loops
-					  || optimize >= 3, true);
+  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
+						   || flag_peel_loops
+						   || optimize >= 3, true);
   if (peeled_loops)
     {
       BITMAP_FREE (peeled_loops);
@@ -1576,8 +1603,7 @@  pass_complete_unrolli::execute (function
 {
   unsigned ret = 0;
 
-  loop_optimizer_init (LOOPS_NORMAL
-		       | LOOPS_HAVE_RECORDED_EXITS);
+  loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
   if (number_of_loops (fun) > 1)
     {
       scev_initialize ();
Index: tree.def
===================================================================
--- tree.def	(revision 255000)
+++ tree.def	(working copy)
@@ -1410,8 +1410,9 @@  DEFTREECODE (TARGET_OPTION_NODE, "target
 
 /* ANNOTATE_EXPR.
    Operand 0 is the expression to be annotated.
-   Operand 1 is the annotation kind.  */
-DEFTREECODE (ANNOTATE_EXPR, "annotate_expr", tcc_expression, 2)
+   Operand 1 is the annotation kind.
+   Operand 2 is additional data.  */
+DEFTREECODE (ANNOTATE_EXPR, "annotate_expr", tcc_expression, 3)
 
 /* Cilk spawn statement
    Operand 0 is the CALL_EXPR.  */