diff mbox series

Add support for #pragma GCC unroll

Message ID 3136854.5zjt3GTbhu@polaris
State New
Headers show
Series Add support for #pragma GCC unroll | expand

Commit Message

Eric Botcazou Nov. 17, 2017, 10:23 a.m. UTC
Hi,

this is a cleaned up and updated revision of Mike's latest posted patch 
implementing #pragma GCC unroll in the C and C++ compilers.  To be honest, 
we're not so much interested in the front-end bits as in the middle-end bits, 
because the latter would at last make the Ada version of the pragma work, but 
the front-end bits are a significant part of the whole thing so it's probably 
fair to rescue them as well.

The C and C++ front-end bits are (almost) verbatim from Mike.  The cleanup 
comprises making the new 3rd operand of ANNOTATE_EXPR mandatory, so that you 
don't have to add guards all over the place, polishing a few rough edges and 
eliminating a few preexisting nits in the unrolling code.

Tested on x86_64-suse-linux, OK for the mainline?


2017-11-17  Mike Stump  <mikestump@comcast.net>
            Eric Botcazou  <ebotcazou@adacore.com>

ChangeLog:
	* doc/extend.texi (Loop-Specific Pragmas): Document pragma GCC unroll.
	* doc/generic.texi (ANNOTATE_EXPR): Document 3rd operand.
	* cfgloop.h (struct loop): Add unroll field.
	* function.h (struct function): Add has_unroll bitfield.
	* gimplify.c (gimple_boolify) <ANNOTATE_EXPR>: Deal with unroll kind.
	(gimplify_expr) <ANNOTATE_EXPR>: Propagate 3rd operand.
	* loop-init.c (pass_loop2::gate): Return true if cfun->has_unroll.
	(pass_rtl_unroll_loops::gate): Likewise.
	* loop-unroll.c (decide_unrolling): Tweak note message.  Skip loops
	if loop->unroll==1 and force unrolling loop->unroll > 1.
	(decide_unroll_constant_iterations): Use note for consistency and
	return early if loop->unroll is set.
	(decide_unroll_runtime_iterations): Use note for consistency and take
	loop->unroll into account.
	(decide_unroll_stupid): Likewise.
	* lto-streamer-in.c (input_cfg): Read loop->unroll.
	* lto-streamer-out.c (output_cfg): Write loop->unroll.
	* tree-cfg.c (replace_loop_annotate_in_block) <annot_expr_unroll_kind>
	New.
	(replace_loop_annotate) <annot_expr_unroll_kind>: Likewise.
	(print_loop): Print loop->unroll if set.
	* tree-core.h (enum annot_expr_kind): Add annot_expr_unroll_kind.
	* tree-inline.c (copy_loops): Copy unroll and set cfun->has_unroll.
	* tree-pretty-print.c (dump_generic_node) <annot_expr_unroll_kind>:
	New.
	* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Bail out if
	loop->unroll is set and smaller than the trip count.  Otherwise bypass
	entirely the heuristics if loop->unroll is set.  Remove dead note.
	Fix off-by-one bug in other node.
	(try_peel_loop): Bail out if loop->unroll is set.  Fix formatting.
	(tree_unroll_loops_completely_1): Force unrolling if loop->unroll
	is greater than 1.
	* tree.def (ANNOTATE_EXPR): Add 3rd operand.

ada/ChangeLog:

	* gcc-interface/trans.c (gnat_gimplify_stmt) <LOOP_STMT>: Add 3rd
	operand to ANNOTATE_EXPR and pass unrolling hints.

c-family/ChangeLog:

	* c-pragma.c (init_pragma): Register pragma GCC unroll.
	* c-pragma.h (enum pragma_kind): Add PRAGMA_UNROLL.

c/ChangeLog:

	* c-parser.c (c_parser_while_statement): Add unroll parameter and
	build ANNOTATE_EXPR if present.  Add 3rd operand to ANNOTATE_EXPR.
	(c_parser_do_statement): Likewise.
	(c_parser_for_statement): Likewise.
	(c_parser_statement_after_labels): Adjust calls to above.
	(c_parse_pragma_ivdep): New static function.
	(c_parser_pragma_unroll): Likewise.
	(c_parser_pragma) <PRAGMA_IVDEP>: Add support for pragma Unroll.
	<PRAGMA_UNROLL>: New case.

cp/ChangeLog:

	* constexpr.c (cxx_eval_constant_expression) <ANNOTATE_EXPR>: Remove
	assertion on 2nd operand.
	(potential_constant_expression_1): Likewise.
	* cp-array-notation.c (create_an_loop): Adjut call to finish_for_cond.
	* cp-tree.h (cp_convert_range_for): Adjust prototype.
	(finish_while_stmt_cond): Likewise.
	(finish_do_stmt): Likewise.
	(finish_for_cond): Likewise.
	* init.c (build_vec_init): Adjut call to finish_for_cond.
	* parser.c (cp_parser_statement): Adjust call to
	cp_parser_iteration_statement.
	(cp_parser_for): Add unroll parameter and pass it in calls to
	cp_parser_range_for and cp_parser_c_for.
	(cp_parser_c_for): Add unroll parameter and pass it in call to
	finish_for_cond.
	(cp_parser_range_for): Add unroll parameter and pass it in call to
	cp_convert_range_for.
	(cp_convert_range_for): Add unroll parameter and pass it in call to
	finish_for_cond.
	(cp_parser_iteration_statement): Add unroll parameter and pass it in
	calls to finish_while_stmt_cond, finish_do_stmt and cp_parser_for.
	(cp_parser_pragma_ivdep): New static function.
	(cp_parser_pragma_unroll): Likewise.
	(cp_parser_pragma) <PRAGMA_IVDEP>: Add support for pragma Unroll.
	<PRAGMA_UNROLL>: New case.
	* pt.c (tsubst_expr): Adjut calls to finish_for_cond,
	cp_convert_range_for, finish_while_stmt_cond and finish_do_stmt.
	<ANNOTATE_EXPR>: Propagate 3rd operand.
	* semantics.c (finish_while_stmt_cond): Add unroll parameter and
	build ANNOTATE_EXPR if present.  Add 3rd operand to ANNOTATE_EXPR.
	(finish_do_stmt): Likewise.
	(finish_for_cond): Likewise.

fortran/ChangeLog:

	* trans-stmt.c (gfc_trans_forall_loop): Add 3rd operand to
	ANNOTATE_EXPR.

testsuite/ChangeLog:

	* c-c++-common/unroll-1.c: New test.
	* c-c++-common/unroll-2.c: Likewise.
	* c-c++-common/unroll-3.c: Likewise.
	* c-c++-common/unroll-4.c: Likewise.
	* gcc.dg/tree-prof/unroll-1.c: Use detailed dump and adjust scan.
	* gcc.dg/unroll-2.c (foo): Adjust message.
	(foo2): Likewise.
	* gcc.dg/unroll-3.c: Adjust scan.
	* gcc.dg/unroll-4.c: Likewise.
	* gcc.dg/unroll-5.c: Likewise.
	* gcc.dg/unroll-7.c: Use detailed dump and adjust scan.
	* gnat.dg/unroll1.ad[sb]: New test.
	* gnat.dg/unroll2.ad[sb]: Likewise.

 ada/gcc-interface/trans.c             |   25 ++-
 c-family/c-pragma.c                   |    4 
 c-family/c-pragma.h                   |    1 
 c/c-parser.c                          |  151 +++++++++++++++----
 cfgloop.h                             |    5 
 cp/constexpr.c                        |    2 
 cp/cp-array-notation.c                |    2 
 cp/cp-tree.h                          |    9 -
 cp/init.c                             |    2 
 cp/parser.c                           |  122 ++++++++++++---
 cp/pt.c                               |   16 +-
 cp/semantics.c                        |   42 ++++-
 doc/extend.texi                       |   12 +
 doc/generic.texi                      |    2 
 fortran/trans-stmt.c                  |    5 
 function.h                            |    5 
 gimplify.c                            |    4 
 loop-init.c                           |    6 
 loop-unroll.c                         |   66 +++++---
 lto-streamer-in.c                     |    1 
 lto-streamer-out.c                    |    1 
 testsuite/c-c++-common/unroll-1.c     |   41 +++++
 testsuite/c-c++-common/unroll-2.c     |   41 +++++
 testsuite/c-c++-common/unroll-3.c     |   20 ++
 testsuite/c-c++-common/unroll-4.c     |   29 +++
 testsuite/gcc.dg/tree-prof/unroll-1.c |    4 
 testsuite/gcc.dg/unroll-2.c           |    4 
 testsuite/gcc.dg/unroll-3.c           |    2 
 testsuite/gcc.dg/unroll-4.c           |    2 
 testsuite/gcc.dg/unroll-5.c           |    2 
 testsuite/gcc.dg/unroll-7.c           |    4 
 testsuite/gnat.dg/unroll1.adb         |   27 +++
 testsuite/gnat.dg/unroll1.ads         |    9 +
 testsuite/gnat.dg/unroll2.adb         |   26 +++
 testsuite/gnat.dg/unroll2.ads         |    9 +
 tree-cfg.c                            |    8 +
 tree-core.h                           |    1 
 tree-inline.c                         |    5 
 tree-pretty-print.c                   |    4 
 tree-ssa-loop-ivcanon.c               |  269 ++++++++++++++++++--------------
 tree.def                              |    5 
 41 files changed, 755 insertions(+), 240 deletions(-)

Comments

Richard Biener Nov. 17, 2017, 1:31 p.m. UTC | #1
On Fri, Nov 17, 2017 at 11:23 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> Hi,
>
> this is a cleaned up and updated revision of Mike's latest posted patch
> implementing #pragma GCC unroll in the C and C++ compilers.  To be honest,
> we're not so much interested in the front-end bits as in the middle-end bits,
> because the latter would at last make the Ada version of the pragma work, but
> the front-end bits are a significant part of the whole thing so it's probably
> fair to rescue them as well.
>
> The C and C++ front-end bits are (almost) verbatim from Mike.  The cleanup
> comprises making the new 3rd operand of ANNOTATE_EXPR mandatory, so that you
> don't have to add guards all over the place, polishing a few rough edges and
> eliminating a few preexisting nits in the unrolling code.
>
> Tested on x86_64-suse-linux, OK for the mainline?

Looking at the middle-end changes.  The change to ANNOTATE_EXPR to three
operands is approved also for existing frontends (just in case you
don't get review).
Found the missing possibility of an argument limiting myself...

Your changes to the RTL unrolling pass as far as I see enable the optimization
in its general form for all loops in a function if cfun->has_unroll,
not just for
the marked loops.  I think that is unintended?

I think you want to change copy_loop_info () to copy the flag as well.

@@ -1566,7 +1593,7 @@ public:
   {}

   /* opt_pass methods: */
-  virtual bool gate (function *) { return optimize >= 2; }
+  virtual bool gate (function *) { return optimize >= 2 || cfun->has_unroll; }
   virtual unsigned int execute (function *);

 }; // class pass_complete_unrolli

I think this has the same issue as the RTL unroller change.

Otherwise the middle-end changes look ok.

Thanks,
Richard.



>
> 2017-11-17  Mike Stump  <mikestump@comcast.net>
>             Eric Botcazou  <ebotcazou@adacore.com>
>
> ChangeLog:
>         * doc/extend.texi (Loop-Specific Pragmas): Document pragma GCC unroll.
>         * doc/generic.texi (ANNOTATE_EXPR): Document 3rd operand.
>         * cfgloop.h (struct loop): Add unroll field.
>         * function.h (struct function): Add has_unroll bitfield.
>         * gimplify.c (gimple_boolify) <ANNOTATE_EXPR>: Deal with unroll kind.
>         (gimplify_expr) <ANNOTATE_EXPR>: Propagate 3rd operand.
>         * loop-init.c (pass_loop2::gate): Return true if cfun->has_unroll.
>         (pass_rtl_unroll_loops::gate): Likewise.
>         * loop-unroll.c (decide_unrolling): Tweak note message.  Skip loops
>         if loop->unroll==1 and force unrolling loop->unroll > 1.
>         (decide_unroll_constant_iterations): Use note for consistency and
>         return early if loop->unroll is set.
>         (decide_unroll_runtime_iterations): Use note for consistency and take
>         loop->unroll into account.
>         (decide_unroll_stupid): Likewise.
>         * lto-streamer-in.c (input_cfg): Read loop->unroll.
>         * lto-streamer-out.c (output_cfg): Write loop->unroll.
>         * tree-cfg.c (replace_loop_annotate_in_block) <annot_expr_unroll_kind>
>         New.
>         (replace_loop_annotate) <annot_expr_unroll_kind>: Likewise.
>         (print_loop): Print loop->unroll if set.
>         * tree-core.h (enum annot_expr_kind): Add annot_expr_unroll_kind.
>         * tree-inline.c (copy_loops): Copy unroll and set cfun->has_unroll.
>         * tree-pretty-print.c (dump_generic_node) <annot_expr_unroll_kind>:
>         New.
>         * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Bail out if
>         loop->unroll is set and smaller than the trip count.  Otherwise bypass
>         entirely the heuristics if loop->unroll is set.  Remove dead note.
>         Fix off-by-one bug in other node.
>         (try_peel_loop): Bail out if loop->unroll is set.  Fix formatting.
>         (tree_unroll_loops_completely_1): Force unrolling if loop->unroll
>         is greater than 1.
>         * tree.def (ANNOTATE_EXPR): Add 3rd operand.
>
> ada/ChangeLog:
>
>         * gcc-interface/trans.c (gnat_gimplify_stmt) <LOOP_STMT>: Add 3rd
>         operand to ANNOTATE_EXPR and pass unrolling hints.
>
> c-family/ChangeLog:
>
>         * c-pragma.c (init_pragma): Register pragma GCC unroll.
>         * c-pragma.h (enum pragma_kind): Add PRAGMA_UNROLL.
>
> c/ChangeLog:
>
>         * c-parser.c (c_parser_while_statement): Add unroll parameter and
>         build ANNOTATE_EXPR if present.  Add 3rd operand to ANNOTATE_EXPR.
>         (c_parser_do_statement): Likewise.
>         (c_parser_for_statement): Likewise.
>         (c_parser_statement_after_labels): Adjust calls to above.
>         (c_parse_pragma_ivdep): New static function.
>         (c_parser_pragma_unroll): Likewise.
>         (c_parser_pragma) <PRAGMA_IVDEP>: Add support for pragma Unroll.
>         <PRAGMA_UNROLL>: New case.
>
> cp/ChangeLog:
>
>         * constexpr.c (cxx_eval_constant_expression) <ANNOTATE_EXPR>: Remove
>         assertion on 2nd operand.
>         (potential_constant_expression_1): Likewise.
>         * cp-array-notation.c (create_an_loop): Adjut call to finish_for_cond.
>         * cp-tree.h (cp_convert_range_for): Adjust prototype.
>         (finish_while_stmt_cond): Likewise.
>         (finish_do_stmt): Likewise.
>         (finish_for_cond): Likewise.
>         * init.c (build_vec_init): Adjut call to finish_for_cond.
>         * parser.c (cp_parser_statement): Adjust call to
>         cp_parser_iteration_statement.
>         (cp_parser_for): Add unroll parameter and pass it in calls to
>         cp_parser_range_for and cp_parser_c_for.
>         (cp_parser_c_for): Add unroll parameter and pass it in call to
>         finish_for_cond.
>         (cp_parser_range_for): Add unroll parameter and pass it in call to
>         cp_convert_range_for.
>         (cp_convert_range_for): Add unroll parameter and pass it in call to
>         finish_for_cond.
>         (cp_parser_iteration_statement): Add unroll parameter and pass it in
>         calls to finish_while_stmt_cond, finish_do_stmt and cp_parser_for.
>         (cp_parser_pragma_ivdep): New static function.
>         (cp_parser_pragma_unroll): Likewise.
>         (cp_parser_pragma) <PRAGMA_IVDEP>: Add support for pragma Unroll.
>         <PRAGMA_UNROLL>: New case.
>         * pt.c (tsubst_expr): Adjut calls to finish_for_cond,
>         cp_convert_range_for, finish_while_stmt_cond and finish_do_stmt.
>         <ANNOTATE_EXPR>: Propagate 3rd operand.
>         * semantics.c (finish_while_stmt_cond): Add unroll parameter and
>         build ANNOTATE_EXPR if present.  Add 3rd operand to ANNOTATE_EXPR.
>         (finish_do_stmt): Likewise.
>         (finish_for_cond): Likewise.
>
> fortran/ChangeLog:
>
>         * trans-stmt.c (gfc_trans_forall_loop): Add 3rd operand to
>         ANNOTATE_EXPR.
>
> testsuite/ChangeLog:
>
>         * c-c++-common/unroll-1.c: New test.
>         * c-c++-common/unroll-2.c: Likewise.
>         * c-c++-common/unroll-3.c: Likewise.
>         * c-c++-common/unroll-4.c: Likewise.
>         * gcc.dg/tree-prof/unroll-1.c: Use detailed dump and adjust scan.
>         * gcc.dg/unroll-2.c (foo): Adjust message.
>         (foo2): Likewise.
>         * gcc.dg/unroll-3.c: Adjust scan.
>         * gcc.dg/unroll-4.c: Likewise.
>         * gcc.dg/unroll-5.c: Likewise.
>         * gcc.dg/unroll-7.c: Use detailed dump and adjust scan.
>         * gnat.dg/unroll1.ad[sb]: New test.
>         * gnat.dg/unroll2.ad[sb]: Likewise.
>
>  ada/gcc-interface/trans.c             |   25 ++-
>  c-family/c-pragma.c                   |    4
>  c-family/c-pragma.h                   |    1
>  c/c-parser.c                          |  151 +++++++++++++++----
>  cfgloop.h                             |    5
>  cp/constexpr.c                        |    2
>  cp/cp-array-notation.c                |    2
>  cp/cp-tree.h                          |    9 -
>  cp/init.c                             |    2
>  cp/parser.c                           |  122 ++++++++++++---
>  cp/pt.c                               |   16 +-
>  cp/semantics.c                        |   42 ++++-
>  doc/extend.texi                       |   12 +
>  doc/generic.texi                      |    2
>  fortran/trans-stmt.c                  |    5
>  function.h                            |    5
>  gimplify.c                            |    4
>  loop-init.c                           |    6
>  loop-unroll.c                         |   66 +++++---
>  lto-streamer-in.c                     |    1
>  lto-streamer-out.c                    |    1
>  testsuite/c-c++-common/unroll-1.c     |   41 +++++
>  testsuite/c-c++-common/unroll-2.c     |   41 +++++
>  testsuite/c-c++-common/unroll-3.c     |   20 ++
>  testsuite/c-c++-common/unroll-4.c     |   29 +++
>  testsuite/gcc.dg/tree-prof/unroll-1.c |    4
>  testsuite/gcc.dg/unroll-2.c           |    4
>  testsuite/gcc.dg/unroll-3.c           |    2
>  testsuite/gcc.dg/unroll-4.c           |    2
>  testsuite/gcc.dg/unroll-5.c           |    2
>  testsuite/gcc.dg/unroll-7.c           |    4
>  testsuite/gnat.dg/unroll1.adb         |   27 +++
>  testsuite/gnat.dg/unroll1.ads         |    9 +
>  testsuite/gnat.dg/unroll2.adb         |   26 +++
>  testsuite/gnat.dg/unroll2.ads         |    9 +
>  tree-cfg.c                            |    8 +
>  tree-core.h                           |    1
>  tree-inline.c                         |    5
>  tree-pretty-print.c                   |    4
>  tree-ssa-loop-ivcanon.c               |  269 ++++++++++++++++++--------------
>  tree.def                              |    5
>  41 files changed, 755 insertions(+), 240 deletions(-)
>
> --
> Eric Botcazou
Bernhard Reutner-Fischer Nov. 17, 2017, 4:52 p.m. UTC | #2
On 17 November 2017 14:31:45 CET, Richard Biener <richard.guenther@gmail.com> wrote:
>On Fri, Nov 17, 2017 at 11:23 AM, Eric Botcazou <ebotcazou@adacore.com>
>wrote:
>> Hi,
>>
>> this is a cleaned up and updated revision of Mike's latest posted
>patch
>> implementing #pragma GCC unroll in the C and C++ compilers.  To be
>honest,
>> we're not so much interested in the front-end bits as in the
>middle-end bits,
>> because the latter would at last make the Ada version of the pragma
>work, but
>> the front-end bits are a significant part of the whole thing so it's
>probably
>> fair to rescue them as well.
>>
>> The C and C++ front-end bits are (almost) verbatim from Mike.  The
>cleanup
>> comprises making the new 3rd operand of ANNOTATE_EXPR mandatory, so
>that you
>> don't have to add guards all over the place, polishing a few rough
>edges and
>> eliminating a few preexisting nits in the unrolling code.
>>
>> Tested on x86_64-suse-linux, OK for the mainline?
>
>Looking at the middle-end changes.  The change to ANNOTATE_EXPR to
>three
>operands is approved also for existing frontends (just in case you
>don't get review).
>Found the missing possibility of an argument limiting myself...
>
>Your changes to the RTL unrolling pass as far as I see enable the
>optimization
>in its general form for all loops in a function if cfun->has_unroll,
>not just for
>the marked loops.  I think that is unintended?
>
>I think you want to change copy_loop_info () to copy the flag as well.
>
>@@ -1566,7 +1593,7 @@ public:
>   {}
>
>   /* opt_pass methods: */
>-  virtual bool gate (function *) { return optimize >= 2; }
>+  virtual bool gate (function *) { return optimize >= 2 ||
>cfun->has_unroll; }
>   virtual unsigned int execute (function *);
>
> }; // class pass_complete_unrolli
>
>I think this has the same issue as the RTL unroller change.
>
>Otherwise the middle-end changes look ok.

If anybody finds the time to push the corresponding Fortran changes then I'd be grateful. I won't have time for this until end of stage 1...

https://gcc.gnu.org/ml/fortran/2015-02/msg00014.html

TIA
Eric Botcazou Nov. 20, 2017, 11:26 a.m. UTC | #3
> If anybody finds the time to push the corresponding Fortran changes then I'd
> be grateful. I won't have time for this until end of stage 1...
> 
> https://gcc.gnu.org/ml/fortran/2015-02/msg00014.html

OK, I'm going to merge it in the main patch.
Eric Botcazou Nov. 20, 2017, 11:36 a.m. UTC | #4
> Looking at the middle-end changes.  The change to ANNOTATE_EXPR to three
> operands is approved also for existing frontends (just in case you
> don't get review).
> Found the missing possibility of an argument limiting myself...

I see. ;-)

> Your changes to the RTL unrolling pass as far as I see enable the
> optimization in its general form for all loops in a function if
> cfun->has_unroll, not just for the marked loops.  I think that is
> unintended?

Probably not, let me try and see what can be done.

> I think you want to change copy_loop_info () to copy the flag as well.

The other optimization hints (ivdep, [no-]vector) aren't copied either.

> I think this has the same issue as the RTL unroller change.

Right.
Bernhard Reutner-Fischer Nov. 20, 2017, 11:57 a.m. UTC | #5
On 20 November 2017 at 12:26, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> If anybody finds the time to push the corresponding Fortran changes then I'd
>> be grateful. I won't have time for this until end of stage 1...
>>
>> https://gcc.gnu.org/ml/fortran/2015-02/msg00014.html
>
> OK, I'm going to merge it in the main patch.

[CCing fortran@]
Thanks alot in advance!
>
> --
> Eric Botcazou
Steve Kargl Nov. 20, 2017, 2:45 p.m. UTC | #6
On Mon, Nov 20, 2017 at 12:57:47PM +0100, Bernhard Reutner-Fischer wrote:
> On 20 November 2017 at 12:26, Eric Botcazou <ebotcazou@adacore.com> wrote:
> >> If anybody finds the time to push the corresponding Fortran changes then I'd
> >> be grateful. I won't have time for this until end of stage 1...
> >>
> >> https://gcc.gnu.org/ml/fortran/2015-02/msg00014.html
> >
> > OK, I'm going to merge it in the main patch.
> 
> [CCing fortran@]
> Thanks alot in advance!

The URL points to a nearly 3 year old patch.  I noticed
that there is no documentation of the new Fortran directive.
Section 7.2 of gfortran.info should be updated.  In particular,
does '!GCC$ UNROLL 4' affect only the immediately following
DO-LOOP or all DO-LOOPs that follow the directive until another
'GCC$ UNROLL ...' is found?  How does this new directive interface
with OpenMP and OpenACC?
Sandra Loosemore Nov. 20, 2017, 7:50 p.m. UTC | #7
On 11/17/2017 03:23 AM, Eric Botcazou wrote:

> Index: doc/extend.texi
> ===================================================================
> --- doc/extend.texi	(revision 254797)
> +++ doc/extend.texi	(working copy)
> @@ -22376,6 +22376,18 @@ void ignore_vec_dep (int *a, int k, int
>  @}
>  @end smallexample
>  
> +@table @code
> +@item #pragma GCC unroll @var{n}
> +@cindex pragma GCC unroll @var{n}
> +
> +With this pragma, the programmer informs the optimizer how many times
> +a loop should be unrolled.  A 0 or 1 informs the compiler to not
> +perform any loop unrolling.  The pragma must be immediately before
> +@samp{#pragma ivdep} or a @code{for}, @code{while} or @code{do} loop
> +and applies only to the loop that follows.  @var{n} is an
> +assignment-expression that evaluates to an integer constant.
> +
> +@end table
>  
>  @node Unnamed Fields
>  @section Unnamed Structure and Union Fields

This documentation patch needs some work.

First of all, the structuring in this section is screwed up.  The 
discussion and examples for the previous item (#pragma ivdep) should be 
moved inside the @table so that you don't have to introduce another 
@table here, just insert another entry into the existing one.

Second, we shouldn't be talking about "the programmer" in the third 
person; programmers are "you", the readers of the manual.  The paragraph 
structure and phrasing seem awkward as well.  How about something like 
this instead?

You can use this pragma to control how many times a loop should be 
unrolled.  It must be placed immediately before a @code{for}, 
@code{while} or @code{do} loop or a @samp{#pragma ivdep}, and applies 
only to the loop that follows.  @var{n} is an integer constant 
expression; a value of 0 or 1 disables unrolling of the loop.

-Sandra
Bernhard Reutner-Fischer Nov. 20, 2017, 8:21 p.m. UTC | #8
On 20 November 2017 at 15:45, Steve Kargl
<sgk@troutmask.apl.washington.edu> wrote:
> On Mon, Nov 20, 2017 at 12:57:47PM +0100, Bernhard Reutner-Fischer wrote:
>> On 20 November 2017 at 12:26, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> >> If anybody finds the time to push the corresponding Fortran changes then I'd
>> >> be grateful. I won't have time for this until end of stage 1...
>> >>
>> >> https://gcc.gnu.org/ml/fortran/2015-02/msg00014.html
>> >
>> > OK, I'm going to merge it in the main patch.
>>
>> [CCing fortran@]
>> Thanks alot in advance!
>
> The URL points to a nearly 3 year old patch.  I noticed
> that there is no documentation of the new Fortran directive.
> Section 7.2 of gfortran.info should be updated.  In particular,
> does '!GCC$ UNROLL 4' affect only the immediately following
> DO-LOOP or all DO-LOOPs that follow the directive until another
> 'GCC$ UNROLL ...' is found?  How does this new directive interface
> with OpenMP and OpenACC?

The documentation for the directive is missing indeed. We can fix this
during stage3.

Currently the directive works on the whole function (see
gfc_cfun_has_unroll()) and instructs the loop-optimizers to run on
that function.
The loop-optimizers will discover the ANNOTATE_EXPR and act accordingly.
Richard B. already noted that the RTL unroller might do more than
intended, see https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01468.html
I expect updates to the C and C++ in this area to be reflected to Fortran too.

The interaction with OpenMP and OpenACC in the Fortran FE is the same
as in the other frontends, obviously.
Eric's current respin of Mikes patch is here, FYI:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01452.html

thanks,
Eric Botcazou Nov. 21, 2017, 10:18 a.m. UTC | #9
> First of all, the structuring in this section is screwed up.  The
> discussion and examples for the previous item (#pragma ivdep) should be
> moved inside the @table so that you don't have to introduce another
> @table here, just insert another entry into the existing one.

That's also the case for the entire subsection just above, namely "Function 
Specific Option Pragmas".  I presume the tables must be merged there too?

> Second, we shouldn't be talking about "the programmer" in the third
> person; programmers are "you", the readers of the manual.  The paragraph
> structure and phrasing seem awkward as well.  How about something like
> this instead?
> 
> You can use this pragma to control how many times a loop should be
> unrolled.  It must be placed immediately before a @code{for},
> @code{while} or @code{do} loop or a @samp{#pragma ivdep}, and applies
> only to the loop that follows.  @var{n} is an integer constant
> expression; a value of 0 or 1 disables unrolling of the loop.

Thanks, integrated into the patch.
Eric Botcazou Nov. 21, 2017, 10:21 a.m. UTC | #10
> The documentation for the directive is missing indeed. We can fix this
> during stage3.

Someone who speaks Fortran will have to write it down...

> Currently the directive works on the whole function (see
> gfc_cfun_has_unroll()) and instructs the loop-optimizers to run on
> that function.

gfc_cfun_has_unroll is superfluous and has already been dropped because the 
flag will be set by the middle-end, but this doesn't change the behavior.

> The loop-optimizers will discover the ANNOTATE_EXPR and act accordingly.
> Richard B. already noted that the RTL unroller might do more than
> intended, see https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01468.html
> I expect updates to the C and C++ in this area to be reflected to Fortran
> too.

Yes, it's a generic issue.
Sandra Loosemore Nov. 22, 2017, 4:45 p.m. UTC | #11
On 11/21/2017 03:18 AM, Eric Botcazou wrote:
>> First of all, the structuring in this section is screwed up.  The
>> discussion and examples for the previous item (#pragma ivdep) should be
>> moved inside the @table so that you don't have to introduce another
>> @table here, just insert another entry into the existing one.
> 
> That's also the case for the entire subsection just above, namely "Function
> Specific Option Pragmas".  I presume the tables must be merged there too?

That would be a good thing generally, but you don't have to fix that as 
part of this patch (since you're not touching that section otherwise).

-Sandra
diff mbox series

Patch

Index: ada/gcc-interface/trans.c
===================================================================
--- ada/gcc-interface/trans.c	(revision 254797)
+++ ada/gcc-interface/trans.c	(working copy)
@@ -8506,17 +8506,30 @@  gnat_gimplify_stmt (tree *stmt_p)
 	  {
 	    /* Deal with the optimization hints.  */
 	    if (LOOP_STMT_IVDEP (stmt))
-	      gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+	      gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
 				 build_int_cst (integer_type_node,
-						annot_expr_ivdep_kind));
+						annot_expr_ivdep_kind),
+				 integer_zero_node);
+	    if (LOOP_STMT_NO_UNROLL (stmt))
+	      gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+				 build_int_cst (integer_type_node,
+						annot_expr_unroll_kind),
+				 integer_one_node);
+	    if (LOOP_STMT_UNROLL (stmt))
+	      gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+				 build_int_cst (integer_type_node,
+						annot_expr_unroll_kind),
+				 build_int_cst (NULL_TREE, USHRT_MAX));
 	    if (LOOP_STMT_NO_VECTOR (stmt))
-	      gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+	      gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
 				 build_int_cst (integer_type_node,
-						annot_expr_no_vector_kind));
+						annot_expr_no_vector_kind),
+				 integer_zero_node);
 	    if (LOOP_STMT_VECTOR (stmt))
-	      gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+	      gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
 				 build_int_cst (integer_type_node,
-						annot_expr_vector_kind));
+						annot_expr_vector_kind),
+				 integer_zero_node);
 
 	    gnu_cond
 	      = build3 (COND_EXPR, void_type_node, gnu_cond, NULL_TREE,
Index: c/c-parser.c
===================================================================
--- c/c-parser.c	(revision 254797)
+++ c/c-parser.c	(working copy)
@@ -1408,9 +1408,9 @@  static tree c_parser_c99_block_statement
 					  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec<tree> *);
 static void c_parser_switch_statement (c_parser *, bool *);
-static void c_parser_while_statement (c_parser *, bool, bool *);
-static void c_parser_do_statement (c_parser *, bool);
-static void c_parser_for_statement (c_parser *, bool, bool *);
+static void c_parser_while_statement (c_parser *, bool, unsigned short, bool *);
+static void c_parser_do_statement (c_parser *, bool, unsigned short);
+static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -5495,13 +5495,13 @@  c_parser_statement_after_labels (c_parse
 	  c_parser_switch_statement (parser, if_p);
 	  break;
 	case RID_WHILE:
-	  c_parser_while_statement (parser, false, if_p);
+	  c_parser_while_statement (parser, false, 0, if_p);
 	  break;
 	case RID_DO:
-	  c_parser_do_statement (parser, false);
+	  c_parser_do_statement (parser, 0, false);
 	  break;
 	case RID_FOR:
-	  c_parser_for_statement (parser, false, if_p);
+	  c_parser_for_statement (parser, false, 0, if_p);
 	  break;
 	case RID_CILK_FOR:
 	  if (!flag_cilkplus)
@@ -6035,7 +6035,8 @@  c_parser_switch_statement (c_parser *par
    implement -Wparentheses.  */
 
 static void
-c_parser_while_statement (c_parser *parser, bool ivdep, bool *if_p)
+c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
+			  bool *if_p)
 {
   tree block, cond, body, save_break, save_cont;
   location_t loc;
@@ -6051,9 +6052,15 @@  c_parser_while_statement (c_parser *pars
 	 "%<_Cilk_spawn%> statement cannot be used as a condition for while statement"))
     cond = error_mark_node;
   if (ivdep && cond != error_mark_node)
-    cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		   build_int_cst (integer_type_node,
-		   annot_expr_ivdep_kind));
+				  annot_expr_ivdep_kind),
+		   integer_zero_node);
+  if (unroll && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node,
+				  annot_expr_unroll_kind),
+		   build_int_cst (integer_type_node, unroll));
   save_break = c_break_label;
   c_break_label = NULL_TREE;
   save_cont = c_cont_label;
@@ -6088,7 +6095,7 @@  c_parser_while_statement (c_parser *pars
 */
 
 static void
-c_parser_do_statement (c_parser *parser, bool ivdep)
+c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll)
 {
   tree block, cond, body, save_break, save_cont, new_break, new_cont;
   location_t loc;
@@ -6116,9 +6123,16 @@  c_parser_do_statement (c_parser *parser,
 	 "%<_Cilk_spawn%> statement cannot be used as a condition for a do-while statement"))
     cond = error_mark_node;
   if (ivdep && cond != error_mark_node)
-    cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node,
+				  annot_expr_ivdep_kind),
+		   integer_zero_node);
+  if (unroll && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node,
+				  annot_expr_unroll_kind),
 		   build_int_cst (integer_type_node,
-		   annot_expr_ivdep_kind));
+				  unroll));
   if (!c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>"))
     c_parser_skip_to_end_of_block_or_statement (parser);
   c_finish_loop (loc, cond, NULL, body, new_break, new_cont, false);
@@ -6185,7 +6199,8 @@  c_parser_do_statement (c_parser *parser,
    implement -Wparentheses.  */
 
 static void
-c_parser_for_statement (c_parser *parser, bool ivdep, bool *if_p)
+c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll,
+			bool *if_p)
 {
   tree block, cond, incr, save_break, save_cont, body;
   /* The following are only used when parsing an ObjC foreach statement.  */
@@ -6306,6 +6321,12 @@  c_parser_for_statement (c_parser *parser
 				  "%<GCC ivdep%> pragma");
 		  cond = error_mark_node;
 		}
+	      else if (unroll)
+		{
+		  c_parser_error (parser, "missing loop condition in loop with "
+				  "%<GCC unroll%> pragma");
+		  cond = error_mark_node;
+		}
 	      else
 		{
 		  c_parser_consume_token (parser);
@@ -6323,9 +6344,15 @@  c_parser_for_statement (c_parser *parser
 					 "expected %<;%>");
 	    }
 	  if (ivdep && cond != error_mark_node)
-	    cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+	    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 			   build_int_cst (integer_type_node,
-			   annot_expr_ivdep_kind));
+					  annot_expr_ivdep_kind),
+			   integer_zero_node);
+	  if (unroll && cond != error_mark_node)
+	    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+			   build_int_cst (integer_type_node,
+					  annot_expr_unroll_kind),
+			   build_int_cst (integer_type_node, unroll));
 	}
       /* Parse the increment expression (the third expression in a
 	 for-statement).  In the case of a foreach-statement, this is
@@ -11035,6 +11062,45 @@  c_parser_objc_at_dynamic_declaration (c_
 }
 
 
+static bool
+c_parse_pragma_ivdep (c_parser *parser)
+{
+  c_parser_consume_pragma (parser);
+  c_parser_skip_to_pragma_eol (parser);
+  return true;
+}
+
+static unsigned short
+c_parser_pragma_unroll (c_parser *parser)
+{
+  unsigned short unroll;
+  c_parser_consume_pragma (parser);
+  location_t location = c_parser_peek_token (parser)->location;
+  tree expr = c_parser_expr_no_commas (parser, NULL).value;
+  mark_exp_read (expr);
+  expr = c_fully_fold (expr, false, NULL);
+  HOST_WIDE_INT lunroll = 0;
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
+      || TREE_CODE (expr) != INTEGER_CST
+      || (lunroll = tree_to_shwi (expr)) < 0
+      || lunroll > USHRT_MAX)
+    {
+      error_at (location, "%<#pragma GCC unroll%> requires an"
+		" assignment-expression that evaluates to a non-negative"
+		" integral constant less than or equal to %u", USHRT_MAX);
+      unroll = 0;
+    }
+  else
+    {
+      unroll = (unsigned short)lunroll;
+      if (unroll == 0)
+	unroll = 1;
+    }
+
+  c_parser_skip_to_pragma_eol (parser);
+  return unroll;
+}
+
 /* Handle pragmas.  Some OpenMP pragmas are associated with, and therefore
    should be considered, statements.  ALLOW_STMT is true if we're within
    the context of a function and such pragmas are to be allowed.  Returns
@@ -11177,21 +11243,46 @@  c_parser_pragma (c_parser *parser, enum
       return c_parser_omp_ordered (parser, context, if_p);
 
     case PRAGMA_IVDEP:
-      c_parser_consume_pragma (parser);
-      c_parser_skip_to_pragma_eol (parser);
-      if (!c_parser_next_token_is_keyword (parser, RID_FOR)
-	  && !c_parser_next_token_is_keyword (parser, RID_WHILE)
-	  && !c_parser_next_token_is_keyword (parser, RID_DO))
-	{
-	  c_parser_error (parser, "for, while or do statement expected");
-	  return false;
-	}
-      if (c_parser_next_token_is_keyword (parser, RID_FOR))
-	c_parser_for_statement (parser, true, if_p);
-      else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
-	c_parser_while_statement (parser, true, if_p);
-      else
-	c_parser_do_statement (parser, true);
+      {
+	bool ivdep = c_parse_pragma_ivdep (parser);
+	unsigned short unroll = 0;
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL)
+	  unroll = c_parser_pragma_unroll (parser);
+	if (!c_parser_next_token_is_keyword (parser, RID_FOR)
+	    && !c_parser_next_token_is_keyword (parser, RID_WHILE)
+	    && !c_parser_next_token_is_keyword (parser, RID_DO))
+	  {
+	    c_parser_error (parser, "for, while or do statement expected");
+	    return false;
+	  }
+	if (c_parser_next_token_is_keyword (parser, RID_FOR))
+	  c_parser_for_statement (parser, ivdep, unroll, if_p);
+	else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
+	  c_parser_while_statement (parser, ivdep, unroll, if_p);
+	else
+	  c_parser_do_statement (parser, ivdep, unroll);
+      }
+      return false;
+    case PRAGMA_UNROLL:
+      {
+	unsigned short unroll = c_parser_pragma_unroll (parser);
+	bool ivdep = false;
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP)
+	  ivdep = c_parse_pragma_ivdep (parser);
+	if (!c_parser_next_token_is_keyword (parser, RID_FOR)
+	    && !c_parser_next_token_is_keyword (parser, RID_WHILE)
+	    && !c_parser_next_token_is_keyword (parser, RID_DO))
+	  {
+	    c_parser_error (parser, "for, while or do statement expected");
+	    return false;
+	  }
+	if (c_parser_next_token_is_keyword (parser, RID_FOR))
+	  c_parser_for_statement (parser, ivdep, unroll, if_p);
+	else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
+	  c_parser_while_statement (parser, ivdep, unroll, if_p);
+	else
+	  c_parser_do_statement (parser, ivdep, unroll);
+      }
       return false;
 
     case PRAGMA_GCC_PCH_PREPROCESS:
Index: c-family/c-pragma.c
===================================================================
--- c-family/c-pragma.c	(revision 254797)
+++ c-family/c-pragma.c	(working copy)
@@ -1544,6 +1544,10 @@  init_pragma (void)
     cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false,
 				  false);
 
+  if (!flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL,
+				  false, false);
+
   if (flag_cilkplus)
     cpp_register_deferred_pragma (parse_in, "cilk", "grainsize",
 				  PRAGMA_CILK_GRAINSIZE, true, false);
Index: c-family/c-pragma.h
===================================================================
--- c-family/c-pragma.h	(revision 254797)
+++ c-family/c-pragma.h	(working copy)
@@ -75,6 +75,7 @@  enum pragma_kind {
 
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
+  PRAGMA_UNROLL,
 
   PRAGMA_FIRST_EXTERNAL
 };
Index: cfgloop.h
===================================================================
--- cfgloop.h	(revision 254797)
+++ cfgloop.h	(working copy)
@@ -221,6 +221,11 @@  struct GTY ((chain_next ("%h.next"))) lo
   /* True if the loop is part of an oacc kernels region.  */
   unsigned in_oacc_kernels_region : 1;
 
+  /* The number of times to unroll the loop.  0, means no information
+     given, just do what we always do.  A value of 1, means don't unroll
+     the loop.  */
+  unsigned short unroll;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
      by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
      builtins.  */
Index: cp/constexpr.c
===================================================================
--- cp/constexpr.c	(revision 254797)
+++ cp/constexpr.c	(working copy)
@@ -4631,7 +4631,6 @@  cxx_eval_constant_expression (const cons
       return t;
 
     case ANNOTATE_EXPR:
-      gcc_assert (tree_to_uhwi (TREE_OPERAND (t, 1)) == annot_expr_ivdep_kind);
       r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0),
 					lval,
 					non_constant_p, overflow_p,
@@ -5879,7 +5878,6 @@  potential_constant_expression_1 (tree t,
       }
 
     case ANNOTATE_EXPR:
-      gcc_assert (tree_to_uhwi (TREE_OPERAND (t, 1)) == annot_expr_ivdep_kind);
       return RECUR (TREE_OPERAND (t, 0), rval);
 
     default:
Index: cp/cp-array-notation.c
===================================================================
--- cp/cp-array-notation.c	(revision 254797)
+++ cp/cp-array-notation.c	(working copy)
@@ -67,7 +67,7 @@  create_an_loop (tree init, tree cond, tr
   finish_expr_stmt (init);
   for_stmt = begin_for_stmt (NULL_TREE, NULL_TREE);
   finish_init_stmt (for_stmt);
-  finish_for_cond (cond, for_stmt, false);
+  finish_for_cond (cond, for_stmt, false, 0);
   finish_for_expr (incr, for_stmt);
   finish_expr_stmt (body);
   finish_for_stmt (for_stmt);
Index: cp/cp-tree.h
===================================================================
--- cp/cp-tree.h	(revision 254797)
+++ cp/cp-tree.h	(working copy)
@@ -6402,7 +6402,8 @@  extern tree implicitly_declare_fn
 extern bool maybe_clone_body			(tree);
 
 /* In parser.c */
-extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool);
+extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool,
+				  unsigned short);
 extern bool parsing_nsdmi (void);
 extern bool parsing_default_capturing_generic_lambda_in_template (void);
 extern void inject_this_parameter (tree, cp_cv_quals);
@@ -6687,16 +6688,16 @@  extern void begin_else_clause			(tree);
 extern void finish_else_clause			(tree);
 extern void finish_if_stmt			(tree);
 extern tree begin_while_stmt			(void);
-extern void finish_while_stmt_cond		(tree, tree, bool);
+extern void finish_while_stmt_cond	(tree, tree, bool, unsigned short);
 extern void finish_while_stmt			(tree);
 extern tree begin_do_stmt			(void);
 extern void finish_do_body			(tree);
-extern void finish_do_stmt			(tree, tree, bool);
+extern void finish_do_stmt		(tree, tree, bool, unsigned short);
 extern tree finish_return_stmt			(tree);
 extern tree begin_for_scope			(tree *);
 extern tree begin_for_stmt			(tree, tree);
 extern void finish_init_stmt			(tree);
-extern void finish_for_cond			(tree, tree, bool);
+extern void finish_for_cond		(tree, tree, bool, unsigned short);
 extern void finish_for_expr			(tree, tree);
 extern void finish_for_stmt			(tree);
 extern tree begin_range_for_stmt		(tree, tree);
Index: cp/init.c
===================================================================
--- cp/init.c	(revision 254797)
+++ cp/init.c	(working copy)
@@ -4319,7 +4319,7 @@  build_vec_init (tree base, tree maxindex
       finish_init_stmt (for_stmt);
       finish_for_cond (build2 (GT_EXPR, boolean_type_node, iterator,
 			       build_int_cst (TREE_TYPE (iterator), -1)),
-		       for_stmt, false);
+		       for_stmt, false, 0);
       elt_init = cp_build_unary_op (PREDECREMENT_EXPR, iterator, false,
 				    complain);
       if (elt_init == error_mark_node)
Index: cp/parser.c
===================================================================
--- cp/parser.c	(revision 254797)
+++ cp/parser.c	(working copy)
@@ -2119,15 +2119,15 @@  static tree cp_parser_selection_statemen
 static tree cp_parser_condition
   (cp_parser *);
 static tree cp_parser_iteration_statement
-  (cp_parser *, bool *, bool);
+  (cp_parser *, bool *, bool, unsigned short);
 static bool cp_parser_init_statement
   (cp_parser *, tree *decl);
 static tree cp_parser_for
-  (cp_parser *, bool);
+  (cp_parser *, bool, unsigned short);
 static tree cp_parser_c_for
-  (cp_parser *, tree, tree, bool);
+  (cp_parser *, tree, tree, bool, unsigned short);
 static tree cp_parser_range_for
-  (cp_parser *, tree, tree, tree, bool);
+  (cp_parser *, tree, tree, tree, bool, unsigned short);
 static void do_range_for_auto_deduction
   (tree, tree);
 static tree cp_parser_perform_range_for_lookup
@@ -10875,7 +10875,7 @@  cp_parser_statement (cp_parser* parser,
 	case RID_WHILE:
 	case RID_DO:
 	case RID_FOR:
-	  statement = cp_parser_iteration_statement (parser, if_p, false);
+	  statement = cp_parser_iteration_statement (parser, if_p, false, 0);
 	  break;
 
 	case RID_CILK_FOR:
@@ -11742,7 +11742,7 @@  cp_parser_condition (cp_parser* parser)
    not included. */
 
 static tree
-cp_parser_for (cp_parser *parser, bool ivdep)
+cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
 {
   tree init, scope, decl;
   bool is_range_for;
@@ -11754,13 +11754,14 @@  cp_parser_for (cp_parser *parser, bool i
   is_range_for = cp_parser_init_statement (parser, &decl);
 
   if (is_range_for)
-    return cp_parser_range_for (parser, scope, init, decl, ivdep);
+    return cp_parser_range_for (parser, scope, init, decl, ivdep, unroll);
   else
-    return cp_parser_c_for (parser, scope, init, ivdep);
+    return cp_parser_c_for (parser, scope, init, ivdep, unroll);
 }
 
 static tree
-cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep)
+cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep,
+		 unsigned short unroll)
 {
   /* Normal for loop */
   tree condition = NULL_TREE;
@@ -11781,7 +11782,13 @@  cp_parser_c_for (cp_parser *parser, tree
 		       "%<GCC ivdep%> pragma");
       condition = error_mark_node;
     }
-  finish_for_cond (condition, stmt, ivdep);
+  else if (unroll)
+    {
+      cp_parser_error (parser, "missing loop condition in loop with "
+		       "%<GCC unroll%> pragma");
+      condition = error_mark_node;
+    }
+  finish_for_cond (condition, stmt, ivdep, unroll);
   /* Look for the `;'.  */
   cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
 
@@ -11805,7 +11812,7 @@  cp_parser_c_for (cp_parser *parser, tree
 
 static tree
 cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl,
-		     bool ivdep)
+		     bool ivdep, unsigned short unroll)
 {
   tree stmt, range_expr;
   auto_vec <cxx_binding *, 16> bindings;
@@ -11874,6 +11881,8 @@  cp_parser_range_for (cp_parser *parser,
       stmt = begin_range_for_stmt (scope, init);
       if (ivdep)
 	RANGE_FOR_IVDEP (stmt) = 1;
+      if (unroll)
+	/* TODO */(void)0;
       finish_range_for_decl (stmt, range_decl, range_expr);
       if (!type_dependent_expression_p (range_expr)
 	  /* do_auto_deduction doesn't mess with template init-lists.  */
@@ -11884,7 +11893,8 @@  cp_parser_range_for (cp_parser *parser,
     {
       stmt = begin_for_stmt (scope, init);
       stmt = cp_convert_range_for (stmt, range_decl, range_expr,
-				   decomp_first_name, decomp_cnt, ivdep);
+				   decomp_first_name, decomp_cnt, ivdep,
+				   unroll);
     }
   return stmt;
 }
@@ -11978,7 +11988,7 @@  do_range_for_auto_deduction (tree decl,
 tree
 cp_convert_range_for (tree statement, tree range_decl, tree range_expr,
 		      tree decomp_first_name, unsigned int decomp_cnt,
-		      bool ivdep)
+		      bool ivdep, unsigned short unroll)
 {
   tree begin, end;
   tree iter_type, begin_expr, end_expr;
@@ -12039,7 +12049,7 @@  cp_convert_range_for (tree statement, tr
 				 begin, ERROR_MARK,
 				 end, ERROR_MARK,
 				 NULL, tf_warning_or_error);
-  finish_for_cond (condition, statement, ivdep);
+  finish_for_cond (condition, statement, ivdep, unroll);
 
   /* The new increment expression.  */
   expression = finish_unary_op_expr (input_location,
@@ -12214,7 +12224,8 @@  cp_parser_range_for_member_function (tre
    Returns the new WHILE_STMT, DO_STMT, FOR_STMT or RANGE_FOR_STMT.  */
 
 static tree
-cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep)
+cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,
+			       unsigned short unroll)
 {
   cp_token *token;
   enum rid keyword;
@@ -12248,7 +12259,7 @@  cp_parser_iteration_statement (cp_parser
 	parens.require_open (parser);
 	/* Parse the condition.  */
 	condition = cp_parser_condition (parser);
-	finish_while_stmt_cond (condition, statement, ivdep);
+	finish_while_stmt_cond (condition, statement, ivdep, unroll);
 	/* Look for the `)'.  */
 	parens.require_close (parser);
 	/* Parse the dependent statement.  */
@@ -12279,7 +12290,7 @@  cp_parser_iteration_statement (cp_parser
 	/* Parse the expression.  */
 	expression = cp_parser_expression (parser);
 	/* We're done with the do-statement.  */
-	finish_do_stmt (expression, statement, ivdep);
+	finish_do_stmt (expression, statement, ivdep, unroll);
 	/* Look for the `)'.  */
 	parens.require_close (parser);
 	/* Look for the `;'.  */
@@ -12293,7 +12304,7 @@  cp_parser_iteration_statement (cp_parser
 	matching_parens parens;
 	parens.require_open (parser);
 
-	statement = cp_parser_for (parser, ivdep);
+	statement = cp_parser_for (parser, ivdep, unroll);
 
 	/* Look for the `)'.  */
 	parens.require_close (parser);
@@ -38672,6 +38683,41 @@  cp_parser_cilk_grainsize (cp_parser *par
   cp_parser_skip_to_pragma_eol (parser, pragma_tok);
 }
 
+static bool
+cp_parser_pragma_ivdep (cp_parser *parser, cp_token *pragma_tok)
+{
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+  return true;
+}
+
+static unsigned short
+cp_parser_pragma_unroll (cp_parser *parser, cp_token *pragma_tok)
+{
+  location_t location = cp_lexer_peek_token (parser->lexer)->location;
+  tree expr = cp_parser_constant_expression (parser);
+  unsigned short unroll;
+  expr = maybe_constant_value (expr);
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+  HOST_WIDE_INT lunroll = 0;
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
+      || TREE_CODE (expr) != INTEGER_CST
+      || (lunroll = tree_to_shwi (expr)) < 0
+      || lunroll > USHRT_MAX)
+    {
+      error_at (location, "%<#pragma GCC unroll%> requires an"
+		" assignment-expression that evaluates to a non-negative"
+		" integral constant less than or equal to %u", USHRT_MAX);
+      unroll = 0;
+    }
+  else
+    {
+      unroll = (unsigned short)lunroll;
+      if (unroll == 0)
+	unroll = 1;
+    }
+  return unroll;
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -38914,9 +38960,45 @@  cp_parser_pragma (cp_parser *parser, enu
 		      "%<#pragma GCC ivdep%> must be inside a function");
 	    break;
 	  }
-	cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+	bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
+	unsigned short unroll = 0;
 	cp_token *tok;
 	tok = cp_lexer_peek_token (the_parser->lexer);
+	if (tok->type == CPP_PRAGMA
+	    && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
+	  {
+	    unroll = cp_parser_pragma_unroll (parser, pragma_tok);
+	    tok = cp_lexer_peek_token (the_parser->lexer);
+	  }
+	if (tok->type != CPP_KEYWORD
+	    || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE
+		&& tok->keyword != RID_DO))
+	  {
+	    cp_parser_error (parser, "for, while or do statement expected");
+	    return false;
+	  }
+	cp_parser_iteration_statement (parser, if_p, ivdep, unroll);
+	return true;
+      }
+
+    case PRAGMA_UNROLL:
+      {
+	if (context == pragma_external)
+	  {
+	    error_at (pragma_tok->location,
+		      "%<#pragma GCC unroll%> must be inside a function");
+	    break;
+	  }
+	unsigned short unroll = cp_parser_pragma_unroll (parser, pragma_tok);
+	bool ivdep = false;
+	cp_token *tok;
+	tok = cp_lexer_peek_token (the_parser->lexer);
+	if (tok->type == CPP_PRAGMA
+	    && cp_parser_pragma_kind (tok) == PRAGMA_IVDEP)
+	  {
+	    ivdep = cp_parser_pragma_ivdep (parser, tok);
+	    tok = cp_lexer_peek_token (the_parser->lexer);
+	  }
 	if (tok->type != CPP_KEYWORD
 	    || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE
 		&& tok->keyword != RID_DO))
@@ -38924,7 +39006,7 @@  cp_parser_pragma (cp_parser *parser, enu
 	    cp_parser_error (parser, "for, while or do statement expected");
 	    return false;
 	  }
-	cp_parser_iteration_statement (parser, if_p, true);
+	cp_parser_iteration_statement (parser, if_p, ivdep, unroll);
 	return true;
       }
 
Index: cp/pt.c
===================================================================
--- cp/pt.c	(revision 254797)
+++ cp/pt.c	(working copy)
@@ -16090,7 +16090,7 @@  tsubst_expr (tree t, tree args, tsubst_f
       RECUR (FOR_INIT_STMT (t));
       finish_init_stmt (stmt);
       tmp = RECUR (FOR_COND (t));
-      finish_for_cond (tmp, stmt, false);
+      finish_for_cond (tmp, stmt, false, 0);
       tmp = RECUR (FOR_EXPR (t));
       finish_for_expr (tmp, stmt);
       RECUR (FOR_BODY (t));
@@ -16112,11 +16112,11 @@  tsubst_expr (tree t, tree args, tsubst_f
 	    decl = tsubst_decomp_names (decl, RANGE_FOR_DECL (t), args,
 					complain, in_decl, &first, &cnt);
 	    stmt = cp_convert_range_for (stmt, decl, expr, first, cnt,
-					 RANGE_FOR_IVDEP (t));
+					 RANGE_FOR_IVDEP (t), 0);
 	  }
 	else
 	  stmt = cp_convert_range_for (stmt, decl, expr, NULL_TREE, 0,
-				       RANGE_FOR_IVDEP (t));
+				       RANGE_FOR_IVDEP (t), 0);
         RECUR (RANGE_FOR_BODY (t));
         finish_for_stmt (stmt);
       }
@@ -16125,7 +16125,7 @@  tsubst_expr (tree t, tree args, tsubst_f
     case WHILE_STMT:
       stmt = begin_while_stmt ();
       tmp = RECUR (WHILE_COND (t));
-      finish_while_stmt_cond (tmp, stmt, false);
+      finish_while_stmt_cond (tmp, stmt, false, 0);
       RECUR (WHILE_BODY (t));
       finish_while_stmt (stmt);
       break;
@@ -16135,7 +16135,7 @@  tsubst_expr (tree t, tree args, tsubst_f
       RECUR (DO_BODY (t));
       finish_do_body (stmt);
       tmp = RECUR (DO_COND (t));
-      finish_do_stmt (tmp, stmt, false);
+      finish_do_stmt (tmp, stmt, false, 0);
       break;
 
     case IF_STMT:
@@ -16699,8 +16699,10 @@  tsubst_expr (tree t, tree args, tsubst_f
 
     case ANNOTATE_EXPR:
       tmp = RECUR (TREE_OPERAND (t, 0));
-      RETURN (build2_loc (EXPR_LOCATION (t), ANNOTATE_EXPR,
-			  TREE_TYPE (tmp), tmp, RECUR (TREE_OPERAND (t, 1))));
+      RETURN (build3_loc (EXPR_LOCATION (t), ANNOTATE_EXPR,
+			  TREE_TYPE (tmp), tmp,
+			  RECUR (TREE_OPERAND (t, 1)),
+			  RECUR (TREE_OPERAND (t, 2))));
 
     default:
       gcc_assert (!STATEMENT_CODE_P (TREE_CODE (t)));
Index: cp/semantics.c
===================================================================
--- cp/semantics.c	(revision 254797)
+++ cp/semantics.c	(working copy)
@@ -802,7 +802,8 @@  begin_while_stmt (void)
    WHILE_STMT.  */
 
 void
-finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep)
+finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep,
+			unsigned short unroll)
 {
   if (check_no_cilk (cond,
       "Cilk array notation cannot be used as a condition for while statement",
@@ -812,11 +813,20 @@  finish_while_stmt_cond (tree cond, tree
   finish_cond (&WHILE_COND (while_stmt), cond);
   begin_maybe_infinite_loop (cond);
   if (ivdep && cond != error_mark_node)
-    WHILE_COND (while_stmt) = build2 (ANNOTATE_EXPR,
+    WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR,
 				      TREE_TYPE (WHILE_COND (while_stmt)),
 				      WHILE_COND (while_stmt),
 				      build_int_cst (integer_type_node,
-						     annot_expr_ivdep_kind));
+						     annot_expr_ivdep_kind),
+				      integer_zero_node);
+  if (unroll && cond != error_mark_node)
+    WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR,
+				      TREE_TYPE (WHILE_COND (while_stmt)),
+				      WHILE_COND (while_stmt),
+				      build_int_cst (integer_type_node,
+						     annot_expr_unroll_kind),
+				      build_int_cst (integer_type_node,
+						     unroll));
   simplify_loop_decl_cond (&WHILE_COND (while_stmt), WHILE_BODY (while_stmt));
 }
 
@@ -861,7 +871,7 @@  finish_do_body (tree do_stmt)
    COND is as indicated.  */
 
 void
-finish_do_stmt (tree cond, tree do_stmt, bool ivdep)
+finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll)
 {
   if (check_no_cilk (cond,
   "Cilk array notation cannot be used as a condition for a do-while statement",
@@ -870,8 +880,13 @@  finish_do_stmt (tree cond, tree do_stmt,
   cond = maybe_convert_cond (cond);
   end_maybe_infinite_loop (cond);
   if (ivdep && cond != error_mark_node)
-    cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
-		   build_int_cst (integer_type_node, annot_expr_ivdep_kind));
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node, annot_expr_ivdep_kind),
+		   integer_zero_node);
+  if (unroll && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node, annot_expr_unroll_kind),
+		   build_int_cst (integer_type_node, unroll));
   DO_COND (do_stmt) = cond;
 }
 
@@ -980,7 +995,7 @@  finish_init_stmt (tree for_stmt)
    FOR_STMT.  */
 
 void
-finish_for_cond (tree cond, tree for_stmt, bool ivdep)
+finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll)
 {
   if (check_no_cilk (cond,
 	 "Cilk array notation cannot be used in a condition for a for-loop",
@@ -990,11 +1005,20 @@  finish_for_cond (tree cond, tree for_stm
   finish_cond (&FOR_COND (for_stmt), cond);
   begin_maybe_infinite_loop (cond);
   if (ivdep && cond != error_mark_node)
-    FOR_COND (for_stmt) = build2 (ANNOTATE_EXPR,
+    FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR,
 				  TREE_TYPE (FOR_COND (for_stmt)),
 				  FOR_COND (for_stmt),
 				  build_int_cst (integer_type_node,
-						 annot_expr_ivdep_kind));
+						 annot_expr_ivdep_kind),
+				  integer_zero_node);
+  if (unroll && cond != error_mark_node)
+    FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR,
+				  TREE_TYPE (FOR_COND (for_stmt)),
+				  FOR_COND (for_stmt),
+				  build_int_cst (integer_type_node,
+						 annot_expr_unroll_kind),
+				  build_int_cst (integer_type_node,
+						 unroll));
   simplify_loop_decl_cond (&FOR_COND (for_stmt), FOR_BODY (for_stmt));
 }
 
Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(revision 254797)
+++ doc/extend.texi	(working copy)
@@ -22376,6 +22376,18 @@  void ignore_vec_dep (int *a, int k, int
 @}
 @end smallexample
 
+@table @code
+@item #pragma GCC unroll @var{n}
+@cindex pragma GCC unroll @var{n}
+
+With this pragma, the programmer informs the optimizer how many times
+a loop should be unrolled.  A 0 or 1 informs the compiler to not
+perform any loop unrolling.  The pragma must be immediately before
+@samp{#pragma ivdep} or a @code{for}, @code{while} or @code{do} loop
+and applies only to the loop that follows.  @var{n} is an
+assignment-expression that evaluates to an integer constant.
+
+@end table
 
 @node Unnamed Fields
 @section Unnamed Structure and Union Fields
Index: doc/generic.texi
===================================================================
--- doc/generic.texi	(revision 254797)
+++ doc/generic.texi	(working copy)
@@ -1686,7 +1686,7 @@  its sole argument yields the representat
 @item ANNOTATE_EXPR
 This node is used to attach markers to an expression. The first operand
 is the annotated expression, the second is an @code{INTEGER_CST} with
-a value from @code{enum annot_expr_kind}.
+a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}.
 @end table
 
 
Index: fortran/trans-stmt.c
===================================================================
--- fortran/trans-stmt.c	(revision 254797)
+++ fortran/trans-stmt.c	(working copy)
@@ -3453,9 +3453,10 @@  gfc_trans_forall_loop (forall_info *fora
       cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
 			      count, build_int_cst (TREE_TYPE (count), 0));
       if (forall_tmp->do_concurrent)
-	cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+	cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		       build_int_cst (integer_type_node,
-				      annot_expr_ivdep_kind));
+				      annot_expr_ivdep_kind),
+		       integer_zero_node);
 
       tmp = build1_v (GOTO_EXPR, exit_label);
       tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node,
Index: function.h
===================================================================
--- function.h	(revision 254797)
+++ function.h	(working copy)
@@ -385,8 +385,11 @@  struct GTY(()) function {
      nonzero value in loop->simduid.  */
   unsigned int has_simduid_loops : 1;
 
-  /* Set when the tail call has been identified.  */
+  /* Nonzero when the tail call has been identified.  */
   unsigned int tail_call_marked : 1;
+
+  /* Nonzero if the current function contains a #pragma GCC unroll.  */
+  unsigned int has_unroll : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
Index: gimplify.c
===================================================================
--- gimplify.c	(revision 254797)
+++ gimplify.c	(working copy)
@@ -3747,6 +3747,7 @@  gimple_boolify (tree expr)
       switch ((enum annot_expr_kind) TREE_INT_CST_LOW (TREE_OPERAND (expr, 1)))
 	{
 	case annot_expr_ivdep_kind:
+	case annot_expr_unroll_kind:
 	case annot_expr_no_vector_kind:
 	case annot_expr_vector_kind:
 	  TREE_OPERAND (expr, 0) = gimple_boolify (TREE_OPERAND (expr, 0));
@@ -11389,6 +11390,7 @@  gimplify_expr (tree *expr_p, gimple_seq
 	  {
 	    tree cond = TREE_OPERAND (*expr_p, 0);
 	    tree kind = TREE_OPERAND (*expr_p, 1);
+	    tree data = TREE_OPERAND (*expr_p, 2);
 	    tree type = TREE_TYPE (cond);
 	    if (!INTEGRAL_TYPE_P (type))
 	      {
@@ -11399,7 +11401,7 @@  gimplify_expr (tree *expr_p, gimple_seq
 	    tree tmp = create_tmp_var (type);
 	    gimplify_arg (&cond, pre_p, EXPR_LOCATION (*expr_p));
 	    gcall *call
-	      = gimple_build_call_internal (IFN_ANNOTATE, 2, cond, kind);
+	      = gimple_build_call_internal (IFN_ANNOTATE, 3, cond, kind, data);
 	    gimple_call_set_lhs (call, tmp);
 	    gimplify_seq_add_stmt (pre_p, call);
 	    *expr_p = tmp;
Index: loop-init.c
===================================================================
--- loop-init.c	(revision 254797)
+++ loop-init.c	(working copy)
@@ -361,8 +361,8 @@  pass_loop2::gate (function *fun)
       && (flag_move_loop_invariants
 	  || flag_unswitch_loops
 	  || flag_unroll_loops
-	  || (flag_branch_on_count_reg
-	      && targetm.have_doloop_end ())))
+	  || (flag_branch_on_count_reg && targetm.have_doloop_end ())
+	  || cfun->has_unroll))
     return true;
   else
     {
@@ -560,7 +560,7 @@  public:
   /* opt_pass methods: */
   virtual bool gate (function *)
     {
-      return (flag_unroll_loops || flag_unroll_all_loops);
+      return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll);
     }
 
   virtual unsigned int execute (function *);
Index: loop-unroll.c
===================================================================
--- loop-unroll.c	(revision 254797)
+++ loop-unroll.c	(working copy)
@@ -212,7 +212,7 @@  report_unroll (struct loop *loop, locati
 
 /* Decide whether unroll loops and how much.  */
 static void
-decide_unrolling (int flags)
+decide_unrolling (int base_flags)
 {
   struct loop *loop;
 
@@ -224,9 +224,16 @@  decide_unrolling (int flags)
 
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, locus,
-                         ";; *** Considering loop %d at BB %d for "
-                         "unrolling ***\n",
-                         loop->num, loop->header->index);
+			 "considering unrolling loop %d at BB %d\n",
+			 loop->num, loop->header->index);
+
+      if (loop->unroll == 1)
+	{
+	  if (dump_file)
+	    fprintf (dump_file,
+		     ";; Not unrolling loop, user didn't want it unrolled\n");
+	  continue;
+	}
 
       /* Do not peel cold areas.  */
       if (optimize_loop_for_size_p (loop))
@@ -258,6 +265,9 @@  decide_unrolling (int flags)
 
       /* Try transformations one by one in decreasing order of
 	 priority.  */
+      int flags = base_flags;
+      if (loop->unroll > 1)
+	flags = UAP_UNROLL | UAP_UNROLL_ALL;
 
       decide_unroll_constant_iterations (loop, flags);
       if (loop->lpt_decision.decision == LPT_NONE)
@@ -353,13 +363,13 @@  decide_unroll_constant_iterations (struc
       return;
     }
 
-  if (dump_file)
-    fprintf (dump_file,
-	     "\n;; Considering unrolling loop with constant "
-	     "number of iterations\n");
+  if (dump_enabled_p ())
+    dump_printf (MSG_NOTE,
+		 "considering unrolling loop with constant "
+		 "number of iterations\n");
 
   /* nunroll = total number of copies of the original loop body in
-     unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
+     unrolled loop (i.e. if it is 2, we have to duplicate loop body once).  */
   nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
   nunroll_by_av
     = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
@@ -391,6 +401,14 @@  decide_unroll_constant_iterations (struc
       return;
     }
 
+  /* Check for an explicit unrolling factor.  */
+  if (loop->unroll)
+    {
+      loop->lpt_decision.decision = LPT_UNROLL_CONSTANT;
+      loop->lpt_decision.times = MIN ((unsigned) loop->unroll - 1, desc->niter);
+      return;
+    }
+
   /* Check whether the loop rolls enough to consider.  
      Consult also loop bounds and profile; in the case the loop has more
      than one exit it may well loop less than determined maximal number
@@ -657,10 +675,10 @@  decide_unroll_runtime_iterations (struct
       return;
     }
 
-  if (dump_file)
-    fprintf (dump_file,
-	     "\n;; Considering unrolling loop with runtime "
-	     "computable number of iterations\n");
+  if (dump_enabled_p ())
+    dump_printf (MSG_NOTE,
+		 "considering unrolling loop with runtime-"
+		 "computable number of iterations\n");
 
   /* nunroll = total number of copies of the original loop body in
      unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
@@ -674,6 +692,9 @@  decide_unroll_runtime_iterations (struct
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
 
+  if (loop->unroll)
+    nunroll = loop->unroll;
+
   /* Skip big loops.  */
   if (nunroll <= 1)
     {
@@ -712,8 +733,9 @@  decide_unroll_runtime_iterations (struct
       return;
     }
 
-  /* Success; now force nunroll to be power of 2, as we are unable to
-     cope with overflows in computation of number of iterations.  */
+  /* Success; now force nunroll to be power of 2, as code-gen
+     requires it, we are unable to cope with overflows in
+     computation of number of iterations.  */
   for (i = 1; 2 * i <= nunroll; i *= 2)
     continue;
 
@@ -824,9 +846,10 @@  compare_and_jump_seq (rtx op0, rtx op1,
   return seq;
 }
 
-/* Unroll LOOP for which we are able to count number of iterations in runtime
-   LOOP->LPT_DECISION.TIMES times.  The transformation does this (with some
-   extra care for case n < 0):
+/* Unroll LOOP for which we are able to count number of iterations in
+   runtime LOOP->LPT_DECISION.TIMES times.  The times value must be a
+   power of two.  The transformation does this (with some extra care
+   for case n < 0):
 
    for (i = 0; i < n; i++)
      body;
@@ -1139,8 +1162,8 @@  decide_unroll_stupid (struct loop *loop,
       return;
     }
 
-  if (dump_file)
-    fprintf (dump_file, "\n;; Considering unrolling loop stupidly\n");
+  if (dump_enabled_p ())
+    dump_printf (MSG_NOTE, "considering unrolling loop stupidly\n");
 
   /* nunroll = total number of copies of the original loop body in
      unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
@@ -1155,6 +1178,9 @@  decide_unroll_stupid (struct loop *loop,
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
 
+  if (loop->unroll)
+    nunroll = loop->unroll;
+
   /* Skip big loops.  */
   if (nunroll <= 1)
     {
Index: lto-streamer-in.c
===================================================================
--- lto-streamer-in.c	(revision 254797)
+++ lto-streamer-in.c	(working copy)
@@ -825,6 +825,7 @@  input_cfg (struct lto_input_block *ib, s
 
       /* Read OMP SIMD related info.  */
       loop->safelen = streamer_read_hwi (ib);
+      loop->unroll = streamer_read_hwi (ib);
       loop->dont_vectorize = streamer_read_hwi (ib);
       loop->force_vectorize = streamer_read_hwi (ib);
       loop->simduid = stream_read_tree (ib, data_in);
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(revision 254797)
+++ lto-streamer-out.c	(working copy)
@@ -1929,6 +1929,7 @@  output_cfg (struct output_block *ob, str
 
       /* Write OMP SIMD related info.  */
       streamer_write_hwi (ob, loop->safelen);
+      streamer_write_hwi (ob, loop->unroll);
       streamer_write_hwi (ob, loop->dont_vectorize);
       streamer_write_hwi (ob, loop->force_vectorize);
       stream_write_tree (ob, loop->simduid, true);
Index: testsuite/c-c++-common/unroll-1.c
===================================================================
--- testsuite/c-c++-common/unroll-1.c	(revision 0)
+++ testsuite/c-c++-common/unroll-1.c	(working copy)
@@ -0,0 +1,41 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdisable-tree-cunroll -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */
+
+extern void bar (int);
+
+int j;
+
+void test (void)
+{
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+  /* { dg-final { scan-tree-dump "11:.*: note: loop with 8 iterations completely unrolled" "cunrolli" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 7; ++i)
+    bar(i);
+  /* { dg-final { scan-tree-dump "16:.*: note: loop with 7 iterations completely unrolled" "cunrolli" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 15; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "21:.*: note: loop unrolled 7 times" "loop2_unroll" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "26:.*: note: loop unrolled 7 times" "loop2_unroll" } } */
+
+  #pragma GCC unroll 7
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "31:.*: note: loop unrolled 3 times" "loop2_unroll" } } */
+
+  unsigned long i = 0;
+  #pragma GCC unroll 3
+  do {
+    bar(i);
+  } while (++i < 9);
+  /* { dg-final { scan-rtl-dump "3\[79\]:.*: note: loop unrolled 2 times" "loop2_unroll" } } */
+}
Index: testsuite/c-c++-common/unroll-2.c
===================================================================
--- testsuite/c-c++-common/unroll-2.c	(revision 0)
+++ testsuite/c-c++-common/unroll-2.c	(working copy)
@@ -0,0 +1,41 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */
+
+extern void bar (int);
+
+int j;
+
+void test (void)
+{
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+  /* { dg-final { scan-tree-dump "11:.*: note: loop with 8 iterations completely unrolled" "cunrolli" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 7; ++i)
+    bar(i);
+  /* { dg-final { scan-tree-dump "16:.*: note: loop with 7 iterations completely unrolled" "cunrolli" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= 15; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "21:.*: note: loop unrolled 7 times" "loop2_unroll" } } */
+
+  #pragma GCC unroll 8
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "26:.*: note: loop unrolled 7 times" "loop2_unroll" } } */
+
+  #pragma GCC unroll 7
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+  /* { dg-final { scan-rtl-dump "31:.*: note: loop unrolled 3 times" "loop2_unroll" } } */
+
+  unsigned long i = 0;
+  #pragma GCC unroll 3
+  do {
+    bar(i);
+  } while (++i < 9);
+  /* { dg-final { scan-rtl-dump "3\[79\]:.*: note: loop unrolled 2 times" "loop2_unroll" } } */
+}
Index: testsuite/c-c++-common/unroll-3.c
===================================================================
--- testsuite/c-c++-common/unroll-3.c	(revision 0)
+++ testsuite/c-c++-common/unroll-3.c	(working copy)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */
+
+extern void bar (int);
+
+int j;
+
+void test (void)
+{
+  #pragma GCC unroll 0
+  for (unsigned long i = 1; i <= 3; ++i)
+    bar(i);
+
+  #pragma GCC unroll 0
+  for (unsigned long i = 1; i <= j; ++i)
+    bar(i);
+
+  /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */
+  /* { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */
+}
Index: testsuite/c-c++-common/unroll-4.c
===================================================================
--- testsuite/c-c++-common/unroll-4.c	(revision 0)
+++ testsuite/c-c++-common/unroll-4.c	(working copy)
@@ -0,0 +1,29 @@ 
+/* { dg-do compile } */
+
+extern void bar (int);
+
+int j;
+
+void test (void)
+{
+  #pragma GCC unroll 4+4
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+
+  #pragma GCC unroll -1	/* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+
+  #pragma GCC unroll 20000000000	/* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+
+  #pragma GCC unroll j	/* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */
+                        /* { dg-error "cannot appear in a constant-expression|is not usable in a constant expression" "" { target c++ } 21 } */
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+
+  #pragma GCC unroll  4.2	/* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */
+  for (unsigned long i = 1; i <= 8; ++i)
+    bar(i);
+}
Index: testsuite/gcc.dg/tree-prof/unroll-1.c
===================================================================
--- testsuite/gcc.dg/tree-prof/unroll-1.c	(revision 254797)
+++ testsuite/gcc.dg/tree-prof/unroll-1.c	(working copy)
@@ -1,4 +1,4 @@ 
-/* { dg-options "-O3 -fdump-rtl-loop2_unroll -funroll-loops -fno-peel-loops" } */
+/* { dg-options "-O3 -fdump-rtl-loop2_unroll-details -funroll-loops -fno-peel-loops" } */
 void abort ();
 
 int a[1000];
@@ -20,4 +20,4 @@  main()
     t();
   return 0;
 }
-/* { dg-final-use { scan-rtl-dump "Considering unrolling loop with constant number of iterations" "loop2_unroll" } } */
+/* { dg-final-use { scan-rtl-dump "considering unrolling loop with constant number of iterations" "loop2_unroll" } } */
Index: testsuite/gcc.dg/unroll-2.c
===================================================================
--- testsuite/gcc.dg/unroll-2.c	(revision 254797)
+++ testsuite/gcc.dg/unroll-2.c	(working copy)
@@ -15,7 +15,7 @@  int foo(void)
 {
   int i;
   bar();
-  for (i = 0; i < 2; i++) /* { dg-message "note: loop with 3 iterations completely unrolled" } */
+  for (i = 0; i < 2; i++) /* { dg-message "note: loop with 2 iterations completely unrolled" } */
   {
      a[i]= b[i] + 1;
   }
@@ -25,7 +25,7 @@  int foo(void)
 int foo2(void)
 {
   int i;
-  for (i = 0; i < 2; i++) /* { dg-message "note: loop with 3 iterations completely unrolled" } */
+  for (i = 0; i < 2; i++) /* { dg-message "note: loop with 2 iterations completely unrolled" } */
   {
      a[i]= b[i] + 1;
   }
Index: testsuite/gcc.dg/unroll-3.c
===================================================================
--- testsuite/gcc.dg/unroll-3.c	(revision 254797)
+++ testsuite/gcc.dg/unroll-3.c	(working copy)
@@ -28,4 +28,4 @@  int foo2(void)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
Index: testsuite/gcc.dg/unroll-4.c
===================================================================
--- testsuite/gcc.dg/unroll-4.c	(revision 254797)
+++ testsuite/gcc.dg/unroll-4.c	(working copy)
@@ -28,4 +28,4 @@  int foo2(void)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
Index: testsuite/gcc.dg/unroll-5.c
===================================================================
--- testsuite/gcc.dg/unroll-5.c	(revision 254797)
+++ testsuite/gcc.dg/unroll-5.c	(working copy)
@@ -28,4 +28,4 @@  int foo2(void)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
Index: testsuite/gcc.dg/unroll-7.c
===================================================================
--- testsuite/gcc.dg/unroll-7.c	(revision 254797)
+++ testsuite/gcc.dg/unroll-7.c	(working copy)
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -funroll-loops" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll-details -funroll-loops" } */
 /* { dg-require-effective-target int32plus } */
 
 extern int *a;
@@ -14,5 +14,5 @@  int t(void)
 /* { dg-final { scan-rtl-dump "number of iterations: .const_int 999999" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump "upper bound: 999999" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump "realistic bound: 999999" "loop2_unroll" } } */
-/* { dg-final { scan-rtl-dump "Considering unrolling loop with constant number of iterations" "loop2_unroll" } } */
+/* { dg-final { scan-rtl-dump "considering unrolling loop with constant number of iterations" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump-not "Invalid sum" "loop2_unroll" } } */
Index: testsuite/gnat.dg/unroll1.adb
===================================================================
--- testsuite/gnat.dg/unroll1.adb	(revision 0)
+++ testsuite/gnat.dg/unroll1.adb	(working copy)
@@ -0,0 +1,27 @@ 
+-- { dg-do compile }
+-- { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" }
+
+package body Unroll1 is
+
+   function "+" (X, Y : Sarray) return Sarray is
+      R : Sarray;
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (No_Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+      return R;
+   end;
+
+   procedure Add (X, Y : Sarray; R : out Sarray) is
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (No_Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+   end;
+
+end Unroll1;
+
+-- { dg-final { scan-tree-dump-times "Not unrolling loop .: user didn't want it unrolled completely" 2 "cunrolli" } } */
+-- { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */
Index: testsuite/gnat.dg/unroll1.ads
===================================================================
--- testsuite/gnat.dg/unroll1.ads	(revision 0)
+++ testsuite/gnat.dg/unroll1.ads	(working copy)
@@ -0,0 +1,9 @@ 
+package Unroll1 is
+
+   type Sarray is array (1 .. 4) of Float;
+   for Sarray'Alignment use 16;
+
+   function "+" (X, Y : Sarray) return Sarray;
+   procedure Add (X, Y : Sarray; R : out Sarray);
+
+end Unroll1;
Index: testsuite/gnat.dg/unroll2.adb
===================================================================
--- testsuite/gnat.dg/unroll2.adb	(revision 0)
+++ testsuite/gnat.dg/unroll2.adb	(working copy)
@@ -0,0 +1,26 @@ 
+-- { dg-do compile }
+-- { dg-options "-O2 -fdump-tree-cunrolli-details" }
+
+package body Unroll2 is
+
+   function "+" (X, Y : Sarray) return Sarray is
+      R : Sarray;
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+      return R;
+   end;
+
+   procedure Add (X, Y : Sarray; R : out Sarray) is
+   begin
+      for I in Sarray'Range loop
+         pragma Loop_Optimize (Unroll);
+         R(I) := X(I) + Y(I);
+      end loop;
+   end;
+
+end Unroll2;
+
+-- { dg-final { scan-tree-dump-times "note: loop with 3 iterations completely unrolled" 2 "cunrolli" } } */
Index: testsuite/gnat.dg/unroll2.ads
===================================================================
--- testsuite/gnat.dg/unroll2.ads	(revision 0)
+++ testsuite/gnat.dg/unroll2.ads	(working copy)
@@ -0,0 +1,9 @@ 
+package Unroll2 is
+
+   type Sarray is array (1 .. 4) of Float;
+   for Sarray'Alignment use 16;
+
+   function "+" (X, Y : Sarray) return Sarray;
+   procedure Add (X, Y : Sarray; R : out Sarray);
+
+end Unroll2;
Index: tree-cfg.c
===================================================================
--- tree-cfg.c	(revision 254797)
+++ tree-cfg.c	(working copy)
@@ -280,6 +280,11 @@  replace_loop_annotate_in_block (basic_bl
 	case annot_expr_ivdep_kind:
 	  loop->safelen = INT_MAX;
 	  break;
+	case annot_expr_unroll_kind:
+	  loop->unroll
+	    = (unsigned short) tree_to_shwi (gimple_call_arg (stmt, 2));
+	  cfun->has_unroll = true;
+	  break;
 	case annot_expr_no_vector_kind:
 	  loop->dont_vectorize = true;
 	  break;
@@ -334,6 +339,7 @@  replace_loop_annotate (void)
 	  switch ((annot_expr_kind) tree_to_shwi (gimple_call_arg (stmt, 1)))
 	    {
 	    case annot_expr_ivdep_kind:
+	    case annot_expr_unroll_kind:
 	    case annot_expr_no_vector_kind:
 	    case annot_expr_vector_kind:
 	      break;
@@ -7991,6 +7997,8 @@  print_loop (FILE *file, struct loop *loo
       fprintf (file, ", estimate = ");
       print_decu (loop->nb_iterations_estimate, file);
     }
+  if (loop->unroll)
+    fprintf (file, ", unroll = %d", loop->unroll);
   fprintf (file, ")\n");
 
   /* Print loop's body.  */
Index: tree-core.h
===================================================================
--- tree-core.h	(revision 254797)
+++ tree-core.h	(working copy)
@@ -851,6 +851,7 @@  enum tree_node_kind {
 
 enum annot_expr_kind {
   annot_expr_ivdep_kind,
+  annot_expr_unroll_kind,
   annot_expr_no_vector_kind,
   annot_expr_vector_kind,
   annot_expr_kind_last
Index: tree-inline.c
===================================================================
--- tree-inline.c	(revision 254797)
+++ tree-inline.c	(working copy)
@@ -2596,6 +2596,11 @@  copy_loops (copy_body_data *id,
 	  flow_loop_tree_node_add (dest_parent, dest_loop);
 
 	  dest_loop->safelen = src_loop->safelen;
+	  if (src_loop->unroll)
+	    {
+	      dest_loop->unroll = src_loop->unroll;
+	      cfun->has_unroll = true;
+	    }
 	  dest_loop->dont_vectorize = src_loop->dont_vectorize;
 	  if (src_loop->force_vectorize)
 	    {
Index: tree-pretty-print.c
===================================================================
--- tree-pretty-print.c	(revision 254797)
+++ tree-pretty-print.c	(working copy)
@@ -2632,6 +2632,10 @@  dump_generic_node (pretty_printer *pp, t
 	case annot_expr_ivdep_kind:
 	  pp_string (pp, ", ivdep");
 	  break;
+	case annot_expr_unroll_kind:
+	  pp_printf (pp, ", unroll %d",
+		     (int) TREE_INT_CST_LOW (TREE_OPERAND (node, 2)));
+	  break;
 	case annot_expr_no_vector_kind:
 	  pp_string (pp, ", no-vector");
 	  break;
Index: tree-ssa-loop-ivcanon.c
===================================================================
--- tree-ssa-loop-ivcanon.c	(revision 254797)
+++ tree-ssa-loop-ivcanon.c	(working copy)
@@ -681,11 +681,9 @@  try_unroll_loop_completely (struct loop
 			    HOST_WIDE_INT maxiter,
 			    location_t locus)
 {
-  unsigned HOST_WIDE_INT n_unroll = 0, ninsns, unr_insns;
-  struct loop_size size;
+  unsigned HOST_WIDE_INT n_unroll = 0;
   bool n_unroll_found = false;
   edge edge_to_cancel = NULL;
-  dump_flags_t report_flags = MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS;
 
   /* See if we proved number of iterations to be low constant.
 
@@ -726,7 +724,8 @@  try_unroll_loop_completely (struct loop
   if (!n_unroll_found)
     return false;
 
-  if (n_unroll > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES))
+  if (!loop->unroll
+      && n_unroll > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "Not unrolling loop %d "
@@ -740,121 +739,137 @@  try_unroll_loop_completely (struct loop
 
   if (n_unroll)
     {
-      bool large;
       if (ul == UL_SINGLE_ITER)
 	return false;
 
-      /* EXIT can be removed only if we are sure it passes first N_UNROLL
-	 iterations.  */
-      bool remove_exit = (exit && niter
-			  && TREE_CODE (niter) == INTEGER_CST
-			  && wi::leu_p (n_unroll, wi::to_widest (niter)));
-
-      large = tree_estimate_loop_size
-		 (loop, remove_exit ? exit : NULL, edge_to_cancel, &size,
-		  PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS));
-      ninsns = size.overall;
-      if (large)
+      if (loop->unroll)
 	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: it is too large.\n",
-		     loop->num);
-	  return false;
+	  /* If the unrolling factor is too large, bail out.  */
+	  if (n_unroll > (unsigned)loop->unroll)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file,
+			 "Not unrolling loop %d: "
+			 "user didn't want it unrolled completely.\n",
+			 loop->num);
+	      return false;
+	    }
 	}
-
-      unr_insns = estimated_unrolled_size (&size, n_unroll);
-      if (dump_file && (dump_flags & TDF_DETAILS))
+      else
 	{
-	  fprintf (dump_file, "  Loop size: %d\n", (int) ninsns);
-	  fprintf (dump_file, "  Estimated size after unrolling: %d\n",
-		   (int) unr_insns);
-	}
+	  struct loop_size size;
+	  /* EXIT can be removed only if we are sure it passes first N_UNROLL
+	     iterations.  */
+	  bool remove_exit = (exit && niter
+			      && TREE_CODE (niter) == INTEGER_CST
+			      && wi::leu_p (n_unroll, wi::to_widest (niter)));
+	  bool large
+	    = tree_estimate_loop_size
+		(loop, remove_exit ? exit : NULL, edge_to_cancel, &size,
+		 PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS));
+	  if (large)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: it is too large.\n",
+			 loop->num);
+	      return false;
+	    }
 
-      /* If the code is going to shrink, we don't need to be extra cautious
-	 on guessing if the unrolling is going to be profitable.  */
-      if (unr_insns
-	  /* If there is IV variable that will become constant, we save
-	     one instruction in the loop prologue we do not account
-	     otherwise.  */
-	  <= ninsns + (size.constant_iv != false))
-	;
-      /* We unroll only inner loops, because we do not consider it profitable
-	 otheriwse.  We still can cancel loopback edge of not rolling loop;
-	 this is always a good idea.  */
-      else if (ul == UL_NO_GROWTH)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: size would grow.\n",
-		     loop->num);
-	  return false;
-	}
-      /* Outer loops tend to be less interesting candidates for complete
-	 unrolling unless we can do a lot of propagation into the inner loop
-	 body.  For now we disable outer loop unrolling when the code would
-	 grow.  */
-      else if (loop->inner)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: "
-		     "it is not innermost and code would grow.\n",
-		     loop->num);
-	  return false;
-	}
-      /* If there is call on a hot path through the loop, then
-	 there is most probably not much to optimize.  */
-      else if (size.num_non_pure_calls_on_hot_path)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: "
-		     "contains call and code would grow.\n",
-		     loop->num);
-	  return false;
-	}
-      /* If there is pure/const call in the function, then we
-	 can still optimize the unrolled loop body if it contains
-	 some other interesting code than the calls and code
-	 storing or cumulating the return value.  */
-      else if (size.num_pure_calls_on_hot_path
-	       /* One IV increment, one test, one ivtmp store
-		  and one useful stmt.  That is about minimal loop
-		  doing pure call.  */
-	       && (size.non_call_stmts_on_hot_path
-		   <= 3 + size.num_pure_calls_on_hot_path))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: "
-		     "contains just pure calls and code would grow.\n",
-		     loop->num);
-	  return false;
-	}
-      /* Complete unrolling is a major win when control flow is removed and
-	 one big basic block is created.  If the loop contains control flow
-	 the optimization may still be a win because of eliminating the loop
-	 overhead but it also may blow the branch predictor tables.
-	 Limit number of branches on the hot path through the peeled
-	 sequence.  */
-      else if (size.num_branches_on_hot_path * (int)n_unroll
-	       > PARAM_VALUE (PARAM_MAX_PEEL_BRANCHES))
-	{
+	  unsigned HOST_WIDE_INT ninsns = size.overall;
+	  unsigned HOST_WIDE_INT unr_insns
+	    = estimated_unrolled_size (&size, n_unroll);
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: "
-		     " number of branches on hot path in the unrolled sequence"
-		     " reach --param max-peel-branches limit.\n",
-		     loop->num);
-	  return false;
-	}
-      else if (unr_insns
-	       > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Not unrolling loop %d: "
-		     "(--param max-completely-peeled-insns limit reached).\n",
-		     loop->num);
-	  return false;
+	    {
+	      fprintf (dump_file, "  Loop size: %d\n", (int) ninsns);
+	      fprintf (dump_file, "  Estimated size after unrolling: %d\n",
+		       (int) unr_insns);
+	    }
+
+	  /* If the code is going to shrink, we don't need to be extra
+	     cautious on guessing if the unrolling is going to be
+	     profitable.  */
+	  if (unr_insns
+	      /* If there is IV variable that will become constant, we
+		 save one instruction in the loop prologue we do not
+		 account otherwise.  */
+	      <= ninsns + (size.constant_iv != false))
+	    ;
+	  /* We unroll only inner loops, because we do not consider it
+	     profitable otheriwse.  We still can cancel loopback edge
+	     of not rolling loop; this is always a good idea.  */
+	  else if (ul == UL_NO_GROWTH)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: size would grow.\n",
+			 loop->num);
+	      return false;
+	    }
+	  /* Outer loops tend to be less interesting candidates for
+	     complete unrolling unless we can do a lot of propagation
+	     into the inner loop body.  For now we disable outer loop
+	     unrolling when the code would grow.  */
+	  else if (loop->inner)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: "
+			 "it is not innermost and code would grow.\n",
+			 loop->num);
+	      return false;
+	    }
+	  /* If there is call on a hot path through the loop, then
+	     there is most probably not much to optimize.  */
+	  else if (size.num_non_pure_calls_on_hot_path)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: "
+			 "contains call and code would grow.\n",
+			 loop->num);
+	      return false;
+	    }
+	  /* If there is pure/const call in the function, then we can
+	     still optimize the unrolled loop body if it contains some
+	     other interesting code than the calls and code storing or
+	     cumulating the return value.  */
+	  else if (size.num_pure_calls_on_hot_path
+		   /* One IV increment, one test, one ivtmp store and
+		      one useful stmt.  That is about minimal loop
+		      doing pure call.  */
+		   && (size.non_call_stmts_on_hot_path
+		       <= 3 + size.num_pure_calls_on_hot_path))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: "
+			 "contains just pure calls and code would grow.\n",
+			 loop->num);
+	      return false;
+	    }
+	  /* Complete unrolling is major win when control flow is
+	     removed and one big basic block is created.  If the loop
+	     contains control flow the optimization may still be a win
+	     because of eliminating the loop overhead but it also may
+	     blow the branch predictor tables.  Limit number of
+	     branches on the hot path through the peeled sequence.  */
+	  else if (size.num_branches_on_hot_path * (int)n_unroll
+		   > PARAM_VALUE (PARAM_MAX_PEEL_BRANCHES))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: "
+			 "number of branches on hot path in the unrolled "
+			 "sequence reaches --param max-peel-branches limit.\n",
+			 loop->num);
+	      return false;
+	    }
+	  else if (unr_insns
+		   > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Not unrolling loop %d: "
+			 "number of insns in the unrolled sequence reaches "
+			 "--param max-completely-peeled-insns limit.\n",
+			 loop->num);
+	      return false;
+	    }
 	}
-      if (!n_unroll)
-        dump_printf_loc (report_flags, locus,
-                         "loop turned into non-loop; it never loops.\n");
 
       initialize_original_copy_tables ();
       auto_sbitmap wont_exit (n_unroll + 1);
@@ -898,8 +913,8 @@  try_unroll_loop_completely (struct loop
       else
 	gimple_cond_make_true (cond);
       update_stmt (cond);
-      /* Do not remove the path. Doing so may remove outer loop
-	 and confuse bookkeeping code in tree_unroll_loops_completelly.  */
+      /* Do not remove the path, as doing so may remove outer loop and
+	 confuse bookkeeping code in tree_unroll_loops_completely.  */
     }
 
   /* Store the loop for later unlooping and exit removal.  */
@@ -915,7 +930,7 @@  try_unroll_loop_completely (struct loop
         {
           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS, locus,
                            "loop with %d iterations completely unrolled",
-			   (int) (n_unroll + 1));
+			   (int) n_unroll);
           if (loop->header->count.initialized_p ())
             dump_printf (MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS,
                          " (header execution count %d)",
@@ -963,7 +978,8 @@  try_peel_loop (struct loop *loop,
   struct loop_size size;
   int peeled_size;
 
-  if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0
+  if (!flag_peel_loops
+      || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0
       || !peeled_loops)
     return false;
 
@@ -974,20 +990,29 @@  try_peel_loop (struct loop *loop,
       return false;
     }
 
+  /* We don't peel loops that will be unrolled as this can duplicate a
+     loop more times than the user requested.  */
+  if (loop->unroll)
+    {
+      if (dump_file)
+        fprintf (dump_file, "Not peeling: user didn't want it peeled.\n");
+      return false;
+    }
+
   /* Peel only innermost loops.
      While the code is perfectly capable of peeling non-innermost loops,
      the heuristics would probably need some improvements. */
   if (loop->inner)
     {
       if (dump_file)
-        fprintf (dump_file, "Not peeling: outer loop\n");
+	fprintf (dump_file, "Not peeling: outer loop\n");
       return false;
     }
 
   if (!optimize_loop_for_speed_p (loop))
     {
       if (dump_file)
-        fprintf (dump_file, "Not peeling: cold loop\n");
+	fprintf (dump_file, "Not peeling: cold loop\n");
       return false;
     }
 
@@ -1005,7 +1030,7 @@  try_peel_loop (struct loop *loop,
   if (maxiter >= 0 && maxiter <= npeel)
     {
       if (dump_file)
-        fprintf (dump_file, "Not peeling: upper bound is known so can "
+	fprintf (dump_file, "Not peeling: upper bound is known so can "
 		 "unroll completely\n");
       return false;
     }
@@ -1016,7 +1041,7 @@  try_peel_loop (struct loop *loop,
   if (npeel > PARAM_VALUE (PARAM_MAX_PEEL_TIMES) - 1)
     {
       if (dump_file)
-        fprintf (dump_file, "Not peeling: rolls too much "
+	fprintf (dump_file, "Not peeling: rolls too much "
 		 "(%i + 1 > --param max-peel-times)\n", (int) npeel);
       return false;
     }
@@ -1029,7 +1054,7 @@  try_peel_loop (struct loop *loop,
       > PARAM_VALUE (PARAM_MAX_PEELED_INSNS))
     {
       if (dump_file)
-        fprintf (dump_file, "Not peeling: peeled sequence size is too large "
+	fprintf (dump_file, "Not peeling: peeled sequence size is too large "
 		 "(%i insns > --param max-peel-insns)", peeled_size);
       return false;
     }
@@ -1317,7 +1342,9 @@  tree_unroll_loops_completely_1 (bool may
   if (!loop_father)
     return false;
 
-  if (may_increase_size && optimize_loop_nest_for_speed_p (loop)
+  if (loop->unroll > 1)
+    ul = UL_ALL;
+  else if (may_increase_size && optimize_loop_nest_for_speed_p (loop)
       /* Unroll outermost loops only if asked to do so or they do
 	 not cause code growth.  */
       && (unroll_outer || loop_outer (loop_father)))
@@ -1566,7 +1593,7 @@  public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *) { return optimize >= 2; }
+  virtual bool gate (function *) { return optimize >= 2 || cfun->has_unroll; }
   virtual unsigned int execute (function *);
 
 }; // class pass_complete_unrolli
Index: tree.def
===================================================================
--- tree.def	(revision 254797)
+++ tree.def	(working copy)
@@ -1410,8 +1410,9 @@  DEFTREECODE (TARGET_OPTION_NODE, "target
 
 /* ANNOTATE_EXPR.
    Operand 0 is the expression to be annotated.
-   Operand 1 is the annotation kind.  */
-DEFTREECODE (ANNOTATE_EXPR, "annotate_expr", tcc_expression, 2)
+   Operand 1 is the annotation kind.
+   Operand 2 is additional data.  */
+DEFTREECODE (ANNOTATE_EXPR, "annotate_expr", tcc_expression, 3)
 
 /* Cilk spawn statement
    Operand 0 is the CALL_EXPR.  */