diff mbox

[Loop,optimizer] : Add logic to disable certain loop optimizations on pre-/post-loops

Message ID D4C76825A6780047854A11E93CDE84D004C176842F@SAUSEXMBP01.amd.com
State New
Headers show

Commit Message

Fang, Changpeng Jan. 4, 2011, 10:30 p.m. UTC
Hi,

Thanks, Jack. Hopefully the 6% protein gain (for -m64) is not just noise. 

I updated the patch based on the current trunk (REV 168477).

Is it OK to commit now?

Thanks,

Changpeng

Comments

Richard Biener Jan. 5, 2011, 11:13 a.m. UTC | #1
On Tue, Jan 4, 2011 at 11:30 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi,
>
> Thanks, Jack. Hopefully the 6% protein gain (for -m64) is not just noise.
>
> I updated the patch based on the current trunk (REV 168477).
>
> Is it OK to commit now?

The patch doesn't contain a testcase nor a reference to a bug so isn't
appropriate
for stage4 (or even stage3).  Also using BB flags for this doesn't
sound quite right
as you'd have to make sure CFG manipulations properly copy/unset this flag
(which is likely the reason for the polyhedron regressions seen).

That said, I don't like the patch too much anyway, it looks like a hack.

The proper fix is to finally go the way of preserving loop information
and putting
such flag in the loop information (or well, just manually setting the
upper bound
for the number of iterations therein).

Please postpone this for stage1 of GCC 4.7.  Thanks.

Richard.

> Thanks,
>
> Changpeng
>
>
>
> ________________________________________
> From: Jack Howarth [howarth@bromo.med.uc.edu]
> Sent: Monday, January 03, 2011 9:04 PM
> To: Fang, Changpeng
> Cc: Zdenek Dvorak; Richard Guenther; Xinliang David Li; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops
>
> On Fri, Dec 17, 2010 at 01:14:49AM -0600, Fang, Changpeng wrote:
>> Hi, Jack:
>>
>> Thanks for the testing.
>>
>> This patch is not supposed to slow down a program by 10% (rnflow and test_fpu).
>> It would be helpful if you can provide analysis why they are slowed down.
>
> Changpeng,
>   The corrected merge against gcc trunk of...
>
> Index: gcc/basic-block.h
> ===================================================================
> --- gcc/basic-block.h   (revision 168437)
> +++ gcc/basic-block.h   (working copy)
> @@ -247,11 +247,14 @@
>      Only used in cfgcleanup.c.  */
>   BB_NONTHREADABLE_BLOCK = 1 << 11,
>
> +  /* Set on blocks that are headers of non-rolling loops.  */
> +  BB_HEADER_OF_NONROLLING_LOOP = 1 << 12,
> +
>   /* Set on blocks that were modified in some way.  This bit is set in
>      df_set_bb_dirty, but not cleared by df_analyze, so it can be used
>      to test whether a block has been modified prior to a df_analyze
>      call.  */
> -  BB_MODIFIED = 1 << 12
> +  BB_MODIFIED = 1 << 13
>  };
>
>  /* Dummy flag for convenience in the hot/cold partitioning code.  */
>
> for the proposed patch from http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01344.html
> eliminated the performance regressions on x86_64-apple-darwin10. I now get...
>
> Compile Command : gfortran -ffast-math -funroll-loops -O3 %n.f90 -o %n
>
>                   Execution Time
> -m32
>                 stock       patched  %increase
> ac               10.59       10.59      0.0
> aermod           19.49       19.13     -1.8
> air               6.07        6.07      0.0
> capacita         44.60       44.61      0.0
> channel           1.98        1.98      0.0
> doduc            31.19       31.31      0.4
> fatigue           9.90       10.29      3.9
> gas_dyn           4.72        4.71     -0.2
> induct           13.93       13.93      0.0
> linpk            15.50       15.49     -0.1
> mdbx             11.28       11.26     -0.2
> nf               27.62       27.58     -0.1
> protein          38.70       38.60     -0.3
> rnflow           24.68       24.68      0.0
> test_fpu         10.13       10.13      0.0
> tfft              1.92        1.92      0.0
>
> Geometric Mean   12.06       12.08      0.2
> Execution Time
>
> -m64
>                 stock       patched   %increase
> ac                8.80        8.80      0.0
> aermod           17.34       17.17     -1.0
> air               5.48        5.52      0.7
> capacita         32.38       32.50      0.4
> channel           1.84        1.84      0.0
> doduc            26.50       26.52      0.1
> fatigue           8.35        8.33     -0.2
> gas_dyn           4.30        4.29     -0.2
> induct           12.83       12.83      0.0
> linpk            15.49       15.49      0.0
> mdbx             11.23       11.22     -0.1
> nf               30.21       30.16     -0.2
> protein          34.13       32.07     -6.0
> rnflow           23.18       23.19      0.0
> test_fpu          8.04        8.02     -0.2
> tfft              1.87        1.86     -0.5
>
> Geometric Mean   10.87       10.82     -0.5
> Execution Time
>
Jack Howarth Jan. 5, 2011, 2:10 p.m. UTC | #2
On Tue, Jan 04, 2011 at 04:30:38PM -0600, Fang, Changpeng wrote:
> Hi,
> 
> Thanks, Jack. Hopefully the 6% protein gain (for -m64) is not just noise. 
> 
> I updated the patch based on the current trunk (REV 168477).
> 
> Is it OK to commit now?

I can't approve it but the benchmarks now seem reasonable with
the patch (although you might want to investigate why fatigue 
runs 3.9% slower at -m32).

> 
> Thanks,
> 
> Changpeng
> 
> 
> 
> ________________________________________
> From: Jack Howarth [howarth@bromo.med.uc.edu]
> Sent: Monday, January 03, 2011 9:04 PM
> To: Fang, Changpeng
> Cc: Zdenek Dvorak; Richard Guenther; Xinliang David Li; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops
> 
> On Fri, Dec 17, 2010 at 01:14:49AM -0600, Fang, Changpeng wrote:
> > Hi, Jack:
> >
> > Thanks for the testing.
> >
> > This patch is not supposed to slow down a program by 10% (rnflow and test_fpu).
> > It would be helpful if you can provide analysis why they are slowed down.
> 
> Changpeng,
>    The corrected merge against gcc trunk of...
> 
> Index: gcc/basic-block.h
> ===================================================================
> --- gcc/basic-block.h   (revision 168437)
> +++ gcc/basic-block.h   (working copy)
> @@ -247,11 +247,14 @@
>       Only used in cfgcleanup.c.  */
>    BB_NONTHREADABLE_BLOCK = 1 << 11,
> 
> +  /* Set on blocks that are headers of non-rolling loops.  */
> +  BB_HEADER_OF_NONROLLING_LOOP = 1 << 12,
> +
>    /* Set on blocks that were modified in some way.  This bit is set in
>       df_set_bb_dirty, but not cleared by df_analyze, so it can be used
>       to test whether a block has been modified prior to a df_analyze
>       call.  */
> -  BB_MODIFIED = 1 << 12
> +  BB_MODIFIED = 1 << 13
>  };
> 
>  /* Dummy flag for convenience in the hot/cold partitioning code.  */
> 
> for the proposed patch from http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01344.html
> eliminated the performance regressions on x86_64-apple-darwin10. I now get...
> 
> Compile Command : gfortran -ffast-math -funroll-loops -O3 %n.f90 -o %n
> 
>                    Execution Time
> -m32
>                  stock       patched  %increase
> ac               10.59       10.59      0.0
> aermod           19.49       19.13     -1.8
> air               6.07        6.07      0.0
> capacita         44.60       44.61      0.0
> channel           1.98        1.98      0.0
> doduc            31.19       31.31      0.4
> fatigue           9.90       10.29      3.9
> gas_dyn           4.72        4.71     -0.2
> induct           13.93       13.93      0.0
> linpk            15.50       15.49     -0.1
> mdbx             11.28       11.26     -0.2
> nf               27.62       27.58     -0.1
> protein          38.70       38.60     -0.3
> rnflow           24.68       24.68      0.0
> test_fpu         10.13       10.13      0.0
> tfft              1.92        1.92      0.0
> 
> Geometric Mean   12.06       12.08      0.2
> Execution Time
> 
> -m64
>                  stock       patched   %increase
> ac                8.80        8.80      0.0
> aermod           17.34       17.17     -1.0
> air               5.48        5.52      0.7
> capacita         32.38       32.50      0.4
> channel           1.84        1.84      0.0
> doduc            26.50       26.52      0.1
> fatigue           8.35        8.33     -0.2
> gas_dyn           4.30        4.29     -0.2
> induct           12.83       12.83      0.0
> linpk            15.49       15.49      0.0
> mdbx             11.23       11.22     -0.1
> nf               30.21       30.16     -0.2
> protein          34.13       32.07     -6.0
> rnflow           23.18       23.19      0.0
> test_fpu          8.04        8.02     -0.2
> tfft              1.87        1.86     -0.5
> 
> Geometric Mean   10.87       10.82     -0.5
> Execution Time

Content-Description: 0001-Consider-a-loop-not-hot-if-it-rolls-only-a-few-times.patch
> From 7d10fcfe379e3fe4387cf192fbbdbea7daff0637 Mon Sep 17 00:00:00 2001
> From: Changpeng Fang <chfang@houghton.(none)>
> Date: Tue, 4 Jan 2011 14:36:50 -0800
> Subject: [PATCH] Consider a loop not hot if it rolls only a few times (non-rolling)
> 
> 	* basic-block.h (bb_flags): Add a new flag BB_HEADER_OF_NONROLLING_LOOP.
> 	* cfg.c (clear_bb_flags): Keep BB_HEADER_OF_NONROLLING marker.
> 	* cfgloop.h (mark_non_rolling_loop): New function declaration.
> 	  (non_rolling_loop_p): New function declaration.
> 	* predict.c (optimize_loop_for_size_p): Return true if the loop was marked
> 	  NON-ROLLING.  (optimize_loop_for_speed_p): Return false if the loop was
> 	  marked NON-ROLLING.
> 	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the
> 	  non-rolling loop.
> 	* tree-ssa-loop-niter.c (mark_non_rolling_loop): Implement the new
> 	  function.  (non_rolling_loop_p): Implement the new function.
> 	* tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the
> 	  non-rolling loop.  (vect_do_peeling_for_alignment): Mark the non-rolling
> 	  loop.
> ---
>  gcc/basic-block.h          |    5 ++++-
>  gcc/cfg.c                  |    7 ++++---
>  gcc/cfgloop.h              |    2 ++
>  gcc/predict.c              |    6 ++++++
>  gcc/tree-ssa-loop-manip.c  |    3 +++
>  gcc/tree-ssa-loop-niter.c  |   20 ++++++++++++++++++++
>  gcc/tree-vect-loop-manip.c |    8 ++++++++
>  7 files changed, 47 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/basic-block.h b/gcc/basic-block.h
> index 3594eea..081c175 100644
> --- a/gcc/basic-block.h
> +++ b/gcc/basic-block.h
> @@ -251,7 +251,10 @@ enum bb_flags
>       df_set_bb_dirty, but not cleared by df_analyze, so it can be used
>       to test whether a block has been modified prior to a df_analyze
>       call.  */
> -  BB_MODIFIED = 1 << 12
> +  BB_MODIFIED = 1 << 12,
> +
> + /* Set on blocks that are headers of non-rolling loops.  */
> +  BB_HEADER_OF_NONROLLING_LOOP = 1 << 13
>  };
>  
>  /* Dummy flag for convenience in the hot/cold partitioning code.  */
> diff --git a/gcc/cfg.c b/gcc/cfg.c
> index c8ef799..e59a637 100644
> --- a/gcc/cfg.c
> +++ b/gcc/cfg.c
> @@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred)
>    connect_src (e);
>  }
>  
> -/* Clear all basic block flags, with the exception of partitioning and
> -   setjmp_target.  */
> +/* Clear all basic block flags, with the exception of partitioning,
> +   setjmp_target, and the non-rolling loop marker.  */
>  void
>  clear_bb_flags (void)
>  {
> @@ -434,7 +434,8 @@ clear_bb_flags (void)
>  
>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>      bb->flags = (BB_PARTITION (bb)
> -		 | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET)));
> +		 | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET
> +				 + BB_HEADER_OF_NONROLLING_LOOP)));
>  }
>  
>  /* Check the consistency of profile information.  We can't do that
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index f7bb134..0f48115 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx);
>  void estimate_numbers_of_iterations_loop (struct loop *, bool);
>  HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool);
>  bool estimated_loop_iterations (struct loop *, bool, double_int *);
> +void mark_non_rolling_loop (struct loop *);
> +bool non_rolling_loop_p (struct loop *);
>  
>  /* Loop manipulation.  */
>  extern bool can_duplicate_loop_p (const struct loop *loop);
> diff --git a/gcc/predict.c b/gcc/predict.c
> index a86708a..34d7dff 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -280,6 +280,9 @@ optimize_insn_for_speed_p (void)
>  bool
>  optimize_loop_for_size_p (struct loop *loop)
>  {
> +  /* Loops marked NON-ROLLING are not likely to be hot.  */
> +  if (non_rolling_loop_p (loop))
> +    return true;
>    return optimize_bb_for_size_p (loop->header);
>  }
>  
> @@ -288,6 +291,9 @@ optimize_loop_for_size_p (struct loop *loop)
>  bool
>  optimize_loop_for_speed_p (struct loop *loop)
>  {
> +  /* Loops marked NON-ROLLING are not likely to be hot.  */
> +  if (non_rolling_loop_p (loop))
> +    return false;
>    return optimize_bb_for_speed_p (loop->header);
>  }
>  
> diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> index 8176ed8..3ff0790 100644
> --- a/gcc/tree-ssa-loop-manip.c
> +++ b/gcc/tree-ssa-loop-manip.c
> @@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor,
>    gcc_assert (new_loop != NULL);
>    update_ssa (TODO_update_ssa);
>  
> +  /* NEW_LOOP is a non-rolling loop.  */
> +  mark_non_rolling_loop (new_loop);
> +
>    /* Determine the probability of the exit edge of the unrolled loop.  */
>    new_est_niter = est_niter / factor;
>  
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index ee85f6f..ec108c2 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p)
>    fold_undefer_and_ignore_overflow_warnings ();
>  }
>  
> +/* Mark LOOP as a non-rolling loop.  */
> +
> +void
> +mark_non_rolling_loop (struct loop *loop)
> +{
> +  gcc_assert (loop && loop->header);
> +  loop->header->flags |= BB_HEADER_OF_NONROLLING_LOOP;
> +}
> +
> +/* Return true if LOOP is a non-rolling loop.  */
> +
> +bool
> +non_rolling_loop_p (struct loop *loop)
> +{
> +  int masked_flags;
> +  gcc_assert (loop && loop->header);
> +  masked_flags = (loop->header->flags & BB_HEADER_OF_NONROLLING_LOOP);
> + return (masked_flags != 0);
> +}
> +
>  /* Returns true if statement S1 dominates statement S2.  */
>  
>  bool
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 28b75f1..9bbe68b 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
>  					    cond_expr, cond_expr_stmt_list);
>    gcc_assert (new_loop);
>    gcc_assert (loop_num == loop->num);
> +
> +  /* NEW_LOOP is a non-rolling loop.  */
> +  mark_non_rolling_loop (new_loop);
> +
>  #ifdef ENABLE_CHECKING
>    slpeel_verify_cfg_after_peeling (loop, new_loop);
>  #endif
> @@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo)
>  				   th, true, NULL_TREE, NULL);
>  
>    gcc_assert (new_loop);
> +
> +  /* NEW_LOOP is a non-rolling loop.  */
> +  mark_non_rolling_loop (new_loop);
> +
>  #ifdef ENABLE_CHECKING
>    slpeel_verify_cfg_after_peeling (new_loop, loop);
>  #endif
> -- 
> 1.6.3.3
>
diff mbox

Patch

From 7d10fcfe379e3fe4387cf192fbbdbea7daff0637 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@houghton.(none)>
Date: Tue, 4 Jan 2011 14:36:50 -0800
Subject: [PATCH] Consider a loop not hot if it rolls only a few times (non-rolling)

	* basic-block.h (bb_flags): Add a new flag BB_HEADER_OF_NONROLLING_LOOP.
	* cfg.c (clear_bb_flags): Keep BB_HEADER_OF_NONROLLING marker.
	* cfgloop.h (mark_non_rolling_loop): New function declaration.
	  (non_rolling_loop_p): New function declaration.
	* predict.c (optimize_loop_for_size_p): Return true if the loop was marked
	  NON-ROLLING.  (optimize_loop_for_speed_p): Return false if the loop was
	  marked NON-ROLLING.
	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the
	  non-rolling loop.
	* tree-ssa-loop-niter.c (mark_non_rolling_loop): Implement the new
	  function.  (non_rolling_loop_p): Implement the new function.
	* tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the
	  non-rolling loop.  (vect_do_peeling_for_alignment): Mark the non-rolling
	  loop.
---
 gcc/basic-block.h          |    5 ++++-
 gcc/cfg.c                  |    7 ++++---
 gcc/cfgloop.h              |    2 ++
 gcc/predict.c              |    6 ++++++
 gcc/tree-ssa-loop-manip.c  |    3 +++
 gcc/tree-ssa-loop-niter.c  |   20 ++++++++++++++++++++
 gcc/tree-vect-loop-manip.c |    8 ++++++++
 7 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index 3594eea..081c175 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -251,7 +251,10 @@  enum bb_flags
      df_set_bb_dirty, but not cleared by df_analyze, so it can be used
      to test whether a block has been modified prior to a df_analyze
      call.  */
-  BB_MODIFIED = 1 << 12
+  BB_MODIFIED = 1 << 12,
+
+ /* Set on blocks that are headers of non-rolling loops.  */
+  BB_HEADER_OF_NONROLLING_LOOP = 1 << 13
 };
 
 /* Dummy flag for convenience in the hot/cold partitioning code.  */
diff --git a/gcc/cfg.c b/gcc/cfg.c
index c8ef799..e59a637 100644
--- a/gcc/cfg.c
+++ b/gcc/cfg.c
@@ -425,8 +425,8 @@  redirect_edge_pred (edge e, basic_block new_pred)
   connect_src (e);
 }
 
-/* Clear all basic block flags, with the exception of partitioning and
-   setjmp_target.  */
+/* Clear all basic block flags, with the exception of partitioning,
+   setjmp_target, and the non-rolling loop marker.  */
 void
 clear_bb_flags (void)
 {
@@ -434,7 +434,8 @@  clear_bb_flags (void)
 
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     bb->flags = (BB_PARTITION (bb)
-		 | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET)));
+		 | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET
+				 + BB_HEADER_OF_NONROLLING_LOOP)));
 }
 
 /* Check the consistency of profile information.  We can't do that
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index f7bb134..0f48115 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -279,6 +279,8 @@  extern rtx doloop_condition_get (rtx);
 void estimate_numbers_of_iterations_loop (struct loop *, bool);
 HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool);
 bool estimated_loop_iterations (struct loop *, bool, double_int *);
+void mark_non_rolling_loop (struct loop *);
+bool non_rolling_loop_p (struct loop *);
 
 /* Loop manipulation.  */
 extern bool can_duplicate_loop_p (const struct loop *loop);
diff --git a/gcc/predict.c b/gcc/predict.c
index a86708a..34d7dff 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -280,6 +280,9 @@  optimize_insn_for_speed_p (void)
 bool
 optimize_loop_for_size_p (struct loop *loop)
 {
+  /* Loops marked NON-ROLLING are not likely to be hot.  */
+  if (non_rolling_loop_p (loop))
+    return true;
   return optimize_bb_for_size_p (loop->header);
 }
 
@@ -288,6 +291,9 @@  optimize_loop_for_size_p (struct loop *loop)
 bool
 optimize_loop_for_speed_p (struct loop *loop)
 {
+  /* Loops marked NON-ROLLING are not likely to be hot.  */
+  if (non_rolling_loop_p (loop))
+    return false;
   return optimize_bb_for_speed_p (loop->header);
 }
 
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 8176ed8..3ff0790 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -931,6 +931,9 @@  tree_transform_and_unroll_loop (struct loop *loop, unsigned factor,
   gcc_assert (new_loop != NULL);
   update_ssa (TODO_update_ssa);
 
+  /* NEW_LOOP is a non-rolling loop.  */
+  mark_non_rolling_loop (new_loop);
+
   /* Determine the probability of the exit edge of the unrolled loop.  */
   new_est_niter = est_niter / factor;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index ee85f6f..ec108c2 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -3011,6 +3011,26 @@  estimate_numbers_of_iterations (bool use_undefined_p)
   fold_undefer_and_ignore_overflow_warnings ();
 }
 
+/* Mark LOOP as a non-rolling loop.  */
+
+void
+mark_non_rolling_loop (struct loop *loop)
+{
+  gcc_assert (loop && loop->header);
+  loop->header->flags |= BB_HEADER_OF_NONROLLING_LOOP;
+}
+
+/* Return true if LOOP is a non-rolling loop.  */
+
+bool
+non_rolling_loop_p (struct loop *loop)
+{
+  int masked_flags;
+  gcc_assert (loop && loop->header);
+  masked_flags = (loop->header->flags & BB_HEADER_OF_NONROLLING_LOOP);
+ return (masked_flags != 0);
+}
+
 /* Returns true if statement S1 dominates statement S2.  */
 
 bool
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 28b75f1..9bbe68b 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1938,6 +1938,10 @@  vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
 					    cond_expr, cond_expr_stmt_list);
   gcc_assert (new_loop);
   gcc_assert (loop_num == loop->num);
+
+  /* NEW_LOOP is a non-rolling loop.  */
+  mark_non_rolling_loop (new_loop);
+
 #ifdef ENABLE_CHECKING
   slpeel_verify_cfg_after_peeling (loop, new_loop);
 #endif
@@ -2191,6 +2195,10 @@  vect_do_peeling_for_alignment (loop_vec_info loop_vinfo)
 				   th, true, NULL_TREE, NULL);
 
   gcc_assert (new_loop);
+
+  /* NEW_LOOP is a non-rolling loop.  */
+  mark_non_rolling_loop (new_loop);
+
 #ifdef ENABLE_CHECKING
   slpeel_verify_cfg_after_peeling (new_loop, loop);
 #endif
-- 
1.6.3.3