diff mbox series

Fix PR68212, unrolled loop no longer aligned

Message ID fa6027a5-f4b5-9244-931c-4af23dc275e2@linux.ibm.com
State New
Headers show
Series Fix PR68212, unrolled loop no longer aligned | expand

Commit Message

Pat Haugen Nov. 28, 2018, 9:28 p.m. UTC
The following patch fixes the case where unrolling in the absence of profile information can cause a loop to no longer look hot and therefor not get aligned. In this case, instead of dividing by the unroll factor we now just scale by profile_probability::likely (). The diff looks worse than what really changed, just the addition of an if test and putting the existing 'scalemain' setting code into the else leg. Bootstrap/regtest on powerpc64le with no new regressions. I also ran a CPU2006 comparison, there were 4 benchmarks that improved 2-3% with the others in the noise range.

Ok for trunk?

-Pat



2018-11-28  Pat Haugen  <pthaugen@us.ibm.com>

        PR rtl-optimization/68212
        * cfgloopmanip.c (duplicate_loop_to_header_edge): Adjust scale factor.

testsuite/ChangeLog:
2018-11-28  Pat Haugen  <pthaugen@us.ibm.com>

        PR rtl-optimization/68212
        * gcc.dg/pr68212.c: New test.

Comments

Jan Hubicka Nov. 28, 2018, 11:01 p.m. UTC | #1
> 2018-11-28  Pat Haugen  <pthaugen@us.ibm.com>
> 
>         PR rtl-optimization/68212
>         * cfgloopmanip.c (duplicate_loop_to_header_edge): Adjust scale factor.
> 
> testsuite/ChangeLog:
> 2018-11-28  Pat Haugen  <pthaugen@us.ibm.com>
> 
>         PR rtl-optimization/68212
>         * gcc.dg/pr68212.c: New test.
> 
> 
> 
> Index: gcc/cfgloopmanip.c
> ===================================================================
> --- gcc/cfgloopmanip.c	(revision 266522)
> +++ gcc/cfgloopmanip.c	(working copy)
> @@ -1242,16 +1242,25 @@ duplicate_loop_to_header_edge (struct lo
>  	  profile_probability prob_pass_main = bitmap_bit_p (wont_exit, 0)
>  							? prob_pass_wont_exit
>  							: prob_pass_thru;
> -	  profile_probability p = prob_pass_main;
> -	  profile_count scale_main_den = count_in;
> -	  for (i = 0; i < ndupl; i++)
> +
> +	  /* If not using real profile data then don't scale the loop by ndupl.
> +	     This can lead to the loop no longer looking hot wrt surrounding
> +	     code.  */
> +	  if (profile_status_for_fn (cfun) == PROFILE_GUESSED)
> +	    scale_main = profile_probability::likely ();

profile_status_for_fn is not very informative when you inline FDO to
non-FDO code (via LTO or when profile was lost for COMDAT).
You want to test count_in.reliable_p () which will return true for FDO
profiles but not for guessed profiles and Auto-FDO.

If we know number of iterations at compile time (which is relatively
frequent for unrolled loops) we actually put in correct profile with a
cap given by --param max-predicted-iterations (which may make sense to
get higher).  In that case we probably do not want to drop this
information from profile.

Can't we get to scale only if the profile looks wrong - count_in
gets unrealistically small compared to cfgloop expected number of
iterations info says (or when it is absent)?

Honza
diff mbox series

Patch

Index: gcc/cfgloopmanip.c
===================================================================
--- gcc/cfgloopmanip.c	(revision 266522)
+++ gcc/cfgloopmanip.c	(working copy)
@@ -1242,16 +1242,25 @@  duplicate_loop_to_header_edge (struct lo
 	  profile_probability prob_pass_main = bitmap_bit_p (wont_exit, 0)
 							? prob_pass_wont_exit
 							: prob_pass_thru;
-	  profile_probability p = prob_pass_main;
-	  profile_count scale_main_den = count_in;
-	  for (i = 0; i < ndupl; i++)
+
+	  /* If not using real profile data then don't scale the loop by ndupl.
+	     This can lead to the loop no longer looking hot wrt surrounding
+	     code.  */
+	  if (profile_status_for_fn (cfun) == PROFILE_GUESSED)
+	    scale_main = profile_probability::likely ();
+	  else
 	    {
-	      scale_main_den += count_in.apply_probability (p);
-	      p = p * scale_step[i];
+	      profile_probability p = prob_pass_main;
+	      profile_count scale_main_den = count_in;
+	      for (i = 0; i < ndupl; i++)
+		{
+		  scale_main_den += count_in.apply_probability (p);
+		  p = p * scale_step[i];
+		}
+	      /* If original loop is executed COUNT_IN times, the unrolled
+		 loop will account SCALE_MAIN_DEN times.  */
+	      scale_main = count_in.probability_in (scale_main_den);
 	    }
-	  /* If original loop is executed COUNT_IN times, the unrolled
-	     loop will account SCALE_MAIN_DEN times.  */
-	  scale_main = count_in.probability_in (scale_main_den);
 	  scale_act = scale_main * prob_pass_main;
 	}
       else
Index: gcc/testsuite/gcc.dg/pr68212.c
===================================================================
--- gcc/testsuite/gcc.dg/pr68212.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/pr68212.c	(working copy)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param max-unroll-times=4 -fdump-rtl-alignments" } */
+
+void foo(long int *a, long int *b, long int n)
+{
+  long int i;
+
+  for (i = 0; i < n; i++)
+    a[i] = *b;
+}
+
+/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 "alignments"} } */