diff mbox series

Loop-split improvements, part 2

Message ID ZMN1DOizIGX8AdGF@kam.mff.cuni.cz
State New
Headers show
Series Loop-split improvements, part 2 | expand

Commit Message

Jan Hubicka July 28, 2023, 7:58 a.m. UTC
Hi,
this patch fixes profile update in the first case of loop splitting.
The pass still gives up on very basic testcases:

__attribute__ ((noinline,noipa))
void test1 (int n)
{
  if (n <= 0 || n > 100000)
    return; 
  for (int i = 0; i <= n; i++)
    {
      if (i < n)
	do_something ();
      if (a[i])
	do_something2();
    }
}

Here I needed to do the conditoinal that enforces sane value range of n.
The reason is that it gives up on:
      !number_of_iterations_exit (loop1, exit1, &niter, false, true)
and without the conditonal we get assumption that n>=0 and not INT_MAX.
I think from overflow we shold derive that INT_MAX test is not needed and since
the loop does nothing for n<0 it is also just an paranoia.

I am not sure how to fix this though :(.  In general the pass does not really
need to compute iteration count.  It only needs to know what direction the IVs
go so it can detect tests that fires in first part of iteration space.

Rich, any idea what the correct test should be?

In testcase:
  for (int i = 0; i < 200; i++)
    if (i < 150)
      do_something ();
    else
      do_something2 ();
the old code did wrong update of the exit condition probabilities.
We know that first loop iterates 150 times and the second loop 50 times
and we get it by simply scaling loop body by the probability of inner test.

With the patch we now get:

  <bb 2> [count: 1000]:

  <bb 3> [count: 150000]:    <- loop 1 correctly iterates 149 times
  # i_10 = PHI <i_7(8), 0(2)>
  do_something ();
  i_7 = i_10 + 1;
  if (i_7 <= 149)
    goto <bb 8>; [99.33%]
  else
    goto <bb 17>; [0.67%]

  <bb 8> [count: 149000]:
  goto <bb 3>; [100.00%]

  <bb 16> [count: 1000]:
  # i_15 = PHI <i_18(17)>

  <bb 9> [count: 49975]:    <- loop 2 should iterate 50 times but
			       we are slightly wrong
  # i_3 = PHI <i_15(16), i_14(13)>
  do_something2 ();
  i_14 = i_3 + 1;
  if (i_14 != 200)
    goto <bb 13>; [98.00%]
  else
    goto <bb 7>; [2.00%]

  <bb 13> [count: 48975]:
  goto <bb 9>; [100.00%]

  <bb 17> [count: 1000]:   <- this test is always true becuase it is
			      reached form bb 3
  # i_18 = PHI <i_7(3)>
  if (i_18 != 200)
    goto <bb 16>; [99.95%]
  else
    goto <bb 7>; [0.05%]

  <bb 7> [count: 1000]:
  return;

The reason why we are slightly wrong is the condtion in bb17 that 
is always true but the pass does not konw it.

Rich any idea how to do that?  I think connect_loops should work out
the cas where the loop exit conditon is never satisfied at the time
the splitted condition fails for first time.

Also we do not update loop iteration expectancies.  If we were able to 
work out if one of the loop has constant iteration count, we could do it
perfectly.

Before patch on hmmer we get a lot of mismatches:
Profile report here claims:
dump id |static mismat|dynamic mismatch                                     |       
        |in count     |in count                  |time                      |       
lsplit  |      5    +5|   8151850567  +8151850567| 531506481006       +57.9%| 
ldist   |      9    +4|  15345493501  +7193642934| 606848841056       +14.2%| 
ifcvt   |     10    +1|  15487514871   +142021370| 689469797790       +13.6%| 
vect    |     35   +25|  17558425961  +2070911090| 517375405715       -25.0%| 
cunroll |     42    +7|  16898736178   -659689783| 452445796198        -4.9%|  
loopdone|     33    -9|   2678017188 -14220718990| 330969127663             |       
tracer  |     34    +1|   2678018710        +1522| 330613415364        +0.0%|  
fre     |     33    -1|   2676980249     -1038461| 330465677073        -0.0%|  
expand  |     28    -5|   2497468467   -179511782|--------------------------|

With patch

lsplit  |      0      |            0             | 328723360744        -2.3%|
ldist   |      0      |            0             | 396193562452       +20.6%|
ifcvt   |      1    +1|     71010686    +71010686| 478743508522       +20.8%|
vect    |     14   +13|    697518955   +626508269| 299398068323       -37.5%|
cunroll |     13    -1|    489349408   -208169547| 257777839725       -10.5%|
loopdone|     11    -2|    402558559    -86790849| 201010712702             |
tracer  |     13    +2|    402977200      +418641| 200651036623        +0.0%|
fre     |     13      |    402622146      -355054| 200344398654        -0.2%|
expand  |     11    -2|    333608636    -69013510|--------------------------|

So no mismatches for lsplit and ldist and also lsplit thinks it improves
speed by 2.3% rather than regressig it by 57%. 

Update is still not perfect since we do not work out that the second loop
never iterates.  Also ldist is still wrong siince time should not go up.

Ifcft wrecks profile by desing since it insert conditonals with both arms 100%
that will be eliminated later after vect.  It is not clear to me what happens
in vect though.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

	PR middle-end/106293
	* tree-ssa-loop-split.cc (connect_loops): Change probability
	of the test preconditioning second loop to very_likely.
	(fix_loop_bb_probability): Handle correctly case where
	on of the arms of the conditional is empty.
	(split_loop): Fold the test guarding first condition to
	see if it is constant true; Set correct entry block
	probabilities of the split loops; determine correct loop
	eixt probabilities.

gcc/testsuite/ChangeLog:

	PR middle-end/106293
	* gcc.dg/tree-prof/loop-split-1.c: New test.
	* gcc.dg/tree-prof/loop-split-2.c: New test.
	* gcc.dg/tree-prof/loop-split-3.c: New test.

Comments

Richard Biener July 28, 2023, 12:37 p.m. UTC | #1
On Fri, Jul 28, 2023 at 9:58 AM Jan Hubicka via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
> this patch fixes profile update in the first case of loop splitting.
> The pass still gives up on very basic testcases:
>
> __attribute__ ((noinline,noipa))
> void test1 (int n)
> {
>   if (n <= 0 || n > 100000)
>     return;
>   for (int i = 0; i <= n; i++)
>     {
>       if (i < n)
>         do_something ();
>       if (a[i])
>         do_something2();
>     }
> }
> Here I needed to do the conditoinal that enforces sane value range of n.
> The reason is that it gives up on:
>       !number_of_iterations_exit (loop1, exit1, &niter, false, true)
> and without the conditonal we get assumption that n>=0 and not INT_MAX.
> I think from overflow we shold derive that INT_MAX test is not needed and since
> the loop does nothing for n<0 it is also just an paranoia.

I only get n != 2147483647 (loop header copying does the n >= 0).  Indeed
this test looks odd.  It's because we turn i <= n into i < n + 1 and analyze
that (our canonical test is LT_EXPR), for this to work n may not be INT_MAX.

> I am not sure how to fix this though :(.  In general the pass does not really
> need to compute iteration count.  It only needs to know what direction the IVs
> go so it can detect tests that fires in first part of iteration space.
>
> Rich, any idea what the correct test should be?

In principle it could just look at the scalar evolution for the IV in
the exit test.
Aka use simple_iv () and check ->no_overflow?

> In testcase:
>   for (int i = 0; i < 200; i++)
>     if (i < 150)
>       do_something ();
>     else
>       do_something2 ();
> the old code did wrong update of the exit condition probabilities.
> We know that first loop iterates 150 times and the second loop 50 times
> and we get it by simply scaling loop body by the probability of inner test.
>
> With the patch we now get:
>
>   <bb 2> [count: 1000]:
>
>   <bb 3> [count: 150000]:    <- loop 1 correctly iterates 149 times
>   # i_10 = PHI <i_7(8), 0(2)>
>   do_something ();
>   i_7 = i_10 + 1;
>   if (i_7 <= 149)
>     goto <bb 8>; [99.33%]
>   else
>     goto <bb 17>; [0.67%]
>
>   <bb 8> [count: 149000]:
>   goto <bb 3>; [100.00%]
>
>   <bb 16> [count: 1000]:
>   # i_15 = PHI <i_18(17)>
>
>   <bb 9> [count: 49975]:    <- loop 2 should iterate 50 times but
>                                we are slightly wrong
>   # i_3 = PHI <i_15(16), i_14(13)>
>   do_something2 ();
>   i_14 = i_3 + 1;
>   if (i_14 != 200)
>     goto <bb 13>; [98.00%]
>   else
>     goto <bb 7>; [2.00%]
>
>   <bb 13> [count: 48975]:
>   goto <bb 9>; [100.00%]
>
>   <bb 17> [count: 1000]:   <- this test is always true becuase it is
>                               reached form bb 3
>   # i_18 = PHI <i_7(3)>
>   if (i_18 != 200)
>     goto <bb 16>; [99.95%]
>   else
>     goto <bb 7>; [0.05%]
>
>   <bb 7> [count: 1000]:
>   return;
>
> The reason why we are slightly wrong is the condtion in bb17 that
> is always true but the pass does not konw it.
>
> Rich any idea how to do that?  I think connect_loops should work out
> the cas where the loop exit conditon is never satisfied at the time
> the splitted condition fails for first time.
>
> Also we do not update loop iteration expectancies.  If we were able to
> work out if one of the loop has constant iteration count, we could do it
> perfectly.
>
> Before patch on hmmer we get a lot of mismatches:
> Profile report here claims:
> dump id |static mismat|dynamic mismatch                                     |
>         |in count     |in count                  |time                      |
> lsplit  |      5    +5|   8151850567  +8151850567| 531506481006       +57.9%|
> ldist   |      9    +4|  15345493501  +7193642934| 606848841056       +14.2%|
> ifcvt   |     10    +1|  15487514871   +142021370| 689469797790       +13.6%|
> vect    |     35   +25|  17558425961  +2070911090| 517375405715       -25.0%|
> cunroll |     42    +7|  16898736178   -659689783| 452445796198        -4.9%|
> loopdone|     33    -9|   2678017188 -14220718990| 330969127663             |
> tracer  |     34    +1|   2678018710        +1522| 330613415364        +0.0%|
> fre     |     33    -1|   2676980249     -1038461| 330465677073        -0.0%|
> expand  |     28    -5|   2497468467   -179511782|--------------------------|
>
> With patch
>
> lsplit  |      0      |            0             | 328723360744        -2.3%|
> ldist   |      0      |            0             | 396193562452       +20.6%|
> ifcvt   |      1    +1|     71010686    +71010686| 478743508522       +20.8%|
> vect    |     14   +13|    697518955   +626508269| 299398068323       -37.5%|
> cunroll |     13    -1|    489349408   -208169547| 257777839725       -10.5%|
> loopdone|     11    -2|    402558559    -86790849| 201010712702             |
> tracer  |     13    +2|    402977200      +418641| 200651036623        +0.0%|
> fre     |     13      |    402622146      -355054| 200344398654        -0.2%|
> expand  |     11    -2|    333608636    -69013510|--------------------------|
>
> So no mismatches for lsplit and ldist and also lsplit thinks it improves
> speed by 2.3% rather than regressig it by 57%.
>
> Update is still not perfect since we do not work out that the second loop
> never iterates.  Also ldist is still wrong siince time should not go up.
>
> Ifcft wrecks profile by desing since it insert conditonals with both arms 100%
> that will be eliminated later after vect.  It is not clear to me what happens
> in vect though.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> gcc/ChangeLog:
>
>         PR middle-end/106293
>         * tree-ssa-loop-split.cc (connect_loops): Change probability
>         of the test preconditioning second loop to very_likely.
>         (fix_loop_bb_probability): Handle correctly case where
>         on of the arms of the conditional is empty.
>         (split_loop): Fold the test guarding first condition to
>         see if it is constant true; Set correct entry block
>         probabilities of the split loops; determine correct loop
>         eixt probabilities.
>
> gcc/testsuite/ChangeLog:
>
>         PR middle-end/106293
>         * gcc.dg/tree-prof/loop-split-1.c: New test.
>         * gcc.dg/tree-prof/loop-split-2.c: New test.
>         * gcc.dg/tree-prof/loop-split-3.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c b/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c
> new file mode 100644
> index 00000000000..46f7ae66fa3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c
> @@ -0,0 +1,33 @@
> +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks -fdump-tree-optimized-details-blocks" } */
> +
> +
> +void
> +__attribute__ ((noinline,noipa))
> +do_something()
> +{
> +}
> +void
> +__attribute__ ((noinline,noipa))
> +do_something2()
> +{
> +}
> +
> +__attribute__ ((noinline,noipa))
> +void test1 (int n, int ga)
> +{
> +  for (int i = 0; i < 200; i++)
> +         if (i < 150)
> +                 do_something ();
> +         else
> +                 do_something2 ();
> +}
> +int
> +main(int, char **)
> +{
> +       for (int i = 0 ; i < 1000; i++)
> +         test1(10, 10);
> +       return 0;
> +}
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c b/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c
> new file mode 100644
> index 00000000000..de2c9f58301
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c
> @@ -0,0 +1,36 @@
> +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks -fdump-tree-optimized-details-blocks" } */
> +
> +int M = 100;
> +
> +void
> +__attribute__ ((noinline,noipa))
> +do_something()
> +{
> +}
> +void
> +__attribute__ ((noinline,noipa))
> +do_something2()
> +{
> +}
> +
> +__attribute__ ((noinline,noipa))
> +void test1 (int n)
> +{
> +  if (n <= 0 || n > 100000)
> +        return;
> +  for (int i = 0; i <= n; i++)
> +          if (i < n)
> +                  do_something ();
> +          else
> +                  do_something2 ();
> +}
> +int
> +main(int, char **)
> +{
> +        for (int i = 0 ; i < 1000; i++)
> +          test1(M);
> +        return 0;
> +}
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c b/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c
> new file mode 100644
> index 00000000000..a88bc1f8663
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c
> @@ -0,0 +1,41 @@
> +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks -fdump-tree-optimized-details-blocks" } */
> +
> +int M = 100;
> +int a[1000];
> +
> +void
> +__attribute__ ((noinline,noipa))
> +do_something()
> +{
> +}
> +void
> +__attribute__ ((noinline,noipa))
> +do_something2()
> +{
> +}
> +
> +__attribute__ ((noinline,noipa))
> +void test1 (int n)
> +{
> +  if (n <= 0 || n > 100000)
> +        return;
> +  for (int i = 0; i <= n; i++)
> +    {
> +      if (i < n)
> +       do_something ();
> +      if (a[i])
> +       do_something2();
> +    }
> +}
> +int
> +main(int, char **)
> +{
> +  for (int i = 0 ; i < 1000; i+=3)
> +    a[i]=1;
> +  for (int i = 0 ; i < 1000; i++)
> +    test1(M);
> +  return 0;
> +}
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "optimized" } } */
> diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
> index f441f3fe1b5..70cd0aaefa7 100644
> --- a/gcc/tree-ssa-loop-split.cc
> +++ b/gcc/tree-ssa-loop-split.cc
> @@ -355,7 +356,7 @@ connect_loops (class loop *loop1, class loop *loop2)
>        new_e->flags |= EDGE_TRUE_VALUE;
>      }
>
> -  new_e->probability = profile_probability::likely ();
> +  new_e->probability = profile_probability::very_likely ();
>    skip_e->probability = new_e->probability.invert ();
>
>    return new_e;
> @@ -496,6 +497,8 @@ fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge true_edge,
>    unsigned j;
>    for (j = 0; j < loop1->num_nodes; j++)
>      if (bbs1[j] == loop1->latch
> +       /* Watch for case where the true conditional is empty.  */
> +       || !single_pred_p (true_edge->dest)
>         || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
>        bbs1[j]->count
>         = bbs1[j]->count.apply_probability (true_edge->probability);
> @@ -507,6 +510,8 @@ fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge true_edge,
>    bbs2 = get_loop_body (loop2);
>    for (j = 0; j < loop2->num_nodes; j++)
>      if (bbs2[j] == loop2->latch
> +       /* Watch for case where the flase conditional is empty.  */
> +       || !single_pred_p (bbi_copy)
>         || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
>        bbs2[j]->count
>         = bbs2[j]->count.apply_probability (true_edge->probability.invert ());
> @@ -564,6 +569,7 @@ split_loop (class loop *loop1)
>    for (i = 0; i < loop1->num_nodes; i++)
>      if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
>        {
> +       profile_count entry_count = loop_preheader_edge (loop1)->count ();
>         /* Handling opposite steps is not implemented yet.  Neither
>            is handling different step sizes.  */
>         if ((tree_int_cst_sign_bit (iv.step)
> @@ -610,7 +616,8 @@ split_loop (class loop *loop1)
>         if (stmts2)
>           gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
>                                             stmts2);
> -       tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
> +       tree cond = fold_build2 (guard_code, boolean_type_node,
> +                                guard_init, border);
>         if (!initial_true)
>           cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>
> @@ -622,13 +629,71 @@ split_loop (class loop *loop1)
>         initialize_original_copy_tables ();
>         basic_block cond_bb;
>
> +       profile_probability loop1_prob
> +         = integer_onep (cond) ? profile_probability::always ()
> +                               : true_edge->probability;
> +       /* TODO: It is commonly a case that we know that both loops will be
> +          entered.  very_likely below is the probability that second loop will
> +          be entered given by connect_loops.  We should work out the common
> +          case it is always true.  */
>         class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -                                         true_edge->probability,
> -                                         true_edge->probability.invert (),
> +                                         loop1_prob,
> +                                         /* Pass always as we will later
> +                                            redirect first loop to second
> +                                            loop.  */
>                                           profile_probability::always (),
>                                           profile_probability::always (),
> +                                         profile_probability::very_likely (),
>                                           true);
>         gcc_assert (loop2);
> +       /* Correct probability of edge  cond_bb->preheader_of_loop2.  */
> +       single_pred_edge
> +               (loop_preheader_edge (loop2)->src)->probability
> +                       = loop1_prob.invert ();
> +
> +       fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
> +
> +       /* Fix loop's exit probability after scaling.  */
> +
> +       if (entry_count.initialized_p () && entry_count.nonzero_p ())
> +         {
> +           edge exit_to_latch1;
> +           if (EDGE_SUCC (exit1->src, 0) == exit1)
> +             exit_to_latch1 = EDGE_SUCC (exit1->src, 1);
> +           else
> +             exit_to_latch1 = EDGE_SUCC (exit1->src, 0);
> +           if (exit1->src->count.nonzero_p ())
> +             {
> +               /* First loop is entered loop1_prob * entry_count times
> +                  and it needs to exit the same number of times.  */
> +               exit1->probability
> +                       = entry_count.apply_probability
> +                               (loop1_prob).probability_in (exit1->src->count);
> +               exit_to_latch1->probability = exit1->probability.invert ();
> +               scale_dominated_blocks_in_loop (loop1, exit1->src,
> +                                               exit_to_latch1->count (),
> +                                               exit_to_latch1->dest->count);
> +             }
> +
> +           edge exit_to_latch2, exit2 = single_exit (loop2);
> +           if (EDGE_SUCC (exit2->src, 0) == exit2)
> +             exit_to_latch2 = EDGE_SUCC (exit2->src, 1);
> +           else
> +             exit_to_latch2 = EDGE_SUCC (exit2->src, 0);
> +           if (exit2->src->count.nonzero_p ())
> +             {
> +               /* Second loop is entered very_likely * entry_count times
> +                  and it needs to exit the same number of times.  */
> +               exit2->probability
> +                       = entry_count.apply_probability
> +                               (profile_probability::very_likely ())
> +                               .probability_in (exit2->src->count);
> +               exit_to_latch2->probability = exit2->probability.invert ();
> +               scale_dominated_blocks_in_loop (loop2, exit2->src,
> +                                               exit_to_latch2->count (),
> +                                               exit_to_latch2->dest->count);
> +             }
> +         }
>
>         edge new_e = connect_loops (loop1, loop2);
>         connect_loop_phis (loop1, loop2, new_e);
> @@ -647,16 +712,7 @@ split_loop (class loop *loop1)
>         tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
>         patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
>
> -       fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
> -
> -       /* Fix first loop's exit probability after scaling.  */
> -       edge exit_to_latch1;
> -       if (EDGE_SUCC (exit1->src, 0) == exit1)
> -         exit_to_latch1 = EDGE_SUCC (exit1->src, 1);
> -       else
> -         exit_to_latch1 = EDGE_SUCC (exit1->src, 0);
> -       exit_to_latch1->probability *= true_edge->probability;
> -       exit1->probability = exit_to_latch1->probability.invert ();
> +       /* TODO: Update any_esitmate and upper bounds.  */
>
>         /* Finally patch out the two copies of the condition to be always
>            true/false (or opposite).  */
Jan Hubicka July 28, 2023, 12:50 p.m. UTC | #2
> On Fri, Jul 28, 2023 at 9:58 AM Jan Hubicka via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi,
> > this patch fixes profile update in the first case of loop splitting.
> > The pass still gives up on very basic testcases:
> >
> > __attribute__ ((noinline,noipa))
> > void test1 (int n)
> > {
> >   if (n <= 0 || n > 100000)
> >     return;
> >   for (int i = 0; i <= n; i++)
> >     {
> >       if (i < n)
> >         do_something ();
> >       if (a[i])
> >         do_something2();
> >     }
> > }
> > Here I needed to do the conditoinal that enforces sane value range of n.
> > The reason is that it gives up on:
> >       !number_of_iterations_exit (loop1, exit1, &niter, false, true)
> > and without the conditonal we get assumption that n>=0 and not INT_MAX.
> > I think from overflow we shold derive that INT_MAX test is not needed and since
> > the loop does nothing for n<0 it is also just an paranoia.
> 
> I only get n != 2147483647 (loop header copying does the n >= 0).  Indeed
> this test looks odd.  It's because we turn i <= n into i < n + 1 and analyze
> that (our canonical test is LT_EXPR), for this to work n may not be INT_MAX.

Yep, I can't think on how that can disturb loop splitting.  The loop
above is similar to one in hmmer so people do loops like that.
We should be able to use the fact that i can not overflow to get rid of
this assumtion, but I am not that famililar with that code...

I think it would help elsewhere too?
> 
> In principle it could just look at the scalar evolution for the IV in
> the exit test.
> Aka use simple_iv () and check ->no_overflow?

Yep, I tink that should be enough.  It uses simple_iv to analyze the
in-loop conditionals.  I will look into that.

Honza
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c b/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c
new file mode 100644
index 00000000000..46f7ae66fa3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c
@@ -0,0 +1,33 @@ 
+/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks -fdump-tree-optimized-details-blocks" } */
+
+
+void
+__attribute__ ((noinline,noipa))
+do_something()
+{
+}
+void
+__attribute__ ((noinline,noipa))
+do_something2()
+{
+}
+
+__attribute__ ((noinline,noipa))
+void test1 (int n, int ga)
+{
+  for (int i = 0; i < 200; i++)
+	  if (i < 150)
+		  do_something ();
+	  else
+		  do_something2 ();
+}
+int
+main(int, char **)
+{
+	for (int i = 0 ; i < 1000; i++)
+	  test1(10, 10);
+	return 0;
+}
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" } } */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "lsplit" } } */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c b/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c
new file mode 100644
index 00000000000..de2c9f58301
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c
@@ -0,0 +1,36 @@ 
+/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks -fdump-tree-optimized-details-blocks" } */
+
+int M = 100;
+
+void
+__attribute__ ((noinline,noipa))
+do_something()
+{
+}
+void
+__attribute__ ((noinline,noipa))
+do_something2()
+{
+}
+
+__attribute__ ((noinline,noipa))
+void test1 (int n)
+{
+  if (n <= 0 || n > 100000)
+        return; 
+  for (int i = 0; i <= n; i++)
+          if (i < n)
+                  do_something ();
+          else
+                  do_something2 ();
+}
+int
+main(int, char **)
+{
+        for (int i = 0 ; i < 1000; i++)
+          test1(M);
+        return 0;
+}
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" } } */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "lsplit" } } */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c b/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c
new file mode 100644
index 00000000000..a88bc1f8663
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c
@@ -0,0 +1,41 @@ 
+/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks -fdump-tree-optimized-details-blocks" } */
+
+int M = 100;
+int a[1000];
+
+void
+__attribute__ ((noinline,noipa))
+do_something()
+{
+}
+void
+__attribute__ ((noinline,noipa))
+do_something2()
+{
+}
+
+__attribute__ ((noinline,noipa))
+void test1 (int n)
+{
+  if (n <= 0 || n > 100000)
+        return; 
+  for (int i = 0; i <= n; i++)
+    {
+      if (i < n)
+	do_something ();
+      if (a[i])
+	do_something2();
+    }
+}
+int
+main(int, char **)
+{
+  for (int i = 0 ; i < 1000; i+=3)
+    a[i]=1;
+  for (int i = 0 ; i < 1000; i++)
+    test1(M);
+  return 0;
+}
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" } } */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "lsplit" } } */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "optimized" } } */
diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
index f441f3fe1b5..70cd0aaefa7 100644
--- a/gcc/tree-ssa-loop-split.cc
+++ b/gcc/tree-ssa-loop-split.cc
@@ -355,7 +356,7 @@  connect_loops (class loop *loop1, class loop *loop2)
       new_e->flags |= EDGE_TRUE_VALUE;
     }
 
-  new_e->probability = profile_probability::likely ();
+  new_e->probability = profile_probability::very_likely ();
   skip_e->probability = new_e->probability.invert ();
 
   return new_e;
@@ -496,6 +497,8 @@  fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge true_edge,
   unsigned j;
   for (j = 0; j < loop1->num_nodes; j++)
     if (bbs1[j] == loop1->latch
+	/* Watch for case where the true conditional is empty.  */
+	|| !single_pred_p (true_edge->dest)
 	|| !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
       bbs1[j]->count
 	= bbs1[j]->count.apply_probability (true_edge->probability);
@@ -507,6 +510,8 @@  fix_loop_bb_probability (class loop *loop1, class loop *loop2, edge true_edge,
   bbs2 = get_loop_body (loop2);
   for (j = 0; j < loop2->num_nodes; j++)
     if (bbs2[j] == loop2->latch
+	/* Watch for case where the flase conditional is empty.  */
+	|| !single_pred_p (bbi_copy)
 	|| !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
       bbs2[j]->count
 	= bbs2[j]->count.apply_probability (true_edge->probability.invert ());
@@ -564,6 +569,7 @@  split_loop (class loop *loop1)
   for (i = 0; i < loop1->num_nodes; i++)
     if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
       {
+	profile_count entry_count = loop_preheader_edge (loop1)->count ();
 	/* Handling opposite steps is not implemented yet.  Neither
 	   is handling different step sizes.  */
 	if ((tree_int_cst_sign_bit (iv.step)
@@ -610,7 +616,8 @@  split_loop (class loop *loop1)
 	if (stmts2)
 	  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
 					    stmts2);
-	tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
+	tree cond = fold_build2 (guard_code, boolean_type_node,
+				 guard_init, border);
 	if (!initial_true)
 	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
 
@@ -622,13 +629,71 @@  split_loop (class loop *loop1)
 	initialize_original_copy_tables ();
 	basic_block cond_bb;
 
+	profile_probability loop1_prob
+	  = integer_onep (cond) ? profile_probability::always ()
+				: true_edge->probability;
+	/* TODO: It is commonly a case that we know that both loops will be
+	   entered.  very_likely below is the probability that second loop will
+	   be entered given by connect_loops.  We should work out the common
+	   case it is always true.  */
 	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-					  true_edge->probability,
-					  true_edge->probability.invert (),
+					  loop1_prob,
+					  /* Pass always as we will later
+					     redirect first loop to second
+					     loop.  */
 					  profile_probability::always (),
 					  profile_probability::always (),
+					  profile_probability::very_likely (),
 					  true);
 	gcc_assert (loop2);
+	/* Correct probability of edge  cond_bb->preheader_of_loop2.  */
+	single_pred_edge
+		(loop_preheader_edge (loop2)->src)->probability
+			= loop1_prob.invert ();
+
+	fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
+
+	/* Fix loop's exit probability after scaling.  */
+
+	if (entry_count.initialized_p () && entry_count.nonzero_p ())
+	  {
+	    edge exit_to_latch1;
+	    if (EDGE_SUCC (exit1->src, 0) == exit1)
+	      exit_to_latch1 = EDGE_SUCC (exit1->src, 1);
+	    else
+	      exit_to_latch1 = EDGE_SUCC (exit1->src, 0);
+	    if (exit1->src->count.nonzero_p ())
+	      {
+		/* First loop is entered loop1_prob * entry_count times
+		   and it needs to exit the same number of times.  */
+		exit1->probability
+			= entry_count.apply_probability
+				(loop1_prob).probability_in (exit1->src->count);
+		exit_to_latch1->probability = exit1->probability.invert ();
+		scale_dominated_blocks_in_loop (loop1, exit1->src,
+						exit_to_latch1->count (),
+						exit_to_latch1->dest->count);
+	      }
+
+	    edge exit_to_latch2, exit2 = single_exit (loop2);
+	    if (EDGE_SUCC (exit2->src, 0) == exit2)
+	      exit_to_latch2 = EDGE_SUCC (exit2->src, 1);
+	    else
+	      exit_to_latch2 = EDGE_SUCC (exit2->src, 0);
+	    if (exit2->src->count.nonzero_p ())
+	      {
+		/* Second loop is entered very_likely * entry_count times
+		   and it needs to exit the same number of times.  */
+		exit2->probability
+			= entry_count.apply_probability
+				(profile_probability::very_likely ())
+				.probability_in (exit2->src->count);
+		exit_to_latch2->probability = exit2->probability.invert ();
+		scale_dominated_blocks_in_loop (loop2, exit2->src,
+						exit_to_latch2->count (),
+						exit_to_latch2->dest->count);
+	      }
+	  }
 
 	edge new_e = connect_loops (loop1, loop2);
 	connect_loop_phis (loop1, loop2, new_e);
@@ -647,16 +712,7 @@  split_loop (class loop *loop1)
 	tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1));
 	patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
 
-	fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
-
-	/* Fix first loop's exit probability after scaling.  */
-	edge exit_to_latch1;
-	if (EDGE_SUCC (exit1->src, 0) == exit1)
-	  exit_to_latch1 = EDGE_SUCC (exit1->src, 1);
-	else
-	  exit_to_latch1 = EDGE_SUCC (exit1->src, 0);
-	exit_to_latch1->probability *= true_edge->probability;
-	exit1->probability = exit_to_latch1->probability.invert ();
+	/* TODO: Update any_esitmate and upper bounds.  */
 
 	/* Finally patch out the two copies of the condition to be always
 	   true/false (or opposite).  */