Patchwork PR middle-end/39326 - limit LIM

login
register
mail settings
Submitter Richard Guenther
Date March 11, 2013, 9:52 a.m.
Message ID <CAFiYyc0dhanJC4GdA2TSz15ECsA=P8pFmHxCKfeq47-NybrBVA@mail.gmail.com>
Download mbox | patch
Permalink /patch/226506/
State New
Headers show

Comments

Richard Guenther - March 11, 2013, 9:52 a.m.
On Sun, Mar 10, 2013 at 2:08 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Hello,
>
> The attached patch fixes another one of the scalability problems
> reported in PR middle-end/39326.
>
> This problem is that tree loop-invariant code motion explodes on basic
> blocks with many memory references. Compile time is quadratic in the
> number of memory references in the basic block, and so are the memory
> requirements when the dependences or independences are propagated
> bottom-up through the loop tree.
>
> The fix is to give up on loops with too many memory references to
> handle. There is already a param for that for loop dependence
> analysis, and this patch makes LIM use the same param.
>
> Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu.
> OK for trunk?

Given

+   well.  Return true if all is well, false if something happened
+   that is fatal to the rest of the LIM pass.  */

-static void
+static bool
 gather_mem_refs_stmt (struct loop *loop, gimple stmt)

and

   FOR_EACH_BB (bb)
     {
...
+      for (bsi = gsi_start_bb (bb);
+          !gsi_end_p (bsi) && all_ok;
+          gsi_next (&bsi))
+       all_ok = gather_mem_refs_stmt (loop, gsi_stmt (bsi));
+
+      if (! all_ok)
+        bitmap_set_bit (loops_with_too_many_memrefs, loop->num);
+    }
+
+  /* Propagate the information about loops with too many memory
+     references up the loop hierarchy.  */
+  FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
+    {
+      struct loop *outer = loop_outer (loop);
+      if (outer == current_loops->tree_root
+         || ! bitmap_bit_p (loops_with_too_many_memrefs, loop->num))
+       continue;
+      bitmap_set_bit (loops_with_too_many_memrefs, outer->num);
     }

I don't see how this propagation works correctly as you start to mark
BBs as not-ok starting from a "random" basic-block in the loop tree.
You of course also end up disqualifying very small loops completely
if they happen to be analyzed after a very big one you disqualify.
Of course that's partly because memory_accesses contains all
memory accesses in the function - so I think rather than limiting
on length of memory_accesses you want to limit on the length of
memory_accesses.refs_in_loop (well, on memory_accesses.all_refs_in_loop).
And you want the initial walk over all BBs to instead walk on BBs
FOR_EACH_LOOP and LI_FROM_INNERMOST (you can then do the
propagation to fill all_refs_in_loop there, too).  I realize there isn't a good
existing BB walker for this (with a suitable place to re-set all-ok) - a simple
walk over get_loop_body via LI_FROM_INNERMOST would get you
visit sub-loop bodies multiple times (easily skipped by a bb->loop_father
test, of course, but still ...)

That is, sth like the following preparatory patch to change the iteration:


At this point this should be stage1 material, eventually backported for 4.8.1.

And yes, aside from the above the rest of the patch looks good to me
(move loops_with_too_many_memrefs into the memory_accesses struct?)

Thanks,
Richard.

> Ciao!
> Steven
>
> (The ChangeLog is a bit long but the patch is relatively straight-forward.)
>
>         * tree-flow.h (enum move_pos): Moved to tree-ssa-loop-im.c.
>         * tree-ssa-loop-im.c: Include diagnostic-core.h for warning_at()
>         (enum move_pos): Moved here.
>         (movement_possibility): Made static.  Reject memory operations in
>         loops with too many memory references to handle.
>         (determine_max_movement): Take loops_with_too_many_memrefs argument.
>         For statements referencing memory, find the outermost superloop
>         that is not in the loops_with_too_many_memrefs set.
>         (determine_invariantness_stmt): Take loops_with_too_many_memrefs
>         via dom_walk_data.global_data, and pass it along where necessary.
>         Hoist "pos == MOVE_POSSIBLE" test.
>         (determine_invariantness): Take loops_with_too_many_memrefs argument.
>         (move_computations): Likewise, but unused for now.
>         (gather_mem_refs_stmt): Fail if there are too many memory references.
>         Use PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS as threshold.  Add disabled
>         optimization warning.
>         (gather_mem_refs_in_loops): Take loops_with_too_many_memrefs argument.
>         Propagate it from inner loops to outer loops.  Do not propagate
>         recorded memory references for loops on which memory optimizations
>         are disabled.
>         (create_vop_ref_mapping): Take loops_with_too_many_memrefs argument.
>         Don't create a mapping on loops that are in this set.
>         (analyze_memory_references): Take loops_with_too_many_memrefs argument
>         and let subroutines fill it.
>         (store_motion): Take loops_with_too_many_memrefs argument.
>         Skip loops that are in this set.
>         (tree_ssa_lim): Allocate, pass, and free loops_with_too_many_memrefs.
Steven Bosscher - March 11, 2013, 6:24 p.m.
On Mon, Mar 11, 2013 at 10:52 AM, Richard Biener wrote:
> Given
>
> +   well.  Return true if all is well, false if something happened
> +   that is fatal to the rest of the LIM pass.  */
>
> -static void
> +static bool
>  gather_mem_refs_stmt (struct loop *loop, gimple stmt)
>
> and
>
>    FOR_EACH_BB (bb)
>      {
> ...
> +      for (bsi = gsi_start_bb (bb);
> +          !gsi_end_p (bsi) && all_ok;
> +          gsi_next (&bsi))
> +       all_ok = gather_mem_refs_stmt (loop, gsi_stmt (bsi));
> +
> +      if (! all_ok)
> +        bitmap_set_bit (loops_with_too_many_memrefs, loop->num);
> +    }
> +
> +  /* Propagate the information about loops with too many memory
> +     references up the loop hierarchy.  */
> +  FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
> +    {
> +      struct loop *outer = loop_outer (loop);
> +      if (outer == current_loops->tree_root
> +         || ! bitmap_bit_p (loops_with_too_many_memrefs, loop->num))
> +       continue;
> +      bitmap_set_bit (loops_with_too_many_memrefs, outer->num);
>      }
>
> I don't see how this propagation works correctly as you start to mark
> BBs as not-ok starting from a "random" basic-block in the loop tree.

Not at all. The function looks like this:

static void
gather_mem_refs_in_loops (bitmap loops_with_too_many_memrefs)
{
  FOR_EACH_BB (bb)
    {
       for each gimple statement
         all_ok = gather_mem_refs_stmt (loop, gsi_stmt (bsi));
    if (! all_ok)
        bitmap_set_bit (loops_with_too_many_memrefs, loop->num);
    }

  /* Propagate the information about loops with too many memory
     references up the loop hierarchy.  */
  FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
    {
      struct loop *outer = loop_outer (loop);
      if (outer == current_loops->tree_root
          || ! bitmap_bit_p (loops_with_too_many_memrefs, loop->num))
        continue;
      bitmap_set_bit (loops_with_too_many_memrefs, outer->num);
    }

  /* Propagate the information about accessed memory references up
     the loop hierarchy.  */
  FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
    /* Propagate stuff */
}

So all basic blocks are visited first.
Note it is also like this without my patch.


> You of course also end up disqualifying very small loops completely
> if they happen to be analyzed after a very big one you disqualify.
> Of course that's partly because memory_accesses contains all
> memory accesses in the function - so I think rather than limiting
> on length of memory_accesses you want to limit on the length of
> memory_accesses.refs_in_loop (well, on memory_accesses.all_refs_in_loop).

Right, I guess the limit should be per-loop, and it's "global" now.


> And you want the initial walk over all BBs to instead walk on BBs
> FOR_EACH_LOOP and LI_FROM_INNERMOST (you can then do the
> propagation to fill all_refs_in_loop there, too).

That is already what happens.


> At this point this should be stage1 material, eventually backported for 4.8.1.

Obviously.

> And yes, aside from the above the rest of the patch looks good to me
> (move loops_with_too_many_memrefs into the memory_accesses struct?)

That's a good idea.

I'll come back with an updated patch for trunk GCC 4.9.

Ciao!
Steven

Patch

Index: gcc/tree-ssa-loop-im.c
===================================================================
--- gcc/tree-ssa-loop-im.c      (revision 196547)
+++ gcc/tree-ssa-loop-im.c      (working copy)
@@ -1629,29 +1629,30 @@  gather_mem_refs_in_loops (void)
   loop_iterator li;
   bitmap lrefs, alrefs, alrefso;

-  FOR_EACH_BB (bb)
-    {
-      loop = bb->loop_father;
-      if (loop == current_loops->tree_root)
-       continue;
-
-      for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
-       gather_mem_refs_stmt (loop, gsi_stmt (bsi));
-    }
-
-  /* Propagate the information about accessed memory references up
-     the loop hierarchy.  */
   FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
     {
+      basic_block bbs = get_loop_body (loop);
+      for (i = 0; i < loop->num_nodes; ++i)
+       {
+         bb = bbs[i];
+         if (bb->loop_father != loop)
+           continue;
+         for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
+           gather_mem_refs_stmt (loop, gsi_stmt (bsi));
+       }
+      free (bbs);
+
+      /* Propagate the information about accessed memory references up
+        the loop hierarchy.  */
       lrefs = memory_accesses.refs_in_loop[loop->num];
       alrefs = memory_accesses.all_refs_in_loop[loop->num];
       bitmap_ior_into (alrefs, lrefs);

-      if (loop_outer (loop) == current_loops->tree_root)
-       continue;
-
-      alrefso = memory_accesses.all_refs_in_loop[loop_outer (loop)->num];
-      bitmap_ior_into (alrefso, alrefs);
+      if (loop_outer (loop) != current_loops->tree_root)
+       {
+         alrefso = memory_accesses.all_refs_in_loop[loop_outer (loop)->num];
+         bitmap_ior_into (alrefso, alrefs);
+       }
     }
 }