diff mbox series

Make loops_list support an optional loop_p root

Message ID 61ac669c-7293-f53a-20c7-158b5a813cee@linux.ibm.com
State New
Headers show
Series Make loops_list support an optional loop_p root | expand

Commit Message

Kewen.Lin July 23, 2021, 8:41 a.m. UTC
on 2021/7/22 下午8:56, Richard Biener wrote:
> On Tue, Jul 20, 2021 at 4:37
> PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> This v2 has addressed some review comments/suggestions:
>>
>>   - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
>>   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
>>     to support loop hierarchy tree rather than just a function,
>>     and adjust to use loops* accordingly.
> 
> I actually meant struct loop *, not struct loops * ;)  At the point
> we pondered to make loop invariant motion work on single
> loop nests we gave up not only but also because it iterates
> over the loop nest but all the iterators only ever can process
> all loops, not say, all loops inside a specific 'loop' (and
> including that 'loop' if LI_INCLUDE_ROOT).  So the
> CTOR would take the 'root' of the loop tree as argument.
> 
> I see that doesn't trivially fit how loops_list works, at least
> not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
> could be adjusted to do ONLY_INNERMOST as well?
> 


Thanks for the clarification!  I just realized that the previous
version with struct loops* is problematic, all traversal is
still bounded with outer_loop == NULL.  I think what you expect
is to respect the given loop_p root boundary.  Since we just
record the loops' nums, I think we still need the function* fn?
So I add one optional argument loop_p root and update the
visiting codes accordingly.  Before this change, the previous
visiting uses the outer_loop == NULL as the termination condition,
it perfectly includes the root itself, but with this given root,
we have to use it as the termination condition to avoid to iterate
onto its possible existing next.

For LI_ONLY_INNERMOST, I was thinking whether we can use the
code like:

    struct loops *fn_loops = loops_for_fn (fn)->larray;
    for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
        if (aloop != NULL
            && aloop->inner == NULL
            && flow_loop_nested_p (tree_root, aloop))
             this->to_visit.quick_push (aloop->num);

it has the stable bound, but if the given root only has several
child loops, it can be much worse if there are many loops in fn.
It seems impossible to predict the given root loop hierarchy size,
maybe we can still use the original linear searching for the case
loops_for_fn (fn) == root?  But since this visiting seems not so
performance critical, I chose to share the code originally used
for FROM_INNERMOST, hope it can have better readability and
maintainability.

Bootstrapped and regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped on ppc64le P9 with bootstrap-O3 config.

Does the attached patch meet what you expect?

BR,
Kewen
-----
gcc/ChangeLog:

	* cfgloop.h (loops_list::loops_list): Add one optional argument root
	and adjust accordingly.

Comments

Martin Sebor July 23, 2021, 4:26 p.m. UTC | #1
On 7/23/21 2:41 AM, Kewen.Lin wrote:
> on 2021/7/22 下午8:56, Richard Biener wrote:
>> On Tue, Jul 20, 2021 at 4:37
>> PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>
>>> Hi,
>>>
>>> This v2 has addressed some review comments/suggestions:
>>>
>>>    - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
>>>    - Add new CTOR loops_list (struct loops *loops, unsigned flags)
>>>      to support loop hierarchy tree rather than just a function,
>>>      and adjust to use loops* accordingly.
>>
>> I actually meant struct loop *, not struct loops * ;)  At the point
>> we pondered to make loop invariant motion work on single
>> loop nests we gave up not only but also because it iterates
>> over the loop nest but all the iterators only ever can process
>> all loops, not say, all loops inside a specific 'loop' (and
>> including that 'loop' if LI_INCLUDE_ROOT).  So the
>> CTOR would take the 'root' of the loop tree as argument.
>>
>> I see that doesn't trivially fit how loops_list works, at least
>> not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
>> could be adjusted to do ONLY_INNERMOST as well?
>>
> 
> 
> Thanks for the clarification!  I just realized that the previous
> version with struct loops* is problematic, all traversal is
> still bounded with outer_loop == NULL.  I think what you expect
> is to respect the given loop_p root boundary.  Since we just
> record the loops' nums, I think we still need the function* fn?
> So I add one optional argument loop_p root and update the
> visiting codes accordingly.  Before this change, the previous
> visiting uses the outer_loop == NULL as the termination condition,
> it perfectly includes the root itself, but with this given root,
> we have to use it as the termination condition to avoid to iterate
> onto its possible existing next.
> 
> For LI_ONLY_INNERMOST, I was thinking whether we can use the
> code like:
> 
>      struct loops *fn_loops = loops_for_fn (fn)->larray;
>      for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
>          if (aloop != NULL
>              && aloop->inner == NULL
>              && flow_loop_nested_p (tree_root, aloop))
>               this->to_visit.quick_push (aloop->num);
> 
> it has the stable bound, but if the given root only has several
> child loops, it can be much worse if there are many loops in fn.
> It seems impossible to predict the given root loop hierarchy size,
> maybe we can still use the original linear searching for the case
> loops_for_fn (fn) == root?  But since this visiting seems not so
> performance critical, I chose to share the code originally used
> for FROM_INNERMOST, hope it can have better readability and
> maintainability.

I might be mixing up the two patches (they both seem to touch
the same functions), but in this one the loops_list ctor looks
like a sizeable function with at least one loop.  Since the ctor
is used in the initialization of each of the many range-for loops,
that could result in inlining of a lot of these calls and so quite
a bit code bloat.  Unless this is necessary for efficiency  (not
my area) I would recommend to consider defining the loops_list
ctor out-of-line in some .c or .cc file.

(Also, if you agree with the rationale, I'd replace loop_p with
loop * in the new code.)

Thanks
Martin

> 
> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu, also
> bootstrapped on ppc64le P9 with bootstrap-O3 config.
> 
> Does the attached patch meet what you expect?
> 
> BR,
> Kewen
> -----
> gcc/ChangeLog:
> 
> 	* cfgloop.h (loops_list::loops_list): Add one optional argument root
> 	and adjust accordingly.
>
Kewen.Lin July 27, 2021, 2:25 a.m. UTC | #2
on 2021/7/24 上午12:26, Martin Sebor wrote:
> On 7/23/21 2:41 AM, Kewen.Lin wrote:
>> on 2021/7/22 下午8:56, Richard Biener wrote:
>>> On Tue, Jul 20, 2021 at 4:37
>>> PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> This v2 has addressed some review comments/suggestions:
>>>>
>>>>    - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
>>>>    - Add new CTOR loops_list (struct loops *loops, unsigned flags)
>>>>      to support loop hierarchy tree rather than just a function,
>>>>      and adjust to use loops* accordingly.
>>>
>>> I actually meant struct loop *, not struct loops * ;)  At the point
>>> we pondered to make loop invariant motion work on single
>>> loop nests we gave up not only but also because it iterates
>>> over the loop nest but all the iterators only ever can process
>>> all loops, not say, all loops inside a specific 'loop' (and
>>> including that 'loop' if LI_INCLUDE_ROOT).  So the
>>> CTOR would take the 'root' of the loop tree as argument.
>>>
>>> I see that doesn't trivially fit how loops_list works, at least
>>> not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
>>> could be adjusted to do ONLY_INNERMOST as well?
>>>
>>
>>
>> Thanks for the clarification!  I just realized that the previous
>> version with struct loops* is problematic, all traversal is
>> still bounded with outer_loop == NULL.  I think what you expect
>> is to respect the given loop_p root boundary.  Since we just
>> record the loops' nums, I think we still need the function* fn?
>> So I add one optional argument loop_p root and update the
>> visiting codes accordingly.  Before this change, the previous
>> visiting uses the outer_loop == NULL as the termination condition,
>> it perfectly includes the root itself, but with this given root,
>> we have to use it as the termination condition to avoid to iterate
>> onto its possible existing next.
>>
>> For LI_ONLY_INNERMOST, I was thinking whether we can use the
>> code like:
>>
>>      struct loops *fn_loops = loops_for_fn (fn)->larray;
>>      for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
>>          if (aloop != NULL
>>              && aloop->inner == NULL
>>              && flow_loop_nested_p (tree_root, aloop))
>>               this->to_visit.quick_push (aloop->num);
>>
>> it has the stable bound, but if the given root only has several
>> child loops, it can be much worse if there are many loops in fn.
>> It seems impossible to predict the given root loop hierarchy size,
>> maybe we can still use the original linear searching for the case
>> loops_for_fn (fn) == root?  But since this visiting seems not so
>> performance critical, I chose to share the code originally used
>> for FROM_INNERMOST, hope it can have better readability and
>> maintainability.
> 
> I might be mixing up the two patches (they both seem to touch
> the same functions), but in this one the loops_list ctor looks
> like a sizeable function with at least one loop.  Since the ctor
> is used in the initialization of each of the many range-for loops,
> that could result in inlining of a lot of these calls and so quite
> a bit code bloat.  Unless this is necessary for efficiency  (not
> my area) I would recommend to consider defining the loops_list
> ctor out-of-line in some .c or .cc file.
> 

Yeah, they touch the same functions.  Good point on the code bloat,
I'm not sure the historical reason here, it needs Richi's input.  :)

> (Also, if you agree with the rationale, I'd replace loop_p with
> loop * in the new code.)
> 

Oh, thanks for the reminder, will update it.  

BR,
Kewen

> Thanks
> Martin
> 
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, also
>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>>
>> Does the attached patch meet what you expect?
>>
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>>     * cfgloop.h (loops_list::loops_list): Add one optional argument root
>>     and adjust accordingly.
>>
>
Richard Biener July 29, 2021, 8:01 a.m. UTC | #3
On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2021/7/22 下午8:56, Richard Biener wrote:
> > On Tue, Jul 20, 2021 at 4:37
> > PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> This v2 has addressed some review comments/suggestions:
> >>
> >>   - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
> >>   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
> >>     to support loop hierarchy tree rather than just a function,
> >>     and adjust to use loops* accordingly.
> >
> > I actually meant struct loop *, not struct loops * ;)  At the point
> > we pondered to make loop invariant motion work on single
> > loop nests we gave up not only but also because it iterates
> > over the loop nest but all the iterators only ever can process
> > all loops, not say, all loops inside a specific 'loop' (and
> > including that 'loop' if LI_INCLUDE_ROOT).  So the
> > CTOR would take the 'root' of the loop tree as argument.
> >
> > I see that doesn't trivially fit how loops_list works, at least
> > not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
> > could be adjusted to do ONLY_INNERMOST as well?
> >
>
>
> Thanks for the clarification!  I just realized that the previous
> version with struct loops* is problematic, all traversal is
> still bounded with outer_loop == NULL.  I think what you expect
> is to respect the given loop_p root boundary.  Since we just
> record the loops' nums, I think we still need the function* fn?

Would it simplify things if we recorded the actual loop *?

There's still the to_visit reserve which needs a bound on
the number of loops for efficiency reasons.

> So I add one optional argument loop_p root and update the
> visiting codes accordingly.  Before this change, the previous
> visiting uses the outer_loop == NULL as the termination condition,
> it perfectly includes the root itself, but with this given root,
> we have to use it as the termination condition to avoid to iterate
> onto its possible existing next.
>
> For LI_ONLY_INNERMOST, I was thinking whether we can use the
> code like:
>
>     struct loops *fn_loops = loops_for_fn (fn)->larray;
>     for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
>         if (aloop != NULL
>             && aloop->inner == NULL
>             && flow_loop_nested_p (tree_root, aloop))
>              this->to_visit.quick_push (aloop->num);
>
> it has the stable bound, but if the given root only has several
> child loops, it can be much worse if there are many loops in fn.
> It seems impossible to predict the given root loop hierarchy size,
> maybe we can still use the original linear searching for the case
> loops_for_fn (fn) == root?  But since this visiting seems not so
> performance critical, I chose to share the code originally used
> for FROM_INNERMOST, hope it can have better readability and
> maintainability.

I was indeed looking for something that has execution/storage
bound on the subtree we're interested in.  If we pull the CTOR
out-of-line we can probably keep the linear search for
LI_ONLY_INNERMOST when looking at the whole loop tree.

It just seemed to me that we can eventually re-use a
single loop tree walker for all orders, just adjusting the
places we push.

>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu, also
> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>
> Does the attached patch meet what you expect?

So yeah, it's probably close to what is sensible.  Not sure
whether optimizing the loops for the !only_push_innermost_p
case is important - if we manage to produce a single
walker with conditionals based on 'flags' then IPA-CP should
produce optimal clones as well I guess.

Richard.

>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         * cfgloop.h (loops_list::loops_list): Add one optional argument root
>         and adjust accordingly.
diff mbox series

Patch

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 741df44ea51..f7148df1758 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -669,13 +669,15 @@  as_const (T &t)
 }
 
 /* A list for visiting loops, which contains the loop numbers instead of
-   the loop pointers.  The scope is restricted in function FN and the
-   visiting order is specified by FLAGS.  */
+   the loop pointers.  If the loop ROOT is offered (non-null), the visiting
+   will start from it, otherwise it would start from loops_for_fn (FN)
+   instead.  The scope is restricted in function FN and the visiting order
+   is specified by FLAGS.  */
 
 class loops_list
 {
 public:
-  loops_list (function *fn, unsigned flags);
+  loops_list (function *fn, unsigned flags, loop_p root = nullptr);
 
   template <typename T> class Iter
   {
@@ -782,71 +784,94 @@  loops_list::Iter<T>::fill_curr_loop ()
 }
 
 /* Set up the loops list to visit according to the specified
-   function scope FN and iterating order FLAGS.  */
+   function scope FN and iterating order FLAGS.  If ROOT is
+   not null, the visiting would start from it, otherwise it
+   will start from tree_root of loops_for_fn (FN).  */
 
-inline loops_list::loops_list (function *fn, unsigned flags)
+inline loops_list::loops_list (function *fn, unsigned flags, loop_p root)
 {
   class loop *aloop;
-  unsigned i;
   int mn;
 
+  struct loops *loops = loops_for_fn (fn);
+  gcc_assert (!root || loops);
+
   this->fn = fn;
-  if (!loops_for_fn (fn))
+  if (!loops)
     return;
 
+  loop_p tree_root = root ? root : loops->tree_root;
+
   this->to_visit.reserve_exact (number_of_loops (fn));
-  mn = (flags & LI_INCLUDE_ROOT) ? 0 : 1;
+  mn = (flags & LI_INCLUDE_ROOT) ? -1 : tree_root->num;
 
-  if (flags & LI_ONLY_INNERMOST)
-    {
-      for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, &aloop); i++)
-	if (aloop != NULL
-	    && aloop->inner == NULL
-	    && aloop->num >= mn)
+  /* The helper function for LI_FROM_INNERMOST and LI_ONLY_INNERMOST
+     visiting, ONLY_PUSH_INNERMOST_P indicates whether only push
+     the innermost loop, it's true for LI_ONLY_INNERMOST visiting
+     while false for LI_FROM_INNERMOST visiting.  */
+  auto visit_from_innermost = [&] (bool only_push_innermost_p)
+  {
+    /* Push the loops to LI->TO_VISIT in postorder.  */
+
+    /* Early handle tree_root without any inner loops, make later
+       processing simpler, that is the while loop can only care
+       about loops which aren't possible to be tree_root.  */
+    if (!tree_root->inner)
+      {
+	if (tree_root->num != mn)
+	  this->to_visit.quick_push (tree_root->num);
+	return;
+      }
+
+    for (aloop = tree_root;
+	aloop->inner != NULL;
+	aloop = aloop->inner)
+      continue;
+
+    while (1)
+      {
+	gcc_assert (aloop != tree_root);
+	if (!only_push_innermost_p || aloop->inner == NULL)
 	  this->to_visit.quick_push (aloop->num);
-    }
-  else if (flags & LI_FROM_INNERMOST)
-    {
-      /* Push the loops to LI->TO_VISIT in postorder.  */
-      for (aloop = loops_for_fn (fn)->tree_root;
-	   aloop->inner != NULL;
-	   aloop = aloop->inner)
-	continue;
 
-      while (1)
-	{
-	  if (aloop->num >= mn)
-	    this->to_visit.quick_push (aloop->num);
+	if (aloop->next)
+	  {
+	    for (aloop = aloop->next;
+		 aloop->inner != NULL;
+		 aloop = aloop->inner)
+	      continue;
+	  }
+	else if (loop_outer (aloop) == tree_root)
+	  break;
+	else
+	  aloop = loop_outer (aloop);
+      }
+
+    /* Reconsider tree_root since the previous loop doesn't handle it.  */
+    if (!only_push_innermost_p && tree_root->num != mn)
+      this->to_visit.quick_push (tree_root->num);
+  };
 
-	  if (aloop->next)
-	    {
-	      for (aloop = aloop->next;
-		   aloop->inner != NULL;
-		   aloop = aloop->inner)
-		continue;
-	    }
-	  else if (!loop_outer (aloop))
-	    break;
-	  else
-	    aloop = loop_outer (aloop);
-	}
-    }
+  if (flags & LI_ONLY_INNERMOST)
+    visit_from_innermost (true);
+  else if (flags & LI_FROM_INNERMOST)
+    visit_from_innermost (false);
   else
     {
       /* Push the loops to LI->TO_VISIT in preorder.  */
-      aloop = loops_for_fn (fn)->tree_root;
+      aloop = tree_root;
       while (1)
 	{
-	  if (aloop->num >= mn)
+	  if (aloop->num != mn)
 	    this->to_visit.quick_push (aloop->num);
 
 	  if (aloop->inner != NULL)
 	    aloop = aloop->inner;
 	  else
 	    {
-	      while (aloop != NULL && aloop->next == NULL)
+	      while (aloop != tree_root && aloop->next == NULL)
 		aloop = loop_outer (aloop);
-	      if (aloop == NULL)
+	      if (aloop == tree_root)
 		break;
 	      aloop = aloop->next;
 	    }