diff mbox series

Fix incorrect computation in fill_always_executed_in_1

Message ID 20210816084612.7999-1-luoxhu@linux.ibm.com
State New
Headers show
Series Fix incorrect computation in fill_always_executed_in_1 | expand

Commit Message

Xionghu Luo Aug. 16, 2021, 8:46 a.m. UTC
It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops.  inn_loop is updated to inner loop, so it need be restored
when exiting from innermost loop. With this patch, the store instruction
in outer loop could also be moved out of outer loop by store motion.
Any comments?  Thanks.

gcc/ChangeLog:

	* tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
	inn_loop when exiting from innermost loop.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++
 gcc/tree-ssa-loop-im.c                     |  6 +++++-
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c

Comments

Richard Biener Aug. 16, 2021, 11:46 a.m. UTC | #1
On Mon, 16 Aug 2021, Xiong Hu Luo wrote:

> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
> nested loops.  inn_loop is updated to inner loop, so it need be restored
> when exiting from innermost loop. With this patch, the store instruction
> in outer loop could also be moved out of outer loop by store motion.
> Any comments?  Thanks.

> gcc/ChangeLog:
> 
> 	* tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
> 	inn_loop when exiting from innermost loop.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++
>  gcc/tree-ssa-loop-im.c                     |  6 +++++-
>  2 files changed, 29 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> new file mode 100644
> index 00000000000..097a5ee4a4b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> @@ -0,0 +1,24 @@
> +/* PR/101293 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +struct X { int i; int j; int k;};
> +
> +void foo(struct X *x, int n, int l)
> +{
> +  for (int j = 0; j < l; j++)
> +    {
> +      for (int i = 0; i < n; ++i)
> +	{
> +	  int *p = &x->j;
> +	  int tem = *p;
> +	  x->j += tem * i;
> +	}
> +      int *r = &x->k;
> +      int tem2 = *r;
> +      x->k += tem2 * j;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */
> +
> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> index b24bc64f2a7..5ca4738b20e 100644
> --- a/gcc/tree-ssa-loop-im.c
> +++ b/gcc/tree-ssa-loop-im.c
> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
>  	  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>  	    last = bb;
>  
> +	  if (inn_loop != loop
> +	      && flow_loop_nested_p (bb->loop_father, inn_loop))
> +	    inn_loop = bb->loop_father;
> +

The comment says

              /* In a loop that is always entered we may proceed anyway.
                 But record that we entered it and stop once we leave it.  
*/
              inn_loop = bb->loop_father;

and your change would defeat that early return, no?

>  	  if (bitmap_bit_p (contains_call, bb->index))
>  	    break;
>  
> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
>  
>  	  if (bb->loop_father->header == bb)
>  	    {
> -	      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> +	      if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, bb))
>  		break;

That's now a always false condition - a loops latch is always dominated
by its header.  The condition as written tries to verify whether the
loop is always entered - mind we visit all blocks, not only those
always executed.

In fact for your testcase the x->j ref is _not_ always executed
since the inner loop is conditional on n > 0.

Richard.
Xionghu Luo Aug. 17, 2021, 5:17 a.m. UTC | #2
Hi,

On 2021/8/16 19:46, Richard Biener wrote:
> On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
> 
>> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
>> nested loops.  inn_loop is updated to inner loop, so it need be restored
>> when exiting from innermost loop. With this patch, the store instruction
>> in outer loop could also be moved out of outer loop by store motion.
>> Any comments?  Thanks.
> 
>> gcc/ChangeLog:
>>
>> 	* tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
>> 	inn_loop when exiting from innermost loop.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
>> ---
>>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++
>>   gcc/tree-ssa-loop-im.c                     |  6 +++++-
>>   2 files changed, 29 insertions(+), 1 deletion(-)
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>> new file mode 100644
>> index 00000000000..097a5ee4a4b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>> @@ -0,0 +1,24 @@
>> +/* PR/101293 */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
>> +
>> +struct X { int i; int j; int k;};
>> +
>> +void foo(struct X *x, int n, int l)
>> +{
>> +  for (int j = 0; j < l; j++)
>> +    {
>> +      for (int i = 0; i < n; ++i)
>> +	{
>> +	  int *p = &x->j;
>> +	  int tem = *p;
>> +	  x->j += tem * i;
>> +	}
>> +      int *r = &x->k;
>> +      int tem2 = *r;
>> +      x->k += tem2 * j;
>> +    }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */
>> +
>> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
>> index b24bc64f2a7..5ca4738b20e 100644
>> --- a/gcc/tree-ssa-loop-im.c
>> +++ b/gcc/tree-ssa-loop-im.c
>> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
>>   	  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>   	    last = bb;
>>   
>> +	  if (inn_loop != loop
>> +	      && flow_loop_nested_p (bb->loop_father, inn_loop))
>> +	    inn_loop = bb->loop_father;
>> +
> 
> The comment says
> 
>                /* In a loop that is always entered we may proceed anyway.
>                   But record that we entered it and stop once we leave it.
> */
>                inn_loop = bb->loop_father;
> 
> and your change would defeat that early return, no?

The issue is the search method exits too early when iterating the outer
loop.  For example of a nested loop, loop 1 includes 5,8,3,10,4,9
and loop2 includes 3,10.  Currently, it breaks when bb is 3 as bb 3
doesn't dominate bb 9 of loop 1.  But actually, both bb 5 and bb 4 are
ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
they won't be processed by store motion again.
 

     5<----
     |\   |
     8 \  9
     |  \ |
 --->3--->4
|    | 
10---|
 

SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
patch, it will continue search when meet bb 3 until bb 4, then last is updated
to bb 4, it will break until exit edge is found at bb 4 by
"if (!flow_bb_inside_loop_p (loop, e->dest))".  Then the followed loop code will
set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.


      while (1)
	{
	  SET_ALWAYS_EXECUTED_IN (last, loop);
	  if (last == loop->header)
	    break;
	  last = get_immediate_dominator (CDI_DOMINATORS, last);
	}

After further discussion with Kewen, we found that the inn_loop variable is
totally useless and could be removed.


> 
>>   	  if (bitmap_bit_p (contains_call, bb->index))
>>   	    break;
>>   
>> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
>>   
>>   	  if (bb->loop_father->header == bb)
>>   	    {
>> -	      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>> +	      if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, bb))
>>   		break;
> 
> That's now a always false condition - a loops latch is always dominated
> by its header.  The condition as written tries to verify whether the
> loop is always entered - mind we visit all blocks, not only those
> always executed.

Thanks for the catch!  I am afraid the piece of code should be removed since it stops
search of potential ALWAYS EXECUTED bb after inner loop...

> 
> In fact for your testcase the x->j ref is _not_ always executed
> since the inner loop is conditional on n > 0.

Yes.  But I want to move x->k (not x->j) out of loop 1 when l > 0 in store-motion.
Attached the diff file without and with my patch to show the extra optimization.

x->j is already moved out of loop 2 on master code.
If change n and l to constant numbers like 100, master code could also do 2 store
motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb 4, bb 3
and bb 5 are ALWAYS EXECUTED for loop 1.


struct X { int i; int j; int k;};

void foo(struct X *x, int n, int l)
{
  for (int j = 0; j < l; j++) // loop 1
    {
      for (int i = 0; i < n; ++i)  // loop 2
        {
          int *p = &x->j;
          int tem = *p;
          x->j += tem * i;
        }
      int *r = &x->k;
      int tem2 = *r;
      x->k += tem2 * j;
    }
}

> 
> Richard.
>
Xionghu Luo Aug. 17, 2021, 5:24 a.m. UTC | #3
On 2021/8/17 13:17, Xionghu Luo via Gcc-patches wrote:
> Hi,
> 
> On 2021/8/16 19:46, Richard Biener wrote:
>> On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
>>
>>> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
>>> nested loops.  inn_loop is updated to inner loop, so it need be restored
>>> when exiting from innermost loop. With this patch, the store instruction
>>> in outer loop could also be moved out of outer loop by store motion.
>>> Any comments?  Thanks.
>>
>>> gcc/ChangeLog:
>>>
>>>     * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
>>>     inn_loop when exiting from innermost loop.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>     * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
>>> ---
>>>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++
>>>   gcc/tree-ssa-loop-im.c                     |  6 +++++-
>>>   2 files changed, 29 insertions(+), 1 deletion(-)
>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>>
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c 
>>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>> new file mode 100644
>>> index 00000000000..097a5ee4a4b
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>> @@ -0,0 +1,24 @@
>>> +/* PR/101293 */
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
>>> +
>>> +struct X { int i; int j; int k;};
>>> +
>>> +void foo(struct X *x, int n, int l)
>>> +{
>>> +  for (int j = 0; j < l; j++)
>>> +    {
>>> +      for (int i = 0; i < n; ++i)
>>> +    {
>>> +      int *p = &x->j;
>>> +      int tem = *p;
>>> +      x->j += tem * i;
>>> +    }
>>> +      int *r = &x->k;
>>> +      int tem2 = *r;
>>> +      x->k += tem2 * j;
>>> +    }
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 
>>> "lim2" } } */
>>> +
>>> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
>>> index b24bc64f2a7..5ca4738b20e 100644
>>> --- a/gcc/tree-ssa-loop-im.c
>>> +++ b/gcc/tree-ssa-loop-im.c
>>> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, 
>>> sbitmap contains_call)
>>>         if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>>           last = bb;
>>> +      if (inn_loop != loop
>>> +          && flow_loop_nested_p (bb->loop_father, inn_loop))
>>> +        inn_loop = bb->loop_father;
>>> +
>>
>> The comment says
>>
>>                /* In a loop that is always entered we may proceed anyway.
>>                   But record that we entered it and stop once we leave 
>> it.
>> */
>>                inn_loop = bb->loop_father;
>>
>> and your change would defeat that early return, no?
> 
> The issue is the search method exits too early when iterating the outer
> loop.  For example of a nested loop, loop 1 includes 5,8,3,10,4,9
> and loop2 includes 3,10.  Currently, it breaks when bb is 3 as bb 3
> doesn't dominate bb 9 of loop 1.  But actually, both bb 5 and bb 4 are
> ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
> they won't be processed by store motion again.
> 
> 
>      5<----
>      |\   |
>      8 \  9
>      |  \ |
> --->3--->4
> |    | 10---|



Correct the graph display:

      5<----
      |\   |
      8 \  9
      |  \ |
  --->3--->4
|   |
  ---10


> 
> 
> SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
> patch, it will continue search when meet bb 3 until bb 4, then last is 
> updated
> to bb 4, it will break until exit edge is found at bb 4 by
> "if (!flow_bb_inside_loop_p (loop, e->dest))".  Then the followed loop 
> code will
> set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.
> 
> 
>       while (1)
>      {
>        SET_ALWAYS_EXECUTED_IN (last, loop);
>        if (last == loop->header)
>          break;
>        last = get_immediate_dominator (CDI_DOMINATORS, last);
>      }
> 
> After further discussion with Kewen, we found that the inn_loop variable is
> totally useless and could be removed.
> 
> 
>>
>>>         if (bitmap_bit_p (contains_call, bb->index))
>>>           break;
>>> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, 
>>> sbitmap contains_call)
>>>         if (bb->loop_father->header == bb)
>>>           {
>>> -          if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>> +          if (!dominated_by_p (CDI_DOMINATORS, 
>>> bb->loop_father->latch, bb))
>>>           break;
>>
>> That's now a always false condition - a loops latch is always dominated
>> by its header.  The condition as written tries to verify whether the
>> loop is always entered - mind we visit all blocks, not only those
>> always executed.
> 
> Thanks for the catch!  I am afraid the piece of code should be removed 
> since it stops
> search of potential ALWAYS EXECUTED bb after inner loop...
> 
>>
>> In fact for your testcase the x->j ref is _not_ always executed
>> since the inner loop is conditional on n > 0.
> 
> Yes.  But I want to move x->k (not x->j) out of loop 1 when l > 0 in 
> store-motion.
> Attached the diff file without and with my patch to show the extra 
> optimization.
> 
> x->j is already moved out of loop 2 on master code.
> If change n and l to constant numbers like 100, master code could also 
> do 2 store
> motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb 
> 4, bb 3
> and bb 5 are ALWAYS EXECUTED for loop 1.
> 
> 
> struct X { int i; int j; int k;};
> 
> void foo(struct X *x, int n, int l)
> {
>   for (int j = 0; j < l; j++) // loop 1
>     {
>       for (int i = 0; i < n; ++i)  // loop 2
>         {
>           int *p = &x->j;
>           int tem = *p;
>           x->j += tem * i;
>         }
>       int *r = &x->k;
>       int tem2 = *r;
>       x->k += tem2 * j;
>     }
> }
> 
>>
>> Richard.
>>
>
Richard Biener Aug. 17, 2021, 7:12 a.m. UTC | #4
On Tue, 17 Aug 2021, Xionghu Luo wrote:

> Hi,
> 
> On 2021/8/16 19:46, Richard Biener wrote:
> > On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
> > 
> >> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
> >> nested loops.  inn_loop is updated to inner loop, so it need be restored
> >> when exiting from innermost loop. With this patch, the store instruction
> >> in outer loop could also be moved out of outer loop by store motion.
> >> Any comments?  Thanks.
> > 
> >> gcc/ChangeLog:
> >>
> >>  * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
> >>  inn_loop when exiting from innermost loop.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 	* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> >> ---
> >>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++
> >>   gcc/tree-ssa-loop-im.c                     |  6 +++++-
> >>   2 files changed, 29 insertions(+), 1 deletion(-)
> >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >> new file mode 100644
> >> index 00000000000..097a5ee4a4b
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >> @@ -0,0 +1,24 @@
> >> +/* PR/101293 */
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> >> +
> >> +struct X { int i; int j; int k;};
> >> +
> >> +void foo(struct X *x, int n, int l)
> >> +{
> >> +  for (int j = 0; j < l; j++)
> >> +    {
> >> +      for (int i = 0; i < n; ++i)
> >> +	{
> >> +	  int *p = &x->j;
> >> +	  int tem = *p;
> >> +	  x->j += tem * i;
> >> +	}
> >> +      int *r = &x->k;
> >> +      int tem2 = *r;
> >> +      x->k += tem2 * j;
> >> +    }
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } }
> >> */
> >> +
> >> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> >> index b24bc64f2a7..5ca4738b20e 100644
> >> --- a/gcc/tree-ssa-loop-im.c
> >> +++ b/gcc/tree-ssa-loop-im.c
> >> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
> >> @@ contains_call)
> >>      if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >>        last = bb;
> >>   +	  if (inn_loop != loop
> >> +	      && flow_loop_nested_p (bb->loop_father, inn_loop))
> >> +	    inn_loop = bb->loop_father;
> >> +
> > 
> > The comment says
> > 
> >                /* In a loop that is always entered we may proceed anyway.
> >                   But record that we entered it and stop once we leave it.
> > */
> >                inn_loop = bb->loop_father;
> > 
> > and your change would defeat that early return, no?
> 
> The issue is the search method exits too early when iterating the outer
> loop.  For example of a nested loop, loop 1 includes 5,8,3,10,4,9
> and loop2 includes 3,10.  Currently, it breaks when bb is 3 as bb 3
> doesn't dominate bb 9 of loop 1.  But actually, both bb 5 and bb 4 are
> ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
> they won't be processed by store motion again.
> 
> 
>     5<----
>     |\   |
>     8 \  9
>     |  \ |
> --->3--->4
> |    | 
> 10---|
> 
> 
> SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
> patch, it will continue search when meet bb 3 until bb 4, then last is updated
> to bb 4, it will break until exit edge is found at bb 4 by
> "if (!flow_bb_inside_loop_p (loop, e->dest))".  Then the followed loop code
> will
> set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.
> 
> 
>      while (1)
> 	{
> 	  SET_ALWAYS_EXECUTED_IN (last, loop);
> 	  if (last == loop->header)
> 	    break;
> 	  last = get_immediate_dominator (CDI_DOMINATORS, last);
> 	}
> 
> After further discussion with Kewen, we found that the inn_loop variable is
> totally useless and could be removed.
> 
> 
> > 
> >>      if (bitmap_bit_p (contains_call, bb->index))
> >>        break;
> >>   
> >> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
> >> @@ contains_call)
> >>   
> >>      if (bb->loop_father->header == bb)
> >>   	    {
> >> -	      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >> +	      if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch,
> >> bb))
> >>     break;
> > 
> > That's now a always false condition - a loops latch is always dominated
> > by its header.  The condition as written tries to verify whether the
> > loop is always entered - mind we visit all blocks, not only those
> > always executed.
> 
> Thanks for the catch!  I am afraid the piece of code should be removed since
> it stops
> search of potential ALWAYS EXECUTED bb after inner loop...

But the code says:

            /* In a loop that is always entered we may proceed anyway.
               But record that we entered it and stop once we leave it.
             */

and you do not remove this comment still it doesn't hold anymore
after your patch.  I don't say the current code is optimal - I just
say it does exactly what is documented.


> > 
> > In fact for your testcase the x->j ref is _not_ always executed
> > since the inner loop is conditional on n > 0.
> 
> Yes.  But I want to move x->k (not x->j) out of loop 1 when l > 0 in
> store-motion.

Sorry, that wasn't clear.  Yes, I agree the code fails to see always
executed blocks after inner loops.  But since the code simply walks
all blocks in a loop instead of greedily following edges it has
to do that since the inner loop might exit the outer loop as well,
sth your change wouldn't honor.  Consider

 while (--n)
  {
    do
      {
        if (do_exit)
          goto out;
      }
    while (1);
    p->x += 1;
  }
out:;

you'll see a CFG where the inner loop exits the outer loop as well.

So I'm saying to improve the code you'll likely have to do more.
And add a testcase like the following

void __attribute__((noipa))
foo (int n, int m, int f, int *p, int *q)
{
 while (--n)
  {
    do
      {
        *q += 1;
        if (f)
          goto out;
      }
    while (--m);
    *p += 1;
  }
out:;
}

int main()
{
  int i = 0;
  foo (10, 10, 1, (void *)0, &i);
  if (i != 1)
    __builtin_abort ();
  return 0;
}

Richard.

> Attached the diff file without and with my patch to show the extra
> optimization.
> 
> x->j is already moved out of loop 2 on master code.
> If change n and l to constant numbers like 100, master code could also do 2
> store
> motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb 4, bb
> 3
> and bb 5 are ALWAYS EXECUTED for loop 1.
> 
> 
> struct X { int i; int j; int k;};
> 
> void foo(struct X *x, int n, int l)
> {
>  for (int j = 0; j < l; j++) // loop 1
>    {
>      for (int i = 0; i < n; ++i)  // loop 2
>        {
>          int *p = &x->j;
>          int tem = *p;
>          x->j += tem * i;
>        }
>      int *r = &x->k;
>      int tem2 = *r;
>      x->k += tem2 * j;
>    }
> }
> 
> > 
> > Richard.
> > 
> 
>
Segher Boessenkool Aug. 17, 2021, 8:59 p.m. UTC | #5
Hi!

As an aside...

On Mon, Aug 16, 2021 at 03:46:12AM -0500, Xiong Hu Luo wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c

> --- a/gcc/tree-ssa-loop-im.c
> +++ b/gcc/tree-ssa-loop-im.c

You can make a saner order for your diffs by putting the testsuite
changes after the rest.  A very mimimal example would use

[diff]
	orderfile = .gitorder

with that file containing something like

gcc/*.*

and nothing else even.  Yeah this is minimal ;-)


Segher
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
new file mode 100644
index 00000000000..097a5ee4a4b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
@@ -0,0 +1,24 @@ 
+/* PR/101293 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+struct X { int i; int j; int k;};
+
+void foo(struct X *x, int n, int l)
+{
+  for (int j = 0; j < l; j++)
+    {
+      for (int i = 0; i < n; ++i)
+	{
+	  int *p = &x->j;
+	  int tem = *p;
+	  x->j += tem * i;
+	}
+      int *r = &x->k;
+      int tem2 = *r;
+      x->k += tem2 * j;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */
+
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index b24bc64f2a7..5ca4738b20e 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3211,6 +3211,10 @@  fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
 	  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
 	    last = bb;
 
+	  if (inn_loop != loop
+	      && flow_loop_nested_p (bb->loop_father, inn_loop))
+	    inn_loop = bb->loop_father;
+
 	  if (bitmap_bit_p (contains_call, bb->index))
 	    break;
 
@@ -3238,7 +3242,7 @@  fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
 
 	  if (bb->loop_father->header == bb)
 	    {
-	      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+	      if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, bb))
 		break;
 
 	      /* In a loop that is always entered we may proceed anyway.