diff mbox

[PR,tree-optimization/78856] Invalidate cached iteration information when threading across multiple loop headers

Message ID 0623b8a3-f772-1f81-f029-611a186aa302@redhat.com
State New
Headers show

Commit Message

Jeff Law Jan. 4, 2017, 5:31 a.m. UTC
So as noted in the BZ comments the jump threading code has code that 
detects when a jump threading path wants to cross multiple loop headers 
and truncates the jump threading path in that case.

What we should have done instead is invalidate the cached loop information.

Additionally, this BZ shows that just looking at loop headers is not 
sufficient -- we might cross from a reducible to an irreducible region 
which is equivalent to crossing into another loop in that we need to 
invalidate the cached loop iteration information.

What's so damn funny here is that eventually we take nested loops and 
irreducible regions, thread various edges and end up with a nice natural 
loop and no irreducible regions in the end :-)  But the cached iteration 
information is still bogus.

Anyway, this patch corrects both issues.  It treats moving between an 
reducible and irreducible region as crossing a loop header and it 
invalidates the cached iteration information rather than truncating the 
jump thread path.

Bootstrapped and regression tested on x86_64-linux-gnu.  That compiler 
was also used to build all the configurations in config-list.mk.

Installing on the trunk.  I could be convinced to install on the gcc-6 
branch as well since it's affected by the same problem.

Jeff
commit 93e3964a4664350446eefe786e3b73eb41d99036
Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Jan 4 05:31:23 2017 +0000

    	PR tree-optimizatin/78856
    	* tree-ssa-threadupdate.c: Include tree-vectorizer.h.
    	(mark_threaded_blocks): Remove code to truncate thread paths that
    	cross multiple loop headers.  Instead invalidate the cached loop
    	iteration information and handle case of a thread path walking
    	into an irreducible region.
    
    	PR tree-optimization/78856
    	* gcc.c-torture/execute/pr78856.c: New test.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244045 138bc75d-0d04-0410-961f-82ee72b054a4

Comments

Richard Biener Jan. 4, 2017, 12:25 p.m. UTC | #1
On Wed, Jan 4, 2017 at 6:31 AM, Jeff Law <law@redhat.com> wrote:
>
> So as noted in the BZ comments the jump threading code has code that detects
> when a jump threading path wants to cross multiple loop headers and
> truncates the jump threading path in that case.
>
> What we should have done instead is invalidate the cached loop information.
>
> Additionally, this BZ shows that just looking at loop headers is not
> sufficient -- we might cross from a reducible to an irreducible region which
> is equivalent to crossing into another loop in that we need to invalidate
> the cached loop iteration information.
>
> What's so damn funny here is that eventually we take nested loops and
> irreducible regions, thread various edges and end up with a nice natural
> loop and no irreducible regions in the end :-)  But the cached iteration
> information is still bogus.
>
> Anyway, this patch corrects both issues.  It treats moving between an
> reducible and irreducible region as crossing a loop header and it
> invalidates the cached iteration information rather than truncating the jump
> thread path.
>
> Bootstrapped and regression tested on x86_64-linux-gnu.  That compiler was
> also used to build all the configurations in config-list.mk.
>
> Installing on the trunk.  I could be convinced to install on the gcc-6
> branch as well since it's affected by the same problem.
>
> Jeff
>
>
> commit 93e3964a4664350446eefe786e3b73eb41d99036
> Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Wed Jan 4 05:31:23 2017 +0000
>
>         PR tree-optimizatin/78856
>         * tree-ssa-threadupdate.c: Include tree-vectorizer.h.
>         (mark_threaded_blocks): Remove code to truncate thread paths that
>         cross multiple loop headers.  Instead invalidate the cached loop
>         iteration information and handle case of a thread path walking
>         into an irreducible region.
>
>         PR tree-optimization/78856
>         * gcc.c-torture/execute/pr78856.c: New test.
>
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244045
> 138bc75d-0d04-0410-961f-82ee72b054a4
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 3114e02..6b2888f 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,12 @@
> +2017-01-03  Jeff Law  <law@redhat.com>
> +
> +       PR tree-optimizatin/78856
> +       * tree-ssa-threadupdate.c: Include tree-vectorizer.h.
> +       (mark_threaded_blocks): Remove code to truncate thread paths that
> +       cross multiple loop headers.  Instead invalidate the cached loop
> +       iteration information and handle case of a thread path walking
> +       into an irreducible region.
> +
>  2016-12-30  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         PR target/78900
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index cd2a065..cadfbc9 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,8 @@
> +2017-01-03  Jeff Law  <law@redhat.com>
> +
> +       PR tree-optimization/78856
> +       * gcc.c-torture/execute/pr78856.c: New test.
> +
>  2017-01-03  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         PR target/78953
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78856.c
> b/gcc/testsuite/gcc.c-torture/execute/pr78856.c
> new file mode 100644
> index 0000000..80f2317
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr78856.c
> @@ -0,0 +1,25 @@
> +extern void exit (int);
> +
> +int a, b, c, d, e, f[3];
> +
> +int main()
> +{
> +  while (d)
> +    while (1)
> +      ;
> +  int g = 0, h, i = 0;
> +  for (; g < 21; g += 9)
> +    {
> +      int j = 1;
> +      for (h = 0; h < 3; h++)
> +       f[h] = 1;
> +      for (; j < 10; j++) {
> +       d = i && (b ? 0 : c);
> +       i = 1;
> +       if (g)
> +         a = e;
> +      }
> +  }
> +  exit (0);
> +}
> +
> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> index adbb6e0..2da93a8 100644
> --- a/gcc/tree-ssa-threadupdate.c
> +++ b/gcc/tree-ssa-threadupdate.c
> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfgloop.h"
>  #include "dbgcnt.h"
>  #include "tree-cfg.h"
> +#include "tree-vectorizer.h"
>
>  /* Given a block B, update the CFG and SSA graph to reflect redirecting
>     one or more in-edges to B to instead reach the destination of an
> @@ -2084,10 +2085,8 @@ mark_threaded_blocks (bitmap threaded_blocks)
>    /* Look for jump threading paths which cross multiple loop headers.
>
>       The code to thread through loop headers will change the CFG in ways
> -     that break assumptions made by the loop optimization code.
> -
> -     We don't want to blindly cancel the requests.  We can instead do
> better
> -     by trimming off the end of the jump thread path.  */
> +     that invalidate the cached loop iteration information.  So we must
> +     detect that case and wipe the cached information.  */
>    EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
>      {
>        basic_block bb = BASIC_BLOCK_FOR_FN (cfun, i);
> @@ -2102,26 +2101,16 @@ mark_threaded_blocks (bitmap threaded_blocks)
>                    i++)
>                 {
>                   basic_block dest = (*path)[i]->e->dest;
> +                 basic_block src = (*path)[i]->e->src;
>                   crossed_headers += (dest == dest->loop_father->header);
> +                 /* If we step from a block outside an irreducible region
> +                    to a block inside an irreducible region, then we have
> +                    crossed into a loop.  */
> +                 crossed_headers += ((src->flags & BB_IRREDUCIBLE_LOOP)
> +                                     != (dest->flags &
> BB_IRREDUCIBLE_LOOP));

I'm not sure about this one -- if dest is inside an irreducible region then
dest->loop_father will be the containing loop which may be unrelated
(just crossed).  Either we miss handling of crossed loop headers or
whether dest is inside an irreducible region doesn't matter.

So, don't we want to check only whether we are src is inside a irreducible
region and dest is not?  And in this case we've crossed a loop header
anyway...

>                   if (crossed_headers > 1)
>                     {
> -                     /* Trim from entry I onwards.  */
> -                     for (unsigned int j = i; j < path->length (); j++)
> -                       delete (*path)[j];
> -                     path->truncate (i);
> -
> -                     /* Now that we've truncated the path, make sure
> -                        what's left is still valid.   We need at least
> -                        two edges on the path and the last edge can not
> -                        be a joiner.  This should never happen, but let's
> -                        be safe.  */
> -                     if (path->length () < 2
> -                         || (path->last ()->type
> -                             == EDGE_COPY_SRC_JOINER_BLOCK))
> -                       {
> -                         delete_jump_thread_path (path);
> -                         e->aux = NULL;
> -                       }
> +                     vect_free_loop_info_assumptions (dest->loop_father);
>                       break;
>                     }
>                 }
>
Jeff Law Jan. 10, 2017, 7:16 p.m. UTC | #2
On 01/04/2017 05:25 AM, Richard Biener wrote:
> On Wed, Jan 4, 2017 at 6:31 AM, Jeff Law <law@redhat.com> wrote:
>>
>> So as noted in the BZ comments the jump threading code has code that detects
>> when a jump threading path wants to cross multiple loop headers and
>> truncates the jump threading path in that case.
>>
>> What we should have done instead is invalidate the cached loop information.
>>
>> Additionally, this BZ shows that just looking at loop headers is not
>> sufficient -- we might cross from a reducible to an irreducible region which
>> is equivalent to crossing into another loop in that we need to invalidate
>> the cached loop iteration information.
>>
>> What's so damn funny here is that eventually we take nested loops and
>> irreducible regions, thread various edges and end up with a nice natural
>> loop and no irreducible regions in the end :-)  But the cached iteration
>> information is still bogus.
>>
>> Anyway, this patch corrects both issues.  It treats moving between an
>> reducible and irreducible region as crossing a loop header and it
>> invalidates the cached iteration information rather than truncating the jump
>> thread path.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu.  That compiler was
>> also used to build all the configurations in config-list.mk.
>>
>> Installing on the trunk.  I could be convinced to install on the gcc-6
>> branch as well since it's affected by the same problem.
>>
>> Jeff
>>
>>
>> commit 93e3964a4664350446eefe786e3b73eb41d99036
>> Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
>> Date:   Wed Jan 4 05:31:23 2017 +0000
>>
>>         PR tree-optimizatin/78856
>>         * tree-ssa-threadupdate.c: Include tree-vectorizer.h.
>>         (mark_threaded_blocks): Remove code to truncate thread paths that
>>         cross multiple loop headers.  Instead invalidate the cached loop
>>         iteration information and handle case of a thread path walking
>>         into an irreducible region.
>>
>>         PR tree-optimization/78856
>>         * gcc.c-torture/execute/pr78856.c: New test.
>>
>>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244045
>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 3114e02..6b2888f 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,12 @@
>> +2017-01-03  Jeff Law  <law@redhat.com>
>> +
>> +       PR tree-optimizatin/78856
>> +       * tree-ssa-threadupdate.c: Include tree-vectorizer.h.
>> +       (mark_threaded_blocks): Remove code to truncate thread paths that
>> +       cross multiple loop headers.  Instead invalidate the cached loop
>> +       iteration information and handle case of a thread path walking
>> +       into an irreducible region.
>> +
>>  2016-12-30  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>
>>         PR target/78900
>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>> index cd2a065..cadfbc9 100644
>> --- a/gcc/testsuite/ChangeLog
>> +++ b/gcc/testsuite/ChangeLog
>> @@ -1,3 +1,8 @@
>> +2017-01-03  Jeff Law  <law@redhat.com>
>> +
>> +       PR tree-optimization/78856
>> +       * gcc.c-torture/execute/pr78856.c: New test.
>> +
>>  2017-01-03  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>
>>         PR target/78953
>> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78856.c
>> b/gcc/testsuite/gcc.c-torture/execute/pr78856.c
>> new file mode 100644
>> index 0000000..80f2317
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.c-torture/execute/pr78856.c
>> @@ -0,0 +1,25 @@
>> +extern void exit (int);
>> +
>> +int a, b, c, d, e, f[3];
>> +
>> +int main()
>> +{
>> +  while (d)
>> +    while (1)
>> +      ;
>> +  int g = 0, h, i = 0;
>> +  for (; g < 21; g += 9)
>> +    {
>> +      int j = 1;
>> +      for (h = 0; h < 3; h++)
>> +       f[h] = 1;
>> +      for (; j < 10; j++) {
>> +       d = i && (b ? 0 : c);
>> +       i = 1;
>> +       if (g)
>> +         a = e;
>> +      }
>> +  }
>> +  exit (0);
>> +}
>> +
>> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
>> index adbb6e0..2da93a8 100644
>> --- a/gcc/tree-ssa-threadupdate.c
>> +++ b/gcc/tree-ssa-threadupdate.c
>> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "cfgloop.h"
>>  #include "dbgcnt.h"
>>  #include "tree-cfg.h"
>> +#include "tree-vectorizer.h"
>>
>>  /* Given a block B, update the CFG and SSA graph to reflect redirecting
>>     one or more in-edges to B to instead reach the destination of an
>> @@ -2084,10 +2085,8 @@ mark_threaded_blocks (bitmap threaded_blocks)
>>    /* Look for jump threading paths which cross multiple loop headers.
>>
>>       The code to thread through loop headers will change the CFG in ways
>> -     that break assumptions made by the loop optimization code.
>> -
>> -     We don't want to blindly cancel the requests.  We can instead do
>> better
>> -     by trimming off the end of the jump thread path.  */
>> +     that invalidate the cached loop iteration information.  So we must
>> +     detect that case and wipe the cached information.  */
>>    EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
>>      {
>>        basic_block bb = BASIC_BLOCK_FOR_FN (cfun, i);
>> @@ -2102,26 +2101,16 @@ mark_threaded_blocks (bitmap threaded_blocks)
>>                    i++)
>>                 {
>>                   basic_block dest = (*path)[i]->e->dest;
>> +                 basic_block src = (*path)[i]->e->src;
>>                   crossed_headers += (dest == dest->loop_father->header);
>> +                 /* If we step from a block outside an irreducible region
>> +                    to a block inside an irreducible region, then we have
>> +                    crossed into a loop.  */
>> +                 crossed_headers += ((src->flags & BB_IRREDUCIBLE_LOOP)
>> +                                     != (dest->flags &
>> BB_IRREDUCIBLE_LOOP));
>
> I'm not sure about this one -- if dest is inside an irreducible region then
> dest->loop_father will be the containing loop which may be unrelated
> (just crossed).  Either we miss handling of crossed loop headers or
> whether dest is inside an irreducible region doesn't matter.
So imagine where we have an irreducible region that lives within a 
natural loop.

We have a jump thread from outside the natural loop (and thus crosses 
the loop header) which targets a block inside the irreducible region.

As we walk the jump threading path, we first set crossed_headers to one 
as we walk from outside the loop to the loop header.  We then increment 
it again as we walk into the irreducible region.

It may be the case that this can be relaxed, but I'd rather be safe than 
sorry at this stage in the game.

Jeff
Richard Biener Jan. 11, 2017, 11 a.m. UTC | #3
On Tue, Jan 10, 2017 at 8:16 PM, Jeff Law <law@redhat.com> wrote:
> On 01/04/2017 05:25 AM, Richard Biener wrote:
>>
>> On Wed, Jan 4, 2017 at 6:31 AM, Jeff Law <law@redhat.com> wrote:
>>>
>>>
>>> So as noted in the BZ comments the jump threading code has code that
>>> detects
>>> when a jump threading path wants to cross multiple loop headers and
>>> truncates the jump threading path in that case.
>>>
>>> What we should have done instead is invalidate the cached loop
>>> information.
>>>
>>> Additionally, this BZ shows that just looking at loop headers is not
>>> sufficient -- we might cross from a reducible to an irreducible region
>>> which
>>> is equivalent to crossing into another loop in that we need to invalidate
>>> the cached loop iteration information.
>>>
>>> What's so damn funny here is that eventually we take nested loops and
>>> irreducible regions, thread various edges and end up with a nice natural
>>> loop and no irreducible regions in the end :-)  But the cached iteration
>>> information is still bogus.
>>>
>>> Anyway, this patch corrects both issues.  It treats moving between an
>>> reducible and irreducible region as crossing a loop header and it
>>> invalidates the cached iteration information rather than truncating the
>>> jump
>>> thread path.
>>>
>>> Bootstrapped and regression tested on x86_64-linux-gnu.  That compiler
>>> was
>>> also used to build all the configurations in config-list.mk.
>>>
>>> Installing on the trunk.  I could be convinced to install on the gcc-6
>>> branch as well since it's affected by the same problem.
>>>
>>> Jeff
>>>
>>>
>>> commit 93e3964a4664350446eefe786e3b73eb41d99036
>>> Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
>>> Date:   Wed Jan 4 05:31:23 2017 +0000
>>>
>>>         PR tree-optimizatin/78856
>>>         * tree-ssa-threadupdate.c: Include tree-vectorizer.h.
>>>         (mark_threaded_blocks): Remove code to truncate thread paths that
>>>         cross multiple loop headers.  Instead invalidate the cached loop
>>>         iteration information and handle case of a thread path walking
>>>         into an irreducible region.
>>>
>>>         PR tree-optimization/78856
>>>         * gcc.c-torture/execute/pr78856.c: New test.
>>>
>>>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244045
>>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>>
>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>> index 3114e02..6b2888f 100644
>>> --- a/gcc/ChangeLog
>>> +++ b/gcc/ChangeLog
>>> @@ -1,3 +1,12 @@
>>> +2017-01-03  Jeff Law  <law@redhat.com>
>>> +
>>> +       PR tree-optimizatin/78856
>>> +       * tree-ssa-threadupdate.c: Include tree-vectorizer.h.
>>> +       (mark_threaded_blocks): Remove code to truncate thread paths that
>>> +       cross multiple loop headers.  Instead invalidate the cached loop
>>> +       iteration information and handle case of a thread path walking
>>> +       into an irreducible region.
>>> +
>>>  2016-12-30  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>>
>>>         PR target/78900
>>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>>> index cd2a065..cadfbc9 100644
>>> --- a/gcc/testsuite/ChangeLog
>>> +++ b/gcc/testsuite/ChangeLog
>>> @@ -1,3 +1,8 @@
>>> +2017-01-03  Jeff Law  <law@redhat.com>
>>> +
>>> +       PR tree-optimization/78856
>>> +       * gcc.c-torture/execute/pr78856.c: New test.
>>> +
>>>  2017-01-03  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>>
>>>         PR target/78953
>>> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78856.c
>>> b/gcc/testsuite/gcc.c-torture/execute/pr78856.c
>>> new file mode 100644
>>> index 0000000..80f2317
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.c-torture/execute/pr78856.c
>>> @@ -0,0 +1,25 @@
>>> +extern void exit (int);
>>> +
>>> +int a, b, c, d, e, f[3];
>>> +
>>> +int main()
>>> +{
>>> +  while (d)
>>> +    while (1)
>>> +      ;
>>> +  int g = 0, h, i = 0;
>>> +  for (; g < 21; g += 9)
>>> +    {
>>> +      int j = 1;
>>> +      for (h = 0; h < 3; h++)
>>> +       f[h] = 1;
>>> +      for (; j < 10; j++) {
>>> +       d = i && (b ? 0 : c);
>>> +       i = 1;
>>> +       if (g)
>>> +         a = e;
>>> +      }
>>> +  }
>>> +  exit (0);
>>> +}
>>> +
>>> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
>>> index adbb6e0..2da93a8 100644
>>> --- a/gcc/tree-ssa-threadupdate.c
>>> +++ b/gcc/tree-ssa-threadupdate.c
>>> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "cfgloop.h"
>>>  #include "dbgcnt.h"
>>>  #include "tree-cfg.h"
>>> +#include "tree-vectorizer.h"
>>>
>>>  /* Given a block B, update the CFG and SSA graph to reflect redirecting
>>>     one or more in-edges to B to instead reach the destination of an
>>> @@ -2084,10 +2085,8 @@ mark_threaded_blocks (bitmap threaded_blocks)
>>>    /* Look for jump threading paths which cross multiple loop headers.
>>>
>>>       The code to thread through loop headers will change the CFG in ways
>>> -     that break assumptions made by the loop optimization code.
>>> -
>>> -     We don't want to blindly cancel the requests.  We can instead do
>>> better
>>> -     by trimming off the end of the jump thread path.  */
>>> +     that invalidate the cached loop iteration information.  So we must
>>> +     detect that case and wipe the cached information.  */
>>>    EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
>>>      {
>>>        basic_block bb = BASIC_BLOCK_FOR_FN (cfun, i);
>>> @@ -2102,26 +2101,16 @@ mark_threaded_blocks (bitmap threaded_blocks)
>>>                    i++)
>>>                 {
>>>                   basic_block dest = (*path)[i]->e->dest;
>>> +                 basic_block src = (*path)[i]->e->src;
>>>                   crossed_headers += (dest == dest->loop_father->header);
>>> +                 /* If we step from a block outside an irreducible
>>> region
>>> +                    to a block inside an irreducible region, then we
>>> have
>>> +                    crossed into a loop.  */
>>> +                 crossed_headers += ((src->flags & BB_IRREDUCIBLE_LOOP)
>>> +                                     != (dest->flags &
>>> BB_IRREDUCIBLE_LOOP));
>>
>>
>> I'm not sure about this one -- if dest is inside an irreducible region
>> then
>> dest->loop_father will be the containing loop which may be unrelated
>> (just crossed).  Either we miss handling of crossed loop headers or
>> whether dest is inside an irreducible region doesn't matter.
>
> So imagine where we have an irreducible region that lives within a natural
> loop.
>
> We have a jump thread from outside the natural loop (and thus crosses the
> loop header) which targets a block inside the irreducible region.
>
> As we walk the jump threading path, we first set crossed_headers to one as
> we walk from outside the loop to the loop header.  We then increment it
> again as we walk into the irreducible region.
>
> It may be the case that this can be relaxed, but I'd rather be safe than
> sorry at this stage in the game.

Sure, I was pointing out inconsistencies.  Quoting the updated code:

              vec<jump_thread_edge *> *path = THREAD_PATH (e);

              for (unsigned int i = 0, crossed_headers = 0;
                   i < path->length ();
                   i++)
                {
                  basic_block dest = (*path)[i]->e->dest;
                  basic_block src = (*path)[i]->e->src;
                  crossed_headers += (dest == dest->loop_father->header);
                  /* If we step from a block outside an irreducible region
                     to a block inside an irreducible region, then we have
                     crossed into a loop.  */
                  crossed_headers += ((src->flags & BB_IRREDUCIBLE_LOOP)
                                      != (dest->flags & BB_IRREDUCIBLE_LOOP));
                  if (crossed_headers > 1)
                    {
                      vect_free_loop_info_assumptions (dest->loop_father);
                      break;
                    }
                }

assuming THREAD_PATH covers all edges in a path then assume a thread path
threads from outside a two-level nested loop into the innermost loop then the
above code will correctly detect we cross the outer loop header and reset its
loop assumptions.  But it will fail to reset assumptions for the 2nd
header crossing
into the inner loop.  And as we can just reset assumptions for loops
(not irreducible
regions) I would have expected sth like

              for (unsigned int i = 0, crossed_headers = 0;
                   i < path->length ();
                   i++)
                {
                  basic_block dest = (*path)[i]->e->dest;
                  if (dest == dest->loop_father->header)
                    vect_free_loop_info_assumptions (dest->loop_father);
                }

or even walking the path backwards as I think only the path ultimate destination
loop father info needs to be scrapped(?).  So

              for (unsigned int i = 0, crossed_headers = 0;
                   i < path->length ();
                   i++)
                {
                  basic_block dest = (*path)[i]->e->dest;
                  basic_block src = (*path)[i]->e->src;
                  crossed_headers += (dest == dest->loop_father->header);
                  /* If we step from a block outside an irreducible region
                     to a block inside an irreducible region, then we have
                     crossed into a loop.  */
                  crossed_headers += ((src->flags & BB_IRREDUCIBLE_LOOP)
                                      != (dest->flags & BB_IRREDUCIBLE_LOOP));
                  if (crossed_headers > 1)
                    {
                      vect_free_loop_info_assumptions
                        ((*path)[path->length () - 1]->e->dest->loop_father);
                      break;
                    }
                }

or walking the path backwards?

Thanks,
Richard.

> Jeff
diff mbox

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3114e02..6b2888f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@ 
+2017-01-03  Jeff Law  <law@redhat.com>
+
+	PR tree-optimizatin/78856
+	* tree-ssa-threadupdate.c: Include tree-vectorizer.h.
+	(mark_threaded_blocks): Remove code to truncate thread paths that
+	cross multiple loop headers.  Instead invalidate the cached loop
+	iteration information and handle case of a thread path walking
+	into an irreducible region.
+
 2016-12-30  Michael Meissner  <meissner@linux.vnet.ibm.com>
 
 	PR target/78900
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index cd2a065..cadfbc9 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@ 
+2017-01-03  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/78856
+	* gcc.c-torture/execute/pr78856.c: New test.
+
 2017-01-03  Michael Meissner  <meissner@linux.vnet.ibm.com>
 
 	PR target/78953
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78856.c b/gcc/testsuite/gcc.c-torture/execute/pr78856.c
new file mode 100644
index 0000000..80f2317
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr78856.c
@@ -0,0 +1,25 @@ 
+extern void exit (int);
+
+int a, b, c, d, e, f[3]; 
+
+int main() 
+{
+  while (d)
+    while (1)
+      ;
+  int g = 0, h, i = 0;
+  for (; g < 21; g += 9) 
+    {
+      int j = 1;
+      for (h = 0; h < 3; h++)
+	f[h] = 1;
+      for (; j < 10; j++) {
+	d = i && (b ? 0 : c); 
+	i = 1;
+	if (g)
+	  a = e;
+      }
+  }
+  exit (0);
+}
+
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index adbb6e0..2da93a8 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -34,6 +34,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "dbgcnt.h"
 #include "tree-cfg.h"
+#include "tree-vectorizer.h"
 
 /* Given a block B, update the CFG and SSA graph to reflect redirecting
    one or more in-edges to B to instead reach the destination of an
@@ -2084,10 +2085,8 @@  mark_threaded_blocks (bitmap threaded_blocks)
   /* Look for jump threading paths which cross multiple loop headers.
 
      The code to thread through loop headers will change the CFG in ways
-     that break assumptions made by the loop optimization code.
-
-     We don't want to blindly cancel the requests.  We can instead do better
-     by trimming off the end of the jump thread path.  */
+     that invalidate the cached loop iteration information.  So we must
+     detect that case and wipe the cached information.  */
   EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
     {
       basic_block bb = BASIC_BLOCK_FOR_FN (cfun, i);
@@ -2102,26 +2101,16 @@  mark_threaded_blocks (bitmap threaded_blocks)
 		   i++)
 		{
 		  basic_block dest = (*path)[i]->e->dest;
+		  basic_block src = (*path)[i]->e->src;
 		  crossed_headers += (dest == dest->loop_father->header);
+		  /* If we step from a block outside an irreducible region
+		     to a block inside an irreducible region, then we have
+		     crossed into a loop.  */
+		  crossed_headers += ((src->flags & BB_IRREDUCIBLE_LOOP)
+				      != (dest->flags & BB_IRREDUCIBLE_LOOP));
 		  if (crossed_headers > 1)
 		    {
-		      /* Trim from entry I onwards.  */
-		      for (unsigned int j = i; j < path->length (); j++)
-			delete (*path)[j];
-		      path->truncate (i);
-
-		      /* Now that we've truncated the path, make sure
-			 what's left is still valid.   We need at least
-			 two edges on the path and the last edge can not
-			 be a joiner.  This should never happen, but let's
-			 be safe.  */
-		      if (path->length () < 2
-			  || (path->last ()->type
-			      == EDGE_COPY_SRC_JOINER_BLOCK))
-			{
-			  delete_jump_thread_path (path);
-			  e->aux = NULL;
-			}
+		      vect_free_loop_info_assumptions (dest->loop_father);
 		      break;
 		    }
 		}