Allow cfgcleanup to remove forwarder loop preheaders and latches
diff mbox

Message ID CAFiYyc2A=6dW9aqAU9CG-sjYtyCj5X7D9HbnuKSby6kceajTYg@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Feb. 28, 2014, 9:09 a.m. UTC
On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng <bin.cheng@arm.com> wrote:
>> Hi,
>> This patch is to fix regression reported in PR60280 by removing forward loop
>> headers/latches in cfg cleanup if possible.  Several tests are broken by
>> this change since cfg cleanup is shared by all optimizers.  Some tests has
>> already been fixed by recent patches, I went through and fixed the others.
>> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  When
>> GCC removing a basic block, it checks profile information by calling
>> check_bb_profile after redirecting incoming edges of the bb.  This certainly
>> results in warnings about invalid profile information and causes the case to
>> fail.  I will send a patch to skip checking profile information for a
>> removing basic block in stage 1 if it sounds reasonable.  For now I just
>> twisted the case itself.
>>
>> Bootstrap and tested on x86_64 and arm_a15.
>>
>> Is it OK?
>>
>>
>> 2014-02-25  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR target/60280
>>         * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
>>         preheaders and latches only if requested.  Fix latch if it
>>         is removed.
>>         * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
>>         LOOPS_HAVE_PREHEADERS.
>>
>
> This change:
>
>         if (dest->loop_father->header == dest)
> -  return false;
> +  {
> +    if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
> +        && bb->loop_father->header != dest)
> +      return false;
> +
> +    if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
> +        && bb->loop_father->header == dest)
> +      return false;
> +  }
>      }
>
> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>
> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
> -fuse-linker-plugin
>
> This patch changes loops without LOOPS_HAVE_PREHEADERS
> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
> true.  I don't have a small testcase.  But this patch:
>
> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
> index b5c384b..2ba673c 100644
> --- a/gcc/tree-cfgcleanup.c
> +++ b/gcc/tree-cfgcleanup.c
> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
>      if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>          && bb->loop_father->header == dest)
>        return false;
> +
> +    if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
> +        && !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
> +      return false;
>    }
>      }
>
> fixes the regression.  Does it make any senses?

I think the preheader test isn't fully correct (bb may be in an inner loop
for example).  So a more conservative variant would be

that makes sure we can properly update loop information.  It's also
a more conservative change at this point which should still successfully
remove simple latches and preheaders created by loop discovery.

Does it fix 435.gromacs?

Thanks,
Richard.



>
> --
> H.J.

Comments

Richard Biener Feb. 28, 2014, 10:09 a.m. UTC | #1
On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng <bin.cheng@arm.com> wrote:
>>> Hi,
>>> This patch is to fix regression reported in PR60280 by removing forward loop
>>> headers/latches in cfg cleanup if possible.  Several tests are broken by
>>> this change since cfg cleanup is shared by all optimizers.  Some tests has
>>> already been fixed by recent patches, I went through and fixed the others.
>>> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  When
>>> GCC removing a basic block, it checks profile information by calling
>>> check_bb_profile after redirecting incoming edges of the bb.  This certainly
>>> results in warnings about invalid profile information and causes the case to
>>> fail.  I will send a patch to skip checking profile information for a
>>> removing basic block in stage 1 if it sounds reasonable.  For now I just
>>> twisted the case itself.
>>>
>>> Bootstrap and tested on x86_64 and arm_a15.
>>>
>>> Is it OK?
>>>
>>>
>>> 2014-02-25  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         PR target/60280
>>>         * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
>>>         preheaders and latches only if requested.  Fix latch if it
>>>         is removed.
>>>         * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
>>>         LOOPS_HAVE_PREHEADERS.
>>>
>>
>> This change:
>>
>>         if (dest->loop_father->header == dest)
>> -  return false;
>> +  {
>> +    if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>> +        && bb->loop_father->header != dest)
>> +      return false;
>> +
>> +    if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>> +        && bb->loop_father->header == dest)
>> +      return false;
>> +  }
>>      }
>>
>> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>>
>> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
>> -fuse-linker-plugin
>>
>> This patch changes loops without LOOPS_HAVE_PREHEADERS
>> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
>> true.  I don't have a small testcase.  But this patch:
>>
>> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
>> index b5c384b..2ba673c 100644
>> --- a/gcc/tree-cfgcleanup.c
>> +++ b/gcc/tree-cfgcleanup.c
>> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
>>      if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>          && bb->loop_father->header == dest)
>>        return false;
>> +
>> +    if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>> +        && !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
>> +      return false;
>>    }
>>      }
>>
>> fixes the regression.  Does it make any senses?
>
> I think the preheader test isn't fully correct (bb may be in an inner loop
> for example).  So a more conservative variant would be
>
> Index: gcc/tree-cfgcleanup.c
> ===================================================================
> --- gcc/tree-cfgcleanup.c       (revision 208169)
> +++ gcc/tree-cfgcleanup.c       (working copy)
> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
>        /* Protect loop preheaders and latches if requested.  */
>        if (dest->loop_father->header == dest)
>         {
> -         if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
> -             && bb->loop_father->header != dest)
> -           return false;
> -
> -         if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
> -             && bb->loop_father->header == dest)
> -           return false;
> +         if (bb->loop_father == dest->loop_father)
> +           return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
> +         else if (bb->loop_father == loop_outer (dest->loop_father))
> +           return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
> +         /* Always preserve other edges into loop headers that are
> +            not simple latches or preheaders.  */
> +         return false;
>         }
>      }
>
> that makes sure we can properly update loop information.  It's also
> a more conservative change at this point which should still successfully
> remove simple latches and preheaders created by loop discovery.

I think the patch makes sense anyway and thus I'll install it once it
passed bootstrap / regtesting.

Another fix that may make sense is to restrict it to
!loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
itself can end up setting that ... which we eventually should fix if it
still happens.  That is, check if

Index: gcc/tree-cfgcleanup.c
===================================================================
--- gcc/tree-cfgcleanup.c       (revision 208169)
+++ gcc/tree-cfgcleanup.c       (working copy)

@@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)

   timevar_pop (TV_TREE_CLEANUP_CFG);

-  if (changed && current_loops)
-    loops_state_set (LOOPS_NEED_FIXUP);
+  if (changed && current_loops
+      && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+    verify_loop_structure ();

   return changed;
 }

trips anywhere (and apply fixes).  That's of course not appropriate at
this stage.

> Does it fix 435.gromacs?

I can't see the failure on our testers (x86_64, i?86, with/without LTO).  How
can I reproduce it?

Thanks,
Richard.
H.J. Lu Feb. 28, 2014, 4:11 p.m. UTC | #2
On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng <bin.cheng@arm.com> wrote:
>>>> Hi,
>>>> This patch is to fix regression reported in PR60280 by removing forward loop
>>>> headers/latches in cfg cleanup if possible.  Several tests are broken by
>>>> this change since cfg cleanup is shared by all optimizers.  Some tests has
>>>> already been fixed by recent patches, I went through and fixed the others.
>>>> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  When
>>>> GCC removing a basic block, it checks profile information by calling
>>>> check_bb_profile after redirecting incoming edges of the bb.  This certainly
>>>> results in warnings about invalid profile information and causes the case to
>>>> fail.  I will send a patch to skip checking profile information for a
>>>> removing basic block in stage 1 if it sounds reasonable.  For now I just
>>>> twisted the case itself.
>>>>
>>>> Bootstrap and tested on x86_64 and arm_a15.
>>>>
>>>> Is it OK?
>>>>
>>>>
>>>> 2014-02-25  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>         PR target/60280
>>>>         * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
>>>>         preheaders and latches only if requested.  Fix latch if it
>>>>         is removed.
>>>>         * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
>>>>         LOOPS_HAVE_PREHEADERS.
>>>>
>>>
>>> This change:
>>>
>>>         if (dest->loop_father->header == dest)
>>> -  return false;
>>> +  {
>>> +    if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>> +        && bb->loop_father->header != dest)
>>> +      return false;
>>> +
>>> +    if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>> +        && bb->loop_father->header == dest)
>>> +      return false;
>>> +  }
>>>      }
>>>
>>> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>>>
>>> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
>>> -fuse-linker-plugin
>>>
>>> This patch changes loops without LOOPS_HAVE_PREHEADERS
>>> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
>>> true.  I don't have a small testcase.  But this patch:
>>>
>>> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
>>> index b5c384b..2ba673c 100644
>>> --- a/gcc/tree-cfgcleanup.c
>>> +++ b/gcc/tree-cfgcleanup.c
>>> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
>>>      if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>>          && bb->loop_father->header == dest)
>>>        return false;
>>> +
>>> +    if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>> +        && !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
>>> +      return false;
>>>    }
>>>      }
>>>
>>> fixes the regression.  Does it make any senses?
>>
>> I think the preheader test isn't fully correct (bb may be in an inner loop
>> for example).  So a more conservative variant would be
>>
>> Index: gcc/tree-cfgcleanup.c
>> ===================================================================
>> --- gcc/tree-cfgcleanup.c       (revision 208169)
>> +++ gcc/tree-cfgcleanup.c       (working copy)
>> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
>>        /* Protect loop preheaders and latches if requested.  */
>>        if (dest->loop_father->header == dest)
>>         {
>> -         if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>> -             && bb->loop_father->header != dest)
>> -           return false;
>> -
>> -         if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>> -             && bb->loop_father->header == dest)
>> -           return false;
>> +         if (bb->loop_father == dest->loop_father)
>> +           return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
>> +         else if (bb->loop_father == loop_outer (dest->loop_father))
>> +           return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
>> +         /* Always preserve other edges into loop headers that are
>> +            not simple latches or preheaders.  */
>> +         return false;
>>         }
>>      }
>>
>> that makes sure we can properly update loop information.  It's also
>> a more conservative change at this point which should still successfully
>> remove simple latches and preheaders created by loop discovery.
>
> I think the patch makes sense anyway and thus I'll install it once it
> passed bootstrap / regtesting.
>
> Another fix that may make sense is to restrict it to
> !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
> itself can end up setting that ... which we eventually should fix if it
> still happens.  That is, check if
>
> Index: gcc/tree-cfgcleanup.c
> ===================================================================
> --- gcc/tree-cfgcleanup.c       (revision 208169)
> +++ gcc/tree-cfgcleanup.c       (working copy)
>
> @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)
>
>    timevar_pop (TV_TREE_CLEANUP_CFG);
>
> -  if (changed && current_loops)
> -    loops_state_set (LOOPS_NEED_FIXUP);
> +  if (changed && current_loops
> +      && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
> +    verify_loop_structure ();
>
>    return changed;
>  }
>
> trips anywhere (and apply fixes).  That's of course not appropriate at
> this stage.
>
>> Does it fix 435.gromacs?

I tried revision 208222 and it doesn't fix 435.gromacs.

> I can't see the failure on our testers (x86_64, i?86, with/without LTO).  How
> can I reproduce it?
>

It only happens with

-mx32 -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
-fuse-linker-plugin

The failure is

  Running 435.gromacs ref peak lto default

*** Miscompare of gromacs.out; for details see
    /export/project/git/gcc-regression/spec/2006/spec/benchspec/CPU2006/435.grom
acs/run/run_peak_ref_lto.0000/gromacs.out.mis

cat /export/project/git/gcc-regression/spec/2006/spec/benchspec/CPU2006/435.gromacs/run/run_peak_ref_lto.0000/gromacs.out.mis
0002:  3.07684e+02
       3.03594e+02

It is very sensitive to loop optimization.
H.J. Lu Feb. 28, 2014, 5:25 p.m. UTC | #3
On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng <bin.cheng@arm.com> wrote:
>>>>> Hi,
>>>>> This patch is to fix regression reported in PR60280 by removing forward loop
>>>>> headers/latches in cfg cleanup if possible.  Several tests are broken by
>>>>> this change since cfg cleanup is shared by all optimizers.  Some tests has
>>>>> already been fixed by recent patches, I went through and fixed the others.
>>>>> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  When
>>>>> GCC removing a basic block, it checks profile information by calling
>>>>> check_bb_profile after redirecting incoming edges of the bb.  This certainly
>>>>> results in warnings about invalid profile information and causes the case to
>>>>> fail.  I will send a patch to skip checking profile information for a
>>>>> removing basic block in stage 1 if it sounds reasonable.  For now I just
>>>>> twisted the case itself.
>>>>>
>>>>> Bootstrap and tested on x86_64 and arm_a15.
>>>>>
>>>>> Is it OK?
>>>>>
>>>>>
>>>>> 2014-02-25  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>         PR target/60280
>>>>>         * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
>>>>>         preheaders and latches only if requested.  Fix latch if it
>>>>>         is removed.
>>>>>         * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
>>>>>         LOOPS_HAVE_PREHEADERS.
>>>>>
>>>>
>>>> This change:
>>>>
>>>>         if (dest->loop_father->header == dest)
>>>> -  return false;
>>>> +  {
>>>> +    if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>>> +        && bb->loop_father->header != dest)
>>>> +      return false;
>>>> +
>>>> +    if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>>> +        && bb->loop_father->header == dest)
>>>> +      return false;
>>>> +  }
>>>>      }
>>>>
>>>> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>>>>
>>>> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
>>>> -fuse-linker-plugin
>>>>
>>>> This patch changes loops without LOOPS_HAVE_PREHEADERS
>>>> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
>>>> true.  I don't have a small testcase.  But this patch:
>>>>
>>>> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
>>>> index b5c384b..2ba673c 100644
>>>> --- a/gcc/tree-cfgcleanup.c
>>>> +++ b/gcc/tree-cfgcleanup.c
>>>> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
>>>>      if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>>>          && bb->loop_father->header == dest)
>>>>        return false;
>>>> +
>>>> +    if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>>> +        && !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
>>>> +      return false;
>>>>    }
>>>>      }
>>>>
>>>> fixes the regression.  Does it make any senses?
>>>
>>> I think the preheader test isn't fully correct (bb may be in an inner loop
>>> for example).  So a more conservative variant would be
>>>
>>> Index: gcc/tree-cfgcleanup.c
>>> ===================================================================
>>> --- gcc/tree-cfgcleanup.c       (revision 208169)
>>> +++ gcc/tree-cfgcleanup.c       (working copy)
>>> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
>>>        /* Protect loop preheaders and latches if requested.  */
>>>        if (dest->loop_father->header == dest)
>>>         {
>>> -         if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>> -             && bb->loop_father->header != dest)
>>> -           return false;
>>> -
>>> -         if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>> -             && bb->loop_father->header == dest)
>>> -           return false;
>>> +         if (bb->loop_father == dest->loop_father)
>>> +           return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
>>> +         else if (bb->loop_father == loop_outer (dest->loop_father))
>>> +           return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
>>> +         /* Always preserve other edges into loop headers that are
>>> +            not simple latches or preheaders.  */
>>> +         return false;
>>>         }
>>>      }
>>>
>>> that makes sure we can properly update loop information.  It's also
>>> a more conservative change at this point which should still successfully
>>> remove simple latches and preheaders created by loop discovery.
>>
>> I think the patch makes sense anyway and thus I'll install it once it
>> passed bootstrap / regtesting.
>>
>> Another fix that may make sense is to restrict it to
>> !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
>> itself can end up setting that ... which we eventually should fix if it
>> still happens.  That is, check if
>>
>> Index: gcc/tree-cfgcleanup.c
>> ===================================================================
>> --- gcc/tree-cfgcleanup.c       (revision 208169)
>> +++ gcc/tree-cfgcleanup.c       (working copy)
>>
>> @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)
>>
>>    timevar_pop (TV_TREE_CLEANUP_CFG);
>>
>> -  if (changed && current_loops)
>> -    loops_state_set (LOOPS_NEED_FIXUP);
>> +  if (changed && current_loops
>> +      && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>> +    verify_loop_structure ();
>>
>>    return changed;
>>  }
>>
>> trips anywhere (and apply fixes).  That's of course not appropriate at
>> this stage.
>>
>>> Does it fix 435.gromacs?
>
> I tried revision 208222 and it doesn't fix 435.gromacs.

Remove

          else if (bb->loop_father == loop_outer (dest->loop_father))
            return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);

fixes 435.gromacs.
H.J. Lu Feb. 28, 2014, 5:42 p.m. UTC | #4
On Fri, Feb 28, 2014 at 9:25 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng <bin.cheng@arm.com> wrote:
>>>>>> Hi,
>>>>>> This patch is to fix regression reported in PR60280 by removing forward loop
>>>>>> headers/latches in cfg cleanup if possible.  Several tests are broken by
>>>>>> this change since cfg cleanup is shared by all optimizers.  Some tests has
>>>>>> already been fixed by recent patches, I went through and fixed the others.
>>>>>> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  When
>>>>>> GCC removing a basic block, it checks profile information by calling
>>>>>> check_bb_profile after redirecting incoming edges of the bb.  This certainly
>>>>>> results in warnings about invalid profile information and causes the case to
>>>>>> fail.  I will send a patch to skip checking profile information for a
>>>>>> removing basic block in stage 1 if it sounds reasonable.  For now I just
>>>>>> twisted the case itself.
>>>>>>
>>>>>> Bootstrap and tested on x86_64 and arm_a15.
>>>>>>
>>>>>> Is it OK?
>>>>>>
>>>>>>
>>>>>> 2014-02-25  Bin Cheng  <bin.cheng@arm.com>
>>>>>>
>>>>>>         PR target/60280
>>>>>>         * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
>>>>>>         preheaders and latches only if requested.  Fix latch if it
>>>>>>         is removed.
>>>>>>         * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
>>>>>>         LOOPS_HAVE_PREHEADERS.
>>>>>>
>>>>>
>>>>> This change:
>>>>>
>>>>>         if (dest->loop_father->header == dest)
>>>>> -  return false;
>>>>> +  {
>>>>> +    if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>>>> +        && bb->loop_father->header != dest)
>>>>> +      return false;
>>>>> +
>>>>> +    if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>>>> +        && bb->loop_father->header == dest)
>>>>> +      return false;
>>>>> +  }
>>>>>      }
>>>>>
>>>>> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>>>>>
>>>>> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
>>>>> -fuse-linker-plugin
>>>>>
>>>>> This patch changes loops without LOOPS_HAVE_PREHEADERS
>>>>> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
>>>>> true.  I don't have a small testcase.  But this patch:
>>>>>
>>>>> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
>>>>> index b5c384b..2ba673c 100644
>>>>> --- a/gcc/tree-cfgcleanup.c
>>>>> +++ b/gcc/tree-cfgcleanup.c
>>>>> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
>>>>>      if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>>>>          && bb->loop_father->header == dest)
>>>>>        return false;
>>>>> +
>>>>> +    if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>>>> +        && !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
>>>>> +      return false;
>>>>>    }
>>>>>      }
>>>>>
>>>>> fixes the regression.  Does it make any senses?
>>>>
>>>> I think the preheader test isn't fully correct (bb may be in an inner loop
>>>> for example).  So a more conservative variant would be
>>>>
>>>> Index: gcc/tree-cfgcleanup.c
>>>> ===================================================================
>>>> --- gcc/tree-cfgcleanup.c       (revision 208169)
>>>> +++ gcc/tree-cfgcleanup.c       (working copy)
>>>> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
>>>>        /* Protect loop preheaders and latches if requested.  */
>>>>        if (dest->loop_father->header == dest)
>>>>         {
>>>> -         if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>>> -             && bb->loop_father->header != dest)
>>>> -           return false;
>>>> -
>>>> -         if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>>> -             && bb->loop_father->header == dest)
>>>> -           return false;
>>>> +         if (bb->loop_father == dest->loop_father)
>>>> +           return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
>>>> +         else if (bb->loop_father == loop_outer (dest->loop_father))
>>>> +           return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
>>>> +         /* Always preserve other edges into loop headers that are
>>>> +            not simple latches or preheaders.  */
>>>> +         return false;
>>>>         }
>>>>      }
>>>>
>>>> that makes sure we can properly update loop information.  It's also
>>>> a more conservative change at this point which should still successfully
>>>> remove simple latches and preheaders created by loop discovery.
>>>
>>> I think the patch makes sense anyway and thus I'll install it once it
>>> passed bootstrap / regtesting.
>>>
>>> Another fix that may make sense is to restrict it to
>>> !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
>>> itself can end up setting that ... which we eventually should fix if it
>>> still happens.  That is, check if
>>>
>>> Index: gcc/tree-cfgcleanup.c
>>> ===================================================================
>>> --- gcc/tree-cfgcleanup.c       (revision 208169)
>>> +++ gcc/tree-cfgcleanup.c       (working copy)
>>>
>>> @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)
>>>
>>>    timevar_pop (TV_TREE_CLEANUP_CFG);
>>>
>>> -  if (changed && current_loops)
>>> -    loops_state_set (LOOPS_NEED_FIXUP);
>>> +  if (changed && current_loops
>>> +      && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>> +    verify_loop_structure ();
>>>
>>>    return changed;
>>>  }
>>>
>>> trips anywhere (and apply fixes).  That's of course not appropriate at
>>> this stage.
>>>
>>>> Does it fix 435.gromacs?
>>
>> I tried revision 208222 and it doesn't fix 435.gromacs.
>
> Remove
>
>           else if (bb->loop_father == loop_outer (dest->loop_father))
>             return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);

Should we also check other loop state, like LOOPS_HAVE_SIMPLE_LATCHES
or LOOPS_MAY_HAVE_MULTIPLE_LATCHES here?

> fixes 435.gromacs.

Patch
diff mbox

Index: gcc/tree-cfgcleanup.c
===================================================================
--- gcc/tree-cfgcleanup.c       (revision 208169)
+++ gcc/tree-cfgcleanup.c       (working copy)
@@ -316,13 +316,13 @@  tree_forwarder_block_p (basic_block bb,
       /* Protect loop preheaders and latches if requested.  */
       if (dest->loop_father->header == dest)
        {
-         if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
-             && bb->loop_father->header != dest)
-           return false;
-
-         if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
-             && bb->loop_father->header == dest)
-           return false;
+         if (bb->loop_father == dest->loop_father)
+           return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
+         else if (bb->loop_father == loop_outer (dest->loop_father))
+           return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
+         /* Always preserve other edges into loop headers that are
+            not simple latches or preheaders.  */
+         return false;
        }
     }