[PR,tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)
diff mbox

Message ID b8393005-a73c-26e4-b76e-04ee62608775@redhat.com
State New
Headers show

Commit Message

Aldy Hernandez Jan. 3, 2017, 5:36 p.m. UTC
On 12/20/2016 09:16 AM, Richard Biener wrote:
> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> Hi folks.
>>
>> This is a follow-up on Jeff and Richi's interaction on the aforementioned PR
>> here:
>>
>>         https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>
>> I decided to explore the idea of analyzing may-undefness on-demand, which
>> actually looks rather cheap.
>>
>> BTW, I don't understand why we don't have auto_bitmap's, as we already have
>> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code we
>> already have.
>>
>> The attached patch fixes the bug without introducing any regressions.
>>
>> I also tested the patch by compiling 242 .ii files with -O3.  These were
>> gathered from a stage1 build with -save-temps.  There is a slight time
>> degradation of 4 seconds within 27 minutes of user time:
>>
>>         tainted:        26:52
>>         orig:           26:48
>>
>> This was the average aggregate time of two runs compiling all 242 .ii files.
>> IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?
>
> +  while (!worklist.is_empty ())
> +    {
> +      name = worklist.pop ();
> +      gcc_assert (TREE_CODE (name) == SSA_NAME);
> +
> +      if (ssa_undefined_value_p (name, true))
> +       return true;
> +
> +      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>
> it should be already set as we use visited_ssa as "was it ever on the worklist",
> so maybe renaming it would be a good thing as well.

I don't understand what you would prefer here.

>
> +             if (TREE_CODE (name) == SSA_NAME)
> +               {
> +                 /* If an SSA has already been seen, this may be a loop.
> +                    Fail conservatively.  */
> +                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
> +                   return false;
>
> so to me "conservative" is returning true, not false.

OK

>
> +                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
> +                 worklist.safe_push (name);
>
> but for loops we can just continue and ignore this use.  And bitmap_set_bit
> returns whether it set a bit, thus
>
>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
>                   worklist.safe_push (name);
>
> should work?

Fixed.

>
> +      /* Check that any SSA names used to define NAME is also fully
> +        defined.  */
> +      use_operand_p use_p;
> +      ssa_op_iter iter;
> +      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
> +       {
> +         name = USE_FROM_PTR (use_p);
> +         if (TREE_CODE (name) == SSA_NAME)
>
> always true.
>
> +           {
> +             /* If an SSA has already been seen, this may be a loop.
> +                Fail conservatively.  */
> +             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
> +               return false;
> +             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
> +             worklist.safe_push (name);
>
> See above.
>
> In principle the thing is sound but I'd like to be able to pass in a bitmap of
> known maybe-undefined/must-defined SSA names to have a cache for
> multiple invocations from within a pass (like this unswitching case).

Done, though bitmaps are now calculated as part of the instantiation.

>
> Also once you hit defs that are in a post-dominated region of the loop entry
> you can treat them as not undefined (as their use invokes undefined
> behavior).  This is also how you treat function parameters (well,
> ssa_undefined_value_p does), where the call site invokes undefined behavior
> when passing in undefined values.  So we need an extra parameter specifying
> the post-dominance region.

Done.

>
> You do not handle memory or calls conservatively which means the existing
> testcase only needs some obfuscation to become a problem again.  To fix
> that before /* Check that any SSA names used to define NAME is also fully
> defined.  */ bail out conservatively, like
>
>    if (! is_gimple_assign (def)
>       || gimple_assign_single_p (def))
>     return true;

As I asked previously, I understand the !is_gimple_assign, which will 
skip over GIMPLE_CALLs setting a value, but the 
"gimple_assign_single_p(def) == true"??

Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't 
we want to follow up the def chain precisely on these?

The attached implementation uses a cache, and a pre-computed 
post-dominance region.  A previous incantation of this patch (sans the 
post-dominance stuff) had performance within the noise of the previous 
implementation.

I am testing again, and will do some performance comparisons in a bit, 
but for now-- are we on the same page here?  Is this what you wanted?

Aldy

p.s. I could turn the post-dominance region into a bitmap for faster 
access if preferred.
commit 47d0c1b3144d4d56405d72c3ad55183d8632d0a7
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Fri Dec 16 03:44:52 2016 -0500

            PR tree-optimization/71691
            * bitmap.h (class auto_bitmap): New.
            * tree-ssa-defined-or-undefined.c: New file.
            * tree-ssa-defined-or-undefined.h: New file.
            * Makefile.in (tree-ssa-defined-or-undefined.o): New.
            * tree-ssa-loop-unswitch.c (tree_ssa_unswitch_loops): Calculate
            post dominance info.
            (tree_may_unswitch_on): Add defined_or_undefined parameter.
            Use defined_or_undefined instead of ssa_undefined_value_p.
            (tree_unswitch_single_loop): Instantiate defined_or_undefined
            variable.

Comments

Richard Biener Jan. 4, 2017, 12:11 p.m. UTC | #1
On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 12/20/2016 09:16 AM, Richard Biener wrote:
>>
>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>
>>> Hi folks.
>>>
>>> This is a follow-up on Jeff and Richi's interaction on the aforementioned
>>> PR
>>> here:
>>>
>>>         https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>>
>>> I decided to explore the idea of analyzing may-undefness on-demand, which
>>> actually looks rather cheap.
>>>
>>> BTW, I don't understand why we don't have auto_bitmap's, as we already
>>> have
>>> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code
>>> we
>>> already have.
>>>
>>> The attached patch fixes the bug without introducing any regressions.
>>>
>>> I also tested the patch by compiling 242 .ii files with -O3.  These were
>>> gathered from a stage1 build with -save-temps.  There is a slight time
>>> degradation of 4 seconds within 27 minutes of user time:
>>>
>>>         tainted:        26:52
>>>         orig:           26:48
>>>
>>> This was the average aggregate time of two runs compiling all 242 .ii
>>> files.
>>> IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?
>>
>>
>> +  while (!worklist.is_empty ())
>> +    {
>> +      name = worklist.pop ();
>> +      gcc_assert (TREE_CODE (name) == SSA_NAME);
>> +
>> +      if (ssa_undefined_value_p (name, true))
>> +       return true;
>> +
>> +      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>
>> it should be already set as we use visited_ssa as "was it ever on the
>> worklist",
>> so maybe renaming it would be a good thing as well.
>
>
> I don't understand what you would prefer here.

Set the bit when you put name on the worklist (and do not do that if the
bit is set).  Thus simply remove the above and add a bitmap_set_bit
for the initial name you put on the worklist.

>>
>> +             if (TREE_CODE (name) == SSA_NAME)
>> +               {
>> +                 /* If an SSA has already been seen, this may be a loop.
>> +                    Fail conservatively.  */
>> +                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>> +                   return false;
>>
>> so to me "conservative" is returning true, not false.
>
>
> OK
>
>>
>> +                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>> +                 worklist.safe_push (name);
>>
>> but for loops we can just continue and ignore this use.  And
>> bitmap_set_bit
>> returns whether it set a bit, thus
>>
>>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
>>                   worklist.safe_push (name);
>>
>> should work?
>
>
> Fixed.
>
>>
>> +      /* Check that any SSA names used to define NAME is also fully
>> +        defined.  */
>> +      use_operand_p use_p;
>> +      ssa_op_iter iter;
>> +      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
>> +       {
>> +         name = USE_FROM_PTR (use_p);
>> +         if (TREE_CODE (name) == SSA_NAME)
>>
>> always true.
>>
>> +           {
>> +             /* If an SSA has already been seen, this may be a loop.
>> +                Fail conservatively.  */
>> +             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>> +               return false;
>> +             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>> +             worklist.safe_push (name);
>>
>> See above.
>>
>> In principle the thing is sound but I'd like to be able to pass in a
>> bitmap of
>> known maybe-undefined/must-defined SSA names to have a cache for
>> multiple invocations from within a pass (like this unswitching case).
>
>
> Done, though bitmaps are now calculated as part of the instantiation.
>
>>
>> Also once you hit defs that are in a post-dominated region of the loop
>> entry
>> you can treat them as not undefined (as their use invokes undefined
>> behavior).  This is also how you treat function parameters (well,
>> ssa_undefined_value_p does), where the call site invokes undefined
>> behavior
>> when passing in undefined values.  So we need an extra parameter
>> specifying
>> the post-dominance region.
>
>
> Done.
>
>>
>> You do not handle memory or calls conservatively which means the existing
>> testcase only needs some obfuscation to become a problem again.  To fix
>> that before /* Check that any SSA names used to define NAME is also fully
>> defined.  */ bail out conservatively, like
>>
>>    if (! is_gimple_assign (def)
>>       || gimple_assign_single_p (def))
>>     return true;
>
>
> As I asked previously, I understand the !is_gimple_assign, which will skip
> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) ==
> true"??
>
> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we
> want to follow up the def chain precisely on these?
>
> The attached implementation uses a cache, and a pre-computed post-dominance
> region.  A previous incantation of this patch (sans the post-dominance
> stuff) had performance within the noise of the previous implementation.
>
> I am testing again, and will do some performance comparisons in a bit, but
> for now-- are we on the same page here?  Is this what you wanted?

+      /* DEFs in the post-dominated region of the loop entry invoke
+        undefined behavior.  Adding any use won't make things any
+        worse.  */
+      for (unsigned i = 0; i < postdom.length (); ++i)
+       if (def->bb == postdom[i])

gimple_bb (def)

+         {
+           set_defined (t);
+           return false;
+         }

I think you can't really return false here but need to continue processing
the worklist.  I'd say rather than a linear walk you can as well use
dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry
block instead?

Note that the way you compute post-dominators doesn't work -- nothing
keeps them up-to-date when unswitching does CFG manipulations :/
Fortunately we have a way to compute them per region, see how tree-if-conv.c
does that (build_region, calculate_dominance_info_for_region).

+      /* Handle memory and calls conservatively.  */
+      if (!is_gimple_assign (def)
+         /* ?? Do we really want this?  The code below will return
+            TRUE for something like "e.3_7 = e", which is probably
+            not what we want.
+         || gimple_assign_single_p (def)

as said for loads it _is_ what we want (the memory may be uninitialized).
You can make it

     || (gimple_assign_single_p (def)
         && gimple_vuse (def))

to better catch that though (stop at memory loads only).

--

The way you add things to the cache (or not) is somewhat odd probably
given that you don't really compute for anything but the arg to
is_maybe_undefined
whether it is maybe undefined or not?

I am missing a cache check in the worklist processing at least but I also don't
see how common "tails" in SSA use trees of different names are reliably
cached (and I'm not sure it's easily possible to populate the cache for all
visited SSA names with conservative values -- a recursive implementation
would have made that obvious but with a worklist it might be tricky).

Given the way it is used from unswitching and given (citing from it):

  /* Condition must be invariant.  */
  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
    {
      /* Unswitching on undefined values would introduce undefined
         behavior that the original program might never exercise.  */
      if (ssa_undefined_value_p (use, true))
        return NULL_TREE;
      def = SSA_NAME_DEF_STMT (use);
      def_bb = gimple_bb (def);
      if (def_bb
          && flow_bb_inside_loop_p (loop, def_bb))
^^^

we do not unswitch on exprs that are defined inside the loop at all, thus all
SSA use-def walking is pointless here (I think in your worklist processing
you miss that you can stop walking when you see a (non-PHI) def that
dominates the post-dom region entry).  The above also suggests to do that
flow_bb_inside_loop_p check first, eventually even for all stmt operands.


Thanks,
Richard.

> Aldy
>
> p.s. I could turn the post-dominance region into a bitmap for faster access
> if preferred.
Aldy Hernandez Jan. 7, 2017, 12:54 p.m. UTC | #2
On 01/04/2017 07:11 AM, Richard Biener wrote:
> On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> On 12/20/2016 09:16 AM, Richard Biener wrote:
>>>
>>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>>
>>>> Hi folks.
>>>>
>>>> This is a follow-up on Jeff and Richi's interaction on the aforementioned
>>>> PR
>>>> here:
>>>>
>>>>         https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>>>
>>>> I decided to explore the idea of analyzing may-undefness on-demand, which
>>>> actually looks rather cheap.
>>>>
>>>> BTW, I don't understand why we don't have auto_bitmap's, as we already
>>>> have
>>>> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code
>>>> we
>>>> already have.
>>>>
>>>> The attached patch fixes the bug without introducing any regressions.
>>>>
>>>> I also tested the patch by compiling 242 .ii files with -O3.  These were
>>>> gathered from a stage1 build with -save-temps.  There is a slight time
>>>> degradation of 4 seconds within 27 minutes of user time:
>>>>
>>>>         tainted:        26:52
>>>>         orig:           26:48
>>>>
>>>> This was the average aggregate time of two runs compiling all 242 .ii
>>>> files.
>>>> IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?
>>>
>>>
>>> +  while (!worklist.is_empty ())
>>> +    {
>>> +      name = worklist.pop ();
>>> +      gcc_assert (TREE_CODE (name) == SSA_NAME);
>>> +
>>> +      if (ssa_undefined_value_p (name, true))
>>> +       return true;
>>> +
>>> +      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>
>>> it should be already set as we use visited_ssa as "was it ever on the
>>> worklist",
>>> so maybe renaming it would be a good thing as well.
>>
>>
>> I don't understand what you would prefer here.
>
> Set the bit when you put name on the worklist (and do not do that if the
> bit is set).  Thus simply remove the above and add a bitmap_set_bit
> for the initial name you put on the worklist.
>
>>>
>>> +             if (TREE_CODE (name) == SSA_NAME)
>>> +               {
>>> +                 /* If an SSA has already been seen, this may be a loop.
>>> +                    Fail conservatively.  */
>>> +                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>>> +                   return false;
>>>
>>> so to me "conservative" is returning true, not false.
>>
>>
>> OK
>>
>>>
>>> +                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>> +                 worklist.safe_push (name);
>>>
>>> but for loops we can just continue and ignore this use.  And
>>> bitmap_set_bit
>>> returns whether it set a bit, thus
>>>
>>>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
>>>                   worklist.safe_push (name);
>>>
>>> should work?
>>
>>
>> Fixed.
>>
>>>
>>> +      /* Check that any SSA names used to define NAME is also fully
>>> +        defined.  */
>>> +      use_operand_p use_p;
>>> +      ssa_op_iter iter;
>>> +      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
>>> +       {
>>> +         name = USE_FROM_PTR (use_p);
>>> +         if (TREE_CODE (name) == SSA_NAME)
>>>
>>> always true.
>>>
>>> +           {
>>> +             /* If an SSA has already been seen, this may be a loop.
>>> +                Fail conservatively.  */
>>> +             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>>> +               return false;
>>> +             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>> +             worklist.safe_push (name);
>>>
>>> See above.
>>>
>>> In principle the thing is sound but I'd like to be able to pass in a
>>> bitmap of
>>> known maybe-undefined/must-defined SSA names to have a cache for
>>> multiple invocations from within a pass (like this unswitching case).
>>
>>
>> Done, though bitmaps are now calculated as part of the instantiation.
>>
>>>
>>> Also once you hit defs that are in a post-dominated region of the loop
>>> entry
>>> you can treat them as not undefined (as their use invokes undefined
>>> behavior).  This is also how you treat function parameters (well,
>>> ssa_undefined_value_p does), where the call site invokes undefined
>>> behavior
>>> when passing in undefined values.  So we need an extra parameter
>>> specifying
>>> the post-dominance region.
>>
>>
>> Done.
>>
>>>
>>> You do not handle memory or calls conservatively which means the existing
>>> testcase only needs some obfuscation to become a problem again.  To fix
>>> that before /* Check that any SSA names used to define NAME is also fully
>>> defined.  */ bail out conservatively, like
>>>
>>>    if (! is_gimple_assign (def)
>>>       || gimple_assign_single_p (def))
>>>     return true;
>>
>>
>> As I asked previously, I understand the !is_gimple_assign, which will skip
>> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) ==
>> true"??
>>
>> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we
>> want to follow up the def chain precisely on these?
>>
>> The attached implementation uses a cache, and a pre-computed post-dominance
>> region.  A previous incantation of this patch (sans the post-dominance
>> stuff) had performance within the noise of the previous implementation.
>>
>> I am testing again, and will do some performance comparisons in a bit, but
>> for now-- are we on the same page here?  Is this what you wanted?
>
> +      /* DEFs in the post-dominated region of the loop entry invoke
> +        undefined behavior.  Adding any use won't make things any
> +        worse.  */
> +      for (unsigned i = 0; i < postdom.length (); ++i)
> +       if (def->bb == postdom[i])
>
> gimple_bb (def)
>
> +         {
> +           set_defined (t);
> +           return false;
> +         }
>
> I think you can't really return false here but need to continue processing
> the worklist.  I'd say rather than a linear walk you can as well use
> dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry
> block instead?
>
> Note that the way you compute post-dominators doesn't work -- nothing
> keeps them up-to-date when unswitching does CFG manipulations :/
> Fortunately we have a way to compute them per region, see how tree-if-conv.c
> does that (build_region, calculate_dominance_info_for_region).

If I understand correctly, we could compute them per region in 
tree_unswitch_single_loop() for each individual loop with what 
tree-if-conv.c uses.

I could certainly abstract 
build_region/calculate_dominance_info_for_region into something new, say 
calculate_dominance_info_for_loop_region().  Then we could use 
dominated_by_p(CDI_POST_DOMINATORS, ...) in my class as you suggest.

Question... computing the post-dom tree per region as I've just 
described will only create post-dom information for the loop region, 
which means that any definition outside of the loop will have zero 
dominance info.

What if the definition in question post-dominates the loop entry but is 
outside/after of the loop?  We will have no post-dom information for 
this.  In this case, could we just ignore and continue evaluating the 
SSA_NAME with the rest of our heuristics, or did you have something else 
in mind?  Another option would be to calculate the post-dom information 
for the entire function on every loop 
(calculate_dominance_info(CDI_POST_DOMINATORS)), but that seems rather 
expensive.

As a side note, at this point I just want to fix/close the PR in an 
acceptable manner, not come up with the end-all catch-all most-efficient 
patch for an unlikely scenario ;-).

Thanks for taking the time to review and offer suggestions here.

Aldy
Richard Biener Jan. 9, 2017, 9:30 a.m. UTC | #3
On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 01/04/2017 07:11 AM, Richard Biener wrote:
>>
>> On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>
>>> On 12/20/2016 09:16 AM, Richard Biener wrote:
>>>>
>>>>
>>>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi folks.
>>>>>
>>>>> This is a follow-up on Jeff and Richi's interaction on the
>>>>> aforementioned
>>>>> PR
>>>>> here:
>>>>>
>>>>>         https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>>>>
>>>>> I decided to explore the idea of analyzing may-undefness on-demand,
>>>>> which
>>>>> actually looks rather cheap.
>>>>>
>>>>> BTW, I don't understand why we don't have auto_bitmap's, as we already
>>>>> have
>>>>> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's
>>>>> code
>>>>> we
>>>>> already have.
>>>>>
>>>>> The attached patch fixes the bug without introducing any regressions.
>>>>>
>>>>> I also tested the patch by compiling 242 .ii files with -O3.  These
>>>>> were
>>>>> gathered from a stage1 build with -save-temps.  There is a slight time
>>>>> degradation of 4 seconds within 27 minutes of user time:
>>>>>
>>>>>         tainted:        26:52
>>>>>         orig:           26:48
>>>>>
>>>>> This was the average aggregate time of two runs compiling all 242 .ii
>>>>> files.
>>>>> IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?
>>>>
>>>>
>>>>
>>>> +  while (!worklist.is_empty ())
>>>> +    {
>>>> +      name = worklist.pop ();
>>>> +      gcc_assert (TREE_CODE (name) == SSA_NAME);
>>>> +
>>>> +      if (ssa_undefined_value_p (name, true))
>>>> +       return true;
>>>> +
>>>> +      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>>
>>>> it should be already set as we use visited_ssa as "was it ever on the
>>>> worklist",
>>>> so maybe renaming it would be a good thing as well.
>>>
>>>
>>>
>>> I don't understand what you would prefer here.
>>
>>
>> Set the bit when you put name on the worklist (and do not do that if the
>> bit is set).  Thus simply remove the above and add a bitmap_set_bit
>> for the initial name you put on the worklist.
>>
>>>>
>>>> +             if (TREE_CODE (name) == SSA_NAME)
>>>> +               {
>>>> +                 /* If an SSA has already been seen, this may be a
>>>> loop.
>>>> +                    Fail conservatively.  */
>>>> +                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
>>>> (name)))
>>>> +                   return false;
>>>>
>>>> so to me "conservative" is returning true, not false.
>>>
>>>
>>>
>>> OK
>>>
>>>>
>>>> +                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>> +                 worklist.safe_push (name);
>>>>
>>>> but for loops we can just continue and ignore this use.  And
>>>> bitmap_set_bit
>>>> returns whether it set a bit, thus
>>>>
>>>>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>>>> (name)))
>>>>                   worklist.safe_push (name);
>>>>
>>>> should work?
>>>
>>>
>>>
>>> Fixed.
>>>
>>>>
>>>> +      /* Check that any SSA names used to define NAME is also fully
>>>> +        defined.  */
>>>> +      use_operand_p use_p;
>>>> +      ssa_op_iter iter;
>>>> +      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
>>>> +       {
>>>> +         name = USE_FROM_PTR (use_p);
>>>> +         if (TREE_CODE (name) == SSA_NAME)
>>>>
>>>> always true.
>>>>
>>>> +           {
>>>> +             /* If an SSA has already been seen, this may be a loop.
>>>> +                Fail conservatively.  */
>>>> +             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>>>> +               return false;
>>>> +             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>> +             worklist.safe_push (name);
>>>>
>>>> See above.
>>>>
>>>> In principle the thing is sound but I'd like to be able to pass in a
>>>> bitmap of
>>>> known maybe-undefined/must-defined SSA names to have a cache for
>>>> multiple invocations from within a pass (like this unswitching case).
>>>
>>>
>>>
>>> Done, though bitmaps are now calculated as part of the instantiation.
>>>
>>>>
>>>> Also once you hit defs that are in a post-dominated region of the loop
>>>> entry
>>>> you can treat them as not undefined (as their use invokes undefined
>>>> behavior).  This is also how you treat function parameters (well,
>>>> ssa_undefined_value_p does), where the call site invokes undefined
>>>> behavior
>>>> when passing in undefined values.  So we need an extra parameter
>>>> specifying
>>>> the post-dominance region.
>>>
>>>
>>>
>>> Done.
>>>
>>>>
>>>> You do not handle memory or calls conservatively which means the
>>>> existing
>>>> testcase only needs some obfuscation to become a problem again.  To fix
>>>> that before /* Check that any SSA names used to define NAME is also
>>>> fully
>>>> defined.  */ bail out conservatively, like
>>>>
>>>>    if (! is_gimple_assign (def)
>>>>       || gimple_assign_single_p (def))
>>>>     return true;
>>>
>>>
>>>
>>> As I asked previously, I understand the !is_gimple_assign, which will
>>> skip
>>> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def)
>>> ==
>>> true"??
>>>
>>> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't
>>> we
>>> want to follow up the def chain precisely on these?
>>>
>>> The attached implementation uses a cache, and a pre-computed
>>> post-dominance
>>> region.  A previous incantation of this patch (sans the post-dominance
>>> stuff) had performance within the noise of the previous implementation.
>>>
>>> I am testing again, and will do some performance comparisons in a bit,
>>> but
>>> for now-- are we on the same page here?  Is this what you wanted?
>>
>>
>> +      /* DEFs in the post-dominated region of the loop entry invoke
>> +        undefined behavior.  Adding any use won't make things any
>> +        worse.  */
>> +      for (unsigned i = 0; i < postdom.length (); ++i)
>> +       if (def->bb == postdom[i])
>>
>> gimple_bb (def)
>>
>> +         {
>> +           set_defined (t);
>> +           return false;
>> +         }
>>
>> I think you can't really return false here but need to continue processing
>> the worklist.  I'd say rather than a linear walk you can as well use
>> dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry
>> block instead?
>>
>> Note that the way you compute post-dominators doesn't work -- nothing
>> keeps them up-to-date when unswitching does CFG manipulations :/
>> Fortunately we have a way to compute them per region, see how
>> tree-if-conv.c
>> does that (build_region, calculate_dominance_info_for_region).
>
>
> If I understand correctly, we could compute them per region in
> tree_unswitch_single_loop() for each individual loop with what
> tree-if-conv.c uses.
>
> I could certainly abstract build_region/calculate_dominance_info_for_region
> into something new, say calculate_dominance_info_for_loop_region().  Then we
> could use dominated_by_p(CDI_POST_DOMINATORS, ...) in my class as you
> suggest.
>
> Question... computing the post-dom tree per region as I've just described
> will only create post-dom information for the loop region, which means that
> any definition outside of the loop will have zero dominance info.

Yes.

> What if the definition in question post-dominates the loop entry but is
> outside/after of the loop?

How would we ever arrive at such def?  Once we reach a def before the
region/loop
we know it's fine to use (the missing "stop" point I pointed out).

>  We will have no post-dom information for this.
> In this case, could we just ignore and continue evaluating the SSA_NAME with
> the rest of our heuristics, or did you have something else in mind?  Another
> option would be to calculate the post-dom information for the entire
> function on every loop (calculate_dominance_info(CDI_POST_DOMINATORS)), but
> that seems rather expensive.
>
> As a side note, at this point I just want to fix/close the PR in an
> acceptable manner, not come up with the end-all catch-all most-efficient
> patch for an unlikely scenario ;-).

Yeah, which is why I wonder whether the caching is worth the trouble (in it's
current form) for the unswitching use (given it's other restrictions
on conditions
to unswitch).

Richard.

> Thanks for taking the time to review and offer suggestions here.
>
> Aldy
Aldy Hernandez Jan. 13, 2017, 6:48 p.m. UTC | #4
[Sorry for the delay, I was sick.]

On 01/09/2017 04:30 AM, Richard Biener wrote:
> On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> On 01/04/2017 07:11 AM, Richard Biener wrote:
>>>
>>> On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>>
>>>> On 12/20/2016 09:16 AM, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi folks.
>>>>>>
>>>>>> This is a follow-up on Jeff and Richi's interaction on the
>>>>>> aforementioned
>>>>>> PR
>>>>>> here:
>>>>>>
>>>>>>         https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>>>>>
>>>>>> I decided to explore the idea of analyzing may-undefness on-demand,
>>>>>> which
>>>>>> actually looks rather cheap.
>>>>>>
>>>>>> BTW, I don't understand why we don't have auto_bitmap's, as we already
>>>>>> have
>>>>>> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's
>>>>>> code
>>>>>> we
>>>>>> already have.
>>>>>>
>>>>>> The attached patch fixes the bug without introducing any regressions.
>>>>>>
>>>>>> I also tested the patch by compiling 242 .ii files with -O3.  These
>>>>>> were
>>>>>> gathered from a stage1 build with -save-temps.  There is a slight time
>>>>>> degradation of 4 seconds within 27 minutes of user time:
>>>>>>
>>>>>>         tainted:        26:52
>>>>>>         orig:           26:48
>>>>>>
>>>>>> This was the average aggregate time of two runs compiling all 242 .ii
>>>>>> files.
>>>>>> IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?
>>>>>
>>>>>
>>>>>
>>>>> +  while (!worklist.is_empty ())
>>>>> +    {
>>>>> +      name = worklist.pop ();
>>>>> +      gcc_assert (TREE_CODE (name) == SSA_NAME);
>>>>> +
>>>>> +      if (ssa_undefined_value_p (name, true))
>>>>> +       return true;
>>>>> +
>>>>> +      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>>>
>>>>> it should be already set as we use visited_ssa as "was it ever on the
>>>>> worklist",
>>>>> so maybe renaming it would be a good thing as well.
>>>>
>>>>
>>>>
>>>> I don't understand what you would prefer here.
>>>
>>>
>>> Set the bit when you put name on the worklist (and do not do that if the
>>> bit is set).  Thus simply remove the above and add a bitmap_set_bit
>>> for the initial name you put on the worklist.
>>>
>>>>>
>>>>> +             if (TREE_CODE (name) == SSA_NAME)
>>>>> +               {
>>>>> +                 /* If an SSA has already been seen, this may be a
>>>>> loop.
>>>>> +                    Fail conservatively.  */
>>>>> +                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
>>>>> (name)))
>>>>> +                   return false;
>>>>>
>>>>> so to me "conservative" is returning true, not false.
>>>>
>>>>
>>>>
>>>> OK
>>>>
>>>>>
>>>>> +                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>>> +                 worklist.safe_push (name);
>>>>>
>>>>> but for loops we can just continue and ignore this use.  And
>>>>> bitmap_set_bit
>>>>> returns whether it set a bit, thus
>>>>>
>>>>>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>>>>> (name)))
>>>>>                   worklist.safe_push (name);
>>>>>
>>>>> should work?
>>>>
>>>>
>>>>
>>>> Fixed.
>>>>
>>>>>
>>>>> +      /* Check that any SSA names used to define NAME is also fully
>>>>> +        defined.  */
>>>>> +      use_operand_p use_p;
>>>>> +      ssa_op_iter iter;
>>>>> +      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
>>>>> +       {
>>>>> +         name = USE_FROM_PTR (use_p);
>>>>> +         if (TREE_CODE (name) == SSA_NAME)
>>>>>
>>>>> always true.
>>>>>
>>>>> +           {
>>>>> +             /* If an SSA has already been seen, this may be a loop.
>>>>> +                Fail conservatively.  */
>>>>> +             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>>>>> +               return false;
>>>>> +             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>>> +             worklist.safe_push (name);
>>>>>
>>>>> See above.
>>>>>
>>>>> In principle the thing is sound but I'd like to be able to pass in a
>>>>> bitmap of
>>>>> known maybe-undefined/must-defined SSA names to have a cache for
>>>>> multiple invocations from within a pass (like this unswitching case).
>>>>
>>>>
>>>>
>>>> Done, though bitmaps are now calculated as part of the instantiation.
>>>>
>>>>>
>>>>> Also once you hit defs that are in a post-dominated region of the loop
>>>>> entry
>>>>> you can treat them as not undefined (as their use invokes undefined
>>>>> behavior).  This is also how you treat function parameters (well,
>>>>> ssa_undefined_value_p does), where the call site invokes undefined
>>>>> behavior
>>>>> when passing in undefined values.  So we need an extra parameter
>>>>> specifying
>>>>> the post-dominance region.
>>>>
>>>>
>>>>
>>>> Done.
>>>>
>>>>>
>>>>> You do not handle memory or calls conservatively which means the
>>>>> existing
>>>>> testcase only needs some obfuscation to become a problem again.  To fix
>>>>> that before /* Check that any SSA names used to define NAME is also
>>>>> fully
>>>>> defined.  */ bail out conservatively, like
>>>>>
>>>>>    if (! is_gimple_assign (def)
>>>>>       || gimple_assign_single_p (def))
>>>>>     return true;
>>>>
>>>>
>>>>
>>>> As I asked previously, I understand the !is_gimple_assign, which will
>>>> skip
>>>> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def)
>>>> ==
>>>> true"??
>>>>
>>>> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't
>>>> we
>>>> want to follow up the def chain precisely on these?
>>>>
>>>> The attached implementation uses a cache, and a pre-computed
>>>> post-dominance
>>>> region.  A previous incantation of this patch (sans the post-dominance
>>>> stuff) had performance within the noise of the previous implementation.
>>>>
>>>> I am testing again, and will do some performance comparisons in a bit,
>>>> but
>>>> for now-- are we on the same page here?  Is this what you wanted?
>>>
>>>
>>> +      /* DEFs in the post-dominated region of the loop entry invoke
>>> +        undefined behavior.  Adding any use won't make things any
>>> +        worse.  */
>>> +      for (unsigned i = 0; i < postdom.length (); ++i)
>>> +       if (def->bb == postdom[i])
>>>
>>> gimple_bb (def)
>>>
>>> +         {
>>> +           set_defined (t);
>>> +           return false;
>>> +         }
>>>
>>> I think you can't really return false here but need to continue processing
>>> the worklist.  I'd say rather than a linear walk you can as well use
>>> dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry
>>> block instead?
>>>
>>> Note that the way you compute post-dominators doesn't work -- nothing
>>> keeps them up-to-date when unswitching does CFG manipulations :/
>>> Fortunately we have a way to compute them per region, see how
>>> tree-if-conv.c
>>> does that (build_region, calculate_dominance_info_for_region).
>>
>>
>> If I understand correctly, we could compute them per region in
>> tree_unswitch_single_loop() for each individual loop with what
>> tree-if-conv.c uses.
>>
>> I could certainly abstract build_region/calculate_dominance_info_for_region
>> into something new, say calculate_dominance_info_for_loop_region().  Then we
>> could use dominated_by_p(CDI_POST_DOMINATORS, ...) in my class as you
>> suggest.
>>
>> Question... computing the post-dom tree per region as I've just described
>> will only create post-dom information for the loop region, which means that
>> any definition outside of the loop will have zero dominance info.
>
> Yes.
>
>> What if the definition in question post-dominates the loop entry but is
>> outside/after of the loop?
>
> How would we ever arrive at such def?  Once we reach a def before the
> region/loop
> we know it's fine to use (the missing "stop" point I pointed out).

Hmmm, it looks like we can't even build the post-dom region 
appropriately in the presence of infinite loops.

First, build_region() fails if it can't find a loop post-header:

   /* The last element is loop post-header.  */
   gcc_assert (exit_bb);

which we won't have in an infinite loop like:

bb2: //loop header
|
V
loop 1:
bb3:		<-------+
	bar();		|
|			|
V			|
bb4:			|
	goto bb3;   >---+


We could just build the region without the post-header, but then we fail 
while building the DFS dominance region:

dom_info::calc_dfs_tree_nonrec (basic_block bb)
....
	  gcc_assert (bn != m_start_block);

Presumably because we've looped back to the start block (bb4 in a post 
dominator world).  This doesn't happen calculating going forward because 
we always have a start block outside of the region (the function entry 
block).

I tried to fake edge my way out of it, but things get really messed up 
while putting/removing fake edges in the presence of loop info.  I'm 
assuming all this is by design.

Would you prefer me to continue down this path, trying to build a 
post-dom region with infinite loops and fixing build_region / 
calc_dfs_tree_nonrec, or could we do without this dominance 
optimization?  Pretty please?

>
>>  We will have no post-dom information for this.
>> In this case, could we just ignore and continue evaluating the SSA_NAME with
>> the rest of our heuristics, or did you have something else in mind?  Another
>> option would be to calculate the post-dom information for the entire
>> function on every loop (calculate_dominance_info(CDI_POST_DOMINATORS)), but
>> that seems rather expensive.
>>
>> As a side note, at this point I just want to fix/close the PR in an
>> acceptable manner, not come up with the end-all catch-all most-efficient
>> patch for an unlikely scenario ;-).
>
> Yeah, which is why I wonder whether the caching is worth the trouble (in it's
> current form) for the unswitching use (given it's other restrictions
> on conditions
> to unswitch).

We could go back to my original, no caching version (with the other 
suggestions).  That solves the problem quite simply, with no regressions.

Thanks again.
Aldy
Richard Biener Jan. 18, 2017, 3:10 p.m. UTC | #5
On Fri, Jan 13, 2017 at 7:48 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> [Sorry for the delay, I was sick.]
>
>
> On 01/09/2017 04:30 AM, Richard Biener wrote:
>>
>> On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>
>>> On 01/04/2017 07:11 AM, Richard Biener wrote:
>>>>
>>>>
>>>> On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>>>
>>>>>
>>>>> On 12/20/2016 09:16 AM, Richard Biener wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi folks.
>>>>>>>
>>>>>>> This is a follow-up on Jeff and Richi's interaction on the
>>>>>>> aforementioned
>>>>>>> PR
>>>>>>> here:
>>>>>>>
>>>>>>>         https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>>>>>>
>>>>>>> I decided to explore the idea of analyzing may-undefness on-demand,
>>>>>>> which
>>>>>>> actually looks rather cheap.
>>>>>>>
>>>>>>> BTW, I don't understand why we don't have auto_bitmap's, as we
>>>>>>> already
>>>>>>> have
>>>>>>> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's
>>>>>>> code
>>>>>>> we
>>>>>>> already have.
>>>>>>>
>>>>>>> The attached patch fixes the bug without introducing any regressions.
>>>>>>>
>>>>>>> I also tested the patch by compiling 242 .ii files with -O3.  These
>>>>>>> were
>>>>>>> gathered from a stage1 build with -save-temps.  There is a slight
>>>>>>> time
>>>>>>> degradation of 4 seconds within 27 minutes of user time:
>>>>>>>
>>>>>>>         tainted:        26:52
>>>>>>>         orig:           26:48
>>>>>>>
>>>>>>> This was the average aggregate time of two runs compiling all 242 .ii
>>>>>>> files.
>>>>>>> IMO, this looks reasonable.  It is after all, -O3.    Is it
>>>>>>> acceptable?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> +  while (!worklist.is_empty ())
>>>>>> +    {
>>>>>> +      name = worklist.pop ();
>>>>>> +      gcc_assert (TREE_CODE (name) == SSA_NAME);
>>>>>> +
>>>>>> +      if (ssa_undefined_value_p (name, true))
>>>>>> +       return true;
>>>>>> +
>>>>>> +      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>>>>
>>>>>> it should be already set as we use visited_ssa as "was it ever on the
>>>>>> worklist",
>>>>>> so maybe renaming it would be a good thing as well.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I don't understand what you would prefer here.
>>>>
>>>>
>>>>
>>>> Set the bit when you put name on the worklist (and do not do that if the
>>>> bit is set).  Thus simply remove the above and add a bitmap_set_bit
>>>> for the initial name you put on the worklist.
>>>>
>>>>>>
>>>>>> +             if (TREE_CODE (name) == SSA_NAME)
>>>>>> +               {
>>>>>> +                 /* If an SSA has already been seen, this may be a
>>>>>> loop.
>>>>>> +                    Fail conservatively.  */
>>>>>> +                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
>>>>>> (name)))
>>>>>> +                   return false;
>>>>>>
>>>>>> so to me "conservative" is returning true, not false.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> OK
>>>>>
>>>>>>
>>>>>> +                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>>>>>> (name));
>>>>>> +                 worklist.safe_push (name);
>>>>>>
>>>>>> but for loops we can just continue and ignore this use.  And
>>>>>> bitmap_set_bit
>>>>>> returns whether it set a bit, thus
>>>>>>
>>>>>>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>>>>>> (name)))
>>>>>>                   worklist.safe_push (name);
>>>>>>
>>>>>> should work?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Fixed.
>>>>>
>>>>>>
>>>>>> +      /* Check that any SSA names used to define NAME is also fully
>>>>>> +        defined.  */
>>>>>> +      use_operand_p use_p;
>>>>>> +      ssa_op_iter iter;
>>>>>> +      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
>>>>>> +       {
>>>>>> +         name = USE_FROM_PTR (use_p);
>>>>>> +         if (TREE_CODE (name) == SSA_NAME)
>>>>>>
>>>>>> always true.
>>>>>>
>>>>>> +           {
>>>>>> +             /* If an SSA has already been seen, this may be a loop.
>>>>>> +                Fail conservatively.  */
>>>>>> +             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>>>>>> +               return false;
>>>>>> +             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>>>> +             worklist.safe_push (name);
>>>>>>
>>>>>> See above.
>>>>>>
>>>>>> In principle the thing is sound but I'd like to be able to pass in a
>>>>>> bitmap of
>>>>>> known maybe-undefined/must-defined SSA names to have a cache for
>>>>>> multiple invocations from within a pass (like this unswitching case).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Done, though bitmaps are now calculated as part of the instantiation.
>>>>>
>>>>>>
>>>>>> Also once you hit defs that are in a post-dominated region of the loop
>>>>>> entry
>>>>>> you can treat them as not undefined (as their use invokes undefined
>>>>>> behavior).  This is also how you treat function parameters (well,
>>>>>> ssa_undefined_value_p does), where the call site invokes undefined
>>>>>> behavior
>>>>>> when passing in undefined values.  So we need an extra parameter
>>>>>> specifying
>>>>>> the post-dominance region.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Done.
>>>>>
>>>>>>
>>>>>> You do not handle memory or calls conservatively which means the
>>>>>> existing
>>>>>> testcase only needs some obfuscation to become a problem again.  To
>>>>>> fix
>>>>>> that before /* Check that any SSA names used to define NAME is also
>>>>>> fully
>>>>>> defined.  */ bail out conservatively, like
>>>>>>
>>>>>>    if (! is_gimple_assign (def)
>>>>>>       || gimple_assign_single_p (def))
>>>>>>     return true;
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> As I asked previously, I understand the !is_gimple_assign, which will
>>>>> skip
>>>>> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def)
>>>>> ==
>>>>> true"??
>>>>>
>>>>> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88?
>>>>> Don't
>>>>> we
>>>>> want to follow up the def chain precisely on these?
>>>>>
>>>>> The attached implementation uses a cache, and a pre-computed
>>>>> post-dominance
>>>>> region.  A previous incantation of this patch (sans the post-dominance
>>>>> stuff) had performance within the noise of the previous implementation.
>>>>>
>>>>> I am testing again, and will do some performance comparisons in a bit,
>>>>> but
>>>>> for now-- are we on the same page here?  Is this what you wanted?
>>>>
>>>>
>>>>
>>>> +      /* DEFs in the post-dominated region of the loop entry invoke
>>>> +        undefined behavior.  Adding any use won't make things any
>>>> +        worse.  */
>>>> +      for (unsigned i = 0; i < postdom.length (); ++i)
>>>> +       if (def->bb == postdom[i])
>>>>
>>>> gimple_bb (def)
>>>>
>>>> +         {
>>>> +           set_defined (t);
>>>> +           return false;
>>>> +         }
>>>>
>>>> I think you can't really return false here but need to continue
>>>> processing
>>>> the worklist.  I'd say rather than a linear walk you can as well use
>>>> dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry
>>>> block instead?
>>>>
>>>> Note that the way you compute post-dominators doesn't work -- nothing
>>>> keeps them up-to-date when unswitching does CFG manipulations :/
>>>> Fortunately we have a way to compute them per region, see how
>>>> tree-if-conv.c
>>>> does that (build_region, calculate_dominance_info_for_region).
>>>
>>>
>>>
>>> If I understand correctly, we could compute them per region in
>>> tree_unswitch_single_loop() for each individual loop with what
>>> tree-if-conv.c uses.
>>>
>>> I could certainly abstract
>>> build_region/calculate_dominance_info_for_region
>>> into something new, say calculate_dominance_info_for_loop_region().  Then
>>> we
>>> could use dominated_by_p(CDI_POST_DOMINATORS, ...) in my class as you
>>> suggest.
>>>
>>> Question... computing the post-dom tree per region as I've just described
>>> will only create post-dom information for the loop region, which means
>>> that
>>> any definition outside of the loop will have zero dominance info.
>>
>>
>> Yes.
>>
>>> What if the definition in question post-dominates the loop entry but is
>>> outside/after of the loop?
>>
>>
>> How would we ever arrive at such def?  Once we reach a def before the
>> region/loop
>> we know it's fine to use (the missing "stop" point I pointed out).
>
>
> Hmmm, it looks like we can't even build the post-dom region appropriately in
> the presence of infinite loops.
>
> First, build_region() fails if it can't find a loop post-header:
>
>   /* The last element is loop post-header.  */
>   gcc_assert (exit_bb);
>
> which we won't have in an infinite loop like:
>
> bb2: //loop header
> |
> V
> loop 1:
> bb3:            <-------+
>         bar();          |
> |                       |
> V                       |
> bb4:                    |
>         goto bb3;   >---+
>
>
> We could just build the region without the post-header, but then we fail
> while building the DFS dominance region:
>
> dom_info::calc_dfs_tree_nonrec (basic_block bb)
> ....
>           gcc_assert (bn != m_start_block);
>
> Presumably because we've looped back to the start block (bb4 in a post
> dominator world).  This doesn't happen calculating going forward because we
> always have a start block outside of the region (the function entry block).
>
> I tried to fake edge my way out of it, but things get really messed up while
> putting/removing fake edges in the presence of loop info.  I'm assuming all
> this is by design.

Heh.  Infinite loops are indeed fun, and yes, the fake edges confuse some
of the loop stuff.

> Would you prefer me to continue down this path, trying to build a post-dom
> region with infinite loops and fixing build_region / calc_dfs_tree_nonrec,
> or could we do without this dominance optimization?  Pretty please?

It will pessimize some cases though, but see below.

>>
>>>  We will have no post-dom information for this.
>>> In this case, could we just ignore and continue evaluating the SSA_NAME
>>> with
>>> the rest of our heuristics, or did you have something else in mind?
>>> Another
>>> option would be to calculate the post-dom information for the entire
>>> function on every loop (calculate_dominance_info(CDI_POST_DOMINATORS)),
>>> but
>>> that seems rather expensive.
>>>
>>> As a side note, at this point I just want to fix/close the PR in an
>>> acceptable manner, not come up with the end-all catch-all most-efficient
>>> patch for an unlikely scenario ;-).
>>
>>
>> Yeah, which is why I wonder whether the caching is worth the trouble (in
>> it's
>> current form) for the unswitching use (given it's other restrictions
>> on conditions
>> to unswitch).
>
>
> We could go back to my original, no caching version (with the other
> suggestions).  That solves the problem quite simply, with no regressions.

So let's go with a unswitching-local solution then.  Based on
tree_may_unswitch_on:

  /* Condition must be invariant.  */
  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
    {
      /* Unswitching on undefined values would introduce undefined
         behavior that the original program might never exercise.  */
      if (ssa_undefined_value_p (use, true))
        return NULL_TREE;
      def = SSA_NAME_DEF_STMT (use);
      def_bb = gimple_bb (def);
      if (def_bb
          && flow_bb_inside_loop_p (loop, def_bb))
        return NULL_TREE;

we only have to look for uses in blocks dominating the loop header block
(or in blocks post-dominating that loop header, but we can probably implement
that by simply including the loop header itself with a FIXME comment).

Note that we only need to know whether a BB will be always executed when
the loop is entered -- there's "infrastructure" elsewhere that computes this
w/o the need of post-dominance.  For example see fill_always_executed_in_1
tree-ssa-loop-im.c (we can't use that 1:1 I think because we already use ->aux
via the original copy tables, but we could simplify it as we're only
interested in
the loop which preheader we put the unswitch condition on so we can use
a bitmap to record whether a block of the loop is always executed or not).

Can you send a patch that does this maybe?

Thanks,
Richard.


> Thanks again.
> Aldy

Patch
diff mbox

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2aae684..beff302 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1511,6 +1511,7 @@  OBJS = \
 	tree-ssa-coalesce.o \
 	tree-ssa-copy.o \
 	tree-ssa-dce.o \
+	tree-ssa-defined-or-undefined.o \
 	tree-ssa-dom.o \
 	tree-ssa-dse.o \
 	tree-ssa-forwprop.o \
diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 1b2c8e0..bc32a21 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -802,4 +802,25 @@  bmp_iter_and_compl (bitmap_iterator *bi, unsigned *bit_no)
        bmp_iter_and_compl (&(ITER), &(BITNUM));				\
        bmp_iter_next (&(ITER), &(BITNUM)))
 
+/* A class that ties the lifetime of a bitmap to its scope.  */
+class auto_bitmap
+{
+ public:
+  auto_bitmap () { bits = BITMAP_ALLOC (NULL); }
+  ~auto_bitmap () { BITMAP_FREE (bits); }
+  // Allow calling bitmap functions on our bitmap.
+  operator bitmap () { return bits; }
+
+ private:
+  // Prevent making a copy that references our bitmap.
+  auto_bitmap (const auto_bitmap &);
+  auto_bitmap &operator = (const auto_bitmap &);
+#if __cplusplus >= 201103L
+  auto_bitmap (auto_bitmap &&);
+  auto_bitmap &operator = (auto_bitmap &&);
+#endif
+
+  bitmap bits;
+};
+
 #endif /* GCC_BITMAP_H */
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr71691.c b/gcc/testsuite/gcc.c-torture/execute/pr71691.c
new file mode 100644
index 0000000..1ffd1a2
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr71691.c
@@ -0,0 +1,47 @@ 
+/* { dg-options "-fno-tree-vrp" } */
+
+char b;
+short f;
+unsigned e;
+int g = 20;
+
+void
+foo ()
+{
+  int l, h;
+  for (l = 0; l <= 7; l++)
+    {
+      int j = 38;
+      if (g)
+	h = 0;
+      for (; h <= 7; h++)
+	{
+	  int i, k = b % (j % 4);
+	  g = f;
+	  for (;;)
+	    {
+	      j = 6 || b;
+	      if (e)
+		{
+		  for (; j; --j)
+		    if (k)
+		      __builtin_printf ("%d", 9);
+		  if (i)
+		    __builtin_printf ("%d", j);
+		}
+	      if (l)
+		continue;
+	      break;
+	    }
+	  i = f || b;
+	}
+    }
+}
+
+int
+main ()
+{
+  foo ();
+  return 0;
+}
+
diff --git a/gcc/tree-ssa-defined-or-undefined.c b/gcc/tree-ssa-defined-or-undefined.c
new file mode 100644
index 0000000..f845eba
--- /dev/null
+++ b/gcc/tree-ssa-defined-or-undefined.c
@@ -0,0 +1,134 @@ 
+/* Simple class for classifying SSA_NAMEs that are either fully
+   defined or possibly undefined.
+
+   This is meant to support conservative analysis for optimization
+   purposes, not for generating warnings.  The analysis for generating
+   warnings is deeper to avoid generation of false positives.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by Aldy Hernandez <aldyh@redhat.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "ssa.h"
+#include "tree-ssa.h"
+#include "tree-ssa-defined-or-undefined.h"
+
+/* Return TRUE if an SSA_NAME maybe undefined.  */
+
+bool
+defined_or_undefined::is_maybe_undefined (const tree name)
+{
+  if (in_cache (name))
+    return bitmap_bit_p (b_maybe_undef, SSA_NAME_VERSION (name));
+
+  auto_bitmap visited_ssa;
+  auto_vec<tree> worklist;
+  worklist.safe_push (name);
+  while (!worklist.is_empty ())
+    {
+      tree t = worklist.pop ();
+
+      /* If it's obviously undefined, avoid further computations.  */
+      if (ssa_undefined_value_p (t, true))
+	{
+	  set_maybe_undefined (t);
+	  if (t != name)
+	    set_maybe_undefined (name);
+	  return true;
+	}
+
+      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (t));
+
+      /* A PARM_DECL will not have an SSA_NAME_DEF_STMT.  Parameters
+	 get their initial value from function entry.  */
+      if (SSA_NAME_VAR (t) && TREE_CODE (SSA_NAME_VAR (t)) == PARM_DECL)
+	{
+	  set_defined (t);
+	  continue;
+	}
+
+      gimple *def = SSA_NAME_DEF_STMT (t);
+
+      /* DEFs in the post-dominated region of the loop entry invoke
+	 undefined behavior.  Adding any use won't make things any
+	 worse.  */
+      for (unsigned i = 0; i < postdom.length (); ++i)
+	if (def->bb == postdom[i])
+	  {
+	    set_defined (t);
+	    return false;
+	  }
+
+      /* Check that all the PHI args are fully defined.  */
+      if (gphi *phi = dyn_cast <gphi *> (def))
+	{
+	  if (virtual_operand_p (PHI_RESULT (phi)))
+	    continue;
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	    {
+	      tree t = gimple_phi_arg_def (phi, i);
+	      /* If an SSA has already been seen, it may be a loop,
+		 but we can continue and ignore this use.  Otherwise,
+		 add the SSA_NAME to the queue and visit it later.  */
+	      if (TREE_CODE (t) == SSA_NAME
+		  && bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (t)))
+		worklist.safe_push (t);
+	    }
+	  continue;
+	}
+
+      /* Handle memory and calls conservatively.  */
+      if (!is_gimple_assign (def)
+	  /* ?? Do we really want this?  The code below will return
+	     TRUE for something like "e.3_7 = e", which is probably
+	     not what we want.
+	  || gimple_assign_single_p (def)
+	  */
+	  )
+	{
+	  set_maybe_undefined (name);
+	  return true;
+	}
+
+      /* Check that any SSA names used to define NAME is also fully
+	 defined.  */
+      use_operand_p use_p;
+      ssa_op_iter iter;
+      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+	{
+	  tree t = USE_FROM_PTR (use_p);
+	  /* If an SSA has already been seen, it may be a loop,
+	     but we can continue and ignore this use.  Otherwise,
+	     add the SSA_NAME to the queue and visit it later.  */
+	  if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (t)))
+	    worklist.safe_push (t);
+	}
+    }
+  set_defined (name);
+  return false;
+}
diff --git a/gcc/tree-ssa-defined-or-undefined.h b/gcc/tree-ssa-defined-or-undefined.h
new file mode 100644
index 0000000..23e306a
--- /dev/null
+++ b/gcc/tree-ssa-defined-or-undefined.h
@@ -0,0 +1,73 @@ 
+/* Simple class for identifying SSA_NAMEs that are either fully defined
+   or maybe undefined.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by Aldy Hernandez <aldyh@redhat.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_TREE_SSA_DEFINED_OR_UNDEFINED_H
+#define GCC_TREE_SSA_DEFINED_OR_UNDEFINED_H
+
+/* Simple class for identifying SSA_NAMEs that are either fully
+   defined or maybe undefined.
+
+   This is meant to support conservative analysis for optimization
+   purposes, not for generating warnings.  The analysis for generating
+   warnings is deeper to avoid generation of false positives.
+
+   Instantiation of this class triggers analysis which is not kept
+   up-to-date as changes in the IL or CFG are made.
+
+   Queries will return the conservative result (maybe undefined) for
+   SSA_NAMEs that were not encountered during the initial
+   analysis.  */
+
+class defined_or_undefined
+{
+ private:
+  // Bitmap specifying which SSAs are in the cache.
+  auto_bitmap b_seen;
+  // Bit is set if a given SSA is maybe undefined, otherwise the given
+  // SSA is definitely defined.
+  auto_bitmap b_maybe_undef;
+  // The post-dominance region of the current loop's entry.
+  vec<basic_block> postdom;
+
+  // Return TRUE if SSA has been previously cached.
+  bool in_cache (tree ssa)
+  { return bitmap_bit_p (b_seen, SSA_NAME_VERSION (ssa)); }
+  // Cache SSA to be maybe undefined.
+  void set_maybe_undefined (tree ssa)
+  { bitmap_set_bit (b_seen, SSA_NAME_VERSION (ssa));
+    bitmap_set_bit (b_maybe_undef, SSA_NAME_VERSION (ssa)); }
+  // Cache SSA to be definitely defined.
+  void set_defined (tree ssa)
+  { bitmap_set_bit (b_seen, SSA_NAME_VERSION (ssa));
+    bitmap_clear_bit (b_maybe_undef, SSA_NAME_VERSION (ssa)); }
+ public:
+  defined_or_undefined ();
+  // ENTRY is the current loop's entry block.
+  defined_or_undefined (basic_block entry)
+    { postdom = get_dominated_by_region (CDI_POST_DOMINATORS, &entry, 1); }
+  // Return TRUE if SSA is maybe undefined.
+  bool is_maybe_undefined (tree ssa);
+  // Return TRUE if SSA is definitely defined.
+  bool is_defined (tree ssa) { return !is_maybe_undefined (ssa); }
+};
+
+#endif // GCC_TREE_SSA_DEFINED_OR_UNDEFINED_H
diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
index 40905af..ea4b68d 100644
--- a/gcc/tree-ssa-loop-unswitch.c
+++ b/gcc/tree-ssa-loop-unswitch.c
@@ -38,6 +38,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "cfghooks.h"
 #include "tree-ssa-loop-manip.h"
+#include "tree-ssa-defined-or-undefined.h"
 
 /* This file implements the loop unswitching, i.e. transformation of loops like
 
@@ -77,7 +78,8 @@  along with GCC; see the file COPYING3.  If not see
 
 static struct loop *tree_unswitch_loop (struct loop *, basic_block, tree);
 static bool tree_unswitch_single_loop (struct loop *, int);
-static tree tree_may_unswitch_on (basic_block, struct loop *);
+static tree tree_may_unswitch_on (basic_block, struct loop *,
+				  defined_or_undefined *);
 static bool tree_unswitch_outer_loop (struct loop *);
 static edge find_loop_guard (struct loop *);
 static bool empty_bb_without_guard_p (struct loop *, basic_block);
@@ -94,6 +96,9 @@  tree_ssa_unswitch_loops (void)
   struct loop *loop;
   bool changed = false;
 
+  /* Used for defined_or_undefined use later.  */
+  calculate_dominance_info (CDI_POST_DOMINATORS);
+
   /* Go through all loops starting from innermost.  */
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
     {
@@ -104,6 +109,8 @@  tree_ssa_unswitch_loops (void)
 	changed |= tree_unswitch_outer_loop (loop);
     }
 
+  free_dominance_info (CDI_POST_DOMINATORS);
+
   if (changed)
     return TODO_cleanup_cfg;
   return 0;
@@ -113,7 +120,8 @@  tree_ssa_unswitch_loops (void)
    basic blocks (for what it means see comments below).  */
 
 static tree
-tree_may_unswitch_on (basic_block bb, struct loop *loop)
+tree_may_unswitch_on (basic_block bb, struct loop *loop,
+		      defined_or_undefined *defined_or_undefined)
 {
   gimple *last, *def;
   gcond *stmt;
@@ -138,7 +146,7 @@  tree_may_unswitch_on (basic_block bb, struct loop *loop)
     {
       /* Unswitching on undefined values would introduce undefined
 	 behavior that the original program might never exercise.  */
-      if (ssa_undefined_value_p (use, true))
+      if (defined_or_undefined->is_maybe_undefined (use))
 	return NULL_TREE;
       def = SSA_NAME_DEF_STMT (use);
       def_bb = gimple_bb (def);
@@ -200,6 +208,7 @@  tree_unswitch_single_loop (struct loop *loop, int num)
   gimple *stmt;
   bool changed = false;
   HOST_WIDE_INT iterations;
+  defined_or_undefined defined_or_undefined (loop->header);
 
   /* Perform initial tests if unswitch is eligible.  */
   if (num == 0)
@@ -243,7 +252,7 @@  tree_unswitch_single_loop (struct loop *loop, int num)
     {
       /* Find a bb to unswitch on.  */
       for (; i < loop->num_nodes; i++)
-	if ((cond = tree_may_unswitch_on (bbs[i], loop)))
+	if ((cond = tree_may_unswitch_on (bbs[i], loop, &defined_or_undefined)))
 	  break;
 
       if (i == loop->num_nodes)
@@ -363,7 +372,8 @@  tree_unswitch_single_loop (struct loop *loop, int num)
       /* Find a bb to unswitch on.  */
       for (; found < loop->num_nodes; found++)
 	if ((bbs[found]->flags & BB_REACHABLE)
-	    && (cond = tree_may_unswitch_on (bbs[found], loop)))
+	    && (cond = tree_may_unswitch_on (bbs[found], loop,
+					     &defined_or_undefined)))
 	  break;
 
       if (found == loop->num_nodes)