[2/3] Extended if-conversion
diff mbox

Message ID CAFiYyc0fVbik5mcWGOarAomBsFJMva6nXRFhsU_Pf=KufzjUmA@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Dec. 10, 2014, 2:31 p.m. UTC
On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> Sorry that I forgot to delete debug dump from my fix.
> I have few questions about your comments.
>
> 1. You wrote :
>> You also still have two functions for PHI predication.  And the
>> new extended variant doesn't commonize the 2-args and general
>> path
>  Did you mean that I must combine predicate_scalar_phi and
> predicate_extended scalar phi to one function?
> Please note that if additional flag was not set up (i.e.
> aggressive_if_conv is false) extended predication is required more
> compile time since it builds hash_map.

It's compile-time complexity is reasonable enough even for
non-aggressive if-conversion.

> 2. About critical edge splitting.
>
> Did you mean that we should perform it (1) under aggressive_if_conv
> option only; (2) should we split all critical edges.
> Note that this leads to recomputing of topological order.

Well, I don't mind splitting all critical edges unconditionally, thus
do something like


> It is worth noting that in current implementation bb's with 2
> predecessors and both are on critical edges are accepted without
> additional option.

Yes, I know.

tree-if-conv.c is a mess right now and if we can avoid adding more
to it and even fix the critical edge missed optimization with splitting
critical edges then I am all for that solution.

Richard.

> Thanks ahead.
> Yuri.
> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> Here is updated patch2 with the following changes:
>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>> 2. Use only one function for extended predication -
>>> predicate_extended_scalar_phi.
>>> 3. Save gsi before insertion of predicate computations for basic
>>> blocks if it has 2 predecessors and
>>> both incoming edges are critical or it gas more than 2 predecessors
>>> and at least one incoming edge
>>> is critical. This saved iterator can be used by extended phi predication.
>>>
>>> Here is motivated test-case which explains this point.
>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>> -ftree-loop-vectorize -fopenmp options.
>>> The problem phi is in bb-7:
>>>
>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>   {
>>>     <bb 5>:
>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>     if (xmax_17 == xmax_27)
>>>       goto <bb 7>;
>>>     else
>>>       goto <bb 9>;
>>>
>>>   }
>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>   {
>>>     <bb 6>:
>>>     if (xmax_17 == xmax_27)
>>>       goto <bb 7>;
>>>     else
>>>       goto <bb 8>;
>>>
>>>   }
>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>   {
>>>     <bb 7>:
>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>     goto <bb 11>;
>>>
>>>   }
>>>
>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>> restoring gsi in predicate_all_scalar_phi:
>>> #if 0
>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>    gsi = bb_insert_point (bb);
>>>  else
>>> #endif
>>>    gsi = gsi_after_labels (bb);
>>>
>>> we will get ICE:
>>> t5.c: In function 'foo':
>>> t5.c:9:6: error: definition in block 4 follows the use
>>>  void foo (int n)
>>>       ^
>>> for SSA_NAME: _1 in statement:
>>> _52 = _1 & _3;
>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>
>>> smce predicate computations were inserted in bb_7.
>>
>> The issue is obviously that the predicates have already been emitted
>> in the target BB - that's of course the wrong place.  This is done
>> by insert_gimplified_predicates.
>>
>> This just shows how edge predicate handling is broken - we don't
>> seem to have a sequence of gimplified stmts for edge predicates
>> but push those to e->dest which makes this really messy.
>>
>> Rather than having a separate phase where we insert all
>> gimplified bb predicates we should do that on-demand when
>> predicating a PHI.
>>
>> Your patch writes to stderr - that's bad - use dump_file and guard
>> the printfs properly.
>>
>> You also still have two functions for PHI predication.  And the
>> new extended variant doesn't commonize the 2-args and general
>> paths.
>>
>> I'm not at all happy with this code.  It may be existing if-conv codes
>> fault but making it even worse is not an option.
>>
>> Again - what's wrong with simply splitting critical edges if
>> aggressive_if_conv?  I think that would very much simplify
>> things here.  Or alternatively use gsi_insert_on_edge and
>> commit edge insertions before merging the blocks.
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog is
>>>
>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-if-conv.c : Include hash-map.h.
>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>> statement iterator.
>>> (bb_insert_point): New function.
>>> (set_bb_insert_point): New function.
>>> (has_pred_critical_p): New function.
>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>> AGGRESSIVE_IF_CONV is true.
>>> (if_convertible_bb_p): Delete check that bb has at least one
>>> non-critical incoming edge.
>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>> Allow interchange PHI arguments if EXTENDED is false.
>>> Change check that block containing reduction statement candidate
>>> is predecessor of phi-block since phi may have more than two arguments.
>>> (predicate_scalar_phi): Add new arguments for call of
>>> is_cond_scalar_reduction.
>>> (get_predicate_for_edge): New function.
>>> (struct phi_args_hash_traits): New type.
>>> (phi_args_hash_traits::hash): New function.
>>> (phi_args_hash_traits::equal_keys): New function.
>>> (gen_phi_arg_condition): New function.
>>> (predicate_extended_scalar_phi): New function.
>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>> to true if BB containing phi has more than 2 predecessors or both
>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>> has 2 predecessors and both incoming edges are critical or it has more
>>> than 2 predecessors and atleast one incoming edge is critical.
>>> Use standard gsi_after_labels otherwise.
>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>> to save gsi before insertion of predicate computations. SEt-up it to
>>> true for BB with 2 predecessors and critical incoming edges either
>>>         number of predecessors is geater 2 and at least one incoming edge is
>>> critical.
>>> Add check that non-predicated block may have statements to insert.
>>> Insert predicate computation of BB just after label if
>>> EXTENDED_PREDICATION is true.
>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>> is copy of inner or outer loop force_vectorize field.
>>>
>>>
>>>
>>>
>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>
>>>>> typedef struct bb_predicate_s {
>>>>>
>>>>>   /* The condition under which this basic block is executed.  */
>>>>>   tree predicate;
>>>>>
>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>
>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>   gimple_stmt_iterator gsi;
>>>>> } *bb_predicate_p;
>>>>>
>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>> insertion code for predicate computation. I checked that this fix
>>>>> works.
>>>>
>>>> Huh?  I still wonder what the issue is with inserting everything
>>>> after the PHI we predicate.
>>>>
>>>> Well, your updated patch will come with testcases for the testsuite
>>>> that will hopefully fail if doing that.
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> Now I am implementing merging of predicate_extended.. and
>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>
>>>>> Best regards.
>>>>> Yuri.
>>>>>
>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Thanks Richard for your quick reply!
>>>>>>>
>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>> splitting is not a good decision.
>>>>>>
>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> Best regards.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Hi Richard,
>>>>>>>>>
>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>
>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>
>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>
>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>   {
>>>>>>>>>     float t = a[i];
>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>> res += 1;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>
>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>
>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>
>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>
>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>
>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>
>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>
>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>> predicates, e.g.
>>>>>>>>>
>>>>>>>>> <bb 7>:
>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>> _48 = _46 & _47;
>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>> _54 = ~_53;
>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>> _56 = _54 & _55;
>>>>>>>>> _57 = _48 | _56;
>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>> goto <bb 11>;
>>>>>>>>>
>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>> block end.
>>>>>>>>
>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>> for critical edges unless you split them).
>>>>>>>>
>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>> GENERIC expressions.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>
>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>
>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>
>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>
>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>
>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>
>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>
>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

Comments

Yuri Rumyantsev Dec. 10, 2014, 3:22 p.m. UTC | #1
Richard,

Thanks for your reply!

I didn't understand your point:

Well, I don't mind splitting all critical edges unconditionally

but you do it unconditionally in proposed patch. Also I assume that
call of split_critical_edges() can break ssa. For example, we can
split headers of loops, loop exit blocks etc. I prefer to do something
more loop-specialized, e.g. call edge_split() for critical edges
outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
destination bb belongs to loop).


2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Sorry that I forgot to delete debug dump from my fix.
>> I have few questions about your comments.
>>
>> 1. You wrote :
>>> You also still have two functions for PHI predication.  And the
>>> new extended variant doesn't commonize the 2-args and general
>>> path
>>  Did you mean that I must combine predicate_scalar_phi and
>> predicate_extended scalar phi to one function?
>> Please note that if additional flag was not set up (i.e.
>> aggressive_if_conv is false) extended predication is required more
>> compile time since it builds hash_map.
>
> It's compile-time complexity is reasonable enough even for
> non-aggressive if-conversion.
>
>> 2. About critical edge splitting.
>>
>> Did you mean that we should perform it (1) under aggressive_if_conv
>> option only; (2) should we split all critical edges.
>> Note that this leads to recomputing of topological order.
>
> Well, I don't mind splitting all critical edges unconditionally, thus
> do something like
>
> Index: gcc/tree-if-conv.c
> ===================================================================
> --- gcc/tree-if-conv.c  (revision 218515)
> +++ gcc/tree-if-conv.c  (working copy)
> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>    if (number_of_loops (fun) <= 1)
>      return 0;
>
> +  bool critical_edges_split_p = false;
>    FOR_EACH_LOOP (loop, 0)
>      if (flag_tree_loop_if_convert == 1
>         || flag_tree_loop_if_convert_stores == 1
>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>             && !loop->dont_vectorize))
> -      todo |= tree_if_conversion (loop);
> +      {
> +       if (!critical_edges_split_p)
> +         {
> +           split_critical_edges ();
> +           critical_edges_split_p = true;
> +           todo |= TODO_cleanup_cfg;
> +         }
> +       todo |= tree_if_conversion (loop);
> +      }
>
>  #ifdef ENABLE_CHECKING
>    {
>
>> It is worth noting that in current implementation bb's with 2
>> predecessors and both are on critical edges are accepted without
>> additional option.
>
> Yes, I know.
>
> tree-if-conv.c is a mess right now and if we can avoid adding more
> to it and even fix the critical edge missed optimization with splitting
> critical edges then I am all for that solution.
>
> Richard.
>
>> Thanks ahead.
>> Yuri.
>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Here is updated patch2 with the following changes:
>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>> 2. Use only one function for extended predication -
>>>> predicate_extended_scalar_phi.
>>>> 3. Save gsi before insertion of predicate computations for basic
>>>> blocks if it has 2 predecessors and
>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>> and at least one incoming edge
>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>
>>>> Here is motivated test-case which explains this point.
>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>> -ftree-loop-vectorize -fopenmp options.
>>>> The problem phi is in bb-7:
>>>>
>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>   {
>>>>     <bb 5>:
>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>     if (xmax_17 == xmax_27)
>>>>       goto <bb 7>;
>>>>     else
>>>>       goto <bb 9>;
>>>>
>>>>   }
>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>   {
>>>>     <bb 6>:
>>>>     if (xmax_17 == xmax_27)
>>>>       goto <bb 7>;
>>>>     else
>>>>       goto <bb 8>;
>>>>
>>>>   }
>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>   {
>>>>     <bb 7>:
>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>     goto <bb 11>;
>>>>
>>>>   }
>>>>
>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>> restoring gsi in predicate_all_scalar_phi:
>>>> #if 0
>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>    gsi = bb_insert_point (bb);
>>>>  else
>>>> #endif
>>>>    gsi = gsi_after_labels (bb);
>>>>
>>>> we will get ICE:
>>>> t5.c: In function 'foo':
>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>  void foo (int n)
>>>>       ^
>>>> for SSA_NAME: _1 in statement:
>>>> _52 = _1 & _3;
>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>
>>>> smce predicate computations were inserted in bb_7.
>>>
>>> The issue is obviously that the predicates have already been emitted
>>> in the target BB - that's of course the wrong place.  This is done
>>> by insert_gimplified_predicates.
>>>
>>> This just shows how edge predicate handling is broken - we don't
>>> seem to have a sequence of gimplified stmts for edge predicates
>>> but push those to e->dest which makes this really messy.
>>>
>>> Rather than having a separate phase where we insert all
>>> gimplified bb predicates we should do that on-demand when
>>> predicating a PHI.
>>>
>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>> the printfs properly.
>>>
>>> You also still have two functions for PHI predication.  And the
>>> new extended variant doesn't commonize the 2-args and general
>>> paths.
>>>
>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>> fault but making it even worse is not an option.
>>>
>>> Again - what's wrong with simply splitting critical edges if
>>> aggressive_if_conv?  I think that would very much simplify
>>> things here.  Or alternatively use gsi_insert_on_edge and
>>> commit edge insertions before merging the blocks.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> ChangeLog is
>>>>
>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-if-conv.c : Include hash-map.h.
>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>> statement iterator.
>>>> (bb_insert_point): New function.
>>>> (set_bb_insert_point): New function.
>>>> (has_pred_critical_p): New function.
>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>> AGGRESSIVE_IF_CONV is true.
>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>> non-critical incoming edge.
>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>> Change check that block containing reduction statement candidate
>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>> (predicate_scalar_phi): Add new arguments for call of
>>>> is_cond_scalar_reduction.
>>>> (get_predicate_for_edge): New function.
>>>> (struct phi_args_hash_traits): New type.
>>>> (phi_args_hash_traits::hash): New function.
>>>> (phi_args_hash_traits::equal_keys): New function.
>>>> (gen_phi_arg_condition): New function.
>>>> (predicate_extended_scalar_phi): New function.
>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>> to true if BB containing phi has more than 2 predecessors or both
>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>> Use standard gsi_after_labels otherwise.
>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>> critical.
>>>> Add check that non-predicated block may have statements to insert.
>>>> Insert predicate computation of BB just after label if
>>>> EXTENDED_PREDICATION is true.
>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>> is copy of inner or outer loop force_vectorize field.
>>>>
>>>>
>>>>
>>>>
>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>
>>>>>> typedef struct bb_predicate_s {
>>>>>>
>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>   tree predicate;
>>>>>>
>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>
>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>   gimple_stmt_iterator gsi;
>>>>>> } *bb_predicate_p;
>>>>>>
>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>> works.
>>>>>
>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>> after the PHI we predicate.
>>>>>
>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>> that will hopefully fail if doing that.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>
>>>>>> Best regards.
>>>>>> Yuri.
>>>>>>
>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>
>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>> splitting is not a good decision.
>>>>>>>
>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Hi Richard,
>>>>>>>>>>
>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>
>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>
>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>
>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>   {
>>>>>>>>>>     float t = a[i];
>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>> res += 1;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>
>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>
>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>
>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>
>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>
>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>
>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>
>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>> predicates, e.g.
>>>>>>>>>>
>>>>>>>>>> <bb 7>:
>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>> _54 = ~_53;
>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>
>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>> block end.
>>>>>>>>>
>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>
>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>> GENERIC expressions.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>
>>>>>>>>>> Best regards.
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>
>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>
>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>
>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>
>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>
>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
Richard Biener Dec. 11, 2014, 8:59 a.m. UTC | #2
On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> Thanks for your reply!
>
> I didn't understand your point:
>
> Well, I don't mind splitting all critical edges unconditionally
>
> but you do it unconditionally in proposed patch.

I don't mind means I am fine with it.

> Also I assume that
> call of split_critical_edges() can break ssa. For example, we can
> split headers of loops, loop exit blocks etc.

How does that "break SSA"?  You mean loop-closed SSA?  I'd
be surprised if so but that may be possible.

> I prefer to do something
> more loop-specialized, e.g. call edge_split() for critical edges
> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
> destination bb belongs to loop).

That works for me as well but it is more complicated to implement.
Ideally you'd only split one edge if you find a block with only critical
predecessors (where we'd currently give up).  But note that this
requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
will change loop->num_nodes so we have to be more careful in
constructing the loop calling if_convertible_bb_p.

Richard.

>
> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> Sorry that I forgot to delete debug dump from my fix.
>>> I have few questions about your comments.
>>>
>>> 1. You wrote :
>>>> You also still have two functions for PHI predication.  And the
>>>> new extended variant doesn't commonize the 2-args and general
>>>> path
>>>  Did you mean that I must combine predicate_scalar_phi and
>>> predicate_extended scalar phi to one function?
>>> Please note that if additional flag was not set up (i.e.
>>> aggressive_if_conv is false) extended predication is required more
>>> compile time since it builds hash_map.
>>
>> It's compile-time complexity is reasonable enough even for
>> non-aggressive if-conversion.
>>
>>> 2. About critical edge splitting.
>>>
>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>> option only; (2) should we split all critical edges.
>>> Note that this leads to recomputing of topological order.
>>
>> Well, I don't mind splitting all critical edges unconditionally, thus
>> do something like
>>
>> Index: gcc/tree-if-conv.c
>> ===================================================================
>> --- gcc/tree-if-conv.c  (revision 218515)
>> +++ gcc/tree-if-conv.c  (working copy)
>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>    if (number_of_loops (fun) <= 1)
>>      return 0;
>>
>> +  bool critical_edges_split_p = false;
>>    FOR_EACH_LOOP (loop, 0)
>>      if (flag_tree_loop_if_convert == 1
>>         || flag_tree_loop_if_convert_stores == 1
>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>             && !loop->dont_vectorize))
>> -      todo |= tree_if_conversion (loop);
>> +      {
>> +       if (!critical_edges_split_p)
>> +         {
>> +           split_critical_edges ();
>> +           critical_edges_split_p = true;
>> +           todo |= TODO_cleanup_cfg;
>> +         }
>> +       todo |= tree_if_conversion (loop);
>> +      }
>>
>>  #ifdef ENABLE_CHECKING
>>    {
>>
>>> It is worth noting that in current implementation bb's with 2
>>> predecessors and both are on critical edges are accepted without
>>> additional option.
>>
>> Yes, I know.
>>
>> tree-if-conv.c is a mess right now and if we can avoid adding more
>> to it and even fix the critical edge missed optimization with splitting
>> critical edges then I am all for that solution.
>>
>> Richard.
>>
>>> Thanks ahead.
>>> Yuri.
>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Here is updated patch2 with the following changes:
>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>> 2. Use only one function for extended predication -
>>>>> predicate_extended_scalar_phi.
>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>> blocks if it has 2 predecessors and
>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>> and at least one incoming edge
>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>
>>>>> Here is motivated test-case which explains this point.
>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>> The problem phi is in bb-7:
>>>>>
>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>   {
>>>>>     <bb 5>:
>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>     if (xmax_17 == xmax_27)
>>>>>       goto <bb 7>;
>>>>>     else
>>>>>       goto <bb 9>;
>>>>>
>>>>>   }
>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>   {
>>>>>     <bb 6>:
>>>>>     if (xmax_17 == xmax_27)
>>>>>       goto <bb 7>;
>>>>>     else
>>>>>       goto <bb 8>;
>>>>>
>>>>>   }
>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>   {
>>>>>     <bb 7>:
>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>     goto <bb 11>;
>>>>>
>>>>>   }
>>>>>
>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>> #if 0
>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>    gsi = bb_insert_point (bb);
>>>>>  else
>>>>> #endif
>>>>>    gsi = gsi_after_labels (bb);
>>>>>
>>>>> we will get ICE:
>>>>> t5.c: In function 'foo':
>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>  void foo (int n)
>>>>>       ^
>>>>> for SSA_NAME: _1 in statement:
>>>>> _52 = _1 & _3;
>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>
>>>>> smce predicate computations were inserted in bb_7.
>>>>
>>>> The issue is obviously that the predicates have already been emitted
>>>> in the target BB - that's of course the wrong place.  This is done
>>>> by insert_gimplified_predicates.
>>>>
>>>> This just shows how edge predicate handling is broken - we don't
>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>> but push those to e->dest which makes this really messy.
>>>>
>>>> Rather than having a separate phase where we insert all
>>>> gimplified bb predicates we should do that on-demand when
>>>> predicating a PHI.
>>>>
>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>> the printfs properly.
>>>>
>>>> You also still have two functions for PHI predication.  And the
>>>> new extended variant doesn't commonize the 2-args and general
>>>> paths.
>>>>
>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>> fault but making it even worse is not an option.
>>>>
>>>> Again - what's wrong with simply splitting critical edges if
>>>> aggressive_if_conv?  I think that would very much simplify
>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>> commit edge insertions before merging the blocks.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> ChangeLog is
>>>>>
>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>> statement iterator.
>>>>> (bb_insert_point): New function.
>>>>> (set_bb_insert_point): New function.
>>>>> (has_pred_critical_p): New function.
>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>> AGGRESSIVE_IF_CONV is true.
>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>> non-critical incoming edge.
>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>> Change check that block containing reduction statement candidate
>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>> is_cond_scalar_reduction.
>>>>> (get_predicate_for_edge): New function.
>>>>> (struct phi_args_hash_traits): New type.
>>>>> (phi_args_hash_traits::hash): New function.
>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>> (gen_phi_arg_condition): New function.
>>>>> (predicate_extended_scalar_phi): New function.
>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>> Use standard gsi_after_labels otherwise.
>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>> critical.
>>>>> Add check that non-predicated block may have statements to insert.
>>>>> Insert predicate computation of BB just after label if
>>>>> EXTENDED_PREDICATION is true.
>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>
>>>>>>> typedef struct bb_predicate_s {
>>>>>>>
>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>   tree predicate;
>>>>>>>
>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>
>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>> } *bb_predicate_p;
>>>>>>>
>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>> works.
>>>>>>
>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>> after the PHI we predicate.
>>>>>>
>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>> that will hopefully fail if doing that.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>
>>>>>>> Best regards.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>
>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>> splitting is not a good decision.
>>>>>>>>
>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>
>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>
>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>
>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>
>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>   {
>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>> res += 1;
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>
>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>
>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>
>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>
>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>
>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>
>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>
>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>
>>>>>>>>>>> <bb 7>:
>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>
>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>> block end.
>>>>>>>>>>
>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>
>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>
>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>
>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>
>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>
>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
Yuri Rumyantsev Dec. 16, 2014, 3:15 p.m. UTC | #3
Hi Richard,

Here is updated patch which includes
(1) split critical edges for aggressive if conversion.
(2) delete all stuff related to support of critical edge predication.
(3) only one function - predicate_scalar_phi performs predication.
(4) function find_phi_replacement_condition was deleted since it was
included in predicate_scalar_phi for phi with two arguments.

I checked that patch works in stress testing mode, i.e. with
aggressive if conversion by default.

What is your opinion?

Thanks.
Yuri.

2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Thanks for your reply!
>>
>> I didn't understand your point:
>>
>> Well, I don't mind splitting all critical edges unconditionally
>>
>> but you do it unconditionally in proposed patch.
>
> I don't mind means I am fine with it.
>
>> Also I assume that
>> call of split_critical_edges() can break ssa. For example, we can
>> split headers of loops, loop exit blocks etc.
>
> How does that "break SSA"?  You mean loop-closed SSA?  I'd
> be surprised if so but that may be possible.
>
>> I prefer to do something
>> more loop-specialized, e.g. call edge_split() for critical edges
>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>> destination bb belongs to loop).
>
> That works for me as well but it is more complicated to implement.
> Ideally you'd only split one edge if you find a block with only critical
> predecessors (where we'd currently give up).  But note that this
> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
> will change loop->num_nodes so we have to be more careful in
> constructing the loop calling if_convertible_bb_p.
>
> Richard.
>
>>
>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Sorry that I forgot to delete debug dump from my fix.
>>>> I have few questions about your comments.
>>>>
>>>> 1. You wrote :
>>>>> You also still have two functions for PHI predication.  And the
>>>>> new extended variant doesn't commonize the 2-args and general
>>>>> path
>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>> predicate_extended scalar phi to one function?
>>>> Please note that if additional flag was not set up (i.e.
>>>> aggressive_if_conv is false) extended predication is required more
>>>> compile time since it builds hash_map.
>>>
>>> It's compile-time complexity is reasonable enough even for
>>> non-aggressive if-conversion.
>>>
>>>> 2. About critical edge splitting.
>>>>
>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>> option only; (2) should we split all critical edges.
>>>> Note that this leads to recomputing of topological order.
>>>
>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>> do something like
>>>
>>> Index: gcc/tree-if-conv.c
>>> ===================================================================
>>> --- gcc/tree-if-conv.c  (revision 218515)
>>> +++ gcc/tree-if-conv.c  (working copy)
>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>    if (number_of_loops (fun) <= 1)
>>>      return 0;
>>>
>>> +  bool critical_edges_split_p = false;
>>>    FOR_EACH_LOOP (loop, 0)
>>>      if (flag_tree_loop_if_convert == 1
>>>         || flag_tree_loop_if_convert_stores == 1
>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>             && !loop->dont_vectorize))
>>> -      todo |= tree_if_conversion (loop);
>>> +      {
>>> +       if (!critical_edges_split_p)
>>> +         {
>>> +           split_critical_edges ();
>>> +           critical_edges_split_p = true;
>>> +           todo |= TODO_cleanup_cfg;
>>> +         }
>>> +       todo |= tree_if_conversion (loop);
>>> +      }
>>>
>>>  #ifdef ENABLE_CHECKING
>>>    {
>>>
>>>> It is worth noting that in current implementation bb's with 2
>>>> predecessors and both are on critical edges are accepted without
>>>> additional option.
>>>
>>> Yes, I know.
>>>
>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>> to it and even fix the critical edge missed optimization with splitting
>>> critical edges then I am all for that solution.
>>>
>>> Richard.
>>>
>>>> Thanks ahead.
>>>> Yuri.
>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Here is updated patch2 with the following changes:
>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>> 2. Use only one function for extended predication -
>>>>>> predicate_extended_scalar_phi.
>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>> blocks if it has 2 predecessors and
>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>> and at least one incoming edge
>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>
>>>>>> Here is motivated test-case which explains this point.
>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>> The problem phi is in bb-7:
>>>>>>
>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>   {
>>>>>>     <bb 5>:
>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>     if (xmax_17 == xmax_27)
>>>>>>       goto <bb 7>;
>>>>>>     else
>>>>>>       goto <bb 9>;
>>>>>>
>>>>>>   }
>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>   {
>>>>>>     <bb 6>:
>>>>>>     if (xmax_17 == xmax_27)
>>>>>>       goto <bb 7>;
>>>>>>     else
>>>>>>       goto <bb 8>;
>>>>>>
>>>>>>   }
>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>   {
>>>>>>     <bb 7>:
>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>     goto <bb 11>;
>>>>>>
>>>>>>   }
>>>>>>
>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>> #if 0
>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>    gsi = bb_insert_point (bb);
>>>>>>  else
>>>>>> #endif
>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>
>>>>>> we will get ICE:
>>>>>> t5.c: In function 'foo':
>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>  void foo (int n)
>>>>>>       ^
>>>>>> for SSA_NAME: _1 in statement:
>>>>>> _52 = _1 & _3;
>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>
>>>>>> smce predicate computations were inserted in bb_7.
>>>>>
>>>>> The issue is obviously that the predicates have already been emitted
>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>> by insert_gimplified_predicates.
>>>>>
>>>>> This just shows how edge predicate handling is broken - we don't
>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>> but push those to e->dest which makes this really messy.
>>>>>
>>>>> Rather than having a separate phase where we insert all
>>>>> gimplified bb predicates we should do that on-demand when
>>>>> predicating a PHI.
>>>>>
>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>> the printfs properly.
>>>>>
>>>>> You also still have two functions for PHI predication.  And the
>>>>> new extended variant doesn't commonize the 2-args and general
>>>>> paths.
>>>>>
>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>> fault but making it even worse is not an option.
>>>>>
>>>>> Again - what's wrong with simply splitting critical edges if
>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>> commit edge insertions before merging the blocks.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> ChangeLog is
>>>>>>
>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>> statement iterator.
>>>>>> (bb_insert_point): New function.
>>>>>> (set_bb_insert_point): New function.
>>>>>> (has_pred_critical_p): New function.
>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>> non-critical incoming edge.
>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>> Change check that block containing reduction statement candidate
>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>> is_cond_scalar_reduction.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (struct phi_args_hash_traits): New type.
>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>> (gen_phi_arg_condition): New function.
>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>> Use standard gsi_after_labels otherwise.
>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>> critical.
>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>> Insert predicate computation of BB just after label if
>>>>>> EXTENDED_PREDICATION is true.
>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>
>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>
>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>   tree predicate;
>>>>>>>>
>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>
>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>> } *bb_predicate_p;
>>>>>>>>
>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>> works.
>>>>>>>
>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>> after the PHI we predicate.
>>>>>>>
>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>> that will hopefully fail if doing that.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>
>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>
>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best regards.
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>
>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>
>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>
>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>   {
>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>
>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>
>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>
>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>
>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>
>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>
>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>
>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>
>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>
>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>> block end.
>>>>>>>>>>>
>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>
>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>
>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>
>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
Richard Biener Dec. 17, 2014, 3:41 p.m. UTC | #4
On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi Richard,
>
> Here is updated patch which includes
> (1) split critical edges for aggressive if conversion.
> (2) delete all stuff related to support of critical edge predication.
> (3) only one function - predicate_scalar_phi performs predication.
> (4) function find_phi_replacement_condition was deleted since it was
> included in predicate_scalar_phi for phi with two arguments.
>
> I checked that patch works in stress testing mode, i.e. with
> aggressive if conversion by default.
>
> What is your opinion?

Looks ok overall, but please simply do

  FOR_EACH_EDGE (e, ei, bb->succs)
    if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
      split_edge (e);

for all blocks apart from the latch.

Can you please send a combined patch up to this one?  Looking at
the incremental diff is somewhat hard.  Thus a patch including all
patches from patch1 to this one.

Thanks,
Richard.

>
> Thanks.
> Yuri.
>
> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> Thanks for your reply!
>>>
>>> I didn't understand your point:
>>>
>>> Well, I don't mind splitting all critical edges unconditionally
>>>
>>> but you do it unconditionally in proposed patch.
>>
>> I don't mind means I am fine with it.
>>
>>> Also I assume that
>>> call of split_critical_edges() can break ssa. For example, we can
>>> split headers of loops, loop exit blocks etc.
>>
>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>> be surprised if so but that may be possible.
>>
>>> I prefer to do something
>>> more loop-specialized, e.g. call edge_split() for critical edges
>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>> destination bb belongs to loop).
>>
>> That works for me as well but it is more complicated to implement.
>> Ideally you'd only split one edge if you find a block with only critical
>> predecessors (where we'd currently give up).  But note that this
>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>> will change loop->num_nodes so we have to be more careful in
>> constructing the loop calling if_convertible_bb_p.
>>
>> Richard.
>>
>>>
>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>> I have few questions about your comments.
>>>>>
>>>>> 1. You wrote :
>>>>>> You also still have two functions for PHI predication.  And the
>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>> path
>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>> predicate_extended scalar phi to one function?
>>>>> Please note that if additional flag was not set up (i.e.
>>>>> aggressive_if_conv is false) extended predication is required more
>>>>> compile time since it builds hash_map.
>>>>
>>>> It's compile-time complexity is reasonable enough even for
>>>> non-aggressive if-conversion.
>>>>
>>>>> 2. About critical edge splitting.
>>>>>
>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>> option only; (2) should we split all critical edges.
>>>>> Note that this leads to recomputing of topological order.
>>>>
>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>> do something like
>>>>
>>>> Index: gcc/tree-if-conv.c
>>>> ===================================================================
>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>    if (number_of_loops (fun) <= 1)
>>>>      return 0;
>>>>
>>>> +  bool critical_edges_split_p = false;
>>>>    FOR_EACH_LOOP (loop, 0)
>>>>      if (flag_tree_loop_if_convert == 1
>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>             && !loop->dont_vectorize))
>>>> -      todo |= tree_if_conversion (loop);
>>>> +      {
>>>> +       if (!critical_edges_split_p)
>>>> +         {
>>>> +           split_critical_edges ();
>>>> +           critical_edges_split_p = true;
>>>> +           todo |= TODO_cleanup_cfg;
>>>> +         }
>>>> +       todo |= tree_if_conversion (loop);
>>>> +      }
>>>>
>>>>  #ifdef ENABLE_CHECKING
>>>>    {
>>>>
>>>>> It is worth noting that in current implementation bb's with 2
>>>>> predecessors and both are on critical edges are accepted without
>>>>> additional option.
>>>>
>>>> Yes, I know.
>>>>
>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>> to it and even fix the critical edge missed optimization with splitting
>>>> critical edges then I am all for that solution.
>>>>
>>>> Richard.
>>>>
>>>>> Thanks ahead.
>>>>> Yuri.
>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Here is updated patch2 with the following changes:
>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>> 2. Use only one function for extended predication -
>>>>>>> predicate_extended_scalar_phi.
>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>> blocks if it has 2 predecessors and
>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>> and at least one incoming edge
>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>
>>>>>>> Here is motivated test-case which explains this point.
>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>> The problem phi is in bb-7:
>>>>>>>
>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>   {
>>>>>>>     <bb 5>:
>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>       goto <bb 7>;
>>>>>>>     else
>>>>>>>       goto <bb 9>;
>>>>>>>
>>>>>>>   }
>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>   {
>>>>>>>     <bb 6>:
>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>       goto <bb 7>;
>>>>>>>     else
>>>>>>>       goto <bb 8>;
>>>>>>>
>>>>>>>   }
>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>   {
>>>>>>>     <bb 7>:
>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>     goto <bb 11>;
>>>>>>>
>>>>>>>   }
>>>>>>>
>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>> #if 0
>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>  else
>>>>>>> #endif
>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>
>>>>>>> we will get ICE:
>>>>>>> t5.c: In function 'foo':
>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>  void foo (int n)
>>>>>>>       ^
>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>> _52 = _1 & _3;
>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>
>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>
>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>> by insert_gimplified_predicates.
>>>>>>
>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>> but push those to e->dest which makes this really messy.
>>>>>>
>>>>>> Rather than having a separate phase where we insert all
>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>> predicating a PHI.
>>>>>>
>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>> the printfs properly.
>>>>>>
>>>>>> You also still have two functions for PHI predication.  And the
>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>> paths.
>>>>>>
>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>> fault but making it even worse is not an option.
>>>>>>
>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>> commit edge insertions before merging the blocks.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> ChangeLog is
>>>>>>>
>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>> statement iterator.
>>>>>>> (bb_insert_point): New function.
>>>>>>> (set_bb_insert_point): New function.
>>>>>>> (has_pred_critical_p): New function.
>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>> non-critical incoming edge.
>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>> Change check that block containing reduction statement candidate
>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>> is_cond_scalar_reduction.
>>>>>>> (get_predicate_for_edge): New function.
>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>> critical.
>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>> Insert predicate computation of BB just after label if
>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>
>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>
>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>   tree predicate;
>>>>>>>>>
>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>
>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>
>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>> works.
>>>>>>>>
>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>> after the PHI we predicate.
>>>>>>>>
>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>
>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>
>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>
>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>
>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>
>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>
>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>
>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>
>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>> block end.
>>>>>>>>>>>>
>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>
>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
Yuri Rumyantsev Dec. 18, 2014, 1:45 p.m. UTC | #5
Richard,

I am sending you full patch (~1000 lines) but if you need only patch.1
and patch.2 will let me know and i'll send you reduced patch.

Below are few comments regarding your remarks for patch.3.

1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
when dead code elimination is required to vectorize loop, i.e. dead
statement is marked as relevant.
2. You wrote:
> The "retry" code also looks odd - why do you walk the BB multiple
> times instead of just doing sth like
>
>  while (!has_single_use (lhs))
>    {
>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>      ifcvt_walk_pattern_tree (copy);
>    }
>
> thus returning the copy you create and re-process it (the copy should
> now have a single-use).

The problem is that not only top SSA_NAME (lhs) may have multiple uses
but some intermediate variables too. For example, for the following
test-case

float a[1000];
int c[1000];

int foo()
{
  int i, res = 0;
#pragma omp simd safelen(8)
  for (i=0; i<512; i++)
  {
    float t = a[i];
    if (t > 0.0f & t < 1.0e+17f)
      if (c[i] != 0)
res += 1;
  }
  return res;
}

After combine_blocks we have the following bb:

<bb 3>:
# res_15 = PHI <res_1(7), 0(15)>
# i_16 = PHI <i_11(7), 0(15)>
# ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
t_5 = a[i_16];
_6 = t_5 > 0.0;
_7 = t_5 < 9.9999998430674944e+16;
_8 = _6 & _7;
_10 = &c[i_16];
_ifc__32 = _8 ? 4294967295 : 0;
_9 = MASK_LOAD (_10, 0B, _ifc__32);
_28 = _8;
_29 = _9 != 0;
_30 = _28 & _29;
_ifc__31 = _30 ? 1 : 0;
res_1 = res_15 + _ifc__31;
i_11 = i_16 + 1;
ivtmp_13 = ivtmp_14 - 1;
if (ivtmp_13 != 0)
  goto <bb 7>;
else
  goto <bb 8>;

and we can see that _8 has multiple uses. Also note that after splitting of
_8 = _6 & _7
we also get multiple uses for definition of  _6 and _7. So I used this
iterative algorithm as the simplest one.

I think it would be nice to re-use some utility from tree-vect-patterns.c
for stmt_is_root_of_bool_pattern.

I assume that function stmt_is_root_of_bool_pattern can be simplified
to check on COND_EXPR only since PHI predication and memory access
predication produced only such statements,i.e. it can look like

static bool
stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
{
  enum tree_code code;
  tree lhs, rhs;

  code = gimple_assign_rhs_code (stmt);
  if (code == COND_EXPR)
    {
      rhs = gimple_assign_rhs1 (stmt);
      if (TREE_CODE (rhs) != SSA_NAME)
return false;
      *var = rhs;
      return true;
    }
  return false;
}

I also did few minor changes in patch.2.

3. You can also notice that I inserted code in tree_if_conversion to
do loop version if explicit option "-ftree-loop-if-convert" was not
passed to compiler, i.e. we perform if-conversion for loop
vectorization only and if it does not take place, we should delete
if-converted version of loop.
What is your opinion?

Thanks.
Yuri.

2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi Richard,
>>
>> Here is updated patch which includes
>> (1) split critical edges for aggressive if conversion.
>> (2) delete all stuff related to support of critical edge predication.
>> (3) only one function - predicate_scalar_phi performs predication.
>> (4) function find_phi_replacement_condition was deleted since it was
>> included in predicate_scalar_phi for phi with two arguments.
>>
>> I checked that patch works in stress testing mode, i.e. with
>> aggressive if conversion by default.
>>
>> What is your opinion?
>
> Looks ok overall, but please simply do
>
>   FOR_EACH_EDGE (e, ei, bb->succs)
>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>       split_edge (e);
>
> for all blocks apart from the latch.
>
> Can you please send a combined patch up to this one?  Looking at
> the incremental diff is somewhat hard.  Thus a patch including all
> patches from patch1 to this one.
>
> Thanks,
> Richard.
>
>>
>> Thanks.
>> Yuri.
>>
>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Thanks for your reply!
>>>>
>>>> I didn't understand your point:
>>>>
>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>
>>>> but you do it unconditionally in proposed patch.
>>>
>>> I don't mind means I am fine with it.
>>>
>>>> Also I assume that
>>>> call of split_critical_edges() can break ssa. For example, we can
>>>> split headers of loops, loop exit blocks etc.
>>>
>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>> be surprised if so but that may be possible.
>>>
>>>> I prefer to do something
>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>> destination bb belongs to loop).
>>>
>>> That works for me as well but it is more complicated to implement.
>>> Ideally you'd only split one edge if you find a block with only critical
>>> predecessors (where we'd currently give up).  But note that this
>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>> will change loop->num_nodes so we have to be more careful in
>>> constructing the loop calling if_convertible_bb_p.
>>>
>>> Richard.
>>>
>>>>
>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>> I have few questions about your comments.
>>>>>>
>>>>>> 1. You wrote :
>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>> path
>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>> predicate_extended scalar phi to one function?
>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>> compile time since it builds hash_map.
>>>>>
>>>>> It's compile-time complexity is reasonable enough even for
>>>>> non-aggressive if-conversion.
>>>>>
>>>>>> 2. About critical edge splitting.
>>>>>>
>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>> option only; (2) should we split all critical edges.
>>>>>> Note that this leads to recomputing of topological order.
>>>>>
>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>> do something like
>>>>>
>>>>> Index: gcc/tree-if-conv.c
>>>>> ===================================================================
>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>    if (number_of_loops (fun) <= 1)
>>>>>      return 0;
>>>>>
>>>>> +  bool critical_edges_split_p = false;
>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>             && !loop->dont_vectorize))
>>>>> -      todo |= tree_if_conversion (loop);
>>>>> +      {
>>>>> +       if (!critical_edges_split_p)
>>>>> +         {
>>>>> +           split_critical_edges ();
>>>>> +           critical_edges_split_p = true;
>>>>> +           todo |= TODO_cleanup_cfg;
>>>>> +         }
>>>>> +       todo |= tree_if_conversion (loop);
>>>>> +      }
>>>>>
>>>>>  #ifdef ENABLE_CHECKING
>>>>>    {
>>>>>
>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>> predecessors and both are on critical edges are accepted without
>>>>>> additional option.
>>>>>
>>>>> Yes, I know.
>>>>>
>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>> critical edges then I am all for that solution.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Thanks ahead.
>>>>>> Yuri.
>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>> and at least one incoming edge
>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>
>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>> The problem phi is in bb-7:
>>>>>>>>
>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>   {
>>>>>>>>     <bb 5>:
>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>       goto <bb 7>;
>>>>>>>>     else
>>>>>>>>       goto <bb 9>;
>>>>>>>>
>>>>>>>>   }
>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>   {
>>>>>>>>     <bb 6>:
>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>       goto <bb 7>;
>>>>>>>>     else
>>>>>>>>       goto <bb 8>;
>>>>>>>>
>>>>>>>>   }
>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>   {
>>>>>>>>     <bb 7>:
>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>     goto <bb 11>;
>>>>>>>>
>>>>>>>>   }
>>>>>>>>
>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>> #if 0
>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>  else
>>>>>>>> #endif
>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>
>>>>>>>> we will get ICE:
>>>>>>>> t5.c: In function 'foo':
>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>  void foo (int n)
>>>>>>>>       ^
>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>> _52 = _1 & _3;
>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>
>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>
>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>> by insert_gimplified_predicates.
>>>>>>>
>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>
>>>>>>> Rather than having a separate phase where we insert all
>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>> predicating a PHI.
>>>>>>>
>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>> the printfs properly.
>>>>>>>
>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>> paths.
>>>>>>>
>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>> fault but making it even worse is not an option.
>>>>>>>
>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> ChangeLog is
>>>>>>>>
>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>> statement iterator.
>>>>>>>> (bb_insert_point): New function.
>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>> non-critical incoming edge.
>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>> is_cond_scalar_reduction.
>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>> critical.
>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>
>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>
>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>   tree predicate;
>>>>>>>>>>
>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>
>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>
>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>> works.
>>>>>>>>>
>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>> after the PHI we predicate.
>>>>>>>>>
>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>
>>>>>>>>>> Best regards.
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>
>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>
>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>
>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>
>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>
>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
Richard Biener Dec. 19, 2014, 11:45 a.m. UTC | #6
On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I am sending you full patch (~1000 lines) but if you need only patch.1
> and patch.2 will let me know and i'll send you reduced patch.
>
> Below are few comments regarding your remarks for patch.3.
>
> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
> when dead code elimination is required to vectorize loop, i.e. dead
> statement is marked as relevant.
> 2. You wrote:
>> The "retry" code also looks odd - why do you walk the BB multiple
>> times instead of just doing sth like
>>
>>  while (!has_single_use (lhs))
>>    {
>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>      ifcvt_walk_pattern_tree (copy);
>>    }
>>
>> thus returning the copy you create and re-process it (the copy should
>> now have a single-use).
>
> The problem is that not only top SSA_NAME (lhs) may have multiple uses
> but some intermediate variables too. For example, for the following
> test-case
>
> float a[1000];
> int c[1000];
>
> int foo()
> {
>   int i, res = 0;
> #pragma omp simd safelen(8)
>   for (i=0; i<512; i++)
>   {
>     float t = a[i];
>     if (t > 0.0f & t < 1.0e+17f)
>       if (c[i] != 0)
> res += 1;
>   }
>   return res;
> }
>
> After combine_blocks we have the following bb:
>
> <bb 3>:
> # res_15 = PHI <res_1(7), 0(15)>
> # i_16 = PHI <i_11(7), 0(15)>
> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
> t_5 = a[i_16];
> _6 = t_5 > 0.0;
> _7 = t_5 < 9.9999998430674944e+16;
> _8 = _6 & _7;
> _10 = &c[i_16];
> _ifc__32 = _8 ? 4294967295 : 0;
> _9 = MASK_LOAD (_10, 0B, _ifc__32);
> _28 = _8;
> _29 = _9 != 0;
> _30 = _28 & _29;
> _ifc__31 = _30 ? 1 : 0;
> res_1 = res_15 + _ifc__31;
> i_11 = i_16 + 1;
> ivtmp_13 = ivtmp_14 - 1;
> if (ivtmp_13 != 0)
>   goto <bb 7>;
> else
>   goto <bb 8>;
>
> and we can see that _8 has multiple uses. Also note that after splitting of
> _8 = _6 & _7
> we also get multiple uses for definition of  _6 and _7. So I used this
> iterative algorithm as the simplest one.

But it walks the entire pattern again and again while you only need to
ensure you walk the pattern tree of the now single-use DEF again
(in fact, rather than replacing a random USE in ifcvt_split_def_stmt
you should pass down the user_operand_p that you need to make
single-use).

> I think it would be nice to re-use some utility from tree-vect-patterns.c
> for stmt_is_root_of_bool_pattern.
>
> I assume that function stmt_is_root_of_bool_pattern can be simplified
> to check on COND_EXPR only since PHI predication and memory access
> predication produced only such statements,i.e. it can look like
>
> static bool
> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
> {
>   enum tree_code code;
>   tree lhs, rhs;
>
>   code = gimple_assign_rhs_code (stmt);
>   if (code == COND_EXPR)
>     {
>       rhs = gimple_assign_rhs1 (stmt);
>       if (TREE_CODE (rhs) != SSA_NAME)
> return false;
>       *var = rhs;
>       return true;
>     }
>   return false;
> }
>
> I also did few minor changes in patch.2.
>
> 3. You can also notice that I inserted code in tree_if_conversion to
> do loop version if explicit option "-ftree-loop-if-convert" was not
> passed to compiler, i.e. we perform if-conversion for loop
> vectorization only and if it does not take place, we should delete
> if-converted version of loop.
> What is your opinion?

Overall part 1 and part 2 look good to me, predicate_scalar_phi
looks in need of some refactoring to avoid duplicate code.  We can
do that a followup.

Part 3 still needs the iteration to be resolved and make the use we
actually care about single-use, not a random one so we can avoid
iterating completely.

Richard.

> Thanks.
> Yuri.
>
> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi Richard,
>>>
>>> Here is updated patch which includes
>>> (1) split critical edges for aggressive if conversion.
>>> (2) delete all stuff related to support of critical edge predication.
>>> (3) only one function - predicate_scalar_phi performs predication.
>>> (4) function find_phi_replacement_condition was deleted since it was
>>> included in predicate_scalar_phi for phi with two arguments.
>>>
>>> I checked that patch works in stress testing mode, i.e. with
>>> aggressive if conversion by default.
>>>
>>> What is your opinion?
>>
>> Looks ok overall, but please simply do
>>
>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>       split_edge (e);
>>
>> for all blocks apart from the latch.
>>
>> Can you please send a combined patch up to this one?  Looking at
>> the incremental diff is somewhat hard.  Thus a patch including all
>> patches from patch1 to this one.
>>
>> Thanks,
>> Richard.
>>
>>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Thanks for your reply!
>>>>>
>>>>> I didn't understand your point:
>>>>>
>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>
>>>>> but you do it unconditionally in proposed patch.
>>>>
>>>> I don't mind means I am fine with it.
>>>>
>>>>> Also I assume that
>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>> split headers of loops, loop exit blocks etc.
>>>>
>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>> be surprised if so but that may be possible.
>>>>
>>>>> I prefer to do something
>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>> destination bb belongs to loop).
>>>>
>>>> That works for me as well but it is more complicated to implement.
>>>> Ideally you'd only split one edge if you find a block with only critical
>>>> predecessors (where we'd currently give up).  But note that this
>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>> will change loop->num_nodes so we have to be more careful in
>>>> constructing the loop calling if_convertible_bb_p.
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>> I have few questions about your comments.
>>>>>>>
>>>>>>> 1. You wrote :
>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>> path
>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>> predicate_extended scalar phi to one function?
>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>> compile time since it builds hash_map.
>>>>>>
>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>> non-aggressive if-conversion.
>>>>>>
>>>>>>> 2. About critical edge splitting.
>>>>>>>
>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>> option only; (2) should we split all critical edges.
>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>
>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>> do something like
>>>>>>
>>>>>> Index: gcc/tree-if-conv.c
>>>>>> ===================================================================
>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>      return 0;
>>>>>>
>>>>>> +  bool critical_edges_split_p = false;
>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>             && !loop->dont_vectorize))
>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>> +      {
>>>>>> +       if (!critical_edges_split_p)
>>>>>> +         {
>>>>>> +           split_critical_edges ();
>>>>>> +           critical_edges_split_p = true;
>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>> +         }
>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>> +      }
>>>>>>
>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>    {
>>>>>>
>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>> additional option.
>>>>>>
>>>>>> Yes, I know.
>>>>>>
>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>> critical edges then I am all for that solution.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks ahead.
>>>>>>> Yuri.
>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>> and at least one incoming edge
>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>
>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>
>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>   {
>>>>>>>>>     <bb 5>:
>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>       goto <bb 7>;
>>>>>>>>>     else
>>>>>>>>>       goto <bb 9>;
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>   {
>>>>>>>>>     <bb 6>:
>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>       goto <bb 7>;
>>>>>>>>>     else
>>>>>>>>>       goto <bb 8>;
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>   {
>>>>>>>>>     <bb 7>:
>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>     goto <bb 11>;
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>> #if 0
>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>  else
>>>>>>>>> #endif
>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>
>>>>>>>>> we will get ICE:
>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>  void foo (int n)
>>>>>>>>>       ^
>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>> _52 = _1 & _3;
>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>
>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>
>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>
>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>
>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>> predicating a PHI.
>>>>>>>>
>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>> the printfs properly.
>>>>>>>>
>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>> paths.
>>>>>>>>
>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>
>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> ChangeLog is
>>>>>>>>>
>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>> statement iterator.
>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>> non-critical incoming edge.
>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>> critical.
>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>
>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>
>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>
>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>
>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>
>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>> works.
>>>>>>>>>>
>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>
>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>
>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
Yuri Rumyantsev Dec. 22, 2014, 2:39 p.m. UTC | #7
Richard,

I changed algorithm for bool pattern repair.
It turned out that ifcvt_local_dce phaase is required since for
test-case I sent you in previous mail vectorization is not performed
without dead code elimination:

For the loop
#pragma omp simd safelen(8)
  for (i=0; i<512; i++)
  {
    float t = a[i];
    if (t > 0.0f & t < 1.0e+17f)
      if (c[i] != 0)
res += 1;
  }

I've got the following message from vectorizer:

t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0;

t3.c:10:11: note: bit-precision arithmetic not supported.
t3.c:10:11: note: not vectorized: relevant stmt not supported:
_ifc__39 = t_5 > 0.0;

It is caused by the following dead predicate computations after
critical edge splitting:

(after combine blocks):

<bb 3>:
# res_15 = PHI <res_1(7), 0(19)>
# i_16 = PHI <i_11(7), 0(19)>
# ivtmp_14 = PHI <ivtmp_13(7), 512(19)>
t_5 = a[i_16];
_6 = t_5 > 0.0;
_7 = t_5 < 9.9999998430674944e+16;
_8 = _6 & _7;
_10 = &c[i_16];
_ifc__36 = _8 ? 4294967295 : 0;
_9 = MASK_LOAD (_10, 0B, _ifc__36);
_28 = _8;
_29 = _9 != 0;
_30 = _28 & _29;
// Statements below are dead!!
_31 = _8;
_32 = _9 != 0;
_33 = ~_32;
_34 = _31 & _33;
// End of dead statements.
_ifc__35 = _30 ? 1 : 0;
res_1 = res_15 + _ifc__35;
i_11 = i_16 + 1;
ivtmp_13 = ivtmp_14 - 1;
if (ivtmp_13 != 0)
  goto <bb 7>;
else
  goto <bb 8>;

But if we delete these statements loop will be vectorized.

New patch is attached.

Thanks.
Yuri.

2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I am sending you full patch (~1000 lines) but if you need only patch.1
>> and patch.2 will let me know and i'll send you reduced patch.
>>
>> Below are few comments regarding your remarks for patch.3.
>>
>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
>> when dead code elimination is required to vectorize loop, i.e. dead
>> statement is marked as relevant.
>> 2. You wrote:
>>> The "retry" code also looks odd - why do you walk the BB multiple
>>> times instead of just doing sth like
>>>
>>>  while (!has_single_use (lhs))
>>>    {
>>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>>      ifcvt_walk_pattern_tree (copy);
>>>    }
>>>
>>> thus returning the copy you create and re-process it (the copy should
>>> now have a single-use).
>>
>> The problem is that not only top SSA_NAME (lhs) may have multiple uses
>> but some intermediate variables too. For example, for the following
>> test-case
>>
>> float a[1000];
>> int c[1000];
>>
>> int foo()
>> {
>>   int i, res = 0;
>> #pragma omp simd safelen(8)
>>   for (i=0; i<512; i++)
>>   {
>>     float t = a[i];
>>     if (t > 0.0f & t < 1.0e+17f)
>>       if (c[i] != 0)
>> res += 1;
>>   }
>>   return res;
>> }
>>
>> After combine_blocks we have the following bb:
>>
>> <bb 3>:
>> # res_15 = PHI <res_1(7), 0(15)>
>> # i_16 = PHI <i_11(7), 0(15)>
>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
>> t_5 = a[i_16];
>> _6 = t_5 > 0.0;
>> _7 = t_5 < 9.9999998430674944e+16;
>> _8 = _6 & _7;
>> _10 = &c[i_16];
>> _ifc__32 = _8 ? 4294967295 : 0;
>> _9 = MASK_LOAD (_10, 0B, _ifc__32);
>> _28 = _8;
>> _29 = _9 != 0;
>> _30 = _28 & _29;
>> _ifc__31 = _30 ? 1 : 0;
>> res_1 = res_15 + _ifc__31;
>> i_11 = i_16 + 1;
>> ivtmp_13 = ivtmp_14 - 1;
>> if (ivtmp_13 != 0)
>>   goto <bb 7>;
>> else
>>   goto <bb 8>;
>>
>> and we can see that _8 has multiple uses. Also note that after splitting of
>> _8 = _6 & _7
>> we also get multiple uses for definition of  _6 and _7. So I used this
>> iterative algorithm as the simplest one.
>
> But it walks the entire pattern again and again while you only need to
> ensure you walk the pattern tree of the now single-use DEF again
> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt
> you should pass down the user_operand_p that you need to make
> single-use).
>
>> I think it would be nice to re-use some utility from tree-vect-patterns.c
>> for stmt_is_root_of_bool_pattern.
>>
>> I assume that function stmt_is_root_of_bool_pattern can be simplified
>> to check on COND_EXPR only since PHI predication and memory access
>> predication produced only such statements,i.e. it can look like
>>
>> static bool
>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
>> {
>>   enum tree_code code;
>>   tree lhs, rhs;
>>
>>   code = gimple_assign_rhs_code (stmt);
>>   if (code == COND_EXPR)
>>     {
>>       rhs = gimple_assign_rhs1 (stmt);
>>       if (TREE_CODE (rhs) != SSA_NAME)
>> return false;
>>       *var = rhs;
>>       return true;
>>     }
>>   return false;
>> }
>>
>> I also did few minor changes in patch.2.
>>
>> 3. You can also notice that I inserted code in tree_if_conversion to
>> do loop version if explicit option "-ftree-loop-if-convert" was not
>> passed to compiler, i.e. we perform if-conversion for loop
>> vectorization only and if it does not take place, we should delete
>> if-converted version of loop.
>> What is your opinion?
>
> Overall part 1 and part 2 look good to me, predicate_scalar_phi
> looks in need of some refactoring to avoid duplicate code.  We can
> do that a followup.
>
> Part 3 still needs the iteration to be resolved and make the use we
> actually care about single-use, not a random one so we can avoid
> iterating completely.
>
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi Richard,
>>>>
>>>> Here is updated patch which includes
>>>> (1) split critical edges for aggressive if conversion.
>>>> (2) delete all stuff related to support of critical edge predication.
>>>> (3) only one function - predicate_scalar_phi performs predication.
>>>> (4) function find_phi_replacement_condition was deleted since it was
>>>> included in predicate_scalar_phi for phi with two arguments.
>>>>
>>>> I checked that patch works in stress testing mode, i.e. with
>>>> aggressive if conversion by default.
>>>>
>>>> What is your opinion?
>>>
>>> Looks ok overall, but please simply do
>>>
>>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>>       split_edge (e);
>>>
>>> for all blocks apart from the latch.
>>>
>>> Can you please send a combined patch up to this one?  Looking at
>>> the incremental diff is somewhat hard.  Thus a patch including all
>>> patches from patch1 to this one.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Thanks for your reply!
>>>>>>
>>>>>> I didn't understand your point:
>>>>>>
>>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>>
>>>>>> but you do it unconditionally in proposed patch.
>>>>>
>>>>> I don't mind means I am fine with it.
>>>>>
>>>>>> Also I assume that
>>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>>> split headers of loops, loop exit blocks etc.
>>>>>
>>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>>> be surprised if so but that may be possible.
>>>>>
>>>>>> I prefer to do something
>>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>>> destination bb belongs to loop).
>>>>>
>>>>> That works for me as well but it is more complicated to implement.
>>>>> Ideally you'd only split one edge if you find a block with only critical
>>>>> predecessors (where we'd currently give up).  But note that this
>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>>> will change loop->num_nodes so we have to be more careful in
>>>>> constructing the loop calling if_convertible_bb_p.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>>> I have few questions about your comments.
>>>>>>>>
>>>>>>>> 1. You wrote :
>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>> path
>>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>>> predicate_extended scalar phi to one function?
>>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>>> compile time since it builds hash_map.
>>>>>>>
>>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>>> non-aggressive if-conversion.
>>>>>>>
>>>>>>>> 2. About critical edge splitting.
>>>>>>>>
>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>>> option only; (2) should we split all critical edges.
>>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>>
>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>>> do something like
>>>>>>>
>>>>>>> Index: gcc/tree-if-conv.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>>      return 0;
>>>>>>>
>>>>>>> +  bool critical_edges_split_p = false;
>>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>>             && !loop->dont_vectorize))
>>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>>> +      {
>>>>>>> +       if (!critical_edges_split_p)
>>>>>>> +         {
>>>>>>> +           split_critical_edges ();
>>>>>>> +           critical_edges_split_p = true;
>>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>>> +         }
>>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>>> +      }
>>>>>>>
>>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>>    {
>>>>>>>
>>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>>> additional option.
>>>>>>>
>>>>>>> Yes, I know.
>>>>>>>
>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>>> critical edges then I am all for that solution.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks ahead.
>>>>>>>> Yuri.
>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>>> and at least one incoming edge
>>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>>
>>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>>
>>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>>   {
>>>>>>>>>>     <bb 5>:
>>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>     else
>>>>>>>>>>       goto <bb 9>;
>>>>>>>>>>
>>>>>>>>>>   }
>>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>>   {
>>>>>>>>>>     <bb 6>:
>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>     else
>>>>>>>>>>       goto <bb 8>;
>>>>>>>>>>
>>>>>>>>>>   }
>>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>>   {
>>>>>>>>>>     <bb 7>:
>>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>     goto <bb 11>;
>>>>>>>>>>
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>>> #if 0
>>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>>  else
>>>>>>>>>> #endif
>>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>>
>>>>>>>>>> we will get ICE:
>>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>>  void foo (int n)
>>>>>>>>>>       ^
>>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>>> _52 = _1 & _3;
>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>>
>>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>>
>>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>>
>>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>>
>>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>>> predicating a PHI.
>>>>>>>>>
>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>>> the printfs properly.
>>>>>>>>>
>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>> paths.
>>>>>>>>>
>>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>>
>>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> ChangeLog is
>>>>>>>>>>
>>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>>> statement iterator.
>>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>>> critical.
>>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>>
>>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>>
>>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>>
>>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>>
>>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>>
>>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>>> works.
>>>>>>>>>>>
>>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>>
>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
Richard Biener Jan. 9, 2015, 12:27 p.m. UTC | #8
On Mon, Dec 22, 2014 at 3:39 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I changed algorithm for bool pattern repair.
> It turned out that ifcvt_local_dce phaase is required since for
> test-case I sent you in previous mail vectorization is not performed
> without dead code elimination:
>
> For the loop
> #pragma omp simd safelen(8)
>   for (i=0; i<512; i++)
>   {
>     float t = a[i];
>     if (t > 0.0f & t < 1.0e+17f)
>       if (c[i] != 0)
> res += 1;
>   }
>
> I've got the following message from vectorizer:
>
> t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0;
>
> t3.c:10:11: note: bit-precision arithmetic not supported.
> t3.c:10:11: note: not vectorized: relevant stmt not supported:
> _ifc__39 = t_5 > 0.0;
>
> It is caused by the following dead predicate computations after
> critical edge splitting:
>
> (after combine blocks):
>
> <bb 3>:
> # res_15 = PHI <res_1(7), 0(19)>
> # i_16 = PHI <i_11(7), 0(19)>
> # ivtmp_14 = PHI <ivtmp_13(7), 512(19)>
> t_5 = a[i_16];
> _6 = t_5 > 0.0;
> _7 = t_5 < 9.9999998430674944e+16;
> _8 = _6 & _7;
> _10 = &c[i_16];
> _ifc__36 = _8 ? 4294967295 : 0;
> _9 = MASK_LOAD (_10, 0B, _ifc__36);
> _28 = _8;
> _29 = _9 != 0;
> _30 = _28 & _29;
> // Statements below are dead!!
> _31 = _8;
> _32 = _9 != 0;
> _33 = ~_32;
> _34 = _31 & _33;
> // End of dead statements.
> _ifc__35 = _30 ? 1 : 0;
> res_1 = res_15 + _ifc__35;
> i_11 = i_16 + 1;
> ivtmp_13 = ivtmp_14 - 1;
> if (ivtmp_13 != 0)
>   goto <bb 7>;
> else
>   goto <bb 8>;
>
> But if we delete these statements loop will be vectorized.

Hm, ok.  We insert predicates too early obviously and not only when
needed.  But let's fix that later.

> New patch is attached.

 fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
 {
   tree rhs1, lhs1, cond_expr;
+
+  /* If COND is comparison r != 0 and r has boolean type, convert COND
+     to SSA_NAME to accept by vect bool pattern.  */
+  if (TREE_CODE (cond) == NE_EXPR)
+    {
+      tree op0 = TREE_OPERAND (cond, 0);
+      tree op1 = TREE_OPERAND (cond, 1);
+      if (TREE_CODE (op0) == SSA_NAME
+         && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
+         && (integer_zerop (op1)))
+       cond = op0;
+      else if (TREE_CODE (op1) == SSA_NAME
+              && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE
+              && (integer_zerop (op0)))
+       cond = op1;

The 2nd form, 0 != SSA_NAME doesn't happen due to operand
canonicalization.  Please remove its handling.

+      if (gimple_phi_num_args (phi) != 2)
+       {
+         if (!aggressive_if_conv)

&& !aggressive_if_conv

+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!aggressive_if_conv)

Likewise.

-      gimple reduc;
+         && (rhs = gimple_phi_arg_def (phi, 0)))) {

the { goes to the next line

 static void
 predicate_mem_writes (loop_p loop)
 {
-  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
+  unsigned int i, j, orig_loop_num_nodes = loop->num_nodes;
+  tree mask_vec[10];

an upper limit of 10?

+      for (j=0; j<10; j++)

spaces around '<' and '='

+       mask_vec[j] = NULL_TREE;
+

+           gcc_assert (exact_log2 (bitsize) != -1);
+           if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE)
+             {

this seems to be a completely separate "optimization"?  Note that
there are targets with non-power-of-two bitsize modes (PSImode),
so the assert will likely trigger.  I would prefer if you separate this
part of the patch.

+      if ( gimple_code (stmt) != GIMPLE_ASSIGN)
+       continue;

no space before gimple_code

+  imm_use_iterator imm_iter;
+
+
+  worklist.create (64);

excessive vertical space.

The patch misses the addition of new testcases - please add some,
otherwise the code will be totally untested.

I assume the patch passes bootstrap and regtest (you didn't say so).
Can you also do a bootstrap with aggressive_if_conv forced to
true and --with-build-config=bootstrap-O3 --disable-werror?

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I am sending you full patch (~1000 lines) but if you need only patch.1
>>> and patch.2 will let me know and i'll send you reduced patch.
>>>
>>> Below are few comments regarding your remarks for patch.3.
>>>
>>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
>>> when dead code elimination is required to vectorize loop, i.e. dead
>>> statement is marked as relevant.
>>> 2. You wrote:
>>>> The "retry" code also looks odd - why do you walk the BB multiple
>>>> times instead of just doing sth like
>>>>
>>>>  while (!has_single_use (lhs))
>>>>    {
>>>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>>>      ifcvt_walk_pattern_tree (copy);
>>>>    }
>>>>
>>>> thus returning the copy you create and re-process it (the copy should
>>>> now have a single-use).
>>>
>>> The problem is that not only top SSA_NAME (lhs) may have multiple uses
>>> but some intermediate variables too. For example, for the following
>>> test-case
>>>
>>> float a[1000];
>>> int c[1000];
>>>
>>> int foo()
>>> {
>>>   int i, res = 0;
>>> #pragma omp simd safelen(8)
>>>   for (i=0; i<512; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0.0f & t < 1.0e+17f)
>>>       if (c[i] != 0)
>>> res += 1;
>>>   }
>>>   return res;
>>> }
>>>
>>> After combine_blocks we have the following bb:
>>>
>>> <bb 3>:
>>> # res_15 = PHI <res_1(7), 0(15)>
>>> # i_16 = PHI <i_11(7), 0(15)>
>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
>>> t_5 = a[i_16];
>>> _6 = t_5 > 0.0;
>>> _7 = t_5 < 9.9999998430674944e+16;
>>> _8 = _6 & _7;
>>> _10 = &c[i_16];
>>> _ifc__32 = _8 ? 4294967295 : 0;
>>> _9 = MASK_LOAD (_10, 0B, _ifc__32);
>>> _28 = _8;
>>> _29 = _9 != 0;
>>> _30 = _28 & _29;
>>> _ifc__31 = _30 ? 1 : 0;
>>> res_1 = res_15 + _ifc__31;
>>> i_11 = i_16 + 1;
>>> ivtmp_13 = ivtmp_14 - 1;
>>> if (ivtmp_13 != 0)
>>>   goto <bb 7>;
>>> else
>>>   goto <bb 8>;
>>>
>>> and we can see that _8 has multiple uses. Also note that after splitting of
>>> _8 = _6 & _7
>>> we also get multiple uses for definition of  _6 and _7. So I used this
>>> iterative algorithm as the simplest one.
>>
>> But it walks the entire pattern again and again while you only need to
>> ensure you walk the pattern tree of the now single-use DEF again
>> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt
>> you should pass down the user_operand_p that you need to make
>> single-use).
>>
>>> I think it would be nice to re-use some utility from tree-vect-patterns.c
>>> for stmt_is_root_of_bool_pattern.
>>>
>>> I assume that function stmt_is_root_of_bool_pattern can be simplified
>>> to check on COND_EXPR only since PHI predication and memory access
>>> predication produced only such statements,i.e. it can look like
>>>
>>> static bool
>>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
>>> {
>>>   enum tree_code code;
>>>   tree lhs, rhs;
>>>
>>>   code = gimple_assign_rhs_code (stmt);
>>>   if (code == COND_EXPR)
>>>     {
>>>       rhs = gimple_assign_rhs1 (stmt);
>>>       if (TREE_CODE (rhs) != SSA_NAME)
>>> return false;
>>>       *var = rhs;
>>>       return true;
>>>     }
>>>   return false;
>>> }
>>>
>>> I also did few minor changes in patch.2.
>>>
>>> 3. You can also notice that I inserted code in tree_if_conversion to
>>> do loop version if explicit option "-ftree-loop-if-convert" was not
>>> passed to compiler, i.e. we perform if-conversion for loop
>>> vectorization only and if it does not take place, we should delete
>>> if-converted version of loop.
>>> What is your opinion?
>>
>> Overall part 1 and part 2 look good to me, predicate_scalar_phi
>> looks in need of some refactoring to avoid duplicate code.  We can
>> do that a followup.
>>
>> Part 3 still needs the iteration to be resolved and make the use we
>> actually care about single-use, not a random one so we can avoid
>> iterating completely.
>>
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> Here is updated patch which includes
>>>>> (1) split critical edges for aggressive if conversion.
>>>>> (2) delete all stuff related to support of critical edge predication.
>>>>> (3) only one function - predicate_scalar_phi performs predication.
>>>>> (4) function find_phi_replacement_condition was deleted since it was
>>>>> included in predicate_scalar_phi for phi with two arguments.
>>>>>
>>>>> I checked that patch works in stress testing mode, i.e. with
>>>>> aggressive if conversion by default.
>>>>>
>>>>> What is your opinion?
>>>>
>>>> Looks ok overall, but please simply do
>>>>
>>>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>>>       split_edge (e);
>>>>
>>>> for all blocks apart from the latch.
>>>>
>>>> Can you please send a combined patch up to this one?  Looking at
>>>> the incremental diff is somewhat hard.  Thus a patch including all
>>>> patches from patch1 to this one.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Thanks for your reply!
>>>>>>>
>>>>>>> I didn't understand your point:
>>>>>>>
>>>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>>>
>>>>>>> but you do it unconditionally in proposed patch.
>>>>>>
>>>>>> I don't mind means I am fine with it.
>>>>>>
>>>>>>> Also I assume that
>>>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>>>> split headers of loops, loop exit blocks etc.
>>>>>>
>>>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>>>> be surprised if so but that may be possible.
>>>>>>
>>>>>>> I prefer to do something
>>>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>>>> destination bb belongs to loop).
>>>>>>
>>>>>> That works for me as well but it is more complicated to implement.
>>>>>> Ideally you'd only split one edge if you find a block with only critical
>>>>>> predecessors (where we'd currently give up).  But note that this
>>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>>>> will change loop->num_nodes so we have to be more careful in
>>>>>> constructing the loop calling if_convertible_bb_p.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>>>> I have few questions about your comments.
>>>>>>>>>
>>>>>>>>> 1. You wrote :
>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>> path
>>>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>>>> predicate_extended scalar phi to one function?
>>>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>>>> compile time since it builds hash_map.
>>>>>>>>
>>>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>>>> non-aggressive if-conversion.
>>>>>>>>
>>>>>>>>> 2. About critical edge splitting.
>>>>>>>>>
>>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>>>> option only; (2) should we split all critical edges.
>>>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>>>
>>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>>>> do something like
>>>>>>>>
>>>>>>>> Index: gcc/tree-if-conv.c
>>>>>>>> ===================================================================
>>>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>>>      return 0;
>>>>>>>>
>>>>>>>> +  bool critical_edges_split_p = false;
>>>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>>>             && !loop->dont_vectorize))
>>>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>>>> +      {
>>>>>>>> +       if (!critical_edges_split_p)
>>>>>>>> +         {
>>>>>>>> +           split_critical_edges ();
>>>>>>>> +           critical_edges_split_p = true;
>>>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>>>> +         }
>>>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>>>> +      }
>>>>>>>>
>>>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>>>    {
>>>>>>>>
>>>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>>>> additional option.
>>>>>>>>
>>>>>>>> Yes, I know.
>>>>>>>>
>>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>>>> critical edges then I am all for that solution.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Thanks ahead.
>>>>>>>>> Yuri.
>>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>>>> and at least one incoming edge
>>>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>>>
>>>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>>>
>>>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>>>   {
>>>>>>>>>>>     <bb 5>:
>>>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>     else
>>>>>>>>>>>       goto <bb 9>;
>>>>>>>>>>>
>>>>>>>>>>>   }
>>>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>>>   {
>>>>>>>>>>>     <bb 6>:
>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>     else
>>>>>>>>>>>       goto <bb 8>;
>>>>>>>>>>>
>>>>>>>>>>>   }
>>>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>>>   {
>>>>>>>>>>>     <bb 7>:
>>>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>     goto <bb 11>;
>>>>>>>>>>>
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>>>> #if 0
>>>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>>>  else
>>>>>>>>>>> #endif
>>>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>>>
>>>>>>>>>>> we will get ICE:
>>>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>>>  void foo (int n)
>>>>>>>>>>>       ^
>>>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>>>> _52 = _1 & _3;
>>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>>>
>>>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>>>
>>>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>>>
>>>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>>>
>>>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>>>> predicating a PHI.
>>>>>>>>>>
>>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>>>> the printfs properly.
>>>>>>>>>>
>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>> paths.
>>>>>>>>>>
>>>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>>>
>>>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> ChangeLog is
>>>>>>>>>>>
>>>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>>>> statement iterator.
>>>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>>>> critical.
>>>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>>>
>>>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>>>
>>>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>>>
>>>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>>>> works.
>>>>>>>>>>>>
>>>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>>>
>>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
Yuri Rumyantsev Jan. 14, 2015, 1:14 p.m. UTC | #9
Richard,

I did all changes proposed by you and add couple tests.
Bootstrap, including aggressive one proposed by you, and regression
testing did not show any new failures.

Is it OK for trunk?

ChangeLog:

2015-01-14  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c: Include hash-map.h.
(aggressive_if_conv): New variable.
(fold_build_cond_expr): Add simplification of non-zero condition.
(add_to_dst_predicate_list): Invoke add_to_predicate_list if edge
destination block is not always executed.
(if_convertible_phi_p): Fix commentary, allow phi nodes have more
than two predecessors if AGGRESSIVE_IF_CONV is true.
(if_convertible_stmt_p): Fix commentary.
(all_preds_critical_p): New function.
(has_pred_critical_p): New function.
(if_convertible_bb_p): Fix commentary, if AGGRESSIVE_IF_CONV is true
BB can have more than two predecessors and all incoming edges can be
critical.
(predicate_bbs): Skip predication for loop exit block, use build2_loc
to compute predicate for true edge.
(find_phi_replacement_condition): Delete this function.
(is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
Allow interchange PHI arguments if EXTENDED is false.
Change check that block containing reduction statement candidate
is predecessor of phi-block since phi may have more than two arguments.
(phi_args_hash_traits): New helper structure.
(struct phi_args_hash_traits): New type.
(phi_args_hash_traits::hash): New function.
(phi_args_hash_traits::equal_keys): New function.
(gen_phi_arg_condition): New function.
(predicate_scalar_phi): Add handling of phi nodes with more than two
arguments, delete COND and TRUE_BB arguments, insert body of
find_phi_replacement_condition to predicate ordinary phi nodes.
(predicate_all_scalar_phis): Skip blocks with the only predecessor,
delete call of find_phi_replacement_condition and invoke
predicate_scalar_phi with two arguments.
(insert_gimplified_predicates): Add assert that non-predicated block
don't have statements to insert.
(ifcvt_split_critical_edges): New function.
(ifcvt_split_def_stmt): Likewise.
(ifcvt_walk_pattern_tree): Likewise.
(stmt_is_root_of_bool_pattern): Likewise.
(ifcvt_repair_bool_pattern): Likewise.
(ifcvt_local_dce): Likewise.
(tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
is copy of inner or outer loop force_vectorize field, invoke
ifcvt_split_critical_edges, ifcvt_local_dce and
ifcvt_repair_bool_pattern for aggressive if-conversion.

gcc/testsuite/ChangeLog

* gcc.dg/vect/vect-aggressive-1.c: New test.
* gcc.target/i386/avx2-vect-aggressive.c: Likewise.

2015-01-09 15:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Dec 22, 2014 at 3:39 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I changed algorithm for bool pattern repair.
>> It turned out that ifcvt_local_dce phaase is required since for
>> test-case I sent you in previous mail vectorization is not performed
>> without dead code elimination:
>>
>> For the loop
>> #pragma omp simd safelen(8)
>>   for (i=0; i<512; i++)
>>   {
>>     float t = a[i];
>>     if (t > 0.0f & t < 1.0e+17f)
>>       if (c[i] != 0)
>> res += 1;
>>   }
>>
>> I've got the following message from vectorizer:
>>
>> t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0;
>>
>> t3.c:10:11: note: bit-precision arithmetic not supported.
>> t3.c:10:11: note: not vectorized: relevant stmt not supported:
>> _ifc__39 = t_5 > 0.0;
>>
>> It is caused by the following dead predicate computations after
>> critical edge splitting:
>>
>> (after combine blocks):
>>
>> <bb 3>:
>> # res_15 = PHI <res_1(7), 0(19)>
>> # i_16 = PHI <i_11(7), 0(19)>
>> # ivtmp_14 = PHI <ivtmp_13(7), 512(19)>
>> t_5 = a[i_16];
>> _6 = t_5 > 0.0;
>> _7 = t_5 < 9.9999998430674944e+16;
>> _8 = _6 & _7;
>> _10 = &c[i_16];
>> _ifc__36 = _8 ? 4294967295 : 0;
>> _9 = MASK_LOAD (_10, 0B, _ifc__36);
>> _28 = _8;
>> _29 = _9 != 0;
>> _30 = _28 & _29;
>> // Statements below are dead!!
>> _31 = _8;
>> _32 = _9 != 0;
>> _33 = ~_32;
>> _34 = _31 & _33;
>> // End of dead statements.
>> _ifc__35 = _30 ? 1 : 0;
>> res_1 = res_15 + _ifc__35;
>> i_11 = i_16 + 1;
>> ivtmp_13 = ivtmp_14 - 1;
>> if (ivtmp_13 != 0)
>>   goto <bb 7>;
>> else
>>   goto <bb 8>;
>>
>> But if we delete these statements loop will be vectorized.
>
> Hm, ok.  We insert predicates too early obviously and not only when
> needed.  But let's fix that later.
>
>> New patch is attached.
>
>  fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
>  {
>    tree rhs1, lhs1, cond_expr;
> +
> +  /* If COND is comparison r != 0 and r has boolean type, convert COND
> +     to SSA_NAME to accept by vect bool pattern.  */
> +  if (TREE_CODE (cond) == NE_EXPR)
> +    {
> +      tree op0 = TREE_OPERAND (cond, 0);
> +      tree op1 = TREE_OPERAND (cond, 1);
> +      if (TREE_CODE (op0) == SSA_NAME
> +         && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
> +         && (integer_zerop (op1)))
> +       cond = op0;
> +      else if (TREE_CODE (op1) == SSA_NAME
> +              && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE
> +              && (integer_zerop (op0)))
> +       cond = op1;
>
> The 2nd form, 0 != SSA_NAME doesn't happen due to operand
> canonicalization.  Please remove its handling.
>
> +      if (gimple_phi_num_args (phi) != 2)
> +       {
> +         if (!aggressive_if_conv)
>
> && !aggressive_if_conv
>
> +  if (EDGE_COUNT (bb->preds) > 2)
> +    {
> +      if (!aggressive_if_conv)
>
> Likewise.
>
> -      gimple reduc;
> +         && (rhs = gimple_phi_arg_def (phi, 0)))) {
>
> the { goes to the next line
>
>  static void
>  predicate_mem_writes (loop_p loop)
>  {
> -  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
> +  unsigned int i, j, orig_loop_num_nodes = loop->num_nodes;
> +  tree mask_vec[10];
>
> an upper limit of 10?
>
> +      for (j=0; j<10; j++)
>
> spaces around '<' and '='
>
> +       mask_vec[j] = NULL_TREE;
> +
>
> +           gcc_assert (exact_log2 (bitsize) != -1);
> +           if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE)
> +             {
>
> this seems to be a completely separate "optimization"?  Note that
> there are targets with non-power-of-two bitsize modes (PSImode),
> so the assert will likely trigger.  I would prefer if you separate this
> part of the patch.
>
> +      if ( gimple_code (stmt) != GIMPLE_ASSIGN)
> +       continue;
>
> no space before gimple_code
>
> +  imm_use_iterator imm_iter;
> +
> +
> +  worklist.create (64);
>
> excessive vertical space.
>
> The patch misses the addition of new testcases - please add some,
> otherwise the code will be totally untested.
>
> I assume the patch passes bootstrap and regtest (you didn't say so).
> Can you also do a bootstrap with aggressive_if_conv forced to
> true and --with-build-config=bootstrap-O3 --disable-werror?
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> I am sending you full patch (~1000 lines) but if you need only patch.1
>>>> and patch.2 will let me know and i'll send you reduced patch.
>>>>
>>>> Below are few comments regarding your remarks for patch.3.
>>>>
>>>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
>>>> when dead code elimination is required to vectorize loop, i.e. dead
>>>> statement is marked as relevant.
>>>> 2. You wrote:
>>>>> The "retry" code also looks odd - why do you walk the BB multiple
>>>>> times instead of just doing sth like
>>>>>
>>>>>  while (!has_single_use (lhs))
>>>>>    {
>>>>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>>>>      ifcvt_walk_pattern_tree (copy);
>>>>>    }
>>>>>
>>>>> thus returning the copy you create and re-process it (the copy should
>>>>> now have a single-use).
>>>>
>>>> The problem is that not only top SSA_NAME (lhs) may have multiple uses
>>>> but some intermediate variables too. For example, for the following
>>>> test-case
>>>>
>>>> float a[1000];
>>>> int c[1000];
>>>>
>>>> int foo()
>>>> {
>>>>   int i, res = 0;
>>>> #pragma omp simd safelen(8)
>>>>   for (i=0; i<512; i++)
>>>>   {
>>>>     float t = a[i];
>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>       if (c[i] != 0)
>>>> res += 1;
>>>>   }
>>>>   return res;
>>>> }
>>>>
>>>> After combine_blocks we have the following bb:
>>>>
>>>> <bb 3>:
>>>> # res_15 = PHI <res_1(7), 0(15)>
>>>> # i_16 = PHI <i_11(7), 0(15)>
>>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
>>>> t_5 = a[i_16];
>>>> _6 = t_5 > 0.0;
>>>> _7 = t_5 < 9.9999998430674944e+16;
>>>> _8 = _6 & _7;
>>>> _10 = &c[i_16];
>>>> _ifc__32 = _8 ? 4294967295 : 0;
>>>> _9 = MASK_LOAD (_10, 0B, _ifc__32);
>>>> _28 = _8;
>>>> _29 = _9 != 0;
>>>> _30 = _28 & _29;
>>>> _ifc__31 = _30 ? 1 : 0;
>>>> res_1 = res_15 + _ifc__31;
>>>> i_11 = i_16 + 1;
>>>> ivtmp_13 = ivtmp_14 - 1;
>>>> if (ivtmp_13 != 0)
>>>>   goto <bb 7>;
>>>> else
>>>>   goto <bb 8>;
>>>>
>>>> and we can see that _8 has multiple uses. Also note that after splitting of
>>>> _8 = _6 & _7
>>>> we also get multiple uses for definition of  _6 and _7. So I used this
>>>> iterative algorithm as the simplest one.
>>>
>>> But it walks the entire pattern again and again while you only need to
>>> ensure you walk the pattern tree of the now single-use DEF again
>>> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt
>>> you should pass down the user_operand_p that you need to make
>>> single-use).
>>>
>>>> I think it would be nice to re-use some utility from tree-vect-patterns.c
>>>> for stmt_is_root_of_bool_pattern.
>>>>
>>>> I assume that function stmt_is_root_of_bool_pattern can be simplified
>>>> to check on COND_EXPR only since PHI predication and memory access
>>>> predication produced only such statements,i.e. it can look like
>>>>
>>>> static bool
>>>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
>>>> {
>>>>   enum tree_code code;
>>>>   tree lhs, rhs;
>>>>
>>>>   code = gimple_assign_rhs_code (stmt);
>>>>   if (code == COND_EXPR)
>>>>     {
>>>>       rhs = gimple_assign_rhs1 (stmt);
>>>>       if (TREE_CODE (rhs) != SSA_NAME)
>>>> return false;
>>>>       *var = rhs;
>>>>       return true;
>>>>     }
>>>>   return false;
>>>> }
>>>>
>>>> I also did few minor changes in patch.2.
>>>>
>>>> 3. You can also notice that I inserted code in tree_if_conversion to
>>>> do loop version if explicit option "-ftree-loop-if-convert" was not
>>>> passed to compiler, i.e. we perform if-conversion for loop
>>>> vectorization only and if it does not take place, we should delete
>>>> if-converted version of loop.
>>>> What is your opinion?
>>>
>>> Overall part 1 and part 2 look good to me, predicate_scalar_phi
>>> looks in need of some refactoring to avoid duplicate code.  We can
>>> do that a followup.
>>>
>>> Part 3 still needs the iteration to be resolved and make the use we
>>> actually care about single-use, not a random one so we can avoid
>>> iterating completely.
>>>
>>> Richard.
>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> Here is updated patch which includes
>>>>>> (1) split critical edges for aggressive if conversion.
>>>>>> (2) delete all stuff related to support of critical edge predication.
>>>>>> (3) only one function - predicate_scalar_phi performs predication.
>>>>>> (4) function find_phi_replacement_condition was deleted since it was
>>>>>> included in predicate_scalar_phi for phi with two arguments.
>>>>>>
>>>>>> I checked that patch works in stress testing mode, i.e. with
>>>>>> aggressive if conversion by default.
>>>>>>
>>>>>> What is your opinion?
>>>>>
>>>>> Looks ok overall, but please simply do
>>>>>
>>>>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>>>>       split_edge (e);
>>>>>
>>>>> for all blocks apart from the latch.
>>>>>
>>>>> Can you please send a combined patch up to this one?  Looking at
>>>>> the incremental diff is somewhat hard.  Thus a patch including all
>>>>> patches from patch1 to this one.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>> Yuri.
>>>>>>
>>>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Thanks for your reply!
>>>>>>>>
>>>>>>>> I didn't understand your point:
>>>>>>>>
>>>>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>>>>
>>>>>>>> but you do it unconditionally in proposed patch.
>>>>>>>
>>>>>>> I don't mind means I am fine with it.
>>>>>>>
>>>>>>>> Also I assume that
>>>>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>>>>> split headers of loops, loop exit blocks etc.
>>>>>>>
>>>>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>>>>> be surprised if so but that may be possible.
>>>>>>>
>>>>>>>> I prefer to do something
>>>>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>>>>> destination bb belongs to loop).
>>>>>>>
>>>>>>> That works for me as well but it is more complicated to implement.
>>>>>>> Ideally you'd only split one edge if you find a block with only critical
>>>>>>> predecessors (where we'd currently give up).  But note that this
>>>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>>>>> will change loop->num_nodes so we have to be more careful in
>>>>>>> constructing the loop calling if_convertible_bb_p.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>>>>> I have few questions about your comments.
>>>>>>>>>>
>>>>>>>>>> 1. You wrote :
>>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>>> path
>>>>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>>>>> predicate_extended scalar phi to one function?
>>>>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>>>>> compile time since it builds hash_map.
>>>>>>>>>
>>>>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>>>>> non-aggressive if-conversion.
>>>>>>>>>
>>>>>>>>>> 2. About critical edge splitting.
>>>>>>>>>>
>>>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>>>>> option only; (2) should we split all critical edges.
>>>>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>>>>
>>>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>>>>> do something like
>>>>>>>>>
>>>>>>>>> Index: gcc/tree-if-conv.c
>>>>>>>>> ===================================================================
>>>>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>>>>      return 0;
>>>>>>>>>
>>>>>>>>> +  bool critical_edges_split_p = false;
>>>>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>>>>             && !loop->dont_vectorize))
>>>>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>>>>> +      {
>>>>>>>>> +       if (!critical_edges_split_p)
>>>>>>>>> +         {
>>>>>>>>> +           split_critical_edges ();
>>>>>>>>> +           critical_edges_split_p = true;
>>>>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>>>>> +         }
>>>>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>>>>> +      }
>>>>>>>>>
>>>>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>>>>    {
>>>>>>>>>
>>>>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>>>>> additional option.
>>>>>>>>>
>>>>>>>>> Yes, I know.
>>>>>>>>>
>>>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>>>>> critical edges then I am all for that solution.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Thanks ahead.
>>>>>>>>>> Yuri.
>>>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>>>>> and at least one incoming edge
>>>>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>>>>
>>>>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>>>>   {
>>>>>>>>>>>>     <bb 5>:
>>>>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>>     else
>>>>>>>>>>>>       goto <bb 9>;
>>>>>>>>>>>>
>>>>>>>>>>>>   }
>>>>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>>>>   {
>>>>>>>>>>>>     <bb 6>:
>>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>>     else
>>>>>>>>>>>>       goto <bb 8>;
>>>>>>>>>>>>
>>>>>>>>>>>>   }
>>>>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>>>>   {
>>>>>>>>>>>>     <bb 7>:
>>>>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>     goto <bb 11>;
>>>>>>>>>>>>
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>>>>> #if 0
>>>>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>>>>  else
>>>>>>>>>>>> #endif
>>>>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>>>>
>>>>>>>>>>>> we will get ICE:
>>>>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>>>>  void foo (int n)
>>>>>>>>>>>>       ^
>>>>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>>>>> _52 = _1 & _3;
>>>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>>>>
>>>>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>>>>
>>>>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>>>>
>>>>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>>>>
>>>>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>>>>> predicating a PHI.
>>>>>>>>>>>
>>>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>>>>> the printfs properly.
>>>>>>>>>>>
>>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>>> paths.
>>>>>>>>>>>
>>>>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>>>>
>>>>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> ChangeLog is
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>>>>> statement iterator.
>>>>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>>>>> critical.
>>>>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.
Richard Biener Jan. 14, 2015, 2:40 p.m. UTC | #10
On Wed, Jan 14, 2015 at 2:14 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I did all changes proposed by you and add couple tests.
> Bootstrap, including aggressive one proposed by you, and regression
> testing did not show any new failures.
>
> Is it OK for trunk?

+++ b/gcc/testsuite/gcc.dg/vect/vect-aggressive-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target fopenmp } */
+/* { dg-options "-fopenmp" } */

please use { dg-additional-options "-fopenmp-simd" } instead and
vect_simd_clones target, not fopenmp target.

+/* { dg-options "-mavx2 -ffast-math -O3 -fopenmp -fdump-tree-vect-details" } */

likewise (for -ffast-math - don't use -mavx2, instead require a proper
vect target on the scan-tree-dump-times line

It would also be nice to have these runtime testcases so it can
be verified the code executes correctly instead of possibly producing
random crap ;)

The patch is ok with the above changes.

Thanks,
Richard.

> ChangeLog:
>
> 2015-01-14  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c: Include hash-map.h.
> (aggressive_if_conv): New variable.
> (fold_build_cond_expr): Add simplification of non-zero condition.
> (add_to_dst_predicate_list): Invoke add_to_predicate_list if edge
> destination block is not always executed.
> (if_convertible_phi_p): Fix commentary, allow phi nodes have more
> than two predecessors if AGGRESSIVE_IF_CONV is true.
> (if_convertible_stmt_p): Fix commentary.
> (all_preds_critical_p): New function.
> (has_pred_critical_p): New function.
> (if_convertible_bb_p): Fix commentary, if AGGRESSIVE_IF_CONV is true
> BB can have more than two predecessors and all incoming edges can be
> critical.
> (predicate_bbs): Skip predication for loop exit block, use build2_loc
> to compute predicate for true edge.
> (find_phi_replacement_condition): Delete this function.
> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
> Allow interchange PHI arguments if EXTENDED is false.
> Change check that block containing reduction statement candidate
> is predecessor of phi-block since phi may have more than two arguments.
> (phi_args_hash_traits): New helper structure.
> (struct phi_args_hash_traits): New type.
> (phi_args_hash_traits::hash): New function.
> (phi_args_hash_traits::equal_keys): New function.
> (gen_phi_arg_condition): New function.
> (predicate_scalar_phi): Add handling of phi nodes with more than two
> arguments, delete COND and TRUE_BB arguments, insert body of
> find_phi_replacement_condition to predicate ordinary phi nodes.
> (predicate_all_scalar_phis): Skip blocks with the only predecessor,
> delete call of find_phi_replacement_condition and invoke
> predicate_scalar_phi with two arguments.
> (insert_gimplified_predicates): Add assert that non-predicated block
> don't have statements to insert.
> (ifcvt_split_critical_edges): New function.
> (ifcvt_split_def_stmt): Likewise.
> (ifcvt_walk_pattern_tree): Likewise.
> (stmt_is_root_of_bool_pattern): Likewise.
> (ifcvt_repair_bool_pattern): Likewise.
> (ifcvt_local_dce): Likewise.
> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
> is copy of inner or outer loop force_vectorize field, invoke
> ifcvt_split_critical_edges, ifcvt_local_dce and
> ifcvt_repair_bool_pattern for aggressive if-conversion.
>
> gcc/testsuite/ChangeLog
>
> * gcc.dg/vect/vect-aggressive-1.c: New test.
> * gcc.target/i386/avx2-vect-aggressive.c: Likewise.
>
> 2015-01-09 15:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Dec 22, 2014 at 3:39 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I changed algorithm for bool pattern repair.
>>> It turned out that ifcvt_local_dce phaase is required since for
>>> test-case I sent you in previous mail vectorization is not performed
>>> without dead code elimination:
>>>
>>> For the loop
>>> #pragma omp simd safelen(8)
>>>   for (i=0; i<512; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0.0f & t < 1.0e+17f)
>>>       if (c[i] != 0)
>>> res += 1;
>>>   }
>>>
>>> I've got the following message from vectorizer:
>>>
>>> t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0;
>>>
>>> t3.c:10:11: note: bit-precision arithmetic not supported.
>>> t3.c:10:11: note: not vectorized: relevant stmt not supported:
>>> _ifc__39 = t_5 > 0.0;
>>>
>>> It is caused by the following dead predicate computations after
>>> critical edge splitting:
>>>
>>> (after combine blocks):
>>>
>>> <bb 3>:
>>> # res_15 = PHI <res_1(7), 0(19)>
>>> # i_16 = PHI <i_11(7), 0(19)>
>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(19)>
>>> t_5 = a[i_16];
>>> _6 = t_5 > 0.0;
>>> _7 = t_5 < 9.9999998430674944e+16;
>>> _8 = _6 & _7;
>>> _10 = &c[i_16];
>>> _ifc__36 = _8 ? 4294967295 : 0;
>>> _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>> _28 = _8;
>>> _29 = _9 != 0;
>>> _30 = _28 & _29;
>>> // Statements below are dead!!
>>> _31 = _8;
>>> _32 = _9 != 0;
>>> _33 = ~_32;
>>> _34 = _31 & _33;
>>> // End of dead statements.
>>> _ifc__35 = _30 ? 1 : 0;
>>> res_1 = res_15 + _ifc__35;
>>> i_11 = i_16 + 1;
>>> ivtmp_13 = ivtmp_14 - 1;
>>> if (ivtmp_13 != 0)
>>>   goto <bb 7>;
>>> else
>>>   goto <bb 8>;
>>>
>>> But if we delete these statements loop will be vectorized.
>>
>> Hm, ok.  We insert predicates too early obviously and not only when
>> needed.  But let's fix that later.
>>
>>> New patch is attached.
>>
>>  fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
>>  {
>>    tree rhs1, lhs1, cond_expr;
>> +
>> +  /* If COND is comparison r != 0 and r has boolean type, convert COND
>> +     to SSA_NAME to accept by vect bool pattern.  */
>> +  if (TREE_CODE (cond) == NE_EXPR)
>> +    {
>> +      tree op0 = TREE_OPERAND (cond, 0);
>> +      tree op1 = TREE_OPERAND (cond, 1);
>> +      if (TREE_CODE (op0) == SSA_NAME
>> +         && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
>> +         && (integer_zerop (op1)))
>> +       cond = op0;
>> +      else if (TREE_CODE (op1) == SSA_NAME
>> +              && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE
>> +              && (integer_zerop (op0)))
>> +       cond = op1;
>>
>> The 2nd form, 0 != SSA_NAME doesn't happen due to operand
>> canonicalization.  Please remove its handling.
>>
>> +      if (gimple_phi_num_args (phi) != 2)
>> +       {
>> +         if (!aggressive_if_conv)
>>
>> && !aggressive_if_conv
>>
>> +  if (EDGE_COUNT (bb->preds) > 2)
>> +    {
>> +      if (!aggressive_if_conv)
>>
>> Likewise.
>>
>> -      gimple reduc;
>> +         && (rhs = gimple_phi_arg_def (phi, 0)))) {
>>
>> the { goes to the next line
>>
>>  static void
>>  predicate_mem_writes (loop_p loop)
>>  {
>> -  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>> +  unsigned int i, j, orig_loop_num_nodes = loop->num_nodes;
>> +  tree mask_vec[10];
>>
>> an upper limit of 10?
>>
>> +      for (j=0; j<10; j++)
>>
>> spaces around '<' and '='
>>
>> +       mask_vec[j] = NULL_TREE;
>> +
>>
>> +           gcc_assert (exact_log2 (bitsize) != -1);
>> +           if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE)
>> +             {
>>
>> this seems to be a completely separate "optimization"?  Note that
>> there are targets with non-power-of-two bitsize modes (PSImode),
>> so the assert will likely trigger.  I would prefer if you separate this
>> part of the patch.
>>
>> +      if ( gimple_code (stmt) != GIMPLE_ASSIGN)
>> +       continue;
>>
>> no space before gimple_code
>>
>> +  imm_use_iterator imm_iter;
>> +
>> +
>> +  worklist.create (64);
>>
>> excessive vertical space.
>>
>> The patch misses the addition of new testcases - please add some,
>> otherwise the code will be totally untested.
>>
>> I assume the patch passes bootstrap and regtest (you didn't say so).
>> Can you also do a bootstrap with aggressive_if_conv forced to
>> true and --with-build-config=bootstrap-O3 --disable-werror?
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I am sending you full patch (~1000 lines) but if you need only patch.1
>>>>> and patch.2 will let me know and i'll send you reduced patch.
>>>>>
>>>>> Below are few comments regarding your remarks for patch.3.
>>>>>
>>>>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
>>>>> when dead code elimination is required to vectorize loop, i.e. dead
>>>>> statement is marked as relevant.
>>>>> 2. You wrote:
>>>>>> The "retry" code also looks odd - why do you walk the BB multiple
>>>>>> times instead of just doing sth like
>>>>>>
>>>>>>  while (!has_single_use (lhs))
>>>>>>    {
>>>>>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>>>>>      ifcvt_walk_pattern_tree (copy);
>>>>>>    }
>>>>>>
>>>>>> thus returning the copy you create and re-process it (the copy should
>>>>>> now have a single-use).
>>>>>
>>>>> The problem is that not only top SSA_NAME (lhs) may have multiple uses
>>>>> but some intermediate variables too. For example, for the following
>>>>> test-case
>>>>>
>>>>> float a[1000];
>>>>> int c[1000];
>>>>>
>>>>> int foo()
>>>>> {
>>>>>   int i, res = 0;
>>>>> #pragma omp simd safelen(8)
>>>>>   for (i=0; i<512; i++)
>>>>>   {
>>>>>     float t = a[i];
>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>       if (c[i] != 0)
>>>>> res += 1;
>>>>>   }
>>>>>   return res;
>>>>> }
>>>>>
>>>>> After combine_blocks we have the following bb:
>>>>>
>>>>> <bb 3>:
>>>>> # res_15 = PHI <res_1(7), 0(15)>
>>>>> # i_16 = PHI <i_11(7), 0(15)>
>>>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
>>>>> t_5 = a[i_16];
>>>>> _6 = t_5 > 0.0;
>>>>> _7 = t_5 < 9.9999998430674944e+16;
>>>>> _8 = _6 & _7;
>>>>> _10 = &c[i_16];
>>>>> _ifc__32 = _8 ? 4294967295 : 0;
>>>>> _9 = MASK_LOAD (_10, 0B, _ifc__32);
>>>>> _28 = _8;
>>>>> _29 = _9 != 0;
>>>>> _30 = _28 & _29;
>>>>> _ifc__31 = _30 ? 1 : 0;
>>>>> res_1 = res_15 + _ifc__31;
>>>>> i_11 = i_16 + 1;
>>>>> ivtmp_13 = ivtmp_14 - 1;
>>>>> if (ivtmp_13 != 0)
>>>>>   goto <bb 7>;
>>>>> else
>>>>>   goto <bb 8>;
>>>>>
>>>>> and we can see that _8 has multiple uses. Also note that after splitting of
>>>>> _8 = _6 & _7
>>>>> we also get multiple uses for definition of  _6 and _7. So I used this
>>>>> iterative algorithm as the simplest one.
>>>>
>>>> But it walks the entire pattern again and again while you only need to
>>>> ensure you walk the pattern tree of the now single-use DEF again
>>>> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt
>>>> you should pass down the user_operand_p that you need to make
>>>> single-use).
>>>>
>>>>> I think it would be nice to re-use some utility from tree-vect-patterns.c
>>>>> for stmt_is_root_of_bool_pattern.
>>>>>
>>>>> I assume that function stmt_is_root_of_bool_pattern can be simplified
>>>>> to check on COND_EXPR only since PHI predication and memory access
>>>>> predication produced only such statements,i.e. it can look like
>>>>>
>>>>> static bool
>>>>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
>>>>> {
>>>>>   enum tree_code code;
>>>>>   tree lhs, rhs;
>>>>>
>>>>>   code = gimple_assign_rhs_code (stmt);
>>>>>   if (code == COND_EXPR)
>>>>>     {
>>>>>       rhs = gimple_assign_rhs1 (stmt);
>>>>>       if (TREE_CODE (rhs) != SSA_NAME)
>>>>> return false;
>>>>>       *var = rhs;
>>>>>       return true;
>>>>>     }
>>>>>   return false;
>>>>> }
>>>>>
>>>>> I also did few minor changes in patch.2.
>>>>>
>>>>> 3. You can also notice that I inserted code in tree_if_conversion to
>>>>> do loop version if explicit option "-ftree-loop-if-convert" was not
>>>>> passed to compiler, i.e. we perform if-conversion for loop
>>>>> vectorization only and if it does not take place, we should delete
>>>>> if-converted version of loop.
>>>>> What is your opinion?
>>>>
>>>> Overall part 1 and part 2 look good to me, predicate_scalar_phi
>>>> looks in need of some refactoring to avoid duplicate code.  We can
>>>> do that a followup.
>>>>
>>>> Part 3 still needs the iteration to be resolved and make the use we
>>>> actually care about single-use, not a random one so we can avoid
>>>> iterating completely.
>>>>
>>>> Richard.
>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Here is updated patch which includes
>>>>>>> (1) split critical edges for aggressive if conversion.
>>>>>>> (2) delete all stuff related to support of critical edge predication.
>>>>>>> (3) only one function - predicate_scalar_phi performs predication.
>>>>>>> (4) function find_phi_replacement_condition was deleted since it was
>>>>>>> included in predicate_scalar_phi for phi with two arguments.
>>>>>>>
>>>>>>> I checked that patch works in stress testing mode, i.e. with
>>>>>>> aggressive if conversion by default.
>>>>>>>
>>>>>>> What is your opinion?
>>>>>>
>>>>>> Looks ok overall, but please simply do
>>>>>>
>>>>>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>>>>>       split_edge (e);
>>>>>>
>>>>>> for all blocks apart from the latch.
>>>>>>
>>>>>> Can you please send a combined patch up to this one?  Looking at
>>>>>> the incremental diff is somewhat hard.  Thus a patch including all
>>>>>> patches from patch1 to this one.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Thanks for your reply!
>>>>>>>>>
>>>>>>>>> I didn't understand your point:
>>>>>>>>>
>>>>>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>>>>>
>>>>>>>>> but you do it unconditionally in proposed patch.
>>>>>>>>
>>>>>>>> I don't mind means I am fine with it.
>>>>>>>>
>>>>>>>>> Also I assume that
>>>>>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>>>>>> split headers of loops, loop exit blocks etc.
>>>>>>>>
>>>>>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>>>>>> be surprised if so but that may be possible.
>>>>>>>>
>>>>>>>>> I prefer to do something
>>>>>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>>>>>> destination bb belongs to loop).
>>>>>>>>
>>>>>>>> That works for me as well but it is more complicated to implement.
>>>>>>>> Ideally you'd only split one edge if you find a block with only critical
>>>>>>>> predecessors (where we'd currently give up).  But note that this
>>>>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>>>>>> will change loop->num_nodes so we have to be more careful in
>>>>>>>> constructing the loop calling if_convertible_bb_p.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>>>>>> I have few questions about your comments.
>>>>>>>>>>>
>>>>>>>>>>> 1. You wrote :
>>>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>>>> path
>>>>>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>>>>>> predicate_extended scalar phi to one function?
>>>>>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>>>>>> compile time since it builds hash_map.
>>>>>>>>>>
>>>>>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>>>>>> non-aggressive if-conversion.
>>>>>>>>>>
>>>>>>>>>>> 2. About critical edge splitting.
>>>>>>>>>>>
>>>>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>>>>>> option only; (2) should we split all critical edges.
>>>>>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>>>>>
>>>>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>>>>>> do something like
>>>>>>>>>>
>>>>>>>>>> Index: gcc/tree-if-conv.c
>>>>>>>>>> ===================================================================
>>>>>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>>>>>      return 0;
>>>>>>>>>>
>>>>>>>>>> +  bool critical_edges_split_p = false;
>>>>>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>>>>>             && !loop->dont_vectorize))
>>>>>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>>>>>> +      {
>>>>>>>>>> +       if (!critical_edges_split_p)
>>>>>>>>>> +         {
>>>>>>>>>> +           split_critical_edges ();
>>>>>>>>>> +           critical_edges_split_p = true;
>>>>>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>>>>>> +         }
>>>>>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>>>>>> +      }
>>>>>>>>>>
>>>>>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>>>>>    {
>>>>>>>>>>
>>>>>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>>>>>> additional option.
>>>>>>>>>>
>>>>>>>>>> Yes, I know.
>>>>>>>>>>
>>>>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>>>>>> critical edges then I am all for that solution.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> Thanks ahead.
>>>>>>>>>>> Yuri.
>>>>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>>>>>> and at least one incoming edge
>>>>>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     <bb 5>:
>>>>>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>>>     else
>>>>>>>>>>>>>       goto <bb 9>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     <bb 6>:
>>>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>>>     else
>>>>>>>>>>>>>       goto <bb 8>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     <bb 7>:
>>>>>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>     goto <bb 11>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>>>>>> #if 0
>>>>>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>>>>>  else
>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>>>>>
>>>>>>>>>>>>> we will get ICE:
>>>>>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>>>>>  void foo (int n)
>>>>>>>>>>>>>       ^
>>>>>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>>>>>> _52 = _1 & _3;
>>>>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>>>>>
>>>>>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>>>>>
>>>>>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>>>>>
>>>>>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>>>>>
>>>>>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>>>>>> predicating a PHI.
>>>>>>>>>>>>
>>>>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>>>>>> the printfs properly.
>>>>>>>>>>>>
>>>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>>>> paths.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>>>>>
>>>>>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> ChangeLog is
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>>>>>> statement iterator.
>>>>>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>>>>>> critical.
>>>>>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

Patch
diff mbox

Index: gcc/tree-if-conv.c
===================================================================
--- gcc/tree-if-conv.c  (revision 218515)
+++ gcc/tree-if-conv.c  (working copy)
@@ -2235,12 +2235,21 @@  pass_if_conversion::execute (function *f
   if (number_of_loops (fun) <= 1)
     return 0;

+  bool critical_edges_split_p = false;
   FOR_EACH_LOOP (loop, 0)
     if (flag_tree_loop_if_convert == 1
        || flag_tree_loop_if_convert_stores == 1
        || ((flag_tree_loop_vectorize || loop->force_vectorize)
            && !loop->dont_vectorize))
-      todo |= tree_if_conversion (loop);
+      {
+       if (!critical_edges_split_p)
+         {
+           split_critical_edges ();
+           critical_edges_split_p = true;
+           todo |= TODO_cleanup_cfg;
+         }
+       todo |= tree_if_conversion (loop);
+      }

 #ifdef ENABLE_CHECKING
   {