[1/2] Extended if-conversion for loops marked with pragma omp simd.
diff mbox

Message ID CAFiYyc0EhSdbcqOpNTO_E43svxz=_=ECmV0TozE-nSkKTNUCvA@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Oct. 21, 2014, 2:19 p.m. UTC
On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I saw the sources of these functions, but I can't understand why I
>> should use something else? Note that all predicate computations are
>> located in basic blocks ( by design of if-conv) and there is special
>> function that put these computations in bb
>> (insert_gimplified_predicates). Edge contains only predicate not its
>> computations. New function - find_insertion_point() does very simple
>> search - it finds out the latest (in current bb) operand def-stmt of
>> predicates taken from all incoming edges.
>> In original algorithm the predicate of non-critical edge is taken to
>> perform phi-node predication since for critical edge it does not work
>> properly.
>>
>> My question is: does your comments mean that I should re-design my extensions?
>
> Well, we have infrastructure for inserting code on edges and you've
> made critical edges predicated correctly.  So why re-invent the wheel?
> I realize this is very similar to my initial suggestion to simply split
> critical edges in loops you want to if-convert but delays splitting
> until it turns out to be necessary (which might be good for the
> !force_vect case).
>
> For edge predicates you simply can emit their computation on the
> edge, no?
>
> Btw, I very originally suggested to rework if-conversion to only
> record edge predicates - having both block and edge predicates
> somewhat complicates the code and makes it harder to
> maintain (thus also the suggestion to simply split critical edges
> if necessary to make BB predicates work always).
>
> Your patches add a lot of code and to me it seems we can avoid
> doing so much special casing.

For example attacking the critical edge issue by a simple


it changes the number of blocks in the loop, so
get_loop_body_in_if_conv_order should probably be re-done with the
above eventually signalling that it created a new block.  Or the above
should populate a vector of edges to split and do that after the
loop calling if_convertible_bb_p.

Richard.

> Richard.
>
>> Thanks.
>> Yuri.
>>
>> BTW Jeff did initial review of my changes related to predicate
>> computation for join blocks. I presented him updated patch with
>> test-case and some minor changes in patch. But still did not get any
>> feedback on it. Could you please take a look also on it?
>>
>>
>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Yes, This patch does not make sense since phi node predication for bb
>>>> with critical incoming edges only performs another function which is
>>>> absent (predicate_extended_scalar_phi).
>>>>
>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>> only but you propose to use it for tree also.
>>>> Did I miss something?
>>>
>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>> if you want easy access to the newly created basic block to push
>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>
>>> Richard.
>>>
>>>> Thanks ahead.
>>>>
>>>>
>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>> in the future (more precise in patch.4).
>>>>>
>>>>> But the same reasoning applies to this version of the patch when
>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>
>>>>> Which means the patch doesn't make sense in isolation?
>>>>>
>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>> (pushing the edge predicate to the newly created block).
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Could you please review it.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> ChangeLog:
>>>>>>
>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> (flag_force_vectorize): New variable.
>>>>>> (edge_predicate): New function.
>>>>>> (set_edge_predicate): New function.
>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>> for critical edge.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>> (all_preds_critical_p): New function.
>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>> after adding support for extended predication.
>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>> Add zeroing of edge 'aux' field.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>
>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Thanks for your answer!
>>>>>>>
>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>> did you mean by:
>>>>>>>>>
>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>correctly.
>>>>>>>>>
>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>> nodes will be in next patch.
>>>>>>>>
>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>> for !flag_force_vectorize.
>>>>>>>>
>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>> is ok.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>
>>>>>>>>> I attached modified patch.
>>>>>>>>>
>>>>>>>>> ChangeLog:
>>>>>>>>>
>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>> (edge_predicate): New function.
>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>> for critical edge.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>
>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>> +             return false;
>>>>>>>>>> +           }
>>>>>>>>>> +
>>>>>>>>>> +        }
>>>>>>>>>>
>>>>>>>>>> Excess vertical space.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>
>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>
>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>> +
>>>>>>>>>> +static inline bool
>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>> +{
>>>>>>>>>>
>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>
>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>> +    {
>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>> +       return false;
>>>>>>>>>> +    }
>>>>>>>>>>
>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>
>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>> correctly.
>>>>>>>>>>
>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>> Yuri.
>>>>>>>>>>> ChangeLog
>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>> for critical edge.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>> more.
>>>>>>>>>>>>
>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>  {
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>> +       {
>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>
>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>
>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>> +
>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>
>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>
>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>
>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>
>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>
>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>
>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>> +
>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>
>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>
>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>
>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>
>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

Comments

Yuri Rumyantsev Oct. 21, 2014, 2:34 p.m. UTC | #1
Richard,

In my initial design I did such splitting but before start real
if-conversion but I decided to not perform it since code size for
if-converted loop is growing (number of phi nodes is increased). It is
worth noting also that for phi with #nodes > 2 we need to get all
predicates (except one) to do phi-predication and it means that block
containing such phi can have only 1 critical edge.

Thanks.
Yuri.

2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I saw the sources of these functions, but I can't understand why I
>>> should use something else? Note that all predicate computations are
>>> located in basic blocks ( by design of if-conv) and there is special
>>> function that put these computations in bb
>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>> computations. New function - find_insertion_point() does very simple
>>> search - it finds out the latest (in current bb) operand def-stmt of
>>> predicates taken from all incoming edges.
>>> In original algorithm the predicate of non-critical edge is taken to
>>> perform phi-node predication since for critical edge it does not work
>>> properly.
>>>
>>> My question is: does your comments mean that I should re-design my extensions?
>>
>> Well, we have infrastructure for inserting code on edges and you've
>> made critical edges predicated correctly.  So why re-invent the wheel?
>> I realize this is very similar to my initial suggestion to simply split
>> critical edges in loops you want to if-convert but delays splitting
>> until it turns out to be necessary (which might be good for the
>> !force_vect case).
>>
>> For edge predicates you simply can emit their computation on the
>> edge, no?
>>
>> Btw, I very originally suggested to rework if-conversion to only
>> record edge predicates - having both block and edge predicates
>> somewhat complicates the code and makes it harder to
>> maintain (thus also the suggestion to simply split critical edges
>> if necessary to make BB predicates work always).
>>
>> Your patches add a lot of code and to me it seems we can avoid
>> doing so much special casing.
>
> For example attacking the critical edge issue by a simple
>
> Index: tree-if-conv.c
> ===================================================================
> --- tree-if-conv.c      (revision 216508)
> +++ tree-if-conv.c      (working copy)
> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>         if (EDGE_COUNT (e->src->succs) == 1)
>           found = true;
>        if (!found)
> -       {
> -         if (dump_file && (dump_flags & TDF_DETAILS))
> -           fprintf (dump_file, "only critical predecessors\n");
> -         return false;
> -       }
> +       split_edge (EDGE_PRED (bb, 0));
>      }
>
>    return true;
>
> it changes the number of blocks in the loop, so
> get_loop_body_in_if_conv_order should probably be re-done with the
> above eventually signalling that it created a new block.  Or the above
> should populate a vector of edges to split and do that after the
> loop calling if_convertible_bb_p.
>
> Richard.
>
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> BTW Jeff did initial review of my changes related to predicate
>>> computation for join blocks. I presented him updated patch with
>>> test-case and some minor changes in patch. But still did not get any
>>> feedback on it. Could you please take a look also on it?
>>>
>>>
>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>> with critical incoming edges only performs another function which is
>>>>> absent (predicate_extended_scalar_phi).
>>>>>
>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>> only but you propose to use it for tree also.
>>>>> Did I miss something?
>>>>
>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>> if you want easy access to the newly created basic block to push
>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>
>>>> Richard.
>>>>
>>>>> Thanks ahead.
>>>>>
>>>>>
>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>> in the future (more precise in patch.4).
>>>>>>
>>>>>> But the same reasoning applies to this version of the patch when
>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>
>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>
>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Could you please review it.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> ChangeLog:
>>>>>>>
>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> (flag_force_vectorize): New variable.
>>>>>>> (edge_predicate): New function.
>>>>>>> (set_edge_predicate): New function.
>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>> for critical edge.
>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>> (all_preds_critical_p): New function.
>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>> after adding support for extended predication.
>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>> algorithm is used.
>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>
>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Thanks for your answer!
>>>>>>>>
>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>> did you mean by:
>>>>>>>>>>
>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>correctly.
>>>>>>>>>>
>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>
>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>
>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>> is ok.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>
>>>>>>>>>> I attached modified patch.
>>>>>>>>>>
>>>>>>>>>> ChangeLog:
>>>>>>>>>>
>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>> for critical edge.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>
>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>> +             return false;
>>>>>>>>>>> +           }
>>>>>>>>>>> +
>>>>>>>>>>> +        }
>>>>>>>>>>>
>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>
>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>
>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>> +
>>>>>>>>>>> +static inline bool
>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>> +{
>>>>>>>>>>>
>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>
>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>> +    {
>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>> +       return false;
>>>>>>>>>>> +    }
>>>>>>>>>>>
>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>
>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>> correctly.
>>>>>>>>>>>
>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>> more.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>> +       {
>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>
>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>
>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>
>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>
>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>
>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.
Richard Biener Oct. 24, 2014, 9:12 a.m. UTC | #2
On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> In my initial design I did such splitting but before start real
> if-conversion but I decided to not perform it since code size for
> if-converted loop is growing (number of phi nodes is increased). It is
> worth noting also that for phi with #nodes > 2 we need to get all
> predicates (except one) to do phi-predication and it means that block
> containing such phi can have only 1 critical edge.

Can you point me to the patch with the special insertion code then?
I definitely want to avoid the mess we ran into with the reassoc
code "clever" insertion code.

Richard.

> Thanks.
> Yuri.
>
> 2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> I saw the sources of these functions, but I can't understand why I
>>>> should use something else? Note that all predicate computations are
>>>> located in basic blocks ( by design of if-conv) and there is special
>>>> function that put these computations in bb
>>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>>> computations. New function - find_insertion_point() does very simple
>>>> search - it finds out the latest (in current bb) operand def-stmt of
>>>> predicates taken from all incoming edges.
>>>> In original algorithm the predicate of non-critical edge is taken to
>>>> perform phi-node predication since for critical edge it does not work
>>>> properly.
>>>>
>>>> My question is: does your comments mean that I should re-design my extensions?
>>>
>>> Well, we have infrastructure for inserting code on edges and you've
>>> made critical edges predicated correctly.  So why re-invent the wheel?
>>> I realize this is very similar to my initial suggestion to simply split
>>> critical edges in loops you want to if-convert but delays splitting
>>> until it turns out to be necessary (which might be good for the
>>> !force_vect case).
>>>
>>> For edge predicates you simply can emit their computation on the
>>> edge, no?
>>>
>>> Btw, I very originally suggested to rework if-conversion to only
>>> record edge predicates - having both block and edge predicates
>>> somewhat complicates the code and makes it harder to
>>> maintain (thus also the suggestion to simply split critical edges
>>> if necessary to make BB predicates work always).
>>>
>>> Your patches add a lot of code and to me it seems we can avoid
>>> doing so much special casing.
>>
>> For example attacking the critical edge issue by a simple
>>
>> Index: tree-if-conv.c
>> ===================================================================
>> --- tree-if-conv.c      (revision 216508)
>> +++ tree-if-conv.c      (working copy)
>> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>>         if (EDGE_COUNT (e->src->succs) == 1)
>>           found = true;
>>        if (!found)
>> -       {
>> -         if (dump_file && (dump_flags & TDF_DETAILS))
>> -           fprintf (dump_file, "only critical predecessors\n");
>> -         return false;
>> -       }
>> +       split_edge (EDGE_PRED (bb, 0));
>>      }
>>
>>    return true;
>>
>> it changes the number of blocks in the loop, so
>> get_loop_body_in_if_conv_order should probably be re-done with the
>> above eventually signalling that it created a new block.  Or the above
>> should populate a vector of edges to split and do that after the
>> loop calling if_convertible_bb_p.
>>
>> Richard.
>>
>>> Richard.
>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> BTW Jeff did initial review of my changes related to predicate
>>>> computation for join blocks. I presented him updated patch with
>>>> test-case and some minor changes in patch. But still did not get any
>>>> feedback on it. Could you please take a look also on it?
>>>>
>>>>
>>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>>> with critical incoming edges only performs another function which is
>>>>>> absent (predicate_extended_scalar_phi).
>>>>>>
>>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>>> only but you propose to use it for tree also.
>>>>>> Did I miss something?
>>>>>
>>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>>> if you want easy access to the newly created basic block to push
>>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Thanks ahead.
>>>>>>
>>>>>>
>>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>>> in the future (more precise in patch.4).
>>>>>>>
>>>>>>> But the same reasoning applies to this version of the patch when
>>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>>
>>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>>
>>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Could you please review it.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> ChangeLog:
>>>>>>>>
>>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>> (edge_predicate): New function.
>>>>>>>> (set_edge_predicate): New function.
>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>> for critical edge.
>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>> (all_preds_critical_p): New function.
>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>>> after adding support for extended predication.
>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>>> algorithm is used.
>>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>
>>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Thanks for your answer!
>>>>>>>>>
>>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>>> did you mean by:
>>>>>>>>>>>
>>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>>correctly.
>>>>>>>>>>>
>>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>>
>>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>>
>>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>>> is ok.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>>
>>>>>>>>>>> I attached modified patch.
>>>>>>>>>>>
>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>> for critical edge.
>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>> algorithm is used.
>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>>
>>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>>> +             return false;
>>>>>>>>>>>> +           }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        }
>>>>>>>>>>>>
>>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>>
>>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>>
>>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>>> +
>>>>>>>>>>>> +static inline bool
>>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>>> +{
>>>>>>>>>>>>
>>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>>
>>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>>> +       return false;
>>>>>>>>>>>> +    }
>>>>>>>>>>>>
>>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>>
>>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>>> correctly.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>>> +       {
>>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.
Yuri Rumyantsev Oct. 24, 2014, 10:21 a.m. UTC | #3
Richard,

Patch containing new core related to extended predication is attached.
Here is few comments which explain a main goal of design.

1. I don't want to insert any critical edge splitting since it may
lead to less efficient binaries (I remember some performance issue
when we designed lazy code motion algorithm in SPARC compiler).
2. One special case of extended PHI node predication was introduced
when #arguments is more than 2 but only two arguments are different
and one argument has the only occurrence. For such PHI conditional
scalar reduction is applied.
This is correspondent to the following:
    if (q1 && q2 && q3) var++
 New function phi_has_two_different_args was introduced to detect such phi.
3. Original algorithm for PHI predication used assumption that at
least one incoming edge for blocks containing PHI is not critical - it
guarantees that all computations related to predicate of normal edge
are already inserted above this block and
core related to PHI predication can be inserted at the beginning of
block. But this is not true for critical edges for which predicate
computations are  in the block where code for phi predication must be
inserted. So new function find_insertion_point is introduced which is
simply found out the last statement in block defining predicates
correspondent to all incoming edges and insert phi predication code
after it (with some minor exceptions).

If you need more comments or something unclear will let me know.

Thanks.
Yuri.

ChangeLog:

2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
FLAG_FORCE_VECTORIZE instead of loop flag.
(if_convertible_bb_p): Allow bb has more than 2 predecessors if
FLAG_FORCE_VECTORIZE is true.
(if_convertible_bb_p): Delete check that bb has at least one
non-critical incoming edge.
(phi_has_two_different_args): New function.
(is_cond_scalar_reduction): Add argument EXTENDED to choose access
to phi arguments. Invoke phi_has_two_different_args to get phi
arguments if EXTENDED is true. Change check that block
containing reduction statement candidate is predecessor
of phi-block since phi may have more than two arguments.
(convert_scalar_cond_reduction): Add argument BEFORE to insert
statement before/after gsi point.
(predicate_scalar_phi): Add argument false (which means non-extended
predication) to call of is_cond_scalar_reduction. Add argument
true (which correspondent to argument BEFORE) to call of
convert_scalar_cond_reduction.
(get_predicate_for_edge): New function.
(predicate_arbitrary_scalar_phi): New function.
(predicate_extended_scalar_phi): New function.
(find_insertion_point): New function.
(predicate_all_scalar_phis): Add two boolean variables EXTENDED and
BEFORE. Initialize EXTENDED to true if BB containing phi has more
than 2 predecessors or both incoming edges are critical. Invoke
find_phi_replacement_condition and predicate_scalar_phi or
find_insertion_point and predicate_extended_scalar_phi depending on
EXTENDED value.
(insert_gimplified_predicates): Add check that non-predicated block
may have statements to insert. Insert predicate of BB just after label
if FLAG_FORCE_VECTORIZE is true.
(tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
is copy of inner or outer loop field force_vectorize.


2014-10-24 13:12 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> In my initial design I did such splitting but before start real
>> if-conversion but I decided to not perform it since code size for
>> if-converted loop is growing (number of phi nodes is increased). It is
>> worth noting also that for phi with #nodes > 2 we need to get all
>> predicates (except one) to do phi-predication and it means that block
>> containing such phi can have only 1 critical edge.
>
> Can you point me to the patch with the special insertion code then?
> I definitely want to avoid the mess we ran into with the reassoc
> code "clever" insertion code.
>
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I saw the sources of these functions, but I can't understand why I
>>>>> should use something else? Note that all predicate computations are
>>>>> located in basic blocks ( by design of if-conv) and there is special
>>>>> function that put these computations in bb
>>>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>>>> computations. New function - find_insertion_point() does very simple
>>>>> search - it finds out the latest (in current bb) operand def-stmt of
>>>>> predicates taken from all incoming edges.
>>>>> In original algorithm the predicate of non-critical edge is taken to
>>>>> perform phi-node predication since for critical edge it does not work
>>>>> properly.
>>>>>
>>>>> My question is: does your comments mean that I should re-design my extensions?
>>>>
>>>> Well, we have infrastructure for inserting code on edges and you've
>>>> made critical edges predicated correctly.  So why re-invent the wheel?
>>>> I realize this is very similar to my initial suggestion to simply split
>>>> critical edges in loops you want to if-convert but delays splitting
>>>> until it turns out to be necessary (which might be good for the
>>>> !force_vect case).
>>>>
>>>> For edge predicates you simply can emit their computation on the
>>>> edge, no?
>>>>
>>>> Btw, I very originally suggested to rework if-conversion to only
>>>> record edge predicates - having both block and edge predicates
>>>> somewhat complicates the code and makes it harder to
>>>> maintain (thus also the suggestion to simply split critical edges
>>>> if necessary to make BB predicates work always).
>>>>
>>>> Your patches add a lot of code and to me it seems we can avoid
>>>> doing so much special casing.
>>>
>>> For example attacking the critical edge issue by a simple
>>>
>>> Index: tree-if-conv.c
>>> ===================================================================
>>> --- tree-if-conv.c      (revision 216508)
>>> +++ tree-if-conv.c      (working copy)
>>> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>>>         if (EDGE_COUNT (e->src->succs) == 1)
>>>           found = true;
>>>        if (!found)
>>> -       {
>>> -         if (dump_file && (dump_flags & TDF_DETAILS))
>>> -           fprintf (dump_file, "only critical predecessors\n");
>>> -         return false;
>>> -       }
>>> +       split_edge (EDGE_PRED (bb, 0));
>>>      }
>>>
>>>    return true;
>>>
>>> it changes the number of blocks in the loop, so
>>> get_loop_body_in_if_conv_order should probably be re-done with the
>>> above eventually signalling that it created a new block.  Or the above
>>> should populate a vector of edges to split and do that after the
>>> loop calling if_convertible_bb_p.
>>>
>>> Richard.
>>>
>>>> Richard.
>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> BTW Jeff did initial review of my changes related to predicate
>>>>> computation for join blocks. I presented him updated patch with
>>>>> test-case and some minor changes in patch. But still did not get any
>>>>> feedback on it. Could you please take a look also on it?
>>>>>
>>>>>
>>>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>>>> with critical incoming edges only performs another function which is
>>>>>>> absent (predicate_extended_scalar_phi).
>>>>>>>
>>>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>>>> only but you propose to use it for tree also.
>>>>>>> Did I miss something?
>>>>>>
>>>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>>>> if you want easy access to the newly created basic block to push
>>>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks ahead.
>>>>>>>
>>>>>>>
>>>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>>>> in the future (more precise in patch.4).
>>>>>>>>
>>>>>>>> But the same reasoning applies to this version of the patch when
>>>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>>>
>>>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>>>
>>>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Could you please review it.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> ChangeLog:
>>>>>>>>>
>>>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>> (edge_predicate): New function.
>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>> for critical edge.
>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>> (all_preds_critical_p): New function.
>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>>>> after adding support for extended predication.
>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>>>> algorithm is used.
>>>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>
>>>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> Thanks for your answer!
>>>>>>>>>>
>>>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>>>> did you mean by:
>>>>>>>>>>>>
>>>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>>>correctly.
>>>>>>>>>>>>
>>>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>>>
>>>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>>>
>>>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>>>> is ok.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>>>
>>>>>>>>>>>> I attached modified patch.
>>>>>>>>>>>>
>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>>>
>>>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>>>> +             return false;
>>>>>>>>>>>>> +           }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>>>
>>>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>>>
>>>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +static inline bool
>>>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>>
>>>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>>>
>>>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>>>> +    {
>>>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>>>> +       return false;
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>
>>>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>>>
>>>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>>>> correctly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>>>> +       {
>>>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.
Yuri Rumyantsev Nov. 7, 2014, 2:08 p.m. UTC | #4
Richard,

Did you have a chance to look at it?

Thanks.
Yuri.

2014-10-24 14:21 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard,
>
> Patch containing new core related to extended predication is attached.
> Here is few comments which explain a main goal of design.
>
> 1. I don't want to insert any critical edge splitting since it may
> lead to less efficient binaries (I remember some performance issue
> when we designed lazy code motion algorithm in SPARC compiler).
> 2. One special case of extended PHI node predication was introduced
> when #arguments is more than 2 but only two arguments are different
> and one argument has the only occurrence. For such PHI conditional
> scalar reduction is applied.
> This is correspondent to the following:
>     if (q1 && q2 && q3) var++
>  New function phi_has_two_different_args was introduced to detect such phi.
> 3. Original algorithm for PHI predication used assumption that at
> least one incoming edge for blocks containing PHI is not critical - it
> guarantees that all computations related to predicate of normal edge
> are already inserted above this block and
> core related to PHI predication can be inserted at the beginning of
> block. But this is not true for critical edges for which predicate
> computations are  in the block where code for phi predication must be
> inserted. So new function find_insertion_point is introduced which is
> simply found out the last statement in block defining predicates
> correspondent to all incoming edges and insert phi predication code
> after it (with some minor exceptions).
>
> If you need more comments or something unclear will let me know.
>
> Thanks.
> Yuri.
>
> ChangeLog:
>
> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
> FLAG_FORCE_VECTORIZE instead of loop flag.
> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
> FLAG_FORCE_VECTORIZE is true.
> (if_convertible_bb_p): Delete check that bb has at least one
> non-critical incoming edge.
> (phi_has_two_different_args): New function.
> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
> to phi arguments. Invoke phi_has_two_different_args to get phi
> arguments if EXTENDED is true. Change check that block
> containing reduction statement candidate is predecessor
> of phi-block since phi may have more than two arguments.
> (convert_scalar_cond_reduction): Add argument BEFORE to insert
> statement before/after gsi point.
> (predicate_scalar_phi): Add argument false (which means non-extended
> predication) to call of is_cond_scalar_reduction. Add argument
> true (which correspondent to argument BEFORE) to call of
> convert_scalar_cond_reduction.
> (get_predicate_for_edge): New function.
> (predicate_arbitrary_scalar_phi): New function.
> (predicate_extended_scalar_phi): New function.
> (find_insertion_point): New function.
> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
> BEFORE. Initialize EXTENDED to true if BB containing phi has more
> than 2 predecessors or both incoming edges are critical. Invoke
> find_phi_replacement_condition and predicate_scalar_phi or
> find_insertion_point and predicate_extended_scalar_phi depending on
> EXTENDED value.
> (insert_gimplified_predicates): Add check that non-predicated block
> may have statements to insert. Insert predicate of BB just after label
> if FLAG_FORCE_VECTORIZE is true.
> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
> is copy of inner or outer loop field force_vectorize.
>
>
> 2014-10-24 13:12 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> In my initial design I did such splitting but before start real
>>> if-conversion but I decided to not perform it since code size for
>>> if-converted loop is growing (number of phi nodes is increased). It is
>>> worth noting also that for phi with #nodes > 2 we need to get all
>>> predicates (except one) to do phi-predication and it means that block
>>> containing such phi can have only 1 critical edge.
>>
>> Can you point me to the patch with the special insertion code then?
>> I definitely want to avoid the mess we ran into with the reassoc
>> code "clever" insertion code.
>>
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I saw the sources of these functions, but I can't understand why I
>>>>>> should use something else? Note that all predicate computations are
>>>>>> located in basic blocks ( by design of if-conv) and there is special
>>>>>> function that put these computations in bb
>>>>>> (insert_gimplified_predicates). Edge contains only predicate not its
>>>>>> computations. New function - find_insertion_point() does very simple
>>>>>> search - it finds out the latest (in current bb) operand def-stmt of
>>>>>> predicates taken from all incoming edges.
>>>>>> In original algorithm the predicate of non-critical edge is taken to
>>>>>> perform phi-node predication since for critical edge it does not work
>>>>>> properly.
>>>>>>
>>>>>> My question is: does your comments mean that I should re-design my extensions?
>>>>>
>>>>> Well, we have infrastructure for inserting code on edges and you've
>>>>> made critical edges predicated correctly.  So why re-invent the wheel?
>>>>> I realize this is very similar to my initial suggestion to simply split
>>>>> critical edges in loops you want to if-convert but delays splitting
>>>>> until it turns out to be necessary (which might be good for the
>>>>> !force_vect case).
>>>>>
>>>>> For edge predicates you simply can emit their computation on the
>>>>> edge, no?
>>>>>
>>>>> Btw, I very originally suggested to rework if-conversion to only
>>>>> record edge predicates - having both block and edge predicates
>>>>> somewhat complicates the code and makes it harder to
>>>>> maintain (thus also the suggestion to simply split critical edges
>>>>> if necessary to make BB predicates work always).
>>>>>
>>>>> Your patches add a lot of code and to me it seems we can avoid
>>>>> doing so much special casing.
>>>>
>>>> For example attacking the critical edge issue by a simple
>>>>
>>>> Index: tree-if-conv.c
>>>> ===================================================================
>>>> --- tree-if-conv.c      (revision 216508)
>>>> +++ tree-if-conv.c      (working copy)
>>>> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop,
>>>>         if (EDGE_COUNT (e->src->succs) == 1)
>>>>           found = true;
>>>>        if (!found)
>>>> -       {
>>>> -         if (dump_file && (dump_flags & TDF_DETAILS))
>>>> -           fprintf (dump_file, "only critical predecessors\n");
>>>> -         return false;
>>>> -       }
>>>> +       split_edge (EDGE_PRED (bb, 0));
>>>>      }
>>>>
>>>>    return true;
>>>>
>>>> it changes the number of blocks in the loop, so
>>>> get_loop_body_in_if_conv_order should probably be re-done with the
>>>> above eventually signalling that it created a new block.  Or the above
>>>> should populate a vector of edges to split and do that after the
>>>> loop calling if_convertible_bb_p.
>>>>
>>>> Richard.
>>>>
>>>>> Richard.
>>>>>
>>>>>> Thanks.
>>>>>> Yuri.
>>>>>>
>>>>>> BTW Jeff did initial review of my changes related to predicate
>>>>>> computation for join blocks. I presented him updated patch with
>>>>>> test-case and some minor changes in patch. But still did not get any
>>>>>> feedback on it. Could you please take a look also on it?
>>>>>>
>>>>>>
>>>>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Yes, This patch does not make sense since phi node predication for bb
>>>>>>>> with critical incoming edges only performs another function which is
>>>>>>>> absent (predicate_extended_scalar_phi).
>>>>>>>>
>>>>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions
>>>>>>>> only but you propose to use it for tree also.
>>>>>>>> Did I miss something?
>>>>>>>
>>>>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert
>>>>>>> if you want easy access to the newly created basic block to push
>>>>>>> the predicate to - see gsi_commit_edge_inserts implementation).
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks ahead.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I did some changes in patch and ChangeLog to mark that support for
>>>>>>>>>> if-convert of blocks with only critical incoming edges will be added
>>>>>>>>>> in the future (more precise in patch.4).
>>>>>>>>>
>>>>>>>>> But the same reasoning applies to this version of the patch when
>>>>>>>>> flag_force_vectorize is true!?  (insertion point and invalid SSA form)
>>>>>>>>>
>>>>>>>>> Which means the patch doesn't make sense in isolation?
>>>>>>>>>
>>>>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge ()
>>>>>>>>> and commit_edge_insertions () before the call to combine_blocks
>>>>>>>>> (pushing the edge predicate to the newly created block).
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Could you please review it.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> ChangeLog:
>>>>>>>>>>
>>>>>>>>>> 2014-10-21  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>> for critical edge.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>> (all_preds_critical_p): New function.
>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>> to reject temporarily block if-conversion with incoming critical edges
>>>>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted
>>>>>>>>>> after adding support for extended predication.
>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and at least one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>
>>>>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your answer!
>>>>>>>>>>>
>>>>>>>>>>> In current implementation phi node conversion assume that one of
>>>>>>>>>>> incoming edge to bb containing given phi has at least one non-critical
>>>>>>>>>>> edge and choose it to insert predicated code. But if we choose
>>>>>>>>>>> critical edge we need to determine insert point and insertion
>>>>>>>>>>> direction (before/after) since in other case we can get invalid ssa
>>>>>>>>>>> form (use before def). This is done by my new function which is not in
>>>>>>>>>>> current patch ( I will present this patch later). SO I assume that we
>>>>>>>>>>> need to leave this patch as it is to not introduce new bugs.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what
>>>>>>>>>>>>> did you mean by:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>So please rework the patch so critical edges are always handled
>>>>>>>>>>>>>>correctly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes
>>>>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only
>>>>>>>>>>>>> critical incoming edges since support for extended predication of phi
>>>>>>>>>>>>> nodes will be in next patch.
>>>>>>>>>>>>
>>>>>>>>>>>> I mean that (2) should not be rejected dependent on flag_force_vectorize.
>>>>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before but with
>>>>>>>>>>>> this patch this is fixed.  I see no reason to still reject this then even
>>>>>>>>>>>> for !flag_force_vectorize.
>>>>>>>>>>>>
>>>>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize
>>>>>>>>>>>> is ok.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> Could you please clarify your statement.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I attached modified patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-17  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p
>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>>>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with
>>>>>>>>>>>>>>> changes in add_to_predicate_list for review).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>>>>>>>> +               fprintf (dump_file, "More than two phi node args.\n");
>>>>>>>>>>>>>> +             return false;
>>>>>>>>>>>>>> +           }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Excess vertical space.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> More than 1 predecessor?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +   Returns false if at least one successor is not on critical edge
>>>>>>>>>>>>>> +   and true otherwise.  */
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +static inline bool
>>>>>>>>>>>>>> +all_edges_are_critical (basic_block bb)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "all_preds_critical_p" would be a better name
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +  if (EDGE_COUNT (bb->preds) > 2)
>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>> +      if (!flag_force_vectorize)
>>>>>>>>>>>>>> +       return false;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> as I said in the last review I don't think we should restrict edge
>>>>>>>>>>>>>> predicates to flag_force_vectorize.  At least I can't see how
>>>>>>>>>>>>>> if-conversion is magically more expensive for that case?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So please rework the patch so critical edges are always handled
>>>>>>>>>>>>>> correctly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok with that and the above suggested changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>>>>> 2014-10-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>>>>>>>>>>>>>> if destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>>>>>>>>>>>>>> to compute predicate instead of fold_build2_loc.
>>>>>>>>>>>>>>> Add zeroing of edge 'aux' field.
>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>>>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Second part of patch will be sent later.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  static inline void
>>>>>>>>>>>>>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>>>>>>>>>>>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>>>>>>>>>>>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>>>>>>>>>>>>>> +        p1 & p2 | p1 & !p2.  */
>>>>>>>>>>>>>>>> +      if (dom_bb != loop->header
>>>>>>>>>>>>>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>>>>>>>>>>>>>> +       {
>>>>>>>>>>>>>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>>>>>>>>>>>>>> +         bc = bb_predicate (dom_bb);
>>>>>>>>>>>>>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>>>>>>>>>>>>>> unconditionally.  Note that you should call free_dominance_info
>>>>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>>>>>>>>>>>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +  /* If edge E is critical save predicate on it.  */
>>>>>>>>>>>>>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>>>>>>>>>>>>>> +    set_edge_predicate (e, cond);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> how do we know the edge is critical by this simple check?  Why not
>>>>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit
>>>>>>>>>>>>>>>> the case where e->src dominates e->dest).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>>>>>>>>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>>>>>>>>>>>>>> for that).  So stuff like
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +         extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
>>>>>>>>>>>>>>>> +         if (flag_force_vectorize)
>>>>>>>>>>>>>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> shouldn't be necessary.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally
>>>>>>>>>>>>>>>> and not depend on flag_force_vectorize.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +      /* The loop latch and loop exit block are always executed and
>>>>>>>>>>>>>>>> +        have no extra conditions to be processed: skip them.  */
>>>>>>>>>>>>>>>> +      if (bb == loop->latch
>>>>>>>>>>>>>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset the
>>>>>>>>>>>>>>>> loop->latch bb predicate the change looks broken.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>>>>>>>>>>>>>> +             supported by vectorizer, so re-build it without folding.
>>>>>>>>>>>>>>>> +            For example, such conversion is generated for sequence:
>>>>>>>>>>>>>>>> +               _Bool _7, _8, _9;
>>>>>>>>>>>>>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>>>>>>>>>>>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +         if (CONVERT_EXPR_P (c)
>>>>>>>>>>>>>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>>>>>>>>>>>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>>>>>>>>>>>>>> -                                    unshare_expr (cond), c2);
>>>>>>>>>>>>>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>>>>>>>>>>>>>> +                                    unshare_expr (c2));
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> why is it necessary to unshare c2?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>>>>>>>>>>>>>> that in detail).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Changelog.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-10-13  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>>>>>>>>>>>>>> Let's me also answer on your comments.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>>>>>>>>>>>>>> My previous code was not correct and now it looks like:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>>>>>>>>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>>>>>>>>>>>>>     c = bb_predicate (b);
>>>>>>>>>>>>>>>>>>   else
>>>>>>>>>>>>>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>>>>>>>>>>>>>     c = edge_predicate (e);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>>>>>>>>>>>>>> local-dce function in next patch.
>>>>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general
>>>>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>>>>>>>>>>>>>> scalar reduction can be applied also.
>>>>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>>>>>>>>>>>>>> omp simd only.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>>>>>>>>>>>>>> for join blocks if it exists.
>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>>>>>>>>>>>>>> for critical edge.
>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>>>>>>>>>>>>>> that extended predication must be applied).
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and
>>>>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>>>>>>>>>>>>>> for blocks with two successors.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> Richard!
>>>>>>>>>>>>>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>>>>>>>>>>>>>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>>>>>>>>>>>>>> negate_predicate was deleted.
>>>>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>>>>>>>>>>>>>> be critical.
>>>>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>>>>>>>>>>>>>> blocks to simplify it.
>>>>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>>>>>>>>>>>>>> of kind
>>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Updated patch is attached.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Any comments will be appreciated.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>>>>>>>>>>>>>> it hard to review.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In addition to that it changes function singatures without updating
>>>>>>>>>>>>>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>>>>>>>>>>>>>> all this added logic.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate computations
>>>>>>>>>>>>>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>>>>>>>>>>>>>> +   conditional expressions.  */
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>>>>>>>>>>>>>> split critical edges (of the respective loop body).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>> (edge_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (set_edge_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>>>>>>>>>>>>>> add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>>>>>>>>>>>>>> include:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>>>>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>>>>>>>>>>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>>>>>>>>>>>>>    predecessors.
>>>>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>>>>>>>>>>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>>>>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> No?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>>>>>>>>>>>>>>>>>>>> with some limitations:
>>>>>>>>>>>>>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>>>>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>>>>>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>>>>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>>>>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>>>>>>>>>>>>>    will be generated for it. In current design very simple check is used:
>>>>>>>>>>>>>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>>>>>>>>>>>>>> with next edge.
>>>>>>>>>>>>>>>>>>>>>>  These guarantee that phi predication will produce the correct result.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>>>>>>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   bb 5:
>>>>>>>>>>>>>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>>>>>>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>>>>>>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>   <bb 4>:
>>>>>>>>>>>>>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>>>>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>>>>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>>>>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>>>>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>>>>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>>>>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>>>>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>>>>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>>>>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>>>>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>>>>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>>>>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>>>>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>>>>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>>>>>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>>>>>>>>>>>>>>>>>>>> equal to false.
>>>>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>>>>>>>>>>>>>> algorithm is used.
>>>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>>>>>>>>>>>>>> predication to build mask.
>>>>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.

Patch
diff mbox

Index: tree-if-conv.c
===================================================================
--- tree-if-conv.c      (revision 216508)
+++ tree-if-conv.c      (working copy)
@@ -980,11 +980,7 @@  if_convertible_bb_p (struct loop *loop,
        if (EDGE_COUNT (e->src->succs) == 1)
          found = true;
       if (!found)
-       {
-         if (dump_file && (dump_flags & TDF_DETAILS))
-           fprintf (dump_file, "only critical predecessors\n");
-         return false;
-       }
+       split_edge (EDGE_PRED (bb, 0));
     }

   return true;