Patchwork Vector Comparison patch

login
register
mail settings
Submitter Richard Guenther
Date Aug. 25, 2011, 11:39 a.m.
Message ID <CAFiYyc3xZHGART6iXscAJ1GumbeWC=MTB56mfhC-BQgTT6Nixg@mail.gmail.com>
Download mbox | patch
Permalink /patch/111552/
State New
Headers show

Comments

Richard Guenther - Aug. 25, 2011, 11:39 a.m.
On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>>> we discussed.
>>>
>>> So I think it is a right time to do something about vcond patterns,
>>> which would allow me to get rid of conversions that I need to put all
>>> over the code.
>>>
>>> Also at the moment the patch breaks lto frontend with a simple example:
>>> #define vector(elcount, type)  \
>>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>
>>> int main (int argc, char *argv[]) {
>>>    vector (4, float) f0;
>>>    vector (4, float) f1;
>>>
>>>    f0 =  f1 != f0
>>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>>
>>>    return (int)f0[argc];
>>> }
>>>
>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>>
>>> I looked into the file, the conversion function is defined as
>>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>>> know what is the right way to treat the conversions.
>>>
>>> And I seriously need help with backend patterns.
>>
>> On the patch.
>>
>> The documentation needs review by a native english speaker, but here
>> are some factual comments:
>>
>> +In C vector comparison is supported within standard comparison operators:
>>
>> it should read 'In GNU C' here and everywhere else as this is a GNU
>> extension.
>>
>>  The result of the
>> +comparison is a signed integer-type vector where the size of each
>> +element must be the same as the size of compared vectors element.
>>
>> The result type of the comparison is determined by the C frontend,
>> it isn't under control of the user.  What you are implying here is
>> restrictions on vector assignments, which are documented elsewhere.
>> I'd just say
>>
>> 'The result of the comparison is a vector of the same width and number
>> of elements as the comparison operands with a signed integral element
>> type.'
>>
>> +In addition to the vector comparison C supports conditional expressions
>>
>> See above.
>>
>> +For the convenience condition in the vector conditional can be just a
>> +vector of signed integer type.
>>
>> 'of integer type.'  I don't see a reason to disallow unsigned integers,
>> they can be equally well compared against zero.
>
> I'll have a final go on the documentation, it is untouched from the old patches.
>
>> Index: gcc/targhooks.h
>> ===================================================================
>> --- gcc/targhooks.h     (revision 177665)
>> +++ gcc/targhooks.h     (working copy)
>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>>  extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>>
>>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
>> +
>>  extern bool
>>  default_builtin_support_vector_misalignment (enum machine_mode mode,
>>                                             const_tree,
>>
>> spurious whitespace change.
>
> Yes, thanks.
>
>> Index: gcc/optabs.c
>> ===================================================================
>> --- gcc/optabs.c        (revision 177665)
>> +++ gcc/optabs.c        (working copy)
>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
>> ...
>> +  else
>> +    {
>> +      rtx rtx_op0;
>> +      rtx vec;
>> +
>> +      rtx_op0 = expand_normal (op0);
>> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
>> +      vec = CONST0_RTX (mode);
>> +
>> +      create_output_operand (&ops[0], target, mode);
>> +      create_input_operand (&ops[1], rtx_op1, mode);
>> +      create_input_operand (&ops[2], rtx_op2, mode);
>> +      create_input_operand (&ops[3], comparison, mode);
>> +      create_input_operand (&ops[4], rtx_op0, mode);
>> +      create_input_operand (&ops[5], vec, mode);
>>
>> this still builds the fake(?) != comparison, but as you said you need help
>> with the .md part if we want to use a machine specific pattern for this
>> case (which we eventually want, for the sake of using XOP vcond).
>
> Yes, I am waiting for it. This is the only way at the moment to make
> sure that in
> m = a > b;
> r = m ? c : d;
>
> m in the vcond is not transformed into the m != 0.
>
>> Index: gcc/target.h
>> ===================================================================
>> --- gcc/target.h        (revision 177665)
>> +++ gcc/target.h        (working copy)
>> @@ -51,6 +51,7 @@
>>  #define GCC_TARGET_H
>>
>>  #include "insn-modes.h"
>> +#include "gimple.h"
>>
>>  #ifdef ENABLE_CHECKING
>>
>> spurious change.
>
> Old stuff, fixed.
>
>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>>      floating-point, we can only do some of these simplifications.)  */
>>   if (operand_equal_p (arg0, arg1, 0))
>>     {
>> +      tree arg0_type = TREE_TYPE (arg0);
>> +
>>       switch (code)
>>        {
>>        case EQ_EXPR:
>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>> +         if (! FLOAT_TYPE_P (arg0_type)
>> +             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>> ...
>
> Ok.
>
>>
>> Likewise.
>>
>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>     case UNGE_EXPR:
>>     case UNEQ_EXPR:
>>     case LTGT_EXPR:
>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>> +       {
>> +         enum tree_code code = ops->code;
>> +         tree arg0 = ops->op0;
>> +         tree arg1 = ops->op1;
>>
>> move this code to do_store_flag (we really store a flag value).  It should
>> also simply do what expand_vec_cond_expr does, probably simply
>> calling that with the {-1,...} {0,...} extra args should work.
>
> I started to do that, but the code in do_store_flag is completely
> different from what I am doing, and it looks confusing. I just call
> expand_vec_cond_expr and that is it. I can write a separate function,
> but the code is quite small.

Hm?  I see in your patch

that's not exactly "calling expand_vec_cond_expr".

>>
>> As for the still required conversions, you should be able to delay those
>> from the C frontend (and here) to expand_vec_cond_expr by, after
>> expanding op1 and op2, wrapping a subreg around it with a proper mode
>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
>> and then convert the result back to the original mode.
>>
>> I'll leave the C frontend pieces of the patch for review by Joseph, but
>
> Conversions are there until we fix the backend. When backend will be
> able to digest f0 > f1 ? int0 : int1, all the conversions will go
> away.
>
>> +static tree
>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>>
>> is missing a function comment.
>
> fixed.
>
>> +static tree
>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
>> +         tree bitpos, tree bitsize, enum tree_code code)
>> +{
>> +  tree cond;
>> +  tree comp_type;
>> +
>> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
>> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
>> +
>> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
>> +
>>
>> Use
>>
>>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);
>>
>> instead.  But I think you don't want to use TYPE_PRECISION on
>> FP types.  Instead you want a signed integer type of the same (mode)
>> size as the vector element type, thus
>>
>>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
>> (TYPE_MODE (inner_type)), 0);
>
> Hm, I thought that at this stage we don't wan to know anything about
> modes. I mean here I am really building the same integer type as the
> operands of the comparison have. But I can use MODE_BITSIZE as well, I
> don't think that it could happen that the size of the mode is
> different from the size of the type. Or could it?

The comparison could be on floating-point types where TYPE_PRECISION
can be, for example, 80 for x87 doubles.  You want an integer type
of the same width, so yes, GET_MODE_BITSIZE is the correct thing
to use here.

>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>
>> the result type of a comparison is boolean_type_node, not comp_type.
>>
>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
>> +                    build_int_cst (comp_type, -1),
>> +                    build_int_cst (comp_type, 0));
>>
>> writing this as
>>
>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type,
>>                     fold_build2 (code, boolean_type_node, a, b),
>> +                    build_int_cst (comp_type, -1),
>> +                    build_int_cst (comp_type, 0));
>>
>> will get the gimplifier a better chance at simplifcation.
>>
>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>
>> I think we are expecting the scalar type and the vector mode here
>> from looking at the single existing caller.  It probably doesn't make
>> a difference (we only check TYPE_UNSIGNED of it, which should
>> also work for vector types), but let's be consistent.  Thus,
>
> Ok.
>
>>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>>
>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>> +    t = expand_vector_piecewise (gsi, do_compare, type,
>> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
>> +  else
>> +    t = gimplify_build2  (gsi, code, type, op0, op1);
>>
>> the else case looks odd.  Why re-build a stmt that already exists?
>> Simply return NULL_TREE instead?
>
> I can adjust. The reason it is written that way is that
> expand_vector_operations_1 is using the result of the function to
> update rhs.

Ok, so it should check whether there was any lowering done then.

>> +static tree
>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
>> +{
>> ...
>> +      /* Expand vector condition inside of VEC_COND_EXPR.  */
>> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
>> +                                   TYPE_MODE (TREE_TYPE (cond))))
>> +       {
>> ...
>> +         new_rhs = expand_vector_piecewise (gsi, do_compare,
>> +                                            TREE_TYPE (cond),
>> +                                            TREE_TYPE (TREE_TYPE (op1)),
>> +                                            op0, op1, TREE_CODE (cond));
>>
>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>>
>> tem = { a[0] < b[0] ? -1 : 0, ... }
>> v0 & tem | v1 & ~tem;
>>
>> instead of
>>
>> { a[0] < b[0] ? v0[0] : v1[0], ... }
>>
>> even if the bitwise operations could be carried out using vectors.
>> It's definitely beneficial to do the first if the CPU can create the
>> bitmask.
>>
>
> o_O
>
> I thought you always wanted to do (m & v0) | (~m & v1).
> Do you want to have two cases of the expansion then -- when we have
> mask available and when we don't? But it is really unlikely that we
> can get the mask, but cannot get vcond. Because condition is actually
> vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] :
> c[0], ...}?

Hm, yeah.  I suppose with the current setup it's hard to only
get the mask but not the full vcond ;)  So it probably makes
sense to always expand to {a[0] > b[0]  ? v[0] :c[0],...} as
fallback.  Sorry for the confusion ;)

>> +  /* Run vecower on the expresisons we have introduced.  */
>> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
>> +    expand_vector_operations_1 (&gsi_tmp);
>>
>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>>
>> +static bool
>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
>> +{
>>
>> This function is lacking a comment.
>>
>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
>> ...
>> +      /* Try to get rid from the useless vector comparison
>> +        x != {0,0,...} which is inserted by the typechecker.  */
>> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>>
>> how and why?  You simply drop that comparison - that doesn't look
>> correct.  And in fact TREE_OPERAND (cond, 0) will never be a
>> comparison - that wouldn't be valid gimple.  Please leave this
>> optimization to SSA based forward propagation (I can help you here
>> once the patch is in).
>
> No-no-no. This is the second part of avoiding
> m = a > b;
> r = m ? v0 : v1;
>
> to prevent m expansion to m != {0}.
>
> I do not _simply_ drop the comparison. I drop it only if
> is_vector_comparison returned true. It means that we can never get
> into the situation that we are dropping actually a comparison inserted
> by the user. But what I really want to achieve here is to drop the
> comparison that the frontend inserts every time when it sees an
> expression there.
>
> As I said earlier, tree forward propagation kicks only using -On, and
> I would really like to make sure that I can get rid of useless != {0}
> at any level.

Please don't.  If the language extension forces a != 0 then it should
appear at -O0.  The code is fishy anyway in the way it walks stmts
in is_vector_comparison.  At least I don't like to see this optimization
done here for the sake of -O0 in this initial patch - you could try
arguing about it as a followup improvement (well, probably with not
much luck).  -O0 is about compile-speed and debugging, doing
data-flow by walking stmts backward is slow.

>>
>> +      if (expand_vec_cond_expr_p (TREE_TYPE (exp),
>> +                                  TYPE_MODE (TREE_TYPE (exp))))
>> +        {
>> +         update_stmt (gsi_stmt (*gsi));
>> +         return;
>>
>> no need to update the stmt when you do nothing.
>>
>> +      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
>> +      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
>> +      update_stmt (gsi_stmt (*gsi));
>> +    }
>>
>> missing return;, just for clarity that you are done here.
>
> Ok.
>
>> You don't do anything for comparisons here, in case they are split
>> away from the VEC_COND_EXPR by the gimplifier.  But if the
>> target doesn't support VEC_COND_EXPRs we have to lower them.
>> I suggest checking your testcases on i?86-linux (or with -m32 -march=i486).
>>
>
> expand_vector_operations_1 take care about any vector comparison,
> considering it as a binary operation. See expand_vector_operation and
> do_compare for more details.

Ah, ok, I missed that piece.

>> -TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
>> +TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
>>
>> huh, no please ;)  I suppose that's no longer necessary anyway now.
>>
>
> Yeah, fixed. :)
>
>> I'll leave the i386.c pieces to the x86 target maintainers to review.
>> They probably will change once the .md file changes are sorted out.
>
> If they ever going to be sorted out...

Well, we can move the conversion stuff to the point of expansion
using convert_move.  That'll keep the middle-end and the C frontend
clean and move the "hack" towards the backends.

Richard.

>> Thanks,
>> Richard.
>>
>
>
> Thanks,
> Artem.
>
Artem Shinkarov - Aug. 25, 2011, 12:45 p.m.
On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>>>> we discussed.
>>>>
>>>> So I think it is a right time to do something about vcond patterns,
>>>> which would allow me to get rid of conversions that I need to put all
>>>> over the code.
>>>>
>>>> Also at the moment the patch breaks lto frontend with a simple example:
>>>> #define vector(elcount, type)  \
>>>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>
>>>> int main (int argc, char *argv[]) {
>>>>    vector (4, float) f0;
>>>>    vector (4, float) f1;
>>>>
>>>>    f0 =  f1 != f0
>>>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>>>
>>>>    return (int)f0[argc];
>>>> }
>>>>
>>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>>>
>>>> I looked into the file, the conversion function is defined as
>>>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>>>> know what is the right way to treat the conversions.
>>>>
>>>> And I seriously need help with backend patterns.
>>>
>>> On the patch.
>>>
>>> The documentation needs review by a native english speaker, but here
>>> are some factual comments:
>>>
>>> +In C vector comparison is supported within standard comparison operators:
>>>
>>> it should read 'In GNU C' here and everywhere else as this is a GNU
>>> extension.
>>>
>>>  The result of the
>>> +comparison is a signed integer-type vector where the size of each
>>> +element must be the same as the size of compared vectors element.
>>>
>>> The result type of the comparison is determined by the C frontend,
>>> it isn't under control of the user.  What you are implying here is
>>> restrictions on vector assignments, which are documented elsewhere.
>>> I'd just say
>>>
>>> 'The result of the comparison is a vector of the same width and number
>>> of elements as the comparison operands with a signed integral element
>>> type.'
>>>
>>> +In addition to the vector comparison C supports conditional expressions
>>>
>>> See above.
>>>
>>> +For the convenience condition in the vector conditional can be just a
>>> +vector of signed integer type.
>>>
>>> 'of integer type.'  I don't see a reason to disallow unsigned integers,
>>> they can be equally well compared against zero.
>>
>> I'll have a final go on the documentation, it is untouched from the old patches.
>>
>>> Index: gcc/targhooks.h
>>> ===================================================================
>>> --- gcc/targhooks.h     (revision 177665)
>>> +++ gcc/targhooks.h     (working copy)
>>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>>>  extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>>>
>>>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
>>> +
>>>  extern bool
>>>  default_builtin_support_vector_misalignment (enum machine_mode mode,
>>>                                             const_tree,
>>>
>>> spurious whitespace change.
>>
>> Yes, thanks.
>>
>>> Index: gcc/optabs.c
>>> ===================================================================
>>> --- gcc/optabs.c        (revision 177665)
>>> +++ gcc/optabs.c        (working copy)
>>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
>>> ...
>>> +  else
>>> +    {
>>> +      rtx rtx_op0;
>>> +      rtx vec;
>>> +
>>> +      rtx_op0 = expand_normal (op0);
>>> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
>>> +      vec = CONST0_RTX (mode);
>>> +
>>> +      create_output_operand (&ops[0], target, mode);
>>> +      create_input_operand (&ops[1], rtx_op1, mode);
>>> +      create_input_operand (&ops[2], rtx_op2, mode);
>>> +      create_input_operand (&ops[3], comparison, mode);
>>> +      create_input_operand (&ops[4], rtx_op0, mode);
>>> +      create_input_operand (&ops[5], vec, mode);
>>>
>>> this still builds the fake(?) != comparison, but as you said you need help
>>> with the .md part if we want to use a machine specific pattern for this
>>> case (which we eventually want, for the sake of using XOP vcond).
>>
>> Yes, I am waiting for it. This is the only way at the moment to make
>> sure that in
>> m = a > b;
>> r = m ? c : d;
>>
>> m in the vcond is not transformed into the m != 0.
>>
>>> Index: gcc/target.h
>>> ===================================================================
>>> --- gcc/target.h        (revision 177665)
>>> +++ gcc/target.h        (working copy)
>>> @@ -51,6 +51,7 @@
>>>  #define GCC_TARGET_H
>>>
>>>  #include "insn-modes.h"
>>> +#include "gimple.h"
>>>
>>>  #ifdef ENABLE_CHECKING
>>>
>>> spurious change.
>>
>> Old stuff, fixed.
>>
>>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>>>      floating-point, we can only do some of these simplifications.)  */
>>>   if (operand_equal_p (arg0, arg1, 0))
>>>     {
>>> +      tree arg0_type = TREE_TYPE (arg0);
>>> +
>>>       switch (code)
>>>        {
>>>        case EQ_EXPR:
>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>> +         if (! FLOAT_TYPE_P (arg0_type)
>>> +             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>>> ...
>>
>> Ok.
>>
>>>
>>> Likewise.
>>>
>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>     case UNGE_EXPR:
>>>     case UNEQ_EXPR:
>>>     case LTGT_EXPR:
>>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>> +       {
>>> +         enum tree_code code = ops->code;
>>> +         tree arg0 = ops->op0;
>>> +         tree arg1 = ops->op1;
>>>
>>> move this code to do_store_flag (we really store a flag value).  It should
>>> also simply do what expand_vec_cond_expr does, probably simply
>>> calling that with the {-1,...} {0,...} extra args should work.
>>
>> I started to do that, but the code in do_store_flag is completely
>> different from what I am doing, and it looks confusing. I just call
>> expand_vec_cond_expr and that is it. I can write a separate function,
>> but the code is quite small.
>
> Hm?  I see in your patch
>
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  (revision 177665)
> +++ gcc/expr.c  (working copy)
> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>     case UNGE_EXPR:
>     case UNEQ_EXPR:
>     case LTGT_EXPR:
> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
> +       {
> +         enum tree_code code = ops->code;
> +         tree arg0 = ops->op0;
> +         tree arg1 = ops->op1;
> +         tree arg_type = TREE_TYPE (arg0);
> +         tree el_type = TREE_TYPE (arg_type);
> +         tree t, ifexp, if_true, if_false;
> +
> +         el_type = lang_hooks.types.type_for_size (TYPE_PRECISION
> (el_type), 0);
> +
> +
> +         ifexp = build2 (code, type, arg0, arg1);
> +         if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
> +         if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
> +
> +         if (arg_type != type)
> +           {
> +             if_true = convert (arg_type, if_true);
> +             if_false = convert (arg_type, if_false);
> +             t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
> +             t = convert (type, t);
> +           }
> +         else
> +           t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
> +
> +         return expand_expr (t,
> +                             modifier != EXPAND_STACK_PARM ? target :
> NULL_RTX,
> +                             tmode != VOIDmode ? tmode : mode,
> +                             modifier);
> +       }
>
> that's not exactly "calling expand_vec_cond_expr".

Well, actually it is. Keep in mind that clean backend would imply
removing the conversions. But I'll make a function.

>>>
>>> As for the still required conversions, you should be able to delay those
>>> from the C frontend (and here) to expand_vec_cond_expr by, after
>>> expanding op1 and op2, wrapping a subreg around it with a proper mode
>>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
>>> and then convert the result back to the original mode.
>>>
>>> I'll leave the C frontend pieces of the patch for review by Joseph, but
>>
>> Conversions are there until we fix the backend. When backend will be
>> able to digest f0 > f1 ? int0 : int1, all the conversions will go
>> away.
>>
>>> +static tree
>>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>>>
>>> is missing a function comment.
>>
>> fixed.
>>
>>> +static tree
>>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
>>> +         tree bitpos, tree bitsize, enum tree_code code)
>>> +{
>>> +  tree cond;
>>> +  tree comp_type;
>>> +
>>> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
>>> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
>>> +
>>> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
>>> +
>>>
>>> Use
>>>
>>>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);
>>>
>>> instead.  But I think you don't want to use TYPE_PRECISION on
>>> FP types.  Instead you want a signed integer type of the same (mode)
>>> size as the vector element type, thus
>>>
>>>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
>>> (TYPE_MODE (inner_type)), 0);
>>
>> Hm, I thought that at this stage we don't wan to know anything about
>> modes. I mean here I am really building the same integer type as the
>> operands of the comparison have. But I can use MODE_BITSIZE as well, I
>> don't think that it could happen that the size of the mode is
>> different from the size of the type. Or could it?
>
> The comparison could be on floating-point types where TYPE_PRECISION
> can be, for example, 80 for x87 doubles.  You want an integer type
> of the same width, so yes, GET_MODE_BITSIZE is the correct thing
> to use here.

Ok.

>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>
>>> the result type of a comparison is boolean_type_node, not comp_type.
>>>
>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
>>> +                    build_int_cst (comp_type, -1),
>>> +                    build_int_cst (comp_type, 0));
>>>
>>> writing this as
>>>
>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type,
>>>                     fold_build2 (code, boolean_type_node, a, b),
>>> +                    build_int_cst (comp_type, -1),
>>> +                    build_int_cst (comp_type, 0));
>>>
>>> will get the gimplifier a better chance at simplifcation.
>>>
>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>
>>> I think we are expecting the scalar type and the vector mode here
>>> from looking at the single existing caller.  It probably doesn't make
>>> a difference (we only check TYPE_UNSIGNED of it, which should
>>> also work for vector types), but let's be consistent.  Thus,
>>
>> Ok.
>>
>>>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>>>
>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>> +    t = expand_vector_piecewise (gsi, do_compare, type,
>>> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
>>> +  else
>>> +    t = gimplify_build2  (gsi, code, type, op0, op1);
>>>
>>> the else case looks odd.  Why re-build a stmt that already exists?
>>> Simply return NULL_TREE instead?
>>
>> I can adjust. The reason it is written that way is that
>> expand_vector_operations_1 is using the result of the function to
>> update rhs.
>
> Ok, so it should check whether there was any lowering done then.
>
>>> +static tree
>>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
>>> +{
>>> ...
>>> +      /* Expand vector condition inside of VEC_COND_EXPR.  */
>>> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
>>> +                                   TYPE_MODE (TREE_TYPE (cond))))
>>> +       {
>>> ...
>>> +         new_rhs = expand_vector_piecewise (gsi, do_compare,
>>> +                                            TREE_TYPE (cond),
>>> +                                            TREE_TYPE (TREE_TYPE (op1)),
>>> +                                            op0, op1, TREE_CODE (cond));
>>>
>>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>>>
>>> tem = { a[0] < b[0] ? -1 : 0, ... }
>>> v0 & tem | v1 & ~tem;
>>>
>>> instead of
>>>
>>> { a[0] < b[0] ? v0[0] : v1[0], ... }
>>>
>>> even if the bitwise operations could be carried out using vectors.
>>> It's definitely beneficial to do the first if the CPU can create the
>>> bitmask.
>>>
>>
>> o_O
>>
>> I thought you always wanted to do (m & v0) | (~m & v1).
>> Do you want to have two cases of the expansion then -- when we have
>> mask available and when we don't? But it is really unlikely that we
>> can get the mask, but cannot get vcond. Because condition is actually
>> vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] :
>> c[0], ...}?
>
> Hm, yeah.  I suppose with the current setup it's hard to only
> get the mask but not the full vcond ;)  So it probably makes
> sense to always expand to {a[0] > b[0]  ? v[0] :c[0],...} as
> fallback.  Sorry for the confusion ;)

Ok.

>>> +  /* Run vecower on the expresisons we have introduced.  */
>>> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
>>> +    expand_vector_operations_1 (&gsi_tmp);
>>>
>>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>>>
>>> +static bool
>>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
>>> +{
>>>
>>> This function is lacking a comment.
>>>
>>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
>>> ...
>>> +      /* Try to get rid from the useless vector comparison
>>> +        x != {0,0,...} which is inserted by the typechecker.  */
>>> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>>>
>>> how and why?  You simply drop that comparison - that doesn't look
>>> correct.  And in fact TREE_OPERAND (cond, 0) will never be a
>>> comparison - that wouldn't be valid gimple.  Please leave this
>>> optimization to SSA based forward propagation (I can help you here
>>> once the patch is in).
>>
>> No-no-no. This is the second part of avoiding
>> m = a > b;
>> r = m ? v0 : v1;
>>
>> to prevent m expansion to m != {0}.
>>
>> I do not _simply_ drop the comparison. I drop it only if
>> is_vector_comparison returned true. It means that we can never get
>> into the situation that we are dropping actually a comparison inserted
>> by the user. But what I really want to achieve here is to drop the
>> comparison that the frontend inserts every time when it sees an
>> expression there.
>>
>> As I said earlier, tree forward propagation kicks only using -On, and
>> I would really like to make sure that I can get rid of useless != {0}
>> at any level.

> Please don't.  If the language extension forces a != 0 then it should
> appear at -O0.  The code is fishy anyway in the way it walks stmts
> in is_vector_comparison.  At least I don't like to see this optimization
> done here for the sake of -O0 in this initial patch - you could try
> arguing about it as a followup improvement (well, probably with not
> much luck).  -O0 is about compile-speed and debugging, doing
> data-flow by walking stmts backward is slow.

Ok, then I seriously don't see any motivation to support the
VEC_COND_EXPR. The following code:

m = a > b;
r = (m & v0) | (~m & v1)

gives me much more flexibility and  control. What the VEC_COND_EXPR is
good for? Syntactical sugar?

How about throwing away all the VEC_COND_EXPR parts supporting only
conditions (implicitly expressed using vconds)? If we would agree on
implicit conversions for real types, then this is a functionality that
perfectly satisfies my needs.

I don't see any interest from the backend people and I cannot wait
forever, so why don't we start with a simple thing?



Artem.

>>>
>>> +      if (expand_vec_cond_expr_p (TREE_TYPE (exp),
>>> +                                  TYPE_MODE (TREE_TYPE (exp))))
>>> +        {
>>> +         update_stmt (gsi_stmt (*gsi));
>>> +         return;
>>>
>>> no need to update the stmt when you do nothing.
>>>
>>> +      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
>>> +      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
>>> +      update_stmt (gsi_stmt (*gsi));
>>> +    }
>>>
>>> missing return;, just for clarity that you are done here.
>>
>> Ok.
>>
>>> You don't do anything for comparisons here, in case they are split
>>> away from the VEC_COND_EXPR by the gimplifier.  But if the
>>> target doesn't support VEC_COND_EXPRs we have to lower them.
>>> I suggest checking your testcases on i?86-linux (or with -m32 -march=i486).
>>>
>>
>> expand_vector_operations_1 take care about any vector comparison,
>> considering it as a binary operation. See expand_vector_operation and
>> do_compare for more details.
>
> Ah, ok, I missed that piece.
>
>>> -TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
>>> +TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
>>>
>>> huh, no please ;)  I suppose that's no longer necessary anyway now.
>>>
>>
>> Yeah, fixed. :)
>>
>>> I'll leave the i386.c pieces to the x86 target maintainers to review.
>>> They probably will change once the .md file changes are sorted out.
>>
>> If they ever going to be sorted out...
>
> Well, we can move the conversion stuff to the point of expansion
> using convert_move.  That'll keep the middle-end and the C frontend
> clean and move the "hack" towards the backends.
>
> Richard.
>
>>> Thanks,
>>> Richard.
>>>
>>
>>
>> Thanks,
>> Artem.
>>
>
Richard Guenther - Aug. 25, 2011, 1 p.m.
On Thu, Aug 25, 2011 at 2:45 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>>>>> we discussed.
>>>>>
>>>>> So I think it is a right time to do something about vcond patterns,
>>>>> which would allow me to get rid of conversions that I need to put all
>>>>> over the code.
>>>>>
>>>>> Also at the moment the patch breaks lto frontend with a simple example:
>>>>> #define vector(elcount, type)  \
>>>>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>
>>>>> int main (int argc, char *argv[]) {
>>>>>    vector (4, float) f0;
>>>>>    vector (4, float) f1;
>>>>>
>>>>>    f0 =  f1 != f0
>>>>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>>>>
>>>>>    return (int)f0[argc];
>>>>> }
>>>>>
>>>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>>>>
>>>>> I looked into the file, the conversion function is defined as
>>>>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>>>>> know what is the right way to treat the conversions.
>>>>>
>>>>> And I seriously need help with backend patterns.
>>>>
>>>> On the patch.
>>>>
>>>> The documentation needs review by a native english speaker, but here
>>>> are some factual comments:
>>>>
>>>> +In C vector comparison is supported within standard comparison operators:
>>>>
>>>> it should read 'In GNU C' here and everywhere else as this is a GNU
>>>> extension.
>>>>
>>>>  The result of the
>>>> +comparison is a signed integer-type vector where the size of each
>>>> +element must be the same as the size of compared vectors element.
>>>>
>>>> The result type of the comparison is determined by the C frontend,
>>>> it isn't under control of the user.  What you are implying here is
>>>> restrictions on vector assignments, which are documented elsewhere.
>>>> I'd just say
>>>>
>>>> 'The result of the comparison is a vector of the same width and number
>>>> of elements as the comparison operands with a signed integral element
>>>> type.'
>>>>
>>>> +In addition to the vector comparison C supports conditional expressions
>>>>
>>>> See above.
>>>>
>>>> +For the convenience condition in the vector conditional can be just a
>>>> +vector of signed integer type.
>>>>
>>>> 'of integer type.'  I don't see a reason to disallow unsigned integers,
>>>> they can be equally well compared against zero.
>>>
>>> I'll have a final go on the documentation, it is untouched from the old patches.
>>>
>>>> Index: gcc/targhooks.h
>>>> ===================================================================
>>>> --- gcc/targhooks.h     (revision 177665)
>>>> +++ gcc/targhooks.h     (working copy)
>>>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>>>>  extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>>>>
>>>>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
>>>> +
>>>>  extern bool
>>>>  default_builtin_support_vector_misalignment (enum machine_mode mode,
>>>>                                             const_tree,
>>>>
>>>> spurious whitespace change.
>>>
>>> Yes, thanks.
>>>
>>>> Index: gcc/optabs.c
>>>> ===================================================================
>>>> --- gcc/optabs.c        (revision 177665)
>>>> +++ gcc/optabs.c        (working copy)
>>>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
>>>> ...
>>>> +  else
>>>> +    {
>>>> +      rtx rtx_op0;
>>>> +      rtx vec;
>>>> +
>>>> +      rtx_op0 = expand_normal (op0);
>>>> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
>>>> +      vec = CONST0_RTX (mode);
>>>> +
>>>> +      create_output_operand (&ops[0], target, mode);
>>>> +      create_input_operand (&ops[1], rtx_op1, mode);
>>>> +      create_input_operand (&ops[2], rtx_op2, mode);
>>>> +      create_input_operand (&ops[3], comparison, mode);
>>>> +      create_input_operand (&ops[4], rtx_op0, mode);
>>>> +      create_input_operand (&ops[5], vec, mode);
>>>>
>>>> this still builds the fake(?) != comparison, but as you said you need help
>>>> with the .md part if we want to use a machine specific pattern for this
>>>> case (which we eventually want, for the sake of using XOP vcond).
>>>
>>> Yes, I am waiting for it. This is the only way at the moment to make
>>> sure that in
>>> m = a > b;
>>> r = m ? c : d;
>>>
>>> m in the vcond is not transformed into the m != 0.
>>>
>>>> Index: gcc/target.h
>>>> ===================================================================
>>>> --- gcc/target.h        (revision 177665)
>>>> +++ gcc/target.h        (working copy)
>>>> @@ -51,6 +51,7 @@
>>>>  #define GCC_TARGET_H
>>>>
>>>>  #include "insn-modes.h"
>>>> +#include "gimple.h"
>>>>
>>>>  #ifdef ENABLE_CHECKING
>>>>
>>>> spurious change.
>>>
>>> Old stuff, fixed.
>>>
>>>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>>>>      floating-point, we can only do some of these simplifications.)  */
>>>>   if (operand_equal_p (arg0, arg1, 0))
>>>>     {
>>>> +      tree arg0_type = TREE_TYPE (arg0);
>>>> +
>>>>       switch (code)
>>>>        {
>>>>        case EQ_EXPR:
>>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>>> +         if (! FLOAT_TYPE_P (arg0_type)
>>>> +             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>>>> ...
>>>
>>> Ok.
>>>
>>>>
>>>> Likewise.
>>>>
>>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>>     case UNGE_EXPR:
>>>>     case UNEQ_EXPR:
>>>>     case LTGT_EXPR:
>>>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>>> +       {
>>>> +         enum tree_code code = ops->code;
>>>> +         tree arg0 = ops->op0;
>>>> +         tree arg1 = ops->op1;
>>>>
>>>> move this code to do_store_flag (we really store a flag value).  It should
>>>> also simply do what expand_vec_cond_expr does, probably simply
>>>> calling that with the {-1,...} {0,...} extra args should work.
>>>
>>> I started to do that, but the code in do_store_flag is completely
>>> different from what I am doing, and it looks confusing. I just call
>>> expand_vec_cond_expr and that is it. I can write a separate function,
>>> but the code is quite small.
>>
>> Hm?  I see in your patch
>>
>> Index: gcc/expr.c
>> ===================================================================
>> --- gcc/expr.c  (revision 177665)
>> +++ gcc/expr.c  (working copy)
>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>     case UNGE_EXPR:
>>     case UNEQ_EXPR:
>>     case LTGT_EXPR:
>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>> +       {
>> +         enum tree_code code = ops->code;
>> +         tree arg0 = ops->op0;
>> +         tree arg1 = ops->op1;
>> +         tree arg_type = TREE_TYPE (arg0);
>> +         tree el_type = TREE_TYPE (arg_type);
>> +         tree t, ifexp, if_true, if_false;
>> +
>> +         el_type = lang_hooks.types.type_for_size (TYPE_PRECISION
>> (el_type), 0);
>> +
>> +
>> +         ifexp = build2 (code, type, arg0, arg1);
>> +         if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
>> +         if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
>> +
>> +         if (arg_type != type)
>> +           {
>> +             if_true = convert (arg_type, if_true);
>> +             if_false = convert (arg_type, if_false);
>> +             t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
>> +             t = convert (type, t);
>> +           }
>> +         else
>> +           t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
>> +
>> +         return expand_expr (t,
>> +                             modifier != EXPAND_STACK_PARM ? target :
>> NULL_RTX,
>> +                             tmode != VOIDmode ? tmode : mode,
>> +                             modifier);
>> +       }
>>
>> that's not exactly "calling expand_vec_cond_expr".
>
> Well, actually it is. Keep in mind that clean backend would imply
> removing the conversions. But I'll make a function.

Why does

  return expand_vec_cond_expr (build2 (ops->code, type, ops->op0, ops->op1),
                                               build_vector_from_val
(type, build_int_cst (el_type, -1)),
                                               build_vector_from_val
(type, build_int_cst (el_type, 0)));

not work?  If you push the conversions to expand_vec_cond_expr
by doing them on RTL you simplify things here and remove the requirement
from doing them in the C frontend for VEC_COND_EXPR as well.

>>>>
>>>> As for the still required conversions, you should be able to delay those
>>>> from the C frontend (and here) to expand_vec_cond_expr by, after
>>>> expanding op1 and op2, wrapping a subreg around it with a proper mode
>>>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
>>>> and then convert the result back to the original mode.
>>>>
>>>> I'll leave the C frontend pieces of the patch for review by Joseph, but
>>>
>>> Conversions are there until we fix the backend. When backend will be
>>> able to digest f0 > f1 ? int0 : int1, all the conversions will go
>>> away.
>>>
>>>> +static tree
>>>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>>>>
>>>> is missing a function comment.
>>>
>>> fixed.
>>>
>>>> +static tree
>>>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
>>>> +         tree bitpos, tree bitsize, enum tree_code code)
>>>> +{
>>>> +  tree cond;
>>>> +  tree comp_type;
>>>> +
>>>> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
>>>> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
>>>> +
>>>> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
>>>> +
>>>>
>>>> Use
>>>>
>>>>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);
>>>>
>>>> instead.  But I think you don't want to use TYPE_PRECISION on
>>>> FP types.  Instead you want a signed integer type of the same (mode)
>>>> size as the vector element type, thus
>>>>
>>>>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
>>>> (TYPE_MODE (inner_type)), 0);
>>>
>>> Hm, I thought that at this stage we don't wan to know anything about
>>> modes. I mean here I am really building the same integer type as the
>>> operands of the comparison have. But I can use MODE_BITSIZE as well, I
>>> don't think that it could happen that the size of the mode is
>>> different from the size of the type. Or could it?
>>
>> The comparison could be on floating-point types where TYPE_PRECISION
>> can be, for example, 80 for x87 doubles.  You want an integer type
>> of the same width, so yes, GET_MODE_BITSIZE is the correct thing
>> to use here.
>
> Ok.
>
>>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>>
>>>> the result type of a comparison is boolean_type_node, not comp_type.
>>>>
>>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
>>>> +                    build_int_cst (comp_type, -1),
>>>> +                    build_int_cst (comp_type, 0));
>>>>
>>>> writing this as
>>>>
>>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type,
>>>>                     fold_build2 (code, boolean_type_node, a, b),
>>>> +                    build_int_cst (comp_type, -1),
>>>> +                    build_int_cst (comp_type, 0));
>>>>
>>>> will get the gimplifier a better chance at simplifcation.
>>>>
>>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>>
>>>> I think we are expecting the scalar type and the vector mode here
>>>> from looking at the single existing caller.  It probably doesn't make
>>>> a difference (we only check TYPE_UNSIGNED of it, which should
>>>> also work for vector types), but let's be consistent.  Thus,
>>>
>>> Ok.
>>>
>>>>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>>>>
>>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>> +    t = expand_vector_piecewise (gsi, do_compare, type,
>>>> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
>>>> +  else
>>>> +    t = gimplify_build2  (gsi, code, type, op0, op1);
>>>>
>>>> the else case looks odd.  Why re-build a stmt that already exists?
>>>> Simply return NULL_TREE instead?
>>>
>>> I can adjust. The reason it is written that way is that
>>> expand_vector_operations_1 is using the result of the function to
>>> update rhs.
>>
>> Ok, so it should check whether there was any lowering done then.
>>
>>>> +static tree
>>>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
>>>> +{
>>>> ...
>>>> +      /* Expand vector condition inside of VEC_COND_EXPR.  */
>>>> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
>>>> +                                   TYPE_MODE (TREE_TYPE (cond))))
>>>> +       {
>>>> ...
>>>> +         new_rhs = expand_vector_piecewise (gsi, do_compare,
>>>> +                                            TREE_TYPE (cond),
>>>> +                                            TREE_TYPE (TREE_TYPE (op1)),
>>>> +                                            op0, op1, TREE_CODE (cond));
>>>>
>>>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>>>>
>>>> tem = { a[0] < b[0] ? -1 : 0, ... }
>>>> v0 & tem | v1 & ~tem;
>>>>
>>>> instead of
>>>>
>>>> { a[0] < b[0] ? v0[0] : v1[0], ... }
>>>>
>>>> even if the bitwise operations could be carried out using vectors.
>>>> It's definitely beneficial to do the first if the CPU can create the
>>>> bitmask.
>>>>
>>>
>>> o_O
>>>
>>> I thought you always wanted to do (m & v0) | (~m & v1).
>>> Do you want to have two cases of the expansion then -- when we have
>>> mask available and when we don't? But it is really unlikely that we
>>> can get the mask, but cannot get vcond. Because condition is actually
>>> vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] :
>>> c[0], ...}?
>>
>> Hm, yeah.  I suppose with the current setup it's hard to only
>> get the mask but not the full vcond ;)  So it probably makes
>> sense to always expand to {a[0] > b[0]  ? v[0] :c[0],...} as
>> fallback.  Sorry for the confusion ;)
>
> Ok.
>
>>>> +  /* Run vecower on the expresisons we have introduced.  */
>>>> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
>>>> +    expand_vector_operations_1 (&gsi_tmp);
>>>>
>>>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>>>>
>>>> +static bool
>>>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
>>>> +{
>>>>
>>>> This function is lacking a comment.
>>>>
>>>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
>>>> ...
>>>> +      /* Try to get rid from the useless vector comparison
>>>> +        x != {0,0,...} which is inserted by the typechecker.  */
>>>> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>>>>
>>>> how and why?  You simply drop that comparison - that doesn't look
>>>> correct.  And in fact TREE_OPERAND (cond, 0) will never be a
>>>> comparison - that wouldn't be valid gimple.  Please leave this
>>>> optimization to SSA based forward propagation (I can help you here
>>>> once the patch is in).
>>>
>>> No-no-no. This is the second part of avoiding
>>> m = a > b;
>>> r = m ? v0 : v1;
>>>
>>> to prevent m expansion to m != {0}.
>>>
>>> I do not _simply_ drop the comparison. I drop it only if
>>> is_vector_comparison returned true. It means that we can never get
>>> into the situation that we are dropping actually a comparison inserted
>>> by the user. But what I really want to achieve here is to drop the
>>> comparison that the frontend inserts every time when it sees an
>>> expression there.
>>>
>>> As I said earlier, tree forward propagation kicks only using -On, and
>>> I would really like to make sure that I can get rid of useless != {0}
>>> at any level.
>
>> Please don't.  If the language extension forces a != 0 then it should
>> appear at -O0.  The code is fishy anyway in the way it walks stmts
>> in is_vector_comparison.  At least I don't like to see this optimization
>> done here for the sake of -O0 in this initial patch - you could try
>> arguing about it as a followup improvement (well, probably with not
>> much luck).  -O0 is about compile-speed and debugging, doing
>> data-flow by walking stmts backward is slow.
>
> Ok, then I seriously don't see any motivation to support the
> VEC_COND_EXPR. The following code:
>
> m = a > b;
> r = (m & v0) | (~m & v1)
>
> gives me much more flexibility and  control. What the VEC_COND_EXPR is
> good for? Syntactical sugar?
>
> How about throwing away all the VEC_COND_EXPR parts supporting only
> conditions (implicitly expressed using vconds)? If we would agree on
> implicit conversions for real types, then this is a functionality that
> perfectly satisfies my needs.
>
> I don't see any interest from the backend people and I cannot wait
> forever, so why don't we start with a simple thing?

But the simple thing is already what the backend supports.

Richard.
Artem Shinkarov - Aug. 25, 2011, 1:15 p.m.
On Thu, Aug 25, 2011 at 2:00 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 2:45 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>>>>>> we discussed.
>>>>>>
>>>>>> So I think it is a right time to do something about vcond patterns,
>>>>>> which would allow me to get rid of conversions that I need to put all
>>>>>> over the code.
>>>>>>
>>>>>> Also at the moment the patch breaks lto frontend with a simple example:
>>>>>> #define vector(elcount, type)  \
>>>>>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>>
>>>>>> int main (int argc, char *argv[]) {
>>>>>>    vector (4, float) f0;
>>>>>>    vector (4, float) f1;
>>>>>>
>>>>>>    f0 =  f1 != f0
>>>>>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>>>>>
>>>>>>    return (int)f0[argc];
>>>>>> }
>>>>>>
>>>>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>>>>>
>>>>>> I looked into the file, the conversion function is defined as
>>>>>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>>>>>> know what is the right way to treat the conversions.
>>>>>>
>>>>>> And I seriously need help with backend patterns.
>>>>>
>>>>> On the patch.
>>>>>
>>>>> The documentation needs review by a native english speaker, but here
>>>>> are some factual comments:
>>>>>
>>>>> +In C vector comparison is supported within standard comparison operators:
>>>>>
>>>>> it should read 'In GNU C' here and everywhere else as this is a GNU
>>>>> extension.
>>>>>
>>>>>  The result of the
>>>>> +comparison is a signed integer-type vector where the size of each
>>>>> +element must be the same as the size of compared vectors element.
>>>>>
>>>>> The result type of the comparison is determined by the C frontend,
>>>>> it isn't under control of the user.  What you are implying here is
>>>>> restrictions on vector assignments, which are documented elsewhere.
>>>>> I'd just say
>>>>>
>>>>> 'The result of the comparison is a vector of the same width and number
>>>>> of elements as the comparison operands with a signed integral element
>>>>> type.'
>>>>>
>>>>> +In addition to the vector comparison C supports conditional expressions
>>>>>
>>>>> See above.
>>>>>
>>>>> +For the convenience condition in the vector conditional can be just a
>>>>> +vector of signed integer type.
>>>>>
>>>>> 'of integer type.'  I don't see a reason to disallow unsigned integers,
>>>>> they can be equally well compared against zero.
>>>>
>>>> I'll have a final go on the documentation, it is untouched from the old patches.
>>>>
>>>>> Index: gcc/targhooks.h
>>>>> ===================================================================
>>>>> --- gcc/targhooks.h     (revision 177665)
>>>>> +++ gcc/targhooks.h     (working copy)
>>>>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>>>>>  extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>>>>>
>>>>>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
>>>>> +
>>>>>  extern bool
>>>>>  default_builtin_support_vector_misalignment (enum machine_mode mode,
>>>>>                                             const_tree,
>>>>>
>>>>> spurious whitespace change.
>>>>
>>>> Yes, thanks.
>>>>
>>>>> Index: gcc/optabs.c
>>>>> ===================================================================
>>>>> --- gcc/optabs.c        (revision 177665)
>>>>> +++ gcc/optabs.c        (working copy)
>>>>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
>>>>> ...
>>>>> +  else
>>>>> +    {
>>>>> +      rtx rtx_op0;
>>>>> +      rtx vec;
>>>>> +
>>>>> +      rtx_op0 = expand_normal (op0);
>>>>> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
>>>>> +      vec = CONST0_RTX (mode);
>>>>> +
>>>>> +      create_output_operand (&ops[0], target, mode);
>>>>> +      create_input_operand (&ops[1], rtx_op1, mode);
>>>>> +      create_input_operand (&ops[2], rtx_op2, mode);
>>>>> +      create_input_operand (&ops[3], comparison, mode);
>>>>> +      create_input_operand (&ops[4], rtx_op0, mode);
>>>>> +      create_input_operand (&ops[5], vec, mode);
>>>>>
>>>>> this still builds the fake(?) != comparison, but as you said you need help
>>>>> with the .md part if we want to use a machine specific pattern for this
>>>>> case (which we eventually want, for the sake of using XOP vcond).
>>>>
>>>> Yes, I am waiting for it. This is the only way at the moment to make
>>>> sure that in
>>>> m = a > b;
>>>> r = m ? c : d;
>>>>
>>>> m in the vcond is not transformed into the m != 0.
>>>>
>>>>> Index: gcc/target.h
>>>>> ===================================================================
>>>>> --- gcc/target.h        (revision 177665)
>>>>> +++ gcc/target.h        (working copy)
>>>>> @@ -51,6 +51,7 @@
>>>>>  #define GCC_TARGET_H
>>>>>
>>>>>  #include "insn-modes.h"
>>>>> +#include "gimple.h"
>>>>>
>>>>>  #ifdef ENABLE_CHECKING
>>>>>
>>>>> spurious change.
>>>>
>>>> Old stuff, fixed.
>>>>
>>>>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>>>>>      floating-point, we can only do some of these simplifications.)  */
>>>>>   if (operand_equal_p (arg0, arg1, 0))
>>>>>     {
>>>>> +      tree arg0_type = TREE_TYPE (arg0);
>>>>> +
>>>>>       switch (code)
>>>>>        {
>>>>>        case EQ_EXPR:
>>>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>>>> +         if (! FLOAT_TYPE_P (arg0_type)
>>>>> +             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>>>>> ...
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>> Likewise.
>>>>>
>>>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>>>     case UNGE_EXPR:
>>>>>     case UNEQ_EXPR:
>>>>>     case LTGT_EXPR:
>>>>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>>>> +       {
>>>>> +         enum tree_code code = ops->code;
>>>>> +         tree arg0 = ops->op0;
>>>>> +         tree arg1 = ops->op1;
>>>>>
>>>>> move this code to do_store_flag (we really store a flag value).  It should
>>>>> also simply do what expand_vec_cond_expr does, probably simply
>>>>> calling that with the {-1,...} {0,...} extra args should work.
>>>>
>>>> I started to do that, but the code in do_store_flag is completely
>>>> different from what I am doing, and it looks confusing. I just call
>>>> expand_vec_cond_expr and that is it. I can write a separate function,
>>>> but the code is quite small.
>>>
>>> Hm?  I see in your patch
>>>
>>> Index: gcc/expr.c
>>> ===================================================================
>>> --- gcc/expr.c  (revision 177665)
>>> +++ gcc/expr.c  (working copy)
>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>     case UNGE_EXPR:
>>>     case UNEQ_EXPR:
>>>     case LTGT_EXPR:
>>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>> +       {
>>> +         enum tree_code code = ops->code;
>>> +         tree arg0 = ops->op0;
>>> +         tree arg1 = ops->op1;
>>> +         tree arg_type = TREE_TYPE (arg0);
>>> +         tree el_type = TREE_TYPE (arg_type);
>>> +         tree t, ifexp, if_true, if_false;
>>> +
>>> +         el_type = lang_hooks.types.type_for_size (TYPE_PRECISION
>>> (el_type), 0);
>>> +
>>> +
>>> +         ifexp = build2 (code, type, arg0, arg1);
>>> +         if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
>>> +         if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
>>> +
>>> +         if (arg_type != type)
>>> +           {
>>> +             if_true = convert (arg_type, if_true);
>>> +             if_false = convert (arg_type, if_false);
>>> +             t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
>>> +             t = convert (type, t);
>>> +           }
>>> +         else
>>> +           t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
>>> +
>>> +         return expand_expr (t,
>>> +                             modifier != EXPAND_STACK_PARM ? target :
>>> NULL_RTX,
>>> +                             tmode != VOIDmode ? tmode : mode,
>>> +                             modifier);
>>> +       }
>>>
>>> that's not exactly "calling expand_vec_cond_expr".
>>
>> Well, actually it is. Keep in mind that clean backend would imply
>> removing the conversions. But I'll make a function.
>
> Why does
>
>  return expand_vec_cond_expr (build2 (ops->code, type, ops->op0, ops->op1),
>                                               build_vector_from_val
> (type, build_int_cst (el_type, -1)),
>                                               build_vector_from_val
> (type, build_int_cst (el_type, 0)));
>
> not work?  If you push the conversions to expand_vec_cond_expr
> by doing them on RTL you simplify things here and remove the requirement
> from doing them in the C frontend for VEC_COND_EXPR as well.

It does not work because vcond <a > b, c, d> requires a,b,c,d to have
the same type. Now here we are dealing only with comparisons, so in
case of floats we have vcond < f0 > f1, {-1,...}, {0,...}> which we
have to transform into
(vsi)(vcond< f0 >f1, (vsf){-1,...}, (vsf){0,...}>).

Ok, so is it ok to do make this conversion here for the real types?

>>>>>
>>>>> As for the still required conversions, you should be able to delay those
>>>>> from the C frontend (and here) to expand_vec_cond_expr by, after
>>>>> expanding op1 and op2, wrapping a subreg around it with a proper mode
>>>>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
>>>>> and then convert the result back to the original mode.
>>>>>
>>>>> I'll leave the C frontend pieces of the patch for review by Joseph, but
>>>>
>>>> Conversions are there until we fix the backend. When backend will be
>>>> able to digest f0 > f1 ? int0 : int1, all the conversions will go
>>>> away.
>>>>
>>>>> +static tree
>>>>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>>>>>
>>>>> is missing a function comment.
>>>>
>>>> fixed.
>>>>
>>>>> +static tree
>>>>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
>>>>> +         tree bitpos, tree bitsize, enum tree_code code)
>>>>> +{
>>>>> +  tree cond;
>>>>> +  tree comp_type;
>>>>> +
>>>>> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
>>>>> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
>>>>> +
>>>>> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
>>>>> +
>>>>>
>>>>> Use
>>>>>
>>>>>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);
>>>>>
>>>>> instead.  But I think you don't want to use TYPE_PRECISION on
>>>>> FP types.  Instead you want a signed integer type of the same (mode)
>>>>> size as the vector element type, thus
>>>>>
>>>>>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
>>>>> (TYPE_MODE (inner_type)), 0);
>>>>
>>>> Hm, I thought that at this stage we don't wan to know anything about
>>>> modes. I mean here I am really building the same integer type as the
>>>> operands of the comparison have. But I can use MODE_BITSIZE as well, I
>>>> don't think that it could happen that the size of the mode is
>>>> different from the size of the type. Or could it?
>>>
>>> The comparison could be on floating-point types where TYPE_PRECISION
>>> can be, for example, 80 for x87 doubles.  You want an integer type
>>> of the same width, so yes, GET_MODE_BITSIZE is the correct thing
>>> to use here.
>>
>> Ok.
>>
>>>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>>>
>>>>> the result type of a comparison is boolean_type_node, not comp_type.
>>>>>
>>>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
>>>>> +                    build_int_cst (comp_type, -1),
>>>>> +                    build_int_cst (comp_type, 0));
>>>>>
>>>>> writing this as
>>>>>
>>>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type,
>>>>>                     fold_build2 (code, boolean_type_node, a, b),
>>>>> +                    build_int_cst (comp_type, -1),
>>>>> +                    build_int_cst (comp_type, 0));
>>>>>
>>>>> will get the gimplifier a better chance at simplifcation.
>>>>>
>>>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>>>
>>>>> I think we are expecting the scalar type and the vector mode here
>>>>> from looking at the single existing caller.  It probably doesn't make
>>>>> a difference (we only check TYPE_UNSIGNED of it, which should
>>>>> also work for vector types), but let's be consistent.  Thus,
>>>>
>>>> Ok.
>>>>
>>>>>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>>>>>
>>>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>>> +    t = expand_vector_piecewise (gsi, do_compare, type,
>>>>> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
>>>>> +  else
>>>>> +    t = gimplify_build2  (gsi, code, type, op0, op1);
>>>>>
>>>>> the else case looks odd.  Why re-build a stmt that already exists?
>>>>> Simply return NULL_TREE instead?
>>>>
>>>> I can adjust. The reason it is written that way is that
>>>> expand_vector_operations_1 is using the result of the function to
>>>> update rhs.
>>>
>>> Ok, so it should check whether there was any lowering done then.
>>>
>>>>> +static tree
>>>>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
>>>>> +{
>>>>> ...
>>>>> +      /* Expand vector condition inside of VEC_COND_EXPR.  */
>>>>> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
>>>>> +                                   TYPE_MODE (TREE_TYPE (cond))))
>>>>> +       {
>>>>> ...
>>>>> +         new_rhs = expand_vector_piecewise (gsi, do_compare,
>>>>> +                                            TREE_TYPE (cond),
>>>>> +                                            TREE_TYPE (TREE_TYPE (op1)),
>>>>> +                                            op0, op1, TREE_CODE (cond));
>>>>>
>>>>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>>>>>
>>>>> tem = { a[0] < b[0] ? -1 : 0, ... }
>>>>> v0 & tem | v1 & ~tem;
>>>>>
>>>>> instead of
>>>>>
>>>>> { a[0] < b[0] ? v0[0] : v1[0], ... }
>>>>>
>>>>> even if the bitwise operations could be carried out using vectors.
>>>>> It's definitely beneficial to do the first if the CPU can create the
>>>>> bitmask.
>>>>>
>>>>
>>>> o_O
>>>>
>>>> I thought you always wanted to do (m & v0) | (~m & v1).
>>>> Do you want to have two cases of the expansion then -- when we have
>>>> mask available and when we don't? But it is really unlikely that we
>>>> can get the mask, but cannot get vcond. Because condition is actually
>>>> vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] :
>>>> c[0], ...}?
>>>
>>> Hm, yeah.  I suppose with the current setup it's hard to only
>>> get the mask but not the full vcond ;)  So it probably makes
>>> sense to always expand to {a[0] > b[0]  ? v[0] :c[0],...} as
>>> fallback.  Sorry for the confusion ;)
>>
>> Ok.
>>
>>>>> +  /* Run vecower on the expresisons we have introduced.  */
>>>>> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
>>>>> +    expand_vector_operations_1 (&gsi_tmp);
>>>>>
>>>>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>>>>>
>>>>> +static bool
>>>>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
>>>>> +{
>>>>>
>>>>> This function is lacking a comment.
>>>>>
>>>>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
>>>>> ...
>>>>> +      /* Try to get rid from the useless vector comparison
>>>>> +        x != {0,0,...} which is inserted by the typechecker.  */
>>>>> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>>>>>
>>>>> how and why?  You simply drop that comparison - that doesn't look
>>>>> correct.  And in fact TREE_OPERAND (cond, 0) will never be a
>>>>> comparison - that wouldn't be valid gimple.  Please leave this
>>>>> optimization to SSA based forward propagation (I can help you here
>>>>> once the patch is in).
>>>>
>>>> No-no-no. This is the second part of avoiding
>>>> m = a > b;
>>>> r = m ? v0 : v1;
>>>>
>>>> to prevent m expansion to m != {0}.
>>>>
>>>> I do not _simply_ drop the comparison. I drop it only if
>>>> is_vector_comparison returned true. It means that we can never get
>>>> into the situation that we are dropping actually a comparison inserted
>>>> by the user. But what I really want to achieve here is to drop the
>>>> comparison that the frontend inserts every time when it sees an
>>>> expression there.
>>>>
>>>> As I said earlier, tree forward propagation kicks only using -On, and
>>>> I would really like to make sure that I can get rid of useless != {0}
>>>> at any level.
>>
>>> Please don't.  If the language extension forces a != 0 then it should
>>> appear at -O0.  The code is fishy anyway in the way it walks stmts
>>> in is_vector_comparison.  At least I don't like to see this optimization
>>> done here for the sake of -O0 in this initial patch - you could try
>>> arguing about it as a followup improvement (well, probably with not
>>> much luck).  -O0 is about compile-speed and debugging, doing
>>> data-flow by walking stmts backward is slow.
>>
>> Ok, then I seriously don't see any motivation to support the
>> VEC_COND_EXPR. The following code:
>>
>> m = a > b;
>> r = (m & v0) | (~m & v1)
>>
>> gives me much more flexibility and  control. What the VEC_COND_EXPR is
>> good for? Syntactical sugar?
>>
>> How about throwing away all the VEC_COND_EXPR parts supporting only
>> conditions (implicitly expressed using vconds)? If we would agree on
>> implicit conversions for real types, then this is a functionality that
>> perfectly satisfies my needs.
>>
>> I don't see any interest from the backend people and I cannot wait
>> forever, so why don't we start with a simple thing?
>
> But the simple thing is already what the backend supports.
>
> Richard.
>

Well, it is not "what" it is "how" -- that is what we are discussing
for three weeks already.

Ok, so the question now is, whether it is fine to have conversions
inside expand_expr_real_2? If we agree that it is ok to do, then I can
adjust the patch.


Artem.

Patch

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 177665)
+++ gcc/expr.c  (working copy)
@@ -8440,6 +8440,37 @@  expand_expr_real_2 (sepops ops, rtx targ
     case UNGE_EXPR:
     case UNEQ_EXPR:
     case LTGT_EXPR:
+      if (TREE_CODE (ops->type) == VECTOR_TYPE)
+       {
+         enum tree_code code = ops->code;
+         tree arg0 = ops->op0;
+         tree arg1 = ops->op1;
+         tree arg_type = TREE_TYPE (arg0);
+         tree el_type = TREE_TYPE (arg_type);
+         tree t, ifexp, if_true, if_false;
+
+         el_type = lang_hooks.types.type_for_size (TYPE_PRECISION
(el_type), 0);
+
+
+         ifexp = build2 (code, type, arg0, arg1);
+         if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
+         if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
+
+         if (arg_type != type)
+           {
+             if_true = convert (arg_type, if_true);
+             if_false = convert (arg_type, if_false);
+             t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
+             t = convert (type, t);
+           }
+         else
+           t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
+
+         return expand_expr (t,
+                             modifier != EXPAND_STACK_PARM ? target :
NULL_RTX,
+                             tmode != VOIDmode ? tmode : mode,
+                             modifier);
+       }