diff mbox

Vector Comparison patch

Message ID CAFiYyc3ZEka=HNO63yzkP0wdHLTLbchHpFDAur2SoFGGCHjOEQ@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Aug. 15, 2011, 2:24 p.m. UTC
On Fri, Aug 12, 2011 at 4:03 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Hi
>
> Here is a completed version of the vector comparison patch we
> discussed a long time ago here:
> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01184.html
>
> The patch implements vector comparison according to the OpenCL
> standard, when the result of the comparison of two vectors is vector
> of signed integers, where -1 represents true and 0 false.
>
> The patch implements vector conditional res = VCOND<V1 ? V2 : V3>
> which is expanded into:
> foreach (i in length (V1)) res[i] = V1 == 0 ? V3[i] : V2[i].

Some comments on the patch below.  First, in general I don't see
why you need a new target hook to specify whether to "vectorize"
a comparison.  Why are the existing hooks used by the vectorizer
not enough?

+{
+  enum ix86_builtins fcode;

is there a reason we need this and cannot simply provide expanders
for the named patterns?  We'd need to give them semantics of
producing all-ones / all-zero masks of course.  Richard, do you think
that's sensible?  That way we'd avoid the new target hook and could
simply do optab queries.

Thanks,
Richard.

> ChangeLog
>
> 2011-08-12 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>
>       gcc/
>       * targhooks.c (default_builtin_vec_compare): New hook.
>       * targhooks.h (default_builtin_vec_compare): New definition.
>       * target.def (builtin_vec_compare): New hook.
>       * target.h: New include (gimple.h).
>       * fold-const.c
>       (fold_comparison): Adjust x <cmp> x vector operations.
>       * c-typeck.c (build_binary_op): Allow vector comparison.
>       (c_obj_common_truthvalue_conversion): Deny vector comparison
>       inside of if statement.
>       (build_conditional_expr): Adjust to build VEC_COND_EXPR.
>       * tree-vect-generic.c (do_compare): Helper function.
>       (expand_vector_comparison): Check if hardware comparison
>       is available, if not expand comparison piecewise.
>       (expand_vector_operation): Handle vector comparison
>       expressions separately.
>       (earlyexpand_vec_cond_expr): Expand vector comparison
>       piecewise.
>       * Makefile.in: New dependencies.
>       * tree-cfg.c (verify_gimple_comparison): Allow vector
>       comparison operations in gimple.
>       * c-parser.c (c_parser_conditional_expression): Adjust
>       to handle VEC_COND_EXPR.
>       * gimplify.c (gimplify_expr): Adjust to handle VEC_COND_EXPR.
>       * config/i386/i386.c (vector_fp_compare): Build hardware
>       specific code for floating point vector comparison.
>       (vector_int_compare): Build hardware specific code for
>       integer vector comparison.
>       (ix86_vectorize_builtin_vec_compare): Implementation of
>       builtin_vec_compare hook.
>
>       gcc/testsuite/
>       * gcc.c-torture/execute/vector-vcond-1.c: New test.
>       * gcc.c-torture/execute/vector-vcond-2.c: New test.
>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>       * gcc.dg/vector-compare-1.c: New test.
>       * gcc.dg/vector-compare-2.c: New test.
>
>       gcc/doc
>       * extend.texi: Adjust.
>       * tm.texi: Adjust.
>       * tm.texi.in: Adjust.
>
>
> bootstrapped and tested on x86_64_unknown-linux.
>
>
> Thanks,
> Artem Shinkarov.
>

Comments

Artem Shinkarov Aug. 15, 2011, 4:58 p.m. UTC | #1
On Mon, Aug 15, 2011 at 3:24 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Fri, Aug 12, 2011 at 4:03 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> Hi
>>
>> Here is a completed version of the vector comparison patch we
>> discussed a long time ago here:
>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01184.html
>>
>> The patch implements vector comparison according to the OpenCL
>> standard, when the result of the comparison of two vectors is vector
>> of signed integers, where -1 represents true and 0 false.
>>
>> The patch implements vector conditional res = VCOND<V1 ? V2 : V3>
>> which is expanded into:
>> foreach (i in length (V1)) res[i] = V1 == 0 ? V3[i] : V2[i].
>
> Some comments on the patch below.  First, in general I don't see
> why you need a new target hook to specify whether to "vectorize"
> a comparison.  Why are the existing hooks used by the vectorizer
> not enough?
>
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    (revision 177665)
> +++ gcc/fold-const.c    (working copy)
> @@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
>      floating-point, we can only do some of these simplifications.)  */
>   if (operand_equal_p (arg0, arg1, 0))
>     {
> -      switch (code)
> +      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
>        {
> -       case EQ_EXPR:
> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
> -           return constant_boolean_node (1, type);
>
> I think this change should go in a separate patch for improved
> constant folding.  It shouldn't be necessary for enabling vector compares, no?

Unfortunately no, this case must be covered here, otherwise x != x
condition fails.

>
> +      if (TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (ifexp))))
> +        {
> +          error_at (colon_loc, "vector comparison must be of signed "
> +                              "integer vector type");
> +          return error_mark_node;
> +        }
>
> why that?

Well, later on I rely on this fact. I mean OpenCL says that it should
return -1 in the sense that all bits set. I don't really know, I can
support unsigned masks as well, but wouldn't it just introduce a
source for possible errors. I mean that natural choice for true and
flase is 0 and 1, not 0 and -1. Anyway I don't have a strong opinion
there, and I could easily adjust it if we decide that we want it.

>
> +      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
> +          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
> +             != TYPE_VECTOR_SUBPARTS (type1))
> +        {
> +          error_at (colon_loc, "vectors of different length found in "
> +                               "vector comparison");
> +          return error_mark_node;
> +        }
>
> I miss verification that type1 and type2 are vector types, or is that done
> elsewhere?  I think type1 and type2 are already verified to be
> compatible (but you might double-check).  At least the above would be
> redundant with

Thanks, type1 and type2 both vectors comparison is missing, going to
be added in the new version of the patch.
>
> +      if (type1 != type2)
> +        {
> +          error_at (colon_loc, "vectors of different types involved in "
> +                               "vector comparison");
> +          return error_mark_node;
> +        }

You are right, what I meant here is TREE_TYPE (type1) != TREE_TYPE
(type2), because vector (4, int) have the same number of elements as
vector (4, float). This would be fixed in the new version.

>
> Joseph may have comments about the fully-fold stuff that follows.
>
> +      /* Currently the expansion of VEC_COND_EXPR does not allow
> +        expessions where the type of vectors you compare differs
> +        form the type of vectors you select from. For the time
> +        being we insert implicit conversions.  */
> +      if ((COMPARISON_CLASS_P (ifexp)
>
> Why only for comparison-class?
Not only, there is || involved:
(COMPARISON_CLASS_P (ifexp)  && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
|| TREE_TYPE (ifexp) != type1

So if this is a comparison class, we check the first operand, because
the result of the comparison fits, however the operands could not. In
case we have an expression of signed vector, we know that we would
transform it into exp != {0,0,...} in tree-vect-generic.c, but if the
types of operands do not match we convert them.

>
> +          && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
> +         || TREE_TYPE (ifexp) != type1)
> +       {
> +         tree comp_type = COMPARISON_CLASS_P (ifexp)
> +                          ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
> +                          : TREE_TYPE (ifexp);
> +         tree vcond;
> +
> +         op1 = convert (comp_type, op1);
> +         op2 = convert (comp_type, op2);
> +         vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
> +         vcond = convert (type1, vcond);
> +         return vcond;
> +       }
> +      else
> +       return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);
>
> In the end we of course will try to fix the middle-end/backends to
> allow mixed types here as the current restriction doesn't really make sense.

Yes, that would be nice, but these conversions do not really affect
the code generation, so for the time being I think it is fine to have
them.

>
>     case EQ_EXPR:
>     case NE_EXPR:
> +      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
> +        {
> +          tree intt;
> +          if (TREE_TYPE (type0) != TREE_TYPE (type1))
> +            {
> +              error_at (location, "comparing vectors with different "
> +                                  "element types");
> +              return error_mark_node;
> +            }
> +
> +          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
> +            {
> +              error_at (location, "comparing vectors with different "
> +                                  "number of elements");
> +              return error_mark_node;
> +            }
>
> as above - compatibility should already be ensured, thus type0 == type1
> here?

Yeah, we know that they are both vector types, but that is about all
we know. Anyhow, all these errors are reachable. As an example see
vector-compare-1.c:
r4 = x > y;  /* { dg-error "comparing vectors with different element types" } */
r8 == r4; /* { dg-error "comparing vectors with different number of
elements"} */

>
> +/* Try a hardware hook for vector comparison or
> +   extract comparison piecewise.  */
> +static tree
> +expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
> +                          tree op1, enum tree_code code)
>
> comments should mention and describe all function arguments.

Ok, coming in the new version of the patch.

> +/* Expand vector condition EXP which should have the form
> +   VEC_COND_EXPR<cond, vec0, vec1> into the following
> +   vector:
> +     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
> +   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
> +static tree
> +earlyexpand_vec_cond_expr (gimple_stmt_iterator *gsi, tree exp)
>
> that would be expand_vec_cond_expr_piecewise, no?

Adjusted.

>
> +  /* Ensure that we will be able to expand vector comparison
> +     in case it is not supported by the architecture.  */
> +  gcc_assert (COMPARISON_CLASS_P (cond));
>
> that looks dangerous to me - did you try
>
>  vec = v1 <= v2;
>  vec2 = vec ? v1 : v2;
>
> without optimization?

Sure, tests should cover this case.
I have this assertion there because only two cases are possible:
1) it is a comparison
2) function callee converted expr to expr != {0,0,...}
So we should be perfectly fine.

>
> +  /* Check if we need to expand vector condition inside of
> +     VEC_COND_EXPR.  */
> +  var = create_tmp_reg (TREE_TYPE (cond), "cond");
> +  new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
> +                                      TREE_OPERAND (cond, 0),
> +                                     TREE_OPERAND (cond, 1),
> +                                      TREE_CODE (cond));
>
> That unconditionally expands, so no need for "Check".

Ok.

>
> +  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
> +  v = VEC_alloc(constructor_elt, gc, nunits);
> +  for (i = 0; i < nunits;
> +       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
> +    {
> +      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
> +      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
> +      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
> +      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond,
> +                                    build_int_cst (inner_type ,0));
>
> I seriously doubt that when expanding this part piecewise expanding
> the mask first in either way is going to be beneficial.  Instead I would
> suggest to "inline" the comparison here.  Thus instead of

Well, the ting is that, if expand_vector_comparison, would insert
builtin there rather than expanding the code piecewise, I'll have to
do the comparison with 0 anyway, because true is expressed as -1
there.

Well, I would hope that in case we have:
c_0 = a_0 > b_0;
d_0 = c_0 != 0;

{d_0, d_1,...}

all the d_n should be constant-folded, or should I pull fold explicitly here?

1) I construct the mask
>
>  mask =
>         = { mask[0] != 0 ? ... }
>
> do
>
>          = { c0[0] < c1[0] ? ..., }
>
> or even expand the ? : using mask operations if we efficiently can
> create that mask.
>

I assume that if we cannot expand VEC_COND_EXPR, then masking the
elements is a problem for us. Otherwise VEC_COND_EXPE expansion has a
bug somewhere. Or I am wrong somewhere?

>
> +  /* Check if VEC_COND_EXPR is supported in hardware within the
> +     given types.  */
> +  if (code == VEC_COND_EXPR)
> +    {
> +      tree exp = gimple_assign_rhs1 (stmt);
> +      tree cond = TREE_OPERAND (exp, 0);
> +
> +      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
> +        change it to A != {0,0,...} ? V0 : V1  */
> +      if (!COMPARISON_CLASS_P (cond))
> +       TREE_OPERAND (exp, 0) =
> +           build2 (NE_EXPR, TREE_TYPE (cond), cond,
> +                   build_vector_from_val (TREE_TYPE (cond),
> +                     build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
>
> That looks inefficient as well.  Iff we know that the mask is always
> either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using
> bitwise operations (see what the i?86 expander does, for example).

This is a requirement of VEC_COND_EXPR, I need to pass 4 parameters,
not 3, that is why I introduce this fake {0,0,..} here.

>
> @@ -471,6 +603,7 @@ expand_vector_operations_1 (gimple_stmt_
>
>   gcc_assert (code != CONVERT_EXPR);
>
> +
>   /* The signedness is determined from input argument.  */
>   if (code == VEC_UNPACK_FLOAT_HI_EXPR
>       || code == VEC_UNPACK_FLOAT_LO_EXPR)
>
> spurious whitespace change.

Fixed.
>
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      (revision 177665)
> +++ gcc/tree-cfg.c      (working copy)
> @@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
>       return true;
>     }
>
> +  if (TREE_CODE (op0_type) == VECTOR_TYPE
> +      && TREE_CODE (op1_type) == VECTOR_TYPE
> +      && TREE_CODE (type) == VECTOR_TYPE)
> +    {
>
> this should check TREE_CODE (type) == VECTOR_TYPE only
> and then verify the comparison operands are actually vectors.

Yes, you are right, adjusted.

>
> +      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
> +        {
> +          error ("invalid vector comparison, vector element type mismatch");
> +          debug_generic_expr (op0_type);
> +          debug_generic_expr (op1_type);
> +          return true;
> +        }
>
> this needs to use code similar to the scalar variant,
>
>          !useless_type_conversion_p (op0_type, op1_type)
>          && !useless_type_conversion_p (op1_type, op0_type)
>
> which also makes the first TYPE_VECTOR_SUBPARTS redundant.
>

Fixed.

> +      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
> +          && TYPE_PRECISION (TREE_TYPE (op0_type))
> +             != TYPE_PRECISION (TREE_TYPE (type)))
> +        {
> +          error ("invalid vector comparison resulting type");
> +          debug_generic_expr (type);
> +          return true;
> +        }
>
> I think you can drop the TYPE_PRECISION check.  We might want to
> assert that a vector element types precision always matches its
> mode precision (in make_vector_type).

I would leave it for a while. During the optimisation you can
construct some strange things, so I would better make verifier
resistant to the all kind of stuff.

>
> Index: gcc/c-parser.c
> ===================================================================
> --- gcc/c-parser.c      (revision 177665)
> +++ gcc/c-parser.c      (working copy)
> @@ -5337,8 +5337,17 @@ c_parser_conditional_expression (c_parse
>   if (c_parser_next_token_is (parser, CPP_COLON))
>     {
>       tree eptype = NULL_TREE;
> -
> +
>       middle_loc = c_parser_peek_token (parser)->location;
> +
> +      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
>
> watch out for whitespace changes - you add a trailing tab here.

Fixed.

>
> +/* Find target specific sequence for vector comparison of
> +   real-type vectors V0 and V1. Returns variable containing
> +   result of the comparison or NULL_TREE in other case.  */
> +static tree
> +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype,
> +                   enum machine_mode mode, tree v0, tree v1,
> +                   enum tree_code code)
> +{
> +  enum ix86_builtins fcode;
>
> is there a reason we need this and cannot simply provide expanders
> for the named patterns?  We'd need to give them semantics of
> producing all-ones / all-zero masks of course.  Richard, do you think
> that's sensible?  That way we'd avoid the new target hook and could
> simply do optab queries.

I think I don't really understand the idea. How we are going to
represent the fact that we need to convert a given node to the given
machine instruction? May be you could point where the similar
technique is already used.


The new patch will be tested and submitted here soon.


Thanks,
Artem.

>
> Thanks,
> Richard.
>
>> ChangeLog
>>
>> 2011-08-12 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>
>>       gcc/
>>       * targhooks.c (default_builtin_vec_compare): New hook.
>>       * targhooks.h (default_builtin_vec_compare): New definition.
>>       * target.def (builtin_vec_compare): New hook.
>>       * target.h: New include (gimple.h).
>>       * fold-const.c
>>       (fold_comparison): Adjust x <cmp> x vector operations.
>>       * c-typeck.c (build_binary_op): Allow vector comparison.
>>       (c_obj_common_truthvalue_conversion): Deny vector comparison
>>       inside of if statement.
>>       (build_conditional_expr): Adjust to build VEC_COND_EXPR.
>>       * tree-vect-generic.c (do_compare): Helper function.
>>       (expand_vector_comparison): Check if hardware comparison
>>       is available, if not expand comparison piecewise.
>>       (expand_vector_operation): Handle vector comparison
>>       expressions separately.
>>       (earlyexpand_vec_cond_expr): Expand vector comparison
>>       piecewise.
>>       * Makefile.in: New dependencies.
>>       * tree-cfg.c (verify_gimple_comparison): Allow vector
>>       comparison operations in gimple.
>>       * c-parser.c (c_parser_conditional_expression): Adjust
>>       to handle VEC_COND_EXPR.
>>       * gimplify.c (gimplify_expr): Adjust to handle VEC_COND_EXPR.
>>       * config/i386/i386.c (vector_fp_compare): Build hardware
>>       specific code for floating point vector comparison.
>>       (vector_int_compare): Build hardware specific code for
>>       integer vector comparison.
>>       (ix86_vectorize_builtin_vec_compare): Implementation of
>>       builtin_vec_compare hook.
>>
>>       gcc/testsuite/
>>       * gcc.c-torture/execute/vector-vcond-1.c: New test.
>>       * gcc.c-torture/execute/vector-vcond-2.c: New test.
>>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>>       * gcc.dg/vector-compare-1.c: New test.
>>       * gcc.dg/vector-compare-2.c: New test.
>>
>>       gcc/doc
>>       * extend.texi: Adjust.
>>       * tm.texi: Adjust.
>>       * tm.texi.in: Adjust.
>>
>>
>> bootstrapped and tested on x86_64_unknown-linux.
>>
>>
>> Thanks,
>> Artem Shinkarov.
>>
>
Richard Biener Aug. 16, 2011, 3:28 p.m. UTC | #2
On Mon, Aug 15, 2011 at 6:58 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Mon, Aug 15, 2011 at 3:24 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Fri, Aug 12, 2011 at 4:03 AM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> Hi
>>>
>>> Here is a completed version of the vector comparison patch we
>>> discussed a long time ago here:
>>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01184.html
>>>
>>> The patch implements vector comparison according to the OpenCL
>>> standard, when the result of the comparison of two vectors is vector
>>> of signed integers, where -1 represents true and 0 false.
>>>
>>> The patch implements vector conditional res = VCOND<V1 ? V2 : V3>
>>> which is expanded into:
>>> foreach (i in length (V1)) res[i] = V1 == 0 ? V3[i] : V2[i].
>>
>> Some comments on the patch below.  First, in general I don't see
>> why you need a new target hook to specify whether to "vectorize"
>> a comparison.  Why are the existing hooks used by the vectorizer
>> not enough?
>>
>> Index: gcc/fold-const.c
>> ===================================================================
>> --- gcc/fold-const.c    (revision 177665)
>> +++ gcc/fold-const.c    (working copy)
>> @@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
>>      floating-point, we can only do some of these simplifications.)  */
>>   if (operand_equal_p (arg0, arg1, 0))
>>     {
>> -      switch (code)
>> +      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
>>        {
>> -       case EQ_EXPR:
>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>> -           return constant_boolean_node (1, type);
>>
>> I think this change should go in a separate patch for improved
>> constant folding.  It shouldn't be necessary for enabling vector compares, no?
>
> Unfortunately no, this case must be covered here, otherwise x != x
> condition fails.

How does it fail?

>>
>> +      if (TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (ifexp))))
>> +        {
>> +          error_at (colon_loc, "vector comparison must be of signed "
>> +                              "integer vector type");
>> +          return error_mark_node;
>> +        }
>>
>> why that?
>
> Well, later on I rely on this fact. I mean OpenCL says that it should
> return -1 in the sense that all bits set. I don't really know, I can
> support unsigned masks as well, but wouldn't it just introduce a
> source for possible errors. I mean that natural choice for true and
> flase is 0 and 1, not 0 and -1. Anyway I don't have a strong opinion
> there, and I could easily adjust it if we decide that we want it.

I think we want to allow both signed and unsigned masks.

>>
>> +      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
>> +          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
>> +             != TYPE_VECTOR_SUBPARTS (type1))
>> +        {
>> +          error_at (colon_loc, "vectors of different length found in "
>> +                               "vector comparison");
>> +          return error_mark_node;
>> +        }
>>
>> I miss verification that type1 and type2 are vector types, or is that done
>> elsewhere?  I think type1 and type2 are already verified to be
>> compatible (but you might double-check).  At least the above would be
>> redundant with
>
> Thanks, type1 and type2 both vectors comparison is missing, going to
> be added in the new version of the patch.
>>
>> +      if (type1 != type2)
>> +        {
>> +          error_at (colon_loc, "vectors of different types involved in "
>> +                               "vector comparison");
>> +          return error_mark_node;
>> +        }
>
> You are right, what I meant here is TREE_TYPE (type1) != TREE_TYPE
> (type2), because vector (4, int) have the same number of elements as
> vector (4, float). This would be fixed in the new version.
>
>>
>> Joseph may have comments about the fully-fold stuff that follows.
>>
>> +      /* Currently the expansion of VEC_COND_EXPR does not allow
>> +        expessions where the type of vectors you compare differs
>> +        form the type of vectors you select from. For the time
>> +        being we insert implicit conversions.  */
>> +      if ((COMPARISON_CLASS_P (ifexp)
>>
>> Why only for comparison-class?
> Not only, there is || involved:
> (COMPARISON_CLASS_P (ifexp)  && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
> || TREE_TYPE (ifexp) != type1
>
> So if this is a comparison class, we check the first operand, because
> the result of the comparison fits, however the operands could not. In
> case we have an expression of signed vector, we know that we would
> transform it into exp != {0,0,...} in tree-vect-generic.c, but if the
> types of operands do not match we convert them.

Hm, ok ... let's hope we can sort-out the backend issues before this
patch goes in so we can remove this converting stuff.

>>
>> +          && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
>> +         || TREE_TYPE (ifexp) != type1)
>> +       {
>> +         tree comp_type = COMPARISON_CLASS_P (ifexp)
>> +                          ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>> +                          : TREE_TYPE (ifexp);
>> +         tree vcond;
>> +
>> +         op1 = convert (comp_type, op1);
>> +         op2 = convert (comp_type, op2);
>> +         vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>> +         vcond = convert (type1, vcond);
>> +         return vcond;
>> +       }
>> +      else
>> +       return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);
>>
>> In the end we of course will try to fix the middle-end/backends to
>> allow mixed types here as the current restriction doesn't really make sense.
>
> Yes, that would be nice, but these conversions do not really affect
> the code generation, so for the time being I think it is fine to have
> them.
>
>>
>>     case EQ_EXPR:
>>     case NE_EXPR:
>> +      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
>> +        {
>> +          tree intt;
>> +          if (TREE_TYPE (type0) != TREE_TYPE (type1))
>> +            {
>> +              error_at (location, "comparing vectors with different "
>> +                                  "element types");
>> +              return error_mark_node;
>> +            }
>> +
>> +          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
>> +            {
>> +              error_at (location, "comparing vectors with different "
>> +                                  "number of elements");
>> +              return error_mark_node;
>> +            }
>>
>> as above - compatibility should already be ensured, thus type0 == type1
>> here?
>
> Yeah, we know that they are both vector types, but that is about all
> we know. Anyhow, all these errors are reachable. As an example see
> vector-compare-1.c:
> r4 = x > y;  /* { dg-error "comparing vectors with different element types" } */
> r8 == r4; /* { dg-error "comparing vectors with different number of
> elements"} */

Ok, I see.

>>
>> +/* Try a hardware hook for vector comparison or
>> +   extract comparison piecewise.  */
>> +static tree
>> +expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
>> +                          tree op1, enum tree_code code)
>>
>> comments should mention and describe all function arguments.
>
> Ok, coming in the new version of the patch.
>
>> +/* Expand vector condition EXP which should have the form
>> +   VEC_COND_EXPR<cond, vec0, vec1> into the following
>> +   vector:
>> +     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
>> +   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
>> +static tree
>> +earlyexpand_vec_cond_expr (gimple_stmt_iterator *gsi, tree exp)
>>
>> that would be expand_vec_cond_expr_piecewise, no?
>
> Adjusted.
>
>>
>> +  /* Ensure that we will be able to expand vector comparison
>> +     in case it is not supported by the architecture.  */
>> +  gcc_assert (COMPARISON_CLASS_P (cond));
>>
>> that looks dangerous to me - did you try
>>
>>  vec = v1 <= v2;
>>  vec2 = vec ? v1 : v2;
>>
>> without optimization?
>
> Sure, tests should cover this case.
> I have this assertion there because only two cases are possible:
> 1) it is a comparison
> 2) function callee converted expr to expr != {0,0,...}
> So we should be perfectly fine.
>
>>
>> +  /* Check if we need to expand vector condition inside of
>> +     VEC_COND_EXPR.  */
>> +  var = create_tmp_reg (TREE_TYPE (cond), "cond");
>> +  new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
>> +                                      TREE_OPERAND (cond, 0),
>> +                                     TREE_OPERAND (cond, 1),
>> +                                      TREE_CODE (cond));
>>
>> That unconditionally expands, so no need for "Check".
>
> Ok.
>
>>
>> +  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
>> +  v = VEC_alloc(constructor_elt, gc, nunits);
>> +  for (i = 0; i < nunits;
>> +       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
>> +    {
>> +      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
>> +      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
>> +      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
>> +      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond,
>> +                                    build_int_cst (inner_type ,0));
>>
>> I seriously doubt that when expanding this part piecewise expanding
>> the mask first in either way is going to be beneficial.  Instead I would
>> suggest to "inline" the comparison here.  Thus instead of
>
> Well, the ting is that, if expand_vector_comparison, would insert
> builtin there rather than expanding the code piecewise, I'll have to
> do the comparison with 0 anyway, because true is expressed as -1
> there.
>
> Well, I would hope that in case we have:
> c_0 = a_0 > b_0;
> d_0 = c_0 != 0;
>
> {d_0, d_1,...}
>
> all the d_n should be constant-folded, or should I pull fold explicitly here?
>
> 1) I construct the mask
>>
>>  mask =
>>         = { mask[0] != 0 ? ... }
>>
>> do
>>
>>          = { c0[0] < c1[0] ? ..., }
>>
>> or even expand the ? : using mask operations if we efficiently can
>> create that mask.
>>
>
> I assume that if we cannot expand VEC_COND_EXPR, then masking the
> elements is a problem for us. Otherwise VEC_COND_EXPE expansion has a
> bug somewhere. Or I am wrong somewhere?

I think we can always do bitwise operations, so if we can get at the
mask vector we are fine.

I was thinking about how the case of explicitly computing the value of
v1 < v2 into a vector vs. a condition inside a VEC_COND_EXPR should
be handled.  If the target produces a mask of condition codes for
a comparison then it might be able to efficiently expand a VEC_COND_EXPR.
It could as well generate a mask via expanding v1 < v2 ? -1 : 0 then.
A similar case is for AMD XOP which can expand mask ? v1 : v2
with a single instruction (so even without seeing a comparison).

Basically, if we can get at the mask we should use that to do the
vector selection in parallel via (v1 & mask) | (v2 & ~mask).

If we cannot even get at the mask then we can build the result
vector piecewise as { v1[0] < v2[0] ? v1[0] : v2[0], .... } etc.

>>
>> +  /* Check if VEC_COND_EXPR is supported in hardware within the
>> +     given types.  */
>> +  if (code == VEC_COND_EXPR)
>> +    {
>> +      tree exp = gimple_assign_rhs1 (stmt);
>> +      tree cond = TREE_OPERAND (exp, 0);
>> +
>> +      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
>> +        change it to A != {0,0,...} ? V0 : V1  */
>> +      if (!COMPARISON_CLASS_P (cond))
>> +       TREE_OPERAND (exp, 0) =
>> +           build2 (NE_EXPR, TREE_TYPE (cond), cond,
>> +                   build_vector_from_val (TREE_TYPE (cond),
>> +                     build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
>>
>> That looks inefficient as well.  Iff we know that the mask is always
>> either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using
>> bitwise operations (see what the i?86 expander does, for example).
>
> This is a requirement of VEC_COND_EXPR, I need to pass 4 parameters,
> not 3, that is why I introduce this fake {0,0,..} here.

Sure, but if you look at expand_vec_cond_expr_p then you don't need
that, and this fake comparison should instead be produced by the
expander (or really avoided by maybe splitting up the named pattern
into two).

It's for sure not necessary for earlyexpand_vec_cond_expr (but instead
makes it less efficient - with just the mask it can do the bitwise
fallback easily).

>>
>> @@ -471,6 +603,7 @@ expand_vector_operations_1 (gimple_stmt_
>>
>>   gcc_assert (code != CONVERT_EXPR);
>>
>> +
>>   /* The signedness is determined from input argument.  */
>>   if (code == VEC_UNPACK_FLOAT_HI_EXPR
>>       || code == VEC_UNPACK_FLOAT_LO_EXPR)
>>
>> spurious whitespace change.
>
> Fixed.
>>
>> Index: gcc/tree-cfg.c
>> ===================================================================
>> --- gcc/tree-cfg.c      (revision 177665)
>> +++ gcc/tree-cfg.c      (working copy)
>> @@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
>>       return true;
>>     }
>>
>> +  if (TREE_CODE (op0_type) == VECTOR_TYPE
>> +      && TREE_CODE (op1_type) == VECTOR_TYPE
>> +      && TREE_CODE (type) == VECTOR_TYPE)
>> +    {
>>
>> this should check TREE_CODE (type) == VECTOR_TYPE only
>> and then verify the comparison operands are actually vectors.
>
> Yes, you are right, adjusted.
>
>>
>> +      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
>> +        {
>> +          error ("invalid vector comparison, vector element type mismatch");
>> +          debug_generic_expr (op0_type);
>> +          debug_generic_expr (op1_type);
>> +          return true;
>> +        }
>>
>> this needs to use code similar to the scalar variant,
>>
>>          !useless_type_conversion_p (op0_type, op1_type)
>>          && !useless_type_conversion_p (op1_type, op0_type)
>>
>> which also makes the first TYPE_VECTOR_SUBPARTS redundant.
>>
>
> Fixed.
>
>> +      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
>> +          && TYPE_PRECISION (TREE_TYPE (op0_type))
>> +             != TYPE_PRECISION (TREE_TYPE (type)))
>> +        {
>> +          error ("invalid vector comparison resulting type");
>> +          debug_generic_expr (type);
>> +          return true;
>> +        }
>>
>> I think you can drop the TYPE_PRECISION check.  We might want to
>> assert that a vector element types precision always matches its
>> mode precision (in make_vector_type).
>
> I would leave it for a while. During the optimisation you can
> construct some strange things, so I would better make verifier
> resistant to the all kind of stuff.

Ok.

>>
>> Index: gcc/c-parser.c
>> ===================================================================
>> --- gcc/c-parser.c      (revision 177665)
>> +++ gcc/c-parser.c      (working copy)
>> @@ -5337,8 +5337,17 @@ c_parser_conditional_expression (c_parse
>>   if (c_parser_next_token_is (parser, CPP_COLON))
>>     {
>>       tree eptype = NULL_TREE;
>> -
>> +
>>       middle_loc = c_parser_peek_token (parser)->location;
>> +
>> +      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
>>
>> watch out for whitespace changes - you add a trailing tab here.
>
> Fixed.
>
>>
>> +/* Find target specific sequence for vector comparison of
>> +   real-type vectors V0 and V1. Returns variable containing
>> +   result of the comparison or NULL_TREE in other case.  */
>> +static tree
>> +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype,
>> +                   enum machine_mode mode, tree v0, tree v1,
>> +                   enum tree_code code)
>> +{
>> +  enum ix86_builtins fcode;
>>
>> is there a reason we need this and cannot simply provide expanders
>> for the named patterns?  We'd need to give them semantics of
>> producing all-ones / all-zero masks of course.  Richard, do you think
>> that's sensible?  That way we'd avoid the new target hook and could
>> simply do optab queries.
>
> I think I don't really understand the idea. How we are going to
> represent the fact that we need to convert a given node to the given
> machine instruction? May be you could point where the similar
> technique is already used.

In all places we check optab_handler (op, mode) != CODE_FOR_nothing.
We have eq_optab for example, so optab_handler (eq_optab, V4SImode)
would get you the instruction sequence for a comparison of V4SImode
vectors.  That isn't yet properly defined what it should return.

Otherwise I'd say we should ask the target to expand
v1 < v2 as VEC_COND_EXPR (v1 < v2, -1, 0) instead.  That one could
as well special-case the -1 and 0 result vectors (and maybe it already
does).

Richard.
>
> The new patch will be tested and submitted here soon.
>
>
> Thanks,
> Artem.
>
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog
>>>
>>> 2011-08-12 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>>
>>>       gcc/
>>>       * targhooks.c (default_builtin_vec_compare): New hook.
>>>       * targhooks.h (default_builtin_vec_compare): New definition.
>>>       * target.def (builtin_vec_compare): New hook.
>>>       * target.h: New include (gimple.h).
>>>       * fold-const.c
>>>       (fold_comparison): Adjust x <cmp> x vector operations.
>>>       * c-typeck.c (build_binary_op): Allow vector comparison.
>>>       (c_obj_common_truthvalue_conversion): Deny vector comparison
>>>       inside of if statement.
>>>       (build_conditional_expr): Adjust to build VEC_COND_EXPR.
>>>       * tree-vect-generic.c (do_compare): Helper function.
>>>       (expand_vector_comparison): Check if hardware comparison
>>>       is available, if not expand comparison piecewise.
>>>       (expand_vector_operation): Handle vector comparison
>>>       expressions separately.
>>>       (earlyexpand_vec_cond_expr): Expand vector comparison
>>>       piecewise.
>>>       * Makefile.in: New dependencies.
>>>       * tree-cfg.c (verify_gimple_comparison): Allow vector
>>>       comparison operations in gimple.
>>>       * c-parser.c (c_parser_conditional_expression): Adjust
>>>       to handle VEC_COND_EXPR.
>>>       * gimplify.c (gimplify_expr): Adjust to handle VEC_COND_EXPR.
>>>       * config/i386/i386.c (vector_fp_compare): Build hardware
>>>       specific code for floating point vector comparison.
>>>       (vector_int_compare): Build hardware specific code for
>>>       integer vector comparison.
>>>       (ix86_vectorize_builtin_vec_compare): Implementation of
>>>       builtin_vec_compare hook.
>>>
>>>       gcc/testsuite/
>>>       * gcc.c-torture/execute/vector-vcond-1.c: New test.
>>>       * gcc.c-torture/execute/vector-vcond-2.c: New test.
>>>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>>>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>>>       * gcc.dg/vector-compare-1.c: New test.
>>>       * gcc.dg/vector-compare-2.c: New test.
>>>
>>>       gcc/doc
>>>       * extend.texi: Adjust.
>>>       * tm.texi: Adjust.
>>>       * tm.texi.in: Adjust.
>>>
>>>
>>> bootstrapped and tested on x86_64_unknown-linux.
>>>
>>>
>>> Thanks,
>>> Artem Shinkarov.
>>>
>>
>
Artem Shinkarov Aug. 16, 2011, 4:35 p.m. UTC | #3
On Tue, Aug 16, 2011 at 4:28 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 15, 2011 at 6:58 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Mon, Aug 15, 2011 at 3:24 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Aug 12, 2011 at 4:03 AM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> Hi
>>>>
>>>> Here is a completed version of the vector comparison patch we
>>>> discussed a long time ago here:
>>>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01184.html
>>>>
>>>> The patch implements vector comparison according to the OpenCL
>>>> standard, when the result of the comparison of two vectors is vector
>>>> of signed integers, where -1 represents true and 0 false.
>>>>
>>>> The patch implements vector conditional res = VCOND<V1 ? V2 : V3>
>>>> which is expanded into:
>>>> foreach (i in length (V1)) res[i] = V1 == 0 ? V3[i] : V2[i].
>>>
>>> Some comments on the patch below.  First, in general I don't see
>>> why you need a new target hook to specify whether to "vectorize"
>>> a comparison.  Why are the existing hooks used by the vectorizer
>>> not enough?
>>>
>>> Index: gcc/fold-const.c
>>> ===================================================================
>>> --- gcc/fold-const.c    (revision 177665)
>>> +++ gcc/fold-const.c    (working copy)
>>> @@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
>>>      floating-point, we can only do some of these simplifications.)  */
>>>   if (operand_equal_p (arg0, arg1, 0))
>>>     {
>>> -      switch (code)
>>> +      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
>>>        {
>>> -       case EQ_EXPR:
>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>> -           return constant_boolean_node (1, type);
>>>
>>> I think this change should go in a separate patch for improved
>>> constant folding.  It shouldn't be necessary for enabling vector compares, no?
>>
>> Unfortunately no, this case must be covered here, otherwise x != x
>> condition fails.
>
> How does it fail?

When I have x > x, x == x, and so on, fold-const.c trigger
operand_equal_p (arg0, arg1, 0), which returns true, and then it calls
 constant_boolean_node (<val>, type). But the problem is that the
result of the comparison is a vector,  not a boolean. So we have an
assertion failure:
test.c: In function ‘foo’:
test.c:9:3: internal compiler error: in build_int_cst_wide, at tree.c:1222
Please submit a full bug report,
with preprocessed source if appropriate.

>
>>>
>>> +      if (TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (ifexp))))
>>> +        {
>>> +          error_at (colon_loc, "vector comparison must be of signed "
>>> +                              "integer vector type");
>>> +          return error_mark_node;
>>> +        }
>>>
>>> why that?
>>
>> Well, later on I rely on this fact. I mean OpenCL says that it should
>> return -1 in the sense that all bits set. I don't really know, I can
>> support unsigned masks as well, but wouldn't it just introduce a
>> source for possible errors. I mean that natural choice for true and
>> flase is 0 and 1, not 0 and -1. Anyway I don't have a strong opinion
>> there, and I could easily adjust it if we decide that we want it.
>
> I think we want to allow both signed and unsigned masks.

Ok, I'll adjust.

>
>>>
>>> +      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
>>> +          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
>>> +             != TYPE_VECTOR_SUBPARTS (type1))
>>> +        {
>>> +          error_at (colon_loc, "vectors of different length found in "
>>> +                               "vector comparison");
>>> +          return error_mark_node;
>>> +        }
>>>
>>> I miss verification that type1 and type2 are vector types, or is that done
>>> elsewhere?  I think type1 and type2 are already verified to be
>>> compatible (but you might double-check).  At least the above would be
>>> redundant with
>>
>> Thanks, type1 and type2 both vectors comparison is missing, going to
>> be added in the new version of the patch.
>>>
>>> +      if (type1 != type2)
>>> +        {
>>> +          error_at (colon_loc, "vectors of different types involved in "
>>> +                               "vector comparison");
>>> +          return error_mark_node;
>>> +        }
>>
>> You are right, what I meant here is TREE_TYPE (type1) != TREE_TYPE
>> (type2), because vector (4, int) have the same number of elements as
>> vector (4, float). This would be fixed in the new version.
>>
>>>
>>> Joseph may have comments about the fully-fold stuff that follows.
>>>
>>> +      /* Currently the expansion of VEC_COND_EXPR does not allow
>>> +        expessions where the type of vectors you compare differs
>>> +        form the type of vectors you select from. For the time
>>> +        being we insert implicit conversions.  */
>>> +      if ((COMPARISON_CLASS_P (ifexp)
>>>
>>> Why only for comparison-class?
>> Not only, there is || involved:
>> (COMPARISON_CLASS_P (ifexp)  && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
>> || TREE_TYPE (ifexp) != type1
>>
>> So if this is a comparison class, we check the first operand, because
>> the result of the comparison fits, however the operands could not. In
>> case we have an expression of signed vector, we know that we would
>> transform it into exp != {0,0,...} in tree-vect-generic.c, but if the
>> types of operands do not match we convert them.
>
> Hm, ok ... let's hope we can sort-out the backend issues before this
> patch goes in so we can remove this converting stuff.

Hm, I would hope that we could commit this patch even with this issue,
because my feeling is that this case would produce errors on all the
other architectures as well, as VEC_COND_EXPR is the feature heavily
used in auto-vectorizer. So it means that all the backends must be
fixed. And another argument, that this conversion is harmless.

So I really hope that someone could shed some light or help me with
this issue, but even if not I think that the current conversion is ok.
However, I don't have any architectures different from x86.

>
>>>
>>> +          && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
>>> +         || TREE_TYPE (ifexp) != type1)
>>> +       {
>>> +         tree comp_type = COMPARISON_CLASS_P (ifexp)
>>> +                          ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>>> +                          : TREE_TYPE (ifexp);
>>> +         tree vcond;
>>> +
>>> +         op1 = convert (comp_type, op1);
>>> +         op2 = convert (comp_type, op2);
>>> +         vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>>> +         vcond = convert (type1, vcond);
>>> +         return vcond;
>>> +       }
>>> +      else
>>> +       return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);
>>>
>>> In the end we of course will try to fix the middle-end/backends to
>>> allow mixed types here as the current restriction doesn't really make sense.
>>
>> Yes, that would be nice, but these conversions do not really affect
>> the code generation, so for the time being I think it is fine to have
>> them.
>>
>>>
>>>     case EQ_EXPR:
>>>     case NE_EXPR:
>>> +      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
>>> +        {
>>> +          tree intt;
>>> +          if (TREE_TYPE (type0) != TREE_TYPE (type1))
>>> +            {
>>> +              error_at (location, "comparing vectors with different "
>>> +                                  "element types");
>>> +              return error_mark_node;
>>> +            }
>>> +
>>> +          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
>>> +            {
>>> +              error_at (location, "comparing vectors with different "
>>> +                                  "number of elements");
>>> +              return error_mark_node;
>>> +            }
>>>
>>> as above - compatibility should already be ensured, thus type0 == type1
>>> here?
>>
>> Yeah, we know that they are both vector types, but that is about all
>> we know. Anyhow, all these errors are reachable. As an example see
>> vector-compare-1.c:
>> r4 = x > y;  /* { dg-error "comparing vectors with different element types" } */
>> r8 == r4; /* { dg-error "comparing vectors with different number of
>> elements"} */
>
> Ok, I see.
>
>>>
>>> +/* Try a hardware hook for vector comparison or
>>> +   extract comparison piecewise.  */
>>> +static tree
>>> +expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
>>> +                          tree op1, enum tree_code code)
>>>
>>> comments should mention and describe all function arguments.
>>
>> Ok, coming in the new version of the patch.
>>
>>> +/* Expand vector condition EXP which should have the form
>>> +   VEC_COND_EXPR<cond, vec0, vec1> into the following
>>> +   vector:
>>> +     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
>>> +   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
>>> +static tree
>>> +earlyexpand_vec_cond_expr (gimple_stmt_iterator *gsi, tree exp)
>>>
>>> that would be expand_vec_cond_expr_piecewise, no?
>>
>> Adjusted.
>>
>>>
>>> +  /* Ensure that we will be able to expand vector comparison
>>> +     in case it is not supported by the architecture.  */
>>> +  gcc_assert (COMPARISON_CLASS_P (cond));
>>>
>>> that looks dangerous to me - did you try
>>>
>>>  vec = v1 <= v2;
>>>  vec2 = vec ? v1 : v2;
>>>
>>> without optimization?
>>
>> Sure, tests should cover this case.
>> I have this assertion there because only two cases are possible:
>> 1) it is a comparison
>> 2) function callee converted expr to expr != {0,0,...}
>> So we should be perfectly fine.
>>
>>>
>>> +  /* Check if we need to expand vector condition inside of
>>> +     VEC_COND_EXPR.  */
>>> +  var = create_tmp_reg (TREE_TYPE (cond), "cond");
>>> +  new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
>>> +                                      TREE_OPERAND (cond, 0),
>>> +                                     TREE_OPERAND (cond, 1),
>>> +                                      TREE_CODE (cond));
>>>
>>> That unconditionally expands, so no need for "Check".
>>
>> Ok.
>>
>>>
>>> +  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
>>> +  v = VEC_alloc(constructor_elt, gc, nunits);
>>> +  for (i = 0; i < nunits;
>>> +       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
>>> +    {
>>> +      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
>>> +      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
>>> +      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
>>> +      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond,
>>> +                                    build_int_cst (inner_type ,0));
>>>
>>> I seriously doubt that when expanding this part piecewise expanding
>>> the mask first in either way is going to be beneficial.  Instead I would
>>> suggest to "inline" the comparison here.  Thus instead of
>>
>> Well, the ting is that, if expand_vector_comparison, would insert
>> builtin there rather than expanding the code piecewise, I'll have to
>> do the comparison with 0 anyway, because true is expressed as -1
>> there.
>>
>> Well, I would hope that in case we have:
>> c_0 = a_0 > b_0;
>> d_0 = c_0 != 0;
>>
>> {d_0, d_1,...}
>>
>> all the d_n should be constant-folded, or should I pull fold explicitly here?
>>
>> 1) I construct the mask
>>>
>>>  mask =
>>>         = { mask[0] != 0 ? ... }
>>>
>>> do
>>>
>>>          = { c0[0] < c1[0] ? ..., }
>>>
>>> or even expand the ? : using mask operations if we efficiently can
>>> create that mask.
>>>
>>
>> I assume that if we cannot expand VEC_COND_EXPR, then masking the
>> elements is a problem for us. Otherwise VEC_COND_EXPE expansion has a
>> bug somewhere. Or I am wrong somewhere?
>
> I think we can always do bitwise operations, so if we can get at the
> mask vector we are fine.
>
> I was thinking about how the case of explicitly computing the value of
> v1 < v2 into a vector vs. a condition inside a VEC_COND_EXPR should
> be handled.  If the target produces a mask of condition codes for
> a comparison then it might be able to efficiently expand a VEC_COND_EXPR.
> It could as well generate a mask via expanding v1 < v2 ? -1 : 0 then.
> A similar case is for AMD XOP which can expand mask ? v1 : v2
> with a single instruction (so even without seeing a comparison).
>
> Basically, if we can get at the mask we should use that to do the
> vector selection in parallel via (v1 & mask) | (v2 & ~mask).
>
> If we cannot even get at the mask then we can build the result
> vector piecewise as { v1[0] < v2[0] ? v1[0] : v2[0], .... } etc.

Ok, I am perfectly fine to construct (v1 & mask) | (v2 & ~mask), the
question is do  I need to check (v1 & mask) and (v2 & mask) or I can
just blindly insert it? The problem is that we have a single veclower
pass, so if I insert something that needs expansion, we would not have
the second chance to expand it again.

I'll adjust the patch.

>>>
>>> +  /* Check if VEC_COND_EXPR is supported in hardware within the
>>> +     given types.  */
>>> +  if (code == VEC_COND_EXPR)
>>> +    {
>>> +      tree exp = gimple_assign_rhs1 (stmt);
>>> +      tree cond = TREE_OPERAND (exp, 0);
>>> +
>>> +      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
>>> +        change it to A != {0,0,...} ? V0 : V1  */
>>> +      if (!COMPARISON_CLASS_P (cond))
>>> +       TREE_OPERAND (exp, 0) =
>>> +           build2 (NE_EXPR, TREE_TYPE (cond), cond,
>>> +                   build_vector_from_val (TREE_TYPE (cond),
>>> +                     build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
>>>
>>> That looks inefficient as well.  Iff we know that the mask is always
>>> either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using
>>> bitwise operations (see what the i?86 expander does, for example).
>>
>> This is a requirement of VEC_COND_EXPR, I need to pass 4 parameters,
>> not 3, that is why I introduce this fake {0,0,..} here.
>
> Sure, but if you look at expand_vec_cond_expr_p then you don't need
> that, and this fake comparison should instead be produced by the
> expander (or really avoided by maybe splitting up the named pattern
> into two).
>
> It's for sure not necessary for earlyexpand_vec_cond_expr (but instead
> makes it less efficient - with just the mask it can do the bitwise
> fallback easily).

Richard, let me give you an example:
#define vector(elcount, type)  \
__attribute__((vector_size((elcount)*sizeof(type)))) type

int
foo (vector (4, int) i0, vector (4, int) i1, int x)
{
  i0 = i0 ? i1 : i0;
  return i0[x];
}

when we optimize i0 ? i1 : i0, expand_vec_cond_expr_p  happily accepts
that and says that it can expand this expression. Now after the
veclowering is done, expand_vec_cond_expr calls vector_compare_rtx
(op0, unsignedp, icode), which has an assertion:
gcc_assert (COMPARISON_CLASS_P (cond));
and of course it fails.

So someone needs to insert != {0,0...} expression. I do it in the
tree-vect-geneic, but it could be done in expand_vec_cond_expr. The
question is where?

I can agree with you that I don't need to put this mask in case I
expand vcond piecewise, I will adjust that, but actually it does not
make much of a difference, in case expansion will use (v0 & mask) |
(v1 & ~mask).

Am I wrong somewhere?
>
>>>
>>> @@ -471,6 +603,7 @@ expand_vector_operations_1 (gimple_stmt_
>>>
>>>   gcc_assert (code != CONVERT_EXPR);
>>>
>>> +
>>>   /* The signedness is determined from input argument.  */
>>>   if (code == VEC_UNPACK_FLOAT_HI_EXPR
>>>       || code == VEC_UNPACK_FLOAT_LO_EXPR)
>>>
>>> spurious whitespace change.
>>
>> Fixed.
>>>
>>> Index: gcc/tree-cfg.c
>>> ===================================================================
>>> --- gcc/tree-cfg.c      (revision 177665)
>>> +++ gcc/tree-cfg.c      (working copy)
>>> @@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
>>>       return true;
>>>     }
>>>
>>> +  if (TREE_CODE (op0_type) == VECTOR_TYPE
>>> +      && TREE_CODE (op1_type) == VECTOR_TYPE
>>> +      && TREE_CODE (type) == VECTOR_TYPE)
>>> +    {
>>>
>>> this should check TREE_CODE (type) == VECTOR_TYPE only
>>> and then verify the comparison operands are actually vectors.
>>
>> Yes, you are right, adjusted.
>>
>>>
>>> +      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
>>> +        {
>>> +          error ("invalid vector comparison, vector element type mismatch");
>>> +          debug_generic_expr (op0_type);
>>> +          debug_generic_expr (op1_type);
>>> +          return true;
>>> +        }
>>>
>>> this needs to use code similar to the scalar variant,
>>>
>>>          !useless_type_conversion_p (op0_type, op1_type)
>>>          && !useless_type_conversion_p (op1_type, op0_type)
>>>
>>> which also makes the first TYPE_VECTOR_SUBPARTS redundant.
>>>
>>
>> Fixed.
>>
>>> +      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
>>> +          && TYPE_PRECISION (TREE_TYPE (op0_type))
>>> +             != TYPE_PRECISION (TREE_TYPE (type)))
>>> +        {
>>> +          error ("invalid vector comparison resulting type");
>>> +          debug_generic_expr (type);
>>> +          return true;
>>> +        }
>>>
>>> I think you can drop the TYPE_PRECISION check.  We might want to
>>> assert that a vector element types precision always matches its
>>> mode precision (in make_vector_type).
>>
>> I would leave it for a while. During the optimisation you can
>> construct some strange things, so I would better make verifier
>> resistant to the all kind of stuff.
>
> Ok.
>
>>>
>>> Index: gcc/c-parser.c
>>> ===================================================================
>>> --- gcc/c-parser.c      (revision 177665)
>>> +++ gcc/c-parser.c      (working copy)
>>> @@ -5337,8 +5337,17 @@ c_parser_conditional_expression (c_parse
>>>   if (c_parser_next_token_is (parser, CPP_COLON))
>>>     {
>>>       tree eptype = NULL_TREE;
>>> -
>>> +
>>>       middle_loc = c_parser_peek_token (parser)->location;
>>> +
>>> +      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
>>>
>>> watch out for whitespace changes - you add a trailing tab here.
>>
>> Fixed.
>>
>>>
>>> +/* Find target specific sequence for vector comparison of
>>> +   real-type vectors V0 and V1. Returns variable containing
>>> +   result of the comparison or NULL_TREE in other case.  */
>>> +static tree
>>> +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype,
>>> +                   enum machine_mode mode, tree v0, tree v1,
>>> +                   enum tree_code code)
>>> +{
>>> +  enum ix86_builtins fcode;
>>>
>>> is there a reason we need this and cannot simply provide expanders
>>> for the named patterns?  We'd need to give them semantics of
>>> producing all-ones / all-zero masks of course.  Richard, do you think
>>> that's sensible?  That way we'd avoid the new target hook and could
>>> simply do optab queries.
>>
>> I think I don't really understand the idea. How we are going to
>> represent the fact that we need to convert a given node to the given
>> machine instruction? May be you could point where the similar
>> technique is already used.
>
> In all places we check optab_handler (op, mode) != CODE_FOR_nothing.
> We have eq_optab for example, so optab_handler (eq_optab, V4SImode)
> would get you the instruction sequence for a comparison of V4SImode
> vectors.  That isn't yet properly defined what it should return.
>
> Otherwise I'd say we should ask the target to expand
> v1 < v2 as VEC_COND_EXPR (v1 < v2, -1, 0) instead.  That one could
> as well special-case the -1 and 0 result vectors (and maybe it already
> does).

Ok, I can adjust the optab  checking for the mode, but I recall that
we introduced the hook exactly because optabs did not return anything
sensible. It was your idea :)

Also, I don't like the idea to expand any comparison  to VEC_COND_EXPR
(v1 < v2, -1, 0). Look at expand_vec_cond_expr, it would do the job
only if there is an instruction vcond in the architecture, it checks
for direct_optab_handler (vcond_optab, mode). But it is not
necessarily the case that using vcond is as efficient as using
comparison instructions. Also, we could run into the situation when
vcond is not supported, but comparison is, or can't we?

Anyhow, I would think that we want to keep vcond and comparison separately.


Artem.

>
> Richard.
>>
>> The new patch will be tested and submitted here soon.
>>
>>
>> Thanks,
>> Artem.
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> ChangeLog
>>>>
>>>> 2011-08-12 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>>>
>>>>       gcc/
>>>>       * targhooks.c (default_builtin_vec_compare): New hook.
>>>>       * targhooks.h (default_builtin_vec_compare): New definition.
>>>>       * target.def (builtin_vec_compare): New hook.
>>>>       * target.h: New include (gimple.h).
>>>>       * fold-const.c
>>>>       (fold_comparison): Adjust x <cmp> x vector operations.
>>>>       * c-typeck.c (build_binary_op): Allow vector comparison.
>>>>       (c_obj_common_truthvalue_conversion): Deny vector comparison
>>>>       inside of if statement.
>>>>       (build_conditional_expr): Adjust to build VEC_COND_EXPR.
>>>>       * tree-vect-generic.c (do_compare): Helper function.
>>>>       (expand_vector_comparison): Check if hardware comparison
>>>>       is available, if not expand comparison piecewise.
>>>>       (expand_vector_operation): Handle vector comparison
>>>>       expressions separately.
>>>>       (earlyexpand_vec_cond_expr): Expand vector comparison
>>>>       piecewise.
>>>>       * Makefile.in: New dependencies.
>>>>       * tree-cfg.c (verify_gimple_comparison): Allow vector
>>>>       comparison operations in gimple.
>>>>       * c-parser.c (c_parser_conditional_expression): Adjust
>>>>       to handle VEC_COND_EXPR.
>>>>       * gimplify.c (gimplify_expr): Adjust to handle VEC_COND_EXPR.
>>>>       * config/i386/i386.c (vector_fp_compare): Build hardware
>>>>       specific code for floating point vector comparison.
>>>>       (vector_int_compare): Build hardware specific code for
>>>>       integer vector comparison.
>>>>       (ix86_vectorize_builtin_vec_compare): Implementation of
>>>>       builtin_vec_compare hook.
>>>>
>>>>       gcc/testsuite/
>>>>       * gcc.c-torture/execute/vector-vcond-1.c: New test.
>>>>       * gcc.c-torture/execute/vector-vcond-2.c: New test.
>>>>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>>>>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>>>>       * gcc.dg/vector-compare-1.c: New test.
>>>>       * gcc.dg/vector-compare-2.c: New test.
>>>>
>>>>       gcc/doc
>>>>       * extend.texi: Adjust.
>>>>       * tm.texi: Adjust.
>>>>       * tm.texi.in: Adjust.
>>>>
>>>>
>>>> bootstrapped and tested on x86_64_unknown-linux.
>>>>
>>>>
>>>> Thanks,
>>>> Artem Shinkarov.
>>>>
>>>
>>
>
Richard Biener Aug. 17, 2011, 9:49 a.m. UTC | #4
On Tue, Aug 16, 2011 at 6:35 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 16, 2011 at 4:28 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>>>> Index: gcc/fold-const.c
>>>> ===================================================================
>>>> --- gcc/fold-const.c    (revision 177665)
>>>> +++ gcc/fold-const.c    (working copy)
>>>> @@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
>>>>      floating-point, we can only do some of these simplifications.)  */
>>>>   if (operand_equal_p (arg0, arg1, 0))
>>>>     {
>>>> -      switch (code)
>>>> +      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
>>>>        {
>>>> -       case EQ_EXPR:
>>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>>> -           return constant_boolean_node (1, type);
>>>>
>>>> I think this change should go in a separate patch for improved
>>>> constant folding.  It shouldn't be necessary for enabling vector compares, no?
>>>
>>> Unfortunately no, this case must be covered here, otherwise x != x
>>> condition fails.
>>
>> How does it fail?
>
> When I have x > x, x == x, and so on, fold-const.c trigger
> operand_equal_p (arg0, arg1, 0), which returns true, and then it calls
>  constant_boolean_node (<val>, type). But the problem is that the
> result of the comparison is a vector,  not a boolean. So we have an
> assertion failure:
> test.c: In function ‘foo’:
> test.c:9:3: internal compiler error: in build_int_cst_wide, at tree.c:1222
> Please submit a full bug report,
> with preprocessed source if appropriate.

Ok, so we have to either avoid folding it (which would be a shame), or
define how true / false look like for vector typed comparison results.

The documentation above the tree code defintions for comparisons in
tree.def needs updating then, with something like

  and the value is either the type used by the language for booleans
  or an integer vector type of the same size and with the same number
  of elements as the comparison operands.  True for a vector of
  comparison results has all bits set while false is equal to zero.

or some better wording.

Then changing constant_boolean_node to return a proper true/false
vector would be the fix for your problem.

>>>> +      /* Currently the expansion of VEC_COND_EXPR does not allow
>>>> +        expessions where the type of vectors you compare differs
>>>> +        form the type of vectors you select from. For the time
>>>> +        being we insert implicit conversions.  */
>>>> +      if ((COMPARISON_CLASS_P (ifexp)
>>>>
>>>> Why only for comparison-class?
>>> Not only, there is || involved:
>>> (COMPARISON_CLASS_P (ifexp)  && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
>>> || TREE_TYPE (ifexp) != type1
>>>
>>> So if this is a comparison class, we check the first operand, because
>>> the result of the comparison fits, however the operands could not. In
>>> case we have an expression of signed vector, we know that we would
>>> transform it into exp != {0,0,...} in tree-vect-generic.c, but if the
>>> types of operands do not match we convert them.
>>
>> Hm, ok ... let's hope we can sort-out the backend issues before this
>> patch goes in so we can remove this converting stuff.
>
> Hm, I would hope that we could commit this patch even with this issue,
> because my feeling is that this case would produce errors on all the
> other architectures as well, as VEC_COND_EXPR is the feature heavily
> used in auto-vectorizer. So it means that all the backends must be
> fixed. And another argument, that this conversion is harmless.

It shouldn't be hard to fix all the backends.  And if we don't do it now
it will never happen.  I would expect that the codegen part of the
backends doesn't need any adjustments, just the patterns that
match what is supported.

Uros, can you convert x86 as an example?  Thus, for

(define_expand "vcond<mode>"
  [(set (match_operand:VF 0 "register_operand" "")
        (if_then_else:VF
          (match_operator 3 ""
            [(match_operand:VF 4 "nonimmediate_operand" "")
             (match_operand:VF 5 "nonimmediate_operand" "")])
          (match_operand:VF 1 "general_operand" "")
          (match_operand:VF 2 "general_operand" "")))]
  "TARGET_SSE"
{
  bool ok = ix86_expand_fp_vcond (operands);
  gcc_assert (ok);

allow any vector mode of the same size (and same number of elements?)
for the vcond mode and operand 1 and 2?  Thus, only restrict the
embedded comparison to VF?

> So I really hope that someone could shed some light or help me with
> this issue, but even if not I think that the current conversion is ok.
> However, I don't have any architectures different from x86.
[...]
>>>>
>>>> +  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
>>>> +  v = VEC_alloc(constructor_elt, gc, nunits);
>>>> +  for (i = 0; i < nunits;
>>>> +       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
>>>> +    {
>>>> +      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
>>>> +      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
>>>> +      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
>>>> +      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond,
>>>> +                                    build_int_cst (inner_type ,0));
>>>>
>>>> I seriously doubt that when expanding this part piecewise expanding
>>>> the mask first in either way is going to be beneficial.  Instead I would
>>>> suggest to "inline" the comparison here.  Thus instead of
>>>
>>> Well, the ting is that, if expand_vector_comparison, would insert
>>> builtin there rather than expanding the code piecewise, I'll have to
>>> do the comparison with 0 anyway, because true is expressed as -1
>>> there.
>>>
>>> Well, I would hope that in case we have:
>>> c_0 = a_0 > b_0;
>>> d_0 = c_0 != 0;
>>>
>>> {d_0, d_1,...}
>>>
>>> all the d_n should be constant-folded, or should I pull fold explicitly here?
>>>
>>> 1) I construct the mask
>>>>
>>>>  mask =
>>>>         = { mask[0] != 0 ? ... }
>>>>
>>>> do
>>>>
>>>>          = { c0[0] < c1[0] ? ..., }
>>>>
>>>> or even expand the ? : using mask operations if we efficiently can
>>>> create that mask.
>>>>
>>>
>>> I assume that if we cannot expand VEC_COND_EXPR, then masking the
>>> elements is a problem for us. Otherwise VEC_COND_EXPE expansion has a
>>> bug somewhere. Or I am wrong somewhere?
>>
>> I think we can always do bitwise operations, so if we can get at the
>> mask vector we are fine.
>>
>> I was thinking about how the case of explicitly computing the value of
>> v1 < v2 into a vector vs. a condition inside a VEC_COND_EXPR should
>> be handled.  If the target produces a mask of condition codes for
>> a comparison then it might be able to efficiently expand a VEC_COND_EXPR.
>> It could as well generate a mask via expanding v1 < v2 ? -1 : 0 then.
>> A similar case is for AMD XOP which can expand mask ? v1 : v2
>> with a single instruction (so even without seeing a comparison).
>>
>> Basically, if we can get at the mask we should use that to do the
>> vector selection in parallel via (v1 & mask) | (v2 & ~mask).
>>
>> If we cannot even get at the mask then we can build the result
>> vector piecewise as { v1[0] < v2[0] ? v1[0] : v2[0], .... } etc.
>
> Ok, I am perfectly fine to construct (v1 & mask) | (v2 & ~mask), the
> question is do  I need to check (v1 & mask) and (v2 & mask) or I can
> just blindly insert it? The problem is that we have a single veclower
> pass, so if I insert something that needs expansion, we would not have
> the second chance to expand it again.

I think you need to re-lower them, thus, insert a stmt and then immediately
lower it.

> I'll adjust the patch.
>
>>>>
>>>> +  /* Check if VEC_COND_EXPR is supported in hardware within the
>>>> +     given types.  */
>>>> +  if (code == VEC_COND_EXPR)
>>>> +    {
>>>> +      tree exp = gimple_assign_rhs1 (stmt);
>>>> +      tree cond = TREE_OPERAND (exp, 0);
>>>> +
>>>> +      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
>>>> +        change it to A != {0,0,...} ? V0 : V1  */
>>>> +      if (!COMPARISON_CLASS_P (cond))
>>>> +       TREE_OPERAND (exp, 0) =
>>>> +           build2 (NE_EXPR, TREE_TYPE (cond), cond,
>>>> +                   build_vector_from_val (TREE_TYPE (cond),
>>>> +                     build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
>>>>
>>>> That looks inefficient as well.  Iff we know that the mask is always
>>>> either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using
>>>> bitwise operations (see what the i?86 expander does, for example).
>>>
>>> This is a requirement of VEC_COND_EXPR, I need to pass 4 parameters,
>>> not 3, that is why I introduce this fake {0,0,..} here.
>>
>> Sure, but if you look at expand_vec_cond_expr_p then you don't need
>> that, and this fake comparison should instead be produced by the
>> expander (or really avoided by maybe splitting up the named pattern
>> into two).
>>
>> It's for sure not necessary for earlyexpand_vec_cond_expr (but instead
>> makes it less efficient - with just the mask it can do the bitwise
>> fallback easily).
>
> Richard, let me give you an example:
> #define vector(elcount, type)  \
> __attribute__((vector_size((elcount)*sizeof(type)))) type
>
> int
> foo (vector (4, int) i0, vector (4, int) i1, int x)
> {
>  i0 = i0 ? i1 : i0;
>  return i0[x];
> }
>
> when we optimize i0 ? i1 : i0, expand_vec_cond_expr_p  happily accepts
> that and says that it can expand this expression. Now after the
> veclowering is done, expand_vec_cond_expr calls vector_compare_rtx
> (op0, unsignedp, icode), which has an assertion:
> gcc_assert (COMPARISON_CLASS_P (cond));
> and of course it fails.

Yes, it's totally non-robust ;)  Specifically tailored for the
vectorizer so far.

> So someone needs to insert != {0,0...} expression. I do it in the
> tree-vect-geneic, but it could be done in expand_vec_cond_expr. The
> question is where?

For this obvious case in expand_vec_cond_expr.  It should handle
what expand_vec_cond_expr_p claims to handle ;)

> I can agree with you that I don't need to put this mask in case I
> expand vcond piecewise, I will adjust that, but actually it does not
> make much of a difference, in case expansion will use (v0 & mask) |
> (v1 & ~mask).
>
> Am I wrong somewhere?

Just in the place that should need fixing.

>>>> +/* Find target specific sequence for vector comparison of
>>>> +   real-type vectors V0 and V1. Returns variable containing
>>>> +   result of the comparison or NULL_TREE in other case.  */
>>>> +static tree
>>>> +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype,
>>>> +                   enum machine_mode mode, tree v0, tree v1,
>>>> +                   enum tree_code code)
>>>> +{
>>>> +  enum ix86_builtins fcode;
>>>>
>>>> is there a reason we need this and cannot simply provide expanders
>>>> for the named patterns?  We'd need to give them semantics of
>>>> producing all-ones / all-zero masks of course.  Richard, do you think
>>>> that's sensible?  That way we'd avoid the new target hook and could
>>>> simply do optab queries.
>>>
>>> I think I don't really understand the idea. How we are going to
>>> represent the fact that we need to convert a given node to the given
>>> machine instruction? May be you could point where the similar
>>> technique is already used.
>>
>> In all places we check optab_handler (op, mode) != CODE_FOR_nothing.
>> We have eq_optab for example, so optab_handler (eq_optab, V4SImode)
>> would get you the instruction sequence for a comparison of V4SImode
>> vectors.  That isn't yet properly defined what it should return.
>>
>> Otherwise I'd say we should ask the target to expand
>> v1 < v2 as VEC_COND_EXPR (v1 < v2, -1, 0) instead.  That one could
>> as well special-case the -1 and 0 result vectors (and maybe it already
>> does).
>
> Ok, I can adjust the optab  checking for the mode, but I recall that
> we introduced the hook exactly because optabs did not return anything
> sensible. It was your idea :)

Heh, I don't remember ...

Still it's probably easiest (for now) to handle

 vec = v1 < v2;

the same as we would handle

 vec = v1 < v2 ? {-1,...} : {0,...};

during lowering (and even for expanding).  I tried to convince the vectorizer
to create a vectorized stand-alone comparison but failed, so it's probably
un-tested territory anyway.

At least the above would reduce the patch size considerably.

> Also, I don't like the idea to expand any comparison  to VEC_COND_EXPR
> (v1 < v2, -1, 0). Look at expand_vec_cond_expr, it would do the job
> only if there is an instruction vcond in the architecture, it checks
> for direct_optab_handler (vcond_optab, mode). But it is not
> necessarily the case that using vcond is as efficient as using
> comparison instructions. Also, we could run into the situation when
> vcond is not supported, but comparison is, or can't we?

That's unlikely as the vcond pattern is required by the vectorizer and
all targets implement it if they can handle the comparison part.  The
rest is just emulated by the target using the bitwise operation trick.

> Anyhow, I would think that we want to keep vcond and comparison separately.

Sure, but I think as the vcond case is in place already we can optimize
the comparison case separately if needed.  I would expect that the
code generated with your patch for

v = v1 < v2;
v = v1 < v2 ? {-1,...} : {0,...};

should be the same?

Thanks,
Richard.
Uros Bizjak Aug. 20, 2011, 9:08 a.m. UTC | #5
On Wed, Aug 17, 2011 at 11:49 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:

>>> Hm, ok ... let's hope we can sort-out the backend issues before this
>>> patch goes in so we can remove this converting stuff.
>>
>> Hm, I would hope that we could commit this patch even with this issue,
>> because my feeling is that this case would produce errors on all the
>> other architectures as well, as VEC_COND_EXPR is the feature heavily
>> used in auto-vectorizer. So it means that all the backends must be
>> fixed. And another argument, that this conversion is harmless.
>
> It shouldn't be hard to fix all the backends.  And if we don't do it now
> it will never happen.  I would expect that the codegen part of the
> backends doesn't need any adjustments, just the patterns that
> match what is supported.
>
> Uros, can you convert x86 as an example?  Thus, for
>
> (define_expand "vcond<mode>"
>  [(set (match_operand:VF 0 "register_operand" "")
>        (if_then_else:VF
>          (match_operator 3 ""
>            [(match_operand:VF 4 "nonimmediate_operand" "")
>             (match_operand:VF 5 "nonimmediate_operand" "")])
>          (match_operand:VF 1 "general_operand" "")
>          (match_operand:VF 2 "general_operand" "")))]
>  "TARGET_SSE"
> {
>  bool ok = ix86_expand_fp_vcond (operands);
>  gcc_assert (ok);

> allow any vector mode of the same size (and same number of elements?)
> for the vcond mode and operand 1 and 2?  Thus, only restrict the
> embedded comparison to VF?

I am a bit late to this discussion, but I see no problem for the
backend to relax this restriction. I will look into it.

Uros.
diff mbox

Patch

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c    (revision 177665)
+++ gcc/fold-const.c    (working copy)
@@ -9073,34 +9073,61 @@  fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
-      switch (code)
+      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
        {
-       case EQ_EXPR:
-         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-           return constant_boolean_node (1, type);

I think this change should go in a separate patch for improved
constant folding.  It shouldn't be necessary for enabling vector compares, no?

+      if (TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (ifexp))))
+        {
+          error_at (colon_loc, "vector comparison must be of signed "
+                              "integer vector type");
+          return error_mark_node;
+        }

why that?

+      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
+          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
+             != TYPE_VECTOR_SUBPARTS (type1))
+        {
+          error_at (colon_loc, "vectors of different length found in "
+                               "vector comparison");
+          return error_mark_node;
+        }

I miss verification that type1 and type2 are vector types, or is that done
elsewhere?  I think type1 and type2 are already verified to be
compatible (but you might double-check).  At least the above would be
redundant with

+      if (type1 != type2)
+        {
+          error_at (colon_loc, "vectors of different types involved in "
+                               "vector comparison");
+          return error_mark_node;
+        }

Joseph may have comments about the fully-fold stuff that follows.

+      /* Currently the expansion of VEC_COND_EXPR does not allow
+        expessions where the type of vectors you compare differs
+        form the type of vectors you select from. For the time
+        being we insert implicit conversions.  */
+      if ((COMPARISON_CLASS_P (ifexp)

Why only for comparison-class?

+          && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
+         || TREE_TYPE (ifexp) != type1)
+       {
+         tree comp_type = COMPARISON_CLASS_P (ifexp)
+                          ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
+                          : TREE_TYPE (ifexp);
+         tree vcond;
+
+         op1 = convert (comp_type, op1);
+         op2 = convert (comp_type, op2);
+         vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+         vcond = convert (type1, vcond);
+         return vcond;
+       }
+      else
+       return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);

In the end we of course will try to fix the middle-end/backends to
allow mixed types here as the current restriction doesn't really make sense.

     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }

as above - compatibility should already be ensured, thus type0 == type1
here?

+/* Try a hardware hook for vector comparison or
+   extract comparison piecewise.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)

comments should mention and describe all function arguments.

+/* Expand vector condition EXP which should have the form
+   VEC_COND_EXPR<cond, vec0, vec1> into the following
+   vector:
+     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
+   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
+static tree
+earlyexpand_vec_cond_expr (gimple_stmt_iterator *gsi, tree exp)

that would be expand_vec_cond_expr_piecewise, no?

+  /* Ensure that we will be able to expand vector comparison
+     in case it is not supported by the architecture.  */
+  gcc_assert (COMPARISON_CLASS_P (cond));

that looks dangerous to me - did you try

 vec = v1 <= v2;
 vec2 = vec ? v1 : v2;

without optimization?

+  /* Check if we need to expand vector condition inside of
+     VEC_COND_EXPR.  */
+  var = create_tmp_reg (TREE_TYPE (cond), "cond");
+  new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
+                                      TREE_OPERAND (cond, 0),
+                                     TREE_OPERAND (cond, 1),
+                                      TREE_CODE (cond));

That unconditionally expands, so no need for "Check".

+  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
+  v = VEC_alloc(constructor_elt, gc, nunits);
+  for (i = 0; i < nunits;
+       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
+    {
+      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
+      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
+      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
+      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond,
+                                    build_int_cst (inner_type ,0));

I seriously doubt that when expanding this part piecewise expanding
the mask first in either way is going to be beneficial.  Instead I would
suggest to "inline" the comparison here.  Thus instead of

 mask =
         = { mask[0] != 0 ? ... }

do

          = { c0[0] < c1[0] ? ..., }

or even expand the ? : using mask operations if we efficiently can
create that mask.


+  /* Check if VEC_COND_EXPR is supported in hardware within the
+     given types.  */
+  if (code == VEC_COND_EXPR)
+    {
+      tree exp = gimple_assign_rhs1 (stmt);
+      tree cond = TREE_OPERAND (exp, 0);
+
+      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
+        change it to A != {0,0,...} ? V0 : V1  */
+      if (!COMPARISON_CLASS_P (cond))
+       TREE_OPERAND (exp, 0) =
+           build2 (NE_EXPR, TREE_TYPE (cond), cond,
+                   build_vector_from_val (TREE_TYPE (cond),
+                     build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));

That looks inefficient as well.  Iff we know that the mask is always
either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using
bitwise operations (see what the i?86 expander does, for example).

@@ -471,6 +603,7 @@  expand_vector_operations_1 (gimple_stmt_

   gcc_assert (code != CONVERT_EXPR);

+
   /* The signedness is determined from input argument.  */
   if (code == VEC_UNPACK_FLOAT_HI_EXPR
       || code == VEC_UNPACK_FLOAT_LO_EXPR)

spurious whitespace change.

Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c      (revision 177665)
+++ gcc/tree-cfg.c      (working copy)
@@ -3191,6 +3191,38 @@  verify_gimple_comparison (tree type, tre
       return true;
     }

+  if (TREE_CODE (op0_type) == VECTOR_TYPE
+      && TREE_CODE (op1_type) == VECTOR_TYPE
+      && TREE_CODE (type) == VECTOR_TYPE)
+    {

this should check TREE_CODE (type) == VECTOR_TYPE only
and then verify the comparison operands are actually vectors.

+      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
+        {
+          error ("invalid vector comparison, vector element type mismatch");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }

this needs to use code similar to the scalar variant,

          !useless_type_conversion_p (op0_type, op1_type)
          && !useless_type_conversion_p (op1_type, op0_type)

which also makes the first TYPE_VECTOR_SUBPARTS redundant.

+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type))
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }

I think you can drop the TYPE_PRECISION check.  We might want to
assert that a vector element types precision always matches its
mode precision (in make_vector_type).

Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c      (revision 177665)
+++ gcc/c-parser.c      (working copy)
@@ -5337,8 +5337,17 @@  c_parser_conditional_expression (c_parse
   if (c_parser_next_token_is (parser, CPP_COLON))
     {
       tree eptype = NULL_TREE;
-
+
       middle_loc = c_parser_peek_token (parser)->location;
+
+      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)

watch out for whitespace changes - you add a trailing tab here.

+/* Find target specific sequence for vector comparison of
+   real-type vectors V0 and V1. Returns variable containing
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype,
+                   enum machine_mode mode, tree v0, tree v1,
+                   enum tree_code code)