Patchwork Vector shuffling

login
register
mail settings
Submitter Richard Guenther
Date Aug. 30, 2011, 1:03 p.m.
Message ID <CAFiYyc075Bxczhy606-AGj_m414fCv6GHhWwuoA87+BA7e57mQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/112282/
State New
Headers show

Comments

Richard Guenther - Aug. 30, 2011, 1:03 p.m.
On Tue, Aug 30, 2011 at 4:31 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Hi
>
> This is a patch for the explicit vector shuffling we have discussed a
> long time ago here:
> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01092.html
>
> The new patch introduces the new tree code, as we agreed, and expands
> this code by checking the vshuffle pattern in the backend.
>
> The patch at the moment lacks of some examples, but mainly it works
> fine for me. It would be nice if i386 gurus could look into the way I
> am doing the expansion.
>
> Middle-end parts seems to be more or less fine, they have not changed
> much from the previous time.

+@code{__builtin_shuffle (vec, mask)} and
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct

the latter would be __builtin_shuffle2.


+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0,
+			   tree v1, tree mask)
+{
+#define inner_type_size(vec) \
+  GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (vec))))

missing comment.  No #defines like this please, just initialize
two temporary variables.

+
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{

comment.

+vshuffle:
+  gcc_assert (v1 == v0);
+
+  icode = direct_optab_handler (vshuffle_optab, mode);

hmm, so we don't have a vshuffle2 optab but always go via the
builtin function, but only for constant masks there?  I wonder
if we should arrange for targets to only support a vshuffle
optab (thus, transition away from the builtin) and so
unconditionally have a vshuffle2 optab only (with possibly
equivalent v1 and v0?)

I suppose Richard might remember what he had in mind back
when we discussed this.

not as builtin but like __builtin_complex and we'd just deal with
VEC_SHUFFLE_EXPR throughout the middle-end, eventually
lowering it in tree-vect-generic.c.  So I didn't look at the lowering
code in detail because that would obviously change then.

Defering to Joseph for a decision here and to x86 maintainers for
the target specific bits.

Thanks,
Richard.

> ChangeLog:
> 2011-08-30 Artjoms Sinkarovs <artyom.shinkaroff@gmailc.com>
>
>        gcc/
>        * optabs.c (expand_vec_shuffle_expr_p): New function. Checks
>        if given expression can be expanded by the target.
>        (expand_vec_shuffle_expr): New function. Expand VEC_SHUFFLE_EXPR
>        using target vector instructions.
>        * optabs.h: New optab vshuffle.
>        (expand_vec_shuffle_expr_p): New prototype.
>        (expand_vec_shuffle_expr): New prototype.
>        * genopinit.c: Adjust to support vecshuffle.
>        * builtins.def: New builtin __builtin_shuffle.
>        * c-typeck.c (build_function_call_vec): Typecheck
>        __builtin_shuffle, allowing only two or three arguments.
>        Change the type of builtin depending on the arguments.
>        (digest_init): Warn when constructor has less elements than
>        vector type.
>        * gimplify.c (gimplify_exp): Adjusted to support VEC_SHUFFLE_EXPR.
>        * tree.def: New tree code VEC_SHUFFLE_EXPR.
>        * tree-vect-generic.c (vector_element): New function. Returns an
>        element of the vector at the given position.
>        (lower_builtin_shuffle): Change builtin_shuffle with VEC_SHUFLLE_EXPR
>        or expand an expression piecewise.
>        (expand_vector_operations_1): Adjusted.
>        (gate_expand_vector_operations_noop): New gate function.
>        * gimple.c (get_gimple_rhs_num_ops): Adjust.
>        * passes.c: Move veclower down.
>        * tree-pretty-print.c (dump_generic_node): Recognize
>        VEC_SHUFFLE_EXPR as valid expression.
>        * tree-ssa-operands: Adjust.
>
>        gcc/config/i386
>        * sse.md: (sseshuffint) New mode_attr. Correspondence between the
>        vector and the type of the mask when shuffling.
>        (vecshuffle<mode>): New expansion.
>        * i386-protos.h (ix86_expand_vshuffle): New prototype.
>        * i386.c (ix86_expand_vshuffle): Expand vshuffle using pshufb.
>        (ix86_vectorize_builtin_vec_perm_ok): Adjust.
>
>        gcc/doc
>        * extend.texi: Adjust.
>
>        gcc/testsuite
>        * gcc.c-torture/execute/vect-shuffle-2.c: New test.
>        * gcc.c-torture/execute/vect-shuffle-4.c: New test.
>        * gcc.c-torture/execute/vect-shuffle-1.c: New test.
>        * gcc.c-torture/execute/vect-shuffle-3.c: New test.
>
> bootstrapped on x86_64-unknown-linux-gnu. The AVX parts are not
> tested, because I don't have actual hardware. It works with -mavx, the
> assembler code looks fine to me. I'll test it on a real hardware in
> couple of days.
>
>
>
> Thanks,
> Artem Shinkarov.
>
Joseph S. Myers - Aug. 30, 2011, 4:53 p.m.
On Tue, 30 Aug 2011, Richard Guenther wrote:

> oh, hum - now I remember ;)  Eventually the C frontend should handle
> this not via the function call mechanism but similar to how Joseph
> added __builtin_complex support with
> 
> 2011-08-19  Joseph Myers  <joseph@codesourcery.com>
> 
>         * c-parser.c (c_parser_postfix_expression): Handle RID_BUILTIN_COMPLEX.
>         * doc/extend.texi (__builtin_complex): Document.
> 
> and then emit VEC_SHUFFLE_EXPRs directly from the frontend.  Joseph?

It's probably time to refactor the parsing code before adding yet another 
pseudo-builtin.  Considering just those all of whose operands are 
expressions (there are more where types are involved), we have 
__builtin_complex (two operands) and __builtin_choose_expr (three 
operands).  How about a helper that parses a parenthesized list of 
expressions (using c_parser_expr_list, disabling all folding and 
conversions), gives an error if the number of expressions is wrong, then 
returns an error status and the list?  Pass the keyword to this function 
and it can give a "wrong number of arguments" error that says which 
pseudo-builtin is involved, rather than less friendly parse errors - so 
these things would act a bit more like built-in functions while still 
being purely front-end syntax for GENERIC and GIMPLE operations.  Then 
c_parser_postfix_expression would only have the code that deals with 
semantics, without duplicating the generic code for parsing lists.
Artem Shinkarov - Aug. 30, 2011, 5:01 p.m.
On Tue, Aug 30, 2011 at 2:03 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 30, 2011 at 4:31 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> Hi
>>
>> This is a patch for the explicit vector shuffling we have discussed a
>> long time ago here:
>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01092.html
>>
>> The new patch introduces the new tree code, as we agreed, and expands
>> this code by checking the vshuffle pattern in the backend.
>>
>> The patch at the moment lacks of some examples, but mainly it works
>> fine for me. It would be nice if i386 gurus could look into the way I
>> am doing the expansion.
>>
>> Middle-end parts seems to be more or less fine, they have not changed
>> much from the previous time.
>
> +@code{__builtin_shuffle (vec, mask)} and
> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>
> the latter would be __builtin_shuffle2.

Why??
That was the syntax we agreed on that elegantly handles both cases in one place.

> +bool
> +expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0,
> +                          tree v1, tree mask)
> +{
> +#define inner_type_size(vec) \
> +  GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (vec))))
>
> missing comment.  No #defines like this please, just initialize
> two temporary variables.
>
> +
> +rtx
> +expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
> +{
>
> comment.
>
> +vshuffle:
> +  gcc_assert (v1 == v0);
> +
> +  icode = direct_optab_handler (vshuffle_optab, mode);
>
> hmm, so we don't have a vshuffle2 optab but always go via the
> builtin function, but only for constant masks there?  I wonder
> if we should arrange for targets to only support a vshuffle
> optab (thus, transition away from the builtin) and so
> unconditionally have a vshuffle2 optab only (with possibly
> equivalent v1 and v0?)

I have only implemented the case with non-constant mask that supports
only one argument. I think that it would be enough for the first
version. Later we can introduce vshuffle2 pattern and reuse the code
that expands vshuffle at the moment.

> I suppose Richard might remember what he had in mind back
> when we discussed this.
>
> Index: gcc/c-typeck.c
> ===================================================================
> --- gcc/c-typeck.c      (revision 177758)
> +++ gcc/c-typeck.c      (working copy)
> @@ -2815,6 +2815,68 @@ build_function_call_vec (location_t loc,
>       && !check_builtin_function_arguments (fundecl, nargs, argarray))
>     return error_mark_node;
>
> +  /* Typecheck a builtin function which is declared with variable
> +     argument list.  */
> +  if (fundecl && DECL_BUILT_IN (fundecl)
> +      && DECL_BUILT_IN_CLASS (fundecl) == BUILT_IN_NORMAL)
>
> just add to check_builtin_function_arguments which is called right
> in front of your added code.
>
> +          /* Here we change the return type of the builtin function
> +             from int f(...) --> t f(...) where t is a type of the
> +             first argument.  */
> +          fundecl = copy_node (fundecl);
> +          TREE_TYPE (fundecl) = build_function_type (TREE_TYPE (firstarg),
> +                                        TYPE_ARG_TYPES (TREE_TYPE (fundecl)));
> +          function = build_fold_addr_expr (fundecl);
>
> oh, hum - now I remember ;)  Eventually the C frontend should handle
> this not via the function call mechanism but similar to how Joseph
> added __builtin_complex support with
>
> 2011-08-19  Joseph Myers  <joseph@codesourcery.com>
>
>        * c-parser.c (c_parser_postfix_expression): Handle RID_BUILTIN_COMPLEX.
>        * doc/extend.texi (__builtin_complex): Document.
>
> and then emit VEC_SHUFFLE_EXPRs directly from the frontend.  Joseph?
>
>          FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
> -           if (!CONSTANT_CLASS_P (value))
> +         if (!CONSTANT_CLASS_P (value))
>
> watch out for spurious whitespace changes.
>
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c      (revision 177758)
> +++ gcc/gimplify.c      (working copy)
> @@ -7050,6 +7050,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>          break;
>
>        case BIT_FIELD_REF:
> +       case VEC_SHUFFLE_EXPR:
>
> I don't think that's quite the right place given the is_gimple_lvalue
> predicate on the first operand.  More like
>
>        case VEC_SHUFFLE_EXPR:
>           goto expr_3;
>
> +/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
>
> typo, mask
>
> +   means
> +
> +   freach i in length (mask):
> +     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
> +*/
> +DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
>
> what is the (is there any?) constraint on the operand types, especially
> the mask type?
>
> Index: gcc/gimple.c
> ===================================================================
> --- gcc/gimple.c        (revision 177758)
> +++ gcc/gimple.c        (working copy)
> @@ -2623,6 +2623,7 @@ get_gimple_rhs_num_ops (enum tree_code c
>       || (SYM) == ADDR_EXPR                                                \
>       || (SYM) == WITH_SIZE_EXPR                                           \
>       || (SYM) == SSA_NAME                                                 \
> +      || (SYM) == VEC_SHUFFLE_EXPR                                         \
>       || (SYM) == VEC_COND_EXPR) ? GIMPLE_SINGLE_RHS                       \
>    : GIMPLE_INVALID_RHS),
>  #define END_OF_BASE_TREE_CODES (unsigned char) GIMPLE_INVALID_RHS,
>
> please make it GIMPLE_TERNARY_RHS instead.
>
> which requires adjustment at least here:
>
> Index: gcc/tree-ssa-operands.c
> ===================================================================
> --- gcc/tree-ssa-operands.c     (revision 177758)
> +++ gcc/tree-ssa-operands.c     (working copy)
> @@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
>
>     case COND_EXPR:
>     case VEC_COND_EXPR:
> +    case VEC_SHUFFLE_EXPR:
>
>
> I think it would be nicer if the builtin would be handled by the frontend
> not as builtin but like __builtin_complex and we'd just deal with
> VEC_SHUFFLE_EXPR throughout the middle-end, eventually
> lowering it in tree-vect-generic.c.  So I didn't look at the lowering
> code in detail because that would obviously change then.
>
> Defering to Joseph for a decision here and to x86 maintainers for
> the target specific bits.
>
> Thanks,
> Richard.

I'll go and see how the __builtin_complex are treated, and try to
adjust the patch.


Thanks,
Artem.

>> ChangeLog:
>> 2011-08-30 Artjoms Sinkarovs <artyom.shinkaroff@gmailc.com>
>>
>>        gcc/
>>        * optabs.c (expand_vec_shuffle_expr_p): New function. Checks
>>        if given expression can be expanded by the target.
>>        (expand_vec_shuffle_expr): New function. Expand VEC_SHUFFLE_EXPR
>>        using target vector instructions.
>>        * optabs.h: New optab vshuffle.
>>        (expand_vec_shuffle_expr_p): New prototype.
>>        (expand_vec_shuffle_expr): New prototype.
>>        * genopinit.c: Adjust to support vecshuffle.
>>        * builtins.def: New builtin __builtin_shuffle.
>>        * c-typeck.c (build_function_call_vec): Typecheck
>>        __builtin_shuffle, allowing only two or three arguments.
>>        Change the type of builtin depending on the arguments.
>>        (digest_init): Warn when constructor has less elements than
>>        vector type.
>>        * gimplify.c (gimplify_exp): Adjusted to support VEC_SHUFFLE_EXPR.
>>        * tree.def: New tree code VEC_SHUFFLE_EXPR.
>>        * tree-vect-generic.c (vector_element): New function. Returns an
>>        element of the vector at the given position.
>>        (lower_builtin_shuffle): Change builtin_shuffle with VEC_SHUFLLE_EXPR
>>        or expand an expression piecewise.
>>        (expand_vector_operations_1): Adjusted.
>>        (gate_expand_vector_operations_noop): New gate function.
>>        * gimple.c (get_gimple_rhs_num_ops): Adjust.
>>        * passes.c: Move veclower down.
>>        * tree-pretty-print.c (dump_generic_node): Recognize
>>        VEC_SHUFFLE_EXPR as valid expression.
>>        * tree-ssa-operands: Adjust.
>>
>>        gcc/config/i386
>>        * sse.md: (sseshuffint) New mode_attr. Correspondence between the
>>        vector and the type of the mask when shuffling.
>>        (vecshuffle<mode>): New expansion.
>>        * i386-protos.h (ix86_expand_vshuffle): New prototype.
>>        * i386.c (ix86_expand_vshuffle): Expand vshuffle using pshufb.
>>        (ix86_vectorize_builtin_vec_perm_ok): Adjust.
>>
>>        gcc/doc
>>        * extend.texi: Adjust.
>>
>>        gcc/testsuite
>>        * gcc.c-torture/execute/vect-shuffle-2.c: New test.
>>        * gcc.c-torture/execute/vect-shuffle-4.c: New test.
>>        * gcc.c-torture/execute/vect-shuffle-1.c: New test.
>>        * gcc.c-torture/execute/vect-shuffle-3.c: New test.
>>
>> bootstrapped on x86_64-unknown-linux-gnu. The AVX parts are not
>> tested, because I don't have actual hardware. It works with -mavx, the
>> assembler code looks fine to me. I'll test it on a real hardware in
>> couple of days.
>>
>>
>>
>> Thanks,
>> Artem Shinkarov.
>>
>
Chris Lattner - Aug. 30, 2011, 11:51 p.m.
On Aug 30, 2011, at 10:01 AM, Artem Shinkarov wrote:
>>> The patch at the moment lacks of some examples, but mainly it works
>>> fine for me. It would be nice if i386 gurus could look into the way I
>>> am doing the expansion.
>>> 
>>> Middle-end parts seems to be more or less fine, they have not changed
>>> much from the previous time.
>> 
>> +@code{__builtin_shuffle (vec, mask)} and
>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>> 
>> the latter would be __builtin_shuffle2.
> 
> Why??
> That was the syntax we agreed on that elegantly handles both cases in one place.

If you're going to add vector shuffling builtins, you might consider adding the same builtin that clang has for compatibility:
http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector

It should be straight-forward to map it into the same IR.

-Chris
Richard Guenther - Aug. 31, 2011, 7:58 a.m.
On Tue, Aug 30, 2011 at 7:01 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 30, 2011 at 2:03 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Aug 30, 2011 at 4:31 AM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> Hi
>>>
>>> This is a patch for the explicit vector shuffling we have discussed a
>>> long time ago here:
>>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01092.html
>>>
>>> The new patch introduces the new tree code, as we agreed, and expands
>>> this code by checking the vshuffle pattern in the backend.
>>>
>>> The patch at the moment lacks of some examples, but mainly it works
>>> fine for me. It would be nice if i386 gurus could look into the way I
>>> am doing the expansion.
>>>
>>> Middle-end parts seems to be more or less fine, they have not changed
>>> much from the previous time.
>>
>> +@code{__builtin_shuffle (vec, mask)} and
>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>
>> the latter would be __builtin_shuffle2.
>
> Why??
> That was the syntax we agreed on that elegantly handles both cases in one place.

Ah, then there was a case below that mentions __builtin_shuffle2 that
needs adjusting then.

>> +bool
>> +expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0,
>> +                          tree v1, tree mask)
>> +{
>> +#define inner_type_size(vec) \
>> +  GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (vec))))
>>
>> missing comment.  No #defines like this please, just initialize
>> two temporary variables.
>>
>> +
>> +rtx
>> +expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
>> +{
>>
>> comment.
>>
>> +vshuffle:
>> +  gcc_assert (v1 == v0);
>> +
>> +  icode = direct_optab_handler (vshuffle_optab, mode);
>>
>> hmm, so we don't have a vshuffle2 optab but always go via the
>> builtin function, but only for constant masks there?  I wonder
>> if we should arrange for targets to only support a vshuffle
>> optab (thus, transition away from the builtin) and so
>> unconditionally have a vshuffle2 optab only (with possibly
>> equivalent v1 and v0?)
>
> I have only implemented the case with non-constant mask that supports
> only one argument. I think that it would be enough for the first
> version. Later we can introduce vshuffle2 pattern and reuse the code
> that expands vshuffle at the moment.

Ok.

>> I suppose Richard might remember what he had in mind back
>> when we discussed this.
>>
>> Index: gcc/c-typeck.c
>> ===================================================================
>> --- gcc/c-typeck.c      (revision 177758)
>> +++ gcc/c-typeck.c      (working copy)
>> @@ -2815,6 +2815,68 @@ build_function_call_vec (location_t loc,
>>       && !check_builtin_function_arguments (fundecl, nargs, argarray))
>>     return error_mark_node;
>>
>> +  /* Typecheck a builtin function which is declared with variable
>> +     argument list.  */
>> +  if (fundecl && DECL_BUILT_IN (fundecl)
>> +      && DECL_BUILT_IN_CLASS (fundecl) == BUILT_IN_NORMAL)
>>
>> just add to check_builtin_function_arguments which is called right
>> in front of your added code.
>>
>> +          /* Here we change the return type of the builtin function
>> +             from int f(...) --> t f(...) where t is a type of the
>> +             first argument.  */
>> +          fundecl = copy_node (fundecl);
>> +          TREE_TYPE (fundecl) = build_function_type (TREE_TYPE (firstarg),
>> +                                        TYPE_ARG_TYPES (TREE_TYPE (fundecl)));
>> +          function = build_fold_addr_expr (fundecl);
>>
>> oh, hum - now I remember ;)  Eventually the C frontend should handle
>> this not via the function call mechanism but similar to how Joseph
>> added __builtin_complex support with
>>
>> 2011-08-19  Joseph Myers  <joseph@codesourcery.com>
>>
>>        * c-parser.c (c_parser_postfix_expression): Handle RID_BUILTIN_COMPLEX.
>>        * doc/extend.texi (__builtin_complex): Document.
>>
>> and then emit VEC_SHUFFLE_EXPRs directly from the frontend.  Joseph?
>>
>>          FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
>> -           if (!CONSTANT_CLASS_P (value))
>> +         if (!CONSTANT_CLASS_P (value))
>>
>> watch out for spurious whitespace changes.
>>
>> Index: gcc/gimplify.c
>> ===================================================================
>> --- gcc/gimplify.c      (revision 177758)
>> +++ gcc/gimplify.c      (working copy)
>> @@ -7050,6 +7050,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>>          break;
>>
>>        case BIT_FIELD_REF:
>> +       case VEC_SHUFFLE_EXPR:
>>
>> I don't think that's quite the right place given the is_gimple_lvalue
>> predicate on the first operand.  More like
>>
>>        case VEC_SHUFFLE_EXPR:
>>           goto expr_3;
>>
>> +/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
>>
>> typo, mask
>>
>> +   means
>> +
>> +   freach i in length (mask):
>> +     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
>> +*/
>> +DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
>>
>> what is the (is there any?) constraint on the operand types, especially
>> the mask type?
>>
>> Index: gcc/gimple.c
>> ===================================================================
>> --- gcc/gimple.c        (revision 177758)
>> +++ gcc/gimple.c        (working copy)
>> @@ -2623,6 +2623,7 @@ get_gimple_rhs_num_ops (enum tree_code c
>>       || (SYM) == ADDR_EXPR                                                \
>>       || (SYM) == WITH_SIZE_EXPR                                           \
>>       || (SYM) == SSA_NAME                                                 \
>> +      || (SYM) == VEC_SHUFFLE_EXPR                                         \
>>       || (SYM) == VEC_COND_EXPR) ? GIMPLE_SINGLE_RHS                       \
>>    : GIMPLE_INVALID_RHS),
>>  #define END_OF_BASE_TREE_CODES (unsigned char) GIMPLE_INVALID_RHS,
>>
>> please make it GIMPLE_TERNARY_RHS instead.
>>
>> which requires adjustment at least here:
>>
>> Index: gcc/tree-ssa-operands.c
>> ===================================================================
>> --- gcc/tree-ssa-operands.c     (revision 177758)
>> +++ gcc/tree-ssa-operands.c     (working copy)
>> @@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
>>
>>     case COND_EXPR:
>>     case VEC_COND_EXPR:
>> +    case VEC_SHUFFLE_EXPR:
>>
>>
>> I think it would be nicer if the builtin would be handled by the frontend
>> not as builtin but like __builtin_complex and we'd just deal with
>> VEC_SHUFFLE_EXPR throughout the middle-end, eventually
>> lowering it in tree-vect-generic.c.  So I didn't look at the lowering
>> code in detail because that would obviously change then.
>>
>> Defering to Joseph for a decision here and to x86 maintainers for
>> the target specific bits.
>>
>> Thanks,
>> Richard.
>
> I'll go and see how the __builtin_complex are treated, and try to
> adjust the patch.

Thanks, also see Josephs comments on this.

Richard.

>
> Thanks,
> Artem.
>
>>> ChangeLog:
>>> 2011-08-30 Artjoms Sinkarovs <artyom.shinkaroff@gmailc.com>
>>>
>>>        gcc/
>>>        * optabs.c (expand_vec_shuffle_expr_p): New function. Checks
>>>        if given expression can be expanded by the target.
>>>        (expand_vec_shuffle_expr): New function. Expand VEC_SHUFFLE_EXPR
>>>        using target vector instructions.
>>>        * optabs.h: New optab vshuffle.
>>>        (expand_vec_shuffle_expr_p): New prototype.
>>>        (expand_vec_shuffle_expr): New prototype.
>>>        * genopinit.c: Adjust to support vecshuffle.
>>>        * builtins.def: New builtin __builtin_shuffle.
>>>        * c-typeck.c (build_function_call_vec): Typecheck
>>>        __builtin_shuffle, allowing only two or three arguments.
>>>        Change the type of builtin depending on the arguments.
>>>        (digest_init): Warn when constructor has less elements than
>>>        vector type.
>>>        * gimplify.c (gimplify_exp): Adjusted to support VEC_SHUFFLE_EXPR.
>>>        * tree.def: New tree code VEC_SHUFFLE_EXPR.
>>>        * tree-vect-generic.c (vector_element): New function. Returns an
>>>        element of the vector at the given position.
>>>        (lower_builtin_shuffle): Change builtin_shuffle with VEC_SHUFLLE_EXPR
>>>        or expand an expression piecewise.
>>>        (expand_vector_operations_1): Adjusted.
>>>        (gate_expand_vector_operations_noop): New gate function.
>>>        * gimple.c (get_gimple_rhs_num_ops): Adjust.
>>>        * passes.c: Move veclower down.
>>>        * tree-pretty-print.c (dump_generic_node): Recognize
>>>        VEC_SHUFFLE_EXPR as valid expression.
>>>        * tree-ssa-operands: Adjust.
>>>
>>>        gcc/config/i386
>>>        * sse.md: (sseshuffint) New mode_attr. Correspondence between the
>>>        vector and the type of the mask when shuffling.
>>>        (vecshuffle<mode>): New expansion.
>>>        * i386-protos.h (ix86_expand_vshuffle): New prototype.
>>>        * i386.c (ix86_expand_vshuffle): Expand vshuffle using pshufb.
>>>        (ix86_vectorize_builtin_vec_perm_ok): Adjust.
>>>
>>>        gcc/doc
>>>        * extend.texi: Adjust.
>>>
>>>        gcc/testsuite
>>>        * gcc.c-torture/execute/vect-shuffle-2.c: New test.
>>>        * gcc.c-torture/execute/vect-shuffle-4.c: New test.
>>>        * gcc.c-torture/execute/vect-shuffle-1.c: New test.
>>>        * gcc.c-torture/execute/vect-shuffle-3.c: New test.
>>>
>>> bootstrapped on x86_64-unknown-linux-gnu. The AVX parts are not
>>> tested, because I don't have actual hardware. It works with -mavx, the
>>> assembler code looks fine to me. I'll test it on a real hardware in
>>> couple of days.
>>>
>>>
>>>
>>> Thanks,
>>> Artem Shinkarov.
>>>
>>
>
Richard Guenther - Aug. 31, 2011, 8:03 a.m.
On Wed, Aug 31, 2011 at 1:51 AM, Chris Lattner <clattner@apple.com> wrote:
> On Aug 30, 2011, at 10:01 AM, Artem Shinkarov wrote:
>>>> The patch at the moment lacks of some examples, but mainly it works
>>>> fine for me. It would be nice if i386 gurus could look into the way I
>>>> am doing the expansion.
>>>>
>>>> Middle-end parts seems to be more or less fine, they have not changed
>>>> much from the previous time.
>>>
>>> +@code{__builtin_shuffle (vec, mask)} and
>>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>>
>>> the latter would be __builtin_shuffle2.
>>
>> Why??
>> That was the syntax we agreed on that elegantly handles both cases in one place.
>
> If you're going to add vector shuffling builtins, you might consider adding the same builtin that clang has for compatibility:
> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>
> It should be straight-forward to map it into the same IR.

Sure.  It doesn't support a vector argument for element selection though,
which I think is required for a mapping to OpenCL shuffle/shuffle2.  That's odd.

Richard.

> -Chris
>
Artem Shinkarov - Aug. 31, 2011, 8:27 a.m.
On Wed, Aug 31, 2011 at 12:51 AM, Chris Lattner <clattner@apple.com> wrote:
> On Aug 30, 2011, at 10:01 AM, Artem Shinkarov wrote:
>>>> The patch at the moment lacks of some examples, but mainly it works
>>>> fine for me. It would be nice if i386 gurus could look into the way I
>>>> am doing the expansion.
>>>>
>>>> Middle-end parts seems to be more or less fine, they have not changed
>>>> much from the previous time.
>>>
>>> +@code{__builtin_shuffle (vec, mask)} and
>>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>>
>>> the latter would be __builtin_shuffle2.
>>
>> Why??
>> That was the syntax we agreed on that elegantly handles both cases in one place.
>
> If you're going to add vector shuffling builtins, you might consider adding the same builtin that clang has for compatibility:
> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>
> It should be straight-forward to map it into the same IR.
>
> -Chris
>

Chris

I am trying to use OpenCL syntax here which says that the mask for
shuffling is a vector. Also I didn't really get from the clang
description if the indexes could be non-constnants? If not, then I
have a problem here, because I want to support this.



Artem.
Duncan Sands - Aug. 31, 2011, 8:35 a.m.
Hi Artem,

On 31/08/11 10:27, Artem Shinkarov wrote:
> On Wed, Aug 31, 2011 at 12:51 AM, Chris Lattner<clattner@apple.com>  wrote:
>> On Aug 30, 2011, at 10:01 AM, Artem Shinkarov wrote:
>>>>> The patch at the moment lacks of some examples, but mainly it works
>>>>> fine for me. It would be nice if i386 gurus could look into the way I
>>>>> am doing the expansion.
>>>>>
>>>>> Middle-end parts seems to be more or less fine, they have not changed
>>>>> much from the previous time.
>>>>
>>>> +@code{__builtin_shuffle (vec, mask)} and
>>>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>>>
>>>> the latter would be __builtin_shuffle2.
>>>
>>> Why??
>>> That was the syntax we agreed on that elegantly handles both cases in one place.
>>
>> If you're going to add vector shuffling builtins, you might consider adding the same builtin that clang has for compatibility:
>> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>>
>> It should be straight-forward to map it into the same IR.
>>
>> -Chris
>>
>
> Chris
>
> I am trying to use OpenCL syntax here which says that the mask for
> shuffling is a vector. Also I didn't really get from the clang
> description if the indexes could be non-constnants? If not, then I
> have a problem here, because I want to support this.

probably it maps directly to the LLVM shufflevector instruction, see
   http://llvm.org/docs/LangRef.html#i_shufflevector
That requires the shuffle mask to be constant.

Ciao, Duncan.
Richard Guenther - Aug. 31, 2011, 9:11 a.m.
On Wed, Aug 31, 2011 at 10:35 AM, Duncan Sands <baldrick@free.fr> wrote:
> Hi Artem,
>
> On 31/08/11 10:27, Artem Shinkarov wrote:
>>
>> On Wed, Aug 31, 2011 at 12:51 AM, Chris Lattner<clattner@apple.com>
>>  wrote:
>>>
>>> On Aug 30, 2011, at 10:01 AM, Artem Shinkarov wrote:
>>>>>>
>>>>>> The patch at the moment lacks of some examples, but mainly it works
>>>>>> fine for me. It would be nice if i386 gurus could look into the way I
>>>>>> am doing the expansion.
>>>>>>
>>>>>> Middle-end parts seems to be more or less fine, they have not changed
>>>>>> much from the previous time.
>>>>>
>>>>> +@code{__builtin_shuffle (vec, mask)} and
>>>>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>>>>
>>>>> the latter would be __builtin_shuffle2.
>>>>
>>>> Why??
>>>> That was the syntax we agreed on that elegantly handles both cases in
>>>> one place.
>>>
>>> If you're going to add vector shuffling builtins, you might consider
>>> adding the same builtin that clang has for compatibility:
>>>
>>> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>>>
>>> It should be straight-forward to map it into the same IR.
>>>
>>> -Chris
>>>
>>
>> Chris
>>
>> I am trying to use OpenCL syntax here which says that the mask for
>> shuffling is a vector. Also I didn't really get from the clang
>> description if the indexes could be non-constnants? If not, then I
>> have a problem here, because I want to support this.
>
> probably it maps directly to the LLVM shufflevector instruction, see
>  http://llvm.org/docs/LangRef.html#i_shufflevector
> That requires the shuffle mask to be constant.

I see.  I think it's not worth copying LLVM builtins that merely map
its internal IL.

Richard.

> Ciao, Duncan.
>
Chris Lattner - Aug. 31, 2011, 5:39 p.m.
On Aug 31, 2011, at 1:27 AM, Artem Shinkarov wrote:

>> If you're going to add vector shuffling builtins, you might consider adding the same builtin that clang has for compatibility:
>> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>> 
>> It should be straight-forward to map it into the same IR.
>> 
>> -Chris
>> 
> 
> Chris
> 
> I am trying to use OpenCL syntax here which says that the mask for
> shuffling is a vector. Also I didn't really get from the clang
> description if the indexes could be non-constnants? If not, then I
> have a problem here, because I want to support this.

Yes, constant elements are required for this builtin.  It is an implementation detail, but Clang doesn't implement the OpenCL shuffle operations with builtins.

-Chris

Patch

Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177758)
+++ gcc/c-typeck.c	(working copy)
@@ -2815,6 +2815,68 @@  build_function_call_vec (location_t loc,
       && !check_builtin_function_arguments (fundecl, nargs, argarray))
     return error_mark_node;

+  /* Typecheck a builtin function which is declared with variable
+     argument list.  */
+  if (fundecl && DECL_BUILT_IN (fundecl)
+      && DECL_BUILT_IN_CLASS (fundecl) == BUILT_IN_NORMAL)

just add to check_builtin_function_arguments which is called right
in front of your added code.

+          /* Here we change the return type of the builtin function
+             from int f(...) --> t f(...) where t is a type of the
+             first argument.  */
+          fundecl = copy_node (fundecl);
+          TREE_TYPE (fundecl) = build_function_type (TREE_TYPE (firstarg),
+                                        TYPE_ARG_TYPES (TREE_TYPE (fundecl)));
+          function = build_fold_addr_expr (fundecl);

oh, hum - now I remember ;)  Eventually the C frontend should handle
this not via the function call mechanism but similar to how Joseph
added __builtin_complex support with

2011-08-19  Joseph Myers  <joseph@codesourcery.com>

        * c-parser.c (c_parser_postfix_expression): Handle RID_BUILTIN_COMPLEX.
        * doc/extend.texi (__builtin_complex): Document.

and then emit VEC_SHUFFLE_EXPRs directly from the frontend.  Joseph?

 	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
-	    if (!CONSTANT_CLASS_P (value))
+	  if (!CONSTANT_CLASS_P (value))

watch out for spurious whitespace changes.

Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177758)
+++ gcc/gimplify.c	(working copy)
@@ -7050,6 +7050,7 @@  gimplify_expr (tree *expr_p, gimple_seq
 	  break;

 	case BIT_FIELD_REF:
+	case VEC_SHUFFLE_EXPR:

I don't think that's quite the right place given the is_gimple_lvalue
predicate on the first operand.  More like

        case VEC_SHUFFLE_EXPR:
           goto expr_3;

+/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>

typo, mask

+   means
+
+   freach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)

what is the (is there any?) constraint on the operand types, especially
the mask type?

Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 177758)
+++ gcc/gimple.c	(working copy)
@@ -2623,6 +2623,7 @@  get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == ADDR_EXPR						    \
       || (SYM) == WITH_SIZE_EXPR					    \
       || (SYM) == SSA_NAME						    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == VEC_COND_EXPR) ? GIMPLE_SINGLE_RHS			    \
    : GIMPLE_INVALID_RHS),
 #define END_OF_BASE_TREE_CODES (unsigned char) GIMPLE_INVALID_RHS,

please make it GIMPLE_TERNARY_RHS instead.

which requires adjustment at least here:

Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 177758)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@  get_expr_operands (gimple stmt, tree *ex

     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:


I think it would be nicer if the builtin would be handled by the frontend