Message ID | CAFiYyc3xZHGART6iXscAJ1GumbeWC=MTB56mfhC-BQgTT6Nixg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov > <artyom.shinkaroff@gmail.com> wrote: >> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov >>> <artyom.shinkaroff@gmail.com> wrote: >>>> Here is a cleaned-up patch without the hook. Mostly it works in a way >>>> we discussed. >>>> >>>> So I think it is a right time to do something about vcond patterns, >>>> which would allow me to get rid of conversions that I need to put all >>>> over the code. >>>> >>>> Also at the moment the patch breaks lto frontend with a simple example: >>>> #define vector(elcount, type) \ >>>> __attribute__((vector_size((elcount)*sizeof(type)))) type >>>> >>>> int main (int argc, char *argv[]) { >>>> vector (4, float) f0; >>>> vector (4, float) f1; >>>> >>>> f0 = f1 != f0 >>>> ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0}; >>>> >>>> return (int)f0[argc]; >>>> } >>>> >>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244 >>>> >>>> I looked into the file, the conversion function is defined as >>>> gcc_unreachable (). I am not very familiar with lto, so I don't really >>>> know what is the right way to treat the conversions. >>>> >>>> And I seriously need help with backend patterns. >>> >>> On the patch. >>> >>> The documentation needs review by a native english speaker, but here >>> are some factual comments: >>> >>> +In C vector comparison is supported within standard comparison operators: >>> >>> it should read 'In GNU C' here and everywhere else as this is a GNU >>> extension. >>> >>> The result of the >>> +comparison is a signed integer-type vector where the size of each >>> +element must be the same as the size of compared vectors element. >>> >>> The result type of the comparison is determined by the C frontend, >>> it isn't under control of the user. What you are implying here is >>> restrictions on vector assignments, which are documented elsewhere. >>> I'd just say >>> >>> 'The result of the comparison is a vector of the same width and number >>> of elements as the comparison operands with a signed integral element >>> type.' >>> >>> +In addition to the vector comparison C supports conditional expressions >>> >>> See above. >>> >>> +For the convenience condition in the vector conditional can be just a >>> +vector of signed integer type. >>> >>> 'of integer type.' I don't see a reason to disallow unsigned integers, >>> they can be equally well compared against zero. >> >> I'll have a final go on the documentation, it is untouched from the old patches. >> >>> Index: gcc/targhooks.h >>> =================================================================== >>> --- gcc/targhooks.h (revision 177665) >>> +++ gcc/targhooks.h (working copy) >>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization >>> extern tree default_builtin_reciprocal (unsigned int, bool, bool); >>> >>> extern bool default_builtin_vector_alignment_reachable (const_tree, bool); >>> + >>> extern bool >>> default_builtin_support_vector_misalignment (enum machine_mode mode, >>> const_tree, >>> >>> spurious whitespace change. >> >> Yes, thanks. >> >>> Index: gcc/optabs.c >>> =================================================================== >>> --- gcc/optabs.c (revision 177665) >>> +++ gcc/optabs.c (working copy) >>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type >>> ... >>> + else >>> + { >>> + rtx rtx_op0; >>> + rtx vec; >>> + >>> + rtx_op0 = expand_normal (op0); >>> + comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX); >>> + vec = CONST0_RTX (mode); >>> + >>> + create_output_operand (&ops[0], target, mode); >>> + create_input_operand (&ops[1], rtx_op1, mode); >>> + create_input_operand (&ops[2], rtx_op2, mode); >>> + create_input_operand (&ops[3], comparison, mode); >>> + create_input_operand (&ops[4], rtx_op0, mode); >>> + create_input_operand (&ops[5], vec, mode); >>> >>> this still builds the fake(?) != comparison, but as you said you need help >>> with the .md part if we want to use a machine specific pattern for this >>> case (which we eventually want, for the sake of using XOP vcond). >> >> Yes, I am waiting for it. This is the only way at the moment to make >> sure that in >> m = a > b; >> r = m ? c : d; >> >> m in the vcond is not transformed into the m != 0. >> >>> Index: gcc/target.h >>> =================================================================== >>> --- gcc/target.h (revision 177665) >>> +++ gcc/target.h (working copy) >>> @@ -51,6 +51,7 @@ >>> #define GCC_TARGET_H >>> >>> #include "insn-modes.h" >>> +#include "gimple.h" >>> >>> #ifdef ENABLE_CHECKING >>> >>> spurious change. >> >> Old stuff, fixed. >> >>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr >>> floating-point, we can only do some of these simplifications.) */ >>> if (operand_equal_p (arg0, arg1, 0)) >>> { >>> + tree arg0_type = TREE_TYPE (arg0); >>> + >>> switch (code) >>> { >>> case EQ_EXPR: >>> - if (! FLOAT_TYPE_P (TREE_TYPE (arg0)) >>> - || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) >>> + if (! FLOAT_TYPE_P (arg0_type) >>> + || ! HONOR_NANS (TYPE_MODE (arg0_type))) >>> ... >> >> Ok. >> >>> >>> Likewise. >>> >>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ >>> case UNGE_EXPR: >>> case UNEQ_EXPR: >>> case LTGT_EXPR: >>> + if (TREE_CODE (ops->type) == VECTOR_TYPE) >>> + { >>> + enum tree_code code = ops->code; >>> + tree arg0 = ops->op0; >>> + tree arg1 = ops->op1; >>> >>> move this code to do_store_flag (we really store a flag value). It should >>> also simply do what expand_vec_cond_expr does, probably simply >>> calling that with the {-1,...} {0,...} extra args should work. >> >> I started to do that, but the code in do_store_flag is completely >> different from what I am doing, and it looks confusing. I just call >> expand_vec_cond_expr and that is it. I can write a separate function, >> but the code is quite small. > > Hm? I see in your patch > > Index: gcc/expr.c > =================================================================== > --- gcc/expr.c (revision 177665) > +++ gcc/expr.c (working copy) > @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ > case UNGE_EXPR: > case UNEQ_EXPR: > case LTGT_EXPR: > + if (TREE_CODE (ops->type) == VECTOR_TYPE) > + { > + enum tree_code code = ops->code; > + tree arg0 = ops->op0; > + tree arg1 = ops->op1; > + tree arg_type = TREE_TYPE (arg0); > + tree el_type = TREE_TYPE (arg_type); > + tree t, ifexp, if_true, if_false; > + > + el_type = lang_hooks.types.type_for_size (TYPE_PRECISION > (el_type), 0); > + > + > + ifexp = build2 (code, type, arg0, arg1); > + if_true = build_vector_from_val (type, build_int_cst (el_type, -1)); > + if_false = build_vector_from_val (type, build_int_cst (el_type, 0)); > + > + if (arg_type != type) > + { > + if_true = convert (arg_type, if_true); > + if_false = convert (arg_type, if_false); > + t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false); > + t = convert (type, t); > + } > + else > + t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false); > + > + return expand_expr (t, > + modifier != EXPAND_STACK_PARM ? target : > NULL_RTX, > + tmode != VOIDmode ? tmode : mode, > + modifier); > + } > > that's not exactly "calling expand_vec_cond_expr". Well, actually it is. Keep in mind that clean backend would imply removing the conversions. But I'll make a function. >>> >>> As for the still required conversions, you should be able to delay those >>> from the C frontend (and here) to expand_vec_cond_expr by, after >>> expanding op1 and op2, wrapping a subreg around it with a proper mode >>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work), >>> and then convert the result back to the original mode. >>> >>> I'll leave the C frontend pieces of the patch for review by Joseph, but >> >> Conversions are there until we fix the backend. When backend will be >> able to digest f0 > f1 ? int0 : int1, all the conversions will go >> away. >> >>> +static tree >>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2) >>> >>> is missing a function comment. >> >> fixed. >> >>> +static tree >>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b, >>> + tree bitpos, tree bitsize, enum tree_code code) >>> +{ >>> + tree cond; >>> + tree comp_type; >>> + >>> + a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos); >>> + b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos); >>> + >>> + comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0); >>> + >>> >>> Use >>> >>> comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0); >>> >>> instead. But I think you don't want to use TYPE_PRECISION on >>> FP types. Instead you want a signed integer type of the same (mode) >>> size as the vector element type, thus >>> >>> comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE >>> (TYPE_MODE (inner_type)), 0); >> >> Hm, I thought that at this stage we don't wan to know anything about >> modes. I mean here I am really building the same integer type as the >> operands of the comparison have. But I can use MODE_BITSIZE as well, I >> don't think that it could happen that the size of the mode is >> different from the size of the type. Or could it? > > The comparison could be on floating-point types where TYPE_PRECISION > can be, for example, 80 for x87 doubles. You want an integer type > of the same width, so yes, GET_MODE_BITSIZE is the correct thing > to use here. Ok. >>> + cond = gimplify_build2 (gsi, code, comp_type, a, b); >>> >>> the result type of a comparison is boolean_type_node, not comp_type. >>> >>> + cond = gimplify_build2 (gsi, code, comp_type, a, b); >>> + return gimplify_build3 (gsi, COND_EXPR, comp_type, cond, >>> + build_int_cst (comp_type, -1), >>> + build_int_cst (comp_type, 0)); >>> >>> writing this as >>> >>> + return gimplify_build3 (gsi, COND_EXPR, comp_type, >>> fold_build2 (code, boolean_type_node, a, b), >>> + build_int_cst (comp_type, -1), >>> + build_int_cst (comp_type, 0)); >>> >>> will get the gimplifier a better chance at simplifcation. >>> >>> + if (! expand_vec_cond_expr_p (type, TYPE_MODE (type))) >>> >>> I think we are expecting the scalar type and the vector mode here >>> from looking at the single existing caller. It probably doesn't make >>> a difference (we only check TYPE_UNSIGNED of it, which should >>> also work for vector types), but let's be consistent. Thus, >> >> Ok. >> >>> if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type))) >>> >>> + if (! expand_vec_cond_expr_p (type, TYPE_MODE (type))) >>> + t = expand_vector_piecewise (gsi, do_compare, type, >>> + TREE_TYPE (TREE_TYPE (op0)), op0, op1, code); >>> + else >>> + t = gimplify_build2 (gsi, code, type, op0, op1); >>> >>> the else case looks odd. Why re-build a stmt that already exists? >>> Simply return NULL_TREE instead? >> >> I can adjust. The reason it is written that way is that >> expand_vector_operations_1 is using the result of the function to >> update rhs. > > Ok, so it should check whether there was any lowering done then. > >>> +static tree >>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp) >>> +{ >>> ... >>> + /* Expand vector condition inside of VEC_COND_EXPR. */ >>> + if (! expand_vec_cond_expr_p (TREE_TYPE (cond), >>> + TYPE_MODE (TREE_TYPE (cond)))) >>> + { >>> ... >>> + new_rhs = expand_vector_piecewise (gsi, do_compare, >>> + TREE_TYPE (cond), >>> + TREE_TYPE (TREE_TYPE (op1)), >>> + op0, op1, TREE_CODE (cond)); >>> >>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to >>> >>> tem = { a[0] < b[0] ? -1 : 0, ... } >>> v0 & tem | v1 & ~tem; >>> >>> instead of >>> >>> { a[0] < b[0] ? v0[0] : v1[0], ... } >>> >>> even if the bitwise operations could be carried out using vectors. >>> It's definitely beneficial to do the first if the CPU can create the >>> bitmask. >>> >> >> o_O >> >> I thought you always wanted to do (m & v0) | (~m & v1). >> Do you want to have two cases of the expansion then -- when we have >> mask available and when we don't? But it is really unlikely that we >> can get the mask, but cannot get vcond. Because condition is actually >> vcond. So once again -- do we always expand to {a[0] > b[0] ? v[0] : >> c[0], ...}? > > Hm, yeah. I suppose with the current setup it's hard to only > get the mask but not the full vcond ;) So it probably makes > sense to always expand to {a[0] > b[0] ? v[0] :c[0],...} as > fallback. Sorry for the confusion ;) Ok. >>> + /* Run vecower on the expresisons we have introduced. */ >>> + for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp)) >>> + expand_vector_operations_1 (&gsi_tmp); >>> >>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi) >>> >>> +static bool >>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr) >>> +{ >>> >>> This function is lacking a comment. >>> >>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_ >>> ... >>> + /* Try to get rid from the useless vector comparison >>> + x != {0,0,...} which is inserted by the typechecker. */ >>> + if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR) >>> >>> how and why? You simply drop that comparison - that doesn't look >>> correct. And in fact TREE_OPERAND (cond, 0) will never be a >>> comparison - that wouldn't be valid gimple. Please leave this >>> optimization to SSA based forward propagation (I can help you here >>> once the patch is in). >> >> No-no-no. This is the second part of avoiding >> m = a > b; >> r = m ? v0 : v1; >> >> to prevent m expansion to m != {0}. >> >> I do not _simply_ drop the comparison. I drop it only if >> is_vector_comparison returned true. It means that we can never get >> into the situation that we are dropping actually a comparison inserted >> by the user. But what I really want to achieve here is to drop the >> comparison that the frontend inserts every time when it sees an >> expression there. >> >> As I said earlier, tree forward propagation kicks only using -On, and >> I would really like to make sure that I can get rid of useless != {0} >> at any level. > Please don't. If the language extension forces a != 0 then it should > appear at -O0. The code is fishy anyway in the way it walks stmts > in is_vector_comparison. At least I don't like to see this optimization > done here for the sake of -O0 in this initial patch - you could try > arguing about it as a followup improvement (well, probably with not > much luck). -O0 is about compile-speed and debugging, doing > data-flow by walking stmts backward is slow. Ok, then I seriously don't see any motivation to support the VEC_COND_EXPR. The following code: m = a > b; r = (m & v0) | (~m & v1) gives me much more flexibility and control. What the VEC_COND_EXPR is good for? Syntactical sugar? How about throwing away all the VEC_COND_EXPR parts supporting only conditions (implicitly expressed using vconds)? If we would agree on implicit conversions for real types, then this is a functionality that perfectly satisfies my needs. I don't see any interest from the backend people and I cannot wait forever, so why don't we start with a simple thing? Artem. >>> >>> + if (expand_vec_cond_expr_p (TREE_TYPE (exp), >>> + TYPE_MODE (TREE_TYPE (exp)))) >>> + { >>> + update_stmt (gsi_stmt (*gsi)); >>> + return; >>> >>> no need to update the stmt when you do nothing. >>> >>> + new_rhs = expand_vec_cond_expr_piecewise (gsi, exp); >>> + gimple_assign_set_rhs_from_tree (gsi, new_rhs); >>> + update_stmt (gsi_stmt (*gsi)); >>> + } >>> >>> missing return;, just for clarity that you are done here. >> >> Ok. >> >>> You don't do anything for comparisons here, in case they are split >>> away from the VEC_COND_EXPR by the gimplifier. But if the >>> target doesn't support VEC_COND_EXPRs we have to lower them. >>> I suggest checking your testcases on i?86-linux (or with -m32 -march=i486). >>> >> >> expand_vector_operations_1 take care about any vector comparison, >> considering it as a binary operation. See expand_vector_operation and >> do_compare for more details. > > Ah, ok, I missed that piece. > >>> -TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h >>> +TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h >>> >>> huh, no please ;) I suppose that's no longer necessary anyway now. >>> >> >> Yeah, fixed. :) >> >>> I'll leave the i386.c pieces to the x86 target maintainers to review. >>> They probably will change once the .md file changes are sorted out. >> >> If they ever going to be sorted out... > > Well, we can move the conversion stuff to the point of expansion > using convert_move. That'll keep the middle-end and the C frontend > clean and move the "hack" towards the backends. > > Richard. > >>> Thanks, >>> Richard. >>> >> >> >> Thanks, >> Artem. >> >
On Thu, Aug 25, 2011 at 2:45 PM, Artem Shinkarov <artyom.shinkaroff@gmail.com> wrote: > On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov >> <artyom.shinkaroff@gmail.com> wrote: >>> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov >>>> <artyom.shinkaroff@gmail.com> wrote: >>>>> Here is a cleaned-up patch without the hook. Mostly it works in a way >>>>> we discussed. >>>>> >>>>> So I think it is a right time to do something about vcond patterns, >>>>> which would allow me to get rid of conversions that I need to put all >>>>> over the code. >>>>> >>>>> Also at the moment the patch breaks lto frontend with a simple example: >>>>> #define vector(elcount, type) \ >>>>> __attribute__((vector_size((elcount)*sizeof(type)))) type >>>>> >>>>> int main (int argc, char *argv[]) { >>>>> vector (4, float) f0; >>>>> vector (4, float) f1; >>>>> >>>>> f0 = f1 != f0 >>>>> ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0}; >>>>> >>>>> return (int)f0[argc]; >>>>> } >>>>> >>>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244 >>>>> >>>>> I looked into the file, the conversion function is defined as >>>>> gcc_unreachable (). I am not very familiar with lto, so I don't really >>>>> know what is the right way to treat the conversions. >>>>> >>>>> And I seriously need help with backend patterns. >>>> >>>> On the patch. >>>> >>>> The documentation needs review by a native english speaker, but here >>>> are some factual comments: >>>> >>>> +In C vector comparison is supported within standard comparison operators: >>>> >>>> it should read 'In GNU C' here and everywhere else as this is a GNU >>>> extension. >>>> >>>> The result of the >>>> +comparison is a signed integer-type vector where the size of each >>>> +element must be the same as the size of compared vectors element. >>>> >>>> The result type of the comparison is determined by the C frontend, >>>> it isn't under control of the user. What you are implying here is >>>> restrictions on vector assignments, which are documented elsewhere. >>>> I'd just say >>>> >>>> 'The result of the comparison is a vector of the same width and number >>>> of elements as the comparison operands with a signed integral element >>>> type.' >>>> >>>> +In addition to the vector comparison C supports conditional expressions >>>> >>>> See above. >>>> >>>> +For the convenience condition in the vector conditional can be just a >>>> +vector of signed integer type. >>>> >>>> 'of integer type.' I don't see a reason to disallow unsigned integers, >>>> they can be equally well compared against zero. >>> >>> I'll have a final go on the documentation, it is untouched from the old patches. >>> >>>> Index: gcc/targhooks.h >>>> =================================================================== >>>> --- gcc/targhooks.h (revision 177665) >>>> +++ gcc/targhooks.h (working copy) >>>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization >>>> extern tree default_builtin_reciprocal (unsigned int, bool, bool); >>>> >>>> extern bool default_builtin_vector_alignment_reachable (const_tree, bool); >>>> + >>>> extern bool >>>> default_builtin_support_vector_misalignment (enum machine_mode mode, >>>> const_tree, >>>> >>>> spurious whitespace change. >>> >>> Yes, thanks. >>> >>>> Index: gcc/optabs.c >>>> =================================================================== >>>> --- gcc/optabs.c (revision 177665) >>>> +++ gcc/optabs.c (working copy) >>>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type >>>> ... >>>> + else >>>> + { >>>> + rtx rtx_op0; >>>> + rtx vec; >>>> + >>>> + rtx_op0 = expand_normal (op0); >>>> + comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX); >>>> + vec = CONST0_RTX (mode); >>>> + >>>> + create_output_operand (&ops[0], target, mode); >>>> + create_input_operand (&ops[1], rtx_op1, mode); >>>> + create_input_operand (&ops[2], rtx_op2, mode); >>>> + create_input_operand (&ops[3], comparison, mode); >>>> + create_input_operand (&ops[4], rtx_op0, mode); >>>> + create_input_operand (&ops[5], vec, mode); >>>> >>>> this still builds the fake(?) != comparison, but as you said you need help >>>> with the .md part if we want to use a machine specific pattern for this >>>> case (which we eventually want, for the sake of using XOP vcond). >>> >>> Yes, I am waiting for it. This is the only way at the moment to make >>> sure that in >>> m = a > b; >>> r = m ? c : d; >>> >>> m in the vcond is not transformed into the m != 0. >>> >>>> Index: gcc/target.h >>>> =================================================================== >>>> --- gcc/target.h (revision 177665) >>>> +++ gcc/target.h (working copy) >>>> @@ -51,6 +51,7 @@ >>>> #define GCC_TARGET_H >>>> >>>> #include "insn-modes.h" >>>> +#include "gimple.h" >>>> >>>> #ifdef ENABLE_CHECKING >>>> >>>> spurious change. >>> >>> Old stuff, fixed. >>> >>>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr >>>> floating-point, we can only do some of these simplifications.) */ >>>> if (operand_equal_p (arg0, arg1, 0)) >>>> { >>>> + tree arg0_type = TREE_TYPE (arg0); >>>> + >>>> switch (code) >>>> { >>>> case EQ_EXPR: >>>> - if (! FLOAT_TYPE_P (TREE_TYPE (arg0)) >>>> - || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) >>>> + if (! FLOAT_TYPE_P (arg0_type) >>>> + || ! HONOR_NANS (TYPE_MODE (arg0_type))) >>>> ... >>> >>> Ok. >>> >>>> >>>> Likewise. >>>> >>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ >>>> case UNGE_EXPR: >>>> case UNEQ_EXPR: >>>> case LTGT_EXPR: >>>> + if (TREE_CODE (ops->type) == VECTOR_TYPE) >>>> + { >>>> + enum tree_code code = ops->code; >>>> + tree arg0 = ops->op0; >>>> + tree arg1 = ops->op1; >>>> >>>> move this code to do_store_flag (we really store a flag value). It should >>>> also simply do what expand_vec_cond_expr does, probably simply >>>> calling that with the {-1,...} {0,...} extra args should work. >>> >>> I started to do that, but the code in do_store_flag is completely >>> different from what I am doing, and it looks confusing. I just call >>> expand_vec_cond_expr and that is it. I can write a separate function, >>> but the code is quite small. >> >> Hm? I see in your patch >> >> Index: gcc/expr.c >> =================================================================== >> --- gcc/expr.c (revision 177665) >> +++ gcc/expr.c (working copy) >> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ >> case UNGE_EXPR: >> case UNEQ_EXPR: >> case LTGT_EXPR: >> + if (TREE_CODE (ops->type) == VECTOR_TYPE) >> + { >> + enum tree_code code = ops->code; >> + tree arg0 = ops->op0; >> + tree arg1 = ops->op1; >> + tree arg_type = TREE_TYPE (arg0); >> + tree el_type = TREE_TYPE (arg_type); >> + tree t, ifexp, if_true, if_false; >> + >> + el_type = lang_hooks.types.type_for_size (TYPE_PRECISION >> (el_type), 0); >> + >> + >> + ifexp = build2 (code, type, arg0, arg1); >> + if_true = build_vector_from_val (type, build_int_cst (el_type, -1)); >> + if_false = build_vector_from_val (type, build_int_cst (el_type, 0)); >> + >> + if (arg_type != type) >> + { >> + if_true = convert (arg_type, if_true); >> + if_false = convert (arg_type, if_false); >> + t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false); >> + t = convert (type, t); >> + } >> + else >> + t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false); >> + >> + return expand_expr (t, >> + modifier != EXPAND_STACK_PARM ? target : >> NULL_RTX, >> + tmode != VOIDmode ? tmode : mode, >> + modifier); >> + } >> >> that's not exactly "calling expand_vec_cond_expr". > > Well, actually it is. Keep in mind that clean backend would imply > removing the conversions. But I'll make a function. Why does return expand_vec_cond_expr (build2 (ops->code, type, ops->op0, ops->op1), build_vector_from_val (type, build_int_cst (el_type, -1)), build_vector_from_val (type, build_int_cst (el_type, 0))); not work? If you push the conversions to expand_vec_cond_expr by doing them on RTL you simplify things here and remove the requirement from doing them in the C frontend for VEC_COND_EXPR as well. >>>> >>>> As for the still required conversions, you should be able to delay those >>>> from the C frontend (and here) to expand_vec_cond_expr by, after >>>> expanding op1 and op2, wrapping a subreg around it with a proper mode >>>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work), >>>> and then convert the result back to the original mode. >>>> >>>> I'll leave the C frontend pieces of the patch for review by Joseph, but >>> >>> Conversions are there until we fix the backend. When backend will be >>> able to digest f0 > f1 ? int0 : int1, all the conversions will go >>> away. >>> >>>> +static tree >>>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2) >>>> >>>> is missing a function comment. >>> >>> fixed. >>> >>>> +static tree >>>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b, >>>> + tree bitpos, tree bitsize, enum tree_code code) >>>> +{ >>>> + tree cond; >>>> + tree comp_type; >>>> + >>>> + a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos); >>>> + b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos); >>>> + >>>> + comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0); >>>> + >>>> >>>> Use >>>> >>>> comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0); >>>> >>>> instead. But I think you don't want to use TYPE_PRECISION on >>>> FP types. Instead you want a signed integer type of the same (mode) >>>> size as the vector element type, thus >>>> >>>> comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE >>>> (TYPE_MODE (inner_type)), 0); >>> >>> Hm, I thought that at this stage we don't wan to know anything about >>> modes. I mean here I am really building the same integer type as the >>> operands of the comparison have. But I can use MODE_BITSIZE as well, I >>> don't think that it could happen that the size of the mode is >>> different from the size of the type. Or could it? >> >> The comparison could be on floating-point types where TYPE_PRECISION >> can be, for example, 80 for x87 doubles. You want an integer type >> of the same width, so yes, GET_MODE_BITSIZE is the correct thing >> to use here. > > Ok. > >>>> + cond = gimplify_build2 (gsi, code, comp_type, a, b); >>>> >>>> the result type of a comparison is boolean_type_node, not comp_type. >>>> >>>> + cond = gimplify_build2 (gsi, code, comp_type, a, b); >>>> + return gimplify_build3 (gsi, COND_EXPR, comp_type, cond, >>>> + build_int_cst (comp_type, -1), >>>> + build_int_cst (comp_type, 0)); >>>> >>>> writing this as >>>> >>>> + return gimplify_build3 (gsi, COND_EXPR, comp_type, >>>> fold_build2 (code, boolean_type_node, a, b), >>>> + build_int_cst (comp_type, -1), >>>> + build_int_cst (comp_type, 0)); >>>> >>>> will get the gimplifier a better chance at simplifcation. >>>> >>>> + if (! expand_vec_cond_expr_p (type, TYPE_MODE (type))) >>>> >>>> I think we are expecting the scalar type and the vector mode here >>>> from looking at the single existing caller. It probably doesn't make >>>> a difference (we only check TYPE_UNSIGNED of it, which should >>>> also work for vector types), but let's be consistent. Thus, >>> >>> Ok. >>> >>>> if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type))) >>>> >>>> + if (! expand_vec_cond_expr_p (type, TYPE_MODE (type))) >>>> + t = expand_vector_piecewise (gsi, do_compare, type, >>>> + TREE_TYPE (TREE_TYPE (op0)), op0, op1, code); >>>> + else >>>> + t = gimplify_build2 (gsi, code, type, op0, op1); >>>> >>>> the else case looks odd. Why re-build a stmt that already exists? >>>> Simply return NULL_TREE instead? >>> >>> I can adjust. The reason it is written that way is that >>> expand_vector_operations_1 is using the result of the function to >>> update rhs. >> >> Ok, so it should check whether there was any lowering done then. >> >>>> +static tree >>>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp) >>>> +{ >>>> ... >>>> + /* Expand vector condition inside of VEC_COND_EXPR. */ >>>> + if (! expand_vec_cond_expr_p (TREE_TYPE (cond), >>>> + TYPE_MODE (TREE_TYPE (cond)))) >>>> + { >>>> ... >>>> + new_rhs = expand_vector_piecewise (gsi, do_compare, >>>> + TREE_TYPE (cond), >>>> + TREE_TYPE (TREE_TYPE (op1)), >>>> + op0, op1, TREE_CODE (cond)); >>>> >>>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to >>>> >>>> tem = { a[0] < b[0] ? -1 : 0, ... } >>>> v0 & tem | v1 & ~tem; >>>> >>>> instead of >>>> >>>> { a[0] < b[0] ? v0[0] : v1[0], ... } >>>> >>>> even if the bitwise operations could be carried out using vectors. >>>> It's definitely beneficial to do the first if the CPU can create the >>>> bitmask. >>>> >>> >>> o_O >>> >>> I thought you always wanted to do (m & v0) | (~m & v1). >>> Do you want to have two cases of the expansion then -- when we have >>> mask available and when we don't? But it is really unlikely that we >>> can get the mask, but cannot get vcond. Because condition is actually >>> vcond. So once again -- do we always expand to {a[0] > b[0] ? v[0] : >>> c[0], ...}? >> >> Hm, yeah. I suppose with the current setup it's hard to only >> get the mask but not the full vcond ;) So it probably makes >> sense to always expand to {a[0] > b[0] ? v[0] :c[0],...} as >> fallback. Sorry for the confusion ;) > > Ok. > >>>> + /* Run vecower on the expresisons we have introduced. */ >>>> + for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp)) >>>> + expand_vector_operations_1 (&gsi_tmp); >>>> >>>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi) >>>> >>>> +static bool >>>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr) >>>> +{ >>>> >>>> This function is lacking a comment. >>>> >>>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_ >>>> ... >>>> + /* Try to get rid from the useless vector comparison >>>> + x != {0,0,...} which is inserted by the typechecker. */ >>>> + if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR) >>>> >>>> how and why? You simply drop that comparison - that doesn't look >>>> correct. And in fact TREE_OPERAND (cond, 0) will never be a >>>> comparison - that wouldn't be valid gimple. Please leave this >>>> optimization to SSA based forward propagation (I can help you here >>>> once the patch is in). >>> >>> No-no-no. This is the second part of avoiding >>> m = a > b; >>> r = m ? v0 : v1; >>> >>> to prevent m expansion to m != {0}. >>> >>> I do not _simply_ drop the comparison. I drop it only if >>> is_vector_comparison returned true. It means that we can never get >>> into the situation that we are dropping actually a comparison inserted >>> by the user. But what I really want to achieve here is to drop the >>> comparison that the frontend inserts every time when it sees an >>> expression there. >>> >>> As I said earlier, tree forward propagation kicks only using -On, and >>> I would really like to make sure that I can get rid of useless != {0} >>> at any level. > >> Please don't. If the language extension forces a != 0 then it should >> appear at -O0. The code is fishy anyway in the way it walks stmts >> in is_vector_comparison. At least I don't like to see this optimization >> done here for the sake of -O0 in this initial patch - you could try >> arguing about it as a followup improvement (well, probably with not >> much luck). -O0 is about compile-speed and debugging, doing >> data-flow by walking stmts backward is slow. > > Ok, then I seriously don't see any motivation to support the > VEC_COND_EXPR. The following code: > > m = a > b; > r = (m & v0) | (~m & v1) > > gives me much more flexibility and control. What the VEC_COND_EXPR is > good for? Syntactical sugar? > > How about throwing away all the VEC_COND_EXPR parts supporting only > conditions (implicitly expressed using vconds)? If we would agree on > implicit conversions for real types, then this is a functionality that > perfectly satisfies my needs. > > I don't see any interest from the backend people and I cannot wait > forever, so why don't we start with a simple thing? But the simple thing is already what the backend supports. Richard.
On Thu, Aug 25, 2011 at 2:00 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Thu, Aug 25, 2011 at 2:45 PM, Artem Shinkarov > <artyom.shinkaroff@gmail.com> wrote: >> On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov >>> <artyom.shinkaroff@gmail.com> wrote: >>>> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther >>>> <richard.guenther@gmail.com> wrote: >>>>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov >>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>> Here is a cleaned-up patch without the hook. Mostly it works in a way >>>>>> we discussed. >>>>>> >>>>>> So I think it is a right time to do something about vcond patterns, >>>>>> which would allow me to get rid of conversions that I need to put all >>>>>> over the code. >>>>>> >>>>>> Also at the moment the patch breaks lto frontend with a simple example: >>>>>> #define vector(elcount, type) \ >>>>>> __attribute__((vector_size((elcount)*sizeof(type)))) type >>>>>> >>>>>> int main (int argc, char *argv[]) { >>>>>> vector (4, float) f0; >>>>>> vector (4, float) f1; >>>>>> >>>>>> f0 = f1 != f0 >>>>>> ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0}; >>>>>> >>>>>> return (int)f0[argc]; >>>>>> } >>>>>> >>>>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244 >>>>>> >>>>>> I looked into the file, the conversion function is defined as >>>>>> gcc_unreachable (). I am not very familiar with lto, so I don't really >>>>>> know what is the right way to treat the conversions. >>>>>> >>>>>> And I seriously need help with backend patterns. >>>>> >>>>> On the patch. >>>>> >>>>> The documentation needs review by a native english speaker, but here >>>>> are some factual comments: >>>>> >>>>> +In C vector comparison is supported within standard comparison operators: >>>>> >>>>> it should read 'In GNU C' here and everywhere else as this is a GNU >>>>> extension. >>>>> >>>>> The result of the >>>>> +comparison is a signed integer-type vector where the size of each >>>>> +element must be the same as the size of compared vectors element. >>>>> >>>>> The result type of the comparison is determined by the C frontend, >>>>> it isn't under control of the user. What you are implying here is >>>>> restrictions on vector assignments, which are documented elsewhere. >>>>> I'd just say >>>>> >>>>> 'The result of the comparison is a vector of the same width and number >>>>> of elements as the comparison operands with a signed integral element >>>>> type.' >>>>> >>>>> +In addition to the vector comparison C supports conditional expressions >>>>> >>>>> See above. >>>>> >>>>> +For the convenience condition in the vector conditional can be just a >>>>> +vector of signed integer type. >>>>> >>>>> 'of integer type.' I don't see a reason to disallow unsigned integers, >>>>> they can be equally well compared against zero. >>>> >>>> I'll have a final go on the documentation, it is untouched from the old patches. >>>> >>>>> Index: gcc/targhooks.h >>>>> =================================================================== >>>>> --- gcc/targhooks.h (revision 177665) >>>>> +++ gcc/targhooks.h (working copy) >>>>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization >>>>> extern tree default_builtin_reciprocal (unsigned int, bool, bool); >>>>> >>>>> extern bool default_builtin_vector_alignment_reachable (const_tree, bool); >>>>> + >>>>> extern bool >>>>> default_builtin_support_vector_misalignment (enum machine_mode mode, >>>>> const_tree, >>>>> >>>>> spurious whitespace change. >>>> >>>> Yes, thanks. >>>> >>>>> Index: gcc/optabs.c >>>>> =================================================================== >>>>> --- gcc/optabs.c (revision 177665) >>>>> +++ gcc/optabs.c (working copy) >>>>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type >>>>> ... >>>>> + else >>>>> + { >>>>> + rtx rtx_op0; >>>>> + rtx vec; >>>>> + >>>>> + rtx_op0 = expand_normal (op0); >>>>> + comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX); >>>>> + vec = CONST0_RTX (mode); >>>>> + >>>>> + create_output_operand (&ops[0], target, mode); >>>>> + create_input_operand (&ops[1], rtx_op1, mode); >>>>> + create_input_operand (&ops[2], rtx_op2, mode); >>>>> + create_input_operand (&ops[3], comparison, mode); >>>>> + create_input_operand (&ops[4], rtx_op0, mode); >>>>> + create_input_operand (&ops[5], vec, mode); >>>>> >>>>> this still builds the fake(?) != comparison, but as you said you need help >>>>> with the .md part if we want to use a machine specific pattern for this >>>>> case (which we eventually want, for the sake of using XOP vcond). >>>> >>>> Yes, I am waiting for it. This is the only way at the moment to make >>>> sure that in >>>> m = a > b; >>>> r = m ? c : d; >>>> >>>> m in the vcond is not transformed into the m != 0. >>>> >>>>> Index: gcc/target.h >>>>> =================================================================== >>>>> --- gcc/target.h (revision 177665) >>>>> +++ gcc/target.h (working copy) >>>>> @@ -51,6 +51,7 @@ >>>>> #define GCC_TARGET_H >>>>> >>>>> #include "insn-modes.h" >>>>> +#include "gimple.h" >>>>> >>>>> #ifdef ENABLE_CHECKING >>>>> >>>>> spurious change. >>>> >>>> Old stuff, fixed. >>>> >>>>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr >>>>> floating-point, we can only do some of these simplifications.) */ >>>>> if (operand_equal_p (arg0, arg1, 0)) >>>>> { >>>>> + tree arg0_type = TREE_TYPE (arg0); >>>>> + >>>>> switch (code) >>>>> { >>>>> case EQ_EXPR: >>>>> - if (! FLOAT_TYPE_P (TREE_TYPE (arg0)) >>>>> - || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) >>>>> + if (! FLOAT_TYPE_P (arg0_type) >>>>> + || ! HONOR_NANS (TYPE_MODE (arg0_type))) >>>>> ... >>>> >>>> Ok. >>>> >>>>> >>>>> Likewise. >>>>> >>>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ >>>>> case UNGE_EXPR: >>>>> case UNEQ_EXPR: >>>>> case LTGT_EXPR: >>>>> + if (TREE_CODE (ops->type) == VECTOR_TYPE) >>>>> + { >>>>> + enum tree_code code = ops->code; >>>>> + tree arg0 = ops->op0; >>>>> + tree arg1 = ops->op1; >>>>> >>>>> move this code to do_store_flag (we really store a flag value). It should >>>>> also simply do what expand_vec_cond_expr does, probably simply >>>>> calling that with the {-1,...} {0,...} extra args should work. >>>> >>>> I started to do that, but the code in do_store_flag is completely >>>> different from what I am doing, and it looks confusing. I just call >>>> expand_vec_cond_expr and that is it. I can write a separate function, >>>> but the code is quite small. >>> >>> Hm? I see in your patch >>> >>> Index: gcc/expr.c >>> =================================================================== >>> --- gcc/expr.c (revision 177665) >>> +++ gcc/expr.c (working copy) >>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ >>> case UNGE_EXPR: >>> case UNEQ_EXPR: >>> case LTGT_EXPR: >>> + if (TREE_CODE (ops->type) == VECTOR_TYPE) >>> + { >>> + enum tree_code code = ops->code; >>> + tree arg0 = ops->op0; >>> + tree arg1 = ops->op1; >>> + tree arg_type = TREE_TYPE (arg0); >>> + tree el_type = TREE_TYPE (arg_type); >>> + tree t, ifexp, if_true, if_false; >>> + >>> + el_type = lang_hooks.types.type_for_size (TYPE_PRECISION >>> (el_type), 0); >>> + >>> + >>> + ifexp = build2 (code, type, arg0, arg1); >>> + if_true = build_vector_from_val (type, build_int_cst (el_type, -1)); >>> + if_false = build_vector_from_val (type, build_int_cst (el_type, 0)); >>> + >>> + if (arg_type != type) >>> + { >>> + if_true = convert (arg_type, if_true); >>> + if_false = convert (arg_type, if_false); >>> + t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false); >>> + t = convert (type, t); >>> + } >>> + else >>> + t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false); >>> + >>> + return expand_expr (t, >>> + modifier != EXPAND_STACK_PARM ? target : >>> NULL_RTX, >>> + tmode != VOIDmode ? tmode : mode, >>> + modifier); >>> + } >>> >>> that's not exactly "calling expand_vec_cond_expr". >> >> Well, actually it is. Keep in mind that clean backend would imply >> removing the conversions. But I'll make a function. > > Why does > > return expand_vec_cond_expr (build2 (ops->code, type, ops->op0, ops->op1), > build_vector_from_val > (type, build_int_cst (el_type, -1)), > build_vector_from_val > (type, build_int_cst (el_type, 0))); > > not work? If you push the conversions to expand_vec_cond_expr > by doing them on RTL you simplify things here and remove the requirement > from doing them in the C frontend for VEC_COND_EXPR as well. It does not work because vcond <a > b, c, d> requires a,b,c,d to have the same type. Now here we are dealing only with comparisons, so in case of floats we have vcond < f0 > f1, {-1,...}, {0,...}> which we have to transform into (vsi)(vcond< f0 >f1, (vsf){-1,...}, (vsf){0,...}>). Ok, so is it ok to do make this conversion here for the real types? >>>>> >>>>> As for the still required conversions, you should be able to delay those >>>>> from the C frontend (and here) to expand_vec_cond_expr by, after >>>>> expanding op1 and op2, wrapping a subreg around it with a proper mode >>>>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work), >>>>> and then convert the result back to the original mode. >>>>> >>>>> I'll leave the C frontend pieces of the patch for review by Joseph, but >>>> >>>> Conversions are there until we fix the backend. When backend will be >>>> able to digest f0 > f1 ? int0 : int1, all the conversions will go >>>> away. >>>> >>>>> +static tree >>>>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2) >>>>> >>>>> is missing a function comment. >>>> >>>> fixed. >>>> >>>>> +static tree >>>>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b, >>>>> + tree bitpos, tree bitsize, enum tree_code code) >>>>> +{ >>>>> + tree cond; >>>>> + tree comp_type; >>>>> + >>>>> + a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos); >>>>> + b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos); >>>>> + >>>>> + comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0); >>>>> + >>>>> >>>>> Use >>>>> >>>>> comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0); >>>>> >>>>> instead. But I think you don't want to use TYPE_PRECISION on >>>>> FP types. Instead you want a signed integer type of the same (mode) >>>>> size as the vector element type, thus >>>>> >>>>> comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE >>>>> (TYPE_MODE (inner_type)), 0); >>>> >>>> Hm, I thought that at this stage we don't wan to know anything about >>>> modes. I mean here I am really building the same integer type as the >>>> operands of the comparison have. But I can use MODE_BITSIZE as well, I >>>> don't think that it could happen that the size of the mode is >>>> different from the size of the type. Or could it? >>> >>> The comparison could be on floating-point types where TYPE_PRECISION >>> can be, for example, 80 for x87 doubles. You want an integer type >>> of the same width, so yes, GET_MODE_BITSIZE is the correct thing >>> to use here. >> >> Ok. >> >>>>> + cond = gimplify_build2 (gsi, code, comp_type, a, b); >>>>> >>>>> the result type of a comparison is boolean_type_node, not comp_type. >>>>> >>>>> + cond = gimplify_build2 (gsi, code, comp_type, a, b); >>>>> + return gimplify_build3 (gsi, COND_EXPR, comp_type, cond, >>>>> + build_int_cst (comp_type, -1), >>>>> + build_int_cst (comp_type, 0)); >>>>> >>>>> writing this as >>>>> >>>>> + return gimplify_build3 (gsi, COND_EXPR, comp_type, >>>>> fold_build2 (code, boolean_type_node, a, b), >>>>> + build_int_cst (comp_type, -1), >>>>> + build_int_cst (comp_type, 0)); >>>>> >>>>> will get the gimplifier a better chance at simplifcation. >>>>> >>>>> + if (! expand_vec_cond_expr_p (type, TYPE_MODE (type))) >>>>> >>>>> I think we are expecting the scalar type and the vector mode here >>>>> from looking at the single existing caller. It probably doesn't make >>>>> a difference (we only check TYPE_UNSIGNED of it, which should >>>>> also work for vector types), but let's be consistent. Thus, >>>> >>>> Ok. >>>> >>>>> if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type))) >>>>> >>>>> + if (! expand_vec_cond_expr_p (type, TYPE_MODE (type))) >>>>> + t = expand_vector_piecewise (gsi, do_compare, type, >>>>> + TREE_TYPE (TREE_TYPE (op0)), op0, op1, code); >>>>> + else >>>>> + t = gimplify_build2 (gsi, code, type, op0, op1); >>>>> >>>>> the else case looks odd. Why re-build a stmt that already exists? >>>>> Simply return NULL_TREE instead? >>>> >>>> I can adjust. The reason it is written that way is that >>>> expand_vector_operations_1 is using the result of the function to >>>> update rhs. >>> >>> Ok, so it should check whether there was any lowering done then. >>> >>>>> +static tree >>>>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp) >>>>> +{ >>>>> ... >>>>> + /* Expand vector condition inside of VEC_COND_EXPR. */ >>>>> + if (! expand_vec_cond_expr_p (TREE_TYPE (cond), >>>>> + TYPE_MODE (TREE_TYPE (cond)))) >>>>> + { >>>>> ... >>>>> + new_rhs = expand_vector_piecewise (gsi, do_compare, >>>>> + TREE_TYPE (cond), >>>>> + TREE_TYPE (TREE_TYPE (op1)), >>>>> + op0, op1, TREE_CODE (cond)); >>>>> >>>>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to >>>>> >>>>> tem = { a[0] < b[0] ? -1 : 0, ... } >>>>> v0 & tem | v1 & ~tem; >>>>> >>>>> instead of >>>>> >>>>> { a[0] < b[0] ? v0[0] : v1[0], ... } >>>>> >>>>> even if the bitwise operations could be carried out using vectors. >>>>> It's definitely beneficial to do the first if the CPU can create the >>>>> bitmask. >>>>> >>>> >>>> o_O >>>> >>>> I thought you always wanted to do (m & v0) | (~m & v1). >>>> Do you want to have two cases of the expansion then -- when we have >>>> mask available and when we don't? But it is really unlikely that we >>>> can get the mask, but cannot get vcond. Because condition is actually >>>> vcond. So once again -- do we always expand to {a[0] > b[0] ? v[0] : >>>> c[0], ...}? >>> >>> Hm, yeah. I suppose with the current setup it's hard to only >>> get the mask but not the full vcond ;) So it probably makes >>> sense to always expand to {a[0] > b[0] ? v[0] :c[0],...} as >>> fallback. Sorry for the confusion ;) >> >> Ok. >> >>>>> + /* Run vecower on the expresisons we have introduced. */ >>>>> + for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp)) >>>>> + expand_vector_operations_1 (&gsi_tmp); >>>>> >>>>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi) >>>>> >>>>> +static bool >>>>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr) >>>>> +{ >>>>> >>>>> This function is lacking a comment. >>>>> >>>>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_ >>>>> ... >>>>> + /* Try to get rid from the useless vector comparison >>>>> + x != {0,0,...} which is inserted by the typechecker. */ >>>>> + if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR) >>>>> >>>>> how and why? You simply drop that comparison - that doesn't look >>>>> correct. And in fact TREE_OPERAND (cond, 0) will never be a >>>>> comparison - that wouldn't be valid gimple. Please leave this >>>>> optimization to SSA based forward propagation (I can help you here >>>>> once the patch is in). >>>> >>>> No-no-no. This is the second part of avoiding >>>> m = a > b; >>>> r = m ? v0 : v1; >>>> >>>> to prevent m expansion to m != {0}. >>>> >>>> I do not _simply_ drop the comparison. I drop it only if >>>> is_vector_comparison returned true. It means that we can never get >>>> into the situation that we are dropping actually a comparison inserted >>>> by the user. But what I really want to achieve here is to drop the >>>> comparison that the frontend inserts every time when it sees an >>>> expression there. >>>> >>>> As I said earlier, tree forward propagation kicks only using -On, and >>>> I would really like to make sure that I can get rid of useless != {0} >>>> at any level. >> >>> Please don't. If the language extension forces a != 0 then it should >>> appear at -O0. The code is fishy anyway in the way it walks stmts >>> in is_vector_comparison. At least I don't like to see this optimization >>> done here for the sake of -O0 in this initial patch - you could try >>> arguing about it as a followup improvement (well, probably with not >>> much luck). -O0 is about compile-speed and debugging, doing >>> data-flow by walking stmts backward is slow. >> >> Ok, then I seriously don't see any motivation to support the >> VEC_COND_EXPR. The following code: >> >> m = a > b; >> r = (m & v0) | (~m & v1) >> >> gives me much more flexibility and control. What the VEC_COND_EXPR is >> good for? Syntactical sugar? >> >> How about throwing away all the VEC_COND_EXPR parts supporting only >> conditions (implicitly expressed using vconds)? If we would agree on >> implicit conversions for real types, then this is a functionality that >> perfectly satisfies my needs. >> >> I don't see any interest from the backend people and I cannot wait >> forever, so why don't we start with a simple thing? > > But the simple thing is already what the backend supports. > > Richard. > Well, it is not "what" it is "how" -- that is what we are discussing for three weeks already. Ok, so the question now is, whether it is fine to have conversions inside expand_expr_real_2? If we agree that it is ok to do, then I can adjust the patch. Artem.
Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 177665) +++ gcc/expr.c (working copy) @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ case UNGE_EXPR: case UNEQ_EXPR: case LTGT_EXPR: + if (TREE_CODE (ops->type) == VECTOR_TYPE) + { + enum tree_code code = ops->code; + tree arg0 = ops->op0; + tree arg1 = ops->op1; + tree arg_type = TREE_TYPE (arg0); + tree el_type = TREE_TYPE (arg_type); + tree t, ifexp, if_true, if_false; + + el_type = lang_hooks.types.type_for_size (TYPE_PRECISION (el_type), 0); + + + ifexp = build2 (code, type, arg0, arg1); + if_true = build_vector_from_val (type, build_int_cst (el_type, -1)); + if_false = build_vector_from_val (type, build_int_cst (el_type, 0)); + + if (arg_type != type) + { + if_true = convert (arg_type, if_true); + if_false = convert (arg_type, if_false); + t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false); + t = convert (type, t); + } + else + t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false); + + return expand_expr (t, + modifier != EXPAND_STACK_PARM ? target : NULL_RTX, + tmode != VOIDmode ? tmode : mode, + modifier); + }