From patchwork Thu Aug 25 11:39:50 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 111552 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 55EEDB6F7E for ; Thu, 25 Aug 2011 21:40:26 +1000 (EST) Received: (qmail 11791 invoked by alias); 25 Aug 2011 11:40:22 -0000 Received: (qmail 11769 invoked by uid 22791); 25 Aug 2011 11:40:18 -0000 X-SWARE-Spam-Status: No, hits=0.4 required=5.0 tests=AWL, BAYES_50, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SINGLE_HEADER_2K, TW_TM X-Spam-Check-By: sourceware.org Received: from mail-yi0-f47.google.com (HELO mail-yi0-f47.google.com) (209.85.218.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 25 Aug 2011 11:39:51 +0000 Received: by yia28 with SMTP id 28so1709336yia.20 for ; Thu, 25 Aug 2011 04:39:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.151.6.12 with SMTP id j12mr161182ybi.421.1314272390195; Thu, 25 Aug 2011 04:39:50 -0700 (PDT) Received: by 10.150.57.5 with HTTP; Thu, 25 Aug 2011 04:39:50 -0700 (PDT) In-Reply-To: References: <4E4D224F.1020206@redhat.com> Date: Thu, 25 Aug 2011 13:39:50 +0200 Message-ID: Subject: Re: Vector Comparison patch From: Richard Guenther To: Artem Shinkarov Cc: Richard Henderson , gcc-patches@gcc.gnu.org, "Joseph S. Myers" , Uros Bizjak X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov wrote: > On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther > wrote: >> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov >> wrote: >>> Here is a cleaned-up patch without the hook. Mostly it works in a way >>> we discussed. >>> >>> So I think it is a right time to do something about vcond patterns, >>> which would allow me to get rid of conversions that I need to put all >>> over the code. >>> >>> Also at the moment the patch breaks lto frontend with a simple example: >>> #define vector(elcount, type)  \ >>> __attribute__((vector_size((elcount)*sizeof(type)))) type >>> >>> int main (int argc, char *argv[]) { >>>    vector (4, float) f0; >>>    vector (4, float) f1; >>> >>>    f0 =  f1 != f0 >>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0}; >>> >>>    return (int)f0[argc]; >>> } >>> >>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244 >>> >>> I looked into the file, the conversion function is defined as >>> gcc_unreachable (). I am not very familiar with lto, so I don't really >>> know what is the right way to treat the conversions. >>> >>> And I seriously need help with backend patterns. >> >> On the patch. >> >> The documentation needs review by a native english speaker, but here >> are some factual comments: >> >> +In C vector comparison is supported within standard comparison operators: >> >> it should read 'In GNU C' here and everywhere else as this is a GNU >> extension. >> >>  The result of the >> +comparison is a signed integer-type vector where the size of each >> +element must be the same as the size of compared vectors element. >> >> The result type of the comparison is determined by the C frontend, >> it isn't under control of the user.  What you are implying here is >> restrictions on vector assignments, which are documented elsewhere. >> I'd just say >> >> 'The result of the comparison is a vector of the same width and number >> of elements as the comparison operands with a signed integral element >> type.' >> >> +In addition to the vector comparison C supports conditional expressions >> >> See above. >> >> +For the convenience condition in the vector conditional can be just a >> +vector of signed integer type. >> >> 'of integer type.'  I don't see a reason to disallow unsigned integers, >> they can be equally well compared against zero. > > I'll have a final go on the documentation, it is untouched from the old patches. > >> Index: gcc/targhooks.h >> =================================================================== >> --- gcc/targhooks.h     (revision 177665) >> +++ gcc/targhooks.h     (working copy) >> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization >>  extern tree default_builtin_reciprocal (unsigned int, bool, bool); >> >>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool); >> + >>  extern bool >>  default_builtin_support_vector_misalignment (enum machine_mode mode, >>                                             const_tree, >> >> spurious whitespace change. > > Yes, thanks. > >> Index: gcc/optabs.c >> =================================================================== >> --- gcc/optabs.c        (revision 177665) >> +++ gcc/optabs.c        (working copy) >> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type >> ... >> +  else >> +    { >> +      rtx rtx_op0; >> +      rtx vec; >> + >> +      rtx_op0 = expand_normal (op0); >> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX); >> +      vec = CONST0_RTX (mode); >> + >> +      create_output_operand (&ops[0], target, mode); >> +      create_input_operand (&ops[1], rtx_op1, mode); >> +      create_input_operand (&ops[2], rtx_op2, mode); >> +      create_input_operand (&ops[3], comparison, mode); >> +      create_input_operand (&ops[4], rtx_op0, mode); >> +      create_input_operand (&ops[5], vec, mode); >> >> this still builds the fake(?) != comparison, but as you said you need help >> with the .md part if we want to use a machine specific pattern for this >> case (which we eventually want, for the sake of using XOP vcond). > > Yes, I am waiting for it. This is the only way at the moment to make > sure that in > m = a > b; > r = m ? c : d; > > m in the vcond is not transformed into the m != 0. > >> Index: gcc/target.h >> =================================================================== >> --- gcc/target.h        (revision 177665) >> +++ gcc/target.h        (working copy) >> @@ -51,6 +51,7 @@ >>  #define GCC_TARGET_H >> >>  #include "insn-modes.h" >> +#include "gimple.h" >> >>  #ifdef ENABLE_CHECKING >> >> spurious change. > > Old stuff, fixed. > >> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr >>      floating-point, we can only do some of these simplifications.)  */ >>   if (operand_equal_p (arg0, arg1, 0)) >>     { >> +      tree arg0_type = TREE_TYPE (arg0); >> + >>       switch (code) >>        { >>        case EQ_EXPR: >> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0)) >> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) >> +         if (! FLOAT_TYPE_P (arg0_type) >> +             || ! HONOR_NANS (TYPE_MODE (arg0_type))) >> ... > > Ok. > >> >> Likewise. >> >> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ >>     case UNGE_EXPR: >>     case UNEQ_EXPR: >>     case LTGT_EXPR: >> +      if (TREE_CODE (ops->type) == VECTOR_TYPE) >> +       { >> +         enum tree_code code = ops->code; >> +         tree arg0 = ops->op0; >> +         tree arg1 = ops->op1; >> >> move this code to do_store_flag (we really store a flag value).  It should >> also simply do what expand_vec_cond_expr does, probably simply >> calling that with the {-1,...} {0,...} extra args should work. > > I started to do that, but the code in do_store_flag is completely > different from what I am doing, and it looks confusing. I just call > expand_vec_cond_expr and that is it. I can write a separate function, > but the code is quite small. Hm? I see in your patch that's not exactly "calling expand_vec_cond_expr". >> >> As for the still required conversions, you should be able to delay those >> from the C frontend (and here) to expand_vec_cond_expr by, after >> expanding op1 and op2, wrapping a subreg around it with a proper mode >> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work), >> and then convert the result back to the original mode. >> >> I'll leave the C frontend pieces of the patch for review by Joseph, but > > Conversions are there until we fix the backend. When backend will be > able to digest f0 > f1 ? int0 : int1, all the conversions will go > away. > >> +static tree >> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2) >> >> is missing a function comment. > > fixed. > >> +static tree >> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b, >> +         tree bitpos, tree bitsize, enum tree_code code) >> +{ >> +  tree cond; >> +  tree comp_type; >> + >> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos); >> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos); >> + >> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0); >> + >> >> Use >> >>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0); >> >> instead.  But I think you don't want to use TYPE_PRECISION on >> FP types.  Instead you want a signed integer type of the same (mode) >> size as the vector element type, thus >> >>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE >> (TYPE_MODE (inner_type)), 0); > > Hm, I thought that at this stage we don't wan to know anything about > modes. I mean here I am really building the same integer type as the > operands of the comparison have. But I can use MODE_BITSIZE as well, I > don't think that it could happen that the size of the mode is > different from the size of the type. Or could it? The comparison could be on floating-point types where TYPE_PRECISION can be, for example, 80 for x87 doubles. You want an integer type of the same width, so yes, GET_MODE_BITSIZE is the correct thing to use here. >> +  cond = gimplify_build2 (gsi, code, comp_type, a, b); >> >> the result type of a comparison is boolean_type_node, not comp_type. >> >> +  cond = gimplify_build2 (gsi, code, comp_type, a, b); >> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond, >> +                    build_int_cst (comp_type, -1), >> +                    build_int_cst (comp_type, 0)); >> >> writing this as >> >> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, >>                     fold_build2 (code, boolean_type_node, a, b), >> +                    build_int_cst (comp_type, -1), >> +                    build_int_cst (comp_type, 0)); >> >> will get the gimplifier a better chance at simplifcation. >> >> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type))) >> >> I think we are expecting the scalar type and the vector mode here >> from looking at the single existing caller.  It probably doesn't make >> a difference (we only check TYPE_UNSIGNED of it, which should >> also work for vector types), but let's be consistent.  Thus, > > Ok. > >>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type))) >> >> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type))) >> +    t = expand_vector_piecewise (gsi, do_compare, type, >> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code); >> +  else >> +    t = gimplify_build2  (gsi, code, type, op0, op1); >> >> the else case looks odd.  Why re-build a stmt that already exists? >> Simply return NULL_TREE instead? > > I can adjust. The reason it is written that way is that > expand_vector_operations_1 is using the result of the function to > update rhs. Ok, so it should check whether there was any lowering done then. >> +static tree >> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp) >> +{ >> ... >> +      /* Expand vector condition inside of VEC_COND_EXPR.  */ >> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond), >> +                                   TYPE_MODE (TREE_TYPE (cond)))) >> +       { >> ... >> +         new_rhs = expand_vector_piecewise (gsi, do_compare, >> +                                            TREE_TYPE (cond), >> +                                            TREE_TYPE (TREE_TYPE (op1)), >> +                                            op0, op1, TREE_CODE (cond)); >> >> I'm not sure it is beneficial to expand a < b ? v0 : v1 to >> >> tem = { a[0] < b[0] ? -1 : 0, ... } >> v0 & tem | v1 & ~tem; >> >> instead of >> >> { a[0] < b[0] ? v0[0] : v1[0], ... } >> >> even if the bitwise operations could be carried out using vectors. >> It's definitely beneficial to do the first if the CPU can create the >> bitmask. >> > > o_O > > I thought you always wanted to do (m & v0) | (~m & v1). > Do you want to have two cases of the expansion then -- when we have > mask available and when we don't? But it is really unlikely that we > can get the mask, but cannot get vcond. Because condition is actually > vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] : > c[0], ...}? Hm, yeah. I suppose with the current setup it's hard to only get the mask but not the full vcond ;) So it probably makes sense to always expand to {a[0] > b[0] ? v[0] :c[0],...} as fallback. Sorry for the confusion ;) >> +  /* Run vecower on the expresisons we have introduced.  */ >> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp)) >> +    expand_vector_operations_1 (&gsi_tmp); >> >> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi) >> >> +static bool >> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr) >> +{ >> >> This function is lacking a comment. >> >> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_ >> ... >> +      /* Try to get rid from the useless vector comparison >> +        x != {0,0,...} which is inserted by the typechecker.  */ >> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR) >> >> how and why?  You simply drop that comparison - that doesn't look >> correct.  And in fact TREE_OPERAND (cond, 0) will never be a >> comparison - that wouldn't be valid gimple.  Please leave this >> optimization to SSA based forward propagation (I can help you here >> once the patch is in). > > No-no-no. This is the second part of avoiding > m = a > b; > r = m ? v0 : v1; > > to prevent m expansion to m != {0}. > > I do not _simply_ drop the comparison. I drop it only if > is_vector_comparison returned true. It means that we can never get > into the situation that we are dropping actually a comparison inserted > by the user. But what I really want to achieve here is to drop the > comparison that the frontend inserts every time when it sees an > expression there. > > As I said earlier, tree forward propagation kicks only using -On, and > I would really like to make sure that I can get rid of useless != {0} > at any level. Please don't. If the language extension forces a != 0 then it should appear at -O0. The code is fishy anyway in the way it walks stmts in is_vector_comparison. At least I don't like to see this optimization done here for the sake of -O0 in this initial patch - you could try arguing about it as a followup improvement (well, probably with not much luck). -O0 is about compile-speed and debugging, doing data-flow by walking stmts backward is slow. >> >> +      if (expand_vec_cond_expr_p (TREE_TYPE (exp), >> +                                  TYPE_MODE (TREE_TYPE (exp)))) >> +        { >> +         update_stmt (gsi_stmt (*gsi)); >> +         return; >> >> no need to update the stmt when you do nothing. >> >> +      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp); >> +      gimple_assign_set_rhs_from_tree (gsi, new_rhs); >> +      update_stmt (gsi_stmt (*gsi)); >> +    } >> >> missing return;, just for clarity that you are done here. > > Ok. > >> You don't do anything for comparisons here, in case they are split >> away from the VEC_COND_EXPR by the gimplifier.  But if the >> target doesn't support VEC_COND_EXPRs we have to lower them. >> I suggest checking your testcases on i?86-linux (or with -m32 -march=i486). >> > > expand_vector_operations_1 take care about any vector comparison, > considering it as a binary operation. See expand_vector_operation and > do_compare for more details. Ah, ok, I missed that piece. >> -TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h >> +TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h >> >> huh, no please ;)  I suppose that's no longer necessary anyway now. >> > > Yeah, fixed. :) > >> I'll leave the i386.c pieces to the x86 target maintainers to review. >> They probably will change once the .md file changes are sorted out. > > If they ever going to be sorted out... Well, we can move the conversion stuff to the point of expansion using convert_move. That'll keep the middle-end and the C frontend clean and move the "hack" towards the backends. Richard. >> Thanks, >> Richard. >> > > > Thanks, > Artem. > Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 177665) +++ gcc/expr.c (working copy) @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ case UNGE_EXPR: case UNEQ_EXPR: case LTGT_EXPR: + if (TREE_CODE (ops->type) == VECTOR_TYPE) + { + enum tree_code code = ops->code; + tree arg0 = ops->op0; + tree arg1 = ops->op1; + tree arg_type = TREE_TYPE (arg0); + tree el_type = TREE_TYPE (arg_type); + tree t, ifexp, if_true, if_false; + + el_type = lang_hooks.types.type_for_size (TYPE_PRECISION (el_type), 0); + + + ifexp = build2 (code, type, arg0, arg1); + if_true = build_vector_from_val (type, build_int_cst (el_type, -1)); + if_false = build_vector_from_val (type, build_int_cst (el_type, 0)); + + if (arg_type != type) + { + if_true = convert (arg_type, if_true); + if_false = convert (arg_type, if_false); + t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false); + t = convert (type, t); + } + else + t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false); + + return expand_expr (t, + modifier != EXPAND_STACK_PARM ? target : NULL_RTX, + tmode != VOIDmode ? tmode : mode, + modifier); + }