From patchwork Tue Sep 6 15:36:04 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 113579 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 41C76B6F70 for ; Wed, 7 Sep 2011 01:36:32 +1000 (EST) Received: (qmail 22784 invoked by alias); 6 Sep 2011 15:36:27 -0000 Received: (qmail 22542 invoked by uid 22791); 6 Sep 2011 15:36:24 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, TW_TM X-Spam-Check-By: sourceware.org Received: from mail-yi0-f47.google.com (HELO mail-yi0-f47.google.com) (209.85.218.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 06 Sep 2011 15:36:05 +0000 Received: by yia28 with SMTP id 28so4066075yia.20 for ; Tue, 06 Sep 2011 08:36:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.243.9 with SMTP id q9mr4116060ybh.52.1315323364210; Tue, 06 Sep 2011 08:36:04 -0700 (PDT) Received: by 10.151.103.7 with HTTP; Tue, 6 Sep 2011 08:36:04 -0700 (PDT) In-Reply-To: References: <4E5DF23A.1080309@free.fr> Date: Tue, 6 Sep 2011 17:36:04 +0200 Message-ID: Subject: Re: Vector shuffling From: Richard Guenther To: Artem Shinkarov Cc: "Joseph S. Myers" , Duncan Sands , gcc-patches@gcc.gnu.org, Jan Hubicka X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Sat, Sep 3, 2011 at 5:52 PM, Artem Shinkarov wrote: > On Fri, Sep 2, 2011 at 8:52 PM, Joseph S. Myers wrote: >> On Fri, 2 Sep 2011, Artem Shinkarov wrote: >> >>> Joseph, I don't understand this comment. I have 2 or 3 arguments in >>> the VEC_SHUFFLE_EXPR and any of them can be C_MAYBE_CONST_EXPR, >> >> Yes. >> >>> so I >>> need to wrap mask (the last argument) to avoid the following failure: >> >> No.  You need to fold it (c_fully_fold) to eliminate any >> C_MAYBE_CONST_EXPR it contains, but you shouldn't need to wrap the result >> of folding in a SAVE_EXPR. > > Ok, Now I get it, thanks. > > In the attachment there is a new version of the patch that removes save-exprs. You are missing a ChangeLog entry + +res = __builtin_shuffle (a, mask1); /* res is @{1,2,2,4@} */ +res = __builtin_shuffle2 (a, b, mask2); /* res is @{1,5,3,6@} */ should be __builtin_shuffle (a, b, mask2); +bool +expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, + tree v1, tree mask) +{ + int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0)))); + int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask)))); + + if (TREE_CODE (mask) == VECTOR_CST + && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask)) + return true; + + if (v0 != v1 || v0_mode_s != mask_mode_s) + return false; the mask size check constrains the size of the vector elements but not their count. It looks like you should instead verify the vector modes are equal? At least that would match the fact that you have an expander that distinguishes one operand mode only. At some point we definitely want to merge the builtin_vec_perm_* target hooks with the optab. + fn = copy_node (fn); You shouldn't need to copy fn + call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn); + call = build_call_nary (type /* ? */, call, 3, v0, v1, mask); + + t = expand_normal (call); + target = gen_reg_rtx (mode); + emit_insn (gen_rtx_SET (VOIDmode, target, t)); why can't you simply use return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL); here? + case VEC_SHUFFLE_EXPR: + { + enum gimplify_status r0, r1, r2; + + if (TREE_OPERAND (*expr_p, 0) == TREE_OPERAND (*expr_p, 1)) + { + r0 = r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, + post_p, is_gimple_val, fb_rvalue); + TREE_OPERAND (*expr_p, 1) = TREE_OPERAND (*expr_p, 0); + } + else + { + r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, + post_p, is_gimple_val, fb_rvalue); + r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p, + post_p, is_gimple_val, fb_rvalue); + } please avoid the above tree sharing (I realize it's probably for constants, but we don't share those). Thus, unconditionally gimplify both operands. The equality check in expanding should probably use operand_equal_p (..., ..., 0) instead of a pointer comparison. Thus, the above should simply use the expr_3: code, like FMA_EXPR. +/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR + means + + freach i in length (mask): foreach + number of elements in V0 and V1. The size of the inner type + of the MASK and of the V0 and V1 must be the same. I leave the C frontend and x86 backend changes to the respective maintainers. Honza, can you have a look at the x86 changes? Thanks, Richard. > > Artem. > Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c (revision 178354) +++ gcc/tree-cfg.c (working copy) @@ -3713,6 +3713,7 @@ verify_gimple_assign_ternary (gimple stm case DOT_PROD_EXPR: case REALIGN_LOAD_EXPR: + case VEC_SHUFFLE_EXPR: /* FIXME. */ return false; can you do some basic verification here? At least what you document in tree.def should be verified here. Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c (revision 178354) +++ gcc/tree-vect-generic.c (working copy) @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3. #include "tree-pass.h" #include "flags.h" #include "ggc.h" +#include "diagnostic.h" I don't see where you need this, nor a Makefile.in change. +/* Build a reference to the element of the vector VECT. Function two spaces after a '.' + if (TREE_CODE (vect) == VECTOR_CST) + { + unsigned i; + tree vals = TREE_VECTOR_CST_ELTS (vect); + for (i = 0; vals; vals = TREE_CHAIN (vals), ++i) + if (i == index) operand_equal_p + else if (TREE_CODE (vect) == SSA_NAME) + { + tree el; + gimple vectdef = SSA_NAME_DEF_STMT (vect); + if (gimple_assign_single_p (vectdef) + && (el = vector_element (gsi, gimple_assign_rhs1 (vectdef), + idx, ptmpvec)) + != error_mark_node) + return el; I think it's a premature optimization to look up the defining statement here. + if (need_asgn) + { + TREE_ADDRESSABLE (tmpvec) = 1; this should be at the point we call create_tmp_var. + asgn = gimple_build_assign (tmpvec, vect); and I think this needs gimplification for the case of a non-constant non-SSA name vect. But maybe that never happens (consider (v4si){a, b, c, d}[i]). + indextype = build_index_type (size_int (maxval)); + arraytype = build_array_type (TREE_TYPE (type), indextype); there is now build_array_type_nelts conveniently available. + idx, NULL_TREE, NULL_TREE); + + +} excess vertical space. +static void +lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc) +{ +#define TRAP_RETURN(new_stmt, stmt, gsi, vec0) \ +do { \ + new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0); \ + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); \ + split_block (gimple_bb (new_stmt), new_stmt); \ + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), vec0); \ + gsi_replace (gsi, new_stmt, false); \ + return; \ +} while (0) I don't like such defines - this instead asks for factoring the lowering to a function that returns a failure state and the caller doing the TRAP_RETURN in case of failure. + } + + excess vertical space (please double-check your patches for these simple stylistic issues). + if (vec0 == vec1) + { operand_equal_p + if (idxval == error_mark_node) + { + warning_at (loc, 0, "Invalid shuffling mask index %i", i); + TRAP_RETURN (new_stmt, stmt, gsi, vec0); ah, here is the diagnostic. I think you should change that to if (warning_at (loc, 0, ...)) inform (loc, "if this code is reached, the program will abort"); as we say elsewhere when inserting traps. + if (code == VEC_SHUFFLE_EXPR) + { + lower_vec_shuffle (gsi, gimple_location (stmt)); + gimple_set_modified (gsi_stmt (*gsi), true); + update_stmt (gsi_stmt (*gsi)); I don't think you need the gimple_set_modified call. /* Use this to lower vector operations introduced by the vectorizer, if it may need the bit-twiddling tricks implemented in this file. */ + spurious white-space change. static bool -gate_expand_vector_operations (void) +gate_expand_vector_operations_noop (void) { - return flag_tree_vectorize != 0; + return optimize == 0; } but I think we already have this change in-tree. Your patch needs updating (we have _ssa, not _noop). Index: gcc/passes.c =================================================================== --- gcc/passes.c (revision 178354) +++ gcc/passes.c (working copy) @@ -1354,7 +1354,6 @@ init_optimization_passes (void) NEXT_PASS (pass_vectorize); { struct opt_pass **p = &pass_vectorize.pass.sub; - NEXT_PASS (pass_lower_vector_ssa); NEXT_PASS (pass_dce_loop); } NEXT_PASS (pass_predcom); @@ -1366,6 +1365,7 @@ init_optimization_passes (void) NEXT_PASS (pass_lim); NEXT_PASS (pass_tree_loop_done); } + NEXT_PASS (pass_lower_vector_ssa); This change should not be neccesary.