diff mbox

Vector shuffling

Message ID CAFiYyc27tS48_JHfY2YwaCS8Sst5yskz4zGfk8jRdqKqYc5_QQ@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Sept. 6, 2011, 3:36 p.m. UTC
On Sat, Sep 3, 2011 at 5:52 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Fri, Sep 2, 2011 at 8:52 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
>> On Fri, 2 Sep 2011, Artem Shinkarov wrote:
>>
>>> Joseph, I don't understand this comment. I have 2 or 3 arguments in
>>> the VEC_SHUFFLE_EXPR and any of them can be C_MAYBE_CONST_EXPR,
>>
>> Yes.
>>
>>> so I
>>> need to wrap mask (the last argument) to avoid the following failure:
>>
>> No.  You need to fold it (c_fully_fold) to eliminate any
>> C_MAYBE_CONST_EXPR it contains, but you shouldn't need to wrap the result
>> of folding in a SAVE_EXPR.
>
> Ok, Now I get it, thanks.
>
> In the attachment there is a new version of the patch that removes save-exprs.

You are missing a ChangeLog entry

+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle2 (a, b, mask2);   /* res is @{1,5,3,6@}  */

should be __builtin_shuffle (a, b, mask2);

+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0,
+                          tree v1, tree mask)
+{
+  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))));
+  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE
(mask))));
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+
+  if (v0 != v1 || v0_mode_s != mask_mode_s)
+    return false;

the mask size check constrains the size of the vector elements but not
their count.  It looks like you should instead verify the vector modes
are equal?  At least that would match the fact that you have an expander
that distinguishes one operand mode only.

At some point we definitely want to merge the builtin_vec_perm_*
target hooks with the optab.

+      fn = copy_node (fn);

You shouldn't need to copy fn

+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
+
+      t = expand_normal (call);
+      target = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (VOIDmode, target, t));

why can't you simply use

  return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL);

here?

+       case VEC_SHUFFLE_EXPR:
+         {
+           enum gimplify_status r0, r1, r2;
+
+           if (TREE_OPERAND (*expr_p, 0) == TREE_OPERAND (*expr_p, 1))
+             {
+               r0 = r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+                                        post_p, is_gimple_val, fb_rvalue);
+               TREE_OPERAND (*expr_p, 1) = TREE_OPERAND (*expr_p, 0);
+             }
+           else
+             {
+                r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+                                    post_p, is_gimple_val, fb_rvalue);
+                r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+                                    post_p, is_gimple_val, fb_rvalue);
+             }

please avoid the above tree sharing (I realize it's probably for constants,
but we don't share those).  Thus, unconditionally gimplify both
operands.  The equality check in expanding should probably use
operand_equal_p (..., ..., 0) instead of a pointer comparison.

Thus, the above should simply use the expr_3: code, like FMA_EXPR.

+/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
+   means
+
+   freach i in length (mask):

foreach

+   number of elements in V0 and V1. The size of the inner type
+   of the MASK and of the V0 and V1 must be the same.

I leave the C frontend and x86 backend changes to the respective
maintainers.  Honza, can you have a look at the x86 changes?

Thanks,
Richard.

>
> Artem.
>
diff mbox

Patch

Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c      (revision 178354)
+++ gcc/tree-cfg.c      (working copy)
@@ -3713,6 +3713,7 @@  verify_gimple_assign_ternary (gimple stm

     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
+    case VEC_SHUFFLE_EXPR:
       /* FIXME.  */
       return false;

can you do some basic verification here?  At least what you document
in tree.def should be verified here.

Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c     (revision 178354)
+++ gcc/tree-vect-generic.c     (working copy)
@@ -30,6 +30,7 @@  along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"

I don't see where you need this, nor a Makefile.in change.

+/* Build a reference to the element of the vector VECT. Function

two spaces after a '.'

+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+            unsigned i;
+            tree vals = TREE_VECTOR_CST_ELTS (vect);
+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+              if (i == index)

operand_equal_p

+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+          tree el;
+          gimple vectdef = SSA_NAME_DEF_STMT (vect);
+          if (gimple_assign_single_p (vectdef)
+              && (el = vector_element (gsi, gimple_assign_rhs1 (vectdef),
+                                       idx, ptmpvec))
+                 != error_mark_node)
+            return el;

I think it's a premature optimization to look up the defining statement here.

+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;

this should be at the point we call create_tmp_var.

+      asgn = gimple_build_assign (tmpvec, vect);

and I think this needs gimplification for the case of a non-constant
non-SSA name vect.  But maybe that never happens
(consider (v4si){a, b, c, d}[i]).

+  indextype = build_index_type (size_int (maxval));
+  arraytype = build_array_type (TREE_TYPE (type), indextype);

there is now build_array_type_nelts conveniently available.

+                 idx, NULL_TREE, NULL_TREE);
+
+
+}

excess vertical space.

+static void
+lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+#define TRAP_RETURN(new_stmt, stmt, gsi, vec0) \
+do { \
+  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0); \
+  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT); \
+  split_block (gimple_bb (new_stmt), new_stmt); \
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt), vec0); \
+  gsi_replace (gsi, new_stmt, false); \
+  return; \
+} while (0)

I don't like such defines - this instead asks for factoring the lowering
to a function that returns a failure state and the caller doing the
TRAP_RETURN in case of failure.

+    }
+
+

excess vertical space (please double-check your patches for these simple
stylistic issues).

+  if (vec0 == vec1)
+    {

operand_equal_p

+          if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);

ah, here is the diagnostic.  I think you should change that to

  if (warning_at (loc, 0, ...))
    inform (loc, "if this code is reached, the program will abort");

as we say elsewhere when inserting traps.

+  if (code == VEC_SHUFFLE_EXPR)
+    {
+      lower_vec_shuffle (gsi, gimple_location (stmt));
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));

I don't think you need the gimple_set_modified call.

 /* Use this to lower vector operations introduced by the vectorizer,
    if it may need the bit-twiddling tricks implemented in this file.  */

+

spurious white-space change.

 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_noop (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }

but I think we already have this change in-tree.  Your patch needs updating
(we have _ssa, not _noop).

Index: gcc/passes.c
===================================================================
--- gcc/passes.c        (revision 178354)
+++ gcc/passes.c        (working copy)
@@ -1354,7 +1354,6 @@  init_optimization_passes (void)
          NEXT_PASS (pass_vectorize);
            {
              struct opt_pass **p = &pass_vectorize.pass.sub;
-             NEXT_PASS (pass_lower_vector_ssa);
              NEXT_PASS (pass_dce_loop);
            }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@  init_optimization_passes (void)
          NEXT_PASS (pass_lim);
          NEXT_PASS (pass_tree_loop_done);
        }
+      NEXT_PASS (pass_lower_vector_ssa);

This change should not be neccesary.