Simple optimization for MASK_STORE.
diff mbox

Message ID CAFiYyc3Q+y7+FOypZNeXEM6=uwEcNpmZLk1-BVDGLcjY4F4+EA@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Nov. 12, 2015, 1:58 p.m. UTC
On Wed, Nov 11, 2015 at 2:13 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> What we should do to cope with this problem (structure size increasing)?
> Should we return to vector comparison version?

Ok, given this constraint I think the cleanest approach is to allow
integer(!) vector equality(!) compares with scalar result.  This should then
expand via cmp_optab and not via vec_cmp_optab.

On gimple you can then have

 if (mask_vec_1 != {0, 0, .... })
...

Note that a fallback expansion (for optabs.c to try) would be
the suggested view-conversion (aka, subreg) variant using
a same-sized integer mode.

Target maintainers can then choose what is a better fit for
their target (and instruction set as register set constraints may apply).

The patch you posted seems to do this but not restrict the compares
to integer ones (please do that).

       if (TREE_CODE (op0_type) == VECTOR_TYPE
          || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+         /* Allow vector comparison returning boolean if operand types
+            are equal and CODE is EQ/NE.  */
+         if ((code != EQ_EXPR && code != NE_EXPR)
+             || TREE_CODE (op0_type) != TREE_CODE (op1_type)
+             || TYPE_VECTOR_SUBPARTS (op0_type)
+                != TYPE_VECTOR_SUBPARTS (op1_type)
+             || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type)))
+                != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type))))

These are all checked with the useless_type_conversion_p checks done earlier.

As said I'd like to see a

                || ! VECTOR_INTEGER_TYPE_P (op0_type)

check added so we and targets do not need to worry about using EQ/NE vs. CMP
and worry about signed zeros and friends.

+           {
+             error ("type mismatch for vector comparison returning a boolean");
+             debug_generic_expr (op0_type);
+             debug_generic_expr (op1_type);
+             return true;
+           }



does not expect this (valid) GENERIC tree.

+  if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0)
+    return;

not sure if I like a param more than a target hook ... :/

+      /* Create vector comparison with boolean result.  */
+      vectype = TREE_TYPE (mask);
+      zero = build_zero_cst (TREE_TYPE (vectype));
+      zero = build_vector_from_val (vectype, zero);

build_zero_cst (vectype);

+      stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);

you can omit the NULL_TREE operands.

+      gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);

please omit the assert.

+      gimple_set_vdef (last, new_vdef);

do this before you create the PHI.

+         /* Put definition statement of stored value in STORE_BB
+            if possible.  */
+         arg3 = gimple_call_arg (last, 3);
+         if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
+           {
...

is this really necessary?  It looks incomplete to me anyway.  I'd rather have
a late sink pass if this shows necessary.  Btw,...

+                it is legal.  */
+             if (gimple_bb (def_stmt) == bb
+                 && is_valid_sink (def_stmt, last_store))

with the implementation of is_valid_sink this is effectively

   && (!gimple_vuse (def_stmt)
          || gimple_vuse (def_stmt) == gimple_vdef (last_store))


I still think this "pass" is quite a hack, esp. as it appears as generic
code in a GIMPLE pass.  And esp. as this hack seems to be needed
for Haswell only, not Boradwell or Skylake.

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2015-11-11 12:18 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2015-11-10 17:46 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>> 2015-11-10 15:33 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I tried it but 256-bit precision integer type is not yet supported.
>>>>>>
>>>>>> What's the symptom?  The compare cannot be expanded?  Just add a pattern then.
>>>>>> After all we have modes up to XImode.
>>>>>
>>>>> I suppose problem may be in:
>>>>>
>>>>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128)
>>>>>
>>>>> which doesn't allow to create constants of bigger size.  Changing it
>>>>> to maximum vector size (512) would mean we increase wide_int structure
>>>>> size significantly. New patterns are probably also needed.
>>>>
>>>> Yes, new patterns are needed but wide-int should be fine (we only need to create
>>>> a literal zero AFACS).  The "new pattern" would be equality/inequality
>>>> against zero
>>>> compares only.
>>>
>>> Currently 256bit integer creation fails because wide_int for max and
>>> min values cannot be created.
>>
>> Hmm, indeed:
>>
>> #1  0x000000000072dab5 in wi::extended_tree<192>::extended_tree (
>>     this=0x7fffffffd950, t=0x7ffff6a000b0)
>>     at /space/rguenther/src/svn/trunk/gcc/tree.h:5125
>> 5125      gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
>>
>> but that's not that the constants fail to be created but
>>
>> #5  0x00000000010d8828 in build_nonstandard_integer_type (precision=512,
>>     unsignedp=65) at /space/rguenther/src/svn/trunk/gcc/tree.c:8051
>> 8051      if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
>> (gdb) l
>> 8046        fixup_unsigned_type (itype);
>> 8047      else
>> 8048        fixup_signed_type (itype);
>> 8049
>> 8050      ret = itype;
>> 8051      if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
>> 8052        ret = type_hash_canon (tree_to_uhwi (TYPE_MAX_VALUE
>> (itype)), itype);
>>
>> thus the integer type hashing being "interesting".  tree_fits_uhwi_p
>> fails because
>> it does
>>
>> 7289    bool
>> 7290    tree_fits_uhwi_p (const_tree t)
>> 7291    {
>> 7292      return (t != NULL_TREE
>> 7293              && TREE_CODE (t) == INTEGER_CST
>> 7294              && wi::fits_uhwi_p (wi::to_widest (t)));
>> 7295    }
>>
>> and wi::to_widest () fails with doing
>>
>> 5121    template <int N>
>> 5122    inline wi::extended_tree <N>::extended_tree (const_tree t)
>> 5123      : m_t (t)
>> 5124    {
>> 5125      gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
>> 5126    }
>>
>> fixing the hashing then runs into type_cache_hasher::equal doing
>> tree_int_cst_equal
>> which again uses to_widest (it should be easier and cheaper to do the compare on
>> the actual tree representation, but well, seems to be just the first
>> of various issues
>> we'd run into).
>>
>> We eventually could fix the assert above (but then need to hope we assert
>> when a computation overflows the narrower precision of widest_int) or use
>> a special really_widest_int (ugh).
>>
>>> It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases
>>> WIDE_INT_MAX_ELTS
>>> and thus increases wide_int structure. If we use 512 for
>>> MAX_BITSIZE_MODE_ANY_INT then
>>> wide_int structure would grow by 48 bytes (16 bytes if use 256 for
>>> MAX_BITSIZE_MODE_ANY_INT).
>>> Is it OK for such narrow usage?
>>
>> widest_int is used in some long-living structures (which is the reason for
>> MAX_BITSIZE_MODE_ANY_INT in the first place).  So I don't think so.
>>
>> Richard.
>>
>>> Ilya
>>>
>>>>
>>>> Richard.
>>>>
>>>>> Ilya
>>>>>
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Yuri.
>>>>>>>
>>>>>>>

Comments

Yuri Rumyantsev Nov. 19, 2015, 3:20 p.m. UTC | #1
Hi Richard,

I send you updated version of patch which contains fixes you mentioned
and additional early exit in
register_edge_assert_for() for gcond with vector comparison - it tries
to produce assert for
  if (vect != {0,0,0,0}) but can't create such constant. This is not
essential since this is applied to very specialized context.

My answers are below.

2015-11-12 16:58 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Nov 11, 2015 at 2:13 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> What we should do to cope with this problem (structure size increasing)?
>> Should we return to vector comparison version?
>
> Ok, given this constraint I think the cleanest approach is to allow
> integer(!) vector equality(!) compares with scalar result.  This should then
> expand via cmp_optab and not via vec_cmp_optab.

  In fact it is expanded through cbranch_optab since the only use of
such comparison is for masked
store motion
>
> On gimple you can then have
>
>  if (mask_vec_1 != {0, 0, .... })
> ...
>
> Note that a fallback expansion (for optabs.c to try) would be
> the suggested view-conversion (aka, subreg) variant using
> a same-sized integer mode.
>
> Target maintainers can then choose what is a better fit for
> their target (and instruction set as register set constraints may apply).
>
> The patch you posted seems to do this but not restrict the compares
> to integer ones (please do that).
>
>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>          {
> -          error ("vector comparison returning a boolean");
> -          debug_generic_expr (op0_type);
> -          debug_generic_expr (op1_type);
> -          return true;
> +         /* Allow vector comparison returning boolean if operand types
> +            are equal and CODE is EQ/NE.  */
> +         if ((code != EQ_EXPR && code != NE_EXPR)
> +             || TREE_CODE (op0_type) != TREE_CODE (op1_type)
> +             || TYPE_VECTOR_SUBPARTS (op0_type)
> +                != TYPE_VECTOR_SUBPARTS (op1_type)
> +             || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type)))
> +                != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type))))
>
> These are all checked with the useless_type_conversion_p checks done earlier.
>
> As said I'd like to see a
>
>                 || ! VECTOR_INTEGER_TYPE_P (op0_type)

  I added check on VECTOR_BOOLEAN_TYPE_P (op0_type) instead since type
of mask was changed.
>
> check added so we and targets do not need to worry about using EQ/NE vs. CMP
> and worry about signed zeros and friends.
>
> +           {
> +             error ("type mismatch for vector comparison returning a boolean");
> +             debug_generic_expr (op0_type);
> +             debug_generic_expr (op1_type);
> +             return true;
> +           }
>
>
>
> --- a/gcc/tree-ssa-forwprop.c
> +++ b/gcc/tree-ssa-forwprop.c
> @@ -422,6 +422,15 @@ forward_propagate_into_comparison_1 (gimple *stmt,
>           enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
>           bool invariant_only_p = !single_use0_p;
>
> +         /* Can't combine vector comparison with scalar boolean type of
> +            the result and VEC_COND_EXPR having vector type of comparison.  */
> +         if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
> +             && INTEGRAL_TYPE_P (type)
> +             && (TREE_CODE (type) == BOOLEAN_TYPE
> +                 || TYPE_PRECISION (type) == 1)
> +             && def_code == VEC_COND_EXPR)
> +           return NULL_TREE;
>
> this hints at larger fallout you paper over here.  So this effectively
> means we're trying combining (vec1 != vec2) != 0 for example
> and that fails miserably?  If so then the solution is to fix whatever
> does not expect this (valid) GENERIC tree.

  I changed it to the following check in combine_cond_expr_cond:
  /* Do not perform combining it types are not compatible.  */
  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
    return NULL_TREE;
>
> +  if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0)
> +    return;
>
> not sure if I like a param more than a target hook ... :/
  I introduced the param instead of a target hook to make this
transformation user visible, i.e. to switch it on/off
for different targets.
>
> +      /* Create vector comparison with boolean result.  */
> +      vectype = TREE_TYPE (mask);
> +      zero = build_zero_cst (TREE_TYPE (vectype));
> +      zero = build_vector_from_val (vectype, zero);
>
> build_zero_cst (vectype);

Done.
>
> +      stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);
>
> you can omit the NULL_TREE operands.

  I did not find such definition for it.
>
> +      gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);
>
> please omit the assert.

  Done.
>
> +      gimple_set_vdef (last, new_vdef);
>
> do this before you create the PHI.
>
  Done.
> +         /* Put definition statement of stored value in STORE_BB
> +            if possible.  */
> +         arg3 = gimple_call_arg (last, 3);
> +         if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
> +           {
> ...
>
> is this really necessary?  It looks incomplete to me anyway.  I'd rather have
> a late sink pass if this shows necessary.  Btw,..

 I tried to avoid creation of multiple ne basic blocks for the same
mask and also I don't want to put all semi-hammock guarded by this
mask to separate block to keep it small enough since x86 chips prefer
short branches. Note also that icc does almost the same.
>
> +                it is legal.  */
> +             if (gimple_bb (def_stmt) == bb
> +                 && is_valid_sink (def_stmt, last_store))
>
> with the implementation of is_valid_sink this is effectively
>
>    && (!gimple_vuse (def_stmt)
>           || gimple_vuse (def_stmt) == gimple_vdef (last_store))
I did inlining of correspondent part of "is_valif_sink" to this place.
>
> I still think this "pass" is quite a hack, esp. as it appears as generic
> code in a GIMPLE pass.  And esp. as this hack seems to be needed
> for Haswell only, not Boradwell or Skylake.

This is not truth since for all them this transformation is performed
for skylake and broadwell since both them
belong to HASWELL family.

>
> Thanks,
> Richard.
>

ChangeLog:
2015-11-19  Yuri Rumyantsev  <ysrumyan@gmail.com>

* config/i386/i386.c: Add conditional initialization of
PARAM_ZERO_TEST_FOR_MASK_STORE.
(ix86_expand_branch): Implement vector comparison with boolean result.
* config/i386/i386.h: New macros TARGET_OPTIMIZE_MASK_STORE.
* config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
for vector comparion with eq/ne only.
* config/i386/x86-tune.def: New macros X86_TUNE_OPTIMIZE_MASK_STORE.
* fold-const.c (fold_relational_const): Add handling of vector
comparison with boolean result.
* params.def (PARAM_ZERO_TEST_FOR_MASK_STORE): New DEFPARAM.
* params.h (ENABLE_ZERO_TEST_FOR_MASK_STORE): New macros.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
combining for non-compatible vector types.
* tree-vect-loop.c (is_valid_sink): New function.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.
* tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
type.

gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
* gcc.target/i386/avx2-vect-mask-store-move2.c: Likewise.

Patch
diff mbox

--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -422,6 +422,15 @@  forward_propagate_into_comparison_1 (gimple *stmt,
          enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
          bool invariant_only_p = !single_use0_p;

+         /* Can't combine vector comparison with scalar boolean type of
+            the result and VEC_COND_EXPR having vector type of comparison.  */
+         if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+             && INTEGRAL_TYPE_P (type)
+             && (TREE_CODE (type) == BOOLEAN_TYPE
+                 || TYPE_PRECISION (type) == 1)
+             && def_code == VEC_COND_EXPR)
+           return NULL_TREE;

this hints at larger fallout you paper over here.  So this effectively
means we're trying combining (vec1 != vec2) != 0 for example
and that fails miserably?  If so then the solution is to fix whatever