[RFC] Try vector<bool> as a new representation for vector masks

On 27 Aug 09:55, Richard Biener wrote:
> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> >
> > Yes, I want to try it. But getting rid of bool patterns would mean
> > support for all targets currently supporting vec_cond. Would it be OK
> > to have vector<bool> mask co-exist with bool patterns for some time?
> 
> No, I'd like to remove the bool patterns anyhow - the vectorizer should be able
> to figure out the correct vector type (or mask type) from the uses.  Currently
> it simply looks at the stmts LHS type but as all stmt operands already have
> vector types it can as well compute the result type from those.  We'd want to
> have a helper function that does this result type computation as I figure it
> will be needed in multiple places.
> 
> This is now on my personal TODO list (but that's already quite long for GCC 6),
> so if you manage to get to that...  see
> tree-vect-loop.c:vect_determine_vectorization_factor
> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
> vector type set from data-ref analysis already - there 'bool' loads
> correctly get
> VNQImode).  There is a basic-block / SLP part as well that would need to use
> the helper function (eventually with some SLP discovery order issue).
> 
> > Thus first step would be to require vector<bool> for MASK_LOAD and
> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
> > MASK_STORE).
> 
> You can certainly try that first, but as soon as you hit complications with
> needing to adjust bool patterns then I'd rather get rid of them.
> 
> >
> > I can directly build a vector type with specified mode to avoid it. Smth. like:
> >
> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
> 
> Hmm, indeed, that might be a (good) solution.  Btw, in this case
> target attribute
> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
> on the active target).  There would also be no way for the user to
> declare vector<bool>
> in source (which is good because of that target attribute issue...).
> 
> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
> and adjusting
> truth_type_for is the way to go.
> 
> I suggest you try modifying those parts first according to this scheme
> that will most
> likely uncover issues we missed.
> 
> Thanks,
> Richard.
> 

I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.  There were no major issues (for now).

build_truth_vector_type and get_mask_type_for_scalar_type were added to build a mask type.  It is always a vector of bools but its mode is determined by a target using number of units and currently used vector length.

As previously I fixed if-conversion to apply boolean masks for loads and stores which automatically disables bool patterns for them and flow goes by a mask path.  Vectorization factor computation is fixed to have a separate computation for mask types.  Comparison is now handled separately by vectorizer and is vectorized into vector comparison.

Optabs for masked loads and stores were transformed into convert optabs.  Now it is checked using both value and mask modes.

Optabs for comparison were added.  These are also convert optabs checking value and result type.

I had to introduce significant number of new patterns in i386 target to support new optabs.  The reason was vector compare was never expanded separately and always was a part of a vec_cond expansion.

As a result it's possible to use the sage GIMPLE representation for both vector and scalar masks target type.  Here is an example I used as a simple test:

  for (i=0; i<N; i++)
  {
    float t = a[i];
    if (t > 0.0f && t < 1.0e+2f)
      if (c[i] != 0)
        c[i] = 1;
  }

Produced vector GIMPLE (before expand):

  vect_t_5.22_105 = MEM[base: _256, offset: 0B];
  mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
  mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
  mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
  vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
  mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
  mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
  MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 });

Produced assembler on AVX-512:

	vmovups	(%rdi), %zmm0
	vcmpps	$25, %zmm5, %zmm0, %k1
	vcmpps	$22, %zmm3, %zmm0, %k1{%k1}
	vmovdqa32	-64(%rdx), %zmm2{%k1}
	vpcmpd	$4, %zmm1, %zmm2, %k1{%k1}
	vmovdqa32	%zmm4, (%rcx){%k1}

Produced assembler on AVX-2:

	vmovups	(%rdx), %xmm1
	vinsertf128	$0x1, -16(%rdx), %ymm1, %ymm1
	vcmpltps	%ymm1, %ymm3, %ymm0
	vcmpltps	%ymm5, %ymm1, %ymm1
	vpand	%ymm0, %ymm1, %ymm0
	vpmaskmovd	-32(%rcx), %ymm0, %ymm1
	vpcmpeqd	%ymm2, %ymm1, %ymm1
	vpcmpeqd	%ymm2, %ymm1, %ymm1
	vpand	%ymm0, %ymm1, %ymm0
	vpmaskmovd	%ymm4, %ymm0, (%rax)

BTW AVX-2 code produced by trunk compiler is 4 insns longer:

	vmovups	(%rdx), %xmm0
	vinsertf128	$0x1, -16(%rdx), %ymm0, %ymm0
	vcmpltps	%ymm0, %ymm6, %ymm1
	vcmpltps	%ymm7, %ymm0, %ymm0
	vpand	%ymm1, %ymm5, %ymm2
	vpand	%ymm0, %ymm2, %ymm1
	vpcmpeqd	%ymm3, %ymm1, %ymm0
	vpandn	%ymm4, %ymm0, %ymm0
	vpmaskmovd	-32(%rcx), %ymm0, %ymm0
	vpcmpeqd	%ymm3, %ymm0, %ymm0
	vpandn	%ymm1, %ymm0, %ymm0
	vpcmpeqd	%ymm3, %ymm0, %ymm0
	vpandn	%ymm4, %ymm0, %ymm0
	vpmaskmovd	%ymm5, %ymm0, (%rax)

For now I still don't disable bool patterns, thus new masks apply to masked loads and stores only.  Patch is also not tested and tried on several small tests only.  Could you please look at what I currently have and say if it's in sync with your view on vector masking?

Thanks,
Ilya
--
gcc/

2015-09-01  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
	(ix86_expand_int_vec_cmp): New.
	(ix86_expand_fp_vec_cmp): New.
	* config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
	op_true and op_false.
	(ix86_int_cmp_code_to_pcmp_immediate): New.
	(ix86_fp_cmp_code_to_pcmp_immediate): New.
	(ix86_cmp_code_to_pcmp_immediate): New.
	(ix86_expand_mask_vec_cmp): New.
	(ix86_expand_fp_vec_cmp): New.
	(ix86_expand_int_sse_cmp): New.
	(ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
	(ix86_expand_int_vec_cmp): New.
	(ix86_get_mask_mode): New.
	(TARGET_VECTORIZE_GET_MASK_MODE): New.
	* config/i386/sse.md (avx512fmaskmodelower): New.
	(vec_cmp<mode><avx512fmaskmodelower>): New.
	(vec_cmp<mode><sseintvecmodelower>): New.
	(vec_cmpv2div2di): New.
	(vec_cmpu<mode><avx512fmaskmodelower>): New.
	(vec_cmpu<mode><sseintvecmodelower>): New.
	(vec_cmpuv2div2di): New.
	(maskload<mode>): Rename to ...
	(maskload<mode><sseintvecmodelower>): ... this.
	(maskstore<mode>): Rename to ...
	(maskstore<mode><sseintvecmodelower>): ... this.
	(maskload<mode><avx512fmaskmodelower>): New.
	(maskstore<mode><avx512fmaskmodelower>): New.
	* doc/tm.texi: Regenerated.
	* doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
	* expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
	* internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
	(expand_MASK_STORE): Likewise.
	* optabs.c (vector_compare_rtx): Add OPNO arg.
	(expand_vec_cond_expr): Adjust to vector_compare_rtx change.
	(get_vec_cmp_icode): New.
	(expand_vec_cmp_expr_p): New.
	(expand_vec_cmp_expr): New.
	(can_vec_mask_load_store_p): Add MASK_MODE arg.
	* optabs.def (vec_cmp_optab): New.
	(vec_cmpu_optab): New.
	(maskload_optab): Transform into convert optab.
	(maskstore_optab): Likewise.
	* optabs.h (expand_vec_cmp_expr_p): New.
	(expand_vec_cmp_expr): New.
	(can_vec_mask_load_store_p): Add MASK_MODE arg.
	* target.def (get_mask_mode): New.
	* targhooks.c (default_vector_alignment): Use mode alignment
	for vector masks.
	(default_get_mask_mode): New.
	* targhooks.h (default_get_mask_mode): New.
	* tree-cfg.c (verify_gimple_comparison): Support vector mask.
	* tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
	can_vec_mask_load_store_p signature change.
	(predicate_mem_writes): Use boolean mask.
	* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
	(vect_create_destination_var): Likewise.
	* tree-vect-generic.c (expand_vector_comparison): Use
	expand_vec_cmp_expr_p for comparison availability.
	(expand_vector_operations_1): Ignore statements with scalar mode.
	* tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
	operations for VF.  Add mask type computation.
	* tree-vect-stmts.c (vect_get_vec_def_for_operand): Support mask
	constant.
	(vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
	signature change.
	(vectorizable_comparison): New.
	(vect_analyze_stmt): Add vectorizable_comparison.
	(vect_transform_stmt): Likewise.
	(get_mask_type_for_scalar_type): New.
	* tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
	(enum stmt_vec_info_type): Add comparison_vec_info_type.
	(get_mask_type_for_scalar_type): New.
	* tree.c (build_truth_vector_type): New.
	(truth_type_for): Use build_truth_vector_type for vectors.
	* tree.h (build_truth_vector_type): New.

[RFC] Try vector<bool> as a new representation for vector masks

Commit Message

Comments

Patch