Message ID | 20150901130800.GA55610@msticlxl57.ims.intel.com |
---|---|
State | New |
Headers | show |
On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > On 27 Aug 09:55, Richard Biener wrote: >> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> > >> > Yes, I want to try it. But getting rid of bool patterns would mean >> > support for all targets currently supporting vec_cond. Would it be OK >> > to have vector<bool> mask co-exist with bool patterns for some time? >> >> No, I'd like to remove the bool patterns anyhow - the vectorizer should be able >> to figure out the correct vector type (or mask type) from the uses. Currently >> it simply looks at the stmts LHS type but as all stmt operands already have >> vector types it can as well compute the result type from those. We'd want to >> have a helper function that does this result type computation as I figure it >> will be needed in multiple places. >> >> This is now on my personal TODO list (but that's already quite long for GCC 6), >> so if you manage to get to that... see >> tree-vect-loop.c:vect_determine_vectorization_factor >> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their >> vector type set from data-ref analysis already - there 'bool' loads >> correctly get >> VNQImode). There is a basic-block / SLP part as well that would need to use >> the helper function (eventually with some SLP discovery order issue). >> >> > Thus first step would be to require vector<bool> for MASK_LOAD and >> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and >> > MASK_STORE). >> >> You can certainly try that first, but as soon as you hit complications with >> needing to adjust bool patterns then I'd rather get rid of them. >> >> > >> > I can directly build a vector type with specified mode to avoid it. Smth. like: >> > >> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size); >> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode); >> >> Hmm, indeed, that might be a (good) solution. Btw, in this case >> target attribute >> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending >> on the active target). There would also be no way for the user to >> declare vector<bool> >> in source (which is good because of that target attribute issue...). >> >> So yeah. Adding a tree.c:build_truth_vector_type (unsigned nunits) >> and adjusting >> truth_type_for is the way to go. >> >> I suggest you try modifying those parts first according to this scheme >> that will most >> likely uncover issues we missed. >> >> Thanks, >> Richard. >> > > I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE. There were no major issues (for now). > > build_truth_vector_type and get_mask_type_for_scalar_type were added to build a mask type. It is always a vector of bools but its mode is determined by a target using number of units and currently used vector length. > > As previously I fixed if-conversion to apply boolean masks for loads and stores which automatically disables bool patterns for them and flow goes by a mask path. Vectorization factor computation is fixed to have a separate computation for mask types. Comparison is now handled separately by vectorizer and is vectorized into vector comparison. > > Optabs for masked loads and stores were transformed into convert optabs. Now it is checked using both value and mask modes. > > Optabs for comparison were added. These are also convert optabs checking value and result type. > > I had to introduce significant number of new patterns in i386 target to support new optabs. The reason was vector compare was never expanded separately and always was a part of a vec_cond expansion. Indeed. > As a result it's possible to use the sage GIMPLE representation for both vector and scalar masks target type. Here is an example I used as a simple test: > > for (i=0; i<N; i++) > { > float t = a[i]; > if (t > 0.0f && t < 1.0e+2f) > if (c[i] != 0) > c[i] = 1; > } > > Produced vector GIMPLE (before expand): > > vect_t_5.22_105 = MEM[base: _256, offset: 0B]; > mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 }; > mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 }; > mask__8.27_110 = mask__6.23_107 & mask__7.25_109; > vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110); > mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 }; > mask__37.35_120 = mask__8.27_110 & mask__36.33_119; > MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 }); Looks good to me. > Produced assembler on AVX-512: > > vmovups (%rdi), %zmm0 > vcmpps $25, %zmm5, %zmm0, %k1 > vcmpps $22, %zmm3, %zmm0, %k1{%k1} > vmovdqa32 -64(%rdx), %zmm2{%k1} > vpcmpd $4, %zmm1, %zmm2, %k1{%k1} > vmovdqa32 %zmm4, (%rcx){%k1} > > Produced assembler on AVX-2: > > vmovups (%rdx), %xmm1 > vinsertf128 $0x1, -16(%rdx), %ymm1, %ymm1 > vcmpltps %ymm1, %ymm3, %ymm0 > vcmpltps %ymm5, %ymm1, %ymm1 > vpand %ymm0, %ymm1, %ymm0 > vpmaskmovd -32(%rcx), %ymm0, %ymm1 > vpcmpeqd %ymm2, %ymm1, %ymm1 > vpcmpeqd %ymm2, %ymm1, %ymm1 > vpand %ymm0, %ymm1, %ymm0 > vpmaskmovd %ymm4, %ymm0, (%rax) > > BTW AVX-2 code produced by trunk compiler is 4 insns longer: > > vmovups (%rdx), %xmm0 > vinsertf128 $0x1, -16(%rdx), %ymm0, %ymm0 > vcmpltps %ymm0, %ymm6, %ymm1 > vcmpltps %ymm7, %ymm0, %ymm0 > vpand %ymm1, %ymm5, %ymm2 > vpand %ymm0, %ymm2, %ymm1 > vpcmpeqd %ymm3, %ymm1, %ymm0 > vpandn %ymm4, %ymm0, %ymm0 > vpmaskmovd -32(%rcx), %ymm0, %ymm0 > vpcmpeqd %ymm3, %ymm0, %ymm0 > vpandn %ymm1, %ymm0, %ymm0 > vpcmpeqd %ymm3, %ymm0, %ymm0 > vpandn %ymm4, %ymm0, %ymm0 > vpmaskmovd %ymm5, %ymm0, (%rax) > > > For now I still don't disable bool patterns, thus new masks apply to masked loads and stores only. Patch is also not tested and tried on several small tests only. Could you please look at what I currently have and say if it's in sync with your view on vector masking? So apart from bool patterns and maybe implementation details (didn't look too closely at the patch yet, maybe tomorrow), there is + /* Or a boolean vector type with the same element count + as the comparison operand types. */ + else if (TREE_CODE (type) == VECTOR_TYPE + && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) + { so we now allow both, integer typed and boolean typed comparison results? I was hoping that on GIMPLE we can canonicalize to a single form, the boolean one and for the "old" style force the use of VEC_COND exprs (which we did anyway, AFAIK). The comparison in the VEC_COND would still have vector bool result type. I expect the vectorization factor changes to "vanish" if we remove bool patterns and re-org vector type deduction Richard. > Thanks, > Ilya > -- > gcc/ > > 2015-09-01 Ilya Enkovich <enkovich.gnu@gmail.com> > > * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New. > (ix86_expand_int_vec_cmp): New. > (ix86_expand_fp_vec_cmp): New. > * config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for > op_true and op_false. > (ix86_int_cmp_code_to_pcmp_immediate): New. > (ix86_fp_cmp_code_to_pcmp_immediate): New. > (ix86_cmp_code_to_pcmp_immediate): New. > (ix86_expand_mask_vec_cmp): New. > (ix86_expand_fp_vec_cmp): New. > (ix86_expand_int_sse_cmp): New. > (ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp. > (ix86_expand_int_vec_cmp): New. > (ix86_get_mask_mode): New. > (TARGET_VECTORIZE_GET_MASK_MODE): New. > * config/i386/sse.md (avx512fmaskmodelower): New. > (vec_cmp<mode><avx512fmaskmodelower>): New. > (vec_cmp<mode><sseintvecmodelower>): New. > (vec_cmpv2div2di): New. > (vec_cmpu<mode><avx512fmaskmodelower>): New. > (vec_cmpu<mode><sseintvecmodelower>): New. > (vec_cmpuv2div2di): New. > (maskload<mode>): Rename to ... > (maskload<mode><sseintvecmodelower>): ... this. > (maskstore<mode>): Rename to ... > (maskstore<mode><sseintvecmodelower>): ... this. > (maskload<mode><avx512fmaskmodelower>): New. > (maskstore<mode><avx512fmaskmodelower>): New. > * doc/tm.texi: Regenerated. > * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New. > * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results. > * internal-fn.c (expand_MASK_LOAD): Adjust to optab changes. > (expand_MASK_STORE): Likewise. > * optabs.c (vector_compare_rtx): Add OPNO arg. > (expand_vec_cond_expr): Adjust to vector_compare_rtx change. > (get_vec_cmp_icode): New. > (expand_vec_cmp_expr_p): New. > (expand_vec_cmp_expr): New. > (can_vec_mask_load_store_p): Add MASK_MODE arg. > * optabs.def (vec_cmp_optab): New. > (vec_cmpu_optab): New. > (maskload_optab): Transform into convert optab. > (maskstore_optab): Likewise. > * optabs.h (expand_vec_cmp_expr_p): New. > (expand_vec_cmp_expr): New. > (can_vec_mask_load_store_p): Add MASK_MODE arg. > * target.def (get_mask_mode): New. > * targhooks.c (default_vector_alignment): Use mode alignment > for vector masks. > (default_get_mask_mode): New. > * targhooks.h (default_get_mask_mode): New. > * tree-cfg.c (verify_gimple_comparison): Support vector mask. > * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to > can_vec_mask_load_store_p signature change. > (predicate_mem_writes): Use boolean mask. > * tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var. > (vect_create_destination_var): Likewise. > * tree-vect-generic.c (expand_vector_comparison): Use > expand_vec_cmp_expr_p for comparison availability. > (expand_vector_operations_1): Ignore statements with scalar mode. > * tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask > operations for VF. Add mask type computation. > * tree-vect-stmts.c (vect_get_vec_def_for_operand): Support mask > constant. > (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p > signature change. > (vectorizable_comparison): New. > (vect_analyze_stmt): Add vectorizable_comparison. > (vect_transform_stmt): Likewise. > (get_mask_type_for_scalar_type): New. > * tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var > (enum stmt_vec_info_type): Add comparison_vec_info_type. > (get_mask_type_for_scalar_type): New. > * tree.c (build_truth_vector_type): New. > (truth_type_for): Use build_truth_vector_type for vectors. > * tree.h (build_truth_vector_type): New.
Adding CCs. 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>: > 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> On 27 Aug 09:55, Richard Biener wrote: >>>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>> > >>>> > Yes, I want to try it. But getting rid of bool patterns would mean >>>> > support for all targets currently supporting vec_cond. Would it be OK >>>> > to have vector<bool> mask co-exist with bool patterns for some time? >>>> >>>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be able >>>> to figure out the correct vector type (or mask type) from the uses. Currently >>>> it simply looks at the stmts LHS type but as all stmt operands already have >>>> vector types it can as well compute the result type from those. We'd want to >>>> have a helper function that does this result type computation as I figure it >>>> will be needed in multiple places. >>>> >>>> This is now on my personal TODO list (but that's already quite long for GCC 6), >>>> so if you manage to get to that... see >>>> tree-vect-loop.c:vect_determine_vectorization_factor >>>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their >>>> vector type set from data-ref analysis already - there 'bool' loads >>>> correctly get >>>> VNQImode). There is a basic-block / SLP part as well that would need to use >>>> the helper function (eventually with some SLP discovery order issue). >>>> >>>> > Thus first step would be to require vector<bool> for MASK_LOAD and >>>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and >>>> > MASK_STORE). >>>> >>>> You can certainly try that first, but as soon as you hit complications with >>>> needing to adjust bool patterns then I'd rather get rid of them. >>>> >>>> > >>>> > I can directly build a vector type with specified mode to avoid it. Smth. like: >>>> > >>>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size); >>>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode); >>>> >>>> Hmm, indeed, that might be a (good) solution. Btw, in this case >>>> target attribute >>>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending >>>> on the active target). There would also be no way for the user to >>>> declare vector<bool> >>>> in source (which is good because of that target attribute issue...). >>>> >>>> So yeah. Adding a tree.c:build_truth_vector_type (unsigned nunits) >>>> and adjusting >>>> truth_type_for is the way to go. >>>> >>>> I suggest you try modifying those parts first according to this scheme >>>> that will most >>>> likely uncover issues we missed. >>>> >>>> Thanks, >>>> Richard. >>>> >>> >>> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE. There were no major issues (for now). >>> >>> build_truth_vector_type and get_mask_type_for_scalar_type were added to build a mask type. It is always a vector of bools but its mode is determined by a target using number of units and currently used vector length. >>> >>> As previously I fixed if-conversion to apply boolean masks for loads and stores which automatically disables bool patterns for them and flow goes by a mask path. Vectorization factor computation is fixed to have a separate computation for mask types. Comparison is now handled separately by vectorizer and is vectorized into vector comparison. >>> >>> Optabs for masked loads and stores were transformed into convert optabs. Now it is checked using both value and mask modes. >>> >>> Optabs for comparison were added. These are also convert optabs checking value and result type. >>> >>> I had to introduce significant number of new patterns in i386 target to support new optabs. The reason was vector compare was never expanded separately and always was a part of a vec_cond expansion. >> >> Indeed. >> >>> As a result it's possible to use the sage GIMPLE representation for both vector and scalar masks target type. Here is an example I used as a simple test: >>> >>> for (i=0; i<N; i++) >>> { >>> float t = a[i]; >>> if (t > 0.0f && t < 1.0e+2f) >>> if (c[i] != 0) >>> c[i] = 1; >>> } >>> >>> Produced vector GIMPLE (before expand): >>> >>> vect_t_5.22_105 = MEM[base: _256, offset: 0B]; >>> mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 }; >>> mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 }; >>> mask__8.27_110 = mask__6.23_107 & mask__7.25_109; >>> vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110); >>> mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 }; >>> mask__37.35_120 = mask__8.27_110 & mask__36.33_119; >>> MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 }); >> >> Looks good to me. >> >>> Produced assembler on AVX-512: >>> >>> vmovups (%rdi), %zmm0 >>> vcmpps $25, %zmm5, %zmm0, %k1 >>> vcmpps $22, %zmm3, %zmm0, %k1{%k1} >>> vmovdqa32 -64(%rdx), %zmm2{%k1} >>> vpcmpd $4, %zmm1, %zmm2, %k1{%k1} >>> vmovdqa32 %zmm4, (%rcx){%k1} >>> >>> Produced assembler on AVX-2: >>> >>> vmovups (%rdx), %xmm1 >>> vinsertf128 $0x1, -16(%rdx), %ymm1, %ymm1 >>> vcmpltps %ymm1, %ymm3, %ymm0 >>> vcmpltps %ymm5, %ymm1, %ymm1 >>> vpand %ymm0, %ymm1, %ymm0 >>> vpmaskmovd -32(%rcx), %ymm0, %ymm1 >>> vpcmpeqd %ymm2, %ymm1, %ymm1 >>> vpcmpeqd %ymm2, %ymm1, %ymm1 >>> vpand %ymm0, %ymm1, %ymm0 >>> vpmaskmovd %ymm4, %ymm0, (%rax) >>> >>> BTW AVX-2 code produced by trunk compiler is 4 insns longer: >>> >>> vmovups (%rdx), %xmm0 >>> vinsertf128 $0x1, -16(%rdx), %ymm0, %ymm0 >>> vcmpltps %ymm0, %ymm6, %ymm1 >>> vcmpltps %ymm7, %ymm0, %ymm0 >>> vpand %ymm1, %ymm5, %ymm2 >>> vpand %ymm0, %ymm2, %ymm1 >>> vpcmpeqd %ymm3, %ymm1, %ymm0 >>> vpandn %ymm4, %ymm0, %ymm0 >>> vpmaskmovd -32(%rcx), %ymm0, %ymm0 >>> vpcmpeqd %ymm3, %ymm0, %ymm0 >>> vpandn %ymm1, %ymm0, %ymm0 >>> vpcmpeqd %ymm3, %ymm0, %ymm0 >>> vpandn %ymm4, %ymm0, %ymm0 >>> vpmaskmovd %ymm5, %ymm0, (%rax) >>> >>> >>> For now I still don't disable bool patterns, thus new masks apply to masked loads and stores only. Patch is also not tested and tried on several small tests only. Could you please look at what I currently have and say if it's in sync with your view on vector masking? >> >> So apart from bool patterns and maybe implementation details (didn't >> look too closely at the patch yet, maybe tomorrow), there is >> >> + /* Or a boolean vector type with the same element count >> + as the comparison operand types. */ >> + else if (TREE_CODE (type) == VECTOR_TYPE >> + && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) >> + { >> >> so we now allow both, integer typed and boolean typed comparison >> results? I was hoping that on GIMPLE >> we can canonicalize to a single form, the boolean one and for the >> "old" style force the use of VEC_COND exprs >> (which we did anyway, AFAIK). The comparison in the VEC_COND would >> still have vector bool result type. >> >> I expect the vectorization factor changes to "vanish" if we remove >> bool patterns and re-org vector type deduction >> >> Richard. >> > > Totally disabling old style vector comparison and bool pattern is a > goal but doing hat would mean a lot of regressions for many targets. > Do you want to it to be tried to estimate amount of changes required > and reveal possible issues? What would be integration plan for these > changes? Do you want to just introduce new vector<bool> in GIMPLE > disabling bool patterns and then resolving vectorization regression on > all targets or allow them live together with following target switch > one by one from bool patterns with finally removing them? Not all > targets are likely to be adopted fast I suppose. > > Ilya
On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > Adding CCs. > > 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>: >> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>> On 27 Aug 09:55, Richard Biener wrote: >>>>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>>> > >>>>> > Yes, I want to try it. But getting rid of bool patterns would mean >>>>> > support for all targets currently supporting vec_cond. Would it be OK >>>>> > to have vector<bool> mask co-exist with bool patterns for some time? >>>>> >>>>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be able >>>>> to figure out the correct vector type (or mask type) from the uses. Currently >>>>> it simply looks at the stmts LHS type but as all stmt operands already have >>>>> vector types it can as well compute the result type from those. We'd want to >>>>> have a helper function that does this result type computation as I figure it >>>>> will be needed in multiple places. >>>>> >>>>> This is now on my personal TODO list (but that's already quite long for GCC 6), >>>>> so if you manage to get to that... see >>>>> tree-vect-loop.c:vect_determine_vectorization_factor >>>>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their >>>>> vector type set from data-ref analysis already - there 'bool' loads >>>>> correctly get >>>>> VNQImode). There is a basic-block / SLP part as well that would need to use >>>>> the helper function (eventually with some SLP discovery order issue). >>>>> >>>>> > Thus first step would be to require vector<bool> for MASK_LOAD and >>>>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and >>>>> > MASK_STORE). >>>>> >>>>> You can certainly try that first, but as soon as you hit complications with >>>>> needing to adjust bool patterns then I'd rather get rid of them. >>>>> >>>>> > >>>>> > I can directly build a vector type with specified mode to avoid it. Smth. like: >>>>> > >>>>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size); >>>>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode); >>>>> >>>>> Hmm, indeed, that might be a (good) solution. Btw, in this case >>>>> target attribute >>>>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending >>>>> on the active target). There would also be no way for the user to >>>>> declare vector<bool> >>>>> in source (which is good because of that target attribute issue...). >>>>> >>>>> So yeah. Adding a tree.c:build_truth_vector_type (unsigned nunits) >>>>> and adjusting >>>>> truth_type_for is the way to go. >>>>> >>>>> I suggest you try modifying those parts first according to this scheme >>>>> that will most >>>>> likely uncover issues we missed. >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>> >>>> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE. There were no major issues (for now). >>>> >>>> build_truth_vector_type and get_mask_type_for_scalar_type were added to build a mask type. It is always a vector of bools but its mode is determined by a target using number of units and currently used vector length. >>>> >>>> As previously I fixed if-conversion to apply boolean masks for loads and stores which automatically disables bool patterns for them and flow goes by a mask path. Vectorization factor computation is fixed to have a separate computation for mask types. Comparison is now handled separately by vectorizer and is vectorized into vector comparison. >>>> >>>> Optabs for masked loads and stores were transformed into convert optabs. Now it is checked using both value and mask modes. >>>> >>>> Optabs for comparison were added. These are also convert optabs checking value and result type. >>>> >>>> I had to introduce significant number of new patterns in i386 target to support new optabs. The reason was vector compare was never expanded separately and always was a part of a vec_cond expansion. >>> >>> Indeed. >>> >>>> As a result it's possible to use the sage GIMPLE representation for both vector and scalar masks target type. Here is an example I used as a simple test: >>>> >>>> for (i=0; i<N; i++) >>>> { >>>> float t = a[i]; >>>> if (t > 0.0f && t < 1.0e+2f) >>>> if (c[i] != 0) >>>> c[i] = 1; >>>> } >>>> >>>> Produced vector GIMPLE (before expand): >>>> >>>> vect_t_5.22_105 = MEM[base: _256, offset: 0B]; >>>> mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 }; >>>> mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 }; >>>> mask__8.27_110 = mask__6.23_107 & mask__7.25_109; >>>> vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110); >>>> mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 }; >>>> mask__37.35_120 = mask__8.27_110 & mask__36.33_119; >>>> MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 }); >>> >>> Looks good to me. >>> >>>> Produced assembler on AVX-512: >>>> >>>> vmovups (%rdi), %zmm0 >>>> vcmpps $25, %zmm5, %zmm0, %k1 >>>> vcmpps $22, %zmm3, %zmm0, %k1{%k1} >>>> vmovdqa32 -64(%rdx), %zmm2{%k1} >>>> vpcmpd $4, %zmm1, %zmm2, %k1{%k1} >>>> vmovdqa32 %zmm4, (%rcx){%k1} >>>> >>>> Produced assembler on AVX-2: >>>> >>>> vmovups (%rdx), %xmm1 >>>> vinsertf128 $0x1, -16(%rdx), %ymm1, %ymm1 >>>> vcmpltps %ymm1, %ymm3, %ymm0 >>>> vcmpltps %ymm5, %ymm1, %ymm1 >>>> vpand %ymm0, %ymm1, %ymm0 >>>> vpmaskmovd -32(%rcx), %ymm0, %ymm1 >>>> vpcmpeqd %ymm2, %ymm1, %ymm1 >>>> vpcmpeqd %ymm2, %ymm1, %ymm1 >>>> vpand %ymm0, %ymm1, %ymm0 >>>> vpmaskmovd %ymm4, %ymm0, (%rax) >>>> >>>> BTW AVX-2 code produced by trunk compiler is 4 insns longer: >>>> >>>> vmovups (%rdx), %xmm0 >>>> vinsertf128 $0x1, -16(%rdx), %ymm0, %ymm0 >>>> vcmpltps %ymm0, %ymm6, %ymm1 >>>> vcmpltps %ymm7, %ymm0, %ymm0 >>>> vpand %ymm1, %ymm5, %ymm2 >>>> vpand %ymm0, %ymm2, %ymm1 >>>> vpcmpeqd %ymm3, %ymm1, %ymm0 >>>> vpandn %ymm4, %ymm0, %ymm0 >>>> vpmaskmovd -32(%rcx), %ymm0, %ymm0 >>>> vpcmpeqd %ymm3, %ymm0, %ymm0 >>>> vpandn %ymm1, %ymm0, %ymm0 >>>> vpcmpeqd %ymm3, %ymm0, %ymm0 >>>> vpandn %ymm4, %ymm0, %ymm0 >>>> vpmaskmovd %ymm5, %ymm0, (%rax) >>>> >>>> >>>> For now I still don't disable bool patterns, thus new masks apply to masked loads and stores only. Patch is also not tested and tried on several small tests only. Could you please look at what I currently have and say if it's in sync with your view on vector masking? >>> >>> So apart from bool patterns and maybe implementation details (didn't >>> look too closely at the patch yet, maybe tomorrow), there is >>> >>> + /* Or a boolean vector type with the same element count >>> + as the comparison operand types. */ >>> + else if (TREE_CODE (type) == VECTOR_TYPE >>> + && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) >>> + { >>> >>> so we now allow both, integer typed and boolean typed comparison >>> results? I was hoping that on GIMPLE >>> we can canonicalize to a single form, the boolean one and for the >>> "old" style force the use of VEC_COND exprs >>> (which we did anyway, AFAIK). The comparison in the VEC_COND would >>> still have vector bool result type. >>> >>> I expect the vectorization factor changes to "vanish" if we remove >>> bool patterns and re-org vector type deduction >>> >>> Richard. >>> >> >> Totally disabling old style vector comparison and bool pattern is a >> goal but doing hat would mean a lot of regressions for many targets. >> Do you want to it to be tried to estimate amount of changes required >> and reveal possible issues? What would be integration plan for these >> changes? Do you want to just introduce new vector<bool> in GIMPLE >> disabling bool patterns and then resolving vectorization regression on >> all targets or allow them live together with following target switch >> one by one from bool patterns with finally removing them? Not all >> targets are likely to be adopted fast I suppose. Well, the frontends already create vec_cond exprs I believe. So for bool patterns the vectorizer would have to do the same, but the comparison result in there would still use vec<bool>. Thus the scalar _Bool a = b < c; _Bool c = a || d; if (c) would become vec<int> a = VEC_COND <a < b ? -1 : 0>; vec<int> c = a | d; when the target does not have vec<bool>s directly and otherwise vec<boo> directly (dropping the VEC_COND). Just the vector comparison inside the VEC_COND would always have vec<bool> type. And the "bool patterns" I am talking about are those in tree-vect-patterns.c, not any targets instruction patterns. Richard. >> >> Ilya
2015-09-03 15:11 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> Adding CCs. >> >> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>: >>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>> >>> Totally disabling old style vector comparison and bool pattern is a >>> goal but doing hat would mean a lot of regressions for many targets. >>> Do you want to it to be tried to estimate amount of changes required >>> and reveal possible issues? What would be integration plan for these >>> changes? Do you want to just introduce new vector<bool> in GIMPLE >>> disabling bool patterns and then resolving vectorization regression on >>> all targets or allow them live together with following target switch >>> one by one from bool patterns with finally removing them? Not all >>> targets are likely to be adopted fast I suppose. > > Well, the frontends already create vec_cond exprs I believe. So for > bool patterns the vectorizer would have to do the same, but the > comparison result in there would still use vec<bool>. Thus the scalar > > _Bool a = b < c; > _Bool c = a || d; > if (c) > > would become > > vec<int> a = VEC_COND <a < b ? -1 : 0>; > vec<int> c = a | d; This should be identical to vec<_Bool> a = a < b; vec<_Bool> c = a | d; where vec<_Bool> has VxSI mode. And we should prefer it in case target supports vector comparison into vec<bool>, right? > > when the target does not have vec<bool>s directly and otherwise > vec<boo> directly (dropping the VEC_COND). > > Just the vector comparison inside the VEC_COND would always > have vec<bool> type. I don't really understand what you mean by 'doesn't have vec<bool>s dirrectly' here. Currently I have a hook to ask for a vec<bool> mode and assume target doesn't support it in case it returns VOIDmode. But in such case I have no mode to use for vec<bool> inside VEC_COND either. In default implementation of the new target hook I always return integer vector mode (to have default behavior similar to the current one). It should allow me to use vec<bool> for conditions in all vec_cond. But we'd need some other trigger for bool patterns to apply. Probably check vec_cmp optab in check_bool_pattern and don't convert in case comparison is supported by target? Or control it via additional hook. > > And the "bool patterns" I am talking about are those in > tree-vect-patterns.c, not any targets instruction patterns. I refer to them also. BTW bool patterns also pull comparison into vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I think with vector comparisons in place we should allow SSA_NAME as conditions in VEC_COND for better CSE. That should require new vcond optabs though. Ilya > > Richard. > >>> >>> Ilya
On 09/01/2015 07:08 AM, Ilya Enkovich wrote: > On 27 Aug 09:55, Richard Biener wrote: >> I suggest you try modifying those parts first according to this >> scheme that will most likely uncover issues we missed. >> >> Thanks, Richard. >> > > I tried to implement this scheme and apply it for MASK_LOAD and > MASK_STORE. There were no major issues (for now). So do we have enough confidence in this representation that we want to go ahead and commit to it? > > I had to introduce significant number of new patterns in i386 target > to support new optabs. The reason was vector compare was never > expanded separately and always was a part of a vec_cond expansion. One could argue we should have fixed this already, so I don't see the new patterns as a bad thing, but instead they're addressing a long term mis-design. > > > For now I still don't disable bool patterns, thus new masks apply to > masked loads and stores only. Patch is also not tested and tried on > several small tests only. Could you please look at what I currently > have and say if it's in sync with your view on vector masking? I'm going to let Richi run with this for the most part -- but I will chime in with a thank you for being willing to bounce this around a bit while we figure out the representational issues. jeff
2015-09-04 23:42 GMT+03:00 Jeff Law <law@redhat.com>: > On 09/01/2015 07:08 AM, Ilya Enkovich wrote: >> >> On 27 Aug 09:55, Richard Biener wrote: >>> >>> I suggest you try modifying those parts first according to this >>> scheme that will most likely uncover issues we missed. >>> >>> Thanks, Richard. >>> >> >> I tried to implement this scheme and apply it for MASK_LOAD and >> MASK_STORE. There were no major issues (for now). > > So do we have enough confidence in this representation that we want to go > ahead and commit to it? I think new representation fits nice mostly. There are some places where I have to make some exceptions for vector of bools to make it work. This is mostly to avoid target modifications. I'd like to avoid necessity to change all targets currently supporting vec_cond. It makes me add some special handling of vec<bool> in GIMPLE, e.g. I add a special code in vect_init_vector to build vec<bool> invariants with proper casting to int. Otherwise I'd need to do it on a target side. I made several fixes and current patch (still allowing integer vector result for vector comparison and applying bool patterns) passes bootstrap and regression testing on x86_64. Now I'll try to fully switch to vec<bool> and see how it goes. Thanks, Ilya > >> >> I had to introduce significant number of new patterns in i386 target >> to support new optabs. The reason was vector compare was never >> expanded separately and always was a part of a vec_cond expansion. > > One could argue we should have fixed this already, so I don't see the new > patterns as a bad thing, but instead they're addressing a long term > mis-design. > >> >> >> For now I still don't disable bool patterns, thus new masks apply to >> masked loads and stores only. Patch is also not tested and tried on >> several small tests only. Could you please look at what I currently >> have and say if it's in sync with your view on vector masking? > > I'm going to let Richi run with this for the most part -- but I will chime > in with a thank you for being willing to bounce this around a bit while we > figure out the representational issues. > > > jeff gcc/ 2015-09-08 Ilya Enkovich <enkovich.gnu@gmail.com> * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New. (ix86_expand_int_vec_cmp): New. (ix86_expand_fp_vec_cmp): New. * config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for op_true and op_false. (ix86_int_cmp_code_to_pcmp_immediate): New. (ix86_fp_cmp_code_to_pcmp_immediate): New. (ix86_cmp_code_to_pcmp_immediate): New. (ix86_expand_mask_vec_cmp): New. (ix86_expand_fp_vec_cmp): New. (ix86_expand_int_sse_cmp): New. (ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp. (ix86_expand_int_vec_cmp): New. (ix86_get_mask_mode): New. (TARGET_VECTORIZE_GET_MASK_MODE): New. * config/i386/sse.md (avx512fmaskmodelower): New. (vec_cmp<mode><avx512fmaskmodelower>): New. (vec_cmp<mode><sseintvecmodelower>): New. (vec_cmpv2div2di): New. (vec_cmpu<mode><avx512fmaskmodelower>): New. (vec_cmpu<mode><sseintvecmodelower>): New. (vec_cmpuv2div2di): New. (maskload<mode>): Rename to ... (maskload<mode><sseintvecmodelower>): ... this. (maskstore<mode>): Rename to ... (maskstore<mode><sseintvecmodelower>): ... this. (maskload<mode><avx512fmaskmodelower>): New. (maskstore<mode><avx512fmaskmodelower>): New. * doc/tm.texi: Regenerated. * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New. * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results. * internal-fn.c (expand_MASK_LOAD): Adjust to optab changes. (expand_MASK_STORE): Likewise. * optabs.c (vector_compare_rtx): Add OPNO arg. (expand_vec_cond_expr): Adjust to vector_compare_rtx change. (get_vec_cmp_icode): New. (expand_vec_cmp_expr_p): New. (expand_vec_cmp_expr): New. (can_vec_mask_load_store_p): Add MASK_MODE arg. * optabs.def (vec_cmp_optab): New. (vec_cmpu_optab): New. (maskload_optab): Transform into convert optab. (maskstore_optab): Likewise. * optabs.h (expand_vec_cmp_expr_p): New. (expand_vec_cmp_expr): New. (can_vec_mask_load_store_p): Add MASK_MODE arg. * target.def (get_mask_mode): New. * targhooks.c (default_vector_alignment): Use mode alignment for vector masks. (default_get_mask_mode): New. * targhooks.h (default_get_mask_mode): New. * tree-cfg.c (verify_gimple_comparison): Support vector mask. * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to can_vec_mask_load_store_p signature change. (predicate_mem_writes): Use boolean mask. * tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var. (vect_create_destination_var): Likewise. * tree-vect-generic.c (expand_vector_comparison): Use expand_vec_cmp_expr_p for comparison availability. (expand_vector_operations_1): Ignore mask statements with scalar mode. * tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask operations for VF. Add mask type computation. * tree-vect-stmts.c (vect_init_vector): Support mask invariants. (vect_get_vec_def_for_operand): Support mask constant. (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p signature change. (vectorizable_comparison): New. (vect_analyze_stmt): Add vectorizable_comparison. (vect_transform_stmt): Likewise. (get_mask_type_for_scalar_type): New. * tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var (enum stmt_vec_info_type): Add comparison_vec_info_type. (get_mask_type_for_scalar_type): New. * tree.c (build_truth_vector_type): New. (truth_type_for): Use build_truth_vector_type for vectors. * tree.h (build_truth_vector_type): New.
On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2015-09-03 15:11 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> Adding CCs. >>> >>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>: >>>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>> >>>> Totally disabling old style vector comparison and bool pattern is a >>>> goal but doing hat would mean a lot of regressions for many targets. >>>> Do you want to it to be tried to estimate amount of changes required >>>> and reveal possible issues? What would be integration plan for these >>>> changes? Do you want to just introduce new vector<bool> in GIMPLE >>>> disabling bool patterns and then resolving vectorization regression on >>>> all targets or allow them live together with following target switch >>>> one by one from bool patterns with finally removing them? Not all >>>> targets are likely to be adopted fast I suppose. >> >> Well, the frontends already create vec_cond exprs I believe. So for >> bool patterns the vectorizer would have to do the same, but the >> comparison result in there would still use vec<bool>. Thus the scalar >> >> _Bool a = b < c; >> _Bool c = a || d; >> if (c) >> >> would become >> >> vec<int> a = VEC_COND <a < b ? -1 : 0>; >> vec<int> c = a | d; > > This should be identical to > > vec<_Bool> a = a < b; > vec<_Bool> c = a | d; > > where vec<_Bool> has VxSI mode. And we should prefer it in case target > supports vector comparison into vec<bool>, right? > >> >> when the target does not have vec<bool>s directly and otherwise >> vec<boo> directly (dropping the VEC_COND). >> >> Just the vector comparison inside the VEC_COND would always >> have vec<bool> type. > > I don't really understand what you mean by 'doesn't have vec<bool>s > dirrectly' here. Currently I have a hook to ask for a vec<bool> mode > and assume target doesn't support it in case it returns VOIDmode. But > in such case I have no mode to use for vec<bool> inside VEC_COND > either. I was thinking about targets not supporting generating vec<bool> (of whatever mode) from a comparison directly but only via a COND_EXPR. > In default implementation of the new target hook I always return > integer vector mode (to have default behavior similar to the current > one). It should allow me to use vec<bool> for conditions in all > vec_cond. But we'd need some other trigger for bool patterns to apply. > Probably check vec_cmp optab in check_bool_pattern and don't convert > in case comparison is supported by target? Or control it via > additional hook. Not sure if we are always talking about the same thing for "bool patterns". I'd remove bool patterns completely, IMHO they are not necessary at all. >> >> And the "bool patterns" I am talking about are those in >> tree-vect-patterns.c, not any targets instruction patterns. > > I refer to them also. BTW bool patterns also pull comparison into > vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I > think with vector comparisons in place we should allow SSA_NAME as > conditions in VEC_COND for better CSE. That should require new vcond > optabs though. I think we do allow this, just the vectorizer doesn't expect it. In the long run I want to get rid of the GENERIC exprs in both COND_EXPR and VEC_COND_EXPR. Just didn't have the time to do this... Richard. > Ilya > >> >> Richard. >> >>>> >>>> Ilya
2015-09-18 15:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2015-09-03 15:11 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>> Adding CCs. >>>> >>>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>: >>>>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>>> >>>>> Totally disabling old style vector comparison and bool pattern is a >>>>> goal but doing hat would mean a lot of regressions for many targets. >>>>> Do you want to it to be tried to estimate amount of changes required >>>>> and reveal possible issues? What would be integration plan for these >>>>> changes? Do you want to just introduce new vector<bool> in GIMPLE >>>>> disabling bool patterns and then resolving vectorization regression on >>>>> all targets or allow them live together with following target switch >>>>> one by one from bool patterns with finally removing them? Not all >>>>> targets are likely to be adopted fast I suppose. >>> >>> Well, the frontends already create vec_cond exprs I believe. So for >>> bool patterns the vectorizer would have to do the same, but the >>> comparison result in there would still use vec<bool>. Thus the scalar >>> >>> _Bool a = b < c; >>> _Bool c = a || d; >>> if (c) >>> >>> would become >>> >>> vec<int> a = VEC_COND <a < b ? -1 : 0>; >>> vec<int> c = a | d; >> >> This should be identical to >> >> vec<_Bool> a = a < b; >> vec<_Bool> c = a | d; >> >> where vec<_Bool> has VxSI mode. And we should prefer it in case target >> supports vector comparison into vec<bool>, right? >> >>> >>> when the target does not have vec<bool>s directly and otherwise >>> vec<boo> directly (dropping the VEC_COND). >>> >>> Just the vector comparison inside the VEC_COND would always >>> have vec<bool> type. >> >> I don't really understand what you mean by 'doesn't have vec<bool>s >> dirrectly' here. Currently I have a hook to ask for a vec<bool> mode >> and assume target doesn't support it in case it returns VOIDmode. But >> in such case I have no mode to use for vec<bool> inside VEC_COND >> either. > > I was thinking about targets not supporting generating vec<bool> > (of whatever mode) from a comparison directly but only via > a COND_EXPR. Where may these direct comparisons come from? Vectorizer never generates unsupported statements. It means we get them from gimplifier? So touch optabs in gimplifier to avoid direct comparisons? Actually vect lowering checks if we are able to make comparison and expand also uses vec_cond to expand vector comparison, so probably we may live with them. > >> In default implementation of the new target hook I always return >> integer vector mode (to have default behavior similar to the current >> one). It should allow me to use vec<bool> for conditions in all >> vec_cond. But we'd need some other trigger for bool patterns to apply. >> Probably check vec_cmp optab in check_bool_pattern and don't convert >> in case comparison is supported by target? Or control it via >> additional hook. > > Not sure if we are always talking about the same thing for > "bool patterns". I'd remove bool patterns completely, IMHO > they are not necessary at all. I refer to transformations made by vect_recog_bool_pattern. Don't see how to remove them completely for targets not supporting comparison vectorization. > >>> >>> And the "bool patterns" I am talking about are those in >>> tree-vect-patterns.c, not any targets instruction patterns. >> >> I refer to them also. BTW bool patterns also pull comparison into >> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I >> think with vector comparisons in place we should allow SSA_NAME as >> conditions in VEC_COND for better CSE. That should require new vcond >> optabs though. > > I think we do allow this, just the vectorizer doesn't expect it. In the long > run I want to get rid of the GENERIC exprs in both COND_EXPR and > VEC_COND_EXPR. Just didn't have the time to do this... That would be nice. As a first step I'd like to support optabs for VEC_COND_EXPR directly using vec<bool>. Thanks, Ilya > > Richard. > >> Ilya >> >>> >>> Richard. >>> >>>>> >>>>> Ilya
2015-09-18 16:40 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>: > 2015-09-18 15:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> >> I was thinking about targets not supporting generating vec<bool> >> (of whatever mode) from a comparison directly but only via >> a COND_EXPR. > > Where may these direct comparisons come from? Vectorizer never > generates unsupported statements. It means we get them from > gimplifier? So touch optabs in gimplifier to avoid direct comparisons? > Actually vect lowering checks if we are able to make comparison and > expand also uses vec_cond to expand vector comparison, so probably we > may live with them. > >> >> Not sure if we are always talking about the same thing for >> "bool patterns". I'd remove bool patterns completely, IMHO >> they are not necessary at all. > > I refer to transformations made by vect_recog_bool_pattern. Don't see > how to remove them completely for targets not supporting comparison > vectorization. > >> >> I think we do allow this, just the vectorizer doesn't expect it. In the long >> run I want to get rid of the GENERIC exprs in both COND_EXPR and >> VEC_COND_EXPR. Just didn't have the time to do this... > > That would be nice. As a first step I'd like to support optabs for > VEC_COND_EXPR directly using vec<bool>. > > Thanks, > Ilya > >> >> Richard. Hi Richard, Do you think we have enough confidence approach is working and we may start integrating it into trunk? What would be integration plan then? Thanks, Ilya
On Fri, Sep 18, 2015 at 3:40 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2015-09-18 15:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> 2015-09-03 15:11 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>>> Adding CCs. >>>>> >>>>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>: >>>>>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>>>> >>>>>> Totally disabling old style vector comparison and bool pattern is a >>>>>> goal but doing hat would mean a lot of regressions for many targets. >>>>>> Do you want to it to be tried to estimate amount of changes required >>>>>> and reveal possible issues? What would be integration plan for these >>>>>> changes? Do you want to just introduce new vector<bool> in GIMPLE >>>>>> disabling bool patterns and then resolving vectorization regression on >>>>>> all targets or allow them live together with following target switch >>>>>> one by one from bool patterns with finally removing them? Not all >>>>>> targets are likely to be adopted fast I suppose. >>>> >>>> Well, the frontends already create vec_cond exprs I believe. So for >>>> bool patterns the vectorizer would have to do the same, but the >>>> comparison result in there would still use vec<bool>. Thus the scalar >>>> >>>> _Bool a = b < c; >>>> _Bool c = a || d; >>>> if (c) >>>> >>>> would become >>>> >>>> vec<int> a = VEC_COND <a < b ? -1 : 0>; >>>> vec<int> c = a | d; >>> >>> This should be identical to >>> >>> vec<_Bool> a = a < b; >>> vec<_Bool> c = a | d; >>> >>> where vec<_Bool> has VxSI mode. And we should prefer it in case target >>> supports vector comparison into vec<bool>, right? >>> >>>> >>>> when the target does not have vec<bool>s directly and otherwise >>>> vec<boo> directly (dropping the VEC_COND). >>>> >>>> Just the vector comparison inside the VEC_COND would always >>>> have vec<bool> type. >>> >>> I don't really understand what you mean by 'doesn't have vec<bool>s >>> dirrectly' here. Currently I have a hook to ask for a vec<bool> mode >>> and assume target doesn't support it in case it returns VOIDmode. But >>> in such case I have no mode to use for vec<bool> inside VEC_COND >>> either. >> >> I was thinking about targets not supporting generating vec<bool> >> (of whatever mode) from a comparison directly but only via >> a COND_EXPR. > > Where may these direct comparisons come from? Vectorizer never > generates unsupported statements. It means we get them from > gimplifier? That's what I say - the vecotirzer wouldn't generate them. > So touch optabs in gimplifier to avoid direct comparisons? > Actually vect lowering checks if we are able to make comparison and > expand also uses vec_cond to expand vector comparison, so probably we > may live with them. > >> >>> In default implementation of the new target hook I always return >>> integer vector mode (to have default behavior similar to the current >>> one). It should allow me to use vec<bool> for conditions in all >>> vec_cond. But we'd need some other trigger for bool patterns to apply. >>> Probably check vec_cmp optab in check_bool_pattern and don't convert >>> in case comparison is supported by target? Or control it via >>> additional hook. >> >> Not sure if we are always talking about the same thing for >> "bool patterns". I'd remove bool patterns completely, IMHO >> they are not necessary at all. > > I refer to transformations made by vect_recog_bool_pattern. Don't see > how to remove them completely for targets not supporting comparison > vectorization. The vectorizer can vectorize comparisons by emitting a VEC_COND_EXPR (the bool pattern would turn the comparison into a COND_EXPR). I don't see how the pattern intermediate step is necessary. The important part is to get the desired vector type of the comparison determined. >> >>>> >>>> And the "bool patterns" I am talking about are those in >>>> tree-vect-patterns.c, not any targets instruction patterns. >>> >>> I refer to them also. BTW bool patterns also pull comparison into >>> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I >>> think with vector comparisons in place we should allow SSA_NAME as >>> conditions in VEC_COND for better CSE. That should require new vcond >>> optabs though. >> >> I think we do allow this, just the vectorizer doesn't expect it. In the long >> run I want to get rid of the GENERIC exprs in both COND_EXPR and >> VEC_COND_EXPR. Just didn't have the time to do this... > > That would be nice. As a first step I'd like to support optabs for > VEC_COND_EXPR directly using vec<bool>. > > Thanks, > Ilya > >> >> Richard. >> >>> Ilya >>> >>>> >>>> Richard. >>>> >>>>>> >>>>>> Ilya
On Wed, Sep 23, 2015 at 3:41 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2015-09-18 16:40 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>: >> 2015-09-18 15:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>> >>> I was thinking about targets not supporting generating vec<bool> >>> (of whatever mode) from a comparison directly but only via >>> a COND_EXPR. >> >> Where may these direct comparisons come from? Vectorizer never >> generates unsupported statements. It means we get them from >> gimplifier? So touch optabs in gimplifier to avoid direct comparisons? >> Actually vect lowering checks if we are able to make comparison and >> expand also uses vec_cond to expand vector comparison, so probably we >> may live with them. >> >>> >>> Not sure if we are always talking about the same thing for >>> "bool patterns". I'd remove bool patterns completely, IMHO >>> they are not necessary at all. >> >> I refer to transformations made by vect_recog_bool_pattern. Don't see >> how to remove them completely for targets not supporting comparison >> vectorization. >> >>> >>> I think we do allow this, just the vectorizer doesn't expect it. In the long >>> run I want to get rid of the GENERIC exprs in both COND_EXPR and >>> VEC_COND_EXPR. Just didn't have the time to do this... >> >> That would be nice. As a first step I'd like to support optabs for >> VEC_COND_EXPR directly using vec<bool>. >> >> Thanks, >> Ilya >> >>> >>> Richard. > > Hi Richard, > > Do you think we have enough confidence approach is working and we may > start integrating it into trunk? What would be integration plan then? I'm still worried about the vec<bool> vector size vs. element size issue (well, somewhat). Otherwise the integration plan would be 1) put in the vector<bool> GIMPLE type support and change the vector comparison type IL requirement to be vector<bool>, fixing all fallout 2) get support for directly expanding vector comparisons to vector<bool> and make use of that from the x86 backend 3) make the vectorizer generate the above if supported I think independent improvements are 1) remove (most) of the bool patterns from the vectorizer 2) make VEC_COND_EXPR not have a GENERIC comparison embedded (same for COND_EXPR?) Richard. > Thanks, > Ilya
On 09/23/2015 06:53 AM, Richard Biener wrote: > I think independent improvements are > > 1) remove (most) of the bool patterns from the vectorizer > > 2) make VEC_COND_EXPR not have a GENERIC comparison embedded > > (same for COND_EXPR?) Careful. The reason that COND_EXPR have embedded comparisons is to handle flags registers. You can't separate the setting of the flags from the using of the flags on most targets, because there's only one flags register. The same is true for VEC_COND_EXPR with respect to MIPS. The base architecture has 8 floating-point comparison result flags, and the vector compare instructions are fixed to set fcc[0:width-1]. So again there's only one possible output location for the result of the compare. MIPS is going to present a problem if we attempt to generalize logical combinations of these vector<bool>, since one has to use several instructions (or one insn and pre-load constants into two registers) to get the fcc bits out into a form we can manipulate. r~
On Wed, Sep 23, 2015 at 8:44 PM, Richard Henderson <rth@redhat.com> wrote: > On 09/23/2015 06:53 AM, Richard Biener wrote: >> I think independent improvements are >> >> 1) remove (most) of the bool patterns from the vectorizer >> >> 2) make VEC_COND_EXPR not have a GENERIC comparison embedded >> >> (same for COND_EXPR?) > > Careful. > > The reason that COND_EXPR have embedded comparisons is to handle flags > registers. You can't separate the setting of the flags from the using of the > flags on most targets, because there's only one flags register. > > The same is true for VEC_COND_EXPR with respect to MIPS. The base architecture > has 8 floating-point comparison result flags, and the vector compare > instructions are fixed to set fcc[0:width-1]. So again there's only one > possible output location for the result of the compare. > > MIPS is going to present a problem if we attempt to generalize logical > combinations of these vector<bool>, since one has to use several instructions > (or one insn and pre-load constants into two registers) to get the fcc bits out > into a form we can manipulate. Both are basically a (target) restriction on how we should expand a conditional move (and its condition). It's techincally convenient to tie both together by having them in the same statement but it's also techincally very incovenient in other places. I'd say for targets where tem_1 = a_2 < b_3; res_4 = tem_1 ? c_5 : d_6; res_7 = tem_1 ? x_8 : z_9; presents a serious issue ("re-using" the flags register) out-of-SSA should duplicate the conditionals so that TER can do its job (and RTL expansion should use TER to get at the flags setter). I imagine that if we expand the above to adjacent statements the CPUs can re-use the condition code. To me where the condition is in GIMPLE is an implementation detail and the inconveniences outweight the benefits. Maybe we should make the effects of TER on the statement schedule explicitely visible to make debugging that easier and remove the implicit scheduling from the SSA name expansion code (basically require SSA names do have expanded defs). That way we have the chance to perform pre-expansion "scheduling" in a more predictable way leaving only the parts of the expansion using TER that want to see a bigger expression (like [VEC_]COND_EXPR expansion eventually). Richard. > > r~
On 09/24/2015 01:09 AM, Richard Biener wrote: > Both are basically a (target) restriction on how we should expand a conditional > move (and its condition). It's techincally convenient to tie both together by > having them in the same statement but it's also techincally very incovenient > in other places. I'd say for targets where > > tem_1 = a_2 < b_3; > res_4 = tem_1 ? c_5 : d_6; > res_7 = tem_1 ? x_8 : z_9; > > presents a serious issue ("re-using" the flags register) out-of-SSA should > duplicate the conditionals so that TER can do its job (and RTL expansion > should use TER to get at the flags setter). Sure it's a target restriction, but it's an extremely common one. Essentially all of our production platforms have it. What do we gain by adding some sort of target hook for this? > I imagine that if we expand the above to adjacent statements the CPUs can > re-use the condition code. Sure, but IMO it should be the job of RTL CSE to make that decision, after all of the uses (and clobbers) of the flags register have been exposed. > To me where the condition is in GIMPLE is an implementation detail and the > inconveniences outweight the benefits. Why is a 3-operand gimple statement fine, but a 4-operand gimple statement inconvenient? r~
On Thu, Sep 24, 2015 at 6:37 PM, Richard Henderson <rth@redhat.com> wrote: > On 09/24/2015 01:09 AM, Richard Biener wrote: >> Both are basically a (target) restriction on how we should expand a conditional >> move (and its condition). It's techincally convenient to tie both together by >> having them in the same statement but it's also techincally very incovenient >> in other places. I'd say for targets where >> >> tem_1 = a_2 < b_3; >> res_4 = tem_1 ? c_5 : d_6; >> res_7 = tem_1 ? x_8 : z_9; >> >> presents a serious issue ("re-using" the flags register) out-of-SSA should >> duplicate the conditionals so that TER can do its job (and RTL expansion >> should use TER to get at the flags setter). > > Sure it's a target restriction, but it's an extremely common one. Essentially > all of our production platforms have it. What do we gain by adding some sort > of target hook for this? A cleaner IL, no GENERIC expression tree building in GIMPLE (I guess that's sth Andrew needs for his GIMPLE types project as well), less awkward special-casing of comparisons based on context in code like genmatch.c or in value-numbering. >> I imagine that if we expand the above to adjacent statements the CPUs can >> re-use the condition code. > > Sure, but IMO it should be the job of RTL CSE to make that decision, after all > of the uses (and clobbers) of the flags register have been exposed. > >> To me where the condition is in GIMPLE is an implementation detail and the >> inconveniences outweight the benefits. > > Why is a 3-operand gimple statement fine, but a 4-operand gimple statement > inconvenient? The inconvenience is not the number of operands but that we have two operation codes and that we compute two values but only have an SSA name def for one of them. Oh, and did I mention that second operation is GENERIC? So one way to clean things up would be to no longer use a GIMPLE_ASSIGN for x = a < b ? c : d but instead use a GIMPLE_COND and give that a SSA def for the result, using the true/false label operand places for 'c' and 'd'. That still wouldn't get the compare a SSA def but at least it would get rid of the 2nd operator code and the GENERIC expression operand. From the GIMPLE side forcing out the comparison to a separate stmt looks more obvious and if we're considering doing a different thing then we may as well think of how to represent predicating arbitrary stmts or how to explicitely model condition codes in GIMPLE. It kind of looks like we want a GIMPLE PARALLEL ... (we already have a GIMPLE stmt with multiple defs - GIMPLE_ASM) Richard. > > r~
2015-09-23 16:53 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Wed, Sep 23, 2015 at 3:41 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2015-09-18 16:40 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>: >>> 2015-09-18 15:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >>>> >>>> I was thinking about targets not supporting generating vec<bool> >>>> (of whatever mode) from a comparison directly but only via >>>> a COND_EXPR. >>> >>> Where may these direct comparisons come from? Vectorizer never >>> generates unsupported statements. It means we get them from >>> gimplifier? So touch optabs in gimplifier to avoid direct comparisons? >>> Actually vect lowering checks if we are able to make comparison and >>> expand also uses vec_cond to expand vector comparison, so probably we >>> may live with them. >>> >>>> >>>> Not sure if we are always talking about the same thing for >>>> "bool patterns". I'd remove bool patterns completely, IMHO >>>> they are not necessary at all. >>> >>> I refer to transformations made by vect_recog_bool_pattern. Don't see >>> how to remove them completely for targets not supporting comparison >>> vectorization. >>> >>>> >>>> I think we do allow this, just the vectorizer doesn't expect it. In the long >>>> run I want to get rid of the GENERIC exprs in both COND_EXPR and >>>> VEC_COND_EXPR. Just didn't have the time to do this... >>> >>> That would be nice. As a first step I'd like to support optabs for >>> VEC_COND_EXPR directly using vec<bool>. >>> >>> Thanks, >>> Ilya >>> >>>> >>>> Richard. >> >> Hi Richard, >> >> Do you think we have enough confidence approach is working and we may >> start integrating it into trunk? What would be integration plan then? > > I'm still worried about the vec<bool> vector size vs. element size > issue (well, somewhat). Yeah, I hit another problem related to element size in vec lowering. It uses inner type sizes in expand_vector_piecewise and bool vector expand goes in a wrong way. There were also other places with similar problems and therefore I want to try to use bools of different sizes and see how it goes. Also having different sized bools may be useful to represent masks pack/unpack in scalar code. > > Otherwise the integration plan would be > > 1) put in the vector<bool> GIMPLE type support and change the vector > comparison type IL requirement to be vector<bool>, > fixing all fallout > > 2) get support for directly expanding vector comparisons to > vector<bool> and make use of that from the x86 backend > > 3) make the vectorizer generate the above if supported > > I think independent improvements are > > 1) remove (most) of the bool patterns from the vectorizer > > 2) make VEC_COND_EXPR not have a GENERIC comparison embedded Sounds great! Ilya > > (same for COND_EXPR?) > > Richard. > >> Thanks, >> Ilya
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 6a17ef4..e22aa57 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -129,6 +129,9 @@ extern bool ix86_expand_fp_vcond (rtx[]); extern bool ix86_expand_int_vcond (rtx[]); extern void ix86_expand_vec_perm (rtx[]); extern bool ix86_expand_vec_perm_const (rtx[]); +extern bool ix86_expand_mask_vec_cmp (rtx[]); +extern bool ix86_expand_int_vec_cmp (rtx[]); +extern bool ix86_expand_fp_vec_cmp (rtx[]); extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool); extern bool ix86_expand_int_addcc (rtx[]); extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 070605f..e44cdb5 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -21440,8 +21440,8 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1, cmp_op1 = force_reg (cmp_ops_mode, cmp_op1); if (optimize - || reg_overlap_mentioned_p (dest, op_true) - || reg_overlap_mentioned_p (dest, op_false)) + || (op_true && reg_overlap_mentioned_p (dest, op_true)) + || (op_false && reg_overlap_mentioned_p (dest, op_false))) dest = gen_reg_rtx (maskcmp ? cmp_mode : mode); /* Compare patterns for int modes are unspec in AVX512F only. */ @@ -21713,34 +21713,127 @@ ix86_expand_fp_movcc (rtx operands[]) return true; } -/* Expand a floating-point vector conditional move; a vcond operation - rather than a movcc operation. */ +/* Helper for ix86_cmp_code_to_pcmp_immediate for int modes. */ + +static int +ix86_int_cmp_code_to_pcmp_immediate (enum rtx_code code) +{ + switch (code) + { + case EQ: + return 0; + case LT: + case LTU: + return 1; + case LE: + case LEU: + return 2; + case NE: + return 4; + case GE: + case GEU: + return 5; + case GT: + case GTU: + return 6; + default: + gcc_unreachable (); + } +} + +/* Helper for ix86_cmp_code_to_pcmp_immediate for fp modes. */ + +static int +ix86_fp_cmp_code_to_pcmp_immediate (enum rtx_code code) +{ + switch (code) + { + case EQ: + return 0x08; + case NE: + return 0x04; + case GT: + return 0x16; + case LE: + return 0x1a; + case GE: + return 0x15; + case LT: + return 0x19; + default: + gcc_unreachable (); + } +} + +/* Return immediate value to be used in UNSPEC_PCMP + for comparison CODE in MODE. */ + +static int +ix86_cmp_code_to_pcmp_immediate (enum rtx_code code, machine_mode mode) +{ + if (FLOAT_MODE_P (mode)) + return ix86_fp_cmp_code_to_pcmp_immediate (code); + return ix86_int_cmp_code_to_pcmp_immediate (code); +} + +/* Expand AVX-512 vector comparison. */ bool -ix86_expand_fp_vcond (rtx operands[]) +ix86_expand_mask_vec_cmp (rtx operands[]) { - enum rtx_code code = GET_CODE (operands[3]); + machine_mode mask_mode = GET_MODE (operands[0]); + machine_mode cmp_mode = GET_MODE (operands[2]); + enum rtx_code code = GET_CODE (operands[1]); + rtx imm = GEN_INT (ix86_cmp_code_to_pcmp_immediate (code, cmp_mode)); + int unspec_code; + rtx unspec; + + switch (code) + { + case LEU: + case GTU: + case GEU: + case LTU: + unspec_code = UNSPEC_UNSIGNED_PCMP; + default: + unspec_code = UNSPEC_PCMP; + } + + unspec = gen_rtx_UNSPEC (mask_mode, gen_rtvec (3, operands[2], + operands[3], imm), + unspec_code); + emit_insn (gen_rtx_SET (operands[0], unspec)); + + return true; +} + +/* Expand fp vector comparison. */ + +bool +ix86_expand_fp_vec_cmp (rtx operands[]) +{ + enum rtx_code code = GET_CODE (operands[1]); rtx cmp; code = ix86_prepare_sse_fp_compare_args (operands[0], code, - &operands[4], &operands[5]); + &operands[2], &operands[3]); if (code == UNKNOWN) { rtx temp; - switch (GET_CODE (operands[3])) + switch (GET_CODE (operands[1])) { case LTGT: - temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4], - operands[5], operands[0], operands[0]); - cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4], - operands[5], operands[1], operands[2]); + temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[2], + operands[3], NULL, NULL); + cmp = ix86_expand_sse_cmp (operands[0], NE, operands[2], + operands[3], NULL, NULL); code = AND; break; case UNEQ: - temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4], - operands[5], operands[0], operands[0]); - cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4], - operands[5], operands[1], operands[2]); + temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[2], + operands[3], NULL, NULL); + cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[2], + operands[3], NULL, NULL); code = IOR; break; default: @@ -21748,72 +21841,26 @@ ix86_expand_fp_vcond (rtx operands[]) } cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1, OPTAB_DIRECT); - ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]); - return true; } + else + cmp = ix86_expand_sse_cmp (operands[0], code, operands[2], operands[3], + operands[1], operands[2]); - if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4], - operands[5], operands[1], operands[2])) - return true; + if (operands[0] != cmp) + emit_move_insn (operands[0], cmp); - cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5], - operands[1], operands[2]); - ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]); return true; } -/* Expand a signed/unsigned integral vector conditional move. */ - -bool -ix86_expand_int_vcond (rtx operands[]) +static rtx +ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, rtx cop0, rtx cop1, + rtx op_true, rtx op_false, bool *negate) { - machine_mode data_mode = GET_MODE (operands[0]); - machine_mode mode = GET_MODE (operands[4]); - enum rtx_code code = GET_CODE (operands[3]); - bool negate = false; - rtx x, cop0, cop1; - - cop0 = operands[4]; - cop1 = operands[5]; + machine_mode data_mode = GET_MODE (dest); + machine_mode mode = GET_MODE (cop0); + rtx x; - /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31 - and x < 0 ? 1 : 0 into (unsigned) x >> 31. */ - if ((code == LT || code == GE) - && data_mode == mode - && cop1 == CONST0_RTX (mode) - && operands[1 + (code == LT)] == CONST0_RTX (data_mode) - && GET_MODE_UNIT_SIZE (data_mode) > 1 - && GET_MODE_UNIT_SIZE (data_mode) <= 8 - && (GET_MODE_SIZE (data_mode) == 16 - || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32))) - { - rtx negop = operands[2 - (code == LT)]; - int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1; - if (negop == CONST1_RTX (data_mode)) - { - rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift), - operands[0], 1, OPTAB_DIRECT); - if (res != operands[0]) - emit_move_insn (operands[0], res); - return true; - } - else if (GET_MODE_INNER (data_mode) != DImode - && vector_all_ones_operand (negop, data_mode)) - { - rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift), - operands[0], 0, OPTAB_DIRECT); - if (res != operands[0]) - emit_move_insn (operands[0], res); - return true; - } - } - - if (!nonimmediate_operand (cop1, mode)) - cop1 = force_reg (mode, cop1); - if (!general_operand (operands[1], data_mode)) - operands[1] = force_reg (data_mode, operands[1]); - if (!general_operand (operands[2], data_mode)) - operands[2] = force_reg (data_mode, operands[2]); + *negate = false; /* XOP supports all of the comparisons on all 128-bit vector int types. */ if (TARGET_XOP @@ -21834,13 +21881,13 @@ ix86_expand_int_vcond (rtx operands[]) case LE: case LEU: code = reverse_condition (code); - negate = true; + *negate = true; break; case GE: case GEU: code = reverse_condition (code); - negate = true; + *negate = true; /* FALLTHRU */ case LT: @@ -21861,14 +21908,14 @@ ix86_expand_int_vcond (rtx operands[]) case EQ: /* SSE4.1 supports EQ. */ if (!TARGET_SSE4_1) - return false; + return NULL; break; case GT: case GTU: /* SSE4.2 supports GT/GTU. */ if (!TARGET_SSE4_2) - return false; + return NULL; break; default: @@ -21929,12 +21976,13 @@ ix86_expand_int_vcond (rtx operands[]) case V8HImode: /* Perform a parallel unsigned saturating subtraction. */ x = gen_reg_rtx (mode); - emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0, cop1))); + emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0, + cop1))); cop0 = x; cop1 = CONST0_RTX (mode); code = EQ; - negate = !negate; + *negate = !*negate; break; default: @@ -21943,22 +21991,162 @@ ix86_expand_int_vcond (rtx operands[]) } } + if (*negate) + std::swap (op_true, op_false); + /* Allow the comparison to be done in one mode, but the movcc to happen in another mode. */ if (data_mode == mode) { - x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1, - operands[1+negate], operands[2-negate]); + x = ix86_expand_sse_cmp (dest, code, cop0, cop1, + op_true, op_false); } else { gcc_assert (GET_MODE_SIZE (data_mode) == GET_MODE_SIZE (mode)); x = ix86_expand_sse_cmp (gen_reg_rtx (mode), code, cop0, cop1, - operands[1+negate], operands[2-negate]); + op_true, op_false); if (GET_MODE (x) == mode) x = gen_lowpart (data_mode, x); } + return x; +} + +/* Expand integer vector comparison. */ + +bool +ix86_expand_int_vec_cmp (rtx operands[]) +{ + rtx_code code = GET_CODE (operands[1]); + bool negate = false; + rtx cmp = ix86_expand_int_sse_cmp (operands[0], code, operands[2], + operands[3], NULL, NULL, &negate); + + if (!cmp) + return false; + + if (negate) + cmp = ix86_expand_int_sse_cmp (operands[0], EQ, cmp, + CONST0_RTX (GET_MODE (cmp)), + NULL, NULL, &negate); + + gcc_assert (!negate); + + if (operands[0] != cmp) + emit_move_insn (operands[0], cmp); + + return true; +} + +/* Expand a floating-point vector conditional move; a vcond operation + rather than a movcc operation. */ + +bool +ix86_expand_fp_vcond (rtx operands[]) +{ + enum rtx_code code = GET_CODE (operands[3]); + rtx cmp; + + code = ix86_prepare_sse_fp_compare_args (operands[0], code, + &operands[4], &operands[5]); + if (code == UNKNOWN) + { + rtx temp; + switch (GET_CODE (operands[3])) + { + case LTGT: + temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4], + operands[5], operands[0], operands[0]); + cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4], + operands[5], operands[1], operands[2]); + code = AND; + break; + case UNEQ: + temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4], + operands[5], operands[0], operands[0]); + cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4], + operands[5], operands[1], operands[2]); + code = IOR; + break; + default: + gcc_unreachable (); + } + cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1, + OPTAB_DIRECT); + ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]); + return true; + } + + if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4], + operands[5], operands[1], operands[2])) + return true; + + cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5], + operands[1], operands[2]); + ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]); + return true; +} + +/* Expand a signed/unsigned integral vector conditional move. */ + +bool +ix86_expand_int_vcond (rtx operands[]) +{ + machine_mode data_mode = GET_MODE (operands[0]); + machine_mode mode = GET_MODE (operands[4]); + enum rtx_code code = GET_CODE (operands[3]); + bool negate = false; + rtx x, cop0, cop1; + + cop0 = operands[4]; + cop1 = operands[5]; + + /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31 + and x < 0 ? 1 : 0 into (unsigned) x >> 31. */ + if ((code == LT || code == GE) + && data_mode == mode + && cop1 == CONST0_RTX (mode) + && operands[1 + (code == LT)] == CONST0_RTX (data_mode) + && GET_MODE_UNIT_SIZE (data_mode) > 1 + && GET_MODE_UNIT_SIZE (data_mode) <= 8 + && (GET_MODE_SIZE (data_mode) == 16 + || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32))) + { + rtx negop = operands[2 - (code == LT)]; + int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1; + if (negop == CONST1_RTX (data_mode)) + { + rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift), + operands[0], 1, OPTAB_DIRECT); + if (res != operands[0]) + emit_move_insn (operands[0], res); + return true; + } + else if (GET_MODE_INNER (data_mode) != DImode + && vector_all_ones_operand (negop, data_mode)) + { + rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift), + operands[0], 0, OPTAB_DIRECT); + if (res != operands[0]) + emit_move_insn (operands[0], res); + return true; + } + } + + if (!nonimmediate_operand (cop1, mode)) + cop1 = force_reg (mode, cop1); + if (!general_operand (operands[1], data_mode)) + operands[1] = force_reg (data_mode, operands[1]); + if (!general_operand (operands[2], data_mode)) + operands[2] = force_reg (data_mode, operands[2]); + + x = ix86_expand_int_sse_cmp (operands[0], code, cop0, cop1, + operands[1], operands[2], &negate); + + if (!x) + return false; + ix86_expand_sse_movcc (operands[0], x, operands[1+negate], operands[2-negate]); return true; @@ -51678,6 +51866,25 @@ ix86_autovectorize_vector_sizes (void) (TARGET_AVX && !TARGET_PREFER_AVX128) ? 32 | 16 : 0; } +/* Implemenation of targetm.vectorize.get_mask_mode. */ + +static machine_mode +ix86_get_mask_mode (unsigned nunits, unsigned vector_size) +{ + /* Scalar mask case. */ + if ((TARGET_AVX512F && vector_size == 64) + || TARGET_AVX512VL) + return smallest_mode_for_size (nunits, MODE_INT); + + unsigned elem_size = vector_size / nunits; + machine_mode elem_mode + = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT); + + gcc_assert (elem_size * nunits == vector_size); + + return mode_for_vector (elem_mode, nunits); +} + /* Return class of registers which could be used for pseudo of MODE @@ -52612,6 +52819,8 @@ ix86_operands_ok_for_move_multiple (rtx *operands, bool load, #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \ ix86_autovectorize_vector_sizes +#undef TARGET_VECTORIZE_GET_MASK_MODE +#define TARGET_VECTORIZE_GET_MASK_MODE ix86_get_mask_mode #undef TARGET_VECTORIZE_INIT_COST #define TARGET_VECTORIZE_INIT_COST ix86_init_cost #undef TARGET_VECTORIZE_ADD_STMT_COST diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4535570..a8d55cc 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -605,6 +605,15 @@ (V16SF "HI") (V8SF "QI") (V4SF "QI") (V8DF "QI") (V4DF "QI") (V2DF "QI")]) +;; Mapping of vector modes to corresponding mask size +(define_mode_attr avx512fmaskmodelower + [(V64QI "di") (V32QI "si") (V16QI "hi") + (V32HI "si") (V16HI "hi") (V8HI "qi") (V4HI "qi") + (V16SI "hi") (V8SI "qi") (V4SI "qi") + (V8DI "qi") (V4DI "qi") (V2DI "qi") + (V16SF "hi") (V8SF "qi") (V4SF "qi") + (V8DF "qi") (V4DF "qi") (V2DF "qi")]) + ;; Mapping of vector float modes to an integer mode of the same size (define_mode_attr sseintvecmode [(V16SF "V16SI") (V8DF "V8DI") @@ -2803,6 +2812,150 @@ (const_string "0"))) (set_attr "mode" "<MODE>")]) +(define_expand "vec_cmp<mode><avx512fmaskmodelower>" + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (match_operator:<avx512fmaskmode> 1 "" + [(match_operand:V48_AVX512VL 2 "register_operand") + (match_operand:V48_AVX512VL 3 "nonimmediate_operand")]))] + "TARGET_AVX512F" +{ + bool ok = ix86_expand_mask_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmp<mode><avx512fmaskmodelower>" + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (match_operator:<avx512fmaskmode> 1 "" + [(match_operand:VI12_AVX512VL 2 "register_operand") + (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))] + "TARGET_AVX512BW" +{ + bool ok = ix86_expand_mask_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmp<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VI_256 2 "register_operand") + (match_operand:VI_256 3 "nonimmediate_operand")]))] + "TARGET_AVX2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmp<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VI124_128 2 "register_operand") + (match_operand:VI124_128 3 "nonimmediate_operand")]))] + "TARGET_SSE2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpv2div2di" + [(set (match_operand:V2DI 0 "register_operand") + (match_operator:V2DI 1 "" + [(match_operand:V2DI 2 "register_operand") + (match_operand:V2DI 3 "nonimmediate_operand")]))] + "TARGET_SSE4_2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmp<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VF_256 2 "register_operand") + (match_operand:VF_256 3 "nonimmediate_operand")]))] + "TARGET_AVX" +{ + bool ok = ix86_expand_fp_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmp<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VF_128 2 "register_operand") + (match_operand:VF_128 3 "nonimmediate_operand")]))] + "TARGET_SSE" +{ + bool ok = ix86_expand_fp_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpu<mode><avx512fmaskmodelower>" + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (match_operator:<avx512fmaskmode> 1 "" + [(match_operand:VI48_AVX512VL 2 "register_operand") + (match_operand:VI48_AVX512VL 3 "nonimmediate_operand")]))] + "TARGET_AVX512F" +{ + bool ok = ix86_expand_mask_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpu<mode><avx512fmaskmodelower>" + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (match_operator:<avx512fmaskmode> 1 "" + [(match_operand:VI12_AVX512VL 2 "register_operand") + (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))] + "TARGET_AVX512BW" +{ + bool ok = ix86_expand_mask_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpu<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VI_256 2 "register_operand") + (match_operand:VI_256 3 "nonimmediate_operand")]))] + "TARGET_AVX2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpu<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VI124_128 2 "register_operand") + (match_operand:VI124_128 3 "nonimmediate_operand")]))] + "TARGET_SSE2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpuv2div2di" + [(set (match_operand:V2DI 0 "register_operand") + (match_operator:V2DI 1 "" + [(match_operand:V2DI 2 "register_operand") + (match_operand:V2DI 3 "nonimmediate_operand")]))] + "TARGET_SSE4_2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + (define_expand "vcond<V_512:mode><VF_512:mode>" [(set (match_operand:V_512 0 "register_operand") (if_then_else:V_512 @@ -17895,7 +18048,7 @@ (set_attr "btver2_decode" "vector") (set_attr "mode" "<sseinsnmode>")]) -(define_expand "maskload<mode>" +(define_expand "maskload<mode><sseintvecmodelower>" [(set (match_operand:V48_AVX2 0 "register_operand") (unspec:V48_AVX2 [(match_operand:<sseintvecmode> 2 "register_operand") @@ -17903,7 +18056,23 @@ UNSPEC_MASKMOV))] "TARGET_AVX") -(define_expand "maskstore<mode>" +(define_expand "maskload<mode><avx512fmaskmodelower>" + [(set (match_operand:V48_AVX512VL 0 "register_operand") + (vec_merge:V48_AVX512VL + (match_operand:V48_AVX512VL 1 "memory_operand") + (match_dup 0) + (match_operand:<avx512fmaskmode> 2 "register_operand")))] + "TARGET_AVX512F") + +(define_expand "maskload<mode><avx512fmaskmodelower>" + [(set (match_operand:VI12_AVX512VL 0 "register_operand") + (vec_merge:VI12_AVX512VL + (match_operand:VI12_AVX512VL 1 "memory_operand") + (match_dup 0) + (match_operand:<avx512fmaskmode> 2 "register_operand")))] + "TARGET_AVX512BW") + +(define_expand "maskstore<mode><sseintvecmodelower>" [(set (match_operand:V48_AVX2 0 "memory_operand") (unspec:V48_AVX2 [(match_operand:<sseintvecmode> 2 "register_operand") @@ -17912,6 +18081,22 @@ UNSPEC_MASKMOV))] "TARGET_AVX") +(define_expand "maskstore<mode><avx512fmaskmodelower>" + [(set (match_operand:V48_AVX512VL 0 "memory_operand") + (vec_merge:V48_AVX512VL + (match_operand:V48_AVX512VL 1 "register_operand") + (match_dup 0) + (match_operand:<avx512fmaskmode> 2 "register_operand")))] + "TARGET_AVX512F") + +(define_expand "maskstore<mode><avx512fmaskmodelower>" + [(set (match_operand:VI12_AVX512VL 0 "memory_operand") + (vec_merge:VI12_AVX512VL + (match_operand:VI12_AVX512VL 1 "register_operand") + (match_dup 0) + (match_operand:<avx512fmaskmode> 2 "register_operand")))] + "TARGET_AVX512BW") + (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") (unspec:AVX256MODE2P diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index f5a1f84..acdfcd5 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5688,6 +5688,11 @@ mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}. The default is zero which means to not iterate over other vector sizes. @end deftypefn +@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_GET_MASK_MODE (unsigned @var{nunits}, unsigned @var{length}) +This hook returns mode to be used for a mask to be used for a vector +of specified @var{length} with @var{nunits} elements. +@end deftypefn + @deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop *@var{loop_info}) This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block. The default allocates three unsigned integers for accumulating costs for the prologue, body, and epilogue of the loop or basic block. If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized. @end deftypefn diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 9d5ac0a..52e912a 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4225,6 +4225,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES +@hook TARGET_VECTORIZE_GET_MASK_MODE + @hook TARGET_VECTORIZE_INIT_COST @hook TARGET_VECTORIZE_ADD_STMT_COST diff --git a/gcc/expr.c b/gcc/expr.c index 1e820b4..fa48484 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -11000,9 +11000,15 @@ do_store_flag (sepops ops, rtx target, machine_mode mode) if (TREE_CODE (ops->type) == VECTOR_TYPE) { tree ifexp = build2 (ops->code, ops->type, arg0, arg1); - tree if_true = constant_boolean_node (true, ops->type); - tree if_false = constant_boolean_node (false, ops->type); - return expand_vec_cond_expr (ops->type, ifexp, if_true, if_false, target); + if (TREE_TYPE (ops->type) == boolean_type_node) + return expand_vec_cmp_expr (ops->type, ifexp, target); + else + { + tree if_true = constant_boolean_node (true, ops->type); + tree if_false = constant_boolean_node (false, ops->type); + return expand_vec_cond_expr (ops->type, ifexp, if_true, + if_false, target); + } } /* Get the rtx comparison code to use. We know that EXP is a comparison diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index e785946..4ca0a40 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt) create_output_operand (&ops[0], target, TYPE_MODE (type)); create_fixed_operand (&ops[1], mem); create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); - expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); + expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type), + TYPE_MODE (TREE_TYPE (maskt))), + 3, ops); } static void @@ -1908,7 +1910,9 @@ expand_MASK_STORE (gcall *stmt) create_fixed_operand (&ops[0], mem); create_input_operand (&ops[1], reg, TYPE_MODE (type)); create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); - expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops); + expand_insn (convert_optab_handler (maskstore_optab, TYPE_MODE (type), + TYPE_MODE (TREE_TYPE (maskt))), + 3, ops); } static void diff --git a/gcc/optabs.c b/gcc/optabs.c index e533e6e..48f7914 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6490,11 +6490,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp) } /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or - unsigned operators. Do not generate compare instruction. */ + unsigned operators. OPNO holds an index of the first comparison + operand in insn with code ICODE. Do not generate compare instruction. */ static rtx vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1, - bool unsignedp, enum insn_code icode) + bool unsignedp, enum insn_code icode, + unsigned int opno) { struct expand_operand ops[2]; rtx rtx_op0, rtx_op1; @@ -6520,7 +6522,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1, create_input_operand (&ops[0], rtx_op0, m0); create_input_operand (&ops[1], rtx_op1, m1); - if (!maybe_legitimize_operands (icode, 4, 2, ops)) + if (!maybe_legitimize_operands (icode, opno, 2, ops)) gcc_unreachable (); return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value); } @@ -6863,7 +6865,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2, if (icode == CODE_FOR_nothing) return 0; - comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode); + comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4); rtx_op1 = expand_normal (op1); rtx_op2 = expand_normal (op2); @@ -6877,6 +6879,63 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2, return ops[0].value; } +/* Return insn code for a comparison operator with VMODE + resultin MASK_MODE, unsigned if UNS is true. */ + +static inline enum insn_code +get_vec_cmp_icode (machine_mode vmode, machine_mode mask_mode, bool uns) +{ + optab tab = uns ? vec_cmpu_optab : vec_cmp_optab; + return convert_optab_handler (tab, vmode, mask_mode); +} + +/* Return TRUE if appropriate vector insn is available + for vector comparison expr with vector type VALUE_TYPE + and resulting mask with MASK_TYPE. */ + +bool +expand_vec_cmp_expr_p (tree value_type, tree mask_type) +{ + enum insn_code icode = get_vec_cmp_icode (TYPE_MODE (value_type), + TYPE_MODE (mask_type), + TYPE_UNSIGNED (value_type)); + return (icode != CODE_FOR_nothing); +} + +/* Generate insns for a vector comparison into a mask. */ + +rtx +expand_vec_cmp_expr (tree type, tree exp, rtx target) +{ + struct expand_operand ops[4]; + enum insn_code icode; + rtx comparison; + machine_mode mask_mode = TYPE_MODE (type); + machine_mode vmode; + bool unsignedp; + tree op0a, op0b; + enum tree_code tcode; + + op0a = TREE_OPERAND (exp, 0); + op0b = TREE_OPERAND (exp, 1); + tcode = TREE_CODE (exp); + + unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); + vmode = TYPE_MODE (TREE_TYPE (op0a)); + + icode = get_vec_cmp_icode (vmode, mask_mode, unsignedp); + if (icode == CODE_FOR_nothing) + return 0; + + comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2); + create_output_operand (&ops[0], target, mask_mode); + create_fixed_operand (&ops[1], comparison); + create_fixed_operand (&ops[2], XEXP (comparison, 0)); + create_fixed_operand (&ops[3], XEXP (comparison, 1)); + expand_insn (icode, 4, ops); + return ops[0].value; +} + /* Return non-zero if a highpart multiply is supported of can be synthisized. For the benefit of expand_mult_highpart, the return value is 1 for direct, 2 for even/odd widening, and 3 for hi/lo widening. */ @@ -7002,26 +7061,32 @@ expand_mult_highpart (machine_mode mode, rtx op0, rtx op1, /* Return true if target supports vector masked load/store for mode. */ bool -can_vec_mask_load_store_p (machine_mode mode, bool is_load) +can_vec_mask_load_store_p (machine_mode mode, + machine_mode mask_mode, + bool is_load) { optab op = is_load ? maskload_optab : maskstore_optab; - machine_mode vmode; unsigned int vector_sizes; /* If mode is vector mode, check it directly. */ if (VECTOR_MODE_P (mode)) - return optab_handler (op, mode) != CODE_FOR_nothing; + return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing; /* Otherwise, return true if there is some vector mode with the mask load/store supported. */ /* See if there is any chance the mask load or store might be vectorized. If not, punt. */ - vmode = targetm.vectorize.preferred_simd_mode (mode); - if (!VECTOR_MODE_P (vmode)) + mode = targetm.vectorize.preferred_simd_mode (mode); + if (!VECTOR_MODE_P (mode)) + return false; + + mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode), + GET_MODE_SIZE (mode)); + if (mask_mode == VOIDmode) return false; - if (optab_handler (op, vmode) != CODE_FOR_nothing) + if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing) return true; vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); @@ -7031,9 +7096,12 @@ can_vec_mask_load_store_p (machine_mode mode, bool is_load) vector_sizes &= ~cur; if (cur <= GET_MODE_SIZE (mode)) continue; - vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode)); - if (VECTOR_MODE_P (vmode) - && optab_handler (op, vmode) != CODE_FOR_nothing) + mode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode)); + mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode), + cur); + if (VECTOR_MODE_P (mode) + && mask_mode != VOIDmode + && convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing) return true; } return false; diff --git a/gcc/optabs.def b/gcc/optabs.def index 888b21c..9804378 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -61,6 +61,10 @@ OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b") OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b") OPTAB_CD(vcond_optab, "vcond$a$b") OPTAB_CD(vcondu_optab, "vcondu$a$b") +OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b") +OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b") +OPTAB_CD(maskload_optab, "maskload$a$b") +OPTAB_CD(maskstore_optab, "maskstore$a$b") OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc) OPTAB_NX(add_optab, "add$F$a3") @@ -264,8 +268,6 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a") OPTAB_D (usum_widen_optab, "widen_usum$I$a3") OPTAB_D (usad_optab, "usad$I$a") OPTAB_D (ssad_optab, "ssad$I$a") -OPTAB_D (maskload_optab, "maskload$a") -OPTAB_D (maskstore_optab, "maskstore$a") OPTAB_D (vec_extract_optab, "vec_extract$a") OPTAB_D (vec_init_optab, "vec_init$a") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") diff --git a/gcc/optabs.h b/gcc/optabs.h index 95f5cbc..dfe9ebf 100644 --- a/gcc/optabs.h +++ b/gcc/optabs.h @@ -496,6 +496,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *); extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx); /* Return tree if target supports vector operations for COND_EXPR. */ +bool expand_vec_cmp_expr_p (tree, tree); + +/* Generate code for VEC_COND_EXPR. */ +extern rtx expand_vec_cmp_expr (tree, tree, rtx); + +/* Return true if target supports vector comparison. */ bool expand_vec_cond_expr_p (tree, tree); /* Generate code for VEC_COND_EXPR. */ @@ -508,7 +514,7 @@ extern int can_mult_highpart_p (machine_mode, bool); extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool); /* Return true if target supports vector masked load/store for mode. */ -extern bool can_vec_mask_load_store_p (machine_mode, bool); +extern bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool); /* Return true if there is an inline compare and swap pattern. */ extern bool can_compare_and_swap_p (machine_mode, bool); diff --git a/gcc/target.def b/gcc/target.def index 4edc209..c5b8ed9 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1789,6 +1789,15 @@ The default is zero which means to not iterate over other vector sizes.", (void), default_autovectorize_vector_sizes) +/* Function to get a target mode for a vector mask. */ +DEFHOOK +(get_mask_mode, + "This hook returns mode to be used for a mask to be used for a vector\n\ +of specified @var{length} with @var{nunits} elements.", + machine_mode, + (unsigned nunits, unsigned length), + default_get_mask_mode) + /* Target builtin that implements vector gather operation. */ DEFHOOK (builtin_gather, diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 7238c8f..61fb97d 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1033,6 +1033,8 @@ tree default_mangle_decl_assembler_name (tree decl ATTRIBUTE_UNUSED, HOST_WIDE_INT default_vector_alignment (const_tree type) { + if (TREE_TYPE (type) == boolean_type_node) + return GET_MODE_ALIGNMENT (TYPE_MODE (type)); return tree_to_shwi (TYPE_SIZE (type)); } @@ -1087,6 +1089,20 @@ default_autovectorize_vector_sizes (void) return 0; } +/* By defaults a vector of integers is used as a mask. */ + +machine_mode +default_get_mask_mode (unsigned nunits, unsigned vector_size) +{ + unsigned elem_size = vector_size / nunits; + machine_mode elem_mode + = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT); + + gcc_assert (elem_size * nunits == vector_size); + + return mode_for_vector (elem_mode, nunits); +} + /* By default, the cost model accumulates three separate costs (prologue, loop body, and epilogue) for a vectorized loop or block. So allocate an array of three unsigned ints, set it to zero, and return its address. */ diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 5ae991d..cc7263f 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -100,6 +100,7 @@ default_builtin_support_vector_misalignment (machine_mode mode, int, bool); extern machine_mode default_preferred_simd_mode (machine_mode mode); extern unsigned int default_autovectorize_vector_sizes (void); +extern machine_mode default_get_mask_mode (unsigned, unsigned); extern void *default_init_cost (struct loop *); extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt, struct _stmt_vec_info *, int, diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 5ac73b3..1ee8f93 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -3490,6 +3490,27 @@ verify_gimple_comparison (tree type, tree op0, tree op1) return true; } } + /* Or a boolean vector type with the same element count + as the comparison operand types. */ + else if (TREE_CODE (type) == VECTOR_TYPE + && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) + { + if (TREE_CODE (op0_type) != VECTOR_TYPE + || TREE_CODE (op1_type) != VECTOR_TYPE) + { + error ("non-vector operands in vector comparison"); + debug_generic_expr (op0_type); + debug_generic_expr (op1_type); + return true; + } + + if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)) + { + error ("invalid vector comparison resulting type"); + debug_generic_expr (type); + return true; + } + } else { error ("bogus comparison result type"); diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 291e602..1c9242a 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -811,7 +811,7 @@ ifcvt_can_use_mask_load_store (gimple stmt) || VECTOR_MODE_P (mode)) return false; - if (can_vec_mask_load_store_p (mode, is_load)) + if (can_vec_mask_load_store_p (mode, VOIDmode, is_load)) return true; return false; @@ -2082,15 +2082,14 @@ predicate_mem_writes (loop_p loop) mask = vect_masks[index]; else { - masktype = build_nonstandard_integer_type (bitsize, 1); - mask_op0 = build_int_cst (masktype, swap ? 0 : -1); - mask_op1 = build_int_cst (masktype, swap ? -1 : 0); - cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond), - is_gimple_condexpr, - NULL_TREE, - true, GSI_SAME_STMT); - mask = fold_build_cond_expr (masktype, unshare_expr (cond), - mask_op0, mask_op1); + masktype = boolean_type_node; + if (TREE_CODE (cond) == NE_EXPR + && integer_zerop (TREE_OPERAND (cond, 1)) + && TREE_CODE (TREE_TYPE (TREE_OPERAND (cond, 0))) + == BOOLEAN_TYPE) + mask = TREE_OPERAND (cond, 0); + else + mask = unshare_expr (cond); mask = ifc_temp_var (masktype, mask, &gsi); /* Save mask and its size for further use. */ vect_sizes.safe_push (bitsize); diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index f1eaef4..b3b970a 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -3849,6 +3849,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind var_kind, const char *name) case vect_scalar_var: prefix = "stmp"; break; + case vect_mask_var: + prefix = "mask"; + break; case vect_pointer_var: prefix = "vectp"; break; @@ -4403,7 +4406,11 @@ vect_create_destination_var (tree scalar_dest, tree vectype) tree type; enum vect_var_kind kind; - kind = vectype ? vect_simple_var : vect_scalar_var; + kind = vectype + ? TREE_TYPE (vectype) == boolean_type_node + ? vect_mask_var + : vect_simple_var + : vect_scalar_var; type = vectype ? vectype : TREE_TYPE (scalar_dest); gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME); diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c index be3d27f..3e4fabc 100644 --- a/gcc/tree-vect-generic.c +++ b/gcc/tree-vect-generic.c @@ -350,7 +350,8 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0, tree op1, enum tree_code code) { tree t; - if (! expand_vec_cond_expr_p (type, TREE_TYPE (op0))) + if (!expand_vec_cmp_expr_p (TREE_TYPE (op0), type) + && !expand_vec_cond_expr_p (type, TREE_TYPE (op0))) t = expand_vector_piecewise (gsi, do_compare, type, TREE_TYPE (TREE_TYPE (op0)), op0, op1, code); else @@ -1506,6 +1507,11 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi) if (TREE_CODE (type) != VECTOR_TYPE) return; + /* A scalar operation pretending to be a vector one. */ + if (!VECTOR_MODE_P (TYPE_MODE (type)) + && TYPE_MODE (type) != BLKmode) + return; + if (CONVERT_EXPR_CODE_P (code) || code == FLOAT_EXPR || code == FIX_TRUNC_EXPR diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 59c75af..f2dbc4e 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -193,19 +193,21 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) { struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); - int nbbs = loop->num_nodes; + unsigned nbbs = loop->num_nodes; unsigned int vectorization_factor = 0; tree scalar_type; gphi *phi; tree vectype; unsigned int nunits; stmt_vec_info stmt_info; - int i; + unsigned i; HOST_WIDE_INT dummy; gimple stmt, pattern_stmt = NULL; gimple_seq pattern_def_seq = NULL; gimple_stmt_iterator pattern_def_si = gsi_none (); bool analyze_pattern_stmt = false; + bool bool_result; + auto_vec<stmt_vec_info> mask_producers; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -424,6 +426,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) return false; } + bool_result = false; + if (STMT_VINFO_VECTYPE (stmt_info)) { /* The only case when a vectype had been already set is for stmts @@ -444,6 +448,30 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); else scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); + + /* Bool ops don't participate in vectorization factor + computation. For comparison use compared types to + compute a factor. */ + if (scalar_type == boolean_type_node) + { + mask_producers.safe_push (stmt_info); + bool_result = true; + + if (gimple_code (stmt) == GIMPLE_ASSIGN + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison + && TREE_TYPE (gimple_assign_rhs1 (stmt)) != boolean_type_node) + scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt)); + else + { + if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si)) + { + pattern_def_seq = NULL; + gsi_next (&si); + } + continue; + } + } + if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, @@ -466,7 +494,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) return false; } - STMT_VINFO_VECTYPE (stmt_info) = vectype; + if (!bool_result) + STMT_VINFO_VECTYPE (stmt_info) = vectype; if (dump_enabled_p ()) { @@ -479,8 +508,9 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) /* The vectorization factor is according to the smallest scalar type (or the largest vector size, but we only support one vector size per loop). */ - scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, - &dummy); + if (!bool_result) + scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, + &dummy); if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, @@ -555,6 +585,100 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) } LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor; + for (i = 0; i < mask_producers.length (); i++) + { + tree mask_type = NULL; + bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (mask_producers[i]); + + stmt = STMT_VINFO_STMT (mask_producers[i]); + + if (gimple_code (stmt) == GIMPLE_ASSIGN + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison + && TREE_TYPE (gimple_assign_rhs1 (stmt)) != boolean_type_node) + { + scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt)); + mask_type = get_mask_type_for_scalar_type (scalar_type); + + if (!mask_type) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: unsupported mask\n"); + return false; + } + } + else + { + tree rhs, def; + ssa_op_iter iter; + gimple def_stmt; + enum vect_def_type dt; + + FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE) + { + if (!vect_is_simple_use_1 (rhs, stmt, loop_vinfo, bb_vinfo, + &def_stmt, &def, &dt, &vectype)) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: can't compute mask type " + "for statement, "); + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, + 0); + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); + } + return false; + } + + /* No vectype probably means external definition. + Allow it in case there is another operand which + allows to determine mask type. */ + if (!vectype) + continue; + + if (!mask_type) + mask_type = vectype; + else if (TYPE_VECTOR_SUBPARTS (mask_type) + != TYPE_VECTOR_SUBPARTS (vectype)) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: different sized masks " + "types in statement, "); + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, + mask_type); + dump_printf (MSG_MISSED_OPTIMIZATION, " and "); + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, + vectype); + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); + } + return false; + } + } + } + + /* No mask_type should mean loop invariant predicate. + This is probably a subject for optimization in + if-conversion. */ + if (!mask_type) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: can't compute mask type " + "for statement, "); + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, + 0); + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); + } + return false; + } + + STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type; + } + return true; } diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index f87c066..41fb401 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1411,7 +1411,15 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def) /* Case 1: operand is a constant. */ case vect_constant_def: { - vector_type = get_vectype_for_scalar_type (TREE_TYPE (op)); + if (TREE_TYPE (op) == boolean_type_node) + { + vector_type = STMT_VINFO_VECTYPE (stmt_vinfo); + nunits = TYPE_VECTOR_SUBPARTS (vector_type); + vector_type = build_truth_vector_type (nunits, current_vector_size); + } + else + vector_type = get_vectype_for_scalar_type (TREE_TYPE (op)); + gcc_assert (vector_type); nunits = TYPE_VECTOR_SUBPARTS (vector_type); @@ -1758,6 +1766,7 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt); struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); tree vectype = STMT_VINFO_VECTYPE (stmt_info); + tree mask_vectype; tree elem_type; gimple new_stmt; tree dummy; @@ -1785,8 +1794,8 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE; mask = gimple_call_arg (stmt, 2); - if (TYPE_PRECISION (TREE_TYPE (mask)) - != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype)))) + + if (TREE_TYPE (mask) != boolean_type_node) return false; /* FORNOW. This restriction should be relaxed. */ @@ -1815,6 +1824,19 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, if (STMT_VINFO_STRIDED_P (stmt_info)) return false; + if (TREE_CODE (mask) != SSA_NAME) + return false; + + if (!vect_is_simple_use_1 (mask, stmt, loop_vinfo, NULL, + &def_stmt, &def, &dt, &mask_vectype)) + return false; + + if (!mask_vectype) + mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype)); + + if (!mask_vectype) + return false; + if (STMT_VINFO_GATHER_P (stmt_info)) { gimple def_stmt; @@ -1848,14 +1870,9 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, : DR_STEP (dr), size_zero_node) <= 0) return false; else if (!VECTOR_MODE_P (TYPE_MODE (vectype)) - || !can_vec_mask_load_store_p (TYPE_MODE (vectype), !is_store)) - return false; - - if (TREE_CODE (mask) != SSA_NAME) - return false; - - if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL, - &def_stmt, &def, &dt)) + || !can_vec_mask_load_store_p (TYPE_MODE (vectype), + TYPE_MODE (mask_vectype), + !is_store)) return false; if (is_store) @@ -7373,6 +7390,201 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi, return true; } +/* vectorizable_comparison. + + Check if STMT is comparison expression that can be vectorized. + If VEC_STMT is also passed, vectorize the STMT: create a vectorized + comparison, put it in VEC_STMT, and insert it at GSI. + + Return FALSE if not a vectorizable STMT, TRUE otherwise. */ + +bool +vectorizable_comparison (gimple stmt, gimple_stmt_iterator *gsi, + gimple *vec_stmt, tree reduc_def, + slp_tree slp_node) +{ + tree lhs, rhs1, rhs2; + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + tree vectype1, vectype2; + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE; + tree vec_compare; + tree new_temp; + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + tree def; + enum vect_def_type dt, dts[4]; + unsigned nunits; + int ncopies; + enum tree_code code; + stmt_vec_info prev_stmt_info = NULL; + int i, j; + bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); + vec<tree> vec_oprnds0 = vNULL; + vec<tree> vec_oprnds1 = vNULL; + tree mask_type; + tree mask; + + if (TREE_TYPE (vectype) != boolean_type_node) + return false; + + mask_type = vectype; + nunits = TYPE_VECTOR_SUBPARTS (vectype); + + if (slp_node || PURE_SLP_STMT (stmt_info)) + ncopies = 1; + else + ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; + + gcc_assert (ncopies >= 1); + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def + && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle + && reduc_def)) + return false; + + if (STMT_VINFO_LIVE_P (stmt_info)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "value used after loop.\n"); + return false; + } + + if (!is_gimple_assign (stmt)) + return false; + + code = gimple_assign_rhs_code (stmt); + + if (TREE_CODE_CLASS (code) != tcc_comparison) + return false; + + rhs1 = gimple_assign_rhs1 (stmt); + rhs2 = gimple_assign_rhs2 (stmt); + + if (TREE_CODE (rhs1) == SSA_NAME) + { + gimple rhs1_def_stmt = SSA_NAME_DEF_STMT (rhs1); + if (!vect_is_simple_use_1 (rhs1, stmt, loop_vinfo, bb_vinfo, + &rhs1_def_stmt, &def, &dt, &vectype1)) + return false; + } + else if (TREE_CODE (rhs1) != INTEGER_CST && TREE_CODE (rhs1) != REAL_CST + && TREE_CODE (rhs1) != FIXED_CST) + return false; + + if (TREE_CODE (rhs2) == SSA_NAME) + { + gimple rhs2_def_stmt = SSA_NAME_DEF_STMT (rhs2); + if (!vect_is_simple_use_1 (rhs2, stmt, loop_vinfo, bb_vinfo, + &rhs2_def_stmt, &def, &dt, &vectype2)) + return false; + } + else if (TREE_CODE (rhs2) != INTEGER_CST && TREE_CODE (rhs2) != REAL_CST + && TREE_CODE (rhs2) != FIXED_CST) + return false; + + vectype = vectype1 ? vectype1 : vectype2; + + if (!vectype + || nunits != TYPE_VECTOR_SUBPARTS (vectype)) + return false; + + if (!vec_stmt) + { + STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type; + return expand_vec_cmp_expr_p (vectype, mask_type); + } + + /* Transform. */ + if (!slp_node) + { + vec_oprnds0.create (1); + vec_oprnds1.create (1); + } + + /* Handle def. */ + lhs = gimple_assign_lhs (stmt); + mask = vect_create_destination_var (lhs, mask_type); + + /* Handle cmp expr. */ + for (j = 0; j < ncopies; j++) + { + gassign *new_stmt = NULL; + if (j == 0) + { + if (slp_node) + { + auto_vec<tree, 2> ops; + auto_vec<vec<tree>, 2> vec_defs; + + ops.safe_push (rhs1); + ops.safe_push (rhs2); + vect_get_slp_defs (ops, slp_node, &vec_defs, -1); + vec_oprnds1 = vec_defs.pop (); + vec_oprnds0 = vec_defs.pop (); + + ops.release (); + vec_defs.release (); + } + else + { + gimple gtemp; + vec_rhs1 + = vect_get_vec_def_for_operand (rhs1, stmt, NULL); + vect_is_simple_use (rhs1, stmt, loop_vinfo, NULL, + >emp, &def, &dts[0]); + vec_rhs2 = + vect_get_vec_def_for_operand (rhs2, stmt, NULL); + vect_is_simple_use (rhs2, stmt, loop_vinfo, NULL, + >emp, &def, &dts[1]); + } + } + else + { + vec_rhs1 = vect_get_vec_def_for_stmt_copy (dts[0], + vec_oprnds0.pop ()); + vec_rhs2 = vect_get_vec_def_for_stmt_copy (dts[1], + vec_oprnds1.pop ()); + } + + if (!slp_node) + { + vec_oprnds0.quick_push (vec_rhs1); + vec_oprnds1.quick_push (vec_rhs2); + } + + /* Arguments are ready. Create the new vector stmt. */ + FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_rhs1) + { + vec_rhs2 = vec_oprnds1[i]; + + vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2); + new_stmt = gimple_build_assign (mask, vec_compare); + new_temp = make_ssa_name (mask, new_stmt); + gimple_assign_set_lhs (new_stmt, new_temp); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (slp_node) + SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); + } + + if (slp_node) + continue; + + if (j == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + + vec_oprnds0.release (); + vec_oprnds1.release (); + + return true; +} /* Make sure the statement is vectorizable. */ @@ -7576,7 +7788,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node) || vectorizable_call (stmt, NULL, NULL, node) || vectorizable_store (stmt, NULL, NULL, node) || vectorizable_reduction (stmt, NULL, NULL, node) - || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)); + || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node) + || vectorizable_comparison (stmt, NULL, NULL, NULL, node)); else { if (bb_vinfo) @@ -7588,7 +7801,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node) || vectorizable_load (stmt, NULL, NULL, node, NULL) || vectorizable_call (stmt, NULL, NULL, node) || vectorizable_store (stmt, NULL, NULL, node) - || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)); + || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node) + || vectorizable_comparison (stmt, NULL, NULL, NULL, node)); } if (!ok) @@ -7704,6 +7918,11 @@ vect_transform_stmt (gimple stmt, gimple_stmt_iterator *gsi, gcc_assert (done); break; + case comparison_vec_info_type: + done = vectorizable_comparison (stmt, gsi, &vec_stmt, NULL, slp_node); + gcc_assert (done); + break; + case call_vec_info_type: done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node); stmt = gsi_stmt (*gsi); @@ -8038,6 +8257,23 @@ get_vectype_for_scalar_type (tree scalar_type) return vectype; } +/* Function get_mask_type_for_scalar_type. + + Returns the mask type corresponding to a result of comparison + of vectors of specified SCALAR_TYPE as supported by target. */ + +tree +get_mask_type_for_scalar_type (tree scalar_type) +{ + tree vectype = get_vectype_for_scalar_type (scalar_type); + + if (!vectype) + return NULL; + + return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype), + current_vector_size); +} + /* Function get_same_sized_vectype Returns a vector type corresponding to SCALAR_TYPE of size diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 58e8f10..94aea1a 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -28,7 +28,8 @@ along with GCC; see the file COPYING3. If not see enum vect_var_kind { vect_simple_var, vect_pointer_var, - vect_scalar_var + vect_scalar_var, + vect_mask_var }; /* Defines type of operation. */ @@ -482,6 +483,7 @@ enum stmt_vec_info_type { call_simd_clone_vec_info_type, assignment_vec_info_type, condition_vec_info_type, + comparison_vec_info_type, reduc_vec_info_type, induc_vec_info_type, type_promotion_vec_info_type, @@ -995,6 +997,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info); /* In tree-vect-stmts.c. */ extern unsigned int current_vector_size; extern tree get_vectype_for_scalar_type (tree); +extern tree get_mask_type_for_scalar_type (tree); extern tree get_same_sized_vectype (tree, tree); extern bool vect_is_simple_use (tree, gimple, loop_vec_info, bb_vec_info, gimple *, diff --git a/gcc/tree.c b/gcc/tree.c index af3a6a3..30398e5 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -10568,6 +10568,20 @@ build_vector_type (tree innertype, int nunits) return make_vector_type (innertype, nunits, VOIDmode); } +/* Build truth vector with specified length and number of units. */ + +tree +build_truth_vector_type (unsigned nunits, unsigned vector_size) +{ + machine_mode mask_mode = targetm.vectorize.get_mask_mode (nunits, + vector_size); + + if (mask_mode == VOIDmode) + return NULL; + + return make_vector_type (boolean_type_node, nunits, mask_mode); +} + /* Similarly, but builds a variant type with TYPE_VECTOR_OPAQUE set. */ tree @@ -11054,9 +11068,10 @@ truth_type_for (tree type) { if (TREE_CODE (type) == VECTOR_TYPE) { - tree elem = lang_hooks.types.type_for_size - (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (type))), 0); - return build_opaque_vector_type (elem, TYPE_VECTOR_SUBPARTS (type)); + if (TREE_TYPE (type) == boolean_type_node) + return type; + return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (type), + GET_MODE_SIZE (TYPE_MODE (type))); } else return boolean_type_node; diff --git a/gcc/tree.h b/gcc/tree.h index 2cd6ec4..1657e06 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -3820,6 +3820,7 @@ extern tree build_reference_type_for_mode (tree, machine_mode, bool); extern tree build_reference_type (tree); extern tree build_vector_type_for_mode (tree, machine_mode); extern tree build_vector_type (tree innertype, int nunits); +extern tree build_truth_vector_type (unsigned, unsigned); extern tree build_opaque_vector_type (tree innertype, int nunits); extern tree build_index_type (tree); extern tree build_array_type (tree, tree);