diff mbox series

[v2] VECT: Remove the type size restriction of vectorizer

Message ID 20231026021816.1633907-1-pan2.li@intel.com
State New
Headers show
Series [v2] VECT: Remove the type size restriction of vectorizer | expand

Commit Message

Li, Pan2 Oct. 26, 2023, 2:18 a.m. UTC
From: Pan Li <pan2.li@intel.com>

Update in v2:

* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.

Original log:

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to remove this data
type size check and unblock the standard name like lrintmn2.

The below test are passed for this patch.

* The x86 bootstrap and regression test.
* The aarch64 regression test.
* The risc-v regression tests.

gcc/ChangeLog:

	* internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
	* tree-vect-stmts.cc (vectorizable_call): Remove size check.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
	* gcc.target/aarch64/sve/clz_1.c: Ditto.
	* gcc.target/aarch64/sve/popcount_1.c: Ditto.
	* gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/internal-fn.cc                                  |  3 ++-
 gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
 gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
 gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
 .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
 gcc/tree-vect-stmts.cc                              | 13 -------------
 6 files changed, 6 insertions(+), 21 deletions(-)

Comments

Richard Biener Oct. 26, 2023, 8:37 a.m. UTC | #1
On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> The below test are passed for this patch.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> * The risc-v regression tests.
>
> gcc/ChangeLog:
>
>         * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>         * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>         * gcc.target/aarch64/sve/clz_1.c: Ditto.
>         * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc                                  |  3 ++-
>  gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>  gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>  gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>  .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>  gcc/tree-vect-stmts.cc                              | 13 -------------
>  6 files changed, 6 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 61d5a9e4772..17c0f4c3805 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>         emit_move_insn (lhs_rtx, ops[0].value);
>        else
>         {
> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));

Can you explain why this is necessary?  In particular what is lhs_rtx
mode vs ops[0].value mode?

>           convert_move (lhs_rtx, ops[0].value, 0);

I'm not sure convert_move handles vector modes correctly.  Richard
probably added this code, CCed.

Richard.

>         }
>      }
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> index bdc9856faaf..940d08bbc7b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> index 0c7a4e6d768..58b8ff406d2 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> index dfb6f4ac7a5..0eba898307c 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> index 585a522aa81..e6e3c70f927 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> @@ -1461,4 +1461,4 @@ main ()
>    RUN_ALL ()
>  }
>
> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..fa4ca0634e8 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>
Li, Pan2 Oct. 26, 2023, 11:59 a.m. UTC | #2
Thanks Richard for comments.

> Can you explain why this is necessary?  In particular what is lhs_rtx
> mode vs ops[0].value mode?

For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.

The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
The ops[0].value is (reg:VNx2DI 104).

The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Thursday, October 26, 2023 4:38 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer

On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> The below test are passed for this patch.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> * The risc-v regression tests.
>
> gcc/ChangeLog:
>
>         * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>         * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>         * gcc.target/aarch64/sve/clz_1.c: Ditto.
>         * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc                                  |  3 ++-
>  gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>  gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>  gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>  .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>  gcc/tree-vect-stmts.cc                              | 13 -------------
>  6 files changed, 6 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 61d5a9e4772..17c0f4c3805 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>         emit_move_insn (lhs_rtx, ops[0].value);
>        else
>         {
> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));

Can you explain why this is necessary?  In particular what is lhs_rtx
mode vs ops[0].value mode?

>           convert_move (lhs_rtx, ops[0].value, 0);

I'm not sure convert_move handles vector modes correctly.  Richard
probably added this code, CCed.

Richard.

>         }
>      }
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> index bdc9856faaf..940d08bbc7b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> index 0c7a4e6d768..58b8ff406d2 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> index dfb6f4ac7a5..0eba898307c 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> index 585a522aa81..e6e3c70f927 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> @@ -1461,4 +1461,4 @@ main ()
>    RUN_ALL ()
>  }
>
> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..fa4ca0634e8 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>
Richard Biener Oct. 26, 2023, 1:59 p.m. UTC | #3
> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>:
> 
> Thanks Richard for comments.
> 
>> Can you explain why this is necessary?  In particular what is lhs_rtx
>> mode vs ops[0].value mode?
> 
> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.
> 
> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
> The ops[0].value is (reg:VNx2DI 104).
> 
> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.

But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)

> Pan
> 
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com> 
> Sent: Thursday, October 26, 2023 4:38 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
> 
>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>> 
>> From: Pan Li <pan2.li@intel.com>
>> 
>> Update in v2:
>> 
>> * Fix one ICE of type assertion.
>> * Adjust some test cases for aarch64 sve and riscv vector.
>> 
>> Original log:
>> 
>> The vectoriable_call has one restriction of the size of data type.
>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
>> when try to vectorize function call like lrintf.
>> 
>> void
>> test_lrintf (long *out, float *in, unsigned count)
>> {
>>  for (unsigned i = 0; i < count; i++)
>>    out[i] = __builtin_lrintf (in[i]);
>> }
>> 
>> lrintf.c:5:26: missed: couldn't vectorize loop
>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>> 
>> Then the standard name pattern like lrintmn2 cannot work for different
>> data type size like SF => DI. This patch would like to remove this data
>> type size check and unblock the standard name like lrintmn2.
>> 
>> The below test are passed for this patch.
>> 
>> * The x86 bootstrap and regression test.
>> * The aarch64 regression test.
>> * The risc-v regression tests.
>> 
>> gcc/ChangeLog:
>> 
>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>> 
>> Signed-off-by: Pan Li <pan2.li@intel.com>
>> ---
>> gcc/internal-fn.cc                                  |  3 ++-
>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>> gcc/tree-vect-stmts.cc                              | 13 -------------
>> 6 files changed, 6 insertions(+), 21 deletions(-)
>> 
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 61d5a9e4772..17c0f4c3805 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>>        emit_move_insn (lhs_rtx, ops[0].value);
>>       else
>>        {
>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
> 
> Can you explain why this is necessary?  In particular what is lhs_rtx
> mode vs ops[0].value mode?
> 
>>          convert_move (lhs_rtx, ops[0].value, 0);
> 
> I'm not sure convert_move handles vector modes correctly.  Richard
> probably added this code, CCed.
> 
> Richard.
> 
>>        }
>>     }
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> index bdc9856faaf..940d08bbc7b 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> index 0c7a4e6d768..58b8ff406d2 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> index dfb6f4ac7a5..0eba898307c 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> index 585a522aa81..e6e3c70f927 100644
>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> @@ -1461,4 +1461,4 @@ main ()
>>   RUN_ALL ()
>> }
>> 
>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index a9200767f67..fa4ca0634e8 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>> 
>>       return false;
>>     }
>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
>> -     by a pack of the two vectors into an SI vector.  We would need
>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
>> -    {
>> -      if (dump_enabled_p ())
>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> -                        "mismatched vector sizes %T and %T\n",
>> -                        vectype_in, vectype_out);
>> -      return false;
>> -    }
>> 
>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
>> --
>> 2.34.1
>>
Li, Pan2 Oct. 26, 2023, 2:42 p.m. UTC | #4
> But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)

Make sense, we can wait a while for the confirmation from Richard S.

If convert_move is not designed for Vector (looks like mostly up to a point), I am not sure if we can fix the assertion like below

...
else If (VECTOR_INTERGER_TYPE (TREE_TYPE(lhs)))
  return;
else
  {
    gcc_checking_assert (INTEGRAL_TYPE_TYPE_P (TREE_TYPE (lhs)));
    convert_move (lhs_rtx, ops[0].value, 0);
  }

Aka bypass the vector here, but I am afraid this change may make the llrintf (SF => DI) not working on standard name.
Let me have a try and keep you posted.

Pan
  

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Thursday, October 26, 2023 10:00 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer



> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>:
> 
> Thanks Richard for comments.
> 
>> Can you explain why this is necessary?  In particular what is lhs_rtx
>> mode vs ops[0].value mode?
> 
> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.
> 
> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
> The ops[0].value is (reg:VNx2DI 104).
> 
> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.

But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)

> Pan
> 
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com> 
> Sent: Thursday, October 26, 2023 4:38 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
> 
>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>> 
>> From: Pan Li <pan2.li@intel.com>
>> 
>> Update in v2:
>> 
>> * Fix one ICE of type assertion.
>> * Adjust some test cases for aarch64 sve and riscv vector.
>> 
>> Original log:
>> 
>> The vectoriable_call has one restriction of the size of data type.
>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
>> when try to vectorize function call like lrintf.
>> 
>> void
>> test_lrintf (long *out, float *in, unsigned count)
>> {
>>  for (unsigned i = 0; i < count; i++)
>>    out[i] = __builtin_lrintf (in[i]);
>> }
>> 
>> lrintf.c:5:26: missed: couldn't vectorize loop
>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>> 
>> Then the standard name pattern like lrintmn2 cannot work for different
>> data type size like SF => DI. This patch would like to remove this data
>> type size check and unblock the standard name like lrintmn2.
>> 
>> The below test are passed for this patch.
>> 
>> * The x86 bootstrap and regression test.
>> * The aarch64 regression test.
>> * The risc-v regression tests.
>> 
>> gcc/ChangeLog:
>> 
>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>> 
>> Signed-off-by: Pan Li <pan2.li@intel.com>
>> ---
>> gcc/internal-fn.cc                                  |  3 ++-
>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>> gcc/tree-vect-stmts.cc                              | 13 -------------
>> 6 files changed, 6 insertions(+), 21 deletions(-)
>> 
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 61d5a9e4772..17c0f4c3805 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>>        emit_move_insn (lhs_rtx, ops[0].value);
>>       else
>>        {
>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
> 
> Can you explain why this is necessary?  In particular what is lhs_rtx
> mode vs ops[0].value mode?
> 
>>          convert_move (lhs_rtx, ops[0].value, 0);
> 
> I'm not sure convert_move handles vector modes correctly.  Richard
> probably added this code, CCed.
> 
> Richard.
> 
>>        }
>>     }
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> index bdc9856faaf..940d08bbc7b 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> index 0c7a4e6d768..58b8ff406d2 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> index dfb6f4ac7a5..0eba898307c 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> index 585a522aa81..e6e3c70f927 100644
>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> @@ -1461,4 +1461,4 @@ main ()
>>   RUN_ALL ()
>> }
>> 
>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index a9200767f67..fa4ca0634e8 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>> 
>>       return false;
>>     }
>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
>> -     by a pack of the two vectors into an SI vector.  We would need
>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
>> -    {
>> -      if (dump_enabled_p ())
>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> -                        "mismatched vector sizes %T and %T\n",
>> -                        vectype_in, vectype_out);
>> -      return false;
>> -    }
>> 
>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
>> --
>> 2.34.1
>>
Richard Sandiford Oct. 26, 2023, 5:46 p.m. UTC | #5
Richard Biener <richard.guenther@gmail.com> writes:
>> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>:
>> 
>> Thanks Richard for comments.
>> 
>>> Can you explain why this is necessary?  In particular what is lhs_rtx
>>> mode vs ops[0].value mode?
>> 
>> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.
>> 
>> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
>> The ops[0].value is (reg:VNx2DI 104).
>> 
>> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.
>
> But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)

Agreed.  Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)
makes sense if the called function allows the input and output modes
to vary.  That's true for internal functions that eventually map to
two-mode optabs.  But we can't remove the condition for calls to
other functions, at least not without some fix-ups.

ISTM that the problem being hit is the one described by the removed
comment.

In other words, I don't think simply removing the test from the vectoriser
is correct.  It needs to be replaced by something more selective.

Thanks,
Richard

>> Pan
>> 
>> -----Original Message-----
>> From: Richard Biener <richard.guenther@gmail.com> 
>> Sent: Thursday, October 26, 2023 4:38 PM
>> To: Li, Pan2 <pan2.li@intel.com>
>> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
>> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
>> 
>>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>>> 
>>> From: Pan Li <pan2.li@intel.com>
>>> 
>>> Update in v2:
>>> 
>>> * Fix one ICE of type assertion.
>>> * Adjust some test cases for aarch64 sve and riscv vector.
>>> 
>>> Original log:
>>> 
>>> The vectoriable_call has one restriction of the size of data type.
>>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
>>> when try to vectorize function call like lrintf.
>>> 
>>> void
>>> test_lrintf (long *out, float *in, unsigned count)
>>> {
>>>  for (unsigned i = 0; i < count; i++)
>>>    out[i] = __builtin_lrintf (in[i]);
>>> }
>>> 
>>> lrintf.c:5:26: missed: couldn't vectorize loop
>>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>>> 
>>> Then the standard name pattern like lrintmn2 cannot work for different
>>> data type size like SF => DI. This patch would like to remove this data
>>> type size check and unblock the standard name like lrintmn2.
>>> 
>>> The below test are passed for this patch.
>>> 
>>> * The x86 bootstrap and regression test.
>>> * The aarch64 regression test.
>>> * The risc-v regression tests.
>>> 
>>> gcc/ChangeLog:
>>> 
>>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
>>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>>> 
>>> Signed-off-by: Pan Li <pan2.li@intel.com>
>>> ---
>>> gcc/internal-fn.cc                                  |  3 ++-
>>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>>> gcc/tree-vect-stmts.cc                              | 13 -------------
>>> 6 files changed, 6 insertions(+), 21 deletions(-)
>>> 
>>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>>> index 61d5a9e4772..17c0f4c3805 100644
>>> --- a/gcc/internal-fn.cc
>>> +++ b/gcc/internal-fn.cc
>>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>>>        emit_move_insn (lhs_rtx, ops[0].value);
>>>       else
>>>        {
>>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
>>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
>> 
>> Can you explain why this is necessary?  In particular what is lhs_rtx
>> mode vs ops[0].value mode?
>> 
>>>          convert_move (lhs_rtx, ops[0].value, 0);
>> 
>> I'm not sure convert_move handles vector modes correctly.  Richard
>> probably added this code, CCed.
>> 
>> Richard.
>> 
>>>        }
>>>     }
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> index bdc9856faaf..940d08bbc7b 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> index 0c7a4e6d768..58b8ff406d2 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> index dfb6f4ac7a5..0eba898307c 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> index 585a522aa81..e6e3c70f927 100644
>>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> @@ -1461,4 +1461,4 @@ main ()
>>>   RUN_ALL ()
>>> }
>>> 
>>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
>>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>>> index a9200767f67..fa4ca0634e8 100644
>>> --- a/gcc/tree-vect-stmts.cc
>>> +++ b/gcc/tree-vect-stmts.cc
>>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>>> 
>>>       return false;
>>>     }
>>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
>>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
>>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
>>> -     by a pack of the two vectors into an SI vector.  We would need
>>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
>>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
>>> -    {
>>> -      if (dump_enabled_p ())
>>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> -                        "mismatched vector sizes %T and %T\n",
>>> -                        vectype_in, vectype_out);
>>> -      return false;
>>> -    }
>>> 
>>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
>>> --
>>> 2.34.1
>>>
Li, Pan2 Oct. 27, 2023, 2:17 a.m. UTC | #6
Thanks Richard S for comments.

> In other words, I don't think simply removing the test from the vectoriser
> is correct.  It needs to be replaced by something more selective.

Does it mean we need to check if the internal fun allow different modes/sizes here?

For example, standard name lrintmn2 (m, n mode) is allowed here, while rintm2 (only m mode) isn't.

Pan

-----Original Message-----
From: Richard Sandiford <richard.sandiford@arm.com> 
Sent: Friday, October 27, 2023 1:47 AM
To: Richard Biener <richard.guenther@gmail.com>
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer

Richard Biener <richard.guenther@gmail.com> writes:
>> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>:
>> 
>> Thanks Richard for comments.
>> 
>>> Can you explain why this is necessary?  In particular what is lhs_rtx
>>> mode vs ops[0].value mode?
>> 
>> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.
>> 
>> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
>> The ops[0].value is (reg:VNx2DI 104).
>> 
>> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.
>
> But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)

Agreed.  Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)
makes sense if the called function allows the input and output modes
to vary.  That's true for internal functions that eventually map to
two-mode optabs.  But we can't remove the condition for calls to
other functions, at least not without some fix-ups.

ISTM that the problem being hit is the one described by the removed
comment.

In other words, I don't think simply removing the test from the vectoriser
is correct.  It needs to be replaced by something more selective.

Thanks,
Richard

>> Pan
>> 
>> -----Original Message-----
>> From: Richard Biener <richard.guenther@gmail.com> 
>> Sent: Thursday, October 26, 2023 4:38 PM
>> To: Li, Pan2 <pan2.li@intel.com>
>> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
>> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
>> 
>>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>>> 
>>> From: Pan Li <pan2.li@intel.com>
>>> 
>>> Update in v2:
>>> 
>>> * Fix one ICE of type assertion.
>>> * Adjust some test cases for aarch64 sve and riscv vector.
>>> 
>>> Original log:
>>> 
>>> The vectoriable_call has one restriction of the size of data type.
>>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
>>> when try to vectorize function call like lrintf.
>>> 
>>> void
>>> test_lrintf (long *out, float *in, unsigned count)
>>> {
>>>  for (unsigned i = 0; i < count; i++)
>>>    out[i] = __builtin_lrintf (in[i]);
>>> }
>>> 
>>> lrintf.c:5:26: missed: couldn't vectorize loop
>>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>>> 
>>> Then the standard name pattern like lrintmn2 cannot work for different
>>> data type size like SF => DI. This patch would like to remove this data
>>> type size check and unblock the standard name like lrintmn2.
>>> 
>>> The below test are passed for this patch.
>>> 
>>> * The x86 bootstrap and regression test.
>>> * The aarch64 regression test.
>>> * The risc-v regression tests.
>>> 
>>> gcc/ChangeLog:
>>> 
>>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
>>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>>> 
>>> Signed-off-by: Pan Li <pan2.li@intel.com>
>>> ---
>>> gcc/internal-fn.cc                                  |  3 ++-
>>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>>> gcc/tree-vect-stmts.cc                              | 13 -------------
>>> 6 files changed, 6 insertions(+), 21 deletions(-)
>>> 
>>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>>> index 61d5a9e4772..17c0f4c3805 100644
>>> --- a/gcc/internal-fn.cc
>>> +++ b/gcc/internal-fn.cc
>>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>>>        emit_move_insn (lhs_rtx, ops[0].value);
>>>       else
>>>        {
>>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
>>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
>> 
>> Can you explain why this is necessary?  In particular what is lhs_rtx
>> mode vs ops[0].value mode?
>> 
>>>          convert_move (lhs_rtx, ops[0].value, 0);
>> 
>> I'm not sure convert_move handles vector modes correctly.  Richard
>> probably added this code, CCed.
>> 
>> Richard.
>> 
>>>        }
>>>     }
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> index bdc9856faaf..940d08bbc7b 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> index 0c7a4e6d768..58b8ff406d2 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> index dfb6f4ac7a5..0eba898307c 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> index 585a522aa81..e6e3c70f927 100644
>>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> @@ -1461,4 +1461,4 @@ main ()
>>>   RUN_ALL ()
>>> }
>>> 
>>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
>>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>>> index a9200767f67..fa4ca0634e8 100644
>>> --- a/gcc/tree-vect-stmts.cc
>>> +++ b/gcc/tree-vect-stmts.cc
>>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>>> 
>>>       return false;
>>>     }
>>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
>>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
>>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
>>> -     by a pack of the two vectors into an SI vector.  We would need
>>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
>>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
>>> -    {
>>> -      if (dump_enabled_p ())
>>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> -                        "mismatched vector sizes %T and %T\n",
>>> -                        vectype_in, vectype_out);
>>> -      return false;
>>> -    }
>>> 
>>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
>>> --
>>> 2.34.1
>>>
Richard Biener Oct. 27, 2023, 1:31 p.m. UTC | #7
On Fri, Oct 27, 2023 at 4:17 AM Li, Pan2 <pan2.li@intel.com> wrote:
>
> Thanks Richard S for comments.
>
> > In other words, I don't think simply removing the test from the vectoriser
> > is correct.  It needs to be replaced by something more selective.
>
> Does it mean we need to check if the internal fun allow different modes/sizes here?
>
> For example, standard name lrintmn2 (m, n mode) is allowed here, while rintm2 (only m mode) isn't.

We need to check whether the "size" of the LHS is somehow participating in the
optab query.  I think the direct_internal_fn_info type0/1 members,
when -1 say that.
If none is -1 (and -2) then the LHS has to match one of the arguments (if there
are two different I'm not sure which we'd pick).

So patch-wise the existing check can probably be skipped when
vectorizable_internal_function
returns an IFN but that function should have the very same check when
vectype_out isn't
participating in the optab selection.

Richard.

>
> Pan
>
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Friday, October 27, 2023 1:47 AM
> To: Richard Biener <richard.guenther@gmail.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
>
> Richard Biener <richard.guenther@gmail.com> writes:
> >> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>:
> >>
> >> Thanks Richard for comments.
> >>
> >>> Can you explain why this is necessary?  In particular what is lhs_rtx
> >>> mode vs ops[0].value mode?
> >>
> >> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.
> >>
> >> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
> >> The ops[0].value is (reg:VNx2DI 104).
> >>
> >> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.
> >
> > But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)
>
> Agreed.  Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)
> makes sense if the called function allows the input and output modes
> to vary.  That's true for internal functions that eventually map to
> two-mode optabs.  But we can't remove the condition for calls to
> other functions, at least not without some fix-ups.
>
> ISTM that the problem being hit is the one described by the removed
> comment.
>
> In other words, I don't think simply removing the test from the vectoriser
> is correct.  It needs to be replaced by something more selective.
>
> Thanks,
> Richard
>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Richard Biener <richard.guenther@gmail.com>
> >> Sent: Thursday, October 26, 2023 4:38 PM
> >> To: Li, Pan2 <pan2.li@intel.com>
> >> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
> >> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
> >>
> >>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
> >>>
> >>> From: Pan Li <pan2.li@intel.com>
> >>>
> >>> Update in v2:
> >>>
> >>> * Fix one ICE of type assertion.
> >>> * Adjust some test cases for aarch64 sve and riscv vector.
> >>>
> >>> Original log:
> >>>
> >>> The vectoriable_call has one restriction of the size of data type.
> >>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> >>> when try to vectorize function call like lrintf.
> >>>
> >>> void
> >>> test_lrintf (long *out, float *in, unsigned count)
> >>> {
> >>>  for (unsigned i = 0; i < count; i++)
> >>>    out[i] = __builtin_lrintf (in[i]);
> >>> }
> >>>
> >>> lrintf.c:5:26: missed: couldn't vectorize loop
> >>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
> >>>
> >>> Then the standard name pattern like lrintmn2 cannot work for different
> >>> data type size like SF => DI. This patch would like to remove this data
> >>> type size check and unblock the standard name like lrintmn2.
> >>>
> >>> The below test are passed for this patch.
> >>>
> >>> * The x86 bootstrap and regression test.
> >>> * The aarch64 regression test.
> >>> * The risc-v regression tests.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
> >>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
> >>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
> >>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
> >>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
> >>>
> >>> Signed-off-by: Pan Li <pan2.li@intel.com>
> >>> ---
> >>> gcc/internal-fn.cc                                  |  3 ++-
> >>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
> >>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
> >>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
> >>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
> >>> gcc/tree-vect-stmts.cc                              | 13 -------------
> >>> 6 files changed, 6 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> >>> index 61d5a9e4772..17c0f4c3805 100644
> >>> --- a/gcc/internal-fn.cc
> >>> +++ b/gcc/internal-fn.cc
> >>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
> >>>        emit_move_insn (lhs_rtx, ops[0].value);
> >>>       else
> >>>        {
> >>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
> >>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> >>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
> >>
> >> Can you explain why this is necessary?  In particular what is lhs_rtx
> >> mode vs ops[0].value mode?
> >>
> >>>          convert_move (lhs_rtx, ops[0].value, 0);
> >>
> >> I'm not sure convert_move handles vector modes correctly.  Richard
> >> probably added this code, CCed.
> >>
> >> Richard.
> >>
> >>>        }
> >>>     }
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> >>> index bdc9856faaf..940d08bbc7b 100644
> >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> >>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
> >>> }
> >>>
> >>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> >>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> >>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> >>> index 0c7a4e6d768..58b8ff406d2 100644
> >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> >>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
> >>> }
> >>>
> >>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> >>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> >>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> >>> index dfb6f4ac7a5..0eba898307c 100644
> >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> >>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
> >>> }
> >>>
> >>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> >>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> >>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> >>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> >>> index 585a522aa81..e6e3c70f927 100644
> >>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> >>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> >>> @@ -1461,4 +1461,4 @@ main ()
> >>>   RUN_ALL ()
> >>> }
> >>>
> >>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
> >>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >>> index a9200767f67..fa4ca0634e8 100644
> >>> --- a/gcc/tree-vect-stmts.cc
> >>> +++ b/gcc/tree-vect-stmts.cc
> >>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
> >>>
> >>>       return false;
> >>>     }
> >>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> >>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> >>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> >>> -     by a pack of the two vectors into an SI vector.  We would need
> >>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> >>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> >>> -    {
> >>> -      if (dump_enabled_p ())
> >>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >>> -                        "mismatched vector sizes %T and %T\n",
> >>> -                        vectype_in, vectype_out);
> >>> -      return false;
> >>> -    }
> >>>
> >>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
> >>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> >>> --
> >>> 2.34.1
> >>>
diff mbox series

Patch

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 61d5a9e4772..17c0f4c3805 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -281,7 +281,8 @@  expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
 	emit_move_insn (lhs_rtx, ops[0].value);
       else
 	{
-	  gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
+	  gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+			       || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
 	  convert_move (lhs_rtx, ops[0].value, 0);
 	}
     }
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
index bdc9856faaf..940d08bbc7b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
@@ -18,5 +18,4 @@  clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
 }
 
 /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
index 0c7a4e6d768..58b8ff406d2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
@@ -18,5 +18,4 @@  clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
 }
 
 /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
index dfb6f4ac7a5..0eba898307c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
@@ -18,5 +18,4 @@  popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
 }
 
 /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
index 585a522aa81..e6e3c70f927 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
@@ -1461,4 +1461,4 @@  main ()
   RUN_ALL ()
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..fa4ca0634e8 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3361,19 +3361,6 @@  vectorizable_call (vec_info *vinfo,
 
       return false;
     }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
-     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
-     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
-     by a pack of the two vectors into an SI vector.  We would need
-     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "mismatched vector sizes %T and %T\n",
-			 vectype_in, vectype_out);
-      return false;
-    }
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
       != VECTOR_BOOLEAN_TYPE_P (vectype_in))