diff mbox series

[v1] RISC-V: Remove the type size restriction of vectorizer

Message ID 20231018012009.849697-1-pan2.li@intel.com
State New
Headers show
Series [v1] RISC-V: Remove the type size restriction of vectorizer | expand

Commit Message

Li, Pan2 Oct. 18, 2023, 1:20 a.m. UTC
From: Pan Li <pan2.li@intel.com>

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to remove this data
type size check and unblock the standard name like lrintmn2.

Passed the x86 bootstrap and regression test already.

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_call): Remove data size
	check.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/tree-vect-stmts.cc | 13 -------------
 1 file changed, 13 deletions(-)

Comments

Richard Biener Oct. 18, 2023, 6:34 a.m. UTC | #1
On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> Passed the x86 bootstrap and regression test already.

OK.

On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP
vectorization we could expect to see the following vectorized after
the patch (with loop vectorization you'll see us pre-select same sized
vector types)

long int x[4];
float y[4];

void foo ()
{
  x[0] = __builtin_lrintf (y[0]);
  x[1] = __builtin_lrintf (y[1]);
  x[2] = __builtin_lrintf (y[2]);
  x[3] = __builtin_lrintf (y[3]);
}


> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_call): Remove data size
>         check.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-stmts.cc | 13 -------------
>  1 file changed, 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a56498595..326e000a71d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>
Li, Pan2 Oct. 18, 2023, 11:50 a.m. UTC | #2
Thanks Richard, let's wait for a while incase there are comments from others due to not familiar with these parts.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Wednesday, October 18, 2023 2:34 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer

On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> Passed the x86 bootstrap and regression test already.

OK.

On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP
vectorization we could expect to see the following vectorized after
the patch (with loop vectorization you'll see us pre-select same sized
vector types)

long int x[4];
float y[4];

void foo ()
{
  x[0] = __builtin_lrintf (y[0]);
  x[1] = __builtin_lrintf (y[1]);
  x[2] = __builtin_lrintf (y[2]);
  x[3] = __builtin_lrintf (y[3]);
}


> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_call): Remove data size
>         check.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-stmts.cc | 13 -------------
>  1 file changed, 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a56498595..326e000a71d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>
Li, Pan2 Oct. 20, 2023, 8:43 a.m. UTC | #3
Hi Richard Biener,

The CI of linaro-toolchain@lists.linaro.org reports some aarch64 regression of this change, I will double check about it soon.

FAIL: 12 regressions

regressions.sum:
		=== gcc tests ===

Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ...
FAIL: gcc.target/aarch64/sve/clrsb_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284)
FAIL: gcc.target/aarch64/sve/clrsb_1.c (test for excess errors)
FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tcls\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2
FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
FAIL: gcc.target/aarch64/sve/clz_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284)
FAIL: gcc.target/aarch64/sve/clz_1.c (test for excess errors)
FAIL: gcc.target/aarch64/sve/clz_1.c scan-assembler-times \\tclz\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2
... and 7 more entries

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Wednesday, October 18, 2023 2:34 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer

On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> Passed the x86 bootstrap and regression test already.

OK.

On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP
vectorization we could expect to see the following vectorized after
the patch (with loop vectorization you'll see us pre-select same sized
vector types)

long int x[4];
float y[4];

void foo ()
{
  x[0] = __builtin_lrintf (y[0]);
  x[1] = __builtin_lrintf (y[1]);
  x[2] = __builtin_lrintf (y[2]);
  x[3] = __builtin_lrintf (y[3]);
}


> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_call): Remove data size
>         check.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-stmts.cc | 13 -------------
>  1 file changed, 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a56498595..326e000a71d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>
juzhe.zhong@rivai.ai Oct. 20, 2023, 8:49 a.m. UTC | #4
/* We only enable VLS modes for VLA vectorization since fixed length VLMAX mode
-   is the highest priority choice and should not conflict with VLS modes.  */
-#define TARGET_VECTOR_VLS                                                      \
-  (TARGET_VECTOR && riscv_autovec_preference == RVV_SCALABLE)
+   is the highest priority choice and should not conflict with VLS modes.
+
+   We also enable VLS modes for some cases in fixed-vlmax, aka the bitsize of
+   the VLS mode are smaller than the minimal vla.
+   */
+#define TARGET_VECTOR_VLS(mode)                                                \
+  (TARGET_VECTOR                                                               \
+    && (riscv_autovec_preference == RVV_SCALABLE                               \
+      || (riscv_autovec_preference == RVV_FIXED_VLMAX                          \
+	&& riscv_vector::legal_vls_mode_in_fixed_vlmax_p (mode))))

Remove this macro. Create a function called "vls_mode_valid_p"
and use riscv_vector::vls_mode_valid_p in all places.

Indicate this PR in commit title: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111857

-> RISC-V: Support partial VLS mode when preference fixed-vlmax[PR111857]

+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vls-mode-6.c
Change  all tests name into pr111857-1.c...etc.

juzhe.zhong@rivai.ai
 
From: Li, Pan2
Date: 2023-10-20 16:43
To: Richard Biener
CC: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang; kito.cheng@gmail.com; Liu, Hongtao
Subject: RE: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer
Hi Richard Biener,
 
The CI of linaro-toolchain@lists.linaro.org reports some aarch64 regression of this change, I will double check about it soon.
 
FAIL: 12 regressions
 
regressions.sum:
=== gcc tests ===
 
Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ...
FAIL: gcc.target/aarch64/sve/clrsb_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284)
FAIL: gcc.target/aarch64/sve/clrsb_1.c (test for excess errors)
FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tcls\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2
FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
FAIL: gcc.target/aarch64/sve/clz_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284)
FAIL: gcc.target/aarch64/sve/clz_1.c (test for excess errors)
FAIL: gcc.target/aarch64/sve/clz_1.c scan-assembler-times \\tclz\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2
... and 7 more entries
 
Pan
 
-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com>
Sent: Wednesday, October 18, 2023 2:34 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer
 
On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> Passed the x86 bootstrap and regression test already.
 
OK.
 
On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP
vectorization we could expect to see the following vectorized after
the patch (with loop vectorization you'll see us pre-select same sized
vector types)
 
long int x[4];
float y[4];
 
void foo ()
{
  x[0] = __builtin_lrintf (y[0]);
  x[1] = __builtin_lrintf (y[1]);
  x[2] = __builtin_lrintf (y[2]);
  x[3] = __builtin_lrintf (y[3]);
}
 
 
> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_call): Remove data size
>         check.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-stmts.cc | 13 -------------
>  1 file changed, 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a56498595..326e000a71d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>
diff mbox series

Patch

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b3a56498595..326e000a71d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3529,19 +3529,6 @@  vectorizable_call (vec_info *vinfo,
 
       return false;
     }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
-     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
-     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
-     by a pack of the two vectors into an SI vector.  We would need
-     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "mismatched vector sizes %T and %T\n",
-			 vectype_in, vectype_out);
-      return false;
-    }
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
       != VECTOR_BOOLEAN_TYPE_P (vectype_in))