diff mbox series

vect: Tighten get_related_vectype_for_scalar_type

Message ID mptbks8jx6v.fsf@arm.com
State New
Headers show
Series vect: Tighten get_related_vectype_for_scalar_type | expand

Commit Message

Richard Sandiford Aug. 25, 2022, 12:59 p.m. UTC
Builds of glibc with SVE enabled have been failing since V1DI was added
to the aarch64 port.  The problem is that BB SLP starts the (hopeless)
attempt to use variable-length modes to vectorise a single-element
vector, and that now gets further than it did before.

Initially we tried getting a vector mode with 1 + 1X DI elements
(i.e. 1 DI per 128-bit vector chunk).  We don't provide such a mode --
it would be VNx1DI -- because it isn't a native SVE format.  We then
try just 1 DI, which previously failed but now succeeds.

There are numerous ways we could fix this.  Perhaps the most obvious
would be to skip variable-length modes for BB SLP.  However, I think
that'd just be kicking the can down the road, since eventually we want
to support BB SLP and VLA vectors using predication.

However, if we do use VLA vectors for BB SLP, the vector modes
we use should actually be variable length.  We don't want to use
variable-length vectors for some element types/group sizes and
fixed-length vectors for others, since it would be difficult
to handle the seams.

The same principle applies during loop vectorisation.  We can't
use a mixture of variable-length and fixed-length vectors for
the same loop because the relative unroll/vectorisation factors
would not be constant (compile-time) multiples of each other.

This patch therefore makes get_related_vectype_for_scalar_type
check that the provided number of units is interoperable with
the provided prevailing mode.  The function is generally quite
forgiving -- it does basic things like checking for scalarness
itself rather than expecting callers to do them -- so the new
check feels in keeping with that.

This seems to subsume the fix for PR96974.  I'm not sure it's
worth reverting that code to an assert though, so the patch just
drops the scan for the associated message.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
	* tree-vect-stmts.cc (get_related_vectype_for_scalar_type): Check
	that the requested number of units is interoperable with the requested
	prevailing mode.

gcc/testsuite/
	* gcc.target/aarch64/sve/slp_15.c: New test.
	* g++.target/aarch64/sve/pr96974.C: Remove scan test.
---
 gcc/testsuite/g++.target/aarch64/sve/pr96974.C |  4 +---
 gcc/testsuite/gcc.target/aarch64/sve/slp_15.c  | 17 +++++++++++++++++
 gcc/tree-vect-stmts.cc                         | 10 ++++++++++
 3 files changed, 28 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/slp_15.c

Comments

Richard Biener Aug. 26, 2022, 8:36 a.m. UTC | #1
On Thu, 25 Aug 2022, Richard Sandiford wrote:

> Builds of glibc with SVE enabled have been failing since V1DI was added
> to the aarch64 port.  The problem is that BB SLP starts the (hopeless)
> attempt to use variable-length modes to vectorise a single-element
> vector, and that now gets further than it did before.
> 
> Initially we tried getting a vector mode with 1 + 1X DI elements
> (i.e. 1 DI per 128-bit vector chunk).  We don't provide such a mode --
> it would be VNx1DI -- because it isn't a native SVE format.  We then
> try just 1 DI, which previously failed but now succeeds.
> 
> There are numerous ways we could fix this.  Perhaps the most obvious
> would be to skip variable-length modes for BB SLP.  However, I think
> that'd just be kicking the can down the road, since eventually we want
> to support BB SLP and VLA vectors using predication.
> 
> However, if we do use VLA vectors for BB SLP, the vector modes
> we use should actually be variable length.  We don't want to use
> variable-length vectors for some element types/group sizes and
> fixed-length vectors for others, since it would be difficult
> to handle the seams.
> 
> The same principle applies during loop vectorisation.  We can't
> use a mixture of variable-length and fixed-length vectors for
> the same loop because the relative unroll/vectorisation factors
> would not be constant (compile-time) multiples of each other.
> 
> This patch therefore makes get_related_vectype_for_scalar_type
> check that the provided number of units is interoperable with
> the provided prevailing mode.  The function is generally quite
> forgiving -- it does basic things like checking for scalarness
> itself rather than expecting callers to do them -- so the new
> check feels in keeping with that.
> 
> This seems to subsume the fix for PR96974.  I'm not sure it's
> worth reverting that code to an assert though, so the patch just
> drops the scan for the associated message.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
> 
> 
> gcc/
> 	* tree-vect-stmts.cc (get_related_vectype_for_scalar_type): Check
> 	that the requested number of units is interoperable with the requested
> 	prevailing mode.
> 
> gcc/testsuite/
> 	* gcc.target/aarch64/sve/slp_15.c: New test.
> 	* g++.target/aarch64/sve/pr96974.C: Remove scan test.
> ---
>  gcc/testsuite/g++.target/aarch64/sve/pr96974.C |  4 +---
>  gcc/testsuite/gcc.target/aarch64/sve/slp_15.c  | 17 +++++++++++++++++
>  gcc/tree-vect-stmts.cc                         | 10 ++++++++++
>  3 files changed, 28 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/slp_15.c
> 
> diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
> index 54000f568ab..2f6ebd6ce3d 100644
> --- a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
> +++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4 -fdump-tree-slp-details" } */
> +/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" } */
>  
>  float a;
>  int
> @@ -14,5 +14,3 @@ struct c {
>      }
>      int coeffs[10];
>  } f;
> -
> -/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of vector subparts between" "slp1" { target lp64 } } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_15.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_15.c
> new file mode 100644
> index 00000000000..23f6d567cc5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_15.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +
> +struct foo
> +{
> +  void *handle;
> +  void *arg;
> +};
> +
> +void
> +dlinfo_doit (struct foo *args)
> +{
> +  __UINTPTR_TYPE__ **l = args->handle;
> +
> +  *(__UINTPTR_TYPE__ *) args->arg = 0;
> +  *(__UINTPTR_TYPE__ *) args->arg = **l;
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index c9dab217f05..7748c42c70f 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -11486,6 +11486,16 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
>  
>    unsigned int nbytes = GET_MODE_SIZE (inner_mode);
>  
> +  /* Interoperability between modes requires one to be a constant multiple
> +     of the other, so that the number of vectors required for each operation
> +     is a compile-time constant.  */
> +  if (prevailing_mode != VOIDmode
> +      && !constant_multiple_p (nunits * nbytes,
> +			       GET_MODE_SIZE (prevailing_mode))
> +      && !constant_multiple_p (GET_MODE_SIZE (prevailing_mode),
> +			       nunits * nbytes))
> +    return NULL_TREE;
> +
>    /* For vector types of elements whose mode precision doesn't
>       match their types precision we use a element type of mode
>       precision.  The vectorization routines will have to make sure
>
diff mbox series

Patch

diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
index 54000f568ab..2f6ebd6ce3d 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
+++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4 -fdump-tree-slp-details" } */
+/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" } */
 
 float a;
 int
@@ -14,5 +14,3 @@  struct c {
     }
     int coeffs[10];
 } f;
-
-/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of vector subparts between" "slp1" { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_15.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_15.c
new file mode 100644
index 00000000000..23f6d567cc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_15.c
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+struct foo
+{
+  void *handle;
+  void *arg;
+};
+
+void
+dlinfo_doit (struct foo *args)
+{
+  __UINTPTR_TYPE__ **l = args->handle;
+
+  *(__UINTPTR_TYPE__ *) args->arg = 0;
+  *(__UINTPTR_TYPE__ *) args->arg = **l;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c9dab217f05..7748c42c70f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11486,6 +11486,16 @@  get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
 
   unsigned int nbytes = GET_MODE_SIZE (inner_mode);
 
+  /* Interoperability between modes requires one to be a constant multiple
+     of the other, so that the number of vectors required for each operation
+     is a compile-time constant.  */
+  if (prevailing_mode != VOIDmode
+      && !constant_multiple_p (nunits * nbytes,
+			       GET_MODE_SIZE (prevailing_mode))
+      && !constant_multiple_p (GET_MODE_SIZE (prevailing_mode),
+			       nunits * nbytes))
+    return NULL_TREE;
+
   /* For vector types of elements whose mode precision doesn't
      match their types precision we use a element type of mode
      precision.  The vectorization routines will have to make sure