s390x: Optimize vector permute with constant indexes

Message ID	20240402075601.7733-1-jchrist@linux.ibm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CCF863858D28 From: Juergen Christ <jchrist@linux.ibm.com> To: gcc-patches@gcc.gnu.org Cc: krebbel@linux.ibm.com Subject: [PATCH] s390x: Optimize vector permute with constant indexes Date: Tue, 2 Apr 2024 09:56:01 +0200 Message-Id: <20240402075601.7733-1-jchrist@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	s390x: Optimize vector permute with constant indexes \| expand s390x: Optimize vector permute with constant indexes

Message ID

20240402075601.7733-1-jchrist@linux.ibm.com

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CCF863858D28
From: Juergen Christ <jchrist@linux.ibm.com>
To: gcc-patches@gcc.gnu.org
Cc: krebbel@linux.ibm.com
Subject: [PATCH] s390x: Optimize vector permute with constant indexes
Date: Tue,  2 Apr 2024 09:56:01 +0200
Message-Id: <20240402075601.7733-1-jchrist@linux.ibm.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org

Series

s390x: Optimize vector permute with constant indexes | expand

Commit Message

Juergen Christ April 2, 2024, 7:56 a.m. UTC

Loop vectorizer can generate vector permutes with constant indexes
where all indexes are equal.  Optimize this case to use vector
replicate instead of vector permute.

gcc/ChangeLog:

	* config/s390/s390.cc (expand_perm_as_replicate): Implement.
	(vectorize_vec_perm_const_1): Call new function.
	* config/s390/vx-builtins.md (vec_splat<mode>): Change to...
	(@vec_splat<mode>): ...this.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/vector/vec-expand-replicate.c: New test.

Bootstrapped and regtested on s390x.  Ok for trunk?

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
---
 gcc/config/s390/s390.cc                       | 32 +++++++++++++++++++
 gcc/config/s390/vx-builtins.md                |  2 +-
 .../s390/vector/vec-expand-replicate.c        | 30 +++++++++++++++++
 3 files changed, 63 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c

Comments

Stefan Schulze Frielinghaus April 9, 2024, 9:51 a.m. UTC | #1

On Tue, Apr 02, 2024 at 09:56:01AM +0200, Juergen Christ wrote:
> Loop vectorizer can generate vector permutes with constant indexes
> where all indexes are equal.  Optimize this case to use vector
> replicate instead of vector permute.
> 
> gcc/ChangeLog:
> 
> 	* config/s390/s390.cc (expand_perm_as_replicate): Implement.
> 	(vectorize_vec_perm_const_1): Call new function.
> 	* config/s390/vx-builtins.md (vec_splat<mode>): Change to...
> 	(@vec_splat<mode>): ...this.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/s390/vector/vec-expand-replicate.c: New test.
> 
> Bootstrapped and regtested on s390x.  Ok for trunk?
> 
> Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
> ---
>  gcc/config/s390/s390.cc                       | 32 +++++++++++++++++++
>  gcc/config/s390/vx-builtins.md                |  2 +-
>  .../s390/vector/vec-expand-replicate.c        | 30 +++++++++++++++++
>  3 files changed, 63 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c
> 
> diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> index 372a23244032..4b4014ebe444 100644
> --- a/gcc/config/s390/s390.cc
> +++ b/gcc/config/s390/s390.cc
> @@ -17923,6 +17923,35 @@ expand_perm_as_a_vlbr_vstbr_candidate (const struct expand_vec_perm_d &d)
>    return false;
>  }
>  
> +static bool expand_perm_as_replicate (const struct expand_vec_perm_d &d)
               ^~~~~~~~~~~~~~~~~~~~~~~~
Function names start on a new line.

> +{
> +  unsigned char i;
> +  unsigned char elem;
> +  rtx base = d.op0;
> +  rtx insn;
> +  /* Needed to silence maybe-uninitialized warning.  */
> +  gcc_assert(d.nelt > 0);
     ~~~~~~~~~~^~~~~~~~~~~~
Between function name and open bracket whitespace is missing.

Curiously enough, the error is about d which is a reference and cannot
be null.  If you are eager you could reduce this and open a PR.

s390.cc:17935:8: warning: ‘d’ may be used uninitialized [-Wmaybe-uninitialized]
17935 |   elem = d.perm[0];
      |   ~~~~~^~~~~~~~~~~

> +  elem = d.perm[0];
> +  for (i = 1; i < d.nelt; ++i)
> +    if (d.perm[i] != elem)
> +      return false;
> +  if (!d.testing_p)
> +    {
> +      if (elem >= d.nelt)
> +	{
> +	  base = d.op1;
> +	  elem -= d.nelt;
> +	}
> +      insn = maybe_gen_vec_splat (d.vmode, d.target, base, GEN_INT (elem));
> +      if (insn == NULL_RTX)
> +	return false;
> +      emit_insn (insn);
> +      return true;
> +    }
> +  else
> +    return maybe_code_for_vec_splat (d.vmode) != CODE_FOR_nothing;
> +}
> +
>  /* Try to find the best sequence for the vector permute operation
>     described by D.  Return true if the operation could be
>     expanded.  */
> @@ -17941,6 +17970,9 @@ vectorize_vec_perm_const_1 (const struct expand_vec_perm_d &d)
>    if (expand_perm_as_a_vlbr_vstbr_candidate (d))
>      return true;
>  
> +  if (expand_perm_as_replicate(d))
         ~~~~~~~~~~~~~~~~~~~~~~~~^~~
Between function name and open bracket whitespace is missing.

> +    return true;
> +
>    return false;
>  }
>  
> diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
> index 432d81a719fc..93c0d408a43e 100644
> --- a/gcc/config/s390/vx-builtins.md
> +++ b/gcc/config/s390/vx-builtins.md
> @@ -424,7 +424,7 @@
>  
>  
>  ; Replicate from vector element
> -(define_expand "vec_splat<mode>"
> +(define_expand "@vec_splat<mode>"
>    [(set (match_operand:V_HW                      0 "register_operand"  "")
>  	(vec_duplicate:V_HW (vec_select:<non_vec>
>  			     (match_operand:V_HW 1 "register_operand"  "")
> diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c
> new file mode 100644
> index 000000000000..27563a00f22b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c
> @@ -0,0 +1,30 @@
> +/* Check that the vectorize_vec_perm_const expander correctly deals with
> +   replication.  Extracted from spec "nab".  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mzarch -march=z13 -fvect-cost-model=unlimited" } */
> +
> +
> +#define REAL_T  double
> +typedef REAL_T  MATRIX_T[ 4 ][ 4 ];
> +
> +int concat_mat_i, concat_mat_j;
> +static void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3);
> +MATRIX_T *rot4p() {
> +  MATRIX_T mat3, mat4;
> +  static MATRIX_T mat5;
> +  concat_mat(mat4, mat3, mat5);
> +}
> +void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3) {
> +  int k;
> +  for (;; concat_mat_i++) {
> +    concat_mat_j = 0;
> +    for (; 4; concat_mat_j++) {
> +      k = 0;
> +      for (; k < 4; k++)
> +        m3[concat_mat_i][concat_mat_j] += m1[concat_mat_i][k];
> +    }

Just nitpicking, if we could come up with a test case which does not
involve integer overflows due to non-terminating loops, I would prefer
that.

Cheers,
Stefan

> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not "vperm" } } */
> -- 
> 2.39.3
>

Juergen Christ April 9, 2024, 2:29 p.m. UTC | #2

Am Tue, Apr 09, 2024 at 11:51:00AM +0200 schrieb Stefan Schulze Frielinghaus:
> > +static bool expand_perm_as_replicate (const struct expand_vec_perm_d &d)
>                ^~~~~~~~~~~~~~~~~~~~~~~~
> Function names start on a new line.

Fixed

> > +{
> > +  unsigned char i;
> > +  unsigned char elem;
> > +  rtx base = d.op0;
> > +  rtx insn;
> > +  /* Needed to silence maybe-uninitialized warning.  */
> > +  gcc_assert(d.nelt > 0);
>      ~~~~~~~~~~^~~~~~~~~~~~
> Between function name and open bracket whitespace is missing.

Fixed.

> Curiously enough, the error is about d which is a reference and cannot
> be null.  If you are eager you could reduce this and open a PR.
> 
> s390.cc:17935:8: warning: ‘d’ may be used uninitialized [-Wmaybe-uninitialized]
> 17935 |   elem = d.perm[0];
>       |   ~~~~~^~~~~~~~~~~

Weirdly enough it is not `d`, but `d.perm[0]` that seems to be the
problem.  But I did not reduce this.  As the assertion suggests, it is
known that all elements in d.perm in the range [0,d.nelts) are
initialized.  I would like to defer that to a time when I (hopefully)
have some more spare time.

> > +  if (expand_perm_as_replicate(d))
>          ~~~~~~~~~~~~~~~~~~~~~~~~^~~
> Between function name and open bracket whitespace is missing.

Fixed

> > diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c
> > new file mode 100644
> > index 000000000000..27563a00f22b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c
> > @@ -0,0 +1,30 @@
> > +/* Check that the vectorize_vec_perm_const expander correctly deals with
> > +   replication.  Extracted from spec "nab".  */
> > +
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -mzarch -march=z13 -fvect-cost-model=unlimited" } */
> > +
> > +
> > +#define REAL_T  double
> > +typedef REAL_T  MATRIX_T[ 4 ][ 4 ];
> > +
> > +int concat_mat_i, concat_mat_j;
> > +static void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3);
> > +MATRIX_T *rot4p() {
> > +  MATRIX_T mat3, mat4;
> > +  static MATRIX_T mat5;
> > +  concat_mat(mat4, mat3, mat5);
> > +}
> > +void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3) {
> > +  int k;
> > +  for (;; concat_mat_i++) {
> > +    concat_mat_j = 0;
> > +    for (; 4; concat_mat_j++) {
> > +      k = 0;
> > +      for (; k < 4; k++)
> > +        m3[concat_mat_i][concat_mat_j] += m1[concat_mat_i][k];
> > +    }
> 
> Just nitpicking, if we could come up with a test case which does not
> involve integer overflows due to non-terminating loops, I would prefer
> that.

Well, I have a version without integer overflows, but it still has
non-terminating loops...

Will send a v2,

Juergen

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 372a23244032..4b4014ebe444 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -17923,6 +17923,35 @@  expand_perm_as_a_vlbr_vstbr_candidate (const struct expand_vec_perm_d &d)
   return false;
 }
 
+static bool expand_perm_as_replicate (const struct expand_vec_perm_d &d)
+{
+  unsigned char i;
+  unsigned char elem;
+  rtx base = d.op0;
+  rtx insn;
+  /* Needed to silence maybe-uninitialized warning.  */
+  gcc_assert(d.nelt > 0);
+  elem = d.perm[0];
+  for (i = 1; i < d.nelt; ++i)
+    if (d.perm[i] != elem)
+      return false;
+  if (!d.testing_p)
+    {
+      if (elem >= d.nelt)
+	{
+	  base = d.op1;
+	  elem -= d.nelt;
+	}
+      insn = maybe_gen_vec_splat (d.vmode, d.target, base, GEN_INT (elem));
+      if (insn == NULL_RTX)
+	return false;
+      emit_insn (insn);
+      return true;
+    }
+  else
+    return maybe_code_for_vec_splat (d.vmode) != CODE_FOR_nothing;
+}
+
 /* Try to find the best sequence for the vector permute operation
    described by D.  Return true if the operation could be
    expanded.  */
@@ -17941,6 +17970,9 @@  vectorize_vec_perm_const_1 (const struct expand_vec_perm_d &d)
   if (expand_perm_as_a_vlbr_vstbr_candidate (d))
     return true;
 
+  if (expand_perm_as_replicate(d))
+    return true;
+
   return false;
 }
 
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 432d81a719fc..93c0d408a43e 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -424,7 +424,7 @@ 
 
 
 ; Replicate from vector element
-(define_expand "vec_splat<mode>"
+(define_expand "@vec_splat<mode>"
   [(set (match_operand:V_HW                      0 "register_operand"  "")
 	(vec_duplicate:V_HW (vec_select:<non_vec>
 			     (match_operand:V_HW 1 "register_operand"  "")
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c
new file mode 100644
index 000000000000..27563a00f22b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c
@@ -0,0 +1,30 @@ 
+/* Check that the vectorize_vec_perm_const expander correctly deals with
+   replication.  Extracted from spec "nab".  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -mzarch -march=z13 -fvect-cost-model=unlimited" } */
+
+
+#define REAL_T  double
+typedef REAL_T  MATRIX_T[ 4 ][ 4 ];
+
+int concat_mat_i, concat_mat_j;
+static void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3);
+MATRIX_T *rot4p() {
+  MATRIX_T mat3, mat4;
+  static MATRIX_T mat5;
+  concat_mat(mat4, mat3, mat5);
+}
+void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3) {
+  int k;
+  for (;; concat_mat_i++) {
+    concat_mat_j = 0;
+    for (; 4; concat_mat_j++) {
+      k = 0;
+      for (; k < 4; k++)
+        m3[concat_mat_i][concat_mat_j] += m1[concat_mat_i][k];
+    }
+  }
+}
+
+/* { dg-final { scan-assembler-not "vperm" } } */

s390x: Optimize vector permute with constant indexes

Commit Message

Comments

Patch