diff mbox series

[rtl-optimization] Simplify vector shift/rotate with const_vec_duplicate to vector shift/rotate with const_int element.

Message ID 20210806070450.1168329-1-hongtao.liu@intel.com
State New
Headers show
Series [rtl-optimization] Simplify vector shift/rotate with const_vec_duplicate to vector shift/rotate with const_int element. | expand

Commit Message

liuhongt Aug. 6, 2021, 7:04 a.m. UTC
Hi:
  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
  Ok for trunk?

gcc/ChangeLog:

	PR rtl-optimization/101796
	* simplify-rtx.c
	(simplify_context::simplify_binary_operation_1): Simplify
	vector shift/rotate with const_vec_duplicate to vector
	shift/rotate with const_int element.

gcc/testsuite/ChangeLog:

	PR rtl-optimization/101796
	* gcc.target/i386/pr101796.c: New test.
---
 gcc/simplify-rtx.c                       | 15 ++++++
 gcc/testsuite/gcc.target/i386/pr101796.c | 65 ++++++++++++++++++++++++
 2 files changed, 80 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101796.c

Comments

Richard Sandiford Aug. 6, 2021, 10:55 a.m. UTC | #1
liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi:
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
>   Ok for trunk?

I think if anything the canonicalisation should be the other way:
if the shift amount is an in-range constant, we know that it fits
within a vector element, and so the vector form should be preferred.

IMO it's a wart that we support (say):

  (ashift:V4SI (reg:V4SI R) (const_int 1))

even though:

  (plus:V4SI (reg:V4SI R) (const_int 1))
  (minus:V4SI (const_int 1) (reg:V4SI R))

are not well-formed (at least AFAIK).

Thanks,
Richard

>
> gcc/ChangeLog:
>
> 	PR rtl-optimization/101796
> 	* simplify-rtx.c
> 	(simplify_context::simplify_binary_operation_1): Simplify
> 	vector shift/rotate with const_vec_duplicate to vector
> 	shift/rotate with const_int element.
>
> gcc/testsuite/ChangeLog:
>
> 	PR rtl-optimization/101796
> 	* gcc.target/i386/pr101796.c: New test.
> ---
>  gcc/simplify-rtx.c                       | 15 ++++++
>  gcc/testsuite/gcc.target/i386/pr101796.c | 65 ++++++++++++++++++++++++
>  2 files changed, 80 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101796.c
>
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index a719f57870f..75f3e455562 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -3970,6 +3970,21 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>  	    return simplify_gen_binary (code, mode, op0,
>  					gen_int_shift_amount (mode, val));
>  	}
> +
> +      /* Optimize vector shift/rotate with const_vec_duplicate
> +	 to vector shift/rotate with const_int element.
> +      /* TODO: vec_duplicate with variable can also be simplified,
> +	 but GCC only require operand 2 of shift/rotate to be a scalar type
> +	 which can have different modes in different backends, it makes
> +	 simplication difficult to decide which mode should be choosed
> +	 for shift/rotate count.  */
> +      if ((code == ASHIFTRT || code == LSHIFTRT
> +	   || code == ASHIFT || code == ROTATERT
> +	   || code == ROTATE)
> +	  && const_vec_duplicate_p (op1))
> +	return simplify_gen_binary (code, mode, op0,
> +				    unwrap_const_vec_duplicate (op1));
> +
>        break;
>  
>      case ASHIFT:
> diff --git a/gcc/testsuite/gcc.target/i386/pr101796.c b/gcc/testsuite/gcc.target/i386/pr101796.c
> new file mode 100644
> index 00000000000..c22d6267fe5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101796.c
> @@ -0,0 +1,65 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512bw -O2 " } */
> +/* { dg-final { scan-assembler-not "vpbroadcast" } }  */
> +/* { dg-final { scan-assembler-not "vpsrlv\[dwq\]" } }  */
> +/* { dg-final { scan-assembler-not "vpsllv\[dwq\]" } }  */
> +/* { dg-final { scan-assembler-not "vpsrav\[dwq\]" } }  */
> +/* { dg-final { scan-assembler-times "vpsrl\[dwq\]" 3 } }  */
> +/* { dg-final { scan-assembler-times "vpsll\[dwq\]" 3 } }  */
> +/* { dg-final { scan-assembler-times "vpsra\[dwq\]" 3 } }  */
> +
> +#include <immintrin.h>
> +
> +__m512i
> +foo (__m512i a)
> +{
> +  return _mm512_srlv_epi16 (a, _mm512_set1_epi16 (3));
> +}
> +
> +__m512i
> +foo1 (__m512i a)
> +{
> +  return _mm512_srlv_epi32 (a, _mm512_set1_epi32 (3));
> +}
> +
> +__m512i
> +foo2 (__m512i a, long long b)
> +{
> +  return _mm512_srlv_epi64 (a, _mm512_set1_epi64 (3));
> +}
> +
> +__m512i
> +foo3 (__m512i a)
> +{
> +  return _mm512_srav_epi16 (a, _mm512_set1_epi16 (3));
> +}
> +
> +__m512i
> +foo4 (__m512i a)
> +{
> +  return _mm512_srav_epi32 (a, _mm512_set1_epi32 (3));
> +}
> +
> +__m512i
> +foo5 (__m512i a, long long b)
> +{
> +  return _mm512_srav_epi64 (a, _mm512_set1_epi64 (3));
> +}
> +
> +__m512i
> +foo6 (__m512i a)
> +{
> +  return _mm512_sllv_epi16 (a, _mm512_set1_epi16 (3));
> +}
> +
> +__m512i
> +foo7 (__m512i a)
> +{
> +  return _mm512_sllv_epi32 (a, _mm512_set1_epi32 (3));
> +}
> +
> +__m512i
> +foo8 (__m512i a, long long b)
> +{
> +  return _mm512_sllv_epi64 (a, _mm512_set1_epi64 (3));
> +}
Segher Boessenkool Aug. 6, 2021, 2:08 p.m. UTC | #2
On Fri, Aug 06, 2021 at 11:55:52AM +0100, Richard Sandiford wrote:
> liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > Hi:
> >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
> >   Ok for trunk?
> 
> I think if anything the canonicalisation should be the other way:
> if the shift amount is an in-range constant, we know that it fits
> within a vector element, and so the vector form should be preferred.

Yeah.  And the canonicalisation needs to be documented *first*, i.e. we
have to agree on it first, *before* patches doing this to simplify-rtx
are acceptable.  We don't do design-by-fait-accompli.

Any canonicalisation also has to fit in well with other canonicalisations,
or we will be better off not having canonical forms.

If it turns out there is no good canonical form, we will simply have to
handle both forms (or more than two perhaps).  This isn't the end of the
world, we have to do that already.  If we can simplify things with a
canonical form, that is great; if that causes too much extra work
instead, it is not so great.  These things have to be thought about.


Segher
diff mbox series

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index a719f57870f..75f3e455562 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -3970,6 +3970,21 @@  simplify_context::simplify_binary_operation_1 (rtx_code code,
 	    return simplify_gen_binary (code, mode, op0,
 					gen_int_shift_amount (mode, val));
 	}
+
+      /* Optimize vector shift/rotate with const_vec_duplicate
+	 to vector shift/rotate with const_int element.
+      /* TODO: vec_duplicate with variable can also be simplified,
+	 but GCC only require operand 2 of shift/rotate to be a scalar type
+	 which can have different modes in different backends, it makes
+	 simplication difficult to decide which mode should be choosed
+	 for shift/rotate count.  */
+      if ((code == ASHIFTRT || code == LSHIFTRT
+	   || code == ASHIFT || code == ROTATERT
+	   || code == ROTATE)
+	  && const_vec_duplicate_p (op1))
+	return simplify_gen_binary (code, mode, op0,
+				    unwrap_const_vec_duplicate (op1));
+
       break;
 
     case ASHIFT:
diff --git a/gcc/testsuite/gcc.target/i386/pr101796.c b/gcc/testsuite/gcc.target/i386/pr101796.c
new file mode 100644
index 00000000000..c22d6267fe5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101796.c
@@ -0,0 +1,65 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2 " } */
+/* { dg-final { scan-assembler-not "vpbroadcast" } }  */
+/* { dg-final { scan-assembler-not "vpsrlv\[dwq\]" } }  */
+/* { dg-final { scan-assembler-not "vpsllv\[dwq\]" } }  */
+/* { dg-final { scan-assembler-not "vpsrav\[dwq\]" } }  */
+/* { dg-final { scan-assembler-times "vpsrl\[dwq\]" 3 } }  */
+/* { dg-final { scan-assembler-times "vpsll\[dwq\]" 3 } }  */
+/* { dg-final { scan-assembler-times "vpsra\[dwq\]" 3 } }  */
+
+#include <immintrin.h>
+
+__m512i
+foo (__m512i a)
+{
+  return _mm512_srlv_epi16 (a, _mm512_set1_epi16 (3));
+}
+
+__m512i
+foo1 (__m512i a)
+{
+  return _mm512_srlv_epi32 (a, _mm512_set1_epi32 (3));
+}
+
+__m512i
+foo2 (__m512i a, long long b)
+{
+  return _mm512_srlv_epi64 (a, _mm512_set1_epi64 (3));
+}
+
+__m512i
+foo3 (__m512i a)
+{
+  return _mm512_srav_epi16 (a, _mm512_set1_epi16 (3));
+}
+
+__m512i
+foo4 (__m512i a)
+{
+  return _mm512_srav_epi32 (a, _mm512_set1_epi32 (3));
+}
+
+__m512i
+foo5 (__m512i a, long long b)
+{
+  return _mm512_srav_epi64 (a, _mm512_set1_epi64 (3));
+}
+
+__m512i
+foo6 (__m512i a)
+{
+  return _mm512_sllv_epi16 (a, _mm512_set1_epi16 (3));
+}
+
+__m512i
+foo7 (__m512i a)
+{
+  return _mm512_sllv_epi32 (a, _mm512_set1_epi32 (3));
+}
+
+__m512i
+foo8 (__m512i a, long long b)
+{
+  return _mm512_sllv_epi64 (a, _mm512_set1_epi64 (3));
+}