Check rrotate optab first when transforming lrotate
diff mbox series

Message ID 232a38b1-76c2-476d-1be0-a1958e5624bb@linux.ibm.com
State New
Headers show
Series
  • Check rrotate optab first when transforming lrotate
Related show

Commit Message

Kewen.Lin July 15, 2019, 8:50 a.m. UTC
Hi all,

In match.pd and expmed.c, we have some codes to transform lrotate to 
rrotate if rotation count is const.  But they don't consider the target
whether supports the rrotate.  It leads to some suboptimal generated
code since some optimization can't get expected result by querying
target optab.  One typical case is that we miss some rotation 
vectorization opportunity on Power.

This patch is to teach them to check target optab availability first.

Regression testing just launched, is it OK for trunk if it passed and
bootstrapped?

Thanks,
Kewen

-----------------------------------------------

gcc/ChangeLog

2019-07-15  Kewen Lin  <linkw@gcc.gnu.org>

	* expmed.c (expand_shift_1): Only transform to opposite rotate
	when it's supported on the target.
	* match.pd (lrot N -> rrot N'): Check target support or not.

gcc/testsuite/ChangeLog

2019-07-15  Kewen Lin  <linkw@gcc.gnu.org>

	* gcc.target/powerpc/vec_rotate.c: New test.

Comments

Jakub Jelinek July 15, 2019, 8:59 a.m. UTC | #1
On Mon, Jul 15, 2019 at 04:50:13PM +0800, Kewen.Lin wrote:
> In match.pd and expmed.c, we have some codes to transform lrotate to 
> rrotate if rotation count is const.  But they don't consider the target
> whether supports the rrotate.  It leads to some suboptimal generated
> code since some optimization can't get expected result by querying
> target optab.  One typical case is that we miss some rotation 
> vectorization opportunity on Power.

This will not do the right thing if neither lrotate nor rrotate is
supported, you want to canonicalize in that case IMHO.
The code formatting is wrong (|| at the end of line, overly long lines).

Finally, what is the reason why Power doesn't support one of the rotates?
Especially for rotates by constant, you can transform those in the
define_expand.

Canonicalizing code is highly desirable during GIMPLE, it means if you have
say left rotate by 23 in one spot and right rotate by bitsize - 23 in
another spot, e.g. SCCVN can merge that code.

So, IMNSHO, just improve the backend.

	Jakub
Richard Biener July 15, 2019, 9:15 a.m. UTC | #2
On Mon, Jul 15, 2019 at 10:59 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Jul 15, 2019 at 04:50:13PM +0800, Kewen.Lin wrote:
> > In match.pd and expmed.c, we have some codes to transform lrotate to
> > rrotate if rotation count is const.  But they don't consider the target
> > whether supports the rrotate.  It leads to some suboptimal generated
> > code since some optimization can't get expected result by querying
> > target optab.  One typical case is that we miss some rotation
> > vectorization opportunity on Power.
>
> This will not do the right thing if neither lrotate nor rrotate is
> supported, you want to canonicalize in that case IMHO.
> The code formatting is wrong (|| at the end of line, overly long lines).
>
> Finally, what is the reason why Power doesn't support one of the rotates?
> Especially for rotates by constant, you can transform those in the
> define_expand.
>
> Canonicalizing code is highly desirable during GIMPLE, it means if you have
> say left rotate by 23 in one spot and right rotate by bitsize - 23 in
> another spot, e.g. SCCVN can merge that code.
>
> So, IMNSHO, just improve the backend.

Agreed.  Or alternatively improve the generic expander.

Richard.

>         Jakub
Richard Sandiford July 15, 2019, 9:19 a.m. UTC | #3
Jakub Jelinek <jakub@redhat.com> writes:
> On Mon, Jul 15, 2019 at 04:50:13PM +0800, Kewen.Lin wrote:
>> In match.pd and expmed.c, we have some codes to transform lrotate to 
>> rrotate if rotation count is const.  But they don't consider the target
>> whether supports the rrotate.  It leads to some suboptimal generated
>> code since some optimization can't get expected result by querying
>> target optab.  One typical case is that we miss some rotation 
>> vectorization opportunity on Power.
>
> This will not do the right thing if neither lrotate nor rrotate is
> supported, you want to canonicalize in that case IMHO.
> The code formatting is wrong (|| at the end of line, overly long lines).
>
> Finally, what is the reason why Power doesn't support one of the rotates?
> Especially for rotates by constant, you can transform those in the
> define_expand.
>
> Canonicalizing code is highly desirable during GIMPLE, it means if you have
> say left rotate by 23 in one spot and right rotate by bitsize - 23 in
> another spot, e.g. SCCVN can merge that code.
>
> So, IMNSHO, just improve the backend.

Agreed FWIW, if the target supports rotates by a constant.  If it
only supports rotates by a variable then I think expand should try
to use the rrotate optab for lrotates.

The current code looks odd though.  The comment above the expmed.c
hunk is:

  /* Canonicalize rotates by constant amount.  If op1 is bitsize / 2,
     prefer left rotation, if op1 is from bitsize / 2 + 1 to
     bitsize - 1, use other direction of rotate with 1 .. bitsize / 2 - 1
     amount instead.  */

whereas rtl.texi says:

@item (rotate:@var{m} @var{x} @var{c})
@itemx (rotatert:@var{m} @var{x} @var{c})
Similar but represent left and right rotate.  If @var{c} is a constant,
use @code{rotate}.

simplify-rtx.c seems to stick to the rtl.texi version and use ROTATE
rather than ROTATERT for constant rotates.

If a target prefers the expmed.c version then I think it should handle
that in the asm template.

Thanks,
Richard
Kewen.Lin July 15, 2019, 10:39 a.m. UTC | #4
Hi Jakub,

on 2019/7/15 下午4:59, Jakub Jelinek wrote:
> On Mon, Jul 15, 2019 at 04:50:13PM +0800, Kewen.Lin wrote:
>> In match.pd and expmed.c, we have some codes to transform lrotate to 
>> rrotate if rotation count is const.  But they don't consider the target
>> whether supports the rrotate.  It leads to some suboptimal generated
>> code since some optimization can't get expected result by querying
>> target optab.  One typical case is that we miss some rotation 
>> vectorization opportunity on Power.
> 
> This will not do the right thing if neither lrotate nor rrotate is
> supported, you want to canonicalize in that case IMHO.
> The code formatting is wrong (|| at the end of line, overly long lines).
> 
> Finally, what is the reason why Power doesn't support one of the rotates?
> Especially for rotates by constant, you can transform those in the
> define_expand.
> 
> Canonicalizing code is highly desirable during GIMPLE, it means if you have
> say left rotate by 23 in one spot and right rotate by bitsize - 23 in
> another spot, e.g. SCCVN can merge that code.
> 
> So, IMNSHO, just improve the backend.
> 

OK, I see.  Thanks for the explanation.  I'll try to fix it in the backend.


Thanks,
Kewen
Segher Boessenkool July 15, 2019, 2:44 p.m. UTC | #5
On Mon, Jul 15, 2019 at 10:59:29AM +0200, Jakub Jelinek wrote:
> On Mon, Jul 15, 2019 at 04:50:13PM +0800, Kewen.Lin wrote:
> > In match.pd and expmed.c, we have some codes to transform lrotate to 
> > rrotate if rotation count is const.  But they don't consider the target
> > whether supports the rrotate.  It leads to some suboptimal generated
> > code since some optimization can't get expected result by querying
> > target optab.  One typical case is that we miss some rotation 
> > vectorization opportunity on Power.
> 
> This will not do the right thing if neither lrotate nor rrotate is
> supported, you want to canonicalize in that case IMHO.
> The code formatting is wrong (|| at the end of line, overly long lines).
> 
> Finally, what is the reason why Power doesn't support one of the rotates?

Because it is pointless duplication, and we have some dozen rotate
patterns already.

Canonicalising this representation is highly desirable.  Everything in
RTL already does work fine, fwiw (and did since before rotatert
existed).


Segher

Patch
diff mbox series

diff --git a/gcc/expmed.c b/gcc/expmed.c
index d7f8e9a5d76..6cd3341a0e4 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -2491,7 +2491,9 @@  expand_shift_1 (enum tree_code code, machine_mode mode, rtx shifted,
   if (rotate
       && CONST_INT_P (op1)
       && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
-                  GET_MODE_BITSIZE (scalar_mode) - 1))
+                  GET_MODE_BITSIZE (scalar_mode) - 1)
+      && ((left && optab_handler (rrotate_optab, mode) != CODE_FOR_nothing) ||
+          (!left && optab_handler (lrotate_optab, mode) != CODE_FOR_nothing)))
     {
       op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
                                         - INTVAL (op1)));
diff --git a/gcc/match.pd b/gcc/match.pd
index 88dae4231d8..a63cd15e129 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2418,11 +2418,14 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

 /* Rewrite an LROTATE_EXPR by a constant into an
    RROTATE_EXPR by a new constant.  */
+
 (simplify
  (lrotate @0 INTEGER_CST@1)
+ (if ((VECTOR_TYPE_P (type) && target_supports_op_p (type, RROTATE_EXPR, optab_vector))
+  || (!VECTOR_TYPE_P (type) && target_supports_op_p (type, RROTATE_EXPR, optab_scalar)))
  (rrotate @0 { const_binop (MINUS_EXPR, TREE_TYPE (@1),
                            build_int_cst (TREE_TYPE (@1),
-                                          element_precision (type)), @1); }))
+                                          element_precision (type)), @1); })))

 /* Turn (a OP c1) OP c2 into a OP (c1+c2).  */
 (for op (lrotate rrotate rshift lshift)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate.c
new file mode 100644
index 00000000000..16d1f297d2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate.c
@@ -0,0 +1,47 @@ 
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+/* Check LROTATE to RROTATE transformation on const rotation count is disabled
+   on Power by checking optab, it helps vectorizer to exploit vector rotation
+   instructions.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> 9)
+            | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> 5)
+            | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */