Don't optimize through mode class changing subregs in simplify_shift_const_1 (PR rtl-optimization/45400)

Submitted by Jakub Jelinek on Aug. 25, 2010, 5:01 p.m.

Details

Message ID 20100825170157.GW702@tyan-ft48-01.lab.bos.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Aug. 25, 2010, 5:01 p.m.
Hi!

On this testcase, simplify_shift_const_1 is called on LSHIFTRT 24
(subreg:SI (ashift:V4HI (reg:V4HI ...) (const_int 16))
varop and happily first optimizes this into AND 255 on the
reg:V4HI and then crashes soon afterwards.
IMHO the only case when we should look through low part subregs
is when both modes are MODE_INT, when crossing mode class boundary
it will be wrong (e.g. vector mode shift is something completely

	Jakub

Comments

Richard Guenther Aug. 25, 2010, 5:30 p.m.
On Wed, Aug 25, 2010 at 7:01 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> On this testcase, simplify_shift_const_1 is called on LSHIFTRT 24
> (subreg:SI (ashift:V4HI (reg:V4HI ...) (const_int 16))
> varop and happily first optimizes this into AND 255 on the
> reg:V4HI and then crashes soon afterwards.
> IMHO the only case when we should look through low part subregs
> is when both modes are MODE_INT, when crossing mode class boundary
> it will be wrong (e.g. vector mode shift is something completely
> different from integral shift) and even when changing vector mode size,
> or doing subreg between different floating point modes etc. isn't going
> to work very well.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux.
> Ok for trunk/4.5?

Ok.

Thanks,
Richard.

> 2010-08-25  Jakub Jelinek  <jakub@redhat.com>
>
>        PR rtl-optimization/45400
>        * combine.c (simplify_shift_const_1) <case SUBREG>: Only use
>        SUBREG_REG if both modes are of MODE_INT class.
>
>        * g++.dg/other/i386-8.C: New test.
>
> --- gcc/combine.c.jj    2010-08-12 11:12:13.000000000 +0200
> +++ gcc/combine.c       2010-08-25 11:10:01.694939344 +0200
> @@ -9505,7 +9505,9 @@ simplify_shift_const_1 (enum rtx_code co
>                  > GET_MODE_SIZE (GET_MODE (varop)))
>              && (unsigned int) ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (varop)))
>                                  + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
> -                == mode_words)
> +                == mode_words
> +             && GET_MODE_CLASS (GET_MODE (varop)) == MODE_INT
> +             && GET_MODE_CLASS (GET_MODE (SUBREG_REG (varop))) == MODE_INT)
>            {
>              varop = SUBREG_REG (varop);
>              if (GET_MODE_SIZE (GET_MODE (varop)) > GET_MODE_SIZE (mode))
> --- gcc/testsuite/g++.dg/other/i386-8.C.jj      2010-08-25 11:19:54.799157615 +0200
> +++ gcc/testsuite/g++.dg/other/i386-8.C 2010-08-25 11:21:20.688108083 +0200
> @@ -0,0 +1,23 @@
> +// PR rtl-optimization/45400
> +// { dg-do compile { target i?86-*-* x86_64-*-* } }
> +// { dg-options "-O2 -msse2" }
> +// { dg-options "-O2 -msse2 -fpic" { target fpic } }
> +// { dg-require-effective-target sse2 }
> +
> +#include <xmmintrin.h>
> +
> +static inline unsigned short
> +bar (unsigned short x)
> +{
> +  return ((x << 8) | (x >> 8));
> +}
> +
> +unsigned int
> +foo (float *x, short *y)
> +{
> +  __m128 a = _mm_set_ps1 (32767.5f);
> +  __m128 b = _mm_mul_ps (_mm_load_ps (x), a);
> +  __m64 c = _mm_cvtps_pi16 (b);
> +  __builtin_memcpy (y, &c, sizeof (short) * 4);
> +  y[0] = bar (y[0]);
> +}
>
>        Jakub
>

Patch hide | download patch | download mbox

different from integral shift) and even when changing vector mode size,
or doing subreg between different floating point modes etc. isn't going
to work very well.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux.
Ok for trunk/4.5?

2010-08-25  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/45400
	* combine.c (simplify_shift_const_1) <case SUBREG>: Only use
	SUBREG_REG if both modes are of MODE_INT class.

	* g++.dg/other/i386-8.C: New test.

--- gcc/combine.c.jj	2010-08-12 11:12:13.000000000 +0200
+++ gcc/combine.c	2010-08-25 11:10:01.694939344 +0200
@@ -9505,7 +9505,9 @@  simplify_shift_const_1 (enum rtx_code co
 		  > GET_MODE_SIZE (GET_MODE (varop)))
 	      && (unsigned int) ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (varop)))
 				  + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
-		 == mode_words)
+		 == mode_words
+	      && GET_MODE_CLASS (GET_MODE (varop)) == MODE_INT
+	      && GET_MODE_CLASS (GET_MODE (SUBREG_REG (varop))) == MODE_INT)
 	    {
 	      varop = SUBREG_REG (varop);
 	      if (GET_MODE_SIZE (GET_MODE (varop)) > GET_MODE_SIZE (mode))
--- gcc/testsuite/g++.dg/other/i386-8.C.jj	2010-08-25 11:19:54.799157615 +0200
+++ gcc/testsuite/g++.dg/other/i386-8.C	2010-08-25 11:21:20.688108083 +0200
@@ -0,0 +1,23 @@ 
+// PR rtl-optimization/45400
+// { dg-do compile { target i?86-*-* x86_64-*-* } }
+// { dg-options "-O2 -msse2" }
+// { dg-options "-O2 -msse2 -fpic" { target fpic } }
+// { dg-require-effective-target sse2 }
+
+#include <xmmintrin.h>
+
+static inline unsigned short
+bar (unsigned short x)
+{
+  return ((x << 8) | (x >> 8));
+}
+
+unsigned int
+foo (float *x, short *y)
+{
+  __m128 a = _mm_set_ps1 (32767.5f);
+  __m128 b = _mm_mul_ps (_mm_load_ps (x), a);
+  __m64 c = _mm_cvtps_pi16 (b);
+  __builtin_memcpy (y, &c, sizeof (short) * 4);
+  y[0] = bar (y[0]);
+}