Patchwork Don't optimize through mode class changing subregs in simplify_shift_const_1 (PR rtl-optimization/45400)

login
register
mail settings
Submitter Jakub Jelinek
Date Aug. 25, 2010, 5:01 p.m.
Message ID <20100825170157.GW702@tyan-ft48-01.lab.bos.redhat.com>
Download mbox | patch
Permalink /patch/62725/
State New
Headers show

Comments

Jakub Jelinek - Aug. 25, 2010, 5:01 p.m.
Hi!

On this testcase, simplify_shift_const_1 is called on LSHIFTRT 24
(subreg:SI (ashift:V4HI (reg:V4HI ...) (const_int 16))
varop and happily first optimizes this into AND 255 on the
reg:V4HI and then crashes soon afterwards.
IMHO the only case when we should look through low part subregs
is when both modes are MODE_INT, when crossing mode class boundary
it will be wrong (e.g. vector mode shift is something completely

	Jakub
Richard Guenther - Aug. 25, 2010, 5:30 p.m.
On Wed, Aug 25, 2010 at 7:01 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> On this testcase, simplify_shift_const_1 is called on LSHIFTRT 24
> (subreg:SI (ashift:V4HI (reg:V4HI ...) (const_int 16))
> varop and happily first optimizes this into AND 255 on the
> reg:V4HI and then crashes soon afterwards.
> IMHO the only case when we should look through low part subregs
> is when both modes are MODE_INT, when crossing mode class boundary
> it will be wrong (e.g. vector mode shift is something completely
> different from integral shift) and even when changing vector mode size,
> or doing subreg between different floating point modes etc. isn't going
> to work very well.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux.
> Ok for trunk/4.5?

Ok.

Thanks,
Richard.

> 2010-08-25  Jakub Jelinek  <jakub@redhat.com>
>
>        PR rtl-optimization/45400
>        * combine.c (simplify_shift_const_1) <case SUBREG>: Only use
>        SUBREG_REG if both modes are of MODE_INT class.
>
>        * g++.dg/other/i386-8.C: New test.
>
> --- gcc/combine.c.jj    2010-08-12 11:12:13.000000000 +0200
> +++ gcc/combine.c       2010-08-25 11:10:01.694939344 +0200
> @@ -9505,7 +9505,9 @@ simplify_shift_const_1 (enum rtx_code co
>                  > GET_MODE_SIZE (GET_MODE (varop)))
>              && (unsigned int) ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (varop)))
>                                  + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
> -                == mode_words)
> +                == mode_words
> +             && GET_MODE_CLASS (GET_MODE (varop)) == MODE_INT
> +             && GET_MODE_CLASS (GET_MODE (SUBREG_REG (varop))) == MODE_INT)
>            {
>              varop = SUBREG_REG (varop);
>              if (GET_MODE_SIZE (GET_MODE (varop)) > GET_MODE_SIZE (mode))
> --- gcc/testsuite/g++.dg/other/i386-8.C.jj      2010-08-25 11:19:54.799157615 +0200
> +++ gcc/testsuite/g++.dg/other/i386-8.C 2010-08-25 11:21:20.688108083 +0200
> @@ -0,0 +1,23 @@
> +// PR rtl-optimization/45400
> +// { dg-do compile { target i?86-*-* x86_64-*-* } }
> +// { dg-options "-O2 -msse2" }
> +// { dg-options "-O2 -msse2 -fpic" { target fpic } }
> +// { dg-require-effective-target sse2 }
> +
> +#include <xmmintrin.h>
> +
> +static inline unsigned short
> +bar (unsigned short x)
> +{
> +  return ((x << 8) | (x >> 8));
> +}
> +
> +unsigned int
> +foo (float *x, short *y)
> +{
> +  __m128 a = _mm_set_ps1 (32767.5f);
> +  __m128 b = _mm_mul_ps (_mm_load_ps (x), a);
> +  __m64 c = _mm_cvtps_pi16 (b);
> +  __builtin_memcpy (y, &c, sizeof (short) * 4);
> +  y[0] = bar (y[0]);
> +}
>
>        Jakub
>

Patch

different from integral shift) and even when changing vector mode size,
or doing subreg between different floating point modes etc. isn't going
to work very well.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux.
Ok for trunk/4.5?

2010-08-25  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/45400
	* combine.c (simplify_shift_const_1) <case SUBREG>: Only use
	SUBREG_REG if both modes are of MODE_INT class.

	* g++.dg/other/i386-8.C: New test.

--- gcc/combine.c.jj	2010-08-12 11:12:13.000000000 +0200
+++ gcc/combine.c	2010-08-25 11:10:01.694939344 +0200
@@ -9505,7 +9505,9 @@  simplify_shift_const_1 (enum rtx_code co
 		  > GET_MODE_SIZE (GET_MODE (varop)))
 	      && (unsigned int) ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (varop)))
 				  + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
-		 == mode_words)
+		 == mode_words
+	      && GET_MODE_CLASS (GET_MODE (varop)) == MODE_INT
+	      && GET_MODE_CLASS (GET_MODE (SUBREG_REG (varop))) == MODE_INT)
 	    {
 	      varop = SUBREG_REG (varop);
 	      if (GET_MODE_SIZE (GET_MODE (varop)) > GET_MODE_SIZE (mode))
--- gcc/testsuite/g++.dg/other/i386-8.C.jj	2010-08-25 11:19:54.799157615 +0200
+++ gcc/testsuite/g++.dg/other/i386-8.C	2010-08-25 11:21:20.688108083 +0200
@@ -0,0 +1,23 @@ 
+// PR rtl-optimization/45400
+// { dg-do compile { target i?86-*-* x86_64-*-* } }
+// { dg-options "-O2 -msse2" }
+// { dg-options "-O2 -msse2 -fpic" { target fpic } }
+// { dg-require-effective-target sse2 }
+
+#include <xmmintrin.h>
+
+static inline unsigned short
+bar (unsigned short x)
+{
+  return ((x << 8) | (x >> 8));
+}
+
+unsigned int
+foo (float *x, short *y)
+{
+  __m128 a = _mm_set_ps1 (32767.5f);
+  __m128 b = _mm_mul_ps (_mm_load_ps (x), a);
+  __m64 c = _mm_cvtps_pi16 (b);
+  __builtin_memcpy (y, &c, sizeof (short) * 4);
+  y[0] = bar (y[0]);
+}