diff mbox

Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

Message ID 20141118215235.GP1745@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Nov. 18, 2014, 9:52 p.m. UTC
Hi!

OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because
they are never used for integer arithmetics (and there is no way
to represent all their values in RTL if not using CONST_WIDE_INT).
As the following testcase shows, simplify_immed_subreg can be called
with such modes though, e.g. trying to forward propagate a CONST_VECTOR
(i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear
in the IL directly) into a SUBREG_REG.
The following patch instead of ICE handles the most common cases (all 0
and all 1 CONST_VECTORs) and returns NULL otherwise.

Before wide-int got merged, the testcase worked, though the code didn't
bother checking anything, just created 0 or constm1_rtx for the two cases
that could happen and if something else appeared, could just return what
matched low TImode (or DImode for -m32).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-11-18  Jakub Jelinek  <jakub@redhat.com>

	PR target/63910
	* simplify-rtx.c (simplify_immed_subreg): For integer modes
	wider than MAX_BITSIZE_MODE_ANY_INT, handle all zeros and all ones
	and for other values return NULL_RTX.

	* gcc.target/i386/pr63910.c: New test.


	Jakub

Comments

Mike Stump Nov. 18, 2014, 11:30 p.m. UTC | #1
On Nov 18, 2014, at 1:52 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because
> they are never used for integer arithmetics (and there is no way
> to represent all their values in RTL if not using CONST_WIDE_INT).
> As the following testcase shows, simplify_immed_subreg can be called
> with such modes though, e.g. trying to forward propagate a CONST_VECTOR
> (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear
> in the IL directly) into a SUBREG_REG.
> The following patch instead of ICE handles the most common cases (all 0
> and all 1 CONST_VECTORs) and returns NULL otherwise.
> 
> Before wide-int got merged, the testcase worked, though the code didn't
> bother checking anything, just created 0 or constm1_rtx for the two cases
> that could happen and if something else appeared, could just return what
> matched low TImode (or DImode for -m32).

tmp is sized for MAX_BITSIZE_MODE_ANY_INT, but, you remove the limiter for units that keeps u in bounds.  Doesn’t this access 32 bytes of OImode values in a 16 byte data structure?

Next, from_arrary uses a wide_int, and this from the documentation applies:

   All three flavors of wide_int are represented as a vector of                                                                                      
   HOST_WIDE_INTs.  The default and widest_int vectors contain enough elements                                                                       
   to hold a value of MAX_BITSIZE_MODE_ANY_INT bits.  offset_int contains only                                                                      
   enough elements to hold ADDR_MAX_PRECISION bits.  The values are stored                                                                          
   in the vector with the least significant HOST_BITS_PER_WIDE_INT bits                                                                              
   in element 0.                                                                                                                                     

If you look at the code to from_arrary:

wide_int_storage::from_array (const HOST_WIDE_INT *val, unsigned int len,
                              unsigned int precision, bool need_canon_p)
{
  wide_int result = wide_int::create (precision);
  result.set_len (wi::from_array (result.write_val (), val, len, precision,
                                  need_canon_p));


unsigned int
wi::from_array (HOST_WIDE_INT *val, const HOST_WIDE_INT *xval,
                unsigned int xlen, unsigned int precision, bool need_canon)
{
  for (unsigned i = 0; i < xlen; i++)
    val[i] = xval[i];

it just does a blind copy of all the xlen hunks which forms from units, and units is:

            int units
              = (GET_MODE_BITSIZE (outer_submode) + HOST_BITS_PER_WIDE_INT - 1)
              / HOST_BITS_PER_WIDE_INT;

and GET_MODE_BITSIZE (outer_submode) is > MAX_BITSIZE_MODE_ANY_INT, right?

You can’t copy more bytes than the size of the destination has?
Jakub Jelinek Nov. 18, 2014, 11:42 p.m. UTC | #2
On Tue, Nov 18, 2014 at 03:30:56PM -0800, Mike Stump wrote:
> > Before wide-int got merged, the testcase worked, though the code didn't
> > bother checking anything, just created 0 or constm1_rtx for the two cases
> > that could happen and if something else appeared, could just return what
> > matched low TImode (or DImode for -m32).
> 
> tmp is sized for MAX_BITSIZE_MODE_ANY_INT, but, you remove the limiter for
> units that keeps u in bounds.  Doesn’t this access 32 bytes of OImode
> values in a 16 byte data structure?

No, I'm not touching tmp array at all in that case, only look at vp
individual bytes.  Either they are all 0, or all 0xff, or I return NULL.

> and GET_MODE_BITSIZE (outer_submode) is > MAX_BITSIZE_MODE_ANY_INT, right?
> 
> You can’t copy more bytes than the size of the destination has?

On the testcase we have
(subreg:OI (reg:V8SI 1234) 0)
and try to propagate
(const_vector:V8SI [8 x (const_int 0)])
into it.  All zeros even in OImode (or XImode) is still
(const_int 0), all ones even in those modes is (const_int -1),
which are the only constants I'm using for those ultra-wide modes,
anything else will return NULL (but not ICE).

	Jakub
Mike Stump Nov. 19, 2014, 12:34 a.m. UTC | #3
On Nov 18, 2014, at 3:42 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> No, I'm not touching tmp array at all in that case, only look at vp
> individual bytes.  Either they are all 0, or all 0xff, or I return NULL.

Oh, sorry, I misread where the break; in your code was going.  I might have been misled by:

> -	    gcc_assert (GET_MODE_PRECISION (outer_submode)
> -			<= MAX_BITSIZE_MODE_ANY_INT);

in your patch.  You don’t need that anymore, do you?  If not, can you remove this part.

The rest looks like normal rtl/vector code, I don’t see anything wrong with it.
Jakub Jelinek Nov. 19, 2014, 8:08 a.m. UTC | #4
On Tue, Nov 18, 2014 at 04:34:12PM -0800, Mike Stump wrote:
> On Nov 18, 2014, at 3:42 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > No, I'm not touching tmp array at all in that case, only look at vp
> > individual bytes.  Either they are all 0, or all 0xff, or I return NULL.
> 
> Oh, sorry, I misread where the break; in your code was going.  I might have been misled by:
> 
> > -	    gcc_assert (GET_MODE_PRECISION (outer_submode)
> > -			<= MAX_BITSIZE_MODE_ANY_INT);
> 
> in your patch.  You don’t need that anymore, do you?  If not, can you remove this part.

I thought the assert is unnecessary given the condition just a few lines
above it.  But can keep it, perhaps gcc_checking_assert would be enough, and
hopefully compiler optimizes it away completely.

> 
> The rest looks like normal rtl/vector code, I don’t see anything wrong with it.

	Jakub
Richard Biener Nov. 19, 2014, 8:59 a.m. UTC | #5
On Tue, 18 Nov 2014, Jakub Jelinek wrote:

> Hi!
> 
> OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because
> they are never used for integer arithmetics (and there is no way
> to represent all their values in RTL if not using CONST_WIDE_INT).
> As the following testcase shows, simplify_immed_subreg can be called
> with such modes though, e.g. trying to forward propagate a CONST_VECTOR
> (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear
> in the IL directly) into a SUBREG_REG.

So we have (subreg:OI (reg:V4DF ...) ...) for example?  What do we
end doing with that OI mode subreg?  (why can't we simply use the
V4DF one?)

> The following patch instead of ICE handles the most common cases (all 0
> and all 1 CONST_VECTORs) and returns NULL otherwise.
> 
> Before wide-int got merged, the testcase worked, though the code didn't
> bother checking anything, just created 0 or constm1_rtx for the two cases
> that could happen and if something else appeared, could just return what
> matched low TImode (or DImode for -m32).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Looks ok to me.  Not sure if the zero/all-ones case really happens that
much to be worth special-casing - I think you could use
fixed_wide_int<proper-size> and simply see if the result is representable
in a CONST_INT/CONST_DOUBLE?  Can you try if that works?  It looks like
the patch may be smaller for that.

Thanks,
Richard.

> 2014-11-18  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/63910
> 	* simplify-rtx.c (simplify_immed_subreg): For integer modes
> 	wider than MAX_BITSIZE_MODE_ANY_INT, handle all zeros and all ones
> 	and for other values return NULL_RTX.
> 
> 	* gcc.target/i386/pr63910.c: New test.
> 
> --- gcc/simplify-rtx.c.jj	2014-11-11 00:06:19.000000000 +0100
> +++ gcc/simplify-rtx.c	2014-11-18 10:17:01.198668555 +0100
> @@ -5505,6 +5505,21 @@ simplify_immed_subreg (machine_mode oute
>  	    HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
>  	    wide_int r;
>  
> +	    if (GET_MODE_PRECISION (outer_submode) > MAX_BITSIZE_MODE_ANY_INT)
> +	      {
> +		/* Handle just all zeros and all ones CONST_VECTORs in
> +		   this case.  */
> +		if ((vp[0] & value_mask) == 0)
> +		  elems[elem] = const0_rtx;
> +		else if ((vp[0] & value_mask) == value_mask)
> +		  elems[elem] = constm1_rtx;
> +		else
> +		  return NULL_RTX;
> +		for (i = value_bit; i < elem_bitsize; i += value_bit)
> +		  if ((vp[i / value_bit] & value_mask) != (vp[0] & value_mask))
> +		    return NULL_RTX;
> +		break;
> +	      }
>  	    for (u = 0; u < units; u++)
>  	      {
>  		unsigned HOST_WIDE_INT buf = 0;
> @@ -5516,8 +5531,6 @@ simplify_immed_subreg (machine_mode oute
>  		tmp[u] = buf;
>  		base += HOST_BITS_PER_WIDE_INT;
>  	      }
> -	    gcc_assert (GET_MODE_PRECISION (outer_submode)
> -			<= MAX_BITSIZE_MODE_ANY_INT);
>  	    r = wide_int::from_array (tmp, units,
>  				      GET_MODE_PRECISION (outer_submode));
>  	    elems[elem] = immed_wide_int_const (r, outer_submode);
> --- gcc/testsuite/gcc.target/i386/pr63910.c.jj	2014-11-18 10:12:24.282659318 +0100
> +++ gcc/testsuite/gcc.target/i386/pr63910.c	2014-11-18 10:12:00.000000000 +0100
> @@ -0,0 +1,12 @@
> +/* PR target/63910 */
> +/* { dg-do compile } */
> +/* { dg-options "-O -mstringop-strategy=vector_loop -mavx512f" } */
> +
> +extern void bar (float *c);
> +
> +void
> +foo (void)
> +{
> +  float c[1024] = { };
> +  bar (c);
> +}
> 
> 	Jakub
> 
>
diff mbox

Patch

--- gcc/simplify-rtx.c.jj	2014-11-11 00:06:19.000000000 +0100
+++ gcc/simplify-rtx.c	2014-11-18 10:17:01.198668555 +0100
@@ -5505,6 +5505,21 @@  simplify_immed_subreg (machine_mode oute
 	    HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
 	    wide_int r;
 
+	    if (GET_MODE_PRECISION (outer_submode) > MAX_BITSIZE_MODE_ANY_INT)
+	      {
+		/* Handle just all zeros and all ones CONST_VECTORs in
+		   this case.  */
+		if ((vp[0] & value_mask) == 0)
+		  elems[elem] = const0_rtx;
+		else if ((vp[0] & value_mask) == value_mask)
+		  elems[elem] = constm1_rtx;
+		else
+		  return NULL_RTX;
+		for (i = value_bit; i < elem_bitsize; i += value_bit)
+		  if ((vp[i / value_bit] & value_mask) != (vp[0] & value_mask))
+		    return NULL_RTX;
+		break;
+	      }
 	    for (u = 0; u < units; u++)
 	      {
 		unsigned HOST_WIDE_INT buf = 0;
@@ -5516,8 +5531,6 @@  simplify_immed_subreg (machine_mode oute
 		tmp[u] = buf;
 		base += HOST_BITS_PER_WIDE_INT;
 	      }
-	    gcc_assert (GET_MODE_PRECISION (outer_submode)
-			<= MAX_BITSIZE_MODE_ANY_INT);
 	    r = wide_int::from_array (tmp, units,
 				      GET_MODE_PRECISION (outer_submode));
 	    elems[elem] = immed_wide_int_const (r, outer_submode);
--- gcc/testsuite/gcc.target/i386/pr63910.c.jj	2014-11-18 10:12:24.282659318 +0100
+++ gcc/testsuite/gcc.target/i386/pr63910.c	2014-11-18 10:12:00.000000000 +0100
@@ -0,0 +1,12 @@ 
+/* PR target/63910 */
+/* { dg-do compile } */
+/* { dg-options "-O -mstringop-strategy=vector_loop -mavx512f" } */
+
+extern void bar (float *c);
+
+void
+foo (void)
+{
+  float c[1024] = { };
+  bar (c);
+}