diff mbox

Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

Message ID 20141119114947.GD1745@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Nov. 19, 2014, 11:49 a.m. UTC
On Wed, Nov 19, 2014 at 09:59:45AM +0100, Richard Biener wrote:
> > OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because
> > they are never used for integer arithmetics (and there is no way
> > to represent all their values in RTL if not using CONST_WIDE_INT).
> > As the following testcase shows, simplify_immed_subreg can be called
> > with such modes though, e.g. trying to forward propagate a CONST_VECTOR
> > (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear
> > in the IL directly) into a SUBREG_REG.
> 
> So we have (subreg:OI (reg:V4DF ...) ...) for example?  What do we
> end doing with that OI mode subreg?  (why can't we simply use the
> V4DF one?)

propagate_rtx_1 is called on
(subreg:OI (reg:V8DI 89) 0)
with old_rtx
(reg:V8DI 89)
and new_rtx
(const_vector:V8DI [
        (const_int 0 [0])
        (const_int 0 [0])
        (const_int 0 [0])
        (const_int 0 [0])
        (const_int 0 [0])
        (const_int 0 [0])
        (const_int 0 [0])
        (const_int 0 [0])
    ])

Seems we throw away the result in that case though, because
gen_lowpart_common doesn't like to return low parts of VOIDmode constants
larger if the mode is larger than double int:
1353	  innermode = GET_MODE (x);
1354	  if (CONST_INT_P (x)
1355	      && msize * BITS_PER_UNIT <= HOST_BITS_PER_WIDE_INT)
1356	    innermode = mode_for_size (HOST_BITS_PER_WIDE_INT, MODE_INT, 0);
1357	  else if (innermode == VOIDmode)
1358	    innermode = mode_for_size (HOST_BITS_PER_DOUBLE_INT, MODE_INT, 0);
we hit the last if and as mode is wider than innermode, we return NULL later
on.

> > The following patch instead of ICE handles the most common cases (all 0
> > and all 1 CONST_VECTORs) and returns NULL otherwise.
> > 
> > Before wide-int got merged, the testcase worked, though the code didn't
> > bother checking anything, just created 0 or constm1_rtx for the two cases
> > that could happen and if something else appeared, could just return what
> > matched low TImode (or DImode for -m32).
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Looks ok to me.  Not sure if the zero/all-ones case really happens that
> much to be worth special-casing - I think you could use
> fixed_wide_int<proper-size> and simply see if the result is representable
> in a CONST_INT/CONST_DOUBLE?  Can you try if that works?  It looks like
> the patch may be smaller for that.

So perhaps something like this?  Don't know how much more inefficient it is
compared to what it used to do before.

Or just keep the existing code and just remove the assert and instead return
NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
least during propagation that will make zero change.
Though, in that case I have still doubts about the current code handling right
modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If TARGET_SUPPORTS_WIDE_INT
== 0, we still silently throw away the upper bits, don't we?

BTW, to Mike, the assert has been misplaced, there was first buffer overflow
and only after that the assert.

2014-11-19  Jakub Jelinek  <jakub@redhat.com>

	PR target/63910
	* simplify-rtx.c (simplify_immed_subreg): Handle integer modes wider
	than MAX_BITSIZE_MODE_ANY_INT.

	* gcc.target/i386/pr63910.c: New test.



	Jakub

Comments

Richard Biener Nov. 19, 2014, 11:59 a.m. UTC | #1
On Wed, 19 Nov 2014, Jakub Jelinek wrote:

> On Wed, Nov 19, 2014 at 09:59:45AM +0100, Richard Biener wrote:
> > > OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because
> > > they are never used for integer arithmetics (and there is no way
> > > to represent all their values in RTL if not using CONST_WIDE_INT).
> > > As the following testcase shows, simplify_immed_subreg can be called
> > > with such modes though, e.g. trying to forward propagate a CONST_VECTOR
> > > (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear
> > > in the IL directly) into a SUBREG_REG.
> > 
> > So we have (subreg:OI (reg:V4DF ...) ...) for example?  What do we
> > end doing with that OI mode subreg?  (why can't we simply use the
> > V4DF one?)
> 
> propagate_rtx_1 is called on
> (subreg:OI (reg:V8DI 89) 0)
> with old_rtx
> (reg:V8DI 89)
> and new_rtx
> (const_vector:V8DI [
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>     ])
> 
> Seems we throw away the result in that case though, because
> gen_lowpart_common doesn't like to return low parts of VOIDmode constants
> larger if the mode is larger than double int:
> 1353	  innermode = GET_MODE (x);
> 1354	  if (CONST_INT_P (x)
> 1355	      && msize * BITS_PER_UNIT <= HOST_BITS_PER_WIDE_INT)
> 1356	    innermode = mode_for_size (HOST_BITS_PER_WIDE_INT, MODE_INT, 0);
> 1357	  else if (innermode == VOIDmode)
> 1358	    innermode = mode_for_size (HOST_BITS_PER_DOUBLE_INT, MODE_INT, 0);
> we hit the last if and as mode is wider than innermode, we return NULL later
> on.
> 
> > > The following patch instead of ICE handles the most common cases (all 0
> > > and all 1 CONST_VECTORs) and returns NULL otherwise.
> > > 
> > > Before wide-int got merged, the testcase worked, though the code didn't
> > > bother checking anything, just created 0 or constm1_rtx for the two cases
> > > that could happen and if something else appeared, could just return what
> > > matched low TImode (or DImode for -m32).
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Looks ok to me.  Not sure if the zero/all-ones case really happens that
> > much to be worth special-casing - I think you could use
> > fixed_wide_int<proper-size> and simply see if the result is representable
> > in a CONST_INT/CONST_DOUBLE?  Can you try if that works?  It looks like
> > the patch may be smaller for that.
> 
> So perhaps something like this?  Don't know how much more inefficient it is
> compared to what it used to do before.

Yes, that looks good.

> Or just keep the existing code and just remove the assert and instead return
> NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
> least during propagation that will make zero change.
> Though, in that case I have still doubts about the current code handling right
> modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
> MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If TARGET_SUPPORTS_WIDE_INT
> == 0, we still silently throw away the upper bits, don't we?

Well - not with your added check, no?

I'd say the patch is ok.

Thanks,
Richard.

> BTW, to Mike, the assert has been misplaced, there was first buffer overflow
> and only after that the assert.
> 
> 2014-11-19  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/63910
> 	* simplify-rtx.c (simplify_immed_subreg): Handle integer modes wider
> 	than MAX_BITSIZE_MODE_ANY_INT.
> 
> 	* gcc.target/i386/pr63910.c: New test.
> 
> --- gcc/simplify-rtx.c.jj	2014-11-19 09:17:15.491327992 +0100
> +++ gcc/simplify-rtx.c	2014-11-19 12:28:16.223808178 +0100
> @@ -5501,8 +5501,12 @@ simplify_immed_subreg (machine_mode oute
>  	    int units
>  	      = (GET_MODE_BITSIZE (outer_submode) + HOST_BITS_PER_WIDE_INT - 1)
>  	      / HOST_BITS_PER_WIDE_INT;
> -	    HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
> -	    wide_int r;
> +	    const int tmpsize = (MAX_BITSIZE_MODE_ANY_MODE 
> +				 + HOST_BITS_PER_WIDE_INT - 1)
> +				/ HOST_BITS_PER_WIDE_INT;
> +	    HOST_WIDE_INT tmp[tmpsize];
> +	    typedef FIXED_WIDE_INT (tmpsize * HOST_BITS_PER_WIDE_INT) imm_int;
> +	    imm_int r;
>  
>  	    for (u = 0; u < units; u++)
>  	      {
> @@ -5515,10 +5519,12 @@ simplify_immed_subreg (machine_mode oute
>  		tmp[u] = buf;
>  		base += HOST_BITS_PER_WIDE_INT;
>  	      }
> -	    gcc_assert (GET_MODE_PRECISION (outer_submode)
> -			<= MAX_BITSIZE_MODE_ANY_INT);
> -	    r = wide_int::from_array (tmp, units,
> -				      GET_MODE_PRECISION (outer_submode));
> +	    r = imm_int::from_array (tmp, units,
> +				     GET_MODE_PRECISION (outer_submode));
> +#if TARGET_SUPPORTS_WIDE_INT == 0
> +	    if (wi::min_precision (r, SIGNED) > HOST_BITS_PER_DOUBLE_INT)
> +	      return NULL_RTX;
> +#endif
>  	    elems[elem] = immed_wide_int_const (r, outer_submode);
>  	  }
>  	  break;
> --- gcc/testsuite/gcc.target/i386/pr63910.c.jj	2014-11-19 12:04:23.490489130 +0100
> +++ gcc/testsuite/gcc.target/i386/pr63910.c	2014-11-19 12:04:23.490489130 +0100
> @@ -0,0 +1,12 @@
> +/* PR target/63910 */
> +/* { dg-do compile } */
> +/* { dg-options "-O -mstringop-strategy=vector_loop -mavx512f" } */
> +
> +extern void bar (float *c);
> +
> +void
> +foo (void)
> +{
> +  float c[1024] = { };
> +  bar (c);
> +}
> 
> 
> 	Jakub
> 
>
Jakub Jelinek Nov. 19, 2014, 12:07 p.m. UTC | #2
On Wed, Nov 19, 2014 at 12:59:06PM +0100, Richard Biener wrote:
> > So perhaps something like this?  Don't know how much more inefficient it is
> > compared to what it used to do before.
> 
> Yes, that looks good.
> 
> > Or just keep the existing code and just remove the assert and instead return
> > NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
> > least during propagation that will make zero change.
> > Though, in that case I have still doubts about the current code handling right
> > modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
> > MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If TARGET_SUPPORTS_WIDE_INT
> > == 0, we still silently throw away the upper bits, don't we?
> 
> Well - not with your added check, no?

For TARGET_SUPPORTS_WIDE_INT == 0 should be hopefully ok.  Not sure
about TARGET_SUPPORTS_WIDE_INT != 0, can it express any generic_wide_int, or
is it still bound to wide_int (i.e. MAX_BITSIZE_MODE_ANY_INT rounded up)
precision?  Mike?

	Jakub
Richard Biener Nov. 19, 2014, 12:24 p.m. UTC | #3
On Wed, 19 Nov 2014, Jakub Jelinek wrote:

> On Wed, Nov 19, 2014 at 12:59:06PM +0100, Richard Biener wrote:
> > > So perhaps something like this?  Don't know how much more inefficient it is
> > > compared to what it used to do before.
> > 
> > Yes, that looks good.
> > 
> > > Or just keep the existing code and just remove the assert and instead return
> > > NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
> > > least during propagation that will make zero change.
> > > Though, in that case I have still doubts about the current code handling right
> > > modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
> > > MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If TARGET_SUPPORTS_WIDE_INT
> > > == 0, we still silently throw away the upper bits, don't we?
> > 
> > Well - not with your added check, no?
> 
> For TARGET_SUPPORTS_WIDE_INT == 0 should be hopefully ok.  Not sure
> about TARGET_SUPPORTS_WIDE_INT != 0, can it express any generic_wide_int, or
> is it still bound to wide_int (i.e. MAX_BITSIZE_MODE_ANY_INT rounded up)
> precision?  Mike?

It can represent any - well, the RTX at least.  Code then using
"simple" wide_int may wreck then though.

Richard.
Mike Stump Nov. 19, 2014, 6:38 p.m. UTC | #4
On Nov 19, 2014, at 4:24 AM, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 19 Nov 2014, Jakub Jelinek wrote:
>> For TARGET_SUPPORTS_WIDE_INT == 0 should be hopefully ok.  Not sure
>> about TARGET_SUPPORTS_WIDE_INT != 0, can it express any generic_wide_int, or
>> is it still bound to wide_int (i.e. MAX_BITSIZE_MODE_ANY_INT rounded up)
>> precision?  Mike?
> 
> It can represent any - well, the RTX at least.  Code then using
> "simple" wide_int may wreck then though.

So, my worry is this…  once you start in on adding support here or there for int modes wider than the largest supported int mode, you create a never ending maintenance nightmare with complex rules that one will never be able to keep straight.  There needs to be a single line or two, that explains the rules that we all agree to, then it will always be clear what the rule is.  The largest supported int mode is: x, has a nice, simple, easy to explain aspect to it.

If one looks at the rtl optimization code, and all the code that cracks rtl integer constants and plays with them, I don’t think _all_ that code is larger than max safe.  I have reason to believe it is all reasonable safe up to the max however; that was our contribution in wide-int.

Originally I was going to say, this patch needs a int larger than MAX_BITSIZE_MODE_ANY_INT person to review it, meaning, that isn’t me.  It still isn’t me; so, I defer that to someone that wants to try and keep those rules straight and memorize them and ensure the totality of the compiler actually works for theses cases by design.  Luckily, that only affects x86, as it is the only port to try and slice and dice like this.  So, I’ll defer to others on that.
Mike Stump Nov. 19, 2014, 6:53 p.m. UTC | #5
On Nov 19, 2014, at 4:07 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> For TARGET_SUPPORTS_WIDE_INT == 0 should be hopefully ok.  Not sure
> about TARGET_SUPPORTS_WIDE_INT != 0, can it express any generic_wide_int, or
> is it still bound to wide_int (i.e. MAX_BITSIZE_MODE_ANY_INT rounded up)
> precision?  Mike?

So, you can generate any finite size const_int you want.  That is safe.  The question, what does the entire rest of the compiler do with these bigger than max things, well that’s the part that I’ll defer to others on.  Generally when Kenny and I did the code, I think we had a tendency to treat wide_int as a maximal size.
Richard Sandiford Nov. 19, 2014, 9:45 p.m. UTC | #6
Jakub Jelinek <jakub@redhat.com> writes:
> On Wed, Nov 19, 2014 at 09:59:45AM +0100, Richard Biener wrote:
>> > OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because
>> > they are never used for integer arithmetics (and there is no way
>> > to represent all their values in RTL if not using CONST_WIDE_INT).
>> > As the following testcase shows, simplify_immed_subreg can be called
>> > with such modes though, e.g. trying to forward propagate a CONST_VECTOR
>> > (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear
>> > in the IL directly) into a SUBREG_REG.
>> 
>> So we have (subreg:OI (reg:V4DF ...) ...) for example?  What do we
>> end doing with that OI mode subreg?  (why can't we simply use the
>> V4DF one?)
>
> propagate_rtx_1 is called on
> (subreg:OI (reg:V8DI 89) 0)
> with old_rtx
> (reg:V8DI 89)
> and new_rtx
> (const_vector:V8DI [
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>         (const_int 0 [0])
>     ])
>
> Seems we throw away the result in that case though, because
> gen_lowpart_common doesn't like to return low parts of VOIDmode constants
> larger if the mode is larger than double int:
> 1353	  innermode = GET_MODE (x);
> 1354	  if (CONST_INT_P (x)
> 1355	      && msize * BITS_PER_UNIT <= HOST_BITS_PER_WIDE_INT)
> 1356	    innermode = mode_for_size (HOST_BITS_PER_WIDE_INT, MODE_INT, 0);
> 1357	  else if (innermode == VOIDmode)
> 1358	    innermode = mode_for_size (HOST_BITS_PER_DOUBLE_INT, MODE_INT, 0);
> we hit the last if and as mode is wider than innermode, we return NULL later
> on.
>
>> > The following patch instead of ICE handles the most common cases (all 0
>> > and all 1 CONST_VECTORs) and returns NULL otherwise.
>> > 
>> > Before wide-int got merged, the testcase worked, though the code didn't
>> > bother checking anything, just created 0 or constm1_rtx for the two cases
>> > that could happen and if something else appeared, could just return what
>> > matched low TImode (or DImode for -m32).
>> > 
>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> 
>> Looks ok to me.  Not sure if the zero/all-ones case really happens that
>> much to be worth special-casing - I think you could use
>> fixed_wide_int<proper-size> and simply see if the result is representable
>> in a CONST_INT/CONST_DOUBLE?  Can you try if that works?  It looks like
>> the patch may be smaller for that.
>
> So perhaps something like this?  Don't know how much more inefficient it is
> compared to what it used to do before.

I think the effect is that OImode integers are interesting on x86 if the
value fits in a sign-extended TImode, but not otherwise.  Is that right?
In some ways it seems like an arbitrary distinction.  If any nonzero
OImode values are potentially interesting then aren't all of them potentially
interesting?

> Or just keep the existing code and just remove the assert and instead return
> NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
> least during propagation that will make zero change.

Sounds good to me FWIW, especially if there are no known cases where we
want a nonnull return for modes wider than MAX_BITSIZE_MODE_ANY_INT.
(I assume the assert would have triggered for them too.)

Thanks,
Richard
diff mbox

Patch

--- gcc/simplify-rtx.c.jj	2014-11-19 09:17:15.491327992 +0100
+++ gcc/simplify-rtx.c	2014-11-19 12:28:16.223808178 +0100
@@ -5501,8 +5501,12 @@  simplify_immed_subreg (machine_mode oute
 	    int units
 	      = (GET_MODE_BITSIZE (outer_submode) + HOST_BITS_PER_WIDE_INT - 1)
 	      / HOST_BITS_PER_WIDE_INT;
-	    HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
-	    wide_int r;
+	    const int tmpsize = (MAX_BITSIZE_MODE_ANY_MODE 
+				 + HOST_BITS_PER_WIDE_INT - 1)
+				/ HOST_BITS_PER_WIDE_INT;
+	    HOST_WIDE_INT tmp[tmpsize];
+	    typedef FIXED_WIDE_INT (tmpsize * HOST_BITS_PER_WIDE_INT) imm_int;
+	    imm_int r;
 
 	    for (u = 0; u < units; u++)
 	      {
@@ -5515,10 +5519,12 @@  simplify_immed_subreg (machine_mode oute
 		tmp[u] = buf;
 		base += HOST_BITS_PER_WIDE_INT;
 	      }
-	    gcc_assert (GET_MODE_PRECISION (outer_submode)
-			<= MAX_BITSIZE_MODE_ANY_INT);
-	    r = wide_int::from_array (tmp, units,
-				      GET_MODE_PRECISION (outer_submode));
+	    r = imm_int::from_array (tmp, units,
+				     GET_MODE_PRECISION (outer_submode));
+#if TARGET_SUPPORTS_WIDE_INT == 0
+	    if (wi::min_precision (r, SIGNED) > HOST_BITS_PER_DOUBLE_INT)
+	      return NULL_RTX;
+#endif
 	    elems[elem] = immed_wide_int_const (r, outer_submode);
 	  }
 	  break;
--- gcc/testsuite/gcc.target/i386/pr63910.c.jj	2014-11-19 12:04:23.490489130 +0100
+++ gcc/testsuite/gcc.target/i386/pr63910.c	2014-11-19 12:04:23.490489130 +0100
@@ -0,0 +1,12 @@ 
+/* PR target/63910 */
+/* { dg-do compile } */
+/* { dg-options "-O -mstringop-strategy=vector_loop -mavx512f" } */
+
+extern void bar (float *c);
+
+void
+foo (void)
+{
+  float c[1024] = { };
+  bar (c);
+}