Message ID | 20141118215235.GP1745@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
On Nov 18, 2014, at 1:52 PM, Jakub Jelinek <jakub@redhat.com> wrote: > OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because > they are never used for integer arithmetics (and there is no way > to represent all their values in RTL if not using CONST_WIDE_INT). > As the following testcase shows, simplify_immed_subreg can be called > with such modes though, e.g. trying to forward propagate a CONST_VECTOR > (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear > in the IL directly) into a SUBREG_REG. > The following patch instead of ICE handles the most common cases (all 0 > and all 1 CONST_VECTORs) and returns NULL otherwise. > > Before wide-int got merged, the testcase worked, though the code didn't > bother checking anything, just created 0 or constm1_rtx for the two cases > that could happen and if something else appeared, could just return what > matched low TImode (or DImode for -m32). tmp is sized for MAX_BITSIZE_MODE_ANY_INT, but, you remove the limiter for units that keeps u in bounds. Doesn’t this access 32 bytes of OImode values in a 16 byte data structure? Next, from_arrary uses a wide_int, and this from the documentation applies: All three flavors of wide_int are represented as a vector of HOST_WIDE_INTs. The default and widest_int vectors contain enough elements to hold a value of MAX_BITSIZE_MODE_ANY_INT bits. offset_int contains only enough elements to hold ADDR_MAX_PRECISION bits. The values are stored in the vector with the least significant HOST_BITS_PER_WIDE_INT bits in element 0. If you look at the code to from_arrary: wide_int_storage::from_array (const HOST_WIDE_INT *val, unsigned int len, unsigned int precision, bool need_canon_p) { wide_int result = wide_int::create (precision); result.set_len (wi::from_array (result.write_val (), val, len, precision, need_canon_p)); unsigned int wi::from_array (HOST_WIDE_INT *val, const HOST_WIDE_INT *xval, unsigned int xlen, unsigned int precision, bool need_canon) { for (unsigned i = 0; i < xlen; i++) val[i] = xval[i]; it just does a blind copy of all the xlen hunks which forms from units, and units is: int units = (GET_MODE_BITSIZE (outer_submode) + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT; and GET_MODE_BITSIZE (outer_submode) is > MAX_BITSIZE_MODE_ANY_INT, right? You can’t copy more bytes than the size of the destination has?
On Tue, Nov 18, 2014 at 03:30:56PM -0800, Mike Stump wrote: > > Before wide-int got merged, the testcase worked, though the code didn't > > bother checking anything, just created 0 or constm1_rtx for the two cases > > that could happen and if something else appeared, could just return what > > matched low TImode (or DImode for -m32). > > tmp is sized for MAX_BITSIZE_MODE_ANY_INT, but, you remove the limiter for > units that keeps u in bounds. Doesn’t this access 32 bytes of OImode > values in a 16 byte data structure? No, I'm not touching tmp array at all in that case, only look at vp individual bytes. Either they are all 0, or all 0xff, or I return NULL. > and GET_MODE_BITSIZE (outer_submode) is > MAX_BITSIZE_MODE_ANY_INT, right? > > You can’t copy more bytes than the size of the destination has? On the testcase we have (subreg:OI (reg:V8SI 1234) 0) and try to propagate (const_vector:V8SI [8 x (const_int 0)]) into it. All zeros even in OImode (or XImode) is still (const_int 0), all ones even in those modes is (const_int -1), which are the only constants I'm using for those ultra-wide modes, anything else will return NULL (but not ICE). Jakub
On Nov 18, 2014, at 3:42 PM, Jakub Jelinek <jakub@redhat.com> wrote: > No, I'm not touching tmp array at all in that case, only look at vp > individual bytes. Either they are all 0, or all 0xff, or I return NULL. Oh, sorry, I misread where the break; in your code was going. I might have been misled by: > - gcc_assert (GET_MODE_PRECISION (outer_submode) > - <= MAX_BITSIZE_MODE_ANY_INT); in your patch. You don’t need that anymore, do you? If not, can you remove this part. The rest looks like normal rtl/vector code, I don’t see anything wrong with it.
On Tue, Nov 18, 2014 at 04:34:12PM -0800, Mike Stump wrote: > On Nov 18, 2014, at 3:42 PM, Jakub Jelinek <jakub@redhat.com> wrote: > > No, I'm not touching tmp array at all in that case, only look at vp > > individual bytes. Either they are all 0, or all 0xff, or I return NULL. > > Oh, sorry, I misread where the break; in your code was going. I might have been misled by: > > > - gcc_assert (GET_MODE_PRECISION (outer_submode) > > - <= MAX_BITSIZE_MODE_ANY_INT); > > in your patch. You don’t need that anymore, do you? If not, can you remove this part. I thought the assert is unnecessary given the condition just a few lines above it. But can keep it, perhaps gcc_checking_assert would be enough, and hopefully compiler optimizes it away completely. > > The rest looks like normal rtl/vector code, I don’t see anything wrong with it. Jakub
On Tue, 18 Nov 2014, Jakub Jelinek wrote: > Hi! > > OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because > they are never used for integer arithmetics (and there is no way > to represent all their values in RTL if not using CONST_WIDE_INT). > As the following testcase shows, simplify_immed_subreg can be called > with such modes though, e.g. trying to forward propagate a CONST_VECTOR > (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear > in the IL directly) into a SUBREG_REG. So we have (subreg:OI (reg:V4DF ...) ...) for example? What do we end doing with that OI mode subreg? (why can't we simply use the V4DF one?) > The following patch instead of ICE handles the most common cases (all 0 > and all 1 CONST_VECTORs) and returns NULL otherwise. > > Before wide-int got merged, the testcase worked, though the code didn't > bother checking anything, just created 0 or constm1_rtx for the two cases > that could happen and if something else appeared, could just return what > matched low TImode (or DImode for -m32). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Looks ok to me. Not sure if the zero/all-ones case really happens that much to be worth special-casing - I think you could use fixed_wide_int<proper-size> and simply see if the result is representable in a CONST_INT/CONST_DOUBLE? Can you try if that works? It looks like the patch may be smaller for that. Thanks, Richard. > 2014-11-18 Jakub Jelinek <jakub@redhat.com> > > PR target/63910 > * simplify-rtx.c (simplify_immed_subreg): For integer modes > wider than MAX_BITSIZE_MODE_ANY_INT, handle all zeros and all ones > and for other values return NULL_RTX. > > * gcc.target/i386/pr63910.c: New test. > > --- gcc/simplify-rtx.c.jj 2014-11-11 00:06:19.000000000 +0100 > +++ gcc/simplify-rtx.c 2014-11-18 10:17:01.198668555 +0100 > @@ -5505,6 +5505,21 @@ simplify_immed_subreg (machine_mode oute > HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; > wide_int r; > > + if (GET_MODE_PRECISION (outer_submode) > MAX_BITSIZE_MODE_ANY_INT) > + { > + /* Handle just all zeros and all ones CONST_VECTORs in > + this case. */ > + if ((vp[0] & value_mask) == 0) > + elems[elem] = const0_rtx; > + else if ((vp[0] & value_mask) == value_mask) > + elems[elem] = constm1_rtx; > + else > + return NULL_RTX; > + for (i = value_bit; i < elem_bitsize; i += value_bit) > + if ((vp[i / value_bit] & value_mask) != (vp[0] & value_mask)) > + return NULL_RTX; > + break; > + } > for (u = 0; u < units; u++) > { > unsigned HOST_WIDE_INT buf = 0; > @@ -5516,8 +5531,6 @@ simplify_immed_subreg (machine_mode oute > tmp[u] = buf; > base += HOST_BITS_PER_WIDE_INT; > } > - gcc_assert (GET_MODE_PRECISION (outer_submode) > - <= MAX_BITSIZE_MODE_ANY_INT); > r = wide_int::from_array (tmp, units, > GET_MODE_PRECISION (outer_submode)); > elems[elem] = immed_wide_int_const (r, outer_submode); > --- gcc/testsuite/gcc.target/i386/pr63910.c.jj 2014-11-18 10:12:24.282659318 +0100 > +++ gcc/testsuite/gcc.target/i386/pr63910.c 2014-11-18 10:12:00.000000000 +0100 > @@ -0,0 +1,12 @@ > +/* PR target/63910 */ > +/* { dg-do compile } */ > +/* { dg-options "-O -mstringop-strategy=vector_loop -mavx512f" } */ > + > +extern void bar (float *c); > + > +void > +foo (void) > +{ > + float c[1024] = { }; > + bar (c); > +} > > Jakub > >
--- gcc/simplify-rtx.c.jj 2014-11-11 00:06:19.000000000 +0100 +++ gcc/simplify-rtx.c 2014-11-18 10:17:01.198668555 +0100 @@ -5505,6 +5505,21 @@ simplify_immed_subreg (machine_mode oute HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; wide_int r; + if (GET_MODE_PRECISION (outer_submode) > MAX_BITSIZE_MODE_ANY_INT) + { + /* Handle just all zeros and all ones CONST_VECTORs in + this case. */ + if ((vp[0] & value_mask) == 0) + elems[elem] = const0_rtx; + else if ((vp[0] & value_mask) == value_mask) + elems[elem] = constm1_rtx; + else + return NULL_RTX; + for (i = value_bit; i < elem_bitsize; i += value_bit) + if ((vp[i / value_bit] & value_mask) != (vp[0] & value_mask)) + return NULL_RTX; + break; + } for (u = 0; u < units; u++) { unsigned HOST_WIDE_INT buf = 0; @@ -5516,8 +5531,6 @@ simplify_immed_subreg (machine_mode oute tmp[u] = buf; base += HOST_BITS_PER_WIDE_INT; } - gcc_assert (GET_MODE_PRECISION (outer_submode) - <= MAX_BITSIZE_MODE_ANY_INT); r = wide_int::from_array (tmp, units, GET_MODE_PRECISION (outer_submode)); elems[elem] = immed_wide_int_const (r, outer_submode); --- gcc/testsuite/gcc.target/i386/pr63910.c.jj 2014-11-18 10:12:24.282659318 +0100 +++ gcc/testsuite/gcc.target/i386/pr63910.c 2014-11-18 10:12:00.000000000 +0100 @@ -0,0 +1,12 @@ +/* PR target/63910 */ +/* { dg-do compile } */ +/* { dg-options "-O -mstringop-strategy=vector_loop -mavx512f" } */ + +extern void bar (float *c); + +void +foo (void) +{ + float c[1024] = { }; + bar (c); +}