Message ID | 20161125173502.GM3541@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
Jakub Jelinek <jakub@redhat.com> writes: > Hi! > > If we try to simplify a paradoxical subreg of a CONST_WIDE_INT, > we can ICE, because wi::extract_uhwi doesn't like accessing > bytes beyond the precision. > for (i = 0; i < elem_bitsize; i += value_bit) > *vp++ = wi::extract_uhwi (val, i, value_bit); > for (; i < elem_bitsize; i += value_bit) > *vp++ = extend; > looks wrong, because the second loop is useless. But, we actually > do want the second loop, to handle the paradoxical bytes. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2016-11-25 Jakub Jelinek <jakub@redhat.com> > > PR rtl-optimization/78526 > * simplify-rtx.c (simplify_immed_subreg): Don't use wi::extract_uhwi > beyond val's precision. > > * gcc.dg/pr78526.c: New test. OK, thanks. I think this is still wrong for modes that are wide enough to use CONST_WIDE_INT but aren't a multiple of a HWI in size. In that case we'll still try to access the integer in a wider precision for the final iteration of the first loop. I'm not sure we have any targets like that though, and either way, I agree this is a strict improvement. Perhaps we should have extract_uhwi that works for arbitrary bitpos and width and zero-extends beyond the precision, and an extract_shwi that does the same but sign-extending beyond the precision? Then code like this could use a single loop and could safely access any wide-int representation (regardless of whether it's stored in sign-extended form). The extract_*hwi routines would become a bit more expensive though. Does that sound like an improvement? If so I'll file a PR. I might have time before GCC 7 if it qualifies. Thanks, Richard
On Fri, Nov 25, 2016 at 06:50:17PM +0000, Richard Sandiford wrote: > I think this is still wrong for modes that are wide enough to use > CONST_WIDE_INT but aren't a multiple of a HWI in size. In that The loop iterates by value_bit, which is 8 bits. Thus, indeed, it will misbehave if there is a CONST_WIDE_INT with a precision not multiple of 8. > Does that sound like an improvement? If so I'll file a PR. > I might have time before GCC 7 if it qualifies. Yeah, it might be an improvement. Jakub
--- gcc/simplify-rtx.c.jj 2016-11-18 20:04:28.000000000 +0100 +++ gcc/simplify-rtx.c 2016-11-25 15:07:13.872322389 +0100 @@ -5740,8 +5740,9 @@ simplify_immed_subreg (machine_mode oute { rtx_mode_t val = rtx_mode_t (el, innermode); unsigned char extend = wi::sign_mask (val); + int prec = wi::get_precision (val); - for (i = 0; i < elem_bitsize; i += value_bit) + for (i = 0; i < prec && i < elem_bitsize; i += value_bit) *vp++ = wi::extract_uhwi (val, i, value_bit); for (; i < elem_bitsize; i += value_bit) *vp++ = extend; --- gcc/testsuite/gcc.dg/pr78526.c.jj 2016-11-25 15:15:53.266799450 +0100 +++ gcc/testsuite/gcc.dg/pr78526.c 2016-11-25 15:16:12.004564146 +0100 @@ -0,0 +1,21 @@ +/* PR rtl-optimization/78526 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O -fno-tree-ccp -fno-tree-sra -g -w" } */ +/* { dg-additional-options "-mavx512bw" { target i?86-*-* x86_64-*-* } } */ + +typedef unsigned U __attribute__ ((vector_size (64))); +typedef unsigned __int128 V __attribute__ ((vector_size (64))); + +static inline V +bar (U u, U x, V v) +{ + v = (V)(U) { 0, ~0 }; + v[x[0]] <<= u[-63]; + return v; +} + +V +foo (U u) +{ + return bar (u, (U) {}, (V) {}); +}