Message ID | 20130108200057.GM7269@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
On Tue, Jan 8, 2013 at 9:00 PM, Jakub Jelinek <jakub@redhat.com> wrote: > No matter whether LRA (if it is a bug in there) is fixed or not, > *vec_concatv2df could handle for !avx sse3 x <- x, 1 alternative the same > as it handles x <- m, 1 alternative (using movddup). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2013-01-08 Jakub Jelinek <jakub@redhat.com> > > PR rtl-optimization/55829 > * config/i386/sse.md (*vec_concatv2df): Add x <- x, 1 alternative > for sse3 but not avx. > > * gcc.target/i386/pr55829.c: New test. > > --- gcc/config/i386/sse.md.jj 2012-11-26 10:14:26.000000000 +0100 > +++ gcc/config/i386/sse.md 2013-01-08 10:28:42.496819712 +0100 > @@ -5183,10 +5183,10 @@ (define_insn "vec_dupv2df" > (set_attr "mode" "V2DF")]) > > (define_insn "*vec_concatv2df" > - [(set (match_operand:V2DF 0 "register_operand" "=x,x,x,x,x,x,x,x") > + [(set (match_operand:V2DF 0 "register_operand" "=x,x,x, x,x,x,x,x") > (vec_concat:V2DF > - (match_operand:DF 1 "nonimmediate_operand" " 0,x,m,0,x,m,0,0") > - (match_operand:DF 2 "vector_move_operand" " x,x,1,m,m,C,x,m")))] > + (match_operand:DF 1 "nonimmediate_operand" " 0,x,xm,0,x,m,0,0") > + (match_operand:DF 2 "vector_move_operand" " x,x,1, m,m,C,x,m")))] This was done on purpose, since reload had some problems with similar pattern (please see PR 50875 [1] and [2]). If we are sure that LRA fixes this problem, then the patch is OK for mainline. Also, please revert "hack" that fixed PR 50875 in this case. [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50875 [2] http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02477.html Uros.
On Wed, Jan 9, 2013 at 10:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> No matter whether LRA (if it is a bug in there) is fixed or not, >> *vec_concatv2df could handle for !avx sse3 x <- x, 1 alternative the same >> as it handles x <- m, 1 alternative (using movddup). >> >> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >> >> 2013-01-08 Jakub Jelinek <jakub@redhat.com> >> >> PR rtl-optimization/55829 >> * config/i386/sse.md (*vec_concatv2df): Add x <- x, 1 alternative >> for sse3 but not avx. >> >> * gcc.target/i386/pr55829.c: New test. >> >> --- gcc/config/i386/sse.md.jj 2012-11-26 10:14:26.000000000 +0100 >> +++ gcc/config/i386/sse.md 2013-01-08 10:28:42.496819712 +0100 >> @@ -5183,10 +5183,10 @@ (define_insn "vec_dupv2df" >> (set_attr "mode" "V2DF")]) >> >> (define_insn "*vec_concatv2df" >> - [(set (match_operand:V2DF 0 "register_operand" "=x,x,x,x,x,x,x,x") >> + [(set (match_operand:V2DF 0 "register_operand" "=x,x,x, x,x,x,x,x") >> (vec_concat:V2DF >> - (match_operand:DF 1 "nonimmediate_operand" " 0,x,m,0,x,m,0,0") >> - (match_operand:DF 2 "vector_move_operand" " x,x,1,m,m,C,x,m")))] >> + (match_operand:DF 1 "nonimmediate_operand" " 0,x,xm,0,x,m,0,0") >> + (match_operand:DF 2 "vector_move_operand" " x,x,1, m,m,C,x,m")))] > > This was done on purpose, since reload had some problems with similar > pattern (please see PR 50875 [1] and [2]). If we are sure that LRA > fixes this problem, then the patch is OK for mainline. > > Also, please revert "hack" that fixed PR 50875 in this case. Looking into this problem a bit more: After Vladimir's LRA patch went in, we generate for gcc.target/i386/pr55829.c: movq p1(%rip), %r12 # 56 *movdi_internal_rex64/2 [length = 7] movq %r12, (%rsp) # 57 *movdi_internal_rex64/4 [length = 4] movddup (%rsp), %xmm1 # 23 *vec_concatv2df/3 [length = 5] Combined with your proposed patch: movq p1(%rip), %r12 # 60 *movdi_internal_rex64/2 [length = 7] movq %r12, (%rsp) # 61 *movdi_internal_rex64/4 [length = 4] movsd (%rsp), %xmm1 # 56 *movdf_internal_rex64/10 [length = 5] unpcklpd %xmm1, %xmm1 # 23 *vec_concatv2df/1 [length = 4] That is, one more move to use unpcklpd. Based on this evidence, I think that the proposed patch should be rejected, the generic LRA fix alone results in better code. Thanks, Uros.
--- gcc/config/i386/sse.md.jj 2012-11-26 10:14:26.000000000 +0100 +++ gcc/config/i386/sse.md 2013-01-08 10:28:42.496819712 +0100 @@ -5183,10 +5183,10 @@ (define_insn "vec_dupv2df" (set_attr "mode" "V2DF")]) (define_insn "*vec_concatv2df" - [(set (match_operand:V2DF 0 "register_operand" "=x,x,x,x,x,x,x,x") + [(set (match_operand:V2DF 0 "register_operand" "=x,x,x, x,x,x,x,x") (vec_concat:V2DF - (match_operand:DF 1 "nonimmediate_operand" " 0,x,m,0,x,m,0,0") - (match_operand:DF 2 "vector_move_operand" " x,x,1,m,m,C,x,m")))] + (match_operand:DF 1 "nonimmediate_operand" " 0,x,xm,0,x,m,0,0") + (match_operand:DF 2 "vector_move_operand" " x,x,1, m,m,C,x,m")))] "TARGET_SSE" "@ unpcklpd\t{%2, %0|%0, %2} --- gcc/testsuite/gcc.target/i386/pr55829.c.jj 2013-01-08 10:34:41.490778873 +0100 +++ gcc/testsuite/gcc.target/i386/pr55829.c 2013-01-08 10:35:24.591532010 +0100 @@ -0,0 +1,23 @@ +/* PR rtl-optimization/55829 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse3 -fno-expensive-optimizations" } */ + +typedef double V __attribute__ ((__vector_size__ (16))); +extern double v[], w[]; +int foo (void); + +int +bar (void) +{ + int i, f = 0; + V t1 = (V) { *v, 0 }; + V t2 = __builtin_ia32_shufpd (t1, t1, 0); + double p10 = v[0]; + for (i = 0; i < 80; i++) + { + w[0] = p10; + __builtin_ia32_storeupd (v, t2); + f += foo (); + } + return f; +}