Patchwork Allow x <- x, 1 in *vec_concatv2df (PR rtl-optimization/55829)

login
register
mail settings
Submitter Jakub Jelinek
Date Jan. 8, 2013, 8 p.m.
Message ID <20130108200057.GM7269@tucnak.redhat.com>
Download mbox | patch
Permalink /patch/210491/
State New
Headers show

Comments

Jakub Jelinek - Jan. 8, 2013, 8 p.m.
Hi!

No matter whether LRA (if it is a bug in there) is fixed or not,
*vec_concatv2df could handle for !avx sse3 x <- x, 1 alternative the same
as it handles x <- m, 1 alternative (using movddup).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-01-08  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/55829
	* config/i386/sse.md (*vec_concatv2df): Add x <- x, 1 alternative
	for sse3 but not avx.

	* gcc.target/i386/pr55829.c: New test.


	Jakub
Uros Bizjak - Jan. 9, 2013, 9:23 a.m.
On Tue, Jan 8, 2013 at 9:00 PM, Jakub Jelinek <jakub@redhat.com> wrote:

> No matter whether LRA (if it is a bug in there) is fixed or not,
> *vec_concatv2df could handle for !avx sse3 x <- x, 1 alternative the same
> as it handles x <- m, 1 alternative (using movddup).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2013-01-08  Jakub Jelinek  <jakub@redhat.com>
>
>         PR rtl-optimization/55829
>         * config/i386/sse.md (*vec_concatv2df): Add x <- x, 1 alternative
>         for sse3 but not avx.
>
>         * gcc.target/i386/pr55829.c: New test.
>
> --- gcc/config/i386/sse.md.jj   2012-11-26 10:14:26.000000000 +0100
> +++ gcc/config/i386/sse.md      2013-01-08 10:28:42.496819712 +0100
> @@ -5183,10 +5183,10 @@ (define_insn "vec_dupv2df"
>     (set_attr "mode" "V2DF")])
>
>  (define_insn "*vec_concatv2df"
> -  [(set (match_operand:V2DF 0 "register_operand"     "=x,x,x,x,x,x,x,x")
> +  [(set (match_operand:V2DF 0 "register_operand"     "=x,x,x, x,x,x,x,x")
>         (vec_concat:V2DF
> -         (match_operand:DF 1 "nonimmediate_operand" " 0,x,m,0,x,m,0,0")
> -         (match_operand:DF 2 "vector_move_operand"  " x,x,1,m,m,C,x,m")))]
> +         (match_operand:DF 1 "nonimmediate_operand" " 0,x,xm,0,x,m,0,0")
> +         (match_operand:DF 2 "vector_move_operand"  " x,x,1, m,m,C,x,m")))]

This was done on purpose, since reload had some problems with similar
pattern (please see PR 50875 [1] and [2]). If we are sure that LRA
fixes this problem, then the patch is OK for mainline.

Also, please revert "hack" that fixed PR 50875 in this case.

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50875
[2] http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02477.html

Uros.
Uros Bizjak - Jan. 9, 2013, 5:40 p.m.
On Wed, Jan 9, 2013 at 10:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote:

>> No matter whether LRA (if it is a bug in there) is fixed or not,
>> *vec_concatv2df could handle for !avx sse3 x <- x, 1 alternative the same
>> as it handles x <- m, 1 alternative (using movddup).
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>
>> 2013-01-08  Jakub Jelinek  <jakub@redhat.com>
>>
>>         PR rtl-optimization/55829
>>         * config/i386/sse.md (*vec_concatv2df): Add x <- x, 1 alternative
>>         for sse3 but not avx.
>>
>>         * gcc.target/i386/pr55829.c: New test.
>>
>> --- gcc/config/i386/sse.md.jj   2012-11-26 10:14:26.000000000 +0100
>> +++ gcc/config/i386/sse.md      2013-01-08 10:28:42.496819712 +0100
>> @@ -5183,10 +5183,10 @@ (define_insn "vec_dupv2df"
>>     (set_attr "mode" "V2DF")])
>>
>>  (define_insn "*vec_concatv2df"
>> -  [(set (match_operand:V2DF 0 "register_operand"     "=x,x,x,x,x,x,x,x")
>> +  [(set (match_operand:V2DF 0 "register_operand"     "=x,x,x, x,x,x,x,x")
>>         (vec_concat:V2DF
>> -         (match_operand:DF 1 "nonimmediate_operand" " 0,x,m,0,x,m,0,0")
>> -         (match_operand:DF 2 "vector_move_operand"  " x,x,1,m,m,C,x,m")))]
>> +         (match_operand:DF 1 "nonimmediate_operand" " 0,x,xm,0,x,m,0,0")
>> +         (match_operand:DF 2 "vector_move_operand"  " x,x,1, m,m,C,x,m")))]
>
> This was done on purpose, since reload had some problems with similar
> pattern (please see PR 50875 [1] and [2]). If we are sure that LRA
> fixes this problem, then the patch is OK for mainline.
>
> Also, please revert "hack" that fixed PR 50875 in this case.

Looking into this problem a bit more: After Vladimir's LRA patch went
in, we generate for gcc.target/i386/pr55829.c:

        movq    p1(%rip), %r12  # 56    *movdi_internal_rex64/2 [length = 7]
        movq    %r12, (%rsp)    # 57    *movdi_internal_rex64/4 [length = 4]
        movddup (%rsp), %xmm1   # 23    *vec_concatv2df/3       [length = 5]

Combined with your proposed patch:

        movq    p1(%rip), %r12  # 60    *movdi_internal_rex64/2 [length = 7]
        movq    %r12, (%rsp)    # 61    *movdi_internal_rex64/4 [length = 4]
        movsd   (%rsp), %xmm1   # 56    *movdf_internal_rex64/10
 [length = 5]
        unpcklpd        %xmm1, %xmm1    # 23    *vec_concatv2df/1
 [length = 4]

That is, one more move to use unpcklpd.

Based on this evidence, I think that the proposed patch should be
rejected, the generic LRA fix alone results in better code.

Thanks,
Uros.

Patch

--- gcc/config/i386/sse.md.jj	2012-11-26 10:14:26.000000000 +0100
+++ gcc/config/i386/sse.md	2013-01-08 10:28:42.496819712 +0100
@@ -5183,10 +5183,10 @@  (define_insn "vec_dupv2df"
    (set_attr "mode" "V2DF")])
 
 (define_insn "*vec_concatv2df"
-  [(set (match_operand:V2DF 0 "register_operand"     "=x,x,x,x,x,x,x,x")
+  [(set (match_operand:V2DF 0 "register_operand"     "=x,x,x, x,x,x,x,x")
 	(vec_concat:V2DF
-	  (match_operand:DF 1 "nonimmediate_operand" " 0,x,m,0,x,m,0,0")
-	  (match_operand:DF 2 "vector_move_operand"  " x,x,1,m,m,C,x,m")))]
+	  (match_operand:DF 1 "nonimmediate_operand" " 0,x,xm,0,x,m,0,0")
+	  (match_operand:DF 2 "vector_move_operand"  " x,x,1, m,m,C,x,m")))]
   "TARGET_SSE"
   "@
    unpcklpd\t{%2, %0|%0, %2}
--- gcc/testsuite/gcc.target/i386/pr55829.c.jj	2013-01-08 10:34:41.490778873 +0100
+++ gcc/testsuite/gcc.target/i386/pr55829.c	2013-01-08 10:35:24.591532010 +0100
@@ -0,0 +1,23 @@ 
+/* PR rtl-optimization/55829 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse3 -fno-expensive-optimizations" } */
+
+typedef double V __attribute__ ((__vector_size__ (16)));
+extern double v[], w[];
+int foo (void);
+
+int
+bar (void)
+{
+  int i, f = 0;
+  V t1 = (V) { *v, 0 };
+  V t2 = __builtin_ia32_shufpd (t1, t1, 0);
+  double p10 = v[0];
+  for (i = 0; i < 80; i++)
+    {
+      w[0] = p10;
+      __builtin_ia32_storeupd (v, t2);
+      f += foo ();
+    }
+  return f;
+}