diff mbox

[i386] access subvectors

Message ID alpine.DEB.2.02.1205022247440.2504@laptop-mg.saclay.inria.fr
State New
Headers show

Commit Message

Marc Glisse May 2, 2012, 9:46 p.m. UTC
Hello,

I definitely don't expect the attached patch to be accepted, but I would 
like some advice on the direction to go, and a patch that passes the 
testsuite and does the optimization I want on a couple testcases seems 
like it may help start the conversation. This is the first time I even 
look at .md files...

The goal is to optimize: v8sf x; v4sf y=*(v4sf*)&x; so the compiler 
doesn't copy x to memory (yes, I know there is an intrinsic to do that).

If I understood Richard Guenther's comment in the PR, it can be optimized 
in the back-end. The only way I found to place this kind of transformation 
is with define_peephole2. And I couldn't figure out how to test if 2 
memory operands correspond to the same address, with different types (so 
match_dup is unhappy), and for some reason the XEXP(*,0) comparison said 
yes on my test and no when using an unrelated piece of memory, but it 
looks like a nonsense test that is just lucky on a couple trivial 
examples.

Any help?


2012-05-02  Marc Glisse  <marc.glisse@inria.fr>
 	PR target/53101

gcc/
 	* config/i386/sse.md: New peephole2 for subvectors.

gcc/testsuite/
 	* gcc.target/i386/pr53101.c: New test.

Comments

Richard Biener May 3, 2012, 8:25 a.m. UTC | #1
On Wed, May 2, 2012 at 11:46 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> Hello,
>
> I definitely don't expect the attached patch to be accepted, but I would
> like some advice on the direction to go, and a patch that passes the
> testsuite and does the optimization I want on a couple testcases seems like
> it may help start the conversation. This is the first time I even look at
> .md files...
>
> The goal is to optimize: v8sf x; v4sf y=*(v4sf*)&x; so the compiler doesn't
> copy x to memory (yes, I know there is an intrinsic to do that).
>
> If I understood Richard Guenther's comment in the PR, it can be optimized in
> the back-end.

Expand simply uses a subreg to access the lower half.  I suppose we miss
some simplify_rtx/combine logic that ends up using a vec_select (can
that even be used to select the lower/upper half?).  Uros?  We want
movq %ymm0, %xmm1 (or rather simply use xmm0 in consumers) here.

Richard.

> The only way I found to place this kind of transformation is
> with define_peephole2. And I couldn't figure out how to test if 2 memory
> operands correspond to the same address, with different types (so match_dup
> is unhappy), and for some reason the XEXP(*,0) comparison said yes on my
> test and no when using an unrelated piece of memory, but it looks like a
> nonsense test that is just lucky on a couple trivial examples.
>
> Any help?
>
>
> 2012-05-02  Marc Glisse  <marc.glisse@inria.fr>
>        PR target/53101
>
> gcc/
>        * config/i386/sse.md: New peephole2 for subvectors.
>
> gcc/testsuite/
>        * gcc.target/i386/pr53101.c: New test.
>
>
> --
> Marc Glisse
diff mbox

Patch

Index: gcc/testsuite/gcc.target/i386/pr53101.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr53101.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/pr53101.c	(revision 0)
@@ -0,0 +1,22 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+typedef double v2df __attribute__ ((vector_size (16)));
+typedef double v4df __attribute__ ((vector_size (32)));
+typedef double v4si __attribute__ ((vector_size (16)));
+typedef double v8si __attribute__ ((vector_size (32)));
+
+v4si
+avx_extract_v4si (v8si x)
+{
+  return *(v4si*)&x;
+}
+
+v2df
+avx_extract_v2df (v4df x __attribute((unused)), v4df y)
+{
+  return *(v2df*)&y;
+}
+
+/* { dg-final { scan-assembler-not "movdq" } } */
+/* { dg-final { scan-assembler-times "movapd" 1 } } */

Property changes on: gcc/testsuite/gcc.target/i386/pr53101.c
___________________________________________________________________
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 187012)
+++ gcc/config/i386/sse.md	(working copy)
@@ -4104,10 +4104,34 @@ 
 
   emit_move_insn (operands[0], adjust_address (operands[1], SFmode, i*4));
   DONE;
 })
 
+;; This is how we receive accesses to the first half of a vector.
+(define_peephole2
+  [(set (match_operand:VI8F_256 3 "memory_operand")
+        (match_operand:VI8F_256 1 "register_operand"))
+   (set (match_operand:<ssehalfvecmode> 0 "register_operand")
+        (match_operand:<ssehalfvecmode> 2 "memory_operand"))]
+  "TARGET_AVX && rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))"
+  [(set (match_dup 0)
+        (vec_select:<ssehalfvecmode> (match_dup 1)
+                                     (parallel [(const_int 0) (const_int 1)])))]
+)
+
+(define_peephole2
+  [(set (match_operand:VI4F_256 3 "memory_operand")
+        (match_operand:VI4F_256 1 "register_operand"))
+   (set (match_operand:<ssehalfvecmode> 0 "register_operand")
+        (match_operand:<ssehalfvecmode> 2 "memory_operand"))]
+  "TARGET_AVX && rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0))"
+  [(set (match_dup 0)
+        (vec_select:<ssehalfvecmode> (match_dup 1)
+                                     (parallel [(const_int 0) (const_int 1)
+			                        (const_int 2) (const_int 3)])))]
+)
+
 (define_expand "avx_vextractf128<mode>"
   [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
    (match_operand:V_256 1 "register_operand")
    (match_operand:SI 2 "const_0_to_1_operand")]
   "TARGET_AVX"