Patchwork [RTL,i386] Use subreg instead of UNSPEC_CAST

login
register
mail settings
Submitter Marc Glisse
Date March 19, 2013, 3:47 p.m.
Message ID <alpine.DEB.2.02.1303191627200.4515@stedding.saclay.inria.fr>
Download mbox | patch
Permalink /patch/229099/
State New
Headers show

Comments

Marc Glisse - March 19, 2013, 3:47 p.m.
Hello,

the following patch passes bootstrap+testsuite on x86_64-linux-gnu. I 
don't see any particular reason to forbid vector subregs of vectors, since 
we can already do it through a scalar. And not using unspecs helps avoid 
unnecessary copies.

2013-01-03  Marc Glisse  <marc.glisse@inria.fr>

 	PR target/50829
gcc/
 	* config/i386/sse.md (enum unspec): Remove UNSPEC_CAST.
 	(avx_<castmode><avxsizesuffix>_<castmode>): Use subreg.
 	* emit-rtl.c (validate_subreg): Allow vector-vector subregs.

gcc/testsuite/
 	* gcc.target/i386/pr50829.c: New file.
Richard Henderson - March 19, 2013, 9 p.m.
On 03/19/2013 08:47 AM, Marc Glisse wrote:
>  (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
>    [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
> -	(unspec:AVX256MODE2P
> -	  [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")]
> -	  UNSPEC_CAST))]
> +	(subreg:AVX256MODE2P
> +	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))]
>    "TARGET_AVX"
>    "#"
>    "&& reload_completed"
>    [(const_int 0)]

I'm not fond of this, primarily because I believe the pattern should
not exist at all.

One of the following is true:

  (1) reload needs working around (thus all the reload_completed nonsense)
or
  (2) the entire pattern is useless and would be subsumed by mov<mode>
or
  (3) the entire pattern is useless and is *already* subsumed by
      mov<mode>, since mov is earlier in the md file, making this
      pattern dead code.



r~
Marc Glisse - March 20, 2013, 3 p.m.
On Tue, 19 Mar 2013, Richard Henderson wrote:

> On 03/19/2013 08:47 AM, Marc Glisse wrote:
>>  (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
>>    [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
>> -	(unspec:AVX256MODE2P
>> -	  [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")]
>> -	  UNSPEC_CAST))]
>> +	(subreg:AVX256MODE2P
>> +	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))]
>>    "TARGET_AVX"
>>    "#"
>>    "&& reload_completed"
>>    [(const_int 0)]
>
> I'm not fond of this, primarily because I believe the pattern should
> not exist at all.

Sure, removing it would be even better.

> One of the following is true:
>
>  (1) reload needs working around (thus all the reload_completed nonsense)
> or
>  (2) the entire pattern is useless and would be subsumed by mov<mode>
> or
>  (3) the entire pattern is useless and is *already* subsumed by
>      mov<mode>, since mov is earlier in the md file, making this
>      pattern dead code.

We need something to expand _mm256_castpd128_pd256 to. I tried making it a 
define_expand (with the subreg pattern, and keeping the {} part intact), 
but that gives check_rtl errors in lra. I then tried to remove the REG_P 
condition and use simplify_gen_subreg or gen_lowpart, but the first one 
gives unrecognizable insn at -O0 (same as removing the {} part completely) 
(it seems happier at -O1), while the second ICEs (gen_lowpart_common 
returns 0) for any -Ox except -O0. As must be obvious from this paragraph, 
I just tried a few random bad ideas... and when none worked I posted the 
minimal patch that worked.

Do you at least agree that vector-vector subregs make sense, or is that 
part wrong as well?
Richard Henderson - March 20, 2013, 3:13 p.m.
On 03/20/2013 08:00 AM, Marc Glisse wrote:
> Do you at least agree that vector-vector subregs make sense, or is that part
> wrong as well?

You mean a V4SImode subreg of a V8SImode register, not just same-size casting?
It makes logical sense, but I'm fairly sure you'll need a lot more surgery
throughout the compiler to make that happen.

I'm curious how a define_expand can fail in LRA, but your define_insn succeeds?
Is the failure because of ix86_cannot_change_mode_class?  Because that hook
fairly well defines what subregs are valid.  And if that says it isn't valid,
then even having a define_insn that uses such is wrong.


r~
Richard Guenther - March 20, 2013, 3:17 p.m.
On Wed, Mar 20, 2013 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>> Do you at least agree that vector-vector subregs make sense, or is that part
>> wrong as well?
>
> You mean a V4SImode subreg of a V8SImode register, not just same-size casting?
> It makes logical sense, but I'm fairly sure you'll need a lot more surgery
> throughout the compiler to make that happen.
>
> I'm curious how a define_expand can fail in LRA, but your define_insn succeeds?
> Is the failure because of ix86_cannot_change_mode_class?  Because that hook
> fairly well defines what subregs are valid.  And if that says it isn't valid,
> then even having a define_insn that uses such is wrong.

Don't we have vec_select to get a V4SImode out of a V8SImode?  So you
only need a define_insn that special-cases the subreg-like ones?

Richard.

>
> r~
Marc Glisse - March 20, 2013, 3:29 p.m.
On Wed, 20 Mar 2013, Richard Henderson wrote:

> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>> Do you at least agree that vector-vector subregs make sense, or is that part
>> wrong as well?
>
> You mean a V4SImode subreg of a V8SImode register, not just same-size casting?

I am mostly interested in the reverse, a paradoxical subreg, since 
vec_select can only model one direction (and only rvalues, but that's a 
different question).

> It makes logical sense, but I'm fairly sure you'll need a lot more surgery
> throughout the compiler to make that happen.
>
> I'm curious how a define_expand can fail in LRA, but your define_insn succeeds?

Total guesswork:

I think it is related to that REG_P protected code, and the 
reload_complete test. With the define_insn_and_split, we keep the insn 
until after reload and only do the subreg magic then. With a 
define_expand, we end up writing to reg 60 as a V2DF and reading it as a 
V4DF, and since it isn't a hard register, that causes a problem.

> Is the failure because of ix86_cannot_change_mode_class?  Because that hook
> fairly well defines what subregs are valid.  And if that says it isn't valid,
> then even having a define_insn that uses such is wrong.

A quick look at ix86_cannot_change_mode_class seems to indicate that it 
does not mind such paradoxical subregs.
Richard Guenther - March 20, 2013, 3:44 p.m.
On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Wed, 20 Mar 2013, Richard Henderson wrote:
>
>> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>>>
>>> Do you at least agree that vector-vector subregs make sense, or is that
>>> part
>>> wrong as well?
>>
>>
>> You mean a V4SImode subreg of a V8SImode register, not just same-size
>> casting?
>
>
> I am mostly interested in the reverse, a paradoxical subreg, since
> vec_select can only model one direction (and only rvalues, but that's a
> different question).

vec_duplicate?

Honestly, what semantics should  _mm256_castpd128_pd256 have if
it is supposed to cast a v2df to a v4df?  Or what use?

Richard.
Marc Glisse - March 20, 2013, 3:54 p.m.
On Wed, 20 Mar 2013, Richard Biener wrote:

> On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> On Wed, 20 Mar 2013, Richard Henderson wrote:
>>
>>> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>>>>
>>>> Do you at least agree that vector-vector subregs make sense, or is that
>>>> part
>>>> wrong as well?
>>>
>>>
>>> You mean a V4SImode subreg of a V8SImode register, not just same-size
>>> casting?
>>
>>
>> I am mostly interested in the reverse, a paradoxical subreg, since
>> vec_select can only model one direction (and only rvalues, but that's a
>> different question).
>
> vec_duplicate?

There is already some of that in various places, and there may be even 
more vec_merge+vec_duplicate patterns soon, but you want to make sure you 
don't actually do the duplication.

> Honestly, what semantics should  _mm256_castpd128_pd256 have if
> it is supposed to cast a v2df to a v4df?

NOP. We don't care what is in the high part of the vector.

> Or what use?

Many vector operations are defined as taking 2 vectors and merging them 
somehow. I didn't check if this case works, but for instance if you want 
to copy a V2DF to the bottom part of a V4DF using Intel's intrinsics, you 
will probably have to cast the V2DF to a V4DF and then use an intrinsic 
that takes 2 V4DF. (there are many issues with those intrinsics, but we 
don't control them)
Richard Guenther - March 21, 2013, 9:24 a.m.
On Wed, Mar 20, 2013 at 4:54 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Wed, 20 Mar 2013, Richard Biener wrote:
>
>> On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> On Wed, 20 Mar 2013, Richard Henderson wrote:
>>>
>>>> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>>>>>
>>>>>
>>>>> Do you at least agree that vector-vector subregs make sense, or is that
>>>>> part
>>>>> wrong as well?
>>>>
>>>>
>>>>
>>>> You mean a V4SImode subreg of a V8SImode register, not just same-size
>>>> casting?
>>>
>>>
>>>
>>> I am mostly interested in the reverse, a paradoxical subreg, since
>>> vec_select can only model one direction (and only rvalues, but that's a
>>> different question).
>>
>>
>> vec_duplicate?
>
>
> There is already some of that in various places, and there may be even more
> vec_merge+vec_duplicate patterns soon, but you want to make sure you don't
> actually do the duplication.
>
>
>> Honestly, what semantics should  _mm256_castpd128_pd256 have if
>> it is supposed to cast a v2df to a v4df?
>
>
> NOP. We don't care what is in the high part of the vector.
>
>> Or what use?
>
>
> Many vector operations are defined as taking 2 vectors and merging them
> somehow. I didn't check if this case works, but for instance if you want to
> copy a V2DF to the bottom part of a V4DF using Intel's intrinsics, you will
> probably have to cast the V2DF to a V4DF and then use an intrinsic that
> takes 2 V4DF. (there are many issues with those intrinsics, but we don't
> control them)

Hmm, I see.  I still think that we should expose most of the intrinsics
and builtins implementation details earlier, at the GIMPLE level.  This one
would be an awkward one there, too.  You'd need sth like

  v4df_3 = CONSTRUCTOR { v2df_2, v2df_1(D) };

thus, make that "uninitialized" explicit by using a default def.  I
think we don't
support generating the above from C/C++ source with GNU extensions as
vector type casts are quite restricted at the moment, so there you'd have
to write sth like

  double uninit;
  v4df res = { v2dfv[0], v2dfv[1], uninit, uninit };

which would get you

  D.1723 = BIT_FIELD_REF <x, 64, 0>;
  D.1724 = BIT_FIELD_REF <x, 64, 64>;
  D.1725 = {D.1723, D.1724, uninit, uninit};

at the moment.  And of course awkward code in the end ;)

Which leaves the other option of folding the __builtin_ia32_ps256_ps
in the target (and most other builtins).

Just side-tracking from the RTL issue of course ...

Richard.

> --
> Marc Glisse

Patch

Index: gcc/testsuite/gcc.target/i386/pr50829.c

===================================================================
--- gcc/testsuite/gcc.target/i386/pr50829.c	(revision 0)

+++ gcc/testsuite/gcc.target/i386/pr50829.c	(revision 0)

@@ -0,0 +1,13 @@ 

+/* { dg-do compile } */

+/* { dg-options "-O1 -mavx" } */

+

+#include <x86intrin.h>

+

+__m256d

+concat (__m128d x)

+{

+  __m256d z = _mm256_castpd128_pd256 (x);

+  return _mm256_insertf128_pd (z, x, 1);

+}

+

+/* { dg-final { scan-assembler-not "vmov" } } */


Property changes on: gcc/testsuite/gcc.target/i386/pr50829.c
___________________________________________________________________
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: gcc/config/i386/sse.md

===================================================================
--- gcc/config/i386/sse.md	(revision 196633)

+++ gcc/config/i386/sse.md	(working copy)

@@ -66,21 +66,20 @@ 

   UNSPEC_AESKEYGENASSIST
 
   ;; For PCLMUL support
   UNSPEC_PCLMUL
 
   ;; For AVX support
   UNSPEC_PCMP
   UNSPEC_VPERMIL
   UNSPEC_VPERMIL2
   UNSPEC_VPERMIL2F128
-  UNSPEC_CAST

   UNSPEC_VTESTP
   UNSPEC_VCVTPH2PS
   UNSPEC_VCVTPS2PH
 
   ;; For AVX2 support
   UNSPEC_VPERMVAR
   UNSPEC_VPERMTI
   UNSPEC_GATHER
   UNSPEC_VSIBADDR
 ])
@@ -11089,23 +11088,22 @@ 

   "TARGET_AVX"
   "v<sseintprefix>maskmov<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "btver2_decode" "vector") 
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
-	(unspec:AVX256MODE2P

-	  [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")]

-	  UNSPEC_CAST))]

+	(subreg:AVX256MODE2P

+	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))]

   "TARGET_AVX"
   "#"
   "&& reload_completed"
   [(const_int 0)]
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   if (REG_P (op0))
     op0 = gen_rtx_REG (<ssehalfvecmode>mode, REGNO (op0));
   else
Index: gcc/emit-rtl.c

===================================================================
--- gcc/emit-rtl.c	(revision 196633)

+++ gcc/emit-rtl.c	(working copy)

@@ -707,20 +707,23 @@  validate_subreg (enum machine_mode omode

   else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
 	   && GET_MODE_INNER (imode) == omode)
     ;
   /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
      i.e. (subreg:V4SF (reg:SF) 0).  This surely isn't the cleanest way to
      represent this.  It's questionable if this ought to be represented at
      all -- why can't this all be hidden in post-reload splitters that make
      arbitrarily mode changes to the registers themselves.  */
   else if (VECTOR_MODE_P (omode) && GET_MODE_INNER (omode) == imode)
     ;
+  else if (VECTOR_MODE_P (omode) && VECTOR_MODE_P (imode)

+	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))

+    ;

   /* Subregs involving floating point modes are not allowed to
      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
      (subreg:SI (reg:DF) 0) isn't.  */
   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
     {
       if (! (isize == osize
 	     /* LRA can use subreg to store a floating point value in
 		an integer mode.  Although the floating point and the
 		integer modes need the same number of hard registers,
 		the size of floating point mode can be less than the