diff mbox

[RFC] Slightly fix up vgather* patterns

Message ID 20111008154322.GQ19412@tyan-ft48-01.lab.bos.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Oct. 8, 2011, 3:43 p.m. UTC
Hi!

The AVX2 docs say that the insns will #UD if any of the mask, src and index
registers are the same, but e.g. on
#include <x86intrin.h>

__m256 m;
float f[1024];

__m256
foo (void)
{
  __m256i mi = (__m256i) m;
  return _mm256_mask_i32gather_ps (m, f, mi, m, 4);
}

which is IMHO valid and should for m being zero vector just return a
zero vector and clear mask (in this case it was already cleared) we compile
it as
        vmovdqa m(%rip), %ymm1
        vmovaps %ymm1, %ymm0
        vgatherdps      %ymm1, (%rax, %ymm1, 4), %ymm0
and thus IMHO it will #UD.  Also, the insns should make it clear that
the mask register is modified too (the patch clobbers it, perhaps
we could instead say that it zeros the register (which is true if
it doesn't segfault), but then what if a segfault handler chooses to
continue with the next insn and doesn't clear the mask register?).
Still, the insn description is imprecise, saying that it loads from mem
at the address register is wrong and perhaps some DCE might delete
what shouldn't be deleted.  So, either it should (use (mem (scratch)))
or something similar, or in the unspec list all the memory locations
that are being read
(mem:<scalarssemode> (plus:SI (reg:SI) (vec_select:SI (match_operand:V4SI)
(parallel [(const_int N)]))))
for N 0 through something (but it is complicated by Pmode size vs.
the need to do nothing/truncate/sign_extend the vec_select to the right
mode).

What do you think?

2011-10-08  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/sse.md (avx2_gathersi<mode>, avx2_gatherdi<mode>,
	avx2_gatherdi<mode>256): Add clobber of operand 4.
	(*avx2_gathersi<mode>, *avx2_gatherdi<mode>,
	*avx2_gatherdi<mode>256): Add clobber of the mask register,
	add earlyclobber to both output operands.


	Jakub

Comments

Uros Bizjak Oct. 9, 2011, 10:55 a.m. UTC | #1
On Sat, Oct 8, 2011 at 5:43 PM, Jakub Jelinek <jakub@redhat.com> wrote:

> The AVX2 docs say that the insns will #UD if any of the mask, src and index
> registers are the same, but e.g. on
> #include <x86intrin.h>
>
> __m256 m;
> float f[1024];
>
> __m256
> foo (void)
> {
>  __m256i mi = (__m256i) m;
>  return _mm256_mask_i32gather_ps (m, f, mi, m, 4);
> }
>
> which is IMHO valid and should for m being zero vector just return a
> zero vector and clear mask (in this case it was already cleared) we compile
> it as
>        vmovdqa m(%rip), %ymm1
>        vmovaps %ymm1, %ymm0
>        vgatherdps      %ymm1, (%rax, %ymm1, 4), %ymm0
> and thus IMHO it will #UD.  Also, the insns should make it clear that
> the mask register is modified too (the patch clobbers it, perhaps
> we could instead say that it zeros the register (which is true if
> it doesn't segfault), but then what if a segfault handler chooses to
> continue with the next insn and doesn't clear the mask register?).
> Still, the insn description is imprecise, saying that it loads from mem
> at the address register is wrong and perhaps some DCE might delete
> what shouldn't be deleted.  So, either it should (use (mem (scratch)))
> or something similar, or in the unspec list all the memory locations
> that are being read
> (mem:<scalarssemode> (plus:SI (reg:SI) (vec_select:SI (match_operand:V4SI)
> (parallel [(const_int N)]))))
> for N 0 through something (but it is complicated by Pmode size vs.
> the need to do nothing/truncate/sign_extend the vec_select to the right
> mode).
>
> What do you think?

Regarding the clear of mask operand: I agree that this should be
modelled as a clobber. Zeroing can't be guaranteed due to the fact you
described above.

About memory - can't we use (mem:BLK (match_operand:P
"register_operand" "r")) here?

BTW: No need to use %c modifier:

/* Meaning of CODE:
   L,W,B,Q,S,T -- print the opcode suffix for specified size of operand.
   C -- print opcode suffix for set/cmov insn.
   c -- like C, but print reversed condition
   ...
*/

Uros.
Jakub Jelinek Oct. 10, 2011, 7:09 a.m. UTC | #2
On Sun, Oct 09, 2011 at 12:55:40PM +0200, Uros Bizjak wrote:
> About memory - can't we use (mem:BLK (match_operand:P
> "register_operand" "r")) here?

I don't think it is sufficient.
Consider e.g. _mm_i32gather_pd (NULL, index, 1); where index
is initialized from loading consecutive (32-bit) double * pointers from an
array.  Then it loads for elt 0 through 1 *(double *)(0 + index[elt]).
Describing this as mem:BLK (register initialized to 0) is wrong.
But even with non-zero base, say if base is a pointer pointing into
a middle of some array and some offsets are positive and some negative
using mem:BLK of the base would just mean non-negative offsets from it.

OT, seems avx2intrin.h is weird for many of the gather patterns:
E.g. the _mm_i32gather_pd inline uses:
  __v2df src = _mm_setzero_pd ();
  __v2df mask = _mm_cmpeq_pd (src, src);
which will work and set mask to all ones floating point vector, but
e.g. _mm256_i32gather_pd uses
  __v4df src = _mm256_setzero_pd ();
  __v4df mask = _mm256_set1_pd((double)(long long int) -1);
which I believe will create a { -1.0, -1.0, -1.0, -1.0 }; vector.
Either it could be
  __v4df src = _mm256_setzero_pd ();
  __v4df mask = _mm256_cmp_pd (src, src, _CMP_EQ_OQ);
or it would need to be something like
#define __MM_ALL_ONES_DOUBLE \
  (__extension__ ((union { long long int __l; double __d; }) { __l: -1 }).__d)
  __v4df src = _mm256_setzero_pd ();
  __v4df mask = _mm256_set1_pd (__MM_ALL_ONES_DOUBLE);

Though, only the most significant bit of the mask is used by the instruction
and thus perhaps -1.0 is useful too.  Though, it is certainly more
expensive than the _mm256_cmp_pd alternative (needs to be loaded from
memory).  BTW, the expander probably needs some help to emit code for
the second case for the third case, it loads it from memory too.

> BTW: No need to use %c modifier:
> 
> /* Meaning of CODE:
>    L,W,B,Q,S,T -- print the opcode suffix for specified size of operand.
>    C -- print opcode suffix for set/cmov insn.
>    c -- like C, but print reversed condition
>    ...
> */

Ok.

	Jakub
Richard Henderson Oct. 10, 2011, 8:47 p.m. UTC | #3
On 10/08/2011 08:43 AM, Jakub Jelinek wrote:
>  (define_expand "avx2_gathersi<mode>"
> -  [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "")
> -	(unspec:VEC_GATHER_MODE
> -	  [(match_operand:VEC_GATHER_MODE 1 "register_operand" "")
> -	   (match_operand:<ssescalarmode> 2 "memory_operand" "")
> -	   (match_operand:<VEC_GATHER_MODE> 3 "register_operand" "")
> -	   (match_operand:VEC_GATHER_MODE 4 "register_operand" "")
> -	   (match_operand:SI 5 "const1248_operand " "")]
> -	  UNSPEC_GATHER))]
> +  [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "")
> +		   (unspec:VEC_GATHER_MODE
> +		     [(match_operand:VEC_GATHER_MODE 1 "register_operand" "")
> +		      (match_operand:<ssescalarmode> 2 "memory_operand" "")
> +		      (match_operand:<VEC_GATHER_MODE> 3 "register_operand" "")
> +		      (match_operand:VEC_GATHER_MODE 4 "register_operand" "")
> +		      (match_operand:SI 5 "const1248_operand " "")]
> +		     UNSPEC_GATHER))
> +	      (clobber (match_dup 4))])]
>    "TARGET_AVX2")

The use of match_dup in the clobber is wrong.  We should not be
clobbering the user-visible copy of the operand.  That does not
make sense when dealing with the user-visible builtin.


>  
>  (define_insn "*avx2_gathersi<mode>"
> -  [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=x")
> +  [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x")
>  	(unspec:VEC_GATHER_MODE
> -	  [(match_operand:VEC_GATHER_MODE 1 "register_operand" "0")
> +	  [(match_operand:VEC_GATHER_MODE 2 "register_operand" "0")
>  	   (mem:<ssescalarmode>
> -	     (match_operand:P 2 "register_operand" "r"))
> -	   (match_operand:<VEC_GATHER_MODE> 3 "register_operand" "x")
> -	   (match_operand:VEC_GATHER_MODE 4 "register_operand" "x")
> -	   (match_operand:SI 5 "const1248_operand" "n")]
> -	  UNSPEC_GATHER))]
> +	     (match_operand:P 3 "register_operand" "r"))
> +	   (match_operand:<VEC_GATHER_MODE> 4 "register_operand" "x")
> +	   (match_operand:VEC_GATHER_MODE 5 "register_operand" "1")
> +	   (match_operand:SI 6 "const1248_operand" "n")]
> +	  UNSPEC_GATHER))
> +   (clobber (match_operand:VEC_GATHER_MODE 1 "register_operand" "=&x"))]
>    "TARGET_AVX2"
> -  "v<gthrfirstp>gatherd<gthrlastp>\t{%4, (%2, %3, %c5), %0|%0, (%2, %3, %c5), %4}"
> +  "v<gthrfirstp>gatherd<gthrlastp>\t{%1, (%3, %4, %c6), %0|%0, (%3, %4, %c6), %1}"
>    [(set_attr "type" "ssemov")
>     (set_attr "prefix" "vex")
>     (set_attr "mode" "<sseinsnmode>")])

Instead, use (clobber (match_scratch)) and matching constraints with operand 4.


> Still, the insn description is imprecise, saying that it loads from mem
> at the address register is wrong and perhaps some DCE might delete
> what shouldn't be deleted.  So, either it should (use (mem (scratch)))
> or something similar, or in the unspec list all the memory locations
> that are being read
> (mem:<scalarssemode> (plus:SI (reg:SI) (vec_select:SI (match_operand:V4SI)
> (parallel [(const_int N)]))))
> for N 0 through something (but it is complicated by Pmode size vs.
> the need to do nothing/truncate/sign_extend the vec_select to the right
> mode).

I think that a (mem (scratch)) as input to the unspec is probably best.
The exact memory usage is almost certainly too complex to describe
in a useful way.


r~
diff mbox

Patch

--- gcc/config/i386/sse.md.jj	2011-10-07 10:03:27.000000000 +0200
+++ gcc/config/i386/sse.md	2011-10-08 17:14:50.000000000 +0200
@@ -12521,55 +12521,59 @@  (define_mode_attr VEC_GATHER_MODE
 		       (V8SI "V8SI") (V8SF "V8SI")])
 
 (define_expand "avx2_gathersi<mode>"
-  [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "")
-	(unspec:VEC_GATHER_MODE
-	  [(match_operand:VEC_GATHER_MODE 1 "register_operand" "")
-	   (match_operand:<ssescalarmode> 2 "memory_operand" "")
-	   (match_operand:<VEC_GATHER_MODE> 3 "register_operand" "")
-	   (match_operand:VEC_GATHER_MODE 4 "register_operand" "")
-	   (match_operand:SI 5 "const1248_operand " "")]
-	  UNSPEC_GATHER))]
+  [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "")
+		   (unspec:VEC_GATHER_MODE
+		     [(match_operand:VEC_GATHER_MODE 1 "register_operand" "")
+		      (match_operand:<ssescalarmode> 2 "memory_operand" "")
+		      (match_operand:<VEC_GATHER_MODE> 3 "register_operand" "")
+		      (match_operand:VEC_GATHER_MODE 4 "register_operand" "")
+		      (match_operand:SI 5 "const1248_operand " "")]
+		     UNSPEC_GATHER))
+	      (clobber (match_dup 4))])]
   "TARGET_AVX2")
 
 (define_insn "*avx2_gathersi<mode>"
-  [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=x")
+  [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x")
 	(unspec:VEC_GATHER_MODE
-	  [(match_operand:VEC_GATHER_MODE 1 "register_operand" "0")
+	  [(match_operand:VEC_GATHER_MODE 2 "register_operand" "0")
 	   (mem:<ssescalarmode>
-	     (match_operand:P 2 "register_operand" "r"))
-	   (match_operand:<VEC_GATHER_MODE> 3 "register_operand" "x")
-	   (match_operand:VEC_GATHER_MODE 4 "register_operand" "x")
-	   (match_operand:SI 5 "const1248_operand" "n")]
-	  UNSPEC_GATHER))]
+	     (match_operand:P 3 "register_operand" "r"))
+	   (match_operand:<VEC_GATHER_MODE> 4 "register_operand" "x")
+	   (match_operand:VEC_GATHER_MODE 5 "register_operand" "1")
+	   (match_operand:SI 6 "const1248_operand" "n")]
+	  UNSPEC_GATHER))
+   (clobber (match_operand:VEC_GATHER_MODE 1 "register_operand" "=&x"))]
   "TARGET_AVX2"
-  "v<gthrfirstp>gatherd<gthrlastp>\t{%4, (%2, %3, %c5), %0|%0, (%2, %3, %c5), %4}"
+  "v<gthrfirstp>gatherd<gthrlastp>\t{%1, (%3, %4, %c6), %0|%0, (%3, %4, %c6), %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "avx2_gatherdi<mode>"
-  [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "")
-	(unspec:VEC_GATHER_MODE
-	  [(match_operand:VEC_GATHER_MODE 1 "register_operand" "")
-	   (match_operand:<ssescalarmode> 2 "memory_operand" "")
-	   (match_operand:<AVXMODE48P_DI> 3 "register_operand" "")
-	   (match_operand:VEC_GATHER_MODE 4 "register_operand" "")
-	   (match_operand:SI 5 "const1248_operand " "")]
-	  UNSPEC_GATHER))]
+  [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "")
+		   (unspec:VEC_GATHER_MODE
+		     [(match_operand:VEC_GATHER_MODE 1 "register_operand" "")
+		      (match_operand:<ssescalarmode> 2 "memory_operand" "")
+		      (match_operand:<AVXMODE48P_DI> 3 "register_operand" "")
+		      (match_operand:VEC_GATHER_MODE 4 "register_operand" "")
+		      (match_operand:SI 5 "const1248_operand " "")]
+		     UNSPEC_GATHER))
+	      (clobber (match_dup 4))])]
   "TARGET_AVX2")
 
 (define_insn "*avx2_gatherdi<mode>"
-  [(set (match_operand:AVXMODE48P_DI 0 "register_operand" "=x")
+  [(set (match_operand:AVXMODE48P_DI 0 "register_operand" "=&x")
 	(unspec:AVXMODE48P_DI
-	  [(match_operand:AVXMODE48P_DI 1 "register_operand" "0")
+	  [(match_operand:AVXMODE48P_DI 2 "register_operand" "0")
 	   (mem:<ssescalarmode>
-	     (match_operand:P 2 "register_operand" "r"))
-	   (match_operand:<AVXMODE48P_DI> 3 "register_operand" "x")
-	   (match_operand:AVXMODE48P_DI 4 "register_operand" "x")
-	   (match_operand:SI 5 "const1248_operand" "n")]
-	  UNSPEC_GATHER))]
+	     (match_operand:P 3 "register_operand" "r"))
+	   (match_operand:<AVXMODE48P_DI> 4 "register_operand" "x")
+	   (match_operand:AVXMODE48P_DI 5 "register_operand" "1")
+	   (match_operand:SI 6 "const1248_operand" "n")]
+	  UNSPEC_GATHER))
+   (clobber (match_operand:AVXMODE48P_DI 1 "register_operand" "=&x"))]
   "TARGET_AVX2"
-  "v<gthrfirstp>gatherq<gthrlastp>\t{%4, (%2, %3, %c5), %0|%0, (%2, %3, %c5), %4}"
+  "v<gthrfirstp>gatherq<gthrlastp>\t{%1, (%3, %4, %c6), %0|%0, (%3, %4, %c6), %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -12577,28 +12581,30 @@  (define_insn "*avx2_gatherdi<mode>"
 ;; Special handling for VEX.256 with float arguments
 ;; since there're still xmms as operands
 (define_expand "avx2_gatherdi<mode>256"
-  [(set (match_operand:VI4F_128 0 "register_operand" "")
-	(unspec:VI4F_128
-	  [(match_operand:VI4F_128 1 "register_operand" "")
-	   (match_operand:<ssescalarmode> 2 "memory_operand" "")
-	   (match_operand:V4DI 3 "register_operand" "")
-	   (match_operand:VI4F_128 4 "register_operand" "")
-	   (match_operand:SI 5 "const1248_operand " "")]
-	  UNSPEC_GATHER))]
+  [(parallel [(set (match_operand:VI4F_128 0 "register_operand" "")
+		   (unspec:VI4F_128
+		     [(match_operand:VI4F_128 1 "register_operand" "")
+		      (match_operand:<ssescalarmode> 2 "memory_operand" "")
+		      (match_operand:V4DI 3 "register_operand" "")
+		      (match_operand:VI4F_128 4 "register_operand" "")
+		      (match_operand:SI 5 "const1248_operand " "")]
+		     UNSPEC_GATHER))
+	      (clobber (match_dup 4))])]
   "TARGET_AVX2")
 
 (define_insn "*avx2_gatherdi<mode>256"
   [(set (match_operand:VI4F_128 0 "register_operand" "=x")
 	(unspec:VI4F_128
-	  [(match_operand:VI4F_128 1 "register_operand" "0")
+	  [(match_operand:VI4F_128 2 "register_operand" "0")
 	   (mem:<ssescalarmode>
-	     (match_operand:P 2 "register_operand" "r"))
-	   (match_operand:V4DI 3 "register_operand" "x")
-	   (match_operand:VI4F_128 4 "register_operand" "x")
-	   (match_operand:SI 5 "const1248_operand" "n")]
-	  UNSPEC_GATHER))]
+	     (match_operand:P 3 "register_operand" "r"))
+	   (match_operand:V4DI 4 "register_operand" "x")
+	   (match_operand:VI4F_128 5 "register_operand" "1")
+	   (match_operand:SI 6 "const1248_operand" "n")]
+	  UNSPEC_GATHER)) 
+   (clobber (match_operand:VI4F_128 1 "register_operand" "=&x"))]
   "TARGET_AVX2"
-  "v<gthrfirstp>gatherq<gthrlastp>\t{%4, (%2, %3, %c5), %0|%0, (%2, %3, %c5), %4}"
+  "v<gthrfirstp>gatherq<gthrlastp>\t{%1, (%3, %4, %c6), %0|%0, (%3, %4, %c6), %1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])