Fix non-AVX512VL handling of lo extraction from AVX512F xmm16+ (PR target/85328)

Message ID 20180411132728.GS8577@tucnak
State New
Headers show
Series
  • Fix non-AVX512VL handling of lo extraction from AVX512F xmm16+ (PR target/85328)
Related show

Commit Message

Jakub Jelinek April 11, 2018, 1:27 p.m.
Hi!

In lots of patterns we assume that we never see xmm16+ hard registers
with 128-bit and 256-bit vector modes when not -mavx512vl, because
HARD_REGNO_MODE_OK refuses those.
Unfortunately, as this testcase and patch shows, the vec_extract_lo*
splitters work as a loophole around this, we happily create instructions
like (set (reg:V32QI xmm5) (reg:V32QI xmm16)) and then hard register
propagation can propagate the V32QI xmm16 into other insns like vpand.

The following patch fixes it by making sure we never create such registers,
just emit (set (reg:V64QI xmm5) (reg:V64QI xmm16)) instead, which by copying
all the 512 bits also copies the low bits, and as the destination is
originally V32QI which is not HARD_REGNO_MODE_OK in xmm16+, this should be
fine.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-04-11  Jakub Jelinek  <jakub@redhat.com>

	PR target/85328
	* config/i386/sse.md
	(<mask_codefor>avx512dq_vextract<shuffletype>64x2_1<mask_name> split,
	<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name> split,
	vec_extract_lo_<mode><mask_name> split, vec_extract_lo_v32hi,
	vec_extract_lo_v64qi): For non-AVX512VL if input is xmm16+ reg
	and output is a reg, avoid creating invalid lowpart subreg, but
	instead split into a 512-bit move.

	* gcc.target/i386/pr85328.c: New test.


	Jakub

Comments

Kirill Yukhin April 12, 2018, 10:46 a.m. | #1
Hello Jakub!

> On 11 Apr 2018, at 16:27, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> Hi!
> 
> In lots of patterns we assume that we never see xmm16+ hard registers
> with 128-bit and 256-bit vector modes when not -mavx512vl, because
> HARD_REGNO_MODE_OK refuses those.
> Unfortunately, as this testcase and patch shows, the vec_extract_lo*
> splitters work as a loophole around this, we happily create instructions
> like (set (reg:V32QI xmm5) (reg:V32QI xmm16)) and then hard register
> propagation can propagate the V32QI xmm16 into other insns like vpand.
> 
> The following patch fixes it by making sure we never create such registers,
> just emit (set (reg:V64QI xmm5) (reg:V64QI xmm16)) instead, which by copying
> all the 512 bits also copies the low bits, and as the destination is
> originally V32QI which is not HARD_REGNO_MODE_OK in xmm16+, this should be
> fine.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Patch is OK for trunk.

—
Thanks, K
Jakub Jelinek April 12, 2018, 10:53 a.m. | #2
On Thu, Apr 12, 2018 at 01:46:40PM +0300, Kirill Yukhin wrote:
> 
> Hello Jakub!
> 
> > On 11 Apr 2018, at 16:27, Jakub Jelinek <jakub@redhat.com> wrote:
> > In lots of patterns we assume that we never see xmm16+ hard registers
> > with 128-bit and 256-bit vector modes when not -mavx512vl, because
> > HARD_REGNO_MODE_OK refuses those.
> > Unfortunately, as this testcase and patch shows, the vec_extract_lo*
> > splitters work as a loophole around this, we happily create instructions
> > like (set (reg:V32QI xmm5) (reg:V32QI xmm16)) and then hard register
> > propagation can propagate the V32QI xmm16 into other insns like vpand.
> > 
> > The following patch fixes it by making sure we never create such registers,
> > just emit (set (reg:V64QI xmm5) (reg:V64QI xmm16)) instead, which by copying
> > all the 512 bits also copies the low bits, and as the destination is
> > originally V32QI which is not HARD_REGNO_MODE_OK in xmm16+, this should be
> > fine.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> Patch is OK for trunk.

I've posted an updated version of this patch later on in
https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00563.html
Is that one ok for trunk instead?

And sorry for not getting it right the first time.

	Jakub
Kirill Yukhin April 12, 2018, 11:11 a.m. | #3
> On 12 Apr 2018, at 13:53, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Thu, Apr 12, 2018 at 01:46:40PM +0300, Kirill Yukhin wrote:
>> 
>> Hello Jakub!
>> 
>>> On 11 Apr 2018, at 16:27, Jakub Jelinek <jakub@redhat.com> wrote:
>>> In lots of patterns we assume that we never see xmm16+ hard registers
>>> with 128-bit and 256-bit vector modes when not -mavx512vl, because
>>> HARD_REGNO_MODE_OK refuses those.
>>> Unfortunately, as this testcase and patch shows, the vec_extract_lo*
>>> splitters work as a loophole around this, we happily create instructions
>>> like (set (reg:V32QI xmm5) (reg:V32QI xmm16)) and then hard register
>>> propagation can propagate the V32QI xmm16 into other insns like vpand.
>>> 
>>> The following patch fixes it by making sure we never create such registers,
>>> just emit (set (reg:V64QI xmm5) (reg:V64QI xmm16)) instead, which by copying
>>> all the 512 bits also copies the low bits, and as the destination is
>>> originally V32QI which is not HARD_REGNO_MODE_OK in xmm16+, this should be
>>> fine.
>>> 
>>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> Patch is OK for trunk.
> 
> I've posted an updated version of this patch later on in
> https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00563.html
> Is that one ok for trunk instead?
Yes.

—
Thanks, K
> 
> And sorry for not getting it right the first time.
> 
> 	Jakub

Patch

--- gcc/config/i386/sse.md.jj	2018-04-10 14:37:02.092801344 +0200
+++ gcc/config/i386/sse.md	2018-04-11 12:00:44.296840287 +0200
@@ -7362,7 +7362,15 @@  (define_split
 	  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_AVX512DQ && reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (<ssequartermode>mode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+      && REG_P (operands[0])
+      && EXT_REX_SSE_REG_P (operands[1]))
+    operands[0]
+      = lowpart_subreg (<MODE>mode, operands[0], <ssequartermode>mode);
+  else
+    operands[1] = gen_lowpart (<ssequartermode>mode, operands[1]);
+})
 
 (define_insn "<mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name>"
   [(set (match_operand:<ssequartermode> 0 "<store_mask_predicate>" "=<store_mask_constraint>")
@@ -7395,7 +7403,15 @@  (define_split
 		     (const_int 2) (const_int 3)])))]
   "TARGET_AVX512F && reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (<ssequartermode>mode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+      && REG_P (operands[0])
+      && EXT_REX_SSE_REG_P (operands[1]))
+    operands[0]
+      = lowpart_subreg (<MODE>mode, operands[0], <ssequartermode>mode);
+  else
+    operands[1] = gen_lowpart (<ssequartermode>mode, operands[1]);
+})
 
 (define_mode_attr extract_type_2
   [(V16SF "avx512dq") (V16SI "avx512dq") (V8DF "avx512f") (V8DI "avx512f")])
@@ -7655,7 +7671,15 @@  (define_split
   "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))
    && reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+      && REG_P (operands[0])
+      && EXT_REX_SSE_REG_P (operands[1]))
+    operands[0]
+      = lowpart_subreg (<MODE>mode, operands[0], <ssehalfvecmode>mode);
+  else
+    operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);
+})
 
 (define_insn "vec_extract_lo_<mode><mask_name>"
   [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=v,m")
@@ -7830,7 +7854,14 @@  (define_insn_and_split "vec_extract_lo_v
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (V16HImode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+      && REG_P (operands[0])
+      && EXT_REX_SSE_REG_P (operands[1]))
+    operands[0] = lowpart_subreg (V32HImode, operands[0], V16HImode);
+  else
+    operands[1] = gen_lowpart (V16HImode, operands[1]);
+})
 
 (define_insn "vec_extract_hi_v32hi"
   [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,m")
@@ -7915,7 +7946,14 @@  (define_insn_and_split "vec_extract_lo_v
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
-  "operands[1] = gen_lowpart (V32QImode, operands[1]);")
+{
+  if (!TARGET_AVX512VL
+      && REG_P (operands[0])
+      && EXT_REX_SSE_REG_P (operands[1]))
+    operands[0] = lowpart_subreg (V64QImode, operands[0], V32QImode);
+  else
+    operands[1] = gen_lowpart (V32QImode, operands[1]);
+})
 
 (define_insn "vec_extract_hi_v64qi"
   [(set (match_operand:V32QI 0 "nonimmediate_operand" "=v,m")
--- gcc/testsuite/gcc.target/i386/pr85328.c.jj	2018-04-11 12:07:15.044933408 +0200
+++ gcc/testsuite/gcc.target/i386/pr85328.c	2018-04-11 10:45:17.269733600 +0200
@@ -0,0 +1,18 @@ 
+/* PR target/85328 */
+/* { dg-do assemble { target avx512f } } */
+/* { dg-options "-O3 -fno-caller-saves -mavx512f" } */
+
+typedef char U __attribute__((vector_size (64)));
+typedef int V __attribute__((vector_size (64)));
+U a, b;
+
+extern void bar (void);
+
+V
+foo (V f)
+{
+  b <<= (U){(V){}[63]} & 7;
+  bar ();
+  a = (U)f & 7;
+  return (V)b;
+}