Message ID | CAMe9rOqVpLJbs_Re28bNnQ0Jdt2oy2zheFWfckLSX-VEnk56eQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, May 31, 2016 at 06:54:14AM -0700, H.J. Lu wrote: > On Mon, May 23, 2016 at 10:15 AM, Jakub Jelinek <jakub@redhat.com> wrote: > > Hi! > > > > The vbroadcastss and vpermilps insns are already in AVX512F & AVX512VL, > > so can be used with v instead of x, the splitter case where we for AVX > > emit vpermilps plus vpermf128 is more problematic, because the latter > > insn isn't available in EVEX. But, we can get the same effect with > > vshuff32x4 when both source operands are the same. > > Alternatively, we could replace the vpermilps and vshuff32x4 insns > > with the AVX512VL arbitrary permutations I think, the question is > > what is faster, because we'd need to load the mask from memory. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > 2016-05-23 Jakub Jelinek <jakub@redhat.com> > > > > * config/i386/sse.md > > (<mask_codefor>avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Rename > > to ... > > (avx512vl_shuf_<shuffletype>32x4_1<mask_name>): ... this. > > (*avx_vperm_broadcast_v4sf): Use v constraint instead of x. Use > > maybe_evex prefix instead of vex. > > (*avx_vperm_broadcast_<mode>): Use v constraint instead of x. Handle > > EXT_REX_SSE_REG_P (op0) case in the splitter. > > > > * gcc.target/i386/avx512vl-vbroadcast-3.c: New test. > > > > The new test fails on x32 due to 32-bit register in address. This > patch fixes it. Tested on x86-64. OK for trunk? Ok, thanks. > 2016-05-31 H.J. Lu <hongjiu.lu@intel.com> > > * gcc.target/i386/avx512vl-vbroadcast-3.c: Scan %\[re\]di > instead of %rdi. > * gcc.target/i386/avx512vl-vcvtps2ph-3.c: Likewise. Jakub
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c index d981fe4..7233398 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c @@ -150,9 +150,9 @@ f16 (V2 *x) asm volatile ("" : "+v" (a)); } -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%rdi\[^\n\r]*%xmm16" 4 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%xmm16" 4 } } */ /* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 3 } } */ -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%rdi\[^\n\r]*%ymm16" 3 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 3 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$0\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$85\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$170\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c index 2fd2215..c2e3f01 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c @@ -38,4 +38,4 @@ f3 (__m256 x, __v8hi *y) *y = (__v8hi) _mm256_cvtps_ph (a, 1); } -/* { dg-final { scan-assembler "vcvtps2ph\[^\n\r]*\\\$1\[^\n\r]*%ymm16\[^\n\r]*%rdi" } } */