diff mbox

[x86] Enable v64qi permutations.

Message ID 20141204094959.GA67582@msticlxl7.ims.intel.com
State New
Headers show

Commit Message

Ilya Tocar Dec. 4, 2014, 9:49 a.m. UTC
Hi,

As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html
This patch enables v64qi permutations.
I've checked  vshuf* tests from dg-torture.exp,
with avx512* options on sde and generated permutations are correct.

OK for trunk?

---
 gcc/config/i386/i386.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/config/i386/sse.md |  4 +--
 2 files changed, 87 insertions(+), 2 deletions(-)

Comments

H.J. Lu Dec. 4, 2014, 11:54 a.m. UTC | #1
On Thu, Dec 4, 2014 at 1:49 AM, Ilya Tocar <tocarip.intel@gmail.com> wrote:
> Hi,
>
> As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html
> This patch enables v64qi permutations.
> I've checked  vshuf* tests from dg-torture.exp,
> with avx512* options on sde and generated permutations are correct.
>
> OK for trunk?
>

Can you add a few testcases?
Jakub Jelinek Dec. 4, 2014, 11:57 a.m. UTC | #2
On Thu, Dec 04, 2014 at 03:54:25AM -0800, H.J. Lu wrote:
> On Thu, Dec 4, 2014 at 1:49 AM, Ilya Tocar <tocarip.intel@gmail.com> wrote:
> > Hi,
> >
> > As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html
> > This patch enables v64qi permutations.
> > I've checked  vshuf* tests from dg-torture.exp,
> > with avx512* options on sde and generated permutations are correct.
> >
> > OK for trunk?
> >
> 
> Can you add a few testcases?

Isn't it already covered by gcc.dg/torture/vshuf* ?

	Jakub
H.J. Lu Dec. 4, 2014, noon UTC | #3
On Thu, Dec 4, 2014 at 3:57 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Dec 04, 2014 at 03:54:25AM -0800, H.J. Lu wrote:
>> On Thu, Dec 4, 2014 at 1:49 AM, Ilya Tocar <tocarip.intel@gmail.com> wrote:
>> > Hi,
>> >
>> > As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html
>> > This patch enables v64qi permutations.
>> > I've checked  vshuf* tests from dg-torture.exp,
>> > with avx512* options on sde and generated permutations are correct.
>> >
>> > OK for trunk?
>> >
>>
>> Can you add a few testcases?
>
> Isn't it already covered by gcc.dg/torture/vshuf* ?
>

I didn't see them fail on my machines today.
Jakub Jelinek Dec. 4, 2014, 12:04 p.m. UTC | #4
On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote:
> >> Can you add a few testcases?
> >
> > Isn't it already covered by gcc.dg/torture/vshuf* ?
> >
> 
> I didn't see them fail on my machines today.

Those are executable testcases, those better should not fail.
The patch just improved code generation and the testcases test
if the improved code generation works well.
Did you mean some scan-assembler test that verifies the better code
generation?  Guess it is possible, though fragile.

	Jakub
Uros Bizjak Dec. 4, 2014, 12:45 p.m. UTC | #5
On Thu, Dec 4, 2014 at 1:04 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote:
>> >> Can you add a few testcases?
>> >
>> > Isn't it already covered by gcc.dg/torture/vshuf* ?
>> >
>>
>> I didn't see them fail on my machines today.
>
> Those are executable testcases, those better should not fail.
> The patch just improved code generation and the testcases test
> if the improved code generation works well.
> Did you mean some scan-assembler test that verifies the better code
> generation?  Guess it is possible, though fragile.

I think that existing executable testcases adequately cover the
functionality of the patch.

The patch is OK.

Thanks,
Uros.
H.J. Lu Dec. 4, 2014, 12:57 p.m. UTC | #6
On Thu, Dec 4, 2014 at 4:04 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote:
>> >> Can you add a few testcases?
>> >
>> > Isn't it already covered by gcc.dg/torture/vshuf* ?
>> >
>>
>> I didn't see them fail on my machines today.
>
> Those are executable testcases, those better should not fail.
> The patch just improved code generation and the testcases test
> if the improved code generation works well.
> Did you mean some scan-assembler test that verifies the better code
> generation?  Guess it is possible, though fragile.

Well, we will never be sure that the better code is really generated
unless we visually exam the assembly code.  Any changes in
the future may disable the better code generation and we won't
notice until much later.
Richard Henderson Dec. 10, 2014, 4:49 p.m. UTC | #7
On 12/04/2014 01:49 AM, Ilya Tocar wrote:
> +  if (!TARGET_AVX512BW || !(d->vmode == V64QImode))

Please don't over-complicate the expression.
Use x != y instead of !(x == y).


r~
Robert Dewar Dec. 10, 2014, 4:52 p.m. UTC | #8
On 12/10/2014 11:49 AM, Richard Henderson wrote:
> On 12/04/2014 01:49 AM, Ilya Tocar wrote:
>> +  if (!TARGET_AVX512BW || !(d->vmode == V64QImode))
>
> Please don't over-complicate the expression.
> Use x != y instead of !(x == y).

To me the original reads more clearly, since it
is of the parallel form !X or !Y, I don't see it
as somehow more complicated???
>
>
> r~
>
diff mbox

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eafc15a..f29f8ce 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21831,6 +21831,10 @@  ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1,
       if (TARGET_AVX512VL && TARGET_AVX512BW)
 	gen = gen_avx512vl_vpermi2varv16hi3;
       break;
+    case V64QImode:
+      if (TARGET_AVX512VBMI)
+	gen = gen_avx512bw_vpermi2varv64qi3;
+      break;
     case V32HImode:
       if (TARGET_AVX512BW)
 	gen = gen_avx512bw_vpermi2varv32hi3;
@@ -48872,6 +48876,7 @@  expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d)
 	emit_move_insn (d->target, gen_lowpart (d->vmode, dest));
       return true;
 
+    case V64QImode:
     case V32QImode:
     case V16HImode:
     case V8SImode:
@@ -48905,6 +48910,78 @@  expand_vec_perm_broadcast (struct expand_vec_perm_d *d)
   return expand_vec_perm_broadcast_1 (d);
 }
 
+/* Implement arbitrary permutations of two V64QImode operands
+   will 2 vpermi2w, 2 vpshufb and one vpor instruction.  */
+static bool
+expand_vec_perm_vpermi2_vpshub2 (struct expand_vec_perm_d *d)
+{
+  if (!TARGET_AVX512BW || !(d->vmode == V64QImode))
+    return false;
+
+  if (d->testing_p)
+    return true;
+
+  struct expand_vec_perm_d ds[2];
+  rtx rperm[128], vperm, target0, target1;
+  unsigned int i, nelt;
+  machine_mode vmode;
+
+  nelt = d->nelt;
+  vmode = V64QImode;
+
+  for (i = 0; i < 2; i++)
+    {
+      ds[i] = *d;
+      ds[i].vmode = V32HImode;
+      ds[i].nelt = 32;
+      ds[i].target = gen_reg_rtx (V32HImode);
+      ds[i].op0 = gen_lowpart (V32HImode, d->op0);
+      ds[i].op1 = gen_lowpart (V32HImode, d->op1);
+    }
+
+  /* Prepare permutations such that the first one takes care of
+     putting the even bytes into the right positions or one higher
+     positions (ds[0]) and the second one takes care of
+     putting the odd bytes into the right positions or one below
+     (ds[1]).  */
+
+  for (i = 0; i < nelt; i++)
+    {
+      ds[i & 1].perm[i / 2] = d->perm[i] / 2;
+      if (i & 1)
+	{
+	  rperm[i] = constm1_rtx;
+	  rperm[i + 64] = GEN_INT ((i & 14) + (d->perm[i] & 1));
+	}
+      else
+	{
+	  rperm[i] = GEN_INT ((i & 14) + (d->perm[i] & 1));
+	  rperm[i + 64] = constm1_rtx;
+	}
+    }
+
+  bool ok = expand_vec_perm_1 (&ds[0]);
+  gcc_assert (ok);
+  ds[0].target = gen_lowpart (V64QImode, ds[0].target);
+
+  ok = expand_vec_perm_1 (&ds[1]);
+  gcc_assert (ok);
+  ds[1].target = gen_lowpart (V64QImode, ds[1].target);
+
+  vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm));
+  vperm = force_reg (vmode, vperm);
+  target0 = gen_reg_rtx (V64QImode);
+  emit_insn (gen_avx512bw_pshufbv64qi3 (target0, ds[0].target, vperm));
+
+  vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm + 64));
+  vperm = force_reg (vmode, vperm);
+  target1 = gen_reg_rtx (V64QImode);
+  emit_insn (gen_avx512bw_pshufbv64qi3 (target1, ds[1].target, vperm));
+
+  emit_insn (gen_iorv64qi3 (d->target, target0, target1));
+  return true;
+}
+
 /* Implement arbitrary permutation of two V32QImode and V16QImode operands
    with 4 vpshufb insns, 2 vpermq and 3 vpor.  We should have already failed
    all the shorter instruction sequences.  */
@@ -49079,6 +49156,9 @@  ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
   if (expand_vec_perm_vpshufb2_vpermq_even_odd (d))
     return true;
 
+  if (expand_vec_perm_vpermi2_vpshub2 (d))
+    return true;
+
   /* ??? Look for narrow permutations whose element orderings would
      allow the promotion to a wider mode.  */
 
@@ -49223,6 +49303,11 @@  ix86_vectorize_vec_perm_const_ok (machine_mode vmode,
 	/* All implementable with a single vpermi2 insn.  */
 	return true;
       break;
+    case V64QImode:
+      if (TARGET_AVX512BW)
+	/* Implementable with 2 vpermi2, 2 vpshufb and 1 or insn.  */
+	return true;
+      break;
     case V8SImode:
     case V8SFmode:
     case V4DFmode:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ca5d720..6252e7e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10678,7 +10678,7 @@ 
    (V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2")
    (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
    (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
-   (V32HI "TARGET_AVX512BW")])
+   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")])
 
 (define_expand "vec_perm<mode>"
   [(match_operand:VEC_PERM_AVX2 0 "register_operand")
@@ -10700,7 +10700,7 @@ 
    (V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2")
    (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
    (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
-   (V32HI "TARGET_AVX512BW")])
+   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
 
 (define_expand "vec_perm_const<mode>"
   [(match_operand:VEC_PERM_CONST 0 "register_operand")