diff mbox

[x86] Improves x86 permutation expand

Message ID CAOvf_xwh02FHmgQDe+7Qj0Dec0c8LuqBjP5UaMkc4TwrdP11nA@mail.gmail.com
State New
Headers show

Commit Message

Evgeny Stupachenko June 5, 2014, 3:29 p.m. UTC
Hi,

The patch passed bootstrap and make check. No new fails.
The patch gives ~10% to test in pr52252 and potentially in pr61403.
Is it ok?

Thanks,
Evgeny

ChangeLog:

2014-06-05  Evgeny Stupachenko  <evstupac@gmail.com>

         * config/i386/i386.c (expand_vec_perm_pblendv): New.
         Permutation expand using pblendv.
         * config/i386/i386.c (ix86_expand_vec_perm_const_1): New
         scheme for permutation expand.

Patch:

expand_vec_perm_d *d)
   if (expand_vec_perm_vperm2f128 (d))
     return true;

+  if (expand_vec_perm_pblendv (d))
+    return true;
+
   /* Try sequences of three instructions.  */

   if (expand_vec_perm_2vperm2f128_vshuf (d))

Comments

Richard Henderson June 5, 2014, 4:49 p.m. UTC | #1
On 06/05/2014 08:29 AM, Evgeny Stupachenko wrote:
> +  /* Figure out where permutation elements stay not in their
> +     respective lanes.  */
> +  for (i = 0, which = 0; i < nelt; ++i)
> +    {
> +      unsigned e = d->perm[i];
> +      if (e != i)
> +       which |= (e < nelt ? 1 : 2);
> +    }
> +  /* We can pblend the part where elements stay not in their
> +     respective lanes only when these elements are all in one
> +     half of a permutation.
> +     {0 1 8 3 4 5 9 7} is ok as 8, 9 are not at their respective
> +     lanes, but both 8 and 9 >= 8
> +     {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their
> +     respective lanes and 8 >= 8, but 2 not.  */
> +  if (which != 1 && which != 2)
> +    return false;

I was about to suggest that you'd get more success by putting the blend first,
and do the shuffle second.  But I suppose it does cover a few cases that the
other way would miss, e.g.

  { 0 4 7 3 }

because we can't blend 0 and 4 (or 3 and 7) into the same vector.  Whereas the
direction you're trying can't handle

  { 0 6 6 1 }

But that can be implemented with

  { 0 1 2 3 }
  { 4 5 6 7 }
  -----------
  { 0 1 6 3 } (pblend)
  -----------
  { 0 6 6 1 } (pshufb)

So I guess we should cover these two cases in successive patches.

> +  if (!expand_vec_perm_blend (&dcopy1))
> +    return false;
> +
> +  return true;

You should avoid doing any work in this function if the ISA isn't enabled.
Don't wait until the last test for blend to fail.  Separate that out from the
start of expand_vec_perm_blend as a subroutine, perhaps.

We should be able to prove that we've got a valid blend as input here, so I'd
be more inclined to write

  ok = expand_vec_perm_blend (&dcopy1);
  gcc_assert (ok);
  return true;

> +  if (!expand_vec_perm_1 (&dcopy))
> +    return false;

If we know we have pblend, then we know we have pshufb, so again I don't see
how expand_vec_perm_1 can fail.  Another assert would be good.

There is a point, earlier in the function, where we know whether we're going to
succeed or not.  I believe just after

> +  if (which != 1 && which != 2)
> +    return false;

You should add a

  if (d->testing_p)
    return true;

at that point.


r~
diff mbox

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8827256..e1c8126 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -43185,6 +43185,76 @@  expand_vec_perm_palignr (struct expand_vec_perm_d *d)
   return ok;
 }

+/* A subroutine of ix86_expand_vec_perm_const_1.  Try to simplify
+   the permutation using the SSE4_1 pblendv instruction.  Potentially
+   reduces permutaion from 2 pshufb and or to 1 pshufb and pblendv.  */
+
+static bool
+expand_vec_perm_pblendv (struct expand_vec_perm_d *d)
+{
+  unsigned i, which, nelt = d->nelt;
+  struct expand_vec_perm_d dcopy, dcopy1;
+
+  /* Figure out where permutation elements stay not in their
+     respective lanes.  */
+  for (i = 0, which = 0; i < nelt; ++i)
+    {
+      unsigned e = d->perm[i];
+      if (e != i)
+       which |= (e < nelt ? 1 : 2);
+    }
+  /* We can pblend the part where elements stay not in their
+     respective lanes only when these elements are all in one
+     half of a permutation.
+     {0 1 8 3 4 5 9 7} is ok as 8, 9 are not at their respective
+     lanes, but both 8 and 9 >= 8
+     {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their
+     respective lanes and 8 >= 8, but 2 not.  */
+  if (which != 1 && which != 2)
+    return false;
+
+  /* First we apply one operand permutation to the part where
+     elements stay not in their respective lanes.  */
+  dcopy = *d;
+  if (which == 2)
+    dcopy.op0 = dcopy.op1 = d->op1;
+  else
+    dcopy.op0 = dcopy.op1 = d->op0;
+  dcopy.one_operand_p = true;
+
+  for (i = 0; i < nelt; ++i)
+    {
+      unsigned e = d->perm[i];
+      if (which == 2)
+       dcopy.perm[i] = ((e >= nelt) ? (e - nelt) : e);
+    }
+
+  if (!expand_vec_perm_1 (&dcopy))
+    return false;
+
+  /* Next we put permuted elements into thier positions.  */
+  dcopy1 = *d;
+  if (which == 2)
+    dcopy1.op1 = dcopy.target;
+  else
+    dcopy1.op0 = dcopy.target;
+
+  for (i = 0; i < nelt; ++i)
+    {
+      unsigned e = d->perm[i];
+      if (which == 2)
+       dcopy1.perm[i] = ((e >= nelt) ? (nelt + i) : e);
+      else
+       dcopy1.perm[i] = ((e < nelt) ? i : e);
+    }
+
+  if (!expand_vec_perm_blend (&dcopy1))
+    return false;
+
+  return true;
+}
+
 static bool expand_vec_perm_interleave3 (struct expand_vec_perm_d *d);

 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
@@ -44557,6 +44627,9 @@  ix86_expand_vec_perm_const_1 (struct