diff mbox

[2/2,x86] Add palignr support for AVX2.

Message ID CAOvf_xyiA5uaZGHd+86Z6X_6=02pRQ7Nc48nbMrHRuyj+kj_kQ@mail.gmail.com
State New
Headers show

Commit Message

Evgeny Stupachenko April 29, 2014, 1:50 p.m. UTC
Hi,

The patch adds use of palignr instruction, when we have one operand
permutation like:
{5 6 7 0 1 2 3 4}:

Treating this as {5 6 7 8 9 a b c} on 2 operands, and therefore palignr on 5.

Bootstrap and make check passed.

Is it ok?

Evgeny

2014-04-29  Evgeny Stupachenko  <evstupac@gmail.com>

        * config/i386/i386.c (expand_vec_perm_palignr_one_operand): New.
        Enables PALIGNR on one operand permutation.
        * config/i386/i386.c (expand_vec_perm_1): Try PALIGNR on one operand.

Comments

Richard Henderson April 29, 2014, 2:50 p.m. UTC | #1
On 04/29/2014 06:50 AM, Evgeny Stupachenko wrote:
> +  if (d->one_operand_p != true)
> +    return false;

This looks odd.  Better as !d->one_operand_p.

> +
> +  /* For an in order permutation with one operand like: {5 6 7 0 1 2 3 4}
> +     PALIGNR is better than PSHUFB.  Check for an order in permutation.  */

FWIW, "in order permutation" sounds like a contradiction in terms.
Better to describe this as a rotate.

> +  in_order_length = 0;
> +  in_order_length_max = 0;
> +  if (d->one_operand_p == true)

You've just tested one_operand above.

> +    for (i = 0; i < 2 * nelt; ++i)

Which means that 2*nelt is doing twice as much work as needed.

> +      {
> +       if ((d->perm[(i + 1) & (nelt - 1)] -
> +            d->perm[i & (nelt - 1)]) != 1)

Surely we can avoid re-reading the comparison value...

  next = (d->perm[0] + 1) & (nelt - 1);
  for (i = 1; i < nelt; ++i)
    {
      if (d->perm[i] != next)
        return false;
      next = (next + 1) & (nelt - 1);
    }

> +         {
> +           if (in_order_length > in_order_length_max)
> +               in_order_length_max = in_order_length;
> +             in_order_length = 0;
> +         }
> +       else
> +         in_order_length++;
> +      }
> +
> +  /* If not an ordered permutation then try something else.  */
> +  if (in_order_length_max != nelt - 1)
> +    return false;

I don't understand what this length and max stuff is trying to accomplish.

> +
> +  min = d->perm[0];
> +
> +  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
> +  shift1 = GEN_INT ((min - nelt / 2) *
> +          GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
> +
> +  if (GET_MODE_SIZE (d->vmode) != 32)

Positive tests are almost always clearer: == 16.


r~
H.J. Lu April 29, 2014, 2:58 p.m. UTC | #2
On Tue, Apr 29, 2014 at 6:50 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Hi,
>
> The patch adds use of palignr instruction, when we have one operand
> permutation like:
> {5 6 7 0 1 2 3 4}:
>
> Treating this as {5 6 7 8 9 a b c} on 2 operands, and therefore palignr on 5.
>
> Bootstrap and make check passed.
>
> Is it ok?
>
> Evgeny
>
> 2014-04-29  Evgeny Stupachenko  <evstupac@gmail.com>
>
>         * config/i386/i386.c (expand_vec_perm_palignr_one_operand): New.
>         Enables PALIGNR on one operand permutation.
>         * config/i386/i386.c (expand_vec_perm_1): Try PALIGNR on one operand.
>
>

I think it is better to include some testcases in this
patch so that the backend change and its testcases
are self-contained.

Thanks.
Evgeny Stupachenko April 29, 2014, 3:54 p.m. UTC | #3
On Tue, Apr 29, 2014 at 6:58 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Apr 29, 2014 at 6:50 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> Hi,
>>
>> The patch adds use of palignr instruction, when we have one operand
>> permutation like:
>> {5 6 7 0 1 2 3 4}:
>>
>> Treating this as {5 6 7 8 9 a b c} on 2 operands, and therefore palignr on 5.
>>
>> Bootstrap and make check passed.
>>
>> Is it ok?
>>
>> Evgeny
>>
>> 2014-04-29  Evgeny Stupachenko  <evstupac@gmail.com>
>>
>>         * config/i386/i386.c (expand_vec_perm_palignr_one_operand): New.
>>         Enables PALIGNR on one operand permutation.
>>         * config/i386/i386.c (expand_vec_perm_1): Try PALIGNR on one operand.
>>
>>
>
> I think it is better to include some testcases in this
> patch so that the backend change and its testcases
> are self-contained.
>
I have no idea how to generate permutation for palignr right now, but
I have a patch fixing pr52252 generating such (first part of it is
currently under review with corresponding tests, next part will
generate palignr instructions on the tests).
Initial implementation of palirnr support was committed without tests.
Current patches are extension of this initial implementation.

> Thanks.
>
> --
> H.J.

Thanks,
Evgeny
Jakub Jelinek April 29, 2014, 5:25 p.m. UTC | #4
On Tue, Apr 29, 2014 at 07:58:39AM -0700, H.J. Lu wrote:
> On Tue, Apr 29, 2014 at 6:50 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> > The patch adds use of palignr instruction, when we have one operand
> > permutation like:
> > {5 6 7 0 1 2 3 4}:
> >
> > Treating this as {5 6 7 8 9 a b c} on 2 operands, and therefore palignr on 5.
> >
> > Bootstrap and make check passed.
> >
> > Is it ok?
> >
> > Evgeny
> >
> > 2014-04-29  Evgeny Stupachenko  <evstupac@gmail.com>
> >
> >         * config/i386/i386.c (expand_vec_perm_palignr_one_operand): New.
> >         Enables PALIGNR on one operand permutation.
> >         * config/i386/i386.c (expand_vec_perm_1): Try PALIGNR on one operand.
> >
> >
> 
> I think it is better to include some testcases in this
> patch so that the backend change and its testcases
> are self-contained.

Note that most likely it should be already covered by
gcc.dg/torture/vshuf*.c, especially with GCC_TEST_RUN_EXPENSIVE=1.
Though, apparently we need to add new tests for AVX-512, because
v64qi, v32hi, v16si and v8di aren't covered (the last 3 are trivial,
for v64qi supposedly we'd need to write new vshuf-64.inc.

	Jakub
diff mbox

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 002d295..8950cf7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42807,6 +42807,97 @@  expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
   return true;
 }

+/* A subroutine of ix86_expand_vec_perm_1.  Try to use just palignr
+   instruction for one operand permutation.  This is better than pshufb
+   as does not require to pass big constant and faster on some x86
+   architectures.  */
+
+static bool
+expand_vec_perm_palignr_one_operand (struct expand_vec_perm_d *d)
+{
+  unsigned i, nelt = d->nelt;
+  unsigned min;
+  unsigned in_order_length, in_order_length_max;
+  rtx shift, shift1, target, tmp;
+
+  /* PALIGNR of 2 128-bits registers takes only 1 instrucion.
+     Requires SSSE3.  */
+  if (GET_MODE_SIZE (d->vmode) == 16)
+    {
+      if(!TARGET_SSSE3)
+       return false;
+    }
+  /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions:
+     PERM and PALIGNR.  It is more profitable than 2 PSHUFB and PERM.  */
+  else if (GET_MODE_SIZE (d->vmode) == 32)
+    {
+      if(!TARGET_AVX2)
+       return false;
+    }
+  else
+    return false;
+
+  if (d->one_operand_p != true)
+    return false;
+
+  /* For an in order permutation with one operand like: {5 6 7 0 1 2 3 4}
+     PALIGNR is better than PSHUFB.  Check for an order in permutation.  */
+  in_order_length = 0;
+  in_order_length_max = 0;
+  if (d->one_operand_p == true)
+    for (i = 0; i < 2 * nelt; ++i)
+      {
+       if ((d->perm[(i + 1) & (nelt - 1)] -
+            d->perm[i & (nelt - 1)]) != 1)
+         {
+           if (in_order_length > in_order_length_max)
+               in_order_length_max = in_order_length;
+             in_order_length = 0;
+         }
+       else
+         in_order_length++;
+      }
+
+  /* If not an ordered permutation then try something else.  */
+  if (in_order_length_max != nelt - 1)
+    return false;
+
+  min = d->perm[0];
+
+  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+  shift1 = GEN_INT ((min - nelt / 2) *
+          GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+
+  if (GET_MODE_SIZE (d->vmode) != 32)
+    {
+      target = gen_reg_rtx (TImode);
+      emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1),
+                                     gen_lowpart (TImode, d->op0), shift));
+    }
+  else
+    {
+      target = gen_reg_rtx (V2TImode);
+      tmp = gen_reg_rtx (V4DImode);
+      emit_insn (gen_avx2_permv2ti (tmp,
+                                   gen_lowpart (V4DImode, d->op0),
+                                   gen_lowpart (V4DImode, d->op1),
+                                   GEN_INT (33)));
+      if (min < nelt / 2)
+        emit_insn (gen_avx2_palignrv2ti (target,
+                                        gen_lowpart (V2TImode, tmp),
+                                        gen_lowpart (V2TImode, d->op0),
+                                        shift));
+      else
+       emit_insn (gen_avx2_palignrv2ti (target,
+                                        gen_lowpart (V2TImode, d->op1),
+                                        gen_lowpart (V2TImode, tmp),
+                                        shift1));
+    }
+  emit_move_insn (d->target, gen_lowpart (d->vmode, target));
+
+  return true;
+}
+
 static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d);

 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
@@ -42943,6 +43034,10 @@  expand_vec_perm_1 (struct expand_vec_perm_d *d)
   if (expand_vec_perm_vpermil (d))
     return true;

+  /* Try palignr on one operand.  */
+  if (expand_vec_perm_palignr_one_operand (d))
+    return true;
+
   /* Try the SSSE3 pshufb or XOP vpperm or AVX2 vperm2i128,
      vpshufb, vpermd, vpermps or vpermq variable permutation.  */
   if (expand_vec_perm_pshufb (d))