diff mbox

[1/2,x86] Add palignr support for AVX2.

Message ID CAOvf_xwh+0j-9YAgJCxKQSoNNJfq9KYr_iSLw=43JhFxAuOfPg@mail.gmail.com
State New
Headers show

Commit Message

Evgeny Stupachenko Aug. 26, 2014, 12:59 p.m. UTC
That is covered by a separate part of the patch:
(make check and bootstrap passed: 2 new passes for core-avx2)
is it ok?


The test case covering this is "gcc.target/i386/pr52252-atom.c".
It will pass for "-march=core-avx2" when the patch committed.

On Thu, Aug 14, 2014 at 6:55 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Aug 14, 2014 at 1:08 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> Ping.
>>
>> On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson <rth@redhat.com> wrote:
>>>> On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
>>>>> -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>>>>> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
>>>>
>>>> insn_num might as well be "bool avx2", since it's only ever set to two values.
>>>
>>> Agree. However:
>>>  after the alignment, one operand permutation could be just move and
>>> take 2 instructions for AVX2 as well
>>>  for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
>>>  we can leave it for potential avx512 extension
>>>
>>>>
>>>>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>>>>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>>>>> +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
>>>>> +  if (GET_MODE_SIZE (d->vmode) == 16)
>>>>> +    {
>>>>> +      if (!TARGET_SSSE3)
>>>>> +       return false;
>>>>> +    }
>>>>> +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
>>>>> +  else if (GET_MODE_SIZE (d->vmode) == 32)
>>>>> +    {
>>>>> +      if (!TARGET_AVX2)
>>>>> +       return false;
>>>>> +    }
>>>>> +  /* Other sizes are not supported.  */
>>>>> +  else
>>>>>      return false;
>>>>
>>>> And you'd better check it up here because...
>>>>
>>>
>>> Correct. The following should resolve the issue:
>>>   /* For AVX2 we need more than 2 instructions when the alignment
>>>      by itself does not produce the desired permutation.  */
>>>   if (TARGET_AVX2 && insn_num <= 2)
>>>     return false;
>>>
>>>>> +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
>>>>> +     operand permutaoin.  */
>>>>> +  if (insn_num == 2)
>>>>> +    {
>>>>> +      ok = expand_vec_perm_1 (&dcopy);
>>>>> +      gcc_assert (ok);
>>>>> +    }
>>>>> +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
>>>>> +     vperm plus 4 instructions for one operand permutation.  */
>>>>> +  else if (insn_num == 6)
>>>>> +    {
>>>>> +      ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
>>>>> +      gcc_assert (ok);
>>>>> +    }
>>>>> +  else
>>>>> +    ok = false;
>>>>>    return ok;
>>>>
>>>> ... down here you'll simply ICE from the gcc_assert.
>>>
>
> Can you modify your patch to fix
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62128
>
> with a testcase?
>
>
> --
> H.J.

Comments

Richard Henderson Aug. 26, 2014, 4:29 p.m. UTC | #1
On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
> +(define_insn_and_split "avx2_rotate<mode>_perm"
> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
> +      (vec_select:V_256
> +       (match_operand:V_256 1 "register_operand" "x")
> +       (match_parallel 2 "palignr_operand"
> +         [(match_operand 3 "const_int_operand" "n")])))]
> +  "TARGET_AVX2"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]

Why are you waiting until after reload to expand this?  It's only the
vec_select parallel that determines which direction the palignr should be done.

This seems like something you could do during permutation expansion.


r~
Evgeny Stupachenko Aug. 27, 2014, 3:50 p.m. UTC | #2
The rotate insn appeared right after expand.
I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
I don't see any potential losses on splitting that after reload.

On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>> +      (vec_select:V_256
>> +       (match_operand:V_256 1 "register_operand" "x")
>> +       (match_parallel 2 "palignr_operand"
>> +         [(match_operand 3 "const_int_operand" "n")])))]
>> +  "TARGET_AVX2"
>> +  "#"
>> +  "&& reload_completed"
>> +  [(const_int 0)]
>
> Why are you waiting until after reload to expand this?  It's only the
> vec_select parallel that determines which direction the palignr should be done.
>
> This seems like something you could do during permutation expansion.
>
>
> r~
>
>
Evgeny Stupachenko Sept. 8, 2014, 10:03 a.m. UTC | #3
PING

On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> The rotate insn appeared right after expand.
> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
> I don't see any potential losses on splitting that after reload.
>
> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>> +      (vec_select:V_256
>>> +       (match_operand:V_256 1 "register_operand" "x")
>>> +       (match_parallel 2 "palignr_operand"
>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>> +  "TARGET_AVX2"
>>> +  "#"
>>> +  "&& reload_completed"
>>> +  [(const_int 0)]
>>
>> Why are you waiting until after reload to expand this?  It's only the
>> vec_select parallel that determines which direction the palignr should be done.
>>
>> This seems like something you could do during permutation expansion.
>>
>>
>> r~
>>
>>
Evgeny Stupachenko Sept. 16, 2014, 1:15 p.m. UTC | #4
PING 2

On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> PING
>
> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> The rotate insn appeared right after expand.
>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
>> I don't see any potential losses on splitting that after reload.
>>
>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>>> +      (vec_select:V_256
>>>> +       (match_operand:V_256 1 "register_operand" "x")
>>>> +       (match_parallel 2 "palignr_operand"
>>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>>> +  "TARGET_AVX2"
>>>> +  "#"
>>>> +  "&& reload_completed"
>>>> +  [(const_int 0)]
>>>
>>> Why are you waiting until after reload to expand this?  It's only the
>>> vec_select parallel that determines which direction the palignr should be done.
>>>
>>> This seems like something you could do during permutation expansion.
>>>
>>>
>>> r~
>>>
>>>
H.J. Lu Sept. 16, 2014, 2:51 p.m. UTC | #5
On Tue, Sep 16, 2014 at 6:15 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> PING 2
>
> On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> PING
>>
>> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> The rotate insn appeared right after expand.
>>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
>>> I don't see any potential losses on splitting that after reload.
>>>
>>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>>>> +      (vec_select:V_256
>>>>> +       (match_operand:V_256 1 "register_operand" "x")
>>>>> +       (match_parallel 2 "palignr_operand"
>>>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>>>> +  "TARGET_AVX2"
>>>>> +  "#"
>>>>> +  "&& reload_completed"
>>>>> +  [(const_int 0)]
>>>>
>>>> Why are you waiting until after reload to expand this?  It's only the
>>>> vec_select parallel that determines which direction the palignr should be done.
>>>>
>>>> This seems like something you could do during permutation expansion.
>>>>
>>>>

Assuming your change is triggered today without any additional changes
 you should include some testcases.  For now, it doesn't show if it does
anything useful.
Evgeny Stupachenko Sept. 17, 2014, 1:01 p.m. UTC | #6
It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.

On Tue, Sep 16, 2014 at 6:51 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Sep 16, 2014 at 6:15 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> PING 2
>>
>> On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> PING
>>>
>>> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>>> The rotate insn appeared right after expand.
>>>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
>>>> I don't see any potential losses on splitting that after reload.
>>>>
>>>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>>>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>>>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>>>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>>>>> +      (vec_select:V_256
>>>>>> +       (match_operand:V_256 1 "register_operand" "x")
>>>>>> +       (match_parallel 2 "palignr_operand"
>>>>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>>>>> +  "TARGET_AVX2"
>>>>>> +  "#"
>>>>>> +  "&& reload_completed"
>>>>>> +  [(const_int 0)]
>>>>>
>>>>> Why are you waiting until after reload to expand this?  It's only the
>>>>> vec_select parallel that determines which direction the palignr should be done.
>>>>>
>>>>> This seems like something you could do during permutation expansion.
>>>>>
>>>>>
>
> Assuming your change is triggered today without any additional changes
>  you should include some testcases.  For now, it doesn't show if it does
> anything useful.
>
> --
> H.J.
H.J. Lu Sept. 17, 2014, 2:49 p.m. UTC | #7
On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>

I suggest you resubmit the patch as a bug fix for pr62128 with
testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.
Evgeny Stupachenko Sept. 17, 2014, 5:26 p.m. UTC | #8
The test in pr62128 is exactly TEST 22 from
gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
or not.
Resubmitting patch looks good as current mail thread is already too complicated.

On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>>
>
> I suggest you resubmit the patch as a bug fix for pr62128 with
> testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.
>
>
> --
> H.J.
Evgeny Stupachenko Oct. 1, 2014, 10:16 a.m. UTC | #9
Getting back to initial patch, is it ok?
It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check.
X86 bootstrap is also ok.

2014-10-01  Evgeny Stupachenko  <evstupac@gmail.com>

        * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
        PALINGR instruction.
        * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
        for AVX2.

On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> The test in pr62128 is exactly TEST 22 from
> gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
> or not.
> Resubmitting patch looks good as current mail thread is already too complicated.
>
> On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>>>
>>
>> I suggest you resubmit the patch as a bug fix for pr62128 with
>> testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.
>>
>>
>> --
>> H.J.
Uros Bizjak Oct. 1, 2014, 10:28 a.m. UTC | #10
On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Getting back to initial patch, is it ok?

IMO, we should start with Jakub's proposed patch [1]

[1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html

Uros.

> It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check.
> X86 bootstrap is also ok.
>
> 2014-10-01  Evgeny Stupachenko  <evstupac@gmail.com>
>
>         * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
>         PALINGR instruction.
>         * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
>         for AVX2.
>
> On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> The test in pr62128 is exactly TEST 22 from
>> gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
>> or not.
>> Resubmitting patch looks good as current mail thread is already too complicated.
>>
>> On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>>>>
>>>
>>> I suggest you resubmit the patch as a bug fix for pr62128 with
>>> testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.
>>>
>>>
>>> --
>>> H.J.
Jakub Jelinek Oct. 1, 2014, 10:35 a.m. UTC | #11
On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote:
> On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> > Getting back to initial patch, is it ok?
> 
> IMO, we should start with Jakub's proposed patch [1]
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html

That doesn't compile, will post a new version; got interrupted when
I found that in
GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
one test is miscompiled even with unpatched compiler, debugging that now.

That said, my patch will not do anything about the
case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation,
for that we can't do vpalignr followed by vpshufb or similar,
but need to do some permutation first and then vpalignr on
the result.  So it would need a new routine.  It is still a 2
insn permutation, not 6, and needs different algorithm, so sharing
the same routine for that is undesirable.

	Jakub
Evgeny Stupachenko Oct. 1, 2014, 10:50 a.m. UTC | #12
This is not a fix for the case { 1, 2, 3, ..., 31, 0 }. This patch is
an extension of expand_vec_perm_palignr on AVX2 case.
For the case { 1, 2, 3, ..., 31, 0 } we should use separate
function/pattern. I like split as it is similar to already handled SSE
byte rotate {1,2,3,.....,15, 0}: ssse3_palignr<mode>_perm and AVX2
split: *avx_vperm_broadcast_<mode>.

On Wed, Oct 1, 2014 at 2:35 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote:
>> On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> > Getting back to initial patch, is it ok?
>>
>> IMO, we should start with Jakub's proposed patch [1]
>>
>> [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html
>
> That doesn't compile, will post a new version; got interrupted when
> I found that in
> GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
> one test is miscompiled even with unpatched compiler, debugging that now.
>
> That said, my patch will not do anything about the
> case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation,
> for that we can't do vpalignr followed by vpshufb or similar,
> but need to do some permutation first and then vpalignr on
> the result.  So it would need a new routine.  It is still a 2
> insn permutation, not 6, and needs different algorithm, so sharing
> the same routine for that is undesirable.
>
>         Jakub
diff mbox

Patch

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d6155cf..68ee65a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -81,6 +81,7 @@ 
   ;; For AVX2 support
   UNSPEC_VPERMVAR
   UNSPEC_VPERMTI
+  UNSPEC_VPALIGNRDI
   UNSPEC_GATHER
   UNSPEC_VSIBADDR

@@ -14167,6 +14168,19 @@ 
    (set_attr "prefix" "vex")
    (set_attr "mode" "OI")])

+(define_insn "avx2_palignrv4di"
+  [(set (match_operand:V4DI 0 "register_operand" "=x")
+       (unspec:V4DI
+         [(match_operand:V4DI 1 "register_operand" "x")
+          (match_operand:V4DI 2 "nonimmediate_operand" "xm")
+          (match_operand:SI 3 "const_0_to_255_operand" "n")]
+         UNSPEC_VPALIGNRDI))]
+  "TARGET_AVX2"
+  "vpalignr\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "vex")
+   (set_attr "mode" "OI")])
+
 (define_insn "avx2_vec_dupv4df"
   [(set (match_operand:V4DF 0 "register_operand" "=x")
        (vec_duplicate:V4DF
@@ -14658,6 +14672,49 @@ 
    (set_attr "length_immediate" "1")
    (set_attr "prefix" "orig,vex")])

+(define_insn_and_split "avx2_rotate<mode>_perm"
+  [(set (match_operand:V_256 0 "register_operand" "=&x")
+      (vec_select:V_256
+       (match_operand:V_256 1 "register_operand" "x")
+       (match_parallel 2 "palignr_operand"
+         [(match_operand 3 "const_int_operand" "n")])))]
+  "TARGET_AVX2"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  enum machine_mode imode = GET_MODE_INNER (<MODE>mode);
+  int shift = INTVAL (operands[3]) * GET_MODE_SIZE (imode);
+  rtx op0 = gen_rtx_REG (V4DImode, REGNO (operands[0]));
+  rtx op1 = gen_rtx_REG (V4DImode, REGNO (operands[1]));
+
+  if (shift == 0)
+    emit_move_insn (operands[0], operands[1]);
+  else
+    {
+      emit_insn (gen_avx2_permv2ti (op0,
+                                   op1,
+                                   op1,
+                                   GEN_INT (33)));
+      if (shift < 16)
+       emit_insn (gen_avx2_palignrv4di (op0,
+                                        op0,
+                                        op1,
+                                        GEN_INT (shift)));
+      else if (shift > 16)
+       emit_insn (gen_avx2_palignrv4di (op0,
+                                        op1,
+                                        op0,
+                                        GEN_INT (shift - 16)));
+    }
+  DONE;
+}
+  [(set_attr "type" "sseishft")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "vex")])
+
+
 (define_expand "avx_vinsertf128<mode>"
   [(match_operand:V_256 0 "register_operand")
    (match_operand:V_256 1 "register_operand")