[1/2,x86] Add palignr support for AVX2.

Message ID	CAOvf_xwh+0j-9YAgJCxKQSoNNJfq9KYr_iSLw=43JhFxAuOfPg@mail.gmail.com
State	New
Headers	show Return-Path: <gcc-patches-return-376187-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=oqroD934mkDqPue01+ F9spGkSC3rjP5ZkFmvN0VTKEHIEvoF2eA7x7EBrWJA6XJE0Y27Sy609BmJ+ULFrP TEBc3tlrlhUltebvsGF1iB+YVQRcgfX3N+V6HtUna7/p70TCljRRuE0jXoOrrrf+ 4ljoSUiPsdb3+VMMfyGxxHtBY= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org MIME-Version: 1.0 In-Reply-To: <CAMe9rOrVkSgMCjCNmU7+SSyT-BLLQ6myWfq+K-WjirdzaZtYMw@mail.gmail.com> References: <CAOvf_xx3-VpgN8YDxJBPvzzNGNykUPoLdU6xThW_QBN7byy5rw@mail.gmail.com> <535E909A.7040205@redhat.com> <CAOvf_xygddE6yxkp=+SZtVz+bvVciH5YfH1Cwy1WxdPHCgrbtg@mail.gmail.com> <535EC233.7000500@redhat.com> <CAOvf_xxq6PNN2KLpA8yoB2jjgmy-UAOvij_ghgsKPkZnsu2Rkg@mail.gmail.com> <537A23F7.2060601@redhat.com> <CAOvf_xwYJpQ0H9=MrLzUd40JDksZ8Tfsks2=8ZW4MfR_wCjU1A@mail.gmail.com> <53BAB164.7000006@redhat.com> <CAOvf_xy20vC4h_kzyDAE0D_Pgm-2aoi3C9yxesoT=n4TVcANcg@mail.gmail.com> <CAOvf_xzsTEtRy+Sz+JaeEZDyd_A7yTqh-kQrpSMNssWdp3hQ3g@mail.gmail.com> <CAMe9rOrVkSgMCjCNmU7+SSyT-BLLQ6myWfq+K-WjirdzaZtYMw@mail.gmail.com> Date: Tue, 26 Aug 2014 16:59:14 +0400 Message-ID: <CAOvf_xwh+0j-9YAgJCxKQSoNNJfq9KYr_iSLw=43JhFxAuOfPg@mail.gmail.com> Subject: Re: [PATCH 1/2, x86] Add palignr support for AVX2. From: Evgeny Stupachenko <evstupac@gmail.com> To: "H.J. Lu" <hjl.tools@gmail.com> Cc: Richard Henderson <rth@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>, Richard Biener <rguenther@suse.de>, Uros Bizjak <ubizjak@gmail.com> Content-Type: text/plain; charset=UTF-8

Evgeny Stupachenko Aug. 26, 2014, 12:59 p.m. UTC

That is covered by a separate part of the patch:
(make check and bootstrap passed: 2 new passes for core-avx2)
is it ok?


The test case covering this is "gcc.target/i386/pr52252-atom.c".
It will pass for "-march=core-avx2" when the patch committed.

On Thu, Aug 14, 2014 at 6:55 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Aug 14, 2014 at 1:08 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> Ping.
>>
>> On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson <rth@redhat.com> wrote:
>>>> On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
>>>>> -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
>>>>> +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)
>>>>
>>>> insn_num might as well be "bool avx2", since it's only ever set to two values.
>>>
>>> Agree. However:
>>>  after the alignment, one operand permutation could be just move and
>>> take 2 instructions for AVX2 as well
>>>  for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
>>>  we can leave it for potential avx512 extension
>>>
>>>>
>>>>> -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
>>>>> -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
>>>>> +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
>>>>> +  if (GET_MODE_SIZE (d->vmode) == 16)
>>>>> +    {
>>>>> +      if (!TARGET_SSSE3)
>>>>> +       return false;
>>>>> +    }
>>>>> +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
>>>>> +  else if (GET_MODE_SIZE (d->vmode) == 32)
>>>>> +    {
>>>>> +      if (!TARGET_AVX2)
>>>>> +       return false;
>>>>> +    }
>>>>> +  /* Other sizes are not supported.  */
>>>>> +  else
>>>>>      return false;
>>>>
>>>> And you'd better check it up here because...
>>>>
>>>
>>> Correct. The following should resolve the issue:
>>>   /* For AVX2 we need more than 2 instructions when the alignment
>>>      by itself does not produce the desired permutation.  */
>>>   if (TARGET_AVX2 && insn_num <= 2)
>>>     return false;
>>>
>>>>> +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
>>>>> +     operand permutaoin.  */
>>>>> +  if (insn_num == 2)
>>>>> +    {
>>>>> +      ok = expand_vec_perm_1 (&dcopy);
>>>>> +      gcc_assert (ok);
>>>>> +    }
>>>>> +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
>>>>> +     vperm plus 4 instructions for one operand permutation.  */
>>>>> +  else if (insn_num == 6)
>>>>> +    {
>>>>> +      ok = expand_vec_perm_vpshufb2_vpermq (&dcopy);
>>>>> +      gcc_assert (ok);
>>>>> +    }
>>>>> +  else
>>>>> +    ok = false;
>>>>>    return ok;
>>>>
>>>> ... down here you'll simply ICE from the gcc_assert.
>>>
>
> Can you modify your patch to fix
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62128
>
> with a testcase?
>
>
> --
> H.J.

Richard Henderson Aug. 26, 2014, 4:29 p.m. UTC | #1

On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
> +(define_insn_and_split "avx2_rotate<mode>_perm"
> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
> +      (vec_select:V_256
> +       (match_operand:V_256 1 "register_operand" "x")
> +       (match_parallel 2 "palignr_operand"
> +         [(match_operand 3 "const_int_operand" "n")])))]
> +  "TARGET_AVX2"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]

Why are you waiting until after reload to expand this?  It's only the
vec_select parallel that determines which direction the palignr should be done.

This seems like something you could do during permutation expansion.

r~

Evgeny Stupachenko Aug. 27, 2014, 3:50 p.m. UTC | #2

The rotate insn appeared right after expand.
I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
I don't see any potential losses on splitting that after reload.

On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>> +      (vec_select:V_256
>> +       (match_operand:V_256 1 "register_operand" "x")
>> +       (match_parallel 2 "palignr_operand"
>> +         [(match_operand 3 "const_int_operand" "n")])))]
>> +  "TARGET_AVX2"
>> +  "#"
>> +  "&& reload_completed"
>> +  [(const_int 0)]
>
> Why are you waiting until after reload to expand this?  It's only the
> vec_select parallel that determines which direction the palignr should be done.
>
> This seems like something you could do during permutation expansion.
>
>
> r~
>
>

Evgeny Stupachenko Sept. 8, 2014, 10:03 a.m. UTC | #3

PING

On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> The rotate insn appeared right after expand.
> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
> I don't see any potential losses on splitting that after reload.
>
> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>> +      (vec_select:V_256
>>> +       (match_operand:V_256 1 "register_operand" "x")
>>> +       (match_parallel 2 "palignr_operand"
>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>> +  "TARGET_AVX2"
>>> +  "#"
>>> +  "&& reload_completed"
>>> +  [(const_int 0)]
>>
>> Why are you waiting until after reload to expand this?  It's only the
>> vec_select parallel that determines which direction the palignr should be done.
>>
>> This seems like something you could do during permutation expansion.
>>
>>
>> r~
>>
>>

Evgeny Stupachenko Sept. 16, 2014, 1:15 p.m. UTC | #4

PING 2

On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> PING
>
> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> The rotate insn appeared right after expand.
>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
>> I don't see any potential losses on splitting that after reload.
>>
>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>>> +      (vec_select:V_256
>>>> +       (match_operand:V_256 1 "register_operand" "x")
>>>> +       (match_parallel 2 "palignr_operand"
>>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>>> +  "TARGET_AVX2"
>>>> +  "#"
>>>> +  "&& reload_completed"
>>>> +  [(const_int 0)]
>>>
>>> Why are you waiting until after reload to expand this?  It's only the
>>> vec_select parallel that determines which direction the palignr should be done.
>>>
>>> This seems like something you could do during permutation expansion.
>>>
>>>
>>> r~
>>>
>>>

H.J. Lu Sept. 16, 2014, 2:51 p.m. UTC | #5

On Tue, Sep 16, 2014 at 6:15 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> PING 2
>
> On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> PING
>>
>> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> The rotate insn appeared right after expand.
>>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
>>> I don't see any potential losses on splitting that after reload.
>>>
>>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>>>> +      (vec_select:V_256
>>>>> +       (match_operand:V_256 1 "register_operand" "x")
>>>>> +       (match_parallel 2 "palignr_operand"
>>>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>>>> +  "TARGET_AVX2"
>>>>> +  "#"
>>>>> +  "&& reload_completed"
>>>>> +  [(const_int 0)]
>>>>
>>>> Why are you waiting until after reload to expand this?  It's only the
>>>> vec_select parallel that determines which direction the palignr should be done.
>>>>
>>>> This seems like something you could do during permutation expansion.
>>>>
>>>>

Assuming your change is triggered today without any additional changes
 you should include some testcases.  For now, it doesn't show if it does
anything useful.

Evgeny Stupachenko Sept. 17, 2014, 1:01 p.m. UTC | #6

It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.

On Tue, Sep 16, 2014 at 6:51 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Sep 16, 2014 at 6:15 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> PING 2
>>
>> On Mon, Sep 8, 2014 at 2:03 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> PING
>>>
>>> On Wed, Aug 27, 2014 at 7:50 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>>> The rotate insn appeared right after expand.
>>>> I've done it similar to define_insn_and_split "*avx_vperm_broadcast_<mode>".
>>>> I don't see any potential losses on splitting that after reload.
>>>>
>>>> On Tue, Aug 26, 2014 at 8:29 PM, Richard Henderson <rth@redhat.com> wrote:
>>>>> On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote:
>>>>>> +(define_insn_and_split "avx2_rotate<mode>_perm"
>>>>>> +  [(set (match_operand:V_256 0 "register_operand" "=&x")
>>>>>> +      (vec_select:V_256
>>>>>> +       (match_operand:V_256 1 "register_operand" "x")
>>>>>> +       (match_parallel 2 "palignr_operand"
>>>>>> +         [(match_operand 3 "const_int_operand" "n")])))]
>>>>>> +  "TARGET_AVX2"
>>>>>> +  "#"
>>>>>> +  "&& reload_completed"
>>>>>> +  [(const_int 0)]
>>>>>
>>>>> Why are you waiting until after reload to expand this?  It's only the
>>>>> vec_select parallel that determines which direction the palignr should be done.
>>>>>
>>>>> This seems like something you could do during permutation expansion.
>>>>>
>>>>>
>
> Assuming your change is triggered today without any additional changes
>  you should include some testcases.  For now, it doesn't show if it does
> anything useful.
>
> --
> H.J.

H.J. Lu Sept. 17, 2014, 2:49 p.m. UTC | #7

On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>

I suggest you resubmit the patch as a bug fix for pr62128 with
testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.

Evgeny Stupachenko Sept. 17, 2014, 5:26 p.m. UTC | #8

The test in pr62128 is exactly TEST 22 from
gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
or not.
Resubmitting patch looks good as current mail thread is already too complicated.

On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>>
>
> I suggest you resubmit the patch as a bug fix for pr62128 with
> testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.
>
>
> --
> H.J.

Evgeny Stupachenko Oct. 1, 2014, 10:16 a.m. UTC | #9

Getting back to initial patch, is it ok?
It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check.
X86 bootstrap is also ok.

2014-10-01  Evgeny Stupachenko  <evstupac@gmail.com>

        * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
        PALINGR instruction.
        * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
        for AVX2.

On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> The test in pr62128 is exactly TEST 22 from
> gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
> or not.
> Resubmitting patch looks good as current mail thread is already too complicated.
>
> On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>>>
>>
>> I suggest you resubmit the patch as a bug fix for pr62128 with
>> testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.
>>
>>
>> --
>> H.J.

Uros Bizjak Oct. 1, 2014, 10:28 a.m. UTC | #10

On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Getting back to initial patch, is it ok?

IMO, we should start with Jakub's proposed patch [1]

[1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html

Uros.

> It fixes gcc.target/i386/pr52252-atom.c for AVX2 make check.
> X86 bootstrap is also ok.
>
> 2014-10-01  Evgeny Stupachenko  <evstupac@gmail.com>
>
>         * config/i386/i386.c (expand_vec_perm_palignr): Extend to use AVX2
>         PALINGR instruction.
>         * config/i386/i386.c (ix86_expand_vec_perm_const_1): Add palignr try
>         for AVX2.
>
> On Wed, Sep 17, 2014 at 9:26 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> The test in pr62128 is exactly TEST 22 from
>> gcc.dg/torture/vshuf-v32qi.c. It will check if the pattern is correct
>> or not.
>> Resubmitting patch looks good as current mail thread is already too complicated.
>>
>> On Wed, Sep 17, 2014 at 6:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Wed, Sep 17, 2014 at 6:01 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>>> It fix "gcc.target/i386/pr52252-atom.c" in core-avx2 make check and pr62128.
>>>>
>>>
>>> I suggest you resubmit the patch as a bug fix for pr62128 with
>>> testcases from  pr62128 as well as gcc.target/i386/pr52252-atom.c.
>>>
>>>
>>> --
>>> H.J.

Jakub Jelinek Oct. 1, 2014, 10:35 a.m. UTC | #11

On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote:
> On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> > Getting back to initial patch, is it ok?
> 
> IMO, we should start with Jakub's proposed patch [1]
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html

That doesn't compile, will post a new version; got interrupted when
I found that in
GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
one test is miscompiled even with unpatched compiler, debugging that now.

That said, my patch will not do anything about the
case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation,
for that we can't do vpalignr followed by vpshufb or similar,
but need to do some permutation first and then vpalignr on
the result.  So it would need a new routine.  It is still a 2
insn permutation, not 6, and needs different algorithm, so sharing
the same routine for that is undesirable.

	Jakub

Evgeny Stupachenko Oct. 1, 2014, 10:50 a.m. UTC | #12

This is not a fix for the case { 1, 2, 3, ..., 31, 0 }. This patch is
an extension of expand_vec_perm_palignr on AVX2 case.
For the case { 1, 2, 3, ..., 31, 0 } we should use separate
function/pattern. I like split as it is similar to already handled SSE
byte rotate {1,2,3,.....,15, 0}: ssse3_palignr<mode>_perm and AVX2
split: *avx_vperm_broadcast_<mode>.

On Wed, Oct 1, 2014 at 2:35 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Oct 01, 2014 at 12:28:51PM +0200, Uros Bizjak wrote:
>> On Wed, Oct 1, 2014 at 12:16 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> > Getting back to initial patch, is it ok?
>>
>> IMO, we should start with Jakub's proposed patch [1]
>>
>> [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00010.html
>
> That doesn't compile, will post a new version; got interrupted when
> I found that in
> GCC_TEST_RUN_EXPENSIVE=1 make check-gcc RUNTESTFLAGS='--target_board=unix/-mavx2 dg-torture.exp=vshuf*.c'
> one test is miscompiled even with unpatched compiler, debugging that now.
>
> That said, my patch will not do anything about the
> case Mark mentioned { 1, 2, 3, ..., 31, 0 } permutation,
> for that we can't do vpalignr followed by vpshufb or similar,
> but need to do some permutation first and then vpalignr on
> the result.  So it would need a new routine.  It is still a 2
> insn permutation, not 6, and needs different algorithm, so sharing
> the same routine for that is undesirable.
>
>         Jakub

[1/2,x86] Add palignr support for AVX2.

Commit Message

Comments

Patch