ARM/NEON: vld1q_dup_s64 builtin

Message ID	4FAA445A.8080605@st.com
State	New
Headers	show Return-Path: <gcc-patches-return-318486-incoming=patchwork.ozlabs.org@gcc.gnu.org> Comment: DKIM? See http://www.dkim.org Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=ADZR5WjqNZR7ctdNm8nesjzgaxZTUzDc1a2geaxOHrlCrbaL9ByB2NMyb7p6gM l6DCPmlQ1EQI9lfzjszPBY1+MGzTA+M/CQnoSejX7/Ha/HZgGSnTqhnROfoqFGYo fpZylGIP16q0Jppgngz1rIskFf7A0s2s/Q+r+Q7Y/HebU=; Message-ID: <4FAA445A.8080605@st.com> Date: Wed, 9 May 2012 12:18:02 +0200 From: Christophe Lyon <christophe.lyon@st.com> User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:12.0) Gecko/20120420 Thunderbird/12.0 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> Subject: [PATCH] ARM/NEON: vld1q_dup_s64 builtin Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org

Message ID

4FAA445A.8080605@st.com

State

New

Headers

Comment: DKIM? See http://www.dkim.org
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org;
	h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding:X-IsSubscribed:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To;
	b=ADZR5WjqNZR7ctdNm8nesjzgaxZTUzDc1a2geaxOHrlCrbaL9ByB2NMyb7p6gM
	l6DCPmlQ1EQI9lfzjszPBY1+MGzTA+M/CQnoSejX7/Ha/HZgGSnTqhnROfoqFGYo
	fpZylGIP16q0Jppgngz1rIskFf7A0s2s/Q+r+Q7Y/HebU=;
Message-ID: <4FAA445A.8080605@st.com>
Date: Wed, 9 May 2012 12:18:02 +0200
From: Christophe Lyon <christophe.lyon@st.com>
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64;
	rv:12.0) Gecko/20120420 Thunderbird/12.0
MIME-Version: 1.0
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org

Commit Message

Christophe Lyon May 9, 2012, 10:18 a.m. UTC

Hello,

On ARM+Neon, the expansion of vld1q_dup_s64() and vld1q_dup_u64() builtins currently fails to load the second vector element.

Here is a small patch to address this problem:

2012-05-07  Christophe Lyon <christophe.lyon@st.com>

     * gcc/config/arm/neon.md (neon_vld1_dup): Fix vld1q_dup_s64.


OK?

Thanks,

Christophe.

Comments

Ramana Radhakrishnan May 10, 2012, 11:41 a.m. UTC | #1

On 9 May 2012 11:18, Christophe Lyon <christophe.lyon@st.com> wrote:
> Hello,
>
> On ARM+Neon, the expansion of vld1q_dup_s64() and vld1q_dup_u64() builtins
> currently fails to load the second vector element.

Thanks for the patch but this is not acceptable as it stands today.
You need to set the length attributes in this case to 8 for the
appropriate alternative at the very least. You also don't mention how
this patch was tested. Alternatively it might be worth splitting the
vld1q_*64 case into a 64 bit load into a (subreg:DI (V2DI reg)  0 )
followed by a subreg to subreg move which should end up having the
same effect . That splitting would allow for better instruction
scheduling. In addition it would be nice to have a testcase in
gcc.target/arm .

As a follow up patch I'd like these patterns merged with the vdup_n
patterns in neon.md (allowing them to grow a memory operand variant)
which should then allow merging of (I think)

scalarval = scalar_load ()
vreg = vdup ( scalarval)

into

vreg = vld1_dup_n ( scalar_address).

Thanks,
Ramana

Christophe Lyon May 10, 2012, 3:31 p.m. UTC | #2

On 10.05.2012 13:41, Ramana Radhakrishnan wrote:
> On 9 May 2012 11:18, Christophe Lyon<christophe.lyon@st.com>  wrote:
>> Hello,
>>
>> On ARM+Neon, the expansion of vld1q_dup_s64() and vld1q_dup_u64() builtins
>> currently fails to load the second vector element.
> Thanks for the patch but this is not acceptable as it stands today.
> You need to set the length attributes in this case to 8 for the
> appropriate alternative at the very least.
OK I'll look at this.

> You also don't mention how this patch was tested.
I used the testsuite I developed some time ago to test all the Neon builtins, which I posted last year on the qemu mailing-list. With the current GCCs, this bug is the only remaining one I could detect.

>   Alternatively it might be worth splitting the
> vld1q_*64 case into a 64 bit load into a (subreg:DI (V2DI reg)  0 )
> followed by a subreg to subreg move which should end up having the
> same effect . That splitting would allow for better instruction
> scheduling.
Are you aware of examples of similar cases I could use as a model?

>   In addition it would be nice to have a testcase in
> gcc.target/arm .
Well. Prior to sending my patch I did look at that directory, but I supposed that such a test ought to belong to the neon/ subdir where the tests are described as autogenerated. Any doc on how to do that?

Thanks,

Christophe.

Julian Brown May 10, 2012, 3:52 p.m. UTC | #3

On Thu, 10 May 2012 17:31:43 +0200
Christophe Lyon <christophe.lyon@st.com> wrote:

> On 10.05.2012 13:41, Ramana Radhakrishnan wrote:
> > On 9 May 2012 11:18, Christophe Lyon<christophe.lyon@st.com>  wrote:
> >> Hello,
> >>
> >> On ARM+Neon, the expansion of vld1q_dup_s64() and vld1q_dup_u64()
> >> builtins currently fails to load the second vector element.
> > Thanks for the patch but this is not acceptable as it stands today.
> > You need to set the length attributes in this case to 8 for the
> > appropriate alternative at the very least.
> OK I'll look at this.
> 
> > You also don't mention how this patch was tested.
> I used the testsuite I developed some time ago to test all the Neon
> builtins, which I posted last year on the qemu mailing-list. With the
> current GCCs, this bug is the only remaining one I could detect.
> 
> >   Alternatively it might be worth splitting the
> > vld1q_*64 case into a 64 bit load into a (subreg:DI (V2DI reg)  0 )
> > followed by a subreg to subreg move which should end up having the
> > same effect . That splitting would allow for better instruction
> > scheduling.
> Are you aware of examples of similar cases I could use as a model?
> 
> >   In addition it would be nice to have a testcase in
> > gcc.target/arm .
> Well. Prior to sending my patch I did look at that directory, but I
> supposed that such a test ought to belong to the neon/ subdir where
> the tests are described as autogenerated. Any doc on how to do that?

I'd recommend not to autogenerate such a test, FWIW -- the
autogenerated neon tests aren't very good. I think a manually-written
execute test would be better in this case.

If you do try autogenerating tests, look at "Disassembles_as" in
neon.ml, and neon-testgen.ml.

Julian

Ramana Radhakrishnan May 11, 2012, 2:48 p.m. UTC | #4

>
>
>> You also don't mention how this patch was tested.
>
> I used the testsuite I developed some time ago to test all the Neon
> builtins, which I posted last year on the qemu mailing-list. With the
> current GCCs, this bug is the only remaining one I could detect.
>

Fair enough.


>
>>  Alternatively it might be worth splitting the
>> vld1q_*64 case into a 64 bit load into a (subreg:DI (V2DI reg)  0 )
>> followed by a subreg to subreg move which should end up having the
>> same effect . That splitting would allow for better instruction
>> scheduling.
>
> Are you aware of examples of similar cases I could use as a model?

I would change the iterator from VQX to VQ in the pattern above (you
can also simplify the setting of neon_type in that case as well as
change that to be a vec_duplicate as below and get rid of any
lingering definitions of UNSPEC_VLD1_DUP if they exist), define a
separate pattern that expressed this as a define_insn_and_split as
below.

 (define_insn_and_split "neon_vld1_dupv2di"
   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
     (vec_duplicate:V2DI (match_operand:DI 1 "neon_struct_operand" "Um")))]
   "TARGET_NEON"
   "#"
   "&& reload_completed"
   [(const_int 0)]
   {
    rtx tmprtx = gen_lowpart (DImode, operands[0]);
    emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1]));
    emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx );
    DONE;
    }
(set_attr "length" "8")
(set_attr "neon_type" "<fromearlierpattern">)
)

Do you want to try this and see what you get ?

>
>
>>  In addition it would be nice to have a testcase in
>> gcc.target/arm .
>
> Well. Prior to sending my patch I did look at that directory, but I supposed
> that such a test ought to belong to the neon/ subdir where the tests are
> described as autogenerated. Any doc on how to do that?

 I'd rather have an extra regression test in gcc.target/arm that was a
run time test. for e.g. take a look at gcc.target/arm/neon-vadds64.c .

Ramana

>
> Thanks,
>
> Christophe.
>

Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md    (revision 2659)
+++ gcc/config/arm/neon.md    (revision 2660)
@@ -4203,7 +4203,7 @@ 
    if (GET_MODE_NUNITS (<MODE>mode) > 2)
      return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, %A1";
    else
-    return "vld1.<V_sz_elem>\t%h0, %A1";
+    return "vld1.<V_sz_elem>\t%e0, %A1 \;vmov\t%f0, %e0";
  }
    [(set (attr "neon_type")
        (if_then_else (gt (const_string "<V_mode_nunits>") (const_string "1"))

ARM/NEON: vld1q_dup_s64 builtin

Commit Message

Comments

Patch