diff mbox

[AArch64] Replace insn to zero up DF register

Message ID 56257F53.2000905@samsung.com
State New
Headers show

Commit Message

Evandro Menezes Oct. 19, 2015, 11:40 p.m. UTC
In the existing targets, it seems that it's always faster to zero up a 
DF register with "movi %d0, #0" instead of "fmov %d0, xzr".

This patch modifies the respective pattern.

Please, commit if it's alright.

Thank you,

Comments

Andrew Pinski Oct. 19, 2015, 11:51 p.m. UTC | #1
On Tue, Oct 20, 2015 at 7:40 AM, Evandro Menezes <e.menezes@samsung.com> wrote:
> In the existing targets, it seems that it's always faster to zero up a DF
> register with "movi %d0, #0" instead of "fmov %d0, xzr".

I think for ThunderX 1, this change will not make a difference.  So I
am neutral on this change.

Thanks,
Andrew

>
> This patch modifies the respective pattern.
>
> Please, commit if it's alright.
>
> Thank you,
>
> --
> Evandro Menezes
>
Andrew Pinski Oct. 19, 2015, 11:59 p.m. UTC | #2
On Tue, Oct 20, 2015 at 7:51 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Tue, Oct 20, 2015 at 7:40 AM, Evandro Menezes <e.menezes@samsung.com> wrote:
>> In the existing targets, it seems that it's always faster to zero up a DF
>> register with "movi %d0, #0" instead of "fmov %d0, xzr".
>
> I think for ThunderX 1, this change will not make a difference.  So I
> am neutral on this change.

Actually depending on fmov is decoded in our pipeline, this change
might actually be worse.  Currently fmov with an immediate is 1 cycle
while movi is two cycles.  Let me double check how internally on how
it is decoded and if it is 1 cycle or two.

Thanks,
Andrew

>
> Thanks,
> Andrew
>
>>
>> This patch modifies the respective pattern.
>>
>> Please, commit if it's alright.
>>
>> Thank you,
>>
>> --
>> Evandro Menezes
>>
Andrew Pinski Oct. 20, 2015, 2:27 p.m. UTC | #3
On Tue, Oct 20, 2015 at 7:59 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Tue, Oct 20, 2015 at 7:51 AM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Tue, Oct 20, 2015 at 7:40 AM, Evandro Menezes <e.menezes@samsung.com> wrote:
>>> In the existing targets, it seems that it's always faster to zero up a DF
>>> register with "movi %d0, #0" instead of "fmov %d0, xzr".
>>
>> I think for ThunderX 1, this change will not make a difference.  So I
>> am neutral on this change.
>
> Actually depending on fmov is decoded in our pipeline, this change
> might actually be worse.  Currently fmov with an immediate is 1 cycle
> while movi is two cycles.  Let me double check how internally on how
> it is decoded and if it is 1 cycle or two.

Ok, my objections are removed as I talked with the architectures here
at Cavium and using movi is better in this case.

Thanks,
Andrew

>
> Thanks,
> Andrew
>
>>
>> Thanks,
>> Andrew
>>
>>>
>>> This patch modifies the respective pattern.
>>>
>>> Please, commit if it's alright.
>>>
>>> Thank you,
>>>
>>> --
>>> Evandro Menezes
>>>
Evandro Menezes Oct. 28, 2015, 6:41 p.m. UTC | #4
Ping.
Marcus Shawcroft Oct. 30, 2015, 10:24 a.m. UTC | #5
On 20 October 2015 at 00:40, Evandro Menezes <e.menezes@samsung.com> wrote:
> In the existing targets, it seems that it's always faster to zero up a DF
> register with "movi %d0, #0" instead of "fmov %d0, xzr".
>
> This patch modifies the respective pattern.


Hi Evandro,

This patch changes the generic, u architecture independent instruction
selection. The ARM ARM (C3.5.3) makes a specific recommendation about
the choice of instruction in this situation and the current
implementation in GCC follows that recommendation.  Wilco has also
picked up on this issue he has the same patch internal to ARM along
with an ongoing discussion with ARM architecture folk regarding this
recommendation.  I'm reluctant to take this patch right now on the
basis that it runs contrary to ARM ARM recommendation pending the
conclusion of Wilco's discussion with ARM architecture folk.

Cheers
/Marcus
Evandro Menezes Nov. 9, 2015, 10:59 p.m. UTC | #6
Hi, Marcus.

Have you an update from the architecture folks about this?

Thank you,
Evandro Menezes Nov. 19, 2015, 10:01 p.m. UTC | #7
On 10/30/2015 05:24 AM, Marcus Shawcroft wrote:
> On 20 October 2015 at 00:40, Evandro Menezes <e.menezes@samsung.com> wrote:
>> In the existing targets, it seems that it's always faster to zero up a DF
>> register with "movi %d0, #0" instead of "fmov %d0, xzr".
>>
>> This patch modifies the respective pattern.
>
> Hi Evandro,
>
> This patch changes the generic, u architecture independent instruction
> selection. The ARM ARM (C3.5.3) makes a specific recommendation about
> the choice of instruction in this situation and the current
> implementation in GCC follows that recommendation.  Wilco has also
> picked up on this issue he has the same patch internal to ARM along
> with an ongoing discussion with ARM architecture folk regarding this
> recommendation.  I'm reluctant to take this patch right now on the
> basis that it runs contrary to ARM ARM recommendation pending the
> conclusion of Wilco's discussion with ARM architecture folk.

Ping.
Evandro Menezes Dec. 3, 2015, 9:01 p.m. UTC | #8
On 11/09/2015 04:59 PM, Evandro Menezes wrote:
> Hi, Marcus.
>
> Have you an update from the architecture folks about this?
>
> Thank you,

Marcus?
Evandro Menezes Dec. 16, 2015, 9:30 p.m. UTC | #9
On 10/30/2015 05:24 AM, Marcus Shawcroft wrote:
> On 20 October 2015 at 00:40, Evandro Menezes <e.menezes@samsung.com> wrote:
>> In the existing targets, it seems that it's always faster to zero up a DF
>> register with "movi %d0, #0" instead of "fmov %d0, xzr".
>>
>> This patch modifies the respective pattern.
>
> Hi Evandro,
>
> This patch changes the generic, u architecture independent instruction
> selection. The ARM ARM (C3.5.3) makes a specific recommendation about
> the choice of instruction in this situation and the current
> implementation in GCC follows that recommendation.  Wilco has also
> picked up on this issue he has the same patch internal to ARM along
> with an ongoing discussion with ARM architecture folk regarding this
> recommendation.  I'm reluctant to take this patch right now on the
> basis that it runs contrary to ARM ARM recommendation pending the
> conclusion of Wilco's discussion with ARM architecture folk.
>

Marcus,

Have you had a chance to discuss this internally further?

Thank you,
Evandro Menezes Jan. 13, 2016, 12:06 a.m. UTC | #10
On 12/16/2015 03:30 PM, Evandro Menezes wrote:
> On 10/30/2015 05:24 AM, Marcus Shawcroft wrote:
>> On 20 October 2015 at 00:40, Evandro Menezes <e.menezes@samsung.com> 
>> wrote:
>>> In the existing targets, it seems that it's always faster to zero up 
>>> a DF
>>> register with "movi %d0, #0" instead of "fmov %d0, xzr".
>>>
>>> This patch modifies the respective pattern.
>>
>> Hi Evandro,
>>
>> This patch changes the generic, u architecture independent instruction
>> selection. The ARM ARM (C3.5.3) makes a specific recommendation about
>> the choice of instruction in this situation and the current
>> implementation in GCC follows that recommendation.  Wilco has also
>> picked up on this issue he has the same patch internal to ARM along
>> with an ongoing discussion with ARM architecture folk regarding this
>> recommendation.  I'm reluctant to take this patch right now on the
>> basis that it runs contrary to ARM ARM recommendation pending the
>> conclusion of Wilco's discussion with ARM architecture folk.
>
> Have you had a chance to discuss this internally further?
>

Ping.
diff mbox

Patch

From 429b1d70a7eca76c96250fec6ec5269a7a661a4c Mon Sep 17 00:00:00 2001
From: Evandro Menezes <e.menezes@samsung.com>
Date: Mon, 19 Oct 2015 18:31:48 -0500
Subject: [PATCH] [AArch64] Replace insn to zero up DF register

gcc/
	* config/aarch64/aarch64.md
	(*movdf_aarch64): Add "movi %d0, #0" to zero up DF register.
---
 gcc/config/aarch64/aarch64.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5b7f2fd..5f00686 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1156,21 +1156,22 @@ 
 )
 
 (define_insn "*movdf_aarch64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=w, ?r,w,w  ,w,m,r,m ,r")
-	(match_operand:DF 1 "general_operand"      "?rY, w,w,Ufc,m,w,m,rY,r"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=w,?r,w,w,w  ,w,m,r,m ,r")
+	(match_operand:DF 1 "general_operand"      "?r, w,w,Y,Ufc,m,w,m,rY,r"))]
   "TARGET_FLOAT && (register_operand (operands[0], DFmode)
     || aarch64_reg_or_fp_zero (operands[1], DFmode))"
   "@
    fmov\\t%d0, %x1
    fmov\\t%x0, %d1
    fmov\\t%d0, %d1
+   movi\\t%d0, #0
    fmov\\t%d0, %1
    ldr\\t%d0, %1
    str\\t%d1, %0
    ldr\\t%x0, %1
    str\\t%x1, %0
    mov\\t%x0, %x1"
-  [(set_attr "type" "f_mcr,f_mrc,fmov,fconstd,\
+  [(set_attr "type" "f_mcr,f_mrc,fmov,neon_move,fconstd,\
                      f_loadd,f_stored,load1,store1,mov_reg")]
 )
 
-- 
2.1.0.243.g30d45f7