Message ID | 56257F53.2000905@samsung.com |
---|---|
State | New |
Headers | show |
On Tue, Oct 20, 2015 at 7:40 AM, Evandro Menezes <e.menezes@samsung.com> wrote: > In the existing targets, it seems that it's always faster to zero up a DF > register with "movi %d0, #0" instead of "fmov %d0, xzr". I think for ThunderX 1, this change will not make a difference. So I am neutral on this change. Thanks, Andrew > > This patch modifies the respective pattern. > > Please, commit if it's alright. > > Thank you, > > -- > Evandro Menezes >
On Tue, Oct 20, 2015 at 7:51 AM, Andrew Pinski <pinskia@gmail.com> wrote: > On Tue, Oct 20, 2015 at 7:40 AM, Evandro Menezes <e.menezes@samsung.com> wrote: >> In the existing targets, it seems that it's always faster to zero up a DF >> register with "movi %d0, #0" instead of "fmov %d0, xzr". > > I think for ThunderX 1, this change will not make a difference. So I > am neutral on this change. Actually depending on fmov is decoded in our pipeline, this change might actually be worse. Currently fmov with an immediate is 1 cycle while movi is two cycles. Let me double check how internally on how it is decoded and if it is 1 cycle or two. Thanks, Andrew > > Thanks, > Andrew > >> >> This patch modifies the respective pattern. >> >> Please, commit if it's alright. >> >> Thank you, >> >> -- >> Evandro Menezes >>
On Tue, Oct 20, 2015 at 7:59 AM, Andrew Pinski <pinskia@gmail.com> wrote: > On Tue, Oct 20, 2015 at 7:51 AM, Andrew Pinski <pinskia@gmail.com> wrote: >> On Tue, Oct 20, 2015 at 7:40 AM, Evandro Menezes <e.menezes@samsung.com> wrote: >>> In the existing targets, it seems that it's always faster to zero up a DF >>> register with "movi %d0, #0" instead of "fmov %d0, xzr". >> >> I think for ThunderX 1, this change will not make a difference. So I >> am neutral on this change. > > Actually depending on fmov is decoded in our pipeline, this change > might actually be worse. Currently fmov with an immediate is 1 cycle > while movi is two cycles. Let me double check how internally on how > it is decoded and if it is 1 cycle or two. Ok, my objections are removed as I talked with the architectures here at Cavium and using movi is better in this case. Thanks, Andrew > > Thanks, > Andrew > >> >> Thanks, >> Andrew >> >>> >>> This patch modifies the respective pattern. >>> >>> Please, commit if it's alright. >>> >>> Thank you, >>> >>> -- >>> Evandro Menezes >>>
Ping.
On 20 October 2015 at 00:40, Evandro Menezes <e.menezes@samsung.com> wrote: > In the existing targets, it seems that it's always faster to zero up a DF > register with "movi %d0, #0" instead of "fmov %d0, xzr". > > This patch modifies the respective pattern. Hi Evandro, This patch changes the generic, u architecture independent instruction selection. The ARM ARM (C3.5.3) makes a specific recommendation about the choice of instruction in this situation and the current implementation in GCC follows that recommendation. Wilco has also picked up on this issue he has the same patch internal to ARM along with an ongoing discussion with ARM architecture folk regarding this recommendation. I'm reluctant to take this patch right now on the basis that it runs contrary to ARM ARM recommendation pending the conclusion of Wilco's discussion with ARM architecture folk. Cheers /Marcus
Hi, Marcus. Have you an update from the architecture folks about this? Thank you,
On 10/30/2015 05:24 AM, Marcus Shawcroft wrote: > On 20 October 2015 at 00:40, Evandro Menezes <e.menezes@samsung.com> wrote: >> In the existing targets, it seems that it's always faster to zero up a DF >> register with "movi %d0, #0" instead of "fmov %d0, xzr". >> >> This patch modifies the respective pattern. > > Hi Evandro, > > This patch changes the generic, u architecture independent instruction > selection. The ARM ARM (C3.5.3) makes a specific recommendation about > the choice of instruction in this situation and the current > implementation in GCC follows that recommendation. Wilco has also > picked up on this issue he has the same patch internal to ARM along > with an ongoing discussion with ARM architecture folk regarding this > recommendation. I'm reluctant to take this patch right now on the > basis that it runs contrary to ARM ARM recommendation pending the > conclusion of Wilco's discussion with ARM architecture folk. Ping.
On 11/09/2015 04:59 PM, Evandro Menezes wrote: > Hi, Marcus. > > Have you an update from the architecture folks about this? > > Thank you, Marcus?
On 10/30/2015 05:24 AM, Marcus Shawcroft wrote: > On 20 October 2015 at 00:40, Evandro Menezes <e.menezes@samsung.com> wrote: >> In the existing targets, it seems that it's always faster to zero up a DF >> register with "movi %d0, #0" instead of "fmov %d0, xzr". >> >> This patch modifies the respective pattern. > > Hi Evandro, > > This patch changes the generic, u architecture independent instruction > selection. The ARM ARM (C3.5.3) makes a specific recommendation about > the choice of instruction in this situation and the current > implementation in GCC follows that recommendation. Wilco has also > picked up on this issue he has the same patch internal to ARM along > with an ongoing discussion with ARM architecture folk regarding this > recommendation. I'm reluctant to take this patch right now on the > basis that it runs contrary to ARM ARM recommendation pending the > conclusion of Wilco's discussion with ARM architecture folk. > Marcus, Have you had a chance to discuss this internally further? Thank you,
On 12/16/2015 03:30 PM, Evandro Menezes wrote: > On 10/30/2015 05:24 AM, Marcus Shawcroft wrote: >> On 20 October 2015 at 00:40, Evandro Menezes <e.menezes@samsung.com> >> wrote: >>> In the existing targets, it seems that it's always faster to zero up >>> a DF >>> register with "movi %d0, #0" instead of "fmov %d0, xzr". >>> >>> This patch modifies the respective pattern. >> >> Hi Evandro, >> >> This patch changes the generic, u architecture independent instruction >> selection. The ARM ARM (C3.5.3) makes a specific recommendation about >> the choice of instruction in this situation and the current >> implementation in GCC follows that recommendation. Wilco has also >> picked up on this issue he has the same patch internal to ARM along >> with an ongoing discussion with ARM architecture folk regarding this >> recommendation. I'm reluctant to take this patch right now on the >> basis that it runs contrary to ARM ARM recommendation pending the >> conclusion of Wilco's discussion with ARM architecture folk. > > Have you had a chance to discuss this internally further? > Ping.
From 429b1d70a7eca76c96250fec6ec5269a7a661a4c Mon Sep 17 00:00:00 2001 From: Evandro Menezes <e.menezes@samsung.com> Date: Mon, 19 Oct 2015 18:31:48 -0500 Subject: [PATCH] [AArch64] Replace insn to zero up DF register gcc/ * config/aarch64/aarch64.md (*movdf_aarch64): Add "movi %d0, #0" to zero up DF register. --- gcc/config/aarch64/aarch64.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5b7f2fd..5f00686 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1156,21 +1156,22 @@ ) (define_insn "*movdf_aarch64" - [(set (match_operand:DF 0 "nonimmediate_operand" "=w, ?r,w,w ,w,m,r,m ,r") - (match_operand:DF 1 "general_operand" "?rY, w,w,Ufc,m,w,m,rY,r"))] + [(set (match_operand:DF 0 "nonimmediate_operand" "=w,?r,w,w,w ,w,m,r,m ,r") + (match_operand:DF 1 "general_operand" "?r, w,w,Y,Ufc,m,w,m,rY,r"))] "TARGET_FLOAT && (register_operand (operands[0], DFmode) || aarch64_reg_or_fp_zero (operands[1], DFmode))" "@ fmov\\t%d0, %x1 fmov\\t%x0, %d1 fmov\\t%d0, %d1 + movi\\t%d0, #0 fmov\\t%d0, %1 ldr\\t%d0, %1 str\\t%d1, %0 ldr\\t%x0, %1 str\\t%x1, %0 mov\\t%x0, %x1" - [(set_attr "type" "f_mcr,f_mrc,fmov,fconstd,\ + [(set_attr "type" "f_mcr,f_mrc,fmov,neon_move,fconstd,\ f_loadd,f_stored,load1,store1,mov_reg")] ) -- 2.1.0.243.g30d45f7