diff mbox

[AArch64] Emit square root using the Newton series

Message ID 56685C62.9070705@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Dec. 9, 2015, 4:52 p.m. UTC
Hi Evandro,

On 08/12/15 21:35, Evandro Menezes wrote:
> Emit square root using the Newton series
>
>    2015-12-03  Evandro Menezes  <e.menezes@samsung.com>
>
>    gcc/
>             * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
>    Declare new
>             function.
>             * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
>    expansion and
>             insn definitions.
>             * config/aarch64/aarch64-tuning-flags.def
>             (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
>             * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
>    new function.
>             * config/aarch64/aarch64.md (sqrt<mode>2): New expansion
>    and insn
>             definitions.
>             * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
>    Expand option
>             description.
>             * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
>
> This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well.
>
> Is it OK at this point of stage 3?
>

A comment on the patch itself from me...


That seems like a misleading name to me.
If we're doing this, that means that the sqrt instruction is not faster
than doing the inverse sqrt estimation followed by a multiply.
I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines
is more appropriate.

Thanks,
Kyrill

Comments

Evandro Menezes Dec. 9, 2015, 4:59 p.m. UTC | #1
On 12/09/2015 10:52 AM, Kyrill Tkachov wrote:
> Hi Evandro,
>
> On 08/12/15 21:35, Evandro Menezes wrote:
>> Emit square root using the Newton series
>>
>>    2015-12-03  Evandro Menezes  <e.menezes@samsung.com>
>>
>>    gcc/
>>             * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
>>    Declare new
>>             function.
>>             * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
>>    expansion and
>>             insn definitions.
>>             * config/aarch64/aarch64-tuning-flags.def
>>             (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
>>             * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
>>    new function.
>>             * config/aarch64/aarch64.md (sqrt<mode>2): New expansion
>>    and insn
>>             definitions.
>>             * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
>>    Expand option
>>             description.
>>             * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
>>
>> This patch extends the patch that added support for implementing 
>> x^-1/2 using the Newton series by adding support for x^1/2 as well.
>>
>> Is it OK at this point of stage 3?
>>
>
> A comment on the patch itself from me...
>
> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> index 6f7dbce..11c6c9a 100644
> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> @@ -30,4 +30,4 @@
>
>  AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
>  AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
> -
> +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT)
>
> That seems like a misleading name to me.
> If we're doing this, that means that the sqrt instruction is not faster
> than doing the inverse sqrt estimation followed by a multiply.
> I think a name like "synth_sqrt" or "estimate_sqrt" or something along 
> those lines
> is more appropriate.

Unfortunately, this is the case on Exynos M1: the series is faster than 
the instruction. :-(  So, other targets when this is also true, using 
the "fast_sqrt" option might make sense.

Thank you,
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 6f7dbce..11c6c9a 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -30,4 +30,4 @@ 
  
  AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
  AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
-
+AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT)