diff mbox

[AArch64] Emit square root using the Newton series

Message ID 56CBACE4.9070309@samsung.com
State New
Headers show

Commit Message

Evandro Menezes Feb. 23, 2016, 12:50 a.m. UTC
On 12/10/15 04:30, Kyrill Tkachov wrote:
>
> On 09/12/15 18:50, Evandro Menezes wrote:
>> On 12/09/2015 11:16 AM, Kyrill Tkachov wrote:
>>>
>>> On 09/12/15 17:02, Kyrill Tkachov wrote:
>>>>
>>>> On 09/12/15 16:59, Evandro Menezes wrote:
>>>>> On 12/09/2015 10:52 AM, Kyrill Tkachov wrote:
>>>>>> Hi Evandro,
>>>>>>
>>>>>> On 08/12/15 21:35, Evandro Menezes wrote:
>>>>>>> Emit square root using the Newton series
>>>>>>>
>>>>>>>    2015-12-03  Evandro Menezes <e.menezes@samsung.com>
>>>>>>>
>>>>>>>    gcc/
>>>>>>>             * config/aarch64/aarch64-protos.h 
>>>>>>> (aarch64_emit_swsqrt):
>>>>>>>    Declare new
>>>>>>>             function.
>>>>>>>             * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
>>>>>>>    expansion and
>>>>>>>             insn definitions.
>>>>>>>             * config/aarch64/aarch64-tuning-flags.def
>>>>>>>             (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
>>>>>>>             * config/aarch64/aarch64.c (aarch64_emit_swsqrt): 
>>>>>>> Define
>>>>>>>    new function.
>>>>>>>             * config/aarch64/aarch64.md (sqrt<mode>2): New 
>>>>>>> expansion
>>>>>>>    and insn
>>>>>>>             definitions.
>>>>>>>             * config/aarch64/aarch64.opt 
>>>>>>> (mlow-precision-recip-sqrt):
>>>>>>>    Expand option
>>>>>>>             description.
>>>>>>>             * doc/invoke.texi (mlow-precision-recip-sqrt): 
>>>>>>> Likewise.
>>>>>>>
>>>>>>> This patch extends the patch that added support for implementing 
>>>>>>> x^-1/2 using the Newton series by adding support for x^1/2 as well.
>>>>>>>
>>>>>>> Is it OK at this point of stage 3?
>>>>>>>
>>>>>>
>>>>>> A comment on the patch itself from me...
>>>>>>
>>>>>> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
>>>>>> b/gcc/config/aarch64/aarch64-tuning-flags.def
>>>>>> index 6f7dbce..11c6c9a 100644
>>>>>> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
>>>>>> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
>>>>>> @@ -30,4 +30,4 @@
>>>>>>
>>>>>>  AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
>>>>>>  AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
>>>>>> -
>>>>>> +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT)
>>>>>>
>>>>>> That seems like a misleading name to me.
>>>>>> If we're doing this, that means that the sqrt instruction is not 
>>>>>> faster
>>>>>> than doing the inverse sqrt estimation followed by a multiply.
>>>>>> I think a name like "synth_sqrt" or "estimate_sqrt" or something 
>>>>>> along those lines
>>>>>> is more appropriate.
>>>>>
>>>>> Unfortunately, this is the case on Exynos M1: the series is faster 
>>>>> than the instruction. :-(  So, other targets when this is also 
>>>>> true, using the "fast_sqrt" option might make sense.
>>>>>
>>>>
>>>> Sure, but the way your patch is written, we will emit the series 
>>>> when "fast_sqrt" is set, rather
>>>> than the other way around, unless I'm misreading the logic in:
>>>>
>>>
>>> Sorry, what I meant to say is it would be clearer, IMO, to describe 
>>> the compiler action that is being taken
>>> (e.g. the rename_fma_regs tuning flag), in this case it's estimating 
>>> sqrt using a series.
>>>
>>> Kyrill
>>>
>>>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>>>> b/gcc/config/aarch64/aarch64-simd.md
>>>> index 030a101..f6d2da4 100644
>>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>>> @@ -4280,7 +4280,23 @@
>>>>
>>>>  ;; sqrt
>>>>
>>>> -(define_insn "sqrt<mode>2"
>>>> +(define_expand "sqrt<mode>2"
>>>> +  [(set (match_operand:VDQF 0 "register_operand")
>>>> +    (sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
>>>> +  "TARGET_SIMD"
>>>> +{
>>>> +  if ((AARCH64_EXTRA_TUNE_FAST_SQRT & 
>>>> aarch64_tune_params.extra_tuning_flags)
>>>> +      && !optimize_function_for_size_p (cfun)
>>>> +      && flag_finite_math_only
>>>> +      && !flag_trapping_math
>>>> +      && flag_unsafe_math_optimizations)
>>>> +    {
>>>> +      aarch64_emit_swsqrt (operands[0], operands[1]);
>>>> +      DONE;
>>>> +    }
>>>> +})
>>>>
>>
>> Kyrill,
>>
>> How about "approx_sqrt" for, you guessed it, approximate square 
>> root?  The same adjective would perhaps describe "recip_sqrt" better 
>> too.
>>
>
> Sounds good to me.
> Sorry for the bikeshedding.

         Rename the reciprocal square root tuning flag

         Rename the tuning option to enable the Newton series for the
    reciprocal square
         root to reflect its approximative characteristic.

         2016-02-22  Evandro Menezes  <e.menezes@samsung.com>

         gcc/
             * config/aarch64/aarch64-tuning-flags.def: Rename tuning
    flag to
             AARCH64_EXTRA_TUNE_APPROX_RSQRT.
             * config/aarch64/aarch64.c (xgene1_tunings): Use new name.
             (use_rsqrt_p): Likewise.
             * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
    Reword the
             text explaining this option.
             * doc/invoke.texi (-mlow-precision-recip-sqrt): Likewise.


In preparation for the patch adding the Newton series also for square 
root, I'd like to propose this patch changing the name of the existing 
tuning flag for the reciprocal square root.

Thank you,

Comments

James Greenhalgh Feb. 26, 2016, 2:59 p.m. UTC | #1
On Mon, Feb 22, 2016 at 06:50:44PM -0600, Evandro Menezes wrote:
> In preparation for the patch adding the Newton series also for
> square root, I'd like to propose this patch changing the name of the
> existing tuning flag for the reciprocal square root.

This is fine, other names like sw_rsqrt, expand_rsqrt, nr_rsqrt would also
be OK. Pick your favourite!

One comment on the replacement invoke.texi text below, otherwise this is
OK to apply now.

> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index 5cbd4cd..155d2bd 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -151,5 +151,5 @@ PC relative literal loads.
>  
>  mlow-precision-recip-sqrt
>  Common Var(flag_mrecip_low_precision_sqrt) Optimization
> -When calculating a sqrt approximation, run fewer steps.
> +Calculate the reciprocal square-root approximation in fewer steps.
>  This reduces precision, but can result in faster computation.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 490df93..eeff24d 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12879,12 +12879,10 @@ corresponding flag to the linker.
>  @item -mno-low-precision-recip-sqrt
>  @opindex -mlow-precision-recip-sqrt
>  @opindex -mno-low-precision-recip-sqrt
> -The square root estimate uses two steps instead of three for double-precision,
> -and one step instead of two for single-precision.
> -Thus reducing latency and precision.
> -This is only relevant if @option{-ffast-math} activates
> -reciprocal square root estimate instructions.
> -Which in turn depends on the target processor.
> +The reciprocal square root approximation uses one step less than otherwise,
> +thus reducing latency and precision.

When calculating the reciprocal square root approximation, use one less
step than otherwise, thus reducing latency and precision.

Thanks,
James
Evandro Menezes Feb. 26, 2016, 11:42 p.m. UTC | #2
On 02/26/16 08:59, James Greenhalgh wrote:
> On Mon, Feb 22, 2016 at 06:50:44PM -0600, Evandro Menezes wrote:
>> In preparation for the patch adding the Newton series also for
>> square root, I'd like to propose this patch changing the name of the
>> existing tuning flag for the reciprocal square root.
> This is fine, other names like sw_rsqrt, expand_rsqrt, nr_rsqrt would also
> be OK. Pick your favourite!
>
> One comment on the replacement invoke.texi text below, otherwise this is
> OK to apply now.
>
>> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
>> index 5cbd4cd..155d2bd 100644
>> --- a/gcc/config/aarch64/aarch64.opt
>> +++ b/gcc/config/aarch64/aarch64.opt
>> @@ -151,5 +151,5 @@ PC relative literal loads.
>>   
>>   mlow-precision-recip-sqrt
>>   Common Var(flag_mrecip_low_precision_sqrt) Optimization
>> -When calculating a sqrt approximation, run fewer steps.
>> +Calculate the reciprocal square-root approximation in fewer steps.
>>   This reduces precision, but can result in faster computation.
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 490df93..eeff24d 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -12879,12 +12879,10 @@ corresponding flag to the linker.
>>   @item -mno-low-precision-recip-sqrt
>>   @opindex -mlow-precision-recip-sqrt
>>   @opindex -mno-low-precision-recip-sqrt
>> -The square root estimate uses two steps instead of three for double-precision,
>> -and one step instead of two for single-precision.
>> -Thus reducing latency and precision.
>> -This is only relevant if @option{-ffast-math} activates
>> -reciprocal square root estimate instructions.
>> -Which in turn depends on the target processor.
>> +The reciprocal square root approximation uses one step less than otherwise,
>> +thus reducing latency and precision.
> When calculating the reciprocal square root approximation, use one less
> step than otherwise, thus reducing latency and precision.
>

Checked in as r233772.

Thank you,
Evandro Menezes Feb. 26, 2016, 11:46 p.m. UTC | #3
On 02/26/16 17:42, Evandro Menezes wrote:
> On 02/26/16 08:59, James Greenhalgh wrote:
>> On Mon, Feb 22, 2016 at 06:50:44PM -0600, Evandro Menezes wrote:
>>> In preparation for the patch adding the Newton series also for
>>> square root, I'd like to propose this patch changing the name of the
>>> existing tuning flag for the reciprocal square root.
>> This is fine, other names like sw_rsqrt, expand_rsqrt, nr_rsqrt would 
>> also
>> be OK. Pick your favourite!
>>
>> One comment on the replacement invoke.texi text below, otherwise this is
>> OK to apply now.
>>
>>> diff --git a/gcc/config/aarch64/aarch64.opt 
>>> b/gcc/config/aarch64/aarch64.opt
>>> index 5cbd4cd..155d2bd 100644
>>> --- a/gcc/config/aarch64/aarch64.opt
>>> +++ b/gcc/config/aarch64/aarch64.opt
>>> @@ -151,5 +151,5 @@ PC relative literal loads.
>>>     mlow-precision-recip-sqrt
>>>   Common Var(flag_mrecip_low_precision_sqrt) Optimization
>>> -When calculating a sqrt approximation, run fewer steps.
>>> +Calculate the reciprocal square-root approximation in fewer steps.
>>>   This reduces precision, but can result in faster computation.
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 490df93..eeff24d 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -12879,12 +12879,10 @@ corresponding flag to the linker.
>>>   @item -mno-low-precision-recip-sqrt
>>>   @opindex -mlow-precision-recip-sqrt
>>>   @opindex -mno-low-precision-recip-sqrt
>>> -The square root estimate uses two steps instead of three for 
>>> double-precision,
>>> -and one step instead of two for single-precision.
>>> -Thus reducing latency and precision.
>>> -This is only relevant if @option{-ffast-math} activates
>>> -reciprocal square root estimate instructions.
>>> -Which in turn depends on the target processor.
>>> +The reciprocal square root approximation uses one step less than 
>>> otherwise,
>>> +thus reducing latency and precision.
>> When calculating the reciprocal square root approximation, use one less
>> step than otherwise, thus reducing latency and precision.
>>
>
> Checked in as r233772.

But not without some log hiccups, sorry...
diff mbox

Patch

From 7043444f83c12de0ab50627a8b386e3070050591 Mon Sep 17 00:00:00 2001
From: Evandro Menezes <e.menezes@samsung.com>
Date: Mon, 22 Feb 2016 17:49:09 -0600
Subject: [PATCH] Rename the reciprocal square root tuning flag

Rename the tuning option to enable the Newton series for the reciprocal square
root to reflect its approximative characteristic.

2016-02-22  Evandro Menezes  <e.menezes@samsung.com>

gcc/
	* config/aarch64/aarch64-tuning-flags.def: Rename tuning flag to
	AARCH64_EXTRA_TUNE_APPROX_RSQRT.
	* config/aarch64/aarch64.c (xgene1_tunings): Use new name.
	(use_rsqrt_p): Likewise.
	* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Reword the
	text explaining this option.
	* doc/invoke.texi (-mlow-precision-recip-sqrt): Likewise.
---
 gcc/config/aarch64/aarch64-tuning-flags.def |  2 +-
 gcc/config/aarch64/aarch64.c                |  4 ++--
 gcc/config/aarch64/aarch64.opt              |  2 +-
 gcc/doc/invoke.texi                         | 10 ++++------
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 8036cfe..7e45a0c 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,5 +29,5 @@ 
      AARCH64_TUNE_ to give an enum name. */
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
-AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
+AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 923a4b3..ebf47da 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -586,7 +586,7 @@  static const struct tune_params xgene1_tunings =
   0,	/* max_case_values.  */
   0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_RECIP_SQRT)	/* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_APPROX_RSQRT)	/* tune_flags.  */
 };
 
 /* Support for fine-grained override of the tuning structures.  */
@@ -7469,7 +7469,7 @@  use_rsqrt_p (void)
   return (!flag_trapping_math
 	  && flag_unsafe_math_optimizations
 	  && ((aarch64_tune_params.extra_tuning_flags
-	       & AARCH64_EXTRA_TUNE_RECIP_SQRT)
+	       & AARCH64_EXTRA_TUNE_APPROX_RSQRT)
 	      || flag_mrecip_low_precision_sqrt));
 }
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 5cbd4cd..155d2bd 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -151,5 +151,5 @@  PC relative literal loads.
 
 mlow-precision-recip-sqrt
 Common Var(flag_mrecip_low_precision_sqrt) Optimization
-When calculating a sqrt approximation, run fewer steps.
+Calculate the reciprocal square-root approximation in fewer steps.
 This reduces precision, but can result in faster computation.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 490df93..eeff24d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12879,12 +12879,10 @@  corresponding flag to the linker.
 @item -mno-low-precision-recip-sqrt
 @opindex -mlow-precision-recip-sqrt
 @opindex -mno-low-precision-recip-sqrt
-The square root estimate uses two steps instead of three for double-precision,
-and one step instead of two for single-precision.
-Thus reducing latency and precision.
-This is only relevant if @option{-ffast-math} activates
-reciprocal square root estimate instructions.
-Which in turn depends on the target processor.
+The reciprocal square root approximation uses one step less than otherwise,
+thus reducing latency and precision.
+This is only relevant if @option{-ffast-math} enables the reciprocal square root
+approximation, which in turn depends on the target processor.
 
 @item -march=@var{name}
 @opindex march
-- 
2.6.3