diff mbox series

Speedup tanf range reduction

Message ID HE1PR08MB10357845B62EA7A4DFCD927783300@HE1PR08MB1035.eurprd08.prod.outlook.com
State New
Headers show
Series Speedup tanf range reduction | expand

Commit Message

Wilco Dijkstra Aug. 22, 2018, 12:29 p.m. UTC
Speedup tanf range reduction by using the new sincosf range
reduction algorithm.  Overall code quality is improved due to
inlining, so there is a speedup even if no range reduction is
required.

Passes GLIBC testsuite on AArch64.  Some files are no longer
required - they are removed in the next patch.

tanf througput gains on Cortex-A72:
* |x| < M_PI_4 : 1.1x
* |x| < M_PI_2  : 1.2x
* |x| < 2 * M_PI: 1.5x
* |x| < 120.0   : 1.6x
* |x| < Inf     : 12.1x

ChangeLog:
2018-08-22  Wilco Dijkstra  <wdijkstr@arm.com>

	* sysdeps/ieee754/flt-32/s_tanf.c (__tanf): Use fast range reduction.

--

Comments

Joseph Myers Aug. 22, 2018, 12:34 p.m. UTC | #1
On Wed, 22 Aug 2018, Wilco Dijkstra wrote:

> +
> +static inline int32_t
> +rem_pio2f (float x, float *y)

Please put a comment on this function documenting its semantics.
Adhemerval Zanella Netto Aug. 22, 2018, 1:05 p.m. UTC | #2
On 22/08/2018 09:29, Wilco Dijkstra wrote:
> Speedup tanf range reduction by using the new sincosf range
> reduction algorithm.  Overall code quality is improved due to
> inlining, so there is a speedup even if no range reduction is
> required.
> 
> Passes GLIBC testsuite on AArch64.  Some files are no longer
> required - they are removed in the next patch.
> 
> tanf througput gains on Cortex-A72:
> * |x| < M_PI_4 : 1.1x
> * |x| < M_PI_2  : 1.2x
> * |x| < 2 * M_PI: 1.5x
> * |x| < 120.0   : 1.6x
> * |x| < Inf     : 12.1x
> 
> ChangeLog:
> 2018-08-22  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* sysdeps/ieee754/flt-32/s_tanf.c (__tanf): Use fast range reduction.

tanf seems to be the only place where __ieee754_rem_pio2f is used, so we
can remove sysdeps/ieee754/flt-32/e_rem_pio2f.c and
sysdeps/powerpc/fpu/e_rem_pio2f.c.  Also, I would like to certify that
it is faster than powerpc version (I would expect so, but at least it
would be good to actually get some numbers).

> 
> --
> diff --git a/sysdeps/ieee754/flt-32/s_tanf.c b/sysdeps/ieee754/flt-32/s_tanf.c
> index ba3af54913669e4abdfd864307856ec44138f9b9..a397665c4bab7785049935ef526472afedf82e34 100644
> --- a/sysdeps/ieee754/flt-32/s_tanf.c
> +++ b/sysdeps/ieee754/flt-32/s_tanf.c
> @@ -21,6 +21,30 @@ static char rcsid[] = "$NetBSD: s_tanf.c,v 1.4 1995/05/10 20:48:20 jtc Exp $";
>  #include <math.h>
>  #include <math_private.h>
>  #include <libm-alias-float.h>
> +#include "s_sincosf.h"
> +
> +static inline int32_t
> +rem_pio2f (float x, float *y)
> +{
> +  double dx = x;
> +  int n;
> +  const sincos_t *p = &__sincosf_table[0];
> +
> +  if (__glibc_likely (abstop12 (x) < abstop12 (120.0f)))
> +    dx = reduce_fast (dx, p, &n);
> +  else
> +    {
> +      uint32_t xi = asuint (x);
> +      int sign = xi >> 31;
> +
> +      dx = reduce_large (xi, &n);
> +      dx = sign ? -dx : dx;
> +    }
> +
> +  y[0] = dx;
> +  y[1] = dx - y[0];
> +  return n;
> +}
>  
>  float __tanf(float x)
>  {
> @@ -42,7 +66,7 @@ float __tanf(float x)
>  
>      /* argument reduction needed */
>  	else {
> -	    n = __ieee754_rem_pio2f(x,y);
> +	    n = rem_pio2f(x,y);
>  	    return __kernel_tanf(y[0],y[1],1-((n&1)<<1)); /*   1 -- n even
>  							      -1 -- n odd */
>  	}
>
Wilco Dijkstra Aug. 22, 2018, 2:27 p.m. UTC | #3
Joseph Myers wrote:

> +
> +static inline int32_t
> +rem_pio2f (float x, float *y)

> Please put a comment on this function documenting its semantics.

Done, see below.


Speedup tanf range reduction by using the new sincosf range
reduction algorithm.  Overall code quality is improved due to
inlining, so there is a speedup even if no range reduction is
required.

Passes GLIBC testsuite on AArch64.  Some files are no longer
required which are removed in the next patch.

tanf througput gains on Cortex-A72:
* |x| < M_PI_4  : 1.1x
* |x| < M_PI_2  : 1.2x
* |x| < 2 * M_PI: 1.5x
* |x| < 120.0   : 1.6x
* |x| < Inf     : 12.1x

ChangeLog:
2018-08-22  Wilco Dijkstra  <wdijkstr@arm.com>

	* sysdeps/ieee754/flt-32/s_tanf.c (__tanf): Use fast range reduction.
--

diff --git a/sysdeps/ieee754/flt-32/s_tanf.c b/sysdeps/ieee754/flt-32/s_tanf.c
index ba3af54913669e4abdfd864307856ec44138f9b9..fd104103ad026a8c87ea7b571f13e868561a2998 100644
--- a/sysdeps/ieee754/flt-32/s_tanf.c
+++ b/sysdeps/ieee754/flt-32/s_tanf.c
@@ -21,6 +21,33 @@ static char rcsid[] = "$NetBSD: s_tanf.c,v 1.4 1995/05/10 20:48:20 jtc Exp $";
 #include <math.h>
 #include <math_private.h>
 #include <libm-alias-float.h>
+#include "s_sincosf.h"
+
+/* Reduce range of X to a multiple of PI/2.  The modulo result is between
+   -PI/4 and PI/4 and returned as a high part y[0] and a low part y[1].
+   The low bit in the return value indicates the first or 2nd half of tanf.  */
+static inline int32_t
+rem_pio2f (float x, float *y)
+{
+  double dx = x;
+  int n;
+  const sincos_t *p = &__sincosf_table[0];
+
+  if (__glibc_likely (abstop12 (x) < abstop12 (120.0f)))
+    dx = reduce_fast (dx, p, &n);
+  else
+    {
+      uint32_t xi = asuint (x);
+      int sign = xi >> 31;
+
+      dx = reduce_large (xi, &n);
+      dx = sign ? -dx : dx;
+    }
+
+  y[0] = dx;
+  y[1] = dx - y[0];
+  return n;
+}
 
 float __tanf(float x)
 {
@@ -42,7 +69,7 @@ float __tanf(float x)
 
     /* argument reduction needed */
 	else {
-	    n = __ieee754_rem_pio2f(x,y);
+	    n = rem_pio2f(x,y);
 	    return __kernel_tanf(y[0],y[1],1-((n&1)<<1)); /*   1 -- n even
 							      -1 -- n odd */
 	}
Joseph Myers Aug. 22, 2018, 2:53 p.m. UTC | #4
On Wed, 22 Aug 2018, Wilco Dijkstra wrote:

> Joseph Myers wrote:
> 
> > +
> > +static inline int32_t
> > +rem_pio2f (float x, float *y)
> 
> > Please put a comment on this function documenting its semantics.
> 
> Done, see below.

This version is OK.
Tulio Magno Quites Machado Filho Aug. 22, 2018, 11:53 p.m. UTC | #5
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:

> Joseph Myers wrote:
>
>> +
>> +static inline int32_t
>> +rem_pio2f (float x, float *y)
>
>> Please put a comment on this function documenting its semantics.
>
> Done, see below.
>
>
> Speedup tanf range reduction by using the new sincosf range
> reduction algorithm.  Overall code quality is improved due to
> inlining, so there is a speedup even if no range reduction is
> required.
>
> Passes GLIBC testsuite on AArch64.  Some files are no longer
> required which are removed in the next patch.
>
> tanf througput gains on Cortex-A72:
> * |x| < M_PI_4  : 1.1x
> * |x| < M_PI_2  : 1.2x
> * |x| < 2 * M_PI: 1.5x
> * |x| < 120.0   : 1.6x
> * |x| < Inf     : 12.1x

LGTM too.

If we were to have a benchtest for tanf with drand48 inputs, should we group
the entries according to __kernel_tanf() ? e.g.

 * |x|>=0.6744 - fast path for __kernel_tanf
 * |x|<=0.6744
Wilco Dijkstra Aug. 23, 2018, 11:46 a.m. UTC | #6
Tulio Magno Quites Machado Filho wrote:

> LGTM too.

Thanks, I've committed it.

> If we were to have a benchtest for tanf with drand48 inputs, should we group
> the entries according to __kernel_tanf() ? e.g.
>
>  * |x|>=0.6744 - fast path for __kernel_tanf
>  * |x|<=0.6744

Having a real trace would be best, and randomized inputs for commonly used
input ranges are useful. I'm not sure whether a better algorithm would have the
same slow/fast paths as the current code (sinf/cosf are now 4x faster...), but
if you're planning to post a benchtest then testing those ranges would be fine.

Cheers,
Wilco
diff mbox series

Patch

diff --git a/sysdeps/ieee754/flt-32/s_tanf.c b/sysdeps/ieee754/flt-32/s_tanf.c
index ba3af54913669e4abdfd864307856ec44138f9b9..a397665c4bab7785049935ef526472afedf82e34 100644
--- a/sysdeps/ieee754/flt-32/s_tanf.c
+++ b/sysdeps/ieee754/flt-32/s_tanf.c
@@ -21,6 +21,30 @@  static char rcsid[] = "$NetBSD: s_tanf.c,v 1.4 1995/05/10 20:48:20 jtc Exp $";
 #include <math.h>
 #include <math_private.h>
 #include <libm-alias-float.h>
+#include "s_sincosf.h"
+
+static inline int32_t
+rem_pio2f (float x, float *y)
+{
+  double dx = x;
+  int n;
+  const sincos_t *p = &__sincosf_table[0];
+
+  if (__glibc_likely (abstop12 (x) < abstop12 (120.0f)))
+    dx = reduce_fast (dx, p, &n);
+  else
+    {
+      uint32_t xi = asuint (x);
+      int sign = xi >> 31;
+
+      dx = reduce_large (xi, &n);
+      dx = sign ? -dx : dx;
+    }
+
+  y[0] = dx;
+  y[1] = dx - y[0];
+  return n;
+}
 
 float __tanf(float x)
 {
@@ -42,7 +66,7 @@  float __tanf(float x)
 
     /* argument reduction needed */
 	else {
-	    n = __ieee754_rem_pio2f(x,y);
+	    n = rem_pio2f(x,y);
 	    return __kernel_tanf(y[0],y[1],1-((n&1)<<1)); /*   1 -- n even
 							      -1 -- n odd */
 	}