Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)

Message ID 20180209233738.GQ5867@tucnak
State New
Headers show
Series
  • Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)
Related show

Commit Message

Jakub Jelinek Feb. 9, 2018, 11:37 p.m.
Hi!

Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
breaks some package (Marek should know which), as it has 7ulp error.
Generally one should be prepared for some errors with -ffast-math.

Though, in this case, if the target has c99 runtime and C is
a positive 0x1pNN it seems much better to use exp2 over exp, for
C being 2 pow (2, x) is optimized into exp2 (x) and even for other
values log2(C) will still be some positive or negative integer, so
in many cases there won't be any rounding errors in the multiplication.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Perhaps we should do something similar if C is a power of 10 (use exp10
and log10).

2018-02-10  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/84309
	* match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into
	exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available.

	* gcc.dg/pr84309.c: New test.
 

	Jakub

Comments

Richard Biener Feb. 10, 2018, 7 a.m. | #1
On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
>breaks some package (Marek should know which), as it has 7ulp error.
>Generally one should be prepared for some errors with -ffast-math.
>
>Though, in this case, if the target has c99 runtime and C is
>a positive 0x1pNN it seems much better to use exp2 over exp, for
>C being 2 pow (2, x) is optimized into exp2 (x) and even for other
>values log2(C) will still be some positive or negative integer, so
>in many cases there won't be any rounding errors in the multiplication.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. I wonder whether there are vectorized variants in libmvec? 

Richard. 

>Perhaps we should do something similar if C is a power of 10 (use exp10
>and log10).
>
>2018-02-10  Jakub Jelinek  <jakub@redhat.com>
>
>	PR middle-end/84309
>	* match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into
>	exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available.
>
>	* gcc.dg/pr84309.c: New test.
> 
>--- gcc/match.pd.jj	2018-01-26 12:43:23.208922420 +0100
>+++ gcc/match.pd	2018-02-09 18:48:26.412021408 +0100
>@@ -3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    (logs (pows @0 @1))
>    (mult @1 (logs @0))))
> 
>- /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
>+ /* pow(C,x) -> exp(log(C)*x) if C > 0,
>+    or if C is a positive power of 2,
>+    pow(C,x) -> exp2(log2(C)*x).  */
>  (for pows (POW)
>       exps (EXP)
>       logs (LOG)
>+      exp2s (EXP2)
>+      log2s (LOG2)
>   (simplify
>    (pows REAL_CST@0 @1)
>-    (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
>-	 && real_isfinite (TREE_REAL_CST_PTR (@0)))
>-     (exps (mult (logs @0) @1)))))
>+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
>+	&& real_isfinite (TREE_REAL_CST_PTR (@0)))
>+    (with {
>+       const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
>+       bool use_exp2 = false;
>+       if (targetm.libc_has_function (function_c99_misc)
>+	   && value->cl == rvc_normal)
>+	 {
>+	   REAL_VALUE_TYPE frac_rvt = *value;
>+	   SET_REAL_EXP (&frac_rvt, 1);
>+	   if (real_equal (&frac_rvt, &dconst1))
>+	     use_exp2 = true;
>+	 }
>+     }
>+     (if (use_exp2)
>+       (exp2s (mult (log2s @0) @1))
>+       (exps (mult (logs @0) @1)))))))
> 
>  (for sqrts (SQRT)
>       cbrts (CBRT)
>--- gcc/testsuite/gcc.dg/pr84309.c.jj	2018-02-09 18:54:52.254787678
>+0100
>+++ gcc/testsuite/gcc.dg/pr84309.c	2018-02-09 18:59:02.343636178 +0100
>@@ -0,0 +1,14 @@
>+/* PR middle-end/84309 */
>+/* { dg-do run { target c99_runtime } } */
>+/* { dg-options "-O2 -ffast-math" } */
>+
>+int
>+main ()
>+{
>+  unsigned long a = 1024;
>+  unsigned long b = 16 * 1024;
>+  unsigned long c = __builtin_pow (2, (__builtin_log2 (a) +
>__builtin_log2 (b)) / 2);
>+  if (c != 4096)
>+    __builtin_abort ();
>+  return 0;
>+}
>
>	Jakub
Jakub Jelinek Feb. 10, 2018, 9:44 a.m. | #2
On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote:
> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
> >Hi!
> >
> >Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
> >breaks some package (Marek should know which), as it has 7ulp error.
> >Generally one should be prepared for some errors with -ffast-math.
> >
> >Though, in this case, if the target has c99 runtime and C is
> >a positive 0x1pNN it seems much better to use exp2 over exp, for
> >C being 2 pow (2, x) is optimized into exp2 (x) and even for other
> >values log2(C) will still be some positive or negative integer, so
> >in many cases there won't be any rounding errors in the multiplication.
> >
> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> OK. I wonder whether there are vectorized variants in libmvec? 

Unfortunately libmvec only provides pow and exp, not exp2 nor exp10.
Wonder how much work it would be to provide also that.

Joseph, is exp2 in glibc .5ulp accurate like exp for double, or not?
Anything known about their relative performance?

> >Perhaps we should do something similar if C is a power of 10 (use exp10
> >and log10).
> >
> >2018-02-10  Jakub Jelinek  <jakub@redhat.com>
> >
> >	PR middle-end/84309
> >	* match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into
> >	exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available.
> >
> >	* gcc.dg/pr84309.c: New test.
> > 
> >--- gcc/match.pd.jj	2018-01-26 12:43:23.208922420 +0100
> >+++ gcc/match.pd	2018-02-09 18:48:26.412021408 +0100
> >@@ -3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >    (logs (pows @0 @1))
> >    (mult @1 (logs @0))))
> > 
> >- /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
> >+ /* pow(C,x) -> exp(log(C)*x) if C > 0,
> >+    or if C is a positive power of 2,
> >+    pow(C,x) -> exp2(log2(C)*x).  */
> >  (for pows (POW)
> >       exps (EXP)
> >       logs (LOG)
> >+      exp2s (EXP2)
> >+      log2s (LOG2)
> >   (simplify
> >    (pows REAL_CST@0 @1)
> >-    (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
> >-	 && real_isfinite (TREE_REAL_CST_PTR (@0)))
> >-     (exps (mult (logs @0) @1)))))
> >+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
> >+	&& real_isfinite (TREE_REAL_CST_PTR (@0)))
> >+    (with {
> >+       const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
> >+       bool use_exp2 = false;
> >+       if (targetm.libc_has_function (function_c99_misc)
> >+	   && value->cl == rvc_normal)
> >+	 {
> >+	   REAL_VALUE_TYPE frac_rvt = *value;
> >+	   SET_REAL_EXP (&frac_rvt, 1);
> >+	   if (real_equal (&frac_rvt, &dconst1))
> >+	     use_exp2 = true;
> >+	 }
> >+     }
> >+     (if (use_exp2)
> >+       (exp2s (mult (log2s @0) @1))
> >+       (exps (mult (logs @0) @1)))))))
> > 
> >  (for sqrts (SQRT)
> >       cbrts (CBRT)
> >--- gcc/testsuite/gcc.dg/pr84309.c.jj	2018-02-09 18:54:52.254787678
> >+0100
> >+++ gcc/testsuite/gcc.dg/pr84309.c	2018-02-09 18:59:02.343636178 +0100
> >@@ -0,0 +1,14 @@
> >+/* PR middle-end/84309 */
> >+/* { dg-do run { target c99_runtime } } */
> >+/* { dg-options "-O2 -ffast-math" } */
> >+
> >+int
> >+main ()
> >+{
> >+  unsigned long a = 1024;
> >+  unsigned long b = 16 * 1024;
> >+  unsigned long c = __builtin_pow (2, (__builtin_log2 (a) +
> >__builtin_log2 (b)) / 2);
> >+  if (c != 4096)
> >+    __builtin_abort ();
> >+  return 0;
> >+}

	Jakub
Richard Biener Feb. 10, 2018, 11:29 a.m. | #3
On February 10, 2018 10:44:37 AM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote:
>> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek
><jakub@redhat.com> wrote:
>> >Hi!
>> >
>> >Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
>> >breaks some package (Marek should know which), as it has 7ulp error.
>> >Generally one should be prepared for some errors with -ffast-math.
>> >
>> >Though, in this case, if the target has c99 runtime and C is
>> >a positive 0x1pNN it seems much better to use exp2 over exp, for
>> >C being 2 pow (2, x) is optimized into exp2 (x) and even for other
>> >values log2(C) will still be some positive or negative integer, so
>> >in many cases there won't be any rounding errors in the
>multiplication.
>> >
>> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> 
>> OK. I wonder whether there are vectorized variants in libmvec? 
>
>Unfortunately libmvec only provides pow and exp, not exp2 nor exp10.

So maybe delay this folding then, there's already two phases we do for math functions. Not sure if they conveniently align with vectorization... 

>Wonder how much work it would be to provide also that.
>
>Joseph, is exp2 in glibc .5ulp accurate like exp for double, or not?
>Anything known about their relative performance?
>
>> >Perhaps we should do something similar if C is a power of 10 (use
>exp10
>> >and log10).
>> >
>> >2018-02-10  Jakub Jelinek  <jakub@redhat.com>
>> >
>> >	PR middle-end/84309
>> >	* match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into
>> >	exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available.
>> >
>> >	* gcc.dg/pr84309.c: New test.
>> > 
>> >--- gcc/match.pd.jj	2018-01-26 12:43:23.208922420 +0100
>> >+++ gcc/match.pd	2018-02-09 18:48:26.412021408 +0100
>> >@@ -3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> >    (logs (pows @0 @1))
>> >    (mult @1 (logs @0))))
>> > 
>> >- /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
>> >+ /* pow(C,x) -> exp(log(C)*x) if C > 0,
>> >+    or if C is a positive power of 2,
>> >+    pow(C,x) -> exp2(log2(C)*x).  */
>> >  (for pows (POW)
>> >       exps (EXP)
>> >       logs (LOG)
>> >+      exp2s (EXP2)
>> >+      log2s (LOG2)
>> >   (simplify
>> >    (pows REAL_CST@0 @1)
>> >-    (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
>> >-	 && real_isfinite (TREE_REAL_CST_PTR (@0)))
>> >-     (exps (mult (logs @0) @1)))))
>> >+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
>> >+	&& real_isfinite (TREE_REAL_CST_PTR (@0)))
>> >+    (with {
>> >+       const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
>> >+       bool use_exp2 = false;
>> >+       if (targetm.libc_has_function (function_c99_misc)
>> >+	   && value->cl == rvc_normal)
>> >+	 {
>> >+	   REAL_VALUE_TYPE frac_rvt = *value;
>> >+	   SET_REAL_EXP (&frac_rvt, 1);
>> >+	   if (real_equal (&frac_rvt, &dconst1))
>> >+	     use_exp2 = true;
>> >+	 }
>> >+     }
>> >+     (if (use_exp2)
>> >+       (exp2s (mult (log2s @0) @1))
>> >+       (exps (mult (logs @0) @1)))))))
>> > 
>> >  (for sqrts (SQRT)
>> >       cbrts (CBRT)
>> >--- gcc/testsuite/gcc.dg/pr84309.c.jj	2018-02-09 18:54:52.254787678
>> >+0100
>> >+++ gcc/testsuite/gcc.dg/pr84309.c	2018-02-09 18:59:02.343636178
>+0100
>> >@@ -0,0 +1,14 @@
>> >+/* PR middle-end/84309 */
>> >+/* { dg-do run { target c99_runtime } } */
>> >+/* { dg-options "-O2 -ffast-math" } */
>> >+
>> >+int
>> >+main ()
>> >+{
>> >+  unsigned long a = 1024;
>> >+  unsigned long b = 16 * 1024;
>> >+  unsigned long c = __builtin_pow (2, (__builtin_log2 (a) +
>> >__builtin_log2 (b)) / 2);
>> >+  if (c != 4096)
>> >+    __builtin_abort ();
>> >+  return 0;
>> >+}
>
>	Jakub
Marek Polacek Feb. 10, 2018, 12:29 p.m. | #4
On Sat, Feb 10, 2018 at 12:37:38AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
> breaks some package (Marek should know which), as it has 7ulp error.
> Generally one should be prepared for some errors with -ffast-math.

I reduced it from "test-cachunker" in package casync.

	Marek
Jakub Jelinek Feb. 10, 2018, 2:26 p.m. | #5
On Sat, Feb 10, 2018 at 12:29:42PM +0100, Richard Biener wrote:
> On February 10, 2018 10:44:37 AM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
> >On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote:
> >> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek
> ><jakub@redhat.com> wrote:
> >> >Hi!
> >> >
> >> >Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
> >> >breaks some package (Marek should know which), as it has 7ulp error.
> >> >Generally one should be prepared for some errors with -ffast-math.
> >> >
> >> >Though, in this case, if the target has c99 runtime and C is
> >> >a positive 0x1pNN it seems much better to use exp2 over exp, for
> >> >C being 2 pow (2, x) is optimized into exp2 (x) and even for other
> >> >values log2(C) will still be some positive or negative integer, so
> >> >in many cases there won't be any rounding errors in the
> >multiplication.
> >> >
> >> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >> 
> >> OK. I wonder whether there are vectorized variants in libmvec? 
> >
> >Unfortunately libmvec only provides pow and exp, not exp2 nor exp10.
> 
> So maybe delay this folding then, there's already two phases we do for
> math functions.  Not sure if they conveniently align with vectorization...

How would that delay look like?
If use_exp2 is true and (cfun->curr_properties & PROP_gimple_lvec) == 0,
don't fold it?  Then I guess if we vectorize or slp vectorize the pow
as vector pow, we'd need to match.pd it into the exp (log (vec_cst) * x).

	Jakub
Richard Biener Feb. 10, 2018, 4:04 p.m. | #6
On February 10, 2018 3:26:46 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Sat, Feb 10, 2018 at 12:29:42PM +0100, Richard Biener wrote:
>> On February 10, 2018 10:44:37 AM GMT+01:00, Jakub Jelinek
><jakub@redhat.com> wrote:
>> >On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote:
>> >> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek
>> ><jakub@redhat.com> wrote:
>> >> >Hi!
>> >> >
>> >> >Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0
>optimization
>> >> >breaks some package (Marek should know which), as it has 7ulp
>error.
>> >> >Generally one should be prepared for some errors with
>-ffast-math.
>> >> >
>> >> >Though, in this case, if the target has c99 runtime and C is
>> >> >a positive 0x1pNN it seems much better to use exp2 over exp, for
>> >> >C being 2 pow (2, x) is optimized into exp2 (x) and even for
>other
>> >> >values log2(C) will still be some positive or negative integer,
>so
>> >> >in many cases there won't be any rounding errors in the
>> >multiplication.
>> >> >
>> >> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
>trunk?
>> >> 
>> >> OK. I wonder whether there are vectorized variants in libmvec? 
>> >
>> >Unfortunately libmvec only provides pow and exp, not exp2 nor exp10.
>> 
>> So maybe delay this folding then, there's already two phases we do
>for
>> math functions.  Not sure if they conveniently align with
>vectorization...
>
>How would that delay look like?
>If use_exp2 is true and (cfun->curr_properties & PROP_gimple_lvec) ==
>0,
>don't fold it?  

I think we have a canonicalize_math phase and an optimization one. But I'm not sure this transform matches either case. 

Then I guess if we vectorize or slp vectorize the pow
>as vector pow, we'd need to match.pd it into the exp (log (vec_cst) *
>x).

Yes.  Of course extending libmvec would be much preferred... 

Richard. 

>
>	Jakub
Joseph Myers Feb. 12, 2018, 10:01 p.m. | #7
On Sat, 10 Feb 2018, Wilco Dijkstra wrote:

> For floats exp2f is ~10% faster than expf, powf is 2.2 times slower, and 
> exp10f is 3.2 times slower (slower than powf due to using double pow).

I expect it would be reasonably straightforward to adapt Szabolcs's 
optimized expf to produce an optimized exp10f (and likewise for log10f).

Patch

--- gcc/match.pd.jj	2018-01-26 12:43:23.208922420 +0100
+++ gcc/match.pd	2018-02-09 18:48:26.412021408 +0100
@@ -3992,15 +3992,33 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (logs (pows @0 @1))
    (mult @1 (logs @0))))
 
- /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
+ /* pow(C,x) -> exp(log(C)*x) if C > 0,
+    or if C is a positive power of 2,
+    pow(C,x) -> exp2(log2(C)*x).  */
  (for pows (POW)
       exps (EXP)
       logs (LOG)
+      exp2s (EXP2)
+      log2s (LOG2)
   (simplify
    (pows REAL_CST@0 @1)
-    (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
-	 && real_isfinite (TREE_REAL_CST_PTR (@0)))
-     (exps (mult (logs @0) @1)))))
+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
+	&& real_isfinite (TREE_REAL_CST_PTR (@0)))
+    (with {
+       const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
+       bool use_exp2 = false;
+       if (targetm.libc_has_function (function_c99_misc)
+	   && value->cl == rvc_normal)
+	 {
+	   REAL_VALUE_TYPE frac_rvt = *value;
+	   SET_REAL_EXP (&frac_rvt, 1);
+	   if (real_equal (&frac_rvt, &dconst1))
+	     use_exp2 = true;
+	 }
+     }
+     (if (use_exp2)
+       (exp2s (mult (log2s @0) @1))
+       (exps (mult (logs @0) @1)))))))
 
  (for sqrts (SQRT)
       cbrts (CBRT)
--- gcc/testsuite/gcc.dg/pr84309.c.jj	2018-02-09 18:54:52.254787678 +0100
+++ gcc/testsuite/gcc.dg/pr84309.c	2018-02-09 18:59:02.343636178 +0100
@@ -0,0 +1,14 @@ 
+/* PR middle-end/84309 */
+/* { dg-do run { target c99_runtime } } */
+/* { dg-options "-O2 -ffast-math" } */
+
+int
+main ()
+{
+  unsigned long a = 1024;
+  unsigned long b = 16 * 1024;
+  unsigned long c = __builtin_pow (2, (__builtin_log2 (a) + __builtin_log2 (b)) / 2);
+  if (c != 4096)
+    __builtin_abort ();
+  return 0;
+}