Message ID  20180209233738.GQ5867@tucnak 

State  New 
Headers  show 
Series 

Related  show 
On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote: >Hi! > >Apparently the new pow(C,x) > exp(log(C)*x) if C > 0 optimization >breaks some package (Marek should know which), as it has 7ulp error. >Generally one should be prepared for some errors with ffastmath. > >Though, in this case, if the target has c99 runtime and C is >a positive 0x1pNN it seems much better to use exp2 over exp, for >C being 2 pow (2, x) is optimized into exp2 (x) and even for other >values log2(C) will still be some positive or negative integer, so >in many cases there won't be any rounding errors in the multiplication. > >Bootstrapped/regtested on x86_64linux and i686linux, ok for trunk? OK. I wonder whether there are vectorized variants in libmvec? Richard. >Perhaps we should do something similar if C is a power of 10 (use exp10 >and log10). > >20180210 Jakub Jelinek <jakub@redhat.com> > > PR middleend/84309 > * match.pd (pow(C,x) > exp(log(C)*x)): Optimize instead into > exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available. > > * gcc.dg/pr84309.c: New test. > > gcc/match.pd.jj 20180126 12:43:23.208922420 +0100 >+++ gcc/match.pd 20180209 18:48:26.412021408 +0100 >@@ 3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (logs (pows @0 @1)) > (mult @1 (logs @0)))) > > /* pow(C,x) > exp(log(C)*x) if C > 0. */ >+ /* pow(C,x) > exp(log(C)*x) if C > 0, >+ or if C is a positive power of 2, >+ pow(C,x) > exp2(log2(C)*x). */ > (for pows (POW) > exps (EXP) > logs (LOG) >+ exp2s (EXP2) >+ log2s (LOG2) > (simplify > (pows REAL_CST@0 @1) > (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) > && real_isfinite (TREE_REAL_CST_PTR (@0))) > (exps (mult (logs @0) @1))))) >+ (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) >+ && real_isfinite (TREE_REAL_CST_PTR (@0))) >+ (with { >+ const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0); >+ bool use_exp2 = false; >+ if (targetm.libc_has_function (function_c99_misc) >+ && value>cl == rvc_normal) >+ { >+ REAL_VALUE_TYPE frac_rvt = *value; >+ SET_REAL_EXP (&frac_rvt, 1); >+ if (real_equal (&frac_rvt, &dconst1)) >+ use_exp2 = true; >+ } >+ } >+ (if (use_exp2) >+ (exp2s (mult (log2s @0) @1)) >+ (exps (mult (logs @0) @1))))))) > > (for sqrts (SQRT) > cbrts (CBRT) > gcc/testsuite/gcc.dg/pr84309.c.jj 20180209 18:54:52.254787678 >+0100 >+++ gcc/testsuite/gcc.dg/pr84309.c 20180209 18:59:02.343636178 +0100 >@@ 0,0 +1,14 @@ >+/* PR middleend/84309 */ >+/* { dgdo run { target c99_runtime } } */ >+/* { dgoptions "O2 ffastmath" } */ >+ >+int >+main () >+{ >+ unsigned long a = 1024; >+ unsigned long b = 16 * 1024; >+ unsigned long c = __builtin_pow (2, (__builtin_log2 (a) + >__builtin_log2 (b)) / 2); >+ if (c != 4096) >+ __builtin_abort (); >+ return 0; >+} > > Jakub
On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote: > On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote: > >Hi! > > > >Apparently the new pow(C,x) > exp(log(C)*x) if C > 0 optimization > >breaks some package (Marek should know which), as it has 7ulp error. > >Generally one should be prepared for some errors with ffastmath. > > > >Though, in this case, if the target has c99 runtime and C is > >a positive 0x1pNN it seems much better to use exp2 over exp, for > >C being 2 pow (2, x) is optimized into exp2 (x) and even for other > >values log2(C) will still be some positive or negative integer, so > >in many cases there won't be any rounding errors in the multiplication. > > > >Bootstrapped/regtested on x86_64linux and i686linux, ok for trunk? > > OK. I wonder whether there are vectorized variants in libmvec? Unfortunately libmvec only provides pow and exp, not exp2 nor exp10. Wonder how much work it would be to provide also that. Joseph, is exp2 in glibc .5ulp accurate like exp for double, or not? Anything known about their relative performance? > >Perhaps we should do something similar if C is a power of 10 (use exp10 > >and log10). > > > >20180210 Jakub Jelinek <jakub@redhat.com> > > > > PR middleend/84309 > > * match.pd (pow(C,x) > exp(log(C)*x)): Optimize instead into > > exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available. > > > > * gcc.dg/pr84309.c: New test. > > > > gcc/match.pd.jj 20180126 12:43:23.208922420 +0100 > >+++ gcc/match.pd 20180209 18:48:26.412021408 +0100 > >@@ 3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (logs (pows @0 @1)) > > (mult @1 (logs @0)))) > > > > /* pow(C,x) > exp(log(C)*x) if C > 0. */ > >+ /* pow(C,x) > exp(log(C)*x) if C > 0, > >+ or if C is a positive power of 2, > >+ pow(C,x) > exp2(log2(C)*x). */ > > (for pows (POW) > > exps (EXP) > > logs (LOG) > >+ exp2s (EXP2) > >+ log2s (LOG2) > > (simplify > > (pows REAL_CST@0 @1) > > (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) > > && real_isfinite (TREE_REAL_CST_PTR (@0))) > > (exps (mult (logs @0) @1))))) > >+ (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) > >+ && real_isfinite (TREE_REAL_CST_PTR (@0))) > >+ (with { > >+ const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0); > >+ bool use_exp2 = false; > >+ if (targetm.libc_has_function (function_c99_misc) > >+ && value>cl == rvc_normal) > >+ { > >+ REAL_VALUE_TYPE frac_rvt = *value; > >+ SET_REAL_EXP (&frac_rvt, 1); > >+ if (real_equal (&frac_rvt, &dconst1)) > >+ use_exp2 = true; > >+ } > >+ } > >+ (if (use_exp2) > >+ (exp2s (mult (log2s @0) @1)) > >+ (exps (mult (logs @0) @1))))))) > > > > (for sqrts (SQRT) > > cbrts (CBRT) > > gcc/testsuite/gcc.dg/pr84309.c.jj 20180209 18:54:52.254787678 > >+0100 > >+++ gcc/testsuite/gcc.dg/pr84309.c 20180209 18:59:02.343636178 +0100 > >@@ 0,0 +1,14 @@ > >+/* PR middleend/84309 */ > >+/* { dgdo run { target c99_runtime } } */ > >+/* { dgoptions "O2 ffastmath" } */ > >+ > >+int > >+main () > >+{ > >+ unsigned long a = 1024; > >+ unsigned long b = 16 * 1024; > >+ unsigned long c = __builtin_pow (2, (__builtin_log2 (a) + > >__builtin_log2 (b)) / 2); > >+ if (c != 4096) > >+ __builtin_abort (); > >+ return 0; > >+} Jakub
On February 10, 2018 10:44:37 AM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote: >On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote: >> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek ><jakub@redhat.com> wrote: >> >Hi! >> > >> >Apparently the new pow(C,x) > exp(log(C)*x) if C > 0 optimization >> >breaks some package (Marek should know which), as it has 7ulp error. >> >Generally one should be prepared for some errors with ffastmath. >> > >> >Though, in this case, if the target has c99 runtime and C is >> >a positive 0x1pNN it seems much better to use exp2 over exp, for >> >C being 2 pow (2, x) is optimized into exp2 (x) and even for other >> >values log2(C) will still be some positive or negative integer, so >> >in many cases there won't be any rounding errors in the >multiplication. >> > >> >Bootstrapped/regtested on x86_64linux and i686linux, ok for trunk? >> >> OK. I wonder whether there are vectorized variants in libmvec? > >Unfortunately libmvec only provides pow and exp, not exp2 nor exp10. So maybe delay this folding then, there's already two phases we do for math functions. Not sure if they conveniently align with vectorization... >Wonder how much work it would be to provide also that. > >Joseph, is exp2 in glibc .5ulp accurate like exp for double, or not? >Anything known about their relative performance? > >> >Perhaps we should do something similar if C is a power of 10 (use >exp10 >> >and log10). >> > >> >20180210 Jakub Jelinek <jakub@redhat.com> >> > >> > PR middleend/84309 >> > * match.pd (pow(C,x) > exp(log(C)*x)): Optimize instead into >> > exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available. >> > >> > * gcc.dg/pr84309.c: New test. >> > >> > gcc/match.pd.jj 20180126 12:43:23.208922420 +0100 >> >+++ gcc/match.pd 20180209 18:48:26.412021408 +0100 >> >@@ 3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> > (logs (pows @0 @1)) >> > (mult @1 (logs @0)))) >> > >> > /* pow(C,x) > exp(log(C)*x) if C > 0. */ >> >+ /* pow(C,x) > exp(log(C)*x) if C > 0, >> >+ or if C is a positive power of 2, >> >+ pow(C,x) > exp2(log2(C)*x). */ >> > (for pows (POW) >> > exps (EXP) >> > logs (LOG) >> >+ exp2s (EXP2) >> >+ log2s (LOG2) >> > (simplify >> > (pows REAL_CST@0 @1) >> > (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) >> > && real_isfinite (TREE_REAL_CST_PTR (@0))) >> > (exps (mult (logs @0) @1))))) >> >+ (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) >> >+ && real_isfinite (TREE_REAL_CST_PTR (@0))) >> >+ (with { >> >+ const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0); >> >+ bool use_exp2 = false; >> >+ if (targetm.libc_has_function (function_c99_misc) >> >+ && value>cl == rvc_normal) >> >+ { >> >+ REAL_VALUE_TYPE frac_rvt = *value; >> >+ SET_REAL_EXP (&frac_rvt, 1); >> >+ if (real_equal (&frac_rvt, &dconst1)) >> >+ use_exp2 = true; >> >+ } >> >+ } >> >+ (if (use_exp2) >> >+ (exp2s (mult (log2s @0) @1)) >> >+ (exps (mult (logs @0) @1))))))) >> > >> > (for sqrts (SQRT) >> > cbrts (CBRT) >> > gcc/testsuite/gcc.dg/pr84309.c.jj 20180209 18:54:52.254787678 >> >+0100 >> >+++ gcc/testsuite/gcc.dg/pr84309.c 20180209 18:59:02.343636178 >+0100 >> >@@ 0,0 +1,14 @@ >> >+/* PR middleend/84309 */ >> >+/* { dgdo run { target c99_runtime } } */ >> >+/* { dgoptions "O2 ffastmath" } */ >> >+ >> >+int >> >+main () >> >+{ >> >+ unsigned long a = 1024; >> >+ unsigned long b = 16 * 1024; >> >+ unsigned long c = __builtin_pow (2, (__builtin_log2 (a) + >> >__builtin_log2 (b)) / 2); >> >+ if (c != 4096) >> >+ __builtin_abort (); >> >+ return 0; >> >+} > > Jakub
On Sat, Feb 10, 2018 at 12:37:38AM +0100, Jakub Jelinek wrote: > Hi! > > Apparently the new pow(C,x) > exp(log(C)*x) if C > 0 optimization > breaks some package (Marek should know which), as it has 7ulp error. > Generally one should be prepared for some errors with ffastmath. I reduced it from "testcachunker" in package casync. Marek
On Sat, Feb 10, 2018 at 12:29:42PM +0100, Richard Biener wrote: > On February 10, 2018 10:44:37 AM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote: > >On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote: > >> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek > ><jakub@redhat.com> wrote: > >> >Hi! > >> > > >> >Apparently the new pow(C,x) > exp(log(C)*x) if C > 0 optimization > >> >breaks some package (Marek should know which), as it has 7ulp error. > >> >Generally one should be prepared for some errors with ffastmath. > >> > > >> >Though, in this case, if the target has c99 runtime and C is > >> >a positive 0x1pNN it seems much better to use exp2 over exp, for > >> >C being 2 pow (2, x) is optimized into exp2 (x) and even for other > >> >values log2(C) will still be some positive or negative integer, so > >> >in many cases there won't be any rounding errors in the > >multiplication. > >> > > >> >Bootstrapped/regtested on x86_64linux and i686linux, ok for trunk? > >> > >> OK. I wonder whether there are vectorized variants in libmvec? > > > >Unfortunately libmvec only provides pow and exp, not exp2 nor exp10. > > So maybe delay this folding then, there's already two phases we do for > math functions. Not sure if they conveniently align with vectorization... How would that delay look like? If use_exp2 is true and (cfun>curr_properties & PROP_gimple_lvec) == 0, don't fold it? Then I guess if we vectorize or slp vectorize the pow as vector pow, we'd need to match.pd it into the exp (log (vec_cst) * x). Jakub
On February 10, 2018 3:26:46 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote: >On Sat, Feb 10, 2018 at 12:29:42PM +0100, Richard Biener wrote: >> On February 10, 2018 10:44:37 AM GMT+01:00, Jakub Jelinek ><jakub@redhat.com> wrote: >> >On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote: >> >> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek >> ><jakub@redhat.com> wrote: >> >> >Hi! >> >> > >> >> >Apparently the new pow(C,x) > exp(log(C)*x) if C > 0 >optimization >> >> >breaks some package (Marek should know which), as it has 7ulp >error. >> >> >Generally one should be prepared for some errors with >ffastmath. >> >> > >> >> >Though, in this case, if the target has c99 runtime and C is >> >> >a positive 0x1pNN it seems much better to use exp2 over exp, for >> >> >C being 2 pow (2, x) is optimized into exp2 (x) and even for >other >> >> >values log2(C) will still be some positive or negative integer, >so >> >> >in many cases there won't be any rounding errors in the >> >multiplication. >> >> > >> >> >Bootstrapped/regtested on x86_64linux and i686linux, ok for >trunk? >> >> >> >> OK. I wonder whether there are vectorized variants in libmvec? >> > >> >Unfortunately libmvec only provides pow and exp, not exp2 nor exp10. >> >> So maybe delay this folding then, there's already two phases we do >for >> math functions. Not sure if they conveniently align with >vectorization... > >How would that delay look like? >If use_exp2 is true and (cfun>curr_properties & PROP_gimple_lvec) == >0, >don't fold it? I think we have a canonicalize_math phase and an optimization one. But I'm not sure this transform matches either case. Then I guess if we vectorize or slp vectorize the pow >as vector pow, we'd need to match.pd it into the exp (log (vec_cst) * >x). Yes. Of course extending libmvec would be much preferred... Richard. > > Jakub
On Sat, 10 Feb 2018, Wilco Dijkstra wrote: > For floats exp2f is ~10% faster than expf, powf is 2.2 times slower, and > exp10f is 3.2 times slower (slower than powf due to using double pow). I expect it would be reasonably straightforward to adapt Szabolcs's optimized expf to produce an optimized exp10f (and likewise for log10f).
 gcc/match.pd.jj 20180126 12:43:23.208922420 +0100 +++ gcc/match.pd 20180209 18:48:26.412021408 +0100 @@ 3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (logs (pows @0 @1)) (mult @1 (logs @0))))  /* pow(C,x) > exp(log(C)*x) if C > 0. */ + /* pow(C,x) > exp(log(C)*x) if C > 0, + or if C is a positive power of 2, + pow(C,x) > exp2(log2(C)*x). */ (for pows (POW) exps (EXP) logs (LOG) + exp2s (EXP2) + log2s (LOG2) (simplify (pows REAL_CST@0 @1)  (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)  && real_isfinite (TREE_REAL_CST_PTR (@0)))  (exps (mult (logs @0) @1))))) + (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0) + && real_isfinite (TREE_REAL_CST_PTR (@0))) + (with { + const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0); + bool use_exp2 = false; + if (targetm.libc_has_function (function_c99_misc) + && value>cl == rvc_normal) + { + REAL_VALUE_TYPE frac_rvt = *value; + SET_REAL_EXP (&frac_rvt, 1); + if (real_equal (&frac_rvt, &dconst1)) + use_exp2 = true; + } + } + (if (use_exp2) + (exp2s (mult (log2s @0) @1)) + (exps (mult (logs @0) @1))))))) (for sqrts (SQRT) cbrts (CBRT)  gcc/testsuite/gcc.dg/pr84309.c.jj 20180209 18:54:52.254787678 +0100 +++ gcc/testsuite/gcc.dg/pr84309.c 20180209 18:59:02.343636178 +0100 @@ 0,0 +1,14 @@ +/* PR middleend/84309 */ +/* { dgdo run { target c99_runtime } } */ +/* { dgoptions "O2 ffastmath" } */ + +int +main () +{ + unsigned long a = 1024; + unsigned long b = 16 * 1024; + unsigned long c = __builtin_pow (2, (__builtin_log2 (a) + __builtin_log2 (b)) / 2); + if (c != 4096) + __builtin_abort (); + return 0; +}