Message ID | 20211203000103.737833-7-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | Improve hypot | expand |
On 02/12/2021 21:00, Adhemerval Zanella wrote: > The generic implementation is shows only slight worse performance: > > POWER9 reciprocal-throughput latency > master 13.4024 14.0967 > new hypot 14.8479 15.8061 > > POWER8 reciprocal-throughput latency > master 15.5767 16.8885 > new hypot 16.5371 18.4057 > > One way to improve might to make gcc generate xsmaxdp/xsmindp for > fmax/fmin (it onl does for -ffast-math, clang does for default > options). > > Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu > (power9). Hi Tulio/Paul, This the only missing patch in this set and I would like to check with you, powerpc maintainers, that if it would be ok to push it. The resulting performance difference, including the latest one that removes the wrappers, is slight better: POWER9 reciprocal-throughput latency master 13.4024 14.0967 new hypot 11.9206 13.9871 POWER8 reciprocal-throughput latency master 15.5767 16.8885 new hypot 15.3541 18.0856 The POWER8 çatency difference seems to be due branch misprediction in the max/min selection. In fact, if I use xsmaxdp/xsmindp on the USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8: POWER8 reciprocal-throughput latency xsmaxdp/xsmindp 12.8959 16.2082 POWER9 is not affected (I don't see any performance difference by using xsmaxdp/xsmindp). The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math for some reason, clang use them on default -O2 option. > --- > sysdeps/powerpc/fpu/e_hypot.c | 87 ------------------- > sysdeps/powerpc/fpu/e_hypotf.c | 78 ----------------- > .../powerpc32/power4/fpu/multiarch/Makefile | 5 +- > .../power4/fpu/multiarch/e_hypot-power7.c | 23 ----- > .../power4/fpu/multiarch/e_hypot-ppc32.c | 23 ----- > .../powerpc32/power4/fpu/multiarch/e_hypot.c | 33 ------- > .../power4/fpu/multiarch/e_hypotf-power7.c | 23 ----- > .../power4/fpu/multiarch/e_hypotf-ppc32.c | 23 ----- > .../powerpc32/power4/fpu/multiarch/e_hypotf.c | 33 ------- > 9 files changed, 1 insertion(+), 327 deletions(-) > delete mode 100644 sysdeps/powerpc/fpu/e_hypot.c > delete mode 100644 sysdeps/powerpc/fpu/e_hypotf.c > delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c > delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c > delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c > delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c > delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c > delete mode 100644 sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c > > diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c > deleted file mode 100644 > index f96c589bbd..0000000000 > --- a/sysdeps/powerpc/fpu/e_hypot.c > +++ /dev/null > @@ -1,87 +0,0 @@ > -/* Pythagorean addition using doubles > - Copyright (C) 2011-2021 Free Software Foundation, Inc. > - This file is part of the GNU C Library > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Library General Public License as > - published by the Free Software Foundation; either version 2 of the > - License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Library General Public License for more details. > - > - You should have received a copy of the GNU Library General Public > - License along with the GNU C Library; see the file COPYING.LIB. If > - not, see <https://www.gnu.org/licenses/>. */ > - > -#include <math.h> > -#include <math_private.h> > -#include <math-underflow.h> > -#include <stdint.h> > -#include <libm-alias-finite.h> > - > -/* __ieee754_hypot(x,y) > - * > - * This a FP only version without any FP->INT conversion. > - * It is similar to default C version, making appropriates > - * overflow and underflows checks as well scaling when it > - * is needed. > - */ > - > -double > -__ieee754_hypot (double x, double y) > -{ > - if ((isinf (x) || isinf (y)) > - && !issignaling (x) && !issignaling (y)) > - return INFINITY; > - if (isnan (x) || isnan (y)) > - return x + y; > - > - x = fabs (x); > - y = fabs (y); > - > - if (y > x) > - { > - double t = x; > - x = y; > - y = t; > - } > - if (y == 0.0) > - return x; > - > - /* if y is higher enough, y * 2^60 might overflow. The tests if > - y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the > - appropriate check to avoid the overflow exception generation. */ > - if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60)) > - return x + y; > - > - if (x > 0x1p+500) > - { > - x *= 0x1p-600; > - y *= 0x1p-600; > - return sqrt (x * x + y * y) / 0x1p-600; > - } > - if (y < 0x1p-500) > - { > - if (y <= 0x0.fffffffffffffp-1022) > - { > - x *= 0x1p+1022; > - y *= 0x1p+1022; > - double ret = sqrt (x * x + y * y) / 0x1p+1022; > - math_check_force_underflow_nonneg (ret); > - return ret; > - } > - else > - { > - x *= 0x1p+600; > - y *= 0x1p+600; > - return sqrt (x * x + y * y) / 0x1p+600; > - } > - } > - return sqrt (x * x + y * y); > -} > -#ifndef __ieee754_hypot > -libm_alias_finite (__ieee754_hypot, __hypot) > -#endif > diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c > deleted file mode 100644 > index fa201dda51..0000000000 > --- a/sysdeps/powerpc/fpu/e_hypotf.c > +++ /dev/null > @@ -1,78 +0,0 @@ > -/* Pythagorean addition using floats > - Copyright (C) 2011-2021 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Library General Public License as > - published by the Free Software Foundation; either version 2 of the > - License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Library General Public License for more details. > - > - You should have received a copy of the GNU Library General Public > - License along with the GNU C Library; see the file COPYING.LIB. If > - not, see <https://www.gnu.org/licenses/>. */ > - > -#include <math.h> > -#include <math_private.h> > -#include <stdint.h> > -#include <libm-alias-finite.h> > - > -/* __ieee754_hypotf(x,y) > - > - This a FP only version without any FP->INT conversion. > - It is similar to default C version, making appropriates > - overflow and underflows checks as using double precision > - instead of scaling. */ > - > -#ifdef _ARCH_PWR7 > -/* POWER7 isinf and isnan optimizations are fast. */ > -# define TEST_INF_NAN(x, y) \ > - if ((isinff(x) || isinff(y)) \ > - && !issignaling (x) && !issignaling (y)) \ > - return INFINITY; \ > - if (isnanf(x) || isnanf(y)) \ > - return x + y; > -# else > -/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are > - * costly (especially for POWER6). */ > -# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2) \ > - do { \ > - ieee_float_shape_type gf_u1; \ > - ieee_float_shape_type gf_u2; \ > - gf_u1.value = (f1); \ > - gf_u2.value = (f2); \ > - (i1) = gf_u1.word & 0x7fffffff; \ > - (i2) = gf_u2.word & 0x7fffffff; \ > - } while (0) > - > -# define TEST_INF_NAN(x, y) \ > - do { \ > - uint32_t hx, hy; \ > - GET_TWO_FLOAT_WORD(x, y, hx, hy); \ > - if (hy > hx) { \ > - uint32_t ht = hx; hx = hy; hy = ht; \ > - } \ > - if (hx >= 0x7f800000) { \ > - if ((hx == 0x7f800000 || hy == 0x7f800000) \ > - && !issignaling (x) && !issignaling (y)) \ > - return INFINITY; \ > - return x + y; \ > - } \ > - } while (0) > -#endif > - > - > -float > -__ieee754_hypotf (float x, float y) > -{ > - TEST_INF_NAN (x, y); > - > - return sqrt ((double) x * x + (double) y * y); > -} > -#ifndef __ieee754_hypotf > -libm_alias_finite (__ieee754_hypotf, __hypotf) > -#endif > diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile > index 60f2c95532..1de0f9b350 100644 > --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile > +++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile > @@ -15,8 +15,7 @@ libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \ > s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \ > s_modff-power5+ s_modff-ppc32 s_logbl-power7 \ > s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \ > - s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \ > - e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32 > + s_logbf-power7 s_logbf-ppc32 > > CFLAGS-s_llrintf-power6.c += -mcpu=power6 > CFLAGS-s_llrintf-ppc32.c += -mcpu=power4 > @@ -35,8 +34,6 @@ CFLAGS-s_modff-power5+.c = -mcpu=power5+ > CFLAGS-s_logbl-power7.c = -mcpu=power7 > CFLAGS-s_logb-power7.c = -mcpu=power7 > CFLAGS-s_logbf-power7.c = -mcpu=power7 > -CFLAGS-e_hypot-power7.c = -mcpu=power7 > -CFLAGS-e_hypotf-power7.c = -mcpu=power7 > > # These files quiet sNaNs in a way that is optimized away without > # -fsignaling-nans. > diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c > deleted file mode 100644 > index 382b4a0b27..0000000000 > --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* __ieee_hypot() POWER7 version. > - Copyright (C) 2013-2021 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <math.h> > - > -#define __ieee754_hypot __ieee754_hypot_power7 > - > -#include <sysdeps/powerpc/fpu/e_hypot.c> > diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c > deleted file mode 100644 > index abb14d5469..0000000000 > --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* __ieee_hypot() PowerPC32 version. > - Copyright (C) 2013-2021 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <math.h> > - > -#define __ieee754_hypot __ieee754_hypot_ppc32 > - > -#include <sysdeps/powerpc/fpu/e_hypot.c> > diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c > deleted file mode 100644 > index a16efa350c..0000000000 > --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c > +++ /dev/null > @@ -1,33 +0,0 @@ > -/* Multiple versions of ieee754_hypot. > - Copyright (C) 2013-2021 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <math.h> > -#include <math_private.h> > -#include <math_ldbl_opt.h> > -#include <libm-alias-finite.h> > -#include "init-arch.h" > - > -extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden; > -extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden; > - > -libc_ifunc (__ieee754_hypot, > - (hwcap & PPC_FEATURE_ARCH_2_06) > - ? __ieee754_hypot_power7 > - : __ieee754_hypot_ppc32); > - > -libm_alias_finite (__ieee754_hypot, __hypot) > diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c > deleted file mode 100644 > index f8a26ff22f..0000000000 > --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* __ieee754_hypot POWER7 version. > - Copyright (C) 2013-2021 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <math.h> > - > -#define __ieee754_hypotf __ieee754_hypotf_power7 > - > -#include <sysdeps/powerpc/fpu/e_hypotf.c> > diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c > deleted file mode 100644 > index b13f8c9db2..0000000000 > --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c > +++ /dev/null > @@ -1,23 +0,0 @@ > -/* __ieee_hypot() PowerPC32 version. > - Copyright (C) 2013-2021 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <math.h> > - > -#define __ieee754_hypotf __ieee754_hypotf_ppc32 > - > -#include <sysdeps/ieee754/flt-32/e_hypotf.c> > diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c > deleted file mode 100644 > index 1e72605db8..0000000000 > --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c > +++ /dev/null > @@ -1,33 +0,0 @@ > -/* Multiple versions of ieee754_hypotf. > - Copyright (C) 2013-2021 Free Software Foundation, Inc. > - This file is part of the GNU C Library. > - > - The GNU C Library is free software; you can redistribute it and/or > - modify it under the terms of the GNU Lesser General Public > - License as published by the Free Software Foundation; either > - version 2.1 of the License, or (at your option) any later version. > - > - The GNU C Library is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - Lesser General Public License for more details. > - > - You should have received a copy of the GNU Lesser General Public > - License along with the GNU C Library; if not, see > - <https://www.gnu.org/licenses/>. */ > - > -#include <math.h> > -#include <math_private.h> > -#include <math_ldbl_opt.h> > -#include <libm-alias-finite.h> > -#include "init-arch.h" > - > -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden; > -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden; > - > -libc_ifunc (__ieee754_hypotf, > - (hwcap & PPC_FEATURE_ARCH_2_06) > - ? __ieee754_hypotf_power7 > - : __ieee754_hypotf_ppc32); > - > -libm_alias_finite (__ieee754_hypotf, __hypotf) >
On Mon, Dec 06, 2021 at 02:12:27PM -0300, Adhemerval Zanella wrote: > On 02/12/2021 21:00, Adhemerval Zanella wrote: > > The generic implementation is shows only slight worse performance: > > > > POWER9 reciprocal-throughput latency > > master 13.4024 14.0967 > > new hypot 14.8479 15.8061 > > > > POWER8 reciprocal-throughput latency > > master 15.5767 16.8885 > > new hypot 16.5371 18.4057 > > > > One way to improve might to make gcc generate xsmaxdp/xsmindp for > > fmax/fmin (it onl does for -ffast-math, clang does for default > > options). > > > > Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu > > (power9). > > Hi Tulio/Paul, > > This the only missing patch in this set and I would like to check with you, > powerpc maintainers, that if it would be ok to push it. The resulting > performance difference, including the latest one that removes the wrappers, > is slight better: > > > POWER9 reciprocal-throughput latency > master 13.4024 14.0967 > new hypot 11.9206 13.9871 > > POWER8 reciprocal-throughput latency > master 15.5767 16.8885 > new hypot 15.3541 18.0856 For Power10 / master: "hypot": { "workload-random": { "duration": 5.28242e+08, "iterations": 4.8e+07, "reciprocal-throughput": 8.28478, "latency": 13.7253, "max-throughput": 1.20703e+08, "min-throughput": 7.2858e+07 } } For Power10 / new hypot: "hypot": { "workload-random": { "duration": 5.30731e+08, "iterations": 5.2e+07, "reciprocal-throughput": 7.21945, "latency": 13.1933, "max-throughput": 1.38515e+08, "min-throughput": 7.57963e+07 } } > The POWER8 çatency difference seems to be due branch misprediction > in the max/min selection. In fact, if I use xsmaxdp/xsmindp on the > USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8: > > POWER8 reciprocal-throughput latency > xsmaxdp/xsmindp 12.8959 16.2082 > > POWER9 is not affected (I don't see any performance difference by > using xsmaxdp/xsmindp). > > The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math > for some reason, clang use them on default -O2 option. I believe clang may be wrong here, in that the instructions do not properly handle NaN for the fmin/fmax semantics without -ffast-math. PC
On 06/12/2021 18:29, Paul A. Clarke wrote: > On Mon, Dec 06, 2021 at 02:12:27PM -0300, Adhemerval Zanella wrote: >> On 02/12/2021 21:00, Adhemerval Zanella wrote: >>> The generic implementation is shows only slight worse performance: >>> >>> POWER9 reciprocal-throughput latency >>> master 13.4024 14.0967 >>> new hypot 14.8479 15.8061 >>> >>> POWER8 reciprocal-throughput latency >>> master 15.5767 16.8885 >>> new hypot 16.5371 18.4057 >>> >>> One way to improve might to make gcc generate xsmaxdp/xsmindp for >>> fmax/fmin (it onl does for -ffast-math, clang does for default >>> options). >>> >>> Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu >>> (power9). >> >> Hi Tulio/Paul, >> >> This the only missing patch in this set and I would like to check with you, >> powerpc maintainers, that if it would be ok to push it. The resulting >> performance difference, including the latest one that removes the wrappers, >> is slight better: >> >> >> POWER9 reciprocal-throughput latency >> master 13.4024 14.0967 >> new hypot 11.9206 13.9871 >> >> POWER8 reciprocal-throughput latency >> master 15.5767 16.8885 >> new hypot 15.3541 18.0856 > > For Power10 / master: > > "hypot": { > "workload-random": { > "duration": 5.28242e+08, > "iterations": 4.8e+07, > "reciprocal-throughput": 8.28478, > "latency": 13.7253, > "max-throughput": 1.20703e+08, > "min-throughput": 7.2858e+07 > } > } > > For Power10 / new hypot: > > "hypot": { > "workload-random": { > "duration": 5.30731e+08, > "iterations": 5.2e+07, > "reciprocal-throughput": 7.21945, > "latency": 13.1933, > "max-throughput": 1.38515e+08, > "min-throughput": 7.57963e+07 > } > } So I assume it would be ok to push this patch based on the power10 performance numbers, right? > >> The POWER8 çatency difference seems to be due branch misprediction >> in the max/min selection. In fact, if I use xsmaxdp/xsmindp on the >> USE_FMAX_BUILTIN/USE_FMIN_BUILTIN, I see way better results on POWER8: >> >> POWER8 reciprocal-throughput latency >> xsmaxdp/xsmindp 12.8959 16.2082 >> >> POWER9 is not affected (I don't see any performance difference by >> using xsmaxdp/xsmindp). >> >> The xsmaxdp/xsmindp unfortunately are only emitted with -ffast-math >> for some reason, clang use them on default -O2 option. > > I believe clang may be wrong here, in that the instructions do not > properly handle NaN for the fmin/fmax semantics without -ffast-math. At least on power8/power9 I don't see any test-double-{fmax,fmin} regressions if I use xsmaxdp/xsmindp instead of default implementation. And libm-test-{fmax,fmin}.inc does extensively tests both default and signaling NaNs.
diff --git a/sysdeps/powerpc/fpu/e_hypot.c b/sysdeps/powerpc/fpu/e_hypot.c deleted file mode 100644 index f96c589bbd..0000000000 --- a/sysdeps/powerpc/fpu/e_hypot.c +++ /dev/null @@ -1,87 +0,0 @@ -/* Pythagorean addition using doubles - Copyright (C) 2011-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Library General Public License as - published by the Free Software Foundation; either version 2 of the - License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Library General Public License for more details. - - You should have received a copy of the GNU Library General Public - License along with the GNU C Library; see the file COPYING.LIB. If - not, see <https://www.gnu.org/licenses/>. */ - -#include <math.h> -#include <math_private.h> -#include <math-underflow.h> -#include <stdint.h> -#include <libm-alias-finite.h> - -/* __ieee754_hypot(x,y) - * - * This a FP only version without any FP->INT conversion. - * It is similar to default C version, making appropriates - * overflow and underflows checks as well scaling when it - * is needed. - */ - -double -__ieee754_hypot (double x, double y) -{ - if ((isinf (x) || isinf (y)) - && !issignaling (x) && !issignaling (y)) - return INFINITY; - if (isnan (x) || isnan (y)) - return x + y; - - x = fabs (x); - y = fabs (y); - - if (y > x) - { - double t = x; - x = y; - y = t; - } - if (y == 0.0) - return x; - - /* if y is higher enough, y * 2^60 might overflow. The tests if - y >= 1.7976931348623157e+308/2^60 (two60factor) and uses the - appropriate check to avoid the overflow exception generation. */ - if (y <= 0x1.fffffffffffffp+963 && x > (y * 0x1p+60)) - return x + y; - - if (x > 0x1p+500) - { - x *= 0x1p-600; - y *= 0x1p-600; - return sqrt (x * x + y * y) / 0x1p-600; - } - if (y < 0x1p-500) - { - if (y <= 0x0.fffffffffffffp-1022) - { - x *= 0x1p+1022; - y *= 0x1p+1022; - double ret = sqrt (x * x + y * y) / 0x1p+1022; - math_check_force_underflow_nonneg (ret); - return ret; - } - else - { - x *= 0x1p+600; - y *= 0x1p+600; - return sqrt (x * x + y * y) / 0x1p+600; - } - } - return sqrt (x * x + y * y); -} -#ifndef __ieee754_hypot -libm_alias_finite (__ieee754_hypot, __hypot) -#endif diff --git a/sysdeps/powerpc/fpu/e_hypotf.c b/sysdeps/powerpc/fpu/e_hypotf.c deleted file mode 100644 index fa201dda51..0000000000 --- a/sysdeps/powerpc/fpu/e_hypotf.c +++ /dev/null @@ -1,78 +0,0 @@ -/* Pythagorean addition using floats - Copyright (C) 2011-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Library General Public License as - published by the Free Software Foundation; either version 2 of the - License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Library General Public License for more details. - - You should have received a copy of the GNU Library General Public - License along with the GNU C Library; see the file COPYING.LIB. If - not, see <https://www.gnu.org/licenses/>. */ - -#include <math.h> -#include <math_private.h> -#include <stdint.h> -#include <libm-alias-finite.h> - -/* __ieee754_hypotf(x,y) - - This a FP only version without any FP->INT conversion. - It is similar to default C version, making appropriates - overflow and underflows checks as using double precision - instead of scaling. */ - -#ifdef _ARCH_PWR7 -/* POWER7 isinf and isnan optimizations are fast. */ -# define TEST_INF_NAN(x, y) \ - if ((isinff(x) || isinff(y)) \ - && !issignaling (x) && !issignaling (y)) \ - return INFINITY; \ - if (isnanf(x) || isnanf(y)) \ - return x + y; -# else -/* For POWER6 and below isinf/isnan triggers LHS and PLT calls are - * costly (especially for POWER6). */ -# define GET_TWO_FLOAT_WORD(f1,f2,i1,i2) \ - do { \ - ieee_float_shape_type gf_u1; \ - ieee_float_shape_type gf_u2; \ - gf_u1.value = (f1); \ - gf_u2.value = (f2); \ - (i1) = gf_u1.word & 0x7fffffff; \ - (i2) = gf_u2.word & 0x7fffffff; \ - } while (0) - -# define TEST_INF_NAN(x, y) \ - do { \ - uint32_t hx, hy; \ - GET_TWO_FLOAT_WORD(x, y, hx, hy); \ - if (hy > hx) { \ - uint32_t ht = hx; hx = hy; hy = ht; \ - } \ - if (hx >= 0x7f800000) { \ - if ((hx == 0x7f800000 || hy == 0x7f800000) \ - && !issignaling (x) && !issignaling (y)) \ - return INFINITY; \ - return x + y; \ - } \ - } while (0) -#endif - - -float -__ieee754_hypotf (float x, float y) -{ - TEST_INF_NAN (x, y); - - return sqrt ((double) x * x + (double) y * y); -} -#ifndef __ieee754_hypotf -libm_alias_finite (__ieee754_hypotf, __hypotf) -#endif diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile index 60f2c95532..1de0f9b350 100644 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile +++ b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile @@ -15,8 +15,7 @@ libm-sysdep_routines += s_llrintf-power6 s_llrintf-ppc32 s_llrint-power6 \ s_lrint-ppc32 s_modf-power5+ s_modf-ppc32 \ s_modff-power5+ s_modff-ppc32 s_logbl-power7 \ s_logbl-ppc32 s_logb-power7 s_logb-ppc32 \ - s_logbf-power7 s_logbf-ppc32 e_hypot-power7 \ - e_hypot-ppc32 e_hypotf-power7 e_hypotf-ppc32 + s_logbf-power7 s_logbf-ppc32 CFLAGS-s_llrintf-power6.c += -mcpu=power6 CFLAGS-s_llrintf-ppc32.c += -mcpu=power4 @@ -35,8 +34,6 @@ CFLAGS-s_modff-power5+.c = -mcpu=power5+ CFLAGS-s_logbl-power7.c = -mcpu=power7 CFLAGS-s_logb-power7.c = -mcpu=power7 CFLAGS-s_logbf-power7.c = -mcpu=power7 -CFLAGS-e_hypot-power7.c = -mcpu=power7 -CFLAGS-e_hypotf-power7.c = -mcpu=power7 # These files quiet sNaNs in a way that is optimized away without # -fsignaling-nans. diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c deleted file mode 100644 index 382b4a0b27..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-power7.c +++ /dev/null @@ -1,23 +0,0 @@ -/* __ieee_hypot() POWER7 version. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <math.h> - -#define __ieee754_hypot __ieee754_hypot_power7 - -#include <sysdeps/powerpc/fpu/e_hypot.c> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c deleted file mode 100644 index abb14d5469..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot-ppc32.c +++ /dev/null @@ -1,23 +0,0 @@ -/* __ieee_hypot() PowerPC32 version. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <math.h> - -#define __ieee754_hypot __ieee754_hypot_ppc32 - -#include <sysdeps/powerpc/fpu/e_hypot.c> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c deleted file mode 100644 index a16efa350c..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c +++ /dev/null @@ -1,33 +0,0 @@ -/* Multiple versions of ieee754_hypot. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <math.h> -#include <math_private.h> -#include <math_ldbl_opt.h> -#include <libm-alias-finite.h> -#include "init-arch.h" - -extern __typeof (__ieee754_hypot) __ieee754_hypot_ppc32 attribute_hidden; -extern __typeof (__ieee754_hypot) __ieee754_hypot_power7 attribute_hidden; - -libc_ifunc (__ieee754_hypot, - (hwcap & PPC_FEATURE_ARCH_2_06) - ? __ieee754_hypot_power7 - : __ieee754_hypot_ppc32); - -libm_alias_finite (__ieee754_hypot, __hypot) diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c deleted file mode 100644 index f8a26ff22f..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-power7.c +++ /dev/null @@ -1,23 +0,0 @@ -/* __ieee754_hypot POWER7 version. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <math.h> - -#define __ieee754_hypotf __ieee754_hypotf_power7 - -#include <sysdeps/powerpc/fpu/e_hypotf.c> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c deleted file mode 100644 index b13f8c9db2..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf-ppc32.c +++ /dev/null @@ -1,23 +0,0 @@ -/* __ieee_hypot() PowerPC32 version. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <math.h> - -#define __ieee754_hypotf __ieee754_hypotf_ppc32 - -#include <sysdeps/ieee754/flt-32/e_hypotf.c> diff --git a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c b/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c deleted file mode 100644 index 1e72605db8..0000000000 --- a/sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c +++ /dev/null @@ -1,33 +0,0 @@ -/* Multiple versions of ieee754_hypotf. - Copyright (C) 2013-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - <https://www.gnu.org/licenses/>. */ - -#include <math.h> -#include <math_private.h> -#include <math_ldbl_opt.h> -#include <libm-alias-finite.h> -#include "init-arch.h" - -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_ppc32 attribute_hidden; -extern __typeof (__ieee754_hypotf) __ieee754_hypotf_power7 attribute_hidden; - -libc_ifunc (__ieee754_hypotf, - (hwcap & PPC_FEATURE_ARCH_2_06) - ? __ieee754_hypotf_power7 - : __ieee754_hypotf_ppc32); - -libm_alias_finite (__ieee754_hypotf, __hypotf)