Message ID | 20200502150243.1347705-2-ibmibmibm.tw@gmail.com |
---|---|
State | New |
Headers | show |
Series | [v4,1/2] math: redirect roundeven function | expand |
On Sat, May 2, 2020 at 8:06 AM Shen-Ta Hsieh via Libc-alpha <libc-alpha@sourceware.org> wrote: > > This patch adds support for the sse4.1 hardware floating point > roundeven. Do you have FSF paper on file? > Here is a benchmark result on my AMD Ryzen 9 3900X system: Since we don't know or may not care SSE4 machines without AVX, should we make it to AVX only? > * benchmark result before this commit > | | roundeven | roundevenf | > |------------|---------------|--------------| > | duration | 3.77783e+09 | 3.77792e+09 | > | iterations | 3.75706e+08 | 3.80448e+08 | > | max | 158.498 | 88.539 | > | min | 6.802 | 7.676 | > | mean | 10.0553 | 9.93018 | > > * benchmark result after this commit > | | roundeven | roundevenf | > |------------|---------------|---------------| > | duration | 3.77242e+09 | 3.77238e+09 | > | iterations | 5.18681e+08 | 5.2425e+08 | > | max | 127.338 | 172.102 | > | min | 7.03 | 7.03 | > | mean | 7.27311 | 7.19577 | > --- > sysdeps/x86_64/fpu/multiarch/Makefile | 5 +-- > sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c | 2 ++ > .../x86_64/fpu/multiarch/s_roundeven-sse4_1.S | 26 ++++++++++++++++ > sysdeps/x86_64/fpu/multiarch/s_roundeven.c | 31 +++++++++++++++++++ > sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c | 3 ++ > .../fpu/multiarch/s_roundevenf-sse4_1.S | 26 ++++++++++++++++ > sysdeps/x86_64/fpu/multiarch/s_roundevenf.c | 31 +++++++++++++++++++ > 7 files changed, 122 insertions(+), 2 deletions(-) > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundeven.c > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S > create mode 100644 sysdeps/x86_64/fpu/multiarch/s_roundevenf.c > > diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile > index 3836574f48..7e3a3f78cb 100644 > --- a/sysdeps/x86_64/fpu/multiarch/Makefile > +++ b/sysdeps/x86_64/fpu/multiarch/Makefile > @@ -1,11 +1,12 @@ > ifeq ($(subdir),math) > libm-sysdep_routines += s_floor-c s_ceil-c s_floorf-c s_ceilf-c \ > s_rint-c s_rintf-c s_nearbyint-c s_nearbyintf-c \ > - s_trunc-c s_truncf-c > + s_roundeven-c s_roundevenf-c s_trunc-c s_truncf-c > > libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \ > s_floorf-sse4_1 s_nearbyint-sse4_1 \ > - s_nearbyintf-sse4_1 s_rint-sse4_1 s_rintf-sse4_1 \ > + s_nearbyintf-sse4_1 s_roundeven-sse4_1 \ > + s_roundevenf-sse4_1 s_rint-sse4_1 s_rintf-sse4_1 \ > s_trunc-sse4_1 s_truncf-sse4_1 > > libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \ > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c b/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > new file mode 100644 > index 0000000000..c7be43cb22 > --- /dev/null > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c > @@ -0,0 +1,2 @@ > +#define __roundeven __roundeven_c > +#include <sysdeps/ieee754/dbl-64/s_roundeven.c> > diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S b/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > new file mode 100644 > index 0000000000..6db88a1649 > --- /dev/null > +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S > @@ -0,0 +1,26 @@ > +/* Round to nearest integer value, rounding halfway cases to even. > + double version. > + Copyright (C) 2019 Free Software Foundation, Inc. Please replace all 2019 with 2020.
* H. J. Lu via Libc-alpha: >> Here is a benchmark result on my AMD Ryzen 9 3900X system: > > Since we don't know or may not care SSE4 machines without AVX, > should we make it to AVX only? What about Goldmont/Tremont? Those are current CPUs which do not support AVX, but I think they have sufficient SSE4 support levels for this change. Thanks, Florian
On Thu, May 28, 2020 at 5:22 AM Florian Weimer <fweimer@redhat.com> wrote: > > * H. J. Lu via Libc-alpha: > > >> Here is a benchmark result on my AMD Ryzen 9 3900X system: > > > > Since we don't know or may not care SSE4 machines without AVX, > > should we make it to AVX only? > > What about Goldmont/Tremont? Those are current CPUs which do not > support AVX, but I think they have sufficient SSE4 support levels for > this change. > Good point. Lili, please collect glibc micro benchmark roundeven/roundevenf data before and after: https://sourceware.org/pipermail/libc-alpha/2020-May/113533.html on Tremont.
> -----Original Message----- > From: H.J. Lu <hjl.tools@gmail.com> > Sent: Thursday, May 28, 2020 8:32 PM > To: Florian Weimer <fweimer@redhat.com>; Cui, Lili <lili.cui@intel.com> > Cc: H.J. Lu via Libc-alpha <libc-alpha@sourceware.org>; Shen-Ta Hsieh > <ibmibmibm.tw@gmail.com> > Subject: Re: [PATCH v4 2/2] x86_64: roundeven with sse4.1 support > > On Thu, May 28, 2020 at 5:22 AM Florian Weimer <fweimer@redhat.com<mailto:fweimer@redhat.com>> > wrote: > > > > * H. J. Lu via Libc-alpha: > > > > >> Here is a benchmark result on my AMD Ryzen 9 3900X system: > > > > > > Since we don't know or may not care SSE4 machines without AVX, > > > should we make it to AVX only? > > > > What about Goldmont/Tremont? Those are current CPUs which do not > > support AVX, but I think they have sufficient SSE4 support levels for > > this change. > > > > Good point. Lili, please collect glibc micro benchmark roundeven/roundevenf > data before and after: > > https://sourceware.org/pipermail/libc-alpha/2020-May/113533.html > > on Tremont. > > -- > H.J. Hi H.J, Result is here. benchmark result before this commit on Tremont [X] benchmark result after this commit on Tremont
On Fri, May 29, 2020 at 1:48 AM Cui, Lili <lili.cui@intel.com> wrote: > > > > -----Original Message----- > > From: H.J. Lu <hjl.tools@gmail.com> > > Sent: Thursday, May 28, 2020 8:32 PM > > To: Florian Weimer <fweimer@redhat.com>; Cui, Lili <lili.cui@intel.com> > > Cc: H.J. Lu via Libc-alpha <libc-alpha@sourceware.org>; Shen-Ta Hsieh > > <ibmibmibm.tw@gmail.com> > > Subject: Re: [PATCH v4 2/2] x86_64: roundeven with sse4.1 support > > > > On Thu, May 28, 2020 at 5:22 AM Florian Weimer <fweimer@redhat.com> > > wrote: > > > > > > * H. J. Lu via Libc-alpha: > > > > > > >> Here is a benchmark result on my AMD Ryzen 9 3900X system: > > > > > > > > Since we don't know or may not care SSE4 machines without AVX, > > > > should we make it to AVX only? > > > > > > What about Goldmont/Tremont? Those are current CPUs which do not > > > support AVX, but I think they have sufficient SSE4 support levels for > > > this change. > > > > > > > Good point. Lili, please collect glibc micro benchmark > roundeven/roundevenf > > data before and after: > > > > https://sourceware.org/pipermail/libc-alpha/2020-May/113533.html > > > > on Tremont. > > > > -- > > H.J. > > Hi H.J, > > Result is here. > benchmark result before this commit on Tremont > > > > benchmark result after this commit on Tremont > > > > Hi Lili, The results are empty.
From: H.J. Lu <hjl.tools@gmail.com> Sent: Friday, May 29, 2020 7:30 PM To: Cui, Lili <lili.cui@intel.com> Cc: Florian Weimer <fweimer@redhat.com>; H.J. Lu via Libc-alpha <libc-alpha@sourceware.org>; Shen-Ta Hsieh <ibmibmibm.tw@gmail.com> Subject: Re: [PATCH v4 2/2] x86_64: roundeven with sse4.1 support On Fri, May 29, 2020 at 1:48 AM Cui, Lili <lili.cui@intel.com<mailto:lili.cui@intel.com>> wrote: > -----Original Message----- > From: H.J. Lu <hjl.tools@gmail.com<mailto:hjl.tools@gmail.com>> > Sent: Thursday, May 28, 2020 8:32 PM > To: Florian Weimer <fweimer@redhat.com<mailto:fweimer@redhat.com>>; Cui, Lili <lili.cui@intel.com<mailto:lili.cui@intel.com>> > Cc: H.J. Lu via Libc-alpha <libc-alpha@sourceware.org<mailto:libc-alpha@sourceware.org>>; Shen-Ta Hsieh > <ibmibmibm.tw@gmail.com<mailto:ibmibmibm.tw@gmail.com>> > Subject: Re: [PATCH v4 2/2] x86_64: roundeven with sse4.1 support > > On Thu, May 28, 2020 at 5:22 AM Florian Weimer <fweimer@redhat.com<mailto:fweimer@redhat.com>> > wrote: > > > > * H. J. Lu via Libc-alpha: > > > > >> Here is a benchmark result on my AMD Ryzen 9 3900X system: > > > > > > Since we don't know or may not care SSE4 machines without AVX, > > > should we make it to AVX only? > > > > What about Goldmont/Tremont? Those are current CPUs which do not > > support AVX, but I think they have sufficient SSE4 support levels for > > this change. > > > > Good point. Lili, please collect glibc micro benchmark roundeven/roundevenf > data before and after: > > https://sourceware.org/pipermail/libc-alpha/2020-May/113533.html > > on Tremont. > > -- > H.J. Hi H.J, Result is here. benchmark result before this commit on Tremont benchmark result after this commit on Tremont Hi Lili, The results are empty. -- H.J. Hi H.J, Sorry for that my format has some problems, data is here. benchmark result before this commit on Tremont "roundeven": "roundevenf": "duration": 2.19422e+09, "duration": 2.19402e+09, "iterations": 1.44514e+08, "iterations": 1.4184e+08, "max": 43.258, "max": 53.07, "min": 11.052, "min": 12.052, "mean": 15.1835 "mean": 15.4683 benchmark result after this commit on Tremont "roundeven": "roundevenf": "duration": 2.19144e+09, "duration": 2.19218e+09, "iterations": 2.17075e+08, "iterations": 1.97982e+08, "max": 395.428, "max": 34.928, "min": 10.044, "min": 11.02, "mean": 10.0953 "mean": 11.0726 Thanks, Lili.
On Sun, May 31, 2020 at 6:28 PM Cui, Lili <lili.cui@intel.com> wrote: > > > > > > From: H.J. Lu <hjl.tools@gmail.com> > Sent: Friday, May 29, 2020 7:30 PM > To: Cui, Lili <lili.cui@intel.com> > Cc: Florian Weimer <fweimer@redhat.com>; H.J. Lu via Libc-alpha <libc-alpha@sourceware.org>; Shen-Ta Hsieh <ibmibmibm.tw@gmail.com> > Subject: Re: [PATCH v4 2/2] x86_64: roundeven with sse4.1 support > > > > On Fri, May 29, 2020 at 1:48 AM Cui, Lili <lili.cui@intel.com> wrote: > > > > > > > -----Original Message----- > > > From: H.J. Lu <hjl.tools@gmail.com> > > > Sent: Thursday, May 28, 2020 8:32 PM > > > To: Florian Weimer <fweimer@redhat.com>; Cui, Lili <lili.cui@intel.com> > > > Cc: H.J. Lu via Libc-alpha <libc-alpha@sourceware.org>; Shen-Ta Hsieh > > > <ibmibmibm.tw@gmail.com> > > > Subject: Re: [PATCH v4 2/2] x86_64: roundeven with sse4.1 support > > > > > > On Thu, May 28, 2020 at 5:22 AM Florian Weimer <fweimer@redhat.com> > > > wrote: > > > > > > > > * H. J. Lu via Libc-alpha: > > > > > > > > >> Here is a benchmark result on my AMD Ryzen 9 3900X system: > > > > > > > > > > Since we don't know or may not care SSE4 machines without AVX, > > > > > should we make it to AVX only? > > > > > > > > What about Goldmont/Tremont? Those are current CPUs which do not > > > > support AVX, but I think they have sufficient SSE4 support levels for > > > > this change. > > > > > > > > > > Good point. Lili, please collect glibc micro benchmark roundeven/roundevenf > > > data before and after: > > > > > > https://sourceware.org/pipermail/libc-alpha/2020-May/113533.html > > > > > > on Tremont. > > > > > > -- > > > H.J. > > > > Hi H.J, > > > > Result is here. > > benchmark result before this commit on Tremont > > > > > > > > benchmark result after this commit on Tremont > > > > > > > > > > Hi Lili, > > > > The results are empty. > > > > -- > > H.J. > > > > Hi H.J, > > > > Sorry for that my format has some problems, data is here. > > > > benchmark result before this commit on Tremont > > > > "roundeven": "roundevenf": > > "duration": 2.19422e+09, "duration": 2.19402e+09, > > "iterations": 1.44514e+08, "iterations": 1.4184e+08, > > "max": 43.258, "max": 53.07, > > "min": 11.052, "min": 12.052, > > "mean": 15.1835 "mean": 15.4683 > > > > benchmark result after this commit on Tremont > > > > "roundeven": "roundevenf": > > "duration": 2.19144e+09, "duration": 2.19218e+09, > > "iterations": 2.17075e+08, "iterations": 1.97982e+08, > > "max": 395.428, "max": 34.928, > > "min": 10.044, "min": 11.02, > > "mean": 10.0953 "mean": 11.0726 > > > Looks good. Thanks.
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile index 3836574f48..7e3a3f78cb 100644 --- a/sysdeps/x86_64/fpu/multiarch/Makefile +++ b/sysdeps/x86_64/fpu/multiarch/Makefile @@ -1,11 +1,12 @@ ifeq ($(subdir),math) libm-sysdep_routines += s_floor-c s_ceil-c s_floorf-c s_ceilf-c \ s_rint-c s_rintf-c s_nearbyint-c s_nearbyintf-c \ - s_trunc-c s_truncf-c + s_roundeven-c s_roundevenf-c s_trunc-c s_truncf-c libm-sysdep_routines += s_ceil-sse4_1 s_ceilf-sse4_1 s_floor-sse4_1 \ s_floorf-sse4_1 s_nearbyint-sse4_1 \ - s_nearbyintf-sse4_1 s_rint-sse4_1 s_rintf-sse4_1 \ + s_nearbyintf-sse4_1 s_roundeven-sse4_1 \ + s_roundevenf-sse4_1 s_rint-sse4_1 s_rintf-sse4_1 \ s_trunc-sse4_1 s_truncf-sse4_1 libm-sysdep_routines += e_exp-fma e_log-fma e_pow-fma s_atan-fma \ diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c b/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c new file mode 100644 index 0000000000..c7be43cb22 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven-c.c @@ -0,0 +1,2 @@ +#define __roundeven __roundeven_c +#include <sysdeps/ieee754/dbl-64/s_roundeven.c> diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S b/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S new file mode 100644 index 0000000000..6db88a1649 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven-sse4_1.S @@ -0,0 +1,26 @@ +/* Round to nearest integer value, rounding halfway cases to even. + double version. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <sysdep.h> + + .section .text.sse4.1,"ax",@progbits +ENTRY(__roundeven_sse41) + roundsd $8, %xmm0, %xmm0 + ret +END(__roundeven_sse41) diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundeven.c b/sysdeps/x86_64/fpu/multiarch/s_roundeven.c new file mode 100644 index 0000000000..bd777b0ca7 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/s_roundeven.c @@ -0,0 +1,31 @@ +/* Multiple versions of __roundeven. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <libm-alias-double.h> + +#define roundeven __redirect_roundeven +#define __roundeven __redirect___roundeven +#include <math.h> +#undef roundeven +#undef __roundeven + +#define SYMBOL_NAME roundeven +#include "ifunc-sse4_1.h" + +libc_ifunc_redirected (__redirect_roundeven, __roundeven, IFUNC_SELECTOR ()); +libm_alias_double (__roundeven, roundeven) diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c new file mode 100644 index 0000000000..72a6e7d1fb --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-c.c @@ -0,0 +1,3 @@ +#undef __roundevenf +#define __roundevenf __roundevenf_c +#include <sysdeps/ieee754/flt-32/s_roundevenf.c> diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S new file mode 100644 index 0000000000..74102bac0d --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf-sse4_1.S @@ -0,0 +1,26 @@ +/* Round to nearest integer value, rounding halfway cases to even. + float version. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <sysdep.h> + + .section .text.sse4.1,"ax",@progbits +ENTRY(__roundevenf_sse41) + roundss $8, %xmm0, %xmm0 + ret +END(__roundevenf_sse41) diff --git a/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c b/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c new file mode 100644 index 0000000000..8ae1944d2b --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/s_roundevenf.c @@ -0,0 +1,31 @@ +/* Multiple versions of __roundevenf. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <libm-alias-float.h> + +#define roundevenf __redirect_roundevenf +#define __roundevenf __redirect___roundevenf +#include <math.h> +#undef roundevenf +#undef __roundevenf + +#define SYMBOL_NAME roundevenf +#include "ifunc-sse4_1.h" + +libc_ifunc_redirected (__redirect_roundevenf, __roundevenf, IFUNC_SELECTOR ()); +libm_alias_float (__roundeven, roundeven)