Message ID | 20201222153039.17722-1-rzinsly@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | powerpc: Add optimized ilogbf128 for POWER9 | expand |
Benchtests results with and without this patch on a POWER9: without: "ilogbf128": { "subnormal": { "duration": 5.09834e+08, "iterations": 2.8146e+07, "max": 38.979, "min": 2.939, "mean": 18.1139 }, "normal": { "duration": 4.99378e+08, "iterations": 1.6151e+08, "max": 16.698, "min": 2.942, "mean": 3.09193 } } with: "ilogbf128": { "subnormal": { "duration": 5.09989e+08, "iterations": 2.5978e+07, "max": 41.027, "min": 4.674, "mean": 19.6316 }, "normal": { "duration": 4.98105e+08, "iterations": 1.77912e+08, "max": 12.663, "min": 2.792, "mean": 2.79972 } } Best Regards,
On 12/22/20 9:30 AM, Raphael Moreira Zinsly via Libc-alpha wrote: > The instruction xsxexpqp introduced on POWER9 extracts the exponent > from a quad-precision floating-point, thus it can be used to improve > ilogbf128 and llogbf128. > --- > .../powerpc/powerpc64/le/fpu/e_ilogbf128.c | 22 +++++++++++++++++++ > 1 file changed, 22 insertions(+) > create mode 100644 sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c > > diff --git a/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c > new file mode 100644 > index 0000000000..47558bbadc > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c > @@ -0,0 +1,22 @@ > +#ifdef _ARCH_PWR9 > +int _ilogbf128 (_Float128 __x); This should be a locally (static) scoped function. > + > +int > +#if defined(_F128_ENABLE_IFUNC) > +__ieee754_ilogbf128_power9 (_Float128 __x) > +#else > +__ieee754_ilogbf128 (_Float128 __x) > +#endif > +{ > + /* Check for exceptional cases. */ > + if (!__builtin_vsx_scalar_test_data_class_qp (__x, 0x7f)) > + return __builtin_vsx_scalar_extract_expq (__x) - 0x3fff; > + else > + /* Fallback to the generic ilogb if __x is NaN, Inf or subnormal. */ > + return _ilogbf128(__x); > +} > + > +#define __ieee754_ilogbf128 _ilogbf128 > +#endif > + > +#include<sysdeps/ieee754/float128/e_ilogbf128.c> A space seems to be missing between include and <. Otherwise, LGTM. As a side note, I think the benchtests are not too impressive. I am surprised normal values don't show better results.
On 1/4/21 5:20 PM, Paul E Murphy via Libc-alpha wrote: > > > On 12/22/20 9:30 AM, Raphael Moreira Zinsly via Libc-alpha wrote: >> The instruction xsxexpqp introduced on POWER9 extracts the exponent >> from a quad-precision floating-point, thus it can be used to improve >> ilogbf128 and llogbf128. >> --- >> .../powerpc/powerpc64/le/fpu/e_ilogbf128.c | 22 +++++++++++++++++++ >> 1 file changed, 22 insertions(+) >> create mode 100644 sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c >> >> diff --git a/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c >> b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c >> new file mode 100644 >> index 0000000000..47558bbadc >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c >> @@ -0,0 +1,22 @@ >> +#ifdef _ARCH_PWR9 >> +int _ilogbf128 (_Float128 __x); > > This should be a locally (static) scoped function. > >> + >> +int >> +#if defined(_F128_ENABLE_IFUNC) >> +__ieee754_ilogbf128_power9 (_Float128 __x) >> +#else >> +__ieee754_ilogbf128 (_Float128 __x) >> +#endif >> +{ >> + /* Check for exceptional cases. */ >> + if (!__builtin_vsx_scalar_test_data_class_qp (__x, 0x7f)) >> + return __builtin_vsx_scalar_extract_expq (__x) - 0x3fff; >> + else >> + /* Fallback to the generic ilogb if __x is NaN, Inf or >> subnormal. */ >> + return _ilogbf128(__x); >> +} >> + >> +#define __ieee754_ilogbf128 _ilogbf128 >> +#endif >> + >> +#include<sysdeps/ieee754/float128/e_ilogbf128.c> > > A space seems to be missing between include and <. > > Otherwise, LGTM. > > As a side note, I think the benchtests are not too impressive. I am > surprised normal values don't show better results. After spending a little time looking at this, the call overhead of the wrapper is hiding most of the improvement. Similarly, power9 adds similar instructions for float32/float64. I would recommend refactoring this patch to provide an override to w_ilogb_template.c so all three formats can use these new instructions without the call overhead for normal numbers.
diff --git a/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c new file mode 100644 index 0000000000..47558bbadc --- /dev/null +++ b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c @@ -0,0 +1,22 @@ +#ifdef _ARCH_PWR9 +int _ilogbf128 (_Float128 __x); + +int +#if defined(_F128_ENABLE_IFUNC) +__ieee754_ilogbf128_power9 (_Float128 __x) +#else +__ieee754_ilogbf128 (_Float128 __x) +#endif +{ + /* Check for exceptional cases. */ + if (!__builtin_vsx_scalar_test_data_class_qp (__x, 0x7f)) + return __builtin_vsx_scalar_extract_expq (__x) - 0x3fff; + else + /* Fallback to the generic ilogb if __x is NaN, Inf or subnormal. */ + return _ilogbf128(__x); +} + +#define __ieee754_ilogbf128 _ilogbf128 +#endif + +#include<sysdeps/ieee754/float128/e_ilogbf128.c>