Message ID | 20200303182038.6873-1-msc@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | powerpc: Fix feraiseexcept and feclearexcept macros | expand |
On 03/03/2020 15:20, Matheus Castanho wrote: > A recent change to fenvinline.h modified the check if __e is a > a power of 2 inside feraiseexcept and feclearexcept macros. It > introduced the use of the powerof2 macro but also removed the > if statement checking whether __e != 0 before issuing an mtfsb* > instruction. This is problematic because powerof2 (0) evaluates > to 1 and without the removed if __e is allowed to be 0 when > __builtin_clz is called. In that case the value 32 is passed > to __MTFSB*, which is invalid. > > This commit uses __builtin_popcount instead of powerof2 to fix this > issue and avoid the extra check for __e != 0. This was the approach > used by the initial versions of that previous patch. This code is becoming convoluted and I think these micro-optimization are hardly wildly used and even more being a possible hotspot in realword cases (non-default rounding are used only on specific cases and excepting handling are done most likely only on exceptions cases). So, I think we should just remove such optimization as we have done for string{2,3}.h and leverage the compiler with some builtin if it were the case. It would allow also remove one extra unused installed header file, since it seems that only powerpc push for this kind of optimization. For the patch itself, I don't have a strong opinion. Either adding the '__e == 0' check back or using your suggestion would be fine. > --- > sysdeps/powerpc/bits/fenvinline.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/powerpc/bits/fenvinline.h b/sysdeps/powerpc/bits/fenvinline.h > index 89ea38a4f8..f2d095a72f 100644 > --- a/sysdeps/powerpc/bits/fenvinline.h > +++ b/sysdeps/powerpc/bits/fenvinline.h > @@ -75,7 +75,7 @@ > int __e = __excepts; \ > int __ret = 0; \ > if (__builtin_constant_p (__e) \ > - && powerof2 (__e) \ > + && __builtin_popcount (__e) == 1 \ > && __e != FE_INVALID) \ > { \ > __MTFSB1 ((__builtin_clz (__e))); \ > @@ -91,7 +91,7 @@ > int __e = __excepts; \ > int __ret = 0; \ > if (__builtin_constant_p (__e) \ > - && powerof2 (__e) \ > + && __builtin_popcount (__e) == 1 \ > && __e != FE_INVALID) \ > { \ > __MTFSB0 ((__builtin_clz (__e))); \ >
* Adhemerval Zanella: > On 03/03/2020 15:20, Matheus Castanho wrote: >> A recent change to fenvinline.h modified the check if __e is a >> a power of 2 inside feraiseexcept and feclearexcept macros. It >> introduced the use of the powerof2 macro but also removed the >> if statement checking whether __e != 0 before issuing an mtfsb* >> instruction. This is problematic because powerof2 (0) evaluates >> to 1 and without the removed if __e is allowed to be 0 when >> __builtin_clz is called. In that case the value 32 is passed >> to __MTFSB*, which is invalid. >> >> This commit uses __builtin_popcount instead of powerof2 to fix this >> issue and avoid the extra check for __e != 0. This was the approach >> used by the initial versions of that previous patch. Sorry about that. I convinced myself that the change was valid, even after remembering that there was something weird with the powerof2 macro. > This code is becoming convoluted and I think these micro-optimization > are hardly wildly used and even more being a possible hotspot in > realword cases (non-default rounding are used only on specific cases > and excepting handling are done most likely only on exceptions cases). It's not an optimization because it's a compile-time-evaluated expression. The question is how to express this in a succinct way, now that the powerof2 macro is out. __builtin_popcount (e) == 1 does not seem to be unreasonable to express this condition. Or do you object to the existence of the feclearexcept macro as a whole? Thanks, Florian
On 04/03/2020 06:24, Florian Weimer wrote: > * Adhemerval Zanella: > >> On 03/03/2020 15:20, Matheus Castanho wrote: >>> A recent change to fenvinline.h modified the check if __e is a >>> a power of 2 inside feraiseexcept and feclearexcept macros. It >>> introduced the use of the powerof2 macro but also removed the >>> if statement checking whether __e != 0 before issuing an mtfsb* >>> instruction. This is problematic because powerof2 (0) evaluates >>> to 1 and without the removed if __e is allowed to be 0 when >>> __builtin_clz is called. In that case the value 32 is passed >>> to __MTFSB*, which is invalid. >>> >>> This commit uses __builtin_popcount instead of powerof2 to fix this >>> issue and avoid the extra check for __e != 0. This was the approach >>> used by the initial versions of that previous patch. > > Sorry about that. I convinced myself that the change was valid, even > after remembering that there was something weird with the powerof2 > macro. > >> This code is becoming convoluted and I think these micro-optimization >> are hardly wildly used and even more being a possible hotspot in >> realword cases (non-default rounding are used only on specific cases >> and excepting handling are done most likely only on exceptions cases). > > It's not an optimization because it's a compile-time-evaluated > expression. The question is how to express this in a succinct way, now > that the powerof2 macro is out. __builtin_popcount (e) == 1 does not > seem to be unreasonable to express this condition. > > Or do you object to the existence of the feclearexcept macro as a whole? I mean the fenvinline.h altogether and the macro dance to avoid function calls. It is similar to the ones on string{2,3}.h macros to optimize some function calls and I think it would be worth to avoid such optimization on glibc side and try to leverage them on compiler (as it does for various math functions with builtins).
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > I mean the fenvinline.h altogether and the macro dance to avoid function > calls. It is similar to the ones on string{2,3}.h macros to optimize > some function calls and I think it would be worth to avoid such optimization > on glibc side and try to leverage them on compiler (as it does for various > math functions with builtins). That's a nice proposal in the long term, but it's better to involve the GCC folks (Segher, Mike and Bill) before taking any decision. The proposal here is to provide an inline expansion code in GCC for fegetround(), feraiseexcept() and feclearexcept().
> __builtin_popcount (e) == 1 does not > seem to be unreasonable to express this condition. > Adhemerval and Florian, apart from the discussion to remove or not fenvinline.h altogether, could we push this commit? This header is currently in a broken state and I'd like to fix it for the time being. If you have any other suggestions / reservations about using __builtin_popcount (__e) == 1, I'd be happy to consider and provide an alternative patch if needed. Thanks, Matheus Castanho
* Matheus Castanho: >> __builtin_popcount (e) == 1 does not >> seem to be unreasonable to express this condition. >> > > Adhemerval and Florian, apart from the discussion to remove or not > fenvinline.h altogether, could we push this commit? This header is > currently in a broken state and I'd like to fix it for the time being. I think the patch is okay, but please also wait for Adhemerval's comments. I'd also be okay with powerof2 and a zero check. I agree that we should fix this soon. Thanks, Florian
On 04/03/2020 14:47, Florian Weimer wrote: > * Matheus Castanho: > >>> __builtin_popcount (e) == 1 does not >>> seem to be unreasonable to express this condition. >>> >> >> Adhemerval and Florian, apart from the discussion to remove or not >> fenvinline.h altogether, could we push this commit? This header is >> currently in a broken state and I'd like to fix it for the time being. > > I think the patch is okay, but please also wait for Adhemerval's > comments. > > I'd also be okay with powerof2 and a zero check. I agree that we should > fix this soon. I am fine with this patch as is, since either solution is fine. I will probably send a patch to just remove the fenvinline to start the discussion whether it is worth to keep continuing provide such optimization.
On 3/4/20 3:30 PM, Adhemerval Zanella wrote: > > > On 04/03/2020 14:47, Florian Weimer wrote: >> * Matheus Castanho: >> >>>> __builtin_popcount (e) == 1 does not >>>> seem to be unreasonable to express this condition. >>>> >>> >>> Adhemerval and Florian, apart from the discussion to remove or not >>> fenvinline.h altogether, could we push this commit? This header is >>> currently in a broken state and I'd like to fix it for the time being. >> >> I think the patch is okay, but please also wait for Adhemerval's >> comments. >> >> I'd also be okay with powerof2 and a zero check. I agree that we should >> fix this soon. > > I am fine with this patch as is, since either solution is fine. I will > probably send a patch to just remove the fenvinline to start the > discussion whether it is worth to keep continuing provide such > optimization. Sounds good to me. Thanks, Matheus Castanho
On Wed, Mar 04, 2020 at 01:06:18PM -0300, Tulio Magno Quites Machado Filho wrote: > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > > > I mean the fenvinline.h altogether and the macro dance to avoid function > > calls. It is similar to the ones on string{2,3}.h macros to optimize > > some function calls and I think it would be worth to avoid such optimization > > on glibc side and try to leverage them on compiler (as it does for various > > math functions with builtins). > > That's a nice proposal in the long term, but it's better to involve the GCC > folks (Segher, Mike and Bill) before taking any decision. > > The proposal here is to provide an inline expansion code in GCC for > fegetround(), feraiseexcept() and feclearexcept(). Yes, please implement these as compiler builtins! Segher
On Wed, 4 Mar 2020, Tulio Magno Quites Machado Filho wrote: > The proposal here is to provide an inline expansion code in GCC for > fegetround(), feraiseexcept() and feclearexcept(). Generically in GCC, providing such inlines can run into different OSes making different choices for the FE_* values. But it appears from the existing TARGET_ATOMIC_ASSIGN_EXPAND_FENV implementation that this wasn't an issue on powerpc for the exception flags at least (it *was* an issue on SPARC, see SPARC_LOW_FE_EXCEPT_VALUES in GCC).
Matheus Castanho <msc@linux.ibm.com> writes: > A recent change to fenvinline.h modified the check if __e is a > a power of 2 inside feraiseexcept and feclearexcept macros. It > introduced the use of the powerof2 macro but also removed the > if statement checking whether __e != 0 before issuing an mtfsb* > instruction. This is problematic because powerof2 (0) evaluates > to 1 and without the removed if __e is allowed to be 0 when > __builtin_clz is called. In that case the value 32 is passed > to __MTFSB*, which is invalid. > > This commit uses __builtin_popcount instead of powerof2 to fix this > issue and avoid the extra check for __e != 0. This was the approach > used by the initial versions of that previous patch. Pushed as 1c252f0e7e5d78695f19450aa7c470bab445aa8e. Thanks!
diff --git a/sysdeps/powerpc/bits/fenvinline.h b/sysdeps/powerpc/bits/fenvinline.h index 89ea38a4f8..f2d095a72f 100644 --- a/sysdeps/powerpc/bits/fenvinline.h +++ b/sysdeps/powerpc/bits/fenvinline.h @@ -75,7 +75,7 @@ int __e = __excepts; \ int __ret = 0; \ if (__builtin_constant_p (__e) \ - && powerof2 (__e) \ + && __builtin_popcount (__e) == 1 \ && __e != FE_INVALID) \ { \ __MTFSB1 ((__builtin_clz (__e))); \ @@ -91,7 +91,7 @@ int __e = __excepts; \ int __ret = 0; \ if (__builtin_constant_p (__e) \ - && powerof2 (__e) \ + && __builtin_popcount (__e) == 1 \ && __e != FE_INVALID) \ { \ __MTFSB0 ((__builtin_clz (__e))); \