diff mbox series

powerpc: Fix feraiseexcept and feclearexcept macros

Message ID 20200303182038.6873-1-msc@linux.ibm.com
State New
Headers show
Series powerpc: Fix feraiseexcept and feclearexcept macros | expand

Commit Message

Matheus Castanho March 3, 2020, 6:20 p.m. UTC
A recent change to fenvinline.h modified the check if __e is a
a power of 2 inside feraiseexcept and feclearexcept macros.  It
introduced the use of the powerof2 macro but also removed the
if statement checking whether __e != 0 before issuing an mtfsb*
instruction.  This is problematic because powerof2 (0) evaluates
to 1 and without the removed if __e is allowed to be 0 when
__builtin_clz is called.  In that case the value 32 is passed
to __MTFSB*, which is invalid.

This commit uses __builtin_popcount instead of powerof2 to fix this
issue and avoid the extra check for __e != 0.  This was the approach
used by the initial versions of that previous patch.
---
 sysdeps/powerpc/bits/fenvinline.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Adhemerval Zanella March 3, 2020, 6:53 p.m. UTC | #1
On 03/03/2020 15:20, Matheus Castanho wrote:
> A recent change to fenvinline.h modified the check if __e is a
> a power of 2 inside feraiseexcept and feclearexcept macros.  It
> introduced the use of the powerof2 macro but also removed the
> if statement checking whether __e != 0 before issuing an mtfsb*
> instruction.  This is problematic because powerof2 (0) evaluates
> to 1 and without the removed if __e is allowed to be 0 when
> __builtin_clz is called.  In that case the value 32 is passed
> to __MTFSB*, which is invalid.
> 
> This commit uses __builtin_popcount instead of powerof2 to fix this
> issue and avoid the extra check for __e != 0.  This was the approach
> used by the initial versions of that previous patch.

This code is becoming convoluted and I think these micro-optimization
are hardly wildly used and even more being a possible hotspot in 
realword cases (non-default rounding are used only on specific cases 
and excepting handling are done most likely only on exceptions cases).

So, I think we should just remove such optimization as we have done for
string{2,3}.h and leverage the compiler with some builtin if it were 
the case. It would allow also remove one extra unused installed header
file, since it seems that only powerpc push for this kind of 
optimization.

For the patch itself, I don't have a strong opinion. Either
adding the '__e == 0' check back or using your suggestion would
be fine.

> ---
>  sysdeps/powerpc/bits/fenvinline.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/sysdeps/powerpc/bits/fenvinline.h b/sysdeps/powerpc/bits/fenvinline.h
> index 89ea38a4f8..f2d095a72f 100644
> --- a/sysdeps/powerpc/bits/fenvinline.h
> +++ b/sysdeps/powerpc/bits/fenvinline.h
> @@ -75,7 +75,7 @@
>      int __e = __excepts;						      \
>      int __ret = 0;							      \
>      if (__builtin_constant_p (__e)					      \
> -        && powerof2 (__e)						      \
> +        && __builtin_popcount (__e) == 1				      \
>          && __e != FE_INVALID)						      \
>        {									      \
>  	__MTFSB1 ((__builtin_clz (__e)));				      \
> @@ -91,7 +91,7 @@
>      int __e = __excepts;						      \
>      int __ret = 0;							      \
>      if (__builtin_constant_p (__e)					      \
> -        && powerof2 (__e)						      \
> +        && __builtin_popcount (__e) == 1				      \
>          && __e != FE_INVALID)						      \
>        {									      \
>  	__MTFSB0 ((__builtin_clz (__e)));				      \
>
Florian Weimer March 4, 2020, 9:24 a.m. UTC | #2
* Adhemerval Zanella:

> On 03/03/2020 15:20, Matheus Castanho wrote:
>> A recent change to fenvinline.h modified the check if __e is a
>> a power of 2 inside feraiseexcept and feclearexcept macros.  It
>> introduced the use of the powerof2 macro but also removed the
>> if statement checking whether __e != 0 before issuing an mtfsb*
>> instruction.  This is problematic because powerof2 (0) evaluates
>> to 1 and without the removed if __e is allowed to be 0 when
>> __builtin_clz is called.  In that case the value 32 is passed
>> to __MTFSB*, which is invalid.
>> 
>> This commit uses __builtin_popcount instead of powerof2 to fix this
>> issue and avoid the extra check for __e != 0.  This was the approach
>> used by the initial versions of that previous patch.

Sorry about that.  I convinced myself that the change was valid, even
after remembering that there was something weird with the powerof2
macro.

> This code is becoming convoluted and I think these micro-optimization
> are hardly wildly used and even more being a possible hotspot in 
> realword cases (non-default rounding are used only on specific cases 
> and excepting handling are done most likely only on exceptions cases).

It's not an optimization because it's a compile-time-evaluated
expression.  The question is how to express this in a succinct way, now
that the powerof2 macro is out.  __builtin_popcount (e) == 1 does not
seem to be unreasonable to express this condition.

Or do you object to the existence of the feclearexcept macro as a whole?

Thanks,
Florian
Adhemerval Zanella March 4, 2020, 11:57 a.m. UTC | #3
On 04/03/2020 06:24, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> On 03/03/2020 15:20, Matheus Castanho wrote:
>>> A recent change to fenvinline.h modified the check if __e is a
>>> a power of 2 inside feraiseexcept and feclearexcept macros.  It
>>> introduced the use of the powerof2 macro but also removed the
>>> if statement checking whether __e != 0 before issuing an mtfsb*
>>> instruction.  This is problematic because powerof2 (0) evaluates
>>> to 1 and without the removed if __e is allowed to be 0 when
>>> __builtin_clz is called.  In that case the value 32 is passed
>>> to __MTFSB*, which is invalid.
>>>
>>> This commit uses __builtin_popcount instead of powerof2 to fix this
>>> issue and avoid the extra check for __e != 0.  This was the approach
>>> used by the initial versions of that previous patch.
> 
> Sorry about that.  I convinced myself that the change was valid, even
> after remembering that there was something weird with the powerof2
> macro.
> 
>> This code is becoming convoluted and I think these micro-optimization
>> are hardly wildly used and even more being a possible hotspot in 
>> realword cases (non-default rounding are used only on specific cases 
>> and excepting handling are done most likely only on exceptions cases).
> 
> It's not an optimization because it's a compile-time-evaluated
> expression.  The question is how to express this in a succinct way, now
> that the powerof2 macro is out.  __builtin_popcount (e) == 1 does not
> seem to be unreasonable to express this condition.
> 
> Or do you object to the existence of the feclearexcept macro as a whole?

I mean the fenvinline.h altogether and the macro dance to avoid function
calls.  It is similar to the ones on string{2,3}.h macros to optimize
some function calls and I think it would be worth to avoid such optimization
on glibc side and try to leverage them on compiler (as it does for various
math functions with builtins).
Tulio Magno Quites Machado Filho March 4, 2020, 4:06 p.m. UTC | #4
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> I mean the fenvinline.h altogether and the macro dance to avoid function
> calls.  It is similar to the ones on string{2,3}.h macros to optimize
> some function calls and I think it would be worth to avoid such optimization
> on glibc side and try to leverage them on compiler (as it does for various
> math functions with builtins).

That's a nice proposal in the long term, but it's better to involve the GCC
folks (Segher, Mike and Bill) before taking any decision.

The proposal here is to provide an inline expansion code in GCC for
fegetround(), feraiseexcept() and feclearexcept().
Matheus Castanho March 4, 2020, 5:25 p.m. UTC | #5
> __builtin_popcount (e) == 1 does not
> seem to be unreasonable to express this condition.
> 

Adhemerval and Florian, apart from the discussion to remove or not
fenvinline.h altogether, could we push this commit? This header is
currently in a broken state and I'd like to fix it for the time being.

If you have any other suggestions / reservations about using
__builtin_popcount (__e) == 1, I'd be happy to consider and provide an
alternative patch if needed.

Thanks,
Matheus Castanho
Florian Weimer March 4, 2020, 5:47 p.m. UTC | #6
* Matheus Castanho:

>> __builtin_popcount (e) == 1 does not
>> seem to be unreasonable to express this condition.
>> 
>
> Adhemerval and Florian, apart from the discussion to remove or not
> fenvinline.h altogether, could we push this commit? This header is
> currently in a broken state and I'd like to fix it for the time being.

I think the patch is okay, but please also wait for Adhemerval's
comments.

I'd also be okay with powerof2 and a zero check.  I agree that we should
fix this soon.

Thanks,
Florian
Adhemerval Zanella March 4, 2020, 6:30 p.m. UTC | #7
On 04/03/2020 14:47, Florian Weimer wrote:
> * Matheus Castanho:
> 
>>> __builtin_popcount (e) == 1 does not
>>> seem to be unreasonable to express this condition.
>>>
>>
>> Adhemerval and Florian, apart from the discussion to remove or not
>> fenvinline.h altogether, could we push this commit? This header is
>> currently in a broken state and I'd like to fix it for the time being.
> 
> I think the patch is okay, but please also wait for Adhemerval's
> comments.
> 
> I'd also be okay with powerof2 and a zero check.  I agree that we should
> fix this soon.

I am fine with this patch as is, since either solution is fine. I will
probably send a patch to just remove the fenvinline to start the
discussion whether it is worth to keep continuing provide such
optimization.
Matheus Castanho March 4, 2020, 6:44 p.m. UTC | #8
On 3/4/20 3:30 PM, Adhemerval Zanella wrote:
> 
> 
> On 04/03/2020 14:47, Florian Weimer wrote:
>> * Matheus Castanho:
>>
>>>> __builtin_popcount (e) == 1 does not
>>>> seem to be unreasonable to express this condition.
>>>>
>>>
>>> Adhemerval and Florian, apart from the discussion to remove or not
>>> fenvinline.h altogether, could we push this commit? This header is
>>> currently in a broken state and I'd like to fix it for the time being.
>>
>> I think the patch is okay, but please also wait for Adhemerval's
>> comments.
>>
>> I'd also be okay with powerof2 and a zero check.  I agree that we should
>> fix this soon.
> 
> I am fine with this patch as is, since either solution is fine. I will
> probably send a patch to just remove the fenvinline to start the
> discussion whether it is worth to keep continuing provide such
> optimization.

Sounds good to me.

Thanks,
Matheus Castanho
Segher Boessenkool March 4, 2020, 9:06 p.m. UTC | #9
On Wed, Mar 04, 2020 at 01:06:18PM -0300, Tulio Magno Quites Machado Filho wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> 
> > I mean the fenvinline.h altogether and the macro dance to avoid function
> > calls.  It is similar to the ones on string{2,3}.h macros to optimize
> > some function calls and I think it would be worth to avoid such optimization
> > on glibc side and try to leverage them on compiler (as it does for various
> > math functions with builtins).
> 
> That's a nice proposal in the long term, but it's better to involve the GCC
> folks (Segher, Mike and Bill) before taking any decision.
> 
> The proposal here is to provide an inline expansion code in GCC for
> fegetround(), feraiseexcept() and feclearexcept().

Yes, please implement these as compiler builtins!


Segher
Joseph Myers March 5, 2020, 12:59 a.m. UTC | #10
On Wed, 4 Mar 2020, Tulio Magno Quites Machado Filho wrote:

> The proposal here is to provide an inline expansion code in GCC for
> fegetround(), feraiseexcept() and feclearexcept().

Generically in GCC, providing such inlines can run into different OSes 
making different choices for the FE_* values.  But it appears from the 
existing TARGET_ATOMIC_ASSIGN_EXPAND_FENV implementation that this wasn't 
an issue on powerpc for the exception flags at least (it *was* an issue on 
SPARC, see SPARC_LOW_FE_EXCEPT_VALUES in GCC).
Tulio Magno Quites Machado Filho March 6, 2020, 2:13 p.m. UTC | #11
Matheus Castanho <msc@linux.ibm.com> writes:

> A recent change to fenvinline.h modified the check if __e is a
> a power of 2 inside feraiseexcept and feclearexcept macros.  It
> introduced the use of the powerof2 macro but also removed the
> if statement checking whether __e != 0 before issuing an mtfsb*
> instruction.  This is problematic because powerof2 (0) evaluates
> to 1 and without the removed if __e is allowed to be 0 when
> __builtin_clz is called.  In that case the value 32 is passed
> to __MTFSB*, which is invalid.
>
> This commit uses __builtin_popcount instead of powerof2 to fix this
> issue and avoid the extra check for __e != 0.  This was the approach
> used by the initial versions of that previous patch.

Pushed as 1c252f0e7e5d78695f19450aa7c470bab445aa8e.

Thanks!
diff mbox series

Patch

diff --git a/sysdeps/powerpc/bits/fenvinline.h b/sysdeps/powerpc/bits/fenvinline.h
index 89ea38a4f8..f2d095a72f 100644
--- a/sysdeps/powerpc/bits/fenvinline.h
+++ b/sysdeps/powerpc/bits/fenvinline.h
@@ -75,7 +75,7 @@ 
     int __e = __excepts;						      \
     int __ret = 0;							      \
     if (__builtin_constant_p (__e)					      \
-        && powerof2 (__e)						      \
+        && __builtin_popcount (__e) == 1				      \
         && __e != FE_INVALID)						      \
       {									      \
 	__MTFSB1 ((__builtin_clz (__e)));				      \
@@ -91,7 +91,7 @@ 
     int __e = __excepts;						      \
     int __ret = 0;							      \
     if (__builtin_constant_p (__e)					      \
-        && powerof2 (__e)						      \
+        && __builtin_popcount (__e) == 1				      \
         && __e != FE_INVALID)						      \
       {									      \
 	__MTFSB0 ((__builtin_clz (__e)));				      \