Message ID | 1566335985-14601-4-git-send-email-pc@us.ibm.com |
---|---|
State | New |
Headers | show |
Series | various FPSCR optimizations | expand |
On 8/20/19 4:19 PM, Paul A. Clarke wrote: > From: "Paul A. Clarke" <pc@us.ibm.com> > > fesetenv() reads the current value of the Floating-Point Status and Control > Register (FPSCR) to determine the difference between the current state of > exception enables and the newly requested state. All of these bits are also > returned by the lighter weight 'mffsl' instruction used by fegetenv_status(). > Use that instead. > > Also, remove a local macro _FPU_MASK_ALL in favor of a common macro, > FPU_ENABLES_MASK from fenv_libc.h. > > Finally, use a local variable ('new') in favor of a pointer dereference > ('*envp'). > > 2019-08-20 Paul A. Clarke <pc@us.ibm.com> > > * sysdeps/powerpc/fpu/fesetenv.c (__fesetenv): Utilize lightweight > FPSCR read. > (_FPU_MASK_ALL): Delete. > --- > sysdeps/powerpc/fpu/fesetenv.c | 12 ++++-------- > 1 file changed, 4 insertions(+), 8 deletions(-) > > diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c > index 009a4f0..5ca15c7 100644 > --- a/sysdeps/powerpc/fpu/fesetenv.c > +++ b/sysdeps/powerpc/fpu/fesetenv.c > @@ -19,8 +19,6 @@ > #include <fenv_libc.h> > #include <fpu_control.h> > > -#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM | _FPU_MASK_XM | _FPU_MASK_IM) > - > int > __fesetenv (const fenv_t *envp) > { > @@ -28,25 +26,23 @@ __fesetenv (const fenv_t *envp) > > /* get the currently set exceptions. */ > new.fenv = *envp; > - old.fenv = fegetenv_register (); > - if (old.l == new.l) > - return 0; > + old.fenv = fegetenv_status (); > > /* If the old env has no enabled exceptions and the new env has any enabled > exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits. This will put the > hardware into "precise mode" and may cause the FPU to run slower on some > hardware. */ > - if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0) > + if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0) > (void) __fe_nomask_env_priv (); > > /* If the old env had any enabled exceptions and the new env has no enabled > exceptions, then mask SIGFPE in the MSR FE0/FE1 bits. This may allow the > FPU to run faster because it always takes the default action and can not > generate SIGFPE. */ > - if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0) > + if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0) > (void)__fe_mask_env (); If you need to make another similar change, I might recommend consolidating the code to toggle the MSR bits. > > - fesetenv_register (*envp); > + fesetenv_register (new.fenv); > > /* Success. */ > return 0; > LGTM, thanks.
On 8/21/19 5:13 PM, Paul E Murphy wrote: > On 8/20/19 4:19 PM, Paul A. Clarke wrote: >> diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c >> index 009a4f0..5ca15c7 100644 >> --- a/sysdeps/powerpc/fpu/fesetenv.c >> +++ b/sysdeps/powerpc/fpu/fesetenv.c >> @@ -28,25 +26,23 @@ __fesetenv (const fenv_t *envp) >> /* If the old env has no enabled exceptions and the new env has any enabled >> exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits. This will put the >> hardware into "precise mode" and may cause the FPU to run slower on some >> hardware. */ >> - if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0) >> + if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0) >> (void) __fe_nomask_env_priv (); > >> /* If the old env had any enabled exceptions and the new env has no enabled >> exceptions, then mask SIGFPE in the MSR FE0/FE1 bits. This may allow the >> FPU to run faster because it always takes the default action and can not >> generate SIGFPE. */ >> - if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0) >> + if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0) >> (void)__fe_mask_env (); > If you need to make another similar change, I might recommend consolidating the code to toggle the MSR bits. Could you elaborate on what consolidation you are anticipating? PC
On 8/22/19 11:55 AM, Paul Clarke wrote: > On 8/21/19 5:13 PM, Paul E Murphy wrote: >> On 8/20/19 4:19 PM, Paul A. Clarke wrote: > >>> diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c >>> index 009a4f0..5ca15c7 100644 >>> --- a/sysdeps/powerpc/fpu/fesetenv.c >>> +++ b/sysdeps/powerpc/fpu/fesetenv.c >>> @@ -28,25 +26,23 @@ __fesetenv (const fenv_t *envp) > >>> /* If the old env has no enabled exceptions and the new env has any enabled >>> exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits. This will put the >>> hardware into "precise mode" and may cause the FPU to run slower on some >>> hardware. */ >>> - if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0) >>> + if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0) >>> (void) __fe_nomask_env_priv (); > >>> /* If the old env had any enabled exceptions and the new env has no enabled >>> exceptions, then mask SIGFPE in the MSR FE0/FE1 bits. This may allow the >>> FPU to run faster because it always takes the default action and can not >>> generate SIGFPE. */ >>> - if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0) >>> + if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0) >>> (void)__fe_mask_env (); > >> If you need to make another similar change, I might recommend consolidating the code to toggle the MSR bits. > > Could you elaborate on what consolidation you are anticipating? Code similar to the above exists in at least four places (as of f615e3fced) to toggle the MSR bits if the number of enabled exceptions changes from 0. git grep "([a-z]*[.]l & _FPU_[A-Z_]*) .= . && ([a-z]*.l & _FPU_[A-Z_]*)" Maybe those can be moved into a single macro/inline function in fenv_private.h if more fixups need to be made? > > PC >
diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c index 009a4f0..5ca15c7 100644 --- a/sysdeps/powerpc/fpu/fesetenv.c +++ b/sysdeps/powerpc/fpu/fesetenv.c @@ -19,8 +19,6 @@ #include <fenv_libc.h> #include <fpu_control.h> -#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM | _FPU_MASK_XM | _FPU_MASK_IM) - int __fesetenv (const fenv_t *envp) { @@ -28,25 +26,23 @@ __fesetenv (const fenv_t *envp) /* get the currently set exceptions. */ new.fenv = *envp; - old.fenv = fegetenv_register (); - if (old.l == new.l) - return 0; + old.fenv = fegetenv_status (); /* If the old env has no enabled exceptions and the new env has any enabled exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits. This will put the hardware into "precise mode" and may cause the FPU to run slower on some hardware. */ - if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0) + if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0) (void) __fe_nomask_env_priv (); /* If the old env had any enabled exceptions and the new env has no enabled exceptions, then mask SIGFPE in the MSR FE0/FE1 bits. This may allow the FPU to run faster because it always takes the default action and can not generate SIGFPE. */ - if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0) + if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0) (void)__fe_mask_env (); - fesetenv_register (*envp); + fesetenv_register (new.fenv); /* Success. */ return 0;
From: "Paul A. Clarke" <pc@us.ibm.com> fesetenv() reads the current value of the Floating-Point Status and Control Register (FPSCR) to determine the difference between the current state of exception enables and the newly requested state. All of these bits are also returned by the lighter weight 'mffsl' instruction used by fegetenv_status(). Use that instead. Also, remove a local macro _FPU_MASK_ALL in favor of a common macro, FPU_ENABLES_MASK from fenv_libc.h. Finally, use a local variable ('new') in favor of a pointer dereference ('*envp'). 2019-08-20 Paul A. Clarke <pc@us.ibm.com> * sysdeps/powerpc/fpu/fesetenv.c (__fesetenv): Utilize lightweight FPSCR read. (_FPU_MASK_ALL): Delete. --- sysdeps/powerpc/fpu/fesetenv.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)