diff mbox

Enable SSE math on i386 with -Ofast

Message ID 20131004105656.GA25297@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Oct. 4, 2013, 10:56 a.m. UTC
Hi,
this patch makes -Ofast to also imply -mfpmath=sse.  It is important win on
SPECfP (2000 and 2006). Even though for exmaple the following
float a(float b)
{
   return b+10;
}

results in somewhat ridiculous
a:
.LFB0:  
        .cfi_startproc
        subl    $4, %esp
        .cfi_def_cfa_offset 8
        movss   .LC0, %xmm0
        addss   8(%esp), %xmm0
        movss   %xmm0, (%esp)
        flds    (%esp)
        addl    $4, %esp
        .cfi_def_cfa_offset 4
        ret

I wonder if we can get rid at least of the redundant stack alignment on ESP...

Bootstrapped/regtested x86_64-linux, will commit it on weekend if there are no
complains.  I wonder if -ffast-math should do the same - it is documented as enabling
explicit set of options, bu that can be changed I guess.

	* invoke.texi (Ofast): Update documentation.
	* i386.h (TARGET_FPMATH_DEFAULT): Enable SSE math with -Ofast.

Comments

Richard Biener Oct. 7, 2013, 8:50 a.m. UTC | #1
On Fri, 4 Oct 2013, Jan Hubicka wrote:

> Hi,
> this patch makes -Ofast to also imply -mfpmath=sse.  It is important win on
> SPECfP (2000 and 2006). Even though for exmaple the following
> float a(float b)
> {
>    return b+10;
> }
> 
> results in somewhat ridiculous
> a:
> .LFB0:  
>         .cfi_startproc
>         subl    $4, %esp
>         .cfi_def_cfa_offset 8
>         movss   .LC0, %xmm0
>         addss   8(%esp), %xmm0
>         movss   %xmm0, (%esp)
>         flds    (%esp)
>         addl    $4, %esp
>         .cfi_def_cfa_offset 4
>         ret
> 
> I wonder if we can get rid at least of the redundant stack alignment on ESP...
> 
> Bootstrapped/regtested x86_64-linux, will commit it on weekend if there are no
> complains.  I wonder if -ffast-math should do the same - it is documented as enabling
> explicit set of options, bu that can be changed I guess.

I wonder if we can restrict -mfpmath=sse to local functions where we can
change the ABI ... (do we change the local functions ABI with 
-mfpmath=sse?)

Richard.

> 	* invoke.texi (Ofast): Update documentation.
> 	* i386.h (TARGET_FPMATH_DEFAULT): Enable SSE math with -Ofast.
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi     (revision 203161)
> +++ doc/invoke.texi     (working copy)
> @@ -6796,6 +6796,7 @@ Disregard strict standards compliance.
>  valid for all standard-compliant programs.
>  It turns on @option{-ffast-math} and the Fortran-specific
>  @option{-fno-protect-parens} and @option{-fstack-arrays}.
> +On i386 target it also enable @option{-mfpmath=sse}.
>  
>  @item -Og
>  @opindex Og
> Index: config/i386/i386.h
> ===================================================================
> --- config/i386/i386.h	(revision 203161)
> +++ config/i386/i386.h	(working copy)
> @@ -209,7 +209,8 @@ extern const struct processor_costs ix86
>  
>  #ifndef TARGET_FPMATH_DEFAULT
>  #define TARGET_FPMATH_DEFAULT \
> -  (TARGET_64BIT && TARGET_SSE ? FPMATH_SSE : FPMATH_387)
> +  ((TARGET_64BIT && TARGET_SSE) \
> +   || (TARGET_SSE && optimize_fast) ? FPMATH_SSE : FPMATH_387)
>  #endif
>  
>  #define TARGET_FLOAT_RETURNS_IN_80387 TARGET_FLOAT_RETURNS
> 
>
Jan Hubicka Oct. 7, 2013, 9:22 a.m. UTC | #2
> On Fri, 4 Oct 2013, Jan Hubicka wrote:
> 
> > Hi,
> > this patch makes -Ofast to also imply -mfpmath=sse.  It is important win on
> > SPECfP (2000 and 2006). Even though for exmaple the following
> > float a(float b)
> > {
> >    return b+10;
> > }
> > 
> > results in somewhat ridiculous
> > a:
> > .LFB0:  
> >         .cfi_startproc
> >         subl    $4, %esp
> >         .cfi_def_cfa_offset 8
> >         movss   .LC0, %xmm0
> >         addss   8(%esp), %xmm0
> >         movss   %xmm0, (%esp)
> >         flds    (%esp)
> >         addl    $4, %esp
> >         .cfi_def_cfa_offset 4
> >         ret
> > 
> > I wonder if we can get rid at least of the redundant stack alignment on ESP...
> > 
> > Bootstrapped/regtested x86_64-linux, will commit it on weekend if there are no
> > complains.  I wonder if -ffast-math should do the same - it is documented as enabling
> > explicit set of options, bu that can be changed I guess.
> 
> I wonder if we can restrict -mfpmath=sse to local functions where we can

We can, but why? Parameters are passed in memory that is equaly bad for 387 and
SSE.  Only return values are passed in registers, that is not that expensive to
have one extra reload per function except for functions containing almost
nothing that should be inlined if they are local.  In meantime I (partially,
since megrez stopped producing 32bit spec2k6 results) benchmarked
-mfpmath=sse,387 and it does not seem to be a loss anymore.  So perhaps we can
give it a try?

> change the ABI ... (do we change the local functions ABI with 
> -mfpmath=sse?)

We don't.  It is probably quite easy to default to sse_regparm and change return value type.
I will look into it.

Honza
Richard Biener Oct. 7, 2013, 9:27 a.m. UTC | #3
On Mon, 7 Oct 2013, Jan Hubicka wrote:

> > On Fri, 4 Oct 2013, Jan Hubicka wrote:
> > 
> > > Hi,
> > > this patch makes -Ofast to also imply -mfpmath=sse.  It is important win on
> > > SPECfP (2000 and 2006). Even though for exmaple the following
> > > float a(float b)
> > > {
> > >    return b+10;
> > > }
> > > 
> > > results in somewhat ridiculous
> > > a:
> > > .LFB0:  
> > >         .cfi_startproc
> > >         subl    $4, %esp
> > >         .cfi_def_cfa_offset 8
> > >         movss   .LC0, %xmm0
> > >         addss   8(%esp), %xmm0
> > >         movss   %xmm0, (%esp)
> > >         flds    (%esp)
> > >         addl    $4, %esp
> > >         .cfi_def_cfa_offset 4
> > >         ret
> > > 
> > > I wonder if we can get rid at least of the redundant stack alignment on ESP...
> > > 
> > > Bootstrapped/regtested x86_64-linux, will commit it on weekend if there are no
> > > complains.  I wonder if -ffast-math should do the same - it is documented as enabling
> > > explicit set of options, bu that can be changed I guess.
> > 
> > I wonder if we can restrict -mfpmath=sse to local functions where we can
> 
> We can, but why? Parameters are passed in memory that is equaly bad for 387 and
> SSE.  Only return values are passed in registers, that is not that expensive to
> have one extra reload per function except for functions containing almost
> nothing that should be inlined if they are local.

Ah, I forgot that detail.  Still going through the FP stack for return
values is bad.

>  In meantime I (partially,
> since megrez stopped producing 32bit spec2k6 results) benchmarked
> -mfpmath=sse,387 and it does not seem to be a loss anymore.  So perhaps we can
> give it a try?

Not sure ... I would guess that it's not a win on any recent architecture
(and LRA is probably not well-prepared here either).

> > change the ABI ... (do we change the local functions ABI with 
> > -mfpmath=sse?)
> 
> We don't.  It is probably quite easy to default to sse_regparm and change return value type.
> I will look into it.

Thanks.  That's independent of enabling -mfpmath=sse at -Ofast of course.

Richard.
Jan Hubicka Oct. 7, 2013, 9:32 a.m. UTC | #4
> >  In meantime I (partially,
> > since megrez stopped producing 32bit spec2k6 results) benchmarked
> > -mfpmath=sse,387 and it does not seem to be a loss anymore.  So perhaps we can
> > give it a try?
> 
> Not sure ... I would guess that it's not a win on any recent architecture
> (and LRA is probably not well-prepared here either).

I think it has chance to win when the input/out registers are forced to be in
387 (because of return value ABI) and perhaps with register pressure in cases
two independent computtions are going on and LRA can home one in SSE and other
in 387 registers.  Don't really know.

Main advantage of 387 is that it is significantly more compact than SSE. Last
hardware really favouring 387 was probably original pentium4 (where additions
was better pipelined on 387 path if I recall correctly).  I wonder how AVX
changed this.  I was thus thining about adding a mode where we chose 387 or SSE
based on fact if function is optimzed for size.
The size difference is quite high - around 5% on specfp.
> 
> > > change the ABI ... (do we change the local functions ABI with 
> > > -mfpmath=sse?)
> > 
> > We don't.  It is probably quite easy to default to sse_regparm and change return value type.
> > I will look into it.
> 
M
> Thanks.  That's independent of enabling -mfpmath=sse at -Ofast of course.
Yep, my plan is to enable fpmath with -Ofast today and look into those two items incrementally.

Honza
> 
> Richard.
diff mbox

Patch

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi     (revision 203161)
+++ doc/invoke.texi     (working copy)
@@ -6796,6 +6796,7 @@  Disregard strict standards compliance.
 valid for all standard-compliant programs.
 It turns on @option{-ffast-math} and the Fortran-specific
 @option{-fno-protect-parens} and @option{-fstack-arrays}.
+On i386 target it also enable @option{-mfpmath=sse}.
 
 @item -Og
 @opindex Og
Index: config/i386/i386.h
===================================================================
--- config/i386/i386.h	(revision 203161)
+++ config/i386/i386.h	(working copy)
@@ -209,7 +209,8 @@  extern const struct processor_costs ix86
 
 #ifndef TARGET_FPMATH_DEFAULT
 #define TARGET_FPMATH_DEFAULT \
-  (TARGET_64BIT && TARGET_SSE ? FPMATH_SSE : FPMATH_387)
+  ((TARGET_64BIT && TARGET_SSE) \
+   || (TARGET_SSE && optimize_fast) ? FPMATH_SSE : FPMATH_387)
 #endif
 
 #define TARGET_FLOAT_RETURNS_IN_80387 TARGET_FLOAT_RETURNS