Patchwork [i386] Turn on -fschedule-insns and -fsched-pressure by default for x86-64

login
register
mail settings
Submitter Fang, Changpeng
Date Dec. 6, 2010, 8:04 p.m.
Message ID <D4C76825A6780047854A11E93CDE84D004C17683E8@SAUSEXMBP01.amd.com>
Download mbox | patch
Permalink /patch/74461/
State New
Headers show

Comments

Fang, Changpeng - Dec. 6, 2010, 8:04 p.m.
Hi,

Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at 
-O2 or higher (for speed-runs only) for x86-64 systems.  Enablement of these two flags could improve
the performance of floating point programs (For CFP2006, 2.29% under -O2, and  1.34% under -O3).
There is no apparent performance  impact on integer programs (as expected).

We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy
the performance gains.

The patch passed bootstrappings under -O2 and -O3 on  x86_64-unknown-linux-gnu, and passed
the gcc regression tests.

Is it OK to commit the patch?

Thanks,

Changpeng
Percentage improvement by turning on -fschedule-insns and
-fsched-pressure on an x86-64 system

                -O2	-O3
============================
410.bwaves	1.63	2.45
416.gamess	1.79	2.38
433.milc	1.39	2.52
434.zeusmp	1.79	0.8
435.gromacs	3.48	4.02
436.cactusADM	0.49	11.26
437.leslie3d	0	1.02
444.namd	3.73	3.6
447.dealII	1.09	-0.5
450.soplex	0	1.26
453.povray	0	0
454.calculix	23.81	0.24
459.GemsFDTD	2.16	-1.97
465.tonto	0	-0.75
470.lbm		1.49	-3.28
481.wrf		0.94	0.69
482.sphinx3	0.5	0
400.perlbench	1.07	0
401.bzip2	-1.48	-2.27
403.gcc		0.68	0.67
429.mcf		2.01	1.86
445.gobmk	1.19	-0.6
456.hmmer	3.45	0
458.sjeng	1.37	0.64
462.libquantum	0	-0.32
464.h264ref	0.92	-2.15
471.omnetpp	0.86	0.84
473.astar	-0.97	-0.91
483.xalancbmk	1.82	3.55
=============================
CFP GEOMEAN	2.29	1.34
CINT GEOMEAN	0.65	0
H.J. Lu - Dec. 6, 2010, 9:18 p.m.
On Mon, Dec 6, 2010 at 12:04 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi,
>
> Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at
> -O2 or higher (for speed-runs only) for x86-64 systems.  Enablement of these two flags could improve
> the performance of floating point programs (For CFP2006, 2.29% under -O2, and  1.34% under -O3).
> There is no apparent performance  impact on integer programs (as expected).
>
> We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy
> the performance gains.
>
> The patch passed bootstrappings under -O2 and -O3 on  x86_64-unknown-linux-gnu, and passed
> the gcc regression tests.
>
> Is it OK to commit the patch?

"-fschedule-insns -fsched-pressure" isn't well tested:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829

Have you run "-fschedule-insns -fsched-pressure" on gcc testsuite?
Fang, Changpeng - Dec. 6, 2010, 9:37 p.m.
> "-fschedule-insns -fsched-pressure" isn't well tested:
We have tested the gcc testsuite and SPEC cpu benchmarks.


> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829
Bug 46829 does not appear in 4.6 trunk, and the patched compiler does not
not has this bug either.

>Have you run "-fschedule-insns -fsched-pressure" on gcc testsuite?
Yes, gcc testsuite testing passes for both -O2 and -O3 bootstrapped compilers.

Thanks,

Changpeng

>--
>H.J.
H.J. Lu - Dec. 6, 2010, 9:54 p.m.
On Mon, Dec 6, 2010 at 1:37 PM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>
>> "-fschedule-insns -fsched-pressure" isn't well tested:
> We have tested the gcc testsuite and SPEC cpu benchmarks.
>
>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829
> Bug 46829 does not appear in 4.6 trunk, and the patched compiler does not
> not has this bug either.

Which revision are are you using? I verified it with trunk revision 167517.
Fang, Changpeng - Dec. 6, 2010, 10:07 p.m.
Sorry, the bug exists in gcc 4.6 trunk.

Thanks,

Changpeng
Jack Howarth - Dec. 7, 2010, 6:10 a.m.
On Mon, Dec 06, 2010 at 02:04:14PM -0600, Fang, Changpeng wrote:
> Hi,
> 
> Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at 
> -O2 or higher (for speed-runs only) for x86-64 systems.  Enablement of these two flags could improve
> the performance of floating point programs (For CFP2006, 2.29% under -O2, and  1.34% under -O3).
> There is no apparent performance  impact on integer programs (as expected).
> 
> We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy
> the performance gains.
> 
> The patch passed bootstrappings under -O2 and -O3 on  x86_64-unknown-linux-gnu, and passed
> the gcc regression tests.
> 
> Is it OK to commit the patch?
> 
> Thanks,
> 
> Changpeng

According to the Polyhedron 2005 benchmarks, -fschedule-insns and -fsched-pressure
result in a net performance loss (at least for -mtune=core2) at both -m32 and -m64
on x86_64-apple-darwin10. The induct benchmark in particular suffers a severe
regression with those options.

x86_64-apple-darwin10
gfortran -m64 -mtune=core2 -ffast-math -funroll-loops -O3 %n.f90 -o %n

benchmark           stock  -fschedule-insns  %change
                           -fsched-pressure
ac                   8.81    8.75             -0.7
aermod              17.21   17.21              0.0
air                  5.53    5.53              0.0
capacita            32.39   32.89              1.5
channel              1.84    1.84              0.0
doduc               26.40   26.55              0.6
fatigue              8.14    8.22              1.0
gas_dyn              4.31    4.29             -0.5
induct              12.85   15.89             19.1
linpk               15.47   15.47              0.0
mdbx                11.21   11.19             -0.2
nf                  29.98   29.96             -0.1
protein             34.03   33.44             -1.8
rnflow              23.13   22.91             -1.0
test_fpu             8.03    8.11              1.0
tfft                 1.87    1.87              0.0

Geometric Mean      10.84   10.99              1.4


gfortran -m32 -mtune=core2 -ffast-math -funroll-loops -O3 %n.f90 -o %n

benchmark           stock  -fschedule-insns  %change
                           -fsched-pressure
ac                  10.76   11.14              3.4
aermod              19.55   19.25             -1.6
air                  6.08    6.54              7.0
capacita            46.13   44.82             -2.9
channel              1.98    1.98              0.0
doduc               31.23   31.27              0.1
fatigue              9.83   10.35              5.0
gas_dyn              4.71    4.74              0.6
induct              13.93   18.74             25.7
linpk               15.49   15.54              0.3
mdbx                11.28   11.51              2.0
nf                  27.51   27.53              0.1
protein             38.55   38.90              0.9
rnflow              24.69   24.55             -0.6
test_fpu            10.12   10.88              7.0
tfft                 1.92    1.95              1.5

Geometric Mean      12.09   12.50              3.3


Content-Description: 0001-Turn-on-fschedule-insns-and-fsched-pressure-by-defau.patch
> From 60859804422c1e0093769eaa47b4cc2d95657f86 Mon Sep 17 00:00:00 2001
> From: Changpeng Fang <chfang@houghton.(none)>
> Date: Mon, 6 Dec 2010 11:03:07 -0800
> Subject: [PATCH] Turn on -fschedule-insns and -fsched-pressure by default at -O2 or higher for x86-64.
> 
> 	* common.opt (fschedule-insns): Add Init(0).
> 
> 	* opts.c (default_options_table): Set default of -fschedule-insns to
> 	2 to differentiate it with the user setting of 1 (if any).
> 
> 	* config/i386/i386.c (ix86_option_optimization_table): Turn on
> 	-fsched-pressure by default at -O2 or higher (for speed run only).
> 	(ix86_option_override_internal): Turn off -fschedule-insns by default
>  	for 32-bit compilation.  Also, turn off -fsched-pressure by default if
> 	-fschedule-insns was not turned on.
> ---
>  gcc/common.opt         |    2 +-
>  gcc/config/i386/i386.c |   20 +++++++++++++++++---
>  gcc/opts.c             |    2 +-
>  3 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 57f5b0a..72be118 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1570,7 +1570,7 @@ Common Ignore
>  Does nothing.  Preserved for backward compatibility.
>  
>  fschedule-insns
> -Common Report Var(flag_schedule_insns) Optimization
> +Common Report Var(flag_schedule_insns) Init(0) Optimization
>  Reschedule instructions before register allocation
>  
>  fschedule-insns2
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index a66a0c4..226e4fc 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -4079,6 +4079,19 @@ ix86_option_override_internal (bool main_args_p)
>    if (!TARGET_SCHEDULE)
>      flag_schedule_insns_after_reload = flag_schedule_insns = 0;
>  
> +#ifdef INSN_SCHEDULING
> +  /* For 32-bit compilation, turn off -fschedule-insns by default to hide
> +     the problem with not enough registers.  */
> +  if (!TARGET_64BIT)
> +    flag_schedule_insns = (flag_schedule_insns == 1);
> +
> +  /* If -fschedule-insns was not turned on, the default of -fsched-pressure
> +     should be off.  On the other hand, if user turned on -fschedule-insns
> +     for 32-bit compilation, the default of -fsched-presssure is kept on.  */
> +  if (!flag_schedule_insns)
> +    flag_sched_pressure = (flag_sched_pressure == 1);
> +#endif
> +
>    maybe_set_param_value (PARAM_SIMULTANEOUS_PREFETCHES,
>  			 ix86_cost->simultaneous_prefetches,
>  			 global_options.x_param_values,
> @@ -5031,10 +5044,11 @@ x86_output_aligned_bss (FILE *file, tree decl ATTRIBUTE_UNUSED,
>  
>  static const struct default_options ix86_option_optimization_table[] =
>    {
> -    /* Turn off -fschedule-insns by default.  It tends to make the
> -       problem with not enough registers even worse.  */
> +    /* Turn on -fsched-pressure by default at -O2 or higher.  This could
> +       alleviate the problem with not enough registers caused by turning
> +       on -fschedule-pressure (in opts.c).  */
>  #ifdef INSN_SCHEDULING
> -    { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fsched_pressure, NULL, 2 },
>  #endif
>  
>  #ifdef SUBTARGET_OPTIMIZATION_OPTIONS
> diff --git a/gcc/opts.c b/gcc/opts.c
> index cd41c2a..7700f0a 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -483,7 +483,7 @@ static const struct default_options default_options_table[] =
>      { OPT_LEVELS_2_PLUS, OPT_fpeephole2, NULL, 1 },
>  #ifdef INSN_SCHEDULING
>    /* Only run the pre-regalloc scheduling pass if optimizing for speed.  */
> -    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 2 },
>      { OPT_LEVELS_2_PLUS, OPT_fschedule_insns2, NULL, 1 },
>  #endif
>      { OPT_LEVELS_2_PLUS, OPT_fregmove, NULL, 1 },
> -- 
> 1.6.3.3
> 

Content-Description: schedule-improvement.txt
> Percentage improvement by turning on -fschedule-insns and
> -fsched-pressure on an x86-64 system
> 
>                 -O2	-O3
> ============================
> 410.bwaves	1.63	2.45
> 416.gamess	1.79	2.38
> 433.milc	1.39	2.52
> 434.zeusmp	1.79	0.8
> 435.gromacs	3.48	4.02
> 436.cactusADM	0.49	11.26
> 437.leslie3d	0	1.02
> 444.namd	3.73	3.6
> 447.dealII	1.09	-0.5
> 450.soplex	0	1.26
> 453.povray	0	0
> 454.calculix	23.81	0.24
> 459.GemsFDTD	2.16	-1.97
> 465.tonto	0	-0.75
> 470.lbm		1.49	-3.28
> 481.wrf		0.94	0.69
> 482.sphinx3	0.5	0
> 400.perlbench	1.07	0
> 401.bzip2	-1.48	-2.27
> 403.gcc		0.68	0.67
> 429.mcf		2.01	1.86
> 445.gobmk	1.19	-0.6
> 456.hmmer	3.45	0
> 458.sjeng	1.37	0.64
> 462.libquantum	0	-0.32
> 464.h264ref	0.92	-2.15
> 471.omnetpp	0.86	0.84
> 473.astar	-0.97	-0.91
> 483.xalancbmk	1.82	3.55
> =============================
> CFP GEOMEAN	2.29	1.34
> CINT GEOMEAN	0.65	0
>
Fang, Changpeng - Dec. 7, 2010, 6:42 p.m.

Paolo Bonzini - Dec. 7, 2010, 7:23 p.m.
On 12/06/2010 09:04 PM, Fang, Changpeng wrote:
> Hi,
>
> Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at
> -O2 or higher (for speed-runs only) for x86-64 systems.  Enablement of these two flags could improve
> the performance of floating point programs (For CFP2006, 2.29% under -O2, and  1.34% under -O3).
> There is no apparent performance  impact on integer programs (as expected).
>
> We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy
> the performance gains.

I think enabling rename registers whenever schedule_insns2 is enabled 
would achieve almost the same freedom without risking bugs like 46829. 
There are just too many singleton register classes in i386 for 
schedule-insns to work. :(

Paolo
Jack Howarth - Dec. 7, 2010, 7:55 p.m.
On Tue, Dec 07, 2010 at 08:23:22PM +0100, Paolo Bonzini wrote:
> On 12/06/2010 09:04 PM, Fang, Changpeng wrote:
>> Hi,
>>
>> Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at
>> -O2 or higher (for speed-runs only) for x86-64 systems.  Enablement of these two flags could improve
>> the performance of floating point programs (For CFP2006, 2.29% under -O2, and  1.34% under -O3).
>> There is no apparent performance  impact on integer programs (as expected).
>>
>> We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy
>> the performance gains.
>
> I think enabling rename registers whenever schedule_insns2 is enabled  
> would achieve almost the same freedom without risking bugs like 46829.  
> There are just too many singleton register classes in i386 for  
> schedule-insns to work. :(

Paolo,
   I noticed that the manpage claims this option can degrade debugging on some targets.
Which debug formats suffer from this problem and wouldn't we want to avoid enabling
-frename-registers on those targets for -O2, -O3 and -Os?
                Jack


>
> Paolo
Paolo Bonzini - Dec. 7, 2010, 7:56 p.m.
On 12/07/2010 08:55 PM, Jack Howarth wrote:
>     I noticed that the manpage claims this option can degrade debugging on some targets.
> Which debug formats suffer from this problem and wouldn't we want to avoid enabling
> -frename-registers on those targets for -O2, -O3 and -Os?

I think it is false since -fvariable-tracking was committed (3.4 maybe?)

Paolo
Jack Howarth - Dec. 8, 2010, 12:12 a.m.
On Tue, Dec 07, 2010 at 08:23:22PM +0100, Paolo Bonzini wrote:
> On 12/06/2010 09:04 PM, Fang, Changpeng wrote:
>> Hi,
>>
>> Attached is the patch that proposes to turn on -fschedule-insns and -fsched-pressure by default at
>> -O2 or higher (for speed-runs only) for x86-64 systems.  Enablement of these two flags could improve
>> the performance of floating point programs (For CFP2006, 2.29% under -O2, and  1.34% under -O3).
>> There is no apparent performance  impact on integer programs (as expected).
>>
>> We have investigated this optimization since 4.5, and it is the time to turn on these two flags to enjoy
>> the performance gains.
>
> I think enabling rename registers whenever schedule_insns2 is enabled  
> would achieve almost the same freedom without risking bugs like 46829.  
> There are just too many singleton register classes in i386 for  
> schedule-insns to work. :(
>
> Paolo

Paolo,
    The improvements in the Polyhedron benchmark from -frename-registers on x86_64-apple-darwin10
appear marginal.

gfortran -mtune=core2 -ffast-math -funroll-loops -O3 %n.f90 -o %n

benchmark        stock   -frename-registers %change
ac               8.81    8.80               -0.1
aermod          17.21   17.21                0.0
air              5.53    5.56                0.5
capacita        32.39   32.37               -0.1
channel          1.84    1.84                0.0
doduc           26.40   26.39                0.0
fatigue          8.14    8.11               -0.4
gas_dyn          4.31    4.32                0.2
induct          12.85   12.84               -0.1
linpk           15.47   15.48                0.1
mdbx            11.21   11.22                0.1
nf              29.98   29.94               -0.1
protein         34.03   32.04               -6.2
rnflow          23.13   23.12               -0.0
test_fpu         8.03    8.03                0.0
tfft             1.87    1.86               -0.5

Geometric Mean  10.84   10.80               -0.4

        Jack
Jan Hubicka - Dec. 11, 2010, 1:40 p.m.
> On 12/07/2010 08:55 PM, Jack Howarth wrote:
>>     I noticed that the manpage claims this option can degrade debugging on some targets.
>> Which debug formats suffer from this problem and wouldn't we want to avoid enabling
>> -frename-registers on those targets for -O2, -O3 and -Os?
>
> I think it is false since -fvariable-tracking was committed (3.4 maybe?)

Yes, variable-tracking was introduced to solve this problem.  The manpage
probably reffers to non-dwarf2 formats (only dwarf2out use location notes) that
are much less of concern now than back then.
>
> Paolo
Jan Hubicka - Dec. 11, 2010, 1:41 p.m.
Hi,
note that -frename-registers was enabled by default for a while and then disabled again.

Honza
Paolo Bonzini - Dec. 12, 2010, 10:11 a.m.
On 12/11/2010 02:41 PM, Jan Hubicka wrote:
> Hi,
> note that -frename-registers was enabled by default for a while and then disabled again.

It was enabled by me in fact :) but a lot of changes have been made in 
the meanwhile.

Paolo

Patch

From 60859804422c1e0093769eaa47b4cc2d95657f86 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@houghton.(none)>
Date: Mon, 6 Dec 2010 11:03:07 -0800
Subject: [PATCH] Turn on -fschedule-insns and -fsched-pressure by default at -O2 or higher for x86-64.

	* common.opt (fschedule-insns): Add Init(0).

	* opts.c (default_options_table): Set default of -fschedule-insns to
	2 to differentiate it with the user setting of 1 (if any).

	* config/i386/i386.c (ix86_option_optimization_table): Turn on
	-fsched-pressure by default at -O2 or higher (for speed run only).
	(ix86_option_override_internal): Turn off -fschedule-insns by default
 	for 32-bit compilation.  Also, turn off -fsched-pressure by default if
	-fschedule-insns was not turned on.
---
 gcc/common.opt         |    2 +-
 gcc/config/i386/i386.c |   20 +++++++++++++++++---
 gcc/opts.c             |    2 +-
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 57f5b0a..72be118 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1570,7 +1570,7 @@  Common Ignore
 Does nothing.  Preserved for backward compatibility.
 
 fschedule-insns
-Common Report Var(flag_schedule_insns) Optimization
+Common Report Var(flag_schedule_insns) Init(0) Optimization
 Reschedule instructions before register allocation
 
 fschedule-insns2
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a66a0c4..226e4fc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4079,6 +4079,19 @@  ix86_option_override_internal (bool main_args_p)
   if (!TARGET_SCHEDULE)
     flag_schedule_insns_after_reload = flag_schedule_insns = 0;
 
+#ifdef INSN_SCHEDULING
+  /* For 32-bit compilation, turn off -fschedule-insns by default to hide
+     the problem with not enough registers.  */
+  if (!TARGET_64BIT)
+    flag_schedule_insns = (flag_schedule_insns == 1);
+
+  /* If -fschedule-insns was not turned on, the default of -fsched-pressure
+     should be off.  On the other hand, if user turned on -fschedule-insns
+     for 32-bit compilation, the default of -fsched-presssure is kept on.  */
+  if (!flag_schedule_insns)
+    flag_sched_pressure = (flag_sched_pressure == 1);
+#endif
+
   maybe_set_param_value (PARAM_SIMULTANEOUS_PREFETCHES,
 			 ix86_cost->simultaneous_prefetches,
 			 global_options.x_param_values,
@@ -5031,10 +5044,11 @@  x86_output_aligned_bss (FILE *file, tree decl ATTRIBUTE_UNUSED,
 
 static const struct default_options ix86_option_optimization_table[] =
   {
-    /* Turn off -fschedule-insns by default.  It tends to make the
-       problem with not enough registers even worse.  */
+    /* Turn on -fsched-pressure by default at -O2 or higher.  This could
+       alleviate the problem with not enough registers caused by turning
+       on -fschedule-pressure (in opts.c).  */
 #ifdef INSN_SCHEDULING
-    { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fsched_pressure, NULL, 2 },
 #endif
 
 #ifdef SUBTARGET_OPTIMIZATION_OPTIONS
diff --git a/gcc/opts.c b/gcc/opts.c
index cd41c2a..7700f0a 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -483,7 +483,7 @@  static const struct default_options default_options_table[] =
     { OPT_LEVELS_2_PLUS, OPT_fpeephole2, NULL, 1 },
 #ifdef INSN_SCHEDULING
   /* Only run the pre-regalloc scheduling pass if optimizing for speed.  */
-    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 2 },
     { OPT_LEVELS_2_PLUS, OPT_fschedule_insns2, NULL, 1 },
 #endif
     { OPT_LEVELS_2_PLUS, OPT_fregmove, NULL, 1 },
-- 
1.6.3.3