diff mbox

[i386] Enable -freorder-blocks-and-partition

Message ID CAAe5K+XeggoZGd8OgdGhh49RF-=5JbBdOEg+j2ZyYBCno0GfdA@mail.gmail.com
State New
Headers show

Commit Message

Teresa Johnson Nov. 19, 2013, 2:50 p.m. UTC
This patch enables -freorder-blocks-and-partition by default for x86
at -O2 and up. It is showing some modest gains in cpu2006 performance
with profile feedback and -O2 on an Intel Westmere system. Specifically,
I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
(1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.

Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal
bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were
configured with --enable-languages=all,obj-c++ and tested for both
32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}".

It would be good to enable this for additional targets as a follow on,
but it needs more testing for both correctness and performance on those
other targets (i.e for correctness because I see a number of places
in other config/*/*.c files that do some special handling under this
option for different targets or simply disable it, so I am not sure
how well-tested it is under different architectural constraints).

Ok for trunk?

Thanks,
Teresa

2013-11-19  Teresa Johnson  <tejohnson@google.com>

        * common/config/i386/i386-common.c: Enable
        -freorder-blocks-and-partition at -O2 and up for x86.
        * opts.c (finish_options): Only warn if -freorder-blocks-and-
        partition was set on command line.

Comments

Jan Hubicka Nov. 19, 2013, 3:44 p.m. UTC | #1
Martin,
can you, please, generate the updated systemtap with
-freorder-blocks-and-partition enabled?

I am in favour of enabling this - it is usefull pass and it is pointless ot
have passes that are not enabled by default.
Is there reason why this would not work on other ELF target? Is it working
with Darwin and Windows?

> This patch enables -freorder-blocks-and-partition by default for x86
> at -O2 and up. It is showing some modest gains in cpu2006 performance
> with profile feedback and -O2 on an Intel Westmere system. Specifically,
> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.

This actually sounds very good ;)

Lets see how the systemtap graphs goes.  If we will end up with problem
of too many accesses to cold section, I would suggest making cold section
subdivided into .unlikely and .unlikely.part (we could have better name)
with the second consisting only of unlikely parts of hot&normal functions.

This should reduce the problems we are seeing with mistakely identifying
code to be cold because of roundoff errors (and it probably makes sense
in general, too).
We will however need to update gold and ld for that.
> 
> Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal
> bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were
> configured with --enable-languages=all,obj-c++ and tested for both
> 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}".
> 
> It would be good to enable this for additional targets as a follow on,
> but it needs more testing for both correctness and performance on those
> other targets (i.e for correctness because I see a number of places
> in other config/*/*.c files that do some special handling under this
> option for different targets or simply disable it, so I am not sure
> how well-tested it is under different architectural constraints).
> 
> Ok for trunk?
> 
> Thanks,
> Teresa
> 
> 2013-11-19  Teresa Johnson  <tejohnson@google.com>
> 
>         * common/config/i386/i386-common.c: Enable
>         -freorder-blocks-and-partition at -O2 and up for x86.
>         * opts.c (finish_options): Only warn if -freorder-blocks-and-
>         partition was set on command line.

You probably mis doc/invoke.texi update.
Thank you for working on this!

Honza
> 
> Index: common/config/i386/i386-common.c
> ===================================================================
> --- common/config/i386/i386-common.c    (revision 205001)
> +++ common/config/i386/i386-common.c    (working copy)
> @@ -789,6 +789,8 @@ static const struct default_options ix86_option_op
>    {
>      /* Enable redundant extension instructions removal at -O2 and higher.  */
>      { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> +    /* Enable function splitting at -O2 and higher.  */
> +    { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_and_partition, NULL, 1 },
>      /* Turn off -fschedule-insns by default.  It tends to make the
>         problem with not enough registers even worse.  */
>      { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> Index: opts.c
> ===================================================================
> --- opts.c      (revision 205001)
> +++ opts.c      (working copy)
> @@ -737,9 +737,10 @@ finish_options (struct gcc_options *opts, struct g
>        && opts->x_flag_reorder_blocks_and_partition
>        && (ui_except == UI_SJLJ || ui_except >= UI_TARGET))
>      {
> -      inform (loc,
> -             "-freorder-blocks-and-partition does not work "
> -             "with exceptions on this architecture");
> +      if (opts_set->x_flag_reorder_blocks_and_partition)
> +        inform (loc,
> +                "-freorder-blocks-and-partition does not work "
> +                "with exceptions on this architecture");
>        opts->x_flag_reorder_blocks_and_partition = 0;
>        opts->x_flag_reorder_blocks = 1;
>      }
> @@ -752,9 +753,10 @@ finish_options (struct gcc_options *opts, struct g
>        && opts->x_flag_reorder_blocks_and_partition
>        && (ui_except == UI_SJLJ || ui_except >= UI_TARGET))
>      {
> -      inform (loc,
> -             "-freorder-blocks-and-partition does not support "
> -             "unwind info on this architecture");
> +      if (opts_set->x_flag_reorder_blocks_and_partition)
> +        inform (loc,
> +                "-freorder-blocks-and-partition does not support "
> +                "unwind info on this architecture");
>        opts->x_flag_reorder_blocks_and_partition = 0;
>        opts->x_flag_reorder_blocks = 1;
>      }
> @@ -769,9 +771,10 @@ finish_options (struct gcc_options *opts, struct g
>               && targetm_common.unwind_tables_default
>               && (ui_except == UI_SJLJ || ui_except >= UI_TARGET))))
>      {
> -      inform (loc,
> -             "-freorder-blocks-and-partition does not work "
> -             "on this architecture");
> +      if (opts_set->x_flag_reorder_blocks_and_partition)
> +        inform (loc,
> +                "-freorder-blocks-and-partition does not work "
> +                "on this architecture");
>        opts->x_flag_reorder_blocks_and_partition = 0;
>        opts->x_flag_reorder_blocks = 1;
>      }
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Andi Kleen Nov. 19, 2013, 9 p.m. UTC | #2
Teresa Johnson <tejohnson@google.com> writes:

> This patch enables -freorder-blocks-and-partition by default for x86
> at -O2 and up. It is showing some modest gains in cpu2006 performance
> with profile feedback and -O2 on an Intel Westmere system. Specifically,
> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.

One thing that worries me is what this will do to profilers.

I had to hack some assembler code using out of line sections
to able to handle the libunwind based perf dwarf unwinder.

My understanding is that these out of line sections can be 
only described properly in dwarf3, and there's still some
dwarf2 based unwinder code around.

So this may cause problems with profiling and debugging.

It's probably still a good idea, just may need some extra
care.

-Andi
Teresa Johnson Nov. 19, 2013, 9:07 p.m. UTC | #3
On Tue, Nov 19, 2013 at 1:00 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Teresa Johnson <tejohnson@google.com> writes:
>
>> This patch enables -freorder-blocks-and-partition by default for x86
>> at -O2 and up. It is showing some modest gains in cpu2006 performance
>> with profile feedback and -O2 on an Intel Westmere system. Specifically,
>> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
>> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.
>
> One thing that worries me is what this will do to profilers.
>
> I had to hack some assembler code using out of line sections
> to able to handle the libunwind based perf dwarf unwinder.
>
> My understanding is that these out of line sections can be
> only described properly in dwarf3, and there's still some
> dwarf2 based unwinder code around.
>
> So this may cause problems with profiling and debugging.
>
> It's probably still a good idea, just may need some extra
> care.
>
> -Andi

Sri has approval for a patch that should address this by giving the
split cold sections a label. It should go in today as well:

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02143.html

Thanks,
Teresa
>
> --
> ak@linux.intel.com -- Speaking for myself only
diff mbox

Patch

Index: common/config/i386/i386-common.c
===================================================================
--- common/config/i386/i386-common.c    (revision 205001)
+++ common/config/i386/i386-common.c    (working copy)
@@ -789,6 +789,8 @@  static const struct default_options ix86_option_op
   {
     /* Enable redundant extension instructions removal at -O2 and higher.  */
     { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
+    /* Enable function splitting at -O2 and higher.  */
+    { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_and_partition, NULL, 1 },
     /* Turn off -fschedule-insns by default.  It tends to make the
        problem with not enough registers even worse.  */
     { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
Index: opts.c
===================================================================
--- opts.c      (revision 205001)
+++ opts.c      (working copy)
@@ -737,9 +737,10 @@  finish_options (struct gcc_options *opts, struct g
       && opts->x_flag_reorder_blocks_and_partition
       && (ui_except == UI_SJLJ || ui_except >= UI_TARGET))
     {
-      inform (loc,
-             "-freorder-blocks-and-partition does not work "
-             "with exceptions on this architecture");
+      if (opts_set->x_flag_reorder_blocks_and_partition)
+        inform (loc,
+                "-freorder-blocks-and-partition does not work "
+                "with exceptions on this architecture");
       opts->x_flag_reorder_blocks_and_partition = 0;
       opts->x_flag_reorder_blocks = 1;
     }
@@ -752,9 +753,10 @@  finish_options (struct gcc_options *opts, struct g
       && opts->x_flag_reorder_blocks_and_partition
       && (ui_except == UI_SJLJ || ui_except >= UI_TARGET))
     {
-      inform (loc,
-             "-freorder-blocks-and-partition does not support "
-             "unwind info on this architecture");
+      if (opts_set->x_flag_reorder_blocks_and_partition)
+        inform (loc,
+                "-freorder-blocks-and-partition does not support "
+                "unwind info on this architecture");
       opts->x_flag_reorder_blocks_and_partition = 0;
       opts->x_flag_reorder_blocks = 1;
     }
@@ -769,9 +771,10 @@  finish_options (struct gcc_options *opts, struct g
              && targetm_common.unwind_tables_default
              && (ui_except == UI_SJLJ || ui_except >= UI_TARGET))))
     {
-      inform (loc,
-             "-freorder-blocks-and-partition does not work "
-             "on this architecture");
+      if (opts_set->x_flag_reorder_blocks_and_partition)
+        inform (loc,
+                "-freorder-blocks-and-partition does not work "
+                "on this architecture");
       opts->x_flag_reorder_blocks_and_partition = 0;
       opts->x_flag_reorder_blocks = 1;
     }