Patchwork Type inheritance graph analysis & speculative devirtualization, part 7/7 (speculative devirtualizatoin)

login
register
mail settings
Submitter Jan Hubicka
Date Sept. 1, 2013, 1:57 p.m.
Message ID <20130901135714.GA23527@kam.mff.cuni.cz>
Download mbox | patch
Permalink /patch/271591/
State New
Headers show

Comments

Jan Hubicka - Sept. 1, 2013, 1:57 p.m.
Hi,
this patch implement speculative devirtualization.  It is a trivial pass that
asks for targets of every polymorphic call in a program and if the list
contains one likely target, it produces an speculative call. No context
sensitive analysis is done at the moment.  This call may or may not survive
into final program depending if we do somehting useful about the direct call.

The pass is currently disabled for LTO because
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01007.html is needed to properly
build type inheritance graph.

With LTO it is supposed to be effective on a premise that most types are not
escaping LTO unit and thus we see all possible targets.  Without LTO it makes
stronger assumption that you usually call only targets defined in current unit,
if any.

Path is suprisingly effective on Firefox:
105306 polymorphic calls, 0 devirtualized, 34258 speculatively devirtualized, 4056 cold
66875 have multiple targets, 0 overwritable, 0 already speculated (0 agree, 0 disagree), 0 not defined

So about 32% of calls are devirutalized.  By random checking, these can be
tracked to real occurences of code where virtual is used in a silly way.
I plan to introduce warning for that (I have code for that already since it
makes it easier to analyze what changes are really made and why).

Martin Liska rebuilt with FDO based indirect call resolution.  Here we get:
23928 indirect calls trained.
12837 (53.65%) have common target.
342 (1.43%) targets was not found.
8378 (35.01%) speculations seems useless.
4117 (17.21%) speculations produced.

I compared the overlap that is devirtualized by both techniques.  There is
almost 100% match, except that FDO code is not dealing well with thunks and
those 342 calls that seem to go out of libxul into plugins.  I will fix the
thunk issue later.

I also tested QT, where the numbers are smaller - only about 20% of devirtualized
calls, but largery things seems similar.

For non-LTO build, the devirtualization also seems sane, there seems to be
about 8% of miss rate on GCC bootstrap that seems acceptable. I tracked most of
those down into randomly included headers that do define derived types of a
given class that are unused in the current unit.  I think we can track this by
computing reachable functions in current unit and looking for vtables actually
used by construction.  One of such actually triggers undefined reference in
build of libstdc++ and therefore I added the check disabling devirtualization
to DECL_EXTERNAL for now.  It is because the libstdc++ header seems to have
explicit instantiation of a template that is never linked with.

I currently enabled the pass by default at -O2.  Based on the experience about
missrate, we may want to disable it for -O2 non-LTO if it shows to be too risky
on some codebases.

Bootstrapped/regtested x86_64-linux and ppc64-linux, also tested with lto bootstrap
with the LTO ODR code and tested on Firefox and QT builds. Will commit the patch
later today.

Comments are welcome.
Honza

	* common.opt (fdevirtualize-speculatively): New function.
	* invoke.texi (fdevirtualize-speculatively): Document.
	* ipa-devirt.c: Include ipa-inline.h
	(likely_target_p): New function.
	(ipa_devirt): New function.
	(gate_ipa_devirt): New function.
	(pass_data_ipa_devirt): New static var.
	(pass_ipa_devirt): Likewise.
	(make_pass_ipa_devirt): New function.
	* opts.c (default_options): Add OPT_fdevirtualize_speculatively.
	(common_handle_option): Disable devirtualization when
	value range profiling is available.
	* passes.def (pass_ipa_devirt): Add.
	* timever.def (TV_IPA_DEVIRT): New timevar.
	* tree-pass.h (make_pass_ipa_devirt):
Xinliang David Li - Sept. 1, 2013, 4:04 p.m.
Missing test cases?

Have you tested the optimization with SPEC2k and SPEC06? There are a
couple of benchmarks benefit greatly from devirtualization, such as
eon, povray etc.   I believe astar will probably improve with this
optimization at O2 (it has hot virtual functions that are not
overridden at all). For eon, the assumption at O2 for speculative
devirt may not work well.

thanks,

David

On Sun, Sep 1, 2013 at 6:57 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> this patch implement speculative devirtualization.  It is a trivial pass that
> asks for targets of every polymorphic call in a program and if the list
> contains one likely target, it produces an speculative call. No context
> sensitive analysis is done at the moment.  This call may or may not survive
> into final program depending if we do somehting useful about the direct call.
>
> The pass is currently disabled for LTO because
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01007.html is needed to properly
> build type inheritance graph.
>
> With LTO it is supposed to be effective on a premise that most types are not
> escaping LTO unit and thus we see all possible targets.  Without LTO it makes
> stronger assumption that you usually call only targets defined in current unit,
> if any.
>
> Path is suprisingly effective on Firefox:
> 105306 polymorphic calls, 0 devirtualized, 34258 speculatively devirtualized, 4056 cold
> 66875 have multiple targets, 0 overwritable, 0 already speculated (0 agree, 0 disagree), 0 not defined
>
> So about 32% of calls are devirutalized.  By random checking, these can be
> tracked to real occurences of code where virtual is used in a silly way.
> I plan to introduce warning for that (I have code for that already since it
> makes it easier to analyze what changes are really made and why).
>
> Martin Liska rebuilt with FDO based indirect call resolution.  Here we get:
> 23928 indirect calls trained.
> 12837 (53.65%) have common target.
> 342 (1.43%) targets was not found.
> 8378 (35.01%) speculations seems useless.
> 4117 (17.21%) speculations produced.
>
> I compared the overlap that is devirtualized by both techniques.  There is
> almost 100% match, except that FDO code is not dealing well with thunks and
> those 342 calls that seem to go out of libxul into plugins.  I will fix the
> thunk issue later.
>
> I also tested QT, where the numbers are smaller - only about 20% of devirtualized
> calls, but largery things seems similar.
>
> For non-LTO build, the devirtualization also seems sane, there seems to be
> about 8% of miss rate on GCC bootstrap that seems acceptable. I tracked most of
> those down into randomly included headers that do define derived types of a
> given class that are unused in the current unit.  I think we can track this by
> computing reachable functions in current unit and looking for vtables actually
> used by construction.  One of such actually triggers undefined reference in
> build of libstdc++ and therefore I added the check disabling devirtualization
> to DECL_EXTERNAL for now.  It is because the libstdc++ header seems to have
> explicit instantiation of a template that is never linked with.
>
> I currently enabled the pass by default at -O2.  Based on the experience about
> missrate, we may want to disable it for -O2 non-LTO if it shows to be too risky
> on some codebases.
>
> Bootstrapped/regtested x86_64-linux and ppc64-linux, also tested with lto bootstrap
> with the LTO ODR code and tested on Firefox and QT builds. Will commit the patch
> later today.
>
> Comments are welcome.
> Honza
>
>         * common.opt (fdevirtualize-speculatively): New function.
>         * invoke.texi (fdevirtualize-speculatively): Document.
>         * ipa-devirt.c: Include ipa-inline.h
>         (likely_target_p): New function.
>         (ipa_devirt): New function.
>         (gate_ipa_devirt): New function.
>         (pass_data_ipa_devirt): New static var.
>         (pass_ipa_devirt): Likewise.
>         (make_pass_ipa_devirt): New function.
>         * opts.c (default_options): Add OPT_fdevirtualize_speculatively.
>         (common_handle_option): Disable devirtualization when
>         value range profiling is available.
>         * passes.def (pass_ipa_devirt): Add.
>         * timever.def (TV_IPA_DEVIRT): New timevar.
>         * tree-pass.h (make_pass_ipa_devirt):
>
> Index: common.opt
> ===================================================================
> --- common.opt  (revision 202136)
> +++ common.opt  (working copy)
> @@ -1007,6 +1007,10 @@ fdevirtualize
>  Common Report Var(flag_devirtualize) Optimization
>  Try to convert virtual calls to direct ones.
>
> +fdevirtualize-speculatively
> +Common Report Var(flag_devirtualize_speculatively) Optimization
> +Perform speculative devirtualization
> +
>  fdiagnostics-show-location=
>  Common Joined RejectNegative Enum(diagnostic_prefixing_rule)
>  -fdiagnostics-show-location=[once|every-line]  How often to emit source location at the beginning of line-wrapped diagnostics
> @@ -1366,7 +1370,7 @@ Common RejectNegative Joined
>
>  fipa-cp
>  Common Report Var(flag_ipa_cp) Optimization
> -Perform Interprocedural constant propagation
> +Perform interprocedural constant propagation
>
>  fipa-cp-clone
>  Common Report Var(flag_ipa_cp_clone) Optimization
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi     (revision 202136)
> +++ doc/invoke.texi     (working copy)
> @@ -365,7 +365,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
>  -fcx-limited-range @gol
>  -fdata-sections -fdce -fdelayed-branch @gol
> --fdelete-null-pointer-checks -fdevirtualize -fdse @gol
> +-fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively -fdse @gol
>  -fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
>  -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol
>  -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
> @@ -6712,7 +6712,7 @@ also turns on the following optimization
>  -fcrossjumping @gol
>  -fcse-follow-jumps  -fcse-skip-blocks @gol
>  -fdelete-null-pointer-checks @gol
> --fdevirtualize @gol
> +-fdevirtualize -fdevirtualize-speculatively @gol
>  -fexpensive-optimizations @gol
>  -fgcse  -fgcse-lm  @gol
>  -fhoist-adjacent-loads @gol
> @@ -7257,6 +7257,15 @@ indirect inlining (@code{-findirect-inli
>  propagation (@option{-fipa-cp}).
>  Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.
>
> +@item -fdevirtualize-speculatively
> +@opindex fdevirtualize-speculatively
> +Attempt to convert calls to virtual functions to speculative direct calls.
> +Based on the analysis of the type inheritance graph, determine for a given call
> +the set of likely targets. If the set is small, preferably of size 1, change
> +the call into an conditional deciding on direct and indirect call.  The
> +speculative calls enable more optimizations, such as inlining.  When they seem
> +useless after further optimization, they are converted back into original form.
> +
>  @item -fexpensive-optimizations
>  @opindex fexpensive-optimizations
>  Perform a number of minor optimizations that are relatively expensive.
> Index: ipa-devirt.c
> ===================================================================
> --- ipa-devirt.c        (revision 202136)
> +++ ipa-devirt.c        (working copy)
> @@ -101,6 +101,8 @@ along with GCC; see the file COPYING3.
>
>    possible_polymorphic_call_targets returns, given an parameters found in
>    indirect polymorphic edge all possible polymorphic call targets of the call.
> +
> +  pass_ipa_devirt performs simple speculative devirtualization.
>  */
>
>  #include "config.h"
> @@ -116,6 +118,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-pretty-print.h"
>  #include "ipa-utils.h"
>  #include "gimple.h"
> +#include "ipa-inline.h"
>
>  /* Pointer set of all call targets appearing in the cache.  */
>  static pointer_set_t *cached_polymorphic_call_targets;
> @@ -728,4 +731,267 @@ update_type_inheritance_graph (void)
>        get_odr_type (method_class_type (TREE_TYPE (n->symbol.decl)), true);
>    timevar_pop (TV_IPA_INHERITANCE);
>  }
> +
> +
> +/* Return true if N looks like likely target of a polymorphic call.
> +   Rule out cxa_pure_virtual, noreturns, function declared cold and
> +   other obvious cases.  */
> +
> +bool
> +likely_target_p (struct cgraph_node *n)
> +{
> +  int flags;
> +  /* cxa_pure_virtual and similar things are not likely.  */
> +  if (TREE_CODE (TREE_TYPE (n->symbol.decl)) != METHOD_TYPE)
> +    return false;
> +  flags = flags_from_decl_or_type (n->symbol.decl);
> +  if (flags & ECF_NORETURN)
> +    return false;
> +  if (lookup_attribute ("cold",
> +                       DECL_ATTRIBUTES (n->symbol.decl)))
> +    return false;
> +  if (n->frequency < NODE_FREQUENCY_NORMAL)
> +    return false;
> +  return true;
> +}
> +
> +/* The ipa-devirt pass.
> +   This performs very trivial devirtualization:
> +     1) when polymorphic call is known to have precisely one target,
> +        turn it into direct call
> +     2) when polymorphic call has only one likely target in the unit,
> +        turn it into speculative call.  */
> +
> +static unsigned int
> +ipa_devirt (void)
> +{
> +  struct cgraph_node *n;
> +  struct pointer_set_t *bad_call_targets = pointer_set_create ();
> +  struct cgraph_edge *e;
> +
> +  int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0;
> +  int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0;
> +  int nwrong = 0, nok = 0, nexternal = 0;;
> +
> +  FOR_EACH_DEFINED_FUNCTION (n)
> +    {
> +      bool update = false;
> +      if (dump_file && n->indirect_calls)
> +       fprintf (dump_file, "\n\nProcesing function %s/%i\n",
> +                cgraph_node_name (n), n->symbol.order);
> +      for (e = n->indirect_calls; e; e = e->next_callee)
> +       if (e->indirect_info->polymorphic)
> +         {
> +           struct cgraph_node *likely_target = NULL;
> +           void *cache_token;
> +           bool final;
> +           vec <cgraph_node *>targets
> +              = possible_polymorphic_call_targets
> +                   (e, &final, &cache_token);
> +           unsigned int i;
> +
> +           if (dump_file)
> +             dump_possible_polymorphic_call_targets
> +               (dump_file, e);
> +           npolymorphic++;
> +
> +           if (final)
> +             {
> +               gcc_assert (targets.length());
> +               if (targets.length() == 1)
> +                 {
> +                   if (dump_file)
> +                     fprintf (dump_file,
> +                              "Devirtualizing call in %s/%i to %s/%i\n",
> +                              cgraph_node_name (n), n->symbol.order,
> +                              cgraph_node_name (targets[0]), targets[0]->symbol.order);
> +                   cgraph_make_edge_direct (e, targets[0]);
> +                   ndevirtualized++;
> +                   update = true;
> +                   continue;
> +                 }
> +             }
> +           if (!flag_devirtualize_speculatively)
> +             continue;
> +           if (!cgraph_maybe_hot_edge_p (e))
> +             {
> +               if (dump_file)
> +                 fprintf (dump_file, "Call is cold\n");
> +               ncold++;
> +               continue;
> +             }
> +           if (e->speculative)
> +             {
> +               if (dump_file)
> +                 fprintf (dump_file, "Call is aready speculated\n");
> +               nspeculated++;
> +
> +               /* When dumping see if we agree with speculation.  */
> +               if (!dump_file)
> +                 continue;
> +             }
> +           if (pointer_set_contains (bad_call_targets,
> +                                     cache_token))
> +             {
> +               if (dump_file)
> +                 fprintf (dump_file, "Target list is known to be useless\n");
> +               nmultiple++;
> +               continue;
> +             }
> +           for (i = 0; i < targets.length(); i++)
> +             if (likely_target_p (targets[i]))
> +               {
> +                 if (likely_target)
> +                   {
> +                     likely_target = NULL;
> +                     if (dump_file)
> +                       fprintf (dump_file, "More than one likely target\n");
> +                     nmultiple++;
> +                     break;
> +                   }
> +                 likely_target = targets[i];
> +               }
> +           if (!likely_target)
> +             {
> +               pointer_set_insert (bad_call_targets, cache_token);
> +               continue;
> +             }
> +           /* This is reached only when dumping; check if we agree or disagree
> +              with the speculation.  */
> +           if (e->speculative)
> +             {
> +               struct cgraph_edge *e2;
> +               struct ipa_ref *ref;
> +               cgraph_speculative_call_info (e, e2, e, ref);
> +               if (cgraph_function_or_thunk_node (e2->callee, NULL)
> +                   == cgraph_function_or_thunk_node (likely_target, NULL))
> +                 {
> +                   fprintf (dump_file, "We agree with speculation\n");
> +                   nok++;
> +                 }
> +               else
> +                 {
> +                   fprintf (dump_file, "We disagree with speculation\n");
> +                   nwrong++;
> +                 }
> +               continue;
> +             }
> +           if (!likely_target->symbol.definition)
> +             {
> +               if (dump_file)
> +                 fprintf (dump_file, "Target is not an definition\n");
> +               nnotdefined++;
> +               continue;
> +             }
> +           /* Do not introduce new references to external symbols.  While we
> +              can handle these just well, it is common for programs to
> +              incorrectly with headers defining methods they are linked
> +              with.  */
> +           if (DECL_EXTERNAL (likely_target->symbol.decl))
> +             {
> +               if (dump_file)
> +                 fprintf (dump_file, "Target is external\n");
> +               nexternal++;
> +               continue;
> +             }
> +           if (cgraph_function_body_availability (likely_target)
> +               <= AVAIL_OVERWRITABLE
> +               && symtab_can_be_discarded ((symtab_node) likely_target))
> +             {
> +               if (dump_file)
> +                 fprintf (dump_file, "Target is overwritable\n");
> +               noverwritable++;
> +               continue;
> +             }
> +           else
> +             {
> +               if (dump_file)
> +                 fprintf (dump_file,
> +                          "Speculatively devirtualizing call in %s/%i to %s/%i\n",
> +                          cgraph_node_name (n), n->symbol.order,
> +                          cgraph_node_name (likely_target),
> +                          likely_target->symbol.order);
> +               if (!symtab_can_be_discarded ((symtab_node) likely_target))
> +                 likely_target = cgraph (symtab_nonoverwritable_alias ((symtab_node)likely_target));
> +               nconverted++;
> +               update = true;
> +               cgraph_turn_edge_to_speculative
> +                 (e, likely_target, e->count * 8 / 10, e->frequency * 8 / 10);
> +             }
> +         }
> +      if (update)
> +       inline_update_overall_summary (n);
> +    }
> +  pointer_set_destroy (bad_call_targets);
> +
> +  if (dump_file)
> +    fprintf (dump_file,
> +            "%i polymorphic calls, %i devirtualized,"
> +            " %i speculatively devirtualized, %i cold\n"
> +            "%i have multiple targets, %i overwritable,"
> +            " %i already speculated (%i agree, %i disagree),"
> +            " %i external, %i not defined\n",
> +            npolymorphic, ndevirtualized, nconverted, ncold,
> +            nmultiple, noverwritable, nspeculated, nok, nwrong,
> +            nexternal, nnotdefined);
> +  return ndevirtualized ? TODO_remove_functions : 0;
> +}
> +
> +/* Gate for IPCP optimization.  */
> +
> +static bool
> +gate_ipa_devirt (void)
> +{
> +  /* FIXME: We should remove the optimize check after we ensure we never run
> +     IPA passes when not optimizing.  */
> +  return (flag_devirtualize || flag_devirtualize_speculatively) && !in_lto_p;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_ipa_devirt =
> +{
> +  IPA_PASS, /* type */
> +  "devirt", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  true, /* has_gate */
> +  true, /* has_execute */
> +  TV_IPA_DEVIRT, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  ( TODO_dump_symtab ), /* todo_flags_finish */
> +};
> +
> +class pass_ipa_devirt : public ipa_opt_pass_d
> +{
> +public:
> +  pass_ipa_devirt(gcc::context *ctxt)
> +    : ipa_opt_pass_d(pass_data_ipa_devirt, ctxt,
> +                    NULL, /* generate_summary */
> +                    NULL, /* write_summary */
> +                    NULL, /* read_summary */
> +                    NULL, /* write_optimization_summary */
> +                    NULL, /* read_optimization_summary */
> +                    NULL, /* stmt_fixup */
> +                    0, /* function_transform_todo_flags_start */
> +                    NULL, /* function_transform */
> +                    NULL) /* variable_transform */
> +  {}
> +
> +  /* opt_pass methods: */
> +  bool gate () { return gate_ipa_devirt (); }
> +  unsigned int execute () { return ipa_devirt (); }
> +
> +}; // class pass_ipa_devirt
> +
> +} // anon namespace
> +
> +ipa_opt_pass_d *
> +make_pass_ipa_devirt (gcc::context *ctxt)
> +{
> +  return new pass_ipa_devirt (ctxt);
> +}
> +
>  #include "gt-ipa-devirt.h"
> Index: opts.c
> ===================================================================
> --- opts.c      (revision 202136)
> +++ opts.c      (working copy)
> @@ -479,6 +479,7 @@ static const struct default_options defa
>      { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
>      { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 },
>      { OPT_LEVELS_2_PLUS, OPT_fdevirtualize, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS, OPT_fdevirtualize_speculatively, NULL, 1 },
>      { OPT_LEVELS_2_PLUS, OPT_fipa_sra, NULL, 1 },
>      { OPT_LEVELS_2_PLUS, OPT_falign_loops, NULL, 1 },
>      { OPT_LEVELS_2_PLUS, OPT_falign_jumps, NULL, 1 },
> @@ -1665,6 +1666,11 @@ common_handle_option (struct gcc_options
>         opts->x_flag_vect_cost_model = value;
>        if (!opts_set->x_flag_tree_loop_distribute_patterns)
>         opts->x_flag_tree_loop_distribute_patterns = value;
> +      /* Indirect call profiling should do all useful transformations
> +        speculative devirutalization does.  */
> +      if (!opts_set->x_flag_devirtualize_speculatively
> +         && opts->x_flag_value_profile_transformations)
> +       opts->x_flag_devirtualize_speculatively = false;
>        break;
>
>      case OPT_fprofile_generate_:
> Index: passes.def
> ===================================================================
> --- passes.def  (revision 202136)
> +++ passes.def  (working copy)
> @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3.
>    INSERT_PASSES_AFTER (all_regular_ipa_passes)
>    NEXT_PASS (pass_ipa_whole_program_visibility);
>    NEXT_PASS (pass_ipa_profile);
> +  NEXT_PASS (pass_ipa_devirt);
>    NEXT_PASS (pass_ipa_cp);
>    NEXT_PASS (pass_ipa_cdtor_merge);
>    NEXT_PASS (pass_ipa_inline);
> Index: timevar.def
> ===================================================================
> --- timevar.def (revision 202136)
> +++ timevar.def (working copy)
> @@ -66,6 +66,7 @@ DEFTIMEVAR (TV_CGRAPH                , "
>  DEFTIMEVAR (TV_CGRAPHOPT             , "callgraph optimization")
>  DEFTIMEVAR (TV_IPA_INHERITANCE       , "ipa inheritance graph")
>  DEFTIMEVAR (TV_IPA_VIRTUAL_CALL      , "ipa virtual call target")
> +DEFTIMEVAR (TV_IPA_DEVIRT           , "ipa devirtualization")
>  DEFTIMEVAR (TV_IPA_CONSTANT_PROP     , "ipa cp")
>  DEFTIMEVAR (TV_IPA_INLINING          , "ipa inlining heuristics")
>  DEFTIMEVAR (TV_IPA_FNSPLIT           , "ipa function splitting")
> Index: tree-pass.h
> ===================================================================
> --- tree-pass.h (revision 202136)
> +++ tree-pass.h (working copy)
> @@ -468,6 +468,7 @@ extern simple_ipa_opt_pass *make_pass_ip
>  extern simple_ipa_opt_pass *make_pass_ipa_free_inline_summary (gcc::context
>                                                                *ctxt);
>  extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt);
> +extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt);
>  extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
>  extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
>  extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
Jan Hubicka - Sept. 3, 2013, 4:24 p.m.
> What is the footprint impact of speculative devirtualization?

It is less than 2% of text section and once we solve problems with ipa-prop tracking,
I hope it will be less.

I hope we now understand better how to devirtualize and I think we can improve non-speculative
devirt noticeably.

> 
> Firefox is sensitive to footprint increases, especially on Android startup.

That is what Martin's patch is expected to solve.  With his ordering only
fraction of the text and rodata section needs to be preloaded.  We are still
chasing some issues there, but it seems that "only" about 10MB is needed
(out of 60MB binary).  Not accounting dynamic linking of data.rel.ro that
adds around another 5MB.  It depends on FDO though.

Honza
H.J. Lu - Nov. 14, 2014, 6:42 p.m.
On Sun, Sep 1, 2013 at 6:57 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> this patch implement speculative devirtualization.  It is a trivial pass that
> asks for targets of every polymorphic call in a program and if the list
> contains one likely target, it produces an speculative call. No context
> sensitive analysis is done at the moment.  This call may or may not survive
> into final program depending if we do somehting useful about the direct call.
>
> The pass is currently disabled for LTO because
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01007.html is needed to properly
> build type inheritance graph.
>
> With LTO it is supposed to be effective on a premise that most types are not
> escaping LTO unit and thus we see all possible targets.  Without LTO it makes
> stronger assumption that you usually call only targets defined in current unit,
> if any.
>
> Path is suprisingly effective on Firefox:
> 105306 polymorphic calls, 0 devirtualized, 34258 speculatively devirtualized, 4056 cold
> 66875 have multiple targets, 0 overwritable, 0 already speculated (0 agree, 0 disagree), 0 not defined
>
> So about 32% of calls are devirutalized.  By random checking, these can be
> tracked to real occurences of code where virtual is used in a silly way.
> I plan to introduce warning for that (I have code for that already since it
> makes it easier to analyze what changes are really made and why).
>
> Martin Liska rebuilt with FDO based indirect call resolution.  Here we get:
> 23928 indirect calls trained.
> 12837 (53.65%) have common target.
> 342 (1.43%) targets was not found.
> 8378 (35.01%) speculations seems useless.
> 4117 (17.21%) speculations produced.
>
> I compared the overlap that is devirtualized by both techniques.  There is
> almost 100% match, except that FDO code is not dealing well with thunks and
> those 342 calls that seem to go out of libxul into plugins.  I will fix the
> thunk issue later.
>
> I also tested QT, where the numbers are smaller - only about 20% of devirtualized
> calls, but largery things seems similar.
>
> For non-LTO build, the devirtualization also seems sane, there seems to be
> about 8% of miss rate on GCC bootstrap that seems acceptable. I tracked most of
> those down into randomly included headers that do define derived types of a
> given class that are unused in the current unit.  I think we can track this by
> computing reachable functions in current unit and looking for vtables actually
> used by construction.  One of such actually triggers undefined reference in
> build of libstdc++ and therefore I added the check disabling devirtualization
> to DECL_EXTERNAL for now.  It is because the libstdc++ header seems to have
> explicit instantiation of a template that is never linked with.
>
> I currently enabled the pass by default at -O2.  Based on the experience about
> missrate, we may want to disable it for -O2 non-LTO if it shows to be too risky
> on some codebases.
>
> Bootstrapped/regtested x86_64-linux and ppc64-linux, also tested with lto bootstrap
> with the LTO ODR code and tested on Firefox and QT builds. Will commit the patch
> later today.
>
> Comments are welcome.
> Honza
>
>         * common.opt (fdevirtualize-speculatively): New function.
>         * invoke.texi (fdevirtualize-speculatively): Document.
>         * ipa-devirt.c: Include ipa-inline.h
>         (likely_target_p): New function.
>         (ipa_devirt): New function.
>         (gate_ipa_devirt): New function.
>         (pass_data_ipa_devirt): New static var.
>         (pass_ipa_devirt): Likewise.
>         (make_pass_ipa_devirt): New function.
>         * opts.c (default_options): Add OPT_fdevirtualize_speculatively.
>         (common_handle_option): Disable devirtualization when
>         value range profiling is available.
>         * passes.def (pass_ipa_devirt): Add.
>         * timever.def (TV_IPA_DEVIRT): New timevar.
>         * tree-pass.h (make_pass_ipa_devirt):
>

This triggered:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63814

H.J.

Patch

Index: common.opt
===================================================================
--- common.opt	(revision 202136)
+++ common.opt	(working copy)
@@ -1007,6 +1007,10 @@  fdevirtualize
 Common Report Var(flag_devirtualize) Optimization
 Try to convert virtual calls to direct ones.
 
+fdevirtualize-speculatively
+Common Report Var(flag_devirtualize_speculatively) Optimization
+Perform speculative devirtualization
+
 fdiagnostics-show-location=
 Common Joined RejectNegative Enum(diagnostic_prefixing_rule)
 -fdiagnostics-show-location=[once|every-line]	How often to emit source location at the beginning of line-wrapped diagnostics
@@ -1366,7 +1370,7 @@  Common RejectNegative Joined
 
 fipa-cp
 Common Report Var(flag_ipa_cp) Optimization
-Perform Interprocedural constant propagation
+Perform interprocedural constant propagation
 
 fipa-cp-clone
 Common Report Var(flag_ipa_cp_clone) Optimization
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 202136)
+++ doc/invoke.texi	(working copy)
@@ -365,7 +365,7 @@  Objective-C and Objective-C++ Dialects}.
 -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
 -fcx-limited-range @gol
 -fdata-sections -fdce -fdelayed-branch @gol
--fdelete-null-pointer-checks -fdevirtualize -fdse @gol
+-fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively -fdse @gol
 -fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol
 -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol
 -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
@@ -6712,7 +6712,7 @@  also turns on the following optimization
 -fcrossjumping @gol
 -fcse-follow-jumps  -fcse-skip-blocks @gol
 -fdelete-null-pointer-checks @gol
--fdevirtualize @gol
+-fdevirtualize -fdevirtualize-speculatively @gol
 -fexpensive-optimizations @gol
 -fgcse  -fgcse-lm  @gol
 -fhoist-adjacent-loads @gol
@@ -7257,6 +7257,15 @@  indirect inlining (@code{-findirect-inli
 propagation (@option{-fipa-cp}).
 Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.
 
+@item -fdevirtualize-speculatively
+@opindex fdevirtualize-speculatively
+Attempt to convert calls to virtual functions to speculative direct calls.
+Based on the analysis of the type inheritance graph, determine for a given call
+the set of likely targets. If the set is small, preferably of size 1, change
+the call into an conditional deciding on direct and indirect call.  The
+speculative calls enable more optimizations, such as inlining.  When they seem
+useless after further optimization, they are converted back into original form.
+
 @item -fexpensive-optimizations
 @opindex fexpensive-optimizations
 Perform a number of minor optimizations that are relatively expensive.
Index: ipa-devirt.c
===================================================================
--- ipa-devirt.c	(revision 202136)
+++ ipa-devirt.c	(working copy)
@@ -101,6 +101,8 @@  along with GCC; see the file COPYING3.
 
   possible_polymorphic_call_targets returns, given an parameters found in
   indirect polymorphic edge all possible polymorphic call targets of the call.
+
+  pass_ipa_devirt performs simple speculative devirtualization.
 */
 
 #include "config.h"
@@ -116,6 +118,7 @@  along with GCC; see the file COPYING3.
 #include "tree-pretty-print.h"
 #include "ipa-utils.h"
 #include "gimple.h"
+#include "ipa-inline.h"
 
 /* Pointer set of all call targets appearing in the cache.  */
 static pointer_set_t *cached_polymorphic_call_targets;
@@ -728,4 +731,267 @@  update_type_inheritance_graph (void)
       get_odr_type (method_class_type (TREE_TYPE (n->symbol.decl)), true);
   timevar_pop (TV_IPA_INHERITANCE);
 }
+
+
+/* Return true if N looks like likely target of a polymorphic call.
+   Rule out cxa_pure_virtual, noreturns, function declared cold and
+   other obvious cases.  */
+
+bool
+likely_target_p (struct cgraph_node *n)
+{
+  int flags;
+  /* cxa_pure_virtual and similar things are not likely.  */
+  if (TREE_CODE (TREE_TYPE (n->symbol.decl)) != METHOD_TYPE)
+    return false;
+  flags = flags_from_decl_or_type (n->symbol.decl);
+  if (flags & ECF_NORETURN)
+    return false;
+  if (lookup_attribute ("cold",
+			DECL_ATTRIBUTES (n->symbol.decl)))
+    return false;
+  if (n->frequency < NODE_FREQUENCY_NORMAL)
+    return false;
+  return true;
+}
+
+/* The ipa-devirt pass.
+   This performs very trivial devirtualization:
+     1) when polymorphic call is known to have precisely one target,
+        turn it into direct call
+     2) when polymorphic call has only one likely target in the unit,
+        turn it into speculative call.  */
+
+static unsigned int
+ipa_devirt (void)
+{
+  struct cgraph_node *n;
+  struct pointer_set_t *bad_call_targets = pointer_set_create ();
+  struct cgraph_edge *e;
+
+  int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0;
+  int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0;
+  int nwrong = 0, nok = 0, nexternal = 0;;
+
+  FOR_EACH_DEFINED_FUNCTION (n)
+    {	
+      bool update = false;
+      if (dump_file && n->indirect_calls)
+	fprintf (dump_file, "\n\nProcesing function %s/%i\n",
+		 cgraph_node_name (n), n->symbol.order);
+      for (e = n->indirect_calls; e; e = e->next_callee)
+	if (e->indirect_info->polymorphic)
+	  {
+	    struct cgraph_node *likely_target = NULL;
+	    void *cache_token;
+	    bool final;
+	    vec <cgraph_node *>targets
+	       = possible_polymorphic_call_targets
+		    (e, &final, &cache_token);
+	    unsigned int i;
+
+	    if (dump_file)
+	      dump_possible_polymorphic_call_targets 
+		(dump_file, e);
+	    npolymorphic++;
+
+	    if (final)
+	      {
+		gcc_assert (targets.length());
+		if (targets.length() == 1)
+		  {
+		    if (dump_file)
+		      fprintf (dump_file,
+			       "Devirtualizing call in %s/%i to %s/%i\n",
+			       cgraph_node_name (n), n->symbol.order,
+			       cgraph_node_name (targets[0]), targets[0]->symbol.order);
+		    cgraph_make_edge_direct (e, targets[0]);
+		    ndevirtualized++;
+		    update = true;
+		    continue;
+		  }
+	      }
+	    if (!flag_devirtualize_speculatively)
+	      continue;
+	    if (!cgraph_maybe_hot_edge_p (e))
+	      {
+		if (dump_file)
+		  fprintf (dump_file, "Call is cold\n");
+		ncold++;
+		continue;
+	      }
+	    if (e->speculative)
+	      {
+		if (dump_file)
+		  fprintf (dump_file, "Call is aready speculated\n");
+		nspeculated++;
+
+		/* When dumping see if we agree with speculation.  */
+		if (!dump_file)
+		  continue;
+	      }
+	    if (pointer_set_contains (bad_call_targets,
+				      cache_token))
+	      {
+		if (dump_file)
+		  fprintf (dump_file, "Target list is known to be useless\n");
+		nmultiple++;
+		continue;
+	      }
+	    for (i = 0; i < targets.length(); i++)
+	      if (likely_target_p (targets[i]))
+		{
+		  if (likely_target)
+		    {
+		      likely_target = NULL;
+		      if (dump_file)
+			fprintf (dump_file, "More than one likely target\n");
+		      nmultiple++;
+		      break;
+		    }
+		  likely_target = targets[i];
+		}
+	    if (!likely_target)
+	      {
+		pointer_set_insert (bad_call_targets, cache_token);
+	        continue;
+	      }
+	    /* This is reached only when dumping; check if we agree or disagree
+ 	       with the speculation.  */
+	    if (e->speculative)
+	      {
+		struct cgraph_edge *e2;
+		struct ipa_ref *ref;
+		cgraph_speculative_call_info (e, e2, e, ref);
+		if (cgraph_function_or_thunk_node (e2->callee, NULL)
+		    == cgraph_function_or_thunk_node (likely_target, NULL))
+		  {
+		    fprintf (dump_file, "We agree with speculation\n");
+		    nok++;
+		  }
+		else
+		  {
+		    fprintf (dump_file, "We disagree with speculation\n");
+		    nwrong++;
+		  }
+		continue;
+	      }
+	    if (!likely_target->symbol.definition)
+	      {
+		if (dump_file)
+		  fprintf (dump_file, "Target is not an definition\n");
+		nnotdefined++;
+		continue;
+	      }
+	    /* Do not introduce new references to external symbols.  While we
+	       can handle these just well, it is common for programs to
+	       incorrectly with headers defining methods they are linked
+	       with.  */
+	    if (DECL_EXTERNAL (likely_target->symbol.decl))
+	      {
+		if (dump_file)
+		  fprintf (dump_file, "Target is external\n");
+		nexternal++;
+		continue;
+	      }
+	    if (cgraph_function_body_availability (likely_target)
+		<= AVAIL_OVERWRITABLE
+		&& symtab_can_be_discarded ((symtab_node) likely_target))
+	      {
+		if (dump_file)
+		  fprintf (dump_file, "Target is overwritable\n");
+		noverwritable++;
+		continue;
+	      }
+	    else
+	      {
+		if (dump_file)
+		  fprintf (dump_file,
+			   "Speculatively devirtualizing call in %s/%i to %s/%i\n",
+			   cgraph_node_name (n), n->symbol.order,
+			   cgraph_node_name (likely_target),
+			   likely_target->symbol.order);
+		if (!symtab_can_be_discarded ((symtab_node) likely_target))
+		  likely_target = cgraph (symtab_nonoverwritable_alias ((symtab_node)likely_target));
+		nconverted++;
+		update = true;
+		cgraph_turn_edge_to_speculative
+		  (e, likely_target, e->count * 8 / 10, e->frequency * 8 / 10);
+	      }
+	  }
+      if (update)
+	inline_update_overall_summary (n);
+    }
+  pointer_set_destroy (bad_call_targets);
+
+  if (dump_file)
+    fprintf (dump_file,
+	     "%i polymorphic calls, %i devirtualized,"
+	     " %i speculatively devirtualized, %i cold\n"
+	     "%i have multiple targets, %i overwritable,"
+	     " %i already speculated (%i agree, %i disagree),"
+	     " %i external, %i not defined\n",
+	     npolymorphic, ndevirtualized, nconverted, ncold,
+	     nmultiple, noverwritable, nspeculated, nok, nwrong,
+	     nexternal, nnotdefined);
+  return ndevirtualized ? TODO_remove_functions : 0;
+}
+
+/* Gate for IPCP optimization.  */
+
+static bool
+gate_ipa_devirt (void)
+{
+  /* FIXME: We should remove the optimize check after we ensure we never run
+     IPA passes when not optimizing.  */
+  return (flag_devirtualize || flag_devirtualize_speculatively) && !in_lto_p;
+}
+
+namespace {
+
+const pass_data pass_data_ipa_devirt =
+{
+  IPA_PASS, /* type */
+  "devirt", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  true, /* has_gate */
+  true, /* has_execute */
+  TV_IPA_DEVIRT, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  ( TODO_dump_symtab ), /* todo_flags_finish */
+};
+
+class pass_ipa_devirt : public ipa_opt_pass_d
+{
+public:
+  pass_ipa_devirt(gcc::context *ctxt)
+    : ipa_opt_pass_d(pass_data_ipa_devirt, ctxt,
+		     NULL, /* generate_summary */
+		     NULL, /* write_summary */
+		     NULL, /* read_summary */
+		     NULL, /* write_optimization_summary */
+		     NULL, /* read_optimization_summary */
+		     NULL, /* stmt_fixup */
+		     0, /* function_transform_todo_flags_start */
+		     NULL, /* function_transform */
+		     NULL) /* variable_transform */
+  {}
+
+  /* opt_pass methods: */
+  bool gate () { return gate_ipa_devirt (); }
+  unsigned int execute () { return ipa_devirt (); }
+
+}; // class pass_ipa_devirt
+
+} // anon namespace
+
+ipa_opt_pass_d *
+make_pass_ipa_devirt (gcc::context *ctxt)
+{
+  return new pass_ipa_devirt (ctxt);
+}
+
 #include "gt-ipa-devirt.h"
Index: opts.c
===================================================================
--- opts.c	(revision 202136)
+++ opts.c	(working copy)
@@ -479,6 +479,7 @@  static const struct default_options defa
     { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fdevirtualize, NULL, 1 },
+    { OPT_LEVELS_2_PLUS, OPT_fdevirtualize_speculatively, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fipa_sra, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_falign_loops, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_falign_jumps, NULL, 1 },
@@ -1665,6 +1666,11 @@  common_handle_option (struct gcc_options
 	opts->x_flag_vect_cost_model = value;
       if (!opts_set->x_flag_tree_loop_distribute_patterns)
 	opts->x_flag_tree_loop_distribute_patterns = value;
+      /* Indirect call profiling should do all useful transformations
+ 	 speculative devirutalization does.  */
+      if (!opts_set->x_flag_devirtualize_speculatively
+	  && opts->x_flag_value_profile_transformations)
+	opts->x_flag_devirtualize_speculatively = false;
       break;
 
     case OPT_fprofile_generate_:
Index: passes.def
===================================================================
--- passes.def	(revision 202136)
+++ passes.def	(working copy)
@@ -104,6 +104,7 @@  along with GCC; see the file COPYING3.
   INSERT_PASSES_AFTER (all_regular_ipa_passes)
   NEXT_PASS (pass_ipa_whole_program_visibility);
   NEXT_PASS (pass_ipa_profile);
+  NEXT_PASS (pass_ipa_devirt);
   NEXT_PASS (pass_ipa_cp);
   NEXT_PASS (pass_ipa_cdtor_merge);
   NEXT_PASS (pass_ipa_inline);
Index: timevar.def
===================================================================
--- timevar.def	(revision 202136)
+++ timevar.def	(working copy)
@@ -66,6 +66,7 @@  DEFTIMEVAR (TV_CGRAPH                , "
 DEFTIMEVAR (TV_CGRAPHOPT             , "callgraph optimization")
 DEFTIMEVAR (TV_IPA_INHERITANCE       , "ipa inheritance graph")
 DEFTIMEVAR (TV_IPA_VIRTUAL_CALL      , "ipa virtual call target")
+DEFTIMEVAR (TV_IPA_DEVIRT	     , "ipa devirtualization")
 DEFTIMEVAR (TV_IPA_CONSTANT_PROP     , "ipa cp")
 DEFTIMEVAR (TV_IPA_INLINING          , "ipa inlining heuristics")
 DEFTIMEVAR (TV_IPA_FNSPLIT           , "ipa function splitting")
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 202136)
+++ tree-pass.h	(working copy)
@@ -468,6 +468,7 @@  extern simple_ipa_opt_pass *make_pass_ip
 extern simple_ipa_opt_pass *make_pass_ipa_free_inline_summary (gcc::context
 							       *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt);
+extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);