From patchwork Sun Sep 1 13:57:14 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 271591 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "www.sourceware.org", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 1591B2C0089 for ; Sun, 1 Sep 2013 23:57:36 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=I/jmG4g8L3A7O0GgtTOr+Tr5fIEU1NHpFOvxixVtFQSFI/iyDV+Hp UcXT7tozZd5LB4phVs6YgMy5/zu87O+dVDBeirIHVOmiN9B9ZMYGH4l1HsmNq450 2nhcCmdzreG9PpK3flBQyhhzqJ3gpjZ9PtBOHDD4mdNI5HrpiVLnpA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=dwQ8UTJaOfgX/Mx56BvbQ3AWqNw=; b=ofopLPI0rj+wbfojfHso ZMtrPQNSapyYQp0hF5rRnjOHU4FlVp1tMsAc6ytmycY+7laAWhyliklWV9hyUW9L jwUVwhot6YOEOi9ED6sX+jl0Vn6/UN+66OnfO97ccr4Rmq/wSA+E8eSzpMUw90bW B1aLMzFrF9AIu3hlzvOpqdM= Received: (qmail 6217 invoked by alias); 1 Sep 2013 13:57:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 6203 invoked by uid 89); 1 Sep 2013 13:57:28 -0000 Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Sun, 01 Sep 2013 13:57:28 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL, BAYES_40, NO_RELAYS autolearn=ham version=3.3.2 X-HELO: nikam.ms.mff.cuni.cz Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id EAF6B541D90; Sun, 1 Sep 2013 15:57:14 +0200 (CEST) Date: Sun, 1 Sep 2013 15:57:14 +0200 From: Jan Hubicka To: gcc-patches@gcc.gnu.org, jason@redhat.com, Nathan Froyd , tglek@mozilla.com, glandium@mozilla.com, marxin.liska@gmail.com Subject: Type inheritance graph analysis & speculative devirtualization, part 7/7 (speculative devirtualizatoin) Message-ID: <20130901135714.GA23527@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Hi, this patch implement speculative devirtualization. It is a trivial pass that asks for targets of every polymorphic call in a program and if the list contains one likely target, it produces an speculative call. No context sensitive analysis is done at the moment. This call may or may not survive into final program depending if we do somehting useful about the direct call. The pass is currently disabled for LTO because http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01007.html is needed to properly build type inheritance graph. With LTO it is supposed to be effective on a premise that most types are not escaping LTO unit and thus we see all possible targets. Without LTO it makes stronger assumption that you usually call only targets defined in current unit, if any. Path is suprisingly effective on Firefox: 105306 polymorphic calls, 0 devirtualized, 34258 speculatively devirtualized, 4056 cold 66875 have multiple targets, 0 overwritable, 0 already speculated (0 agree, 0 disagree), 0 not defined So about 32% of calls are devirutalized. By random checking, these can be tracked to real occurences of code where virtual is used in a silly way. I plan to introduce warning for that (I have code for that already since it makes it easier to analyze what changes are really made and why). Martin Liska rebuilt with FDO based indirect call resolution. Here we get: 23928 indirect calls trained. 12837 (53.65%) have common target. 342 (1.43%) targets was not found. 8378 (35.01%) speculations seems useless. 4117 (17.21%) speculations produced. I compared the overlap that is devirtualized by both techniques. There is almost 100% match, except that FDO code is not dealing well with thunks and those 342 calls that seem to go out of libxul into plugins. I will fix the thunk issue later. I also tested QT, where the numbers are smaller - only about 20% of devirtualized calls, but largery things seems similar. For non-LTO build, the devirtualization also seems sane, there seems to be about 8% of miss rate on GCC bootstrap that seems acceptable. I tracked most of those down into randomly included headers that do define derived types of a given class that are unused in the current unit. I think we can track this by computing reachable functions in current unit and looking for vtables actually used by construction. One of such actually triggers undefined reference in build of libstdc++ and therefore I added the check disabling devirtualization to DECL_EXTERNAL for now. It is because the libstdc++ header seems to have explicit instantiation of a template that is never linked with. I currently enabled the pass by default at -O2. Based on the experience about missrate, we may want to disable it for -O2 non-LTO if it shows to be too risky on some codebases. Bootstrapped/regtested x86_64-linux and ppc64-linux, also tested with lto bootstrap with the LTO ODR code and tested on Firefox and QT builds. Will commit the patch later today. Comments are welcome. Honza * common.opt (fdevirtualize-speculatively): New function. * invoke.texi (fdevirtualize-speculatively): Document. * ipa-devirt.c: Include ipa-inline.h (likely_target_p): New function. (ipa_devirt): New function. (gate_ipa_devirt): New function. (pass_data_ipa_devirt): New static var. (pass_ipa_devirt): Likewise. (make_pass_ipa_devirt): New function. * opts.c (default_options): Add OPT_fdevirtualize_speculatively. (common_handle_option): Disable devirtualization when value range profiling is available. * passes.def (pass_ipa_devirt): Add. * timever.def (TV_IPA_DEVIRT): New timevar. * tree-pass.h (make_pass_ipa_devirt): Index: common.opt =================================================================== --- common.opt (revision 202136) +++ common.opt (working copy) @@ -1007,6 +1007,10 @@ fdevirtualize Common Report Var(flag_devirtualize) Optimization Try to convert virtual calls to direct ones. +fdevirtualize-speculatively +Common Report Var(flag_devirtualize_speculatively) Optimization +Perform speculative devirtualization + fdiagnostics-show-location= Common Joined RejectNegative Enum(diagnostic_prefixing_rule) -fdiagnostics-show-location=[once|every-line] How often to emit source location at the beginning of line-wrapped diagnostics @@ -1366,7 +1370,7 @@ Common RejectNegative Joined fipa-cp Common Report Var(flag_ipa_cp) Optimization -Perform Interprocedural constant propagation +Perform interprocedural constant propagation fipa-cp-clone Common Report Var(flag_ipa_cp_clone) Optimization Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 202136) +++ doc/invoke.texi (working copy) @@ -365,7 +365,7 @@ Objective-C and Objective-C++ Dialects}. -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol --fdelete-null-pointer-checks -fdevirtualize -fdse @gol +-fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively -fdse @gol -fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol @@ -6712,7 +6712,7 @@ also turns on the following optimization -fcrossjumping @gol -fcse-follow-jumps -fcse-skip-blocks @gol -fdelete-null-pointer-checks @gol --fdevirtualize @gol +-fdevirtualize -fdevirtualize-speculatively @gol -fexpensive-optimizations @gol -fgcse -fgcse-lm @gol -fhoist-adjacent-loads @gol @@ -7257,6 +7257,15 @@ indirect inlining (@code{-findirect-inli propagation (@option{-fipa-cp}). Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}. +@item -fdevirtualize-speculatively +@opindex fdevirtualize-speculatively +Attempt to convert calls to virtual functions to speculative direct calls. +Based on the analysis of the type inheritance graph, determine for a given call +the set of likely targets. If the set is small, preferably of size 1, change +the call into an conditional deciding on direct and indirect call. The +speculative calls enable more optimizations, such as inlining. When they seem +useless after further optimization, they are converted back into original form. + @item -fexpensive-optimizations @opindex fexpensive-optimizations Perform a number of minor optimizations that are relatively expensive. Index: ipa-devirt.c =================================================================== --- ipa-devirt.c (revision 202136) +++ ipa-devirt.c (working copy) @@ -101,6 +101,8 @@ along with GCC; see the file COPYING3. possible_polymorphic_call_targets returns, given an parameters found in indirect polymorphic edge all possible polymorphic call targets of the call. + + pass_ipa_devirt performs simple speculative devirtualization. */ #include "config.h" @@ -116,6 +118,7 @@ along with GCC; see the file COPYING3. #include "tree-pretty-print.h" #include "ipa-utils.h" #include "gimple.h" +#include "ipa-inline.h" /* Pointer set of all call targets appearing in the cache. */ static pointer_set_t *cached_polymorphic_call_targets; @@ -728,4 +731,267 @@ update_type_inheritance_graph (void) get_odr_type (method_class_type (TREE_TYPE (n->symbol.decl)), true); timevar_pop (TV_IPA_INHERITANCE); } + + +/* Return true if N looks like likely target of a polymorphic call. + Rule out cxa_pure_virtual, noreturns, function declared cold and + other obvious cases. */ + +bool +likely_target_p (struct cgraph_node *n) +{ + int flags; + /* cxa_pure_virtual and similar things are not likely. */ + if (TREE_CODE (TREE_TYPE (n->symbol.decl)) != METHOD_TYPE) + return false; + flags = flags_from_decl_or_type (n->symbol.decl); + if (flags & ECF_NORETURN) + return false; + if (lookup_attribute ("cold", + DECL_ATTRIBUTES (n->symbol.decl))) + return false; + if (n->frequency < NODE_FREQUENCY_NORMAL) + return false; + return true; +} + +/* The ipa-devirt pass. + This performs very trivial devirtualization: + 1) when polymorphic call is known to have precisely one target, + turn it into direct call + 2) when polymorphic call has only one likely target in the unit, + turn it into speculative call. */ + +static unsigned int +ipa_devirt (void) +{ + struct cgraph_node *n; + struct pointer_set_t *bad_call_targets = pointer_set_create (); + struct cgraph_edge *e; + + int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0; + int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0; + int nwrong = 0, nok = 0, nexternal = 0;; + + FOR_EACH_DEFINED_FUNCTION (n) + { + bool update = false; + if (dump_file && n->indirect_calls) + fprintf (dump_file, "\n\nProcesing function %s/%i\n", + cgraph_node_name (n), n->symbol.order); + for (e = n->indirect_calls; e; e = e->next_callee) + if (e->indirect_info->polymorphic) + { + struct cgraph_node *likely_target = NULL; + void *cache_token; + bool final; + vec targets + = possible_polymorphic_call_targets + (e, &final, &cache_token); + unsigned int i; + + if (dump_file) + dump_possible_polymorphic_call_targets + (dump_file, e); + npolymorphic++; + + if (final) + { + gcc_assert (targets.length()); + if (targets.length() == 1) + { + if (dump_file) + fprintf (dump_file, + "Devirtualizing call in %s/%i to %s/%i\n", + cgraph_node_name (n), n->symbol.order, + cgraph_node_name (targets[0]), targets[0]->symbol.order); + cgraph_make_edge_direct (e, targets[0]); + ndevirtualized++; + update = true; + continue; + } + } + if (!flag_devirtualize_speculatively) + continue; + if (!cgraph_maybe_hot_edge_p (e)) + { + if (dump_file) + fprintf (dump_file, "Call is cold\n"); + ncold++; + continue; + } + if (e->speculative) + { + if (dump_file) + fprintf (dump_file, "Call is aready speculated\n"); + nspeculated++; + + /* When dumping see if we agree with speculation. */ + if (!dump_file) + continue; + } + if (pointer_set_contains (bad_call_targets, + cache_token)) + { + if (dump_file) + fprintf (dump_file, "Target list is known to be useless\n"); + nmultiple++; + continue; + } + for (i = 0; i < targets.length(); i++) + if (likely_target_p (targets[i])) + { + if (likely_target) + { + likely_target = NULL; + if (dump_file) + fprintf (dump_file, "More than one likely target\n"); + nmultiple++; + break; + } + likely_target = targets[i]; + } + if (!likely_target) + { + pointer_set_insert (bad_call_targets, cache_token); + continue; + } + /* This is reached only when dumping; check if we agree or disagree + with the speculation. */ + if (e->speculative) + { + struct cgraph_edge *e2; + struct ipa_ref *ref; + cgraph_speculative_call_info (e, e2, e, ref); + if (cgraph_function_or_thunk_node (e2->callee, NULL) + == cgraph_function_or_thunk_node (likely_target, NULL)) + { + fprintf (dump_file, "We agree with speculation\n"); + nok++; + } + else + { + fprintf (dump_file, "We disagree with speculation\n"); + nwrong++; + } + continue; + } + if (!likely_target->symbol.definition) + { + if (dump_file) + fprintf (dump_file, "Target is not an definition\n"); + nnotdefined++; + continue; + } + /* Do not introduce new references to external symbols. While we + can handle these just well, it is common for programs to + incorrectly with headers defining methods they are linked + with. */ + if (DECL_EXTERNAL (likely_target->symbol.decl)) + { + if (dump_file) + fprintf (dump_file, "Target is external\n"); + nexternal++; + continue; + } + if (cgraph_function_body_availability (likely_target) + <= AVAIL_OVERWRITABLE + && symtab_can_be_discarded ((symtab_node) likely_target)) + { + if (dump_file) + fprintf (dump_file, "Target is overwritable\n"); + noverwritable++; + continue; + } + else + { + if (dump_file) + fprintf (dump_file, + "Speculatively devirtualizing call in %s/%i to %s/%i\n", + cgraph_node_name (n), n->symbol.order, + cgraph_node_name (likely_target), + likely_target->symbol.order); + if (!symtab_can_be_discarded ((symtab_node) likely_target)) + likely_target = cgraph (symtab_nonoverwritable_alias ((symtab_node)likely_target)); + nconverted++; + update = true; + cgraph_turn_edge_to_speculative + (e, likely_target, e->count * 8 / 10, e->frequency * 8 / 10); + } + } + if (update) + inline_update_overall_summary (n); + } + pointer_set_destroy (bad_call_targets); + + if (dump_file) + fprintf (dump_file, + "%i polymorphic calls, %i devirtualized," + " %i speculatively devirtualized, %i cold\n" + "%i have multiple targets, %i overwritable," + " %i already speculated (%i agree, %i disagree)," + " %i external, %i not defined\n", + npolymorphic, ndevirtualized, nconverted, ncold, + nmultiple, noverwritable, nspeculated, nok, nwrong, + nexternal, nnotdefined); + return ndevirtualized ? TODO_remove_functions : 0; +} + +/* Gate for IPCP optimization. */ + +static bool +gate_ipa_devirt (void) +{ + /* FIXME: We should remove the optimize check after we ensure we never run + IPA passes when not optimizing. */ + return (flag_devirtualize || flag_devirtualize_speculatively) && !in_lto_p; +} + +namespace { + +const pass_data pass_data_ipa_devirt = +{ + IPA_PASS, /* type */ + "devirt", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + true, /* has_gate */ + true, /* has_execute */ + TV_IPA_DEVIRT, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + ( TODO_dump_symtab ), /* todo_flags_finish */ +}; + +class pass_ipa_devirt : public ipa_opt_pass_d +{ +public: + pass_ipa_devirt(gcc::context *ctxt) + : ipa_opt_pass_d(pass_data_ipa_devirt, ctxt, + NULL, /* generate_summary */ + NULL, /* write_summary */ + NULL, /* read_summary */ + NULL, /* write_optimization_summary */ + NULL, /* read_optimization_summary */ + NULL, /* stmt_fixup */ + 0, /* function_transform_todo_flags_start */ + NULL, /* function_transform */ + NULL) /* variable_transform */ + {} + + /* opt_pass methods: */ + bool gate () { return gate_ipa_devirt (); } + unsigned int execute () { return ipa_devirt (); } + +}; // class pass_ipa_devirt + +} // anon namespace + +ipa_opt_pass_d * +make_pass_ipa_devirt (gcc::context *ctxt) +{ + return new pass_ipa_devirt (ctxt); +} + #include "gt-ipa-devirt.h" Index: opts.c =================================================================== --- opts.c (revision 202136) +++ opts.c (working copy) @@ -479,6 +479,7 @@ static const struct default_options defa { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fdevirtualize, NULL, 1 }, + { OPT_LEVELS_2_PLUS, OPT_fdevirtualize_speculatively, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fipa_sra, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_falign_loops, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_falign_jumps, NULL, 1 }, @@ -1665,6 +1666,11 @@ common_handle_option (struct gcc_options opts->x_flag_vect_cost_model = value; if (!opts_set->x_flag_tree_loop_distribute_patterns) opts->x_flag_tree_loop_distribute_patterns = value; + /* Indirect call profiling should do all useful transformations + speculative devirutalization does. */ + if (!opts_set->x_flag_devirtualize_speculatively + && opts->x_flag_value_profile_transformations) + opts->x_flag_devirtualize_speculatively = false; break; case OPT_fprofile_generate_: Index: passes.def =================================================================== --- passes.def (revision 202136) +++ passes.def (working copy) @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3. INSERT_PASSES_AFTER (all_regular_ipa_passes) NEXT_PASS (pass_ipa_whole_program_visibility); NEXT_PASS (pass_ipa_profile); + NEXT_PASS (pass_ipa_devirt); NEXT_PASS (pass_ipa_cp); NEXT_PASS (pass_ipa_cdtor_merge); NEXT_PASS (pass_ipa_inline); Index: timevar.def =================================================================== --- timevar.def (revision 202136) +++ timevar.def (working copy) @@ -66,6 +66,7 @@ DEFTIMEVAR (TV_CGRAPH , " DEFTIMEVAR (TV_CGRAPHOPT , "callgraph optimization") DEFTIMEVAR (TV_IPA_INHERITANCE , "ipa inheritance graph") DEFTIMEVAR (TV_IPA_VIRTUAL_CALL , "ipa virtual call target") +DEFTIMEVAR (TV_IPA_DEVIRT , "ipa devirtualization") DEFTIMEVAR (TV_IPA_CONSTANT_PROP , "ipa cp") DEFTIMEVAR (TV_IPA_INLINING , "ipa inlining heuristics") DEFTIMEVAR (TV_IPA_FNSPLIT , "ipa function splitting") Index: tree-pass.h =================================================================== --- tree-pass.h (revision 202136) +++ tree-pass.h (working copy) @@ -468,6 +468,7 @@ extern simple_ipa_opt_pass *make_pass_ip extern simple_ipa_opt_pass *make_pass_ipa_free_inline_summary (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt); +extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);