From patchwork Tue Nov 1 22:06:12 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 123146 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 4E7F3B6F88 for ; Wed, 2 Nov 2011 09:06:34 +1100 (EST) Received: (qmail 16655 invoked by alias); 1 Nov 2011 22:06:32 -0000 Received: (qmail 16644 invoked by uid 22791); 1 Nov 2011 22:06:30 -0000 X-SWARE-Spam-Status: No, hits=-3.1 required=5.0 tests=AWL, BAYES_00, TW_CF, TW_FN, TW_JF, TW_TM X-Spam-Check-By: sourceware.org Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 01 Nov 2011 22:06:15 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.221.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 6AFE98B013; Tue, 1 Nov 2011 23:06:13 +0100 (CET) Date: Tue, 1 Nov 2011 23:06:12 +0100 From: Martin Jambor To: GCC Patches Cc: Jan Hubicka , Maxim Kuvyrkov Subject: [PATCH, devirtualization] Intraprocedural devirtualization pass Message-ID: <20111101220611.GD13544@virgil.arch.suse.de> Mail-Followup-To: GCC Patches , Jan Hubicka , Maxim Kuvyrkov MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, the patch below is the second (and last) revived type-based devirtualization patch that did not make it to 4.6. It deals with virtual calls from the function in which the there is also the object declaration: void foo() { A a; a.foo (); } Normally, the front-end would do the devirtualization on its own, however, early-inlining can create these situations too. Since there is nothing interprocedural going on, the current inlining and IPA-CP devirtualization bits are of no help. We do not do type-based devirtualization in OBJ_TYPE_REF folding either, because the necessary type-changing checks might make it quite expensive. Thus, this patch introduces a new pass to do that. The patch basically piggy-tails on the intraprocedural devirtualization mechanism, trying to construct a known-type jump function for all objects in OBJ_TYPE_REF calls and then immediately using it like we do in IPA-CP. The original patch was doing this as a part of pass_rebuild_cgraph_edges. Honza did not like this idea so I made it a separate pass. First, I scheduled it after pass_rebuild_cgraph_edges and was only traversing indirect edges, avoiding a sweep over all of the IL. Unfortunately, this does not work in one scenario. If the newly-known destination of a virtual call is known not to throw, we may have to purge some EH CFG edges and potentially basic blocks. If these basic blocks contain calls (typically calls to object destructors), we may end up having stale call edges in the call graph... and our current approach to that problem is to call pass_rebuild_cgraph_edges. I think that I was not running into this problem when the mechanism was a part of that pass just because of pure luck. Anyway, this is why I eventually opted for a sweep over the statements. My best guess is that it is probably worth it, but only because the overhead should be still fairly low. The pass triggers quite a number of times when building libstdc++ and it can speed up an artificial testcase which I will attach from over 20 seconds to 7s on my older desktop - it is very similar to the one I posted with the previous patch but this time the object constructors must not get early inlined but the function multiply_matrices has to. Currently I have problems compiling Firefox even without LTO so I don't have any numbers from it either. IIRC, Honza did not see this patch trigger there when he tried the ancient version almost a year go. On the other hand, Maxim claimed that the impact can be noticeable on some code base he is concerned about. I have successfully bootstrapped and tested the patch on x86_64-linux. What do you think, should we include this in trunk? Thanks, Martin 2011-10-31 Martin Jambor * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to... * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and updated all callers. (intraprocedural_devirtualization): New function. (gate_intra_devirtualization): Likewise. (pass_intra_devirt): New pass. * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared. * passes.c (init_optimization_passes): Schedule pass_intra_devirt. * tree-pass.h (pass_intra_devirt): Declare. * testsuite/g++.dg/ipa/imm-devirt-1.C: New test. * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise. Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C =================================================================== --- /dev/null +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C @@ -0,0 +1,62 @@ +/* Verify that virtual calls are folded even when a typecast to an + ancestor is involved along the way. */ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-devirt" } */ + +extern "C" void abort (void); + +class A +{ +public: + int data; + virtual int foo (int i); +}; + + +class B : public A +{ +public: + __attribute__ ((noinline)) B(); + virtual int foo (int i); +}; + +int __attribute__ ((noinline)) A::foo (int i) +{ + return i + 1; +} + +int __attribute__ ((noinline)) B::foo (int i) +{ + return i + 2; +} + +int __attribute__ ((noinline,noclone)) get_input(void) +{ + return 1; +} + +__attribute__ ((noinline)) B::B() +{ +} + +static inline int middleman_1 (class A *obj, int i) +{ + return obj->foo (i); +} + +static inline int middleman_2 (class B *obj, int i) +{ + return middleman_1 (obj, i); +} + +int main (int argc, char *argv[]) +{ + class B b; + + if (middleman_2 (&b, get_input ()) != 3) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump "Immediately devirtualizing call.*into.*B::foo" "devirt" } } */ +/* { dg-final { cleanup-tree-dump "devirt" } } */ Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C =================================================================== --- /dev/null +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C @@ -0,0 +1,95 @@ +/* Verify that virtual calls are folded even when a typecast to an + ancestor is involved along the way. */ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-devirt-slim" } */ + +extern "C" void abort (void); + +class Distraction +{ +public: + float f; + double d; + Distraction () + { + f = 8.3; + d = 10.2; + } + virtual float bar (float z); +}; + +class A +{ +public: + int data; + virtual int foo (int i); +}; + +class B : public A +{ +public: + int data_2; + virtual int foo (int i); + virtual int baz (int i); +}; + + +class C : public Distraction, public B +{ +public: + __attribute__ ((noinline)) C(); + virtual int foo (int i); +}; + +float __attribute__ ((noinline)) Distraction::bar (float z) +{ + f += z; + return f/2; +} + +int __attribute__ ((noinline)) A::foo (int i) +{ + return i + 1; +} + +int __attribute__ ((noinline)) B::foo (int i) +{ + return i + 2; +} + +int __attribute__ ((noinline)) B::baz (int i) +{ + return i * 15; +} + +int __attribute__ ((noinline)) C::foo (int i) +{ + return i + 3; +} + +int __attribute__ ((noinline,noclone)) get_input(void) +{ + return 1; +} + +static inline int middleman (class A *obj, int i) +{ + return obj->foo (i); +} + +__attribute__ ((noinline)) C::C() +{ +} + +int main (int argc, char *argv[]) +{ + class C c; + + if (middleman (&c, get_input ()) != 4) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump "Immediately devirtualizing call.*into.*C::.*foo" "devirt" } } */ +/* { dg-final { cleanup-tree-dump "devirt" } } */ Index: src/gcc/ipa-cp.c =================================================================== --- src.orig/gcc/ipa-cp.c +++ src/gcc/ipa-cp.c @@ -674,20 +674,6 @@ ipa_get_jf_ancestor_result (struct ipa_j return NULL_TREE; } -/* Extract the acual BINFO being described by JFUNC which must be a known type - jump function. */ - -static tree -ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc) -{ - tree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type); - if (!base_binfo) - return NULL_TREE; - return get_binfo_at_offset (base_binfo, - jfunc->value.known_type.offset, - jfunc->value.known_type.component_type); -} - /* Determine whether JFUNC evaluates to a known value (that is either a constant or a binfo) and if so, return it. Otherwise return NULL. INFO describes the caller node so that pass-through jump functions can be @@ -699,7 +685,7 @@ ipa_value_from_jfunc (struct ipa_node_pa if (jfunc->type == IPA_JF_CONST) return jfunc->value.constant; else if (jfunc->type == IPA_JF_KNOWN_TYPE) - return ipa_value_from_known_type_jfunc (jfunc); + return ipa_binfo_from_known_type_jfunc (jfunc); else if (jfunc->type == IPA_JF_PASS_THROUGH || jfunc->type == IPA_JF_ANCESTOR) { @@ -1006,7 +992,7 @@ propagate_accross_jump_function (struct if (jfunc->type == IPA_JF_KNOWN_TYPE) { - val = ipa_value_from_known_type_jfunc (jfunc); + val = ipa_binfo_from_known_type_jfunc (jfunc); if (!val) return set_lattice_contains_variable (dest_lat); } Index: src/gcc/ipa-prop.c =================================================================== --- src.orig/gcc/ipa-prop.c +++ src/gcc/ipa-prop.c @@ -3069,3 +3069,107 @@ ipa_update_after_lto_read (void) if (node->analyzed) ipa_initialize_node_params (node); } + +/* Extract the acual BINFO being described by JFUNC which must be a known type + jump function. */ + +tree +ipa_binfo_from_known_type_jfunc (struct ipa_jump_func *jfunc) +{ + tree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type); + if (!base_binfo) + return NULL_TREE; + return get_binfo_at_offset (base_binfo, + jfunc->value.known_type.offset, + jfunc->value.known_type.component_type); +} + +/* Intraprocedural (early) type-based devirtualization pass. Looks at all call + statements and attempts type-base devirtualization on those calling an + OBJ_TYPE_REF. */ + +static unsigned int +intraprocedural_devirtualization (void) +{ + basic_block bb; + gimple_stmt_iterator gsi; + bool cfg_changed = false; + + FOR_EACH_BB (bb) + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + if (is_gimple_call (gsi_stmt (gsi))) + { + tree binfo, token, fndecl; + gimple call = gsi_stmt (gsi); + tree target = gimple_call_fn (call); + struct ipa_jump_func jfunc; + + if (TREE_CODE (target) != OBJ_TYPE_REF) + continue; + + jfunc.type = IPA_JF_UNKNOWN; + compute_known_type_jump_func (OBJ_TYPE_REF_OBJECT (target), &jfunc, + call); + if (jfunc.type != IPA_JF_KNOWN_TYPE) + continue; + binfo = ipa_binfo_from_known_type_jfunc (&jfunc); + if (!binfo) + continue; + token = OBJ_TYPE_REF_TOKEN (target); + fndecl = gimple_get_virt_method_for_binfo (tree_low_cst (token, 1), + binfo); + if (!fndecl) + continue; + + if (dump_file) + { + fprintf (dump_file, "ipa-prop: Immediately devirtualizing call "); + print_gimple_expr (dump_file, call, 0, 0); + } + + gimple_call_set_fndecl (call, fndecl); + update_stmt (call); + if (maybe_clean_eh_stmt (call) + && gimple_purge_dead_eh_edges (bb)) + cfg_changed = true; + + if (dump_file) + { + fprintf (dump_file, " into "); + print_gimple_expr (dump_file, call, 0, 0); + fprintf (dump_file, "\n"); + } + } + + + if (cfg_changed) + return TODO_cleanup_cfg; + else + return 0; +} + +static bool +gate_intra_devirtualization (void) +{ + return flag_devirtualize != 0; +} + + +struct gimple_opt_pass pass_intra_devirt = + { + { + GIMPLE_PASS, + "devirt", /* name */ + gate_intra_devirtualization, /* gate */ + intraprocedural_devirtualization, /* execute */ + NULL, /* sub */ + NULL, /* next */ + 0, /* static_pass_number */ + TV_CGRAPH, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ + } + }; Index: src/gcc/ipa-prop.h =================================================================== --- src.orig/gcc/ipa-prop.h +++ src/gcc/ipa-prop.h @@ -347,6 +347,7 @@ bool ipa_propagate_indirect_call_infos ( /* Indirect edge and binfo processing. */ struct cgraph_edge *ipa_make_edge_direct_to_target (struct cgraph_edge *, tree); +tree ipa_binfo_from_known_type_jfunc (struct ipa_jump_func *); /* Functions related to both. */ void ipa_analyze_node (struct cgraph_node *); Index: src/gcc/passes.c =================================================================== --- src.orig/gcc/passes.c +++ src/gcc/passes.c @@ -1225,6 +1225,7 @@ init_optimization_passes (void) NEXT_PASS (pass_cleanup_eh); NEXT_PASS (pass_profile); NEXT_PASS (pass_local_pure_const); + NEXT_PASS (pass_intra_devirt); /* Split functions creates parts that are not run through early optimizations again. It is thus good idea to do this late. */ Index: src/gcc/tree-pass.h =================================================================== --- src.orig/gcc/tree-pass.h +++ src/gcc/tree-pass.h @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_trace extern struct gimple_opt_pass pass_warn_unused_result; extern struct gimple_opt_pass pass_split_functions; extern struct gimple_opt_pass pass_feedback_split_functions; +extern struct gimple_opt_pass pass_intra_devirt; /* IPA Passes */ extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;