From patchwork Sat Aug 21 12:11:10 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 62345 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id E6FABB70DC for ; Sat, 21 Aug 2010 22:11:34 +1000 (EST) Received: (qmail 18345 invoked by alias); 21 Aug 2010 12:11:32 -0000 Received: (qmail 18297 invoked by uid 22791); 21 Aug 2010 12:11:20 -0000 X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL, BAYES_00, TW_TM, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from nikam-dmz.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 21 Aug 2010 12:11:13 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 3306E9ACCA4; Sat, 21 Aug 2010 14:11:10 +0200 (CEST) Date: Sat, 21 Aug 2010 14:11:10 +0200 From: Jan Hubicka To: gcc-patches@gcc.gnu.org, tglek@mozilla.com Subject: Merge constructors and destructors Message-ID: <20100821121110.GQ630@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, this patch implements merging of multiple constructors and destructors in LTO units into single function. I believe this is actually needed for correctness when linking with libraries since constructors of libraries are required to be executed first, while we currently execute them in random order depending on layout of the final LTO file. The main motivation is however to improve code locality by inlining all the functions into single. Mozilla has about 400 constructors, chrome has couple thousdand and they are causing problems because at startup they bring into memory random parts of the executable. See also http://blog.mozilla.com/tglek/2010/05/27/startup-backward-constructors/ I've taken Richard Henderson's code to produce ctors/dtors for collect2 and extended it to work on targets with ctors and dtors too. I also put it into separate micro-IPA pass rather then having it part of cgraphunit driver (things become more twisted there with LTO otherwise). Bootstrapped/regtested x86_64-linux, will commit it today after further testing at Mozilla if there are no complains. Honza * tree-pass.h (pass_ipa_cdtor_merge): New function. * cgraphunit.c (static_ctors, static_dtors): Move to ipa.c; make heap allocated. (record_cdtor_fn): Move to ipa.c; do not test for have_ctors_dtors. (build_cdtor): Move to ipa.c; add code avoiding construction when target have ctors/dtors and there is only one ctor/dtor at given priority. (compare_ctor, compare_dtor): Move to ipa.c; use DECL_UID to stabilize sort; reverse order of constructors. (cgraph_build_cdtor_fns):Move to ipa.c; rename to build_cdtor_fns. (cgraph_finalize_function): Do not call record_cdtor_fn. (cgraph_finalize_compilation_unit): Do not call cgraph_build_cdtor_fns. (cgraph_build_static_cdtor): Move to ipa.c. * ipa.c: Include target.h and tree-iterator.h. (cgraph_build_static_cdtor, static_ctors, static_dtors, record_cdtor_fn, build_cdtor, compare_ctor, compare_dtor, build_cdtor_fns, ipa_cdtor_merge, gate_ipa_cdtor_merge, pass_ipa_cdtor_merge): New. * passes.c (init_optimization_passes): Enqueue pass_ipa_cdtor_merge. * ipa-prop.c (update_indirect_edges_after_inlining): Avoid out of bounds access. Index: tree-pass.h =================================================================== --- tree-pass.h (revision 163440) +++ tree-pass.h (working copy) @@ -468,6 +468,7 @@ extern struct simple_ipa_opt_pass pass_i extern struct ipa_opt_pass_d pass_ipa_lto_wpa_fixup; extern struct ipa_opt_pass_d pass_ipa_lto_finish_out; extern struct ipa_opt_pass_d pass_ipa_profile; +extern struct ipa_opt_pass_d pass_ipa_cdtor_merge; extern struct gimple_opt_pass pass_all_optimizations; extern struct gimple_opt_pass pass_cleanup_cfg_post_optimizing; Index: cgraphunit.c =================================================================== --- cgraphunit.c (revision 163440) +++ cgraphunit.c (working copy) @@ -147,174 +147,9 @@ static void cgraph_analyze_function (str FILE *cgraph_dump_file; -/* A vector of FUNCTION_DECLs declared as static constructors. */ -static GTY (()) VEC(tree, gc) *static_ctors; -/* A vector of FUNCTION_DECLs declared as static destructors. */ -static GTY (()) VEC(tree, gc) *static_dtors; - /* Used for vtable lookup in thunk adjusting. */ static GTY (()) tree vtable_entry_type; -/* When target does not have ctors and dtors, we call all constructor - and destructor by special initialization/destruction function - recognized by collect2. - - When we are going to build this function, collect all constructors and - destructors and turn them into normal functions. */ - -static void -record_cdtor_fn (tree fndecl) -{ - struct cgraph_node *node; - if (targetm.have_ctors_dtors - || (!DECL_STATIC_CONSTRUCTOR (fndecl) - && !DECL_STATIC_DESTRUCTOR (fndecl))) - return; - - if (DECL_STATIC_CONSTRUCTOR (fndecl)) - { - VEC_safe_push (tree, gc, static_ctors, fndecl); - DECL_STATIC_CONSTRUCTOR (fndecl) = 0; - } - if (DECL_STATIC_DESTRUCTOR (fndecl)) - { - VEC_safe_push (tree, gc, static_dtors, fndecl); - DECL_STATIC_DESTRUCTOR (fndecl) = 0; - } - node = cgraph_node (fndecl); - node->local.disregard_inline_limits = 1; - cgraph_mark_reachable_node (node); -} - -/* Define global constructors/destructor functions for the CDTORS, of - which they are LEN. The CDTORS are sorted by initialization - priority. If CTOR_P is true, these are constructors; otherwise, - they are destructors. */ - -static void -build_cdtor (bool ctor_p, tree *cdtors, size_t len) -{ - size_t i; - - i = 0; - while (i < len) - { - tree body; - tree fn; - priority_type priority; - - priority = 0; - body = NULL_TREE; - /* Find the next batch of constructors/destructors with the same - initialization priority. */ - do - { - priority_type p; - tree call; - fn = cdtors[i]; - p = ctor_p ? DECL_INIT_PRIORITY (fn) : DECL_FINI_PRIORITY (fn); - if (!body) - priority = p; - else if (p != priority) - break; - call = build_call_expr (fn, 0); - append_to_statement_list (call, &body); - ++i; - } - while (i < len); - gcc_assert (body != NULL_TREE); - /* Generate a function to call all the function of like - priority. */ - cgraph_build_static_cdtor (ctor_p ? 'I' : 'D', body, priority); - } -} - -/* Comparison function for qsort. P1 and P2 are actually of type - "tree *" and point to static constructors. DECL_INIT_PRIORITY is - used to determine the sort order. */ - -static int -compare_ctor (const void *p1, const void *p2) -{ - tree f1; - tree f2; - int priority1; - int priority2; - - f1 = *(const tree *)p1; - f2 = *(const tree *)p2; - priority1 = DECL_INIT_PRIORITY (f1); - priority2 = DECL_INIT_PRIORITY (f2); - - if (priority1 < priority2) - return -1; - else if (priority1 > priority2) - return 1; - else - /* Ensure a stable sort. */ - return (const tree *)p1 - (const tree *)p2; -} - -/* Comparison function for qsort. P1 and P2 are actually of type - "tree *" and point to static destructors. DECL_FINI_PRIORITY is - used to determine the sort order. */ - -static int -compare_dtor (const void *p1, const void *p2) -{ - tree f1; - tree f2; - int priority1; - int priority2; - - f1 = *(const tree *)p1; - f2 = *(const tree *)p2; - priority1 = DECL_FINI_PRIORITY (f1); - priority2 = DECL_FINI_PRIORITY (f2); - - if (priority1 < priority2) - return -1; - else if (priority1 > priority2) - return 1; - else - /* Ensure a stable sort. */ - return (const tree *)p1 - (const tree *)p2; -} - -/* Generate functions to call static constructors and destructors - for targets that do not support .ctors/.dtors sections. These - functions have magic names which are detected by collect2. */ - -static void -cgraph_build_cdtor_fns (void) -{ - if (!VEC_empty (tree, static_ctors)) - { - gcc_assert (!targetm.have_ctors_dtors); - qsort (VEC_address (tree, static_ctors), - VEC_length (tree, static_ctors), - sizeof (tree), - compare_ctor); - build_cdtor (/*ctor_p=*/true, - VEC_address (tree, static_ctors), - VEC_length (tree, static_ctors)); - VEC_truncate (tree, static_ctors, 0); - } - - if (!VEC_empty (tree, static_dtors)) - { - gcc_assert (!targetm.have_ctors_dtors); - qsort (VEC_address (tree, static_dtors), - VEC_length (tree, static_dtors), - sizeof (tree), - compare_dtor); - build_cdtor (/*ctor_p=*/false, - VEC_address (tree, static_dtors), - VEC_length (tree, static_dtors)); - VEC_truncate (tree, static_dtors, 0); - } -} - /* Determine if function DECL is needed. That is, visible to something either outside this translation unit, something magic in the system configury. */ @@ -519,7 +354,6 @@ cgraph_finalize_function (tree decl, boo node->local.finalized = true; node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL; node->finalized_by_frontend = true; - record_cdtor_fn (node->decl); if (cgraph_decide_is_function_needed (node, decl)) cgraph_mark_needed_node (node); @@ -1155,10 +989,6 @@ cgraph_finalize_compilation_unit (void) /* Emit size functions we didn't inline. */ finalize_size_functions (); - /* Call functions declared with the "constructor" or "destructor" - attribute. */ - cgraph_build_cdtor_fns (); - /* Mark alias targets necessary and emit diagnostics. */ finish_aliases_1 (); @@ -2007,78 +1837,6 @@ cgraph_optimize (void) #endif } - -/* Generate and emit a static constructor or destructor. WHICH must - be one of 'I' (for a constructor) or 'D' (for a destructor). BODY - is a STATEMENT_LIST containing GENERIC statements. PRIORITY is the - initialization priority for this constructor or destructor. */ - -void -cgraph_build_static_cdtor (char which, tree body, int priority) -{ - static int counter = 0; - char which_buf[16]; - tree decl, name, resdecl; - - /* The priority is encoded in the constructor or destructor name. - collect2 will sort the names and arrange that they are called at - program startup. */ - sprintf (which_buf, "%c_%.5d_%d", which, priority, counter++); - name = get_file_function_name (which_buf); - - decl = build_decl (input_location, FUNCTION_DECL, name, - build_function_type_list (void_type_node, NULL_TREE)); - current_function_decl = decl; - - resdecl = build_decl (input_location, - RESULT_DECL, NULL_TREE, void_type_node); - DECL_ARTIFICIAL (resdecl) = 1; - DECL_RESULT (decl) = resdecl; - DECL_CONTEXT (resdecl) = decl; - - allocate_struct_function (decl, false); - - TREE_STATIC (decl) = 1; - TREE_USED (decl) = 1; - DECL_ARTIFICIAL (decl) = 1; - DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl) = 1; - DECL_SAVED_TREE (decl) = body; - if (!targetm.have_ctors_dtors) - { - TREE_PUBLIC (decl) = 1; - DECL_PRESERVE_P (decl) = 1; - } - DECL_UNINLINABLE (decl) = 1; - - DECL_INITIAL (decl) = make_node (BLOCK); - TREE_USED (DECL_INITIAL (decl)) = 1; - - DECL_SOURCE_LOCATION (decl) = input_location; - cfun->function_end_locus = input_location; - - switch (which) - { - case 'I': - DECL_STATIC_CONSTRUCTOR (decl) = 1; - decl_init_priority_insert (decl, priority); - break; - case 'D': - DECL_STATIC_DESTRUCTOR (decl) = 1; - decl_fini_priority_insert (decl, priority); - break; - default: - gcc_unreachable (); - } - - gimplify_function_tree (decl); - - cgraph_add_new_function (decl, false); - cgraph_mark_needed_node (cgraph_node (decl)); - - set_cfun (NULL); - current_function_decl = NULL; -} - void init_cgraph (void) { Index: ipa.c =================================================================== --- ipa.c (revision 163440) +++ ipa.c (working copy) @@ -29,6 +29,8 @@ along with GCC; see the file COPYING3. #include "ggc.h" #include "flags.h" #include "pointer-set.h" +#include "target.h" +#include "tree-iterator.h" /* Fill array order with all nodes with output flag set in the reverse topological order. */ @@ -1317,3 +1319,310 @@ struct ipa_opt_pass_d pass_ipa_profile = NULL, /* function_transform */ NULL /* variable_transform */ }; + +/* Generate and emit a static constructor or destructor. WHICH must + be one of 'I' (for a constructor) or 'D' (for a destructor). BODY + is a STATEMENT_LIST containing GENERIC statements. PRIORITY is the + initialization priority for this constructor or destructor. */ + +void +cgraph_build_static_cdtor (char which, tree body, int priority) +{ + static int counter = 0; + char which_buf[16]; + tree decl, name, resdecl; + + /* The priority is encoded in the constructor or destructor name. + collect2 will sort the names and arrange that they are called at + program startup. */ + sprintf (which_buf, "%c_%.5d_%d", which, priority, counter++); + name = get_file_function_name (which_buf); + + decl = build_decl (input_location, FUNCTION_DECL, name, + build_function_type_list (void_type_node, NULL_TREE)); + current_function_decl = decl; + + resdecl = build_decl (input_location, + RESULT_DECL, NULL_TREE, void_type_node); + DECL_ARTIFICIAL (resdecl) = 1; + DECL_RESULT (decl) = resdecl; + DECL_CONTEXT (resdecl) = decl; + + allocate_struct_function (decl, false); + + TREE_STATIC (decl) = 1; + TREE_USED (decl) = 1; + DECL_ARTIFICIAL (decl) = 1; + DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl) = 1; + DECL_SAVED_TREE (decl) = body; + if (!targetm.have_ctors_dtors) + { + TREE_PUBLIC (decl) = 1; + DECL_PRESERVE_P (decl) = 1; + } + DECL_UNINLINABLE (decl) = 1; + + DECL_INITIAL (decl) = make_node (BLOCK); + TREE_USED (DECL_INITIAL (decl)) = 1; + + DECL_SOURCE_LOCATION (decl) = input_location; + cfun->function_end_locus = input_location; + + switch (which) + { + case 'I': + DECL_STATIC_CONSTRUCTOR (decl) = 1; + decl_init_priority_insert (decl, priority); + break; + case 'D': + DECL_STATIC_DESTRUCTOR (decl) = 1; + decl_fini_priority_insert (decl, priority); + break; + default: + gcc_unreachable (); + } + + gimplify_function_tree (decl); + + cgraph_add_new_function (decl, false); + + set_cfun (NULL); + current_function_decl = NULL; +} + + +/* A vector of FUNCTION_DECLs declared as static constructors. */ +static VEC(tree, heap) *static_ctors; +/* A vector of FUNCTION_DECLs declared as static destructors. */ +static VEC(tree, heap) *static_dtors; + +/* When target does not have ctors and dtors, we call all constructor + and destructor by special initialization/destruction function + recognized by collect2. + + When we are going to build this function, collect all constructors and + destructors and turn them into normal functions. */ + +static void +record_cdtor_fn (struct cgraph_node *node) +{ + if (DECL_STATIC_CONSTRUCTOR (node->decl)) + VEC_safe_push (tree, heap, static_ctors, node->decl); + if (DECL_STATIC_DESTRUCTOR (node->decl)) + VEC_safe_push (tree, heap, static_dtors, node->decl); + node = cgraph_node (node->decl); + node->local.disregard_inline_limits = 1; +} + +/* Define global constructors/destructor functions for the CDTORS, of + which they are LEN. The CDTORS are sorted by initialization + priority. If CTOR_P is true, these are constructors; otherwise, + they are destructors. */ + +static void +build_cdtor (bool ctor_p, tree *cdtors, size_t len) +{ + size_t i,j; + + i = 0; + while (i < len) + { + tree body; + tree fn; + priority_type priority; + + priority = 0; + body = NULL_TREE; + j = i; + do + { + priority_type p; + fn = cdtors[i]; + p = ctor_p ? DECL_INIT_PRIORITY (fn) : DECL_FINI_PRIORITY (fn); + if (j==i) + priority = p; + else if (p != priority) + break; + j++; + } + while (j < len); + + /* When there is only once constructor and target supports them, do nothing. */ + if (j == i + 1 + && targetm.have_ctors_dtors) + continue; + /* Find the next batch of constructors/destructors with the same + initialization priority. */ + do + { + priority_type p; + tree call; + fn = cdtors[i]; + p = ctor_p ? DECL_INIT_PRIORITY (fn) : DECL_FINI_PRIORITY (fn); + if (p != priority) + break; + call = build_call_expr (fn, 0); + if (ctor_p) + DECL_STATIC_CONSTRUCTOR (fn) = 0; + else + DECL_STATIC_DESTRUCTOR (fn) = 0; + /* We do not want to optimize away pure/const calls here. + When optimizing, these should be already removed, when not + optimizing, we want user to be able to breakpoint in them. */ + TREE_SIDE_EFFECTS (call) = 1; + append_to_statement_list (call, &body); + ++i; + } + while (i < len); + gcc_assert (body != NULL_TREE); + /* Generate a function to call all the function of like + priority. */ + cgraph_build_static_cdtor (ctor_p ? 'I' : 'D', body, priority); + } +} + +/* Comparison function for qsort. P1 and P2 are actually of type + "tree *" and point to static constructors. DECL_INIT_PRIORITY is + used to determine the sort order. */ + +static int +compare_ctor (const void *p1, const void *p2) +{ + tree f1; + tree f2; + int priority1; + int priority2; + + f1 = *(const tree *)p1; + f2 = *(const tree *)p2; + priority1 = DECL_INIT_PRIORITY (f1); + priority2 = DECL_INIT_PRIORITY (f2); + + if (priority1 < priority2) + return -1; + else if (priority1 > priority2) + return 1; + else + /* Ensure a stable sort. Constructors are executed in backwarding + order to make LTO initialize braries first. */ + return DECL_UID (f2) - DECL_UID (f1); +} + +/* Comparison function for qsort. P1 and P2 are actually of type + "tree *" and point to static destructors. DECL_FINI_PRIORITY is + used to determine the sort order. */ + +static int +compare_dtor (const void *p1, const void *p2) +{ + tree f1; + tree f2; + int priority1; + int priority2; + + f1 = *(const tree *)p1; + f2 = *(const tree *)p2; + priority1 = DECL_FINI_PRIORITY (f1); + priority2 = DECL_FINI_PRIORITY (f2); + + if (priority1 < priority2) + return -1; + else if (priority1 > priority2) + return 1; + else + /* Ensure a stable sort. */ + return DECL_UID (f1) - DECL_UID (f2); +} + +/* Generate functions to call static constructors and destructors + for targets that do not support .ctors/.dtors sections. These + functions have magic names which are detected by collect2. */ + +static void +build_cdtor_fns (void) +{ + if (!VEC_empty (tree, static_ctors)) + { + gcc_assert (!targetm.have_ctors_dtors || in_lto_p); + qsort (VEC_address (tree, static_ctors), + VEC_length (tree, static_ctors), + sizeof (tree), + compare_ctor); + build_cdtor (/*ctor_p=*/true, + VEC_address (tree, static_ctors), + VEC_length (tree, static_ctors)); + VEC_truncate (tree, static_ctors, 0); + } + + if (!VEC_empty (tree, static_dtors)) + { + gcc_assert (!targetm.have_ctors_dtors || in_lto_p); + qsort (VEC_address (tree, static_dtors), + VEC_length (tree, static_dtors), + sizeof (tree), + compare_dtor); + build_cdtor (/*ctor_p=*/false, + VEC_address (tree, static_dtors), + VEC_length (tree, static_dtors)); + VEC_truncate (tree, static_dtors, 0); + } +} + +/* Look for constructors and destructors and produce function calling them. + This is needed for targets not supporting ctors or dtors, but we perform the + transformation also at linktime to merge possibly numberous + constructors/destructors into single function to improve code locality and + reduce size. */ + +static unsigned int +ipa_cdtor_merge (void) +{ + struct cgraph_node *node; + for (node = cgraph_nodes; node; node = node->next) + if (node->analyzed + && (DECL_STATIC_CONSTRUCTOR (node->decl) + || DECL_STATIC_DESTRUCTOR (node->decl))) + record_cdtor_fn (node); + build_cdtor_fns (); + VEC_free (tree, heap, static_ctors); + VEC_free (tree, heap, static_dtors); + return 0; +} + +/* Perform the pass when we have no ctors/dtors support + or at LTO time to merge multiple constructors into single + function. */ + +static bool +gate_ipa_cdtor_merge (void) +{ + return !targetm.have_ctors_dtors || (optimize && in_lto_p); +} + +struct ipa_opt_pass_d pass_ipa_cdtor_merge = +{ + { + IPA_PASS, + "cdtor", /* name */ + gate_ipa_cdtor_merge, /* gate */ + ipa_cdtor_merge, /* execute */ + NULL, /* sub */ + NULL, /* next */ + 0, /* static_pass_number */ + TV_CGRAPHOPT, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0 /* todo_flags_finish */ + }, + NULL, /* generate_summary */ + NULL, /* write_summary */ + NULL, /* read_summary */ + NULL, /* write_optimization_summary */ + NULL, /* read_optimization_summary */ + NULL, /* stmt_fixup */ + 0, /* TODOs */ + NULL, /* function_transform */ + NULL /* variable_transform */ +}; Index: passes.c =================================================================== --- passes.c (revision 163440) +++ passes.c (working copy) @@ -811,6 +811,7 @@ init_optimization_passes (void) NEXT_PASS (pass_ipa_whole_program_visibility); NEXT_PASS (pass_ipa_profile); NEXT_PASS (pass_ipa_cp); + NEXT_PASS (pass_ipa_cdtor_merge); NEXT_PASS (pass_ipa_inline); NEXT_PASS (pass_ipa_pure_const); NEXT_PASS (pass_ipa_reference); Index: ipa-prop.c =================================================================== --- ipa-prop.c (revision 163440) +++ ipa-prop.c (working copy) @@ -1541,11 +1541,12 @@ update_indirect_edges_after_inlining (st struct cgraph_node *node, VEC (cgraph_edge_p, heap) **new_edges) { - struct ipa_edge_args *top = IPA_EDGE_REF (cs); + struct ipa_edge_args *top; struct cgraph_edge *ie, *next_ie, *new_direct_edge; bool res = false; ipa_check_create_edge_args (); + top = IPA_EDGE_REF (cs); for (ie = node->indirect_calls; ie; ie = next_ie) {