From patchwork Thu Sep 19 08:33:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Li=C5=A1ka?= X-Patchwork-Id: 1164401 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-509251-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="FGKEtnwM"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46Yqrf5mPnz9s7T for ; Thu, 19 Sep 2019 18:34:10 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=QEVG8MNThZVh4Maj3rITHhOQDuxj07SszutRv50/5sesuj4tC5 ioKkgQn4ppysiioihyMw5Cvks9ar/Q5K2KSxGCCOSec6bvhjYVw+s6APuYidtHdA WMVBmi4pYGId1cR+UnF+5aB2atFNn2G0K1oGZdusenVDMk9AkRLRqUJbY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; s= default; bh=MQIONoehS4jqCUpB524gW5Ph8sg=; b=FGKEtnwM3GGWF0v/O/5f U2LiLuUzatkB4uOR8+OXy2WXPQ1yiIXyevZjmvpfRY4dvoqm3p11kaSw/RA6C3zO TkD1ykrH9tidy2waa2yXjbbaeviCUPZjRvXypFls/uv+uFqEBzMo51m/EdLskJ/i ShBk8EaOMbHikEvR88l07C8= Received: (qmail 80546 invoked by alias); 19 Sep 2019 08:34:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 80485 invoked by uid 89); 19 Sep 2019 08:34:02 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-18.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, SPF_PASS autolearn=ham version=3.3.1 spammy=Very, scheduled, 04, oriented X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 19 Sep 2019 08:33:57 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 91132AD3E; Thu, 19 Sep 2019 08:33:55 +0000 (UTC) To: GCC Patches Cc: Jan Hubicka , Nathan Sidwell , v.barinov@samsung.com From: =?utf-8?q?Martin_Li=C5=A1ka?= Subject: [PATCH][RFC] Add new ipa-reorder pass Message-ID: <12f761ce-012d-f9be-ceef-6ad8de6324e8@suse.cz> Date: Thu, 19 Sep 2019 10:33:55 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 X-IsSubscribed: yes Hi. Function reordering has been around for quite some time and a naive implementation was also part of my diploma thesis some time ago. Currently, the GCC can reorder function based on first execution, which happens with PGO and LTO of course. Known limitation is that the order is preserved only partially as various symbols go into different LTRANS partitions. There has been some research in the area and I would point out the Facebook paper ([1]) and Sony presentation ([2]). Based on that, I decided to make a new implementation in the GCC that does the same (in a proper way). First part of the enablement are patches to ld.bfd and ld.gold that come up with a new section .text.sorted, that is always sorted. There's a snippet from the modified default linker script: .text : { *(.text.unlikely .text.*_unlikely .text.unlikely.*) *(.text.exit .text.exit.*) *(.text.startup .text.startup.*) *(.text.hot .text.hot.*) *(SORT(.text.sorted.*)) *(.text .stub .text.* .gnu.linkonce.t.*) /* .gnu.warning sections are handled specially by elf.em. */ *(.gnu.warning) } Second part is the reordering pass where I implemented both approaches: the existing one (-freorder-functions-algorithm=first-run) and the new one based on the Facebook paper (-freorder-functions-algorithm=call-chain-clustering). The implementation is straightforward and utilizes the newly introduced text subsection. Results: I made a PGO LTO bootstrap of the GCC with the patch and -freorder-functions-algorithm=call-chain-clustering. The pass makes sorting order for about 7500 functions and from these about 7300 are really put in the desired order. The remaining are the functions that are later observed as cold and are eventually put into .text.unlikely section. I verified the order with nm (sorted by address) and my -fdump-ipa-reorder dump. The aforementioned sorted function occupy 5.0MB (out of 18.3MB) of the final binary of cc1plus. I made a simple benchmark of: ./cc1plus /home/marxin/Programming/tramp3d/tramp3d-v4.ii -O2 -quiet and run it 20x before and after my patch: run #020: old:15.033 s, new: 14.975 s, cmp: 99.61% The numbers are stable and present a speed up of about 0.4%. To verify that I also measured iTLB misses before: 18,475,504 iTLB-load-misses:u and after: 13,779,347 iTLB-load-misses:u Both these numbers are very stable based on my measurements. That said, I believe the patch is correct and can help in huge binaries that have a flat profile. Notes and limitations: - The call-chain-clustering algorithm requires to fit as many as possible functions into page size (4K). Based on my measurements that should correspond to ~1000 GIMPLE statements (IPA inliner size). I can make it a param in the future. - One needs modified binutils and I that would probably require a configure detection. The only way which I see is based on ld --version. I'm planning to make the binutils submission soon. - Right now, I prefer to put functions into .text.sorted.* rather than .text.hot. That's required by the algorithm to put hot function calls next to each other. - Observation: Very hot calls are likely inlined and so that the clustering reached quite big functions. Thoughts? I would definitely welcome any interesting measurement on a bigger load. Martin [1] https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf [2] https://llvm.org/devmtg/2018-10/slides/Spencer-Profile%20Guided%20Function%20Layout%20in%20LLVM%20and%20LLD.pdf From d9c81529ae2a66bd51c49d83e19edb58b19cbf97 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Thu, 5 Sep 2019 13:32:41 +0200 Subject: [PATCH] Add new ipa-reorder pass. --- gcc/Makefile.in | 1 + gcc/cgraph.c | 2 + gcc/cgraph.h | 2 + gcc/cgraphclones.c | 1 + gcc/common.opt | 14 ++ gcc/flag-types.h | 8 + gcc/ipa-reorder.c | 333 ++++++++++++++++++++++++++++++++++++++++ gcc/lto-cgraph.c | 2 + gcc/lto/lto-partition.c | 18 --- gcc/passes.def | 1 + gcc/timevar.def | 1 + gcc/tree-pass.h | 1 + gcc/varasm.c | 9 ++ 13 files changed, 375 insertions(+), 18 deletions(-) create mode 100644 gcc/ipa-reorder.c diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 152df9fa9b3..c0efef626c5 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1368,6 +1368,7 @@ OBJS = \ init-regs.o \ internal-fn.o \ ipa-cp.o \ + ipa-reorder.o \ ipa-devirt.o \ ipa-fnsummary.o \ ipa-polymorphic-call.o \ diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 843891e9e56..a0f350c26c0 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2041,6 +2041,8 @@ cgraph_node::dump (FILE *f) } if (tp_first_run > 0) fprintf (f, " first_run:%i", tp_first_run); + if (text_sorted_order > 0) + fprintf (f, " text_sorted_order:%i", text_sorted_order); if (origin) fprintf (f, " nested in:%s", origin->asm_name ()); if (gimple_has_body_p (decl)) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 4c54210123a..24c0b2be321 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1446,6 +1446,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node unsigned int profile_id; /* Time profiler: first run of function. */ int tp_first_run; + /* Order in .text.sorted.* section. */ + int text_sorted_order; /* Set when decl is an abstract function pointed to by the ABSTRACT_DECL_ORIGIN of a reachable function. */ diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index fa753697c78..4d47b3e7574 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -462,6 +462,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, new_node->rtl = rtl; new_node->frequency = frequency; new_node->tp_first_run = tp_first_run; + new_node->text_sorted_order = text_sorted_order; new_node->tm_clone = tm_clone; new_node->icf_merged = icf_merged; new_node->merged_comdat = merged_comdat; diff --git a/gcc/common.opt b/gcc/common.opt index 1b9e0f3c802..e68f28f6401 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2252,6 +2252,20 @@ freorder-functions Common Report Var(flag_reorder_functions) Optimization Reorder functions to improve code placement. +freorder-functions-algorithm= +Common Joined RejectNegative Enum(reorder_functions_algorithm) Var(flag_reorder_functions_algorithm) Init(REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING) Optimization +-freorder-functions-algorithm=[first-run|call-chain-clustering] Set the used function reordering algorithm. + +Enum +Name(reorder_functions_algorithm) Type(enum reorder_functions_algorithm) UnknownError(unknown function reordering algorithm %qs) + +EnumValue +Enum(reorder_functions_algorithm) String(first-run) Value(REORDER_FUNCTIONS_ALGORITHM_FIRST_RUN) + +EnumValue +Enum(reorder_functions_algorithm) String(call-chain-clustering) Value(REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING) + + frerun-cse-after-loop Common Report Var(flag_rerun_cse_after_loop) Optimization Add a common subexpression elimination pass after loop optimizations. diff --git a/gcc/flag-types.h b/gcc/flag-types.h index a2103282d46..9658aa99ea5 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -138,6 +138,14 @@ enum reorder_blocks_algorithm REORDER_BLOCKS_ALGORITHM_STC }; +/* The algorithm used for function reordering. */ +enum reorder_functions_algorithm +{ + REORDER_FUNCTIONS_ALGORITHM_FIRST_RUN, + REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING +}; + + /* The algorithm used for the integrated register allocator (IRA). */ enum ira_algorithm { diff --git a/gcc/ipa-reorder.c b/gcc/ipa-reorder.c new file mode 100644 index 00000000000..d9ddc468968 --- /dev/null +++ b/gcc/ipa-reorder.c @@ -0,0 +1,333 @@ +/* Reorder functions based on profile. + Copyright (C) 2019 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "tree.h" +#include "tree-pass.h" +#include "cgraph.h" +#include "symbol-summary.h" +#include "tree-vrp.h" +#include "alloc-pool.h" +#include "ipa-prop.h" +#include "ipa-fnsummary.h" +#include "fibonacci_heap.h" +#include + +using namespace std; + +namespace { + +#define C3_CLUSTER_THRESHOLD 1024 + +struct cluster_edge; + +/* Cluster is set of functions considered in C3 algorithm. */ + +struct cluster +{ + cluster (cgraph_node *node, int size, sreal time): + m_functions (), m_callers (), m_size (size), m_time (time) + { + m_functions.safe_push (node); + } + + vec m_functions; + hash_map m_callers; + int m_size; + sreal m_time; +}; + +/* Cluster edge is an oriented edge in between two clusters. */ + +struct cluster_edge +{ + cluster_edge (cluster *caller, cluster *callee, uint32_t count): + m_caller (caller), m_callee (callee), m_count (count), m_heap_node (NULL) + {} + + + uint32_t inverted_count () + { + return numeric_limits::max () - m_count; + } + + cluster *m_caller; + cluster *m_callee; + uint32_t m_count; + fibonacci_node *m_heap_node; +}; + +/* Sort functions based of first execution of the function. */ + +static void +sort_functions_by_first_run (void) +{ + cgraph_node *node; + + FOR_EACH_DEFINED_FUNCTION (node) + if (node->tp_first_run && !node->alias) + node->text_sorted_order = node->tp_first_run; +} + +/* Compare clusters by density after that are established. */ + +static int +cluster_cmp (const void *a_p, const void *b_p) +{ + const cluster *a = *(cluster * const *)a_p; + const cluster *b = *(cluster * const *)b_p; + + unsigned fncounta = a->m_functions.length (); + unsigned fncountb = b->m_functions.length (); + if (fncounta <= 1 || fncountb <= 1) + return fncountb - fncounta; + + sreal r = b->m_time * a->m_size - a->m_time * b->m_size; + return (r < 0) ? -1 : ((r > 0) ? 1 : 0); +} + +/* Sort functions based on call chain clustering, which is an algorithm + mentioned in the following article: + https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf + . */ + +static void +sort_functions_by_c3 (void) +{ + cgraph_node *node; + auto_vec clusters; + + /* Create a cluster for each function. */ + FOR_EACH_DEFINED_FUNCTION (node) + if (!node->alias + && !node->global.inlined_to) + { + ipa_fn_summary *summary = ipa_fn_summaries->get (node); + cluster *c = new cluster (node, summary->size, summary->time); + node->aux = c; + clusters.safe_push (c); + } + + auto_vec edges; + + /* Insert edges between clusters that have a profile. */ + for (unsigned i = 0; i < clusters.length (); i++) + { + cgraph_node *node = clusters[i]->m_functions[0]; + for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller) + if (cs->count.reliable_p () + && cs->count.to_gcov_type () > 0) + { + cluster *caller = (cluster *)cs->caller->aux; + cluster *callee = (cluster *)cs->callee->aux; + gcov_type count = cs->count.to_gcov_type (); + + cluster_edge **cedge = callee->m_callers.get (caller); + if (cedge != NULL) + (*cedge)->m_count += count; + else + { + cluster_edge *cedge = new cluster_edge (caller, callee, count); + edges.safe_push (cedge); + callee->m_callers.put (caller, cedge); + } + } + } + + /* Now insert all created edges into a heap. */ + fibonacci_heap heap (0); + + for (unsigned i = 0; i < clusters.length (); i++) + { + cluster *c = clusters[i]; + for (hash_map::iterator it + = c->m_callers.begin (); it != c->m_callers.end (); ++it) + { + cluster_edge *cedge = (*it).second; + cedge->m_heap_node = heap.insert (cedge->inverted_count (), cedge); + } + } + + while (!heap.empty ()) + { + cluster_edge *cedge = heap.extract_min (); + cluster *caller = cedge->m_caller; + cluster *callee = cedge->m_callee; + cedge->m_heap_node = NULL; + + if (caller == callee) + continue; + if (caller->m_size + callee->m_size <= C3_CLUSTER_THRESHOLD) + { + caller->m_size += callee->m_size; + caller->m_time += callee->m_time; + + /* Append all cgraph_nodes from callee to caller. */ + for (unsigned i = 0; i < callee->m_functions.length (); i++) + caller->m_functions.safe_push (callee->m_functions[i]); + + callee->m_functions.truncate (0); + + /* Iterate all cluster_edges of callee and add them to the caller. */ + for (hash_map::iterator it + = callee->m_callers.begin (); it != callee->m_callers.end (); + ++it) + { + (*it).second->m_callee = caller; + cluster_edge **ce = caller->m_callers.get ((*it).first); + + if (ce != NULL) + { + (*ce)->m_count += (*it).second->m_count; + if ((*ce)->m_heap_node != NULL) + heap.decrease_key ((*ce)->m_heap_node, + (*ce)->inverted_count ()); + } + else + caller->m_callers.put ((*it).first, (*it).second); + } + } + } + + /* Sort the candidate clusters. */ + clusters.qsort (cluster_cmp); + + /* Dump clusters. */ + if (dump_file) + { + for (unsigned i = 0; i < clusters.length (); i++) + { + cluster *c = clusters[i]; + if (c->m_functions.length () <= 1) + continue; + + fprintf (dump_file, "Cluster %d with functions: %d, size: %d," + " density: %f\n", i, c->m_functions.length (), c->m_size, + (c->m_time / c->m_size).to_double ()); + fprintf (dump_file, " functions: "); + for (unsigned j = 0; j < c->m_functions.length (); j++) + fprintf (dump_file, "%s ", c->m_functions[j]->dump_name ()); + fprintf (dump_file, "\n"); + } + fprintf (dump_file, "\n"); + } + + /* Assign .text.sorted.* section names. */ + int counter = 1; + for (unsigned i = 0; i < clusters.length (); i++) + { + cluster *c = clusters[i]; + if (c->m_functions.length () <= 1) + continue; + + for (unsigned j = 0; j < c->m_functions.length (); j++) + { + cgraph_node *node = c->m_functions[j]; + + if (dump_file) + fprintf (dump_file, "setting: %d for %s with size:%d\n", + counter, node->dump_asm_name (), + ipa_fn_summaries->get (node)->size); + node->text_sorted_order = counter++; + } + } + + /* Release memory. */ + FOR_EACH_DEFINED_FUNCTION (node) + if (!node->alias) + node->aux = NULL; + + for (unsigned i = 0; i < clusters.length (); i++) + delete clusters[i]; + + for (unsigned i = 0; i < edges.length (); i++) + delete edges[i]; +} + +/* The main function for function sorting. */ + +static unsigned int +ipa_reorder (void) +{ + switch (flag_reorder_functions_algorithm) + { + case REORDER_FUNCTIONS_ALGORITHM_CALL_CHAIN_CLUSTERING: + sort_functions_by_c3 (); + break; + case REORDER_FUNCTIONS_ALGORITHM_FIRST_RUN: + sort_functions_by_first_run (); + break; + default: + gcc_unreachable (); + } + + return 0; +} + +const pass_data pass_data_ipa_reorder = +{ + IPA_PASS, /* type */ + "reorder", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_IPA_REORDER, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_ipa_reorder : public ipa_opt_pass_d +{ +public: + pass_ipa_reorder (gcc::context *ctxt) + : ipa_opt_pass_d (pass_data_ipa_reorder, ctxt, + NULL, /* generate_summary */ + NULL, /* write_summary */ + NULL, /* read_summary */ + NULL, /* write_optimization_summary */ + NULL, /* read_optimization_summary */ + NULL, /* stmt_fixup */ + 0, /* function_transform_todo_flags_start */ + NULL, /* function_transform */ + NULL) /* variable_transform */ + {} + + /* opt_pass methods: */ + virtual bool gate (function *); + virtual unsigned int execute (function *) { return ipa_reorder (); } + +}; // class pass_ipa_reorder + +bool +pass_ipa_reorder::gate (function *) +{ + return flag_profile_reorder_functions && flag_profile_use && flag_wpa; +} + +} // anon namespace + +ipa_opt_pass_d * +make_pass_ipa_reorder (gcc::context *ctxt) +{ + return new pass_ipa_reorder (ctxt); +} diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index bc0f0107333..0a424a2ad12 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -505,6 +505,7 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node, section = ""; streamer_write_hwi_stream (ob->main_stream, node->tp_first_run); + streamer_write_hwi_stream (ob->main_stream, node->text_sorted_order); bp = bitpack_create (ob->main_stream); bp_pack_value (&bp, node->local.local, 1); @@ -1277,6 +1278,7 @@ input_node (struct lto_file_decl_data *file_data, "node with uid %d", node->get_uid ()); node->tp_first_run = streamer_read_uhwi (ib); + node->text_sorted_order = streamer_read_uhwi (ib); bp = streamer_read_bitpack (ib); diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 6972e6e9ef3..8937e64ef72 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -381,24 +381,6 @@ node_cmp (const void *pa, const void *pb) const struct cgraph_node *a = *(const struct cgraph_node * const *) pa; const struct cgraph_node *b = *(const struct cgraph_node * const *) pb; - /* Profile reorder flag enables function reordering based on first execution - of a function. All functions with profile are placed in ascending - order at the beginning. */ - - if (flag_profile_reorder_functions) - { - /* Functions with time profile are sorted in ascending order. */ - if (a->tp_first_run && b->tp_first_run) - return a->tp_first_run != b->tp_first_run - ? a->tp_first_run - b->tp_first_run - : a->order - b->order; - - /* Functions with time profile are sorted before the functions - that do not have the profile. */ - if (a->tp_first_run || b->tp_first_run) - return b->tp_first_run - a->tp_first_run; - } - return b->order - a->order; } diff --git a/gcc/passes.def b/gcc/passes.def index 93879223a8b..6caf2ac5cda 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -153,6 +153,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ipa_fn_summary); NEXT_PASS (pass_ipa_inline); NEXT_PASS (pass_ipa_pure_const); + NEXT_PASS (pass_ipa_reorder); NEXT_PASS (pass_ipa_free_fn_summary, false /* small_p */); NEXT_PASS (pass_ipa_reference); /* This pass needs to be scheduled after any IP code duplication. */ diff --git a/gcc/timevar.def b/gcc/timevar.def index 3551a433b1f..068c902424f 100644 --- a/gcc/timevar.def +++ b/gcc/timevar.def @@ -76,6 +76,7 @@ DEFTIMEVAR (TV_IPA_INHERITANCE , "ipa inheritance graph") DEFTIMEVAR (TV_IPA_VIRTUAL_CALL , "ipa virtual call target") DEFTIMEVAR (TV_IPA_DEVIRT , "ipa devirtualization") DEFTIMEVAR (TV_IPA_CONSTANT_PROP , "ipa cp") +DEFTIMEVAR (TV_IPA_REORDER , "ipa reorder") DEFTIMEVAR (TV_IPA_INLINING , "ipa inlining heuristics") DEFTIMEVAR (TV_IPA_FNSPLIT , "ipa function splitting") DEFTIMEVAR (TV_IPA_COMDATS , "ipa comdats") diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 7106eba54af..10bc4cc125d 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -500,6 +500,7 @@ extern ipa_opt_pass_d *make_pass_ipa_inline (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_free_lang_data (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_free_fn_summary (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt); +extern ipa_opt_pass_d *make_pass_ipa_reorder (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt); diff --git a/gcc/varasm.c b/gcc/varasm.c index a7c22523f9f..498b9ed59fd 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -601,6 +601,15 @@ default_function_section (tree decl, enum node_frequency freq, if (exit && freq != NODE_FREQUENCY_UNLIKELY_EXECUTED) return get_named_text_section (decl, ".text.exit", NULL); + cgraph_node *node = cgraph_node::get (decl); + if (node->text_sorted_order > 0 && freq != NODE_FREQUENCY_UNLIKELY_EXECUTED) + { + char section_name[64]; + sprintf (section_name, ".text.sorted.%010d", + node->text_sorted_order); + return get_named_text_section (decl, section_name, NULL); + } + /* Group cold functions together, similarly for hot code. */ switch (freq) { -- 2.23.0