From patchwork Sun Dec 8 17:05:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1205716 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-515459-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ucw.cz Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="ZiCRCzCq"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47WCQ36Fbjz9sP6 for ; Mon, 9 Dec 2019 04:05:45 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=q7oo8oNMSukr0ZOkkyZJKqv+1BWuQAlJS0Hm6dC76DTekrbNWEU8m TDcyBSe3G14tM+oZOlyPtGssE81g6k9Oz51a2LNz6H/Rx2TjAt0bIYqH6XV2JAb6 1tjMcURIJ6af0LHNSLNRnuhe1/gdiYGB2lUfWrpBNxI1UIs57y+HhA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=g9yqzsAyZvpKUwMl4FBW4iiG/cs=; b=ZiCRCzCq5wYY61g0VSNG SRDBUzyJ7yrRThdQjGKX197CHwYh/zhVga3LCudXuAr6Af4LqdP2JPD5VkgVhDAK t4N9fDXTOqsZeJLmJCbHolMX2sLTdrxvo3FTX5GO0r8qWCpRsmYc8jrjsXuXOPWi o3sadm0RSFtCiD4/i8ZKmuU= Received: (qmail 96374 invoked by alias); 8 Dec 2019 17:05:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 96363 invoked by uid 89); 8 Dec 2019 17:05:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS autolearn=ham version=3.3.1 spammy=Watch, balanced, Turn, reordered X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 08 Dec 2019 17:05:36 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id A34B62805EF; Sun, 8 Dec 2019 18:05:33 +0100 (CET) Date: Sun, 8 Dec 2019 18:05:33 +0100 From: Jan Hubicka To: gcc-patches@gcc.gnu.org Subject: Fix overflows in -fprofile-reorder-functions Message-ID: <20191208170533.tqjym4jbeljzsnwo@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: NeoMutt/20170113 (1.7.2) Hi, this patch fixes three sissues with -fprofile-reorder-functions: 1) First is that tp_first_run is stored as 32bit integer while it can easily overflow (and does so during Firefox profiling). 2) Second problem is that flag_profile_functions can not be tested w/o function context. The changes to expand_all_functions makes it to work on mixed units by first outputting all functions w/o -fprofile-reorder-function (or with no profile info) and then outputting in first_run order 3) LTO partitioner was mixing up order by tp_first_run and by order. for no_reorder we definitly want to order via first, while for everything else we want to roder by second. I have also merged duplicated comparators since they are bit fragile into tp_first_run_node_cmp. I originaly started to look into this because of undefined symbols with Firefox PGO builds. These symbols went away with fixing these bug but I am not quite sure how. it is possible that there is another problem in lto_blanced_map but even after reading the noreorder code few times carefuly I did not find it. Other explanation would be that our new qsort with broken comparator due to overflow can actualy remove some entries in the array, but that sounds bit crazy. Bootstrapped/regested x86_64-linux. Comitted. * cgraph.c (cgraph_node::dump): Make tp_first_run 64bit. * cgraph.h (cgrpah_node): Likewise. (tp_first_run_node_cmp): Deeclare. * cgraphunit.c (node_cmp): Rename to ... (tp_first_run_node_cmp): ... this; export; watch for 64bit overflows; clear tp_first_run for no_reorder and !flag_profile_reorder_functions. (expand_all_functions): Collect tp_first_run and normal functions to two vectors so the other functions remain sorted. Do not check for flag_profile_reorder_functions it is function local flag. * profile.c (compute_value_histograms): Update tp_first_run printing. * lto-partition.c (node_cmp): Turn into simple order comparsions. (varpool_node_cmp): Remove. (add_sorted_nodes): Use node_cmp. (lto_balanced_map): Use tp_first_run_node_cmp. Index: cgraph.c =================================================================== --- cgraph.c (revision 279076) +++ cgraph.c (working copy) @@ -1954,7 +1954,7 @@ cgraph_node::dump (FILE *f) count.dump (f); } if (tp_first_run > 0) - fprintf (f, " first_run:%i", tp_first_run); + fprintf (f, " first_run:%" PRId64, (int64_t) tp_first_run); if (origin) fprintf (f, " nested in:%s", origin->asm_name ()); if (gimple_has_body_p (decl)) Index: cgraph.h =================================================================== --- cgraph.h (revision 279076) +++ cgraph.h (working copy) @@ -1430,6 +1430,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cg /* Expected number of executions: calculated in profile.c. */ profile_count count; + /* Time profiler: first run of function. */ + gcov_type tp_first_run; /* How to scale counts at materialization time; used to merge LTO units with different number of profile runs. */ int count_materialization_scale; @@ -1437,8 +1439,6 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cg unsigned int profile_id; /* ID of the translation unit. */ int unit_id; - /* Time profiler: first run of function. */ - int tp_first_run; /* Set when decl is an abstract function pointed to by the ABSTRACT_DECL_ORIGIN of a reachable function. */ @@ -2463,6 +2463,7 @@ cgraph_inline_failed_type_t cgraph_inlin /* In cgraphunit.c */ void cgraphunit_c_finalize (void); +int tp_first_run_node_cmp (const void *pa, const void *pb); /* Initialize datastructures so DECL is a function in lowered gimple form. IN_SSA is true if the gimple is in SSA. */ Index: cgraphunit.c =================================================================== --- cgraphunit.c (revision 279076) +++ cgraphunit.c (working copy) @@ -2359,19 +2359,33 @@ cgraph_node::expand (void) /* Node comparator that is responsible for the order that corresponds to time when a function was launched for the first time. */ -static int -node_cmp (const void *pa, const void *pb) +int +tp_first_run_node_cmp (const void *pa, const void *pb) { const cgraph_node *a = *(const cgraph_node * const *) pa; const cgraph_node *b = *(const cgraph_node * const *) pb; + gcov_type tp_first_run_a = a->tp_first_run; + gcov_type tp_first_run_b = b->tp_first_run; + + if (!opt_for_fn (a->decl, flag_profile_reorder_functions) + || a->no_reorder) + tp_first_run_a = 0; + if (!opt_for_fn (b->decl, flag_profile_reorder_functions) + || b->no_reorder) + tp_first_run_b = 0; + + if (tp_first_run_a == tp_first_run_b) + return b->order - a->order; /* Functions with time profile must be before these without profile. */ - if (!a->tp_first_run || !b->tp_first_run) - return a->tp_first_run - b->tp_first_run; + if (!tp_first_run_a || !tp_first_run_b) + return tp_first_run_a ? 1 : -1; - return a->tp_first_run != b->tp_first_run - ? b->tp_first_run - a->tp_first_run - : b->order - a->order; + /* Watch for overlflow - tp_first_run is 64bit. */ + if (tp_first_run_a > tp_first_run_b) + return -1; + else + return 1; } /* Expand all functions that must be output. @@ -2390,8 +2404,10 @@ expand_all_functions (void) cgraph_node *node; cgraph_node **order = XCNEWVEC (cgraph_node *, symtab->cgraph_count); + cgraph_node **tp_first_run_order = XCNEWVEC (cgraph_node *, + symtab->cgraph_count); unsigned int expanded_func_count = 0, profiled_func_count = 0; - int order_pos, new_order_pos = 0; + int order_pos, tp_first_run_order_pos = 0, new_order_pos = 0; int i; order_pos = ipa_reverse_postorder (order); @@ -2401,11 +2417,17 @@ expand_all_functions (void) optimization. So we must be sure to not reference them. */ for (i = 0; i < order_pos; i++) if (order[i]->process) - order[new_order_pos++] = order[i]; - - if (flag_profile_reorder_functions) - qsort (order, new_order_pos, sizeof (cgraph_node *), node_cmp); - + { + if (order[i]->tp_first_run + && opt_for_fn (order[i]->decl, flag_profile_reorder_functions)) + tp_first_run_order[tp_first_run_order_pos++] = order[i]; + else + order[new_order_pos++] = order[i]; + } + + /* Output functions in RPO so callers get optimized before callees. This + makes ipa-ra and other propagators to work. + FIXME: This is far from optimal code layout. */ for (i = new_order_pos - 1; i >= 0; i--) { node = order[i]; @@ -2413,13 +2435,25 @@ expand_all_functions (void) if (node->process) { expanded_func_count++; - if(node->tp_first_run) - profiled_func_count++; + node->process = 0; + node->expand (); + } + } + qsort (tp_first_run_order, tp_first_run_order_pos, + sizeof (cgraph_node *), tp_first_run_node_cmp); + for (i = 0; i < tp_first_run_order_pos; i++) + { + node = tp_first_run_order[i]; + + if (node->process) + { + expanded_func_count++; + profiled_func_count++; if (symtab->dump_file) fprintf (symtab->dump_file, - "Time profile order in expand_all_functions:%s:%d\n", - node->asm_name (), node->tp_first_run); + "Time profile order in expand_all_functions:%s:%" PRId64 + "\n", node->asm_name (), (int64_t) node->tp_first_run); node->process = 0; node->expand (); } @@ -2429,7 +2463,7 @@ expand_all_functions (void) fprintf (dump_file, "Expanded functions with time profile (%s):%u/%u\n", main_input_filename, profiled_func_count, expanded_func_count); - if (symtab->dump_file && flag_profile_reorder_functions) + if (symtab->dump_file && tp_first_run_order_pos) fprintf (symtab->dump_file, "Expanded functions with time profile:%u/%u\n", profiled_func_count, expanded_func_count); Index: lto/lto-partition.c =================================================================== --- lto/lto-partition.c (revision 279076) +++ lto/lto-partition.c (working copy) @@ -372,38 +372,9 @@ lto_max_map (void) new_partition ("empty"); } -/* Helper function for qsort; sort nodes by order. noreorder functions must have - been removed earlier. */ -static int -node_cmp (const void *pa, const void *pb) -{ - const struct cgraph_node *a = *(const struct cgraph_node * const *) pa; - const struct cgraph_node *b = *(const struct cgraph_node * const *) pb; - - /* Profile reorder flag enables function reordering based on first execution - of a function. All functions with profile are placed in ascending - order at the beginning. */ - - if (flag_profile_reorder_functions) - { - /* Functions with time profile are sorted in ascending order. */ - if (a->tp_first_run && b->tp_first_run) - return a->tp_first_run != b->tp_first_run - ? a->tp_first_run - b->tp_first_run - : a->order - b->order; - - /* Functions with time profile are sorted before the functions - that do not have the profile. */ - if (a->tp_first_run || b->tp_first_run) - return b->tp_first_run - a->tp_first_run; - } - - return b->order - a->order; -} - /* Helper function for qsort; sort nodes by order. */ static int -varpool_node_cmp (const void *pa, const void *pb) +node_cmp (const void *pa, const void *pb) { const symtab_node *a = *static_cast (pa); const symtab_node *b = *static_cast (pb); @@ -418,7 +389,7 @@ add_sorted_nodes (vec &ne unsigned i; symtab_node *node; - next_nodes.qsort (varpool_node_cmp); + next_nodes.qsort (node_cmp); FOR_EACH_VEC_ELT (next_nodes, i, node) if (!symbol_partitioned_p (node)) add_symbol_to_partition (partition, node); @@ -537,17 +508,17 @@ lto_balanced_map (int n_lto_partitions, unit tends to import a lot of global trees defined there. We should get better about minimizing the function bounday, but until that things works smoother if we order in source order. */ - order.qsort (node_cmp); + order.qsort (tp_first_run_node_cmp); noreorder.qsort (node_cmp); if (dump_file) { for (unsigned i = 0; i < order.length (); i++) - fprintf (dump_file, "Balanced map symbol order:%s:%u\n", - order[i]->name (), order[i]->tp_first_run); + fprintf (dump_file, "Balanced map symbol order:%s:%" PRId64 "\n", + order[i]->name (), (int64_t) order[i]->tp_first_run); for (unsigned i = 0; i < noreorder.length (); i++) - fprintf (dump_file, "Balanced map symbol no_reorder:%s:%u\n", - noreorder[i]->name (), noreorder[i]->tp_first_run); + fprintf (dump_file, "Balanced map symbol no_reorder:%s:%" PRId64 "\n", + noreorder[i]->name (), (int64_t) noreorder[i]->tp_first_run); } /* Collect all variables that should not be reordered. */ @@ -556,7 +527,7 @@ lto_balanced_map (int n_lto_partitions, && vnode->no_reorder) varpool_order.safe_push (vnode); n_varpool_nodes = varpool_order.length (); - varpool_order.qsort (varpool_node_cmp); + varpool_order.qsort (node_cmp); /* Compute partition size and create the first partition. */ if (param_min_partition_size > max_partition_size) Index: profile.c =================================================================== --- profile.c (revision 279076) +++ profile.c (working copy) @@ -874,7 +874,8 @@ compute_value_histograms (histogram_valu node->tp_first_run = hist->hvalue.counters[0]; if (dump_file) - fprintf (dump_file, "Read tp_first_run: %d\n", node->tp_first_run); + fprintf (dump_file, "Read tp_first_run: %" PRId64 "\n", + (int64_t) node->tp_first_run); } }