Message ID | 3167e521-aa5b-e47c-6d9b-956a1af2a886@oracle.com |
---|---|
State | New |
Headers | show |
Series | [PING] Make function clone name numbering independent. | expand |
On 08/13/2018 05:58 PM, Michael Ploujnikov wrote: > Ping and I've updated the patch since last time as follows: > > - unittest scans assembly rather than the constprop dump because its > forward changed > - unittests should handle different hosts where any of > NO_DOT_IN_LABEL, NO_DOLLAR_IN_LABEL or __USER_LABEL_PREFIX__ may > be defined > - not 100% it's safe to change DECL_NAME to DECL_ASSEMBLER_NAME in > cgraph_node::create_virtual_clone, but I've attempted to reduce > some code duplication > - lto-partition.c: privatize_symbol_name_1 *does* need numbered > names > - but cold sections don't > - Expecting an IDENTIFIER_NODE in clone_function_name_1 avoids > unreliable string pointer use as pointed out in the first review > - renamed clone_function_name_1 and clone_function_name to > numbered_clone_function_name_1 and numbered_clone_function_name to > clarify purpose and discourage future unintended uses Richi has more state here than I do, so I'm going to let him own it. I know he's just returning from PTO, so it's going to take him a bit of time to catch up. jeff
On 2018-08-13 07:58 PM, Michael Ploujnikov wrote: > Ping and I've updated the patch since last time as follows: > > - unittest scans assembly rather than the constprop dump because its > forward changed > - unittests should handle different hosts where any of > NO_DOT_IN_LABEL, NO_DOLLAR_IN_LABEL or __USER_LABEL_PREFIX__ may > be defined > - not 100% it's safe to change DECL_NAME to DECL_ASSEMBLER_NAME in > cgraph_node::create_virtual_clone, but I've attempted to reduce > some code duplication > - lto-partition.c: privatize_symbol_name_1 *does* need numbered > names > - but cold sections don't > - Expecting an IDENTIFIER_NODE in clone_function_name_1 avoids > unreliable string pointer use as pointed out in the first review > - renamed clone_function_name_1 and clone_function_name to > numbered_clone_function_name_1 and numbered_clone_function_name to > clarify purpose and discourage future unintended uses > > Also bootstrapped and regtested. Ping. I've done some more digging into the current uses of numbered_clone_function_name and checked if any tests fail if I change it to suffixed_function_name: - gcc/cgraphclones.c: DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk"); - no new tests fail, inconclusive - and despite the comment on redirect_callee_duplicating_thunks about "one or more" duplicates it doesn't seem like duplicate_thunk_for_node would be called more than once for each node, assuming each node is named uniquely, but I'm far from an expert in this area - gcc/cgraphclones.c: SET_DECL_ASSEMBLER_NAME (new_decl, numbered_clone_function_name (old_decl, suffix)); - called by cgraph_node::create_virtual_clone, definitely needs it - this is where clones like .constprop come from - gcc/cgraphclones.c: DECL_NAME (new_decl) = numbered_clone_function_name (old_decl, suffix); - more tests fail - this is where clones like .simdclone come from - called by cgraph_node::create_version_clone_with_body, most likely needs it - gcc/config/rs6000/rs6000.c: tree decl_name = numbered_clone_function_name (default_decl, "resolver"); - doesn't look like this needs the numbering as there should only be one resolver per multi-version function, but need someone with an rs/6000 to confirm - gcc/lto/lto-partition.c: numbered_clone_function_name_1 (identifier, - definitely needed for disambiguation, shown by unit tests failing - gcc/multiple_target.c: numbered_clone_function_name (node->decl, - create_dispatcher_calls says it only creates the dispatcher once - no new tests fail, inconclusive - gcc/multiple_target.c: numbered_clone_function_name (node->decl, - I have a feeling this isn't necessary - no new tests fail, inconclusive - gcc/omp-expand.c: DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel"); - no new tests fail, inconclusive - I didn't see (and couldn't figure out a way to get) any of the existing omp/acc tests actually exercise this codeptah - gcc/omp-low.c: return numbered_clone_function_name (current_function_decl, - definitely needed based on gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c and others - gcc/omp-simd-clone.c: DECL_NAME (new_decl) = numbered_clone_function_name (old_decl, "simdclone"); - no tests fail, inconclusive - can definitely have more than one simdclone per function as above, but maybe not these external types - some tests, like declare-simd.c actually exercise this codepath, but I couldn't find the resulting .simdclone decl the the simdclone pass dump nor in any of the other dumps to verify - gcc/symtab.c: DECL_NAME (new_decl) = numbered_clone_function_name (node->decl, "localalias"); - no tests fail, inconclusive - my understanding of function_and_variable_visibility says that there can only be one per function so maybe this isn't; hubicka? I'll add patches to switch these once someone with expertise in these areas can confirm that the numbering isn't needed in the respective decl names. - Michael From adde0632266d3f1b0540e09b9db931df4302d2bc Mon Sep 17 00:00:00 2001 From: Michael Ploujnikov <michael.ploujnikov@oracle.com> Date: Mon, 16 Jul 2018 12:55:49 -0400 Subject: [PATCH 1/4] Make function assembly more independent. This changes clone_function_name such that clone names are based on a per-function counter rather than a global one. gcc: 2018-08-02 Michael Ploujnikov <michael.ploujnikov@oracle.com> Make function clone name numbering independent. * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids. (clone_function_name_1): Take an IDENTIFIER_NODE instead of a string. Use clone_fn_id_num. lto: 2018-08-02 Michael Ploujnikov <michael.ploujnikov@oracle.com> * lto-partition.c (privatize_symbol_name_1): Pass a persistent identifier node to clone_function_name_1. testsuite: 2018-08-02 Michael Ploujnikov <michael.ploujnikov@oracle.com> Clone id counters should be completely independent from one another. * gcc.dg/independent-cloneids-1.c: New test. --- gcc/cgraph.h | 2 +- gcc/cgraphclones.c | 19 +++++++++----- gcc/lto/lto-partition.c | 5 ++-- gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++ 4 files changed, 54 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c diff --git gcc/cgraph.h gcc/cgraph.h index a8b1b4c..fadc107 100644 --- gcc/cgraph.h +++ gcc/cgraph.h @@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool, profile_count); tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree); /* In cgraphclones.c */ -tree clone_function_name_1 (const char *, const char *); +tree clone_function_name_1 (tree identifier, const char *); tree clone_function_name (tree decl, const char *); void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *, diff --git gcc/cgraphclones.c gcc/cgraphclones.c index 6e84a31..fb81fbd 100644 --- gcc/cgraphclones.c +++ gcc/cgraphclones.c @@ -512,14 +512,15 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, return new_node; } -static GTY(()) unsigned int clone_fn_id_num; +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; -/* Return a new assembler name for a clone with SUFFIX of a decl named - NAME. */ +/* Return a new assembler name for a numbered clone with SUFFIX of a + decl named IDENTIFIER. */ tree -clone_function_name_1 (const char *name, const char *suffix) +clone_function_name_1 (tree identifier, const char *suffix) { + const char *name = IDENTIFIER_POINTER (identifier); size_t len = strlen (name); char *tmp_name, *prefix; @@ -527,7 +528,12 @@ clone_function_name_1 (const char *name, const char *suffix) memcpy (prefix, name, len); strcpy (prefix + len + 1, suffix); prefix[len] = symbol_table::symbol_suffix_separator (); - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); + /* Initialize the per-function counter hash table on first call */ + if (!clone_fn_ids) + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); + unsigned int &suffix_counter = clone_fn_ids->get_or_insert(name); + ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, suffix_counter); + suffix_counter++; return get_identifier (tmp_name); } @@ -536,8 +542,7 @@ clone_function_name_1 (const char *name, const char *suffix) tree clone_function_name (tree decl, const char *suffix) { - tree name = DECL_ASSEMBLER_NAME (decl); - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); + return clone_function_name_1 (DECL_ASSEMBLER_NAME (decl), suffix); } diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c index c7a5710..e3efa71 100644 --- gcc/lto/lto-partition.c +++ gcc/lto/lto-partition.c @@ -962,11 +962,12 @@ privatize_symbol_name_1 (symtab_node *node, tree decl) if (must_not_rename (node, name)) return false; - name = maybe_rewrite_identifier (name); + tree identifier = get_identifier (maybe_rewrite_identifier (name)); symtab->change_decl_assembler_name (decl, - clone_function_name_1 (name, + clone_function_name_1 (identifier, "lto_priv")); + name = IDENTIFIER_POINTER (identifier); if (node->lto_file_data) lto_record_renamed_decl (node->lto_file_data, name, IDENTIFIER_POINTER diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c new file mode 100644 index 0000000..3203895 --- /dev/null +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone" } */ + +extern int printf (const char *, ...); + +static int __attribute__ ((noinline)) +foo (int arg) +{ + return 7 * arg; +} + +static int __attribute__ ((noinline)) +bar (int arg) +{ + return arg * arg; +} + +int +baz (int arg) +{ + printf("%d\n", bar (3)); + printf("%d\n", bar (4)); + printf("%d\n", foo (5)); + printf("%d\n", foo (6)); + /* adding or removing the following call should not affect foo + function's clone numbering */ + printf("%d\n", bar (7)); + return foo (8); +} + +/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]0:} 1 } } */ +/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]1:} 1 } } */ +/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]2:} 1 } } */ +/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]0:} 1 } } */ +/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]1:} 1 } } */ +/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]2:} 1 } } */ +/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]3:} } } */ +/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]4:} } } */
Hi, On Fri, Aug 31 2018, Michael Ploujnikov wrote: > I've done some more digging into the current uses of > numbered_clone_function_name and checked if any tests fail if I change > it to suffixed_function_name: > > - gcc/cgraphclones.c: DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk"); > - no new tests fail, inconclusive > - and despite the comment on redirect_callee_duplicating_thunks > about "one or more" duplicates it doesn't seem like > duplicate_thunk_for_node would be called more than once for each > node, assuming each node is named uniquely, but I'm far from an > expert in this area The comment means that if there is a chain of thunks, the method clones all of them. Nevertheless, you need name numbering here for the same reason why you need them for constprop. > - gcc/omp-expand.c: DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel"); > - no new tests fail, inconclusive > - I didn't see (and couldn't figure out a way to get) any of the > existing omp/acc tests actually exercise this codeptah I guess this one should not need it. Build with --enable-offload-targets=hsa and run gomp.exp to try yourself. I can run run-time HSA tests for you if you want. Martin
On Mon, Sep 3, 2018 at 12:02 PM Martin Jambor <mjambor@suse.cz> wrote: > > Hi, > > On Fri, Aug 31 2018, Michael Ploujnikov wrote: > > I've done some more digging into the current uses of > > numbered_clone_function_name and checked if any tests fail if I change > > it to suffixed_function_name: > > > > - gcc/cgraphclones.c: DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk"); > > - no new tests fail, inconclusive > > - and despite the comment on redirect_callee_duplicating_thunks > > about "one or more" duplicates it doesn't seem like > > duplicate_thunk_for_node would be called more than once for each > > node, assuming each node is named uniquely, but I'm far from an > > expert in this area > > The comment means that if there is a chain of thunks, the method clones > all of them. Nevertheless, you need name numbering here for the same > reason why you need them for constprop. The remaining question I had with the patch was if maybe all callers can handle assigning the numbering themselves, thus do sth like for-each-clone-of (i, fn) DECL_NAME (...) = numbered_clone_function_name (..., "artificial_thunk", i); which would make the map of name -> number unnecessary. > > > - gcc/omp-expand.c: DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel"); > > - no new tests fail, inconclusive > > - I didn't see (and couldn't figure out a way to get) any of the > > existing omp/acc tests actually exercise this codeptah > > I guess this one should not need it. Build with > --enable-offload-targets=hsa and run gomp.exp to try yourself. I can > run run-time HSA tests for you if you want. > > Martin
Hi, On Mon, Sep 03 2018, Richard Biener wrote: > On Mon, Sep 3, 2018 at 12:02 PM Martin Jambor <mjambor@suse.cz> wrote: >> >> Hi, >> >> On Fri, Aug 31 2018, Michael Ploujnikov wrote: >> > I've done some more digging into the current uses of >> > numbered_clone_function_name and checked if any tests fail if I change >> > it to suffixed_function_name: >> > >> > - gcc/cgraphclones.c: DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk"); >> > - no new tests fail, inconclusive >> > - and despite the comment on redirect_callee_duplicating_thunks >> > about "one or more" duplicates it doesn't seem like >> > duplicate_thunk_for_node would be called more than once for each >> > node, assuming each node is named uniquely, but I'm far from an >> > expert in this area >> >> The comment means that if there is a chain of thunks, the method clones >> all of them. Nevertheless, you need name numbering here for the same >> reason why you need them for constprop. > > The remaining question I had with the patch was if maybe all callers > can handle assigning > the numbering themselves, thus do sth like > > for-each-clone-of (i, fn) > DECL_NAME (...) = numbered_clone_function_name (..., > "artificial_thunk", i); > > which would make the map of name -> number unnecessary. > That is what I would prefer too but it also has downsides. Callers would have to do their book keeping and the various cloning interfaces would get another parameter when already they have quite a few (introducing some cloning context classes seems like an overkill to me :-). If we want to get rid of .constprop discrepancies, something like the following (only very mildly tested) patch should be enough (note that IPA-CP currently does not really need the new has because it creates clones in only one pass through the call graph, but that is something likely to change). We could then adjust other cloners that we care about. However, if clones of thunks are one of them, then the optional parameter will also proliferate to cgraph_node::create_clone(), cgraph_edge::redirect_callee_duplicating_thunks() and duplicate_thunk_for_node(). create_version_clone_with_body would almost certainly need it because of IPA-SRA too (IPA-SRA can of course always pass zero). Martin diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 2b00f0165fa..703c3cfea7b 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -964,11 +964,15 @@ public: cgraph_node *new_inlined_to, bitmap args_to_skip, const char *suffix = NULL); - /* Create callgraph node clone with new declaration. The actual body will - be copied later at compilation stage. */ + /* Create callgraph node clone with new declaration. The actual body will be + copied later at compilation stage. The name of the new clone will be + constructed from the name of the original name, SUFFIX and a number which + can either be NUM_SUFFIX if non-negative or a unique incremented integer + otherwise. */ cgraph_node *create_virtual_clone (vec<cgraph_edge *> redirect_callers, vec<ipa_replace_map *, va_gc> *tree_map, - bitmap args_to_skip, const char * suffix); + bitmap args_to_skip, const char * suffix, + int num_suffix = -1); /* cgraph node being removed from symbol table; see if its entry can be replaced by other inline clone. */ @@ -2376,8 +2380,9 @@ basic_block init_lowered_empty_function (tree, bool, profile_count); tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree); /* In cgraphclones.c */ -tree clone_function_name_1 (const char *, const char *); -tree clone_function_name (tree decl, const char *); +tree clone_function_name_1 (const char *name, const char *suffix, + int num_suffix = -1); +tree clone_function_name (tree decl, const char *suffix, int num_suffix = -1); void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *, bool, bitmap, bool, bitmap, basic_block); diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index 0c0a94b04a3..865c3fa1ad0 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -513,11 +513,13 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, static GTY(()) unsigned int clone_fn_id_num; -/* Return a new assembler name for a clone with SUFFIX of a decl named - NAME. */ +/* Return a new assembler name for a clone with SUFFIX of a decl named NAME + Apart from, string SUFFIX, the new name will end with either NUM_SIFFIX, if + non-negative, or a unique unspecified number otherwise. . */ tree -clone_function_name_1 (const char *name, const char *suffix) +clone_function_name_1 (const char *name, const char *suffix, + int num_suffix) { size_t len = strlen (name); char *tmp_name, *prefix; @@ -526,22 +528,32 @@ clone_function_name_1 (const char *name, const char *suffix) memcpy (prefix, name, len); strcpy (prefix + len + 1, suffix); prefix[len] = symbol_table::symbol_suffix_separator (); - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); + unsigned int num; + if (num_suffix < 0) + num = clone_fn_id_num++; + else + num = (unsigned) num_suffix; + ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, num); return get_identifier (tmp_name); } -/* Return a new assembler name for a clone of DECL with SUFFIX. */ +/* Return a new assembler name for a clone of DECL with SUFFIX. Apart from, + string SUFFIX, the new name will end with either NUM_SIFFIX, if + non-negative, or a unique unspecified number otherwise. */ tree -clone_function_name (tree decl, const char *suffix) +clone_function_name (tree decl, const char *suffix, int num_suffix) { tree name = DECL_ASSEMBLER_NAME (decl); - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); + return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix, num_suffix); } -/* Create callgraph node clone with new declaration. The actual body will - be copied later at compilation stage. +/* Create callgraph node clone with new declaration. The actual body will be + copied later at compilation stage. The name of the new clone will be + constructed from the name of the original name, SUFFIX and a number which + can either be NUM_SUFFIX if non-negative or a unique incremented integer + otherwise. TODO: after merging in ipa-sra use function call notes instead of args_to_skip bitmap interface. @@ -549,7 +561,8 @@ clone_function_name (tree decl, const char *suffix) cgraph_node * cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers, vec<ipa_replace_map *, va_gc> *tree_map, - bitmap args_to_skip, const char * suffix) + bitmap args_to_skip, const char * suffix, + int num_suffix) { tree old_decl = decl; cgraph_node *new_node = NULL; @@ -584,7 +597,8 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers, strcpy (name + len + 1, suffix); name[len] = '.'; DECL_NAME (new_decl) = get_identifier (name); - SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix)); + SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix, + num_suffix)); SET_DECL_RTL (new_decl, NULL); new_node = create_clone (new_decl, count, false, diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index afc45969558..52690996419 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -376,6 +376,9 @@ static profile_count max_count; static long overall_size, max_new_size; +/* Hash to give different UID suffixes */ +static hash_map<cgraph_node *, int> *clone_num_suffixes; + /* Return the param lattices structure corresponding to the Ith formal parameter of the function described by INFO. */ static inline struct ipcp_param_lattices * @@ -3850,8 +3853,11 @@ create_specialized_node (struct cgraph_node *node, } } + int &suffix_counter = clone_num_suffixes->get_or_insert(node); new_node = node->create_virtual_clone (callers, replace_trees, - args_to_skip, "constprop"); + args_to_skip, "constprop", + suffix_counter); + suffix_counter++; bool have_self_recursive_calls = !self_recursive_calls.is_empty (); for (unsigned j = 0; j < self_recursive_calls.length (); j++) @@ -5065,6 +5071,7 @@ ipcp_driver (void) ipa_check_create_node_params (); ipa_check_create_edge_args (); + clone_num_suffixes = new hash_map<cgraph_node *, int>; if (dump_file) { @@ -5086,6 +5093,7 @@ ipcp_driver (void) ipcp_store_vr_results (); /* Free all IPCP structures. */ + delete clone_num_suffixes; free_toporder_info (&topo); delete edge_clone_summaries; edge_clone_summaries = NULL;
Hi Martin, Richard, Thanks for your responses. On 2018-09-03 09:15 AM, Martin Jambor wrote: > Hi, > > On Mon, Sep 03 2018, Richard Biener wrote: >> On Mon, Sep 3, 2018 at 12:02 PM Martin Jambor <mjambor@suse.cz> wrote: >>> >>> Hi, >>> >>> On Fri, Aug 31 2018, Michael Ploujnikov wrote: >>>> I've done some more digging into the current uses of >>>> numbered_clone_function_name and checked if any tests fail if I change >>>> it to suffixed_function_name: >>>> >>>> - gcc/cgraphclones.c: DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk"); >>>> - no new tests fail, inconclusive >>>> - and despite the comment on redirect_callee_duplicating_thunks >>>> about "one or more" duplicates it doesn't seem like >>>> duplicate_thunk_for_node would be called more than once for each >>>> node, assuming each node is named uniquely, but I'm far from an >>>> expert in this area >>> >>> The comment means that if there is a chain of thunks, the method clones >>> all of them. Nevertheless, you need name numbering here for the same >>> reason why you need them for constprop. >> >> The remaining question I had with the patch was if maybe all callers >> can handle assigning >> the numbering themselves, thus do sth like >> >> for-each-clone-of (i, fn) >> DECL_NAME (...) = numbered_clone_function_name (..., >> "artificial_thunk", i); >> >> which would make the map of name -> number unnecessary. Please see my comment below. >> > > That is what I would prefer too but it also has downsides. Callers > would have to do their book keeping and the various cloning interfaces > would get another parameter when already they have quite a few > (introducing some cloning context classes seems like an overkill to me > :-). I'm fine with doing it the way you guys are suggesting, but just to explain my original thinking: The way I see it is that there will always be some kind of name -> number mapping, even if it's managed by each individual user and even if it's actually cgraph_node -> number. Needless to say I'm far from an expert in all of the involved areas (this is my first contribution to GCC) so the most effective approach I could come up is doing it all in one place. Now with your expert advice and as I get a better understanding of how things like sra and other clones are created, I'm starting to see that a more tailored approach is possible and probably the way it should have been done in the first place. > > If we want to get rid of .constprop discrepancies, Could you please clarify to me what exactly you mean by the ".constprop discrepancies"? > something like the > following (only very mildly tested) patch should be enough (note that > IPA-CP currently does not really need the new has because it creates What do you mean "it doesn't need the new hash"? My independent-cloneids-1.c test shows that a function can have more than one .constprop clone, but I'm probably just not understanding something. > clones in only one pass through the call graph, but that is something > likely to change). We could then adjust other cloners that we care > about. > > However, if clones of thunks are one of them, then the optional > parameter will also proliferate to cgraph_node::create_clone(), > cgraph_edge::redirect_callee_duplicating_thunks() and > duplicate_thunk_for_node(). create_version_clone_with_body would almost > certainly need it because of IPA-SRA too (IPA-SRA can of course always > pass zero). I'll try to do this and to convert the other users in the next version of the patch, but I might need some help with understanding how some of those passes work! > > Martin > > > > diff --git a/gcc/cgraph.h b/gcc/cgraph.h > index 2b00f0165fa..703c3cfea7b 100644 > --- a/gcc/cgraph.h > +++ b/gcc/cgraph.h > @@ -964,11 +964,15 @@ public: > cgraph_node *new_inlined_to, > bitmap args_to_skip, const char *suffix = NULL); > > - /* Create callgraph node clone with new declaration. The actual body will > - be copied later at compilation stage. */ > + /* Create callgraph node clone with new declaration. The actual body will be > + copied later at compilation stage. The name of the new clone will be > + constructed from the name of the original name, SUFFIX and a number which > + can either be NUM_SUFFIX if non-negative or a unique incremented integer > + otherwise. */ > cgraph_node *create_virtual_clone (vec<cgraph_edge *> redirect_callers, > vec<ipa_replace_map *, va_gc> *tree_map, > - bitmap args_to_skip, const char * suffix); > + bitmap args_to_skip, const char * suffix, > + int num_suffix = -1); > > /* cgraph node being removed from symbol table; see if its entry can be > replaced by other inline clone. */ > @@ -2376,8 +2380,9 @@ basic_block init_lowered_empty_function (tree, bool, profile_count); > tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree); > /* In cgraphclones.c */ > > -tree clone_function_name_1 (const char *, const char *); > -tree clone_function_name (tree decl, const char *); > +tree clone_function_name_1 (const char *name, const char *suffix, > + int num_suffix = -1); > +tree clone_function_name (tree decl, const char *suffix, int num_suffix = -1); I can see how using the default -1 helped you keep this patch very small and simple, but for my version of the patch I would actually prefer to avoid the implicit fallback to the original global counter and instead make the patch as big as necessary to convert all the users who need it. The motivation is that implicit behaviour can sometimes be missed, which I believe is related to why some existing calls to clone_function_name are unnecessary. On this note, what do you think about my idea of renaming the function to numbered_clone_function_name and adding a suffixed_function_name to make a distinction for users who don't care about numbering? > > void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *, > bool, bitmap, bool, bitmap, basic_block); > diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c > index 0c0a94b04a3..865c3fa1ad0 100644 > --- a/gcc/cgraphclones.c > +++ b/gcc/cgraphclones.c > @@ -513,11 +513,13 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, > > static GTY(()) unsigned int clone_fn_id_num; > > -/* Return a new assembler name for a clone with SUFFIX of a decl named > - NAME. */ > +/* Return a new assembler name for a clone with SUFFIX of a decl named NAME > + Apart from, string SUFFIX, the new name will end with either NUM_SIFFIX, if > + non-negative, or a unique unspecified number otherwise. . */ > > tree > -clone_function_name_1 (const char *name, const char *suffix) > +clone_function_name_1 (const char *name, const char *suffix, > + int num_suffix) > { > size_t len = strlen (name); > char *tmp_name, *prefix; > @@ -526,22 +528,32 @@ clone_function_name_1 (const char *name, const char *suffix) > memcpy (prefix, name, len); > strcpy (prefix + len + 1, suffix); > prefix[len] = symbol_table::symbol_suffix_separator (); > - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); > + unsigned int num; > + if (num_suffix < 0) > + num = clone_fn_id_num++; > + else > + num = (unsigned) num_suffix; > + ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, num); > return get_identifier (tmp_name); > } > > -/* Return a new assembler name for a clone of DECL with SUFFIX. */ > +/* Return a new assembler name for a clone of DECL with SUFFIX. Apart from, > + string SUFFIX, the new name will end with either NUM_SIFFIX, if > + non-negative, or a unique unspecified number otherwise. */ > > tree > -clone_function_name (tree decl, const char *suffix) > +clone_function_name (tree decl, const char *suffix, int num_suffix) > { > tree name = DECL_ASSEMBLER_NAME (decl); > - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); > + return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix, num_suffix); > } > > > -/* Create callgraph node clone with new declaration. The actual body will > - be copied later at compilation stage. > +/* Create callgraph node clone with new declaration. The actual body will be > + copied later at compilation stage. The name of the new clone will be > + constructed from the name of the original name, SUFFIX and a number which > + can either be NUM_SUFFIX if non-negative or a unique incremented integer > + otherwise. > > TODO: after merging in ipa-sra use function call notes instead of args_to_skip > bitmap interface. > @@ -549,7 +561,8 @@ clone_function_name (tree decl, const char *suffix) > cgraph_node * > cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers, > vec<ipa_replace_map *, va_gc> *tree_map, > - bitmap args_to_skip, const char * suffix) > + bitmap args_to_skip, const char * suffix, > + int num_suffix) > { > tree old_decl = decl; > cgraph_node *new_node = NULL; > @@ -584,7 +597,8 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers, > strcpy (name + len + 1, suffix); > name[len] = '.'; > DECL_NAME (new_decl) = get_identifier (name); > - SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix)); > + SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix, > + num_suffix)); > SET_DECL_RTL (new_decl, NULL); > > new_node = create_clone (new_decl, count, false, > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c > index afc45969558..52690996419 100644 > --- a/gcc/ipa-cp.c > +++ b/gcc/ipa-cp.c > @@ -376,6 +376,9 @@ static profile_count max_count; > > static long overall_size, max_new_size; > > +/* Hash to give different UID suffixes */ > +static hash_map<cgraph_node *, int> *clone_num_suffixes; > + > /* Return the param lattices structure corresponding to the Ith formal > parameter of the function described by INFO. */ > static inline struct ipcp_param_lattices * > @@ -3850,8 +3853,11 @@ create_specialized_node (struct cgraph_node *node, > } > } > > + int &suffix_counter = clone_num_suffixes->get_or_insert(node); > new_node = node->create_virtual_clone (callers, replace_trees, > - args_to_skip, "constprop"); > + args_to_skip, "constprop", > + suffix_counter); > + suffix_counter++; > > bool have_self_recursive_calls = !self_recursive_calls.is_empty (); > for (unsigned j = 0; j < self_recursive_calls.length (); j++) > @@ -5065,6 +5071,7 @@ ipcp_driver (void) > > ipa_check_create_node_params (); > ipa_check_create_edge_args (); > + clone_num_suffixes = new hash_map<cgraph_node *, int>; > > if (dump_file) > { > @@ -5086,6 +5093,7 @@ ipcp_driver (void) > ipcp_store_vr_results (); > > /* Free all IPCP structures. */ > + delete clone_num_suffixes; I like how this allows the reduction of the lifetime of the mapping! Richard: that helps with your concerns about the extra memory usage, right? > free_toporder_info (&topo); > delete edge_clone_summaries; > edge_clone_summaries = NULL; > Thanks again, - Michael
Hi Martin, On 2018-09-03 06:01 AM, Martin Jambor wrote: > Hi, > > On Fri, Aug 31 2018, Michael Ploujnikov wrote: >> I've done some more digging into the current uses of >> numbered_clone_function_name and checked if any tests fail if I change >> it to suffixed_function_name: >> >> - gcc/cgraphclones.c: DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk"); >> - no new tests fail, inconclusive >> - and despite the comment on redirect_callee_duplicating_thunks >> about "one or more" duplicates it doesn't seem like >> duplicate_thunk_for_node would be called more than once for each >> node, assuming each node is named uniquely, but I'm far from an >> expert in this area > > The comment means that if there is a chain of thunks, the method clones > all of them. Nevertheless, you need name numbering here for the same > reason why you need them for constprop. > > >> - gcc/omp-expand.c: DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel"); >> - no new tests fail, inconclusive >> - I didn't see (and couldn't figure out a way to get) any of the >> existing omp/acc tests actually exercise this codeptah > > I guess this one should not need it. Build with > --enable-offload-targets=hsa and run gomp.exp to try yourself. I can > run run-time HSA tests for you if you want. > > Martin > I've tried building with numbered_clone_function_name replaced by suffixed_function_name and with --enable-offload-targets=hsa and didn't see any errors in gomp.exp. I don't have a readily available HSA setup so if you could do a quick test, I would really appreciate it! - Michael
Hi, On Tue, Sep 04 2018, Michael Ploujnikov wrote: > > I've tried building with numbered_clone_function_name replaced by > suffixed_function_name and with --enable-offload-targets=hsa and > didn't see any errors in gomp.exp. I don't have a readily available > HSA setup so if you could do a quick test, I would really appreciate > it! Sorry it took so long, I have run regular tests on trunk today and your patch does not appear to have caused any issues. Martin
On 2018-12-04 7:48 a.m., Martin Jambor wrote: > Hi, > > On Tue, Sep 04 2018, Michael Ploujnikov wrote: >> >> I've tried building with numbered_clone_function_name replaced by >> suffixed_function_name and with --enable-offload-targets=hsa and >> didn't see any errors in gomp.exp. I don't have a readily available >> HSA setup so if you could do a quick test, I would really appreciate >> it! > > Sorry it took so long, I have run regular tests on trunk today and your > patch does not appear to have caused any issues. > > Martin > Thank you for checking. - Michael
From d96bedcf51c289f3c80a12a681a7831d9e287874 Mon Sep 17 00:00:00 2001 From: Michael Ploujnikov <michael.ploujnikov@oracle.com> Date: Mon, 13 Aug 2018 13:59:46 -0400 Subject: [PATCH 4/4] Use suffixed_function_name in cgraph_node::create_virtual_clone. gcc: 2018-08-02 Michael Ploujnikov <michael.ploujnikov@oracle.com> * cgraphclones.c (cgraph_node::create_virtual_clone): Use suffixed_function_name. --- gcc/cgraphclones.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git gcc/cgraphclones.c gcc/cgraphclones.c index 784f1df..876b7fa 100644 --- gcc/cgraphclones.c +++ gcc/cgraphclones.c @@ -576,9 +576,8 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers, tree old_decl = decl; cgraph_node *new_node = NULL; tree new_decl; - size_t len, i; + size_t i; ipa_replace_map *map; - char *name; gcc_checking_assert (local.versionable); gcc_assert (local.can_change_signature || !args_to_skip); @@ -600,12 +599,7 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers, sometimes storing only clone decl instead of original. */ /* Generate a new name for the new version. */ - len = IDENTIFIER_LENGTH (DECL_NAME (old_decl)); - name = XALLOCAVEC (char, len + strlen (suffix) + 2); - memcpy (name, IDENTIFIER_POINTER (DECL_NAME (old_decl)), len); - strcpy (name + len + 1, suffix); - name[len] = '.'; - DECL_NAME (new_decl) = get_identifier (name); + DECL_NAME (new_decl) = suffixed_function_name (old_decl, suffix); SET_DECL_ASSEMBLER_NAME (new_decl, numbered_clone_function_name (old_decl, suffix)); SET_DECL_RTL (new_decl, NULL); -- 2.7.4