Message ID | 069cb2a6-2e37-58fb-2009-35e0300270f1@oracle.com |
---|---|
State | New |
Headers | show |
Series | Make function clone name numbering independent. | expand |
On 2018-07-17 04:25 PM, Michael Ploujnikov wrote: > On 2018-07-17 06:02 AM, Richard Biener wrote: >> On Tue, Jul 17, 2018 at 8:10 AM Bernhard Reutner-Fischer >> <rep.dot.nop@gmail.com> wrote: >>> >>> On 16 July 2018 21:38:36 CEST, Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote: >>>> Hi, >>>> >>> >>>> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc >>>> (1000); >>> >>> Isn't 1000 a bit excessive? What about 64 or thereabouts? >> >> I'm not sure we should throw memory at this "problem" in this way. >> What specific issue >> does this address? > > This goes along with the general theme of preventing changes to one function affecting codegen of others. What I'm seeing in this case is when a function bar is modified codegen decides to create more clones of it (eg: during the constprop pass). These extra clones cause the global counter to increment so the clones of the unchanged function foo are named differently only because of a source change to bar. I was hoping that the testcase would be a good illustration, but perhaps not; is there anything else I can do to make this clearer? > > >> >> Iff then I belive forgoing the automatic counter addidion is the way >> to go and hand >> control of that to the callers (for example the caller in >> lto/lto-partition.c doesn't >> even seem to need that counter. > > How can you tell that privatize_symbol_name_1 doesn't need the counter? I'm assuming it has a good reason to call clone_function_name_1 rather than appending ".lto_priv" itself. > >> You also assume the string you key on persists - luckily the >> lto-partition.c caller >> leaks it but this makes your approach quite fragile in my eye (put the logic >> into clone_function_name instead, where you at least know you are dealing >> with a string from an indentifier which are never collected). >> >> Richard. >> > > Is this what you had in mind?: > > diff --git gcc/cgraphclones.c gcc/cgraphclones.c > index 6e84a31..f000420 100644 > --- gcc/cgraphclones.c > +++ gcc/cgraphclones.c > @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, > return new_node; > } > > -static GTY(()) unsigned int clone_fn_id_num; > +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; > > /* Return a new assembler name for a clone with SUFFIX of a decl named > NAME. */ > @@ -521,14 +521,13 @@ tree > clone_function_name_1 (const char *name, const char *suffix) > { > size_t len = strlen (name); > - char *tmp_name, *prefix; > + char *prefix; > > prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); > memcpy (prefix, name, len); > strcpy (prefix + len + 1, suffix); > prefix[len] = symbol_table::symbol_suffix_separator (); > - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); > - return get_identifier (tmp_name); > + return get_identifier (prefix); > } > > /* Return a new assembler name for a clone of DECL with SUFFIX. */ > @@ -537,7 +536,17 @@ tree > clone_function_name (tree decl, const char *suffix) > { > tree name = DECL_ASSEMBLER_NAME (decl); > - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); > + char *decl_name = IDENTIFIER_POINTER (name); > + char *numbered_name; > + unsigned int *suffix_counter; > + if (!clone_fn_ids) { > + /* Initialize the per-function counter hash table if this is the first call */ > + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); > + } > + suffix_counter = &clone_fn_ids->get_or_insert(name); > + ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter); > + *suffix_counter = *suffix_counter + 1; > + return clone_function_name_1 (numbered_name, suffix); > } > > - Michael > > > Ping, and below is the updated version of the full patch with changelogs: gcc: 2018-07-16 Michael Ploujnikov <michael.ploujnikov@oracle.com> Make function clone name numbering independent. * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids. (clone_function_name_1): Move suffixing to clone_function_name and change it to use per-function clone_fn_ids. testsuite: 2018-07-16 Michael Ploujnikov <michael.ploujnikov@oracle.com> Clone id counters should be completely independent from one another. * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test. --- gcc/cgraphclones.c | 19 ++++++++++---- gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++ 2 files changed, 52 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c diff --git gcc/cgraphclones.c gcc/cgraphclones.c index 6e84a31..e1a77a2 100644 --- gcc/cgraphclones.c +++ gcc/cgraphclones.c @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, return new_node; } -static GTY(()) unsigned int clone_fn_id_num; +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; /* Return a new assembler name for a clone with SUFFIX of a decl named NAME. */ @@ -521,14 +521,13 @@ tree clone_function_name_1 (const char *name, const char *suffix) { size_t len = strlen (name); - char *tmp_name, *prefix; + char *prefix; prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); memcpy (prefix, name, len); strcpy (prefix + len + 1, suffix); prefix[len] = symbol_table::symbol_suffix_separator (); - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); - return get_identifier (tmp_name); + return get_identifier (prefix); } /* Return a new assembler name for a clone of DECL with SUFFIX. */ @@ -537,7 +536,17 @@ tree clone_function_name (tree decl, const char *suffix) { tree name = DECL_ASSEMBLER_NAME (decl); - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); + const char *decl_name = IDENTIFIER_POINTER (name); + char *numbered_name; + unsigned int *suffix_counter; + if (!clone_fn_ids) { + /* Initialize the per-function counter hash table if this is the first call */ + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); + } + suffix_counter = &clone_fn_ids->get_or_insert(decl_name); + ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter); + *suffix_counter = *suffix_counter + 1; + return clone_function_name_1 (numbered_name, suffix); } diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c new file mode 100644 index 0000000..d723e20 --- /dev/null +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp" } */ + +extern int printf (const char *, ...); + +static int __attribute__ ((noinline)) +foo (int arg) +{ + return 7 * arg; +} + +static int __attribute__ ((noinline)) +bar (int arg) +{ + return arg * arg; +} + +int +baz (int arg) +{ + printf("%d\n", bar (3)); + printf("%d\n", bar (4)); + printf("%d\n", foo (5)); + printf("%d\n", foo (6)); + /* adding or removing the following call should not affect foo + function's clone numbering */ + printf("%d\n", bar (7)); + return foo (8); +} + +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */ +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */ +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
On Fri, Jul 20, 2018 at 4:48 AM Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote: > > On 2018-07-17 04:25 PM, Michael Ploujnikov wrote: > > On 2018-07-17 06:02 AM, Richard Biener wrote: > >> On Tue, Jul 17, 2018 at 8:10 AM Bernhard Reutner-Fischer > >> <rep.dot.nop@gmail.com> wrote: > >>> > >>> On 16 July 2018 21:38:36 CEST, Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote: > >>>> Hi, > >>>> > >>> > >>>> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc > >>>> (1000); > >>> > >>> Isn't 1000 a bit excessive? What about 64 or thereabouts? > >> > >> I'm not sure we should throw memory at this "problem" in this way. > >> What specific issue > >> does this address? > > > > This goes along with the general theme of preventing changes to one function affecting codegen of others. What I'm seeing in this case is when a function bar is modified codegen decides to create more clones of it (eg: during the constprop pass). These extra clones cause the global counter to increment so the clones of the unchanged function foo are named differently only because of a source change to bar. I was hoping that the testcase would be a good illustration, but perhaps not; is there anything else I can do to make this clearer? > > > > > >> > >> Iff then I belive forgoing the automatic counter addidion is the way > >> to go and hand > >> control of that to the callers (for example the caller in > >> lto/lto-partition.c doesn't > >> even seem to need that counter. > > > > How can you tell that privatize_symbol_name_1 doesn't need the counter? I'm assuming it has a good reason to call clone_function_name_1 rather than appending ".lto_priv" itself. > > > >> You also assume the string you key on persists - luckily the > >> lto-partition.c caller > >> leaks it but this makes your approach quite fragile in my eye (put the logic > >> into clone_function_name instead, where you at least know you are dealing > >> with a string from an indentifier which are never collected). > >> > >> Richard. > >> > > > > Is this what you had in mind?: > > > > diff --git gcc/cgraphclones.c gcc/cgraphclones.c > > index 6e84a31..f000420 100644 > > --- gcc/cgraphclones.c > > +++ gcc/cgraphclones.c > > @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, > > return new_node; > > } > > > > -static GTY(()) unsigned int clone_fn_id_num; > > +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; > > > > /* Return a new assembler name for a clone with SUFFIX of a decl named > > NAME. */ > > @@ -521,14 +521,13 @@ tree > > clone_function_name_1 (const char *name, const char *suffix) > > { > > size_t len = strlen (name); > > - char *tmp_name, *prefix; > > + char *prefix; > > > > prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); > > memcpy (prefix, name, len); > > strcpy (prefix + len + 1, suffix); > > prefix[len] = symbol_table::symbol_suffix_separator (); > > - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); > > - return get_identifier (tmp_name); > > + return get_identifier (prefix); > > } > > > > /* Return a new assembler name for a clone of DECL with SUFFIX. */ > > @@ -537,7 +536,17 @@ tree > > clone_function_name (tree decl, const char *suffix) > > { > > tree name = DECL_ASSEMBLER_NAME (decl); > > - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); > > + char *decl_name = IDENTIFIER_POINTER (name); > > + char *numbered_name; > > + unsigned int *suffix_counter; > > + if (!clone_fn_ids) { > > + /* Initialize the per-function counter hash table if this is the first call */ > > + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); > > + } > > + suffix_counter = &clone_fn_ids->get_or_insert(name); > > + ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter); > > + *suffix_counter = *suffix_counter + 1; > > + return clone_function_name_1 (numbered_name, suffix); > > } > > > > - Michael > > > > > > > > Ping, and below is the updated version of the full patch with changelogs: > > > gcc: > 2018-07-16 Michael Ploujnikov <michael.ploujnikov@oracle.com> > > Make function clone name numbering independent. > * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids. > (clone_function_name_1): Move suffixing to clone_function_name > and change it to use per-function clone_fn_ids. > > testsuite: > 2018-07-16 Michael Ploujnikov <michael.ploujnikov@oracle.com> > > Clone id counters should be completely independent from one another. > * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test. > > --- > gcc/cgraphclones.c | 19 ++++++++++---- > gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++ > 2 files changed, 52 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c > > diff --git gcc/cgraphclones.c gcc/cgraphclones.c > index 6e84a31..e1a77a2 100644 > --- gcc/cgraphclones.c > +++ gcc/cgraphclones.c > @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, > return new_node; > } > > -static GTY(()) unsigned int clone_fn_id_num; > +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; > > /* Return a new assembler name for a clone with SUFFIX of a decl named > NAME. */ > @@ -521,14 +521,13 @@ tree > clone_function_name_1 (const char *name, const char *suffix) pass this function the counter to use.... > { > size_t len = strlen (name); > - char *tmp_name, *prefix; > + char *prefix; > > prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); > memcpy (prefix, name, len); > strcpy (prefix + len + 1, suffix); > prefix[len] = symbol_table::symbol_suffix_separator (); > - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); and keep using ASM_FORMAT_PRIVATE_NAME here. You need to change the lto/lto-partition.c caller (just use zero as counter). > - return get_identifier (tmp_name); > + return get_identifier (prefix); > } > > /* Return a new assembler name for a clone of DECL with SUFFIX. */ > @@ -537,7 +536,17 @@ tree > clone_function_name (tree decl, const char *suffix) > { > tree name = DECL_ASSEMBLER_NAME (decl); > - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); > + const char *decl_name = IDENTIFIER_POINTER (name); > + char *numbered_name; > + unsigned int *suffix_counter; > + if (!clone_fn_ids) { > + /* Initialize the per-function counter hash table if this is the first call */ > + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); > + } I still do not like throwing memory at the problem in this way for the little benefit this change provides :/ So no approval from me at this point... Richard. > + suffix_counter = &clone_fn_ids->get_or_insert(decl_name); > + ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter); > + *suffix_counter = *suffix_counter + 1; > + return clone_function_name_1 (numbered_name, suffix); > } > > > diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c > new file mode 100644 > index 0000000..d723e20 > --- /dev/null > +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c > @@ -0,0 +1,38 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp" } */ > + > +extern int printf (const char *, ...); > + > +static int __attribute__ ((noinline)) > +foo (int arg) > +{ > + return 7 * arg; > +} > + > +static int __attribute__ ((noinline)) > +bar (int arg) > +{ > + return arg * arg; > +} > + > +int > +baz (int arg) > +{ > + printf("%d\n", bar (3)); > + printf("%d\n", bar (4)); > + printf("%d\n", foo (5)); > + printf("%d\n", foo (6)); > + /* adding or removing the following call should not affect foo > + function's clone numbering */ > + printf("%d\n", bar (7)); > + return foo (8); > +} > + > +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */ > +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */ > +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */ > +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */ > +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */ > +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */ > +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */ > +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */ > -- > 2.7.4 >
On 2018-07-20 06:05 AM, Richard Biener wrote: > On Fri, Jul 20, 2018 at 4:48 AM Michael Ploujnikov > <michael.ploujnikov@oracle.com> wrote: >> >> On 2018-07-17 04:25 PM, Michael Ploujnikov wrote: >>> On 2018-07-17 06:02 AM, Richard Biener wrote: >>>> On Tue, Jul 17, 2018 at 8:10 AM Bernhard Reutner-Fischer >>>> <rep.dot.nop@gmail.com> wrote: >>>>> >>>>> On 16 July 2018 21:38:36 CEST, Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote: >>>>>> Hi, >>>>>> >>>>> >>>>>> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc >>>>>> (1000); >>>>> >>>>> Isn't 1000 a bit excessive? What about 64 or thereabouts? >>>> >>>> I'm not sure we should throw memory at this "problem" in this way. >>>> What specific issue >>>> does this address? >>> >>> This goes along with the general theme of preventing changes to one function affecting codegen of others. What I'm seeing in this case is when a function bar is modified codegen decides to create more clones of it (eg: during the constprop pass). These extra clones cause the global counter to increment so the clones of the unchanged function foo are named differently only because of a source change to bar. I was hoping that the testcase would be a good illustration, but perhaps not; is there anything else I can do to make this clearer? >>> >>> >>>> >>>> Iff then I belive forgoing the automatic counter addidion is the way >>>> to go and hand >>>> control of that to the callers (for example the caller in >>>> lto/lto-partition.c doesn't >>>> even seem to need that counter. >>> >>> How can you tell that privatize_symbol_name_1 doesn't need the counter? I'm assuming it has a good reason to call clone_function_name_1 rather than appending ".lto_priv" itself. >>> >>>> You also assume the string you key on persists - luckily the >>>> lto-partition.c caller >>>> leaks it but this makes your approach quite fragile in my eye (put the logic >>>> into clone_function_name instead, where you at least know you are dealing >>>> with a string from an indentifier which are never collected). >>>> >>>> Richard. >>>> >>> >>> Is this what you had in mind?: >>> >>> diff --git gcc/cgraphclones.c gcc/cgraphclones.c >>> index 6e84a31..f000420 100644 >>> --- gcc/cgraphclones.c >>> +++ gcc/cgraphclones.c >>> @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, >>> return new_node; >>> } >>> >>> -static GTY(()) unsigned int clone_fn_id_num; >>> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; >>> >>> /* Return a new assembler name for a clone with SUFFIX of a decl named >>> NAME. */ >>> @@ -521,14 +521,13 @@ tree >>> clone_function_name_1 (const char *name, const char *suffix) >>> { >>> size_t len = strlen (name); >>> - char *tmp_name, *prefix; >>> + char *prefix; >>> >>> prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); >>> memcpy (prefix, name, len); >>> strcpy (prefix + len + 1, suffix); >>> prefix[len] = symbol_table::symbol_suffix_separator (); >>> - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); >>> - return get_identifier (tmp_name); >>> + return get_identifier (prefix); >>> } >>> >>> /* Return a new assembler name for a clone of DECL with SUFFIX. */ >>> @@ -537,7 +536,17 @@ tree >>> clone_function_name (tree decl, const char *suffix) >>> { >>> tree name = DECL_ASSEMBLER_NAME (decl); >>> - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); >>> + char *decl_name = IDENTIFIER_POINTER (name); >>> + char *numbered_name; >>> + unsigned int *suffix_counter; >>> + if (!clone_fn_ids) { >>> + /* Initialize the per-function counter hash table if this is the first call */ >>> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); >>> + } >>> + suffix_counter = &clone_fn_ids->get_or_insert(name); >>> + ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter); >>> + *suffix_counter = *suffix_counter + 1; >>> + return clone_function_name_1 (numbered_name, suffix); >>> } >>> >>> - Michael >>> >>> >>> >> >> Ping, and below is the updated version of the full patch with changelogs: >> >> >> gcc: >> 2018-07-16 Michael Ploujnikov <michael.ploujnikov@oracle.com> >> >> Make function clone name numbering independent. >> * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids. >> (clone_function_name_1): Move suffixing to clone_function_name >> and change it to use per-function clone_fn_ids. >> >> testsuite: >> 2018-07-16 Michael Ploujnikov <michael.ploujnikov@oracle.com> >> >> Clone id counters should be completely independent from one another. >> * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test. >> >> --- >> gcc/cgraphclones.c | 19 ++++++++++---- >> gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++ >> 2 files changed, 52 insertions(+), 5 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c >> >> diff --git gcc/cgraphclones.c gcc/cgraphclones.c >> index 6e84a31..e1a77a2 100644 >> --- gcc/cgraphclones.c >> +++ gcc/cgraphclones.c >> @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, >> return new_node; >> } >> >> -static GTY(()) unsigned int clone_fn_id_num; >> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; >> >> /* Return a new assembler name for a clone with SUFFIX of a decl named >> NAME. */ >> @@ -521,14 +521,13 @@ tree >> clone_function_name_1 (const char *name, const char *suffix) > > pass this function the counter to use.... > >> { >> size_t len = strlen (name); >> - char *tmp_name, *prefix; >> + char *prefix; >> >> prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); >> memcpy (prefix, name, len); >> strcpy (prefix + len + 1, suffix); >> prefix[len] = symbol_table::symbol_suffix_separator (); >> - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); > > and keep using ASM_FORMAT_PRIVATE_NAME here. You need to change > the lto/lto-partition.c caller (just use zero as counter). > >> - return get_identifier (tmp_name); >> + return get_identifier (prefix); >> } >> >> /* Return a new assembler name for a clone of DECL with SUFFIX. */ >> @@ -537,7 +536,17 @@ tree >> clone_function_name (tree decl, const char *suffix) >> { >> tree name = DECL_ASSEMBLER_NAME (decl); >> - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); >> + const char *decl_name = IDENTIFIER_POINTER (name); >> + char *numbered_name; >> + unsigned int *suffix_counter; >> + if (!clone_fn_ids) { >> + /* Initialize the per-function counter hash table if this is the first call */ >> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); >> + } > > I still do not like throwing memory at the problem in this way for the > little benefit > this change provides :/ > > So no approval from me at this point... > > Richard. Can you give me an idea of the memory constraints that are involved? The highest memory usage increase that I could find was when compiling a source file (from Linux) with roughly 10,000 functions. It showed a 2kB increase over the before-patch use of 6936kB which is barely 0.03%. Using a single counter can result in more confusing namespacing when you have .bar.clone.4 despite there only being 3 clones of .bar. From a practical point of view this change is helpful to anyone diffing binary output such as forensic analysts, Debian Reproducible Builds or even someone validating compiler output (before and after an input source patch). The extra changes that this patch alleviates are a distraction and could even be misleading. For example, applying a source patch to the same Linux source produces the following binary diff before my change: --- /tmp/output.o.objdump +++ /tmp/patched-output.o.objdump @@ -1,5 +1,5 @@ -/tmp/uverbs_cmd/output.o: file format elf32-i386 +/tmp/uverbs_cmd/patched-output.o: file format elf32-i386 Disassembly of section .text.get_order: @@ -265,12 +265,12 @@ 3: e9 fc ff ff ff jmp 4 <put_cq_read+0x4> 4: R_386_PC32 .text.put_uobj_read -Disassembly of section .text.trace_kmalloc.constprop.3: +Disassembly of section .text.trace_kmalloc.constprop.4: -00000000 <trace_kmalloc.constprop.3>: +00000000 <trace_kmalloc.constprop.4>: 0: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 2: R_386_32 __tracepoint_kmalloc - 7: 74 34 je 3d <trace_kmalloc.constprop.3+0x3d> + 7: 74 34 je 3d <trace_kmalloc.constprop.4+0x3d> 9: 55 push %ebp a: 89 cd mov %ecx,%ebp c: 57 push %edi @@ -281,7 +281,7 @@ 13: 8b 1d 10 00 00 00 mov 0x10,%ebx 15: R_386_32 __tracepoint_kmalloc 19: 85 db test %ebx,%ebx - 1b: 74 1b je 38 <trace_kmalloc.constprop.3+0x38> + 1b: 74 1b je 38 <trace_kmalloc.constprop.4+0x38> 1d: 68 d0 00 00 00 push $0xd0 22: 89 fa mov %edi,%edx 24: 89 f0 mov %esi,%eax @@ -292,7 +292,7 @@ 31: 58 pop %eax 32: 83 3b 00 cmpl $0x0,(%ebx) 35: 5a pop %edx - 36: eb e3 jmp 1b <trace_kmalloc.constprop.3+0x1b> + 36: eb e3 jmp 1b <trace_kmalloc.constprop.4+0x1b> 38: 5b pop %ebx 39: 5e pop %esi 3a: 5f pop %edi @@ -846,7 +846,7 @@ 78: b8 5f 00 00 00 mov $0x5f,%eax 79: R_386_32 .text.ib_uverbs_alloc_pd 7d: e8 fc ff ff ff call 7e <ib_uverbs_alloc_pd+0x7e> - 7e: R_386_PC32 .text.trace_kmalloc.constprop.3 + 7e: R_386_PC32 .text.trace_kmalloc.constprop.4 82: c7 45 d4 f4 ff ff ff movl $0xfffffff4,-0x2c(%ebp) 89: 59 pop %ecx 8a: 85 db test %ebx,%ebx @@ -1068,7 +1068,7 @@ 9c: b8 83 00 00 00 mov $0x83,%eax 9d: R_386_32 .text.ib_uverbs_reg_mr a1: e8 fc ff ff ff call a2 <ib_uverbs_reg_mr+0xa2> - a2: R_386_PC32 .text.trace_kmalloc.constprop.3 + a2: R_386_PC32 .text.trace_kmalloc.constprop.4 a6: ba f4 ff ff ff mov $0xfffffff4,%edx ab: 58 pop %eax ac: 85 db test %ebx,%ebx @@ -1385,7 +1385,7 @@ 99: b8 7b 00 00 00 mov $0x7b,%eax 9a: R_386_32 .text.ib_uverbs_create_cq 9e: e8 fc ff ff ff call 9f <ib_uverbs_create_cq+0x9f> - 9f: R_386_PC32 .text.trace_kmalloc.constprop.3 + 9f: R_386_PC32 .text.trace_kmalloc.constprop.4 a3: 58 pop %eax a4: 85 db test %ebx,%ebx a6: 75 0a jne b2 <ib_uverbs_create_cq+0xb2> @@ -1607,129 +1607,107 @@ 00000000 <ib_uverbs_poll_cq>: 0: 55 push %ebp 1: 57 push %edi - 2: 89 c7 mov %eax,%edi - 4: 56 push %esi - 5: 89 ce mov %ecx,%esi - 7: b9 10 00 00 00 mov $0x10,%ecx - c: 53 push %ebx - d: 83 ec 18 sub $0x18,%esp - 10: 8d 44 24 08 lea 0x8(%esp),%eax - 14: e8 fc ff ff ff call 15 <ib_uverbs_poll_cq+0x15> - 15: R_386_PC32 copy_from_user - 19: 85 c0 test %eax,%eax - 1b: 0f 85 34 01 00 00 jne 155 <ib_uverbs_poll_cq+0x155> - 21: 6b 44 24 14 34 imul $0x34,0x14(%esp),%eax - 26: ba d0 00 00 00 mov $0xd0,%edx - 2b: e8 fc ff ff ff call 2c <ib_uverbs_poll_cq+0x2c> - 2c: R_386_PC32 __kmalloc - 30: 89 c5 mov %eax,%ebp - 32: 85 c0 test %eax,%eax - 34: 0f 84 22 01 00 00 je 15c <ib_uverbs_poll_cq+0x15c> - 3a: 6b 44 24 14 30 imul $0x30,0x14(%esp),%eax - 3f: ba d0 00 00 00 mov $0xd0,%edx - 44: 83 c0 08 add $0x8,%eax - 47: 89 44 24 04 mov %eax,0x4(%esp) - 4b: e8 fc ff ff ff call 4c <ib_uverbs_poll_cq+0x4c> - 4c: R_386_PC32 __kmalloc - 50: ba f4 ff ff ff mov $0xfffffff4,%edx - 55: 89 04 24 mov %eax,(%esp) - 58: 85 c0 test %eax,%eax - 5a: 0f 84 e1 00 00 00 je 141 <ib_uverbs_poll_cq+0x141> - 60: 8b 4f 58 mov 0x58(%edi),%ecx - 63: 6a 00 push $0x0 - 65: b8 00 00 00 00 mov $0x0,%eax - 66: R_386_32 ib_uverbs_cq_idr - 6a: 8b 54 24 14 mov 0x14(%esp),%edx - 6e: e8 fc ff ff ff call 6f <ib_uverbs_poll_cq+0x6f> - 6f: R_386_PC32 .text.idr_read_obj - 73: ba ea ff ff ff mov $0xffffffea,%edx - 78: 89 c7 mov %eax,%edi - 7a: 58 pop %eax - 7b: 85 ff test %edi,%edi - 7d: 0f 84 ae 00 00 00 je 131 <ib_uverbs_poll_cq+0x131> - 83: 8b 1f mov (%edi),%ebx - 85: 8b 54 24 14 mov 0x14(%esp),%edx - 89: 89 e9 mov %ebp,%ecx - 8b: 89 f8 mov %edi,%eax - 8d: ff 93 70 01 00 00 call *0x170(%ebx) - 93: 8b 1c 24 mov (%esp),%ebx - 96: 89 03 mov %eax,(%ebx) - 98: 89 f8 mov %edi,%eax - 9a: e8 fc ff ff ff call 9b <ib_uverbs_poll_cq+0x9b> - 9b: R_386_PC32 .text.put_cq_read - 9f: 8b 1c 24 mov (%esp),%ebx - a2: 89 e8 mov %ebp,%eax - a4: 6b 3b 34 imul $0x34,(%ebx),%edi - a7: 8d 53 08 lea 0x8(%ebx),%edx - aa: 01 ef add %ebp,%edi - ac: 39 f8 cmp %edi,%eax - ae: 74 67 je 117 <ib_uverbs_poll_cq+0x117> - b0: 8b 08 mov (%eax),%ecx - b2: 8b 58 04 mov 0x4(%eax),%ebx - b5: 83 c2 30 add $0x30,%edx - b8: 83 c0 34 add $0x34,%eax - bb: 89 4a d0 mov %ecx,-0x30(%edx) - be: 89 5a d4 mov %ebx,-0x2c(%edx) - c1: 8b 48 d4 mov -0x2c(%eax),%ecx - c4: 89 4a d8 mov %ecx,-0x28(%edx) - c7: 8b 48 d8 mov -0x28(%eax),%ecx - ca: 89 4a dc mov %ecx,-0x24(%edx) - cd: 8b 48 dc mov -0x24(%eax),%ecx - d0: 89 4a e0 mov %ecx,-0x20(%edx) - d3: 8b 48 e0 mov -0x20(%eax),%ecx - d6: 89 4a e4 mov %ecx,-0x1c(%edx) - d9: 8b 48 e8 mov -0x18(%eax),%ecx - dc: 89 4a e8 mov %ecx,-0x18(%edx) - df: 8b 48 e4 mov -0x1c(%eax),%ecx - e2: 8b 49 20 mov 0x20(%ecx),%ecx - e5: 89 4a ec mov %ecx,-0x14(%edx) - e8: 8b 48 ec mov -0x14(%eax),%ecx - eb: 89 4a f0 mov %ecx,-0x10(%edx) - ee: 8b 48 f0 mov -0x10(%eax),%ecx - f1: 89 4a f4 mov %ecx,-0xc(%edx) - f4: 8b 48 f4 mov -0xc(%eax),%ecx - f7: 66 89 4a f8 mov %cx,-0x8(%edx) - fb: 66 8b 48 f6 mov -0xa(%eax),%cx - ff: 66 89 4a fa mov %cx,-0x6(%edx) - 103: 8a 48 f8 mov -0x8(%eax),%cl - 106: 88 4a fc mov %cl,-0x4(%edx) - 109: 8a 48 f9 mov -0x7(%eax),%cl - 10c: 88 4a fd mov %cl,-0x3(%edx) - 10f: 8a 48 fa mov -0x6(%eax),%cl - 112: 88 4a fe mov %cl,-0x2(%edx) - 115: eb 95 jmp ac <ib_uverbs_poll_cq+0xac> - 117: 8b 14 24 mov (%esp),%edx - 11a: 8b 4c 24 04 mov 0x4(%esp),%ecx - 11e: 8b 44 24 08 mov 0x8(%esp),%eax - 122: e8 fc ff ff ff call 123 <ib_uverbs_poll_cq+0x123> - 123: R_386_PC32 copy_to_user - 127: 83 f8 01 cmp $0x1,%eax - 12a: 19 d2 sbb %edx,%edx - 12c: f7 d2 not %edx - 12e: 83 e2 f2 and $0xfffffff2,%edx - 131: 8b 04 24 mov (%esp),%eax - 134: 89 54 24 04 mov %edx,0x4(%esp) - 138: e8 fc ff ff ff call 139 <ib_uverbs_poll_cq+0x139> - 139: R_386_PC32 kfree - 13d: 8b 54 24 04 mov 0x4(%esp),%edx - 141: 89 e8 mov %ebp,%eax - 143: 89 14 24 mov %edx,(%esp) - 146: e8 fc ff ff ff call 147 <ib_uverbs_poll_cq+0x147> - 147: R_386_PC32 kfree - 14b: 8b 14 24 mov (%esp),%edx - 14e: 85 d2 test %edx,%edx - 150: 0f 45 f2 cmovne %edx,%esi - 153: eb 0c jmp 161 <ib_uverbs_poll_cq+0x161> - 155: be f2 ff ff ff mov $0xfffffff2,%esi - 15a: eb 05 jmp 161 <ib_uverbs_poll_cq+0x161> - 15c: be f4 ff ff ff mov $0xfffffff4,%esi - 161: 83 c4 18 add $0x18,%esp - 164: 89 f0 mov %esi,%eax - 166: 5b pop %ebx - 167: 5e pop %esi - 168: 5f pop %edi - 169: 5d pop %ebp - 16a: c3 ret + 2: bf f2 ff ff ff mov $0xfffffff2,%edi + 7: 56 push %esi + 8: 53 push %ebx + 9: 89 c3 mov %eax,%ebx + b: 81 ec 84 00 00 00 sub $0x84,%esp + 11: 89 4c 24 04 mov %ecx,0x4(%esp) + 15: 8d 44 24 10 lea 0x10(%esp),%eax + 19: b9 10 00 00 00 mov $0x10,%ecx + 1e: e8 fc ff ff ff call 1f <ib_uverbs_poll_cq+0x1f> + 1f: R_386_PC32 copy_from_user + 23: 89 04 24 mov %eax,(%esp) + 26: 85 c0 test %eax,%eax + 28: 0f 85 1f 01 00 00 jne 14d <ib_uverbs_poll_cq+0x14d> + 2e: 8b 4b 58 mov 0x58(%ebx),%ecx + 31: 6a 00 push $0x0 + 33: b8 00 00 00 00 mov $0x0,%eax + 34: R_386_32 ib_uverbs_cq_idr + 38: bf ea ff ff ff mov $0xffffffea,%edi + 3d: 8b 54 24 1c mov 0x1c(%esp),%edx + 41: e8 fc ff ff ff call 42 <ib_uverbs_poll_cq+0x42> + 42: R_386_PC32 .text.idr_read_obj + 46: 89 c3 mov %eax,%ebx + 48: 58 pop %eax + 49: 85 db test %ebx,%ebx + 4b: 0f 84 fc 00 00 00 je 14d <ib_uverbs_poll_cq+0x14d> + 51: 8b 6c 24 10 mov 0x10(%esp),%ebp + 55: b9 02 00 00 00 mov $0x2,%ecx + 5a: 8d 7c 24 08 lea 0x8(%esp),%edi + 5e: 8b 04 24 mov (%esp),%eax + 61: 8d 75 08 lea 0x8(%ebp),%esi + 64: f3 ab rep stos %eax,%es:(%edi) + 66: 8b 44 24 1c mov 0x1c(%esp),%eax + 6a: 39 44 24 08 cmp %eax,0x8(%esp) + 6e: 73 1f jae 8f <ib_uverbs_poll_cq+0x8f> + 70: 8b 3b mov (%ebx),%edi + 72: 8d 4c 24 50 lea 0x50(%esp),%ecx + 76: ba 01 00 00 00 mov $0x1,%edx + 7b: 89 d8 mov %ebx,%eax + 7d: ff 97 70 01 00 00 call *0x170(%edi) + 83: 89 c7 mov %eax,%edi + 85: 85 c0 test %eax,%eax + 87: 0f 88 b9 00 00 00 js 146 <ib_uverbs_poll_cq+0x146> + 8d: 75 21 jne b0 <ib_uverbs_poll_cq+0xb0> + 8f: b9 08 00 00 00 mov $0x8,%ecx + 94: 8d 54 24 08 lea 0x8(%esp),%edx + 98: 89 e8 mov %ebp,%eax + 9a: bf f2 ff ff ff mov $0xfffffff2,%edi + 9f: e8 fc ff ff ff call a0 <ib_uverbs_poll_cq+0xa0> + a0: R_386_PC32 copy_to_user + a4: 85 c0 test %eax,%eax + a6: 0f 44 7c 24 04 cmove 0x4(%esp),%edi + ab: e9 96 00 00 00 jmp 146 <ib_uverbs_poll_cq+0x146> + b0: 8b 44 24 50 mov 0x50(%esp),%eax + b4: 8b 54 24 54 mov 0x54(%esp),%edx + b8: b9 30 00 00 00 mov $0x30,%ecx + bd: 89 44 24 20 mov %eax,0x20(%esp) + c1: 8b 44 24 58 mov 0x58(%esp),%eax + c5: 89 54 24 24 mov %edx,0x24(%esp) + c9: 8b 54 24 74 mov 0x74(%esp),%edx + cd: 89 44 24 28 mov %eax,0x28(%esp) + d1: 8b 44 24 5c mov 0x5c(%esp),%eax + d5: 89 44 24 2c mov %eax,0x2c(%esp) + d9: 8b 44 24 60 mov 0x60(%esp),%eax + dd: 89 44 24 30 mov %eax,0x30(%esp) + e1: 8b 44 24 64 mov 0x64(%esp),%eax + e5: 89 44 24 34 mov %eax,0x34(%esp) + e9: 8b 44 24 6c mov 0x6c(%esp),%eax + ed: 89 44 24 38 mov %eax,0x38(%esp) + f1: 8b 44 24 68 mov 0x68(%esp),%eax + f5: 8b 40 20 mov 0x20(%eax),%eax + f8: 89 54 24 44 mov %edx,0x44(%esp) + fc: 8d 54 24 20 lea 0x20(%esp),%edx + 100: c6 44 24 4f 00 movb $0x0,0x4f(%esp) + 105: 89 44 24 3c mov %eax,0x3c(%esp) + 109: 8b 44 24 70 mov 0x70(%esp),%eax + 10d: 89 44 24 40 mov %eax,0x40(%esp) + 111: 8b 44 24 78 mov 0x78(%esp),%eax + 115: 89 44 24 48 mov %eax,0x48(%esp) + 119: 8b 44 24 7c mov 0x7c(%esp),%eax + 11d: 66 89 44 24 4c mov %ax,0x4c(%esp) + 122: 8a 44 24 7e mov 0x7e(%esp),%al + 126: 88 44 24 4e mov %al,0x4e(%esp) + 12a: 89 f0 mov %esi,%eax + 12c: e8 fc ff ff ff call 12d <ib_uverbs_poll_cq+0x12d> + 12d: R_386_PC32 copy_to_user + 131: 85 c0 test %eax,%eax + 133: 75 0c jne 141 <ib_uverbs_poll_cq+0x141> + 135: 83 c6 30 add $0x30,%esi + 138: ff 44 24 08 incl 0x8(%esp) + 13c: e9 25 ff ff ff jmp 66 <ib_uverbs_poll_cq+0x66> + 141: bf f2 ff ff ff mov $0xfffffff2,%edi + 146: 89 d8 mov %ebx,%eax + 148: e8 fc ff ff ff call 149 <ib_uverbs_poll_cq+0x149> + 149: R_386_PC32 .text.put_cq_read + 14d: 81 c4 84 00 00 00 add $0x84,%esp + 153: 89 f8 mov %edi,%eax + 155: 5b pop %ebx + 156: 5e pop %esi + 157: 5f pop %edi + 158: 5d pop %ebp + 159: c3 ret Disassembly of section .text.ib_uverbs_req_notify_cq: @@ -1915,7 +1893,7 @@ 94: b8 7b 00 00 00 mov $0x7b,%eax 95: R_386_32 .text.ib_uverbs_create_qp 99: e8 fc ff ff ff call 9a <ib_uverbs_create_qp+0x9a> - 9a: R_386_PC32 .text.trace_kmalloc.constprop.3 + 9a: R_386_PC32 .text.trace_kmalloc.constprop.4 9e: 59 pop %ecx 9f: c7 85 50 ff ff ff f4 movl $0xfffffff4,-0xb0(%ebp) a6: ff ff ff @@ -2241,7 +2219,7 @@ 68: b8 4f 00 00 00 mov $0x4f,%eax 69: R_386_32 .text.ib_uverbs_query_qp 6d: e8 fc ff ff ff call 6e <ib_uverbs_query_qp+0x6e> - 6e: R_386_PC32 .text.trace_kmalloc.constprop.3 + 6e: R_386_PC32 .text.trace_kmalloc.constprop.4 72: 59 pop %ecx 73: c7 85 5c ff ff ff 10 movl $0x10,-0xa4(%ebp) 7a: 00 00 00 @@ -2260,7 +2238,7 @@ a3: b8 86 00 00 00 mov $0x86,%eax a4: R_386_32 .text.ib_uverbs_query_qp a8: e8 fc ff ff ff call a9 <ib_uverbs_query_qp+0xa9> - a9: R_386_PC32 .text.trace_kmalloc.constprop.3 + a9: R_386_PC32 .text.trace_kmalloc.constprop.4 ad: 5a pop %edx ae: 85 db test %ebx,%ebx b0: 0f 84 c1 01 00 00 je 277 <ib_uverbs_query_qp+0x277> @@ -2462,7 +2440,7 @@ 88: b8 6f 00 00 00 mov $0x6f,%eax 89: R_386_32 .text.ib_uverbs_modify_qp 8d: e8 fc ff ff ff call 8e <ib_uverbs_modify_qp+0x8e> - 8e: R_386_PC32 .text.trace_kmalloc.constprop.3 + 8e: R_386_PC32 .text.trace_kmalloc.constprop.4 92: 5a pop %edx 93: ba f4 ff ff ff mov $0xfffffff4,%edx 98: 85 db test %ebx,%ebx @@ -3129,7 +3107,7 @@ 6a: b8 4c 00 00 00 mov $0x4c,%eax 6b: R_386_32 .text.ib_uverbs_create_ah 6f: e8 fc ff ff ff call 70 <ib_uverbs_create_ah+0x70> - 70: R_386_PC32 .text.trace_kmalloc.constprop.3 + 70: R_386_PC32 .text.trace_kmalloc.constprop.4 74: 58 pop %eax 75: 85 db test %ebx,%ebx 77: 75 0a jne 83 <ib_uverbs_create_ah+0x83> @@ -3396,7 +3374,7 @@ af: b8 91 00 00 00 mov $0x91,%eax b0: R_386_32 .text.ib_uverbs_attach_mcast b4: e8 fc ff ff ff call b5 <ib_uverbs_attach_mcast+0xb5> - b5: R_386_PC32 .text.trace_kmalloc.constprop.3 + b5: R_386_PC32 .text.trace_kmalloc.constprop.4 b9: 58 pop %eax ba: 85 db test %ebx,%ebx bc: 75 07 jne c5 <ib_uverbs_attach_mcast+0xc5> @@ -3572,7 +3550,7 @@ 77: b8 5e 00 00 00 mov $0x5e,%eax 78: R_386_32 .text.ib_uverbs_create_srq 7c: e8 fc ff ff ff call 7d <ib_uverbs_create_srq+0x7d> - 7d: R_386_PC32 .text.trace_kmalloc.constprop.3 + 7d: R_386_PC32 .text.trace_kmalloc.constprop.4 81: ba f4 ff ff ff mov $0xfffffff4,%edx 86: 58 pop %eax 87: 85 db test %ebx,%ebx Needless to say, after my change the diff only shows the truly changed functions. - Michael > >> + suffix_counter = &clone_fn_ids->get_or_insert(decl_name); >> + ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter); >> + *suffix_counter = *suffix_counter + 1; >> + return clone_function_name_1 (numbered_name, suffix); >> } >> >> >> diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c >> new file mode 100644 >> index 0000000..d723e20 >> --- /dev/null >> +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c >> @@ -0,0 +1,38 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp" } */ >> + >> +extern int printf (const char *, ...); >> + >> +static int __attribute__ ((noinline)) >> +foo (int arg) >> +{ >> + return 7 * arg; >> +} >> + >> +static int __attribute__ ((noinline)) >> +bar (int arg) >> +{ >> + return arg * arg; >> +} >> + >> +int >> +baz (int arg) >> +{ >> + printf("%d\n", bar (3)); >> + printf("%d\n", bar (4)); >> + printf("%d\n", foo (5)); >> + printf("%d\n", foo (6)); >> + /* adding or removing the following call should not affect foo >> + function's clone numbering */ >> + printf("%d\n", bar (7)); >> + return foo (8); >> +} >> + >> +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */ >> +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */ >> +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */ >> +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */ >> +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */ >> +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */ >> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */ >> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */ >> -- >> 2.7.4 >>
On 2018-07-24 09:57 AM, Michael Ploujnikov wrote: > On 2018-07-20 06:05 AM, Richard Biener wrote: >>> /* Return a new assembler name for a clone with SUFFIX of a decl named >>> NAME. */ >>> @@ -521,14 +521,13 @@ tree >>> clone_function_name_1 (const char *name, const char *suffix) >> >> pass this function the counter to use.... >> >>> { >>> size_t len = strlen (name); >>> - char *tmp_name, *prefix; >>> + char *prefix; >>> >>> prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); >>> memcpy (prefix, name, len); >>> strcpy (prefix + len + 1, suffix); >>> prefix[len] = symbol_table::symbol_suffix_separator (); >>> - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); >> >> and keep using ASM_FORMAT_PRIVATE_NAME here. You need to change >> the lto/lto-partition.c caller (just use zero as counter). >> >>> - return get_identifier (tmp_name); >>> + return get_identifier (prefix); >>> } >>> >>> /* Return a new assembler name for a clone of DECL with SUFFIX. */ >>> @@ -537,7 +536,17 @@ tree >>> clone_function_name (tree decl, const char *suffix) >>> { >>> tree name = DECL_ASSEMBLER_NAME (decl); >>> - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); >>> + const char *decl_name = IDENTIFIER_POINTER (name); >>> + char *numbered_name; >>> + unsigned int *suffix_counter; >>> + if (!clone_fn_ids) { >>> + /* Initialize the per-function counter hash table if this is the first call */ >>> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); >>> + } >> >> I still do not like throwing memory at the problem in this way for the >> little benefit >> this change provides :/ >> >> So no approval from me at this point... >> >> Richard. > > Can you give me an idea of the memory constraints that are involved? > > The highest memory usage increase that I could find was when compiling > a source file (from Linux) with roughly 10,000 functions. It showed a 2kB > increase over the before-patch use of 6936kB which is barely 0.03%. > > Using a single counter can result in more confusing namespacing when > you have .bar.clone.4 despite there only being 3 clones of .bar. > > From a practical point of view this change is helpful to anyone > diffing binary output such as forensic analysts, Debian Reproducible > Builds or even someone validating compiler output (before and after an input > source patch). The extra changes that this patch alleviates are a > distraction and could even be misleading. For example, applying a > source patch to the same Linux source produces the following binary > diff before my change: > > --- /tmp/output.o.objdump > +++ /tmp/patched-output.o.objdump > @@ -1,5 +1,5 @@ > > -/tmp/uverbs_cmd/output.o: file format elf32-i386 > +/tmp/uverbs_cmd/patched-output.o: file format elf32-i386 > > > Disassembly of section .text.get_order: > @@ -265,12 +265,12 @@ > 3: e9 fc ff ff ff jmp 4 <put_cq_read+0x4> > 4: R_386_PC32 .text.put_uobj_read > > -Disassembly of section .text.trace_kmalloc.constprop.3: > +Disassembly of section .text.trace_kmalloc.constprop.4: > > -00000000 <trace_kmalloc.constprop.3>: > +00000000 <trace_kmalloc.constprop.4>: > 0: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 > 2: R_386_32 __tracepoint_kmalloc > - 7: 74 34 je 3d <trace_kmalloc.constprop.3+0x3d> > + 7: 74 34 je 3d <trace_kmalloc.constprop.4+0x3d> > 9: 55 push %ebp > a: 89 cd mov %ecx,%ebp > c: 57 push %edi > @@ -281,7 +281,7 @@ > 13: 8b 1d 10 00 00 00 mov 0x10,%ebx > 15: R_386_32 __tracepoint_kmalloc > 19: 85 db test %ebx,%ebx > - 1b: 74 1b je 38 <trace_kmalloc.constprop.3+0x38> > + 1b: 74 1b je 38 <trace_kmalloc.constprop.4+0x38> > 1d: 68 d0 00 00 00 push $0xd0 > 22: 89 fa mov %edi,%edx > 24: 89 f0 mov %esi,%eax > @@ -292,7 +292,7 @@ > 31: 58 pop %eax > 32: 83 3b 00 cmpl $0x0,(%ebx) > 35: 5a pop %edx > - 36: eb e3 jmp 1b <trace_kmalloc.constprop.3+0x1b> > + 36: eb e3 jmp 1b <trace_kmalloc.constprop.4+0x1b> > 38: 5b pop %ebx > 39: 5e pop %esi > 3a: 5f pop %edi > @@ -846,7 +846,7 @@ > 78: b8 5f 00 00 00 mov $0x5f,%eax > 79: R_386_32 .text.ib_uverbs_alloc_pd > 7d: e8 fc ff ff ff call 7e <ib_uverbs_alloc_pd+0x7e> > - 7e: R_386_PC32 .text.trace_kmalloc.constprop.3 > + 7e: R_386_PC32 .text.trace_kmalloc.constprop.4 > 82: c7 45 d4 f4 ff ff ff movl $0xfffffff4,-0x2c(%ebp) > 89: 59 pop %ecx > 8a: 85 db test %ebx,%ebx > @@ -1068,7 +1068,7 @@ > 9c: b8 83 00 00 00 mov $0x83,%eax > 9d: R_386_32 .text.ib_uverbs_reg_mr > a1: e8 fc ff ff ff call a2 <ib_uverbs_reg_mr+0xa2> > - a2: R_386_PC32 .text.trace_kmalloc.constprop.3 > + a2: R_386_PC32 .text.trace_kmalloc.constprop.4 > a6: ba f4 ff ff ff mov $0xfffffff4,%edx > ab: 58 pop %eax > ac: 85 db test %ebx,%ebx > @@ -1385,7 +1385,7 @@ > 99: b8 7b 00 00 00 mov $0x7b,%eax > 9a: R_386_32 .text.ib_uverbs_create_cq > 9e: e8 fc ff ff ff call 9f <ib_uverbs_create_cq+0x9f> > - 9f: R_386_PC32 .text.trace_kmalloc.constprop.3 > + 9f: R_386_PC32 .text.trace_kmalloc.constprop.4 > a3: 58 pop %eax > a4: 85 db test %ebx,%ebx > a6: 75 0a jne b2 <ib_uverbs_create_cq+0xb2> > @@ -1607,129 +1607,107 @@ > 00000000 <ib_uverbs_poll_cq>: > 0: 55 push %ebp > 1: 57 push %edi > - 2: 89 c7 mov %eax,%edi > - 4: 56 push %esi > - 5: 89 ce mov %ecx,%esi > - 7: b9 10 00 00 00 mov $0x10,%ecx > - c: 53 push %ebx > - d: 83 ec 18 sub $0x18,%esp > - 10: 8d 44 24 08 lea 0x8(%esp),%eax > - 14: e8 fc ff ff ff call 15 <ib_uverbs_poll_cq+0x15> > - 15: R_386_PC32 copy_from_user > - 19: 85 c0 test %eax,%eax > - 1b: 0f 85 34 01 00 00 jne 155 <ib_uverbs_poll_cq+0x155> > - 21: 6b 44 24 14 34 imul $0x34,0x14(%esp),%eax > - 26: ba d0 00 00 00 mov $0xd0,%edx > - 2b: e8 fc ff ff ff call 2c <ib_uverbs_poll_cq+0x2c> > - 2c: R_386_PC32 __kmalloc > - 30: 89 c5 mov %eax,%ebp > - 32: 85 c0 test %eax,%eax > - 34: 0f 84 22 01 00 00 je 15c <ib_uverbs_poll_cq+0x15c> > - 3a: 6b 44 24 14 30 imul $0x30,0x14(%esp),%eax > - 3f: ba d0 00 00 00 mov $0xd0,%edx > - 44: 83 c0 08 add $0x8,%eax > - 47: 89 44 24 04 mov %eax,0x4(%esp) > - 4b: e8 fc ff ff ff call 4c <ib_uverbs_poll_cq+0x4c> > - 4c: R_386_PC32 __kmalloc > - 50: ba f4 ff ff ff mov $0xfffffff4,%edx > - 55: 89 04 24 mov %eax,(%esp) > - 58: 85 c0 test %eax,%eax > - 5a: 0f 84 e1 00 00 00 je 141 <ib_uverbs_poll_cq+0x141> > - 60: 8b 4f 58 mov 0x58(%edi),%ecx > - 63: 6a 00 push $0x0 > - 65: b8 00 00 00 00 mov $0x0,%eax > - 66: R_386_32 ib_uverbs_cq_idr > - 6a: 8b 54 24 14 mov 0x14(%esp),%edx > - 6e: e8 fc ff ff ff call 6f <ib_uverbs_poll_cq+0x6f> > - 6f: R_386_PC32 .text.idr_read_obj > - 73: ba ea ff ff ff mov $0xffffffea,%edx > - 78: 89 c7 mov %eax,%edi > - 7a: 58 pop %eax > - 7b: 85 ff test %edi,%edi > - 7d: 0f 84 ae 00 00 00 je 131 <ib_uverbs_poll_cq+0x131> > - 83: 8b 1f mov (%edi),%ebx > - 85: 8b 54 24 14 mov 0x14(%esp),%edx > - 89: 89 e9 mov %ebp,%ecx > - 8b: 89 f8 mov %edi,%eax > - 8d: ff 93 70 01 00 00 call *0x170(%ebx) > - 93: 8b 1c 24 mov (%esp),%ebx > - 96: 89 03 mov %eax,(%ebx) > - 98: 89 f8 mov %edi,%eax > - 9a: e8 fc ff ff ff call 9b <ib_uverbs_poll_cq+0x9b> > - 9b: R_386_PC32 .text.put_cq_read > - 9f: 8b 1c 24 mov (%esp),%ebx > - a2: 89 e8 mov %ebp,%eax > - a4: 6b 3b 34 imul $0x34,(%ebx),%edi > - a7: 8d 53 08 lea 0x8(%ebx),%edx > - aa: 01 ef add %ebp,%edi > - ac: 39 f8 cmp %edi,%eax > - ae: 74 67 je 117 <ib_uverbs_poll_cq+0x117> > - b0: 8b 08 mov (%eax),%ecx > - b2: 8b 58 04 mov 0x4(%eax),%ebx > - b5: 83 c2 30 add $0x30,%edx > - b8: 83 c0 34 add $0x34,%eax > - bb: 89 4a d0 mov %ecx,-0x30(%edx) > - be: 89 5a d4 mov %ebx,-0x2c(%edx) > - c1: 8b 48 d4 mov -0x2c(%eax),%ecx > - c4: 89 4a d8 mov %ecx,-0x28(%edx) > - c7: 8b 48 d8 mov -0x28(%eax),%ecx > - ca: 89 4a dc mov %ecx,-0x24(%edx) > - cd: 8b 48 dc mov -0x24(%eax),%ecx > - d0: 89 4a e0 mov %ecx,-0x20(%edx) > - d3: 8b 48 e0 mov -0x20(%eax),%ecx > - d6: 89 4a e4 mov %ecx,-0x1c(%edx) > - d9: 8b 48 e8 mov -0x18(%eax),%ecx > - dc: 89 4a e8 mov %ecx,-0x18(%edx) > - df: 8b 48 e4 mov -0x1c(%eax),%ecx > - e2: 8b 49 20 mov 0x20(%ecx),%ecx > - e5: 89 4a ec mov %ecx,-0x14(%edx) > - e8: 8b 48 ec mov -0x14(%eax),%ecx > - eb: 89 4a f0 mov %ecx,-0x10(%edx) > - ee: 8b 48 f0 mov -0x10(%eax),%ecx > - f1: 89 4a f4 mov %ecx,-0xc(%edx) > - f4: 8b 48 f4 mov -0xc(%eax),%ecx > - f7: 66 89 4a f8 mov %cx,-0x8(%edx) > - fb: 66 8b 48 f6 mov -0xa(%eax),%cx > - ff: 66 89 4a fa mov %cx,-0x6(%edx) > - 103: 8a 48 f8 mov -0x8(%eax),%cl > - 106: 88 4a fc mov %cl,-0x4(%edx) > - 109: 8a 48 f9 mov -0x7(%eax),%cl > - 10c: 88 4a fd mov %cl,-0x3(%edx) > - 10f: 8a 48 fa mov -0x6(%eax),%cl > - 112: 88 4a fe mov %cl,-0x2(%edx) > - 115: eb 95 jmp ac <ib_uverbs_poll_cq+0xac> > - 117: 8b 14 24 mov (%esp),%edx > - 11a: 8b 4c 24 04 mov 0x4(%esp),%ecx > - 11e: 8b 44 24 08 mov 0x8(%esp),%eax > - 122: e8 fc ff ff ff call 123 <ib_uverbs_poll_cq+0x123> > - 123: R_386_PC32 copy_to_user > - 127: 83 f8 01 cmp $0x1,%eax > - 12a: 19 d2 sbb %edx,%edx > - 12c: f7 d2 not %edx > - 12e: 83 e2 f2 and $0xfffffff2,%edx > - 131: 8b 04 24 mov (%esp),%eax > - 134: 89 54 24 04 mov %edx,0x4(%esp) > - 138: e8 fc ff ff ff call 139 <ib_uverbs_poll_cq+0x139> > - 139: R_386_PC32 kfree > - 13d: 8b 54 24 04 mov 0x4(%esp),%edx > - 141: 89 e8 mov %ebp,%eax > - 143: 89 14 24 mov %edx,(%esp) > - 146: e8 fc ff ff ff call 147 <ib_uverbs_poll_cq+0x147> > - 147: R_386_PC32 kfree > - 14b: 8b 14 24 mov (%esp),%edx > - 14e: 85 d2 test %edx,%edx > - 150: 0f 45 f2 cmovne %edx,%esi > - 153: eb 0c jmp 161 <ib_uverbs_poll_cq+0x161> > - 155: be f2 ff ff ff mov $0xfffffff2,%esi > - 15a: eb 05 jmp 161 <ib_uverbs_poll_cq+0x161> > - 15c: be f4 ff ff ff mov $0xfffffff4,%esi > - 161: 83 c4 18 add $0x18,%esp > - 164: 89 f0 mov %esi,%eax > - 166: 5b pop %ebx > - 167: 5e pop %esi > - 168: 5f pop %edi > - 169: 5d pop %ebp > - 16a: c3 ret > + 2: bf f2 ff ff ff mov $0xfffffff2,%edi > + 7: 56 push %esi > + 8: 53 push %ebx > + 9: 89 c3 mov %eax,%ebx > + b: 81 ec 84 00 00 00 sub $0x84,%esp > + 11: 89 4c 24 04 mov %ecx,0x4(%esp) > + 15: 8d 44 24 10 lea 0x10(%esp),%eax > + 19: b9 10 00 00 00 mov $0x10,%ecx > + 1e: e8 fc ff ff ff call 1f <ib_uverbs_poll_cq+0x1f> > + 1f: R_386_PC32 copy_from_user > + 23: 89 04 24 mov %eax,(%esp) > + 26: 85 c0 test %eax,%eax > + 28: 0f 85 1f 01 00 00 jne 14d <ib_uverbs_poll_cq+0x14d> > + 2e: 8b 4b 58 mov 0x58(%ebx),%ecx > + 31: 6a 00 push $0x0 > + 33: b8 00 00 00 00 mov $0x0,%eax > + 34: R_386_32 ib_uverbs_cq_idr > + 38: bf ea ff ff ff mov $0xffffffea,%edi > + 3d: 8b 54 24 1c mov 0x1c(%esp),%edx > + 41: e8 fc ff ff ff call 42 <ib_uverbs_poll_cq+0x42> > + 42: R_386_PC32 .text.idr_read_obj > + 46: 89 c3 mov %eax,%ebx > + 48: 58 pop %eax > + 49: 85 db test %ebx,%ebx > + 4b: 0f 84 fc 00 00 00 je 14d <ib_uverbs_poll_cq+0x14d> > + 51: 8b 6c 24 10 mov 0x10(%esp),%ebp > + 55: b9 02 00 00 00 mov $0x2,%ecx > + 5a: 8d 7c 24 08 lea 0x8(%esp),%edi > + 5e: 8b 04 24 mov (%esp),%eax > + 61: 8d 75 08 lea 0x8(%ebp),%esi > + 64: f3 ab rep stos %eax,%es:(%edi) > + 66: 8b 44 24 1c mov 0x1c(%esp),%eax > + 6a: 39 44 24 08 cmp %eax,0x8(%esp) > + 6e: 73 1f jae 8f <ib_uverbs_poll_cq+0x8f> > + 70: 8b 3b mov (%ebx),%edi > + 72: 8d 4c 24 50 lea 0x50(%esp),%ecx > + 76: ba 01 00 00 00 mov $0x1,%edx > + 7b: 89 d8 mov %ebx,%eax > + 7d: ff 97 70 01 00 00 call *0x170(%edi) > + 83: 89 c7 mov %eax,%edi > + 85: 85 c0 test %eax,%eax > + 87: 0f 88 b9 00 00 00 js 146 <ib_uverbs_poll_cq+0x146> > + 8d: 75 21 jne b0 <ib_uverbs_poll_cq+0xb0> > + 8f: b9 08 00 00 00 mov $0x8,%ecx > + 94: 8d 54 24 08 lea 0x8(%esp),%edx > + 98: 89 e8 mov %ebp,%eax > + 9a: bf f2 ff ff ff mov $0xfffffff2,%edi > + 9f: e8 fc ff ff ff call a0 <ib_uverbs_poll_cq+0xa0> > + a0: R_386_PC32 copy_to_user > + a4: 85 c0 test %eax,%eax > + a6: 0f 44 7c 24 04 cmove 0x4(%esp),%edi > + ab: e9 96 00 00 00 jmp 146 <ib_uverbs_poll_cq+0x146> > + b0: 8b 44 24 50 mov 0x50(%esp),%eax > + b4: 8b 54 24 54 mov 0x54(%esp),%edx > + b8: b9 30 00 00 00 mov $0x30,%ecx > + bd: 89 44 24 20 mov %eax,0x20(%esp) > + c1: 8b 44 24 58 mov 0x58(%esp),%eax > + c5: 89 54 24 24 mov %edx,0x24(%esp) > + c9: 8b 54 24 74 mov 0x74(%esp),%edx > + cd: 89 44 24 28 mov %eax,0x28(%esp) > + d1: 8b 44 24 5c mov 0x5c(%esp),%eax > + d5: 89 44 24 2c mov %eax,0x2c(%esp) > + d9: 8b 44 24 60 mov 0x60(%esp),%eax > + dd: 89 44 24 30 mov %eax,0x30(%esp) > + e1: 8b 44 24 64 mov 0x64(%esp),%eax > + e5: 89 44 24 34 mov %eax,0x34(%esp) > + e9: 8b 44 24 6c mov 0x6c(%esp),%eax > + ed: 89 44 24 38 mov %eax,0x38(%esp) > + f1: 8b 44 24 68 mov 0x68(%esp),%eax > + f5: 8b 40 20 mov 0x20(%eax),%eax > + f8: 89 54 24 44 mov %edx,0x44(%esp) > + fc: 8d 54 24 20 lea 0x20(%esp),%edx > + 100: c6 44 24 4f 00 movb $0x0,0x4f(%esp) > + 105: 89 44 24 3c mov %eax,0x3c(%esp) > + 109: 8b 44 24 70 mov 0x70(%esp),%eax > + 10d: 89 44 24 40 mov %eax,0x40(%esp) > + 111: 8b 44 24 78 mov 0x78(%esp),%eax > + 115: 89 44 24 48 mov %eax,0x48(%esp) > + 119: 8b 44 24 7c mov 0x7c(%esp),%eax > + 11d: 66 89 44 24 4c mov %ax,0x4c(%esp) > + 122: 8a 44 24 7e mov 0x7e(%esp),%al > + 126: 88 44 24 4e mov %al,0x4e(%esp) > + 12a: 89 f0 mov %esi,%eax > + 12c: e8 fc ff ff ff call 12d <ib_uverbs_poll_cq+0x12d> > + 12d: R_386_PC32 copy_to_user > + 131: 85 c0 test %eax,%eax > + 133: 75 0c jne 141 <ib_uverbs_poll_cq+0x141> > + 135: 83 c6 30 add $0x30,%esi > + 138: ff 44 24 08 incl 0x8(%esp) > + 13c: e9 25 ff ff ff jmp 66 <ib_uverbs_poll_cq+0x66> > + 141: bf f2 ff ff ff mov $0xfffffff2,%edi > + 146: 89 d8 mov %ebx,%eax > + 148: e8 fc ff ff ff call 149 <ib_uverbs_poll_cq+0x149> > + 149: R_386_PC32 .text.put_cq_read > + 14d: 81 c4 84 00 00 00 add $0x84,%esp > + 153: 89 f8 mov %edi,%eax > + 155: 5b pop %ebx > + 156: 5e pop %esi > + 157: 5f pop %edi > + 158: 5d pop %ebp > + 159: c3 ret > > Disassembly of section .text.ib_uverbs_req_notify_cq: > > @@ -1915,7 +1893,7 @@ > 94: b8 7b 00 00 00 mov $0x7b,%eax > 95: R_386_32 .text.ib_uverbs_create_qp > 99: e8 fc ff ff ff call 9a <ib_uverbs_create_qp+0x9a> > - 9a: R_386_PC32 .text.trace_kmalloc.constprop.3 > + 9a: R_386_PC32 .text.trace_kmalloc.constprop.4 > 9e: 59 pop %ecx > 9f: c7 85 50 ff ff ff f4 movl $0xfffffff4,-0xb0(%ebp) > a6: ff ff ff > @@ -2241,7 +2219,7 @@ > 68: b8 4f 00 00 00 mov $0x4f,%eax > 69: R_386_32 .text.ib_uverbs_query_qp > 6d: e8 fc ff ff ff call 6e <ib_uverbs_query_qp+0x6e> > - 6e: R_386_PC32 .text.trace_kmalloc.constprop.3 > + 6e: R_386_PC32 .text.trace_kmalloc.constprop.4 > 72: 59 pop %ecx > 73: c7 85 5c ff ff ff 10 movl $0x10,-0xa4(%ebp) > 7a: 00 00 00 > @@ -2260,7 +2238,7 @@ > a3: b8 86 00 00 00 mov $0x86,%eax > a4: R_386_32 .text.ib_uverbs_query_qp > a8: e8 fc ff ff ff call a9 <ib_uverbs_query_qp+0xa9> > - a9: R_386_PC32 .text.trace_kmalloc.constprop.3 > + a9: R_386_PC32 .text.trace_kmalloc.constprop.4 > ad: 5a pop %edx > ae: 85 db test %ebx,%ebx > b0: 0f 84 c1 01 00 00 je 277 <ib_uverbs_query_qp+0x277> > @@ -2462,7 +2440,7 @@ > 88: b8 6f 00 00 00 mov $0x6f,%eax > 89: R_386_32 .text.ib_uverbs_modify_qp > 8d: e8 fc ff ff ff call 8e <ib_uverbs_modify_qp+0x8e> > - 8e: R_386_PC32 .text.trace_kmalloc.constprop.3 > + 8e: R_386_PC32 .text.trace_kmalloc.constprop.4 > 92: 5a pop %edx > 93: ba f4 ff ff ff mov $0xfffffff4,%edx > 98: 85 db test %ebx,%ebx > @@ -3129,7 +3107,7 @@ > 6a: b8 4c 00 00 00 mov $0x4c,%eax > 6b: R_386_32 .text.ib_uverbs_create_ah > 6f: e8 fc ff ff ff call 70 <ib_uverbs_create_ah+0x70> > - 70: R_386_PC32 .text.trace_kmalloc.constprop.3 > + 70: R_386_PC32 .text.trace_kmalloc.constprop.4 > 74: 58 pop %eax > 75: 85 db test %ebx,%ebx > 77: 75 0a jne 83 <ib_uverbs_create_ah+0x83> > @@ -3396,7 +3374,7 @@ > af: b8 91 00 00 00 mov $0x91,%eax > b0: R_386_32 .text.ib_uverbs_attach_mcast > b4: e8 fc ff ff ff call b5 <ib_uverbs_attach_mcast+0xb5> > - b5: R_386_PC32 .text.trace_kmalloc.constprop.3 > + b5: R_386_PC32 .text.trace_kmalloc.constprop.4 > b9: 58 pop %eax > ba: 85 db test %ebx,%ebx > bc: 75 07 jne c5 <ib_uverbs_attach_mcast+0xc5> > @@ -3572,7 +3550,7 @@ > 77: b8 5e 00 00 00 mov $0x5e,%eax > 78: R_386_32 .text.ib_uverbs_create_srq > 7c: e8 fc ff ff ff call 7d <ib_uverbs_create_srq+0x7d> > - 7d: R_386_PC32 .text.trace_kmalloc.constprop.3 > + 7d: R_386_PC32 .text.trace_kmalloc.constprop.4 > 81: ba f4 ff ff ff mov $0xfffffff4,%edx > 86: 58 pop %eax > 87: 85 db test %ebx,%ebx > > Needless to say, after my change the diff only shows the truly changed > functions. > > - Michael > Updated patch gcc: 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> Make function clone name numbering independent. * cgraph.h (clone_function_name_1): Add clone_num argument * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids. (clone_function_name): Use counters from the hashtable. lto: 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> * lto-partition.c (privatize_symbol_name_1): Pass 0 to clone_function_name_1. testsuite: 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> Clone numbering should be independent for each function. * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test. --- gcc/cgraph.h | 2 +- gcc/cgraphclones.c | 16 ++++++++--- gcc/lto/lto-partition.c | 2 +- gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++ 4 files changed, 52 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c diff --git gcc/cgraph.h gcc/cgraph.h index a8b1b4c..fe44bd0 100644 --- gcc/cgraph.h +++ gcc/cgraph.h @@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool, profile_count); tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree); /* In cgraphclones.c */ -tree clone_function_name_1 (const char *, const char *); +tree clone_function_name_1 (const char *, const char *, unsigned int); tree clone_function_name (tree decl, const char *); void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *, diff --git gcc/cgraphclones.c gcc/cgraphclones.c index 6e84a31..7de7a65 100644 --- gcc/cgraphclones.c +++ gcc/cgraphclones.c @@ -512,13 +512,13 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, return new_node; } -static GTY(()) unsigned int clone_fn_id_num; +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; /* Return a new assembler name for a clone with SUFFIX of a decl named NAME. */ tree -clone_function_name_1 (const char *name, const char *suffix) +clone_function_name_1 (const char *name, const char *suffix, unsigned int clone_num) { size_t len = strlen (name); char *tmp_name, *prefix; @@ -527,7 +527,7 @@ clone_function_name_1 (const char *name, const char *suffix) memcpy (prefix, name, len); strcpy (prefix + len + 1, suffix); prefix[len] = symbol_table::symbol_suffix_separator (); - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); + ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_num); return get_identifier (tmp_name); } @@ -537,7 +537,15 @@ tree clone_function_name (tree decl, const char *suffix) { tree name = DECL_ASSEMBLER_NAME (decl); - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); + const char *decl_name = IDENTIFIER_POINTER (name); + unsigned int *suffix_counter; + if (!clone_fn_ids) { + /* Initialize the per-function counter hash table on first call */ + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); + } + suffix_counter = &clone_fn_ids->get_or_insert(decl_name); + *suffix_counter = *suffix_counter + 1; + return clone_function_name_1 (decl_name, suffix, *suffix_counter); } diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c index c7a5710..4b15232 100644 --- gcc/lto/lto-partition.c +++ gcc/lto/lto-partition.c @@ -965,7 +965,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl) name = maybe_rewrite_identifier (name); symtab->change_decl_assembler_name (decl, clone_function_name_1 (name, - "lto_priv")); + "lto_priv", 0)); if (node->lto_file_data) lto_record_renamed_decl (node->lto_file_data, name, diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c new file mode 100644 index 0000000..d723e20 --- /dev/null +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp" } */ + +extern int printf (const char *, ...); + +static int __attribute__ ((noinline)) +foo (int arg) +{ + return 7 * arg; +} + +static int __attribute__ ((noinline)) +bar (int arg) +{ + return arg * arg; +} + +int +baz (int arg) +{ + printf("%d\n", bar (3)); + printf("%d\n", bar (4)); + printf("%d\n", foo (5)); + printf("%d\n", foo (6)); + /* adding or removing the following call should not affect foo + function's clone numbering */ + printf("%d\n", bar (7)); + return foo (8); +} + +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */ +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */ +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
On 2018-07-26 01:27 PM, Michael Ploujnikov wrote: > On 2018-07-24 09:57 AM, Michael Ploujnikov wrote: >> On 2018-07-20 06:05 AM, Richard Biener wrote: >>>> /* Return a new assembler name for a clone with SUFFIX of a decl named >>>> NAME. */ >>>> @@ -521,14 +521,13 @@ tree >>>> clone_function_name_1 (const char *name, const char *suffix) >>> >>> pass this function the counter to use.... >>> >>>> { >>>> size_t len = strlen (name); >>>> - char *tmp_name, *prefix; >>>> + char *prefix; >>>> >>>> prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); >>>> memcpy (prefix, name, len); >>>> strcpy (prefix + len + 1, suffix); >>>> prefix[len] = symbol_table::symbol_suffix_separator (); >>>> - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); >>> >>> and keep using ASM_FORMAT_PRIVATE_NAME here. You need to change >>> the lto/lto-partition.c caller (just use zero as counter). >>> >>>> - return get_identifier (tmp_name); >>>> + return get_identifier (prefix); >>>> } >>>> >>>> /* Return a new assembler name for a clone of DECL with SUFFIX. */ >>>> @@ -537,7 +536,17 @@ tree >>>> clone_function_name (tree decl, const char *suffix) >>>> { >>>> tree name = DECL_ASSEMBLER_NAME (decl); >>>> - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); >>>> + const char *decl_name = IDENTIFIER_POINTER (name); >>>> + char *numbered_name; >>>> + unsigned int *suffix_counter; >>>> + if (!clone_fn_ids) { >>>> + /* Initialize the per-function counter hash table if this is the first call */ >>>> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); >>>> + } >>> >>> I still do not like throwing memory at the problem in this way for the >>> little benefit >>> this change provides :/ >>> >>> So no approval from me at this point... >>> >>> Richard. >> >> Can you give me an idea of the memory constraints that are involved? >> >> The highest memory usage increase that I could find was when compiling >> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB >> increase over the before-patch use of 6936kB which is barely 0.03%. >> >> Using a single counter can result in more confusing namespacing when >> you have .bar.clone.4 despite there only being 3 clones of .bar. >> >> From a practical point of view this change is helpful to anyone >> diffing binary output such as forensic analysts, Debian Reproducible >> Builds or even someone validating compiler output (before and after an input >> source patch). The extra changes that this patch alleviates are a >> distraction and could even be misleading. For example, applying a >> source patch to the same Linux source produces the following binary >> diff before my change: >> >> --- /tmp/output.o.objdump >> +++ /tmp/patched-output.o.objdump >> @@ -1,5 +1,5 @@ >> >> -/tmp/uverbs_cmd/output.o: file format elf32-i386 >> +/tmp/uverbs_cmd/patched-output.o: file format elf32-i386 >> >> >> Disassembly of section .text.get_order: >> @@ -265,12 +265,12 @@ >> 3: e9 fc ff ff ff jmp 4 <put_cq_read+0x4> >> 4: R_386_PC32 .text.put_uobj_read >> >> -Disassembly of section .text.trace_kmalloc.constprop.3: >> +Disassembly of section .text.trace_kmalloc.constprop.4: >> >> -00000000 <trace_kmalloc.constprop.3>: >> +00000000 <trace_kmalloc.constprop.4>: >> 0: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 >> 2: R_386_32 __tracepoint_kmalloc >> - 7: 74 34 je 3d <trace_kmalloc.constprop.3+0x3d> >> + 7: 74 34 je 3d <trace_kmalloc.constprop.4+0x3d> >> 9: 55 push %ebp >> a: 89 cd mov %ecx,%ebp >> c: 57 push %edi >> @@ -281,7 +281,7 @@ >> 13: 8b 1d 10 00 00 00 mov 0x10,%ebx >> 15: R_386_32 __tracepoint_kmalloc >> 19: 85 db test %ebx,%ebx >> - 1b: 74 1b je 38 <trace_kmalloc.constprop.3+0x38> >> + 1b: 74 1b je 38 <trace_kmalloc.constprop.4+0x38> >> 1d: 68 d0 00 00 00 push $0xd0 >> 22: 89 fa mov %edi,%edx >> 24: 89 f0 mov %esi,%eax >> @@ -292,7 +292,7 @@ >> 31: 58 pop %eax >> 32: 83 3b 00 cmpl $0x0,(%ebx) >> 35: 5a pop %edx >> - 36: eb e3 jmp 1b <trace_kmalloc.constprop.3+0x1b> >> + 36: eb e3 jmp 1b <trace_kmalloc.constprop.4+0x1b> >> 38: 5b pop %ebx >> 39: 5e pop %esi >> 3a: 5f pop %edi >> @@ -846,7 +846,7 @@ >> 78: b8 5f 00 00 00 mov $0x5f,%eax >> 79: R_386_32 .text.ib_uverbs_alloc_pd >> 7d: e8 fc ff ff ff call 7e <ib_uverbs_alloc_pd+0x7e> >> - 7e: R_386_PC32 .text.trace_kmalloc.constprop.3 >> + 7e: R_386_PC32 .text.trace_kmalloc.constprop.4 >> 82: c7 45 d4 f4 ff ff ff movl $0xfffffff4,-0x2c(%ebp) >> 89: 59 pop %ecx >> 8a: 85 db test %ebx,%ebx >> @@ -1068,7 +1068,7 @@ >> 9c: b8 83 00 00 00 mov $0x83,%eax >> 9d: R_386_32 .text.ib_uverbs_reg_mr >> a1: e8 fc ff ff ff call a2 <ib_uverbs_reg_mr+0xa2> >> - a2: R_386_PC32 .text.trace_kmalloc.constprop.3 >> + a2: R_386_PC32 .text.trace_kmalloc.constprop.4 >> a6: ba f4 ff ff ff mov $0xfffffff4,%edx >> ab: 58 pop %eax >> ac: 85 db test %ebx,%ebx >> @@ -1385,7 +1385,7 @@ >> 99: b8 7b 00 00 00 mov $0x7b,%eax >> 9a: R_386_32 .text.ib_uverbs_create_cq >> 9e: e8 fc ff ff ff call 9f <ib_uverbs_create_cq+0x9f> >> - 9f: R_386_PC32 .text.trace_kmalloc.constprop.3 >> + 9f: R_386_PC32 .text.trace_kmalloc.constprop.4 >> a3: 58 pop %eax >> a4: 85 db test %ebx,%ebx >> a6: 75 0a jne b2 <ib_uverbs_create_cq+0xb2> >> @@ -1607,129 +1607,107 @@ >> 00000000 <ib_uverbs_poll_cq>: >> 0: 55 push %ebp >> 1: 57 push %edi >> - 2: 89 c7 mov %eax,%edi >> - 4: 56 push %esi >> - 5: 89 ce mov %ecx,%esi >> - 7: b9 10 00 00 00 mov $0x10,%ecx >> - c: 53 push %ebx >> - d: 83 ec 18 sub $0x18,%esp >> - 10: 8d 44 24 08 lea 0x8(%esp),%eax >> - 14: e8 fc ff ff ff call 15 <ib_uverbs_poll_cq+0x15> >> - 15: R_386_PC32 copy_from_user >> - 19: 85 c0 test %eax,%eax >> - 1b: 0f 85 34 01 00 00 jne 155 <ib_uverbs_poll_cq+0x155> >> - 21: 6b 44 24 14 34 imul $0x34,0x14(%esp),%eax >> - 26: ba d0 00 00 00 mov $0xd0,%edx >> - 2b: e8 fc ff ff ff call 2c <ib_uverbs_poll_cq+0x2c> >> - 2c: R_386_PC32 __kmalloc >> - 30: 89 c5 mov %eax,%ebp >> - 32: 85 c0 test %eax,%eax >> - 34: 0f 84 22 01 00 00 je 15c <ib_uverbs_poll_cq+0x15c> >> - 3a: 6b 44 24 14 30 imul $0x30,0x14(%esp),%eax >> - 3f: ba d0 00 00 00 mov $0xd0,%edx >> - 44: 83 c0 08 add $0x8,%eax >> - 47: 89 44 24 04 mov %eax,0x4(%esp) >> - 4b: e8 fc ff ff ff call 4c <ib_uverbs_poll_cq+0x4c> >> - 4c: R_386_PC32 __kmalloc >> - 50: ba f4 ff ff ff mov $0xfffffff4,%edx >> - 55: 89 04 24 mov %eax,(%esp) >> - 58: 85 c0 test %eax,%eax >> - 5a: 0f 84 e1 00 00 00 je 141 <ib_uverbs_poll_cq+0x141> >> - 60: 8b 4f 58 mov 0x58(%edi),%ecx >> - 63: 6a 00 push $0x0 >> - 65: b8 00 00 00 00 mov $0x0,%eax >> - 66: R_386_32 ib_uverbs_cq_idr >> - 6a: 8b 54 24 14 mov 0x14(%esp),%edx >> - 6e: e8 fc ff ff ff call 6f <ib_uverbs_poll_cq+0x6f> >> - 6f: R_386_PC32 .text.idr_read_obj >> - 73: ba ea ff ff ff mov $0xffffffea,%edx >> - 78: 89 c7 mov %eax,%edi >> - 7a: 58 pop %eax >> - 7b: 85 ff test %edi,%edi >> - 7d: 0f 84 ae 00 00 00 je 131 <ib_uverbs_poll_cq+0x131> >> - 83: 8b 1f mov (%edi),%ebx >> - 85: 8b 54 24 14 mov 0x14(%esp),%edx >> - 89: 89 e9 mov %ebp,%ecx >> - 8b: 89 f8 mov %edi,%eax >> - 8d: ff 93 70 01 00 00 call *0x170(%ebx) >> - 93: 8b 1c 24 mov (%esp),%ebx >> - 96: 89 03 mov %eax,(%ebx) >> - 98: 89 f8 mov %edi,%eax >> - 9a: e8 fc ff ff ff call 9b <ib_uverbs_poll_cq+0x9b> >> - 9b: R_386_PC32 .text.put_cq_read >> - 9f: 8b 1c 24 mov (%esp),%ebx >> - a2: 89 e8 mov %ebp,%eax >> - a4: 6b 3b 34 imul $0x34,(%ebx),%edi >> - a7: 8d 53 08 lea 0x8(%ebx),%edx >> - aa: 01 ef add %ebp,%edi >> - ac: 39 f8 cmp %edi,%eax >> - ae: 74 67 je 117 <ib_uverbs_poll_cq+0x117> >> - b0: 8b 08 mov (%eax),%ecx >> - b2: 8b 58 04 mov 0x4(%eax),%ebx >> - b5: 83 c2 30 add $0x30,%edx >> - b8: 83 c0 34 add $0x34,%eax >> - bb: 89 4a d0 mov %ecx,-0x30(%edx) >> - be: 89 5a d4 mov %ebx,-0x2c(%edx) >> - c1: 8b 48 d4 mov -0x2c(%eax),%ecx >> - c4: 89 4a d8 mov %ecx,-0x28(%edx) >> - c7: 8b 48 d8 mov -0x28(%eax),%ecx >> - ca: 89 4a dc mov %ecx,-0x24(%edx) >> - cd: 8b 48 dc mov -0x24(%eax),%ecx >> - d0: 89 4a e0 mov %ecx,-0x20(%edx) >> - d3: 8b 48 e0 mov -0x20(%eax),%ecx >> - d6: 89 4a e4 mov %ecx,-0x1c(%edx) >> - d9: 8b 48 e8 mov -0x18(%eax),%ecx >> - dc: 89 4a e8 mov %ecx,-0x18(%edx) >> - df: 8b 48 e4 mov -0x1c(%eax),%ecx >> - e2: 8b 49 20 mov 0x20(%ecx),%ecx >> - e5: 89 4a ec mov %ecx,-0x14(%edx) >> - e8: 8b 48 ec mov -0x14(%eax),%ecx >> - eb: 89 4a f0 mov %ecx,-0x10(%edx) >> - ee: 8b 48 f0 mov -0x10(%eax),%ecx >> - f1: 89 4a f4 mov %ecx,-0xc(%edx) >> - f4: 8b 48 f4 mov -0xc(%eax),%ecx >> - f7: 66 89 4a f8 mov %cx,-0x8(%edx) >> - fb: 66 8b 48 f6 mov -0xa(%eax),%cx >> - ff: 66 89 4a fa mov %cx,-0x6(%edx) >> - 103: 8a 48 f8 mov -0x8(%eax),%cl >> - 106: 88 4a fc mov %cl,-0x4(%edx) >> - 109: 8a 48 f9 mov -0x7(%eax),%cl >> - 10c: 88 4a fd mov %cl,-0x3(%edx) >> - 10f: 8a 48 fa mov -0x6(%eax),%cl >> - 112: 88 4a fe mov %cl,-0x2(%edx) >> - 115: eb 95 jmp ac <ib_uverbs_poll_cq+0xac> >> - 117: 8b 14 24 mov (%esp),%edx >> - 11a: 8b 4c 24 04 mov 0x4(%esp),%ecx >> - 11e: 8b 44 24 08 mov 0x8(%esp),%eax >> - 122: e8 fc ff ff ff call 123 <ib_uverbs_poll_cq+0x123> >> - 123: R_386_PC32 copy_to_user >> - 127: 83 f8 01 cmp $0x1,%eax >> - 12a: 19 d2 sbb %edx,%edx >> - 12c: f7 d2 not %edx >> - 12e: 83 e2 f2 and $0xfffffff2,%edx >> - 131: 8b 04 24 mov (%esp),%eax >> - 134: 89 54 24 04 mov %edx,0x4(%esp) >> - 138: e8 fc ff ff ff call 139 <ib_uverbs_poll_cq+0x139> >> - 139: R_386_PC32 kfree >> - 13d: 8b 54 24 04 mov 0x4(%esp),%edx >> - 141: 89 e8 mov %ebp,%eax >> - 143: 89 14 24 mov %edx,(%esp) >> - 146: e8 fc ff ff ff call 147 <ib_uverbs_poll_cq+0x147> >> - 147: R_386_PC32 kfree >> - 14b: 8b 14 24 mov (%esp),%edx >> - 14e: 85 d2 test %edx,%edx >> - 150: 0f 45 f2 cmovne %edx,%esi >> - 153: eb 0c jmp 161 <ib_uverbs_poll_cq+0x161> >> - 155: be f2 ff ff ff mov $0xfffffff2,%esi >> - 15a: eb 05 jmp 161 <ib_uverbs_poll_cq+0x161> >> - 15c: be f4 ff ff ff mov $0xfffffff4,%esi >> - 161: 83 c4 18 add $0x18,%esp >> - 164: 89 f0 mov %esi,%eax >> - 166: 5b pop %ebx >> - 167: 5e pop %esi >> - 168: 5f pop %edi >> - 169: 5d pop %ebp >> - 16a: c3 ret >> + 2: bf f2 ff ff ff mov $0xfffffff2,%edi >> + 7: 56 push %esi >> + 8: 53 push %ebx >> + 9: 89 c3 mov %eax,%ebx >> + b: 81 ec 84 00 00 00 sub $0x84,%esp >> + 11: 89 4c 24 04 mov %ecx,0x4(%esp) >> + 15: 8d 44 24 10 lea 0x10(%esp),%eax >> + 19: b9 10 00 00 00 mov $0x10,%ecx >> + 1e: e8 fc ff ff ff call 1f <ib_uverbs_poll_cq+0x1f> >> + 1f: R_386_PC32 copy_from_user >> + 23: 89 04 24 mov %eax,(%esp) >> + 26: 85 c0 test %eax,%eax >> + 28: 0f 85 1f 01 00 00 jne 14d <ib_uverbs_poll_cq+0x14d> >> + 2e: 8b 4b 58 mov 0x58(%ebx),%ecx >> + 31: 6a 00 push $0x0 >> + 33: b8 00 00 00 00 mov $0x0,%eax >> + 34: R_386_32 ib_uverbs_cq_idr >> + 38: bf ea ff ff ff mov $0xffffffea,%edi >> + 3d: 8b 54 24 1c mov 0x1c(%esp),%edx >> + 41: e8 fc ff ff ff call 42 <ib_uverbs_poll_cq+0x42> >> + 42: R_386_PC32 .text.idr_read_obj >> + 46: 89 c3 mov %eax,%ebx >> + 48: 58 pop %eax >> + 49: 85 db test %ebx,%ebx >> + 4b: 0f 84 fc 00 00 00 je 14d <ib_uverbs_poll_cq+0x14d> >> + 51: 8b 6c 24 10 mov 0x10(%esp),%ebp >> + 55: b9 02 00 00 00 mov $0x2,%ecx >> + 5a: 8d 7c 24 08 lea 0x8(%esp),%edi >> + 5e: 8b 04 24 mov (%esp),%eax >> + 61: 8d 75 08 lea 0x8(%ebp),%esi >> + 64: f3 ab rep stos %eax,%es:(%edi) >> + 66: 8b 44 24 1c mov 0x1c(%esp),%eax >> + 6a: 39 44 24 08 cmp %eax,0x8(%esp) >> + 6e: 73 1f jae 8f <ib_uverbs_poll_cq+0x8f> >> + 70: 8b 3b mov (%ebx),%edi >> + 72: 8d 4c 24 50 lea 0x50(%esp),%ecx >> + 76: ba 01 00 00 00 mov $0x1,%edx >> + 7b: 89 d8 mov %ebx,%eax >> + 7d: ff 97 70 01 00 00 call *0x170(%edi) >> + 83: 89 c7 mov %eax,%edi >> + 85: 85 c0 test %eax,%eax >> + 87: 0f 88 b9 00 00 00 js 146 <ib_uverbs_poll_cq+0x146> >> + 8d: 75 21 jne b0 <ib_uverbs_poll_cq+0xb0> >> + 8f: b9 08 00 00 00 mov $0x8,%ecx >> + 94: 8d 54 24 08 lea 0x8(%esp),%edx >> + 98: 89 e8 mov %ebp,%eax >> + 9a: bf f2 ff ff ff mov $0xfffffff2,%edi >> + 9f: e8 fc ff ff ff call a0 <ib_uverbs_poll_cq+0xa0> >> + a0: R_386_PC32 copy_to_user >> + a4: 85 c0 test %eax,%eax >> + a6: 0f 44 7c 24 04 cmove 0x4(%esp),%edi >> + ab: e9 96 00 00 00 jmp 146 <ib_uverbs_poll_cq+0x146> >> + b0: 8b 44 24 50 mov 0x50(%esp),%eax >> + b4: 8b 54 24 54 mov 0x54(%esp),%edx >> + b8: b9 30 00 00 00 mov $0x30,%ecx >> + bd: 89 44 24 20 mov %eax,0x20(%esp) >> + c1: 8b 44 24 58 mov 0x58(%esp),%eax >> + c5: 89 54 24 24 mov %edx,0x24(%esp) >> + c9: 8b 54 24 74 mov 0x74(%esp),%edx >> + cd: 89 44 24 28 mov %eax,0x28(%esp) >> + d1: 8b 44 24 5c mov 0x5c(%esp),%eax >> + d5: 89 44 24 2c mov %eax,0x2c(%esp) >> + d9: 8b 44 24 60 mov 0x60(%esp),%eax >> + dd: 89 44 24 30 mov %eax,0x30(%esp) >> + e1: 8b 44 24 64 mov 0x64(%esp),%eax >> + e5: 89 44 24 34 mov %eax,0x34(%esp) >> + e9: 8b 44 24 6c mov 0x6c(%esp),%eax >> + ed: 89 44 24 38 mov %eax,0x38(%esp) >> + f1: 8b 44 24 68 mov 0x68(%esp),%eax >> + f5: 8b 40 20 mov 0x20(%eax),%eax >> + f8: 89 54 24 44 mov %edx,0x44(%esp) >> + fc: 8d 54 24 20 lea 0x20(%esp),%edx >> + 100: c6 44 24 4f 00 movb $0x0,0x4f(%esp) >> + 105: 89 44 24 3c mov %eax,0x3c(%esp) >> + 109: 8b 44 24 70 mov 0x70(%esp),%eax >> + 10d: 89 44 24 40 mov %eax,0x40(%esp) >> + 111: 8b 44 24 78 mov 0x78(%esp),%eax >> + 115: 89 44 24 48 mov %eax,0x48(%esp) >> + 119: 8b 44 24 7c mov 0x7c(%esp),%eax >> + 11d: 66 89 44 24 4c mov %ax,0x4c(%esp) >> + 122: 8a 44 24 7e mov 0x7e(%esp),%al >> + 126: 88 44 24 4e mov %al,0x4e(%esp) >> + 12a: 89 f0 mov %esi,%eax >> + 12c: e8 fc ff ff ff call 12d <ib_uverbs_poll_cq+0x12d> >> + 12d: R_386_PC32 copy_to_user >> + 131: 85 c0 test %eax,%eax >> + 133: 75 0c jne 141 <ib_uverbs_poll_cq+0x141> >> + 135: 83 c6 30 add $0x30,%esi >> + 138: ff 44 24 08 incl 0x8(%esp) >> + 13c: e9 25 ff ff ff jmp 66 <ib_uverbs_poll_cq+0x66> >> + 141: bf f2 ff ff ff mov $0xfffffff2,%edi >> + 146: 89 d8 mov %ebx,%eax >> + 148: e8 fc ff ff ff call 149 <ib_uverbs_poll_cq+0x149> >> + 149: R_386_PC32 .text.put_cq_read >> + 14d: 81 c4 84 00 00 00 add $0x84,%esp >> + 153: 89 f8 mov %edi,%eax >> + 155: 5b pop %ebx >> + 156: 5e pop %esi >> + 157: 5f pop %edi >> + 158: 5d pop %ebp >> + 159: c3 ret >> >> Disassembly of section .text.ib_uverbs_req_notify_cq: >> >> @@ -1915,7 +1893,7 @@ >> 94: b8 7b 00 00 00 mov $0x7b,%eax >> 95: R_386_32 .text.ib_uverbs_create_qp >> 99: e8 fc ff ff ff call 9a <ib_uverbs_create_qp+0x9a> >> - 9a: R_386_PC32 .text.trace_kmalloc.constprop.3 >> + 9a: R_386_PC32 .text.trace_kmalloc.constprop.4 >> 9e: 59 pop %ecx >> 9f: c7 85 50 ff ff ff f4 movl $0xfffffff4,-0xb0(%ebp) >> a6: ff ff ff >> @@ -2241,7 +2219,7 @@ >> 68: b8 4f 00 00 00 mov $0x4f,%eax >> 69: R_386_32 .text.ib_uverbs_query_qp >> 6d: e8 fc ff ff ff call 6e <ib_uverbs_query_qp+0x6e> >> - 6e: R_386_PC32 .text.trace_kmalloc.constprop.3 >> + 6e: R_386_PC32 .text.trace_kmalloc.constprop.4 >> 72: 59 pop %ecx >> 73: c7 85 5c ff ff ff 10 movl $0x10,-0xa4(%ebp) >> 7a: 00 00 00 >> @@ -2260,7 +2238,7 @@ >> a3: b8 86 00 00 00 mov $0x86,%eax >> a4: R_386_32 .text.ib_uverbs_query_qp >> a8: e8 fc ff ff ff call a9 <ib_uverbs_query_qp+0xa9> >> - a9: R_386_PC32 .text.trace_kmalloc.constprop.3 >> + a9: R_386_PC32 .text.trace_kmalloc.constprop.4 >> ad: 5a pop %edx >> ae: 85 db test %ebx,%ebx >> b0: 0f 84 c1 01 00 00 je 277 <ib_uverbs_query_qp+0x277> >> @@ -2462,7 +2440,7 @@ >> 88: b8 6f 00 00 00 mov $0x6f,%eax >> 89: R_386_32 .text.ib_uverbs_modify_qp >> 8d: e8 fc ff ff ff call 8e <ib_uverbs_modify_qp+0x8e> >> - 8e: R_386_PC32 .text.trace_kmalloc.constprop.3 >> + 8e: R_386_PC32 .text.trace_kmalloc.constprop.4 >> 92: 5a pop %edx >> 93: ba f4 ff ff ff mov $0xfffffff4,%edx >> 98: 85 db test %ebx,%ebx >> @@ -3129,7 +3107,7 @@ >> 6a: b8 4c 00 00 00 mov $0x4c,%eax >> 6b: R_386_32 .text.ib_uverbs_create_ah >> 6f: e8 fc ff ff ff call 70 <ib_uverbs_create_ah+0x70> >> - 70: R_386_PC32 .text.trace_kmalloc.constprop.3 >> + 70: R_386_PC32 .text.trace_kmalloc.constprop.4 >> 74: 58 pop %eax >> 75: 85 db test %ebx,%ebx >> 77: 75 0a jne 83 <ib_uverbs_create_ah+0x83> >> @@ -3396,7 +3374,7 @@ >> af: b8 91 00 00 00 mov $0x91,%eax >> b0: R_386_32 .text.ib_uverbs_attach_mcast >> b4: e8 fc ff ff ff call b5 <ib_uverbs_attach_mcast+0xb5> >> - b5: R_386_PC32 .text.trace_kmalloc.constprop.3 >> + b5: R_386_PC32 .text.trace_kmalloc.constprop.4 >> b9: 58 pop %eax >> ba: 85 db test %ebx,%ebx >> bc: 75 07 jne c5 <ib_uverbs_attach_mcast+0xc5> >> @@ -3572,7 +3550,7 @@ >> 77: b8 5e 00 00 00 mov $0x5e,%eax >> 78: R_386_32 .text.ib_uverbs_create_srq >> 7c: e8 fc ff ff ff call 7d <ib_uverbs_create_srq+0x7d> >> - 7d: R_386_PC32 .text.trace_kmalloc.constprop.3 >> + 7d: R_386_PC32 .text.trace_kmalloc.constprop.4 >> 81: ba f4 ff ff ff mov $0xfffffff4,%edx >> 86: 58 pop %eax >> 87: 85 db test %ebx,%ebx >> >> Needless to say, after my change the diff only shows the truly changed >> functions. >> >> - Michael >> > > Updated patch > > > gcc: > 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> > > Make function clone name numbering independent. > * cgraph.h (clone_function_name_1): Add clone_num argument > * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids. > (clone_function_name): Use counters from the hashtable. > > lto: > 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> > > * lto-partition.c (privatize_symbol_name_1): Pass 0 to > clone_function_name_1. > > > testsuite: > 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> > > Clone numbering should be independent for each function. > * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test. > > --- > gcc/cgraph.h | 2 +- > gcc/cgraphclones.c | 16 ++++++++--- > gcc/lto/lto-partition.c | 2 +- > gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++ > 4 files changed, 52 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c > > diff --git gcc/cgraph.h gcc/cgraph.h > index a8b1b4c..fe44bd0 100644 > --- gcc/cgraph.h > +++ gcc/cgraph.h > @@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool, > profile_count); > tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree); > /* In cgraphclones.c */ > > -tree clone_function_name_1 (const char *, const char *); > +tree clone_function_name_1 (const char *, const char *, unsigned int); > tree clone_function_name (tree decl, const char *); > > void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *, > diff --git gcc/cgraphclones.c gcc/cgraphclones.c > index 6e84a31..7de7a65 100644 > --- gcc/cgraphclones.c > +++ gcc/cgraphclones.c > @@ -512,13 +512,13 @@ cgraph_node::create_clone (tree new_decl, profile_count > prof_count, > return new_node; > } > > -static GTY(()) unsigned int clone_fn_id_num; > +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; > > /* Return a new assembler name for a clone with SUFFIX of a decl named > NAME. */ > > tree > -clone_function_name_1 (const char *name, const char *suffix) > +clone_function_name_1 (const char *name, const char *suffix, unsigned int > clone_num) > { > size_t len = strlen (name); > char *tmp_name, *prefix; > @@ -527,7 +527,7 @@ clone_function_name_1 (const char *name, const char *suffix) > memcpy (prefix, name, len); > strcpy (prefix + len + 1, suffix); > prefix[len] = symbol_table::symbol_suffix_separator (); > - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); > + ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_num); > return get_identifier (tmp_name); > } > > @@ -537,7 +537,15 @@ tree > clone_function_name (tree decl, const char *suffix) > { > tree name = DECL_ASSEMBLER_NAME (decl); > - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); > + const char *decl_name = IDENTIFIER_POINTER (name); > + unsigned int *suffix_counter; > + if (!clone_fn_ids) { > + /* Initialize the per-function counter hash table on first call */ > + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); > + } > + suffix_counter = &clone_fn_ids->get_or_insert(decl_name); > + *suffix_counter = *suffix_counter + 1; > + return clone_function_name_1 (decl_name, suffix, *suffix_counter); > } > > > diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c > index c7a5710..4b15232 100644 > --- gcc/lto/lto-partition.c > +++ gcc/lto/lto-partition.c > @@ -965,7 +965,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl) > name = maybe_rewrite_identifier (name); > symtab->change_decl_assembler_name (decl, > clone_function_name_1 (name, > - "lto_priv")); > + "lto_priv", 0)); > > if (node->lto_file_data) > lto_record_renamed_decl (node->lto_file_data, name, > diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c > gcc/testsuite/gcc.dg/independent-cloneids-1.c > new file mode 100644 > index 0000000..d723e20 > --- /dev/null > +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c > @@ -0,0 +1,38 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp" } */ > + > +extern int printf (const char *, ...); > + > +static int __attribute__ ((noinline)) > +foo (int arg) > +{ > + return 7 * arg; > +} > + > +static int __attribute__ ((noinline)) > +bar (int arg) > +{ > + return arg * arg; > +} > + > +int > +baz (int arg) > +{ > + printf("%d\n", bar (3)); > + printf("%d\n", bar (4)); > + printf("%d\n", foo (5)); > + printf("%d\n", foo (6)); > + /* adding or removing the following call should not affect foo > + function's clone numbering */ > + printf("%d\n", bar (7)); > + return foo (8); > +} > + > +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */ > +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */ > +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */ > +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */ > +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */ > +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */ > +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */ > +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */ > Ping? Could someone else please take a look at this since Richard appears to be overloaded at the moment? Regards, - Michael
On Tue, Jul 31, 2018 at 7:40 PM Michael Ploujnikov <michael.ploujnikov@oracle.com> wrote: > > On 2018-07-26 01:27 PM, Michael Ploujnikov wrote: > > On 2018-07-24 09:57 AM, Michael Ploujnikov wrote: > >> On 2018-07-20 06:05 AM, Richard Biener wrote: > >>>> /* Return a new assembler name for a clone with SUFFIX of a decl named > >>>> NAME. */ > >>>> @@ -521,14 +521,13 @@ tree > >>>> clone_function_name_1 (const char *name, const char *suffix) > >>> > >>> pass this function the counter to use.... > >>> > >>>> { > >>>> size_t len = strlen (name); > >>>> - char *tmp_name, *prefix; > >>>> + char *prefix; > >>>> > >>>> prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); > >>>> memcpy (prefix, name, len); > >>>> strcpy (prefix + len + 1, suffix); > >>>> prefix[len] = symbol_table::symbol_suffix_separator (); > >>>> - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); > >>> > >>> and keep using ASM_FORMAT_PRIVATE_NAME here. You need to change > >>> the lto/lto-partition.c caller (just use zero as counter). > >>> > >>>> - return get_identifier (tmp_name); > >>>> + return get_identifier (prefix); > >>>> } > >>>> > >>>> /* Return a new assembler name for a clone of DECL with SUFFIX. */ > >>>> @@ -537,7 +536,17 @@ tree > >>>> clone_function_name (tree decl, const char *suffix) > >>>> { > >>>> tree name = DECL_ASSEMBLER_NAME (decl); > >>>> - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); > >>>> + const char *decl_name = IDENTIFIER_POINTER (name); > >>>> + char *numbered_name; > >>>> + unsigned int *suffix_counter; > >>>> + if (!clone_fn_ids) { > >>>> + /* Initialize the per-function counter hash table if this is the first call */ > >>>> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); > >>>> + } > >>> > >>> I still do not like throwing memory at the problem in this way for the > >>> little benefit > >>> this change provides :/ > >>> > >>> So no approval from me at this point... > >>> > >>> Richard. > >> > >> Can you give me an idea of the memory constraints that are involved? > >> > >> The highest memory usage increase that I could find was when compiling > >> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB > >> increase over the before-patch use of 6936kB which is barely 0.03%. > >> > >> Using a single counter can result in more confusing namespacing when > >> you have .bar.clone.4 despite there only being 3 clones of .bar. > >> > >> From a practical point of view this change is helpful to anyone > >> diffing binary output such as forensic analysts, Debian Reproducible > >> Builds or even someone validating compiler output (before and after an input > >> source patch). The extra changes that this patch alleviates are a > >> distraction and could even be misleading. For example, applying a > >> source patch to the same Linux source produces the following binary > >> diff before my change: > >> > >> --- /tmp/output.o.objdump > >> +++ /tmp/patched-output.o.objdump > >> @@ -1,5 +1,5 @@ > >> > >> -/tmp/uverbs_cmd/output.o: file format elf32-i386 > >> +/tmp/uverbs_cmd/patched-output.o: file format elf32-i386 > >> > >> > >> Disassembly of section .text.get_order: > >> @@ -265,12 +265,12 @@ > >> 3: e9 fc ff ff ff jmp 4 <put_cq_read+0x4> > >> 4: R_386_PC32 .text.put_uobj_read > >> > >> -Disassembly of section .text.trace_kmalloc.constprop.3: > >> +Disassembly of section .text.trace_kmalloc.constprop.4: > >> > >> -00000000 <trace_kmalloc.constprop.3>: > >> +00000000 <trace_kmalloc.constprop.4>: > >> 0: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 > >> 2: R_386_32 __tracepoint_kmalloc > >> - 7: 74 34 je 3d <trace_kmalloc.constprop.3+0x3d> > >> + 7: 74 34 je 3d <trace_kmalloc.constprop.4+0x3d> > >> 9: 55 push %ebp > >> a: 89 cd mov %ecx,%ebp > >> c: 57 push %edi > >> @@ -281,7 +281,7 @@ > >> 13: 8b 1d 10 00 00 00 mov 0x10,%ebx > >> 15: R_386_32 __tracepoint_kmalloc > >> 19: 85 db test %ebx,%ebx > >> - 1b: 74 1b je 38 <trace_kmalloc.constprop.3+0x38> > >> + 1b: 74 1b je 38 <trace_kmalloc.constprop.4+0x38> > >> 1d: 68 d0 00 00 00 push $0xd0 > >> 22: 89 fa mov %edi,%edx > >> 24: 89 f0 mov %esi,%eax > >> @@ -292,7 +292,7 @@ > >> 31: 58 pop %eax > >> 32: 83 3b 00 cmpl $0x0,(%ebx) > >> 35: 5a pop %edx > >> - 36: eb e3 jmp 1b <trace_kmalloc.constprop.3+0x1b> > >> + 36: eb e3 jmp 1b <trace_kmalloc.constprop.4+0x1b> > >> 38: 5b pop %ebx > >> 39: 5e pop %esi > >> 3a: 5f pop %edi > >> @@ -846,7 +846,7 @@ > >> 78: b8 5f 00 00 00 mov $0x5f,%eax > >> 79: R_386_32 .text.ib_uverbs_alloc_pd > >> 7d: e8 fc ff ff ff call 7e <ib_uverbs_alloc_pd+0x7e> > >> - 7e: R_386_PC32 .text.trace_kmalloc.constprop.3 > >> + 7e: R_386_PC32 .text.trace_kmalloc.constprop.4 > >> 82: c7 45 d4 f4 ff ff ff movl $0xfffffff4,-0x2c(%ebp) > >> 89: 59 pop %ecx > >> 8a: 85 db test %ebx,%ebx > >> @@ -1068,7 +1068,7 @@ > >> 9c: b8 83 00 00 00 mov $0x83,%eax > >> 9d: R_386_32 .text.ib_uverbs_reg_mr > >> a1: e8 fc ff ff ff call a2 <ib_uverbs_reg_mr+0xa2> > >> - a2: R_386_PC32 .text.trace_kmalloc.constprop.3 > >> + a2: R_386_PC32 .text.trace_kmalloc.constprop.4 > >> a6: ba f4 ff ff ff mov $0xfffffff4,%edx > >> ab: 58 pop %eax > >> ac: 85 db test %ebx,%ebx > >> @@ -1385,7 +1385,7 @@ > >> 99: b8 7b 00 00 00 mov $0x7b,%eax > >> 9a: R_386_32 .text.ib_uverbs_create_cq > >> 9e: e8 fc ff ff ff call 9f <ib_uverbs_create_cq+0x9f> > >> - 9f: R_386_PC32 .text.trace_kmalloc.constprop.3 > >> + 9f: R_386_PC32 .text.trace_kmalloc.constprop.4 > >> a3: 58 pop %eax > >> a4: 85 db test %ebx,%ebx > >> a6: 75 0a jne b2 <ib_uverbs_create_cq+0xb2> > >> @@ -1607,129 +1607,107 @@ > >> 00000000 <ib_uverbs_poll_cq>: > >> 0: 55 push %ebp > >> 1: 57 push %edi > >> - 2: 89 c7 mov %eax,%edi > >> - 4: 56 push %esi > >> - 5: 89 ce mov %ecx,%esi > >> - 7: b9 10 00 00 00 mov $0x10,%ecx > >> - c: 53 push %ebx > >> - d: 83 ec 18 sub $0x18,%esp > >> - 10: 8d 44 24 08 lea 0x8(%esp),%eax > >> - 14: e8 fc ff ff ff call 15 <ib_uverbs_poll_cq+0x15> > >> - 15: R_386_PC32 copy_from_user > >> - 19: 85 c0 test %eax,%eax > >> - 1b: 0f 85 34 01 00 00 jne 155 <ib_uverbs_poll_cq+0x155> > >> - 21: 6b 44 24 14 34 imul $0x34,0x14(%esp),%eax > >> - 26: ba d0 00 00 00 mov $0xd0,%edx > >> - 2b: e8 fc ff ff ff call 2c <ib_uverbs_poll_cq+0x2c> > >> - 2c: R_386_PC32 __kmalloc > >> - 30: 89 c5 mov %eax,%ebp > >> - 32: 85 c0 test %eax,%eax > >> - 34: 0f 84 22 01 00 00 je 15c <ib_uverbs_poll_cq+0x15c> > >> - 3a: 6b 44 24 14 30 imul $0x30,0x14(%esp),%eax > >> - 3f: ba d0 00 00 00 mov $0xd0,%edx > >> - 44: 83 c0 08 add $0x8,%eax > >> - 47: 89 44 24 04 mov %eax,0x4(%esp) > >> - 4b: e8 fc ff ff ff call 4c <ib_uverbs_poll_cq+0x4c> > >> - 4c: R_386_PC32 __kmalloc > >> - 50: ba f4 ff ff ff mov $0xfffffff4,%edx > >> - 55: 89 04 24 mov %eax,(%esp) > >> - 58: 85 c0 test %eax,%eax > >> - 5a: 0f 84 e1 00 00 00 je 141 <ib_uverbs_poll_cq+0x141> > >> - 60: 8b 4f 58 mov 0x58(%edi),%ecx > >> - 63: 6a 00 push $0x0 > >> - 65: b8 00 00 00 00 mov $0x0,%eax > >> - 66: R_386_32 ib_uverbs_cq_idr > >> - 6a: 8b 54 24 14 mov 0x14(%esp),%edx > >> - 6e: e8 fc ff ff ff call 6f <ib_uverbs_poll_cq+0x6f> > >> - 6f: R_386_PC32 .text.idr_read_obj > >> - 73: ba ea ff ff ff mov $0xffffffea,%edx > >> - 78: 89 c7 mov %eax,%edi > >> - 7a: 58 pop %eax > >> - 7b: 85 ff test %edi,%edi > >> - 7d: 0f 84 ae 00 00 00 je 131 <ib_uverbs_poll_cq+0x131> > >> - 83: 8b 1f mov (%edi),%ebx > >> - 85: 8b 54 24 14 mov 0x14(%esp),%edx > >> - 89: 89 e9 mov %ebp,%ecx > >> - 8b: 89 f8 mov %edi,%eax > >> - 8d: ff 93 70 01 00 00 call *0x170(%ebx) > >> - 93: 8b 1c 24 mov (%esp),%ebx > >> - 96: 89 03 mov %eax,(%ebx) > >> - 98: 89 f8 mov %edi,%eax > >> - 9a: e8 fc ff ff ff call 9b <ib_uverbs_poll_cq+0x9b> > >> - 9b: R_386_PC32 .text.put_cq_read > >> - 9f: 8b 1c 24 mov (%esp),%ebx > >> - a2: 89 e8 mov %ebp,%eax > >> - a4: 6b 3b 34 imul $0x34,(%ebx),%edi > >> - a7: 8d 53 08 lea 0x8(%ebx),%edx > >> - aa: 01 ef add %ebp,%edi > >> - ac: 39 f8 cmp %edi,%eax > >> - ae: 74 67 je 117 <ib_uverbs_poll_cq+0x117> > >> - b0: 8b 08 mov (%eax),%ecx > >> - b2: 8b 58 04 mov 0x4(%eax),%ebx > >> - b5: 83 c2 30 add $0x30,%edx > >> - b8: 83 c0 34 add $0x34,%eax > >> - bb: 89 4a d0 mov %ecx,-0x30(%edx) > >> - be: 89 5a d4 mov %ebx,-0x2c(%edx) > >> - c1: 8b 48 d4 mov -0x2c(%eax),%ecx > >> - c4: 89 4a d8 mov %ecx,-0x28(%edx) > >> - c7: 8b 48 d8 mov -0x28(%eax),%ecx > >> - ca: 89 4a dc mov %ecx,-0x24(%edx) > >> - cd: 8b 48 dc mov -0x24(%eax),%ecx > >> - d0: 89 4a e0 mov %ecx,-0x20(%edx) > >> - d3: 8b 48 e0 mov -0x20(%eax),%ecx > >> - d6: 89 4a e4 mov %ecx,-0x1c(%edx) > >> - d9: 8b 48 e8 mov -0x18(%eax),%ecx > >> - dc: 89 4a e8 mov %ecx,-0x18(%edx) > >> - df: 8b 48 e4 mov -0x1c(%eax),%ecx > >> - e2: 8b 49 20 mov 0x20(%ecx),%ecx > >> - e5: 89 4a ec mov %ecx,-0x14(%edx) > >> - e8: 8b 48 ec mov -0x14(%eax),%ecx > >> - eb: 89 4a f0 mov %ecx,-0x10(%edx) > >> - ee: 8b 48 f0 mov -0x10(%eax),%ecx > >> - f1: 89 4a f4 mov %ecx,-0xc(%edx) > >> - f4: 8b 48 f4 mov -0xc(%eax),%ecx > >> - f7: 66 89 4a f8 mov %cx,-0x8(%edx) > >> - fb: 66 8b 48 f6 mov -0xa(%eax),%cx > >> - ff: 66 89 4a fa mov %cx,-0x6(%edx) > >> - 103: 8a 48 f8 mov -0x8(%eax),%cl > >> - 106: 88 4a fc mov %cl,-0x4(%edx) > >> - 109: 8a 48 f9 mov -0x7(%eax),%cl > >> - 10c: 88 4a fd mov %cl,-0x3(%edx) > >> - 10f: 8a 48 fa mov -0x6(%eax),%cl > >> - 112: 88 4a fe mov %cl,-0x2(%edx) > >> - 115: eb 95 jmp ac <ib_uverbs_poll_cq+0xac> > >> - 117: 8b 14 24 mov (%esp),%edx > >> - 11a: 8b 4c 24 04 mov 0x4(%esp),%ecx > >> - 11e: 8b 44 24 08 mov 0x8(%esp),%eax > >> - 122: e8 fc ff ff ff call 123 <ib_uverbs_poll_cq+0x123> > >> - 123: R_386_PC32 copy_to_user > >> - 127: 83 f8 01 cmp $0x1,%eax > >> - 12a: 19 d2 sbb %edx,%edx > >> - 12c: f7 d2 not %edx > >> - 12e: 83 e2 f2 and $0xfffffff2,%edx > >> - 131: 8b 04 24 mov (%esp),%eax > >> - 134: 89 54 24 04 mov %edx,0x4(%esp) > >> - 138: e8 fc ff ff ff call 139 <ib_uverbs_poll_cq+0x139> > >> - 139: R_386_PC32 kfree > >> - 13d: 8b 54 24 04 mov 0x4(%esp),%edx > >> - 141: 89 e8 mov %ebp,%eax > >> - 143: 89 14 24 mov %edx,(%esp) > >> - 146: e8 fc ff ff ff call 147 <ib_uverbs_poll_cq+0x147> > >> - 147: R_386_PC32 kfree > >> - 14b: 8b 14 24 mov (%esp),%edx > >> - 14e: 85 d2 test %edx,%edx > >> - 150: 0f 45 f2 cmovne %edx,%esi > >> - 153: eb 0c jmp 161 <ib_uverbs_poll_cq+0x161> > >> - 155: be f2 ff ff ff mov $0xfffffff2,%esi > >> - 15a: eb 05 jmp 161 <ib_uverbs_poll_cq+0x161> > >> - 15c: be f4 ff ff ff mov $0xfffffff4,%esi > >> - 161: 83 c4 18 add $0x18,%esp > >> - 164: 89 f0 mov %esi,%eax > >> - 166: 5b pop %ebx > >> - 167: 5e pop %esi > >> - 168: 5f pop %edi > >> - 169: 5d pop %ebp > >> - 16a: c3 ret > >> + 2: bf f2 ff ff ff mov $0xfffffff2,%edi > >> + 7: 56 push %esi > >> + 8: 53 push %ebx > >> + 9: 89 c3 mov %eax,%ebx > >> + b: 81 ec 84 00 00 00 sub $0x84,%esp > >> + 11: 89 4c 24 04 mov %ecx,0x4(%esp) > >> + 15: 8d 44 24 10 lea 0x10(%esp),%eax > >> + 19: b9 10 00 00 00 mov $0x10,%ecx > >> + 1e: e8 fc ff ff ff call 1f <ib_uverbs_poll_cq+0x1f> > >> + 1f: R_386_PC32 copy_from_user > >> + 23: 89 04 24 mov %eax,(%esp) > >> + 26: 85 c0 test %eax,%eax > >> + 28: 0f 85 1f 01 00 00 jne 14d <ib_uverbs_poll_cq+0x14d> > >> + 2e: 8b 4b 58 mov 0x58(%ebx),%ecx > >> + 31: 6a 00 push $0x0 > >> + 33: b8 00 00 00 00 mov $0x0,%eax > >> + 34: R_386_32 ib_uverbs_cq_idr > >> + 38: bf ea ff ff ff mov $0xffffffea,%edi > >> + 3d: 8b 54 24 1c mov 0x1c(%esp),%edx > >> + 41: e8 fc ff ff ff call 42 <ib_uverbs_poll_cq+0x42> > >> + 42: R_386_PC32 .text.idr_read_obj > >> + 46: 89 c3 mov %eax,%ebx > >> + 48: 58 pop %eax > >> + 49: 85 db test %ebx,%ebx > >> + 4b: 0f 84 fc 00 00 00 je 14d <ib_uverbs_poll_cq+0x14d> > >> + 51: 8b 6c 24 10 mov 0x10(%esp),%ebp > >> + 55: b9 02 00 00 00 mov $0x2,%ecx > >> + 5a: 8d 7c 24 08 lea 0x8(%esp),%edi > >> + 5e: 8b 04 24 mov (%esp),%eax > >> + 61: 8d 75 08 lea 0x8(%ebp),%esi > >> + 64: f3 ab rep stos %eax,%es:(%edi) > >> + 66: 8b 44 24 1c mov 0x1c(%esp),%eax > >> + 6a: 39 44 24 08 cmp %eax,0x8(%esp) > >> + 6e: 73 1f jae 8f <ib_uverbs_poll_cq+0x8f> > >> + 70: 8b 3b mov (%ebx),%edi > >> + 72: 8d 4c 24 50 lea 0x50(%esp),%ecx > >> + 76: ba 01 00 00 00 mov $0x1,%edx > >> + 7b: 89 d8 mov %ebx,%eax > >> + 7d: ff 97 70 01 00 00 call *0x170(%edi) > >> + 83: 89 c7 mov %eax,%edi > >> + 85: 85 c0 test %eax,%eax > >> + 87: 0f 88 b9 00 00 00 js 146 <ib_uverbs_poll_cq+0x146> > >> + 8d: 75 21 jne b0 <ib_uverbs_poll_cq+0xb0> > >> + 8f: b9 08 00 00 00 mov $0x8,%ecx > >> + 94: 8d 54 24 08 lea 0x8(%esp),%edx > >> + 98: 89 e8 mov %ebp,%eax > >> + 9a: bf f2 ff ff ff mov $0xfffffff2,%edi > >> + 9f: e8 fc ff ff ff call a0 <ib_uverbs_poll_cq+0xa0> > >> + a0: R_386_PC32 copy_to_user > >> + a4: 85 c0 test %eax,%eax > >> + a6: 0f 44 7c 24 04 cmove 0x4(%esp),%edi > >> + ab: e9 96 00 00 00 jmp 146 <ib_uverbs_poll_cq+0x146> > >> + b0: 8b 44 24 50 mov 0x50(%esp),%eax > >> + b4: 8b 54 24 54 mov 0x54(%esp),%edx > >> + b8: b9 30 00 00 00 mov $0x30,%ecx > >> + bd: 89 44 24 20 mov %eax,0x20(%esp) > >> + c1: 8b 44 24 58 mov 0x58(%esp),%eax > >> + c5: 89 54 24 24 mov %edx,0x24(%esp) > >> + c9: 8b 54 24 74 mov 0x74(%esp),%edx > >> + cd: 89 44 24 28 mov %eax,0x28(%esp) > >> + d1: 8b 44 24 5c mov 0x5c(%esp),%eax > >> + d5: 89 44 24 2c mov %eax,0x2c(%esp) > >> + d9: 8b 44 24 60 mov 0x60(%esp),%eax > >> + dd: 89 44 24 30 mov %eax,0x30(%esp) > >> + e1: 8b 44 24 64 mov 0x64(%esp),%eax > >> + e5: 89 44 24 34 mov %eax,0x34(%esp) > >> + e9: 8b 44 24 6c mov 0x6c(%esp),%eax > >> + ed: 89 44 24 38 mov %eax,0x38(%esp) > >> + f1: 8b 44 24 68 mov 0x68(%esp),%eax > >> + f5: 8b 40 20 mov 0x20(%eax),%eax > >> + f8: 89 54 24 44 mov %edx,0x44(%esp) > >> + fc: 8d 54 24 20 lea 0x20(%esp),%edx > >> + 100: c6 44 24 4f 00 movb $0x0,0x4f(%esp) > >> + 105: 89 44 24 3c mov %eax,0x3c(%esp) > >> + 109: 8b 44 24 70 mov 0x70(%esp),%eax > >> + 10d: 89 44 24 40 mov %eax,0x40(%esp) > >> + 111: 8b 44 24 78 mov 0x78(%esp),%eax > >> + 115: 89 44 24 48 mov %eax,0x48(%esp) > >> + 119: 8b 44 24 7c mov 0x7c(%esp),%eax > >> + 11d: 66 89 44 24 4c mov %ax,0x4c(%esp) > >> + 122: 8a 44 24 7e mov 0x7e(%esp),%al > >> + 126: 88 44 24 4e mov %al,0x4e(%esp) > >> + 12a: 89 f0 mov %esi,%eax > >> + 12c: e8 fc ff ff ff call 12d <ib_uverbs_poll_cq+0x12d> > >> + 12d: R_386_PC32 copy_to_user > >> + 131: 85 c0 test %eax,%eax > >> + 133: 75 0c jne 141 <ib_uverbs_poll_cq+0x141> > >> + 135: 83 c6 30 add $0x30,%esi > >> + 138: ff 44 24 08 incl 0x8(%esp) > >> + 13c: e9 25 ff ff ff jmp 66 <ib_uverbs_poll_cq+0x66> > >> + 141: bf f2 ff ff ff mov $0xfffffff2,%edi > >> + 146: 89 d8 mov %ebx,%eax > >> + 148: e8 fc ff ff ff call 149 <ib_uverbs_poll_cq+0x149> > >> + 149: R_386_PC32 .text.put_cq_read > >> + 14d: 81 c4 84 00 00 00 add $0x84,%esp > >> + 153: 89 f8 mov %edi,%eax > >> + 155: 5b pop %ebx > >> + 156: 5e pop %esi > >> + 157: 5f pop %edi > >> + 158: 5d pop %ebp > >> + 159: c3 ret > >> > >> Disassembly of section .text.ib_uverbs_req_notify_cq: > >> > >> @@ -1915,7 +1893,7 @@ > >> 94: b8 7b 00 00 00 mov $0x7b,%eax > >> 95: R_386_32 .text.ib_uverbs_create_qp > >> 99: e8 fc ff ff ff call 9a <ib_uverbs_create_qp+0x9a> > >> - 9a: R_386_PC32 .text.trace_kmalloc.constprop.3 > >> + 9a: R_386_PC32 .text.trace_kmalloc.constprop.4 > >> 9e: 59 pop %ecx > >> 9f: c7 85 50 ff ff ff f4 movl $0xfffffff4,-0xb0(%ebp) > >> a6: ff ff ff > >> @@ -2241,7 +2219,7 @@ > >> 68: b8 4f 00 00 00 mov $0x4f,%eax > >> 69: R_386_32 .text.ib_uverbs_query_qp > >> 6d: e8 fc ff ff ff call 6e <ib_uverbs_query_qp+0x6e> > >> - 6e: R_386_PC32 .text.trace_kmalloc.constprop.3 > >> + 6e: R_386_PC32 .text.trace_kmalloc.constprop.4 > >> 72: 59 pop %ecx > >> 73: c7 85 5c ff ff ff 10 movl $0x10,-0xa4(%ebp) > >> 7a: 00 00 00 > >> @@ -2260,7 +2238,7 @@ > >> a3: b8 86 00 00 00 mov $0x86,%eax > >> a4: R_386_32 .text.ib_uverbs_query_qp > >> a8: e8 fc ff ff ff call a9 <ib_uverbs_query_qp+0xa9> > >> - a9: R_386_PC32 .text.trace_kmalloc.constprop.3 > >> + a9: R_386_PC32 .text.trace_kmalloc.constprop.4 > >> ad: 5a pop %edx > >> ae: 85 db test %ebx,%ebx > >> b0: 0f 84 c1 01 00 00 je 277 <ib_uverbs_query_qp+0x277> > >> @@ -2462,7 +2440,7 @@ > >> 88: b8 6f 00 00 00 mov $0x6f,%eax > >> 89: R_386_32 .text.ib_uverbs_modify_qp > >> 8d: e8 fc ff ff ff call 8e <ib_uverbs_modify_qp+0x8e> > >> - 8e: R_386_PC32 .text.trace_kmalloc.constprop.3 > >> + 8e: R_386_PC32 .text.trace_kmalloc.constprop.4 > >> 92: 5a pop %edx > >> 93: ba f4 ff ff ff mov $0xfffffff4,%edx > >> 98: 85 db test %ebx,%ebx > >> @@ -3129,7 +3107,7 @@ > >> 6a: b8 4c 00 00 00 mov $0x4c,%eax > >> 6b: R_386_32 .text.ib_uverbs_create_ah > >> 6f: e8 fc ff ff ff call 70 <ib_uverbs_create_ah+0x70> > >> - 70: R_386_PC32 .text.trace_kmalloc.constprop.3 > >> + 70: R_386_PC32 .text.trace_kmalloc.constprop.4 > >> 74: 58 pop %eax > >> 75: 85 db test %ebx,%ebx > >> 77: 75 0a jne 83 <ib_uverbs_create_ah+0x83> > >> @@ -3396,7 +3374,7 @@ > >> af: b8 91 00 00 00 mov $0x91,%eax > >> b0: R_386_32 .text.ib_uverbs_attach_mcast > >> b4: e8 fc ff ff ff call b5 <ib_uverbs_attach_mcast+0xb5> > >> - b5: R_386_PC32 .text.trace_kmalloc.constprop.3 > >> + b5: R_386_PC32 .text.trace_kmalloc.constprop.4 > >> b9: 58 pop %eax > >> ba: 85 db test %ebx,%ebx > >> bc: 75 07 jne c5 <ib_uverbs_attach_mcast+0xc5> > >> @@ -3572,7 +3550,7 @@ > >> 77: b8 5e 00 00 00 mov $0x5e,%eax > >> 78: R_386_32 .text.ib_uverbs_create_srq > >> 7c: e8 fc ff ff ff call 7d <ib_uverbs_create_srq+0x7d> > >> - 7d: R_386_PC32 .text.trace_kmalloc.constprop.3 > >> + 7d: R_386_PC32 .text.trace_kmalloc.constprop.4 > >> 81: ba f4 ff ff ff mov $0xfffffff4,%edx > >> 86: 58 pop %eax > >> 87: 85 db test %ebx,%ebx > >> > >> Needless to say, after my change the diff only shows the truly changed > >> functions. > >> > >> - Michael > >> > > > > Updated patch > > > > > > gcc: > > 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> > > > > Make function clone name numbering independent. > > * cgraph.h (clone_function_name_1): Add clone_num argument > > * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids. > > (clone_function_name): Use counters from the hashtable. > > > > lto: > > 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> > > > > * lto-partition.c (privatize_symbol_name_1): Pass 0 to > > clone_function_name_1. > > > > > > testsuite: > > 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> > > > > Clone numbering should be independent for each function. > > * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test. > > > > --- > > gcc/cgraph.h | 2 +- > > gcc/cgraphclones.c | 16 ++++++++--- > > gcc/lto/lto-partition.c | 2 +- > > gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++ > > 4 files changed, 52 insertions(+), 6 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c > > > > diff --git gcc/cgraph.h gcc/cgraph.h > > index a8b1b4c..fe44bd0 100644 > > --- gcc/cgraph.h > > +++ gcc/cgraph.h > > @@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool, > > profile_count); > > tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree); > > /* In cgraphclones.c */ > > > > -tree clone_function_name_1 (const char *, const char *); > > +tree clone_function_name_1 (const char *, const char *, unsigned int); > > tree clone_function_name (tree decl, const char *); > > > > void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *, > > diff --git gcc/cgraphclones.c gcc/cgraphclones.c > > index 6e84a31..7de7a65 100644 > > --- gcc/cgraphclones.c > > +++ gcc/cgraphclones.c > > @@ -512,13 +512,13 @@ cgraph_node::create_clone (tree new_decl, profile_count > > prof_count, > > return new_node; > > } > > > > -static GTY(()) unsigned int clone_fn_id_num; > > +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; > > > > /* Return a new assembler name for a clone with SUFFIX of a decl named > > NAME. */ > > > > tree > > -clone_function_name_1 (const char *name, const char *suffix) > > +clone_function_name_1 (const char *name, const char *suffix, unsigned int > > clone_num) > > { > > size_t len = strlen (name); > > char *tmp_name, *prefix; > > @@ -527,7 +527,7 @@ clone_function_name_1 (const char *name, const char *suffix) > > memcpy (prefix, name, len); > > strcpy (prefix + len + 1, suffix); > > prefix[len] = symbol_table::symbol_suffix_separator (); > > - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); > > + ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_num); > > return get_identifier (tmp_name); > > } > > > > @@ -537,7 +537,15 @@ tree > > clone_function_name (tree decl, const char *suffix) > > { > > tree name = DECL_ASSEMBLER_NAME (decl); > > - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); > > + const char *decl_name = IDENTIFIER_POINTER (name); > > + unsigned int *suffix_counter; > > + if (!clone_fn_ids) { > > + /* Initialize the per-function counter hash table on first call */ > > + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); > > + } > > + suffix_counter = &clone_fn_ids->get_or_insert(decl_name); > > + *suffix_counter = *suffix_counter + 1; > > + return clone_function_name_1 (decl_name, suffix, *suffix_counter); > > } > > > > > > diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c > > index c7a5710..4b15232 100644 > > --- gcc/lto/lto-partition.c > > +++ gcc/lto/lto-partition.c > > @@ -965,7 +965,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl) > > name = maybe_rewrite_identifier (name); > > symtab->change_decl_assembler_name (decl, > > clone_function_name_1 (name, > > - "lto_priv")); > > + "lto_priv", 0)); > > > > if (node->lto_file_data) > > lto_record_renamed_decl (node->lto_file_data, name, > > diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c > > gcc/testsuite/gcc.dg/independent-cloneids-1.c > > new file mode 100644 > > index 0000000..d723e20 > > --- /dev/null > > +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c > > @@ -0,0 +1,38 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp" } */ > > + > > +extern int printf (const char *, ...); > > + > > +static int __attribute__ ((noinline)) > > +foo (int arg) > > +{ > > + return 7 * arg; > > +} > > + > > +static int __attribute__ ((noinline)) > > +bar (int arg) > > +{ > > + return arg * arg; > > +} > > + > > +int > > +baz (int arg) > > +{ > > + printf("%d\n", bar (3)); > > + printf("%d\n", bar (4)); > > + printf("%d\n", foo (5)); > > + printf("%d\n", foo (6)); > > + /* adding or removing the following call should not affect foo > > + function's clone numbering */ > > + printf("%d\n", bar (7)); > > + return foo (8); > > +} > > + > > +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */ > > +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */ > > +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */ > > +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */ > > +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */ > > +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */ > > +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */ > > +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */ > > > > Ping? Could someone else please take a look at this since Richard appears to be > overloaded at the moment? The implementation is now how I suggested. I'm still not 100% agreeing to the solution of using a hash-map here as I hoped that most callers have a good idea themselves what "number" of clone they are creating. Thus most of them should be explicit in passing the number. But I didn't investigate any of them (but the LTO one which I think doesn't need the suffix at all). So if anybody besides me is fine with this approach feel free to approve. One minor nit: > > + if (!clone_fn_ids) { > > + /* Initialize the per-function counter hash table on first call */ > > + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); > > + } omit {} around single stmts, the comment doesn't add much information but if you want to keep it move it before the if(). Thanks and sorry for the delay (slowly wading through older patches...), Richard. > > > Regards, > - Michael >
On 2018-08-01 06:37 AM, Richard Biener wrote: > On Tue, Jul 31, 2018 at 7:40 PM Michael Ploujnikov > <michael.ploujnikov@oracle.com> wrote: >> >> On 2018-07-26 01:27 PM, Michael Ploujnikov wrote: >>> On 2018-07-24 09:57 AM, Michael Ploujnikov wrote: >>>> On 2018-07-20 06:05 AM, Richard Biener wrote: >>>>>> /* Return a new assembler name for a clone with SUFFIX of a decl named >>>>>> NAME. */ >>>>>> @@ -521,14 +521,13 @@ tree >>>>>> clone_function_name_1 (const char *name, const char *suffix) >>>>> >>>>> pass this function the counter to use.... >>>>> >>>>>> { >>>>>> size_t len = strlen (name); >>>>>> - char *tmp_name, *prefix; >>>>>> + char *prefix; >>>>>> >>>>>> prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); >>>>>> memcpy (prefix, name, len); >>>>>> strcpy (prefix + len + 1, suffix); >>>>>> prefix[len] = symbol_table::symbol_suffix_separator (); >>>>>> - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); >>>>> >>>>> and keep using ASM_FORMAT_PRIVATE_NAME here. You need to change >>>>> the lto/lto-partition.c caller (just use zero as counter). >>>>> >>>>>> - return get_identifier (tmp_name); >>>>>> + return get_identifier (prefix); >>>>>> } >>>>>> >>>>>> /* Return a new assembler name for a clone of DECL with SUFFIX. */ >>>>>> @@ -537,7 +536,17 @@ tree >>>>>> clone_function_name (tree decl, const char *suffix) >>>>>> { >>>>>> tree name = DECL_ASSEMBLER_NAME (decl); >>>>>> - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); >>>>>> + const char *decl_name = IDENTIFIER_POINTER (name); >>>>>> + char *numbered_name; >>>>>> + unsigned int *suffix_counter; >>>>>> + if (!clone_fn_ids) { >>>>>> + /* Initialize the per-function counter hash table if this is the first call */ >>>>>> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); >>>>>> + } >>>>> >>>>> I still do not like throwing memory at the problem in this way for the >>>>> little benefit >>>>> this change provides :/ >>>>> >>>>> So no approval from me at this point... >>>>> >>>>> Richard. >>>> >>>> Can you give me an idea of the memory constraints that are involved? >>>> >>>> The highest memory usage increase that I could find was when compiling >>>> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB >>>> increase over the before-patch use of 6936kB which is barely 0.03%. >>>> >>>> Using a single counter can result in more confusing namespacing when >>>> you have .bar.clone.4 despite there only being 3 clones of .bar. >>>> >>>> From a practical point of view this change is helpful to anyone >>>> diffing binary output such as forensic analysts, Debian Reproducible >>>> Builds or even someone validating compiler output (before and after an input >>>> source patch). The extra changes that this patch alleviates are a >>>> distraction and could even be misleading. For example, applying a >>>> source patch to the same Linux source produces the following binary >>>> diff before my change: >>>> >>>> --- /tmp/output.o.objdump >>>> +++ /tmp/patched-output.o.objdump >>>> @@ -1,5 +1,5 @@ >>>> >>>> -/tmp/uverbs_cmd/output.o: file format elf32-i386 >>>> +/tmp/uverbs_cmd/patched-output.o: file format elf32-i386 >>>> >>>> >>>> Disassembly of section .text.get_order: >>>> @@ -265,12 +265,12 @@ >>>> 3: e9 fc ff ff ff jmp 4 <put_cq_read+0x4> >>>> 4: R_386_PC32 .text.put_uobj_read >>>> >>>> -Disassembly of section .text.trace_kmalloc.constprop.3: >>>> +Disassembly of section .text.trace_kmalloc.constprop.4: >>>> >>>> -00000000 <trace_kmalloc.constprop.3>: >>>> +00000000 <trace_kmalloc.constprop.4>: >>>> 0: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 >>>> 2: R_386_32 __tracepoint_kmalloc >>>> - 7: 74 34 je 3d <trace_kmalloc.constprop.3+0x3d> >>>> + 7: 74 34 je 3d <trace_kmalloc.constprop.4+0x3d> >>>> 9: 55 push %ebp >>>> a: 89 cd mov %ecx,%ebp >>>> c: 57 push %edi >>>> @@ -281,7 +281,7 @@ >>>> 13: 8b 1d 10 00 00 00 mov 0x10,%ebx >>>> 15: R_386_32 __tracepoint_kmalloc >>>> 19: 85 db test %ebx,%ebx >>>> - 1b: 74 1b je 38 <trace_kmalloc.constprop.3+0x38> >>>> + 1b: 74 1b je 38 <trace_kmalloc.constprop.4+0x38> >>>> 1d: 68 d0 00 00 00 push $0xd0 >>>> 22: 89 fa mov %edi,%edx >>>> 24: 89 f0 mov %esi,%eax >>>> @@ -292,7 +292,7 @@ >>>> 31: 58 pop %eax >>>> 32: 83 3b 00 cmpl $0x0,(%ebx) >>>> 35: 5a pop %edx >>>> - 36: eb e3 jmp 1b <trace_kmalloc.constprop.3+0x1b> >>>> + 36: eb e3 jmp 1b <trace_kmalloc.constprop.4+0x1b> >>>> 38: 5b pop %ebx >>>> 39: 5e pop %esi >>>> 3a: 5f pop %edi >>>> @@ -846,7 +846,7 @@ >>>> 78: b8 5f 00 00 00 mov $0x5f,%eax >>>> 79: R_386_32 .text.ib_uverbs_alloc_pd >>>> 7d: e8 fc ff ff ff call 7e <ib_uverbs_alloc_pd+0x7e> >>>> - 7e: R_386_PC32 .text.trace_kmalloc.constprop.3 >>>> + 7e: R_386_PC32 .text.trace_kmalloc.constprop.4 >>>> 82: c7 45 d4 f4 ff ff ff movl $0xfffffff4,-0x2c(%ebp) >>>> 89: 59 pop %ecx >>>> 8a: 85 db test %ebx,%ebx >>>> @@ -1068,7 +1068,7 @@ >>>> 9c: b8 83 00 00 00 mov $0x83,%eax >>>> 9d: R_386_32 .text.ib_uverbs_reg_mr >>>> a1: e8 fc ff ff ff call a2 <ib_uverbs_reg_mr+0xa2> >>>> - a2: R_386_PC32 .text.trace_kmalloc.constprop.3 >>>> + a2: R_386_PC32 .text.trace_kmalloc.constprop.4 >>>> a6: ba f4 ff ff ff mov $0xfffffff4,%edx >>>> ab: 58 pop %eax >>>> ac: 85 db test %ebx,%ebx >>>> @@ -1385,7 +1385,7 @@ >>>> 99: b8 7b 00 00 00 mov $0x7b,%eax >>>> 9a: R_386_32 .text.ib_uverbs_create_cq >>>> 9e: e8 fc ff ff ff call 9f <ib_uverbs_create_cq+0x9f> >>>> - 9f: R_386_PC32 .text.trace_kmalloc.constprop.3 >>>> + 9f: R_386_PC32 .text.trace_kmalloc.constprop.4 >>>> a3: 58 pop %eax >>>> a4: 85 db test %ebx,%ebx >>>> a6: 75 0a jne b2 <ib_uverbs_create_cq+0xb2> >>>> @@ -1607,129 +1607,107 @@ >>>> 00000000 <ib_uverbs_poll_cq>: >>>> 0: 55 push %ebp >>>> 1: 57 push %edi >>>> - 2: 89 c7 mov %eax,%edi >>>> - 4: 56 push %esi >>>> - 5: 89 ce mov %ecx,%esi >>>> - 7: b9 10 00 00 00 mov $0x10,%ecx >>>> - c: 53 push %ebx >>>> - d: 83 ec 18 sub $0x18,%esp >>>> - 10: 8d 44 24 08 lea 0x8(%esp),%eax >>>> - 14: e8 fc ff ff ff call 15 <ib_uverbs_poll_cq+0x15> >>>> - 15: R_386_PC32 copy_from_user >>>> - 19: 85 c0 test %eax,%eax >>>> - 1b: 0f 85 34 01 00 00 jne 155 <ib_uverbs_poll_cq+0x155> >>>> - 21: 6b 44 24 14 34 imul $0x34,0x14(%esp),%eax >>>> - 26: ba d0 00 00 00 mov $0xd0,%edx >>>> - 2b: e8 fc ff ff ff call 2c <ib_uverbs_poll_cq+0x2c> >>>> - 2c: R_386_PC32 __kmalloc >>>> - 30: 89 c5 mov %eax,%ebp >>>> - 32: 85 c0 test %eax,%eax >>>> - 34: 0f 84 22 01 00 00 je 15c <ib_uverbs_poll_cq+0x15c> >>>> - 3a: 6b 44 24 14 30 imul $0x30,0x14(%esp),%eax >>>> - 3f: ba d0 00 00 00 mov $0xd0,%edx >>>> - 44: 83 c0 08 add $0x8,%eax >>>> - 47: 89 44 24 04 mov %eax,0x4(%esp) >>>> - 4b: e8 fc ff ff ff call 4c <ib_uverbs_poll_cq+0x4c> >>>> - 4c: R_386_PC32 __kmalloc >>>> - 50: ba f4 ff ff ff mov $0xfffffff4,%edx >>>> - 55: 89 04 24 mov %eax,(%esp) >>>> - 58: 85 c0 test %eax,%eax >>>> - 5a: 0f 84 e1 00 00 00 je 141 <ib_uverbs_poll_cq+0x141> >>>> - 60: 8b 4f 58 mov 0x58(%edi),%ecx >>>> - 63: 6a 00 push $0x0 >>>> - 65: b8 00 00 00 00 mov $0x0,%eax >>>> - 66: R_386_32 ib_uverbs_cq_idr >>>> - 6a: 8b 54 24 14 mov 0x14(%esp),%edx >>>> - 6e: e8 fc ff ff ff call 6f <ib_uverbs_poll_cq+0x6f> >>>> - 6f: R_386_PC32 .text.idr_read_obj >>>> - 73: ba ea ff ff ff mov $0xffffffea,%edx >>>> - 78: 89 c7 mov %eax,%edi >>>> - 7a: 58 pop %eax >>>> - 7b: 85 ff test %edi,%edi >>>> - 7d: 0f 84 ae 00 00 00 je 131 <ib_uverbs_poll_cq+0x131> >>>> - 83: 8b 1f mov (%edi),%ebx >>>> - 85: 8b 54 24 14 mov 0x14(%esp),%edx >>>> - 89: 89 e9 mov %ebp,%ecx >>>> - 8b: 89 f8 mov %edi,%eax >>>> - 8d: ff 93 70 01 00 00 call *0x170(%ebx) >>>> - 93: 8b 1c 24 mov (%esp),%ebx >>>> - 96: 89 03 mov %eax,(%ebx) >>>> - 98: 89 f8 mov %edi,%eax >>>> - 9a: e8 fc ff ff ff call 9b <ib_uverbs_poll_cq+0x9b> >>>> - 9b: R_386_PC32 .text.put_cq_read >>>> - 9f: 8b 1c 24 mov (%esp),%ebx >>>> - a2: 89 e8 mov %ebp,%eax >>>> - a4: 6b 3b 34 imul $0x34,(%ebx),%edi >>>> - a7: 8d 53 08 lea 0x8(%ebx),%edx >>>> - aa: 01 ef add %ebp,%edi >>>> - ac: 39 f8 cmp %edi,%eax >>>> - ae: 74 67 je 117 <ib_uverbs_poll_cq+0x117> >>>> - b0: 8b 08 mov (%eax),%ecx >>>> - b2: 8b 58 04 mov 0x4(%eax),%ebx >>>> - b5: 83 c2 30 add $0x30,%edx >>>> - b8: 83 c0 34 add $0x34,%eax >>>> - bb: 89 4a d0 mov %ecx,-0x30(%edx) >>>> - be: 89 5a d4 mov %ebx,-0x2c(%edx) >>>> - c1: 8b 48 d4 mov -0x2c(%eax),%ecx >>>> - c4: 89 4a d8 mov %ecx,-0x28(%edx) >>>> - c7: 8b 48 d8 mov -0x28(%eax),%ecx >>>> - ca: 89 4a dc mov %ecx,-0x24(%edx) >>>> - cd: 8b 48 dc mov -0x24(%eax),%ecx >>>> - d0: 89 4a e0 mov %ecx,-0x20(%edx) >>>> - d3: 8b 48 e0 mov -0x20(%eax),%ecx >>>> - d6: 89 4a e4 mov %ecx,-0x1c(%edx) >>>> - d9: 8b 48 e8 mov -0x18(%eax),%ecx >>>> - dc: 89 4a e8 mov %ecx,-0x18(%edx) >>>> - df: 8b 48 e4 mov -0x1c(%eax),%ecx >>>> - e2: 8b 49 20 mov 0x20(%ecx),%ecx >>>> - e5: 89 4a ec mov %ecx,-0x14(%edx) >>>> - e8: 8b 48 ec mov -0x14(%eax),%ecx >>>> - eb: 89 4a f0 mov %ecx,-0x10(%edx) >>>> - ee: 8b 48 f0 mov -0x10(%eax),%ecx >>>> - f1: 89 4a f4 mov %ecx,-0xc(%edx) >>>> - f4: 8b 48 f4 mov -0xc(%eax),%ecx >>>> - f7: 66 89 4a f8 mov %cx,-0x8(%edx) >>>> - fb: 66 8b 48 f6 mov -0xa(%eax),%cx >>>> - ff: 66 89 4a fa mov %cx,-0x6(%edx) >>>> - 103: 8a 48 f8 mov -0x8(%eax),%cl >>>> - 106: 88 4a fc mov %cl,-0x4(%edx) >>>> - 109: 8a 48 f9 mov -0x7(%eax),%cl >>>> - 10c: 88 4a fd mov %cl,-0x3(%edx) >>>> - 10f: 8a 48 fa mov -0x6(%eax),%cl >>>> - 112: 88 4a fe mov %cl,-0x2(%edx) >>>> - 115: eb 95 jmp ac <ib_uverbs_poll_cq+0xac> >>>> - 117: 8b 14 24 mov (%esp),%edx >>>> - 11a: 8b 4c 24 04 mov 0x4(%esp),%ecx >>>> - 11e: 8b 44 24 08 mov 0x8(%esp),%eax >>>> - 122: e8 fc ff ff ff call 123 <ib_uverbs_poll_cq+0x123> >>>> - 123: R_386_PC32 copy_to_user >>>> - 127: 83 f8 01 cmp $0x1,%eax >>>> - 12a: 19 d2 sbb %edx,%edx >>>> - 12c: f7 d2 not %edx >>>> - 12e: 83 e2 f2 and $0xfffffff2,%edx >>>> - 131: 8b 04 24 mov (%esp),%eax >>>> - 134: 89 54 24 04 mov %edx,0x4(%esp) >>>> - 138: e8 fc ff ff ff call 139 <ib_uverbs_poll_cq+0x139> >>>> - 139: R_386_PC32 kfree >>>> - 13d: 8b 54 24 04 mov 0x4(%esp),%edx >>>> - 141: 89 e8 mov %ebp,%eax >>>> - 143: 89 14 24 mov %edx,(%esp) >>>> - 146: e8 fc ff ff ff call 147 <ib_uverbs_poll_cq+0x147> >>>> - 147: R_386_PC32 kfree >>>> - 14b: 8b 14 24 mov (%esp),%edx >>>> - 14e: 85 d2 test %edx,%edx >>>> - 150: 0f 45 f2 cmovne %edx,%esi >>>> - 153: eb 0c jmp 161 <ib_uverbs_poll_cq+0x161> >>>> - 155: be f2 ff ff ff mov $0xfffffff2,%esi >>>> - 15a: eb 05 jmp 161 <ib_uverbs_poll_cq+0x161> >>>> - 15c: be f4 ff ff ff mov $0xfffffff4,%esi >>>> - 161: 83 c4 18 add $0x18,%esp >>>> - 164: 89 f0 mov %esi,%eax >>>> - 166: 5b pop %ebx >>>> - 167: 5e pop %esi >>>> - 168: 5f pop %edi >>>> - 169: 5d pop %ebp >>>> - 16a: c3 ret >>>> + 2: bf f2 ff ff ff mov $0xfffffff2,%edi >>>> + 7: 56 push %esi >>>> + 8: 53 push %ebx >>>> + 9: 89 c3 mov %eax,%ebx >>>> + b: 81 ec 84 00 00 00 sub $0x84,%esp >>>> + 11: 89 4c 24 04 mov %ecx,0x4(%esp) >>>> + 15: 8d 44 24 10 lea 0x10(%esp),%eax >>>> + 19: b9 10 00 00 00 mov $0x10,%ecx >>>> + 1e: e8 fc ff ff ff call 1f <ib_uverbs_poll_cq+0x1f> >>>> + 1f: R_386_PC32 copy_from_user >>>> + 23: 89 04 24 mov %eax,(%esp) >>>> + 26: 85 c0 test %eax,%eax >>>> + 28: 0f 85 1f 01 00 00 jne 14d <ib_uverbs_poll_cq+0x14d> >>>> + 2e: 8b 4b 58 mov 0x58(%ebx),%ecx >>>> + 31: 6a 00 push $0x0 >>>> + 33: b8 00 00 00 00 mov $0x0,%eax >>>> + 34: R_386_32 ib_uverbs_cq_idr >>>> + 38: bf ea ff ff ff mov $0xffffffea,%edi >>>> + 3d: 8b 54 24 1c mov 0x1c(%esp),%edx >>>> + 41: e8 fc ff ff ff call 42 <ib_uverbs_poll_cq+0x42> >>>> + 42: R_386_PC32 .text.idr_read_obj >>>> + 46: 89 c3 mov %eax,%ebx >>>> + 48: 58 pop %eax >>>> + 49: 85 db test %ebx,%ebx >>>> + 4b: 0f 84 fc 00 00 00 je 14d <ib_uverbs_poll_cq+0x14d> >>>> + 51: 8b 6c 24 10 mov 0x10(%esp),%ebp >>>> + 55: b9 02 00 00 00 mov $0x2,%ecx >>>> + 5a: 8d 7c 24 08 lea 0x8(%esp),%edi >>>> + 5e: 8b 04 24 mov (%esp),%eax >>>> + 61: 8d 75 08 lea 0x8(%ebp),%esi >>>> + 64: f3 ab rep stos %eax,%es:(%edi) >>>> + 66: 8b 44 24 1c mov 0x1c(%esp),%eax >>>> + 6a: 39 44 24 08 cmp %eax,0x8(%esp) >>>> + 6e: 73 1f jae 8f <ib_uverbs_poll_cq+0x8f> >>>> + 70: 8b 3b mov (%ebx),%edi >>>> + 72: 8d 4c 24 50 lea 0x50(%esp),%ecx >>>> + 76: ba 01 00 00 00 mov $0x1,%edx >>>> + 7b: 89 d8 mov %ebx,%eax >>>> + 7d: ff 97 70 01 00 00 call *0x170(%edi) >>>> + 83: 89 c7 mov %eax,%edi >>>> + 85: 85 c0 test %eax,%eax >>>> + 87: 0f 88 b9 00 00 00 js 146 <ib_uverbs_poll_cq+0x146> >>>> + 8d: 75 21 jne b0 <ib_uverbs_poll_cq+0xb0> >>>> + 8f: b9 08 00 00 00 mov $0x8,%ecx >>>> + 94: 8d 54 24 08 lea 0x8(%esp),%edx >>>> + 98: 89 e8 mov %ebp,%eax >>>> + 9a: bf f2 ff ff ff mov $0xfffffff2,%edi >>>> + 9f: e8 fc ff ff ff call a0 <ib_uverbs_poll_cq+0xa0> >>>> + a0: R_386_PC32 copy_to_user >>>> + a4: 85 c0 test %eax,%eax >>>> + a6: 0f 44 7c 24 04 cmove 0x4(%esp),%edi >>>> + ab: e9 96 00 00 00 jmp 146 <ib_uverbs_poll_cq+0x146> >>>> + b0: 8b 44 24 50 mov 0x50(%esp),%eax >>>> + b4: 8b 54 24 54 mov 0x54(%esp),%edx >>>> + b8: b9 30 00 00 00 mov $0x30,%ecx >>>> + bd: 89 44 24 20 mov %eax,0x20(%esp) >>>> + c1: 8b 44 24 58 mov 0x58(%esp),%eax >>>> + c5: 89 54 24 24 mov %edx,0x24(%esp) >>>> + c9: 8b 54 24 74 mov 0x74(%esp),%edx >>>> + cd: 89 44 24 28 mov %eax,0x28(%esp) >>>> + d1: 8b 44 24 5c mov 0x5c(%esp),%eax >>>> + d5: 89 44 24 2c mov %eax,0x2c(%esp) >>>> + d9: 8b 44 24 60 mov 0x60(%esp),%eax >>>> + dd: 89 44 24 30 mov %eax,0x30(%esp) >>>> + e1: 8b 44 24 64 mov 0x64(%esp),%eax >>>> + e5: 89 44 24 34 mov %eax,0x34(%esp) >>>> + e9: 8b 44 24 6c mov 0x6c(%esp),%eax >>>> + ed: 89 44 24 38 mov %eax,0x38(%esp) >>>> + f1: 8b 44 24 68 mov 0x68(%esp),%eax >>>> + f5: 8b 40 20 mov 0x20(%eax),%eax >>>> + f8: 89 54 24 44 mov %edx,0x44(%esp) >>>> + fc: 8d 54 24 20 lea 0x20(%esp),%edx >>>> + 100: c6 44 24 4f 00 movb $0x0,0x4f(%esp) >>>> + 105: 89 44 24 3c mov %eax,0x3c(%esp) >>>> + 109: 8b 44 24 70 mov 0x70(%esp),%eax >>>> + 10d: 89 44 24 40 mov %eax,0x40(%esp) >>>> + 111: 8b 44 24 78 mov 0x78(%esp),%eax >>>> + 115: 89 44 24 48 mov %eax,0x48(%esp) >>>> + 119: 8b 44 24 7c mov 0x7c(%esp),%eax >>>> + 11d: 66 89 44 24 4c mov %ax,0x4c(%esp) >>>> + 122: 8a 44 24 7e mov 0x7e(%esp),%al >>>> + 126: 88 44 24 4e mov %al,0x4e(%esp) >>>> + 12a: 89 f0 mov %esi,%eax >>>> + 12c: e8 fc ff ff ff call 12d <ib_uverbs_poll_cq+0x12d> >>>> + 12d: R_386_PC32 copy_to_user >>>> + 131: 85 c0 test %eax,%eax >>>> + 133: 75 0c jne 141 <ib_uverbs_poll_cq+0x141> >>>> + 135: 83 c6 30 add $0x30,%esi >>>> + 138: ff 44 24 08 incl 0x8(%esp) >>>> + 13c: e9 25 ff ff ff jmp 66 <ib_uverbs_poll_cq+0x66> >>>> + 141: bf f2 ff ff ff mov $0xfffffff2,%edi >>>> + 146: 89 d8 mov %ebx,%eax >>>> + 148: e8 fc ff ff ff call 149 <ib_uverbs_poll_cq+0x149> >>>> + 149: R_386_PC32 .text.put_cq_read >>>> + 14d: 81 c4 84 00 00 00 add $0x84,%esp >>>> + 153: 89 f8 mov %edi,%eax >>>> + 155: 5b pop %ebx >>>> + 156: 5e pop %esi >>>> + 157: 5f pop %edi >>>> + 158: 5d pop %ebp >>>> + 159: c3 ret >>>> >>>> Disassembly of section .text.ib_uverbs_req_notify_cq: >>>> >>>> @@ -1915,7 +1893,7 @@ >>>> 94: b8 7b 00 00 00 mov $0x7b,%eax >>>> 95: R_386_32 .text.ib_uverbs_create_qp >>>> 99: e8 fc ff ff ff call 9a <ib_uverbs_create_qp+0x9a> >>>> - 9a: R_386_PC32 .text.trace_kmalloc.constprop.3 >>>> + 9a: R_386_PC32 .text.trace_kmalloc.constprop.4 >>>> 9e: 59 pop %ecx >>>> 9f: c7 85 50 ff ff ff f4 movl $0xfffffff4,-0xb0(%ebp) >>>> a6: ff ff ff >>>> @@ -2241,7 +2219,7 @@ >>>> 68: b8 4f 00 00 00 mov $0x4f,%eax >>>> 69: R_386_32 .text.ib_uverbs_query_qp >>>> 6d: e8 fc ff ff ff call 6e <ib_uverbs_query_qp+0x6e> >>>> - 6e: R_386_PC32 .text.trace_kmalloc.constprop.3 >>>> + 6e: R_386_PC32 .text.trace_kmalloc.constprop.4 >>>> 72: 59 pop %ecx >>>> 73: c7 85 5c ff ff ff 10 movl $0x10,-0xa4(%ebp) >>>> 7a: 00 00 00 >>>> @@ -2260,7 +2238,7 @@ >>>> a3: b8 86 00 00 00 mov $0x86,%eax >>>> a4: R_386_32 .text.ib_uverbs_query_qp >>>> a8: e8 fc ff ff ff call a9 <ib_uverbs_query_qp+0xa9> >>>> - a9: R_386_PC32 .text.trace_kmalloc.constprop.3 >>>> + a9: R_386_PC32 .text.trace_kmalloc.constprop.4 >>>> ad: 5a pop %edx >>>> ae: 85 db test %ebx,%ebx >>>> b0: 0f 84 c1 01 00 00 je 277 <ib_uverbs_query_qp+0x277> >>>> @@ -2462,7 +2440,7 @@ >>>> 88: b8 6f 00 00 00 mov $0x6f,%eax >>>> 89: R_386_32 .text.ib_uverbs_modify_qp >>>> 8d: e8 fc ff ff ff call 8e <ib_uverbs_modify_qp+0x8e> >>>> - 8e: R_386_PC32 .text.trace_kmalloc.constprop.3 >>>> + 8e: R_386_PC32 .text.trace_kmalloc.constprop.4 >>>> 92: 5a pop %edx >>>> 93: ba f4 ff ff ff mov $0xfffffff4,%edx >>>> 98: 85 db test %ebx,%ebx >>>> @@ -3129,7 +3107,7 @@ >>>> 6a: b8 4c 00 00 00 mov $0x4c,%eax >>>> 6b: R_386_32 .text.ib_uverbs_create_ah >>>> 6f: e8 fc ff ff ff call 70 <ib_uverbs_create_ah+0x70> >>>> - 70: R_386_PC32 .text.trace_kmalloc.constprop.3 >>>> + 70: R_386_PC32 .text.trace_kmalloc.constprop.4 >>>> 74: 58 pop %eax >>>> 75: 85 db test %ebx,%ebx >>>> 77: 75 0a jne 83 <ib_uverbs_create_ah+0x83> >>>> @@ -3396,7 +3374,7 @@ >>>> af: b8 91 00 00 00 mov $0x91,%eax >>>> b0: R_386_32 .text.ib_uverbs_attach_mcast >>>> b4: e8 fc ff ff ff call b5 <ib_uverbs_attach_mcast+0xb5> >>>> - b5: R_386_PC32 .text.trace_kmalloc.constprop.3 >>>> + b5: R_386_PC32 .text.trace_kmalloc.constprop.4 >>>> b9: 58 pop %eax >>>> ba: 85 db test %ebx,%ebx >>>> bc: 75 07 jne c5 <ib_uverbs_attach_mcast+0xc5> >>>> @@ -3572,7 +3550,7 @@ >>>> 77: b8 5e 00 00 00 mov $0x5e,%eax >>>> 78: R_386_32 .text.ib_uverbs_create_srq >>>> 7c: e8 fc ff ff ff call 7d <ib_uverbs_create_srq+0x7d> >>>> - 7d: R_386_PC32 .text.trace_kmalloc.constprop.3 >>>> + 7d: R_386_PC32 .text.trace_kmalloc.constprop.4 >>>> 81: ba f4 ff ff ff mov $0xfffffff4,%edx >>>> 86: 58 pop %eax >>>> 87: 85 db test %ebx,%ebx >>>> >>>> Needless to say, after my change the diff only shows the truly changed >>>> functions. >>>> >>>> - Michael >>>> >>> >>> Updated patch >>> >>> >>> gcc: >>> 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> >>> >>> Make function clone name numbering independent. >>> * cgraph.h (clone_function_name_1): Add clone_num argument >>> * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids. >>> (clone_function_name): Use counters from the hashtable. >>> >>> lto: >>> 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> >>> >>> * lto-partition.c (privatize_symbol_name_1): Pass 0 to >>> clone_function_name_1. >>> >>> >>> testsuite: >>> 2018-07-26 Michael Ploujnikov <michael.ploujnikov@oracle.com> >>> >>> Clone numbering should be independent for each function. >>> * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test. >>> >>> --- >>> gcc/cgraph.h | 2 +- >>> gcc/cgraphclones.c | 16 ++++++++--- >>> gcc/lto/lto-partition.c | 2 +- >>> gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++ >>> 4 files changed, 52 insertions(+), 6 deletions(-) >>> create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c >>> >>> diff --git gcc/cgraph.h gcc/cgraph.h >>> index a8b1b4c..fe44bd0 100644 >>> --- gcc/cgraph.h >>> +++ gcc/cgraph.h >>> @@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool, >>> profile_count); >>> tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree); >>> /* In cgraphclones.c */ >>> >>> -tree clone_function_name_1 (const char *, const char *); >>> +tree clone_function_name_1 (const char *, const char *, unsigned int); >>> tree clone_function_name (tree decl, const char *); >>> >>> void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *, >>> diff --git gcc/cgraphclones.c gcc/cgraphclones.c >>> index 6e84a31..7de7a65 100644 >>> --- gcc/cgraphclones.c >>> +++ gcc/cgraphclones.c >>> @@ -512,13 +512,13 @@ cgraph_node::create_clone (tree new_decl, profile_count >>> prof_count, >>> return new_node; >>> } >>> >>> -static GTY(()) unsigned int clone_fn_id_num; >>> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; >>> >>> /* Return a new assembler name for a clone with SUFFIX of a decl named >>> NAME. */ >>> >>> tree >>> -clone_function_name_1 (const char *name, const char *suffix) >>> +clone_function_name_1 (const char *name, const char *suffix, unsigned int >>> clone_num) >>> { >>> size_t len = strlen (name); >>> char *tmp_name, *prefix; >>> @@ -527,7 +527,7 @@ clone_function_name_1 (const char *name, const char *suffix) >>> memcpy (prefix, name, len); >>> strcpy (prefix + len + 1, suffix); >>> prefix[len] = symbol_table::symbol_suffix_separator (); >>> - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); >>> + ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_num); >>> return get_identifier (tmp_name); >>> } >>> >>> @@ -537,7 +537,15 @@ tree >>> clone_function_name (tree decl, const char *suffix) >>> { >>> tree name = DECL_ASSEMBLER_NAME (decl); >>> - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); >>> + const char *decl_name = IDENTIFIER_POINTER (name); >>> + unsigned int *suffix_counter; >>> + if (!clone_fn_ids) { >>> + /* Initialize the per-function counter hash table on first call */ >>> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); >>> + } >>> + suffix_counter = &clone_fn_ids->get_or_insert(decl_name); >>> + *suffix_counter = *suffix_counter + 1; >>> + return clone_function_name_1 (decl_name, suffix, *suffix_counter); >>> } >>> >>> >>> diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c >>> index c7a5710..4b15232 100644 >>> --- gcc/lto/lto-partition.c >>> +++ gcc/lto/lto-partition.c >>> @@ -965,7 +965,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl) >>> name = maybe_rewrite_identifier (name); >>> symtab->change_decl_assembler_name (decl, >>> clone_function_name_1 (name, >>> - "lto_priv")); >>> + "lto_priv", 0)); >>> >>> if (node->lto_file_data) >>> lto_record_renamed_decl (node->lto_file_data, name, >>> diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c >>> gcc/testsuite/gcc.dg/independent-cloneids-1.c >>> new file mode 100644 >>> index 0000000..d723e20 >>> --- /dev/null >>> +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c >>> @@ -0,0 +1,38 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp" } */ >>> + >>> +extern int printf (const char *, ...); >>> + >>> +static int __attribute__ ((noinline)) >>> +foo (int arg) >>> +{ >>> + return 7 * arg; >>> +} >>> + >>> +static int __attribute__ ((noinline)) >>> +bar (int arg) >>> +{ >>> + return arg * arg; >>> +} >>> + >>> +int >>> +baz (int arg) >>> +{ >>> + printf("%d\n", bar (3)); >>> + printf("%d\n", bar (4)); >>> + printf("%d\n", foo (5)); >>> + printf("%d\n", foo (6)); >>> + /* adding or removing the following call should not affect foo >>> + function's clone numbering */ >>> + printf("%d\n", bar (7)); >>> + return foo (8); >>> +} >>> + >>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */ >>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */ >>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */ >>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */ >>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */ >>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */ >>> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */ >>> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */ >>> >> >> Ping? Could someone else please take a look at this since Richard appears to be >> overloaded at the moment? > > The implementation is now how I suggested. I'm still not 100% agreeing to > the solution of using a hash-map here as I hoped that most callers have > a good idea themselves what "number" of clone they are creating. Thus > most of them should be explicit in passing the number. But I didn't investigate > any of them (but the LTO one which I think doesn't need the suffix at all). > > So if anybody besides me is fine with this approach feel free to approve. > > One minor nit: > >>> + if (!clone_fn_ids) { >>> + /* Initialize the per-function counter hash table on first call */ >>> + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); >>> + } > > omit {} around single stmts, the comment doesn't add much information but > if you want to keep it move it before the if(). > > Thanks and sorry for the delay (slowly wading through older patches...), > Richard. No worries, and thank you for the review. I dug a little deeper and found that LTO does need the numbered suffixes. Otherwise I get errors in a lot of tests that look like: ... spawn -ignore SIGHUP /gcc/build/gcc/xgcc -B/gcc/build/gcc/ c_lto_pr60820_0.o c_lto_pr60820_1.o -fno-diagnostics-show-caret -fdiagnostics-color=never -flto -r -nostdlib -O2 -flinker-output=nolto-rel -o gcc-dg-lto-pr60820-01.exe pid is 52067 -52067 lto1: error: two or more sections for .gnu.lto_local_in6addr_any.lto_priv.0.lto_priv.0.3f1a2975b4946e93 lto1: internal compiler error: cannot read LTO decls from /tmp/ccwo9PaB.ltrans0.o ... which makes sense because lto-partition.c:rename_statics calls privatize_symbol_name in a loop over symbols with the same asm name. So I'm planning to go back to the version where clone_function_name_1 just takes two strings and I'm wondering if I should do: name = IDENTIFIER_POINTER (get_identifier (maybe_rewrite_identifier (name))); in privatize_symbol_name_1 to guarantee that name always persists. - Michael From 03e9191e18e3935171f0281b588b0c6191e46253 Mon Sep 17 00:00:00 2001 From: Michael Ploujnikov <michael.ploujnikov@oracle.com> Date: Mon, 16 Jul 2018 12:55:49 -0400 Subject: [PATCH] Make function assembly more independent. gcc: 2018-08-02 Michael Ploujnikov <michael.ploujnikov@oracle.com> Make function clone name numbering independent. * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids. (clone_function_name_1): Use it. lto: 2018-08-02 Michael Ploujnikov <michael.ploujnikov@oracle.com> * lto-partition.c (privatize_symbol_name_1): Pass a persistent identifier string to clone_function_name_1. testsuite: 2018-08-02 Michael Ploujnikov <michael.ploujnikov@oracle.com> Clone id counters should be completely independent from one another. * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test. --- gcc/cgraphclones.c | 10 +++++-- gcc/lto/lto-partition.c | 2 +- gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++ 3 files changed, 47 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c diff --git gcc/cgraphclones.c gcc/cgraphclones.c index 6e84a31..dfed9e5 100644 --- gcc/cgraphclones.c +++ gcc/cgraphclones.c @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, return new_node; } -static GTY(()) unsigned int clone_fn_id_num; +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; /* Return a new assembler name for a clone with SUFFIX of a decl named NAME. */ @@ -522,12 +522,18 @@ clone_function_name_1 (const char *name, const char *suffix) { size_t len = strlen (name); char *tmp_name, *prefix; + unsigned int *suffix_counter; prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); memcpy (prefix, name, len); strcpy (prefix + len + 1, suffix); prefix[len] = symbol_table::symbol_suffix_separator (); - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); + /* Initialize the per-function counter hash table on first call */ + if (!clone_fn_ids) + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); + suffix_counter = &clone_fn_ids->get_or_insert(name); + *suffix_counter = *suffix_counter + 1; + ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, *suffix_counter); return get_identifier (tmp_name); } diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c index c7a5710..fc8514a 100644 --- gcc/lto/lto-partition.c +++ gcc/lto/lto-partition.c @@ -962,7 +962,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl) if (must_not_rename (node, name)) return false; - name = maybe_rewrite_identifier (name); + name = IDENTIFIER_POINTER (get_identifier (maybe_rewrite_identifier (name))); symtab->change_decl_assembler_name (decl, clone_function_name_1 (name, "lto_priv")); diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c new file mode 100644 index 0000000..d723e20 --- /dev/null +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp" } */ + +extern int printf (const char *, ...); + +static int __attribute__ ((noinline)) +foo (int arg) +{ + return 7 * arg; +} + +static int __attribute__ ((noinline)) +bar (int arg) +{ + return arg * arg; +} + +int +baz (int arg) +{ + printf("%d\n", bar (3)); + printf("%d\n", bar (4)); + printf("%d\n", foo (5)); + printf("%d\n", foo (6)); + /* adding or removing the following call should not affect foo + function's clone numbering */ + printf("%d\n", bar (7)); + return foo (8); +} + +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */ +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */ +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */ +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
diff --git gcc/cgraphclones.c gcc/cgraphclones.c index 6e84a31..f000420 100644 --- gcc/cgraphclones.c +++ gcc/cgraphclones.c @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, return new_node; } -static GTY(()) unsigned int clone_fn_id_num; +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids; /* Return a new assembler name for a clone with SUFFIX of a decl named NAME. */ @@ -521,14 +521,13 @@ tree clone_function_name_1 (const char *name, const char *suffix) { size_t len = strlen (name); - char *tmp_name, *prefix; + char *prefix; prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); memcpy (prefix, name, len); strcpy (prefix + len + 1, suffix); prefix[len] = symbol_table::symbol_suffix_separator (); - ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++); - return get_identifier (tmp_name); + return get_identifier (prefix); } /* Return a new assembler name for a clone of DECL with SUFFIX. */ @@ -537,7 +536,17 @@ tree clone_function_name (tree decl, const char *suffix) { tree name = DECL_ASSEMBLER_NAME (decl); - return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); + char *decl_name = IDENTIFIER_POINTER (name); + char *numbered_name; + unsigned int *suffix_counter; + if (!clone_fn_ids) { + /* Initialize the per-function counter hash table if this is the first call */ + clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64); + } + suffix_counter = &clone_fn_ids->get_or_insert(name); + ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter); + *suffix_counter = *suffix_counter + 1; + return clone_function_name_1 (numbered_name, suffix); } - Michael