diff mbox series

[PING] Make function clone name numbering independent.

Message ID 3167e521-aa5b-e47c-6d9b-956a1af2a886@oracle.com
State New
Headers show
Series [PING] Make function clone name numbering independent. | expand

Commit Message

Michael Ploujnikov Aug. 13, 2018, 11:58 p.m. UTC
Ping and I've updated the patch since last time as follows:

  - unittest scans assembly rather than the constprop dump because its
    forward changed
  - unittests should handle different hosts where any of
    NO_DOT_IN_LABEL, NO_DOLLAR_IN_LABEL or __USER_LABEL_PREFIX__ may
    be defined
  - not 100% it's safe to change DECL_NAME to DECL_ASSEMBLER_NAME in
    cgraph_node::create_virtual_clone, but I've attempted to reduce
    some code duplication
  - lto-partition.c: privatize_symbol_name_1 *does* need numbered
    names
  - but cold sections don't
  - Expecting an IDENTIFIER_NODE in clone_function_name_1 avoids
    unreliable string pointer use as pointed out in the first review
  - renamed clone_function_name_1 and clone_function_name to
    numbered_clone_function_name_1 and numbered_clone_function_name to
    clarify purpose and discourage future unintended uses

Also bootstrapped and regtested.

- Michael

On 2018-08-02 03:05 PM, Michael Ploujnikov wrote:
> On 2018-08-01 06:37 AM, Richard Biener wrote:
>> On Tue, Jul 31, 2018 at 7:40 PM Michael Ploujnikov
>> <michael.ploujnikov@oracle.com> wrote:
>>>
>>> On 2018-07-26 01:27 PM, Michael Ploujnikov wrote:
>>>> On 2018-07-24 09:57 AM, Michael Ploujnikov wrote:
>>>>> On 2018-07-20 06:05 AM, Richard Biener wrote:
>>>>>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>>>>>>     NAME.  */
>>>>>>> @@ -521,14 +521,13 @@ tree
>>>>>>>  clone_function_name_1 (const char *name, const char *suffix)
>>>>>>
>>>>>> pass this function the counter to use....
>>>>>>
>>>>>>>  {
>>>>>>>    size_t len = strlen (name);
>>>>>>> -  char *tmp_name, *prefix;
>>>>>>> +  char *prefix;
>>>>>>>
>>>>>>>    prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
>>>>>>>    memcpy (prefix, name, len);
>>>>>>>    strcpy (prefix + len + 1, suffix);
>>>>>>>    prefix[len] = symbol_table::symbol_suffix_separator ();
>>>>>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>>>>>
>>>>>> and keep using ASM_FORMAT_PRIVATE_NAME here.  You need to change
>>>>>> the lto/lto-partition.c caller (just use zero as counter).
>>>>>>
>>>>>>> -  return get_identifier (tmp_name);
>>>>>>> +  return get_identifier (prefix);
>>>>>>>  }
>>>>>>>
>>>>>>>  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
>>>>>>> @@ -537,7 +536,17 @@ tree
>>>>>>>  clone_function_name (tree decl, const char *suffix)
>>>>>>>  {
>>>>>>>    tree name = DECL_ASSEMBLER_NAME (decl);
>>>>>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>>>>>> +  const char *decl_name = IDENTIFIER_POINTER (name);
>>>>>>> +  char *numbered_name;
>>>>>>> +  unsigned int *suffix_counter;
>>>>>>> +  if (!clone_fn_ids) {
>>>>>>> +    /* Initialize the per-function counter hash table if this is the first call */
>>>>>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>>>>>> +  }
>>>>>>
>>>>>> I still do not like throwing memory at the problem in this way for the
>>>>>> little benefit
>>>>>> this change provides :/
>>>>>>
>>>>>> So no approval from me at this point...
>>>>>>
>>>>>> Richard.
>>>>>
>>>>> Can you give me an idea of the memory constraints that are involved?
>>>>>
>>>>> The highest memory usage increase that I could find was when compiling
>>>>> a source file (from Linux) with roughly 10,000 functions. It showed a 2kB
>>>>> increase over the before-patch use of 6936kB which is barely 0.03%.
>>>>>
>>>>> Using a single counter can result in more confusing namespacing when
>>>>> you have .bar.clone.4 despite there only being 3 clones of .bar.
>>>>>
>>>>> From a practical point of view this change is helpful to anyone
>>>>> diffing binary output such as forensic analysts, Debian Reproducible
>>>>> Builds or even someone validating compiler output (before and after an input
>>>>> source patch). The extra changes that this patch alleviates are a
>>>>> distraction and could even be misleading. For example, applying a
>>>>> source patch to the same Linux source produces the following binary
>>>>> diff before my change:
>>>>>
>>>>> --- /tmp/output.o.objdump
>>>>> +++ /tmp/patched-output.o.objdump
>>>>> @@ -1,5 +1,5 @@
>>>>>
>>>>> -/tmp/uverbs_cmd/output.o:     file format elf32-i386
>>>>> +/tmp/uverbs_cmd/patched-output.o:     file format elf32-i386
>>>>>
>>>>>
>>>>>  Disassembly of section .text.get_order:
>>>>> @@ -265,12 +265,12 @@
>>>>>     3:       e9 fc ff ff ff          jmp    4 <put_cq_read+0x4>
>>>>>                      4: R_386_PC32   .text.put_uobj_read
>>>>>
>>>>> -Disassembly of section .text.trace_kmalloc.constprop.3:
>>>>> +Disassembly of section .text.trace_kmalloc.constprop.4:
>>>>>
>>>>> -00000000 <trace_kmalloc.constprop.3>:
>>>>> +00000000 <trace_kmalloc.constprop.4>:
>>>>>     0:       83 3d 04 00 00 00 00    cmpl   $0x0,0x4
>>>>>                      2: R_386_32     __tracepoint_kmalloc
>>>>> -   7:       74 34                   je     3d <trace_kmalloc.constprop.3+0x3d>
>>>>> +   7:       74 34                   je     3d <trace_kmalloc.constprop.4+0x3d>
>>>>>     9:       55                      push   %ebp
>>>>>     a:       89 cd                   mov    %ecx,%ebp
>>>>>     c:       57                      push   %edi
>>>>> @@ -281,7 +281,7 @@
>>>>>    13:       8b 1d 10 00 00 00       mov    0x10,%ebx
>>>>>                      15: R_386_32    __tracepoint_kmalloc
>>>>>    19:       85 db                   test   %ebx,%ebx
>>>>> -  1b:       74 1b                   je     38 <trace_kmalloc.constprop.3+0x38>
>>>>> +  1b:       74 1b                   je     38 <trace_kmalloc.constprop.4+0x38>
>>>>>    1d:       68 d0 00 00 00          push   $0xd0
>>>>>    22:       89 fa                   mov    %edi,%edx
>>>>>    24:       89 f0                   mov    %esi,%eax
>>>>> @@ -292,7 +292,7 @@
>>>>>    31:       58                      pop    %eax
>>>>>    32:       83 3b 00                cmpl   $0x0,(%ebx)
>>>>>    35:       5a                      pop    %edx
>>>>> -  36:       eb e3                   jmp    1b <trace_kmalloc.constprop.3+0x1b>
>>>>> +  36:       eb e3                   jmp    1b <trace_kmalloc.constprop.4+0x1b>
>>>>>    38:       5b                      pop    %ebx
>>>>>    39:       5e                      pop    %esi
>>>>>    3a:       5f                      pop    %edi
>>>>> @@ -846,7 +846,7 @@
>>>>>    78:       b8 5f 00 00 00          mov    $0x5f,%eax
>>>>>                      79: R_386_32    .text.ib_uverbs_alloc_pd
>>>>>    7d:       e8 fc ff ff ff          call   7e <ib_uverbs_alloc_pd+0x7e>
>>>>> -                    7e: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    7e: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    82:       c7 45 d4 f4 ff ff ff    movl   $0xfffffff4,-0x2c(%ebp)
>>>>>    89:       59                      pop    %ecx
>>>>>    8a:       85 db                   test   %ebx,%ebx
>>>>> @@ -1068,7 +1068,7 @@
>>>>>    9c:       b8 83 00 00 00          mov    $0x83,%eax
>>>>>                      9d: R_386_32    .text.ib_uverbs_reg_mr
>>>>>    a1:       e8 fc ff ff ff          call   a2 <ib_uverbs_reg_mr+0xa2>
>>>>> -                    a2: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    a2: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    a6:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>>    ab:       58                      pop    %eax
>>>>>    ac:       85 db                   test   %ebx,%ebx
>>>>> @@ -1385,7 +1385,7 @@
>>>>>    99:       b8 7b 00 00 00          mov    $0x7b,%eax
>>>>>                      9a: R_386_32    .text.ib_uverbs_create_cq
>>>>>    9e:       e8 fc ff ff ff          call   9f <ib_uverbs_create_cq+0x9f>
>>>>> -                    9f: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    9f: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    a3:       58                      pop    %eax
>>>>>    a4:       85 db                   test   %ebx,%ebx
>>>>>    a6:       75 0a                   jne    b2 <ib_uverbs_create_cq+0xb2>
>>>>> @@ -1607,129 +1607,107 @@
>>>>>  00000000 <ib_uverbs_poll_cq>:
>>>>>     0:       55                      push   %ebp
>>>>>     1:       57                      push   %edi
>>>>> -   2:       89 c7                   mov    %eax,%edi
>>>>> -   4:       56                      push   %esi
>>>>> -   5:       89 ce                   mov    %ecx,%esi
>>>>> -   7:       b9 10 00 00 00          mov    $0x10,%ecx
>>>>> -   c:       53                      push   %ebx
>>>>> -   d:       83 ec 18                sub    $0x18,%esp
>>>>> -  10:       8d 44 24 08             lea    0x8(%esp),%eax
>>>>> -  14:       e8 fc ff ff ff          call   15 <ib_uverbs_poll_cq+0x15>
>>>>> -                    15: R_386_PC32  copy_from_user
>>>>> -  19:       85 c0                   test   %eax,%eax
>>>>> -  1b:       0f 85 34 01 00 00       jne    155 <ib_uverbs_poll_cq+0x155>
>>>>> -  21:       6b 44 24 14 34          imul   $0x34,0x14(%esp),%eax
>>>>> -  26:       ba d0 00 00 00          mov    $0xd0,%edx
>>>>> -  2b:       e8 fc ff ff ff          call   2c <ib_uverbs_poll_cq+0x2c>
>>>>> -                    2c: R_386_PC32  __kmalloc
>>>>> -  30:       89 c5                   mov    %eax,%ebp
>>>>> -  32:       85 c0                   test   %eax,%eax
>>>>> -  34:       0f 84 22 01 00 00       je     15c <ib_uverbs_poll_cq+0x15c>
>>>>> -  3a:       6b 44 24 14 30          imul   $0x30,0x14(%esp),%eax
>>>>> -  3f:       ba d0 00 00 00          mov    $0xd0,%edx
>>>>> -  44:       83 c0 08                add    $0x8,%eax
>>>>> -  47:       89 44 24 04             mov    %eax,0x4(%esp)
>>>>> -  4b:       e8 fc ff ff ff          call   4c <ib_uverbs_poll_cq+0x4c>
>>>>> -                    4c: R_386_PC32  __kmalloc
>>>>> -  50:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>> -  55:       89 04 24                mov    %eax,(%esp)
>>>>> -  58:       85 c0                   test   %eax,%eax
>>>>> -  5a:       0f 84 e1 00 00 00       je     141 <ib_uverbs_poll_cq+0x141>
>>>>> -  60:       8b 4f 58                mov    0x58(%edi),%ecx
>>>>> -  63:       6a 00                   push   $0x0
>>>>> -  65:       b8 00 00 00 00          mov    $0x0,%eax
>>>>> -                    66: R_386_32    ib_uverbs_cq_idr
>>>>> -  6a:       8b 54 24 14             mov    0x14(%esp),%edx
>>>>> -  6e:       e8 fc ff ff ff          call   6f <ib_uverbs_poll_cq+0x6f>
>>>>> -                    6f: R_386_PC32  .text.idr_read_obj
>>>>> -  73:       ba ea ff ff ff          mov    $0xffffffea,%edx
>>>>> -  78:       89 c7                   mov    %eax,%edi
>>>>> -  7a:       58                      pop    %eax
>>>>> -  7b:       85 ff                   test   %edi,%edi
>>>>> -  7d:       0f 84 ae 00 00 00       je     131 <ib_uverbs_poll_cq+0x131>
>>>>> -  83:       8b 1f                   mov    (%edi),%ebx
>>>>> -  85:       8b 54 24 14             mov    0x14(%esp),%edx
>>>>> -  89:       89 e9                   mov    %ebp,%ecx
>>>>> -  8b:       89 f8                   mov    %edi,%eax
>>>>> -  8d:       ff 93 70 01 00 00       call   *0x170(%ebx)
>>>>> -  93:       8b 1c 24                mov    (%esp),%ebx
>>>>> -  96:       89 03                   mov    %eax,(%ebx)
>>>>> -  98:       89 f8                   mov    %edi,%eax
>>>>> -  9a:       e8 fc ff ff ff          call   9b <ib_uverbs_poll_cq+0x9b>
>>>>> -                    9b: R_386_PC32  .text.put_cq_read
>>>>> -  9f:       8b 1c 24                mov    (%esp),%ebx
>>>>> -  a2:       89 e8                   mov    %ebp,%eax
>>>>> -  a4:       6b 3b 34                imul   $0x34,(%ebx),%edi
>>>>> -  a7:       8d 53 08                lea    0x8(%ebx),%edx
>>>>> -  aa:       01 ef                   add    %ebp,%edi
>>>>> -  ac:       39 f8                   cmp    %edi,%eax
>>>>> -  ae:       74 67                   je     117 <ib_uverbs_poll_cq+0x117>
>>>>> -  b0:       8b 08                   mov    (%eax),%ecx
>>>>> -  b2:       8b 58 04                mov    0x4(%eax),%ebx
>>>>> -  b5:       83 c2 30                add    $0x30,%edx
>>>>> -  b8:       83 c0 34                add    $0x34,%eax
>>>>> -  bb:       89 4a d0                mov    %ecx,-0x30(%edx)
>>>>> -  be:       89 5a d4                mov    %ebx,-0x2c(%edx)
>>>>> -  c1:       8b 48 d4                mov    -0x2c(%eax),%ecx
>>>>> -  c4:       89 4a d8                mov    %ecx,-0x28(%edx)
>>>>> -  c7:       8b 48 d8                mov    -0x28(%eax),%ecx
>>>>> -  ca:       89 4a dc                mov    %ecx,-0x24(%edx)
>>>>> -  cd:       8b 48 dc                mov    -0x24(%eax),%ecx
>>>>> -  d0:       89 4a e0                mov    %ecx,-0x20(%edx)
>>>>> -  d3:       8b 48 e0                mov    -0x20(%eax),%ecx
>>>>> -  d6:       89 4a e4                mov    %ecx,-0x1c(%edx)
>>>>> -  d9:       8b 48 e8                mov    -0x18(%eax),%ecx
>>>>> -  dc:       89 4a e8                mov    %ecx,-0x18(%edx)
>>>>> -  df:       8b 48 e4                mov    -0x1c(%eax),%ecx
>>>>> -  e2:       8b 49 20                mov    0x20(%ecx),%ecx
>>>>> -  e5:       89 4a ec                mov    %ecx,-0x14(%edx)
>>>>> -  e8:       8b 48 ec                mov    -0x14(%eax),%ecx
>>>>> -  eb:       89 4a f0                mov    %ecx,-0x10(%edx)
>>>>> -  ee:       8b 48 f0                mov    -0x10(%eax),%ecx
>>>>> -  f1:       89 4a f4                mov    %ecx,-0xc(%edx)
>>>>> -  f4:       8b 48 f4                mov    -0xc(%eax),%ecx
>>>>> -  f7:       66 89 4a f8             mov    %cx,-0x8(%edx)
>>>>> -  fb:       66 8b 48 f6             mov    -0xa(%eax),%cx
>>>>> -  ff:       66 89 4a fa             mov    %cx,-0x6(%edx)
>>>>> - 103:       8a 48 f8                mov    -0x8(%eax),%cl
>>>>> - 106:       88 4a fc                mov    %cl,-0x4(%edx)
>>>>> - 109:       8a 48 f9                mov    -0x7(%eax),%cl
>>>>> - 10c:       88 4a fd                mov    %cl,-0x3(%edx)
>>>>> - 10f:       8a 48 fa                mov    -0x6(%eax),%cl
>>>>> - 112:       88 4a fe                mov    %cl,-0x2(%edx)
>>>>> - 115:       eb 95                   jmp    ac <ib_uverbs_poll_cq+0xac>
>>>>> - 117:       8b 14 24                mov    (%esp),%edx
>>>>> - 11a:       8b 4c 24 04             mov    0x4(%esp),%ecx
>>>>> - 11e:       8b 44 24 08             mov    0x8(%esp),%eax
>>>>> - 122:       e8 fc ff ff ff          call   123 <ib_uverbs_poll_cq+0x123>
>>>>> -                    123: R_386_PC32 copy_to_user
>>>>> - 127:       83 f8 01                cmp    $0x1,%eax
>>>>> - 12a:       19 d2                   sbb    %edx,%edx
>>>>> - 12c:       f7 d2                   not    %edx
>>>>> - 12e:       83 e2 f2                and    $0xfffffff2,%edx
>>>>> - 131:       8b 04 24                mov    (%esp),%eax
>>>>> - 134:       89 54 24 04             mov    %edx,0x4(%esp)
>>>>> - 138:       e8 fc ff ff ff          call   139 <ib_uverbs_poll_cq+0x139>
>>>>> -                    139: R_386_PC32 kfree
>>>>> - 13d:       8b 54 24 04             mov    0x4(%esp),%edx
>>>>> - 141:       89 e8                   mov    %ebp,%eax
>>>>> - 143:       89 14 24                mov    %edx,(%esp)
>>>>> - 146:       e8 fc ff ff ff          call   147 <ib_uverbs_poll_cq+0x147>
>>>>> -                    147: R_386_PC32 kfree
>>>>> - 14b:       8b 14 24                mov    (%esp),%edx
>>>>> - 14e:       85 d2                   test   %edx,%edx
>>>>> - 150:       0f 45 f2                cmovne %edx,%esi
>>>>> - 153:       eb 0c                   jmp    161 <ib_uverbs_poll_cq+0x161>
>>>>> - 155:       be f2 ff ff ff          mov    $0xfffffff2,%esi
>>>>> - 15a:       eb 05                   jmp    161 <ib_uverbs_poll_cq+0x161>
>>>>> - 15c:       be f4 ff ff ff          mov    $0xfffffff4,%esi
>>>>> - 161:       83 c4 18                add    $0x18,%esp
>>>>> - 164:       89 f0                   mov    %esi,%eax
>>>>> - 166:       5b                      pop    %ebx
>>>>> - 167:       5e                      pop    %esi
>>>>> - 168:       5f                      pop    %edi
>>>>> - 169:       5d                      pop    %ebp
>>>>> - 16a:       c3                      ret
>>>>> +   2:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
>>>>> +   7:       56                      push   %esi
>>>>> +   8:       53                      push   %ebx
>>>>> +   9:       89 c3                   mov    %eax,%ebx
>>>>> +   b:       81 ec 84 00 00 00       sub    $0x84,%esp
>>>>> +  11:       89 4c 24 04             mov    %ecx,0x4(%esp)
>>>>> +  15:       8d 44 24 10             lea    0x10(%esp),%eax
>>>>> +  19:       b9 10 00 00 00          mov    $0x10,%ecx
>>>>> +  1e:       e8 fc ff ff ff          call   1f <ib_uverbs_poll_cq+0x1f>
>>>>> +                    1f: R_386_PC32  copy_from_user
>>>>> +  23:       89 04 24                mov    %eax,(%esp)
>>>>> +  26:       85 c0                   test   %eax,%eax
>>>>> +  28:       0f 85 1f 01 00 00       jne    14d <ib_uverbs_poll_cq+0x14d>
>>>>> +  2e:       8b 4b 58                mov    0x58(%ebx),%ecx
>>>>> +  31:       6a 00                   push   $0x0
>>>>> +  33:       b8 00 00 00 00          mov    $0x0,%eax
>>>>> +                    34: R_386_32    ib_uverbs_cq_idr
>>>>> +  38:       bf ea ff ff ff          mov    $0xffffffea,%edi
>>>>> +  3d:       8b 54 24 1c             mov    0x1c(%esp),%edx
>>>>> +  41:       e8 fc ff ff ff          call   42 <ib_uverbs_poll_cq+0x42>
>>>>> +                    42: R_386_PC32  .text.idr_read_obj
>>>>> +  46:       89 c3                   mov    %eax,%ebx
>>>>> +  48:       58                      pop    %eax
>>>>> +  49:       85 db                   test   %ebx,%ebx
>>>>> +  4b:       0f 84 fc 00 00 00       je     14d <ib_uverbs_poll_cq+0x14d>
>>>>> +  51:       8b 6c 24 10             mov    0x10(%esp),%ebp
>>>>> +  55:       b9 02 00 00 00          mov    $0x2,%ecx
>>>>> +  5a:       8d 7c 24 08             lea    0x8(%esp),%edi
>>>>> +  5e:       8b 04 24                mov    (%esp),%eax
>>>>> +  61:       8d 75 08                lea    0x8(%ebp),%esi
>>>>> +  64:       f3 ab                   rep stos %eax,%es:(%edi)
>>>>> +  66:       8b 44 24 1c             mov    0x1c(%esp),%eax
>>>>> +  6a:       39 44 24 08             cmp    %eax,0x8(%esp)
>>>>> +  6e:       73 1f                   jae    8f <ib_uverbs_poll_cq+0x8f>
>>>>> +  70:       8b 3b                   mov    (%ebx),%edi
>>>>> +  72:       8d 4c 24 50             lea    0x50(%esp),%ecx
>>>>> +  76:       ba 01 00 00 00          mov    $0x1,%edx
>>>>> +  7b:       89 d8                   mov    %ebx,%eax
>>>>> +  7d:       ff 97 70 01 00 00       call   *0x170(%edi)
>>>>> +  83:       89 c7                   mov    %eax,%edi
>>>>> +  85:       85 c0                   test   %eax,%eax
>>>>> +  87:       0f 88 b9 00 00 00       js     146 <ib_uverbs_poll_cq+0x146>
>>>>> +  8d:       75 21                   jne    b0 <ib_uverbs_poll_cq+0xb0>
>>>>> +  8f:       b9 08 00 00 00          mov    $0x8,%ecx
>>>>> +  94:       8d 54 24 08             lea    0x8(%esp),%edx
>>>>> +  98:       89 e8                   mov    %ebp,%eax
>>>>> +  9a:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
>>>>> +  9f:       e8 fc ff ff ff          call   a0 <ib_uverbs_poll_cq+0xa0>
>>>>> +                    a0: R_386_PC32  copy_to_user
>>>>> +  a4:       85 c0                   test   %eax,%eax
>>>>> +  a6:       0f 44 7c 24 04          cmove  0x4(%esp),%edi
>>>>> +  ab:       e9 96 00 00 00          jmp    146 <ib_uverbs_poll_cq+0x146>
>>>>> +  b0:       8b 44 24 50             mov    0x50(%esp),%eax
>>>>> +  b4:       8b 54 24 54             mov    0x54(%esp),%edx
>>>>> +  b8:       b9 30 00 00 00          mov    $0x30,%ecx
>>>>> +  bd:       89 44 24 20             mov    %eax,0x20(%esp)
>>>>> +  c1:       8b 44 24 58             mov    0x58(%esp),%eax
>>>>> +  c5:       89 54 24 24             mov    %edx,0x24(%esp)
>>>>> +  c9:       8b 54 24 74             mov    0x74(%esp),%edx
>>>>> +  cd:       89 44 24 28             mov    %eax,0x28(%esp)
>>>>> +  d1:       8b 44 24 5c             mov    0x5c(%esp),%eax
>>>>> +  d5:       89 44 24 2c             mov    %eax,0x2c(%esp)
>>>>> +  d9:       8b 44 24 60             mov    0x60(%esp),%eax
>>>>> +  dd:       89 44 24 30             mov    %eax,0x30(%esp)
>>>>> +  e1:       8b 44 24 64             mov    0x64(%esp),%eax
>>>>> +  e5:       89 44 24 34             mov    %eax,0x34(%esp)
>>>>> +  e9:       8b 44 24 6c             mov    0x6c(%esp),%eax
>>>>> +  ed:       89 44 24 38             mov    %eax,0x38(%esp)
>>>>> +  f1:       8b 44 24 68             mov    0x68(%esp),%eax
>>>>> +  f5:       8b 40 20                mov    0x20(%eax),%eax
>>>>> +  f8:       89 54 24 44             mov    %edx,0x44(%esp)
>>>>> +  fc:       8d 54 24 20             lea    0x20(%esp),%edx
>>>>> + 100:       c6 44 24 4f 00          movb   $0x0,0x4f(%esp)
>>>>> + 105:       89 44 24 3c             mov    %eax,0x3c(%esp)
>>>>> + 109:       8b 44 24 70             mov    0x70(%esp),%eax
>>>>> + 10d:       89 44 24 40             mov    %eax,0x40(%esp)
>>>>> + 111:       8b 44 24 78             mov    0x78(%esp),%eax
>>>>> + 115:       89 44 24 48             mov    %eax,0x48(%esp)
>>>>> + 119:       8b 44 24 7c             mov    0x7c(%esp),%eax
>>>>> + 11d:       66 89 44 24 4c          mov    %ax,0x4c(%esp)
>>>>> + 122:       8a 44 24 7e             mov    0x7e(%esp),%al
>>>>> + 126:       88 44 24 4e             mov    %al,0x4e(%esp)
>>>>> + 12a:       89 f0                   mov    %esi,%eax
>>>>> + 12c:       e8 fc ff ff ff          call   12d <ib_uverbs_poll_cq+0x12d>
>>>>> +                    12d: R_386_PC32 copy_to_user
>>>>> + 131:       85 c0                   test   %eax,%eax
>>>>> + 133:       75 0c                   jne    141 <ib_uverbs_poll_cq+0x141>
>>>>> + 135:       83 c6 30                add    $0x30,%esi
>>>>> + 138:       ff 44 24 08             incl   0x8(%esp)
>>>>> + 13c:       e9 25 ff ff ff          jmp    66 <ib_uverbs_poll_cq+0x66>
>>>>> + 141:       bf f2 ff ff ff          mov    $0xfffffff2,%edi
>>>>> + 146:       89 d8                   mov    %ebx,%eax
>>>>> + 148:       e8 fc ff ff ff          call   149 <ib_uverbs_poll_cq+0x149>
>>>>> +                    149: R_386_PC32 .text.put_cq_read
>>>>> + 14d:       81 c4 84 00 00 00       add    $0x84,%esp
>>>>> + 153:       89 f8                   mov    %edi,%eax
>>>>> + 155:       5b                      pop    %ebx
>>>>> + 156:       5e                      pop    %esi
>>>>> + 157:       5f                      pop    %edi
>>>>> + 158:       5d                      pop    %ebp
>>>>> + 159:       c3                      ret
>>>>>
>>>>>  Disassembly of section .text.ib_uverbs_req_notify_cq:
>>>>>
>>>>> @@ -1915,7 +1893,7 @@
>>>>>    94:       b8 7b 00 00 00          mov    $0x7b,%eax
>>>>>                      95: R_386_32    .text.ib_uverbs_create_qp
>>>>>    99:       e8 fc ff ff ff          call   9a <ib_uverbs_create_qp+0x9a>
>>>>> -                    9a: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    9a: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    9e:       59                      pop    %ecx
>>>>>    9f:       c7 85 50 ff ff ff f4    movl   $0xfffffff4,-0xb0(%ebp)
>>>>>    a6:       ff ff ff
>>>>> @@ -2241,7 +2219,7 @@
>>>>>    68:       b8 4f 00 00 00          mov    $0x4f,%eax
>>>>>                      69: R_386_32    .text.ib_uverbs_query_qp
>>>>>    6d:       e8 fc ff ff ff          call   6e <ib_uverbs_query_qp+0x6e>
>>>>> -                    6e: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    6e: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    72:       59                      pop    %ecx
>>>>>    73:       c7 85 5c ff ff ff 10    movl   $0x10,-0xa4(%ebp)
>>>>>    7a:       00 00 00
>>>>> @@ -2260,7 +2238,7 @@
>>>>>    a3:       b8 86 00 00 00          mov    $0x86,%eax
>>>>>                      a4: R_386_32    .text.ib_uverbs_query_qp
>>>>>    a8:       e8 fc ff ff ff          call   a9 <ib_uverbs_query_qp+0xa9>
>>>>> -                    a9: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    a9: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    ad:       5a                      pop    %edx
>>>>>    ae:       85 db                   test   %ebx,%ebx
>>>>>    b0:       0f 84 c1 01 00 00       je     277 <ib_uverbs_query_qp+0x277>
>>>>> @@ -2462,7 +2440,7 @@
>>>>>    88:       b8 6f 00 00 00          mov    $0x6f,%eax
>>>>>                      89: R_386_32    .text.ib_uverbs_modify_qp
>>>>>    8d:       e8 fc ff ff ff          call   8e <ib_uverbs_modify_qp+0x8e>
>>>>> -                    8e: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    8e: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    92:       5a                      pop    %edx
>>>>>    93:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>>    98:       85 db                   test   %ebx,%ebx
>>>>> @@ -3129,7 +3107,7 @@
>>>>>    6a:       b8 4c 00 00 00          mov    $0x4c,%eax
>>>>>                      6b: R_386_32    .text.ib_uverbs_create_ah
>>>>>    6f:       e8 fc ff ff ff          call   70 <ib_uverbs_create_ah+0x70>
>>>>> -                    70: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    70: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    74:       58                      pop    %eax
>>>>>    75:       85 db                   test   %ebx,%ebx
>>>>>    77:       75 0a                   jne    83 <ib_uverbs_create_ah+0x83>
>>>>> @@ -3396,7 +3374,7 @@
>>>>>    af:       b8 91 00 00 00          mov    $0x91,%eax
>>>>>                      b0: R_386_32    .text.ib_uverbs_attach_mcast
>>>>>    b4:       e8 fc ff ff ff          call   b5 <ib_uverbs_attach_mcast+0xb5>
>>>>> -                    b5: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    b5: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    b9:       58                      pop    %eax
>>>>>    ba:       85 db                   test   %ebx,%ebx
>>>>>    bc:       75 07                   jne    c5 <ib_uverbs_attach_mcast+0xc5>
>>>>> @@ -3572,7 +3550,7 @@
>>>>>    77:       b8 5e 00 00 00          mov    $0x5e,%eax
>>>>>                      78: R_386_32    .text.ib_uverbs_create_srq
>>>>>    7c:       e8 fc ff ff ff          call   7d <ib_uverbs_create_srq+0x7d>
>>>>> -                    7d: R_386_PC32  .text.trace_kmalloc.constprop.3
>>>>> +                    7d: R_386_PC32  .text.trace_kmalloc.constprop.4
>>>>>    81:       ba f4 ff ff ff          mov    $0xfffffff4,%edx
>>>>>    86:       58                      pop    %eax
>>>>>    87:       85 db                   test   %ebx,%ebx
>>>>>
>>>>> Needless to say, after my change the diff only shows the truly changed
>>>>> functions.
>>>>>
>>>>> - Michael
>>>>>
>>>>
>>>> Updated patch
>>>>
>>>>
>>>> gcc:
>>>> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>>>
>>>>        Make function clone name numbering independent.
>>>>        * cgraph.h (clone_function_name_1): Add clone_num argument
>>>>        * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
>>>>        (clone_function_name): Use counters from the hashtable.
>>>>
>>>> lto:
>>>> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>>>
>>>>        * lto-partition.c (privatize_symbol_name_1): Pass 0 to
>>>>        clone_function_name_1.
>>>>
>>>>
>>>> testsuite:
>>>> 2018-07-26  Michael Ploujnikov  <michael.ploujnikov@oracle.com>
>>>>
>>>>       Clone numbering should be independent for each function.
>>>>       * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.
>>>>
>>>> ---
>>>>  gcc/cgraph.h                                  |  2 +-
>>>>  gcc/cgraphclones.c                            | 16 ++++++++---
>>>>  gcc/lto/lto-partition.c                       |  2 +-
>>>>  gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
>>>>  4 files changed, 52 insertions(+), 6 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>>>
>>>> diff --git gcc/cgraph.h gcc/cgraph.h
>>>> index a8b1b4c..fe44bd0 100644
>>>> --- gcc/cgraph.h
>>>> +++ gcc/cgraph.h
>>>> @@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool,
>>>> profile_count);
>>>>  tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
>>>>  /* In cgraphclones.c  */
>>>>
>>>> -tree clone_function_name_1 (const char *, const char *);
>>>> +tree clone_function_name_1 (const char *, const char *, unsigned int);
>>>>  tree clone_function_name (tree decl, const char *);
>>>>
>>>>  void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
>>>> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
>>>> index 6e84a31..7de7a65 100644
>>>> --- gcc/cgraphclones.c
>>>> +++ gcc/cgraphclones.c
>>>> @@ -512,13 +512,13 @@ cgraph_node::create_clone (tree new_decl, profile_count
>>>> prof_count,
>>>>    return new_node;
>>>>  }
>>>>
>>>> -static GTY(()) unsigned int clone_fn_id_num;
>>>> +static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
>>>>
>>>>  /* Return a new assembler name for a clone with SUFFIX of a decl named
>>>>     NAME.  */
>>>>
>>>>  tree
>>>> -clone_function_name_1 (const char *name, const char *suffix)
>>>> +clone_function_name_1 (const char *name, const char *suffix, unsigned int
>>>> clone_num)
>>>>  {
>>>>    size_t len = strlen (name);
>>>>    char *tmp_name, *prefix;
>>>> @@ -527,7 +527,7 @@ clone_function_name_1 (const char *name, const char *suffix)
>>>>    memcpy (prefix, name, len);
>>>>    strcpy (prefix + len + 1, suffix);
>>>>    prefix[len] = symbol_table::symbol_suffix_separator ();
>>>> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
>>>> +  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_num);
>>>>    return get_identifier (tmp_name);
>>>>  }
>>>>
>>>> @@ -537,7 +537,15 @@ tree
>>>>  clone_function_name (tree decl, const char *suffix)
>>>>  {
>>>>    tree name = DECL_ASSEMBLER_NAME (decl);
>>>> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
>>>> +  const char *decl_name = IDENTIFIER_POINTER (name);
>>>> +  unsigned int *suffix_counter;
>>>> +  if (!clone_fn_ids) {
>>>> +    /* Initialize the per-function counter hash table on first call */
>>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>>> +  }
>>>> +  suffix_counter = &clone_fn_ids->get_or_insert(decl_name);
>>>> +  *suffix_counter = *suffix_counter + 1;
>>>> +  return clone_function_name_1 (decl_name, suffix, *suffix_counter);
>>>>  }
>>>>
>>>>
>>>> diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
>>>> index c7a5710..4b15232 100644
>>>> --- gcc/lto/lto-partition.c
>>>> +++ gcc/lto/lto-partition.c
>>>> @@ -965,7 +965,7 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
>>>>    name = maybe_rewrite_identifier (name);
>>>>    symtab->change_decl_assembler_name (decl,
>>>>                                     clone_function_name_1 (name,
>>>> -                                                          "lto_priv"));
>>>> +                                                          "lto_priv", 0));
>>>>
>>>>    if (node->lto_file_data)
>>>>      lto_record_renamed_decl (node->lto_file_data, name,
>>>> diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>>> gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>>> new file mode 100644
>>>> index 0000000..d723e20
>>>> --- /dev/null
>>>> +++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
>>>> @@ -0,0 +1,38 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp"  } */
>>>> +
>>>> +extern int printf (const char *, ...);
>>>> +
>>>> +static int __attribute__ ((noinline))
>>>> +foo (int arg)
>>>> +{
>>>> +  return 7 * arg;
>>>> +}
>>>> +
>>>> +static int __attribute__ ((noinline))
>>>> +bar (int arg)
>>>> +{
>>>> +  return arg * arg;
>>>> +}
>>>> +
>>>> +int
>>>> +baz (int arg)
>>>> +{
>>>> +  printf("%d\n", bar (3));
>>>> +  printf("%d\n", bar (4));
>>>> +  printf("%d\n", foo (5));
>>>> +  printf("%d\n", foo (6));
>>>> +  /* adding or removing the following call should not affect foo
>>>> +     function's clone numbering */
>>>> +  printf("%d\n", bar (7));
>>>> +  return foo (8);
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.0" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.1" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump "Function bar.constprop.3" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.0" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.1" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump "Function foo.constprop.2" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.3" "cp" } } */
>>>> +/* { dg-final { scan-ipa-dump-not "Function foo.constprop.4" "cp" } } */
>>>>
>>>
>>> Ping? Could someone else please take a look at this since Richard appears to be
>>> overloaded at the moment?
>>
>> The implementation is now how I suggested.  I'm still not 100% agreeing to
>> the solution of using a hash-map here as I hoped that most callers have
>> a good idea themselves what "number" of clone they are creating.  Thus
>> most of them should be explicit in passing the number.  But I didn't investigate
>> any of them (but the LTO one which I think doesn't need the suffix at all).
>>
>> So if anybody besides me is fine with this approach feel free to approve.
>>
>> One minor nit:
>>
>>>> +  if (!clone_fn_ids) {
>>>> +    /* Initialize the per-function counter hash table on first call */
>>>> +    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
>>>> +  }
>>
>> omit {} around single stmts, the comment doesn't add much information but
>> if you want to keep it move it before the if().
>>
>> Thanks and sorry for the delay (slowly wading through older patches...),
>> Richard.
> 
> No worries, and thank you for the review.
> 
> I dug a little deeper and found that LTO does need the numbered suffixes.
> Otherwise I get errors in a lot of tests that look like:
> 
> ...
> spawn -ignore SIGHUP /gcc/build/gcc/xgcc -B/gcc/build/gcc/ c_lto_pr60820_0.o
> c_lto_pr60820_1.o -fno-diagnostics-show-caret -fdiagnostics-color=never -flto -r
> -nostdlib -O2 -flinker-output=nolto-rel -o gcc-dg-lto-pr60820-01.exe
> 
> pid is 52067 -52067
> lto1: error: two or more sections for
> .gnu.lto_local_in6addr_any.lto_priv.0.lto_priv.0.3f1a2975b4946e93
> 
> lto1: internal compiler error: cannot read LTO decls from /tmp/ccwo9PaB.ltrans0.o
> 
> ...
> 
> which makes sense because lto-partition.c:rename_statics calls
> privatize_symbol_name in a loop over symbols with the same asm name.
> 
> So I'm planning to go back to the version where clone_function_name_1 just takes
> two strings and I'm wondering if I should do:
> 
> name = IDENTIFIER_POINTER (get_identifier (maybe_rewrite_identifier (name)));
> 
> in privatize_symbol_name_1 to guarantee that name always persists.
> 
> 
> - Michael
>

Comments

Jeff Law Aug. 20, 2018, 7:47 p.m. UTC | #1
On 08/13/2018 05:58 PM, Michael Ploujnikov wrote:
> Ping and I've updated the patch since last time as follows:
> 
>   - unittest scans assembly rather than the constprop dump because its
>     forward changed
>   - unittests should handle different hosts where any of
>     NO_DOT_IN_LABEL, NO_DOLLAR_IN_LABEL or __USER_LABEL_PREFIX__ may
>     be defined
>   - not 100% it's safe to change DECL_NAME to DECL_ASSEMBLER_NAME in
>     cgraph_node::create_virtual_clone, but I've attempted to reduce
>     some code duplication
>   - lto-partition.c: privatize_symbol_name_1 *does* need numbered
>     names
>   - but cold sections don't
>   - Expecting an IDENTIFIER_NODE in clone_function_name_1 avoids
>     unreliable string pointer use as pointed out in the first review
>   - renamed clone_function_name_1 and clone_function_name to
>     numbered_clone_function_name_1 and numbered_clone_function_name to
>     clarify purpose and discourage future unintended uses
Richi has more state here than I do, so I'm going to let him own it.  I
know he's just returning from PTO, so it's going to take him a bit of
time to catch up.

jeff
Michael Ploujnikov Aug. 31, 2018, 6:49 p.m. UTC | #2
On 2018-08-13 07:58 PM, Michael Ploujnikov wrote:
> Ping and I've updated the patch since last time as follows:
> 
>   - unittest scans assembly rather than the constprop dump because its
>     forward changed
>   - unittests should handle different hosts where any of
>     NO_DOT_IN_LABEL, NO_DOLLAR_IN_LABEL or __USER_LABEL_PREFIX__ may
>     be defined
>   - not 100% it's safe to change DECL_NAME to DECL_ASSEMBLER_NAME in
>     cgraph_node::create_virtual_clone, but I've attempted to reduce
>     some code duplication
>   - lto-partition.c: privatize_symbol_name_1 *does* need numbered
>     names
>   - but cold sections don't
>   - Expecting an IDENTIFIER_NODE in clone_function_name_1 avoids
>     unreliable string pointer use as pointed out in the first review
>   - renamed clone_function_name_1 and clone_function_name to
>     numbered_clone_function_name_1 and numbered_clone_function_name to
>     clarify purpose and discourage future unintended uses
> 
> Also bootstrapped and regtested.

Ping.

I've done some more digging into the current uses of
numbered_clone_function_name and checked if any tests fail if I change
it to suffixed_function_name:

  - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
    - no new tests fail, inconclusive
    - and despite the comment on redirect_callee_duplicating_thunks
      about "one or more" duplicates it doesn't seem like
      duplicate_thunk_for_node would be called more than once for each
      node, assuming each node is named uniquely, but I'm far from an
      expert in this area
  - gcc/cgraphclones.c:  SET_DECL_ASSEMBLER_NAME (new_decl, numbered_clone_function_name (old_decl, suffix));
    - called by cgraph_node::create_virtual_clone, definitely needs it
    - this is where clones like .constprop come from
  - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (old_decl, suffix);
    - more tests fail
    - this is where clones like .simdclone come from
    - called by cgraph_node::create_version_clone_with_body, most likely needs it
  - gcc/config/rs6000/rs6000.c:  tree decl_name = numbered_clone_function_name (default_decl, "resolver");
    - doesn't look like this needs the numbering as there should only
      be one resolver per multi-version function, but need someone
      with an rs/6000 to confirm
  - gcc/lto/lto-partition.c:                                      numbered_clone_function_name_1 (identifier,
    - definitely needed for disambiguation, shown by unit tests failing
  - gcc/multiple_target.c:                                numbered_clone_function_name (node->decl,
    - create_dispatcher_calls says it only creates the dispatcher once
    - no new tests fail, inconclusive
  - gcc/multiple_target.c:                                    numbered_clone_function_name (node->decl,
    - I have a feeling this isn't necessary
    - no new tests fail, inconclusive
  - gcc/omp-expand.c:  DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel");
    - no new tests fail, inconclusive
    - I didn't see (and couldn't figure out a way to get) any of the
      existing omp/acc tests actually exercise this codeptah
  - gcc/omp-low.c:  return numbered_clone_function_name (current_function_decl,
    - definitely needed based on
      gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c and others
  - gcc/omp-simd-clone.c:      DECL_NAME (new_decl) = numbered_clone_function_name (old_decl, "simdclone");
    - no tests fail, inconclusive
    - can definitely have more than one simdclone per function as above, but
      maybe not these external types
    - some tests, like declare-simd.c actually exercise this codepath,
      but I couldn't find the resulting .simdclone decl the the
      simdclone pass dump nor in any of the other dumps to verify
  - gcc/symtab.c:  DECL_NAME (new_decl) = numbered_clone_function_name (node->decl, "localalias");
    - no tests fail, inconclusive
    - my understanding of function_and_variable_visibility says that
      there can only be one per function so maybe this isn't; hubicka?

I'll add patches to switch these once someone with expertise in these
areas can confirm that the numbering isn't needed in the respective
decl names.


- Michael
From adde0632266d3f1b0540e09b9db931df4302d2bc Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 16 Jul 2018 12:55:49 -0400
Subject: [PATCH 1/4] Make function assembly more independent.

This changes clone_function_name such that clone names are based on a
per-function counter rather than a global one.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Make function clone name numbering independent.
       * cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
       (clone_function_name_1): Take an IDENTIFIER_NODE instead of a
       string. Use clone_fn_id_num.

lto:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * lto-partition.c (privatize_symbol_name_1): Pass a persistent
       identifier node to clone_function_name_1.

testsuite:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       Clone id counters should be completely independent from one another.
       * gcc.dg/independent-cloneids-1.c: New test.
---
 gcc/cgraph.h                                  |  2 +-
 gcc/cgraphclones.c                            | 19 +++++++++-----
 gcc/lto/lto-partition.c                       |  5 ++--
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 +++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c

diff --git gcc/cgraph.h gcc/cgraph.h
index a8b1b4c..fadc107 100644
--- gcc/cgraph.h
+++ gcc/cgraph.h
@@ -2368,7 +2368,7 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */
 
-tree clone_function_name_1 (const char *, const char *);
+tree clone_function_name_1 (tree identifier, const char *);
 tree clone_function_name (tree decl, const char *);
 
 void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 6e84a31..fb81fbd 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -512,14 +512,15 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   return new_node;
 }
 
-static GTY(()) unsigned int clone_fn_id_num;
+static GTY(()) hash_map<const char *, unsigned> *clone_fn_ids;
 
-/* Return a new assembler name for a clone with SUFFIX of a decl named
-   NAME.  */
+/* Return a new assembler name for a numbered clone with SUFFIX of a
+   decl named IDENTIFIER.  */
 
 tree
-clone_function_name_1 (const char *name, const char *suffix)
+clone_function_name_1 (tree identifier, const char *suffix)
 {
+  const char *name = IDENTIFIER_POINTER (identifier);
   size_t len = strlen (name);
   char *tmp_name, *prefix;
 
@@ -527,7 +528,12 @@ clone_function_name_1 (const char *name, const char *suffix)
   memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
   prefix[len] = symbol_table::symbol_suffix_separator ();
-  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
+  /* Initialize the per-function counter hash table on first call */
+  if (!clone_fn_ids)
+    clone_fn_ids = hash_map<const char *, unsigned>::create_ggc (64);
+  unsigned int &suffix_counter = clone_fn_ids->get_or_insert(name);
+  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, suffix_counter);
+  suffix_counter++;
   return get_identifier (tmp_name);
 }
 
@@ -536,8 +542,7 @@ clone_function_name_1 (const char *name, const char *suffix)
 tree
 clone_function_name (tree decl, const char *suffix)
 {
-  tree name = DECL_ASSEMBLER_NAME (decl);
-  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+  return clone_function_name_1 (DECL_ASSEMBLER_NAME (decl), suffix);
 }
 
 
diff --git gcc/lto/lto-partition.c gcc/lto/lto-partition.c
index c7a5710..e3efa71 100644
--- gcc/lto/lto-partition.c
+++ gcc/lto/lto-partition.c
@@ -962,11 +962,12 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
   if (must_not_rename (node, name))
     return false;
 
-  name = maybe_rewrite_identifier (name);
+  tree identifier = get_identifier (maybe_rewrite_identifier (name));
   symtab->change_decl_assembler_name (decl,
-				      clone_function_name_1 (name,
+				      clone_function_name_1 (identifier,
 							     "lto_priv"));
 
+  name = IDENTIFIER_POINTER (identifier);
   if (node->lto_file_data)
     lto_record_renamed_decl (node->lto_file_data, name,
 			     IDENTIFIER_POINTER
diff --git gcc/testsuite/gcc.dg/independent-cloneids-1.c gcc/testsuite/gcc.dg/independent-cloneids-1.c
new file mode 100644
index 0000000..3203895
--- /dev/null
+++ gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone"  } */
+
+extern int printf (const char *, ...);
+
+static int __attribute__ ((noinline))
+foo (int arg)
+{
+  return 7 * arg;
+}
+
+static int __attribute__ ((noinline))
+bar (int arg)
+{
+  return arg * arg;
+}
+
+int
+baz (int arg)
+{
+  printf("%d\n", bar (3));
+  printf("%d\n", bar (4));
+  printf("%d\n", foo (5));
+  printf("%d\n", foo (6));
+  /* adding or removing the following call should not affect foo
+     function's clone numbering */
+  printf("%d\n", bar (7));
+  return foo (8);
+}
+
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]3:} } } */
+/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]4:} } } */
Martin Jambor Sept. 3, 2018, 10:01 a.m. UTC | #3
Hi,

On Fri, Aug 31 2018, Michael Ploujnikov wrote:
> I've done some more digging into the current uses of
> numbered_clone_function_name and checked if any tests fail if I change
> it to suffixed_function_name:
>
>   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
>     - no new tests fail, inconclusive
>     - and despite the comment on redirect_callee_duplicating_thunks
>       about "one or more" duplicates it doesn't seem like
>       duplicate_thunk_for_node would be called more than once for each
>       node, assuming each node is named uniquely, but I'm far from an
>       expert in this area

The comment means that if there is a chain of thunks, the method clones
all of them.  Nevertheless, you need name numbering here for the same
reason why you need them for constprop.


>   - gcc/omp-expand.c:  DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel");
>     - no new tests fail, inconclusive
>     - I didn't see (and couldn't figure out a way to get) any of the
>       existing omp/acc tests actually exercise this codeptah

I guess this one should not need it.  Build with
--enable-offload-targets=hsa and run gomp.exp to try yourself.  I can
run run-time HSA tests for you if you want.

Martin
Richard Biener Sept. 3, 2018, 11:08 a.m. UTC | #4
On Mon, Sep 3, 2018 at 12:02 PM Martin Jambor <mjambor@suse.cz> wrote:
>
> Hi,
>
> On Fri, Aug 31 2018, Michael Ploujnikov wrote:
> > I've done some more digging into the current uses of
> > numbered_clone_function_name and checked if any tests fail if I change
> > it to suffixed_function_name:
> >
> >   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
> >     - no new tests fail, inconclusive
> >     - and despite the comment on redirect_callee_duplicating_thunks
> >       about "one or more" duplicates it doesn't seem like
> >       duplicate_thunk_for_node would be called more than once for each
> >       node, assuming each node is named uniquely, but I'm far from an
> >       expert in this area
>
> The comment means that if there is a chain of thunks, the method clones
> all of them.  Nevertheless, you need name numbering here for the same
> reason why you need them for constprop.

The remaining question I had with the patch was if maybe all callers
can handle assigning
the numbering themselves, thus do sth like

   for-each-clone-of (i, fn)
      DECL_NAME (...) = numbered_clone_function_name (...,
"artificial_thunk", i);

which would make the map of name -> number unnecessary.

>
> >   - gcc/omp-expand.c:  DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel");
> >     - no new tests fail, inconclusive
> >     - I didn't see (and couldn't figure out a way to get) any of the
> >       existing omp/acc tests actually exercise this codeptah
>
> I guess this one should not need it.  Build with
> --enable-offload-targets=hsa and run gomp.exp to try yourself.  I can
> run run-time HSA tests for you if you want.
>
> Martin
Martin Jambor Sept. 3, 2018, 1:15 p.m. UTC | #5
Hi,

On Mon, Sep 03 2018, Richard Biener wrote:
> On Mon, Sep 3, 2018 at 12:02 PM Martin Jambor <mjambor@suse.cz> wrote:
>>
>> Hi,
>>
>> On Fri, Aug 31 2018, Michael Ploujnikov wrote:
>> > I've done some more digging into the current uses of
>> > numbered_clone_function_name and checked if any tests fail if I change
>> > it to suffixed_function_name:
>> >
>> >   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
>> >     - no new tests fail, inconclusive
>> >     - and despite the comment on redirect_callee_duplicating_thunks
>> >       about "one or more" duplicates it doesn't seem like
>> >       duplicate_thunk_for_node would be called more than once for each
>> >       node, assuming each node is named uniquely, but I'm far from an
>> >       expert in this area
>>
>> The comment means that if there is a chain of thunks, the method clones
>> all of them.  Nevertheless, you need name numbering here for the same
>> reason why you need them for constprop.
>
> The remaining question I had with the patch was if maybe all callers
> can handle assigning
> the numbering themselves, thus do sth like
>
>    for-each-clone-of (i, fn)
>       DECL_NAME (...) = numbered_clone_function_name (...,
> "artificial_thunk", i);
>
> which would make the map of name -> number unnecessary.
>

That is what I would prefer too but it also has downsides.  Callers
would have to do their book keeping and the various cloning interfaces
would get another parameter when already they have quite a few
(introducing some cloning context classes seems like an overkill to me
:-).

If we want to get rid of .constprop discrepancies, something like the
following (only very mildly tested) patch should be enough (note that
IPA-CP currently does not really need the new has because it creates
clones in only one pass through the call graph, but that is something
likely to change).  We could then adjust other cloners that we care
about.

However, if clones of thunks are one of them, then the optional
parameter will also proliferate to cgraph_node::create_clone(),
cgraph_edge::redirect_callee_duplicating_thunks() and
duplicate_thunk_for_node().  create_version_clone_with_body would almost
certainly need it because of IPA-SRA too (IPA-SRA can of course always
pass zero).

Martin



diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 2b00f0165fa..703c3cfea7b 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -964,11 +964,15 @@ public:
 			     cgraph_node *new_inlined_to,
 			     bitmap args_to_skip, const char *suffix = NULL);
 
-  /* Create callgraph node clone with new declaration.  The actual body will
-     be copied later at compilation stage.  */
+  /* Create callgraph node clone with new declaration.  The actual body will be
+     copied later at compilation stage.  The name of the new clone will be
+     constructed from the name of the original name, SUFFIX and a number which
+     can either be NUM_SUFFIX if non-negative or a unique incremented integer
+     otherwise.  */
   cgraph_node *create_virtual_clone (vec<cgraph_edge *> redirect_callers,
 				     vec<ipa_replace_map *, va_gc> *tree_map,
-				     bitmap args_to_skip, const char * suffix);
+				     bitmap args_to_skip, const char * suffix,
+				     int num_suffix = -1);
 
   /* cgraph node being removed from symbol table; see if its entry can be
    replaced by other inline clone.  */
@@ -2376,8 +2380,9 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
 tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
 /* In cgraphclones.c  */
 
-tree clone_function_name_1 (const char *, const char *);
-tree clone_function_name (tree decl, const char *);
+tree clone_function_name_1 (const char *name, const char *suffix,
+			    int num_suffix = -1);
+tree clone_function_name (tree decl, const char *suffix, int num_suffix = -1);
 
 void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
 			       bool, bitmap, bool, bitmap, basic_block);
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 0c0a94b04a3..865c3fa1ad0 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -513,11 +513,13 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
 
 static GTY(()) unsigned int clone_fn_id_num;
 
-/* Return a new assembler name for a clone with SUFFIX of a decl named
-   NAME.  */
+/* Return a new assembler name for a clone with SUFFIX of a decl named NAME
+   Apart from, string SUFFIX, the new name will end with either NUM_SIFFIX, if
+   non-negative, or a unique unspecified number otherwise.  .  */
 
 tree
-clone_function_name_1 (const char *name, const char *suffix)
+clone_function_name_1 (const char *name, const char *suffix,
+		       int num_suffix)
 {
   size_t len = strlen (name);
   char *tmp_name, *prefix;
@@ -526,22 +528,32 @@ clone_function_name_1 (const char *name, const char *suffix)
   memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
   prefix[len] = symbol_table::symbol_suffix_separator ();
-  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
+  unsigned int num;
+  if (num_suffix < 0)
+    num = clone_fn_id_num++;
+  else
+    num = (unsigned) num_suffix;
+  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, num);
   return get_identifier (tmp_name);
 }
 
-/* Return a new assembler name for a clone of DECL with SUFFIX.  */
+/* Return a new assembler name for a clone of DECL with SUFFIX. Apart from,
+   string SUFFIX, the new name will end with either NUM_SIFFIX, if
+   non-negative, or a unique unspecified number otherwise.  */
 
 tree
-clone_function_name (tree decl, const char *suffix)
+clone_function_name (tree decl, const char *suffix, int num_suffix)
 {
   tree name = DECL_ASSEMBLER_NAME (decl);
-  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix, num_suffix);
 }
 
 
-/* Create callgraph node clone with new declaration.  The actual body will
-   be copied later at compilation stage.
+/* Create callgraph node clone with new declaration.  The actual body will be
+   copied later at compilation stage.  The name of the new clone will be
+   constructed from the name of the original name, SUFFIX and a number which
+   can either be NUM_SUFFIX if non-negative or a unique incremented integer
+   otherwise.
 
    TODO: after merging in ipa-sra use function call notes instead of args_to_skip
    bitmap interface.
@@ -549,7 +561,8 @@ clone_function_name (tree decl, const char *suffix)
 cgraph_node *
 cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
 				   vec<ipa_replace_map *, va_gc> *tree_map,
-				   bitmap args_to_skip, const char * suffix)
+				   bitmap args_to_skip, const char * suffix,
+				   int num_suffix)
 {
   tree old_decl = decl;
   cgraph_node *new_node = NULL;
@@ -584,7 +597,8 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
   strcpy (name + len + 1, suffix);
   name[len] = '.';
   DECL_NAME (new_decl) = get_identifier (name);
-  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix));
+  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix,
+							  num_suffix));
   SET_DECL_RTL (new_decl, NULL);
 
   new_node = create_clone (new_decl, count, false,
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index afc45969558..52690996419 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -376,6 +376,9 @@ static profile_count max_count;
 
 static long overall_size, max_new_size;
 
+/* Hash to give different UID suffixes */
+static hash_map<cgraph_node *, int> *clone_num_suffixes;
+
 /* Return the param lattices structure corresponding to the Ith formal
    parameter of the function described by INFO.  */
 static inline struct ipcp_param_lattices *
@@ -3850,8 +3853,11 @@ create_specialized_node (struct cgraph_node *node,
 	}
     }
 
+  int &suffix_counter = clone_num_suffixes->get_or_insert(node);
   new_node = node->create_virtual_clone (callers, replace_trees,
-					 args_to_skip, "constprop");
+					 args_to_skip, "constprop",
+					 suffix_counter);
+  suffix_counter++;
 
   bool have_self_recursive_calls = !self_recursive_calls.is_empty ();
   for (unsigned j = 0; j < self_recursive_calls.length (); j++)
@@ -5065,6 +5071,7 @@ ipcp_driver (void)
 
   ipa_check_create_node_params ();
   ipa_check_create_edge_args ();
+  clone_num_suffixes = new hash_map<cgraph_node *, int>;
 
   if (dump_file)
     {
@@ -5086,6 +5093,7 @@ ipcp_driver (void)
   ipcp_store_vr_results ();
 
   /* Free all IPCP structures.  */
+  delete clone_num_suffixes;
   free_toporder_info (&topo);
   delete edge_clone_summaries;
   edge_clone_summaries = NULL;
Michael Ploujnikov Sept. 5, 2018, 3:23 a.m. UTC | #6
Hi Martin, Richard,

Thanks for your responses.

On 2018-09-03 09:15 AM, Martin Jambor wrote:
> Hi,
> 
> On Mon, Sep 03 2018, Richard Biener wrote:
>> On Mon, Sep 3, 2018 at 12:02 PM Martin Jambor <mjambor@suse.cz> wrote:
>>>
>>> Hi,
>>>
>>> On Fri, Aug 31 2018, Michael Ploujnikov wrote:
>>>> I've done some more digging into the current uses of
>>>> numbered_clone_function_name and checked if any tests fail if I change
>>>> it to suffixed_function_name:
>>>>
>>>>   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
>>>>     - no new tests fail, inconclusive
>>>>     - and despite the comment on redirect_callee_duplicating_thunks
>>>>       about "one or more" duplicates it doesn't seem like
>>>>       duplicate_thunk_for_node would be called more than once for each
>>>>       node, assuming each node is named uniquely, but I'm far from an
>>>>       expert in this area
>>>
>>> The comment means that if there is a chain of thunks, the method clones
>>> all of them.  Nevertheless, you need name numbering here for the same
>>> reason why you need them for constprop.
>>
>> The remaining question I had with the patch was if maybe all callers
>> can handle assigning
>> the numbering themselves, thus do sth like
>>
>>    for-each-clone-of (i, fn)
>>       DECL_NAME (...) = numbered_clone_function_name (...,
>> "artificial_thunk", i);
>>
>> which would make the map of name -> number unnecessary.

Please see my comment below.

>>
> 
> That is what I would prefer too but it also has downsides.  Callers
> would have to do their book keeping and the various cloning interfaces
> would get another parameter when already they have quite a few
> (introducing some cloning context classes seems like an overkill to me
> :-).

I'm fine with doing it the way you guys are suggesting, but just to
explain my original thinking: The way I see it is that there will
always be some kind of name -> number mapping, even if it's managed by
each individual user and even if it's actually cgraph_node ->
number. Needless to say I'm far from an expert in all of the involved
areas (this is my first contribution to GCC) so the most effective
approach I could come up is doing it all in one place. Now with your
expert advice and as I get a better understanding of how things like
sra and other clones are created, I'm starting to see that a more
tailored approach is possible and probably the way it should have been
done in the first place.

> 
> If we want to get rid of .constprop discrepancies,

Could you please clarify to me what exactly you mean by the
".constprop discrepancies"?

> something like the
> following (only very mildly tested) patch should be enough (note that
> IPA-CP currently does not really need the new has because it creates

What do you mean "it doesn't need the new hash"? My
independent-cloneids-1.c test shows that a function can have more than
one .constprop clone, but I'm probably just not understanding
something.

> clones in only one pass through the call graph, but that is something
> likely to change).  We could then adjust other cloners that we care
> about.
> 
> However, if clones of thunks are one of them, then the optional
> parameter will also proliferate to cgraph_node::create_clone(),
> cgraph_edge::redirect_callee_duplicating_thunks() and
> duplicate_thunk_for_node().  create_version_clone_with_body would almost
> certainly need it because of IPA-SRA too (IPA-SRA can of course always
> pass zero).

I'll try to do this and to convert the other users in the next version
of the patch, but I might need some help with understanding how some
of those passes work!

> 
> Martin
> 
> 
> 
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 2b00f0165fa..703c3cfea7b 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -964,11 +964,15 @@ public:
>  			     cgraph_node *new_inlined_to,
>  			     bitmap args_to_skip, const char *suffix = NULL);
>  
> -  /* Create callgraph node clone with new declaration.  The actual body will
> -     be copied later at compilation stage.  */
> +  /* Create callgraph node clone with new declaration.  The actual body will be
> +     copied later at compilation stage.  The name of the new clone will be
> +     constructed from the name of the original name, SUFFIX and a number which
> +     can either be NUM_SUFFIX if non-negative or a unique incremented integer
> +     otherwise.  */
>    cgraph_node *create_virtual_clone (vec<cgraph_edge *> redirect_callers,
>  				     vec<ipa_replace_map *, va_gc> *tree_map,
> -				     bitmap args_to_skip, const char * suffix);
> +				     bitmap args_to_skip, const char * suffix,
> +				     int num_suffix = -1);
>  
>    /* cgraph node being removed from symbol table; see if its entry can be
>     replaced by other inline clone.  */
> @@ -2376,8 +2380,9 @@ basic_block init_lowered_empty_function (tree, bool, profile_count);
>  tree thunk_adjust (gimple_stmt_iterator *, tree, bool, HOST_WIDE_INT, tree);
>  /* In cgraphclones.c  */
>  
> -tree clone_function_name_1 (const char *, const char *);
> -tree clone_function_name (tree decl, const char *);
> +tree clone_function_name_1 (const char *name, const char *suffix,
> +			    int num_suffix = -1);
> +tree clone_function_name (tree decl, const char *suffix, int num_suffix = -1);

I can see how using the default -1 helped you keep this patch very
small and simple, but for my version of the patch I would actually
prefer to avoid the implicit fallback to the original global counter
and instead make the patch as big as necessary to convert all the
users who need it. The motivation is that implicit behaviour can
sometimes be missed, which I believe is related to why some existing
calls to clone_function_name are unnecessary.

On this note, what do you think about my idea of renaming the function
to numbered_clone_function_name and adding a suffixed_function_name to
make a distinction for users who don't care about numbering?

>  
>  void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
>  			       bool, bitmap, bool, bitmap, basic_block);
> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
> index 0c0a94b04a3..865c3fa1ad0 100644
> --- a/gcc/cgraphclones.c
> +++ b/gcc/cgraphclones.c
> @@ -513,11 +513,13 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
>  
>  static GTY(()) unsigned int clone_fn_id_num;
>  
> -/* Return a new assembler name for a clone with SUFFIX of a decl named
> -   NAME.  */
> +/* Return a new assembler name for a clone with SUFFIX of a decl named NAME
> +   Apart from, string SUFFIX, the new name will end with either NUM_SIFFIX, if
> +   non-negative, or a unique unspecified number otherwise.  .  */
>  
>  tree
> -clone_function_name_1 (const char *name, const char *suffix)
> +clone_function_name_1 (const char *name, const char *suffix,
> +		       int num_suffix)
>  {
>    size_t len = strlen (name);
>    char *tmp_name, *prefix;
> @@ -526,22 +528,32 @@ clone_function_name_1 (const char *name, const char *suffix)
>    memcpy (prefix, name, len);
>    strcpy (prefix + len + 1, suffix);
>    prefix[len] = symbol_table::symbol_suffix_separator ();
> -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
> +  unsigned int num;
> +  if (num_suffix < 0)
> +    num = clone_fn_id_num++;
> +  else
> +    num = (unsigned) num_suffix;
> +  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, num);
>    return get_identifier (tmp_name);
>  }
>  
> -/* Return a new assembler name for a clone of DECL with SUFFIX.  */
> +/* Return a new assembler name for a clone of DECL with SUFFIX. Apart from,
> +   string SUFFIX, the new name will end with either NUM_SIFFIX, if
> +   non-negative, or a unique unspecified number otherwise.  */
>  
>  tree
> -clone_function_name (tree decl, const char *suffix)
> +clone_function_name (tree decl, const char *suffix, int num_suffix)
>  {
>    tree name = DECL_ASSEMBLER_NAME (decl);
> -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
> +  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix, num_suffix);
>  }
>  
>  
> -/* Create callgraph node clone with new declaration.  The actual body will
> -   be copied later at compilation stage.
> +/* Create callgraph node clone with new declaration.  The actual body will be
> +   copied later at compilation stage.  The name of the new clone will be
> +   constructed from the name of the original name, SUFFIX and a number which
> +   can either be NUM_SUFFIX if non-negative or a unique incremented integer
> +   otherwise.
>  
>     TODO: after merging in ipa-sra use function call notes instead of args_to_skip
>     bitmap interface.
> @@ -549,7 +561,8 @@ clone_function_name (tree decl, const char *suffix)
>  cgraph_node *
>  cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
>  				   vec<ipa_replace_map *, va_gc> *tree_map,
> -				   bitmap args_to_skip, const char * suffix)
> +				   bitmap args_to_skip, const char * suffix,
> +				   int num_suffix)
>  {
>    tree old_decl = decl;
>    cgraph_node *new_node = NULL;
> @@ -584,7 +597,8 @@ cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
>    strcpy (name + len + 1, suffix);
>    name[len] = '.';
>    DECL_NAME (new_decl) = get_identifier (name);
> -  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix));
> +  SET_DECL_ASSEMBLER_NAME (new_decl, clone_function_name (old_decl, suffix,
> +							  num_suffix));
>    SET_DECL_RTL (new_decl, NULL);
>  
>    new_node = create_clone (new_decl, count, false,
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index afc45969558..52690996419 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -376,6 +376,9 @@ static profile_count max_count;
>  
>  static long overall_size, max_new_size;
>  
> +/* Hash to give different UID suffixes */
> +static hash_map<cgraph_node *, int> *clone_num_suffixes;
> +
>  /* Return the param lattices structure corresponding to the Ith formal
>     parameter of the function described by INFO.  */
>  static inline struct ipcp_param_lattices *
> @@ -3850,8 +3853,11 @@ create_specialized_node (struct cgraph_node *node,
>  	}
>      }
>  
> +  int &suffix_counter = clone_num_suffixes->get_or_insert(node);
>    new_node = node->create_virtual_clone (callers, replace_trees,
> -					 args_to_skip, "constprop");
> +					 args_to_skip, "constprop",
> +					 suffix_counter);
> +  suffix_counter++;
>  
>    bool have_self_recursive_calls = !self_recursive_calls.is_empty ();
>    for (unsigned j = 0; j < self_recursive_calls.length (); j++)
> @@ -5065,6 +5071,7 @@ ipcp_driver (void)
>  
>    ipa_check_create_node_params ();
>    ipa_check_create_edge_args ();
> +  clone_num_suffixes = new hash_map<cgraph_node *, int>;
>  
>    if (dump_file)
>      {
> @@ -5086,6 +5093,7 @@ ipcp_driver (void)
>    ipcp_store_vr_results ();
>  
>    /* Free all IPCP structures.  */
> +  delete clone_num_suffixes;

I like how this allows the reduction of the lifetime of the mapping!
Richard: that helps with your concerns about the extra memory usage,
right?


>    free_toporder_info (&topo);
>    delete edge_clone_summaries;
>    edge_clone_summaries = NULL;
> 


Thanks again,
- Michael
Michael Ploujnikov Sept. 5, 2018, 3:32 a.m. UTC | #7
Hi Martin,

On 2018-09-03 06:01 AM, Martin Jambor wrote:
> Hi,
> 
> On Fri, Aug 31 2018, Michael Ploujnikov wrote:
>> I've done some more digging into the current uses of
>> numbered_clone_function_name and checked if any tests fail if I change
>> it to suffixed_function_name:
>>
>>   - gcc/cgraphclones.c:  DECL_NAME (new_decl) = numbered_clone_function_name (thunk->decl, "artificial_thunk");
>>     - no new tests fail, inconclusive
>>     - and despite the comment on redirect_callee_duplicating_thunks
>>       about "one or more" duplicates it doesn't seem like
>>       duplicate_thunk_for_node would be called more than once for each
>>       node, assuming each node is named uniquely, but I'm far from an
>>       expert in this area
> 
> The comment means that if there is a chain of thunks, the method clones
> all of them.  Nevertheless, you need name numbering here for the same
> reason why you need them for constprop.
> 
> 
>>   - gcc/omp-expand.c:  DECL_NAME (kern_fndecl) = numbered_clone_function_name (kern_fndecl, "kernel");
>>     - no new tests fail, inconclusive
>>     - I didn't see (and couldn't figure out a way to get) any of the
>>       existing omp/acc tests actually exercise this codeptah
> 
> I guess this one should not need it.  Build with
> --enable-offload-targets=hsa and run gomp.exp to try yourself.  I can
> run run-time HSA tests for you if you want.
> 
> Martin
> 

I've tried building with numbered_clone_function_name replaced by
suffixed_function_name and with --enable-offload-targets=hsa and
didn't see any errors in gomp.exp. I don't have a readily available
HSA setup so if you could do a quick test, I would really appreciate
it!

- Michael
Martin Jambor Dec. 4, 2018, 12:48 p.m. UTC | #8
Hi,

On Tue, Sep 04 2018, Michael Ploujnikov wrote:
>
> I've tried building with numbered_clone_function_name replaced by
> suffixed_function_name and with --enable-offload-targets=hsa and
> didn't see any errors in gomp.exp. I don't have a readily available
> HSA setup so if you could do a quick test, I would really appreciate
> it!

Sorry it took so long, I have run regular tests on trunk today and your
patch does not appear to have caused any issues.

Martin
Michael Ploujnikov Dec. 4, 2018, 1:22 p.m. UTC | #9
On 2018-12-04 7:48 a.m., Martin Jambor wrote:
> Hi,
> 
> On Tue, Sep 04 2018, Michael Ploujnikov wrote:
>>
>> I've tried building with numbered_clone_function_name replaced by
>> suffixed_function_name and with --enable-offload-targets=hsa and
>> didn't see any errors in gomp.exp. I don't have a readily available
>> HSA setup so if you could do a quick test, I would really appreciate
>> it!
> 
> Sorry it took so long, I have run regular tests on trunk today and your
> patch does not appear to have caused any issues.
> 
> Martin
> 

Thank you for checking.

- Michael
diff mbox series

Patch

From d96bedcf51c289f3c80a12a681a7831d9e287874 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov <michael.ploujnikov@oracle.com>
Date: Mon, 13 Aug 2018 13:59:46 -0400
Subject: [PATCH 4/4] Use suffixed_function_name in
 cgraph_node::create_virtual_clone.

gcc:
2018-08-02  Michael Ploujnikov  <michael.ploujnikov@oracle.com>

       * cgraphclones.c (cgraph_node::create_virtual_clone): Use
         suffixed_function_name.
---
 gcc/cgraphclones.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git gcc/cgraphclones.c gcc/cgraphclones.c
index 784f1df..876b7fa 100644
--- gcc/cgraphclones.c
+++ gcc/cgraphclones.c
@@ -576,9 +576,8 @@  cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
   tree old_decl = decl;
   cgraph_node *new_node = NULL;
   tree new_decl;
-  size_t len, i;
+  size_t i;
   ipa_replace_map *map;
-  char *name;
 
   gcc_checking_assert (local.versionable);
   gcc_assert (local.can_change_signature || !args_to_skip);
@@ -600,12 +599,7 @@  cgraph_node::create_virtual_clone (vec<cgraph_edge *> redirect_callers,
      sometimes storing only clone decl instead of original.  */
 
   /* Generate a new name for the new version. */
-  len = IDENTIFIER_LENGTH (DECL_NAME (old_decl));
-  name = XALLOCAVEC (char, len + strlen (suffix) + 2);
-  memcpy (name, IDENTIFIER_POINTER (DECL_NAME (old_decl)), len);
-  strcpy (name + len + 1, suffix);
-  name[len] = '.';
-  DECL_NAME (new_decl) = get_identifier (name);
+  DECL_NAME (new_decl) = suffixed_function_name (old_decl, suffix);
   SET_DECL_ASSEMBLER_NAME (new_decl, numbered_clone_function_name (old_decl, suffix));
   SET_DECL_RTL (new_decl, NULL);
 
-- 
2.7.4