diff mbox series

c++: Retry the aliasing of base/complete cdtor optimization at import_export_decl time [PR113208]

Message ID Zh99dzErl4LUr2AC@tucnak
State New
Headers show
Series c++: Retry the aliasing of base/complete cdtor optimization at import_export_decl time [PR113208] | expand

Commit Message

Jakub Jelinek April 17, 2024, 7:42 a.m. UTC
Hi!

When expand_or_defer_fn is called at_eof time, it calls import_export_decl
and then maybe_clone_body, which uses DECL_ONE_ONLY and comdat name in a
couple of places to try to optimize cdtors which are known to have the
same body by making the complete cdtor an alias to base cdtor (and in
that case also uses *[CD]5* as comdat group name instead of the normal
comdat group names specific to each mangled name).
Now, this optimization depends on DECL_ONE_ONLY and DECL_INTERFACE_KNOWN,
maybe_clone_body and can_alias_cdtor use:
      if (DECL_ONE_ONLY (fn))
        cgraph_node::get_create (clone)->set_comdat_group (cxx_comdat_group (clone));
...
  bool can_alias = can_alias_cdtor (fn);
...
      /* Tell cgraph if both ctors or both dtors are known to have
         the same body.  */
      if (can_alias
          && fns[0]
          && idx == 1
          && cgraph_node::get_create (fns[0])->create_same_body_alias
               (clone, fns[0]))
        {
          alias = true;
          if (DECL_ONE_ONLY (fns[0]))
            {
              /* For comdat base and complete cdtors put them
                 into the same, *[CD]5* comdat group instead of
                 *[CD][12]*.  */
              comdat_group = cdtor_comdat_group (fns[1], fns[0]);
              cgraph_node::get_create (fns[0])->set_comdat_group (comdat_group);
              if (symtab_node::get (clone)->same_comdat_group)
                symtab_node::get (clone)->remove_from_same_comdat_group ();
              symtab_node::get (clone)->add_to_same_comdat_group
                (symtab_node::get (fns[0]));
            }
        }
and
  /* Don't use aliases for weak/linkonce definitions unless we can put both
     symbols in the same COMDAT group.  */
  return (DECL_INTERFACE_KNOWN (fn)
          && (SUPPORTS_ONE_ONLY || !DECL_WEAK (fn))
          && (!DECL_ONE_ONLY (fn)
              || (HAVE_COMDAT_GROUP && DECL_WEAK (fn))));
The following testcase regressed with Marek's r14-5979 change,
when pr113208_0.C is compiled where the ctor is marked constexpr,
we no longer perform this optimization, where
_ZN6vectorI12QualityValueEC2ERKS1_ was emitted in the
_ZN6vectorI12QualityValueEC5ERKS1_ comdat group and
_ZN6vectorI12QualityValueEC1ERKS1_ was made an alias to it,
instead we emit _ZN6vectorI12QualityValueEC2ERKS1_ in
_ZN6vectorI12QualityValueEC2ERKS1_ comdat group and the same
content _ZN6vectorI12QualityValueEC1ERKS1_ as separate symbol in
_ZN6vectorI12QualityValueEC1ERKS1_ comdat group.
Now, the linker seems to somehow cope with that, eventhough it
probably keeps both copies of the ctor, but seems LTO can't cope
with that and Honza doesn't know what it should do in that case
(linker decides that the prevailing symbol is
_ZN6vectorI12QualityValueEC2ERKS1_ (from the
_ZN6vectorI12QualityValueEC2ERKS1_ comdat group) and
_ZN6vectorI12QualityValueEC1ERKS1_ alias (from the other TU,
from _ZN6vectorI12QualityValueEC5ERKS1_ comdat group)).

Note, the case where some constructor is marked constexpr in one
TU and not in another one happens pretty often in libstdc++ when
one mixes -std= flags used to compile different compilation units.

The reason the optimization doesn't trigger when the constructor is
constexpr is that expand_or_defer_fn is called in that case much earlier
than when it is not constexpr; in the former case it is called when we
try to constant evaluate that constructor.  But DECL_INTERFACE_KNOWN
is false in that case and comdat_linkage hasn't been called either
(I think it is desirable, because comdat group is stored in the cgraph
node and am not sure it is a good idea to create cgraph nodes for
something that might not be needed later on at all), so maybe_clone_body
clones the bodies, but doesn't make them as aliases.

The following patch is an attempt to redo this optimization when
import_export_decl is called at_eof time on the base/complete cdtor
(or deleting dtor).  It will not do anything if maybe_clone_body
hasn't been called uyet (the TREE_ASM_WRITTEN check on the
DECL_MAYBE_IN_CHARGE_CDTOR_P), or when one or both of the base/complete
cdtors have been lowered already, or when maybe_clone_body called
maybe_thunk_body and it was successful.  Otherwise retries the
can_alias_cdtor check and makes the complete cdtor alias to the
base cdtor with adjustments to the comdat group.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-04-17  Jakub Jelinek  <jakub@redhat.com>

	PR lto/113208
	* cp-tree.h (maybe_optimize_cdtor): Declare.
	* decl2.cc (import_export_decl): Call it for cloned cdtors.
	* optimize.cc (maybe_optimize_cdtor): New function.

	* g++.dg/lto/pr113208_0.C: New test.
	* g++.dg/lto/pr113208_1.C: New file.
	* g++.dg/lto/pr113208.h: New file.


	Jakub

Comments

Jan Hubicka April 17, 2024, 9:04 a.m. UTC | #1
> Hi!
Hello,
> The reason the optimization doesn't trigger when the constructor is
> constexpr is that expand_or_defer_fn is called in that case much earlier
> than when it is not constexpr; in the former case it is called when we
> try to constant evaluate that constructor.  But DECL_INTERFACE_KNOWN
> is false in that case and comdat_linkage hasn't been called either
> (I think it is desirable, because comdat group is stored in the cgraph
> node and am not sure it is a good idea to create cgraph nodes for
> something that might not be needed later on at all), so maybe_clone_body
> clones the bodies, but doesn't make them as aliases.

Thanks for working this out! Creating cgraph node with no body is
harmless as it will be removed later.  
> 
> The following patch is an attempt to redo this optimization when
> import_export_decl is called at_eof time on the base/complete cdtor
> (or deleting dtor).  It will not do anything if maybe_clone_body
> hasn't been called uyet (the TREE_ASM_WRITTEN check on the
> DECL_MAYBE_IN_CHARGE_CDTOR_P), or when one or both of the base/complete
> cdtors have been lowered already, or when maybe_clone_body called
> maybe_thunk_body and it was successful.  Otherwise retries the
> can_alias_cdtor check and makes the complete cdtor alias to the
> base cdtor with adjustments to the comdat group.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-04-17  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR lto/113208
> 	* cp-tree.h (maybe_optimize_cdtor): Declare.
> 	* decl2.cc (import_export_decl): Call it for cloned cdtors.
> 	* optimize.cc (maybe_optimize_cdtor): New function.
> 
> 	* g++.dg/lto/pr113208_0.C: New test.
> 	* g++.dg/lto/pr113208_1.C: New file.
> 	* g++.dg/lto/pr113208.h: New file.
> 
> --- gcc/cp/cp-tree.h.jj	2024-04-16 17:18:37.286279533 +0200
> +++ gcc/cp/cp-tree.h	2024-04-16 17:45:01.685635709 +0200
> @@ -7447,6 +7447,7 @@ extern bool handle_module_option (unsign
>  /* In optimize.cc */
>  extern tree clone_attrs				(tree);
>  extern bool maybe_clone_body			(tree);
> +extern void maybe_optimize_cdtor		(tree);
>  
>  /* In parser.cc */
>  extern tree cp_convert_range_for (tree, tree, tree, cp_decomp *, bool,
> --- gcc/cp/decl2.cc.jj	2024-04-16 17:18:37.287279519 +0200
> +++ gcc/cp/decl2.cc	2024-04-16 17:45:01.686635695 +0200
> @@ -3568,6 +3568,9 @@ import_export_decl (tree decl)
>      }
>  
>    DECL_INTERFACE_KNOWN (decl) = 1;
> +
> +  if (DECL_CLONED_FUNCTION_P (decl))
> +    maybe_optimize_cdtor (decl);
>  }
>  
>  /* Return an expression that performs the destruction of DECL, which
> --- gcc/cp/optimize.cc.jj	2024-04-16 17:18:37.374278327 +0200
> +++ gcc/cp/optimize.cc	2024-04-16 21:35:53.597188774 +0200
> @@ -723,3 +723,58 @@ maybe_clone_body (tree fn)
>    /* We don't need to process the original function any further.  */
>    return 1;
>  }
> +
> +/* If maybe_clone_body is called while the cdtor is still tentative,
> +   DECL_ONE_ONLY will be false and so will be can_alias_cdtor (fn).
> +   In that case we wouldn't try to optimize using an alias and instead
> +   would emit separate base and complete cdtor.  The following function
> +   attempts to still optimize that case when we import_export_decl
> +   is called first time on one of the clones.  */
> +
> +void
> +maybe_optimize_cdtor (tree orig_decl)
> +{
> +  tree fns[3];
> +  tree fn = DECL_CLONED_FUNCTION (orig_decl);
> +  gcc_checking_assert (DECL_MAYBE_IN_CHARGE_CDTOR_P (fn));
> +  if (DECL_INTERFACE_KNOWN (fn)
> +      || !TREE_ASM_WRITTEN (fn)
> +      || !DECL_ONE_ONLY (orig_decl)
> +      || symtab->global_info_ready)
> +    return;
> +
> +  populate_clone_array (fn, fns);
> +
> +  if (!fns[0] || !fns[1])
> +    return;
> +  for (int i = 2 - !fns[2]; i >= 0; --i)
> +    if (fns[i] != orig_decl && DECL_INTERFACE_KNOWN (fns[i]))
> +      return;
> +  DECL_INTERFACE_KNOWN (fn) = 1;
> +  comdat_linkage (fn);
> +  if (!can_alias_cdtor (fn))
> +    return;
> +  /* For comdat base and complete cdtors put them into the same,
> +     *[CD]5* comdat group instead of *[CD][12]*.  */
> +  auto n0 = cgraph_node::get_create (fns[0]);
> +  auto n1 = cgraph_node::get_create (fns[1]);
> +  auto n2 = fns[2] ? cgraph_node::get_create (fns[1]) : NULL;
> +  if (n0->lowered || n1->lowered || (n2 && n2->lowered))
> +    return;
> +  import_export_decl (fns[0]);
> +  n1->definition = false;
> +  if (!n0->create_same_body_alias (fns[1], fns[0]))
> +    return;
> +
> +  tree comdat_group = cdtor_comdat_group (fns[1], fns[0]);
> +  n1 = cgraph_node::get (fns[1]);
> +  n0->set_comdat_group (comdat_group);
> +  if (n1->same_comdat_group)
> +    n1->remove_from_same_comdat_group ();
> +  n1->add_to_same_comdat_group (n0);
> +  if (fns[2])
> +    n2->add_to_same_comdat_group (n0);
> +  import_export_decl (fns[1]);

So this is handling the case where symbol was already inserted into one
comdat group and later we need to move it into the C5 group?  As long as
we move everythingf rom old comdat group, this should be fine.
> +  /* Remove the body now that it is an alias.  */
> +  DECL_SAVED_TREE (fns[1]) = NULL_TREE;
Maybe using release_function_body since it also knows how to remove
DECL_STRUCT_FUNCTION that exists at this stage?

I was thinking how to solve the problem on cgrpah side.  We definitely
have long lasting bug where aliases are handled incorrectly for which I
made WIP patch (but probably would like to hold it after release branch is
created).  When foo has alias bar and foo is praviled then the alias
target is prevailed too.  This is what causes the ICE about cross comdat
section alias.  However fixing this is not enough as I do not think we
can handle incremental linking correctly (as discussed briefly on IRC
technically we should keep both sections but that would require two
symbols of same name in single .o file).

With the aliasing fixed we turn the other symbol into static but keep
alias, so we end up with one comdat group having the non-aliased
constructor and second comdat group (C5) exporting only the alias, which
is not quite reight either.

Honza
Jakub Jelinek April 17, 2024, 12:32 p.m. UTC | #2
On Wed, Apr 17, 2024 at 11:04:26AM +0200, Jan Hubicka wrote:
> > The reason the optimization doesn't trigger when the constructor is
> > constexpr is that expand_or_defer_fn is called in that case much earlier
> > than when it is not constexpr; in the former case it is called when we
> > try to constant evaluate that constructor.  But DECL_INTERFACE_KNOWN
> > is false in that case and comdat_linkage hasn't been called either
> > (I think it is desirable, because comdat group is stored in the cgraph
> > node and am not sure it is a good idea to create cgraph nodes for
> > something that might not be needed later on at all), so maybe_clone_body
> > clones the bodies, but doesn't make them as aliases.
> 
> Thanks for working this out! Creating cgraph node with no body is
> harmless as it will be removed later.  

That is actually for functions with bodies, maybe_clone_body creates the
bodies for those, but still when it happens early, the cdtors have
tentative_decl_linkage linkage, which in many cases means DECL_EXTERNAL,
DECL_NOT_REALLY_EXTERN (C++ FE special thing), DECL_DEFER_OUTPUT etc.

> > +  tree comdat_group = cdtor_comdat_group (fns[1], fns[0]);
> > +  n1 = cgraph_node::get (fns[1]);
> > +  n0->set_comdat_group (comdat_group);
> > +  if (n1->same_comdat_group)
> > +    n1->remove_from_same_comdat_group ();
> > +  n1->add_to_same_comdat_group (n0);
> > +  if (fns[2])
> > +    n2->add_to_same_comdat_group (n0);
> > +  import_export_decl (fns[1]);
> 
> So this is handling the case where symbol was already inserted into one
> comdat group and later we need to move it into the C5 group?  As long as
> we move everythingf rom old comdat group, this should be fine.

The above is pretty much an adjusted copy of what maybe_clone_body does,
except it doesn't call cgraph_node::get{,_create} all the time and uses
import_export_decl rather than expand_or_defer_fn{,_1}.

> > +  /* Remove the body now that it is an alias.  */
> > +  DECL_SAVED_TREE (fns[1]) = NULL_TREE;
> Maybe using release_function_body since it also knows how to remove
> DECL_STRUCT_FUNCTION that exists at this stage?

Guess I could try that, clearing of DECL_SAVED_TREE was what was done
in maybe_clone_body too.

> I was thinking how to solve the problem on cgrpah side.  We definitely
> have long lasting bug where aliases are handled incorrectly for which I
> made WIP patch (but probably would like to hold it after release branch is
> created).  When foo has alias bar and foo is praviled then the alias
> target is prevailed too.  This is what causes the ICE about cross comdat
> section alias.  However fixing this is not enough as I do not think we
> can handle incremental linking correctly (as discussed briefly on IRC
> technically we should keep both sections but that would require two
> symbols of same name in single .o file).
> 
> With the aliasing fixed we turn the other symbol into static but keep
> alias, so we end up with one comdat group having the non-aliased
> constructor and second comdat group (C5) exporting only the alias, which
> is not quite reight either.

I've tried to see what actually happens during linking without LTO, so compiled
pr113208_0.C with -O1 -fkeep-inline-functions -std=c++20 with vanilla trunk
(so it has those 2 separate comdats, one for C2 and one for C1), though I've
changed the
void m(k);
line to
__attribute__((noipa)) void m(k) {}
in the testcase, then compiled
pr113208_1.C with -O2 -fkeep-inline-functions -std=c++20 -fno-omit-frame-pointer
so that one can clearly differentiate from where the implementation was
picked and finally added
template <typename _Tp> struct _Vector_base {
  int g() const;
  _Vector_base(int, int);
};

struct QualityValue;
template <>
_Vector_base<QualityValue>::_Vector_base(int, int) {}
template <>
int _Vector_base<QualityValue>::g() const { return 0; }
int main () {}
If I link this, I see _ZN6vectorI12QualityValueEC2ERKS1_ and
_ZN6vectorI12QualityValueEC1ERKS1_ as separate functions with the
omitted frame pointer bodies, so clearly the pr113208_0.C versions prevailed
in both cases.  It is unclear why that isn't the case for LTO.

	Jakub
Jan Hubicka April 17, 2024, 1:26 p.m. UTC | #3
> 
> I've tried to see what actually happens during linking without LTO, so compiled
> pr113208_0.C with -O1 -fkeep-inline-functions -std=c++20 with vanilla trunk
> (so it has those 2 separate comdats, one for C2 and one for C1), though I've
> changed the
> void m(k);
> line to
> __attribute__((noipa)) void m(k) {}
> in the testcase, then compiled
> pr113208_1.C with -O2 -fkeep-inline-functions -std=c++20 -fno-omit-frame-pointer
> so that one can clearly differentiate from where the implementation was
> picked and finally added
> template <typename _Tp> struct _Vector_base {
>   int g() const;
>   _Vector_base(int, int);
> };
> 
> struct QualityValue;
> template <>
> _Vector_base<QualityValue>::_Vector_base(int, int) {}
> template <>
> int _Vector_base<QualityValue>::g() const { return 0; }
> int main () {}
> If I link this, I see _ZN6vectorI12QualityValueEC2ERKS1_ and
> _ZN6vectorI12QualityValueEC1ERKS1_ as separate functions with the
> omitted frame pointer bodies, so clearly the pr113208_0.C versions prevailed
> in both cases.  It is unclear why that isn't the case for LTO.

I think it is because of -fkeep-inline-functions which makes the first
object file to define both symbols, while with LTO we optimize out one
of them.  

So to reproduce same behaviour with non-LTO we would probably need use
-O1 and arrange the contructor to be unilinable instead of using
-fkeep-inline-functions.

Honza
Jakub Jelinek April 17, 2024, 2:07 p.m. UTC | #4
On Wed, Apr 17, 2024 at 03:26:53PM +0200, Jan Hubicka wrote:
> > 
> > I've tried to see what actually happens during linking without LTO, so compiled
> > pr113208_0.C with -O1 -fkeep-inline-functions -std=c++20 with vanilla trunk
> > (so it has those 2 separate comdats, one for C2 and one for C1), though I've
> > changed the
> > void m(k);
> > line to
> > __attribute__((noipa)) void m(k) {}
> > in the testcase, then compiled
> > pr113208_1.C with -O2 -fkeep-inline-functions -std=c++20 -fno-omit-frame-pointer
> > so that one can clearly differentiate from where the implementation was
> > picked and finally added
> > template <typename _Tp> struct _Vector_base {
> >   int g() const;
> >   _Vector_base(int, int);
> > };
> > 
> > struct QualityValue;
> > template <>
> > _Vector_base<QualityValue>::_Vector_base(int, int) {}
> > template <>
> > int _Vector_base<QualityValue>::g() const { return 0; }
> > int main () {}
> > If I link this, I see _ZN6vectorI12QualityValueEC2ERKS1_ and
> > _ZN6vectorI12QualityValueEC1ERKS1_ as separate functions with the
> > omitted frame pointer bodies, so clearly the pr113208_0.C versions prevailed
> > in both cases.  It is unclear why that isn't the case for LTO.
> 
> I think it is because of -fkeep-inline-functions which makes the first
> object file to define both symbols, while with LTO we optimize out one
> of them.  
> 
> So to reproduce same behaviour with non-LTO we would probably need use
> -O1 and arrange the contructor to be unilinable instead of using
> -fkeep-inline-functions.

Ah, you're right.
If I compile (the one line modified) pr113208_0.C with
-O -fno-early-inlining -fdisable-ipa-inline -std=c++20
it does have just _ZN6vectorI12QualityValueEC2ERKS1_ in _ZN6vectorI12QualityValueEC2ERKS1_
comdat and no _ZN6vectorI12QualityValueEC1ERKS1_
and pr113208_1.C with -O -fno-early-inlining -fdisable-ipa-inline -std=c++20 -fno-omit-frame-pointer
and link that together with the above mentioned third *.C file, I see
000000000040112a <_ZN6vectorI12QualityValueEC2ERKS1_>:
  40112a:       53                      push   %rbx
  40112b:       48 89 fb                mov    %rdi,%rbx
  40112e:       48 89 f7                mov    %rsi,%rdi
  401131:       e8 9c 00 00 00          call   4011d2 <_ZNK12_Vector_baseI12QualityValueE1gEv>
  401136:       89 c2                   mov    %eax,%edx
  401138:       be 01 00 00 00          mov    $0x1,%esi
  40113d:       48 89 df                mov    %rbx,%rdi
  401140:       e8 7b 00 00 00          call   4011c0 <_ZN12_Vector_baseI12QualityValueEC1Eii>
  401145:       5b                      pop    %rbx
  401146:       c3                      ret    
i.e. the C2 prevailing from pr113208_0.s where it is the only symbol, and
0000000000401196 <_ZN6vectorI12QualityValueEC1ERKS1_>:
  401196:       55                      push   %rbp
  401197:       48 89 e5                mov    %rsp,%rbp
  40119a:       53                      push   %rbx
  40119b:       48 83 ec 08             sub    $0x8,%rsp
  40119f:       48 89 fb                mov    %rdi,%rbx
  4011a2:       48 89 f7                mov    %rsi,%rdi
  4011a5:       e8 28 00 00 00          call   4011d2 <_ZNK12_Vector_baseI12QualityValueE1gEv>
  4011aa:       89 c2                   mov    %eax,%edx
  4011ac:       be 01 00 00 00          mov    $0x1,%esi
  4011b1:       48 89 df                mov    %rbx,%rdi
  4011b4:       e8 07 00 00 00          call   4011c0 <_ZN12_Vector_baseI12QualityValueEC1Eii>
  4011b9:       48 8b 5d f8             mov    -0x8(%rbp),%rbx
  4011bd:       c9                      leave  
  4011be:       c3                      ret    
which is the C1 alias originally aliased to C2 in C5 comdat.
So, that would match linker behavior where it sees C1 -> C2 alias prevails,
but a different version of C2 prevails, so let's either make C1 a non-alias
or alias to a non-exported symbol or something like that.
Though, I admit I have no idea what we do with comdat's during LTO, perhaps
doing what I said above could break stuff if linker after seeing the LTO
resulting objects decides on prevailing symbols differently.

	Jakub
Jan Hubicka April 17, 2024, 2:34 p.m. UTC | #5
> 
> Ah, you're right.
> If I compile (the one line modified) pr113208_0.C with
> -O -fno-early-inlining -fdisable-ipa-inline -std=c++20
> it does have just _ZN6vectorI12QualityValueEC2ERKS1_ in _ZN6vectorI12QualityValueEC2ERKS1_
> comdat and no _ZN6vectorI12QualityValueEC1ERKS1_
> and pr113208_1.C with -O -fno-early-inlining -fdisable-ipa-inline -std=c++20 -fno-omit-frame-pointer
> and link that together with the above mentioned third *.C file, I see
> 000000000040112a <_ZN6vectorI12QualityValueEC2ERKS1_>:
>   40112a:       53                      push   %rbx
>   40112b:       48 89 fb                mov    %rdi,%rbx
>   40112e:       48 89 f7                mov    %rsi,%rdi
>   401131:       e8 9c 00 00 00          call   4011d2 <_ZNK12_Vector_baseI12QualityValueE1gEv>
>   401136:       89 c2                   mov    %eax,%edx
>   401138:       be 01 00 00 00          mov    $0x1,%esi
>   40113d:       48 89 df                mov    %rbx,%rdi
>   401140:       e8 7b 00 00 00          call   4011c0 <_ZN12_Vector_baseI12QualityValueEC1Eii>
>   401145:       5b                      pop    %rbx
>   401146:       c3                      ret    
> i.e. the C2 prevailing from pr113208_0.s where it is the only symbol, and
> 0000000000401196 <_ZN6vectorI12QualityValueEC1ERKS1_>:
>   401196:       55                      push   %rbp
>   401197:       48 89 e5                mov    %rsp,%rbp
>   40119a:       53                      push   %rbx
>   40119b:       48 83 ec 08             sub    $0x8,%rsp
>   40119f:       48 89 fb                mov    %rdi,%rbx
>   4011a2:       48 89 f7                mov    %rsi,%rdi
>   4011a5:       e8 28 00 00 00          call   4011d2 <_ZNK12_Vector_baseI12QualityValueE1gEv>
>   4011aa:       89 c2                   mov    %eax,%edx
>   4011ac:       be 01 00 00 00          mov    $0x1,%esi
>   4011b1:       48 89 df                mov    %rbx,%rdi
>   4011b4:       e8 07 00 00 00          call   4011c0 <_ZN12_Vector_baseI12QualityValueEC1Eii>
>   4011b9:       48 8b 5d f8             mov    -0x8(%rbp),%rbx
>   4011bd:       c9                      leave  
>   4011be:       c3                      ret    
> which is the C1 alias originally aliased to C2 in C5 comdat.
> So, that would match linker behavior where it sees C1 -> C2 alias prevails,
> but a different version of C2 prevails, so let's either make C1 a non-alias
> or alias to a non-exported symbol or something like that.

Yep, I think the linker simply handles the C2 symbol as weak symbol and
after it decides to keep both comdat section (C2 and C5) the C5's weak
symbol is prevailed by C2.

I wrote patch doing the same with LTO - we need to handle alias
references specially and do not update them according to the resolution
info and then look for prevailed symbols which have aliases and turn
them to static symbols.  

I think for most scenarios this is OK, but I am not sure about
incremental linking (both LTO and non-LTO). What seems wrong is that we
produce C5 comdat that is not exporting what it should and thus breaking
the invariant that in valid code all comdats of same name are
semantically equivalent.  Perhaps it makes no difference since this
scenario is pretty special and we know that the functions are
semantically equivalent and their addresses are never compared for
equality (at least I failed to produce some useful testcase).

Honza
Jakub Jelinek April 17, 2024, 2:39 p.m. UTC | #6
On Wed, Apr 17, 2024 at 04:34:07PM +0200, Jan Hubicka wrote:
> I think for most scenarios this is OK, but I am not sure about
> incremental linking (both LTO and non-LTO). What seems wrong is that we
> produce C5 comdat that is not exporting what it should and thus breaking
> the invariant that in valid code all comdats of same name are
> semantically equivalent.

Yeah, exactly.  That is what I'm worried about too.

>  Perhaps it makes no difference since this
> scenario is pretty special and we know that the functions are
> semantically equivalent and their addresses are never compared for
> equality (at least I failed to produce some useful testcase).

Yes, I think one can't take address of a constructor/destructor and compare
that for equality; I guess the destructor address can be stored in vtables,
but code manually reading stuff from vtables and assuming pointer equality
is almost certainly not valid.

	Jakub
diff mbox series

Patch

--- gcc/cp/cp-tree.h.jj	2024-04-16 17:18:37.286279533 +0200
+++ gcc/cp/cp-tree.h	2024-04-16 17:45:01.685635709 +0200
@@ -7447,6 +7447,7 @@  extern bool handle_module_option (unsign
 /* In optimize.cc */
 extern tree clone_attrs				(tree);
 extern bool maybe_clone_body			(tree);
+extern void maybe_optimize_cdtor		(tree);
 
 /* In parser.cc */
 extern tree cp_convert_range_for (tree, tree, tree, cp_decomp *, bool,
--- gcc/cp/decl2.cc.jj	2024-04-16 17:18:37.287279519 +0200
+++ gcc/cp/decl2.cc	2024-04-16 17:45:01.686635695 +0200
@@ -3568,6 +3568,9 @@  import_export_decl (tree decl)
     }
 
   DECL_INTERFACE_KNOWN (decl) = 1;
+
+  if (DECL_CLONED_FUNCTION_P (decl))
+    maybe_optimize_cdtor (decl);
 }
 
 /* Return an expression that performs the destruction of DECL, which
--- gcc/cp/optimize.cc.jj	2024-04-16 17:18:37.374278327 +0200
+++ gcc/cp/optimize.cc	2024-04-16 21:35:53.597188774 +0200
@@ -723,3 +723,58 @@  maybe_clone_body (tree fn)
   /* We don't need to process the original function any further.  */
   return 1;
 }
+
+/* If maybe_clone_body is called while the cdtor is still tentative,
+   DECL_ONE_ONLY will be false and so will be can_alias_cdtor (fn).
+   In that case we wouldn't try to optimize using an alias and instead
+   would emit separate base and complete cdtor.  The following function
+   attempts to still optimize that case when we import_export_decl
+   is called first time on one of the clones.  */
+
+void
+maybe_optimize_cdtor (tree orig_decl)
+{
+  tree fns[3];
+  tree fn = DECL_CLONED_FUNCTION (orig_decl);
+  gcc_checking_assert (DECL_MAYBE_IN_CHARGE_CDTOR_P (fn));
+  if (DECL_INTERFACE_KNOWN (fn)
+      || !TREE_ASM_WRITTEN (fn)
+      || !DECL_ONE_ONLY (orig_decl)
+      || symtab->global_info_ready)
+    return;
+
+  populate_clone_array (fn, fns);
+
+  if (!fns[0] || !fns[1])
+    return;
+  for (int i = 2 - !fns[2]; i >= 0; --i)
+    if (fns[i] != orig_decl && DECL_INTERFACE_KNOWN (fns[i]))
+      return;
+  DECL_INTERFACE_KNOWN (fn) = 1;
+  comdat_linkage (fn);
+  if (!can_alias_cdtor (fn))
+    return;
+  /* For comdat base and complete cdtors put them into the same,
+     *[CD]5* comdat group instead of *[CD][12]*.  */
+  auto n0 = cgraph_node::get_create (fns[0]);
+  auto n1 = cgraph_node::get_create (fns[1]);
+  auto n2 = fns[2] ? cgraph_node::get_create (fns[1]) : NULL;
+  if (n0->lowered || n1->lowered || (n2 && n2->lowered))
+    return;
+  import_export_decl (fns[0]);
+  n1->definition = false;
+  if (!n0->create_same_body_alias (fns[1], fns[0]))
+    return;
+
+  tree comdat_group = cdtor_comdat_group (fns[1], fns[0]);
+  n1 = cgraph_node::get (fns[1]);
+  n0->set_comdat_group (comdat_group);
+  if (n1->same_comdat_group)
+    n1->remove_from_same_comdat_group ();
+  n1->add_to_same_comdat_group (n0);
+  if (fns[2])
+    n2->add_to_same_comdat_group (n0);
+  import_export_decl (fns[1]);
+  /* Remove the body now that it is an alias.  */
+  DECL_SAVED_TREE (fns[1]) = NULL_TREE;
+}
--- gcc/testsuite/g++.dg/lto/pr113208_0.C.jj	2024-04-16 17:45:01.687635682 +0200
+++ gcc/testsuite/g++.dg/lto/pr113208_0.C	2024-04-16 17:45:01.687635682 +0200
@@ -0,0 +1,13 @@ 
+// { dg-lto-do link }
+// { dg-lto-options { {-O1 -std=c++20 -flto}} }
+// { dg-extra-ld-options "-r -nostdlib -flinker-output=nolto-rel" }
+// { dg-require-linker-plugin "" }
+
+#define CONSTEXPR constexpr
+#include "pr113208.h"
+
+struct QualityValue;
+struct k : vector<QualityValue> {};
+
+void m(k);
+void n(k i) { m(i); }
--- gcc/testsuite/g++.dg/lto/pr113208_1.C.jj	2024-04-16 17:45:01.687635682 +0200
+++ gcc/testsuite/g++.dg/lto/pr113208_1.C	2024-04-16 17:45:01.687635682 +0200
@@ -0,0 +1,6 @@ 
+#define CONSTEXPR 
+#include "pr113208.h"
+
+struct QualityValue;
+vector<QualityValue> values1;
+vector<QualityValue> values{values1};
--- gcc/testsuite/g++.dg/lto/pr113208.h.jj	2024-04-16 17:45:01.687635682 +0200
+++ gcc/testsuite/g++.dg/lto/pr113208.h	2024-04-16 17:45:01.687635682 +0200
@@ -0,0 +1,10 @@ 
+template <typename _Tp> struct _Vector_base {
+  int g() const;
+  _Vector_base(int, int);
+};
+template <typename _Tp>
+struct vector : _Vector_base<_Tp> {
+  CONSTEXPR vector(const vector &__x)
+      : _Vector_base<_Tp>(1, __x.g()) {}
+ vector() : _Vector_base<_Tp>(1, 2) {}
+};