Patchwork RFA (cgraph): C++ 'structor decloning patch, Mark III

login
register
mail settings
Submitter Jason Merrill
Date Nov. 21, 2013, 5:41 p.m.
Message ID <528E45BB.6030100@redhat.com>
Download mbox | patch
Permalink /patch/293208/
State New
Headers show

Comments

Jason Merrill - Nov. 21, 2013, 5:41 p.m.
This all started with Stuart Hastings' original decloning patch way back 
in 2002:
   http://gcc.gnu.org/ml/gcc-patches/2002-08/msg00354.html
Bill Maddox tried to revive it in 2007:
   http://gcc.gnu.org/ml/gcc-patches/2007-11/msg01147.html

I'm embarrassed that it has taken so long to get this in.  The main 
source of concern in the past was ABI issues.  But now that we have 
COMDAT groups and precedent for putting multiple [cd]tors into the same 
group for aliasing, I see a solution: make the unified 'tor internal to 
the comdat, so the ABI of the comdat is no different from the cloned case.

I had to change various things in cgraph/ipa in order to support the 
notion of a comdat-local symbol which can only be referenced from within 
that comdat, which is what I'm looking for feedback/approval for.  The 
change to can_refer_decl_in_current_unit_p is not actually necessary, 
but seemed worth adding for possible future use of this functionality.

When decloning is turned on it is done regardless of the size of the 
'tor; we don't need to consider the size because the inliner will clean 
everything up for us.  If the unified function is small enough to inline 
into the thunks, it will do so and then the thunks can themselves be 
inlined into their callers; otherwise, inlining of the thunks is prevented.

The patch enables decloning with -Os, or in the rare case when cloning 
breaks semantics (i.e. in the presence of the GNU label address 
extension, as in the testcase).

OK for trunk?
Joseph S. Myers - Nov. 21, 2013, 6:24 p.m.
The new option should be able to use Var() in the .opt file rather than 
having a variable defined explicitly or any explicit handling code in 
c_common_handle_option, and shouldn't need to use UInteger (given the 
option has no arguments).
Jason Merrill - Dec. 9, 2013, 7:48 p.m.
On 11/21/2013 12:41 PM, Jason Merrill wrote:
> I had to change various things in cgraph/ipa in order to support the
> notion of a comdat-local symbol which can only be referenced from within
> that comdat, which is what I'm looking for feedback/approval for.  The
> change to can_refer_decl_in_current_unit_p is not actually necessary,
> but seemed worth adding for possible future use of this functionality.

Ping?

Jason
Jason Merrill - Dec. 9, 2013, 9:07 p.m.
On 11/21/2013 12:41 PM, Jason Merrill wrote:
> I had to change various things in cgraph/ipa in order to support the
> notion of a comdat-local symbol which can only be referenced from within
> that comdat, which is what I'm looking for feedback/approval for.  The
> change to can_refer_decl_in_current_unit_p is not actually necessary,
> but seemed worth adding for possible future use of this functionality.

Ping?

Jason
Jan Hubicka - Dec. 10, 2013, 9:48 a.m.
> This all started with Stuart Hastings' original decloning patch way
> back in 2002:
>   http://gcc.gnu.org/ml/gcc-patches/2002-08/msg00354.html
> Bill Maddox tried to revive it in 2007:
>   http://gcc.gnu.org/ml/gcc-patches/2007-11/msg01147.html
> 
> I'm embarrassed that it has taken so long to get this in.  The main
> source of concern in the past was ABI issues.  But now that we have
> COMDAT groups and precedent for putting multiple [cd]tors into the
> same group for aliasing, I see a solution: make the unified 'tor
> internal to the comdat, so the ABI of the comdat is no different
> from the cloned case.
> 
> I had to change various things in cgraph/ipa in order to support the
> notion of a comdat-local symbol which can only be referenced from
> within that comdat, which is what I'm looking for feedback/approval
> for.  The change to can_refer_decl_in_current_unit_p is not actually
> necessary, but seemed worth adding for possible future use of this
> functionality.
> 
> When decloning is turned on it is done regardless of the size of the
> 'tor; we don't need to consider the size because the inliner will
> clean everything up for us.  If the unified function is small enough
> to inline into the thunks, it will do so and then the thunks can
> themselves be inlined into their callers; otherwise, inlining of the
> thunks is prevented.
> 
> The patch enables decloning with -Os, or in the rare case when
> cloning breaks semantics (i.e. in the presence of the GNU label
> address extension, as in the testcase).
> 
> OK for trunk?

> commit 631d568491a3f4581e7067885a07a6096d06bf02
> Author: Jason Merrill <jason@redhat.com>
> Date:   Thu Jan 12 14:04:42 2012 -0500
> 
>     	PR c++/41090
>     	Add -fdeclone-ctor-dtor.
>     gcc/cp/
>     	* optimize.c (can_alias_cdtor, populate_clone_array): Split out
>     	from maybe_clone_body.
>     	(maybe_thunk_body): New function.
>     	(maybe_clone_body): Call it.
>     	* mangle.c (write_mangled_name): Remove code to suppress
>     	writing of mangled name for cloned constructor or destructor.
>     	(write_special_name_constructor): Handle decloned constructor.
>     	(write_special_name_destructor): Handle decloned destructor.
>     	* method.c (trivial_fn_p): Handle decloning.
>     	* semantics.c (expand_or_defer_fn_1): Clone after setting linkage.
>     gcc/c-family/
>     	* c-common.c, c-common.h (flag_declone_ctor_dtor): New variable.
>     	* c.opt: Add -fdeclone-ctor-dtor.
>     	* c-opts.c (c_common_handle_option): Likewise.
>     	(c_common_post_options): Default to on iff -Os.
>     gcc/
>     	* cgraph.h (comdat_local_p): New.
>     	* cgraph.c (verify_cgraph_node): Make sure we don't call a
>     	comdat-local function from outside its comdat.
>     	* gimple-fold.c (can_refer_decl_in_current_unit_p): Check
>     	references to comdat-local symbols.
>     	* ipa-cp.c (decide_about_value): Don't clone comdat-local fns.
>     	* ipa-inline.c (calls_comdat_local_p): New.
>     	(can_inline_edge_p): Check it.
>     	* ipa.c (symtab_remove_unreachable_nodes): Ignore comdat-local
>     	symbols when marking everything in the group as reachable.
>     	(function_and_variable_visibility): Handle comdat-local fns.
>     include/
>     	* demangle.h (enum gnu_v3_ctor_kinds):
>     	Added literal gnu_v3_unified_ctor.
>     	(enum gnu_v3_ctor_kinds):
>     	Added literal gnu_v3_unified_dtor.
>     libiberty/
>     	* cp-demangle.c (cplus_demangle_fill_ctor,cplus_demangle_fill_dtor):
>     	Handle unified ctor/dtor.
>     	(d_ctor_dtor_name): Handle unified ctor/dtor.
> 

> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index 009a165..0854b95 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2664,10 +2664,20 @@ verify_cgraph_node (struct cgraph_node *node)
>  	  error_found = true;
>  	}
>      }
> +  tree required_group = NULL_TREE;
> +  if (comdat_local_p (node))
> +    required_group = DECL_COMDAT_GROUP (node->decl);
>    for (e = node->callers; e; e = e->next_caller)
>      {
>        if (verify_edge_count_and_frequency (e))
>  	error_found = true;
> +      if (required_group
> +	  && DECL_COMDAT_GROUP (e->caller->decl) != required_group)
> +	{
> +	  error ("comdat-local function called by %s outside its comdat",
> +		 identifier_to_locale (e->caller->name ()));
> +	  error_found = true;
> +	}

You probably want to add same check into verify_symtab_base for referneces
(i.e. walk all references to function/variable and verify that these are in the same
group).
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 15719fb..b791811 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -1390,4 +1390,14 @@ symtab_can_be_discarded (symtab_node *node)
>  	      && node->resolution != LDPR_PREVAILING_DEF_IRONLY
>  	      && node->resolution != LDPR_PREVAILING_DEF_IRONLY_EXP));
>  }
> +
> +/* Return true if NODE is local to a particular COMDAT group, and must not
> +   be named from outside the COMDAT.  This is used for C++ decloned
> +   constructors.  */
> +
> +static inline bool
> +comdat_local_p (symtab_node *node)
> +{
> +  return (node->same_comdat_group && !TREE_PUBLIC (node->decl));
> +}

Until we turn it into methods, lets stick with current practice of having
symtab_ prefixes for the API...

The case where I played with local comdats was the following.
I made cgraph_body_availability to get context argument (i.e. instead of saying
if something is available in current unit, it was saying if it is available
in current function/variable).

If two symbols are in the same comdat group and refering each other, they are
available even though they may seem overwritable to others. I then started to
produce local symbols for those local references that are equivalent to your comdat
local.

We probably want to get in this extension too. (I did not get data on how often
it fires, but it does i.e. for recursive calls of C++ inlines).
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index d0fa8db..96c889e 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3318,6 +3318,10 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
>  					    &caller_count))
>      return false;
>  
> +  /* Don't clone decls local to a comdat group.  */
> +  if (comdat_local_p (node))
> +    return false;

Why? It seems this should work if the clone becomes another comdat local?

On the other hand, I think you want to prevent ipa-cp propagating addresses of comdat
ocals out of the function. (i.e. make it to check can_refer predicate for its subtitutions)
> +
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
>        fprintf (dump_file, " - considering value ");
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index fbb63da..aa49bfe 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -230,6 +230,25 @@ report_inline_failed_reason (struct cgraph_edge *e)
>      }
>  }
>  
> +/* True iff NODE calls another function which is local to its comdat
> +   (i.e. C++ decloned constructor); in that case, calls to NODE cannot be
> +   inlined, as that would cause a reference from outside the comdat to the
> +   local symbol.  If we inline the call from NODE to the local function,
> +   then inlining NODE can succeed.  */
> +
> +static bool
> +calls_comdat_local_p (struct cgraph_node *node)
> +{
> +  if (!node->same_comdat_group)
> +    return false;
> +  for (struct cgraph_edge *e = node->callees; e; e = e->next_callee)
> +    {
> +      if (comdat_local_p (e->callee))
> +	return true;
> +    }
> +  return false;
> +}
> +
>  /* Decide if we can inline the edge and possibly update
>     inline_failed reason.  
>     We check whether inlining is possible at all and whether
> @@ -237,7 +256,7 @@ report_inline_failed_reason (struct cgraph_edge *e)
>  
>     if REPORT is true, output reason to the dump file.  
>  
> -   if DISREGARD_LIMITES is true, ignore size limits.*/
> +   if DISREGARD_LIMITS is true, ignore size limits.*/
>  
>  static bool
>  can_inline_edge_p (struct cgraph_edge *e, bool report,
> @@ -267,6 +286,11 @@ can_inline_edge_p (struct cgraph_edge *e, bool report,
>        e->inline_failed = CIF_BODY_NOT_AVAILABLE;
>        inlinable = false;
>      }
> +  else if (calls_comdat_local_p (callee))

So we should check here if both caller and callee are in the same group and allow
inlining then?
> @@ -949,6 +952,8 @@ function_and_variable_visibility (bool whole_program)
>  	  node->forced_by_abi = false;
>  	}
>        if (!node->externally_visible
> +	  /* Don't mess with a function local to a comdat group.  */
> +	  && !comdat_local_p (node)
>  	  && node->definition && !node->weakref
>  	  && !DECL_EXTERNAL (node->decl))
>  	{

Probably you want to get make_decl_local to preserve comdat group; then you won't need
to care about it here and clonning could work.

Probably also when declaring a comdat symbol local, we want to turn all associated
comdats to loca, right? (i.e. with LTO and hidden comdat)

Thanks, this looks like interesting stuff!
Honza
Jan Hubicka - Dec. 10, 2013, 9:57 a.m.
> >     	* gimple-fold.c (can_refer_decl_in_current_unit_p): Check
> >     	references to comdat-local symbols.

Also i think this change needs more work.  FROM_DECL is not the function you
are going to get the refernece, it is variable whose constructor the value was
take from.
I think we need to rename it to can_refer_decl_in_symbol and add symbol argument
to get proper check. I am not sure where we use it without being sure the symbol
is current_function_decl.  We definitly make use of it during devirt and we need to
start using it in ipa-cp.

Honza
Jan Hubicka - Dec. 10, 2013, 10:06 a.m.
> > diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> > index fbb63da..aa49bfe 100644
> > --- a/gcc/ipa-inline.c
> > +++ b/gcc/ipa-inline.c
> > @@ -230,6 +230,25 @@ report_inline_failed_reason (struct cgraph_edge *e)
> >      }
> >  }
> >  
> > +/* True iff NODE calls another function which is local to its comdat
> > +   (i.e. C++ decloned constructor); in that case, calls to NODE cannot be
> > +   inlined, as that would cause a reference from outside the comdat to the
> > +   local symbol.  If we inline the call from NODE to the local function,
> > +   then inlining NODE can succeed.  */
> > +
> > +static bool
> > +calls_comdat_local_p (struct cgraph_node *node)
> > +{
> > +  if (!node->same_comdat_group)
> > +    return false;
> > +  for (struct cgraph_edge *e = node->callees; e; e = e->next_callee)
> > +    {
> > +      if (comdat_local_p (e->callee))
> > +	return true;
> > +    }
> > +  return false;
> > +}

Also this is not correct - because it ignores functions inlined into NODE and
use in can_inline_edge_p will lead to quadratic time issues.  We probably want
ipa-inline to compute flag if function refer to local symbol just before
inlining starts and propagate it across inlining decisions in make_edge_inline
and only check the flag in can_inline_edge_p.

Honza
Jason Merrill - Dec. 10, 2013, 9:11 p.m.
On 12/10/2013 04:48 AM, Jan Hubicka wrote:
> The case where I played with local comdats was the following.
> I made cgraph_body_availability to get context argument (i.e. instead of saying
> if something is available in current unit, it was saying if it is available
> in current function/variable).
>
> If two symbols are in the same comdat group and refering each other, they are
> available even though they may seem overwritable to others. I then started to
> produce local symbols for those local references that are equivalent to your comdat
> local.
>
> We probably want to get in this extension too. (I did not get data on how often
> it fires, but it does i.e. for recursive calls of C++ inlines).

I wouldn't expect it to affect anything other than recursive calls, 
since before my patch functions in the same COMDAT don't call each 
other, and with my patch they only call functions that are already local.

Also, this optimization would seem to apply to all recursive functions, 
not just those in comdat groups.

Are you thinking to add this after my patch is in?

>> +  /* Don't clone decls local to a comdat group.  */
>> +  if (comdat_local_p (node))
>> +    return false;
>
> Why? It seems this should work if the clone becomes another comdat local?

Right, I think the problem was that the clone wasn't becoming comdat 
local.  And for the specific case of decloning, there's no point in 
cloning the decloned constructor.

> On the other hand, I think you want to prevent ipa-cp propagating addresses of comdat
> locals out of the function. (i.e. make it to check can_refer predicate for its subtitutions)

Right, I wasn't worrying about that because it can't come up with decloning.

> So we should check here if both caller and callee are in the same group and allow
> inlining then?

Makes sense.

> Probably you want to get make_decl_local to preserve comdat group; then you won't need
> to care about it here and clonning could work.

OK.

> Probably also when declaring a comdat symbol local, we want to turn all associated
> comdats to local, right? (i.e. with LTO and hidden comdat)

I don't think so; if we change a public comdat symbol to local, that's a 
change to the ABI of the comdat, so it can't be the same comdat anymore, 
and dissolving the comdat seems appropriate.

> Also i think this change needs more work.  FROM_DECL is not the function you
> are going to get the reference, it is variable whose constructor the value was
> take from.
> I think we need to rename it to can_refer_decl_in_symbol and add symbol argument
> to get proper check. I am not sure where we use it without being sure the symbol
> is current_function_decl.  We definitly make use of it during devirt and we need to
> start using it in ipa-cp.

Would it be OK for me to just drop this change, since it isn't needed 
for decloning?

> Also this is not correct - because it ignores functions inlined into NODE and
> use in can_inline_edge_p will lead to quadratic time issues.  We probably want
> ipa-inline to compute flag if function refer to local symbol just before
> inlining starts and propagate it across inlining decisions in make_edge_inline
> and only check the flag in can_inline_edge_p.

OK.

Jason
Jan Hubicka - Dec. 11, 2013, 1:55 p.m.
> On 12/10/2013 04:48 AM, Jan Hubicka wrote:
> >The case where I played with local comdats was the following.
> >I made cgraph_body_availability to get context argument (i.e. instead of saying
> >if something is available in current unit, it was saying if it is available
> >in current function/variable).
> >
> >If two symbols are in the same comdat group and refering each other, they are
> >available even though they may seem overwritable to others. I then started to
> >produce local symbols for those local references that are equivalent to your comdat
> >local.
> >
> >We probably want to get in this extension too. (I did not get data on how often
> >it fires, but it does i.e. for recursive calls of C++ inlines).
> 
> I wouldn't expect it to affect anything other than recursive calls,
> since before my patch functions in the same COMDAT don't call each
> other, and with my patch they only call functions that are already
> local.
> 
> Also, this optimization would seem to apply to all recursive
> functions, not just those in comdat groups.

Agreed, I already do the conversion for many functions (based on "inline"
keyword implying that there is no overwrite changing semantic). So far the conversion
does not happen for comdats, since it would lead to local comdat and I also ignored the
conversion rule.
I have patch for that that handles it post-inlining + inliner patch that takes advantage
of function context to allow recursive inlining.
> 
> Are you thinking to add this after my patch is in?

Yes, lets do that incrementally.
> 
> >>+  /* Don't clone decls local to a comdat group.  */
> >>+  if (comdat_local_p (node))
> >>+    return false;
> >
> >Why? It seems this should work if the clone becomes another comdat local?
> 
> Right, I think the problem was that the clone wasn't becoming comdat
> local.  And for the specific case of decloning, there's no point in
> cloning the decloned constructor.

If it does not make sense, how we ended up cloning it?
Can you show me some code sample of decloning?  

I assume that we basically turn original cloned constructors into wrappers
around common "worker" that is now comdat local.  I think ipa-cp may end up
deciding on clonning the wrapper that will break because it will end up static
calling the local comdat function.
On the other hand it may decide to clone both the wrapper and the actual function
and in that case bringing both static is fine. 

So to be on safe side, we probably want to consider functions calling local comdat
unclonable but we do not need to worry about local comdats themselves.
For good codegen, I think ipa-cp will need to understand it needs to either clone
both or nothing. I think it is something Martin can look into?
> 
> >On the other hand, I think you want to prevent ipa-cp propagating addresses of comdat
> >locals out of the function. (i.e. make it to check can_refer predicate for its subtitutions)
> 
> Right, I wasn't worrying about that because it can't come up with decloning.
> 
> >So we should check here if both caller and callee are in the same group and allow
> >inlining then?
> 
> Makes sense.
> 
> >Probably you want to get make_decl_local to preserve comdat group; then you won't need
> >to care about it here and clonning could work.
> 
> OK.
> 
> >Probably also when declaring a comdat symbol local, we want to turn all associated
> >comdats to local, right? (i.e. with LTO and hidden comdat)
> 
> I don't think so; if we change a public comdat symbol to local,
> that's a change to the ABI of the comdat, so it can't be the same
> comdat anymore, and dissolving the comdat seems appropriate.

Yep, this is what I had in mind.  When we declare to turn comdat into non-comdat we need
to make sure that the comdat locals are dissolved, too.
> 
> >Also i think this change needs more work.  FROM_DECL is not the function you
> >are going to get the reference, it is variable whose constructor the value was
> >take from.
> >I think we need to rename it to can_refer_decl_in_symbol and add symbol argument
> >to get proper check. I am not sure where we use it without being sure the symbol
> >is current_function_decl.  We definitly make use of it during devirt and we need to
> >start using it in ipa-cp.
> 
> Would it be OK for me to just drop this change, since it isn't
> needed for decloning?

OK, I will make followup patch for this.
Thanks,
Honza
Jason Merrill - Dec. 11, 2013, 3:11 p.m.
On 12/11/2013 08:55 AM, Jan Hubicka wrote:
>>>> +  /* Don't clone decls local to a comdat group.  */
>>>> +  if (comdat_local_p (node))
>>>> +    return false;
>>>
>>> Why? It seems this should work if the clone becomes another comdat local?
>>
>> Right, I think the problem was that the clone wasn't becoming comdat
>> local.  And for the specific case of decloning, there's no point in
>> cloning the decloned constructor.
>
> If it does not make sense, how we ended up cloning it?

I guess the heuristics are making a mistake.

> Can you show me some code sample of decloning?

> I assume that we basically turn original cloned constructors into wrappers
> around common "worker" that is now comdat local.

Right.

> I think ipa-cp may end up
> deciding on clonning the wrapper that will break because it will end up static
> calling the local comdat function.
> On the other hand it may decide to clone both the wrapper and the actual function
> and in that case bringing both static is fine.

In the case I'm looking at (g++.dg/torture/pr41257.C, with decloning 
forced on), we're cloning the local function in order to propagate in a 
'0' passed in from one of the wrappers.  This is pointless because the 
wrapper contains just the one call, so in any situation where cloning 
makes sense, inlining is better.

So, it's probably possible to make it work to clone the comdat local 
function into another comdat local function, but it's not useful, and 
it's easier to just prevent it.

Jason
Jan Hubicka - Dec. 11, 2013, 5:04 p.m.
> On 12/11/2013 08:55 AM, Jan Hubicka wrote:
> >>>>+  /* Don't clone decls local to a comdat group.  */
> >>>>+  if (comdat_local_p (node))
> >>>>+    return false;
> >>>
> >>>Why? It seems this should work if the clone becomes another comdat local?
> >>
> >>Right, I think the problem was that the clone wasn't becoming comdat
> >>local.  And for the specific case of decloning, there's no point in
> >>cloning the decloned constructor.
> >
> >If it does not make sense, how we ended up cloning it?
> 
> I guess the heuristics are making a mistake.
> 
> >Can you show me some code sample of decloning?
> 
> >I assume that we basically turn original cloned constructors into wrappers
> >around common "worker" that is now comdat local.
> 
> Right.
> 
> >I think ipa-cp may end up
> >deciding on clonning the wrapper that will break because it will end up static
> >calling the local comdat function.
> >On the other hand it may decide to clone both the wrapper and the actual function
> >and in that case bringing both static is fine.
> 
> In the case I'm looking at (g++.dg/torture/pr41257.C, with decloning
> forced on), we're cloning the local function in order to propagate
> in a '0' passed in from one of the wrappers.  This is pointless
> because the wrapper contains just the one call, so in any situation
> where cloning makes sense, inlining is better.

ipa-cp does not really contain heuristic to figure out where cloning makes more
sense than inlining basically because if we produce such clone, inliner will inline
it anyway.

It is harder than it may seem to isolate testcases where inlining is always a win.
> 
> So, it's probably possible to make it work to clone the comdat local
> function into another comdat local function, but it's not useful,
> and it's easier to just prevent it.

Yep, with current limited use of comdat locals I guess it makes sense.

Honza
> 
> Jason
Martin Jambor - Dec. 13, 2013, 4 p.m.
Hi,

On Wed, Dec 11, 2013 at 02:55:26PM +0100, Jan Hubicka wrote:
> > On 12/10/2013 04:48 AM, Jan Hubicka wrote:
> > >The case where I played with local comdats was the following.
> > >I made cgraph_body_availability to get context argument (i.e. instead of saying
> > >if something is available in current unit, it was saying if it is available
> > >in current function/variable).
> > >
> > >If two symbols are in the same comdat group and refering each other, they are
> > >available even though they may seem overwritable to others. I then started to
> > >produce local symbols for those local references that are equivalent to your comdat
> > >local.
> > >
> > >We probably want to get in this extension too. (I did not get data on how often
> > >it fires, but it does i.e. for recursive calls of C++ inlines).
> > 
> > I wouldn't expect it to affect anything other than recursive calls,
> > since before my patch functions in the same COMDAT don't call each
> > other, and with my patch they only call functions that are already
> > local.
> > 
> > Also, this optimization would seem to apply to all recursive
> > functions, not just those in comdat groups.
> 
> Agreed, I already do the conversion for many functions (based on "inline"
> keyword implying that there is no overwrite changing semantic). So far the conversion
> does not happen for comdats, since it would lead to local comdat and I also ignored the
> conversion rule.
> I have patch for that that handles it post-inlining + inliner patch that takes advantage
> of function context to allow recursive inlining.
> > 
> > Are you thinking to add this after my patch is in?
> 
> Yes, lets do that incrementally.
> > 
> > >>+  /* Don't clone decls local to a comdat group.  */
> > >>+  if (comdat_local_p (node))
> > >>+    return false;
> > >
> > >Why? It seems this should work if the clone becomes another comdat local?
> > 
> > Right, I think the problem was that the clone wasn't becoming comdat
> > local.  And for the specific case of decloning, there's no point in
> > cloning the decloned constructor.
> 
> If it does not make sense, how we ended up cloning it?
> Can you show me some code sample of decloning?  
> 
> I assume that we basically turn original cloned constructors into wrappers
> around common "worker" that is now comdat local.  I think ipa-cp may end up
> deciding on clonning the wrapper that will break because it will end up static
> calling the local comdat function.
> On the other hand it may decide to clone both the wrapper and the actual function
> and in that case bringing both static is fine. 
> 
> So to be on safe side, we probably want to consider functions calling local comdat
> unclonable but we do not need to worry about local comdats themselves.
> For good codegen, I think ipa-cp will need to understand it needs to either clone
> both or nothing. I think it is something Martin can look into?

Uh, this will be quite a bit ugly but I will put it on my todo list.

Nevertheless, if you believe that a particular function should not be
cloned by ipa-cp, the best place to do make that change is in
ipcp_versionable_function_p, decide_about_value is too late and a lot
of assumptions about versionablity have already been made when it
runs.

Thanks,

Martin

Patch

commit 631d568491a3f4581e7067885a07a6096d06bf02
Author: Jason Merrill <jason@redhat.com>
Date:   Thu Jan 12 14:04:42 2012 -0500

    	PR c++/41090
    	Add -fdeclone-ctor-dtor.
    gcc/cp/
    	* optimize.c (can_alias_cdtor, populate_clone_array): Split out
    	from maybe_clone_body.
    	(maybe_thunk_body): New function.
    	(maybe_clone_body): Call it.
    	* mangle.c (write_mangled_name): Remove code to suppress
    	writing of mangled name for cloned constructor or destructor.
    	(write_special_name_constructor): Handle decloned constructor.
    	(write_special_name_destructor): Handle decloned destructor.
    	* method.c (trivial_fn_p): Handle decloning.
    	* semantics.c (expand_or_defer_fn_1): Clone after setting linkage.
    gcc/c-family/
    	* c-common.c, c-common.h (flag_declone_ctor_dtor): New variable.
    	* c.opt: Add -fdeclone-ctor-dtor.
    	* c-opts.c (c_common_handle_option): Likewise.
    	(c_common_post_options): Default to on iff -Os.
    gcc/
    	* cgraph.h (comdat_local_p): New.
    	* cgraph.c (verify_cgraph_node): Make sure we don't call a
    	comdat-local function from outside its comdat.
    	* gimple-fold.c (can_refer_decl_in_current_unit_p): Check
    	references to comdat-local symbols.
    	* ipa-cp.c (decide_about_value): Don't clone comdat-local fns.
    	* ipa-inline.c (calls_comdat_local_p): New.
    	(can_inline_edge_p): Check it.
    	* ipa.c (symtab_remove_unreachable_nodes): Ignore comdat-local
    	symbols when marking everything in the group as reachable.
    	(function_and_variable_visibility): Handle comdat-local fns.
    include/
    	* demangle.h (enum gnu_v3_ctor_kinds):
    	Added literal gnu_v3_unified_ctor.
    	(enum gnu_v3_ctor_kinds):
    	Added literal gnu_v3_unified_dtor.
    libiberty/
    	* cp-demangle.c (cplus_demangle_fill_ctor,cplus_demangle_fill_dtor):
    	Handle unified ctor/dtor.
    	(d_ctor_dtor_name): Handle unified ctor/dtor.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 97f33c1..ab0fca0 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -259,6 +259,12 @@  enum cxx_dialect cxx_dialect = cxx98;
 
 int max_tinst_depth = 900;
 
+/* For positive value, generate thunks to share a common constructor or
+   destructor body.  Otherwise, the code is replicated in the complete and
+   base-class variants of the ctor/dtor.  */
+
+int flag_declone_ctor_dtor = -1;
+
 /* The elements of `ridpointers' are identifier nodes for the reserved
    type names and storage classes.  It is indexed by a RID_... value.  */
 tree *ridpointers;
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 664e928..6721146 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -666,6 +666,12 @@  extern int c_inhibit_evaluation_warnings;
 
 extern bool done_lexing;
 
+/* For nonzero value, generate thunks to share a common constructor or
+   destructor body.  Otherwise, the code is replicated in the complete and
+   base-class variants of the ctor/dtor.  */
+
+extern int flag_declone_ctor_dtor;
+
 /* C types are partitioned into three subsets: object, function, and
    incomplete types.  */
 #define C_TYPE_OBJECT_P(type) \
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index f368cab..3e3b609 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -496,6 +496,10 @@  c_common_handle_option (size_t scode, const char *arg, int value,
       constant_string_class_name = arg;
       break;
 
+    case OPT_fdeclone_ctor_dtor:
+      flag_declone_ctor_dtor = value;
+      break;
+
     case OPT_fextended_identifiers:
       cpp_opts->extended_identifiers = value;
       break;
@@ -899,6 +903,10 @@  c_common_post_options (const char **pfilename)
   if (warn_implicit_function_declaration == -1)
     warn_implicit_function_declaration = flag_isoc99;
 
+  /* Declone C++ 'structors if -Os.  */
+  if (flag_declone_ctor_dtor == -1)
+    flag_declone_ctor_dtor = optimize_size;
+
   if (cxx_dialect >= cxx11)
     {
       /* If we're allowing C++0x constructs, don't warn about C++98
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index ac67885..0154312 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -891,6 +891,10 @@  fdeduce-init-list
 C++ ObjC++ Var(flag_deduce_init_list) Init(0)
 -fdeduce-init-list	enable deduction of std::initializer_list for a template type parameter from a brace-enclosed initializer-list
 
+fdeclone-ctor-dtor
+C++ ObjC++ UInteger
+Factor complex constructors and destructors to favor space over speed
+
 fdefault-inline
 C++ ObjC++ Ignore
 Does nothing.  Preserved for backward compatibility.
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 009a165..0854b95 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2664,10 +2664,20 @@  verify_cgraph_node (struct cgraph_node *node)
 	  error_found = true;
 	}
     }
+  tree required_group = NULL_TREE;
+  if (comdat_local_p (node))
+    required_group = DECL_COMDAT_GROUP (node->decl);
   for (e = node->callers; e; e = e->next_caller)
     {
       if (verify_edge_count_and_frequency (e))
 	error_found = true;
+      if (required_group
+	  && DECL_COMDAT_GROUP (e->caller->decl) != required_group)
+	{
+	  error ("comdat-local function called by %s outside its comdat",
+		 identifier_to_locale (e->caller->name ()));
+	  error_found = true;
+	}
       if (!e->inline_failed)
 	{
 	  if (node->global.inlined_to
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 15719fb..b791811 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1390,4 +1390,14 @@  symtab_can_be_discarded (symtab_node *node)
 	      && node->resolution != LDPR_PREVAILING_DEF_IRONLY
 	      && node->resolution != LDPR_PREVAILING_DEF_IRONLY_EXP));
 }
+
+/* Return true if NODE is local to a particular COMDAT group, and must not
+   be named from outside the COMDAT.  This is used for C++ decloned
+   constructors.  */
+
+static inline bool
+comdat_local_p (symtab_node *node)
+{
+  return (node->same_comdat_group && !TREE_PUBLIC (node->decl));
+}
 #endif  /* GCC_CGRAPH_H  */
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 500c81f..cc1f8c3 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10154,7 +10154,9 @@  grokdeclarator (const cp_declarator *declarator,
 	    /* The TYPE_DECL is "abstract" because there will be
 	       clones of this constructor/destructor, and there will
 	       be copies of this TYPE_DECL generated in those
-	       clones.  */
+	       clones.  The decloning optimization (for space) may
+               revert this subsequently if it determines that
+               the clones should share a common implementation.  */
 	    DECL_ABSTRACT (decl) = 1;
 	}
       else if (current_class_type
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 8a24d6c..d99062d 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -689,13 +689,6 @@  write_mangled_name (const tree decl, bool top_level)
     mangled_name:;
       write_string ("_Z");
       write_encoding (decl);
-      if (DECL_LANG_SPECIFIC (decl)
-	  && (DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (decl)
-	      || DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (decl)))
-	/* We need a distinct mangled name for these entities, but
-	   we should never actually output it.  So, we append some
-	   characters the assembler won't like.  */
-	write_string (" *INTERNAL* ");
     }
 }
 
@@ -1653,25 +1646,21 @@  write_identifier (const char *identifier)
 		    ::= C2   # base object constructor
 		    ::= C3   # complete object allocating constructor
 
-   Currently, allocating constructors are never used.
-
-   We also need to provide mangled names for the maybe-in-charge
-   constructor, so we treat it here too.  mangle_decl_string will
-   append *INTERNAL* to that, to make sure we never emit it.  */
+   Currently, allocating constructors are never used.  */
 
 static void
 write_special_name_constructor (const tree ctor)
 {
   if (DECL_BASE_CONSTRUCTOR_P (ctor))
     write_string ("C2");
+  /* This is the old-style "[unified]" constructor.
+     In some cases, we may emit this function and call
+     it from the clones in order to share code and save space.  */
+  else if (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (ctor))
+    write_string ("C4");
   else
     {
-      gcc_assert (DECL_COMPLETE_CONSTRUCTOR_P (ctor)
-		  /* Even though we don't ever emit a definition of
-		     the old-style destructor, we still have to
-		     consider entities (like static variables) nested
-		     inside it.  */
-		  || DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (ctor));
+      gcc_assert (DECL_COMPLETE_CONSTRUCTOR_P (ctor));
       write_string ("C1");
     }
 }
@@ -1681,11 +1670,7 @@  write_special_name_constructor (const tree ctor)
 
      <special-name> ::= D0 # deleting (in-charge) destructor
 		    ::= D1 # complete object (in-charge) destructor
-		    ::= D2 # base object (not-in-charge) destructor
-
-   We also need to provide mangled names for the maybe-incharge
-   destructor, so we treat it here too.  mangle_decl_string will
-   append *INTERNAL* to that, to make sure we never emit it.  */
+		    ::= D2 # base object (not-in-charge) destructor  */
 
 static void
 write_special_name_destructor (const tree dtor)
@@ -1694,14 +1679,14 @@  write_special_name_destructor (const tree dtor)
     write_string ("D0");
   else if (DECL_BASE_DESTRUCTOR_P (dtor))
     write_string ("D2");
+  else if (DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (dtor))
+    /* This is the old-style "[unified]" destructor.
+       In some cases, we may emit this function and call
+       it from the clones in order to share code and save space.  */
+    write_string ("D4");
   else
     {
-      gcc_assert (DECL_COMPLETE_DESTRUCTOR_P (dtor)
-		  /* Even though we don't ever emit a definition of
-		     the old-style destructor, we still have to
-		     consider entities (like static variables) nested
-		     inside it.  */
-		  || DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (dtor));
+      gcc_assert (DECL_COMPLETE_DESTRUCTOR_P (dtor));
       write_string ("D1");
     }
 }
diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index 7405365..e79a922 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -477,7 +477,8 @@  trivial_fn_p (tree fn)
     return false;
 
   /* If fn is a clone, get the primary variant.  */
-  fn = DECL_ORIGIN (fn);
+  if (tree prim = DECL_CLONED_FUNCTION (fn))
+    fn = prim;
   return type_has_trivial_fn (DECL_CONTEXT (fn), special_function_p (fn));
 }
 
diff --git a/gcc/cp/optimize.c b/gcc/cp/optimize.c
index b8df134..c7480ef 100644
--- a/gcc/cp/optimize.c
+++ b/gcc/cp/optimize.c
@@ -193,30 +193,40 @@  cdtor_comdat_group (tree complete, tree base)
   return get_identifier (grp_name);
 }
 
-/* FN is a function that has a complete body.  Clone the body as
-   necessary.  Returns nonzero if there's no longer any need to
-   process the main body.  */
+/* Returns true iff we can make the base and complete [cd]tor aliases of
+   the same symbol rather than separate functions.  */
 
-bool
-maybe_clone_body (tree fn)
+static bool
+can_alias_cdtor (tree fn)
+{
+#ifndef ASM_OUTPUT_DEF
+  /* If aliases aren't supported by the assembler, fail.  */
+  return false;
+#endif
+  /* We can't use an alias if there are virtual bases.  */
+  if (CLASSTYPE_VBASECLASSES (DECL_CONTEXT (fn)))
+    return false;
+  /* ??? Why not use aliases with -frepo?  */
+  if (flag_use_repository)
+    return false;
+  gcc_assert (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn)
+	      || DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (fn));
+  /* Don't use aliases for weak/linkonce definitions unless we can put both
+     symbols in the same COMDAT group.  */
+  return (DECL_INTERFACE_KNOWN (fn)
+	  && (SUPPORTS_ONE_ONLY || !DECL_WEAK (fn))
+	  && (!DECL_ONE_ONLY (fn)
+	      || (HAVE_COMDAT_GROUP && DECL_WEAK (fn))));
+}
+
+/* FN is a [cd]tor, fns is a pointer to an array of length 3.  Fill fns
+   with pointers to the base, complete, and deleting variants.  */
+
+static void
+populate_clone_array (tree fn, tree *fns)
 {
-  tree comdat_group = NULL_TREE;
   tree clone;
-  tree fns[3];
-  bool first = true;
-  bool in_charge_parm_used;
-  int idx;
-  bool need_alias = false;
 
-  /* We only clone constructors and destructors.  */
-  if (!DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn)
-      && !DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (fn))
-    return 0;
-
-  /* Emit the DWARF1 abstract instance.  */
-  (*debug_hooks->deferred_inline_function) (fn);
-
-  in_charge_parm_used = CLASSTYPE_VBASECLASSES (DECL_CONTEXT (fn)) != NULL;
   fns[0] = NULL_TREE;
   fns[1] = NULL_TREE;
   fns[2] = NULL_TREE;
@@ -234,6 +244,206 @@  maybe_clone_body (tree fn)
       fns[2] = clone;
     else
       gcc_unreachable ();
+}
+
+/* FN is a constructor or destructor, and there are FUNCTION_DECLs
+   cloned from it nearby.  Instead of cloning this body, leave it
+   alone and create tiny one-call bodies for the cloned
+   FUNCTION_DECLs.  These clones are sibcall candidates, and their
+   resulting code will be very thunk-esque.  */
+
+static bool
+maybe_thunk_body (tree fn, bool force)
+{
+  tree bind, block, call, clone, clone_result, fn_parm, fn_parm_typelist;
+  tree last_arg, modify, *args;
+  int parmno, vtt_parmno, max_parms;
+  tree fns[3];
+
+  if (!force && !flag_declone_ctor_dtor)
+    return 0;
+
+  /* If function accepts variable arguments, give up.  */
+  last_arg = tree_last (TYPE_ARG_TYPES (TREE_TYPE (fn)));
+  if (last_arg != void_list_node)
+    return 0;
+
+  /* If we got this far, we've decided to turn the clones into thunks.  */
+
+  /* We're going to generate code for fn, so it is no longer "abstract."
+     Also make the unified ctor/dtor private to either the translation unit
+     (for non-vague linkage ctors) or the COMDAT group (otherwise).  */
+
+  populate_clone_array (fn, fns);
+  DECL_ABSTRACT (fn) = false;
+  if (!DECL_WEAK (fn))
+    {
+      TREE_PUBLIC (fn) = false;
+      DECL_EXTERNAL (fn) = false;
+      DECL_INTERFACE_KNOWN (fn) = true;
+    }
+  else if (HAVE_COMDAT_GROUP)
+    {
+      tree comdat_group = cdtor_comdat_group (fns[1], fns[0]);
+      DECL_COMDAT_GROUP (fns[0]) = comdat_group;
+      symtab_add_to_same_comdat_group (cgraph_get_create_node (fns[1]),
+				       cgraph_get_create_node (fns[0]));
+      symtab_add_to_same_comdat_group (symtab_get_node (fn),
+				       symtab_get_node (fns[0]));
+      if (fns[2])
+	/* If *[CD][12]* dtors go into the *[CD]5* comdat group and dtor is
+	   virtual, it goes into the same comdat group as well.  */
+	symtab_add_to_same_comdat_group (cgraph_get_create_node (fns[2]),
+					 symtab_get_node (fns[0]));
+      TREE_PUBLIC (fn) = false;
+      DECL_EXTERNAL (fn) = false;
+      DECL_INTERFACE_KNOWN (fn) = true;
+      /* function_and_variable_visibility doesn't want !PUBLIC decls to
+	 have these flags set.  */
+      DECL_WEAK (fn) = false;
+      DECL_COMDAT (fn) = false;
+    }
+
+  /* Find the vtt_parm, if present.  */
+  for (vtt_parmno = -1, parmno = 0, fn_parm = DECL_ARGUMENTS (fn);
+       fn_parm;
+       ++parmno, fn_parm = TREE_CHAIN (fn_parm))
+    {
+      if (DECL_ARTIFICIAL (fn_parm)
+	  && DECL_NAME (fn_parm) == vtt_parm_identifier)
+	{
+	  /* Compensate for removed in_charge parameter.  */
+	  vtt_parmno = parmno;
+	  break;
+	}
+    }
+
+  /* Allocate an argument buffer for build_cxx_call().
+     Make sure it is large enough for any of the clones.  */
+  max_parms = 0;
+  FOR_EACH_CLONE (clone, fn)
+    {
+      int length = list_length (DECL_ARGUMENTS (fn));
+      if (length > max_parms)
+        max_parms = length;
+    }
+  args = (tree *) alloca (max_parms * sizeof (tree));
+
+  /* We know that any clones immediately follow FN in TYPE_METHODS.  */
+  FOR_EACH_CLONE (clone, fn)
+    {
+      tree clone_parm;
+
+      /* If we've already generated a body for this clone, avoid
+	 duplicating it.  (Is it possible for a clone-list to grow after we
+	 first see it?)  */
+      if (DECL_SAVED_TREE (clone) || TREE_ASM_WRITTEN (clone))
+	continue;
+
+      /* Start processing the function.  */
+      start_preparsed_function (clone, NULL_TREE, SF_PRE_PARSED);
+
+      if (clone == fns[2])
+	{
+	  for (clone_parm = DECL_ARGUMENTS (clone); clone_parm;
+	       clone_parm = TREE_CHAIN (clone_parm))
+	    DECL_ABSTRACT_ORIGIN (clone_parm) = NULL_TREE;
+	  /* Build the delete destructor by calling complete destructor and
+	     delete function.  */
+	  build_delete_destructor_body (clone, fns[1]);
+	}
+      else
+	{
+	  /* Walk parameter lists together, creating parameter list for
+	     call to original function.  */
+	  for (parmno = 0,
+		 fn_parm = DECL_ARGUMENTS (fn),
+		 fn_parm_typelist = TYPE_ARG_TYPES (TREE_TYPE (fn)),
+		 clone_parm = DECL_ARGUMENTS (clone);
+	       fn_parm;
+	       ++parmno,
+		 fn_parm = TREE_CHAIN (fn_parm))
+	    {
+	      if (parmno == vtt_parmno && ! DECL_HAS_VTT_PARM_P (clone))
+		{
+		  gcc_assert (fn_parm_typelist);
+		  /* Clobber argument with formal parameter type.  */
+		  args[parmno]
+		    = convert (TREE_VALUE (fn_parm_typelist),
+			       null_pointer_node);
+		}
+	      else if (parmno == 1 && DECL_HAS_IN_CHARGE_PARM_P (fn))
+		{
+		  tree in_charge
+		    = copy_node (in_charge_arg_for_name (DECL_NAME (clone)));
+		  args[parmno] = in_charge;
+		}
+	      /* Map other parameters to their equivalents in the cloned
+		 function.  */
+	      else
+		{
+		  gcc_assert (clone_parm);
+		  DECL_ABSTRACT_ORIGIN (clone_parm) = NULL;
+		  args[parmno] = clone_parm;
+		  clone_parm = TREE_CHAIN (clone_parm);
+		}
+	      if (fn_parm_typelist)
+		fn_parm_typelist = TREE_CHAIN (fn_parm_typelist);
+	    }
+
+	  /* We built this list backwards; fix now.  */
+	  mark_used (fn);
+	  call = build_cxx_call (fn, parmno, args, tf_warning_or_error);
+	  /* Arguments passed to the thunk by invisible reference should
+	     be transmitted to the callee unchanged.  Do not create a
+	     temporary and invoke the copy constructor.  The thunking
+	     transformation must not introduce any constructor calls.  */
+	  CALL_FROM_THUNK_P (call) = 1;
+	  block = make_node (BLOCK);
+	  if (targetm.cxx.cdtor_returns_this ())
+	    {
+	      clone_result = DECL_RESULT (clone);
+	      modify = build2 (MODIFY_EXPR, TREE_TYPE (clone_result),
+			       clone_result, call);
+	      add_stmt (modify);
+	      BLOCK_VARS (block) = clone_result;
+	    }
+	  else
+	    {
+	      add_stmt (call);
+	    }
+	  bind = c_build_bind_expr (DECL_SOURCE_LOCATION (clone),
+				    block, cur_stmt_list);
+	  DECL_SAVED_TREE (clone) = push_stmt_list ();
+	  add_stmt (bind);
+	}
+
+      DECL_ABSTRACT_ORIGIN (clone) = NULL;
+      expand_or_defer_fn (finish_function (0));
+    }
+  return 1;
+}
+
+/* FN is a function that has a complete body.  Clone the body as
+   necessary.  Returns nonzero if there's no longer any need to
+   process the main body.  */
+
+bool
+maybe_clone_body (tree fn)
+{
+  tree comdat_group = NULL_TREE;
+  tree clone;
+  tree fns[3];
+  bool first = true;
+  int idx;
+  bool need_alias = false;
+
+  /* We only clone constructors and destructors.  */
+  if (!DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn)
+      && !DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (fn))
+    return 0;
+
+  populate_clone_array (fn, fns);
 
   /* Remember if we can't have multiple clones for some reason.  We need to
      check this before we remap local static initializers in clone_body.  */
@@ -247,9 +457,6 @@  maybe_clone_body (tree fn)
     {
       tree parm;
       tree clone_parm;
-      int parmno;
-      bool alias = false;
-      struct pointer_map_t *decl_map;
 
       clone = fns[idx];
       if (!clone)
@@ -296,26 +503,44 @@  maybe_clone_body (tree fn)
 	   parm = DECL_CHAIN (parm), clone_parm = DECL_CHAIN (clone_parm))
 	/* Update this parameter.  */
 	update_cloned_parm (parm, clone_parm, first);
+    }
+
+  bool can_alias = can_alias_cdtor (fn);
+
+  /* If we decide to turn clones into thunks, they will branch to fn.
+     Must have original function available to call.  */
+  if (!can_alias && maybe_thunk_body (fn, need_alias))
+    {
+      pop_from_top_level ();
+      /* We still need to emit the original function.  */
+      return 0;
+    }
+
+  /* Emit the DWARF1 abstract instance.  */
+  (*debug_hooks->deferred_inline_function) (fn);
+
+  /* We know that any clones immediately follow FN in the TYPE_METHODS list. */
+  for (idx = 0; idx < 3; idx++)
+    {
+      tree parm;
+      tree clone_parm;
+      int parmno;
+      struct pointer_map_t *decl_map;
+      bool alias = false;
+
+      clone = fns[idx];
+      if (!clone)
+	continue;
 
       /* Start processing the function.  */
       start_preparsed_function (clone, NULL_TREE, SF_PRE_PARSED);
 
       /* Tell cgraph if both ctors or both dtors are known to have
 	 the same body.  */
-      if (!in_charge_parm_used
+      if (can_alias
 	  && fns[0]
 	  && idx == 1
-	  && !flag_use_repository
-	  && DECL_INTERFACE_KNOWN (fns[0])
-	  && (SUPPORTS_ONE_ONLY || !DECL_WEAK (fns[0]))
-	  && (!DECL_ONE_ONLY (fns[0])
-	      || (HAVE_COMDAT_GROUP
-		  && DECL_WEAK (fns[0])))
-	  && !flag_syntax_only
-	  /* Set linkage flags appropriately before
-	     cgraph_create_function_alias looks at them.  */
-	  && expand_or_defer_fn_1 (clone)
-	  && cgraph_same_body_alias (cgraph_get_node (fns[0]),
+	  && cgraph_same_body_alias (cgraph_get_create_node (fns[0]),
 				     clone, fns[0]))
 	{
 	  alias = true;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 11f7812..56ea393 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3901,20 +3901,6 @@  expand_or_defer_fn_1 (tree fn)
 
   gcc_assert (DECL_SAVED_TREE (fn));
 
-  /* If this is a constructor or destructor body, we have to clone
-     it.  */
-  if (maybe_clone_body (fn))
-    {
-      /* We don't want to process FN again, so pretend we've written
-	 it out, even though we haven't.  */
-      TREE_ASM_WRITTEN (fn) = 1;
-      /* If this is an instantiation of a constexpr function, keep
-	 DECL_SAVED_TREE for explain_invalid_constexpr_fn.  */
-      if (!is_instantiation_of_constexpr (fn))
-	DECL_SAVED_TREE (fn) = NULL_TREE;
-      return false;
-    }
-
   /* We make a decision about linkage for these functions at the end
      of the compilation.  Until that point, we do not want the back
      end to output them -- but we do want it to see the bodies of
@@ -3962,6 +3948,20 @@  expand_or_defer_fn_1 (tree fn)
 	}
     }
 
+  /* If this is a constructor or destructor body, we have to clone
+     it.  */
+  if (maybe_clone_body (fn))
+    {
+      /* We don't want to process FN again, so pretend we've written
+	 it out, even though we haven't.  */
+      TREE_ASM_WRITTEN (fn) = 1;
+      /* If this is an instantiation of a constexpr function, keep
+	 DECL_SAVED_TREE for explain_invalid_constexpr_fn.  */
+      if (!is_instantiation_of_constexpr (fn))
+	DECL_SAVED_TREE (fn) = NULL_TREE;
+      return false;
+    }
+
   /* There's no reason to do any of the work here if we're only doing
      semantic analysis; this code just generates RTL.  */
   if (flag_syntax_only)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6fc56b9..675ad21 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7265,6 +7265,18 @@  branch-less equivalents.
 
 Enabled at levels @option{-O}, @option{-O2}, @option{-O3}, @option{-Os}.
 
+@item -fdeclone-ctor-dtor
+@opindex fdeclone-ctor-dtor
+The C++ ABI requires multiple entry points for constructors and
+destructors: one for a base subobject, one for a complete object, and
+one for a virtual destructor that calls operator delete afterwards.
+For a hierarchy with virtual bases, the base and complete variants are
+clones, which means two copies of the function.  With this option, the
+base and complete variants are changed to be thunks that call a common
+implementation.
+
+Enabled by @option{-Os}.
+
 @item -fdelete-null-pointer-checks
 @opindex fdelete-null-pointer-checks
 Assume that programs cannot safely dereference null pointers, and that
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 91214bc..a57767e 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -89,6 +89,12 @@  can_refer_decl_in_current_unit_p (tree decl, tree from_decl)
       snode = symtab_get_node (decl);
       if (!snode)
 	return false;
+      /* A decl local to a comdat group can only be referenced by other
+	 decls in that group.  */
+      if (comdat_local_p (snode)
+	  && (!from_decl
+	      || DECL_COMDAT_GROUP (from_decl) != DECL_COMDAT_GROUP (decl)))
+	return false;
       node = dyn_cast <cgraph_node> (snode);
       return !node || !node->global.inlined_to;
     }
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index d0fa8db..96c889e 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3318,6 +3318,10 @@  decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
 					    &caller_count))
     return false;
 
+  /* Don't clone decls local to a comdat group.  */
+  if (comdat_local_p (node))
+    return false;
+
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
       fprintf (dump_file, " - considering value ");
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index fbb63da..aa49bfe 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -230,6 +230,25 @@  report_inline_failed_reason (struct cgraph_edge *e)
     }
 }
 
+/* True iff NODE calls another function which is local to its comdat
+   (i.e. C++ decloned constructor); in that case, calls to NODE cannot be
+   inlined, as that would cause a reference from outside the comdat to the
+   local symbol.  If we inline the call from NODE to the local function,
+   then inlining NODE can succeed.  */
+
+static bool
+calls_comdat_local_p (struct cgraph_node *node)
+{
+  if (!node->same_comdat_group)
+    return false;
+  for (struct cgraph_edge *e = node->callees; e; e = e->next_callee)
+    {
+      if (comdat_local_p (e->callee))
+	return true;
+    }
+  return false;
+}
+
 /* Decide if we can inline the edge and possibly update
    inline_failed reason.  
    We check whether inlining is possible at all and whether
@@ -237,7 +256,7 @@  report_inline_failed_reason (struct cgraph_edge *e)
 
    if REPORT is true, output reason to the dump file.  
 
-   if DISREGARD_LIMITES is true, ignore size limits.*/
+   if DISREGARD_LIMITS is true, ignore size limits.*/
 
 static bool
 can_inline_edge_p (struct cgraph_edge *e, bool report,
@@ -267,6 +286,11 @@  can_inline_edge_p (struct cgraph_edge *e, bool report,
       e->inline_failed = CIF_BODY_NOT_AVAILABLE;
       inlinable = false;
     }
+  else if (calls_comdat_local_p (callee))
+    {
+      e->inline_failed = CIF_FUNCTION_NOT_INLINABLE;
+      inlinable = false;
+    }
   else if (!inline_summary (callee)->inlinable 
 	   || (caller_cfun && fn_contains_cilk_spawn_p (caller_cfun)))
     {
diff --git a/gcc/ipa.c b/gcc/ipa.c
index 3950d4e..ab55388 100644
--- a/gcc/ipa.c
+++ b/gcc/ipa.c
@@ -363,14 +363,17 @@  symtab_remove_unreachable_nodes (bool before_inlining_p, FILE *file)
 	      enqueue_node (origin_node, &first, reachable);
 	    }
 	  /* If any symbol in a comdat group is reachable, force
-	     all other in the same comdat group to be also reachable.  */
+	     all externally visible symbols in the same comdat
+	     group to be reachable as well.  Comdat-local symbols
+	     can be discarded if all uses were inlined.  */
 	  if (node->same_comdat_group)
 	    {
 	      symtab_node *next;
 	      for (next = node->same_comdat_group;
 		   next != node;
 		   next = next->same_comdat_group)
-		if (!pointer_set_insert (reachable, next))
+		if (!comdat_local_p (next)
+		    && !pointer_set_insert (reachable, next))
 		  enqueue_node (next, &first, reachable);
 	    }
 	  /* Mark references as reachable.  */
@@ -949,6 +952,8 @@  function_and_variable_visibility (bool whole_program)
 	  node->forced_by_abi = false;
 	}
       if (!node->externally_visible
+	  /* Don't mess with a function local to a comdat group.  */
+	  && !comdat_local_p (node)
 	  && node->definition && !node->weakref
 	  && !DECL_EXTERNAL (node->decl))
 	{
diff --git a/gcc/testsuite/g++.dg/ext/label13a.C b/gcc/testsuite/g++.dg/ext/label13a.C
new file mode 100644
index 0000000..3be8e31
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/label13a.C
@@ -0,0 +1,25 @@ 
+// PR c++/41090
+// { dg-do run }
+// { dg-options "-save-temps" }
+// { dg-final { scan-assembler "_ZN1CC4Ev" } }
+// { dg-final cleanup-saved-temps }
+
+int i;
+struct A { A() {} };
+struct C: virtual A
+{
+  C();
+};
+
+C::C()
+{
+  static void *labelref = &&label;
+  goto *labelref;
+ label: i = 1;
+}
+
+int main()
+{
+  C c;
+  return (i != 1);
+}
diff --git a/include/demangle.h b/include/demangle.h
index 58bf547..bbad71b 100644
--- a/include/demangle.h
+++ b/include/demangle.h
@@ -173,6 +173,10 @@  enum gnu_v3_ctor_kinds {
   gnu_v3_complete_object_ctor = 1,
   gnu_v3_base_object_ctor,
   gnu_v3_complete_object_allocating_ctor,
+  /* These are not part of the V3 ABI.  Unified constructors are generated
+     as a speed-for-space optimization when the -fdeclone-ctor-dtor option
+     is used, and are always internal symbols.  */
+  gnu_v3_unified_ctor,
   gnu_v3_object_ctor_group
 };
 
@@ -188,6 +192,10 @@  enum gnu_v3_dtor_kinds {
   gnu_v3_deleting_dtor = 1,
   gnu_v3_complete_object_dtor,
   gnu_v3_base_object_dtor,
+  /* These are not part of the V3 ABI.  Unified destructors are generated
+     as a speed-for-space optimization when the -fdeclone-ctor-dtor option
+     is used, and are always internal symbols.  */
+  gnu_v3_unified_dtor,
   gnu_v3_object_dtor_group
 };
 
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index cbe4d8c..8902dd8 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -2081,6 +2081,9 @@  d_ctor_dtor_name (struct d_info *di)
 	  case '3':
 	    kind = gnu_v3_complete_object_allocating_ctor;
 	    break;
+          case '4':
+	    kind = gnu_v3_unified_ctor;
+	    break;
 	  case '5':
 	    kind = gnu_v3_object_ctor_group;
 	    break;
@@ -2106,6 +2109,10 @@  d_ctor_dtor_name (struct d_info *di)
 	  case '2':
 	    kind = gnu_v3_base_object_dtor;
 	    break;
+          /*  digit '3' is not used */
+	  case '4':
+	    kind = gnu_v3_unified_dtor;
+	    break;
 	  case '5':
 	    kind = gnu_v3_object_dtor_group;
 	    break;