diff mbox

RFA (cgraph): C++ 'structor decloning patch, Mark III

Message ID 52A891BC.7060307@redhat.com
State New
Headers show

Commit Message

Jason Merrill Dec. 11, 2013, 4:24 p.m. UTC
On 12/11/2013 10:11 AM, Jason Merrill wrote:
> So, it's probably possible to make it work to clone the comdat local
> function into another comdat local function, but it's not useful, and
> it's easier to just prevent it.

Well, not that much easier actually.  I'm attaching both a delta from my 
last patch and a complete patch against trunk.  Two questions:

Do you have an opinion about which way to implement symtab_in_same_comdat_p?

Is ipa_passes the right place to initialize calls_comdat_local?

Jason

Comments

Jason Merrill Dec. 11, 2013, 4:28 p.m. UTC | #1
On 12/11/2013 11:24 AM, Jason Merrill wrote:
> Well, not that much easier actually.  I'm attaching both a delta from my
> last patch and a complete patch against trunk.

Oops, forgot to remove the gimple-fold change, doing that now.

Jason
Jan Hubicka Dec. 11, 2013, 5:14 p.m. UTC | #2
> On 12/11/2013 10:11 AM, Jason Merrill wrote:
> >So, it's probably possible to make it work to clone the comdat local
> >function into another comdat local function, but it's not useful, and
> >it's easier to just prevent it.
> 
> Well, not that much easier actually.  I'm attaching both a delta
> from my last patch and a complete patch against trunk.  Two
> questions:
> 
> Do you have an opinion about which way to implement symtab_in_same_comdat_p?

I would go with the #if 1 case.  Don't seem to justify a loop that is trivially
replaceable by comdat.

In longer term the plan is to move all those visibility fields from trees into symbols
and in that case I am sure we will end up having the comdat group name in there too.
> 
> Is ipa_passes the right place to initialize calls_comdat_local?

The flag is probably needed in both early inliner and IPA inliner.  A conservative
place to initialize it would be in inline_analyze_function.
(early inliner analyze function twice, first before inlining and next after
early optimization, so the update should also clear the flag if call disapeared).

I think we also want to clear it if call to comdat local function is marked inline.
This is only way we can end up inlining those constructors now, right?
inlining ctors/dtors of course is rather important for further optimization,
so we should be cureful to not regress optimization here too much.
We will still lose some propagation since inliner won't work out that it can
propagate something useful through the wrapper call (unlike ipa-cp that would if
it knew how to clone local comdats).

I am on way home, I will look in detail on patch next.

Honza
Jan Hubicka Dec. 11, 2013, 5:45 p.m. UTC | #3
Hi,
while thinking of the details on how to handle comdat locals through ipa-cp/inliner/folding
I wonder, if we won't end up with better code going the oposite way.
We can declare those functions static first and then have post-inliner IPA pass that will
travel the callgraph and mark all static functions/variables that are accessed only from
one comdat to be comdat local. We already have issues here - for example when ipa-split
decides to split a comdat function, its part will end up output comdat block.
Similarly when we decide to produce a clone only to optimize comdat function, the clone
may easily become dead.  I also think with the C++ include jungle, people will end up
having static things referenced only from comdats, but I may be wrong.

This approach would have issues by itself tough
 1) we will need to do it at -O0 too and in this case we may want to have flag
    for those decloned functions since we need to do it only for those
 2) since I dropped the ball on merging function labels patch, we may end up
    privatizing function with a label that has address taken from static variable
    that is not privatized.  We may just disable localization for such functions I suppose.
 3) we may surprise users who forget to annotate with used attribute.
 4) inliner does not handle static and comdats the same way, since it assumes comdat
    will be optimized out with some probability.  This won't match on statics that will
    later become part of comdat group.   Not big deal probably.

On the other hand my feeling is that this has a potential of code size savings and
may end up more robust/easier to maintain than support this somewhat counter-intuitive
context thorough the whole IPA optimization queue.

What do you think?

Honza
Jason Merrill Dec. 11, 2013, 7:42 p.m. UTC | #4
On 12/11/2013 12:14 PM, Jan Hubicka wrote:
>> Is ipa_passes the right place to initialize calls_comdat_local?
>
> The flag is probably needed in both early inliner and IPA inliner.  A conservative
> place to initialize it would be in inline_analyze_function.
> (early inliner analyze function twice, first before inlining and next after
> early optimization, so the update should also clear the flag if call disapeared).

Unfortunately early inlining doesn't call inline_analyze_function on all 
functions, so we need to initialize it somewhere else.  Is there a 
reason not to set up an initial value in ipa_passes?

> I think we also want to clear it if call to comdat local function is marked inline.

Makes sense.

Jason
Jason Merrill Dec. 11, 2013, 8:44 p.m. UTC | #5
On 12/11/2013 12:45 PM, Jan Hubicka wrote:
> I wonder, if we won't end up with better code going the oposite way.
> We can declare those functions static first and then have post-inliner IPA pass that will
> travel the callgraph and mark all static functions/variables that are accessed only from
> one comdat to be comdat local.

With that scheme I would be concerned that we would tend to inline the 
wrappers so that we can't move the worker function into the comdat, and 
we end up with copies of it in lots of object files.

Jason
Jan Hubicka Dec. 11, 2013, 8:53 p.m. UTC | #6
> On 12/11/2013 12:45 PM, Jan Hubicka wrote:
> >I wonder, if we won't end up with better code going the oposite way.
> >We can declare those functions static first and then have post-inliner IPA pass that will
> >travel the callgraph and mark all static functions/variables that are accessed only from
> >one comdat to be comdat local.
> 
> With that scheme I would be concerned that we would tend to inline
> the wrappers so that we can't move the worker function into the
> comdat, and we end up with copies of it in lots of object files.

Hmm, you are right. Inliner and ipa-cp will need a cost model for dragging
stuff local in any case.  Perhaps we even want to "privatize" as much as possible
prior inlining. I will play with that incrementally.
Lets go with minimal version of patch that makes things working and I will take
care of solving the inliner limitations.

Honza
> 
> Jason
Jan Hubicka Dec. 11, 2013, 8:58 p.m. UTC | #7
> On 12/11/2013 12:14 PM, Jan Hubicka wrote:
> >>Is ipa_passes the right place to initialize calls_comdat_local?
> >
> >The flag is probably needed in both early inliner and IPA inliner.  A conservative
> >place to initialize it would be in inline_analyze_function.
> >(early inliner analyze function twice, first before inlining and next after
> >early optimization, so the update should also clear the flag if call disapeared).
> 
> Unfortunately early inlining doesn't call inline_analyze_function on
> all functions, so we need to initialize it somewhere else.  Is there
> a reason not to set up an initial value in ipa_passes?

Well, less thing we have done behind passmanager back, the better.  ipa-passes
should really just set up compiler to execute ipa passes, but it should not do
any analysis by itself.

Every function we inline or clone needs to go through the inliner's analysis.
But looking into it now, the place is probably compute_inline_paremeters, since
inline_analyze_function is just a wrapper for the IPA pass, but
pass_data_inline_parameters bypass it (this is bit confusing - I will clean it
up)

Honza
> 
> >I think we also want to clear it if call to comdat local function is marked inline.
> 
> Makes sense.
> 
> Jason
diff mbox

Patch

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index f368cab..3576f7d 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -899,6 +899,10 @@  c_common_post_options (const char **pfilename)
   if (warn_implicit_function_declaration == -1)
     warn_implicit_function_declaration = flag_isoc99;
 
+  /* Declone C++ 'structors if -Os.  */
+  if (flag_declone_ctor_dtor == -1)
+    flag_declone_ctor_dtor = optimize_size;
+
   if (cxx_dialect >= cxx11)
     {
       /* If we're allowing C++0x constructs, don't warn about C++98
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index bfca1e0..d270f77 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -890,6 +890,10 @@  fdeduce-init-list
 C++ ObjC++ Var(flag_deduce_init_list) Init(0)
 -fdeduce-init-list	enable deduction of std::initializer_list for a template type parameter from a brace-enclosed initializer-list
 
+fdeclone-ctor-dtor
+C++ ObjC++ Var(flag_declone_ctor_dtor) Init(-1)
+Factor complex constructors and destructors to favor space over speed
+
 fdefault-inline
 C++ ObjC++ Ignore
 Does nothing.  Preserved for backward compatibility.
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 9501afa..8b5b786 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2623,11 +2623,6 @@  verify_cgraph_node (struct cgraph_node *node)
       error ("execution count is negative");
       error_found = true;
     }
-  if (node->global.inlined_to && node->same_comdat_group)
-    {
-      error ("inline clone in same comdat group list");
-      error_found = true;
-    }
   if (!node->definition && !node->in_other_partition && node->local.local)
     {
       error ("local symbols must be defined");
@@ -2666,10 +2661,18 @@  verify_cgraph_node (struct cgraph_node *node)
 	  error_found = true;
 	}
     }
+  bool check_group = symtab_comdat_local_p (node);
   for (e = node->callers; e; e = e->next_caller)
     {
       if (verify_edge_count_and_frequency (e))
 	error_found = true;
+      if (check_group
+	  && !symtab_in_same_comdat_p (e->caller, node))
+	{
+	  error ("comdat-local function called by %s outside its comdat",
+		 identifier_to_locale (e->caller->name ()));
+	  error_found = true;
+	}
       if (!e->inline_failed)
 	{
 	  if (node->global.inlined_to
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0a88da3..62b7a7c 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -428,6 +428,8 @@  public:
   unsigned tm_clone : 1;
   /* True if this decl is a dispatcher for function versions.  */
   unsigned dispatcher_function : 1;
+  /* True if this decl calls a COMDAT-local function.  */
+  unsigned calls_comdat_local : 1;
 };
 
 
@@ -1490,4 +1492,31 @@  symtab_can_be_discarded (symtab_node *node)
 	      && node->resolution != LDPR_PREVAILING_DEF_IRONLY
 	      && node->resolution != LDPR_PREVAILING_DEF_IRONLY_EXP));
 }
+
+/* Return true if NODE is local to a particular COMDAT group, and must not
+   be named from outside the COMDAT.  This is used for C++ decloned
+   constructors.  */
+
+static inline bool
+symtab_comdat_local_p (symtab_node *node)
+{
+  return (node->same_comdat_group && !TREE_PUBLIC (node->decl));
+}
+
+/* Return true if ONE and TWO are part of the same COMDAT group.  */
+
+static inline bool
+symtab_in_same_comdat_p (symtab_node *one, symtab_node *two)
+{
+#if 1
+  return DECL_COMDAT_GROUP (one->decl) == DECL_COMDAT_GROUP (two->decl);
+#else
+  if (one->same_comdat_group && two->same_comdat_group)
+    for (symtab_node *n = one->same_comdat_group; n != one;
+	 n = n->same_comdat_group)
+      if (n == two)
+	return true;
+  return false;
+#endif
+}
 #endif  /* GCC_CGRAPH_H  */
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 90ef901..cff9a45 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -216,6 +216,8 @@  cgraph_clone_node (struct cgraph_node *n, tree decl, gcov_type count, int freq,
   new_node->clone = n->clone;
   new_node->clone.tree_map = NULL;
   new_node->tp_first_run = n->tp_first_run;
+  if (n->same_comdat_group)
+    symtab_add_to_same_comdat_group (new_node, n);
   if (n->count)
     {
       if (new_node->count > n->count)
@@ -446,13 +448,10 @@  cgraph_create_virtual_clone (struct cgraph_node *old_node,
 				redirect_callers, false, NULL);
   /* Update the properties.
      Make clone visible only within this translation unit.  Make sure
-     that is not weak also.
-     ??? We cannot use COMDAT linkage because there is no
-     ABI support for this.  */
+     that is not weak also.  */
   DECL_EXTERNAL (new_node->decl) = 0;
   if (DECL_ONE_ONLY (old_decl))
     DECL_SECTION_NAME (new_node->decl) = NULL;
-  DECL_COMDAT_GROUP (new_node->decl) = 0;
   TREE_PUBLIC (new_node->decl) = 0;
   DECL_COMDAT (new_node->decl) = 0;
   DECL_WEAK (new_node->decl) = 0;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 44f3afd..a1294ae 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1244,7 +1244,8 @@  mark_functions_to_output (void)
 	      for (next = cgraph (node->same_comdat_group);
 		   next != node;
 		   next = cgraph (next->same_comdat_group))
-		if (!next->thunk.thunk_p && !next->alias)
+		if (!next->thunk.thunk_p && !next->alias
+		    && !symtab_comdat_local_p (next))
 		  next->process = 1;
 	    }
 	}
@@ -1990,6 +1991,12 @@  ipa_passes (void)
 
   invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL);
 
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    if (symtab_comdat_local_p (node))
+      for (struct cgraph_edge *e = node->callers; e; e = e->next_caller)
+	e->caller->calls_comdat_local = true;
+
   if (!in_lto_p)
     {
       execute_ipa_pass_list (passes->all_small_ipa_passes);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index a6744c7..0f33542 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10184,7 +10184,9 @@  grokdeclarator (const cp_declarator *declarator,
 	    /* The TYPE_DECL is "abstract" because there will be
 	       clones of this constructor/destructor, and there will
 	       be copies of this TYPE_DECL generated in those
-	       clones.  */
+	       clones.  The decloning optimization (for space) may
+               revert this subsequently if it determines that
+               the clones should share a common implementation.  */
 	    DECL_ABSTRACT (decl) = 1;
 	}
       else if (current_class_type
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 8a24d6c..d99062d 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -689,13 +689,6 @@  write_mangled_name (const tree decl, bool top_level)
     mangled_name:;
       write_string ("_Z");
       write_encoding (decl);
-      if (DECL_LANG_SPECIFIC (decl)
-	  && (DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (decl)
-	      || DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (decl)))
-	/* We need a distinct mangled name for these entities, but
-	   we should never actually output it.  So, we append some
-	   characters the assembler won't like.  */
-	write_string (" *INTERNAL* ");
     }
 }
 
@@ -1653,25 +1646,21 @@  write_identifier (const char *identifier)
 		    ::= C2   # base object constructor
 		    ::= C3   # complete object allocating constructor
 
-   Currently, allocating constructors are never used.
-
-   We also need to provide mangled names for the maybe-in-charge
-   constructor, so we treat it here too.  mangle_decl_string will
-   append *INTERNAL* to that, to make sure we never emit it.  */
+   Currently, allocating constructors are never used.  */
 
 static void
 write_special_name_constructor (const tree ctor)
 {
   if (DECL_BASE_CONSTRUCTOR_P (ctor))
     write_string ("C2");
+  /* This is the old-style "[unified]" constructor.
+     In some cases, we may emit this function and call
+     it from the clones in order to share code and save space.  */
+  else if (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (ctor))
+    write_string ("C4");
   else
     {
-      gcc_assert (DECL_COMPLETE_CONSTRUCTOR_P (ctor)
-		  /* Even though we don't ever emit a definition of
-		     the old-style destructor, we still have to
-		     consider entities (like static variables) nested
-		     inside it.  */
-		  || DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (ctor));
+      gcc_assert (DECL_COMPLETE_CONSTRUCTOR_P (ctor));
       write_string ("C1");
     }
 }
@@ -1681,11 +1670,7 @@  write_special_name_constructor (const tree ctor)
 
      <special-name> ::= D0 # deleting (in-charge) destructor
 		    ::= D1 # complete object (in-charge) destructor
-		    ::= D2 # base object (not-in-charge) destructor
-
-   We also need to provide mangled names for the maybe-incharge
-   destructor, so we treat it here too.  mangle_decl_string will
-   append *INTERNAL* to that, to make sure we never emit it.  */
+		    ::= D2 # base object (not-in-charge) destructor  */
 
 static void
 write_special_name_destructor (const tree dtor)
@@ -1694,14 +1679,14 @@  write_special_name_destructor (const tree dtor)
     write_string ("D0");
   else if (DECL_BASE_DESTRUCTOR_P (dtor))
     write_string ("D2");
+  else if (DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (dtor))
+    /* This is the old-style "[unified]" destructor.
+       In some cases, we may emit this function and call
+       it from the clones in order to share code and save space.  */
+    write_string ("D4");
   else
     {
-      gcc_assert (DECL_COMPLETE_DESTRUCTOR_P (dtor)
-		  /* Even though we don't ever emit a definition of
-		     the old-style destructor, we still have to
-		     consider entities (like static variables) nested
-		     inside it.  */
-		  || DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (dtor));
+      gcc_assert (DECL_COMPLETE_DESTRUCTOR_P (dtor));
       write_string ("D1");
     }
 }
diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index 7405365..e79a922 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -477,7 +477,8 @@  trivial_fn_p (tree fn)
     return false;
 
   /* If fn is a clone, get the primary variant.  */
-  fn = DECL_ORIGIN (fn);
+  if (tree prim = DECL_CLONED_FUNCTION (fn))
+    fn = prim;
   return type_has_trivial_fn (DECL_CONTEXT (fn), special_function_p (fn));
 }
 
diff --git a/gcc/cp/optimize.c b/gcc/cp/optimize.c
index f1b09bf..40494f2 100644
--- a/gcc/cp/optimize.c
+++ b/gcc/cp/optimize.c
@@ -193,30 +193,40 @@  cdtor_comdat_group (tree complete, tree base)
   return get_identifier (grp_name);
 }
 
-/* FN is a function that has a complete body.  Clone the body as
-   necessary.  Returns nonzero if there's no longer any need to
-   process the main body.  */
+/* Returns true iff we can make the base and complete [cd]tor aliases of
+   the same symbol rather than separate functions.  */
 
-bool
-maybe_clone_body (tree fn)
+static bool
+can_alias_cdtor (tree fn)
+{
+#ifndef ASM_OUTPUT_DEF
+  /* If aliases aren't supported by the assembler, fail.  */
+  return false;
+#endif
+  /* We can't use an alias if there are virtual bases.  */
+  if (CLASSTYPE_VBASECLASSES (DECL_CONTEXT (fn)))
+    return false;
+  /* ??? Why not use aliases with -frepo?  */
+  if (flag_use_repository)
+    return false;
+  gcc_assert (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn)
+	      || DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (fn));
+  /* Don't use aliases for weak/linkonce definitions unless we can put both
+     symbols in the same COMDAT group.  */
+  return (DECL_INTERFACE_KNOWN (fn)
+	  && (SUPPORTS_ONE_ONLY || !DECL_WEAK (fn))
+	  && (!DECL_ONE_ONLY (fn)
+	      || (HAVE_COMDAT_GROUP && DECL_WEAK (fn))));
+}
+
+/* FN is a [cd]tor, fns is a pointer to an array of length 3.  Fill fns
+   with pointers to the base, complete, and deleting variants.  */
+
+static void
+populate_clone_array (tree fn, tree *fns)
 {
-  tree comdat_group = NULL_TREE;
   tree clone;
-  tree fns[3];
-  bool first = true;
-  bool in_charge_parm_used;
-  int idx;
-  bool need_alias = false;
 
-  /* We only clone constructors and destructors.  */
-  if (!DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn)
-      && !DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (fn))
-    return 0;
-
-  /* Emit the DWARF1 abstract instance.  */
-  (*debug_hooks->deferred_inline_function) (fn);
-
-  in_charge_parm_used = CLASSTYPE_VBASECLASSES (DECL_CONTEXT (fn)) != NULL;
   fns[0] = NULL_TREE;
   fns[1] = NULL_TREE;
   fns[2] = NULL_TREE;
@@ -234,6 +244,206 @@  maybe_clone_body (tree fn)
       fns[2] = clone;
     else
       gcc_unreachable ();
+}
+
+/* FN is a constructor or destructor, and there are FUNCTION_DECLs
+   cloned from it nearby.  Instead of cloning this body, leave it
+   alone and create tiny one-call bodies for the cloned
+   FUNCTION_DECLs.  These clones are sibcall candidates, and their
+   resulting code will be very thunk-esque.  */
+
+static bool
+maybe_thunk_body (tree fn, bool force)
+{
+  tree bind, block, call, clone, clone_result, fn_parm, fn_parm_typelist;
+  tree last_arg, modify, *args;
+  int parmno, vtt_parmno, max_parms;
+  tree fns[3];
+
+  if (!force && !flag_declone_ctor_dtor)
+    return 0;
+
+  /* If function accepts variable arguments, give up.  */
+  last_arg = tree_last (TYPE_ARG_TYPES (TREE_TYPE (fn)));
+  if (last_arg != void_list_node)
+    return 0;
+
+  /* If we got this far, we've decided to turn the clones into thunks.  */
+
+  /* We're going to generate code for fn, so it is no longer "abstract."
+     Also make the unified ctor/dtor private to either the translation unit
+     (for non-vague linkage ctors) or the COMDAT group (otherwise).  */
+
+  populate_clone_array (fn, fns);
+  DECL_ABSTRACT (fn) = false;
+  if (!DECL_WEAK (fn))
+    {
+      TREE_PUBLIC (fn) = false;
+      DECL_EXTERNAL (fn) = false;
+      DECL_INTERFACE_KNOWN (fn) = true;
+    }
+  else if (HAVE_COMDAT_GROUP)
+    {
+      tree comdat_group = cdtor_comdat_group (fns[1], fns[0]);
+      DECL_COMDAT_GROUP (fns[0]) = comdat_group;
+      symtab_add_to_same_comdat_group (cgraph_get_create_node (fns[1]),
+				       cgraph_get_create_node (fns[0]));
+      symtab_add_to_same_comdat_group (symtab_get_node (fn),
+				       symtab_get_node (fns[0]));
+      if (fns[2])
+	/* If *[CD][12]* dtors go into the *[CD]5* comdat group and dtor is
+	   virtual, it goes into the same comdat group as well.  */
+	symtab_add_to_same_comdat_group (cgraph_get_create_node (fns[2]),
+					 symtab_get_node (fns[0]));
+      TREE_PUBLIC (fn) = false;
+      DECL_EXTERNAL (fn) = false;
+      DECL_INTERFACE_KNOWN (fn) = true;
+      /* function_and_variable_visibility doesn't want !PUBLIC decls to
+	 have these flags set.  */
+      DECL_WEAK (fn) = false;
+      DECL_COMDAT (fn) = false;
+    }
+
+  /* Find the vtt_parm, if present.  */
+  for (vtt_parmno = -1, parmno = 0, fn_parm = DECL_ARGUMENTS (fn);
+       fn_parm;
+       ++parmno, fn_parm = TREE_CHAIN (fn_parm))
+    {
+      if (DECL_ARTIFICIAL (fn_parm)
+	  && DECL_NAME (fn_parm) == vtt_parm_identifier)
+	{
+	  /* Compensate for removed in_charge parameter.  */
+	  vtt_parmno = parmno;
+	  break;
+	}
+    }
+
+  /* Allocate an argument buffer for build_cxx_call().
+     Make sure it is large enough for any of the clones.  */
+  max_parms = 0;
+  FOR_EACH_CLONE (clone, fn)
+    {
+      int length = list_length (DECL_ARGUMENTS (fn));
+      if (length > max_parms)
+        max_parms = length;
+    }
+  args = (tree *) alloca (max_parms * sizeof (tree));
+
+  /* We know that any clones immediately follow FN in TYPE_METHODS.  */
+  FOR_EACH_CLONE (clone, fn)
+    {
+      tree clone_parm;
+
+      /* If we've already generated a body for this clone, avoid
+	 duplicating it.  (Is it possible for a clone-list to grow after we
+	 first see it?)  */
+      if (DECL_SAVED_TREE (clone) || TREE_ASM_WRITTEN (clone))
+	continue;
+
+      /* Start processing the function.  */
+      start_preparsed_function (clone, NULL_TREE, SF_PRE_PARSED);
+
+      if (clone == fns[2])
+	{
+	  for (clone_parm = DECL_ARGUMENTS (clone); clone_parm;
+	       clone_parm = TREE_CHAIN (clone_parm))
+	    DECL_ABSTRACT_ORIGIN (clone_parm) = NULL_TREE;
+	  /* Build the delete destructor by calling complete destructor and
+	     delete function.  */
+	  build_delete_destructor_body (clone, fns[1]);
+	}
+      else
+	{
+	  /* Walk parameter lists together, creating parameter list for
+	     call to original function.  */
+	  for (parmno = 0,
+		 fn_parm = DECL_ARGUMENTS (fn),
+		 fn_parm_typelist = TYPE_ARG_TYPES (TREE_TYPE (fn)),
+		 clone_parm = DECL_ARGUMENTS (clone);
+	       fn_parm;
+	       ++parmno,
+		 fn_parm = TREE_CHAIN (fn_parm))
+	    {
+	      if (parmno == vtt_parmno && ! DECL_HAS_VTT_PARM_P (clone))
+		{
+		  gcc_assert (fn_parm_typelist);
+		  /* Clobber argument with formal parameter type.  */
+		  args[parmno]
+		    = convert (TREE_VALUE (fn_parm_typelist),
+			       null_pointer_node);
+		}
+	      else if (parmno == 1 && DECL_HAS_IN_CHARGE_PARM_P (fn))
+		{
+		  tree in_charge
+		    = copy_node (in_charge_arg_for_name (DECL_NAME (clone)));
+		  args[parmno] = in_charge;
+		}
+	      /* Map other parameters to their equivalents in the cloned
+		 function.  */
+	      else
+		{
+		  gcc_assert (clone_parm);
+		  DECL_ABSTRACT_ORIGIN (clone_parm) = NULL;
+		  args[parmno] = clone_parm;
+		  clone_parm = TREE_CHAIN (clone_parm);
+		}
+	      if (fn_parm_typelist)
+		fn_parm_typelist = TREE_CHAIN (fn_parm_typelist);
+	    }
+
+	  /* We built this list backwards; fix now.  */
+	  mark_used (fn);
+	  call = build_cxx_call (fn, parmno, args, tf_warning_or_error);
+	  /* Arguments passed to the thunk by invisible reference should
+	     be transmitted to the callee unchanged.  Do not create a
+	     temporary and invoke the copy constructor.  The thunking
+	     transformation must not introduce any constructor calls.  */
+	  CALL_FROM_THUNK_P (call) = 1;
+	  block = make_node (BLOCK);
+	  if (targetm.cxx.cdtor_returns_this ())
+	    {
+	      clone_result = DECL_RESULT (clone);
+	      modify = build2 (MODIFY_EXPR, TREE_TYPE (clone_result),
+			       clone_result, call);
+	      add_stmt (modify);
+	      BLOCK_VARS (block) = clone_result;
+	    }
+	  else
+	    {
+	      add_stmt (call);
+	    }
+	  bind = c_build_bind_expr (DECL_SOURCE_LOCATION (clone),
+				    block, cur_stmt_list);
+	  DECL_SAVED_TREE (clone) = push_stmt_list ();
+	  add_stmt (bind);
+	}
+
+      DECL_ABSTRACT_ORIGIN (clone) = NULL;
+      expand_or_defer_fn (finish_function (0));
+    }
+  return 1;
+}
+
+/* FN is a function that has a complete body.  Clone the body as
+   necessary.  Returns nonzero if there's no longer any need to
+   process the main body.  */
+
+bool
+maybe_clone_body (tree fn)
+{
+  tree comdat_group = NULL_TREE;
+  tree clone;
+  tree fns[3];
+  bool first = true;
+  int idx;
+  bool need_alias = false;
+
+  /* We only clone constructors and destructors.  */
+  if (!DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn)
+      && !DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (fn))
+    return 0;
+
+  populate_clone_array (fn, fns);
 
   /* Remember if we can't have multiple clones for some reason.  We need to
      check this before we remap local static initializers in clone_body.  */
@@ -247,9 +457,6 @@  maybe_clone_body (tree fn)
     {
       tree parm;
       tree clone_parm;
-      int parmno;
-      bool alias = false;
-      struct pointer_map_t *decl_map;
 
       clone = fns[idx];
       if (!clone)
@@ -296,26 +503,44 @@  maybe_clone_body (tree fn)
 	   parm = DECL_CHAIN (parm), clone_parm = DECL_CHAIN (clone_parm))
 	/* Update this parameter.  */
 	update_cloned_parm (parm, clone_parm, first);
+    }
+
+  bool can_alias = can_alias_cdtor (fn);
+
+  /* If we decide to turn clones into thunks, they will branch to fn.
+     Must have original function available to call.  */
+  if (!can_alias && maybe_thunk_body (fn, need_alias))
+    {
+      pop_from_top_level ();
+      /* We still need to emit the original function.  */
+      return 0;
+    }
+
+  /* Emit the DWARF1 abstract instance.  */
+  (*debug_hooks->deferred_inline_function) (fn);
+
+  /* We know that any clones immediately follow FN in the TYPE_METHODS list. */
+  for (idx = 0; idx < 3; idx++)
+    {
+      tree parm;
+      tree clone_parm;
+      int parmno;
+      struct pointer_map_t *decl_map;
+      bool alias = false;
+
+      clone = fns[idx];
+      if (!clone)
+	continue;
 
       /* Start processing the function.  */
       start_preparsed_function (clone, NULL_TREE, SF_PRE_PARSED);
 
       /* Tell cgraph if both ctors or both dtors are known to have
 	 the same body.  */
-      if (!in_charge_parm_used
+      if (can_alias
 	  && fns[0]
 	  && idx == 1
-	  && !flag_use_repository
-	  && DECL_INTERFACE_KNOWN (fns[0])
-	  && (SUPPORTS_ONE_ONLY || !DECL_WEAK (fns[0]))
-	  && (!DECL_ONE_ONLY (fns[0])
-	      || (HAVE_COMDAT_GROUP
-		  && DECL_WEAK (fns[0])))
-	  && !flag_syntax_only
-	  /* Set linkage flags appropriately before
-	     cgraph_create_function_alias looks at them.  */
-	  && expand_or_defer_fn_1 (clone)
-	  && cgraph_same_body_alias (cgraph_get_node (fns[0]),
+	  && cgraph_same_body_alias (cgraph_get_create_node (fns[0]),
 				     clone, fns[0]))
 	{
 	  alias = true;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 7c1b18e..30b1b4e 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3901,20 +3901,6 @@  expand_or_defer_fn_1 (tree fn)
 
   gcc_assert (DECL_SAVED_TREE (fn));
 
-  /* If this is a constructor or destructor body, we have to clone
-     it.  */
-  if (maybe_clone_body (fn))
-    {
-      /* We don't want to process FN again, so pretend we've written
-	 it out, even though we haven't.  */
-      TREE_ASM_WRITTEN (fn) = 1;
-      /* If this is an instantiation of a constexpr function, keep
-	 DECL_SAVED_TREE for explain_invalid_constexpr_fn.  */
-      if (!is_instantiation_of_constexpr (fn))
-	DECL_SAVED_TREE (fn) = NULL_TREE;
-      return false;
-    }
-
   /* We make a decision about linkage for these functions at the end
      of the compilation.  Until that point, we do not want the back
      end to output them -- but we do want it to see the bodies of
@@ -3962,6 +3948,20 @@  expand_or_defer_fn_1 (tree fn)
 	}
     }
 
+  /* If this is a constructor or destructor body, we have to clone
+     it.  */
+  if (maybe_clone_body (fn))
+    {
+      /* We don't want to process FN again, so pretend we've written
+	 it out, even though we haven't.  */
+      TREE_ASM_WRITTEN (fn) = 1;
+      /* If this is an instantiation of a constexpr function, keep
+	 DECL_SAVED_TREE for explain_invalid_constexpr_fn.  */
+      if (!is_instantiation_of_constexpr (fn))
+	DECL_SAVED_TREE (fn) = NULL_TREE;
+      return false;
+    }
+
   /* There's no reason to do any of the work here if we're only doing
      semantic analysis; this code just generates RTL.  */
   if (flag_syntax_only)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 782a472..f69461f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7338,6 +7338,18 @@  branch-less equivalents.
 
 Enabled at levels @option{-O}, @option{-O2}, @option{-O3}, @option{-Os}.
 
+@item -fdeclone-ctor-dtor
+@opindex fdeclone-ctor-dtor
+The C++ ABI requires multiple entry points for constructors and
+destructors: one for a base subobject, one for a complete object, and
+one for a virtual destructor that calls operator delete afterwards.
+For a hierarchy with virtual bases, the base and complete variants are
+clones, which means two copies of the function.  With this option, the
+base and complete variants are changed to be thunks that call a common
+implementation.
+
+Enabled by @option{-Os}.
+
 @item -fdelete-null-pointer-checks
 @opindex fdelete-null-pointer-checks
 Assume that programs cannot safely dereference null pointers, and that
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 1705394..850abb8 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -95,6 +95,12 @@  can_refer_decl_in_current_unit_p (tree decl, tree from_decl)
       snode = symtab_get_node (decl);
       if (!snode)
 	return false;
+      /* A decl local to a comdat group can only be referenced by other
+	 decls in that group.  */
+      if (symtab_comdat_local_p (snode)
+	  && (!from_decl
+	      || DECL_COMDAT_GROUP (from_decl) != DECL_COMDAT_GROUP (decl)))
+	return false;
       node = dyn_cast <cgraph_node> (snode);
       return !node || !node->global.inlined_to;
     }
diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c
index 7fb4ab9..aa34221 100644
--- a/gcc/ipa-inline-transform.c
+++ b/gcc/ipa-inline-transform.c
@@ -272,6 +272,9 @@  inline_call (struct cgraph_edge *e, bool update_original,
    inline_update_overall_summary (to);
   new_size = inline_summary (to)->size;
 
+  if (callee->calls_comdat_local)
+    to->calls_comdat_local = true;
+
 #ifdef ENABLE_CHECKING
   /* Verify that estimated growth match real growth.  Allow off-by-one
      error due to INLINE_SIZE_SCALE roudoff errors.  */
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 38157ca..72675c6 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -241,7 +241,7 @@  report_inline_failed_reason (struct cgraph_edge *e)
 
    if REPORT is true, output reason to the dump file.  
 
-   if DISREGARD_LIMITES is true, ignore size limits.*/
+   if DISREGARD_LIMITS is true, ignore size limits.*/
 
 static bool
 can_inline_edge_p (struct cgraph_edge *e, bool report,
@@ -271,6 +271,12 @@  can_inline_edge_p (struct cgraph_edge *e, bool report,
       e->inline_failed = CIF_BODY_NOT_AVAILABLE;
       inlinable = false;
     }
+  else if (callee->calls_comdat_local
+	   && !symtab_in_same_comdat_p (e->caller, callee))
+    {
+      e->inline_failed = CIF_FUNCTION_NOT_INLINABLE;
+      inlinable = false;
+    }
   else if (!inline_summary (callee)->inlinable 
 	   || (caller_cfun && fn_contains_cilk_spawn_p (caller_cfun)))
     {
diff --git a/gcc/ipa.c b/gcc/ipa.c
index 1ec4b5f..3c19288 100644
--- a/gcc/ipa.c
+++ b/gcc/ipa.c
@@ -362,14 +362,17 @@  symtab_remove_unreachable_nodes (bool before_inlining_p, FILE *file)
 	      enqueue_node (origin_node, &first, reachable);
 	    }
 	  /* If any symbol in a comdat group is reachable, force
-	     all other in the same comdat group to be also reachable.  */
+	     all externally visible symbols in the same comdat
+	     group to be reachable as well.  Comdat-local symbols
+	     can be discarded if all uses were inlined.  */
 	  if (node->same_comdat_group)
 	    {
 	      symtab_node *next;
 	      for (next = node->same_comdat_group;
 		   next != node;
 		   next = next->same_comdat_group)
-		if (!pointer_set_insert (reachable, next))
+		if (!symtab_comdat_local_p (next)
+		    && !pointer_set_insert (reachable, next))
 		  enqueue_node (next, &first, reachable);
 	    }
 	  /* Mark references as reachable.  */
@@ -969,14 +972,14 @@  function_and_variable_visibility (bool whole_program)
 	  node->unique_name = ((node->resolution == LDPR_PREVAILING_DEF_IRONLY
 				      || node->resolution == LDPR_PREVAILING_DEF_IRONLY_EXP)
 				      && TREE_PUBLIC (node->decl));
-	  symtab_make_decl_local (node->decl);
 	  node->resolution = LDPR_PREVAILING_DEF_IRONLY;
-	  if (node->same_comdat_group)
+	  if (node->same_comdat_group && TREE_PUBLIC (node->decl))
 	    /* cgraph_externally_visible_p has already checked all other nodes
 	       in the group and they will all be made local.  We need to
 	       dissolve the group at once so that the predicate does not
 	       segfault though. */
 	    symtab_dissolve_same_comdat_group_list (node);
+	  symtab_make_decl_local (node->decl);
 	}
 
       if (node->thunk.thunk_p
diff --git a/gcc/symtab.c b/gcc/symtab.c
index dc700e7..26ec176 100644
--- a/gcc/symtab.c
+++ b/gcc/symtab.c
@@ -844,6 +844,21 @@  verify_symtab_base (symtab_node *node)
 	  n = n->same_comdat_group;
 	}
       while (n != node);
+      if (symtab_comdat_local_p (node))
+	{
+	  struct ipa_ref_list *refs = &node->ref_list;
+	  struct ipa_ref *ref;
+	  for (int i = 0; ipa_ref_list_referring_iterate (refs, i, ref); ++i)
+	    {
+	      if (!symtab_in_same_comdat_p (ref->referring, node))
+		{
+		  error ("comdat-local symbol referred to by %s outside its "
+			 "comdat",
+			 identifier_to_locale (ref->referring->name()));
+		  error_found = true;
+		}
+	    }
+	}
     }
   return error_found;
 }
@@ -911,6 +926,10 @@  symtab_make_decl_local (tree decl)
 {
   rtx rtl, symbol;
 
+  /* Avoid clearing DECL_COMDAT_GROUP on comdat-local decls.  */
+  if (TREE_PUBLIC (decl) == 0)
+    return;
+
   if (TREE_CODE (decl) == VAR_DECL)
     DECL_COMMON (decl) = 0;
   else gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
diff --git a/gcc/testsuite/g++.dg/ext/label13a.C b/gcc/testsuite/g++.dg/ext/label13a.C
new file mode 100644
index 0000000..3be8e31
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/label13a.C
@@ -0,0 +1,25 @@ 
+// PR c++/41090
+// { dg-do run }
+// { dg-options "-save-temps" }
+// { dg-final { scan-assembler "_ZN1CC4Ev" } }
+// { dg-final cleanup-saved-temps }
+
+int i;
+struct A { A() {} };
+struct C: virtual A
+{
+  C();
+};
+
+C::C()
+{
+  static void *labelref = &&label;
+  goto *labelref;
+ label: i = 1;
+}
+
+int main()
+{
+  C c;
+  return (i != 1);
+}
diff --git a/include/demangle.h b/include/demangle.h
index 58bf547..bbad71b 100644
--- a/include/demangle.h
+++ b/include/demangle.h
@@ -173,6 +173,10 @@  enum gnu_v3_ctor_kinds {
   gnu_v3_complete_object_ctor = 1,
   gnu_v3_base_object_ctor,
   gnu_v3_complete_object_allocating_ctor,
+  /* These are not part of the V3 ABI.  Unified constructors are generated
+     as a speed-for-space optimization when the -fdeclone-ctor-dtor option
+     is used, and are always internal symbols.  */
+  gnu_v3_unified_ctor,
   gnu_v3_object_ctor_group
 };
 
@@ -188,6 +192,10 @@  enum gnu_v3_dtor_kinds {
   gnu_v3_deleting_dtor = 1,
   gnu_v3_complete_object_dtor,
   gnu_v3_base_object_dtor,
+  /* These are not part of the V3 ABI.  Unified destructors are generated
+     as a speed-for-space optimization when the -fdeclone-ctor-dtor option
+     is used, and are always internal symbols.  */
+  gnu_v3_unified_dtor,
   gnu_v3_object_dtor_group
 };
 
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 029151e..7f51268 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -2107,6 +2107,9 @@  d_ctor_dtor_name (struct d_info *di)
 	  case '3':
 	    kind = gnu_v3_complete_object_allocating_ctor;
 	    break;
+          case '4':
+	    kind = gnu_v3_unified_ctor;
+	    break;
 	  case '5':
 	    kind = gnu_v3_object_ctor_group;
 	    break;
@@ -2132,6 +2135,10 @@  d_ctor_dtor_name (struct d_info *di)
 	  case '2':
 	    kind = gnu_v3_base_object_dtor;
 	    break;
+          /*  digit '3' is not used */
+	  case '4':
+	    kind = gnu_v3_unified_dtor;
+	    break;
 	  case '5':
 	    kind = gnu_v3_object_dtor_group;
 	    break;