diff mbox

RFC: C++ PATCH to support dynamic initialization and destruction of C++11 and OpenMP TLS variables

Message ID 506DC99E.8010001@redhat.com
State New
Headers show

Commit Message

Jason Merrill Oct. 4, 2012, 5:38 p.m. UTC
Both C++11 and OpenMP specify that thread_local/threadprivate variables 
can have dynamic initialization and destruction semantics.  This 
sequence of patches implements that.

The first patch adds the C++11 thread_local keyword and implements the 
C++11 parsing rules for thread_local, which differ somewhat from 
__thread.  It also allows dynamic initialization of function-local 
thread_local variables.

The second patch adds a __cxa_thread_atexit interface for registering 
cleanups to be run when a thread exits.  The current implementation is 
not fully conformant; on exit, TLS destructors are supposed to run 
before any destructors for non-TLS variables, but I think that will 
require glibc support for __cxa_thread_atexit.

The third patch implements dynamic initialization of non-function-local 
TLS variables using the init-on-first-use idiom: uses of such a variable 
are replaced with calls to a wrapper function, so that

int f();
thread_local int i = f();

implies

void i_init() {
   static bool initialized;
   if (!initialized)
     {
       initialized = true;
       i = f();
     }
}
inline int& i_wrapper() {
   i_init();
   return i;
}

Note that if we see

extern thread_local int i;

in a header, we don't know whether it has a dynamic initializer in its 
defining translation unit, so we need to make conservative assumptions. 
  For a type that doesn't always get dynamic initialization, we make 
i_init a weakref and only call it if it exists.  In the same translation 
unit as the definition, we optimize appropriately.

The wrapper function is the function I'm talking about in
   http://gcc.gnu.org/ml/gcc/2012-10/msg00024.html

Any comments before I check this in?

Jason

Comments

Richard Biener Oct. 5, 2012, 8:29 a.m. UTC | #1
On Thu, Oct 4, 2012 at 7:38 PM, Jason Merrill <jason@redhat.com> wrote:
> Both C++11 and OpenMP specify that thread_local/threadprivate variables can
> have dynamic initialization and destruction semantics.  This sequence of
> patches implements that.
>
> The first patch adds the C++11 thread_local keyword and implements the C++11
> parsing rules for thread_local, which differ somewhat from __thread.  It
> also allows dynamic initialization of function-local thread_local variables.
>
> The second patch adds a __cxa_thread_atexit interface for registering
> cleanups to be run when a thread exits.  The current implementation is not
> fully conformant; on exit, TLS destructors are supposed to run before any
> destructors for non-TLS variables, but I think that will require glibc
> support for __cxa_thread_atexit.
>
> The third patch implements dynamic initialization of non-function-local TLS
> variables using the init-on-first-use idiom: uses of such a variable are
> replaced with calls to a wrapper function, so that
>
> int f();
> thread_local int i = f();
>
> implies
>
> void i_init() {
>   static bool initialized;
>   if (!initialized)
>     {
>       initialized = true;
>       i = f();
>     }
> }
> inline int& i_wrapper() {
>   i_init();
>   return i;
> }
>
> Note that if we see
>
> extern thread_local int i;
>
> in a header, we don't know whether it has a dynamic initializer in its
> defining translation unit, so we need to make conservative assumptions.  For
> a type that doesn't always get dynamic initialization, we make i_init a
> weakref and only call it if it exists.  In the same translation unit as the
> definition, we optimize appropriately.
>
> The wrapper function is the function I'm talking about in
>   http://gcc.gnu.org/ml/gcc/2012-10/msg00024.html
>
> Any comments before I check this in?

I wonder if an implementation is conforming that performs non-local TLS
variable inits at thread creation time instead (probably also would require
glibc support)?

Or if we have the extra indirection via a reference anyway, we could
have a pointer TLS variable (NULL initialized) that on the first access
will trap where in a trap handler we could then perform initialization
and setup of that pointer.

Should we document our choice of implementation somewhere in the
manual?  And thus implicitely provide the hint that using non-local TLS
variables with dynamic initialization comes with an abstraction penalty
that we might not be easily able to remove?

Thanks,
Richard.

> Jason
Jakub Jelinek Oct. 5, 2012, 8:31 a.m. UTC | #2
On Thu, Oct 04, 2012 at 01:38:38PM -0400, Jason Merrill wrote:
> commit 18c01be0ec8b7a3cda6a16e86356e8e434c12f89
> Author: Jason Merrill <jason@redhat.com>
> Date:   Thu Sep 20 16:00:08 2012 -0400
> 
>     	Support C++11 thread_local destructors.
>     libstdc++-v3/
>     	* libsupc++/cxxabi.h: Declare __cxa_thread_atexit.
>     	* libsupc++/atexit_thread.cc: New.
>     	* libsupc++/Makefile.am (nested_exception.lo): Add it.
>     	* config/abi/pre/gnu.ver: Add __cxa_thread_atexit.

If we want to add it to glibc, then it shouldn't be exported
from libstdc++ (or perhaps only as a fallback implementation;
the question is what to do if non-Linux targets add __cxa_thread_atexit
to their C libraries).  If this is a fallback implementation, then it
needs to have some destructor that will call pthread_key_delete
(well, its __gthread_* wrapper), so that it libstdc++.so is dlclosed,
things don't crash (well, they likely will anyway if the C++ shared
library using thread_local destructors is dlclosed with multiple running
threads, but in the libstdc++ case it is at least avoidable).

	Jakub
Jakub Jelinek Oct. 5, 2012, 8:41 a.m. UTC | #3
On Fri, Oct 05, 2012 at 10:29:54AM +0200, Richard Guenther wrote:
> I wonder if an implementation is conforming that performs non-local TLS
> variable inits at thread creation time instead (probably also would require
> glibc support)?

I think it is conforming, but not really doable, because of dlopen.
If you have multiple threads running already and dlopen a library that has
thread_local with ctors, you can construct them in the current thread, but
can't construct them in other threads (even just sending some magic syscall,
interrupting those threads, isn't going to work, because then you'd be in
async-signal context where the number of things you can portably do is
limited, while the constructors can expect to be able to execute arbitrary
code).  This is similar to the reason why we want to make libraries
which have thread_local with dtors non-dlcloseable.  We can destruct in the
current thread at dlclose time, but can't destruct in other threads (and the
C++ standard isn't aware of dlclose, therefore it can't easily say anything
allowing the dtors not to be run in that case).  thread_local dtors for
other threads (from what Jason said, haven't checked the standard) don't
need to run for other threads at exit time (which is a situation very
similar to dlclose).

> Should we document our choice of implementation somewhere in the
> manual?  And thus implicitely provide the hint that using non-local TLS
> variables with dynamic initialization comes with an abstraction penalty
> that we might not be easily able to remove?

Unfortunately, that penalty is not only for thread_local vars with
ctors/dtors.  There is some penalty even for using
extern thread_local int i;
int foo (void)
{
  return i;
}
(as compared to extern __thread int i;), because we have to at least check
whether the weak TLS initializer for i is NULL or not.  Other TU could have
thread_local int i = dynamic_initialization ();

	Jakub
Jason Merrill Oct. 5, 2012, 5:28 p.m. UTC | #4
On 10/05/2012 04:41 AM, Jakub Jelinek wrote:
> Unfortunately, that penalty is not only for thread_local vars with
> ctors/dtors.  There is some penalty even for using
> extern thread_local int i;
> int foo (void)
> {
>    return i;
> }
> (as compared to extern __thread int i;), because we have to at least check
> whether the weak TLS initializer for i is NULL or not.  Other TU could have
> thread_local int i = dynamic_initialization ();

Right.  This is why my patch continues to treat thread_local and 
__thread differently; people that don't want this overhead for exported 
TLS variables with static initialization can continue to use __thread.

Jason
Jason Merrill Oct. 5, 2012, 5:38 p.m. UTC | #5
On 10/05/2012 04:29 AM, Richard Guenther wrote:
> Or if we have the extra indirection via a reference anyway, we could
> have a pointer TLS variable (NULL initialized) that on the first access
> will trap where in a trap handler we could then perform initialization
> and setup of that pointer.

Interesting idea.  But I don't think there's any way to determine from a 
SEGV handler which null pointer needs to be initialized, and in any case 
users might want to have their own SEGV handlers.

Jason
Jason Merrill Oct. 15, 2012, 5:43 p.m. UTC | #6
On 10/05/2012 10:38 AM, Jason Merrill wrote:
> On 10/05/2012 04:29 AM, Richard Guenther wrote:
>> Or if we have the extra indirection via a reference anyway, we could
>> have a pointer TLS variable (NULL initialized) that on the first access
>> will trap where in a trap handler we could then perform initialization
>> and setup of that pointer.
>
> Interesting idea.  But I don't think there's any way to determine from a
> SEGV handler which null pointer needs to be initialized, and in any case
> users might want to have their own SEGV handlers.

But then, with support from the kernel and dynamic loader we aren't 
limited to normal signal handlers; it ought to be possible to mark a 
page of thread_local variables as trapping so that the first reference 
invokes a designated initialization function.  I think that such a 
scheme ought to be link-compatible with this one so long as the 
initialization function is the same; references would just be direct 
rather than through the wrapper function.

It sounds like recent versions of MacOS X support something like this, 
though clang doesn't take advantage of it yet.

Jason
diff mbox

Patch

commit 972a5ac7c33d93b27d86d010af7c154da1281e6b
Author: Jason Merrill <jason@redhat.com>
Date:   Wed Sep 19 11:23:24 2012 -0400

    	Partial implementation of C++11 thread_local.
    c-family/
    	* c-common.c (c_common_reswords): Add thread_local.
    cp/
    	* decl.c (cp_finish_decl): Remove errors about non-trivial
    	initialization and destruction of TLS variables.
    	(register_dtor_fn): Add sorry about TLS variables.
    	(expand_static_init): Add sorry about non-local TLS variables,
    	or error with __thread.
    	Don't emit thread-safety guards for local TLS variables.
    	(grokdeclarator): thread_local in a function implies static.
    	* decl.h: Adjust prototype.
    	* decl2.c (get_guard): Copy DECL_TLS_MODEL.
    	* parser.c (cp_parser_set_storage_class, cp_parser_set_decl_spec_type)
    	(set_and_check_decl_spec_loc): Take the token rather than the location.
    	Distinguish between __thread and thread_local.
    	(cp_parser_set_storage_class): Don't complain about thread_local before
    	extern/static.
    	(token_is__thread): New.
    	* call.c (make_temporary_var_for_ref_to_temp): Handle TLS.
    	* cp-tree.h (DECL_GNU_TLS_P): New.
    	(cp_decl_specifier_seq): Add gnu_thread_keyword_p.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 6de2f1c..fcc9132 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -542,6 +542,7 @@  const struct c_common_resword c_common_reswords[] =
   { "switch",		RID_SWITCH,	0 },
   { "template",		RID_TEMPLATE,	D_CXXONLY | D_CXXWARN },
   { "this",		RID_THIS,	D_CXXONLY | D_CXXWARN },
+  { "thread_local",	RID_THREAD,	D_CXXONLY | D_CXX0X | D_CXXWARN },
   { "throw",		RID_THROW,	D_CXX_OBJC | D_CXXWARN },
   { "true",		RID_TRUE,	D_CXXONLY | D_CXXWARN },
   { "try",		RID_TRY,	D_CXX_OBJC | D_CXXWARN },
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index d0492d8..5cc9bad 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8718,9 +8718,9 @@  perform_direct_initialization_if_possible (tree type,
 
   The next several functions are involved in this lifetime extension.  */
 
-/* DECL is a VAR_DECL whose type is a REFERENCE_TYPE.  The reference
-   is being bound to a temporary.  Create and return a new VAR_DECL
-   with the indicated TYPE; this variable will store the value to
+/* DECL is a VAR_DECL or FIELD_DECL whose type is a REFERENCE_TYPE.  The
+   reference is being bound to a temporary.  Create and return a new
+   VAR_DECL with the indicated TYPE; this variable will store the value to
    which the reference is bound.  */
 
 tree
@@ -8732,13 +8732,15 @@  make_temporary_var_for_ref_to_temp (tree decl, tree type)
   var = create_temporary_var (type);
 
   /* Register the variable.  */
-  if (TREE_STATIC (decl))
+  if (TREE_CODE (decl) == VAR_DECL
+      && (TREE_STATIC (decl) || DECL_THREAD_LOCAL_P (decl)))
     {
       /* Namespace-scope or local static; give it a mangled name.  */
       /* FIXME share comdat with decl?  */
       tree name;
 
-      TREE_STATIC (var) = 1;
+      TREE_STATIC (var) = TREE_STATIC (decl);
+      DECL_TLS_MODEL (var) = DECL_TLS_MODEL (decl);
       name = mangle_ref_init_variable (decl);
       DECL_NAME (var) = name;
       SET_DECL_ASSEMBLER_NAME (var, name);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index bfe7ad7..12ad4ed 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -56,6 +56,7 @@  c-common.h, not after.
       AGGR_INIT_VIA_CTOR_P (in AGGR_INIT_EXPR)
       PTRMEM_OK_P (in ADDR_EXPR, OFFSET_REF, SCOPE_REF)
       PAREN_STRING_LITERAL (in STRING_CST)
+      DECL_GNU_TLS_P (in VAR_DECL)
       KOENIG_LOOKUP_P (in CALL_EXPR)
       STATEMENT_LIST_NO_SCOPE (in STATEMENT_LIST).
       EXPR_STMT_STMT_EXPR_RESULT (in EXPR_STMT)
@@ -2422,6 +2423,11 @@  struct GTY((variable_size)) lang_decl {
   (DECL_NAME (NODE) \
    && !strcmp (IDENTIFIER_POINTER (DECL_NAME (NODE)), "__PRETTY_FUNCTION__"))
 
+/* Nonzero if the thread-local variable was declared with __thread
+   as opposed to thread_local.  */
+#define DECL_GNU_TLS_P(NODE) \
+  (TREE_LANG_FLAG_0 (VAR_DECL_CHECK (NODE)))
+
 /* The _TYPE context in which this _DECL appears.  This field holds the
    class where a virtual function instance is actually defined.  */
 #define DECL_CLASS_CONTEXT(NODE) \
@@ -4722,6 +4728,8 @@  typedef struct cp_decl_specifier_seq {
   BOOL_BITFIELD explicit_int128_p : 1;
   /* True iff "char" was explicitly provided.  */
   BOOL_BITFIELD explicit_char_p : 1;
+  /* True iff ds_thread is set for __thread, not thread_local.  */
+  BOOL_BITFIELD gnu_thread_keyword_p : 1;
 } cp_decl_specifier_seq;
 
 /* The various kinds of declarators.  */
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index d0933ef..980aec2 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6194,13 +6194,6 @@  cp_finish_decl (tree decl, tree init, bool init_const_expr_p,
 
   if (TREE_CODE (decl) == VAR_DECL)
     {
-      /* Only variables with trivial initialization and destruction can
-	 have thread-local storage.  */
-      if (DECL_THREAD_LOCAL_P (decl)
-	  && (type_has_nontrivial_default_init (TREE_TYPE (decl))
-	      || TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (decl))))
-	error ("%qD cannot be thread-local because it has non-trivial "
-	       "type %qT", decl, TREE_TYPE (decl));
       /* If this is a local variable that will need a mangled name,
 	 register it now.  We must do this before processing the
 	 initializer for the variable, since the initialization might
@@ -6246,13 +6239,6 @@  cp_finish_decl (tree decl, tree init, bool init_const_expr_p,
 	    }
 	  cleanups = make_tree_vector ();
 	  init = check_initializer (decl, init, flags, &cleanups);
-	  /* Thread-local storage cannot be dynamically initialized.  */
-	  if (DECL_THREAD_LOCAL_P (decl) && init)
-	    {
-	      error ("%qD is thread-local and so cannot be dynamically "
-		     "initialized", decl);
-	      init = NULL_TREE;
-	    }
 
 	  /* Check that the initializer for a static data member was a
 	     constant.  Although we check in the parser that the
@@ -6701,6 +6687,12 @@  register_dtor_fn (tree decl)
       end_cleanup_fn ();
     }
 
+  if (DECL_THREAD_LOCAL_P (decl))
+    /* We don't have a thread-local atexit yet.  FIXME write one using
+       pthread_key_create and friends.  */
+    sorry ("thread-local variable %q#D with non-trivial "
+	   "destructor", decl);
+
   /* Call atexit with the cleanup function.  */
   mark_used (cleanup);
   cleanup = build_address (cleanup);
@@ -6764,6 +6756,36 @@  expand_static_init (tree decl, tree init)
       && TYPE_HAS_TRIVIAL_DESTRUCTOR (TREE_TYPE (decl)))
     return;
 
+  if (DECL_THREAD_LOCAL_P (decl) && DECL_GNU_TLS_P (decl)
+      && !DECL_FUNCTION_SCOPE_P (decl))
+    {
+      if (init)
+	error ("non-local variable %qD declared %<__thread%> "
+	       "needs dynamic initialization", decl);
+      else
+	error ("non-local variable %qD declared %<__thread%> "
+	       "has a non-trivial destructor", decl);
+      static bool informed;
+      if (!informed)
+	{
+	  inform (DECL_SOURCE_LOCATION (decl),
+		  "C++11 %<thread_local%> allows dynamic initialization "
+		  "and destruction");
+	  informed = true;
+	}
+      return;
+    }
+
+  if (DECL_THREAD_LOCAL_P (decl) && !DECL_FUNCTION_SCOPE_P (decl))
+    {
+      /* We haven't implemented dynamic initialization of non-local
+	 thread-local storage yet.  FIXME transform to singleton
+	 function.  */
+      sorry ("thread-local variable %qD with dynamic initialization outside "
+	     "function scope", decl);
+      return;
+    }
+
   if (DECL_FUNCTION_SCOPE_P (decl))
     {
       /* Emit code to perform this initialization but once.  */
@@ -6771,6 +6793,9 @@  expand_static_init (tree decl, tree init)
       tree then_clause = NULL_TREE, inner_then_clause = NULL_TREE;
       tree guard, guard_addr;
       tree flag, begin;
+      /* We don't need thread-safety code for thread-local vars.  */
+      bool thread_guard = (flag_threadsafe_statics
+			   && !DECL_THREAD_LOCAL_P (decl));
 
       /* Emit code to perform this initialization but once.  This code
 	 looks like:
@@ -6809,7 +6834,7 @@  expand_static_init (tree decl, tree init)
       /* This optimization isn't safe on targets with relaxed memory
 	 consistency.  On such targets we force synchronization in
 	 __cxa_guard_acquire.  */
-      if (!targetm.relaxed_ordering || !flag_threadsafe_statics)
+      if (!targetm.relaxed_ordering || !thread_guard)
 	{
 	  /* Begin the conditional initialization.  */
 	  if_stmt = begin_if_stmt ();
@@ -6817,7 +6842,7 @@  expand_static_init (tree decl, tree init)
 	  then_clause = begin_compound_stmt (BCS_NO_SCOPE);
 	}
 
-      if (flag_threadsafe_statics)
+      if (thread_guard)
 	{
 	  tree vfntype = NULL_TREE;
 	  tree acquire_name, release_name, abort_name;
@@ -6875,14 +6900,14 @@  expand_static_init (tree decl, tree init)
 
       finish_expr_stmt (init);
 
-      if (flag_threadsafe_statics)
+      if (thread_guard)
 	{
 	  finish_compound_stmt (inner_then_clause);
 	  finish_then_clause (inner_if_stmt);
 	  finish_if_stmt (inner_if_stmt);
 	}
 
-      if (!targetm.relaxed_ordering || !flag_threadsafe_statics)
+      if (!targetm.relaxed_ordering || !thread_guard)
 	{
 	  finish_compound_stmt (then_clause);
 	  finish_then_clause (if_stmt);
@@ -7699,7 +7724,11 @@  grokvardecl (tree type,
     }
 
   if (decl_spec_seq_has_spec_p (declspecs, ds_thread))
-    DECL_TLS_MODEL (decl) = decl_default_tls_model (decl);
+    {
+      DECL_TLS_MODEL (decl) = decl_default_tls_model (decl);
+      if (declspecs->gnu_thread_keyword_p)
+	DECL_GNU_TLS_P (decl) = true;
+    }
 
   /* If the type of the decl has no linkage, make sure that we'll
      notice that in mark_used.  */
@@ -8386,7 +8415,7 @@  check_var_type (tree identifier, tree type)
 
 tree
 grokdeclarator (const cp_declarator *declarator,
-		const cp_decl_specifier_seq *declspecs,
+		cp_decl_specifier_seq *declspecs,
 		enum decl_context decl_context,
 		int initialized,
 		tree* attrlist)
@@ -9100,9 +9129,15 @@  grokdeclarator (const cp_declarator *declarator,
 	   && storage_class != sc_extern
 	   && storage_class != sc_static)
     {
-      error ("function-scope %qs implicitly auto and declared %<__thread%>",
-	     name);
-      thread_p = false;
+      if (declspecs->gnu_thread_keyword_p)
+	pedwarn (input_location, 0, "function-scope %qs implicitly auto and "
+		 "declared %<__thread%>", name);
+
+      /* When thread_local is applied to a variable of block scope the
+	 storage-class-specifier static is implied if it does not appear
+	 explicitly.  */
+      storage_class = declspecs->storage_class = sc_static;
+      staticp = 1;
     }
 
   if (storage_class && friendp)
@@ -10337,7 +10372,14 @@  grokdeclarator (const cp_declarator *declarator,
 	else if (storage_class == sc_register)
 	  error ("storage class %<register%> invalid for function %qs", name);
 	else if (thread_p)
-	  error ("storage class %<__thread%> invalid for function %qs", name);
+	  {
+	    if (declspecs->gnu_thread_keyword_p)
+	      error ("storage class %<__thread%> invalid for function %qs",
+		     name);
+	    else
+	      error ("storage class %<thread_local%> invalid for function %qs",
+		     name);
+	  }
 
         if (virt_specifiers)
           error ("virt-specifiers in %qs not allowed outside a class definition", name);
diff --git a/gcc/cp/decl.h b/gcc/cp/decl.h
index a8a2b78..193df27 100644
--- a/gcc/cp/decl.h
+++ b/gcc/cp/decl.h
@@ -34,7 +34,7 @@  enum decl_context
 
 /* We need this in here to get the decl_context definition.  */
 extern tree grokdeclarator (const cp_declarator *,
-			    const cp_decl_specifier_seq *,
+			    cp_decl_specifier_seq *,
 			    enum decl_context, int, tree*);
 
 /* States indicating how grokdeclarator() should handle declspecs marked
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 87e38d3..a240ff4 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -2696,6 +2696,7 @@  get_guard (tree decl)
       TREE_STATIC (guard) = TREE_STATIC (decl);
       DECL_COMMON (guard) = DECL_COMMON (decl);
       DECL_COMDAT (guard) = DECL_COMDAT (decl);
+      DECL_TLS_MODEL (guard) = DECL_TLS_MODEL (decl);
       if (DECL_ONE_ONLY (decl))
 	make_decl_one_only (guard, cxx_comdat_group (guard));
       if (TREE_PUBLIC (decl))
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 155b51a..e3fb9ac 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -2215,12 +2215,12 @@  static tree cp_parser_trait_expr
 static bool cp_parser_declares_only_class_p
   (cp_parser *);
 static void cp_parser_set_storage_class
-  (cp_parser *, cp_decl_specifier_seq *, enum rid, location_t);
+  (cp_parser *, cp_decl_specifier_seq *, enum rid, cp_token *);
 static void cp_parser_set_decl_spec_type
-  (cp_decl_specifier_seq *, tree, location_t, bool);
+  (cp_decl_specifier_seq *, tree, cp_token *, bool);
 static void set_and_check_decl_spec_loc
   (cp_decl_specifier_seq *decl_specs,
-   cp_decl_spec ds, source_location location);
+   cp_decl_spec ds, cp_token *);
 static bool cp_parser_friend_p
   (const cp_decl_specifier_seq *);
 static void cp_parser_required_error
@@ -10672,7 +10672,7 @@  cp_parser_decl_specifier_seq (cp_parser* parser,
 
               /* Set the storage class anyway.  */
               cp_parser_set_storage_class (parser, decl_specs, RID_AUTO,
-					   token->location);
+					   token);
             }
           else
 	    /* C++0x auto type-specifier.  */
@@ -10686,7 +10686,7 @@  cp_parser_decl_specifier_seq (cp_parser* parser,
 	  /* Consume the token.  */
 	  cp_lexer_consume_token (parser->lexer);
           cp_parser_set_storage_class (parser, decl_specs, token->keyword,
-				       token->location);
+				       token);
 	  break;
 	case RID_THREAD:
 	  /* Consume the token.  */
@@ -10706,7 +10706,7 @@  cp_parser_decl_specifier_seq (cp_parser* parser,
 	error ("decl-specifier invalid in condition");
 
       if (ds != ds_last)
-	set_and_check_decl_spec_loc (decl_specs, ds, token->location);
+	set_and_check_decl_spec_loc (decl_specs, ds, token);
 
       /* Constructors are a special case.  The `S' in `S()' is not a
 	 decl-specifier; it is the beginning of the declarator.  */
@@ -10855,7 +10855,7 @@  cp_parser_function_specifier_opt (cp_parser* parser,
   switch (token->keyword)
     {
     case RID_INLINE:
-      set_and_check_decl_spec_loc (decl_specs, ds_inline, token->location);
+      set_and_check_decl_spec_loc (decl_specs, ds_inline, token);
       break;
 
     case RID_VIRTUAL:
@@ -10864,11 +10864,11 @@  cp_parser_function_specifier_opt (cp_parser* parser,
 	 A member function template shall not be virtual.  */
       if (PROCESSING_REAL_TEMPLATE_DECL_P ())
 	error_at (token->location, "templates may not be %<virtual%>");
-      set_and_check_decl_spec_loc (decl_specs, ds_virtual, token->location);
+      set_and_check_decl_spec_loc (decl_specs, ds_virtual, token);
       break;
 
     case RID_EXPLICIT:
-      set_and_check_decl_spec_loc (decl_specs, ds_explicit, token->location);
+      set_and_check_decl_spec_loc (decl_specs, ds_explicit, token);
       break;
 
     default:
@@ -13372,7 +13372,7 @@  cp_parser_type_specifier (cp_parser* parser,
 	  if (decl_specs)
 	    cp_parser_set_decl_spec_type (decl_specs,
 					  type_spec,
-					  token->location,
+					  token,
 					  /*type_definition_p=*/true);
 	  return type_spec;
 	}
@@ -13401,7 +13401,7 @@  cp_parser_type_specifier (cp_parser* parser,
 	  if (decl_specs)
 	    cp_parser_set_decl_spec_type (decl_specs,
 					  type_spec,
-					  token->location,
+					  token,
 					  /*type_definition_p=*/true);
 	  return type_spec;
 	}
@@ -13423,7 +13423,7 @@  cp_parser_type_specifier (cp_parser* parser,
       if (decl_specs)
 	cp_parser_set_decl_spec_type (decl_specs,
 				      type_spec,
-				      token->location,
+				      token,
 				      /*type_definition_p=*/false);
       return type_spec;
 
@@ -13459,7 +13459,7 @@  cp_parser_type_specifier (cp_parser* parser,
     {
       if (decl_specs)
 	{
-	  set_and_check_decl_spec_loc (decl_specs, ds, token->location);
+	  set_and_check_decl_spec_loc (decl_specs, ds, token);
 	  decl_specs->any_specifiers_p = true;
 	}
       return cp_lexer_consume_token (parser->lexer)->u.value;
@@ -13550,7 +13550,7 @@  cp_parser_simple_type_specifier (cp_parser* parser,
       type = boolean_type_node;
       break;
     case RID_SHORT:
-      set_and_check_decl_spec_loc (decl_specs, ds_short, token->location);
+      set_and_check_decl_spec_loc (decl_specs, ds_short, token);
       type = short_integer_type_node;
       break;
     case RID_INT:
@@ -13567,15 +13567,15 @@  cp_parser_simple_type_specifier (cp_parser* parser,
       break;
     case RID_LONG:
       if (decl_specs)
-	set_and_check_decl_spec_loc (decl_specs, ds_long, token->location);
+	set_and_check_decl_spec_loc (decl_specs, ds_long, token);
       type = long_integer_type_node;
       break;
     case RID_SIGNED:
-      set_and_check_decl_spec_loc (decl_specs, ds_signed, token->location);
+      set_and_check_decl_spec_loc (decl_specs, ds_signed, token);
       type = integer_type_node;
       break;
     case RID_UNSIGNED:
-      set_and_check_decl_spec_loc (decl_specs, ds_unsigned, token->location);
+      set_and_check_decl_spec_loc (decl_specs, ds_unsigned, token);
       type = unsigned_type_node;
       break;
     case RID_FLOAT:
@@ -13613,7 +13613,7 @@  cp_parser_simple_type_specifier (cp_parser* parser,
 
       if (decl_specs)
 	cp_parser_set_decl_spec_type (decl_specs, type,
-				      token->location,
+				      token,
 				      /*type_definition_p=*/false);
 
       return type;
@@ -13622,7 +13622,7 @@  cp_parser_simple_type_specifier (cp_parser* parser,
       type = cp_parser_trait_expr (parser, RID_UNDERLYING_TYPE);
       if (decl_specs)
 	cp_parser_set_decl_spec_type (decl_specs, type,
-				      token->location,
+				      token,
 				      /*type_definition_p=*/false);
 
       return type;
@@ -13632,7 +13632,7 @@  cp_parser_simple_type_specifier (cp_parser* parser,
       type = cp_parser_trait_expr (parser, token->keyword);
       if (decl_specs)
        cp_parser_set_decl_spec_type (decl_specs, type,
-                                     token->location,
+                                     token,
                                      /*type_definition_p=*/false);
       return type;
     default:
@@ -13647,7 +13647,7 @@  cp_parser_simple_type_specifier (cp_parser* parser,
       type = token->u.value;
       if (decl_specs)
 	cp_parser_set_decl_spec_type (decl_specs, type,
-				      token->location,
+				      token,
 				      /*type_definition_p=*/false);
       cp_lexer_consume_token (parser->lexer);
       return type;
@@ -13664,7 +13664,7 @@  cp_parser_simple_type_specifier (cp_parser* parser,
 	      && token->keyword != RID_LONG))
 	cp_parser_set_decl_spec_type (decl_specs,
 				      type,
-				      token->location,
+				      token,
 				      /*type_definition_p=*/false);
       if (decl_specs)
 	decl_specs->any_specifiers_p = true;
@@ -13741,7 +13741,7 @@  cp_parser_simple_type_specifier (cp_parser* parser,
 	type = NULL_TREE;
       if (type && decl_specs)
 	cp_parser_set_decl_spec_type (decl_specs, type,
-				      token->location,
+				      token,
 				      /*type_definition_p=*/false);
     }
 
@@ -15092,21 +15092,24 @@  static tree
 cp_parser_alias_declaration (cp_parser* parser)
 {
   tree id, type, decl, pushed_scope = NULL_TREE, attributes;
-  location_t id_location, using_location, attrs_location = 0;
+  location_t id_location;
   cp_declarator *declarator;
   cp_decl_specifier_seq decl_specs;
   bool member_p;
   const char *saved_message = NULL;
 
   /* Look for the `using' keyword.  */
-  using_location = cp_lexer_peek_token (parser->lexer)->location;
-  cp_parser_require_keyword (parser, RID_USING, RT_USING);
+  cp_token *using_token
+    = cp_parser_require_keyword (parser, RID_USING, RT_USING);
+  if (using_token == NULL)
+    return error_mark_node;
+
   id_location = cp_lexer_peek_token (parser->lexer)->location;
   id = cp_parser_identifier (parser);
   if (id == error_mark_node)
     return error_mark_node;
 
-  attrs_location = cp_lexer_peek_token (parser->lexer)->location;
+  cp_token *attrs_token = cp_lexer_peek_token (parser->lexer);
   attributes = cp_parser_attributes_opt (parser);
   if (attributes == error_mark_node)
     return error_mark_node;
@@ -15163,14 +15166,14 @@  cp_parser_alias_declaration (cp_parser* parser)
       decl_specs.attributes = attributes;
       set_and_check_decl_spec_loc (&decl_specs,
 				   ds_attribute,
-				   attrs_location);
+				   attrs_token);
     }
   set_and_check_decl_spec_loc (&decl_specs,
 			       ds_typedef,
-			       using_location);
+			       using_token);
   set_and_check_decl_spec_loc (&decl_specs,
 			       ds_alias,
-			       using_location);
+			       using_token);
 
   declarator = make_id_declarator (NULL_TREE, id, sfk_none);
   declarator->id_loc = id_location;
@@ -22054,13 +22057,13 @@  static void
 cp_parser_set_storage_class (cp_parser *parser,
 			     cp_decl_specifier_seq *decl_specs,
 			     enum rid keyword,
-			     location_t location)
+			     cp_token *token)
 {
   cp_storage_class storage_class;
 
   if (parser->in_unbraced_linkage_specification_p)
     {
-      error_at (location, "invalid use of %qD in linkage specification",
+      error_at (token->location, "invalid use of %qD in linkage specification",
 		ridpointers[keyword]);
       return;
     }
@@ -22071,11 +22074,11 @@  cp_parser_set_storage_class (cp_parser *parser,
     }
 
   if ((keyword == RID_EXTERN || keyword == RID_STATIC)
-      && decl_spec_seq_has_spec_p (decl_specs, ds_thread))
+      && decl_spec_seq_has_spec_p (decl_specs, ds_thread)
+      && decl_specs->gnu_thread_keyword_p)
     {
-      error_at (decl_specs->locations[ds_thread],
+      pedwarn (decl_specs->locations[ds_thread], 0,
 		"%<__thread%> before %qD", ridpointers[keyword]);
-      decl_specs->locations[ds_thread] = 0;
     }
 
   switch (keyword)
@@ -22099,7 +22102,7 @@  cp_parser_set_storage_class (cp_parser *parser,
       gcc_unreachable ();
     }
   decl_specs->storage_class = storage_class;
-  set_and_check_decl_spec_loc (decl_specs, ds_storage_class, location);
+  set_and_check_decl_spec_loc (decl_specs, ds_storage_class, token);
 
   /* A storage class specifier cannot be applied alongside a typedef 
      specifier. If there is a typedef specifier present then set 
@@ -22115,7 +22118,7 @@  cp_parser_set_storage_class (cp_parser *parser,
 static void
 cp_parser_set_decl_spec_type (cp_decl_specifier_seq *decl_specs,
 			      tree type_spec,
-			      location_t location,
+			      cp_token *token,
 			      bool type_definition_p)
 {
   decl_specs->any_specifiers_p = true;
@@ -22140,12 +22143,12 @@  cp_parser_set_decl_spec_type (cp_decl_specifier_seq *decl_specs,
       decl_specs->redefined_builtin_type = type_spec;
       set_and_check_decl_spec_loc (decl_specs,
 				   ds_redefined_builtin_type_spec,
-				   location);
+				   token);
       if (!decl_specs->type)
 	{
 	  decl_specs->type = type_spec;
 	  decl_specs->type_definition_p = false;
-	  set_and_check_decl_spec_loc (decl_specs,ds_type_spec, location);
+	  set_and_check_decl_spec_loc (decl_specs,ds_type_spec, token);
 	}
     }
   else if (decl_specs->type)
@@ -22155,10 +22158,19 @@  cp_parser_set_decl_spec_type (cp_decl_specifier_seq *decl_specs,
       decl_specs->type = type_spec;
       decl_specs->type_definition_p = type_definition_p;
       decl_specs->redefined_builtin_type = NULL_TREE;
-      set_and_check_decl_spec_loc (decl_specs, ds_type_spec, location);
+      set_and_check_decl_spec_loc (decl_specs, ds_type_spec, token);
     }
 }
 
+/* True iff TOKEN is the GNU keyword __thread.  */
+
+static bool
+token_is__thread (cp_token *token)
+{
+  gcc_assert (token->keyword == RID_THREAD);
+  return !strcmp (IDENTIFIER_POINTER (token->u.value), "__thread");
+}
+
 /* Set the location for a declarator specifier and check if it is
    duplicated.
 
@@ -22173,15 +22185,21 @@  cp_parser_set_decl_spec_type (cp_decl_specifier_seq *decl_specs,
 
 static void
 set_and_check_decl_spec_loc (cp_decl_specifier_seq *decl_specs,
-			     cp_decl_spec ds, source_location location)
+			     cp_decl_spec ds, cp_token *token)
 {
   gcc_assert (ds < ds_last);
 
   if (decl_specs == NULL)
     return;
 
+  source_location location = token->location;
+
   if (decl_specs->locations[ds] == 0)
-    decl_specs->locations[ds] = location;
+    {
+      decl_specs->locations[ds] = location;
+      if (ds == ds_thread)
+	decl_specs->gnu_thread_keyword_p = token_is__thread (token);
+    }
   else
     {
       if (ds == ds_long)
@@ -22197,6 +22215,15 @@  set_and_check_decl_spec_loc (cp_decl_specifier_seq *decl_specs,
 			     "ISO C++ 1998 does not support %<long long%>");
 	    }
 	}
+      else if (ds == ds_thread)
+	{
+	  bool gnu = token_is__thread (token);
+	  if (gnu != decl_specs->gnu_thread_keyword_p)
+	    error_at (location,
+		      "both %<__thread%> and %<thread_local%> specified");
+	  else
+	    error_at (location, "duplicate %qD", token->u.value);
+	}
       else
 	{
 	  static const char *const decl_spec_names[] = {
@@ -22214,8 +22241,7 @@  set_and_check_decl_spec_loc (cp_decl_specifier_seq *decl_specs,
 	    "typedef",
 	    "using",
             "constexpr",
-	    "__complex",
-	    "__thread"
+	    "__complex"
 	  };
 	  error_at (location,
 		    "duplicate %qs", decl_spec_names[ds]);
@@ -24056,7 +24082,7 @@  cp_parser_objc_class_ivars (cp_parser* parser)
 	  declspecs.storage_class = sc_none;
 	}
 
-      /* __thread.  */
+      /* thread_local.  */
       if (decl_spec_seq_has_spec_p (&declspecs, ds_thread))
 	{
 	  cp_parser_error (parser, "invalid type for instance variable");
@@ -24635,7 +24661,7 @@  cp_parser_objc_struct_declaration (cp_parser *parser)
       declspecs.storage_class = sc_none;
     }
   
-  /* __thread.  */
+  /* thread_local.  */
   if (decl_spec_seq_has_spec_p (&declspecs, ds_thread))
     {
       cp_parser_error (parser, "invalid type for property");
diff --git a/gcc/testsuite/g++.dg/tls/init-2.C b/gcc/testsuite/g++.dg/tls/init-2.C
index c9f646d..327c309 100644
--- a/gcc/testsuite/g++.dg/tls/init-2.C
+++ b/gcc/testsuite/g++.dg/tls/init-2.C
@@ -2,13 +2,13 @@ 
 /* { dg-require-effective-target tls } */
 
 extern __thread int i;
-__thread int *p = &i;	/* { dg-error "dynamically initialized" } */
+__thread int *p = &i;	/* { dg-error "dynamic initialization" } */
 
 extern int f();
-__thread int j = f();	/* { dg-error "dynamically initialized" } */
+__thread int j = f();	/* { dg-error "dynamic initialization" } */
 
 struct S
 {
   S();
 };
-__thread S s;		/* { dg-error "" } two errors here */
+__thread S s;		/* { dg-error "dynamic initialization" } */
diff --git a/gcc/testsuite/g++.dg/tls/thread_local1.C b/gcc/testsuite/g++.dg/tls/thread_local1.C
new file mode 100644
index 0000000..e7734a0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local1.C
@@ -0,0 +1,21 @@ 
+// { dg-options "-std=c++11" }
+// { dg-require-effective-target tls }
+
+// The variable should have a guard.
+// { dg-final { scan-assembler "_ZGVZ1fvE1a" } }
+// But since it's thread local we don't need to guard against
+// simultaneous execution.
+// { dg-final { scan-assembler-not "cxa_guard" } }
+// The guard should be TLS, not local common.
+// { dg-final { scan-assembler-not "\.comm" } }
+
+struct A
+{
+  A();
+};
+
+A &f()
+{
+  thread_local A a;
+  return a;
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local2.C b/gcc/testsuite/g++.dg/tls/thread_local2.C
new file mode 100644
index 0000000..4cbef15
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local2.C
@@ -0,0 +1,27 @@ 
+// { dg-do run }
+// { dg-options "-std=c++11" }
+// { dg-require-effective-target tls_runtime }
+
+extern "C" void abort();
+
+struct A
+{
+  A();
+  int i;
+};
+
+A &f()
+{
+  thread_local A a;
+  return a;
+}
+
+int j;
+A::A(): i(j) { }
+
+int main()
+{
+  j = 42;
+  if (f().i != 42)
+    abort ();
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local7.C b/gcc/testsuite/g++.dg/tls/thread_local7.C
new file mode 100644
index 0000000..77a1c05
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local7.C
@@ -0,0 +1,10 @@ 
+// { dg-options "-std=c++11" }
+// { dg-require-effective-target tls }
+
+// The reference temp should be TLS, not normal data.
+// { dg-final { scan-assembler-not "\\.data" } }
+
+void f()
+{
+  thread_local int&& ir = 42;
+}

commit 18c01be0ec8b7a3cda6a16e86356e8e434c12f89
Author: Jason Merrill <jason@redhat.com>
Date:   Thu Sep 20 16:00:08 2012 -0400

    	Support C++11 thread_local destructors.
    gcc/cp/
    	* decl.c (get_thread_atexit_node): New.
    	(register_dtor_fn): Use it for TLS.
    libstdc++-v3/
    	* libsupc++/cxxabi.h: Declare __cxa_thread_atexit.
    	* libsupc++/atexit_thread.cc: New.
    	* libsupc++/Makefile.am (nested_exception.lo): Add it.
    	* config/abi/pre/gnu.ver: Add __cxa_thread_atexit.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 980aec2..0d04dc0 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6542,6 +6542,24 @@  get_atexit_node (void)
   return atexit_node;
 }
 
+/* Like get_atexit_node, but for thread-local cleanups.  */
+
+static tree
+get_thread_atexit_node (void)
+{
+  /* The declaration for `__cxa_thread_atexit' is:
+
+     int __cxa_atexit (void (*)(void *), void *)  */
+  tree fn_type = build_function_type_list (integer_type_node,
+					   get_atexit_fn_ptr_type (),
+					   ptr_type_node, ptr_type_node,
+					   NULL_TREE);
+
+  /* Now, build the function declaration.  */
+  tree atexit_fndecl = build_library_fn_ptr ("__cxa_thread_atexit", fn_type);
+  return decay_conversion (atexit_fndecl, tf_warning_or_error);
+}
+
 /* Returns the __dso_handle VAR_DECL.  */
 
 static tree
@@ -6633,23 +6651,27 @@  tree
 register_dtor_fn (tree decl)
 {
   tree cleanup;
+  tree addr;
   tree compound_stmt;
   tree fcall;
   tree type;
-  bool use_dtor;
-  tree arg0, arg1 = NULL_TREE, arg2 = NULL_TREE;
+  bool ob_parm, dso_parm, use_dtor;
+  tree arg0, arg1, arg2;
+  tree atex_node;
 
   type = TREE_TYPE (decl);
   if (TYPE_HAS_TRIVIAL_DESTRUCTOR (type))
     return void_zero_node;
 
-  /* If we're using "__cxa_atexit" (or "__aeabi_atexit"), and DECL is
-     a class object, we can just pass the destructor to
-     "__cxa_atexit"; we don't have to build a temporary function to do
-     the cleanup.  */
-  use_dtor = (flag_use_cxa_atexit 
-	      && !targetm.cxx.use_atexit_for_cxa_atexit ()
-	      && CLASS_TYPE_P (type));
+  /* If we're using "__cxa_atexit" (or "__cxa_thread_atexit" or
+     "__aeabi_atexit"), and DECL is a class object, we can just pass the
+     destructor to "__cxa_atexit"; we don't have to build a temporary
+     function to do the cleanup.  */
+  ob_parm = (DECL_THREAD_LOCAL_P (decl)
+	     || (flag_use_cxa_atexit
+		 && !targetm.cxx.use_atexit_for_cxa_atexit ()));
+  dso_parm = ob_parm;
+  use_dtor = ob_parm && CLASS_TYPE_P (type);
   if (use_dtor)
     {
       int idx;
@@ -6687,44 +6709,48 @@  register_dtor_fn (tree decl)
       end_cleanup_fn ();
     }
 
-  if (DECL_THREAD_LOCAL_P (decl))
-    /* We don't have a thread-local atexit yet.  FIXME write one using
-       pthread_key_create and friends.  */
-    sorry ("thread-local variable %q#D with non-trivial "
-	   "destructor", decl);
-
   /* Call atexit with the cleanup function.  */
   mark_used (cleanup);
   cleanup = build_address (cleanup);
-  if (flag_use_cxa_atexit && !targetm.cxx.use_atexit_for_cxa_atexit ())
+
+  if (DECL_THREAD_LOCAL_P (decl))
+    atex_node = get_thread_atexit_node ();
+  else
+    atex_node = get_atexit_node ();
+
+  if (use_dtor)
     {
-      tree addr;
+      /* We must convert CLEANUP to the type that "__cxa_atexit"
+	 expects.  */
+      cleanup = build_nop (get_atexit_fn_ptr_type (), cleanup);
+      /* "__cxa_atexit" will pass the address of DECL to the
+	 cleanup function.  */
+      mark_used (decl);
+      addr = build_address (decl);
+      /* The declared type of the parameter to "__cxa_atexit" is
+	 "void *".  For plain "T*", we could just let the
+	 machinery in cp_build_function_call convert it -- but if the
+	 type is "cv-qualified T *", then we need to convert it
+	 before passing it in, to avoid spurious errors.  */
+      addr = build_nop (ptr_type_node, addr);
+    }
+  else if (ob_parm)
+    /* Since the cleanup functions we build ignore the address
+       they're given, there's no reason to pass the actual address
+       in, and, in general, it's cheaper to pass NULL than any
+       other value.  */
+    addr = null_pointer_node;
+
+  if (dso_parm)
+    arg2 = cp_build_addr_expr (get_dso_handle_node (),
+			       tf_warning_or_error);
+  else
+    arg2 = NULL_TREE;
 
-      if (use_dtor)
-	{
-	  /* We must convert CLEANUP to the type that "__cxa_atexit"
-	     expects.  */
-	  cleanup = build_nop (get_atexit_fn_ptr_type (), cleanup);
-	  /* "__cxa_atexit" will pass the address of DECL to the
-	     cleanup function.  */
-	  mark_used (decl);
-	  addr = build_address (decl);
-	  /* The declared type of the parameter to "__cxa_atexit" is
-	     "void *".  For plain "T*", we could just let the
-	     machinery in cp_build_function_call convert it -- but if the
-	     type is "cv-qualified T *", then we need to convert it
-	     before passing it in, to avoid spurious errors.  */
-	  addr = build_nop (ptr_type_node, addr);
-	}
-      else
-	/* Since the cleanup functions we build ignore the address
-	   they're given, there's no reason to pass the actual address
-	   in, and, in general, it's cheaper to pass NULL than any
-	   other value.  */
-	addr = null_pointer_node;
-      arg2 = cp_build_addr_expr (get_dso_handle_node (),
-				 tf_warning_or_error);
-      if (targetm.cxx.use_aeabi_atexit ())
+  if (ob_parm)
+    {
+      if (!DECL_THREAD_LOCAL_P (decl)
+	  && targetm.cxx.use_aeabi_atexit ())
 	{
 	  arg1 = cleanup;
 	  arg0 = addr;
@@ -6736,8 +6762,11 @@  register_dtor_fn (tree decl)
 	}
     }
   else
-    arg0 = cleanup;
-  return cp_build_function_call_nary (get_atexit_node (), tf_warning_or_error,
+    {
+      arg0 = cleanup;
+      arg1 = NULL_TREE;
+    }
+  return cp_build_function_call_nary (atex_node, tf_warning_or_error,
 				      arg0, arg1, arg2, NULL_TREE);
 }
 
diff --git a/gcc/testsuite/g++.dg/tls/thread_local3.C b/gcc/testsuite/g++.dg/tls/thread_local3.C
new file mode 100644
index 0000000..461f126
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local3.C
@@ -0,0 +1,37 @@ 
+// { dg-do run }
+// { dg-require-effective-target c++11 }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-effective-target pthread }
+// { dg-options -pthread }
+
+int c;
+int d;
+struct A
+{
+  A() { ++c; }
+  ~A() { ++d; }
+};
+
+void f()
+{
+  thread_local A a;
+}
+
+void *thread_main(void *)
+{
+  f(); f(); f();
+}
+
+#include <pthread.h>
+
+int main()
+{
+  pthread_t thread;
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_join(thread, 0);
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_join(thread, 0);
+
+  if (c != 2 || d != 2)
+    __builtin_abort();
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local4.C b/gcc/testsuite/g++.dg/tls/thread_local4.C
new file mode 100644
index 0000000..53b1f05
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local4.C
@@ -0,0 +1,47 @@ 
+// Test for cleanups with pthread_cancel.
+
+// { dg-do run }
+// { dg-require-effective-target c++11 }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-effective-target pthread }
+// { dg-options -pthread }
+
+#include <pthread.h>
+#include <unistd.h>
+
+int c;
+int d;
+struct A
+{
+  A() { ++c; }
+  ~A() { ++d; }
+};
+
+void f()
+{
+  thread_local A a;
+}
+
+void *thread_main(void *)
+{
+  f(); f(); f();
+  while (true)
+    {
+      pthread_testcancel();
+      sleep (1);
+    }
+}
+
+int main()
+{
+  pthread_t thread;
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_cancel(thread);
+  pthread_join(thread, 0);
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_cancel(thread);
+  pthread_join(thread, 0);
+
+   if (c != 2 || d != 2)
+     __builtin_abort();
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local5.C b/gcc/testsuite/g++.dg/tls/thread_local5.C
new file mode 100644
index 0000000..7ce02f6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local5.C
@@ -0,0 +1,47 @@ 
+// Test for cleanups in the main thread, too.
+
+// { dg-do run }
+// { dg-require-effective-target c++11 }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-effective-target pthread }
+// { dg-options -pthread }
+
+#include <pthread.h>
+#include <unistd.h>
+
+int c;
+int d;
+struct A
+{
+  A() { ++c; }
+  ~A() {
+    if (++d == 3)
+      _exit (0);
+  }
+};
+
+void f()
+{
+  thread_local A a;
+}
+
+void *thread_main(void *)
+{
+  f(); f(); f();
+}
+
+int main()
+{
+  pthread_t thread;
+  thread_main(0);
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_join(thread, 0);
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_join(thread, 0);
+
+  // The dtor for a in the main thread is run after main exits, so we
+  // return 1 now and override the return value with _exit above.
+  if (c != 3 || d != 2)
+    __builtin_abort();
+  return 1;
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local6.C b/gcc/testsuite/g++.dg/tls/thread_local6.C
new file mode 100644
index 0000000..118969a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local6.C
@@ -0,0 +1,33 @@ 
+// Test for cleanups in the main thread without -pthread.
+
+// { dg-do run }
+// { dg-options "-std=c++11" }
+// { dg-require-effective-target tls_runtime }
+
+extern "C" void _exit (int);
+
+int c;
+struct A
+{
+  A() { ++c; }
+  ~A() { if (c == 1) _exit(0); }
+};
+
+void f()
+{
+  thread_local A a;
+}
+
+void *thread_main(void *)
+{
+  f(); f(); f();
+}
+
+int main()
+{
+  thread_main(0);
+
+  // The dtor for a in the main thread is run after main exits, so we
+  // return 1 now and override the return value with _exit above.
+  return 1;
+}
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 396feec..e23fdfb 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1531,6 +1531,10 @@  CXXABI_1.3.6 {
 
 } CXXABI_1.3.5;
 
+CXXABI_1.3.7 {
+    __cxa_thread_atexit;
+} CXXABI_1.3.6;
+
 
 # Symbols in the support library (libsupc++) supporting transactional memory.
 CXXABI_TM_1 {
diff --git a/libstdc++-v3/libsupc++/Makefile.am b/libstdc++-v3/libsupc++/Makefile.am
index 69cbf5c..a019bd8 100644
--- a/libstdc++-v3/libsupc++/Makefile.am
+++ b/libstdc++-v3/libsupc++/Makefile.am
@@ -48,6 +48,7 @@  endif
 sources = \
 	array_type_info.cc \
 	atexit_arm.cc \
+	atexit_thread.cc \
 	bad_alloc.cc \
 	bad_cast.cc \
 	bad_typeid.cc \
@@ -123,6 +124,11 @@  guard.lo: guard.cc
 guard.o: guard.cc
 	$(CXXCOMPILE) -std=gnu++0x -c $<
 
+atexit_thread.lo: atexit_thread.cc
+	$(LTCXXCOMPILE) -std=gnu++0x -c $<
+atexit_thread.o: atexit_thread.cc
+	$(CXXCOMPILE) -std=gnu++0x -c $<
+
 nested_exception.lo: nested_exception.cc
 	$(LTCXXCOMPILE) -std=gnu++0x -c $<
 nested_exception.o: nested_exception.cc
diff --git a/libstdc++-v3/libsupc++/Makefile.in b/libstdc++-v3/libsupc++/Makefile.in
index b2af9ba..e745179 100644
--- a/libstdc++-v3/libsupc++/Makefile.in
+++ b/libstdc++-v3/libsupc++/Makefile.in
@@ -90,7 +90,7 @@  am__installdirs = "$(DESTDIR)$(toolexeclibdir)" "$(DESTDIR)$(bitsdir)" \
 	"$(DESTDIR)$(stddir)"
 LTLIBRARIES = $(noinst_LTLIBRARIES) $(toolexeclib_LTLIBRARIES)
 libsupc___la_LIBADD =
-am__objects_1 = array_type_info.lo atexit_arm.lo bad_alloc.lo \
+am__objects_1 = array_type_info.lo atexit_arm.lo atexit_thread.lo bad_alloc.lo \
 	bad_cast.lo bad_typeid.lo class_type_info.lo del_op.lo \
 	del_opnt.lo del_opv.lo del_opvnt.lo dyncast.lo eh_alloc.lo \
 	eh_arm.lo eh_aux_runtime.lo eh_call.lo eh_catch.lo \
@@ -362,6 +362,7 @@  headers = $(std_HEADERS) $(bits_HEADERS)
 sources = \
 	array_type_info.cc \
 	atexit_arm.cc \
+	atexit_thread.cc \
 	bad_alloc.cc \
 	bad_cast.cc \
 	bad_typeid.cc \
@@ -800,6 +801,11 @@  guard.lo: guard.cc
 guard.o: guard.cc
 	$(CXXCOMPILE) -std=gnu++0x -c $<
 
+atexit_thread.lo: atexit_thread.cc
+	$(LTCXXCOMPILE) -std=gnu++0x -c $<
+atexit_thread.o: atexit_thread.cc
+	$(CXXCOMPILE) -std=gnu++0x -c $<
+
 nested_exception.lo: nested_exception.cc
 	$(LTCXXCOMPILE) -std=gnu++0x -c $<
 nested_exception.o: nested_exception.cc
diff --git a/libstdc++-v3/libsupc++/atexit_thread.cc b/libstdc++-v3/libsupc++/atexit_thread.cc
new file mode 100644
index 0000000..517b4ef
--- /dev/null
+++ b/libstdc++-v3/libsupc++/atexit_thread.cc
@@ -0,0 +1,134 @@ 
+// Copyright (C) 2012 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// GCC is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+#include <cxxabi.h>
+#include <new>
+#include "bits/gthr.h"
+
+namespace {
+  // Data structure for the list of destructors: Singly-linked list
+  // of arrays.
+  class list
+  {
+    struct elt
+    {
+      void *object;
+      void (*destructor)(void *);
+    };
+
+    static const int max_nelts = 32;
+
+    list *next;
+    int nelts;
+    elt array[max_nelts];
+
+    elt *allocate_elt();
+  public:
+    void run();
+    static void run(void *p);
+    int add_elt(void (*)(void *), void *);
+  };
+
+  // Return the address of an open slot.
+  list::elt *
+  list::allocate_elt()
+  {
+    if (nelts < max_nelts)
+      return &array[nelts++];
+    if (!next)
+      next = new (std::nothrow) list();
+    if (!next)
+      return 0;
+    return next->allocate_elt();
+  }
+
+  // Run all the cleanups in the list.
+  void
+  list::run()
+  {
+    for (int i = nelts - 1; i >= 0; --i)
+      array[i].destructor (array[i].object);
+    if (next)
+      next->run();
+  }
+
+  // Static version to use as a callback to __gthread_key_create.
+  void
+  list::run(void *p)
+  {
+    static_cast<list *>(p)->run();
+  }
+
+  // The list of cleanups is per-thread.
+  thread_local list first;
+
+  // The pthread data structures for actually running the destructors at
+  // thread exit are shared.  The constructor of the thread-local sentinel
+  // object in add_elt performs the initialization.
+  __gthread_key_t key;
+  __gthread_once_t once = __GTHREAD_ONCE_INIT;
+  void run_current (void *) { first.run(); }
+  void key_init() {
+    __gthread_key_create (&key, list::run);
+    // Also make sure the destructors are run by std::exit.
+    // FIXME TLS cleanups should run before static cleanups and atexit
+    // cleanups.
+    abi::__cxa_atexit (run_current, NULL, NULL);
+  }
+  struct sentinel
+  {
+    sentinel()
+    {
+      if (__gthread_active_p ())
+	{
+	  __gthread_once (&once, key_init);
+	  __gthread_setspecific (key, &first);
+	}
+      else
+	abi::__cxa_atexit (run_current, NULL, NULL);
+    }
+  };
+
+  // Actually insert an element.
+  int
+  list::add_elt(void (*dtor)(void *), void *obj)
+  {
+    thread_local sentinel s;
+    elt *e = allocate_elt ();
+    if (!e)
+      return -1;
+    e->object = obj;
+    e->destructor = dtor;
+    return 0;
+  }
+}
+
+namespace __cxxabiv1
+{
+  extern "C" int
+  __cxa_thread_atexit (void (*dtor)(void *), void *obj, void *dso_handle)
+    _GLIBCXX_NOTHROW
+  {
+    return first.add_elt (dtor, obj);
+  }
+}
diff --git a/libstdc++-v3/libsupc++/cxxabi.h b/libstdc++-v3/libsupc++/cxxabi.h
index b924fc1..582c435 100644
--- a/libstdc++-v3/libsupc++/cxxabi.h
+++ b/libstdc++-v3/libsupc++/cxxabi.h
@@ -134,6 +134,10 @@  namespace __cxxabiv1
   int
   __cxa_finalize(void*);
 
+  // TLS destruction.
+  int
+  __cxa_thread_atexit(void (*)(void*), void*, void *) _GLIBCXX_NOTHROW;
+
   // Pure virtual functions.
   void
   __cxa_pure_virtual(void) __attribute__ ((__noreturn__));

commit c20f7291032d68f0afe249f1d1a67707acab720d
Author: Jason Merrill <jason@redhat.com>
Date:   Thu Sep 27 06:15:22 2012 -0400

    	Allow dynamic initialization of thread_locals.
    gcc/cp/
    	* decl.c: Define tls_aggregates.
    	(expand_static_init): Remove sorry.  Add to tls_aggregates.
    	* cp-tree.h: Declare tls_aggregates.
    	* call.c (set_up_extended_ref_temp): Add to tls_aggregates.
    	* decl2.c (var_needs_tls_wrapper): New.
    	(var_defined_without_dynamic_init): New.
    	(get_tls_init_fn, get_tls_wrapper_fn): New.
    	(generate_tls_wrapper, handle_tls_init): New.
    	(cp_write_global_declarations): Call handle_tls_init and
    	enerate_tls_wrapper.
    	* mangle.c (write_guarded_var_name): Split out from..
    	(mangle_guard_variable): ...here.
    	(mangle_tls_init_fn, mangle_tls_wrapper_fn): Use it.
    	(decl_tls_wrapper_p): New.
    	* semantics.c (finish_id_expression): Replace use of thread_local
    	variable with a call to its wrapper.
    libiberty/
    	* cp-demangle.c (d_special_name, d_dump): Handle TH and TW.
    	(d_make_comp, d_print_comp): Likewise.
    include/
    	* demangle.h (enum demangle_component_type): Add
    	DEMANGLE_COMPONENT_TLS_INIT and DEMANGLE_COMPONENT_TLS_WRAPPER.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 5cc9bad..943d0c0 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8859,8 +8859,14 @@  set_up_extended_ref_temp (tree decl, tree expr, VEC(tree,gc) **cleanups,
     {
       rest_of_decl_compilation (var, /*toplev=*/1, at_eof);
       if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (type))
-	static_aggregates = tree_cons (NULL_TREE, var,
-				       static_aggregates);
+	{
+	  if (DECL_THREAD_LOCAL_P (var))
+	    tls_aggregates = tree_cons (NULL_TREE, var,
+					tls_aggregates);
+	  else
+	    static_aggregates = tree_cons (NULL_TREE, var,
+					   static_aggregates);
+	}
     }
 
   *initp = init;
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 12ad4ed..97b6d51 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4378,6 +4378,8 @@  extern int at_eof;
    in the TREE_VALUE slot and the initializer is stored in the
    TREE_PURPOSE slot.  */
 extern GTY(()) tree static_aggregates;
+/* Likewise, for thread local storage.  */
+extern GTY(()) tree tls_aggregates;
 
 enum overload_flags { NO_SPECIAL = 0, DTOR_FLAG, TYPENAME_FLAG };
 
@@ -5179,6 +5181,7 @@  extern tree cp_build_parm_decl			(tree, tree);
 extern tree get_guard				(tree);
 extern tree get_guard_cond			(tree);
 extern tree set_guard				(tree);
+extern tree get_tls_wrapper_fn			(tree);
 extern void mark_needed				(tree);
 extern bool decl_needed_p			(tree);
 extern void note_vague_linkage_fn		(tree);
@@ -5973,6 +5976,9 @@  extern tree mangle_ctor_vtbl_for_type		(tree, tree);
 extern tree mangle_thunk			(tree, int, tree, tree);
 extern tree mangle_conv_op_name_for_type	(tree);
 extern tree mangle_guard_variable		(tree);
+extern tree mangle_tls_init_fn			(tree);
+extern tree mangle_tls_wrapper_fn		(tree);
+extern bool decl_tls_wrapper_p			(tree);
 extern tree mangle_ref_init_variable		(tree);
 
 /* in dump.c */
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 0d04dc0..246ba9a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -169,6 +169,9 @@  tree global_scope_name;
    in the TREE_PURPOSE slot.  */
 tree static_aggregates;
 
+/* Like static_aggregates, but for thread_local variables.  */
+tree tls_aggregates;
+
 /* -- end of C++ */
 
 /* A node for the integer constant 2.  */
@@ -6805,16 +6808,6 @@  expand_static_init (tree decl, tree init)
       return;
     }
 
-  if (DECL_THREAD_LOCAL_P (decl) && !DECL_FUNCTION_SCOPE_P (decl))
-    {
-      /* We haven't implemented dynamic initialization of non-local
-	 thread-local storage yet.  FIXME transform to singleton
-	 function.  */
-      sorry ("thread-local variable %qD with dynamic initialization outside "
-	     "function scope", decl);
-      return;
-    }
-
   if (DECL_FUNCTION_SCOPE_P (decl))
     {
       /* Emit code to perform this initialization but once.  */
@@ -6943,6 +6936,8 @@  expand_static_init (tree decl, tree init)
 	  finish_if_stmt (if_stmt);
 	}
     }
+  else if (DECL_THREAD_LOCAL_P (decl))
+    tls_aggregates = tree_cons (init, decl, tls_aggregates);
   else
     static_aggregates = tree_cons (init, decl, static_aggregates);
 }
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index a240ff4..869c616 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -2781,6 +2781,187 @@  set_guard (tree guard)
 			       tf_warning_or_error);
 }
 
+/* Returns true iff we can tell that VAR does not have a dynamic
+   initializer.  */
+
+static bool
+var_defined_without_dynamic_init (tree var)
+{
+  /* If it's defined in another TU, we can't tell.  */
+  if (DECL_EXTERNAL (var))
+    return false;
+  /* If it has a non-trivial destructor, registering the destructor
+     counts as dynamic initialization.  */
+  if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (var)))
+    return false;
+  /* If it's in this TU, its initializer has been processed.  */
+  gcc_assert (DECL_INITIALIZED_P (var));
+  /* If it has no initializer or a constant one, it's not dynamic.  */
+  return (!DECL_NONTRIVIALLY_INITIALIZED_P (var)
+	  || DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (var));
+}
+
+/* Returns true iff VAR is a variable that needs uses to be
+   wrapped for possible dynamic initialization.  */
+
+static bool
+var_needs_tls_wrapper (tree var)
+{
+  return (DECL_THREAD_LOCAL_P (var)
+	  && !DECL_GNU_TLS_P (var)
+	  && !DECL_FUNCTION_SCOPE_P (var)
+	  && !var_defined_without_dynamic_init (var));
+}
+
+/* Get a FUNCTION_DECL for the init function for the thread_local
+   variable VAR.  The init function will be an alias to the function
+   that initializes all the non-local TLS variables in the translation
+   unit.  The init function is only used by the wrapper function.  */
+
+static tree
+get_tls_init_fn (tree var)
+{
+  /* Only C++11 TLS vars need this init fn.  */
+  if (!var_needs_tls_wrapper (var))
+    return NULL_TREE;
+
+  tree sname = mangle_tls_init_fn (var);
+  tree fn = IDENTIFIER_GLOBAL_VALUE (sname);
+  if (!fn)
+    {
+      fn = build_lang_decl (FUNCTION_DECL, sname,
+			    build_function_type (void_type_node,
+						 void_list_node));
+      SET_DECL_LANGUAGE (fn, lang_c);
+      TREE_PUBLIC (fn) = TREE_PUBLIC (var);
+      DECL_ARTIFICIAL (fn) = true;
+      DECL_COMDAT (fn) = DECL_COMDAT (var);
+      DECL_EXTERNAL (fn) = true;
+      if (DECL_ONE_ONLY (var))
+	make_decl_one_only (fn, cxx_comdat_group (fn));
+      if (TREE_PUBLIC (var))
+	{
+	  tree obtype = strip_array_types (non_reference (TREE_TYPE (var)));
+	  /* If the variable might have static initialization, make the
+	     init function a weak reference.  */
+	  if ((!TYPE_NEEDS_CONSTRUCTING (obtype)
+	       || TYPE_HAS_CONSTEXPR_CTOR (obtype))
+	      && TARGET_SUPPORTS_WEAK)
+	    declare_weak (fn);
+	  else
+	    DECL_WEAK (fn) = DECL_WEAK (var);
+	}
+      DECL_VISIBILITY (fn) = DECL_VISIBILITY (var);
+      DECL_VISIBILITY_SPECIFIED (fn) = DECL_VISIBILITY_SPECIFIED (var);
+      DECL_DLLIMPORT_P (fn) = DECL_DLLIMPORT_P (var);
+      DECL_IGNORED_P (fn) = 1;
+      mark_used (fn);
+
+      DECL_BEFRIENDING_CLASSES (fn) = var;
+
+      SET_IDENTIFIER_GLOBAL_VALUE (sname, fn);
+    }
+  return fn;
+}
+
+/* Get a FUNCTION_DECL for the init wrapper function for the thread_local
+   variable VAR.  The wrapper function calls the init function (if any) for
+   VAR and then returns a reference to VAR.  The wrapper function is used
+   in place of VAR everywhere VAR is mentioned.  */
+
+tree
+get_tls_wrapper_fn (tree var)
+{
+  /* Only C++11 TLS vars need this wrapper fn.  */
+  if (!var_needs_tls_wrapper (var))
+    return NULL_TREE;
+
+  tree sname = mangle_tls_wrapper_fn (var);
+  tree fn = IDENTIFIER_GLOBAL_VALUE (sname);
+  if (!fn)
+    {
+      /* A named rvalue reference is an lvalue, so the wrapper should
+	 always return an lvalue reference.  */
+      tree type = non_reference (TREE_TYPE (var));
+      type = build_reference_type (type);
+      tree fntype = build_function_type (type, void_list_node);
+      fn = build_lang_decl (FUNCTION_DECL, sname, fntype);
+      SET_DECL_LANGUAGE (fn, lang_c);
+      TREE_PUBLIC (fn) = TREE_PUBLIC (var);
+      DECL_ARTIFICIAL (fn) = true;
+      DECL_IGNORED_P (fn) = 1;
+      /* The wrapper is inline and emitted everywhere var is used.  */
+      DECL_DECLARED_INLINE_P (fn) = true;
+      if (TREE_PUBLIC (var))
+	{
+	  comdat_linkage (fn);
+#ifdef HAVE_GAS_HIDDEN
+	  /* Make the wrapper bind locally; there's no reason to share
+	     the wrapper between multiple shared objects.  */
+	  DECL_VISIBILITY (fn) = VISIBILITY_INTERNAL;
+	  DECL_VISIBILITY_SPECIFIED (fn) = true;
+#endif
+	}
+      if (!TREE_PUBLIC (fn))
+	DECL_INTERFACE_KNOWN (fn) = true;
+      mark_used (fn);
+      note_vague_linkage_fn (fn);
+
+#if 0
+      /* We want CSE to commonize calls to the wrapper, but marking it as
+	 pure is unsafe since it has side-effects.  I guess we need a new
+	 ECF flag even weaker than ECF_PURE.  FIXME!  */
+      DECL_PURE_P (fn) = true;
+#endif
+
+      DECL_BEFRIENDING_CLASSES (fn) = var;
+
+      SET_IDENTIFIER_GLOBAL_VALUE (sname, fn);
+    }
+  return fn;
+}
+
+/* At EOF, generate the definition for the TLS wrapper function FN:
+
+   T& var_wrapper() {
+     if (init_fn) init_fn();
+     return var;
+   }  */
+
+static void
+generate_tls_wrapper (tree fn)
+{
+  tree var = DECL_BEFRIENDING_CLASSES (fn);
+
+  start_preparsed_function (fn, NULL_TREE, SF_DEFAULT | SF_PRE_PARSED);
+  tree body = begin_function_body ();
+  /* Only call the init fn if there might be one.  */
+  if (tree init_fn = get_tls_init_fn (var))
+    {
+      tree if_stmt = NULL_TREE;
+      /* If init_fn is a weakref, make sure it exists before calling.  */
+      if (lookup_attribute ("weak", DECL_ATTRIBUTES (init_fn)))
+	{
+	  if_stmt = begin_if_stmt ();
+	  tree addr = cp_build_addr_expr (init_fn, tf_warning_or_error);
+	  tree cond = cp_build_binary_op (DECL_SOURCE_LOCATION (var),
+					  NE_EXPR, addr, nullptr_node,
+					  tf_warning_or_error);
+	  finish_if_stmt_cond (cond, if_stmt);
+	}
+      finish_expr_stmt (build_cxx_call
+			(init_fn, 0, NULL, tf_warning_or_error));
+      if (if_stmt)
+	{
+	  finish_then_clause (if_stmt);
+	  finish_if_stmt (if_stmt);
+	}
+    }
+  finish_return_stmt (convert_from_reference (var));
+  finish_function_body (body);
+  expand_or_defer_fn (finish_function (0));
+}
+
 /* Start the process of running a particular set of global constructors
    or destructors.  Subroutine of do_[cd]tors.  */
 
@@ -3668,6 +3849,75 @@  clear_decl_external (struct cgraph_node *node, void * /*data*/)
   return false;
 }
 
+/* Build up the function to run dynamic initializers for thread_local
+   variables in this translation unit and alias the init functions for the
+   individual variables to it.  */
+
+static void
+handle_tls_init (void)
+{
+  tree vars = prune_vars_needing_no_initialization (&tls_aggregates);
+  if (vars == NULL_TREE)
+    return;
+
+  location_t loc = DECL_SOURCE_LOCATION (TREE_VALUE (vars));
+
+  #ifndef ASM_OUTPUT_DEF
+  /* This currently requires alias support.  FIXME other targets could use
+     small thunks instead of aliases.  */
+  input_location = loc;
+  sorry ("dynamic initialization of non-function-local thread_local "
+	 "variables not supported on this target");
+  return;
+  #endif
+
+  write_out_vars (vars);
+
+  tree guard = build_decl (loc, VAR_DECL, get_identifier ("__tls_guard"),
+			   boolean_type_node);
+  TREE_PUBLIC (guard) = false;
+  TREE_STATIC (guard) = true;
+  DECL_ARTIFICIAL (guard) = true;
+  DECL_IGNORED_P (guard) = true;
+  TREE_USED (guard) = true;
+  DECL_TLS_MODEL (guard) = decl_default_tls_model (guard);
+  pushdecl_top_level_and_finish (guard, NULL_TREE);
+
+  tree fn = build_lang_decl (FUNCTION_DECL,
+			     get_identifier ("__tls_init"),
+			     build_function_type (void_type_node,
+						  void_list_node));
+  SET_DECL_LANGUAGE (fn, lang_c);
+  TREE_PUBLIC (fn) = false;
+  DECL_ARTIFICIAL (fn) = true;
+  mark_used (fn);
+  start_preparsed_function (fn, NULL_TREE, SF_PRE_PARSED);
+  tree body = begin_function_body ();
+  tree if_stmt = begin_if_stmt ();
+  tree cond = cp_build_unary_op (TRUTH_NOT_EXPR, guard, false,
+				 tf_warning_or_error);
+  finish_if_stmt_cond (cond, if_stmt);
+  finish_expr_stmt (cp_build_modify_expr (guard, NOP_EXPR, boolean_true_node,
+					  tf_warning_or_error));
+  for (; vars; vars = TREE_CHAIN (vars))
+    {
+      tree var = TREE_VALUE (vars);
+      tree init = TREE_PURPOSE (vars);
+      one_static_initialization_or_destruction (var, init, true);
+
+      tree single_init_fn = get_tls_init_fn (var);
+      cgraph_node *alias
+	= cgraph_same_body_alias (cgraph_get_create_node (fn),
+				  single_init_fn, fn);
+      gcc_assert (alias != NULL);
+    }
+
+  finish_then_clause (if_stmt);
+  finish_if_stmt (if_stmt);
+  finish_function_body (body);
+  expand_or_defer_fn (finish_function (0));
+}
+
 /* This routine is called at the end of compilation.
    Its job is to create all the code needed to initialize and
    destroy the global aggregates.  We do the destruction
@@ -3845,6 +4095,9 @@  cp_write_global_declarations (void)
 	  /* ??? was:  locus.line++; */
 	}
 
+      /* Now do the same for thread_local variables.  */
+      handle_tls_init ();
+
       /* Go through the set of inline functions whose bodies have not
 	 been emitted yet.  If out-of-line copies of these functions
 	 are required, emit them.  */
@@ -3869,6 +4122,9 @@  cp_write_global_declarations (void)
 	      reconsider = true;
 	    }
 
+	  if (!DECL_INITIAL (decl) && decl_tls_wrapper_p (decl))
+	    generate_tls_wrapper (decl);
+
 	  if (!DECL_SAVED_TREE (decl))
 	    continue;
 
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 13c658b..163c18e 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -3678,23 +3678,70 @@  mangle_conv_op_name_for_type (const tree type)
   return identifier;
 }
 
-/* Return an identifier for the name of an initialization guard
-   variable for indicated VARIABLE.  */
+/* Write out the appropriate string for this variable when generating
+   another mangled name based on this one.  */
 
-tree
-mangle_guard_variable (const tree variable)
+static void
+write_guarded_var_name (const tree variable)
 {
-  start_mangling (variable);
-  write_string ("_ZGV");
   if (strncmp (IDENTIFIER_POINTER (DECL_NAME (variable)), "_ZGR", 4) == 0)
     /* The name of a guard variable for a reference temporary should refer
        to the reference, not the temporary.  */
     write_string (IDENTIFIER_POINTER (DECL_NAME (variable)) + 4);
   else
     write_name (variable, /*ignore_local_scope=*/0);
+}
+
+/* Return an identifier for the name of an initialization guard
+   variable for indicated VARIABLE.  */
+
+tree
+mangle_guard_variable (const tree variable)
+{
+  start_mangling (variable);
+  write_string ("_ZGV");
+  write_guarded_var_name (variable);
+  return finish_mangling_get_identifier (/*warn=*/false);
+}
+
+/* Return an identifier for the name of a thread_local initialization
+   function for VARIABLE.  */
+
+tree
+mangle_tls_init_fn (const tree variable)
+{
+  start_mangling (variable);
+  write_string ("_ZTH");
+  write_guarded_var_name (variable);
+  return finish_mangling_get_identifier (/*warn=*/false);
+}
+
+/* Return an identifier for the name of a thread_local wrapper
+   function for VARIABLE.  */
+
+#define TLS_WRAPPER_PREFIX "_ZTW"
+
+tree
+mangle_tls_wrapper_fn (const tree variable)
+{
+  start_mangling (variable);
+  write_string (TLS_WRAPPER_PREFIX);
+  write_guarded_var_name (variable);
   return finish_mangling_get_identifier (/*warn=*/false);
 }
 
+/* Return true iff FN is a thread_local wrapper function.  */
+
+bool
+decl_tls_wrapper_p (const tree fn)
+{
+  if (TREE_CODE (fn) != FUNCTION_DECL)
+    return false;
+  tree name = DECL_NAME (fn);
+  return strncmp (IDENTIFIER_POINTER (name), TLS_WRAPPER_PREFIX,
+		  strlen (TLS_WRAPPER_PREFIX)) == 0;
+}
+
 /* Return an identifier for the name of a temporary variable used to
    initialize a static reference.  This isn't part of the ABI, but we might
    as well call them something readable.  */
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 68cbb4b..570ffeb 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3254,7 +3254,17 @@  finish_id_expression (tree id_expression,
 	  *non_integral_constant_expression_p = true;
 	}
 
-      if (scope)
+      tree wrap;
+      if (TREE_CODE (decl) == VAR_DECL
+	  && !cp_unevaluated_operand
+	  && DECL_THREAD_LOCAL_P (decl)
+	  && (wrap = get_tls_wrapper_fn (decl)))
+	{
+	  /* Replace an evaluated use of the thread_local variable with
+	     a call to its wrapper.  */
+	  decl = build_cxx_call (wrap, 0, NULL, tf_warning_or_error);
+	}
+      else if (scope)
 	{
 	  decl = (adjust_result_of_qualified_name_lookup
 		  (decl, scope, current_nonlambda_class_type()));
diff --git a/gcc/testsuite/g++.dg/gomp/tls-5.C b/gcc/testsuite/g++.dg/gomp/tls-5.C
new file mode 100644
index 0000000..74e4faa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/tls-5.C
@@ -0,0 +1,12 @@ 
+// The reference temp should be TLS, not normal data.
+// { dg-require-effective-target c++11 }
+// { dg-final { scan-assembler-not "\\.data" } }
+
+extern int&& ir;
+#pragma omp threadprivate (ir)
+int&& ir = 42;
+
+void f()
+{
+  ir = 24;
+}
diff --git a/gcc/testsuite/g++.dg/gomp/tls-wrap1.C b/gcc/testsuite/g++.dg/gomp/tls-wrap1.C
new file mode 100644
index 0000000..91c9f86
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/tls-wrap1.C
@@ -0,0 +1,13 @@ 
+// If we can see the definition at the use site, we don't need to bother
+// with a wrapper.
+
+// { dg-require-effective-target tls }
+// { dg-final { scan-assembler-not "_ZTW1i" } }
+
+int i = 42;
+#pragma omp threadprivate (i)
+
+int main()
+{
+  return i - 42;
+}
diff --git a/gcc/testsuite/g++.dg/gomp/tls-wrap2.C b/gcc/testsuite/g++.dg/gomp/tls-wrap2.C
new file mode 100644
index 0000000..7aa1371
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/tls-wrap2.C
@@ -0,0 +1,16 @@ 
+// If we can't see the definition at the use site, but it's in this translation
+// unit, we build a wrapper but don't bother with an init function.
+
+// { dg-require-effective-target tls }
+// { dg-final { scan-assembler "_ZTW1i" } }
+// { dg-final { scan-assembler-not "_ZTH1i" } }
+
+extern int i;
+#pragma omp threadprivate (i)
+
+int main()
+{
+  return i - 42;
+}
+
+int i = 42;
diff --git a/gcc/testsuite/g++.dg/gomp/tls-wrap3.C b/gcc/testsuite/g++.dg/gomp/tls-wrap3.C
new file mode 100644
index 0000000..2504d99
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/tls-wrap3.C
@@ -0,0 +1,14 @@ 
+// If we can't see the definition at all, we need to assume there might be
+// an init function.
+
+// { dg-require-effective-target tls }
+// { dg-final { scan-assembler "_ZTW1i" } }
+// { dg-final { scan-assembler "_ZTH1i" } }
+
+extern int i;
+#pragma omp threadprivate (i)
+
+int main()
+{
+  return i - 42;
+}
diff --git a/gcc/testsuite/g++.dg/gomp/tls-wrap4.C b/gcc/testsuite/g++.dg/gomp/tls-wrap4.C
new file mode 100644
index 0000000..1301148
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/tls-wrap4.C
@@ -0,0 +1,13 @@ 
+// We don't need to call the wrapper through the PLT; we can use a separate
+// copy per shared object.
+
+// { dg-require-effective-target tls }
+// { dg-options "-std=c++11 -fPIC" }
+// { dg-final { scan-assembler-not "_ZTW1i@PLT" { target i?86-*-* x86_64-*-* } } }
+
+extern thread_local int i;
+
+int main()
+{
+  return i - 42;
+}
diff --git a/gcc/testsuite/g++.dg/gomp/tls-wrapper-cse.C b/gcc/testsuite/g++.dg/gomp/tls-wrapper-cse.C
new file mode 100644
index 0000000..af2de2f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/tls-wrapper-cse.C
@@ -0,0 +1,18 @@ 
+// Test for CSE of the wrapper function: we should only call it once
+// for the two references to ir.
+// { dg-options "-fopenmp -O -fno-inline" }
+// { dg-require-effective-target tls }
+// { dg-final { scan-assembler-times "call *_ZTW2ir" 1 { xfail *-*-* } } }
+
+// XFAILed until the back end supports a way to mark a function as cseable
+// though not pure.
+
+int f() { return 42; }
+
+int ir = f();
+#pragma omp threadprivate (ir)
+
+int main()
+{
+  return ir + ir - 84;
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local-cse.C b/gcc/testsuite/g++.dg/tls/thread_local-cse.C
new file mode 100644
index 0000000..47c6aed
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local-cse.C
@@ -0,0 +1,20 @@ 
+// Test for CSE of the wrapper function: we should only call it once
+// for the two references to ir.
+// { dg-options "-std=c++11 -O -fno-inline -save-temps" }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-alias }
+// { dg-final { scan-assembler-times "call *_ZTW2ir" 1 { xfail *-*-* } } }
+// { dg-final cleanup-saved-temps }
+// { dg-do run }
+
+// XFAILed until the back end supports a way to mark a function as cseable
+// though not pure.
+
+int f() { return 42; }
+
+thread_local int ir = f();
+
+int main()
+{
+  return ir + ir - 84;
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local-order1.C b/gcc/testsuite/g++.dg/tls/thread_local-order1.C
new file mode 100644
index 0000000..6557e93
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local-order1.C
@@ -0,0 +1,25 @@ 
+// { dg-do run }
+// { dg-options "-std=c++11" }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-alias }
+
+extern "C" void abort();
+extern "C" int printf (const char *, ...);
+#define printf(...)
+
+int c;
+struct A {
+  int i;
+  A(int i): i(i) { printf ("A(%d)\n", i); if (i != c++) abort (); }
+  ~A() { printf("~A(%d)\n", i); if (i != --c) abort(); }
+};
+
+A a0(0);
+thread_local A a1(1);
+thread_local A a2(2);
+A* ap = &a1;
+
+int main()
+{
+  if (c != 3) abort();
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local-order2.C b/gcc/testsuite/g++.dg/tls/thread_local-order2.C
new file mode 100644
index 0000000..eb9c769
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local-order2.C
@@ -0,0 +1,28 @@ 
+// The standard says that a1 should be destroyed before a0 even though
+// that isn't reverse order of construction.  We need to move
+// __cxa_thread_atexit into glibc to get this right.
+
+// { dg-do run { xfail *-*-* } }
+// { dg-options "-std=c++11" }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-alias }
+
+extern "C" void abort();
+extern "C" int printf (const char *, ...);
+#define printf(...)
+
+int c;
+struct A {
+  int i;
+  A(int i): i(i) { printf ("A(%d)\n", i); ++c; }
+  ~A() { printf("~A(%d)\n", i); if (i != --c) abort(); }
+};
+
+thread_local A a1(1);
+A* ap = &a1;
+A a0(0);
+
+int main()
+{
+  if (c != 2) abort();
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local-wrap1.C b/gcc/testsuite/g++.dg/tls/thread_local-wrap1.C
new file mode 100644
index 0000000..56177da
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local-wrap1.C
@@ -0,0 +1,13 @@ 
+// If we can see the definition at the use site, we don't need to bother
+// with a wrapper.
+
+// { dg-require-effective-target tls }
+// { dg-options "-std=c++11" }
+// { dg-final { scan-assembler-not "_ZTW1i" } }
+
+thread_local int i = 42;
+
+int main()
+{
+  return i - 42;
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local-wrap2.C b/gcc/testsuite/g++.dg/tls/thread_local-wrap2.C
new file mode 100644
index 0000000..1e8078f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local-wrap2.C
@@ -0,0 +1,16 @@ 
+// If we can't see the definition at the use site, but it's in this translation
+// unit, we build a wrapper but don't bother with an init function.
+
+// { dg-require-effective-target tls }
+// { dg-options "-std=c++11" }
+// { dg-final { scan-assembler "_ZTW1i" } }
+// { dg-final { scan-assembler-not "_ZTH1i" } }
+
+extern thread_local int i;
+
+int main()
+{
+  return i - 42;
+}
+
+thread_local int i = 42;
diff --git a/gcc/testsuite/g++.dg/tls/thread_local-wrap3.C b/gcc/testsuite/g++.dg/tls/thread_local-wrap3.C
new file mode 100644
index 0000000..19e6ab8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local-wrap3.C
@@ -0,0 +1,14 @@ 
+// If we can't see the definition at all, we need to assume there might be
+// an init function.
+
+// { dg-require-effective-target tls }
+// { dg-options "-std=c++11" }
+// { dg-final { scan-assembler "_ZTW1i" } }
+// { dg-final { scan-assembler "_ZTH1i" } }
+
+extern thread_local int i;
+
+int main()
+{
+  return i - 42;
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local-wrap4.C b/gcc/testsuite/g++.dg/tls/thread_local-wrap4.C
new file mode 100644
index 0000000..1301148
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local-wrap4.C
@@ -0,0 +1,13 @@ 
+// We don't need to call the wrapper through the PLT; we can use a separate
+// copy per shared object.
+
+// { dg-require-effective-target tls }
+// { dg-options "-std=c++11 -fPIC" }
+// { dg-final { scan-assembler-not "_ZTW1i@PLT" { target i?86-*-* x86_64-*-* } } }
+
+extern thread_local int i;
+
+int main()
+{
+  return i - 42;
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local2g.C b/gcc/testsuite/g++.dg/tls/thread_local2g.C
new file mode 100644
index 0000000..36451d2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local2g.C
@@ -0,0 +1,29 @@ 
+// { dg-do run }
+// { dg-options "-std=c++11" }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-alias }
+
+extern "C" void abort();
+
+struct A
+{
+  A();
+  int i;
+};
+
+thread_local A a;
+
+A &f()
+{
+  return a;
+}
+
+int j;
+A::A(): i(j) { }
+
+int main()
+{
+  j = 42;
+  if (f().i != 42)
+    abort ();
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local3g.C b/gcc/testsuite/g++.dg/tls/thread_local3g.C
new file mode 100644
index 0000000..d5e83e8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local3g.C
@@ -0,0 +1,35 @@ 
+// { dg-do run }
+// { dg-require-effective-target c++11 }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-effective-target pthread }
+// { dg-require-alias }
+// { dg-options -pthread }
+
+int c;
+int d;
+struct A
+{
+  A() { ++c; }
+  ~A() { ++d; }
+};
+
+thread_local A a;
+
+void *thread_main(void *)
+{
+  A* ap = &a;
+}
+
+#include <pthread.h>
+
+int main()
+{
+  pthread_t thread;
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_join(thread, 0);
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_join(thread, 0);
+
+  if (c != 2 || d != 2)
+    __builtin_abort();
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local4g.C b/gcc/testsuite/g++.dg/tls/thread_local4g.C
new file mode 100644
index 0000000..574d267
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local4g.C
@@ -0,0 +1,45 @@ 
+// Test for cleanups with pthread_cancel.
+
+// { dg-do run }
+// { dg-require-effective-target c++11 }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-effective-target pthread }
+// { dg-require-alias }
+// { dg-options -pthread }
+
+#include <pthread.h>
+#include <unistd.h>
+
+int c;
+int d;
+struct A
+{
+  A() { ++c; }
+  ~A() { ++d; }
+};
+
+thread_local A a;
+
+void *thread_main(void *)
+{
+  A *ap = &a;
+  while (true)
+    {
+      pthread_testcancel();
+      sleep (1);
+    }
+}
+
+int main()
+{
+  pthread_t thread;
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_cancel(thread);
+  pthread_join(thread, 0);
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_cancel(thread);
+  pthread_join(thread, 0);
+
+   if (c != 2 || d != 2)
+     __builtin_abort();
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local5g.C b/gcc/testsuite/g++.dg/tls/thread_local5g.C
new file mode 100644
index 0000000..badab4f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local5g.C
@@ -0,0 +1,45 @@ 
+// Test for cleanups in the main thread, too.
+
+// { dg-do run }
+// { dg-require-effective-target c++11 }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-effective-target pthread }
+// { dg-require-alias }
+// { dg-options -pthread }
+
+#include <pthread.h>
+#include <unistd.h>
+
+int c;
+int d;
+struct A
+{
+  A() { ++c; }
+  ~A() {
+    if (++d == 3)
+      _exit (0);
+  }
+};
+
+thread_local A a;
+
+void *thread_main(void *)
+{
+  A* ap = &a;
+}
+
+int main()
+{
+  pthread_t thread;
+  thread_main(0);
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_join(thread, 0);
+  pthread_create (&thread, 0, thread_main, 0);
+  pthread_join(thread, 0);
+
+  // The dtor for a in the main thread is run after main exits, so we
+  // return 1 now and override the return value with _exit above.
+  if (c != 3 || d != 2)
+    __builtin_abort();
+  return 1;
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local6g.C b/gcc/testsuite/g++.dg/tls/thread_local6g.C
new file mode 100644
index 0000000..ff8d608
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local6g.C
@@ -0,0 +1,31 @@ 
+// Test for cleanups in the main thread without -pthread.
+
+// { dg-do run }
+// { dg-options "-std=c++11" }
+// { dg-require-effective-target tls_runtime }
+// { dg-require-alias }
+
+extern "C" void _exit (int);
+
+int c;
+struct A
+{
+  A() { ++c; }
+  ~A() { if (c == 1) _exit(0); }
+};
+
+thread_local A a;
+
+void *thread_main(void *)
+{
+  A* ap = &a;
+}
+
+int main()
+{
+  thread_main(0);
+
+  // The dtor for a in the main thread is run after main exits, so we
+  // return 1 now and override the return value with _exit above.
+  return 1;
+}
diff --git a/gcc/testsuite/g++.dg/tls/thread_local7g.C b/gcc/testsuite/g++.dg/tls/thread_local7g.C
new file mode 100644
index 0000000..6960598
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tls/thread_local7g.C
@@ -0,0 +1,13 @@ 
+// { dg-options "-std=c++11" }
+// { dg-require-effective-target tls }
+// { dg-require-alias }
+
+// The reference temp should be TLS, not normal data.
+// { dg-final { scan-assembler-not "\\.data" } }
+
+thread_local int&& ir = 42;
+
+void f()
+{
+  ir = 24;
+}
diff --git a/include/demangle.h b/include/demangle.h
index 34b3ed3..5da79d8 100644
--- a/include/demangle.h
+++ b/include/demangle.h
@@ -272,6 +272,9 @@  enum demangle_component_type
   /* A guard variable.  This has one subtree, the name for which this
      is a guard variable.  */
   DEMANGLE_COMPONENT_GUARD,
+  /* The init and wrapper functions for C++11 thread_local variables.  */
+  DEMANGLE_COMPONENT_TLS_INIT,
+  DEMANGLE_COMPONENT_TLS_WRAPPER,
   /* A reference temporary.  This has one subtree, the name for which
      this is a temporary.  */
   DEMANGLE_COMPONENT_REFTEMP,
diff --git a/libgomp/testsuite/libgomp.c++/tls-init1.C b/libgomp/testsuite/libgomp.c++/tls-init1.C
new file mode 100644
index 0000000..4cbaccb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/tls-init1.C
@@ -0,0 +1,26 @@ 
+extern "C" void abort();
+
+struct A
+{
+  A();
+  int i;
+};
+
+extern A a;
+#pragma omp threadprivate (a)
+A a;
+
+A &f()
+{
+  return a;
+}
+
+int j;
+A::A(): i(j) { }
+
+int main()
+{
+  j = 42;
+  if (f().i != 42)
+    abort ();
+}
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 258aaa7..32df38c 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -696,6 +696,12 @@  d_dump (struct demangle_component *dc, int indent)
     case DEMANGLE_COMPONENT_PACK_EXPANSION:
       printf ("pack expansion\n");
       break;
+    case DEMANGLE_COMPONENT_TLS_INIT:
+      printf ("tls init function\n");
+      break;
+    case DEMANGLE_COMPONENT_TLS_WRAPPER:
+      printf ("tls wrapper function\n");
+      break;
     }
 
   d_dump (d_left (dc), indent + 2);
@@ -832,6 +838,8 @@  d_make_comp (struct d_info *di, enum demangle_component_type type,
     case DEMANGLE_COMPONENT_COVARIANT_THUNK:
     case DEMANGLE_COMPONENT_JAVA_CLASS:
     case DEMANGLE_COMPONENT_GUARD:
+    case DEMANGLE_COMPONENT_TLS_INIT:
+    case DEMANGLE_COMPONENT_TLS_WRAPPER:
     case DEMANGLE_COMPONENT_REFTEMP:
     case DEMANGLE_COMPONENT_HIDDEN_ALIAS:
     case DEMANGLE_COMPONENT_TRANSACTION_CLONE:
@@ -1867,6 +1875,14 @@  d_special_name (struct d_info *di)
 	  return d_make_comp (di, DEMANGLE_COMPONENT_JAVA_CLASS,
 			      cplus_demangle_type (di), NULL);
 
+	case 'H':
+	  return d_make_comp (di, DEMANGLE_COMPONENT_TLS_INIT,
+			      d_name (di), NULL);
+
+	case 'W':
+	  return d_make_comp (di, DEMANGLE_COMPONENT_TLS_WRAPPER,
+			      d_name (di), NULL);
+
 	default:
 	  return NULL;
 	}
@@ -4072,6 +4088,16 @@  d_print_comp (struct d_print_info *dpi, int options,
       d_print_comp (dpi, options, d_left (dc));
       return;
 
+    case DEMANGLE_COMPONENT_TLS_INIT:
+      d_append_string (dpi, "TLS init function for ");
+      d_print_comp (dpi, options, d_left (dc));
+      return;
+
+    case DEMANGLE_COMPONENT_TLS_WRAPPER:
+      d_append_string (dpi, "TLS wrapper function for ");
+      d_print_comp (dpi, options, d_left (dc));
+      return;
+
     case DEMANGLE_COMPONENT_REFTEMP:
       d_append_string (dpi, "reference temporary #");
       d_print_comp (dpi, options, d_right (dc));