Patchwork [RQ,Review] convert emulated TLS so that it works with LTO and WHOPR.

login
register
mail settings
Submitter IainS
Date June 9, 2010, 8:09 p.m.
Message ID <3720F7D1-507D-464F-BA75-5A1932AD903D@sandoe-acoustics.co.uk>
Download mbox | patch
Permalink /patch/55125/
State New
Headers show

Comments

IainS - June 9, 2010, 8:09 p.m.
Emulated TLS is used by a number of gcc targets and, at present, is  
not able to support LTO and WHOPR.

This is because the substitution of the control var for the user one  
is carried out too late to be streamed properly.

The attached patch based on Honza's original suggestions is a "proper"  
solution to PR44132,
(with the currently applied patch recognized as a work-around).

what this does is to move the substitution of __emutls_v.xxxx to every  
place that a variable is finalized or a reference gimplified.

The original "master" or "user" vars are marked as not requiring output.
.. although they are still caught in assemble_variable (and friends)  
just in case someone resets the flag.

Aliases are handled by walking the alias pairs, re-mapping the aliases  
between the original vars so that they now exist between the control  
vars.

There are some small rearrangements in varpool_finalize_decl().
The first is to remove the early calls to  
varpool_assemble_pending_decls() after discussion with Honza on irc.
there is a global call to this in toplev.c
The second is to reorder the
   if (node->needed)
   varpool_enqueue_needed_node (node);

until *after* all the needed checks are made.
AFAICT it's possible for a var to be marked as finalized without being  
queued at present, since the finalized flag causes the routine to exit  
before the enqueing.

I have been using this for a few weeks now on *-darwin{9,10} and  
tested on cris-elf, which is an emutls target with alises (and as a  
NULL test that it makes no difference to i686-pc-linux-gnu).

There is a single regression:
gcc.dg/tls/asm-1.c  - However, I believe that this regression is due  
to gimplify_asm() not considering the possibility of needing to expand  
the arguments (i.e. not specifically a fault in the emu TLS  
implementation).

===

I put a set of test suite additions on the PR referenced, that run  
under the torture conditions and therefore are applied with lto/whopr  
for targets using those.  These extra tests are in my local trees
(in fact, I've also applied the approach of making all TLS content in  
the test- suite applicable to emulated TLS, where appropriate). [see http://gcc.gnu.org/ml/gcc-patches/2010-05/msg00943.html 
, for example].

There is also an issue to resolve with  g++.dg/tls/init-2.C (if that  
is enabled).  Again, this seems to be a problem outside the TLS  
implementation (in that there are two conflicting error messages and  
then a failure when it apparently tries to create a constructor for a  
non-existent var).

====

I would hope that this is reasonably close to applicability, on the  
ground that the general approach was as suggested by Honza.
However, all misunderstanding &| faults with the rendition are  
certainly mine ;)

This is my most complex endeavor so far, so please scrutinize  
carefully and don't bite too hard.

Iain
====

  gcc/Changelog:

	* emults.c (general): Add some comments on the implementation.
	(__emutls_get_address): Adjust memset initial offset.
	(__emutls_register_common): Abort on NULL object pointer (debug code,  
to be removed).

	* expr.c (emutls_var_address): Remove.
	(expand_expr_addr_expr_1): Remove TLS emulation hook.
	(expand_expr_real_1): Ditto.

	* gimplify.c (emutls_var_address): Add proc.
	(gimplify_decl_expr): expand TLS vars.
	(gimplify_var_or_parm_decl): Ditto.
	(omp_notice_variable): Recognize TLS_MODEL_EMULATED.

	* passes.c (rest_of_decl_compilation): Substitute TLS control vars  
for the master.

	* varasm.c (get_emutls_init_templ_addr): Do not reset the caller's  
DECL_INITIAL.
	(get_emutls_init_templ_addr): Copy DECL_PRESERVE_P.  Create the init  
template
	as soon as we see a valid initializer.  Mark the original var with  
error_mark_node so
	that re-initializations will be diagnosed properly.  Mark the  
original var as not to be output.
	(emutls_common_1): Do not rely on DECL_COMMON as a decision on  
whether a constructor
	is needed, it is also required for lcomm.   Ensure that the var is  
laid out before the constructor
	is built.
	(emutls_finalize_control_var): Copy TREE_USED before finalization.   
Assert on
	 !DECL_THREAD_LOCAL_P, debug code to be removed.
	(asm_output_bss): Do not emit TLS vars for emu tls targets (safety  
net, should not be
	required, perhaps an assert would be better).
	(asm_output_aligned_bss): Ditto.
	(get_variable_section): Do not allow lcomm unless the target can  
register common.
	(assemble_variable): Remove template init code.
	Trap and ignore any TLS master vars (should not be needed maybe  
better to assert).
	(do_assemble_alias): Ditto, remove TLS emulation var substitution.
	(finish_aliases_1): Walk alias pairs redirecting aliases between the  
emulated vars, rather
	than their masters.
	
	* varpool.c (varpool_mark_needed_node): Do not handle TLS  
substitution here.
	(decide_is_variable_needed): Or here.
	(varpool_finalize_decl): Handle TLS substitution.
	Remove early calls to varpool_assemble_pending_decls().
	Check enqueuing of vars after all tests for need are complete.
	(varpool_analyze_pending_decls): Do not record references if the  
initializer is error_mark.
Jakub Jelinek - June 9, 2010, 8:22 p.m.
On Wed, Jun 09, 2010 at 09:09:03PM +0100, IainS wrote:
>  gcc/Changelog:
> 
> 	* emults.c (general): Add some comments on the implementation.

Typo in filename?

>        arr->size = size;
> -      memset (arr->data + orig_size, 0,
> +      memset (arr->data + (orig_size * sizeof (void *)), 0,
>  	      (size - orig_size) * sizeof (void *));
>        __gthread_setspecific (emutls_key, (void *) arr);
>      }

This looks wrong.  arr->data is an array of pointers (to pointers),
so arr->data + orig_size is the same as
((char *)arr->data) + (orig_size * sizeof (void *))
What you now use is
((char *)arr->data) + (orig_size * sizeof (void *) * sizeof (void *))

>  void
>  __emutls_register_common (struct __emutls_object *obj,
>  			  word size, word align, void *templ)
>  {
> +  if (!obj)
> +    abort ();

Why?  obj->size will crash too, and gcc shouldn't be calling
__emutls_register_common with NULL anyway.

	Jakub
IainS - June 9, 2010, 9:04 p.m.
On 9 Jun 2010, at 21:22, Jakub Jelinek wrote:

> On Wed, Jun 09, 2010 at 09:09:03PM +0100, IainS wrote:
>> gcc/Changelog:
>>
>> 	* emults.c (general): Add some comments on the implementation.
>
> Typo in filename?

yeah, I do that a lot with emutls.
Were the implementation comments otherwise OK?
(happy to remove them if you like, they were notes for me as I worked  
through it).

>>       arr->size = size;
>> -      memset (arr->data + orig_size, 0,
>> +      memset (arr->data + (orig_size * sizeof (void *)), 0,
>> 	      (size - orig_size) * sizeof (void *));
>>       __gthread_setspecific (emutls_key, (void *) arr);
>>     }
>
> This looks wrong.  arr->data is an array of pointers (to pointers),
> so arr->data + orig_size is the same as
> ((char *)arr->data) + (orig_size * sizeof (void *))
> What you now use is
> ((char *)arr->data) + (orig_size * sizeof (void *) * sizeof (void *))

it is wrong, looked at it a dozen times and misread every time, mea  
culpa.

>> void
>> __emutls_register_common (struct __emutls_object *obj,
>> 			  word size, word align, void *templ)
>> {
>> +  if (!obj)
>> +    abort ();
>
> Why?  obj->size will crash too, and gcc shouldn't be calling
> __emutls_register_common with NULL anyway.

as I noted, this is debug code, to be removed.  I was making changes  
that might have caused a fault here.
FWIW abort gives me a stack trace without having to crank up gdb.

thanks,
Iain
Dave Korn - June 10, 2010, 12:49 a.m.
On 09/06/2010 21:22, Jakub Jelinek wrote:
> On Wed, Jun 09, 2010 at 09:09:03PM +0100, IainS wrote:

>>  void
>>  __emutls_register_common (struct __emutls_object *obj,
>>  			  word size, word align, void *templ)
>>  {
>> +  if (!obj)
>> +    abort ();
> 
> Why?  obj->size will crash too, and gcc shouldn't be calling
> __emutls_register_common with NULL anyway.

  Should be a gcc_assert then, shouldn't it?

> +/*
> +  DECL_PRESERVE_P (to) = 1;*/
> +  

  Commented-out or #if-0'd code should be removed before committing.

> +  TREE_USED (to) = TREE_USED (decl);
> +
>    TREE_STATIC (to) = TREE_STATIC (decl);
> -  TREE_USED (to) = TREE_USED (decl);
> -  TREE_PUBLIC (to) = TREE_PUBLIC (decl);
>    DECL_EXTERNAL (to) = DECL_EXTERNAL (decl);
>    DECL_COMMON (to) = DECL_COMMON (decl);
> +
> +  TREE_PUBLIC (to) = TREE_PUBLIC (decl);

  Should avoid non-functional shuffling of existing code.

> +  if (!targetm.have_tls 
> +	&& (TREE_CODE (decl) == VAR_DECL)
> +	&& DECL_THREAD_LOCAL_P (decl))
> +	/* We should never get here.  */
> +    return ;

  If you really should never get there, abort or ICE, don't return.

    cheers,
      DaveK
IainS - June 10, 2010, 9:47 a.m.
Hi Dave,

other comments absorbed pending an update if the solution is deemed  
acceptable.

On 10 Jun 2010, at 01:49, Dave Korn wrote:

>> +  TREE_USED (to) = TREE_USED (decl);
>> +
>>   TREE_STATIC (to) = TREE_STATIC (decl);
>> -  TREE_USED (to) = TREE_USED (decl);
>> -  TREE_PUBLIC (to) = TREE_PUBLIC (decl);
>>   DECL_EXTERNAL (to) = DECL_EXTERNAL (decl);
>>   DECL_COMMON (to) = DECL_COMMON (decl);
>> +
>> +  TREE_PUBLIC (to) = TREE_PUBLIC (decl);
>
>  Should avoid non-functional shuffling of existing code.

Well, I'm happy to go with the status quo.
However, I have a point of view that collecting together items that  
are related (even though it makes no functional change to the code's  
behavior) does make a benefit in longer term maintenance.

Thanks for reviewing,
Iain
Dave Korn - June 10, 2010, 2:17 p.m.
On 10/06/2010 10:47, IainS wrote:

> However, I have a point of view that collecting together items that are
> related (even though it makes no functional change to the code's
> behavior) does make a benefit in longer term maintenance.

  Yeh, what you say is perfectly reasonable; it's just that I suspect that if
we did that all the time, different people might have different ideas about
what items are "related" and what ordering might seem "obvious", and we'd end
up with a load of back-and-forth churn.  Either way, it's small beer.

    cheers,
      DaveK

Patch

Index: gcc/emutls.c
===================================================================
--- gcc/emutls.c	(revision 160497)
+++ gcc/emutls.c	(working copy)
@@ -43,12 +43,21 @@  struct __emutls_object
   void *templ;
 };
 
+/* Each thread has an array of pointers to the allocated indexed vars.  
+   This is indexed by the id number written back into the 'master' 
+   instance of the variable in the relevant data area of the executable.  */
 struct __emutls_array
 {
   pointer size;
   void **data[];
 };
 
+/* Each var is represented by a malloc'd block which is laid out thus:
+| padder | blockptr | var .................................. |
+^                   ^
+blockptr            *data[i]  (arranged to be ALIGNed per var requirement).
+'padder' is unused, blockptr can be found at at *data[i][-1].  */ 
+
 void *__emutls_get_address (struct __emutls_object *);
 void __emutls_register_common (struct __emutls_object *, word, word, void *);
 
@@ -127,6 +136,9 @@  __emutls_get_address (struct __emutls_object *obj)
 {
   if (! __gthread_active_p ())
     {
+      /* No threading, just allocate an object and overwrite the offset
+	 field in the master instance with the address of the allocated
+	 block.  */
       if (__builtin_expect (obj->loc.ptr == NULL, 0))
 	obj->loc.ptr = emutls_alloc (obj);
       return obj->loc.ptr;
@@ -137,6 +149,9 @@  __emutls_get_address (struct __emutls_object *obj)
 #else
   pointer offset = obj->loc.offset;
 
+  /* If the offset is zero, this is the first time we've seen this var.
+     The very first time through we set up the emutls_key and bind it 
+     to the destructor.  */
   if (__builtin_expect (offset == 0, 0))
     {
       static __gthread_once_t once = __GTHREAD_ONCE_INIT;
@@ -171,7 +186,7 @@  __emutls_get_address (struct __emutls_object *obj)
       if (arr == NULL)
 	abort ();
       arr->size = size;
-      memset (arr->data + orig_size, 0,
+      memset (arr->data + (orig_size * sizeof (void *)), 0,
 	      (size - orig_size) * sizeof (void *));
       __gthread_setspecific (emutls_key, (void *) arr);
     }
@@ -186,10 +201,16 @@  __emutls_get_address (struct __emutls_object *obj)
 #endif
 }
 
+/* We allow for repeated initialization of common templates.
+   In this case, if either the size or the alignment grow in a given
+   transaction, we increase the definition to accommodate.
+*/
 void
 __emutls_register_common (struct __emutls_object *obj,
 			  word size, word align, void *templ)
 {
+  if (!obj)
+    abort ();
   if (obj->size < size)
     {
       obj->size = size;
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 160497)
+++ gcc/expr.c	(working copy)
@@ -6768,21 +6768,7 @@  highest_pow2_factor_for_target (const_tree target,
 
   return MAX (factor, talign);
 }
-
-/* Return &VAR expression for emulated thread local VAR.  */
 
-static tree
-emutls_var_address (tree var)
-{
-  tree emuvar = emutls_decl (var);
-  tree fn = built_in_decls [BUILT_IN_EMUTLS_GET_ADDRESS];
-  tree arg = build_fold_addr_expr_with_type (emuvar, ptr_type_node);
-  tree arglist = build_tree_list (NULL_TREE, arg);
-  tree call = build_function_call_expr (UNKNOWN_LOCATION, fn, arglist);
-  return fold_convert (build_pointer_type (TREE_TYPE (var)), call);
-}
-
-
 /* Subroutine of expand_expr.  Expand the two operands of a binary
    expression EXP0 and EXP1 placing the results in OP0 and OP1.
    The value may be stored in TARGET if TARGET is nonzero.  The
@@ -6876,17 +6862,6 @@  expand_expr_addr_expr_1 (tree exp, rtx target, enu
       break;
 
     case VAR_DECL:
-      /* TLS emulation hook - replace __thread VAR's &VAR with
-	 __emutls_get_address (&_emutls.VAR).  */
-      if (! targetm.have_tls
-	  && TREE_CODE (exp) == VAR_DECL
-	  && DECL_THREAD_LOCAL_P (exp))
-	{
-	  exp = emutls_var_address (exp);
-	  return expand_expr (exp, target, tmode, modifier);
-	}
-      /* Fall through.  */
-
     default:
       /* If the object is a DECL, then expand it for its rtl.  Don't bypass
 	 expand_expr, as that can have various side effects; LABEL_DECLs for
@@ -8416,16 +8391,6 @@  expand_expr_real_1 (tree exp, rtx target, enum mac
 	  && (TREE_STATIC (exp) || DECL_EXTERNAL (exp)))
 	layout_decl (exp, 0);
 
-      /* TLS emulation hook - replace __thread vars with
-	 *__emutls_get_address (&_emutls.var).  */
-      if (! targetm.have_tls
-	  && TREE_CODE (exp) == VAR_DECL
-	  && DECL_THREAD_LOCAL_P (exp))
-	{
-	  exp = build_fold_indirect_ref_loc (loc, emutls_var_address (exp));
-	  return expand_expr_real_1 (exp, target, tmode, modifier, NULL);
-	}
-
       /* ... fall through ...  */
 
     case FUNCTION_DECL:
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 160497)
+++ gcc/gimplify.c	(working copy)
@@ -1343,7 +1343,19 @@  gimplify_vla_decl (tree decl, gimple_seq *seq_p)
   gimplify_ctxp->save_stack = true;
 }
 
+/* Return &VAR expression for emulated thread local VAR.  */
 
+static tree
+emutls_var_address (tree var)
+{
+  tree emuvar = emutls_decl (var);
+  tree fn = built_in_decls [BUILT_IN_EMUTLS_GET_ADDRESS];
+  tree arg = build_fold_addr_expr_with_type (emuvar, ptr_type_node);
+  tree arglist = build_tree_list (NULL_TREE, arg);
+  tree call = build_function_call_expr (UNKNOWN_LOCATION, fn, arglist);
+  return fold_convert (build_pointer_type (TREE_TYPE (var)), call);
+}
+
 /* Gimplifies a DECL_EXPR node *STMT_P by making any necessary allocation
    and initialization explicit.  */
 
@@ -1358,6 +1370,18 @@  gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_
   if (TREE_TYPE (decl) == error_mark_node)
     return GS_ERROR;
 
+  /* TLS emulation hook - replace __thread VAR's &VAR with
+     __emutls_get_address (&_emutls.VAR). We then ignore the original
+     var.  */
+  if (! targetm.have_tls
+      && TREE_CODE (decl) == VAR_DECL
+      && DECL_THREAD_LOCAL_P (decl))
+    {
+      stmt = build_fold_indirect_ref (emutls_var_address (decl));
+      gimplify_and_add (stmt, seq_p);
+      return GS_ALL_DONE;
+    }
+
   if ((TREE_CODE (decl) == TYPE_DECL
        || TREE_CODE (decl) == VAR_DECL)
       && !TYPE_SIZES_GIMPLIFIED (TREE_TYPE (decl)))
@@ -1877,6 +1901,17 @@  gimplify_var_or_parm_decl (tree *expr_p)
       return GS_ERROR;
     }
 
+  /* TLS emulation hook - replace __thread VAR's &VAR with
+     __emutls_get_address (&_emutls.VAR).  */
+  if (! targetm.have_tls
+      && TREE_CODE (decl) == VAR_DECL
+      && DECL_THREAD_LOCAL_P (decl))
+    {
+      gcc_assert (!DECL_HAS_VALUE_EXPR_P (decl)) ;
+      *expr_p = build_fold_indirect_ref (emutls_var_address (decl));
+      return GS_OK ;
+    }
+
   /* When within an OpenMP context, notice uses of variables.  */
   if (gimplify_omp_ctxp && omp_notice_variable (gimplify_omp_ctxp, decl, true))
     return GS_ALL_DONE;
@@ -5553,14 +5588,15 @@  omp_notice_variable (struct gimplify_omp_ctx *ctx,
   /* Threadprivate variables are predetermined.  */
   if (is_global_var (decl))
     {
-      if (DECL_THREAD_LOCAL_P (decl))
+      if ((DECL_TLS_MODEL (decl) >= TLS_MODEL_EMULATED))
 	return omp_notice_threadprivate_variable (ctx, decl, NULL_TREE);
 
       if (DECL_HAS_VALUE_EXPR_P (decl))
 	{
 	  tree value = get_base_address (DECL_VALUE_EXPR (decl));
 
-	  if (value && DECL_P (value) && DECL_THREAD_LOCAL_P (value))
+	  if (value && DECL_P (value) && 
+	      (DECL_TLS_MODEL (value) >= TLS_MODEL_EMULATED))
 	    return omp_notice_threadprivate_variable (ctx, decl, value);
 	}
     }
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 160497)
+++ gcc/passes.c	(working copy)
@@ -152,6 +152,16 @@  rest_of_decl_compilation (tree decl,
 			  int top_level,
 			  int at_end)
 {
+  if (! targetm.have_tls 
+	&& !in_lto_p 
+	&& TREE_CODE (decl) == VAR_DECL
+	&& DECL_THREAD_LOCAL_P (decl))
+    {
+      /* Substitute the control var. for the user one.  */
+      rest_of_decl_compilation (emutls_decl (decl), top_level, at_end);
+      return;
+    }
+
   /* We deferred calling assemble_alias so that we could collect
      other attributes such as visibility.  Emit the alias now.  */
   {
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 160497)
+++ gcc/varasm.c	(working copy)
@@ -316,14 +316,18 @@  get_emutls_init_templ_addr (tree decl)
   SET_DECL_ASSEMBLER_NAME (to, DECL_NAME (to));
 
   DECL_ARTIFICIAL (to) = 1;
-  TREE_USED (to) = TREE_USED (decl);
   TREE_READONLY (to) = 1;
   DECL_IGNORED_P (to) = 1;
+/*
+  DECL_PRESERVE_P (to) = 1;*/
+  
   DECL_CONTEXT (to) = DECL_CONTEXT (decl);
   DECL_SECTION_NAME (to) = DECL_SECTION_NAME (decl);
-  DECL_PRESERVE_P (to) = DECL_PRESERVE_P (decl);
 
+  TREE_USED (to) = TREE_USED (decl);
+
   DECL_WEAK (to) = DECL_WEAK (decl);
+  
   if (DECL_ONE_ONLY (decl))
     {
       make_decl_one_only (to, DECL_ASSEMBLER_NAME (to));
@@ -336,9 +340,9 @@  get_emutls_init_templ_addr (tree decl)
 
   DECL_VISIBILITY_SPECIFIED (to) = DECL_VISIBILITY_SPECIFIED (decl);
   DECL_INITIAL (to) = DECL_INITIAL (decl);
-  DECL_INITIAL (decl) = NULL;
 
   varpool_finalize_decl (to);
+
   return build_fold_addr_expr (to);
 }
 
@@ -387,8 +391,7 @@  emutls_decl (tree decl)
       DECL_TLS_MODEL (to) = TLS_MODEL_EMULATED;
       DECL_ARTIFICIAL (to) = 1;
       DECL_IGNORED_P (to) = 1;
-      /* FIXME: work around PR44132.  */
-      DECL_PRESERVE_P (to) = 1;
+
       TREE_READONLY (to) = 0;
       SET_DECL_ASSEMBLER_NAME (to, DECL_NAME (to));
       if (DECL_ONE_ONLY (decl))
@@ -409,21 +412,46 @@  emutls_decl (tree decl)
   DECL_DLLIMPORT_P (to) = DECL_DLLIMPORT_P (decl);
   DECL_ATTRIBUTES (to) = targetm.merge_decl_attributes (decl, to);
 
+  TREE_USED (to) = TREE_USED (decl);
+
   TREE_STATIC (to) = TREE_STATIC (decl);
-  TREE_USED (to) = TREE_USED (decl);
-  TREE_PUBLIC (to) = TREE_PUBLIC (decl);
   DECL_EXTERNAL (to) = DECL_EXTERNAL (decl);
   DECL_COMMON (to) = DECL_COMMON (decl);
+
+  TREE_PUBLIC (to) = TREE_PUBLIC (decl);
   DECL_WEAK (to) = DECL_WEAK (decl);
   DECL_VISIBILITY (to) = DECL_VISIBILITY (decl);
   DECL_VISIBILITY_SPECIFIED (to) = DECL_VISIBILITY_SPECIFIED (decl);
+  DECL_PRESERVE_P (to) = DECL_PRESERVE_P (decl);
   
   /* Fortran might pass this to us.  */
   DECL_RESTRICTED_P (to) = DECL_RESTRICTED_P (decl);
+  
+  /* As soon as we see an initializer (and providing one is not already
+     present) we can setup the init. template.  */
+  if (!DECL_INITIAL (to) && 
+      DECL_INITIAL (decl) && (DECL_INITIAL (decl) != error_mark_node)) 
+    if (! DECL_EXTERNAL (to) && ! DECL_COMMON (to))
+      {
+	DECL_INITIAL (to) = targetm.emutls.var_init
+		(to, decl, get_emutls_init_templ_addr (decl));
 
+	/* Make sure the template is marked as needed early enough.
+	   Without this, if the variable is placed in a
+	   section-anchored block, the template will only be marked
+	   when it's too late.*/
+	record_references_in_initializer (to, false);
+	/* We leave this marked, so that any attempt at duplicate
+	   or re-initialization will give the appropriate error.  */
+	DECL_INITIAL (decl) = error_mark_node ;
+      }
+  /* Say we are not interested in emitting this Var.  */
+  TREE_ASM_WRITTEN (decl) = 1;
   return to;
 }
 
+/* Add static constructors for emutls vars, where required.  */
+
 static int
 emutls_common_1 (void **loc, void *xstmts)
 {
@@ -431,17 +459,21 @@  emutls_common_1 (void **loc, void *xstmts)
   tree args, x, *pstmts = (tree *) xstmts;
   tree word_type_node;
 
-  if (! DECL_COMMON (h->base.from)
-      || (DECL_INITIAL (h->base.from)
-	  && DECL_INITIAL (h->base.from) != error_mark_node))
+  /* ??? This is probably not an adequate test, however it is also
+     not enough to check DECL_COMMON because variables might be 
+     allocated in other unititialized sections.  */
+  if (DECL_INITIAL (h->to))
     return 1;
+  
+  /* Refuse to build a zero-sized item, first make sure there's
+     a valid layout.  */
+  if (DECL_SIZE (h->base.from) == 0)
+    layout_decl (h->base.from, 0);
+	
+  gcc_assert (DECL_SIZE (h->base.from) != 0);
 
   word_type_node = lang_hooks.types.type_for_mode (word_mode, 1);
 
-  /* The idea was to call get_emutls_init_templ_addr here, but if we
-     do this and there is an initializer, -fanchor_section loses,
-     because it would be too late to ensure the template is
-     output.  */
   x = null_pointer_node;
   args = tree_cons (NULL, x, NULL);
   x = build_int_cst (word_type_node, DECL_ALIGN_UNIT (h->base.from));
@@ -468,6 +500,9 @@  emutls_finalize_control_var (void **loc,
   if (h != NULL) 
     {
       struct varpool_node *node = varpool_node (h->to);
+      gcc_assert (! DECL_THREAD_LOCAL_P (h->to));
+      if (TREE_USED (h->base.from)) 
+	TREE_USED (h->to) = 1;
       /* Because varpool_finalize_decl () has side-effects,
          only apply to un-finalized vars.  */
       if (node && !node->finalized) 
@@ -791,6 +826,10 @@  asm_output_bss (FILE *file, tree decl ATTRIBUTE_UN
   gcc_assert (strcmp (XSTR (XEXP (DECL_RTL (decl), 0), 0), name) == 0);
   targetm.asm_out.globalize_decl_name (file, decl);
   switch_to_section (bss_section);
+  /* We don't emit the userland vars for emulated TLS, just the control.  */
+  if (! targetm.have_tls && TREE_CODE (decl) == VAR_DECL
+	&& DECL_THREAD_LOCAL_P (decl))
+    return ;
 #ifdef ASM_DECLARE_OBJECT_NAME
   last_assemble_variable_decl = decl;
   ASM_DECLARE_OBJECT_NAME (file, name, decl);
@@ -817,6 +856,10 @@  asm_output_aligned_bss (FILE *file, tree decl ATTR
 {
   switch_to_section (bss_section);
   ASM_OUTPUT_ALIGN (file, floor_log2 (align / BITS_PER_UNIT));
+  /* We don't emit the userland vars for emulated TLS, just the control.  */
+  if (! targetm.have_tls && TREE_CODE (decl) == VAR_DECL
+	&& DECL_THREAD_LOCAL_P (decl))
+    return ;
 #ifdef ASM_DECLARE_OBJECT_NAME
   last_assemble_variable_decl = decl;
   ASM_DECLARE_OBJECT_NAME (file, name, decl);
@@ -1232,7 +1278,12 @@  get_variable_section (tree decl, bool prefer_noswi
   if (IN_NAMED_SECTION (decl))
     return get_named_section (decl, NULL, reloc);
 
-  if (ADDR_SPACE_GENERIC_P (as)
+  /* This cannot be lcomm for an emulated TLS object, without
+     a register_common hook.  */
+  if (DECL_TLS_MODEL (decl) == TLS_MODEL_EMULATED
+      && !targetm.emutls.register_common)
+    ;
+  else if (ADDR_SPACE_GENERIC_P (as)
       && !DECL_THREAD_LOCAL_P (decl)
       && !(prefer_noswitch_p && targetm.have_switchable_bss_sections)
       && bss_initializer_p (decl))
@@ -2152,35 +2203,6 @@  assemble_variable (tree decl, int top_level ATTRIB
   rtx decl_rtl, symbol;
   section *sect;
 
-  if (! targetm.have_tls
-      && TREE_CODE (decl) == VAR_DECL
-      && DECL_THREAD_LOCAL_P (decl))
-    {
-      tree to = emutls_decl (decl);
-
-      /* If this variable is defined locally, then we need to initialize the
-         control structure with size and alignment information.  We do this
-	 at the last moment because tentative definitions can take a locally
-	 defined but uninitialized variable and initialize it later, which
-	 would result in incorrect contents.  */
-      if (! DECL_EXTERNAL (to)
-	  && (! DECL_COMMON (to)
-	      || (DECL_INITIAL (decl)
-		  && DECL_INITIAL (decl) != error_mark_node)))
-	{
-	  DECL_INITIAL (to) = targetm.emutls.var_init
-	    (to, decl, get_emutls_init_templ_addr (decl));
-
-	  /* Make sure the template is marked as needed early enough.
-	     Without this, if the variable is placed in a
-	     section-anchored block, the template will only be marked
-	     when it's too late.  */
-	  record_references_in_initializer (to, false);
-	}
-
-      decl = to;
-    }
-
   last_assemble_variable_decl = 0;
 
   /* Normally no need to say anything here for external references,
@@ -2196,6 +2218,14 @@  assemble_variable (tree decl, int top_level ATTRIB
   if (TREE_CODE (decl) == FUNCTION_DECL)
     return;
 
+  /* The first declaration of a variable that comes through this function
+     decides whether it is global (in C, has external linkage)
+     or local (in C, has internal linkage).  So do nothing more
+     if this function has already run.  */
+
+  if (TREE_ASM_WRITTEN (decl))
+    return;
+
   /* Do nothing for global register variables.  */
   if (DECL_RTL_SET_P (decl) && REG_P (DECL_RTL (decl)))
     {
@@ -2219,14 +2249,6 @@  assemble_variable (tree decl, int top_level ATTRIB
       return;
     }
 
-  /* The first declaration of a variable that comes through this function
-     decides whether it is global (in C, has external linkage)
-     or local (in C, has internal linkage).  So do nothing more
-     if this function has already run.  */
-
-  if (TREE_ASM_WRITTEN (decl))
-    return;
-
   /* Make sure targetm.encode_section_info is invoked before we set
      ASM_WRITTEN.  */
   decl_rtl = DECL_RTL (decl);
@@ -2237,6 +2259,13 @@  assemble_variable (tree decl, int top_level ATTRIB
   if (flag_syntax_only)
     return;
 
+  /* We don't emit the userland vars for emulated TLS - they should never
+     get to here, only the control vars should be emitted.  */
+  if (! targetm.have_tls
+	&& TREE_CODE (decl) == VAR_DECL
+	&& DECL_THREAD_LOCAL_P (decl))
+    return ;
+
   if (! dont_output_data
       && ! host_integerp (DECL_SIZE_UNIT (decl), 1))
     {
@@ -5705,18 +5734,16 @@  do_assemble_alias (tree decl, tree target)
   TREE_ASM_WRITTEN (decl) = 1;
   TREE_ASM_WRITTEN (DECL_ASSEMBLER_NAME (decl)) = 1;
 
+  if (!targetm.have_tls 
+	&& (TREE_CODE (decl) == VAR_DECL)
+	&& DECL_THREAD_LOCAL_P (decl))
+	/* We should never get here.  */
+    return ;
+
   if (lookup_attribute ("weakref", DECL_ATTRIBUTES (decl)))
     {
       ultimate_transparent_alias_target (&target);
 
-      if (!targetm.have_tls
-	  && TREE_CODE (decl) == VAR_DECL
-	  && DECL_THREAD_LOCAL_P (decl))
-	{
-	  decl = emutls_decl (decl);
-	  target = get_emutls_object_name (target);
-	}
-
       if (!TREE_SYMBOL_REFERENCED (target))
 	weakref_targets = tree_cons (decl, target, weakref_targets);
 
@@ -5735,14 +5762,6 @@  do_assemble_alias (tree decl, tree target)
       return;
     }
 
-  if (!targetm.have_tls
-      && TREE_CODE (decl) == VAR_DECL
-      && DECL_THREAD_LOCAL_P (decl))
-    {
-      decl = emutls_decl (decl);
-      target = get_emutls_object_name (target);
-    }
-
 #ifdef ASM_OUTPUT_DEF
   /* Make name accessible from other files, if appropriate.  */
 
@@ -5824,6 +5843,15 @@  remove_unreachable_alias_pairs (void)
     }
 }
 
+/* Lookup the decl for a symbol, either in the varpool or cgraph.  */
+static tree
+var_decl_for_asm (tree symbol)
+{
+  struct varpool_node *vnode = varpool_node_for_asm  (symbol);
+  if (vnode) 
+    return vnode->decl;
+  return NULL;
+}
 
 /* First pass of completing pending aliases.  Make sure that cgraph knows
    which symbols will be required.  */
@@ -5836,8 +5864,55 @@  finish_aliases_1 (void)
 
   for (i = 0; VEC_iterate (alias_pair, alias_pairs, i, p); i++)
     {
-      tree target_decl;
+      tree target_decl=NULL;
 
+      /* When emulated TLS is in effect, redirect aliases so that they 
+         are registered between the control vars.  */
+      if (!targetm.have_tls 
+          && TREE_CODE (p->decl) == VAR_DECL
+          && (DECL_TLS_MODEL (p->decl) >= TLS_MODEL_EMULATED))
+	{
+	  tree tsym = p->target ;
+	  target_decl = var_decl_for_asm (tsym) ;
+	  if (!target_decl) 
+	    {
+	      /* If we didn't find the user's symbol, it could
+	         be because the alias really refers to a control 
+	         var.  */
+	      tsym = get_emutls_object_name (p->target);
+	      target_decl = var_decl_for_asm (tsym);
+	    }
+	  if (target_decl) 
+	    {
+	      struct varpool_node *vnode;
+	      /* If it hasn't been done already, substitute the control
+	         var for the original.  */
+	      if (DECL_THREAD_LOCAL_P (p->decl))
+		p->decl = emutls_decl (p->decl);
+	      /* If not TLS target, we've made a mistake.  */
+	      if (DECL_TLS_MODEL (target_decl) < TLS_MODEL_EMULATED)
+		error ("TLS symbol %q+D aliased to non-TLS symbol %qE",
+			p->decl, p->target);
+	      /* If it's the original we need to substitute the contol.  */
+	      else if (DECL_THREAD_LOCAL_P (target_decl))
+		{
+		  target_decl = emutls_decl (target_decl);
+		  tsym = get_emutls_object_name (p->target);
+		}
+	      /* else it's already the emulation control.  */
+	      /* Mark the var needed.  */
+	      vnode = varpool_node (target_decl);
+	      if (vnode) 
+	        {
+		  varpool_mark_needed_node (vnode);
+		  vnode->force_output = 1;
+	        }
+	      p->target = tsym;
+	    }
+	  /* Else we didn't find a decl for the symbol, which is an error
+	     unless there's a weak ref.  */
+	} 
+      else
       target_decl = find_decl_and_mark_needed (p->decl, p->target);
       if (target_decl == NULL)
 	{
Index: gcc/varpool.c
===================================================================
--- gcc/varpool.c	(revision 160497)
+++ gcc/varpool.c	(working copy)
@@ -312,6 +312,14 @@  varpool_mark_needed_node (struct varpool_node *nod
       && !TREE_ASM_WRITTEN (node->decl))
     varpool_enqueue_needed_node (node);
   node->needed = 1;
+  /* If we need the var, and it's an emulated TLS entity, that
+     means we need the control var.  */
+  if (!targetm.have_tls && DECL_THREAD_LOCAL_P (node->decl))
+    {
+      struct varpool_node *cv_node;
+      cv_node = varpool_node (emutls_decl (node->decl)) ;
+      varpool_mark_needed_node (cv_node);
+    }
 }
 
 /* Reset the queue of needed nodes.  */
@@ -346,17 +354,6 @@  decide_is_variable_needed (struct varpool_node *no
       && !DECL_EXTERNAL (decl))
     return true;
 
-  /* When emulating tls, we actually see references to the control
-     variable, rather than the user-level variable.  */
-  if (!targetm.have_tls
-      && TREE_CODE (decl) == VAR_DECL
-      && DECL_THREAD_LOCAL_P (decl))
-    {
-      tree control = emutls_decl (decl);
-      if (decide_is_variable_needed (varpool_node (control), control))
-	return true;
-    }
-
   /* When not reordering top level variables, we have to assume that
      we are going to keep everything.  */
   if (flag_toplevel_reorder)
@@ -381,14 +378,19 @@  varpool_finalize_decl (tree decl)
      or local (in C, has internal linkage).  So do nothing more
      if this function has already run.  */
   if (node->finalized)
+      return;
+
+  /* For emulated TLS vars, if we are in a position to finalize the userland
+     var, then we should be able to finalize the control var too.  */
+  if (!targetm.have_tls && decl 
+      && TREE_CODE (decl) == VAR_DECL
+      && DECL_THREAD_LOCAL_P (decl))
     {
-      if (cgraph_global_info_ready)
-	varpool_assemble_pending_decls ();
+      varpool_finalize_decl (emutls_decl (decl)) ;
+      node->finalized = true;    
       return;
     }
-  if (node->needed)
-    varpool_enqueue_needed_node (node);
-  node->finalized = true;
+
   if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl))
     node->force_output = true;
 
@@ -399,8 +401,11 @@  varpool_finalize_decl (tree decl)
      there.  */
   else if (TREE_PUBLIC (decl) && !DECL_COMDAT (decl) && !DECL_EXTERNAL (decl))
     varpool_mark_needed_node (node);
-  if (cgraph_global_info_ready)
-    varpool_assemble_pending_decls ();
+
+  if (node->needed)
+    varpool_enqueue_needed_node (node);
+
+  node->finalized = true;
 }
 
 /* Return variable availability.  See cgraph.h for description of individual
@@ -449,7 +454,7 @@  varpool_analyze_pending_decls (void)
 	     already informed about increased alignment.  */
           align_variable (decl, 0);
 	}
-      if (DECL_INITIAL (decl))
+      if (DECL_INITIAL (decl) && (DECL_INITIAL (decl) != error_mark_node))
 	record_references_in_initializer (decl, analyzed);
       if (node->same_comdat_group)
 	{