Patchwork [updated] Make emulated TLS lto-friendly.

login
register
mail settings
Submitter IainS
Date July 3, 2010, 2:10 p.m.
Message ID <ED394B6D-943B-4EC5-8ABF-B540166AF6D8@sandoe-acoustics.co.uk>
Download mbox | patch
Permalink /patch/57808/
State New
Headers show

Comments

IainS - July 3, 2010, 2:10 p.m.
This is an update of a patch originally posted and discussed in this  
thread:
http://gcc.gnu.org/ml/gcc-patches/2010-06/threads.html#00995

I guess it needs a global approver - so cc-ing Diego who, IIRC,  
approved the previous EMUTLS changes.

changes from the previous thread.
(a) emutls.c is no longer modified in any way.
(b) I have taken on board the comments about not rearranging code.
(c) the cases where we should not have emutls variables are now asserts.

tested on {i686,powerpc*}-apple-darwin{8,9},  x86_64-apple-darwin10 -  
EMUTLS target without aliases.
    {i686,x86_64}unknown-linux-gnu - native TLS targets (i.e. for a  
null response)
   cris-elf(sim)  - EMUTLS target with aliases
  + armel-linux-eabi(compile-only), mipsabi64-elf(sim), s390x(sompile  
only).

The test-suite additions are motivated by:
(a) the fact that initialization of the emults vars was a weakness  
w.r.t. optimization,
(b) to include the test that is used as the configury check for TLS  
compliance.


OK for trunk now? (and eventually 4.5)?

Iain

gcc/Changelog (email addresses omitted)
	
	Iain Sandoe
	Jan Hubika

	PR target/44132
	* expr.c (emutls_var_address): Remove.
	(expand_expr_addr_expr_1): Remove TLS emulation hook.
	(expand_expr_real_1): Ditto.

	* gimplify.c (emutls_var_address): Add proc.
	(gimplify_decl_expr): expand TLS vars.
	(gimplify_var_or_parm_decl): Ditto.
	(omp_notice_variable): Recognize TLS_MODEL_EMULATED.

	* passes.c (rest_of_decl_compilation): Substitute TLS control vars  
for the master.

	* varasm.c (get_emutls_init_templ_addr): Do not reset the caller's  
DECL_INITIAL.
	(emutls_decl): copy DECL_PRESERVE_P instead of setting it, copy
	TREE_ADDRESSABLE.  Create the init template as soon as we see a valid  
initializer.
	 Mark the original var with error_mark_node so that re- 
initializations will be diagnosed
	properly.  Mark the original var as not to be output.
	(emutls_common_1): Do not rely on DECL_COMMON as a decision on  
whether a constructor
	is needed, it is also required for lcomm.   Ensure that the var is  
laid out before the constructor
	is built.
	(emutls_finalize_control_var): Assert on incorrect type, copy USED.
	(asm_output_bss): Assert on DECL_THREAD_LOCAL_P for EMUTLS targets.
	(asm_output_aligned_bss): Likewise.
	(assemble_variable): Remove TLS var. substitution.  Back up  
TREE_ASM_WRITTEN so that
	assemble_variable() has no side-effects for unwritten vars.  Assert  
on DECL_THREAD_LOCAL_P
	for EMUTLS targets.
	(do_assemble_alias): Assert on DECL_THREAD_LOCAL_P for EMUTLS targets.
	(var_decl_for_asm): New.
	(finish_aliases_1): Walk alias pairs substituting tls controls for  
originals.

	* passes.c (rest_of_decl_compilation): Substitute TLS control vars  
for the master.

	* varpool.c (varpool_mark_needed_node): Do not handle TLS  
substitution here.
	(decide_is_variable_needed): Or here.
	(varpool_finalize_decl): Handle TLS substitution.
	Remove early calls to varpool_assemble_pending_decls().
	Check enqueuing of vars after all tests for need are complete.
	(varpool_analyze_pending_decls): Do not record references if the  
initializer is error_mark.

testsuite:
	
	PR target/44132
	* gcc.dg/tls/thr-init-1.c: New.
	* gcc.dg/tls/thr-init-2.c: New.
	* gcc.dg/torture/tls New.
	* gcc.dg/torture/tls/tls-test.c: New.
	* gcc.dg/torture/tls/thr-init-1.c: New.
	* gcc.dg/torture/tls/tls.exp: New.
	* gcc.dg/torture/tls/thr-init-2.c: New.


Index: gcc/testsuite/gcc.dg/tls/thr-init-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tls/thr-init-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tls/thr-init-1.c	(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-require-effective-target tls } */
+/* { dg-do compile } */
+
+static __thread int fstat ;
+static __thread int fstat = 1 ;
+static __thread int fstat ;
+static __thread int fstat = 2; /* { dg-error "redefinition of 'fstat'" } */
+				/* { dg-message "note: previous definition of 'fstat' was here" "" { target *-*-* } 5 } */
Index: gcc/testsuite/gcc.dg/tls/thr-init-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tls/thr-init-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tls/thr-init-2.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-require-effective-target tls } */
+/* { dg-do run } */
+
+extern void abort() ;
+
+static __thread int fstat ;
+static __thread int fstat = 1;
+
+int test_code(int b)
+{
+  fstat += b ;
+  return fstat;
+}
+
+int main (int ac, char *av[])
+{
+  int a = test_code(1);
+  
+  if ((a != 2) || (fstat != 2))
+    abort () ;
+  
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/tls/tls-test.c
===================================================================
--- gcc/testsuite/gcc.dg/torture/tls/tls-test.c	(revision 0)
+++ gcc/testsuite/gcc.dg/torture/tls/tls-test.c	(revision 0)
@@ -0,0 +1,51 @@
+/* { dg-do run }  */
+/* { dg-require-effective-target tls  }  */
+/* { dg-require-effective-target pthread } */
+
+#include <pthread.h>
+extern int printf (char *,...);
+__thread int a = 5; 
+int *volatile a_in_other_thread = (int *) 12345;
+
+static void *
+thread_func (void *arg)
+{
+  a_in_other_thread = &a;
+  a+=5;
+  *((int *) arg) = a;
+  return (void *)0;
+}
+
+int
+main ()
+{
+  pthread_t thread;
+  void *thread_retval;
+  int *volatile a_in_main_thread;
+  int *volatile again ;
+  int thr_a;
+
+  a_in_main_thread = &a;
+
+  if (pthread_create (&thread, (pthread_attr_t *)0, thread_func, &thr_a))
+    return 0;
+
+  if (pthread_join (thread, &thread_retval))
+    return 0;
+
+  again = &a;
+  if (again != a_in_main_thread)
+    {
+      printf ("FAIL: main thread addy changed from 0x%0x to 0x%0x\n", 
+		a_in_other_thread, again);
+      return 1;
+    }
+
+  if (a != 5 || thr_a != 10 || (a_in_other_thread == a_in_main_thread))
+    {
+      printf ("FAIL: a= %d, thr_a = %d Addr = 0x%0x\n", 
+		a, thr_a, a_in_other_thread);
+      return 1;
+    }
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/tls/thr-init-1.c
===================================================================
--- gcc/testsuite/gcc.dg/torture/tls/thr-init-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/torture/tls/thr-init-1.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls } */
+
+extern int printf (char *,...);
+extern void abort() ;
+
+int test_code(int b)
+{
+static __thread int fstat = 1;
+  fstat += b ;
+  return fstat;
+}
+
+int main (int ac, char *av[])
+{
+  int a = test_code(1);
+  
+  if ( a != 2 )
+    {
+      printf ("a=%d\n", a) ;
+      abort ();
+    }
+  
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/tls/tls.exp
===================================================================
--- gcc/testsuite/gcc.dg/torture/tls/tls.exp	(revision 0)
+++ gcc/testsuite/gcc.dg/torture/tls/tls.exp	(revision 0)
@@ -0,0 +1,36 @@
+#   Copyright (C) 2010 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \
+        $DEFAULT_CFLAGS
+
+# All done.
+dg-finish
Index: gcc/testsuite/gcc.dg/torture/tls/thr-init-2.c
===================================================================
--- gcc/testsuite/gcc.dg/torture/tls/thr-init-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/torture/tls/thr-init-2.c	(revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls } */
+
+extern int printf (char *,...);
+extern void abort() ;
+
+static __thread int fstat ;
+static __thread int fstat = 1;
+static __thread int fstat ;
+
+int test_code(int b)
+{
+  fstat += b ;
+  return fstat;
+}
+
+int main (int ac, char *av[])
+{
+  int a = test_code(1);
+  
+  if ( a != 2 || fstat != 2 )
+    {
+    printf ("a=%d fstat=%d\n", a, fstat) ;
+    abort ();
+    }
+  
+  return 0;
+}
Richard Henderson - July 7, 2010, 11:22 p.m.
On 07/03/2010 07:10 AM, IainS wrote:
> +      return GS_OK ;

s/ ;/;/g

There are at least 2 occurrences.

> +      if ((DECL_TLS_MODEL (decl) >= TLS_MODEL_EMULATED))

Extra ().

Possibly better written as != TLS_MODEL_NONE as well.

> +	  if (value && DECL_P (value) && 
> +	      (DECL_TLS_MODEL (value) >= TLS_MODEL_EMULATED))

> +  if (!DECL_INITIAL (to) && 
> +      DECL_INITIAL (decl) && (DECL_INITIAL (decl) != error_mark_node)) 
> +    if (! DECL_EXTERNAL (to) && ! DECL_COMMON (to))

Coding style -- && should be on the new line.
Extra ().
Merge the unnecessarily nested ifs.

> +  /* ??? It is not enough to check DECL_COMMON because variables might be 
> +     allocated in other uninitialized sections.  However, this might still
> +     not be an adequate test.  */
> +  if (DECL_INITIAL (h->to))

Huh?  Why is DECL_COMMON not sufficient?

> +  /* Refuse to build a zero-sized item, first make sure there's
> +     a valid layout.  */
> +  if (DECL_SIZE (h->base.from) == 0)
> +    layout_decl (h->base.from, 0);

Really?  What is a testcase for which the decl is not layed out?
That really sounds like a bug elsewhere.

> +  gcc_assert (DECL_SIZE (h->base.from) != 0);

I'm pretty sure you can declare zero sized objects as a gcc extension.

$ cat z.c
__thread int x[0] __attribute__((common));
$ cat z.s
	.file	"z.c"
	.tls_common	x,0,4
	.ident	"GCC: (GNU) 4.4.4 20100630 (Red Hat 4.4.4-10)"
	.section	.note.GNU-stack,"",@progbits

So here's a test case that will trigger that abort.

> +  /* We don't emit the userland vars for emulated TLS - they should never
> +     get to here, only the control vars should be emitted.  */
> +  gcc_assert (targetm.have_tls  
> +	      || (TREE_CODE (decl) != VAR_DECL)
> +	      || !DECL_THREAD_LOCAL_P (decl));

Extra ().  I'm sure there are more, but I won't comment again.
This pattern occurs many places.  Pull this out to a function.

> +  /* If we need the var, and it's an emulated TLS entity, that
> +     means we need the control var.  */
> +  if (!targetm.have_tls && DECL_THREAD_LOCAL_P (node->decl))
> +    {
> +      struct varpool_node *cv_node;
> +      cv_node = varpool_node (emutls_decl (node->decl)) ;
> +      varpool_mark_needed_node (cv_node);
> +    }

Do you really need this kind of thing anymore, since you're exposing
the use of the control variable so early?  I would have thought that
varpool.c would no longer need any special-casing for !have_tls.

Finally, as a follow-up, I think the alias oracle needs to be taught what 
__builtin_emutls_get_address does.  Namely, return the address of the
associated TLS variable.  This is the only way I can think of to avoid
the optimization penalty that you're introducing because of exposing the
control variable and the function call at this point.


r~
IainS - July 8, 2010, 8:42 a.m.
Hello Richard,
Thanks for the review and sorry about the typographical nits...
I'll re-work the other things and investigate the alias oracle.

one question tho:

On 8 Jul 2010, at 00:22, Richard Henderson wrote:

>> +  /* ??? It is not enough to check DECL_COMMON because variables  
>> might be
>> +     allocated in other uninitialized sections.  However, this  
>> might still
>> +     not be an adequate test.  */
>> +  if (DECL_INITIAL (h->to))
>
> Huh?  Why is DECL_COMMON not sufficient?

int foo (void)
{
   static __thread int a;
   return a;
}

places __emutls_v.a.1700  in .lcomm and therefore needs an  
initializer, but the decl is not marked DECL_COMMON.
do you believe that indicates a bug elsewhere, or is it a reasonable  
explanation?

thanks
Iain
Richard Guenther - July 8, 2010, 9:25 a.m.
On Thu, Jul 8, 2010 at 1:22 AM, Richard Henderson <rth@redhat.com> wrote:

> Finally, as a follow-up, I think the alias oracle needs to be taught what
> __builtin_emutls_get_address does.  Namely, return the address of the
> associated TLS variable.  This is the only way I can think of to avoid
> the optimization penalty that you're introducing because of exposing the
> control variable and the function call at this point.

While teaching the oracle about __builtin_emutls_get_address will
help disambiguating against emutls accesses it will not be enough
to trigger CSE.  So it would be nice to have a testcase that shows
we can properly CSE emutls accesses in FRE (in theory it shuld
work as __builtin_emutls_get_address is const).

Richard.

>
>
> r~
>
Jakub Jelinek - July 8, 2010, 9:34 a.m.
On Thu, Jul 08, 2010 at 09:42:16AM +0100, IainS wrote:
> >Huh?  Why is DECL_COMMON not sufficient?
> 
> int foo (void)
> {
>   static __thread int a;
>   return a;
> }

This testcase should be optimized into just
int foo (void)
{
  return 0;
}
with no TLS vars at all, at least it is with non-emulated TLS.
Please make sure it doesn't regress for emulated TLS with your patch.

	Jakub
IainS - July 8, 2010, noon
On 8 Jul 2010, at 10:34, Jakub Jelinek wrote:

> On Thu, Jul 08, 2010 at 09:42:16AM +0100, IainS wrote:
>>> Huh?  Why is DECL_COMMON not sufficient?
>>
>> int foo (void)
>> {
>>  static __thread int a;
>>  return a;
>> }
>
> This testcase should be optimized into just
> int foo (void)
> {
>  return 0;
> }
> with no TLS vars at all, at least it is with non-emulated TLS.
> Please make sure it doesn't regress for emulated TLS with your patch.

Well, this wasn't the purpose of the example, O0 is sufficient for that.

However, it *does* regress on this test, so I've got more work to do  
there.
Thanks for catching that.

Iain.
Richard Henderson - July 8, 2010, 4:14 p.m.
On 07/08/2010 01:42 AM, IainS wrote:
>> Huh?  Why is DECL_COMMON not sufficient?
> 
> int foo (void)
> {
>   static __thread int a;
>   return a;
> }
> 
> places __emutls_v.a.1700  in .lcomm and therefore needs an initializer,
> but the decl is not marked DECL_COMMON.
> do you believe that indicates a bug elsewhere, or is it a reasonable
> explanation?

The purpose of __emutls_register_common is to merge COMMON block sizes,
as would normally be done in the linker.  The variable A is not a COMMON
variable (as you note by DECL_COMMON not being set).  No initialization
is needed.

You've simply been confused about .lcomm for zero-initialization of the
control variable and true COMMON block variables.


r~
IainS - July 8, 2010, 4:20 p.m.
On 8 Jul 2010, at 17:14, Richard Henderson wrote:

> On 07/08/2010 01:42 AM, IainS wrote:
>>> Huh?  Why is DECL_COMMON not sufficient?
>>
>> int foo (void)
>> {
>>  static __thread int a;
>>  return a;
>> }
>>
>> places __emutls_v.a.1700  in .lcomm and therefore needs an  
>> initializer,
>> but the decl is not marked DECL_COMMON.
>> do you believe that indicates a bug elsewhere, or is it a reasonable
>> explanation?
>
> The purpose of __emutls_register_common is to merge COMMON block  
> sizes,
> as would normally be done in the linker.  The variable A is not a  
> COMMON
> variable (as you note by DECL_COMMON not being set).  No  
> initialization
> is needed.
>
> You've simply been confused about .lcomm for zero-initialization of  
> the
> control variable and true COMMON block variables.

I'm sure it's confusion on my part - but :

__emutls_v.a
has size and align fields which cannot be 0.

so, either it should not be placed into .lcomm (I infer this is  
probably the case from your response), or these fields must be  
initialized by the emutls_register_common hook.

Iain
Richard Henderson - July 8, 2010, 5:04 p.m.
On 07/08/2010 09:20 AM, IainS wrote:
> I'm sure it's confusion on my part - but :
> 
> __emutls_v.a
> has size and align fields which cannot be 0.
> 
> so, either it should not be placed into .lcomm (I infer this is probably
> the case from your response), or these fields must be initialized by the
> emutls_register_common hook.

Err, that's true.  In this case I'd expect __emutls_v.a to *not*
reside in .lcomm and not be initialized by the runtime.  Certainly
that's what happens with current HEAD.


r~
Richard Henderson - July 8, 2010, 5:10 p.m.
On 07/08/2010 05:00 AM, IainS wrote:
> On 8 Jul 2010, at 10:34, Jakub Jelinek wrote:
>> with no TLS vars at all, at least it is with non-emulated TLS.
>> Please make sure it doesn't regress for emulated TLS with your patch.
> 
> Well, this wasn't the purpose of the example, O0 is sufficient for that.
> 
> However, it *does* regress on this test, so I've got more work to do there.
> Thanks for catching that.

I very much doubt you'll be able to fix this (minor) regression.

The only reasonable way to prevent that is to *not* lower the TLS
variable so early, so that the optimizers see DECLs where they 
expect to see DECLs.  And that goal appears to be at odds with
making emulated tls LTO friendly.


r~
Jan Hubicka - July 8, 2010, 5:20 p.m.
> On 07/08/2010 05:00 AM, IainS wrote:
> > On 8 Jul 2010, at 10:34, Jakub Jelinek wrote:
> >> with no TLS vars at all, at least it is with non-emulated TLS.
> >> Please make sure it doesn't regress for emulated TLS with your patch.
> > 
> > Well, this wasn't the purpose of the example, O0 is sufficient for that.
> > 
> > However, it *does* regress on this test, so I've got more work to do there.
> > Thanks for catching that.
> 
> I very much doubt you'll be able to fix this (minor) regression.
> 
> The only reasonable way to prevent that is to *not* lower the TLS
> variable so early, so that the optimizers see DECLs where they 
> expect to see DECLs.  And that goal appears to be at odds with
> making emulated tls LTO friendly.

There is option to have emultls lowering pass somewhere at the end of early
optimization queue.  I am not however sure it is worth a special purpose pass.
Not lowering TLS is also leading to code quality problems, for instance inliner
will not take into account htat functions using TLS are rather expensive etc.

Honza
> 
> 
> r~
Richard Henderson - July 8, 2010, 5:33 p.m.
On 07/08/2010 10:20 AM, Jan Hubicka wrote:
> There is option to have emultls lowering pass somewhere at the end of early
> optimization queue.  I am not however sure it is worth a special purpose pass.

I'm not sure it's worth it either.  I can't think of any interesting early
optimizations that doesn't work equally well with the const function call.

> Not lowering TLS is also leading to code quality problems, for instance inliner
> will not take into account htat functions using TLS are rather expensive etc.

Well, *that* should simply be a matter of adjusting the costs.  Because
we'll never be able to lower real TLS -- the code sequences required for
linker optimization are extremely specific, padding prefixes and all.


r~
IainS - July 8, 2010, 7:07 p.m.
On 8 Jul 2010, at 00:22, Richard Henderson wrote:

>
>> +  /* If we need the var, and it's an emulated TLS entity, that
>> +     means we need the control var.  */
>> +  if (!targetm.have_tls && DECL_THREAD_LOCAL_P (node->decl))
>> +    {
>> +      struct varpool_node *cv_node;
>> +      cv_node = varpool_node (emutls_decl (node->decl)) ;
>> +      varpool_mark_needed_node (cv_node);
>> +    }
>
> Do you really need this kind of thing anymore, since you're exposing
> the use of the control variable so early?  I would have thought that
> varpool.c would no longer need any special-casing for !have_tls.

this is my understanding (which might be flawed &| incomplete).

Whilst we are on the parse side - and building varpool & cgraph, the  
relationship is not fully exposed (that happens when gimplication is  
done).

So my reasoning was that the "ghost/proxy" vars should be made to  
track the user-land ones until then.

It would be possible to put all the checks for 'used-ness' and  
finalization in emutls_finish() but it seemed more elegant to do so in  
the places I put it.

I'm happy either way ..

Iain
Richard Henderson - July 8, 2010, 7:19 p.m.
On 07/08/2010 12:07 PM, IainS wrote:
> 
> On 8 Jul 2010, at 00:22, Richard Henderson wrote:
>> Do you really need this kind of thing anymore, since you're exposing
>> the use of the control variable so early?  I would have thought that
>> varpool.c would no longer need any special-casing for !have_tls.
> 
> this is my understanding (which might be flawed &| incomplete).
> 
> Whilst we are on the parse side - and building varpool & cgraph, the
> relationship is not fully exposed (that happens when gimplication is done).
> 
> So my reasoning was that the "ghost/proxy" vars should be made to track
> the user-land ones until then.

Hmm.  So what you're saying is that there's extra cleanup work that
needs to happen while lowering the representation.  It's not merely
a matter of code substitution during gimplification.

In which case I wonder if it wouldn't be better to do as Honza
suggested and separate all of this out into a new pass_lower_emutls.
Perhaps to be placed just after pass_lower_vector.  That placement
is before pass_build_cgraph_edges, which I believe means you would
not have to fix up the cgraph edges for the new function calls.  All
you'd need to do is transform the tls variable references and fix up
the varpool references.

Honza, thoughts here?


r~

Patch

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 161773)
+++ gcc/expr.c	(working copy)
@@ -6862,21 +6862,7 @@  highest_pow2_factor_for_target (const_tree target,
 
   return MAX (factor, talign);
 }
-
-/* Return &VAR expression for emulated thread local VAR.  */
 
-static tree
-emutls_var_address (tree var)
-{
-  tree emuvar = emutls_decl (var);
-  tree fn = built_in_decls [BUILT_IN_EMUTLS_GET_ADDRESS];
-  tree arg = build_fold_addr_expr_with_type (emuvar, ptr_type_node);
-  tree arglist = build_tree_list (NULL_TREE, arg);
-  tree call = build_function_call_expr (UNKNOWN_LOCATION, fn, arglist);
-  return fold_convert (build_pointer_type (TREE_TYPE (var)), call);
-}
-
-
 /* Subroutine of expand_expr.  Expand the two operands of a binary
    expression EXP0 and EXP1 placing the results in OP0 and OP1.
    The value may be stored in TARGET if TARGET is nonzero.  The
@@ -6980,17 +6966,6 @@  expand_expr_addr_expr_1 (tree exp, rtx target, enu
       break;
 
     case VAR_DECL:
-      /* TLS emulation hook - replace __thread VAR's &VAR with
-	 __emutls_get_address (&_emutls.VAR).  */
-      if (! targetm.have_tls
-	  && TREE_CODE (exp) == VAR_DECL
-	  && DECL_THREAD_LOCAL_P (exp))
-	{
-	  exp = emutls_var_address (exp);
-	  return expand_expr (exp, target, tmode, modifier);
-	}
-      /* Fall through.  */
-
     default:
       /* If the object is a DECL, then expand it for its rtl.  Don't bypass
 	 expand_expr, as that can have various side effects; LABEL_DECLs for
@@ -8428,16 +8403,6 @@  expand_expr_real_1 (tree exp, rtx target, enum mac
 	  && (TREE_STATIC (exp) || DECL_EXTERNAL (exp)))
 	layout_decl (exp, 0);
 
-      /* TLS emulation hook - replace __thread vars with
-	 *__emutls_get_address (&_emutls.var).  */
-      if (! targetm.have_tls
-	  && TREE_CODE (exp) == VAR_DECL
-	  && DECL_THREAD_LOCAL_P (exp))
-	{
-	  exp = build_fold_indirect_ref_loc (loc, emutls_var_address (exp));
-	  return expand_expr_real_1 (exp, target, tmode, modifier, NULL);
-	}
-
       /* ... fall through ...  */
 
     case FUNCTION_DECL:
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 161773)
+++ gcc/gimplify.c	(working copy)
@@ -1341,7 +1341,19 @@  gimplify_vla_decl (tree decl, gimple_seq *seq_p)
   gimplify_ctxp->save_stack = true;
 }
 
+/* Return &VAR expression for emulated thread local VAR.  */
 
+static tree
+emutls_var_address (tree var)
+{
+  tree emuvar = emutls_decl (var);
+  tree fn = built_in_decls [BUILT_IN_EMUTLS_GET_ADDRESS];
+  tree arg = build_fold_addr_expr_with_type (emuvar, ptr_type_node);
+  tree arglist = build_tree_list (NULL_TREE, arg);
+  tree call = build_function_call_expr (UNKNOWN_LOCATION, fn, arglist);
+  return fold_convert (build_pointer_type (TREE_TYPE (var)), call);
+}
+
 /* Gimplifies a DECL_EXPR node *STMT_P by making any necessary allocation
    and initialization explicit.  */
 
@@ -1356,6 +1368,18 @@  gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_
   if (TREE_TYPE (decl) == error_mark_node)
     return GS_ERROR;
 
+  /* TLS emulation hook - replace __thread VAR's &VAR with
+     __emutls_get_address (&_emutls.VAR). We then ignore the original
+     var.  */
+  if (! targetm.have_tls
+      && TREE_CODE (decl) == VAR_DECL
+      && DECL_THREAD_LOCAL_P (decl))
+    {
+      stmt = build_fold_indirect_ref (emutls_var_address (decl));
+      gimplify_and_add (stmt, seq_p);
+      return GS_ALL_DONE;
+    }
+
   if ((TREE_CODE (decl) == TYPE_DECL
        || TREE_CODE (decl) == VAR_DECL)
       && !TYPE_SIZES_GIMPLIFIED (TREE_TYPE (decl)))
@@ -1875,6 +1899,17 @@  gimplify_var_or_parm_decl (tree *expr_p)
       return GS_ERROR;
     }
 
+  /* TLS emulation hook - replace __thread VAR's &VAR with
+     __emutls_get_address (&_emutls.VAR).  */
+  if (! targetm.have_tls
+      && TREE_CODE (decl) == VAR_DECL
+      && DECL_THREAD_LOCAL_P (decl))
+    {
+      gcc_assert (!DECL_HAS_VALUE_EXPR_P (decl)) ;
+      *expr_p = build_fold_indirect_ref (emutls_var_address (decl));
+      return GS_OK ;
+    }
+
   /* When within an OpenMP context, notice uses of variables.  */
   if (gimplify_omp_ctxp && omp_notice_variable (gimplify_omp_ctxp, decl, true))
     return GS_ALL_DONE;
@@ -5555,14 +5590,15 @@  omp_notice_variable (struct gimplify_omp_ctx *ctx,
   /* Threadprivate variables are predetermined.  */
   if (is_global_var (decl))
     {
-      if (DECL_THREAD_LOCAL_P (decl))
+      if ((DECL_TLS_MODEL (decl) >= TLS_MODEL_EMULATED))
 	return omp_notice_threadprivate_variable (ctx, decl, NULL_TREE);
 
       if (DECL_HAS_VALUE_EXPR_P (decl))
 	{
 	  tree value = get_base_address (DECL_VALUE_EXPR (decl));
 
-	  if (value && DECL_P (value) && DECL_THREAD_LOCAL_P (value))
+	  if (value && DECL_P (value) && 
+	      (DECL_TLS_MODEL (value) >= TLS_MODEL_EMULATED))
 	    return omp_notice_threadprivate_variable (ctx, decl, value);
 	}
     }
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 161773)
+++ gcc/varasm.c	(working copy)
@@ -319,9 +319,11 @@  get_emutls_init_templ_addr (tree decl)
   TREE_USED (to) = TREE_USED (decl);
   TREE_READONLY (to) = 1;
   DECL_IGNORED_P (to) = 1;
+
+  DECL_PRESERVE_P (to) = 1;
+  
   DECL_CONTEXT (to) = DECL_CONTEXT (decl);
   DECL_SECTION_NAME (to) = DECL_SECTION_NAME (decl);
-  DECL_PRESERVE_P (to) = DECL_PRESERVE_P (decl);
 
   DECL_WEAK (to) = DECL_WEAK (decl);
   if (DECL_ONE_ONLY (decl))
@@ -336,7 +338,6 @@  get_emutls_init_templ_addr (tree decl)
 
   DECL_VISIBILITY_SPECIFIED (to) = DECL_VISIBILITY_SPECIFIED (decl);
   DECL_INITIAL (to) = DECL_INITIAL (decl);
-  DECL_INITIAL (decl) = NULL;
 
   varpool_finalize_decl (to);
   return build_fold_addr_expr (to);
@@ -361,6 +362,8 @@  emutls_decl (tree decl)
   if (! emutls_htab)
     emutls_htab = htab_create_ggc (512, tree_map_hash, tree_map_eq, 0);
 
+  gcc_assert (DECL_NAME (decl)) ;
+  
   name = DECL_ASSEMBLER_NAME (decl);
 
   /* Note that we use the hash of the decl's name, rather than a hash
@@ -387,8 +390,7 @@  emutls_decl (tree decl)
       DECL_TLS_MODEL (to) = TLS_MODEL_EMULATED;
       DECL_ARTIFICIAL (to) = 1;
       DECL_IGNORED_P (to) = 1;
-      /* FIXME: work around PR44132.  */
-      DECL_PRESERVE_P (to) = 1;
+
       TREE_READONLY (to) = 0;
       SET_DECL_ASSEMBLER_NAME (to, DECL_NAME (to));
       if (DECL_ONE_ONLY (decl))
@@ -412,18 +414,42 @@  emutls_decl (tree decl)
   TREE_STATIC (to) = TREE_STATIC (decl);
   TREE_USED (to) = TREE_USED (decl);
   TREE_PUBLIC (to) = TREE_PUBLIC (decl);
+  TREE_ADDRESSABLE (to) = TREE_ADDRESSABLE (decl);
   DECL_EXTERNAL (to) = DECL_EXTERNAL (decl);
   DECL_COMMON (to) = DECL_COMMON (decl);
   DECL_WEAK (to) = DECL_WEAK (decl);
   DECL_VISIBILITY (to) = DECL_VISIBILITY (decl);
   DECL_VISIBILITY_SPECIFIED (to) = DECL_VISIBILITY_SPECIFIED (decl);
+  DECL_PRESERVE_P (to) = DECL_PRESERVE_P (decl);
   
   /* Fortran might pass this to us.  */
   DECL_RESTRICTED_P (to) = DECL_RESTRICTED_P (decl);
+  
+  /* As soon as we see an initializer (and providing one is not already
+     present) we can setup the init. template.  */
+  if (!DECL_INITIAL (to) && 
+      DECL_INITIAL (decl) && (DECL_INITIAL (decl) != error_mark_node)) 
+    if (! DECL_EXTERNAL (to) && ! DECL_COMMON (to))
+      {
+	DECL_INITIAL (to) = targetm.emutls.var_init
+		(to, decl, get_emutls_init_templ_addr (decl));
 
+	/* Make sure the template is marked as needed early enough.
+	   Without this, if the variable is placed in a
+	   section-anchored block, the template will only be marked
+	   when it's too late.*/
+	record_references_in_initializer (to, false);
+	/* We leave this marked, so that any attempt at duplicate
+	   or re-initialization will give the appropriate error.  */
+	DECL_INITIAL (decl) = error_mark_node ;
+      }
+  /* Say we are not interested in emitting this Var.  */
+  TREE_ASM_WRITTEN (decl) = 1;
   return to;
 }
 
+/* Add static constructors for emutls vars, where required.  */
+
 static int
 emutls_common_1 (void **loc, void *xstmts)
 {
@@ -431,17 +457,21 @@  emutls_common_1 (void **loc, void *xstmts)
   tree args, x, *pstmts = (tree *) xstmts;
   tree word_type_node;
 
-  if (! DECL_COMMON (h->base.from)
-      || (DECL_INITIAL (h->base.from)
-	  && DECL_INITIAL (h->base.from) != error_mark_node))
+  /* ??? It is not enough to check DECL_COMMON because variables might be 
+     allocated in other uninitialized sections.  However, this might still
+     not be an adequate test.  */
+  if (DECL_INITIAL (h->to))
     return 1;
+  
+  /* Refuse to build a zero-sized item, first make sure there's
+     a valid layout.  */
+  if (DECL_SIZE (h->base.from) == 0)
+    layout_decl (h->base.from, 0);
+	
+  gcc_assert (DECL_SIZE (h->base.from) != 0);
 
   word_type_node = lang_hooks.types.type_for_mode (word_mode, 1);
 
-  /* The idea was to call get_emutls_init_templ_addr here, but if we
-     do this and there is an initializer, -fanchor_section loses,
-     because it would be too late to ensure the template is
-     output.  */
   x = null_pointer_node;
   args = tree_cons (NULL, x, NULL);
   x = build_int_cst (word_type_node, DECL_ALIGN_UNIT (h->base.from));
@@ -468,6 +498,9 @@  emutls_finalize_control_var (void **loc,
   if (h != NULL) 
     {
       struct varpool_node *node = varpool_node (h->to);
+      gcc_assert (! DECL_THREAD_LOCAL_P (h->to));
+      if (TREE_USED (h->base.from)) 
+	TREE_USED (h->to) = 1;
       /* Because varpool_finalize_decl () has side-effects,
          only apply to un-finalized vars.  */
       if (node && !node->finalized) 
@@ -791,6 +824,10 @@  asm_output_bss (FILE *file, tree decl ATTRIBUTE_UN
   gcc_assert (strcmp (XSTR (XEXP (DECL_RTL (decl), 0), 0), name) == 0);
   targetm.asm_out.globalize_decl_name (file, decl);
   switch_to_section (bss_section);
+  /* We don't emit the userland vars for emulated TLS, just the control.  */
+  gcc_assert (targetm.have_tls  
+	      || (TREE_CODE (decl) != VAR_DECL)
+	      || !DECL_THREAD_LOCAL_P (decl));
 #ifdef ASM_DECLARE_OBJECT_NAME
   last_assemble_variable_decl = decl;
   ASM_DECLARE_OBJECT_NAME (file, name, decl);
@@ -817,6 +854,10 @@  asm_output_aligned_bss (FILE *file, tree decl ATTR
 {
   switch_to_section (bss_section);
   ASM_OUTPUT_ALIGN (file, floor_log2 (align / BITS_PER_UNIT));
+  /* We don't emit the userland vars for emulated TLS, just the control.  */
+  gcc_assert (targetm.have_tls  
+	      || (TREE_CODE (decl) != VAR_DECL)
+	      || !DECL_THREAD_LOCAL_P (decl));
 #ifdef ASM_DECLARE_OBJECT_NAME
   last_assemble_variable_decl = decl;
   ASM_DECLARE_OBJECT_NAME (file, name, decl);
@@ -1232,7 +1273,12 @@  get_variable_section (tree decl, bool prefer_noswi
   if (IN_NAMED_SECTION (decl))
     return get_named_section (decl, NULL, reloc);
 
-  if (ADDR_SPACE_GENERIC_P (as)
+  /* This cannot be lcomm for an emulated TLS object, without
+     a register_common hook.  */
+  if (DECL_TLS_MODEL (decl) == TLS_MODEL_EMULATED
+      && !targetm.emutls.register_common)
+    ;
+  else if (ADDR_SPACE_GENERIC_P (as)
       && !DECL_THREAD_LOCAL_P (decl)
       && !(prefer_noswitch_p && targetm.have_switchable_bss_sections)
       && bss_initializer_p (decl))
@@ -2152,35 +2198,6 @@  assemble_variable (tree decl, int top_level ATTRIB
   rtx decl_rtl, symbol;
   section *sect;
 
-  if (! targetm.have_tls
-      && TREE_CODE (decl) == VAR_DECL
-      && DECL_THREAD_LOCAL_P (decl))
-    {
-      tree to = emutls_decl (decl);
-
-      /* If this variable is defined locally, then we need to initialize the
-         control structure with size and alignment information.  We do this
-	 at the last moment because tentative definitions can take a locally
-	 defined but uninitialized variable and initialize it later, which
-	 would result in incorrect contents.  */
-      if (! DECL_EXTERNAL (to)
-	  && (! DECL_COMMON (to)
-	      || (DECL_INITIAL (decl)
-		  && DECL_INITIAL (decl) != error_mark_node)))
-	{
-	  DECL_INITIAL (to) = targetm.emutls.var_init
-	    (to, decl, get_emutls_init_templ_addr (decl));
-
-	  /* Make sure the template is marked as needed early enough.
-	     Without this, if the variable is placed in a
-	     section-anchored block, the template will only be marked
-	     when it's too late.  */
-	  record_references_in_initializer (to, false);
-	}
-
-      decl = to;
-    }
-
   last_assemble_variable_decl = 0;
 
   /* Normally no need to say anything here for external references,
@@ -2196,6 +2213,14 @@  assemble_variable (tree decl, int top_level ATTRIB
   if (TREE_CODE (decl) == FUNCTION_DECL)
     return;
 
+  /* The first declaration of a variable that comes through this function
+     decides whether it is global (in C, has external linkage)
+     or local (in C, has internal linkage).  So do nothing more
+     if this function has already run.  */
+
+  if (TREE_ASM_WRITTEN (decl))
+    return;
+
   /* Do nothing for global register variables.  */
   if (DECL_RTL_SET_P (decl) && REG_P (DECL_RTL (decl)))
     {
@@ -2219,14 +2244,6 @@  assemble_variable (tree decl, int top_level ATTRIB
       return;
     }
 
-  /* The first declaration of a variable that comes through this function
-     decides whether it is global (in C, has external linkage)
-     or local (in C, has internal linkage).  So do nothing more
-     if this function has already run.  */
-
-  if (TREE_ASM_WRITTEN (decl))
-    return;
-
   /* Make sure targetm.encode_section_info is invoked before we set
      ASM_WRITTEN.  */
   decl_rtl = DECL_RTL (decl);
@@ -2237,6 +2254,12 @@  assemble_variable (tree decl, int top_level ATTRIB
   if (flag_syntax_only)
     return;
 
+  /* We don't emit the userland vars for emulated TLS - they should never
+     get to here, only the control vars should be emitted.  */
+  gcc_assert (targetm.have_tls  
+	      || (TREE_CODE (decl) != VAR_DECL)
+	      || !DECL_THREAD_LOCAL_P (decl));
+
   if (! dont_output_data
       && ! host_integerp (DECL_SIZE_UNIT (decl), 1))
     {
@@ -5700,18 +5723,14 @@  do_assemble_alias (tree decl, tree target)
   TREE_ASM_WRITTEN (decl) = 1;
   TREE_ASM_WRITTEN (DECL_ASSEMBLER_NAME (decl)) = 1;
 
+  gcc_assert (targetm.have_tls  
+	      || (TREE_CODE (decl) != VAR_DECL)
+	      || !DECL_THREAD_LOCAL_P (decl));
+
   if (lookup_attribute ("weakref", DECL_ATTRIBUTES (decl)))
     {
       ultimate_transparent_alias_target (&target);
 
-      if (!targetm.have_tls
-	  && TREE_CODE (decl) == VAR_DECL
-	  && DECL_THREAD_LOCAL_P (decl))
-	{
-	  decl = emutls_decl (decl);
-	  target = get_emutls_object_name (target);
-	}
-
       if (!TREE_SYMBOL_REFERENCED (target))
 	weakref_targets = tree_cons (decl, target, weakref_targets);
 
@@ -5730,14 +5749,6 @@  do_assemble_alias (tree decl, tree target)
       return;
     }
 
-  if (!targetm.have_tls
-      && TREE_CODE (decl) == VAR_DECL
-      && DECL_THREAD_LOCAL_P (decl))
-    {
-      decl = emutls_decl (decl);
-      target = get_emutls_object_name (target);
-    }
-
 #ifdef ASM_OUTPUT_DEF
   /* Make name accessible from other files, if appropriate.  */
 
@@ -5819,6 +5830,15 @@  remove_unreachable_alias_pairs (void)
     }
 }
 
+/* Lookup the decl for a symbol in the varpool.  */
+static tree
+var_decl_for_asm (tree symbol)
+{
+  struct varpool_node *vnode = varpool_node_for_asm  (symbol);
+  if (vnode) 
+    return vnode->decl;
+  return NULL;
+}
 
 /* First pass of completing pending aliases.  Make sure that cgraph knows
    which symbols will be required.  */
@@ -5831,8 +5851,55 @@  finish_aliases_1 (void)
 
   for (i = 0; VEC_iterate (alias_pair, alias_pairs, i, p); i++)
     {
-      tree target_decl;
+      tree target_decl=NULL;
 
+      /* When emulated TLS is in effect, redirect aliases so that they 
+         are registered between the control vars.  */
+      if (!targetm.have_tls 
+          && TREE_CODE (p->decl) == VAR_DECL
+          && (DECL_TLS_MODEL (p->decl) >= TLS_MODEL_EMULATED))
+	{
+	  tree tsym = p->target ;
+	  target_decl = var_decl_for_asm (tsym) ;
+	  if (!target_decl) 
+	    {
+	      /* If we didn't find the user's symbol, it could
+	         be because the alias really refers to a control 
+	         var.  */
+	      tsym = get_emutls_object_name (p->target);
+	      target_decl = var_decl_for_asm (tsym);
+	    }
+	  if (target_decl) 
+	    {
+	      struct varpool_node *vnode;
+	      /* If it hasn't been done already, substitute the control
+	         var for the original.  */
+	      if (DECL_THREAD_LOCAL_P (p->decl))
+		p->decl = emutls_decl (p->decl);
+	      /* If not TLS target, we've made a mistake.  */
+	      if (DECL_TLS_MODEL (target_decl) < TLS_MODEL_EMULATED)
+		error ("TLS symbol %q+D aliased to non-TLS symbol %qE",
+			p->decl, p->target);
+	      /* If it's the original we need to substitute the contol.  */
+	      else if (DECL_THREAD_LOCAL_P (target_decl))
+		{
+		  target_decl = emutls_decl (target_decl);
+		  tsym = get_emutls_object_name (p->target);
+		}
+	      /* else it's already the emulation control.  */
+	      /* Mark the var needed.  */
+	      vnode = varpool_node (target_decl);
+	      if (vnode) 
+	        {
+		  varpool_mark_needed_node (vnode);
+		  vnode->force_output = 1;
+	        }
+	      p->target = tsym;
+	    }
+	  /* Else we didn't find a decl for the symbol, which is an error
+	     unless there's a weak ref.  */
+	} 
+      else
       target_decl = find_decl_and_mark_needed (p->decl, p->target);
       if (target_decl == NULL)
 	{
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 161773)
+++ gcc/passes.c	(working copy)
@@ -152,6 +152,16 @@  rest_of_decl_compilation (tree decl,
 			  int top_level,
 			  int at_end)
 {
+  if (! targetm.have_tls 
+	&& !in_lto_p 
+	&& TREE_CODE (decl) == VAR_DECL
+	&& DECL_THREAD_LOCAL_P (decl))
+    {
+      /* Substitute the control var. for the user one.  */
+      rest_of_decl_compilation (emutls_decl (decl), top_level, at_end);
+      return;
+    }
+
   /* We deferred calling assemble_alias so that we could collect
      other attributes such as visibility.  Emit the alias now.  */
   {
Index: gcc/varpool.c
===================================================================
--- gcc/varpool.c	(revision 161773)
+++ gcc/varpool.c	(working copy)
@@ -312,6 +312,14 @@  varpool_mark_needed_node (struct varpool_node *nod
       && !TREE_ASM_WRITTEN (node->decl))
     varpool_enqueue_needed_node (node);
   node->needed = 1;
+  /* If we need the var, and it's an emulated TLS entity, that
+     means we need the control var.  */
+  if (!targetm.have_tls && DECL_THREAD_LOCAL_P (node->decl))
+    {
+      struct varpool_node *cv_node;
+      cv_node = varpool_node (emutls_decl (node->decl)) ;
+      varpool_mark_needed_node (cv_node);
+    }
 }
 
 /* Reset the queue of needed nodes.  */
@@ -346,17 +354,6 @@  decide_is_variable_needed (struct varpool_node *no
       && !DECL_EXTERNAL (decl))
     return true;
 
-  /* When emulating tls, we actually see references to the control
-     variable, rather than the user-level variable.  */
-  if (!targetm.have_tls
-      && TREE_CODE (decl) == VAR_DECL
-      && DECL_THREAD_LOCAL_P (decl))
-    {
-      tree control = emutls_decl (decl);
-      if (decide_is_variable_needed (varpool_node (control), control))
-	return true;
-    }
-
   /* When not reordering top level variables, we have to assume that
      we are going to keep everything.  */
   if (flag_toplevel_reorder)
@@ -381,14 +378,19 @@  varpool_finalize_decl (tree decl)
      or local (in C, has internal linkage).  So do nothing more
      if this function has already run.  */
   if (node->finalized)
+      return;
+
+  /* For emulated TLS vars, if we are in a position to finalize the userland
+     var, then we should be able to finalize the control var too.  */
+  if (!targetm.have_tls && decl 
+      && TREE_CODE (decl) == VAR_DECL
+      && DECL_THREAD_LOCAL_P (decl))
     {
-      if (cgraph_global_info_ready)
-	varpool_assemble_pending_decls ();
+      varpool_finalize_decl (emutls_decl (decl)) ;
+      node->finalized = true;    
       return;
     }
-  if (node->needed)
-    varpool_enqueue_needed_node (node);
-  node->finalized = true;
+
   if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl))
     node->force_output = true;
 
@@ -399,8 +401,11 @@  varpool_finalize_decl (tree decl)
      there.  */
   else if (TREE_PUBLIC (decl) && !DECL_COMDAT (decl) && !DECL_EXTERNAL (decl))
     varpool_mark_needed_node (node);
-  if (cgraph_global_info_ready)
-    varpool_assemble_pending_decls ();
+
+  if (node->needed)
+    varpool_enqueue_needed_node (node);
+
+  node->finalized = true;
 }
 
 /* Return variable availability.  See cgraph.h for description of individual
@@ -449,7 +454,7 @@  varpool_analyze_pending_decls (void)
 	     already informed about increased alignment.  */
           align_variable (decl, 0);
 	}
-      if (DECL_INITIAL (decl))
+      if (DECL_INITIAL (decl) && (DECL_INITIAL (decl) != error_mark_node))
 	record_references_in_initializer (decl, analyzed);
       if (node->same_comdat_group)
 	{