diff mbox

[cxx-mem-model] disallow load data races (1 of some)

Message ID 4D8B8051.2030307@redhat.com
State New
Headers show

Commit Message

Aldy Hernandez March 24, 2011, 5:33 p.m. UTC
This is my first stab at disallowing load data races that happen when we 
cache the value of a global.  I would appreciate input before 
committing, especially on the RTL bits, cause it's been quite a while 
since I typed d-e-b-u-g-_-r-t-x. :-)

In the example below we usually hoist "global" into a register or 
temporary to avoid reading from it at each step.  This would cause a 
race if another thread had modified "global" in between iterations.

	for (x=0; x< 5; x++)
		sum[x] =  global;

As expected, multiple passes can cause the same end result, so plugging 
the problem involves multiple places.  First, loop invariant movement 
moves the invariant.  Barring that, PRE moves the load, and even if we 
get past the SSA passes, rtl-CSE pulls the rug from under us.  If that 
weren't enough, the post-reload pass performs CSE over the hard 
registers, and we end up eliminating subsequent loads of "global".  I am 
sure there are many other places, but I'm starting with whatever it 
takes to fix gcc.dg/memmodel/global-hoist.c.

The patch below fixes the test in question with "--param 
allow-load-data-races=0".  What do y'all think?

Thanks.
* tree.h (DECL_THREAD_VISIBLE_P): New.
	* cselib.c (cselib_lookup_mem): Return 0 for globals for
	restrictive memory model.
	* cse.c (hash_rtx_cb): Same.
	* gimple.h (gimple_has_thread_visible_ops): Protoize.
	* gimple.c (gimple_has_thread_visible_ops): New.
	* tree-ssa-loop-im.c (movement_possibility): Disallow when
	avoiding load data races.
	* tree-ssa-pre.c (compute_avail): Same, for globals.

Comments

Richard Henderson March 24, 2011, 7:58 p.m. UTC | #1
On 03/24/2011 10:33 AM, Aldy Hernandez wrote:
> In the example below we usually hoist "global" into a register or
> temporary to avoid reading from it at each step. This would cause a
> race if another thread had modified "global" in between iterations.
> 
>     for (x=0; x< 5; x++)
>         sum[x] =  global;

Um, what?  Doesn't the c++ memory model have, like, sequence points
or somesuch verbage that includes some language like an "atomic"?

Your argument above, in absence of some serializing entity, does not
sound right at all.


r~
Aldy Hernandez March 24, 2011, 9:31 p.m. UTC | #2
On 03/24/11 14:58, Richard Henderson wrote:
> On 03/24/2011 10:33 AM, Aldy Hernandez wrote:
>> In the example below we usually hoist "global" into a register or
>> temporary to avoid reading from it at each step. This would cause a
>> race if another thread had modified "global" in between iterations.
>>
>>      for (x=0; x<  5; x++)
>>          sum[x] =  global;
>
> Um, what?  Doesn't the c++ memory model have, like, sequence points
> or somesuch verbage that includes some language like an "atomic"?
>
> Your argument above, in absence of some serializing entity, does not
> sound right at all.

This work is independent of the C++ memory model.  It is just to prevent 
the optimizers from introducing new data races due to code movement. 
This initial pass is just to make the optimizations data race safe so 
that if you have a program which is proven to be free of data races, you 
can run the optimizers and it will still be data race free after the 
optimizers have been run.

See the following summary by Andrew (which in turn is based on a paper 
by Hans Boehm):

http://gcc.gnu.org/wiki/Atomic/GCCMM/DataRaces

The code movement is disallowed under specific --param options and will 
later be used for the C++ memory model, though the kernel folk have been 
asking for something similar, so it's not C++ specific.

Does this make sense now?  Or were Andrew and I smoking some heavy duty 
stuff when reading the specs... and not sharing with you?

Aldy
Joseph Myers March 24, 2011, 9:51 p.m. UTC | #3
On Thu, 24 Mar 2011, Aldy Hernandez wrote:

> This work is independent of the C++ memory model.  It is just to prevent the
> optimizers from introducing new data races due to code movement. This initial
> pass is just to make the optimizations data race safe so that if you have a
> program which is proven to be free of data races, you can run the optimizers
> and it will still be data race free after the optimizers have been run.
> 
> See the following summary by Andrew (which in turn is based on a paper by Hans
> Boehm):
> 
> http://gcc.gnu.org/wiki/Atomic/GCCMM/DataRaces

But hoisting global in this case doesn't result in a data race, since the 
loop always accesses global and contains no synchronisation code.  If it 
were only accessed conditionally, as in the examples under "Avoiding 
Speculation" on that page, then there would be a race in hoisting it, but 
not for the code you gave; any data races with the hoisting would still 
have been present without it.
Andrew MacLeod March 24, 2011, 11:57 p.m. UTC | #4
> On Thu, 24 Mar 2011, Aldy Hernandez wrote:
>
>> This work is independent of the C++ memory model.  It is just to prevent the
>> optimizers from introducing new data races due to code movement. This initial
>> pass is just to make the optimizations data race safe so that if you have a
>> program which is proven to be free of data races, you can run the optimizers
>> and it will still be data race free after the optimizers have been run.
>>
>> See the following summary by Andrew (which in turn is based on a paper by Hans
>> Boehm):
>>
>> http://gcc.gnu.org/wiki/Atomic/GCCMM/DataRaces
> But hoisting global in this case doesn't result in a data race, since the
> loop always accesses global and contains no synchronisation code.  If it
> were only accessed conditionally, as in the examples under "Avoiding
> Speculation" on that page, then there would be a race in hoisting it, but
> not for the code you gave; any data races with the hoisting would still
> have been present without it.
>
My fault for not being specific about it... I tend to just use data race 
as a catch all for all these types of things when talking about them 
with Aldy.

the testcase should have      for (x=0; x< limit; x++)  sum[x] = global; 
     rather than x<5,  so that its not a known loop count.

The hoisted load is then not allowed as it become a speculative load of 
'global' on the main path which would not otherwise have been there. 
When the value of limit < 0, this can cause a load race detection on 
systems with either a software or hardware data race detector.

It can be allowed as long as the load happens inside a guard for the 
loop, but I dont think we are that sophisticated yet.

Bottom line is these flags are to prevent the introduction of loads of 
globals on code paths which didn't have had them before.

Andrew
Richard Biener March 25, 2011, 10:03 a.m. UTC | #5
On Fri, Mar 25, 2011 at 12:57 AM, Andrew MacLeod <amacleod@redhat.com> wrote:
>> On Thu, 24 Mar 2011, Aldy Hernandez wrote:
>>
>>> This work is independent of the C++ memory model.  It is just to prevent
>>> the
>>> optimizers from introducing new data races due to code movement. This
>>> initial
>>> pass is just to make the optimizations data race safe so that if you have
>>> a
>>> program which is proven to be free of data races, you can run the
>>> optimizers
>>> and it will still be data race free after the optimizers have been run.
>>>
>>> See the following summary by Andrew (which in turn is based on a paper by
>>> Hans
>>> Boehm):
>>>
>>> http://gcc.gnu.org/wiki/Atomic/GCCMM/DataRaces
>>
>> But hoisting global in this case doesn't result in a data race, since the
>> loop always accesses global and contains no synchronisation code.  If it
>> were only accessed conditionally, as in the examples under "Avoiding
>> Speculation" on that page, then there would be a race in hoisting it, but
>> not for the code you gave; any data races with the hoisting would still
>> have been present without it.
>>
> My fault for not being specific about it... I tend to just use data race as
> a catch all for all these types of things when talking about them with Aldy.
>
> the testcase should have      for (x=0; x< limit; x++)  sum[x] = global;
> rather than x<5,  so that its not a known loop count.
>
> The hoisted load is then not allowed as it become a speculative load of
> 'global' on the main path which would not otherwise have been there. When
> the value of limit < 0, this can cause a load race detection on systems with
> either a software or hardware data race detector.

But speculative loads are never a problem.  So I'd like to avoid cluttering
GCC code with stuff to avoid them.  I honestly don't care about diagnostic
tools that fail to see if a value read is used or not.

Richard.

> It can be allowed as long as the load happens inside a guard for the loop,
> but I dont think we are that sophisticated yet.
>
> Bottom line is these flags are to prevent the introduction of loads of
> globals on code paths which didn't have had them before.
>
> Andrew
>
>
>
Aldy Hernandez March 25, 2011, 12:57 p.m. UTC | #6
>>> But hoisting global in this case doesn't result in a data race, since the
>>> loop always accesses global and contains no synchronisation code.  If it
>>> were only accessed conditionally, as in the examples under "Avoiding
>>> Speculation" on that page, then there would be a race in hoisting it, but
>>> not for the code you gave; any data races with the hoisting would still
>>> have been present without it.
>>>
>> My fault for not being specific about it... I tend to just use data race as
>> a catch all for all these types of things when talking about them with Aldy.

Arghh... I took Andrew's testcase unmodified, and assumed it did not 
conform to the model.  Sorry for the confusion.

>> the testcase should have      for (x=0; x<  limit; x++)  sum[x] = global;
>> rather than x<5,  so that its not a known loop count.

And in this variant, we no longer have a data race.  The load from 
global is guarded by the limit check.

> But speculative loads are never a problem.  So I'd like to avoid cluttering
> GCC code with stuff to avoid them.  I honestly don't care about diagnostic
> tools that fail to see if a value read is used or not.

Frankly, I agree.  The standard (and the aforementioned paper) says that 
load data races are disallowed...

"If we had hypothetical hardware (or a virtual machine) that aborted the 
program when a race was detected, compilers would not be allowed to 
introduce potentially racing loads... even if the result of the load 
were unused would change the semantics of the program. Although such 
environments are not commonplace, they have certainly been proposed, and 
may well turn out to be very useful."

...but, if there's no hardware for it, I see no need for spending time 
on this.  Do y'all agree, so I can drop this witch hunt?

In which case, unfortunately, all the tests I had from Andrew (that 
actually trigger) are load data races.  So I'll have to hunt for invalid 
speculative stores instead.

Back to the drawing board.

Thanks folks.
Michael Matz March 25, 2011, 3:20 p.m. UTC | #7
Hi,

On Fri, 25 Mar 2011, Aldy Hernandez wrote:

> > But speculative loads are never a problem.  So I'd like to avoid 
> > cluttering GCC code with stuff to avoid them.  I honestly don't care 
> > about diagnostic tools that fail to see if a value read is used or 
> > not.
> 
> Frankly, I agree.  The standard (and the aforementioned paper) says that 
> load data races are disallowed...
> 
> "If we had hypothetical hardware (or a virtual machine) that aborted the
> program when a race was detected, compilers would not be allowed to introduce
> potentially racing loads... even if the result of the load were unused would
> change the semantics of the program. Although such environments are not
> commonplace, they have certainly been proposed, and may well turn out to be
> very useful."
> 
> ...but, if there's no hardware for it, I see no need for spending time on
> this.

Even if there was such (IMO useless) hardware, or somebody would waste his 
time in writing such (equally useless) virtual machine that can detect 
fabricated problems somebody invented for some standard that never are 
going to be problems in the real world, we shouldn't feel obliged to 
uglify GCC for that.

(OTOH my opinion about the new c++ memory model is known ;) )


Ciao,
Michael.
Jeff Law March 25, 2011, 3:30 p.m. UTC | #8
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/25/11 09:20, Michael Matz wrote:

> Even if there was such (IMO useless) hardware, or somebody would waste his 
> time in writing such (equally useless) virtual machine that can detect 
> fabricated problems somebody invented for some standard that never are 
> going to be problems in the real world, we shouldn't feel obliged to 
> uglify GCC for that.
> 
> (OTOH my opinion about the new c++ memory model is known ;) )
I'm not going to chime in on this specific problem; however, it is worth
noting that many of the issues raised by the C++0x memory model also
affect the linux kernel.

In fact, it was the realization that the kernel guys are fighting
closely related issues with data races that bumped the priority of the
memory model work to a level that we (Red Hat) felt it was necessary to
start pushing these issues upstream now.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNjLUOAAoJEBRtltQi2kC7FTkIAKtk5XF1auo5IPtpYJ1G8Yo8
EjW0a8hMcWFM0lWVt6+t9qStBcSCT35W9XYtWf4EmX+2HX6Hs3+KJIOoKXboqM0j
oA05YuQHHGqX2Br9jqlmocfB3E0qW+aCJLi3hu/nJWlMTUbOlf5BXF9sc0Q0Veio
oGwF8aMoXG1IXQJ5nq4SU03n6lrBWn7x/4vyQCesdEMzzeOL2/LN0i2FBdZkXXNj
xK8+R5uKni/pO5V/mKbm4l0ExRLmgO2iyxiTQO/jGlwfS1EnR4Zj2NiWHlXgWH2B
o55zGtfy1xNSONGdTAKwFHk3ShvmatRvZZxX5A+M3NlPBpbF0CHL1la+XaSZkZ8=
=ZlUI
-----END PGP SIGNATURE-----
Jakub Jelinek March 25, 2011, 3:33 p.m. UTC | #9
On Fri, Mar 25, 2011 at 09:30:22AM -0600, Jeff Law wrote:
> I'm not going to chime in on this specific problem; however, it is worth
> noting that many of the issues raised by the C++0x memory model also
> affect the linux kernel.

But what they are seeing are certainly store data races, not load races,
because no hw they care about (or no hw at all?) detects the latter.
Having options to avoid store data races is useful not just for C++0x memory
model compliance and Linux kernel, but e.g. for OpenMP too.

> In fact, it was the realization that the kernel guys are fighting
> closely related issues with data races that bumped the priority of the
> memory model work to a level that we (Red Hat) felt it was necessary to
> start pushing these issues upstream now.

	Jakub
Richard Biener March 25, 2011, 3:39 p.m. UTC | #10
On Fri, Mar 25, 2011 at 4:33 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Mar 25, 2011 at 09:30:22AM -0600, Jeff Law wrote:
>> I'm not going to chime in on this specific problem; however, it is worth
>> noting that many of the issues raised by the C++0x memory model also
>> affect the linux kernel.
>
> But what they are seeing are certainly store data races, not load races,
> because no hw they care about (or no hw at all?) detects the latter.
> Having options to avoid store data races is useful not just for C++0x memory
> model compliance and Linux kernel, but e.g. for OpenMP too.

And we have in the past removed code that created store data races
anyway.  There is nothing new here.

As stated multiple times in the past things get "interesting" when you
look at non-x86 hardware.

Richard.
Jeff Law March 31, 2011, 1:08 p.m. UTC | #11
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/24/11 17:57, Andrew MacLeod wrote:
> My fault for not being specific about it... I tend to just use data race
> as a catch all for all these types of things when talking about them
> with Aldy.
> 
> the testcase should have      for (x=0; x< limit; x++)  sum[x] = global;
>     rather than x<5,  so that its not a known loop count.
> 
> The hoisted load is then not allowed as it become a speculative load of
> 'global' on the main path which would not otherwise have been there.
> When the value of limit < 0, this can cause a load race detection on
> systems with either a software or hardware data race detector.
We generally don't allow this kind of code motion anyway -- though it's
more because we haven't built any kind of analysis to determine that
this kind of speculative load is often profitable and can't result in a
fault.

However, having a switch we can throw to avoid this behaviour (assuming
we add this kind of speculative code motion one day) is, IMHO, a good thing.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNlHzgAAoJEBRtltQi2kC7h24H/1TJ9gdSKfPwjtiKQL7BLraO
yAwvJJJihacO3xM2pkVZIXRVlHwFyqzU03x0ygExMOex3QMuLMByBbpxsk3uQV7t
30xF5MDEdXEmQ7L1jWI7gYXbKZk5lnJhDQKewPePdUbcVzJj3H71SQGP2uWCLMUE
7ToPioN2w3DazoWdnYC/Jqi5JwBLnKpcxkGXQPYbCvqII8W5Xkmlh743Zq5RF7ax
q1nAkRmKdsgIq0SJJNF0GVxdzfOQkt1q7yzR7W74PCG1h6Yam6DNsePSbswDSu46
0gxkqnlGb6dkgx1iCHAS0PtNDAurGVCAOd5rijH1yib7C+P48XLmoF10Y13v82I=
=D+1J
-----END PGP SIGNATURE-----
Richard Biener March 31, 2011, 1:26 p.m. UTC | #12
On Thu, Mar 31, 2011 at 3:08 PM, Jeff Law <law@redhat.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/24/11 17:57, Andrew MacLeod wrote:
>> My fault for not being specific about it... I tend to just use data race
>> as a catch all for all these types of things when talking about them
>> with Aldy.
>>
>> the testcase should have      for (x=0; x< limit; x++)  sum[x] = global;
>>     rather than x<5,  so that its not a known loop count.
>>
>> The hoisted load is then not allowed as it become a speculative load of
>> 'global' on the main path which would not otherwise have been there.
>> When the value of limit < 0, this can cause a load race detection on
>> systems with either a software or hardware data race detector.
> We generally don't allow this kind of code motion anyway -- though it's
> more because we haven't built any kind of analysis to determine that
> this kind of speculative load is often profitable and can't result in a
> fault.
>
> However, having a switch we can throw to avoid this behaviour (assuming
> we add this kind of speculative code motion one day) is, IMHO, a good thing.

If we know the access to global cannot trap I see no reason to disallow
speculative reading of it.

Richard.

> Jeff
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJNlHzgAAoJEBRtltQi2kC7h24H/1TJ9gdSKfPwjtiKQL7BLraO
> yAwvJJJihacO3xM2pkVZIXRVlHwFyqzU03x0ygExMOex3QMuLMByBbpxsk3uQV7t
> 30xF5MDEdXEmQ7L1jWI7gYXbKZk5lnJhDQKewPePdUbcVzJj3H71SQGP2uWCLMUE
> 7ToPioN2w3DazoWdnYC/Jqi5JwBLnKpcxkGXQPYbCvqII8W5Xkmlh743Zq5RF7ax
> q1nAkRmKdsgIq0SJJNF0GVxdzfOQkt1q7yzR7W74PCG1h6Yam6DNsePSbswDSu46
> 0gxkqnlGb6dkgx1iCHAS0PtNDAurGVCAOd5rijH1yib7C+P48XLmoF10Y13v82I=
> =D+1J
> -----END PGP SIGNATURE-----
>
Jeff Law March 31, 2011, 1:27 p.m. UTC | #13
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/31/11 07:26, Richard Guenther wrote:
> On Thu, Mar 31, 2011 at 3:08 PM, Jeff Law <law@redhat.com> wrote:
> On 03/24/11 17:57, Andrew MacLeod wrote:
>>>> My fault for not being specific about it... I tend to just use data race
>>>> as a catch all for all these types of things when talking about them
>>>> with Aldy.
>>>>
>>>> the testcase should have      for (x=0; x< limit; x++)  sum[x] = global;
>>>>     rather than x<5,  so that its not a known loop count.
>>>>
>>>> The hoisted load is then not allowed as it become a speculative load of
>>>> 'global' on the main path which would not otherwise have been there.
>>>> When the value of limit < 0, this can cause a load race detection on
>>>> systems with either a software or hardware data race detector.
> We generally don't allow this kind of code motion anyway -- though it's
> more because we haven't built any kind of analysis to determine that
> this kind of speculative load is often profitable and can't result in a
> fault.
> 
> However, having a switch we can throw to avoid this behaviour (assuming
> we add this kind of speculative code motion one day) is, IMHO, a good thing.
> 
>> If we know the access to global cannot trap I see no reason to disallow
>> speculative reading of it.
Without trip count estimates the speculative read can introduce a
performance regression.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNlIFYAAoJEBRtltQi2kC7+PsH/jRDrhp30ZgRcKctox4/4w46
GBPWsizmDttIg9INfWejX5xvD6wj++qKwBEjD4+5CiDPnjVg6A+5Kr4NrfyQoINx
m/kJHmSlO70jHTkfux24tA+Psyj+eyxEFIsXtntaW06MN0J4umEUceJj1SKygi4Q
qVfzR9q8gqDKZSEaNzDDpuZh8ANGt2Te4aAK27zd1aWImqjNdXJquESnr0i9Wyqo
Q6539rpfemCmSY5dusu0IGUoEWmKqk7EpfJPSsfZgb5E4FgnDMUCclHcNeMQatAC
aDmTVeP/AAtd+9ILwVyu0/j38Es6v5kMUXIRJxhMtjLAIQ6DTSajM7IUmuOs4Lg=
=TF6O
-----END PGP SIGNATURE-----
Michael Matz March 31, 2011, 8:52 p.m. UTC | #14
Hi,

On Thu, 31 Mar 2011, Jeff Law wrote:

> Without trip count estimates the speculative read can introduce a 
> performance regression.

Like any other transformation.  So what are you trying to say?  That 
therefore introducing a mode in GCC disallowing any speculative reads 
(even if they happen to improve performance!?) is a bright idea?

I sure hope not.

The right way to deal with performance concerns is to model them and (try 
to) avoid them.  Not to introduce random restrictions onto a (timing 
independend) as-if model.

So, please let's leave out any disguising arguments about performance in 
connection with the cxx mem-model, shall we?  It has different goals 
AFAIU, and if it has to resort to performance for showing its usefulness 
it's bound to fail already.


Ciao,
Michael.
diff mbox

Patch

Index: tree-ssa-loop-im.c
===================================================================
--- tree-ssa-loop-im.c	(revision 170745)
+++ tree-ssa-loop-im.c	(working copy)
@@ -377,6 +377,11 @@  movement_possibility (gimple stmt)
       || stmt_could_throw_p (stmt))
     return MOVE_IMPOSSIBLE;
 
+  /* Do not move loads of globals if under a restrictive memory model.  */
+  if (PARAM_VALUE (PARAM_ALLOW_LOAD_DATA_RACES) == 0
+      && gimple_has_thread_visible_ops (stmt))
+    return MOVE_IMPOSSIBLE;
+
   if (is_gimple_call (stmt))
     {
       /* While pure or const call is guaranteed to have no side effects, we
Index: tree.h
===================================================================
--- tree.h	(revision 170745)
+++ tree.h	(working copy)
@@ -3102,6 +3102,10 @@  struct GTY(()) tree_parm_decl {
 #define DECL_THREAD_LOCAL_P(NODE) \
   (VAR_DECL_CHECK (NODE)->decl_with_vis.tls_model >= TLS_MODEL_REAL)
 
+/* Return true if a VAR_DECL is visible from another thread.  */
+#define DECL_THREAD_VISIBLE_P(NODE) \
+  (TREE_STATIC (NODE) && !DECL_THREAD_LOCAL_P (NODE))
+
 /* In a non-local VAR_DECL with static storage duration, true if the
    variable has an initialization priority.  If false, the variable
    will be initialized at the DEFAULT_INIT_PRIORITY.  */
Index: cse.c
===================================================================
--- cse.c	(revision 170745)
+++ cse.c	(working copy)
@@ -2402,6 +2402,19 @@  hash_rtx_cb (const_rtx x, enum machine_m
       }
 
     case MEM:
+      /* Avoid hoisting loads of globals if under a restrictive memory
+	 model.  */
+      if (PARAM_VALUE (PARAM_ALLOW_LOAD_DATA_RACES) == 0)
+	{
+	  tree decl = MEM_EXPR (x);
+	  if (do_not_record_p
+	      && decl && TREE_CODE (decl) == VAR_DECL
+	      && DECL_THREAD_VISIBLE_P (decl))
+	    {
+	      *do_not_record_p = 1;
+	      return 0;
+	    }
+	}
       /* We don't record if marked volatile or if BLKmode since we don't
 	 know the size of the move.  */
       if (do_not_record_p && (MEM_VOLATILE_P (x) || GET_MODE (x) == BLKmode))
Index: cselib.c
===================================================================
--- cselib.c	(revision 170745)
+++ cselib.c	(working copy)
@@ -1190,6 +1190,16 @@  cselib_lookup_mem (rtx x, int create)
   cselib_val *mem_elt;
   struct elt_list *l;
 
+  /* Forget any reads from globals if under a restrictive memory
+     model.  */
+  if (PARAM_VALUE (PARAM_ALLOW_LOAD_DATA_RACES) == 0)
+    {
+      tree decl = MEM_EXPR (x);
+      if (decl && TREE_CODE (decl) == VAR_DECL
+	  && DECL_THREAD_VISIBLE_P (decl))
+	return 0;
+    }
+
   if (MEM_VOLATILE_P (x) || mode == BLKmode
       || !cselib_record_memory
       || (FLOAT_MODE_P (mode) && flag_float_store))
Index: tree-ssa-pre.c
===================================================================
--- tree-ssa-pre.c	(revision 170745)
+++ tree-ssa-pre.c	(working copy)
@@ -4000,6 +4000,12 @@  compute_avail (void)
 	      || stmt_could_throw_p (stmt))
 	    continue;
 
+	  /* Do not move loads of globals if under a restrictive
+	     memory model.  */
+	  if (PARAM_VALUE (PARAM_ALLOW_LOAD_DATA_RACES) == 0
+	      && gimple_has_thread_visible_ops (stmt))
+	    continue;
+
 	  switch (gimple_code (stmt))
 	    {
 	    case GIMPLE_RETURN:
Index: gimple.c
===================================================================
--- gimple.c	(revision 170745)
+++ gimple.c	(working copy)
@@ -2388,6 +2388,36 @@  gimple_rhs_has_side_effects (const_gimpl
   return false;
 }
 
+/* Return true if any of the operands in S are visible from another
+   thread (globals that are not TLS.  */
+/* ?? If this becomes a performance penalty, we should cache this
+   value as we do with volatiles.  */
+
+bool
+gimple_has_thread_visible_ops (const_gimple s)
+{
+  unsigned int i;
+
+  if (is_gimple_debug (s))
+    return false;
+
+  if (is_gimple_assign (s))
+    {
+      /* Skip the LHS.  */
+      i = 1;
+    }
+  else
+    i = 0;
+  for (; i < gimple_num_ops (s); i++)
+    {
+      tree op = gimple_op (s, i);
+      if (op && TREE_CODE (op) == VAR_DECL
+	  && DECL_THREAD_VISIBLE_P (op))
+	return true;
+    }
+  return false;
+}
+
 /* Helper for gimple_could_trap_p and gimple_assign_rhs_could_trap_p.
    Return true if S can trap.  When INCLUDE_MEM is true, check whether
    the memory operations could trap.  When INCLUDE_STORES is true and
Index: gimple.h
===================================================================
--- gimple.h	(revision 170745)
+++ gimple.h	(working copy)
@@ -882,6 +882,7 @@  gimple gimple_build_cond_from_tree (tree
 void gimple_cond_set_condition_from_tree (gimple, tree);
 bool gimple_has_side_effects (const_gimple);
 bool gimple_rhs_has_side_effects (const_gimple);
+bool gimple_has_thread_visible_ops (const_gimple);
 bool gimple_could_trap_p (gimple);
 bool gimple_could_trap_p_1 (gimple, bool, bool);
 bool gimple_assign_rhs_could_trap_p (gimple);