diff mbox

[C11-atomic] gimple atomic statements

Message ID 4F7B5218.6070609@redhat.com
State New
Headers show

Commit Message

Andrew MacLeod April 3, 2012, 7:40 p.m. UTC
Here is my first step in promoting the __atomic builtins into gimple 
statements.  I originally planned this as tree codes, but when 
prototyped it became obvious that a gimple statement was a far superior 
solution since we also need to deal with LHS and memory address issues.

Motivations are many-fold, but primarily it makes manipulating atomics 
easier and exposes more of their side effects to the optimizers.  In 
particular we can now expose both return values of the compare and swap 
when implementing compare_exchange and get more efficient code 
generation. (soon :-)   It is item number 3 on my 4.8 task list: 
http://gcc.gnu.org/wiki/Atomic/GCCMM/gcc4.8.

This first step adds a GIMPLE_ATOMIC statement class which handles all 
the __atomic built-in calls. Right after the cfg is built, all built-in 
__atomic calls are converted to gimple_atomic statements.  I considered 
doing the conversion right in the gimplifier, but elected to leave it as 
a pass for the time being.  This then  passes through all the optimizers 
to cfgexpand, where they are then converted directly into rtl.

This currently  produces the same code that the builtins do in 4.7.  I 
expect that I missed a few places in the optimizers where they aren't 
properly treated as barriers yet,  but I'll get to tracking those down 
in a bit.

I also have not implemented non-integral atomics yet, nor do I issuing 
library calls when inline expansion cannot be done.  That's next... I 
just want to get the basics checked into the branch.

I expect to be able to wrap the __sync routines into this as well, 
eliminating all the atomic and sync builtin expansion code, keeping 
everything in one easy statement class.   Then I'll add the _Atomic type 
qualifier to the parser, and have that simply translate expressions 
involving those types into gimple_atomic statements at the same time 
calls are converted.

This bootstraps on x86_64-unknown-linux-gnu, and the only testsuite 
regressions are one involving issuing library calls.  There is a 
toolchain build problem with libjava however... During libjava 
construction there ends up being files which cant be created due to 
permission problems in .svn directories ...   Pretty darn weird, but 
I'll look into it later when the atomic gimple support is complete, if 
the problem it still exists then.

Anyone see anything obviously flawed about the approach?

Andrew

Comments

Richard Biener April 4, 2012, 8:45 a.m. UTC | #1
On Tue, Apr 3, 2012 at 9:40 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> Here is my first step in promoting the __atomic builtins into gimple
> statements.  I originally planned this as tree codes, but when prototyped it
> became obvious that a gimple statement was a far superior solution since we
> also need to deal with LHS and memory address issues.
>
> Motivations are many-fold, but primarily it makes manipulating atomics
> easier and exposes more of their side effects to the optimizers.  In
> particular we can now expose both return values of the compare and swap when
> implementing compare_exchange and get more efficient code generation. (soon
> :-)   It is item number 3 on my 4.8 task list:
> http://gcc.gnu.org/wiki/Atomic/GCCMM/gcc4.8.
>
> This first step adds a GIMPLE_ATOMIC statement class which handles all the
> __atomic built-in calls. Right after the cfg is built, all built-in __atomic
> calls are converted to gimple_atomic statements.  I considered doing the
> conversion right in the gimplifier, but elected to leave it as a pass for
> the time being.  This then  passes through all the optimizers to cfgexpand,
> where they are then converted directly into rtl.
>
> This currently  produces the same code that the builtins do in 4.7.  I
> expect that I missed a few places in the optimizers where they aren't
> properly treated as barriers yet,  but I'll get to tracking those down in a
> bit.
>
> I also have not implemented non-integral atomics yet, nor do I issuing
> library calls when inline expansion cannot be done.  That's next... I just
> want to get the basics checked into the branch.
>
> I expect to be able to wrap the __sync routines into this as well,
> eliminating all the atomic and sync builtin expansion code, keeping
> everything in one easy statement class.   Then I'll add the _Atomic type
> qualifier to the parser, and have that simply translate expressions
> involving those types into gimple_atomic statements at the same time calls
> are converted.
>
> This bootstraps on x86_64-unknown-linux-gnu, and the only testsuite
> regressions are one involving issuing library calls.  There is a toolchain
> build problem with libjava however... During libjava construction there ends
> up being files which cant be created due to permission problems in .svn
> directories ...   Pretty darn weird, but I'll look into it later when the
> atomic gimple support is complete, if the problem it still exists then.
>
> Anyone see anything obviously flawed about the approach?

The fact that you need to touch every place that wants to look at memory
accesses shows that you are doing it wrong.  Instead my plan was to
force _all_ memory accesses to GIMPLE_ASSIGNs (yes, including those
we have now in calls).  You're making a backwards step in my eyes.

What do you think is "easier" when you use a GIMPLE_ATOMIC
(why do you need a fntype field?!  Should the type not be available
via the operand types?!)

Your tree-cfg.c parts need to be filled in.  They are the specification of
GIMPLE_ATOMIC - at the moment you allow any garbage.

Similar to how I dislike the choice of adding GIMPLE_TRANSACTION
instead of using builtin functions I dislike this.

I suppose you do not want to use builtins because for primitive types you
end up with multiple statements for something "atomic"?

So, please tell us why the current scheme does not work and how the
new scheme overcomes this (that's entirely missing in your proposal...)

Thanks,
Richard.

> Andrew
>
>
Richard Henderson April 4, 2012, 1:26 p.m. UTC | #2
On 04/04/2012 04:45 AM, Richard Guenther wrote:
> I suppose you do not want to use builtins because for primitive types you
> end up with multiple statements for something "atomic"?

The primary motivation is that builtins cannot return two values.

Our current builtin returns one of the two values by reference, as
one would do from plain old C.  Even though we drop the reference
when we convert from gimple to rtl, this is not good enough to clean
up the variable we forced to a stack slot.

I suggested a specialized GIMPLE_ATOMIC opcode instead of doing a
totally generalized GIMPLE_ASSIGN_N, returning N values.


r~
Andrew MacLeod April 4, 2012, 1:28 p.m. UTC | #3
On 04/04/2012 04:45 AM, Richard Guenther wrote:
>
> The fact that you need to touch every place that wants to look at memory
> accesses shows that you are doing it wrong.  Instead my plan was to
> force _all_ memory accesses to GIMPLE_ASSIGNs (yes, including those
> we have now in calls).  You're making a backwards step in my eyes.
I'm not sure I understand what you are saying, or at least I don't know 
what this plan you are talking about is...   Are you saying that you are 
planning to change gimple so that none of the gimple statement types 
other than GIMPLE_ASSIGN ever see an ADDR_EXPR or memory reference?     
Seems like that change, when it happens, would simply affect 
GIMPLE_ATOMIC like all the other gimple classes.  And if it was done 
before I tried to merge the branch, would fall on me to fix.  Right now, 
I'm just handling what the compiler sends my way...  A bunch of places 
need to understand a new gimple_statement kind...
> What do you think is "easier" when you use a GIMPLE_ATOMIC
> (why do you need a fntype field?!  Should the type not be available
> via the operand types?!)

This is a WIP... that fntype fields is there for simplicity..   and 
no... you can do a 1 byte atomic operation on a full word object if you 
want by using __atomic_store_1 ()... so you can't just look at the 
object. We might be able to sort that type out eventually if all the 
casts are correct, but until everything is finished, this is safer.  I'm 
actually hoping eventually to not have a bunch of casts on the params, 
they are just there to get around the builtin's type-checking system.. 
we should be able to  just take care of required promotions at expansion 
time and do type-checking during verification.

>
> Your tree-cfg.c parts need to be filled in.  They are the specification of
> GIMPLE_ATOMIC - at the moment you allow any garbage.

well of course.... this isnt suppose to be a final patch, its to get the 
core changes into a branch while I continue working on it.  There are a 
number of parts that aren't filled in or flushed out yet.   Once its all 
working and what is expected is well defined, then I'll fill in the 
verification stuff.

> Similar to how I dislike the choice of adding GIMPLE_TRANSACTION
> instead of using builtin functions I dislike this.
>
> I suppose you do not want to use builtins because for primitive types you
> end up with multiple statements for something "atomic"?
builtins are just more awkward to work with, and don't support more than 
1 result.
compare_and swap was the worst case.. it has 2 results and that does not 
map to a built in function very well. we struggled all last fall with 
how to do it efficiently, and eventually gave up. given:

   int p = 1;
   bool ret;
   ret = __atomic_compare_exchange_n (&flag2, &p, 0, 0, 
__ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
   return ret;

with GCC 4.7 we currently end up generating

   p = 1;
   ret_1 = __atomic_compare_exchange_4 (&flag2, &p, 0, 0, 5, 5);
   return ret_1;

Note this actually requires leaving a local (p) on the stack, and 
reduces the optimizations that can be performed on it, even though there 
isn't really a need.

By going to a gimple statement, we can expose both results properly, and 
this ends up generating

   (ret_3, cmpxchg.2_4) = atomic_compare_exchange_strong <&flag2, 1, 0, 
SEQ_CST, SEQ_CST>
   return ret_3;

and during expansion to RTL, can trivially see that cmpxchg.2_4 is not 
used, and generate the really efficient compare and swap pattern which 
only produces a boolean result.   if only cmpxchg.2_4 were used, we can 
generate the C&S pattern which only returns the result.  Only if we see 
both are actually used do we have to fall back to the much uglier 
pattern we have that produces both results.  Currently we always 
generate this pattern.

Next, we have the C11 atomic type qualifier which needs to be 
implemented.  Every reference to this variable is going to have to be 
expanded into one or more atomic operations of some sort.  Yes, I 
probably could do that by emitting built-in functions, but they are a 
bit more unwieldy, its far simpler to just create gimple_statements.

I discovered last fall that when I tried to translate one built-in 
function into another form that dealing with parameters and all the call 
bits was a pain.  Especially when the library call I need to emit had a 
different number of parameters than the built-in did.   A GIMPLE_ATOMIC 
statement makes all of this trivial.

I also hope that when done, I can also remove all the ugly built-in 
overload code that was created for __sync and continues to be used by 
__atomic.  This would clean up where we have to take func_n and turn it 
into a func_1 or func_2 or whatever.  We also had to bend over and issue 
a crap load of different casts early to squeeze all the parameters into 
the 'proper' form for the builtins. This made it more awkward to dig 
down and find the things being operated on and manipulate them. The 
type-checking code is not a thing of beauty either.   Expansion of 
GIMPLE_ATOMIC should take care of cleaning all that up.

So bottom line, a GIMPLE_ATOMIC statement is just an object that is much 
easier to work with.  It cleans up both initial creation and rtl 
generation, as well as being easier to manipulate. It also encompasses 
an entire class of operations that are becoming more integral *if* we 
can make them efficient, and I hope to actually do some optimizations on 
them eventually.  I had a discussion last fall with Linus about what we 
needed to be able to do to them in order for the kernel to use __atomic 
instead of their home-rolled solutions.  Could I do everything with 
builtins? sure... its just more awkward and this approach seems cleaner 
to me.

I wasn't excited about creating a new gimple statement, but it seemed 
the best solution to my issues. In the end, I think this works very 
cleanly.  Im certainly open to better solutions. If there is a plan to 
change gimple in some way that this doesnt work with, then it would be 
good to know what that plan is.

Andrew
Richard Biener April 4, 2012, 1:46 p.m. UTC | #4
On Wed, Apr 4, 2012 at 3:26 PM, Richard Henderson <rth@redhat.com> wrote:
> On 04/04/2012 04:45 AM, Richard Guenther wrote:
>> I suppose you do not want to use builtins because for primitive types you
>> end up with multiple statements for something "atomic"?
>
> The primary motivation is that builtins cannot return two values.
>
> Our current builtin returns one of the two values by reference, as
> one would do from plain old C.  Even though we drop the reference
> when we convert from gimple to rtl, this is not good enough to clean
> up the variable we forced to a stack slot.

If that is the only reason you can return two values by using a complex
or vector type (that would be only an IL implementation detail as far
as I can see).
We use that trick to get sincos () "sane" in our IL as well.

Are there other reasons to go with a new GIMPLE code?

> I suggested a specialized GIMPLE_ATOMIC opcode instead of doing a
> totally generalized GIMPLE_ASSIGN_N, returning N values.

We already support multiple SSA defs btw, there is just no operand slot for
it that is properly named or handled by the operand scanner.  Thus, a
new GIMPLE_ASSIGN sub-code class would do, too (of course nobody
expects multiple DEFs here thus it would not be a very good idea to do
that, IMHO).

Richard.

>
> r~
Andrew MacLeod April 4, 2012, 1:58 p.m. UTC | #5
On 04/04/2012 09:28 AM, Andrew MacLeod wrote:
>
>
> I wasn't excited about creating a new gimple statement, but it seemed 
> the best solution to my issues. In the end, I think this works very 
> cleanly.  Im certainly open to better solutions. If there is a plan to 
> change gimple in some way that this doesnt work with, then it would be 
> good to know what that plan is.
>
btw, I did start my prototyping of this by creating atomic tree codes 
for each of the atomic buitins rather than a gimple atomic, but found 
that did not integrate very well (I forget exactly what the issue was 
now... something to do with when I was trying to translate them from 
bultins to treecodes), so I evolved to a gimple statement which gave me 
more control over things.

  If gimple is going to change somehow that will make this work better, 
I'm also fine with doing that.  i still have some of that code laying 
around.  OR I can go back and revisit it to remember exactly what the 
issue was.

Andrew
Andrew MacLeod April 4, 2012, 2:01 p.m. UTC | #6
Im not sure what happened to my original reply, so I'll resend it..


On 04/04/2012 09:28 AM, Andrew MacLeod wrote:
On 04/04/2012 04:45 AM, Richard Guenther wrote:
>
> The fact that you need to touch every place that wants to look at memory
> accesses shows that you are doing it wrong.  Instead my plan was to
> force _all_ memory accesses to GIMPLE_ASSIGNs (yes, including those
> we have now in calls).  You're making a backwards step in my eyes.
I'm not sure I understand what you are saying, or at least I don't know 
what this plan you are talking about is...   Are you saying that you are 
planning to change gimple so that none of the gimple statement types 
other than GIMPLE_ASSIGN ever see an ADDR_EXPR or memory reference?     
Seems like that change, when it happens, would simply affect 
GIMPLE_ATOMIC like all the other gimple classes.  And if it was done 
before I tried to merge the branch, would fall on me to fix.  Right now, 
I'm just handling what the compiler sends my way...  A bunch of places 
need to understand a new gimple_statement kind...
> What do you think is "easier" when you use a GIMPLE_ATOMIC
> (why do you need a fntype field?!  Should the type not be available
> via the operand types?!)

This is a WIP... that fntype fields is there for simplicity..   and 
no... you can do a 1 byte atomic operation on a full word object if you 
want by using __atomic_store_1 ()... so you can't just look at the 
object. We might be able to sort that type out eventually if all the 
casts are correct, but until everything is finished, this is safer.  I'm 
actually hoping eventually to not have a bunch of casts on the params, 
they are just there to get around the builtin's type-checking system.. 
we should be able to  just take care of required promotions at expansion 
time and do type-checking during verification.

>
> Your tree-cfg.c parts need to be filled in.  They are the 
> specification of
> GIMPLE_ATOMIC - at the moment you allow any garbage.

well of course.... this isnt suppose to be a final patch, its to get the 
core changes into a branch while I continue working on it.  There are a 
number of parts that aren't filled in or flushed out yet.   Once its all 
working and what is expected is well defined, then I'll fill in the 
verification stuff.

> Similar to how I dislike the choice of adding GIMPLE_TRANSACTION
> instead of using builtin functions I dislike this.
>
> I suppose you do not want to use builtins because for primitive types you
> end up with multiple statements for something "atomic"?
builtins are just more awkward to work with, and don't support more than 
1 result.
compare_and swap was the worst case.. it has 2 results and that does not 
map to a built in function very well. we struggled all last fall with 
how to do it efficiently, and eventually gave up. given:

   int p = 1;
   bool ret;
   ret = __atomic_compare_exchange_n (&flag2, &p, 0, 0, 
__ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
   return ret;

with GCC 4.7 we currently end up generating

   p = 1;
   ret_1 = __atomic_compare_exchange_4 (&flag2, &p, 0, 0, 5, 5);
   return ret_1;

Note this actually requires leaving a local (p) on the stack, and 
reduces the optimizations that can be performed on it, even though there 
isn't really a need.

By going to a gimple statement, we can expose both results properly, and 
this ends up generating

   (ret_3, cmpxchg.2_4) = atomic_compare_exchange_strong <&flag2, 1, 0, 
SEQ_CST, SEQ_CST>
   return ret_3;

and during expansion to RTL, can trivially see that cmpxchg.2_4 is not 
used, and generate the really efficient compare and swap pattern which 
only produces a boolean result.   if only cmpxchg.2_4 were used, we can 
generate the C&S pattern which only returns the result.  Only if we see 
both are actually used do we have to fall back to the much uglier 
pattern we have that produces both results.  Currently we always 
generate this pattern.

Next, we have the C11 atomic type qualifier which needs to be 
implemented.  Every reference to this variable is going to have to be 
expanded into one or more atomic operations of some sort.  Yes, I 
probably could do that by emitting built-in functions, but they are a 
bit more unwieldy, its far simpler to just create gimple_statements.

I discovered last fall that when I tried to translate one built-in 
function into another form that dealing with parameters and all the call 
bits was a pain.  Especially when the library call I need to emit had a 
different number of parameters than the built-in did.   A GIMPLE_ATOMIC 
statement makes all of this trivial.

I also hope that when done, I can also remove all the ugly built-in 
overload code that was created for __sync and continues to be used by 
__atomic.  This would clean up where we have to take func_n and turn it 
into a func_1 or func_2 or whatever.  We also had to bend over and issue 
a crap load of different casts early to squeeze all the parameters into 
the 'proper' form for the builtins. This made it more awkward to dig 
down and find the things being operated on and manipulate them. The 
type-checking code is not a thing of beauty either.   Expansion of 
GIMPLE_ATOMIC should take care of cleaning all that up.

So bottom line, a GIMPLE_ATOMIC statement is just an object that is much 
easier to work with.  It cleans up both initial creation and rtl 
generation, as well as being easier to manipulate. It also encompasses 
an entire class of operations that are becoming more integral *if* we 
can make them efficient, and I hope to actually do some optimizations on 
them eventually.  I had a discussion last fall with Linus about what we 
needed to be able to do to them in order for the kernel to use __atomic 
instead of their home-rolled solutions.  Could I do everything with 
builtins? sure... its just more awkward and this approach seems cleaner 
to me.

I wasn't excited about creating a new gimple statement, but it seemed 
the best solution to my issues. In the end, I think this works very 
cleanly.  Im certainly open to better solutions. If there is a plan to 
change gimple in some way that this doesnt work with, then it would be 
good to know what that plan is.

Andrew



>
Richard Henderson April 4, 2012, 2:32 p.m. UTC | #7
On 04/04/2012 09:46 AM, Richard Guenther wrote:
> If that is the only reason you can return two values by using a complex
> or vector type (that would be only an IL implementation detail as far
> as I can see).
> We use that trick to get sincos () "sane" in our IL as well.

That would work if the two values were of the same type, as they
are with sincos.  In the case of compare_exchange, they are not.



r~
Richard Biener April 4, 2012, 2:33 p.m. UTC | #8
On Wed, Apr 4, 2012 at 3:28 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 04/04/2012 04:45 AM, Richard Guenther wrote:
>>
>>
>> The fact that you need to touch every place that wants to look at memory
>> accesses shows that you are doing it wrong.  Instead my plan was to
>> force _all_ memory accesses to GIMPLE_ASSIGNs (yes, including those
>> we have now in calls).  You're making a backwards step in my eyes.
>
> I'm not sure I understand what you are saying, or at least I don't know what
> this plan you are talking about is...   Are you saying that you are planning
> to change gimple so that none of the gimple statement types other than
> GIMPLE_ASSIGN ever see an ADDR_EXPR or memory reference?

A memory reference, yes.  And at most one, thus no aggregate copies
anymore.

>     Seems like that
> change, when it happens, would simply affect GIMPLE_ATOMIC like all the
> other gimple classes.  And if it was done before I tried to merge the
> branch, would fall on me to fix.  Right now, I'm just handling what the
> compiler sends my way...  A bunch of places need to understand a new
> gimple_statement kind...

I'm not sure if I ever will end up finishing the above I just wanted
to mention it.

>> What do you think is "easier" when you use a GIMPLE_ATOMIC
>> (why do you need a fntype field?!  Should the type not be available
>> via the operand types?!)
>
>
> This is a WIP... that fntype fields is there for simplicity..   and no...
> you can do a 1 byte atomic operation on a full word object if you want by
> using __atomic_store_1 ()... so you can't just look at the object. We might
> be able to sort that type out eventually if all the casts are correct, but
> until everything is finished, this is safer.  I'm actually hoping eventually
> to not have a bunch of casts on the params, they are just there to get
> around the builtin's type-checking system.. we should be able to  just take
> care of required promotions at expansion time and do type-checking during
> verification.

Oh, so you rather need a size or a mode specified, not a "fntype"?

>
>
>>
>> Your tree-cfg.c parts need to be filled in.  They are the specification of
>> GIMPLE_ATOMIC - at the moment you allow any garbage.
>
>
> well of course.... this isnt suppose to be a final patch, its to get the
> core changes into a branch while I continue working on it.  There are a
> number of parts that aren't filled in or flushed out yet.   Once its all
> working and what is expected is well defined, then I'll fill in the
> verification stuff.
>
>
>> Similar to how I dislike the choice of adding GIMPLE_TRANSACTION
>> instead of using builtin functions I dislike this.
>>
>> I suppose you do not want to use builtins because for primitive types you
>> end up with multiple statements for something "atomic"?
>
> builtins are just more awkward to work with, and don't support more than 1
> result.
> compare_and swap was the worst case.. it has 2 results and that does not map
> to a built in function very well. we struggled all last fall with how to do
> it efficiently, and eventually gave up. given:
>
>  int p = 1;
>  bool ret;
>  ret = __atomic_compare_exchange_n (&flag2, &p, 0, 0, __ATOMIC_SEQ_CST,
> __ATOMIC_SEQ_CST);
>  return ret;
>
> with GCC 4.7 we currently end up generating
>
>  p = 1;
>  ret_1 = __atomic_compare_exchange_4 (&flag2, &p, 0, 0, 5, 5);
>  return ret_1;
>
> Note this actually requires leaving a local (p) on the stack, and reduces
> the optimizations that can be performed on it, even though there isn't
> really a need.

You could use a vector, complex or aggregate return.

> By going to a gimple statement, we can expose both results properly, and
> this ends up generating
>
>  (ret_3, cmpxchg.2_4) = atomic_compare_exchange_strong <&flag2, 1, 0,
> SEQ_CST, SEQ_CST>
>  return ret_3;

In the example you only ever use address operands (not memory operands)
to the GIMPLE_ATOMIC - is that true in all cases?  Is the result always
non-memory?

I suppose the GIMPLE_ATOMICs are still optimization barriers for all
memory, not just that possibly referenced by them?

> and during expansion to RTL, can trivially see that cmpxchg.2_4 is not used,
> and generate the really efficient compare and swap pattern which only
> produces a boolean result.

I suppose gimple stmt folding could transform it as well?

>   if only cmpxchg.2_4 were used, we can generate
> the C&S pattern which only returns the result.  Only if we see both are
> actually used do we have to fall back to the much uglier pattern we have
> that produces both results.  Currently we always generate this pattern.
>
> Next, we have the C11 atomic type qualifier which needs to be implemented.
>  Every reference to this variable is going to have to be expanded into one
> or more atomic operations of some sort.  Yes, I probably could do that by
> emitting built-in functions, but they are a bit more unwieldy, its far
> simpler to just create gimple_statements.

As I understand you first generate builtins anyway and then lower them?
Or are you planning on emitting those for GENERIC as well?  Remember
GENERIC is not GIMPLE, so you'd need new tree codes anyway ;)
Or do you plan to make __atomic integral part of GENERIC and thus
do this lowering during gimplification?

> I discovered last fall that when I tried to translate one built-in function
> into another form that dealing with parameters and all the call bits was a
> pain.  Especially when the library call I need to emit had a different
> number of parameters than the built-in did.   A GIMPLE_ATOMIC statement
> makes all of this trivial.

Trivial?  Surely only as trivial as providing functions that do the task for
atomics - something which you can do for re-writing atomic builtins as well ...

> I also hope that when done, I can also remove all the ugly built-in overload
> code that was created for __sync and continues to be used by __atomic.

But the builtins will stay for our users consumption and libstdc++ use, no?

> This
> would clean up where we have to take func_n and turn it into a func_1 or
> func_2 or whatever.  We also had to bend over and issue a crap load of
> different casts early to squeeze all the parameters into the 'proper' form
> for the builtins. This made it more awkward to dig down and find the things
> being operated on and manipulate them. The type-checking code is not a thing
> of beauty either.   Expansion of GIMPLE_ATOMIC should take care of cleaning
> all that up.

You have internal functions now that are "untyped", I suppose atomics would
map to those easier given they are never emitted as calls but always explicitely
expanded.

> So bottom line, a GIMPLE_ATOMIC statement is just an object that is much
> easier to work with.

Yes, I see that it is easier to work with for you.  All other statements will
see GIMPLE_ATOMICs as blockers for their work though, even if they
already deal with calls just fine - that's why I most of the time suggest
to use builtins (or internal fns) to do things (I bet you forgot to update
enough predicates ...).  Can GIMPLE_ATOMICs throw with -fnon-call-exceptions?
I suppose yes.  One thing you missed at least ;)

>  It cleans up both initial creation and rtl generation,
> as well as being easier to manipulate. It also encompasses an entire class
> of operations that are becoming more integral *if* we can make them
> efficient, and I hope to actually do some optimizations on them eventually.
>  I had a discussion last fall with Linus about what we needed to be able to
> do to them in order for the kernel to use __atomic instead of their
> home-rolled solutions.  Could I do everything with builtins? sure... its
> just more awkward and this approach seems cleaner to me.

Cleaner if you look at it in isolation - messy if you consider that not only
things working with atomics need to (not) deal with these new stmt kind.

Richard.

> I wasn't excited about creating a new gimple statement, but it seemed the
> best solution to my issues. In the end, I think this works very cleanly.  Im
> certainly open to better solutions. If there is a plan to change gimple in
> some way that this doesnt work with, then it would be good to know what that
> plan is.
>
> Andrew
>
>
>
>
Richard Biener April 4, 2012, 2:34 p.m. UTC | #9
On Wed, Apr 4, 2012 at 4:32 PM, Richard Henderson <rth@redhat.com> wrote:
> On 04/04/2012 09:46 AM, Richard Guenther wrote:
>> If that is the only reason you can return two values by using a complex
>> or vector type (that would be only an IL implementation detail as far
>> as I can see).
>> We use that trick to get sincos () "sane" in our IL as well.
>
> That would work if the two values were of the same type, as they
> are with sincos.  In the case of compare_exchange, they are not.

You can return an aggregate then (or adjust the IL so that they do have the
same type and only fix that up during expansion).

Richard.

>
>
> r~
Andrew MacLeod April 4, 2012, 3:50 p.m. UTC | #10
On 04/04/2012 10:33 AM, Richard Guenther wrote:
> On Wed, Apr 4, 2012 at 3:28 PM, Andrew MacLeod<amacleod@redhat.com>  wrote:
> This is a WIP... that fntype fields is there for simplicity..   and no...
> you can do a 1 byte atomic operation on a full word object if you want by
>
> Oh, so you rather need a size or a mode specified, not a "fntype"?

yes, poorly named perhaps as I created things... its just a type node at 
the moment that indicates the size being operated on that I collected 
from the builtin in function.

>
> In the example you only ever use address operands (not memory operands)
> to the GIMPLE_ATOMIC - is that true in all cases?  Is the result always
> non-memory?
The atomic address can be any arbitrary memory location... I haven't 
gotten to that yet.  its commonly just an address so I'm working with 
that first as proof of concept. When it gets something else it'll trap 
and I'll know :-)

Results are always non-memory, other than the side effects of the atomic 
contents changing and having to up date the second parameter to the 
compare_exchange routine.  The generic routines for arbitary structures 
(not added in yet), actually just work with blocks of memory, but they 
are all handled by addresses and the functions themselves are typically 
void.  I was planning on folding them right into the existing 
atomic_kinds as well... I can recognize from the type that it wont map 
to a integral type.  I needed separate builtins in 4.7  for them since 
the parameter list was different.
> I suppose the GIMPLE_ATOMICs are still optimization barriers for all
> memory, not just that possibly referenced by them?

yes, depending on the memory model used.  It can force synchronization 
with other CPUs/threads which will have the appearence of changing any 
shared memory location.  Various guarantees are made about whether those 
changes are visible to this thread after an atomic operation so we can't 
reuse shared values in those cases.  Various guarantees are made about 
what changes this thread has made are visible to other CPUs/threads at 
an atomic call as well, so that precludes moving stores downward in some 
models.
>
>> and during expansion to RTL, can trivially see that cmpxchg.2_4 is not used,
>> and generate the really efficient compare and swap pattern which only
>> produces a boolean result.
> I suppose gimple stmt folding could transform it as well?
it could if I provided gimple statements for the 3 different forms of 
C&S. I was planning to just leave it this way since its the interface 
being forced by C++11 as well as C11... and then just emit the 
appropriate RTL for this one C&S type.  The RTL patterns are already 
defined for the 2 easy cases for the __sync routines. the third one was 
added for __atomic.  Its possible that the process of integrating the 
__sync routines with GIMPLE_ATOMIC will indicate its better to add those 
forms as atomic_kinds and then gimple_fold_stmt could take care of it as 
well.   Maybe that is just a good idea anyway...  I'll keep it in mind.

>
>>    if only cmpxchg.2_4 were used, we can generate
>> the C&S pattern which only returns the result.  Only if we see both are
>> actually used do we have to fall back to the much uglier pattern we have
>> that produces both results.  Currently we always generate this pattern.
>>
>> Next, we have the C11 atomic type qualifier which needs to be implemented.
>>   Every reference to this variable is going to have to be expanded into one
>> or more atomic operations of some sort.  Yes, I probably could do that by
>> emitting built-in functions, but they are a bit more unwieldy, its far
>> simpler to just create gimple_statements.
> As I understand you first generate builtins anyway and then lower them?
> Or are you planning on emitting those for GENERIC as well?  Remember
> GENERIC is not GIMPLE, so you'd need new tree codes anyway ;)
> Or do you plan to make __atomic integral part of GENERIC and thus
> do this lowering during gimplification?
I was actually thinking about doing it during gimplification... I hadnt 
gotten as far as figuring out what to do with the functions from the 
front end yet.  I dont know that code well, but I was in fact hoping 
there was a way to 'recognize' the function names easily and avoid built 
in functions completely...

The C parser is going to have to understand the set of C11 routine names 
for all these anyway.. I figured there was something in there that could 
be done.


>> I also hope that when done, I can also remove all the ugly built-in overload
>> code that was created for __sync and continues to be used by __atomic.
> But the builtins will stay for our users consumption and libstdc++ use, no?

well, the names must remain exposed and recognizable since they are 'out 
there'.  Maybe under the covers I can just leave them as normal calls 
and then during gimplification simply recognize the names and generate 
GIMPLE_ATOMIC statements directly from the CALL_EXPR.  That would be 
ideal.  That way there are no builtins any more.


>> So bottom line, a GIMPLE_ATOMIC statement is just an object that is much
>> easier to work with.
> Yes, I see that it is easier to work with for you.  All other statements will
> see GIMPLE_ATOMICs as blockers for their work though, even if they
> already deal with calls just fine - that's why I most of the time suggest
> to use builtins (or internal fns) to do things (I bet you forgot to update
> enough predicates ...).  Can GIMPLE_ATOMICs throw with -fnon-call-exceptions?
> I suppose yes.  One thing you missed at least ;)

Not that I am aware of, they are 'noexcept'.  But I'm sure I've missed 
more than a few things so far.  Im pretty early in the process :-)
>
>>   It cleans up both initial creation and rtl generation,
>> as well as being easier to manipulate. It also encompasses an entire class
>> of operations that are becoming more integral *if* we can make them
>> efficient, and I hope to actually do some optimizations on them eventually.
>>   I had a discussion last fall with Linus about what we needed to be able to
>> do to them in order for the kernel to use __atomic instead of their
>> home-rolled solutions.  Could I do everything with builtins? sure... its
>> just more awkward and this approach seems cleaner to me.
> Cleaner if you look at it in isolation - messy if you consider that not only
> things working with atomics need to (not) deal with these new stmt kind.

They can affect shared memory in some ways like a call, but don't have 
many of the other attributes of call.  They are really more like an 
assignment or other operation with arbitrary shared memory side 
effects.  I do hope to be able to teach the optimizers the 
directionality of the memory model restrictions.  ie, ACQUIRE is only a 
barrier to hoisting shared memory code...  stores can be moved downward 
past this mode. RELEASE is only a barrier to sinking code.   RELAXED is 
no barrier at all to code motion.  In fact, a relaxed store is barely 
different than a real store... but there is a slight difference so we 
can't make it a normal store :-P.

By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we 
could hopefully lessen their impact eventually.

Andrew
Richard Biener April 5, 2012, 9:14 a.m. UTC | #11
On Wed, Apr 4, 2012 at 5:50 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 04/04/2012 10:33 AM, Richard Guenther wrote:
>>
>> On Wed, Apr 4, 2012 at 3:28 PM, Andrew MacLeod<amacleod@redhat.com>
>>  wrote:
>> This is a WIP... that fntype fields is there for simplicity..   and no...
>> you can do a 1 byte atomic operation on a full word object if you want by
>>
>> Oh, so you rather need a size or a mode specified, not a "fntype"?
>
>
> yes, poorly named perhaps as I created things... its just a type node at the
> moment that indicates the size being operated on that I collected from the
> builtin in function.

Ok.  Remember that you should use non-tree things if you can in GIMPLE
land.  This probably means that both the size and the memmodel "operands"
should be

+ struct GTY(()) gimple_statement_atomic
+ {
+   /* [ WORD 1-8 ]  */
+   struct gimple_statement_with_memory_ops_base membase;
+
+   /* [ WORD 9 ] */
+   enum gimple_atomic_kind kind;
+
     enum gimple_atomic_memmodel memmodel;

     unsigned size;

and not be trees in the ops array.  Even more, both kind and memmodel
should probably go in membase.base.subcode

>> In the example you only ever use address operands (not memory operands)
>> to the GIMPLE_ATOMIC - is that true in all cases?  Is the result always
>> non-memory?
>
> The atomic address can be any arbitrary memory location... I haven't gotten
> to that yet.  its commonly just an address so I'm working with that first as
> proof of concept. When it gets something else it'll trap and I'll know :-)

Ok.  Please try to avoid memory operands and stick to address operands ;)
You can make tree-ssa-operands.c not consider ADDR_EXPRs in the
address operands, similar to the ADDR_EXPRs in MEM_REF operand zero.

> Results are always non-memory, other than the side effects of the atomic
> contents changing and having to up date the second parameter to the
> compare_exchange routine.  The generic routines for arbitary structures (not
> added in yet), actually just work with blocks of memory, but they are all
> handled by addresses and the functions themselves are typically void.  I was
> planning on folding them right into the existing atomic_kinds as well... I
> can recognize from the type that it wont map to a integral type.  I needed
> separate builtins in 4.7  for them since the parameter list was different.
>
>> I suppose the GIMPLE_ATOMICs are still optimization barriers for all
>> memory, not just that possibly referenced by them?
>
>
> yes, depending on the memory model used.  It can force synchronization with
> other CPUs/threads which will have the appearence of changing any shared
> memory location.  Various guarantees are made about whether those changes
> are visible to this thread after an atomic operation so we can't reuse
> shared values in those cases.  Various guarantees are made about what
> changes this thread has made are visible to other CPUs/threads at an atomic
> call as well, so that precludes moving stores downward in some models.
>
>>
>>> and during expansion to RTL, can trivially see that cmpxchg.2_4 is not
>>> used,
>>> and generate the really efficient compare and swap pattern which only
>>> produces a boolean result.
>>
>> I suppose gimple stmt folding could transform it as well?
>
> it could if I provided gimple statements for the 3 different forms of C&S. I
> was planning to just leave it this way since its the interface being forced
> by C++11 as well as C11... and then just emit the appropriate RTL for this
> one C&S type.  The RTL patterns are already defined for the 2 easy cases for
> the __sync routines. the third one was added for __atomic.  Its possible
> that the process of integrating the __sync routines with GIMPLE_ATOMIC will
> indicate its better to add those forms as atomic_kinds and then
> gimple_fold_stmt could take care of it as well.   Maybe that is just a good
> idea anyway...  I'll keep it in mind.
>
>
>>
>>>   if only cmpxchg.2_4 were used, we can generate
>>> the C&S pattern which only returns the result.  Only if we see both are
>>> actually used do we have to fall back to the much uglier pattern we have
>>> that produces both results.  Currently we always generate this pattern.
>>>
>>> Next, we have the C11 atomic type qualifier which needs to be
>>> implemented.
>>>  Every reference to this variable is going to have to be expanded into
>>> one
>>> or more atomic operations of some sort.  Yes, I probably could do that by
>>> emitting built-in functions, but they are a bit more unwieldy, its far
>>> simpler to just create gimple_statements.
>>
>> As I understand you first generate builtins anyway and then lower them?
>> Or are you planning on emitting those for GENERIC as well?  Remember
>> GENERIC is not GIMPLE, so you'd need new tree codes anyway ;)
>> Or do you plan to make __atomic integral part of GENERIC and thus
>> do this lowering during gimplification?
>
> I was actually thinking about doing it during gimplification... I hadnt
> gotten as far as figuring out what to do with the functions from the front
> end yet.  I dont know that code well, but I was in fact hoping there was a
> way to 'recognize' the function names easily and avoid built in functions
> completely...

Heh ... you'd still need a GENERIC representation then.  Possibly
a ATOMIC_OP tree may do.

> The C parser is going to have to understand the set of C11 routine names for
> all these anyway.. I figured there was something in there that could be
> done.
>
>
>
>>> I also hope that when done, I can also remove all the ugly built-in
>>> overload
>>> code that was created for __sync and continues to be used by __atomic.
>>
>> But the builtins will stay for our users consumption and libstdc++ use,
>> no?
>
>
> well, the names must remain exposed and recognizable since they are 'out
> there'.  Maybe under the covers I can just leave them as normal calls and
> then during gimplification simply recognize the names and generate
> GIMPLE_ATOMIC statements directly from the CALL_EXPR.  That would be ideal.
>  That way there are no builtins any more.

I suppose my question was whether the frontends need to do sth about the
__atomic keyword or if that is simply translated to some type flag - or
is that keyword applying to operations, not to objects or types?

>
>
>>> So bottom line, a GIMPLE_ATOMIC statement is just an object that is much
>>> easier to work with.
>>
>> Yes, I see that it is easier to work with for you.  All other statements
>> will
>> see GIMPLE_ATOMICs as blockers for their work though, even if they
>> already deal with calls just fine - that's why I most of the time suggest
>> to use builtins (or internal fns) to do things (I bet you forgot to update
>> enough predicates ...).  Can GIMPLE_ATOMICs throw with
>> -fnon-call-exceptions?
>> I suppose yes.  One thing you missed at least ;)
>
>
> Not that I am aware of, they are 'noexcept'.  But I'm sure I've missed more
> than a few things so far.  Im pretty early in the process :-)

What about compare-exchange on a pointer dereference? The pointer
dereference surely can trap, so it can throw with -fnon-call-exceptions.  No?

>>
>>>  It cleans up both initial creation and rtl generation,
>>> as well as being easier to manipulate. It also encompasses an entire
>>> class
>>> of operations that are becoming more integral *if* we can make them
>>> efficient, and I hope to actually do some optimizations on them
>>> eventually.
>>>  I had a discussion last fall with Linus about what we needed to be able
>>> to
>>> do to them in order for the kernel to use __atomic instead of their
>>> home-rolled solutions.  Could I do everything with builtins? sure... its
>>> just more awkward and this approach seems cleaner to me.
>>
>> Cleaner if you look at it in isolation - messy if you consider that not
>> only
>> things working with atomics need to (not) deal with these new stmt kind.
>
>
> They can affect shared memory in some ways like a call, but don't have many
> of the other attributes of call.  They are really more like an assignment or
> other operation with arbitrary shared memory side effects.  I do hope to be
> able to teach the optimizers the directionality of the memory model
> restrictions.  ie, ACQUIRE is only a barrier to hoisting shared memory
> code...  stores can be moved downward past this mode. RELEASE is only a
> barrier to sinking code.   RELAXED is no barrier at all to code motion.  In
> fact, a relaxed store is barely different than a real store... but there is
> a slight difference so we can't make it a normal store :-P.
>
> By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we could
> hopefully lessen their impact eventually.

Ok.  I suppose having a GIMPLE_ATOMIC is fine then.

+ /* In tree-atomic.c.  */
+ extern bool expand_gimple_atomic_load (gimple);

err, gimple-atomic.c please ;)

+ /* Return the expression field of atomic operation GS.  */
+
+ static inline tree
+ gimple_atomic_expr (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_expr (gs));
+   return gimple_op (gs, 2);
+ }

err - what's "expression" in this context?  I hope it's not an arbitrary
tcc_expression tree?!

+ static inline bool
+ gimple_atomic_has_fail_order (const_gimple gs)
+ {
+   return gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE;
+ }

btw, these kind of predicates look superfluous to me - if they are true
exactly for one atomic kind then users should use the predicate to
test for that specific atomic kind, not for some random field presence.

+ /* Return the arithmetic operation tree code for atomic operation GS.  */
+
+ static inline enum tree_code
+ gimple_atomic_op_code (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_FETCH_OP ||
+             gimple_atomic_kind (gs) == GIMPLE_ATOMIC_OP_FETCH);
+   return (enum tree_code) gs->gsbase.subcode;
+ }

now, what was it with this "expression" thing again?  Btw, ||s go to the
next line.  subcode should be the atomic kind - it seems that you glob
too much into GIMPLE_ATOMIC and that you maybe should sub-class
GIMPLE_ATOMIC properly via the subcode field.  You'd have an
gimple_atomic_base which fences could use for example.

All asserts in inline functions should be gcc_checking_asserts.

+ gimple
+ gimple_build_atomic_load (tree type, tree target, tree order)
+ {
+   gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 3);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_LOAD);
+   gimple_atomic_set_order (s, order);
+   gimple_atomic_set_target (s, target);
+   gimple_atomic_set_type (s, type);
+   gimple_set_has_volatile_ops (s, true);

you should have a gimple_build_atomic_raw function that takes all
operands and the atomic kind, avoiding the need for all the repeated
calls of gimple_atomic_set_* as well as avoid all the repeated checking
this causes.

+   else if (is_gimple_atomic (stmt))
+     {
+       tree t;
+       if (visit_store)
+         {
+         for (i = 0; i < gimple_atomic_num_lhs (stmt); i++)
+           {
+             t = gimple_atomic_lhs (stmt, i);
+             if (t)
+               {
+                 t = get_base_loadstore (t);
+                 if (t)
+                   ret |= visit_store (stmt, t, data);
+               }

I thought results are always registers?  The walk_stmt_load_store_addr_ops
looks wrong to me.  As the loads/stores are implicit (you only have addresses)
you have to adjust all callers of walk_stmt_load_store_addr_ops to handle
atomics specially as they expect to come along all loads/stores that way.
Those also handle calls and asms (for "memory" clobber) specially.


+     case GIMPLE_ATOMIC:
+       /* Atomic operations are memory barriers in both directions for now.  */
+       add_virtual_operand (stmt, opf_def | opf_use);

It's surely not relevant that "in both directions for now" but instead that
"Atomic operations have side-effects on memory."

+       for (n = 0; n < gimple_atomic_num_lhs (stmt); n++)
+       get_expr_operands (stmt, gimple_atomic_lhs_ptr (stmt, n), opf_def);
+       for (n = 0; n < gimple_atomic_num_rhs (stmt); n++)
+       get_expr_operands (stmt, gimple_op_ptr (stmt, n), opf_use);
+       break;

Do address-takens in operands make the addresses escape?  If not
you should pass opf_non_addressable as well.

Index: tree-ssa-alias.c
===================================================================
*** tree-ssa-alias.c    (revision 186098)
--- tree-ssa-alias.c    (working copy)
*************** ref_maybe_used_by_stmt_p (gimple stmt, t
*** 1440,1445 ****
--- 1440,1447 ----
      }
    else if (is_gimple_call (stmt))
      return ref_maybe_used_by_call_p (stmt, ref);
+   else if (is_gimple_atomic (stmt))
+     return true;

please add a comment before these atomic handlings that we assume
they are using/clobbering all refs because they are considered memory
optimization barriers.  Btw, does this apply to non-address-taken automatic
references?  I suppose not.  Consider:

int foo()
{
  struct s;
  atomic_fence();
  s.a = 1;
  atomic_fence();
  return s.a;
}

we still should be able to optimize this to return 1, no?  At least SRA will
happily do similar things in a non-flow-sensitive way.  Please add a FIXME
to the alias predicates at least, or even better fix this missed optimization.
There is no need to establish "backwards missed-optimization compatibility"
just like we do for asms.

*************** stmt_kills_ref_p_1 (gimple stmt, ao_ref
*** 1814,1819 ****
--- 1818,1825 ----
        }
      }

+   if (is_gimple_atomic (stmt))
+     return true;

that's for sure wrong ;)  It should return false.

Index: tree-ssa-sink.c
===================================================================
*** tree-ssa-sink.c     (revision 186098)
--- tree-ssa-sink.c     (working copy)
*************** is_hidden_global_store (gimple stmt)
*** 145,150 ****
--- 145,154 ----
      {
        tree lhs;

+       /* Don't optimize across an atomic operation.  */
+       if (is_gimple_atomic (stmt))
+         return true;
+

that's bogus, too (really all uses of is_hidden_global_store should go away).
Please look into the few callers of this function and  handle atomics in
a correct way explicitely.

+         else if (is_gimple_atomic (stmt))
+           {
+             unsigned n;
+
+             /* We may be able to lessen this with more relaxed memory
+                models, but for now, its a full barrier.  */
+             mark_all_reaching_defs_necessary (stmt);
+
+             for (n = 0; n < gimple_atomic_num_rhs (stmt); n++)
+               {
+                 tree t = gimple_op (stmt, n);
+                 if (TREE_CODE (t) != SSA_NAME &&
+                     TREE_CODE (t) != INTEGER_CST &&
+                     !is_gimple_min_invariant (t) &&
+                     !ref_may_be_aliased (t))
+                   mark_aliased_reaching_defs_necessary (stmt, t);
+               }
+           }

for sure atomics do not make non-aliased automatic variable stores necessary.
At least I hope so.  As there are only addresses in the atomic ops the
code looks wrong to me (and &&s go to the next line ...).  As you are
marking everything needed anway you can just remove the bogus loop
completely.

+     case GIMPLE_ATOMIC:
+       /* Treat this like a call for now, it may expand into a call.  */
+       if (gimple_atomic_kind (stmt) != GIMPLE_ATOMIC_FENCE)
+       cost = gimple_num_ops (stmt) *
+              estimate_move_cost (TREE_TYPE (gimple_atomic_target (stmt)));
+       else
+         cost = 1;
+       break;

for sure this counts way too many "ops".  There is at most a single
memory operand as far as I understand.  A size/speed cost of 1 for
a FENCE is also too low.

I miss handling of GIMPLE_ATOMICs in tree-ssa-structalias.c.

Richard.

> Andrew
>
Andrew MacLeod April 5, 2012, 11:58 a.m. UTC | #12
On 04/05/2012 05:14 AM, Richard Guenther wrote:
>
> Ok.  I suppose having a GIMPLE_ATOMIC is fine then.
>
Thanks for all the feedback, I haven't really had to play in gimple land 
much lately. Some of those routines I guessed at what to do in order to 
get through initial compilation and your comments help clear those up 
too :-)

I will address everything next week since I doubt I will get to them 
before the long weekend arrives here.

Andrew
Richard Sandiford April 6, 2012, 8:13 a.m. UTC | #13
Richard Guenther <richard.guenther@gmail.com> writes:
>> They can affect shared memory in some ways like a call, but don't have many
>> of the other attributes of call.  They are really more like an assignment or
>> other operation with arbitrary shared memory side effects.  I do hope to be
>> able to teach the optimizers the directionality of the memory model
>> restrictions.  ie, ACQUIRE is only a barrier to hoisting shared memory
>> code...  stores can be moved downward past this mode. RELEASE is only a
>> barrier to sinking code.   RELAXED is no barrier at all to code motion.  In
>> fact, a relaxed store is barely different than a real store... but there is
>> a slight difference so we can't make it a normal store :-P.
>>
>> By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we could
>> hopefully lessen their impact eventually.
>
> Ok.  I suppose having a GIMPLE_ATOMIC is fine then.

Just for my own education really, but: does this mean that there'd
be unnecessary pessimisation in representing the thing as a call?
The interleaved load/store internal fns are really assignments too,
so if calls aren't right for that kind of operation, maybe we need
to replace the internal fns with something else.  Or at least come
up with some new call properties.

Which is a roundabout way of wondering what the main difficulties
would be in attaching things like directionality to a call.

Not arguing for anything here, just an onlooker wanting to understand. :-)

(BTW, it sounds like restricting memory accesses to GIMPLE_ASSIGN
might cause trouble for the interleave load/store stuff too.)

Richard
Richard Biener April 9, 2012, 10:50 a.m. UTC | #14
On Fri, Apr 6, 2012 at 10:13 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
>>> They can affect shared memory in some ways like a call, but don't have many
>>> of the other attributes of call.  They are really more like an assignment or
>>> other operation with arbitrary shared memory side effects.  I do hope to be
>>> able to teach the optimizers the directionality of the memory model
>>> restrictions.  ie, ACQUIRE is only a barrier to hoisting shared memory
>>> code...  stores can be moved downward past this mode. RELEASE is only a
>>> barrier to sinking code.   RELAXED is no barrier at all to code motion.  In
>>> fact, a relaxed store is barely different than a real store... but there is
>>> a slight difference so we can't make it a normal store :-P.
>>>
>>> By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we could
>>> hopefully lessen their impact eventually.
>>
>> Ok.  I suppose having a GIMPLE_ATOMIC is fine then.
>
> Just for my own education really, but: does this mean that there'd
> be unnecessary pessimisation in representing the thing as a call?

No, there are not.  They are more pessimized than GIMPLE_ASSIGNs
(unless you handle them specially in a few places).  But the same is
true for GIMPLE_ATOMIC.  The question is one of convenience as
far as I understand.  In general I would like to avoid new GIMPLE codes,
especially "random" ones.  You can do everything with builtins or
internal functions just fine.

> The interleaved load/store internal fns are really assignments too,
> so if calls aren't right for that kind of operation, maybe we need
> to replace the internal fns with something else.  Or at least come
> up with some new call properties.

What missed optimizations do you see?

> Which is a roundabout way of wondering what the main difficulties
> would be in attaching things like directionality to a call.

Directionality?

> Not arguing for anything here, just an onlooker wanting to understand. :-)
>
> (BTW, it sounds like restricting memory accesses to GIMPLE_ASSIGN
> might cause trouble for the interleave load/store stuff too.)

Well.  In the end my plan was to have a GIMPLE_LOAD and GIMPLE_STORE
stmt, where the load would load to an SSA name and the store would store from
a constant or an SSA name.  Advantages would be simplified data-flow analysis
and things like aggregate copy propagation and value-numbering for free.  It
would also be easier to attach special data / properties to the now
single load / store
sites (where in calls you can have an arbitrary number of loads at
least).  Whatever
special property the interleaved load/store has would be stored there, too.  The
idea was to be able to fold most of the implicit information about
loads/stores that
are in the MEM_REF /COMPONENT_REF / ARRAY_REF trees into proper "gimple"
level information, like having an array of index, stride tuples for
(multidimensional)
array accesses.  Think of it like a place where we can properly embed the
MEM_ATTRs we have on the RTL side for example.  That's much easier if
there were loads/stores only in very defined places.

Richard.
Richard Sandiford April 10, 2012, 1:14 p.m. UTC | #15
Richard Guenther <richard.guenther@gmail.com> writes:
> On Fri, Apr 6, 2012 at 10:13 AM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Richard Guenther <richard.guenther@gmail.com> writes:
>>>> They can affect shared memory in some ways like a call, but don't have many
>>>> of the other attributes of call.  They are really more like an assignment or
>>>> other operation with arbitrary shared memory side effects.  I do hope to be
>>>> able to teach the optimizers the directionality of the memory model
>>>> restrictions.  ie, ACQUIRE is only a barrier to hoisting shared memory
>>>> code...  stores can be moved downward past this mode. RELEASE is only a
>>>> barrier to sinking code.   RELAXED is no barrier at all to code motion.  In
>>>> fact, a relaxed store is barely different than a real store... but there is
>>>> a slight difference so we can't make it a normal store :-P.
>>>>
>>>> By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we could
>>>> hopefully lessen their impact eventually.
>>>
>>> Ok.  I suppose having a GIMPLE_ATOMIC is fine then.
>>
>> Just for my own education really, but: does this mean that there'd
>> be unnecessary pessimisation in representing the thing as a call?
>
> No, there are not.  They are more pessimized than GIMPLE_ASSIGNs
> (unless you handle them specially in a few places).  But the same is
> true for GIMPLE_ATOMIC.  The question is one of convenience as
> far as I understand.  In general I would like to avoid new GIMPLE codes,
> especially "random" ones.  You can do everything with builtins or
> internal functions just fine.
>
>> The interleaved load/store internal fns are really assignments too,
>> so if calls aren't right for that kind of operation, maybe we need
>> to replace the internal fns with something else.  Or at least come
>> up with some new call properties.
>
> What missed optimizations do you see?

None. :-)  But...

>> Which is a roundabout way of wondering what the main difficulties
>> would be in attaching things like directionality to a call.
>
> Directionality?

[See above.]

...I was asking in the context quoted above, which seemed to be the bit
that convinced you GIMPLE_ATOMIC would be OK after all.  And the two
main reasons in the bit quoted above seemed to be that GIMPLE_ATOMIC
was more like GIMPLE_ASSIGN (which is true of the current internal
fns too, and was why I was interested) and that we wanted to add the
directionality of the memory model (which seemed at face value like
something that could be attached to a call).

>> Not arguing for anything here, just an onlooker wanting to understand. :-)
>>
>> (BTW, it sounds like restricting memory accesses to GIMPLE_ASSIGN
>> might cause trouble for the interleave load/store stuff too.)
>
> Well.  In the end my plan was to have a GIMPLE_LOAD and GIMPLE_STORE
> stmt, where the load would load to an SSA name and the store would store from
> a constant or an SSA name.  Advantages would be simplified data-flow analysis
> and things like aggregate copy propagation and value-numbering for free.  It
> would also be easier to attach special data / properties to the now
> single load / store
> sites (where in calls you can have an arbitrary number of loads at
> least).  Whatever
> special property the interleaved load/store has would be stored there, too.  The
> idea was to be able to fold most of the implicit information about
> loads/stores that
> are in the MEM_REF /COMPONENT_REF / ARRAY_REF trees into proper "gimple"
> level information, like having an array of index, stride tuples for
> (multidimensional)
> array accesses.  Think of it like a place where we can properly embed the
> MEM_ATTRs we have on the RTL side for example.  That's much easier if
> there were loads/stores only in very defined places.

Ah, OK, so there'd still be a single gimple stmt (GIMPLE_LOAD or GIMPLE_STORE),
and that stmt would do the interleaving too?  Sounds good.  I was worried at
first that we'd have two separate stmts (e.g. a load and an interleave)
which was what the internal fns were supposed to avoid.

Thanks,
Richard
Richard Biener April 10, 2012, 2:39 p.m. UTC | #16
On Tue, Apr 10, 2012 at 3:14 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
>> On Fri, Apr 6, 2012 at 10:13 AM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> Richard Guenther <richard.guenther@gmail.com> writes:
>>>>> They can affect shared memory in some ways like a call, but don't have many
>>>>> of the other attributes of call.  They are really more like an assignment or
>>>>> other operation with arbitrary shared memory side effects.  I do hope to be
>>>>> able to teach the optimizers the directionality of the memory model
>>>>> restrictions.  ie, ACQUIRE is only a barrier to hoisting shared memory
>>>>> code...  stores can be moved downward past this mode. RELEASE is only a
>>>>> barrier to sinking code.   RELAXED is no barrier at all to code motion.  In
>>>>> fact, a relaxed store is barely different than a real store... but there is
>>>>> a slight difference so we can't make it a normal store :-P.
>>>>>
>>>>> By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we could
>>>>> hopefully lessen their impact eventually.
>>>>
>>>> Ok.  I suppose having a GIMPLE_ATOMIC is fine then.
>>>
>>> Just for my own education really, but: does this mean that there'd
>>> be unnecessary pessimisation in representing the thing as a call?
>>
>> No, there are not.  They are more pessimized than GIMPLE_ASSIGNs
>> (unless you handle them specially in a few places).  But the same is
>> true for GIMPLE_ATOMIC.  The question is one of convenience as
>> far as I understand.  In general I would like to avoid new GIMPLE codes,
>> especially "random" ones.  You can do everything with builtins or
>> internal functions just fine.
>>
>>> The interleaved load/store internal fns are really assignments too,
>>> so if calls aren't right for that kind of operation, maybe we need
>>> to replace the internal fns with something else.  Or at least come
>>> up with some new call properties.
>>
>> What missed optimizations do you see?
>
> None. :-)  But...
>
>>> Which is a roundabout way of wondering what the main difficulties
>>> would be in attaching things like directionality to a call.
>>
>> Directionality?
>
> [See above.]
>
> ...I was asking in the context quoted above, which seemed to be the bit
> that convinced you GIMPLE_ATOMIC would be OK after all.  And the two
> main reasons in the bit quoted above seemed to be that GIMPLE_ATOMIC
> was more like GIMPLE_ASSIGN (which is true of the current internal
> fns too, and was why I was interested) and that we wanted to add the
> directionality of the memory model (which seemed at face value like
> something that could be attached to a call).
>
>>> Not arguing for anything here, just an onlooker wanting to understand. :-)
>>>
>>> (BTW, it sounds like restricting memory accesses to GIMPLE_ASSIGN
>>> might cause trouble for the interleave load/store stuff too.)
>>
>> Well.  In the end my plan was to have a GIMPLE_LOAD and GIMPLE_STORE
>> stmt, where the load would load to an SSA name and the store would store from
>> a constant or an SSA name.  Advantages would be simplified data-flow analysis
>> and things like aggregate copy propagation and value-numbering for free.  It
>> would also be easier to attach special data / properties to the now
>> single load / store
>> sites (where in calls you can have an arbitrary number of loads at
>> least).  Whatever
>> special property the interleaved load/store has would be stored there, too.  The
>> idea was to be able to fold most of the implicit information about
>> loads/stores that
>> are in the MEM_REF /COMPONENT_REF / ARRAY_REF trees into proper "gimple"
>> level information, like having an array of index, stride tuples for
>> (multidimensional)
>> array accesses.  Think of it like a place where we can properly embed the
>> MEM_ATTRs we have on the RTL side for example.  That's much easier if
>> there were loads/stores only in very defined places.
>
> Ah, OK, so there'd still be a single gimple stmt (GIMPLE_LOAD or GIMPLE_STORE),
> and that stmt would do the interleaving too?

Well, whatever we'd like to add (part of the atomic stuff would fit here, too,
just not the operation part like increment).

Richard.

>  Sounds good.  I was worried at
> first that we'd have two separate stmts (e.g. a load and an interleave)
> which was what the internal fns were supposed to avoid.



> Thanks,
> Richard
Andrew MacLeod April 26, 2012, 5:53 p.m. UTC | #17
On 04/05/2012 05:14 AM, Richard Guenther wrote:
>
> Ok.  Remember that you should use non-tree things if you can in GIMPLE
> land.  This probably means that both the size and the memmodel "operands"
> should be
>
> + struct GTY(()) gimple_statement_atomic
> + {
> +   /* [ WORD 1-8 ]  */
> +   struct gimple_statement_with_memory_ops_base membase;
> +
> +   /* [ WORD 9 ] */
> +   enum gimple_atomic_kind kind;
> +
>       enum gimple_atomic_memmodel memmodel;
>
>       unsigned size;
>
> and not be trees in the ops array.  Even more, both kind and memmodel
> should probably go in membase.base.subcode

I'm using subcode now for for the fetch_op and op_fetch operation when 
we actually need a tree code for plus, sub, and, etc...  so I am 
utilizing that field. Since the 'kind' field is present in ALL node, and 
the tree code was only needed in some, it looked cleaner to utilize the 
subcode field for the 'sometimes' field so that it wouldnt be an obvious 
wasted field in all the non-fetch nodes.  In the end it amounts to the 
same thing, but just looked cleaner :-)   I could change if it you felt 
strongly about it and use subcode for the kind and create a tree_code 
field in the object for the operation.

Since integral atomics are always of an unsigned type ,  I could switch 
over and use 'unsigned size' instead of 'tree fntype' for them (I will 
rename it), but then things may  be more complicated when dealing with 
generic atomics...  those can be structure or array types and I was 
planning to allow leaving the type in case I discover something useful I 
can do with it.  It may ultimately turn out that the real type isn't 
going to matter, in which case I will remove it and replace it with an 
unsigned int for size.

And the reason memmodel is a tree is because, as ridiculous as it seems, 
it can ultimately be a runtime value.    Even barring that, it shows up 
as a variable after inlining before various propagation engines run, 
especially in the  C++ templates.  So it must be a tree.

>> I was actually thinking about doing it during gimplification... I hadnt
>> gotten as far as figuring out what to do with the functions from the front
>> end yet.  I dont know that code well, but I was in fact hoping there was a
>> way to 'recognize' the function names easily and avoid built in functions
>> completely...
> Heh ... you'd still need a GENERIC representation then.  Possibly
> a ATOMIC_OP tree may do.
possibly...   or maybe a single generic atomic_builtin with a kind and a 
variable list of  parameters.

> well, the names must remain exposed and recognizable since they are 'out
> there'.  Maybe under the covers I can just leave them as normal calls and
> then during gimplification simply recognize the names and generate
> GIMPLE_ATOMIC statements directly from the CALL_EXPR.  That would be ideal.
>   That way there are no builtins any more.
> I suppose my question was whether the frontends need to do sth about the
> __atomic keyword or if that is simply translated to some type flag - or
> is that keyword applying to operations, not to objects or types?
>
The _Atomic keyword is a type modifier like const or volatile.  So 
during gimplification I'll also look for all occurrences of variables in 
normal expressions which have that bit set in the type,  then translate 
the expression to utilize the new gimple atomic node.  so

_Atomic int var = 0;
  var += 4;
  foo (var);

would become

  __atomic_add_fetch (&var, 4, SEQ_CST);
  D.123 = __atomic_load (&var, SEQ_CST);
foo (D.123);




>>
>>>> So bottom line, a GIMPLE_ATOMIC statement is just an object that is much
>>>> easier to work with.
>>> Yes, I see that it is easier to work with for you.  All other statements
>>> will
>>> see GIMPLE_ATOMICs as blockers for their work though, even if they
>>> already deal with calls just fine - that's why I most of the time suggest
>>> to use builtins (or internal fns) to do things (I bet you forgot to update
>>> enough predicates ...).  Can GIMPLE_ATOMICs throw with
>>> -fnon-call-exceptions?
>>> I suppose yes.  One thing you missed at least ;)
>>
>> Not that I am aware of, they are 'noexcept'.  But I'm sure I've missed more
>> than a few things so far.  Im pretty early in the process :-)
> What about compare-exchange on a pointer dereference? The pointer
> dereference surely can trap, so it can throw with -fnon-call-exceptions.  No?
in theory *any* gimple atomic operation could end up being expanded as a 
library call, so I will have to ensure they are treated as calls because 
of things like that.

These are all things I will focus on once all the basic functionality is 
there.  This patch is not meant to be fully flushed out, I just wanted 
some eyes on it before I checked it into the branch so i dont carry this 
huge patch set around when making changes.  When I get things functional 
in the branch  I'll revisit all this and any of the other implementation 
comments I don't get to and then submit another patch for review.
>
> + /* Return the expression field of atomic operation GS.  */
> +
> + static inline tree
> + gimple_atomic_expr (const_gimple gs)
> + {
> +   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
> +   gcc_assert (gimple_atomic_has_expr (gs));
> +   return gimple_op (gs, 2);
> + }
>
> err - what's "expression" in this context?  I hope it's not an arbitrary
> tcc_expression tree?!

Its just the 'expression' parameter of atomic operations which have it, 
like  store , fetch_add, or exchange. It would normally be an SSA_NAME 
or const.  I suppose we coud call it 'value' or something if expression 
is confusing.


>
> + /* Return the arithmetic operation tree code for atomic operation GS.  */
> +
> + static inline enum tree_code
> + gimple_atomic_op_code (const_gimple gs)
> + {
> +   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
> +   gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_FETCH_OP ||
> +             gimple_atomic_kind (gs) == GIMPLE_ATOMIC_OP_FETCH);
> +   return (enum tree_code) gs->gsbase.subcode;
> + }
>
> now, what was it with this "expression" thing again?  Btw, ||s go to the
> next line.  subcode should be the atomic kind - it seems that you glob
> too much into GIMPLE_ATOMIC and that you maybe should sub-class
> GIMPLE_ATOMIC properly via the subcode field.  You'd have an
> gimple_atomic_base which fences could use for example.

I was trying to make it simple and utilize the variable length ops array 
to handle the variable stuff :-) It took many attempts to arrive at this 
layout.

They don't really break down well into sub-components.  I wanted to use 
a single routine to access a given field for all atomic kinds, so 
gimple_atomic_target(),  gimple_atomic_lhs(), and gimple_atomic_expr() 
just work .  The problem is that they don't break down into a tree 
structure, but more of a multiple-inheritance type thing.

LOAD has a LHS and a TARGET, no EXPR
STORE has no LHS, but has a TARGET and EXPR
EXCHANGE has a LHS, a TARGET and an EXPR
FENCE has no target or anything else.
COMPARE_EXCHANGE has 2 LHS, a TARGET, and an EXPR, not to mention an 
additional memorder

I set it up so that the variable length array always stores a given 
parameter at the same offset, and also allows for contiguous trees in 
the array to represent all the LHS or RHS parameters for easy generic 
operand scanning or whatever.  And each statement type only allocated 
the right amount of memory required.

I planned to add a comment to the description showing the layout of the 
various nodes:
/*
   LOAD         | ORDER | TARGET | LHS  |
    STORE        | ORDER | TARGET | EXPR |
    EXCHANGE     | ORDER | TARGET | EXPR | LHS      |
    COMPARE_EXCHG| ORDER | TARGET | EXPR | EXPECTED | FAIL_ORDER | LHS2 
| LHS1 |
    FETCH        | ORDER | TARGET | EXPR | LHS      |
    TEST_AND_SET | ORDER | TARGET | LHS  |
    CLEAR        | ORDER | TARGET |
    FENCE        | ORDER |


This allows all the RHS values to be contiguous for bulk processing like 
operands, and located at the same index for an easy access routine.  The 
LHS is viewed from right to left, and allows more than 1 value  as 
required by compare_exchange.  They also are contiguous for generic 
access, and a single LHS routine can also be used to access those values.
*/

Is that OK?     it seemed better than trying to map it to some sort of 
hierarchical structure it doesn't really fit into.

Andrew
Andrew MacLeod April 26, 2012, 8:07 p.m. UTC | #18
On 04/05/2012 05:14 AM, Richard Guenther wrote:
> + static inline bool
> + gimple_atomic_has_fail_order (const_gimple gs)
> + {
> +   return gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE;
> + }
>
> btw, these kind of predicates look superfluous to me - if they are true
> exactly for one atomic kind then users should use the predicate to
> test for that specific atomic kind, not for some random field presence.

yeah, a previous incarnation artifact, removed.
>
> All asserts in inline functions should be gcc_checking_asserts.
indeed.
>
>
> you should have a gimple_build_atomic_raw function that takes all
> operands and the atomic kind, avoiding the need for all the repeated
> calls of gimple_atomic_set_* as well as avoid all the repeated checking
> this causes.
as well. done.

>
> +   else if (is_gimple_atomic (stmt))
> +     {
> +       tree t;
> +       if (visit_store)
> +         {
> +         for (i = 0; i<  gimple_atomic_num_lhs (stmt); i++)
> +           {
> +             t = gimple_atomic_lhs (stmt, i);
> +             if (t)
> +               {
> +                 t = get_base_loadstore (t);
> +                 if (t)
> +                   ret |= visit_store (stmt, t, data);
> +               }
>
> I thought results are always registers?  The walk_stmt_load_store_addr_ops
> looks wrong to me.  As the loads/stores are implicit (you only have addresses)
> you have to adjust all callers of walk_stmt_load_store_addr_ops to handle
> atomics specially as they expect to come along all loads/stores that way.
> Those also handle calls and asms (for "memory" clobber) specially.

hmm, yeah they always return a value.     I was just copying the 
gimple_call code...  Why would we need to do this processing  for a 
GIMPLE_CALL lhs and not a GIMPLE_ATOMIC lhs?   And the RHS processing is 
the same as for a is_gimple_call as well...  it was lifted from the code 
immediately following.,..  I tried to write all this code to return the 
same values as would have been returned had it actually been a  built-in 
__sync or __atomic call like we have today.  Once I had them all, then 
we could actually make any improvements based on our known side effect 
limits.

I guess I don't understand the rest of the comment about why I need to 
do something different here than with a call...
> Index: tree-ssa-alias.c
> ===================================================================
> *** tree-ssa-alias.c    (revision 186098)
> --- tree-ssa-alias.c    (working copy)
> *************** ref_maybe_used_by_stmt_p (gimple stmt, t
> *** 1440,1445 ****
> --- 1440,1447 ----
>        }
>      else if (is_gimple_call (stmt))
>        return ref_maybe_used_by_call_p (stmt, ref);
> +   else if (is_gimple_atomic (stmt))
> +     return true;
>
> please add a comment before these atomic handlings that we assume
> they are using/clobbering all refs because they are considered memory
> optimization barriers.  Btw, does this apply to non-address-taken automatic
> references?  I suppose not.  Consider:
>
> int foo()
> {
>    struct s;
>    atomic_fence();
>    s.a = 1;
>    atomic_fence();
>    return s.a;
> }
>
> we still should be able to optimize this to return 1, no?  At least SRA will
yes, locals can do anything they want since they aren't visible to other 
processes.  at the moment, we'll leave those fences in because we dont 
optimize atomics at all, but  "in the fullness of time" this will be 
optimized to:
int foo()
{
   atomic_fence()
   return 1;
}

at the moment we produce:

int foo()
{
   atomic_fence()
   atomic_fence()
   return 1;
}

>
> *************** stmt_kills_ref_p_1 (gimple stmt, ao_ref
> *** 1814,1819 ****
> --- 1818,1825 ----
>          }
>        }
>
> +   if (is_gimple_atomic (stmt))
> +     return true;
>
> that's for sure wrong ;)  It should return false.

err, yeah. oops.
>
> Index: tree-ssa-sink.c
> ===================================================================
> *** tree-ssa-sink.c     (revision 186098)
> --- tree-ssa-sink.c     (working copy)
> *************** is_hidden_global_store (gimple stmt)
> *** 145,150 ****
> --- 145,154 ----
>        {
>          tree lhs;
>
> +       /* Don't optimize across an atomic operation.  */
> +       if (is_gimple_atomic (stmt))
> +         return true;
> +
>
> that's bogus, too (really all uses of is_hidden_global_store should go away).
> Please look into the few callers of this function and  handle atomics in
> a correct way explicitely.
All atomic operations will have a VDEF so in theory it should be fine to 
ignore.  There are only 2 uses:
   DCE already has code added to handle them, and
   DSE handles it through ref_maybe_used_by_stmt_p.

removed.


>
> +         else if (is_gimple_atomic (stmt))
> +           {
> +             unsigned n;
> +
> +             /* We may be able to lessen this with more relaxed memory
> +                models, but for now, its a full barrier.  */
> +             mark_all_reaching_defs_necessary (stmt);
> +
> +             for (n = 0; n<  gimple_atomic_num_rhs (stmt); n++)
> +               {
> +                 tree t = gimple_op (stmt, n);
> +                 if (TREE_CODE (t) != SSA_NAME&&
> +                     TREE_CODE (t) != INTEGER_CST&&
> +                     !is_gimple_min_invariant (t)&&
> +                     !ref_may_be_aliased (t))
> +                   mark_aliased_reaching_defs_necessary (stmt, t);
> +               }
> +           }
>
> for sure atomics do not make non-aliased automatic variable stores necessary.
> At least I hope so.  As there are only addresses in the atomic ops the
> code looks wrong to me (and&&s go to the next line ...).  As you are
> marking everything needed anway you can just remove the bogus loop
> completely.
I was simply copying the code path that was followed for gimple_call 
which handled the old __sync_ and __atomic builtins...  this is what 
that code did.  there are addresses in the atomic ops, but the 
functionality of those operands can dereference and store or load from 
those address...?
ie  the generic
   atomic_store (&atomic, &expr)
dereferences  *expr and stores it into *atomic...

I figured whatever I can see as a function call argument to __atomic_* 
would be seeable in a GIMPLE_ATOMIC position as well...?


> +     case GIMPLE_ATOMIC:
> +       /* Treat this like a call for now, it may expand into a call.  */
> +       if (gimple_atomic_kind (stmt) != GIMPLE_ATOMIC_FENCE)
> +       cost = gimple_num_ops (stmt) *
> +              estimate_move_cost (TREE_TYPE (gimple_atomic_target (stmt)));
> +       else
> +         cost = 1;
> +       break;
>
> for sure this counts way too many "ops".  There is at most a single
> memory operand as far as I understand.  A size/speed cost of 1 for
> a FENCE is also too low.
These were just fillers values to get it to compile until I had time to 
understand the proper way to find these values.  I thought there were 
just instruction count guesses, and again mimiced what the gimple_call 
code did.

> I miss handling of GIMPLE_ATOMICs in tree-ssa-structalias.c.
>
>
No doubt I havent gotten there yet because it did cause a compilation 
failure :-) :-)

Andrew
Richard Biener April 27, 2012, 8:37 a.m. UTC | #19
On Thu, Apr 26, 2012 at 7:53 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 04/05/2012 05:14 AM, Richard Guenther wrote:
>>
>>
>> Ok.  Remember that you should use non-tree things if you can in GIMPLE
>> land.  This probably means that both the size and the memmodel "operands"
>> should be
>>
>> + struct GTY(()) gimple_statement_atomic
>> + {
>> +   /* [ WORD 1-8 ]  */
>> +   struct gimple_statement_with_memory_ops_base membase;
>> +
>> +   /* [ WORD 9 ] */
>> +   enum gimple_atomic_kind kind;
>> +
>>      enum gimple_atomic_memmodel memmodel;
>>
>>      unsigned size;
>>
>> and not be trees in the ops array.  Even more, both kind and memmodel
>> should probably go in membase.base.subcode
>
>
> I'm using subcode now for for the fetch_op and op_fetch operation when we
> actually need a tree code for plus, sub, and, etc...  so I am utilizing that
> field. Since the 'kind' field is present in ALL node, and the tree code was
> only needed in some, it looked cleaner to utilize the subcode field for the
> 'sometimes' field so that it wouldnt be an obvious wasted field in all the
> non-fetch nodes.  In the end it amounts to the same thing, but just looked
> cleaner :-)   I could change if it you felt strongly about it and use
> subcode for the kind and create a tree_code field in the object for the
> operation.
>
> Since integral atomics are always of an unsigned type ,  I could switch over
> and use 'unsigned size' instead of 'tree fntype' for them (I will rename
> it), but then things may  be more complicated when dealing with generic
> atomics...  those can be structure or array types and I was planning to
> allow leaving the type in case I discover something useful I can do with it.
>  It may ultimately turn out that the real type isn't going to matter, in
> which case I will remove it and replace it with an unsigned int for size.

So it eventually will support variable-size types?

> And the reason memmodel is a tree is because, as ridiculous as it seems, it
> can ultimately be a runtime value.    Even barring that, it shows up as a
> variable after inlining before various propagation engines run, especially
> in the  C++ templates.  So it must be a tree.

Ick.  That sounds gross.  So, if it ends up a variable at the time we generate
assembly we use a "conservative" constant value for it?

>
>>> I was actually thinking about doing it during gimplification... I hadnt
>>> gotten as far as figuring out what to do with the functions from the
>>> front
>>> end yet.  I dont know that code well, but I was in fact hoping there was
>>> a
>>> way to 'recognize' the function names easily and avoid built in functions
>>> completely...
>>
>> Heh ... you'd still need a GENERIC representation then.  Possibly
>> a ATOMIC_OP tree may do.
>
> possibly...   or maybe a single generic atomic_builtin with a kind and a
> variable list of  parameters.
>
>
>> well, the names must remain exposed and recognizable since they are 'out
>> there'.  Maybe under the covers I can just leave them as normal calls and
>> then during gimplification simply recognize the names and generate
>> GIMPLE_ATOMIC statements directly from the CALL_EXPR.  That would be
>> ideal.
>>  That way there are no builtins any more.
>> I suppose my question was whether the frontends need to do sth about the
>> __atomic keyword or if that is simply translated to some type flag - or
>> is that keyword applying to operations, not to objects or types?
>>
> The _Atomic keyword is a type modifier like const or volatile.  So during
> gimplification I'll also look for all occurrences of variables in normal
> expressions which have that bit set in the type,  then translate the
> expression to utilize the new gimple atomic node.  so
>
> _Atomic int var = 0;
>  var += 4;
>  foo (var);
>
> would become
>
>  __atomic_add_fetch (&var, 4, SEQ_CST);
>  D.123 = __atomic_load (&var, SEQ_CST);
> foo (D.123);

Hmm, ok.  So you are changing GENERIC in fact.  I suppose _Atomic is
restricted to global variables.  Can I attach _Atomic to allocated storage
via pointer casts?

Consider writing up semantics of _Atomic into generic.texi please.

>
>
>
>
>>>
>>>>> So bottom line, a GIMPLE_ATOMIC statement is just an object that is
>>>>> much
>>>>> easier to work with.
>>>>
>>>> Yes, I see that it is easier to work with for you.  All other statements
>>>> will
>>>> see GIMPLE_ATOMICs as blockers for their work though, even if they
>>>> already deal with calls just fine - that's why I most of the time
>>>> suggest
>>>> to use builtins (or internal fns) to do things (I bet you forgot to
>>>> update
>>>> enough predicates ...).  Can GIMPLE_ATOMICs throw with
>>>> -fnon-call-exceptions?
>>>> I suppose yes.  One thing you missed at least ;)
>>>
>>>
>>> Not that I am aware of, they are 'noexcept'.  But I'm sure I've missed
>>> more
>>> than a few things so far.  Im pretty early in the process :-)
>>
>> What about compare-exchange on a pointer dereference? The pointer
>> dereference surely can trap, so it can throw with -fnon-call-exceptions.
>>  No?
>
> in theory *any* gimple atomic operation could end up being expanded as a
> library call, so I will have to ensure they are treated as calls because of
> things like that.

Well, they are calls with a very well-defined set of side-effects.  Otherwise
not representing them as calls would be a waste of time.  Thus, no - they
do not need to be considered "calls" everywhere (well, treating them the
same as calls should be conservative) but treating them as "atomics" even
if they were expanded as calls needs to be correct.

> These are all things I will focus on once all the basic functionality is
> there.  This patch is not meant to be fully flushed out, I just wanted some
> eyes on it before I checked it into the branch so i dont carry this huge
> patch set around when making changes.  When I get things functional in the
> branch  I'll revisit all this and any of the other implementation comments I
> don't get to and then submit another patch for review.

Ok, I see.

>>
>> + /* Return the expression field of atomic operation GS.  */
>> +
>> + static inline tree
>> + gimple_atomic_expr (const_gimple gs)
>> + {
>> +   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
>> +   gcc_assert (gimple_atomic_has_expr (gs));
>> +   return gimple_op (gs, 2);
>> + }
>>
>> err - what's "expression" in this context?  I hope it's not an arbitrary
>> tcc_expression tree?!
>
>
> Its just the 'expression' parameter of atomic operations which have it, like
>  store , fetch_add, or exchange. It would normally be an SSA_NAME or const.
>  I suppose we coud call it 'value' or something if expression is confusing.

Or even 'op'.

>
>
>>
>> + /* Return the arithmetic operation tree code for atomic operation GS.
>>  */
>> +
>> + static inline enum tree_code
>> + gimple_atomic_op_code (const_gimple gs)
>> + {
>> +   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
>> +   gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_FETCH_OP ||
>> +             gimple_atomic_kind (gs) == GIMPLE_ATOMIC_OP_FETCH);
>> +   return (enum tree_code) gs->gsbase.subcode;
>> + }
>>
>> now, what was it with this "expression" thing again?  Btw, ||s go to the
>> next line.  subcode should be the atomic kind - it seems that you glob
>> too much into GIMPLE_ATOMIC and that you maybe should sub-class
>> GIMPLE_ATOMIC properly via the subcode field.  You'd have an
>> gimple_atomic_base which fences could use for example.
>
>
> I was trying to make it simple and utilize the variable length ops array to
> handle the variable stuff :-) It took many attempts to arrive at this
> layout.
>
> They don't really break down well into sub-components.  I wanted to use a
> single routine to access a given field for all atomic kinds, so
> gimple_atomic_target(),  gimple_atomic_lhs(), and gimple_atomic_expr() just
> work .  The problem is that they don't break down into a tree structure, but
> more of a multiple-inheritance type thing.
>
> LOAD has a LHS and a TARGET, no EXPR
> STORE has no LHS, but has a TARGET and EXPR
> EXCHANGE has a LHS, a TARGET and an EXPR
> FENCE has no target or anything else.
> COMPARE_EXCHANGE has 2 LHS, a TARGET, and an EXPR, not to mention an
> additional memorder

Looks like a hieararchy "no target or anything else" -> "target" ->
"expr" -> "memorder" would work, with only "lhs" being optional and
present everywhere.
Of course the hierarchy only makes sense for things that are not trees
(I was thinking of memorder originally - but that thought fell apart).
 So in the
end apart from abstracting a base for FENCE the flat hieararchy makes sense
(all but FENCE have a type / size).

> I set it up so that the variable length array always stores a given
> parameter at the same offset, and also allows for contiguous trees in the
> array to represent all the LHS or RHS parameters for easy generic operand
> scanning or whatever.  And each statement type only allocated the right
> amount of memory required.
>
> I planned to add a comment to the description showing the layout of the
> various nodes:
> /*
>  LOAD         | ORDER | TARGET | LHS  |
>   STORE        | ORDER | TARGET | EXPR |
>   EXCHANGE     | ORDER | TARGET | EXPR | LHS      |
>   COMPARE_EXCHG| ORDER | TARGET | EXPR | EXPECTED | FAIL_ORDER | LHS2 | LHS1
> |
>   FETCH        | ORDER | TARGET | EXPR | LHS      |
>   TEST_AND_SET | ORDER | TARGET | LHS  |
>   CLEAR        | ORDER | TARGET |
>   FENCE        | ORDER |
>
>
> This allows all the RHS values to be contiguous for bulk processing like
> operands, and located at the same index for an easy access routine.  The LHS
> is viewed from right to left, and allows more than 1 value  as required by
> compare_exchange.  They also are contiguous for generic access, and a single
> LHS routine can also be used to access those values.
> */
>
> Is that OK?     it seemed better than trying to map it to some sort of
> hierarchical structure it doesn't really fit into.

I suppose it's ok.  Please consider making a base available for FENCE
(does FENCE have a memorder?)

Richard.

> Andrew
>
>
>
Richard Biener April 27, 2012, 8:52 a.m. UTC | #20
On Thu, Apr 26, 2012 at 10:07 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 04/05/2012 05:14 AM, Richard Guenther wrote:
>>
>> + static inline bool
>> + gimple_atomic_has_fail_order (const_gimple gs)
>> + {
>> +   return gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE;
>> + }
>>
>> btw, these kind of predicates look superfluous to me - if they are true
>> exactly for one atomic kind then users should use the predicate to
>> test for that specific atomic kind, not for some random field presence.
>
>
> yeah, a previous incarnation artifact, removed.
>
>>
>> All asserts in inline functions should be gcc_checking_asserts.
>
> indeed.
>
>>
>>
>> you should have a gimple_build_atomic_raw function that takes all
>> operands and the atomic kind, avoiding the need for all the repeated
>> calls of gimple_atomic_set_* as well as avoid all the repeated checking
>> this causes.
>
> as well. done.
>
>
>>
>> +   else if (is_gimple_atomic (stmt))
>> +     {
>> +       tree t;
>> +       if (visit_store)
>> +         {
>> +         for (i = 0; i<  gimple_atomic_num_lhs (stmt); i++)
>> +           {
>> +             t = gimple_atomic_lhs (stmt, i);
>> +             if (t)
>> +               {
>> +                 t = get_base_loadstore (t);
>> +                 if (t)
>> +                   ret |= visit_store (stmt, t, data);
>> +               }
>>
>> I thought results are always registers?  The walk_stmt_load_store_addr_ops
>> looks wrong to me.  As the loads/stores are implicit (you only have
>> addresses)
>> you have to adjust all callers of walk_stmt_load_store_addr_ops to handle
>> atomics specially as they expect to come along all loads/stores that way.
>> Those also handle calls and asms (for "memory" clobber) specially.
>
>
> hmm, yeah they always return a value.     I was just copying the gimple_call
> code...  Why would we need to do this processing  for a GIMPLE_CALL lhs and
> not a GIMPLE_ATOMIC lhs?

GIMPLE_CALL lhs can be memory if the call returns an aggregate, similar
GIMPLE_CALL arguments can be memory if they are aggregate.

Can this be the case for atomics as well?

>   And the RHS processing is the same as for a
> is_gimple_call as well...  it was lifted from the code immediately
> following.,..  I tried to write all this code to return the same values as
> would have been returned had it actually been a  built-in __sync or __atomic
> call like we have today.  Once I had them all, then we could actually make
> any improvements based on our known side effect limits.
>
> I guess I don't understand the rest of the comment about why I need to do
> something different here than with a call...

Well, all callers that use walk_stmt_load_store/addr_ops need to handle
non-explicit loads/stores represented by the stmt.  For calls this includes
the loads and stores the callee may perform.  For atomics this includes .... ?
(depends on whether the operand of an atomic-load is a pointer or an object,
I suppose it is a pointer for the following) For atomic this includes the
implicit load of *{address operand} which will _not_ be returned by
walk_stmt_load_store/addr_ops.  Thus callers that expect to see all loads/stores
(like ipa-pure-const.c) need to explicitely handle atomics similar to how they
handle calls and asms (if only in a very conservative way).  Similar the
alias oracle needs to handle them (obviously).

>> Index: tree-ssa-alias.c
>> ===================================================================
>> *** tree-ssa-alias.c    (revision 186098)
>> --- tree-ssa-alias.c    (working copy)
>> *************** ref_maybe_used_by_stmt_p (gimple stmt, t
>> *** 1440,1445 ****
>> --- 1440,1447 ----
>>       }
>>     else if (is_gimple_call (stmt))
>>       return ref_maybe_used_by_call_p (stmt, ref);
>> +   else if (is_gimple_atomic (stmt))
>> +     return true;
>>
>> please add a comment before these atomic handlings that we assume
>> they are using/clobbering all refs because they are considered memory
>> optimization barriers.  Btw, does this apply to non-address-taken
>> automatic
>> references?  I suppose not.  Consider:
>>
>> int foo()
>> {
>>   struct s;
>>   atomic_fence();
>>   s.a = 1;
>>   atomic_fence();
>>   return s.a;
>> }
>>
>> we still should be able to optimize this to return 1, no?  At least SRA
>> will
>
> yes, locals can do anything they want since they aren't visible to other
> processes.  at the moment, we'll leave those fences in because we dont
> optimize atomics at all, but  "in the fullness of time" this will be
> optimized to:
> int foo()
> {
>  atomic_fence()
>  return 1;
> }
>
> at the moment we produce:
>
> int foo()
> {
>  atomic_fence()
>  atomic_fence()
>  return 1;
>
> }

Which is inconsistend with the alias-oracle implementation.  Please fix it
to at _least_ not consider non-aliased locals.  You can look at the call
handling for how to do this - where you can also see how to do even better
by using points-to information from the pointer operand of the atomics
that specify the memory loaded/stored.  You don't want to hide all your
implementation bugs by making the alias oracle stupid ;)

>>
>> *************** stmt_kills_ref_p_1 (gimple stmt, ao_ref
>> *** 1814,1819 ****
>> --- 1818,1825 ----
>>         }
>>       }
>>
>> +   if (is_gimple_atomic (stmt))
>> +     return true;
>>
>> that's for sure wrong ;)  It should return false.
>
>
> err, yeah. oops.
>
>>
>> Index: tree-ssa-sink.c
>> ===================================================================
>> *** tree-ssa-sink.c     (revision 186098)
>> --- tree-ssa-sink.c     (working copy)
>> *************** is_hidden_global_store (gimple stmt)
>> *** 145,150 ****
>> --- 145,154 ----
>>       {
>>         tree lhs;
>>
>> +       /* Don't optimize across an atomic operation.  */
>> +       if (is_gimple_atomic (stmt))
>> +         return true;
>> +
>>
>> that's bogus, too (really all uses of is_hidden_global_store should go
>> away).
>> Please look into the few callers of this function and  handle atomics in
>> a correct way explicitely.
>
> All atomic operations will have a VDEF so in theory it should be fine to
> ignore.

Ignore?  Returning 'false' would be ignoring them.  Why do all atomic
operations have a VDEF?  At least atomic loads are not considered
stores, are they?

>  There are only 2 uses:
>  DCE already has code added to handle them, and
>  DSE handles it through ref_maybe_used_by_stmt_p.

Yes.  And eventually what matters for both callers should be folded into them
(let me put that somewhere ontop of my TODO ...)

For example DCE would happily remove atomic loads if they look like

 result-SSA-name = ATOMIC-LOAD <ptr-SSA-name>;

if result-SSA-name is not used anywhere.  And I don't see why we should
not be allowed to do this?  Returning true from the above function will
make DCE not remove it.

> removed.
>
>
>>
>> +         else if (is_gimple_atomic (stmt))
>> +           {
>> +             unsigned n;
>> +
>> +             /* We may be able to lessen this with more relaxed memory
>> +                models, but for now, its a full barrier.  */
>> +             mark_all_reaching_defs_necessary (stmt);
>> +
>> +             for (n = 0; n<  gimple_atomic_num_rhs (stmt); n++)
>> +               {
>> +                 tree t = gimple_op (stmt, n);
>> +                 if (TREE_CODE (t) != SSA_NAME&&
>> +                     TREE_CODE (t) != INTEGER_CST&&
>> +                     !is_gimple_min_invariant (t)&&
>> +                     !ref_may_be_aliased (t))
>> +                   mark_aliased_reaching_defs_necessary (stmt, t);
>> +               }
>> +           }
>>
>> for sure atomics do not make non-aliased automatic variable stores
>> necessary.
>> At least I hope so.  As there are only addresses in the atomic ops the
>> code looks wrong to me (and&&s go to the next line ...).  As you are
>>
>> marking everything needed anway you can just remove the bogus loop
>> completely.
>
> I was simply copying the code path that was followed for gimple_call which
> handled the old __sync_ and __atomic builtins...  this is what that code
> did.  there are addresses in the atomic ops, but the functionality of those
> operands can dereference and store or load from those address...?
> ie  the generic
>  atomic_store (&atomic, &expr)
> dereferences  *expr and stores it into *atomic...

Sure.  But it only can ever load/store from aliased variables (not
non-address-taken
local automatics).  That we did not handle this for the __sync_ atomics very
well is no excuse to not handle it well for atomics ;)

> I figured whatever I can see as a function call argument to __atomic_* would
> be seeable in a GIMPLE_ATOMIC position as well...?

Yes.  But you know - compared to a random call - there are no other side-effects
on memory.

>> +     case GIMPLE_ATOMIC:
>> +       /* Treat this like a call for now, it may expand into a call.  */
>> +       if (gimple_atomic_kind (stmt) != GIMPLE_ATOMIC_FENCE)
>> +       cost = gimple_num_ops (stmt) *
>> +              estimate_move_cost (TREE_TYPE (gimple_atomic_target
>> (stmt)));
>> +       else
>> +         cost = 1;
>> +       break;
>>
>> for sure this counts way too many "ops".  There is at most a single
>> memory operand as far as I understand.  A size/speed cost of 1 for
>> a FENCE is also too low.
>
> These were just fillers values to get it to compile until I had time to
> understand the proper way to find these values.  I thought there were just
> instruction count guesses, and again mimiced what the gimple_call code did.

Sure, still it does not make sense to account for "move" cost of the memorder
operand the same as for the "move" cost of the eventual load/store that
happens.  If it's a filler just fill in 'cost = 10' with a comment.

>
>> I miss handling of GIMPLE_ATOMICs in tree-ssa-structalias.c.
>>
>>
> No doubt I havent gotten there yet because it did cause a compilation
> failure :-) :-)

;)  It will only cause ICEs and miscompiles.

Richard.

> Andrew
Andrew MacLeod April 27, 2012, 12:31 p.m. UTC | #21
On 04/27/2012 04:37 AM, Richard Guenther wrote:
> Since integral atomics are always of an unsigned type ,  I could switch over
> and use 'unsigned size' instead of 'tree fntype' for them (I will rename
> it), but then things may  be more complicated when dealing with generic
> atomics...  those can be structure or array types and I was planning to
> allow leaving the type in case I discover something useful I can do with it.
>   It may ultimately turn out that the real type isn't going to matter, in
> which case I will remove it and replace it with an unsigned int for size.
> So it eventually will support variable-size types?
we support arbitrary sized objects now for exchange, compare_exchange, 
load, and store. I just havent added the support to gimple_atomic yet.  
Thats next on my list.
>> And the reason memmodel is a tree is because, as ridiculous as it seems, it
>> can ultimately be a runtime value.    Even barring that, it shows up as a
>> variable after inlining before various propagation engines run, especially
>> in the  C++ templates.  So it must be a tree.
> Ick.  That sounds gross.  So, if it ends up a variable at the time we generate
> assembly we use a "conservative" constant value for it?
Indeed,  ICK,  and yes, I make it SEQ_CST at compile time if a variable 
value gets all the way through to codegen.
> Hmm, ok. So you are changing GENERIC in fact. I suppose _Atomic is 
> restricted to global variables. Can I attach _Atomic to allocated 
> storage via pointer casts? Consider writing up semantics of _Atomic 
> into generic.texi please.

Yes, I have a start on implementing _Atomic which adds the bit to the 
type.  And no, it isn't restricted to globals... it just indicates said 
hunk of memory must be accessed in an atomic manner.   You can use it on 
automatics if you want...

I believe _Atomic can also be used in casts as well.  so an atomic call 
may have to be injected into a dereference expression as well to load or 
update memory.

I'll update .texi along with the patch when I get to it.



> Well, they are calls with a very well-defined set of side-effects. 
> Otherwise not representing them as calls would be a waste of time. 
> Thus, no - they do not need to be considered "calls" everywhere (well, 
> treating them the same as calls should be conservative) but treating 
> them as "atomics" even if they were expanded as calls needs to be 
> correct. 

Yeah, I was starting with them as calls just to be conservative and get 
the same behaviour, then refine them by examining each GIMPLE_ATOMIC use 
in detail.
> Looks like a hieararchy "no target or anything else" ->  "target" ->
> "expr" ->  "memorder" would work, with only "lhs" being optional and
> present everywhere.
> Of course the hierarchy only makes sense for things that are not trees
> (I was thinking of memorder originally - but that thought fell apart).
>   So in the
> end apart from abstracting a base for FENCE the flat hieararchy makes sense
> (all but FENCE have a type / size).
Is it really worth it to have 2 different types of gimple_atomic nodes 
in order to avoid having an unused type field in atomic_fence?  
Atomic_fence has an order field, so it requires the use of op[0] in 
order for gimple_atomic_order() to work properly, we can't have all the 
other gimple nodes inherit from that, so we'd need a base which 
basically has just the KIND in it, and target/no-target nodes inherit 
from that.  So we'd have:

                     /    gimple_atomic_fence
atomic_base
                     \   gimple_atomic_all_others

then all the checks for is_gimple_atomic() have to look for both 
gimple_atomic_all_others and gimple_atomic_fence.


Perhaps a better option is to shift all the operands by one since the 
type is a tree... ie, if there is a a target, there is also a type entry, so
op[0] remains ORDER,
op[1] is the newly inserted TYPE
op[2] then becomes the TARGET,
op[3...] ,when present, are all shifted over by one.

the type can be removed from the structure and now there would be no 
wastage.  Is that reasonable?

Andrew
Andrew MacLeod April 27, 2012, 1:36 p.m. UTC | #22
On 04/27/2012 04:52 AM, Richard Guenther wrote:
> hmm, yeah they always return a value.     I was just copying the gimple_call
> code...  Why would we need to do this processing  for a GIMPLE_CALL lhs and
> not a GIMPLE_ATOMIC lhs?
> GIMPLE_CALL lhs can be memory if the call returns an aggregate, similar
> GIMPLE_CALL arguments can be memory if they are aggregate.
>
> Can this be the case for atomics as well?

Ahhhh. point taken.  No.  No aggregates can ever be returned by an 
atomic, so yes, I can trim that out then.
When generic atomics are used to handle aggregates, it's all handled by 
pointer parameters and the function is usually  void, except for 
compare_exchange which returns a boolean.
>
>>    And the RHS processing is the same as for a
>> is_gimple_call as well...  it was lifted from the code immediately
>> following.,..  I tried to write all this code to return the same values as
>> would have been returned had it actually been a  built-in __sync or __atomic
>> call like we have today.  Once I had them all, then we could actually make
>> any improvements based on our known side effect limits.
>>
>> I guess I don't understand the rest of the comment about why I need to do
>> something different here than with a call...
> Well, all callers that use walk_stmt_load_store/addr_ops need to handle
> non-explicit loads/stores represented by the stmt.  For calls this includes
> the loads and stores the callee may perform.  For atomics this includes .... ?
the only implicit thing a normal atomic operation can do is load or 
store the TARGET.
When the type is an aggregate, then the EXPR (soon to be op :-) and/or 
EXPECTED field may also be loaded or stored indrectly as all the 
parameters become pointers.  When I add that code I will reflect that 
properly.
> (depends on whether the operand of an atomic-load is a pointer or an object,
> I suppose it is a pointer for the following) For atomic this includes the
> implicit load of *{address operand} which will _not_ be returned by
> walk_stmt_load_store/addr_ops.  Thus callers that expect to see all loads/stores
> (like ipa-pure-const.c) need to explicitely handle atomics similar to how they
> handle calls and asms (if only in a very conservative way).  Similar the
> alias oracle needs to handle them (obviously).
OK, I understand better :-)
>> yes, locals can do anything they want since they aren't visible to other
>> processes.  at the moment, we'll leave those fences in because we dont
>> optimize atomics at all, but  "in the fullness of time" this will be
>> optimized to:
>> int foo()
>> {
>>   atomic_fence()
>>   return 1;
>> }
>>
>> at the moment we produce:
>>
>> int foo()
>> {
>>   atomic_fence()
>>   atomic_fence()
>>   return 1;
>>
>> }
> Which is inconsistend with the alias-oracle implementation.  Please fix it
> to at _least_ not consider non-aliased locals.  You can look at the call
> handling for how to do this - where you can also see how to do even better
> by using points-to information from the pointer operand of the atomics
> that specify the memory loaded/stored.  You don't want to hide all your
> implementation bugs by making the alias oracle stupid ;)
I'm too transparently lazy obviously :-)  I'll look at it :-)
>
> All atomic operations will have a VDEF so in theory it should be fine to
> ignore.
> Ignore?  Returning 'false' would be ignoring them.  Why do all atomic
> operations have a VDEF?  At least atomic loads are not considered
> stores, are they?
very true, and neither do fences...  But we can't move any shared memory 
load or store past them, so making them all VDEFS prevents that 
activity.  Once gimple_atomic is working properly, it might be possible 
to go back and adjust places. In particular, we may be able to allow 
directional movement based on the memory order.  seq_cst will always 
block motion in both directions, but the acquire model allows shared 
memory accesses to be sunk, and the release model allows shared memory 
access to be hoisted. relaxed allows both..  We may be able to get this 
behaviour through appropriate uses of VDEFs and VUSES...  or maybe it 
will be through a more basic understanding of GIMPLE_ATOMIC in the 
appropriate places.  I will visit that once everything is working 
conservatively.

>>   There are only 2 uses:
>>   DCE already has code added to handle them, and
>>   DSE handles it through ref_maybe_used_by_stmt_p.
> Yes.  And eventually what matters for both callers should be folded into them
> (let me put that somewhere ontop of my TODO ...)
>
> For example DCE would happily remove atomic loads if they look like
>
>   result-SSA-name = ATOMIC-LOAD<ptr-SSA-name>;
>
> if result-SSA-name is not used anywhere.  And I don't see why we should
> not be allowed to do this?  Returning true from the above function will
> make DCE not remove it.
at the moment, we are NOT allowed to remove any atomic operation. There 
are synchronization side effects and the only way we can remove such a 
statement is by analyzing the atomic operations in relation to each 
other. (ie, 2 loads from the same atomic with no other intervening 
atomic operation may make the first one dead if the memory orders are 
compatible)

I will also revisit this once we have a set of operations and conditions 
upon which we can operate. It may be possible to simply remove this 
load, but for the moment it stays.  My optimization wiki page indicates 
that a dead atomic load might be removable.. I just haven't had 
confirmation from 'experts' on atomic operations yet, so I won't act on 
that yet.



>> I was simply copying the code path that was followed for gimple_call 
>> which handled the old __sync_ and __atomic builtins...  this is what 
>> that code did.  there are addresses in the atomic ops, but the 
>> functionality of those operands can dereference and store or load 
>> from those address...? ie  the generic  atomic_store (&atomic, &expr) 
>> dereferences  *expr and stores it into *atomic... 
> Sure.  But it only can ever load/store from aliased variables (not
> non-address-taken
> local automatics).  That we did not handle this for the __sync_ atomics very
> well is no excuse to not handle it well for atomics ;)
absolutely.  And Im going to be rolling all the __sync builtins as well 
eventually as well.  I was just explaining where the code came from, not 
that its optimal :-)  I'll remove the bogus loop :-)
>> These were just fillers values to get it to compile until I had time to
>> understand the proper way to find these values.  I thought there were just
>> instruction count guesses, and again mimiced what the gimple_call code did.
> Sure, still it does not make sense to account for "move" cost of the memorder
> operand the same as for the "move" cost of the eventual load/store that
> happens.  If it's a filler just fill in 'cost = 10' with a comment.
will do.
>
>>> I miss handling of GIMPLE_ATOMICs in tree-ssa-structalias.c.
>>>
>>>
>> No doubt I havent gotten there yet because it did cause a compilation
>> failure :-) :-)
> ;)  It will only cause ICEs and miscompiles.
err, I meant to say did NOT ICE.  I'll have a look at what I need in 
tree-ssa-structalias.c

Thanks
Andrew
Richard Biener May 3, 2012, 9:59 a.m. UTC | #23
On Fri, Apr 27, 2012 at 2:31 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 04/27/2012 04:37 AM, Richard Guenther wrote:
>>
>> Since integral atomics are always of an unsigned type ,  I could switch
>> over
>> and use 'unsigned size' instead of 'tree fntype' for them (I will rename
>> it), but then things may  be more complicated when dealing with generic
>> atomics...  those can be structure or array types and I was planning to
>> allow leaving the type in case I discover something useful I can do with
>> it.
>>  It may ultimately turn out that the real type isn't going to matter, in
>> which case I will remove it and replace it with an unsigned int for size.
>> So it eventually will support variable-size types?
>
> we support arbitrary sized objects now for exchange, compare_exchange, load,
> and store. I just havent added the support to gimple_atomic yet.  Thats next
> on my list.
>
>>> And the reason memmodel is a tree is because, as ridiculous as it seems,
>>> it
>>> can ultimately be a runtime value.    Even barring that, it shows up as a
>>> variable after inlining before various propagation engines run,
>>> especially
>>> in the  C++ templates.  So it must be a tree.
>>
>> Ick.  That sounds gross.  So, if it ends up a variable at the time we
>> generate
>> assembly we use a "conservative" constant value for it?
>
> Indeed,  ICK,  and yes, I make it SEQ_CST at compile time if a variable
> value gets all the way through to codegen.
>
>> Hmm, ok. So you are changing GENERIC in fact. I suppose _Atomic is
>> restricted to global variables. Can I attach _Atomic to allocated storage
>> via pointer casts? Consider writing up semantics of _Atomic into
>> generic.texi please.
>
>
> Yes, I have a start on implementing _Atomic which adds the bit to the type.
>  And no, it isn't restricted to globals... it just indicates said hunk of
> memory must be accessed in an atomic manner.   You can use it on automatics
> if you want...
>
> I believe _Atomic can also be used in casts as well.  so an atomic call may
> have to be injected into a dereference expression as well to load or update
> memory.
>
> I'll update .texi along with the patch when I get to it.
>
>
>
>
>> Well, they are calls with a very well-defined set of side-effects.
>> Otherwise not representing them as calls would be a waste of time. Thus, no
>> - they do not need to be considered "calls" everywhere (well, treating them
>> the same as calls should be conservative) but treating them as "atomics"
>> even if they were expanded as calls needs to be correct.
>
>
> Yeah, I was starting with them as calls just to be conservative and get the
> same behaviour, then refine them by examining each GIMPLE_ATOMIC use in
> detail.
>
>> Looks like a hieararchy "no target or anything else" ->  "target" ->
>> "expr" ->  "memorder" would work, with only "lhs" being optional and
>> present everywhere.
>> Of course the hierarchy only makes sense for things that are not trees
>> (I was thinking of memorder originally - but that thought fell apart).
>>  So in the
>> end apart from abstracting a base for FENCE the flat hieararchy makes
>> sense
>> (all but FENCE have a type / size).
>
> Is it really worth it to have 2 different types of gimple_atomic nodes in
> order to avoid having an unused type field in atomic_fence?  Atomic_fence
> has an order field, so it requires the use of op[0] in order for
> gimple_atomic_order() to work properly, we can't have all the other gimple
> nodes inherit from that, so we'd need a base which basically has just the
> KIND in it, and target/no-target nodes inherit from that.  So we'd have:
>
>                    /    gimple_atomic_fence
> atomic_base
>                    \   gimple_atomic_all_others
>
> then all the checks for is_gimple_atomic() have to look for both
> gimple_atomic_all_others and gimple_atomic_fence.
>
>
> Perhaps a better option is to shift all the operands by one since the type
> is a tree... ie, if there is a a target, there is also a type entry, so
> op[0] remains ORDER,
> op[1] is the newly inserted TYPE
> op[2] then becomes the TARGET,
> op[3...] ,when present, are all shifted over by one.
>
> the type can be removed from the structure and now there would be no
> wastage.  Is that reasonable?

types in ops is a bad idea (tried that once - heh), it wrecks too much generic
code.

Richard.

> Andrew
Andrew MacLeod May 3, 2012, 4:31 p.m. UTC | #24
On 05/03/2012 05:59 AM, Richard Guenther wrote:
>
>> the type can be removed from the structure and now there would be no
>> wastage.  Is that reasonable?
> types in ops is a bad idea (tried that once - heh), it wrecks too much generic
> code.
>
heh, no doubt.  As I Iooked at handling the generic atomics, looks like 
Im going to restructure it anyway a bit, and it will actually match what 
you were looking for in the first place.  the generic functions deal 
with slabs of memory, so instead of a LHS, they pass in a pointer for 
the result. as well as any other values like the RHS expression value.

so now I expect to see something more along the lines of

gimple_atomic_base  { kind,  order }

gimple_atomic_target_base : gimple_atomic_base { type, target }

gimple_atomic : gimple_atomic_target_base { ops[] }

gimple_atomic_generic : gimple_atomic_target_base { return_val;  ops[] }


so FENCEs will be gimple_atomic_base,
all the rest of the normal ones will be gimple_atomic
and the generi oness will be gimple_atomic_generic

or something like that. It aint working yet so it isnt in stone and 
those are just preliminary names, but it will have a proper hierarchy of 
some sort. :-)

Andrew
diff mbox

Patch


	* sync-builtins.def (BUILT_IN_ATOMIC_ALWAYS_LOCK_FREE,
	BUILT_IN_ATOMIC_IS_LOCK_FREE): relocate to make identifying atomic
	builtins mapping to tree codes easier.
	* gsstruct.def (GSS_ATOMIC): New gimple garbage collection format.
	* gimple.def (GIMPLE_ATOMIC): New gimple statement type.
	* gimple.h (GF_ATOMIC_THREAD_FENCE, GF_ATOMIC_WEAK): New flags.
	(enum gimple_atomic_kind): New.  Kind of atomic operations.
	(struct gimple_statement_atomic): New. Gimple atomic statement.
	(is_gimple_atomic, gimple_atomic_kind, gimple_atomic_set_kind,
	gimple_atomic_type, gimple_atomic_set_type, gimple_atomic_num_lhs,
	gimple_atomic_num_rhs, gimple_atomic_has_lhs,
	gimple_atomic_lhs, gimple_atomic_lhs_ptr, gimple_atomic_set_lhs,
	gimple_atomic_order, gimple_atomic_order_ptr, gimple_atomic_set_order,
	gimple_atomic_has_target, gimple_atomic_target,
	gimple_atomic_target_ptr, gimple_atomic_set_target,
	gimple_atomic_has_expr, gimple_atomic_expr, gimple_atomic_expr_ptr,
	gimple_atomic_set_expr, gimple_atomic_has_expected,
	gimple_atomic_expected, gimple_atomic_expected_ptr,
	gimple_atomic_set_expected, gimple_atomic_has_fail_order,
	gimple_atomic_fail_order, gimple_atomic_fail_order_ptr,
	gimple_atomic_set_fail_order, gimple_atomic_op_code,
	gimple_atomic_set_op_code, gimple_atomic_thread_fence,
	gimple_atomic_set_thread_fence, gimple_atomic_weak,
	gimple_atomic_set_weak): New. Helper functions for new atomic statement.
	* gimple.c (gimple_build_atomic_load, gimple_build_atomic_store,
	gimple_build_atomic_exchange, gimple_build_atomic_compare_exchange,
	gimple_build_atomic_fetch_op, gimple_build_atomic_op_fetch,
	gimple_build_atomic_test_and_set, gimple_build_atomic_clear,
	gimple_build_atomic_fence): New. Functions to construct atomic
	statements.
	(walk_gimple_op): Handle GIMPLE_ATOMIC case.
	(walk_stmt_load_store_addr_ops): Handle walking GIMPLE_ATOMIC.
	* cfgexpand.c (expand_atomic_stmt): New.  Expand a GIMPLE_ATOMIC stmt
	into RTL.
	(expand_gimple_assign_move, expand_gimple_assign): Split out from
	GIMPLE_ASSIGN case of expand_gimple_stmt_1.
	(expand_gimple_stmt_1): Handle GIMPLE_ATOMIC case.
	* Makefile.in (tree-atomic.o): Add new object file.
	* tree-atomic.c: New file.
	(get_atomic_type): New.  Return the type of an atomic operation.
	(get_memmodel): New.  Get memory model for an opertation.
	(gimple_verify_memmodel): New.  Verify the validity of a memory model.
	(expand_atomic_target): New:  Expand atomic memory location to RTL.
	(expand_expr_force_mode): New.  Force expression to the correct mode.
	(get_atomic_lhs_rtx): New.  Expand RTL for a LHS expression.
	(expand_gimple_atomic_library_call): New.  Turn an atomic operation
	into a library call.
	(expand_gimple_atomic_load, expand_gimple_atomic_store,
	expand_gimple_atomic_exchange, expand_gimple_atomic_compare_exchange):
	New.  Expand atomic operations to RTL.
	(rtx_code_from_tree_code): New.  Tree code to rtx code.
	(expand_atomic_fetch, expand_gimple_atomic_fetch_op,
	expand_gimple_atomic_op_fetch, expand_gimple_atomic_test_and_set,
	expand_gimple_atomic_clear, expand_gimple_atomic_fence): New.  Expand
	atomic operations to RTL.
	(is_built_in_atomic): New.  Check for atomic builtin functions.
	(atomic_func_type): New.  Base type of atomic builtin function.
	(lower_atomic_call): New.  Convert an atomic builtin to gimple.
	(lower_atomics): New.  Entry point to lower all atomic operations.
	(gate_lower_atomics): New gate routine.
	(pass_lower_atomics): New pass structure.
	* tree-ssa-operands.c (parse_ssa_operands): Handle GIMPLE_ATOMIC.
	* gimple-pretty-print.c (dump_gimple_atomic_kind_op): New.  Print
	atomic statement kind.
	(dump_gimple_atomic_order): New.  Print atomic memory order.
	(dump_gimple_atomic_type_size): New.  Append size of atomic operation.
	(dump_gimple_atomic): New.  Dump an atomic statement.
	(dump_gimple_stmt): Handle GIMPLE_ATOMIC case.
	* tree-cfg.c (verify_gimple_atomic): New.  Verify gimple atomic stmt.
	(verify_gimple_stmt): Handle GIMPLE_ATOMIC case.
	* tree-pass.h (pass_lower_atomics): Declare.
	* passes.c (init_optimization_passes): Add pass_lower_atomics right
	after CFG construction.
	* gimple-low.c (lower_stmt): Handle GIMPLE_ATOMIC case.
	* tree-ssa-alias.c (ref_maybe_used_by_stmt_p): Handle GIMPLE_ATOMIC.
	(stmt_may_clobber_ref_p_1): Handle GIMPLE_ATOMIC.
	(stmt_kills_ref_p_1): Handle GIMPLE_ATOMIC.
	* tree-ssa-sink.c (is_hidden_global_store): GIMPLE_ATOMIC prevents
	optimization.
	* tree-ssa-dce.c (propagate_necessity): Handle GIMPLE_ATOMIC in
	reaching defs.
	* tree-inline.c (estimate_num_insns): Handle GIMPLE_ATOMIC.
	* ipa-pure-const.c (check_stmt): GIMPLE_ATOMIC affects pure/const.

Index: sync-builtins.def
===================================================================
*** sync-builtins.def	(revision 186098)
--- sync-builtins.def	(working copy)
*************** DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_FETCH_
*** 583,597 ****
  		  "__atomic_fetch_or_16",
  		  BT_FN_I16_VPTR_I16_INT, ATTR_NOTHROW_LEAF_LIST)
  
- DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_ALWAYS_LOCK_FREE,
- 		  "__atomic_always_lock_free",
- 		  BT_FN_BOOL_SIZE_CONST_VPTR, ATTR_CONST_NOTHROW_LEAF_LIST)
- 
- DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_IS_LOCK_FREE,
- 		  "__atomic_is_lock_free",
- 		  BT_FN_BOOL_SIZE_CONST_VPTR, ATTR_CONST_NOTHROW_LEAF_LIST)
- 
- 
  DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_THREAD_FENCE,
  		  "__atomic_thread_fence",
  		  BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
--- 583,588 ----
*************** DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_SIGNAL
*** 600,602 ****
--- 591,602 ----
  		  "__atomic_signal_fence",
  		  BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
  
+ 
+ DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_ALWAYS_LOCK_FREE,
+ 		  "__atomic_always_lock_free",
+ 		  BT_FN_BOOL_SIZE_CONST_VPTR, ATTR_CONST_NOTHROW_LEAF_LIST)
+ 
+ DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_IS_LOCK_FREE,
+ 		  "__atomic_is_lock_free",
+ 		  BT_FN_BOOL_SIZE_CONST_VPTR, ATTR_CONST_NOTHROW_LEAF_LIST)
+ 
Index: gsstruct.def
===================================================================
*** gsstruct.def	(revision 186098)
--- gsstruct.def	(working copy)
*************** DEFGSSTRUCT(GSS_WITH_OPS, gimple_stateme
*** 30,35 ****
--- 30,36 ----
  DEFGSSTRUCT(GSS_WITH_MEM_OPS_BASE, gimple_statement_with_memory_ops_base, false)
  DEFGSSTRUCT(GSS_WITH_MEM_OPS, gimple_statement_with_memory_ops, true)
  DEFGSSTRUCT(GSS_CALL, gimple_statement_call, true)
+ DEFGSSTRUCT(GSS_ATOMIC, gimple_statement_atomic, true)
  DEFGSSTRUCT(GSS_ASM, gimple_statement_asm, true)
  DEFGSSTRUCT(GSS_BIND, gimple_statement_bind, false)
  DEFGSSTRUCT(GSS_PHI, gimple_statement_phi, false)
Index: gimple.def
===================================================================
*** gimple.def	(revision 186098)
--- gimple.def	(working copy)
*************** DEFGSCODE(GIMPLE_ASM, "gimple_asm", GSS_
*** 124,129 ****
--- 124,138 ----
      CHAIN is the optional static chain link for nested functions.  */
  DEFGSCODE(GIMPLE_CALL, "gimple_call", GSS_CALL)
  
+ /* GIMPLE_ATOMIC <KIND, TYPE, ARG1... ARGN> Represents an atomic
+    operation which maps to a builtin function call.
+ 
+    FN is the gimple atomic operation KIND.
+    TYPE is the base type of the atomic operation.
+ 
+    ARG1-ARGN are the other arguments required by the various operations.  */
+ DEFGSCODE(GIMPLE_ATOMIC, "gimple_atomic", GSS_ATOMIC)
+ 
  /* GIMPLE_TRANSACTION <BODY, LABEL> represents __transaction_atomic and
     __transaction_relaxed blocks.
     BODY is the sequence of statements inside the transaction.
Index: gimple.h
===================================================================
*** gimple.h	(revision 186098)
--- gimple.h	(working copy)
*************** enum gimple_rhs_class
*** 97,102 ****
--- 97,104 ----
  enum gf_mask {
      GF_ASM_INPUT		= 1 << 0,
      GF_ASM_VOLATILE		= 1 << 1,
+     GF_ATOMIC_THREAD_FENCE	= 1 << 0,
+     GF_ATOMIC_WEAK		= 1 << 0,
      GF_CALL_FROM_THUNK		= 1 << 0,
      GF_CALL_RETURN_SLOT_OPT	= 1 << 1,
      GF_CALL_TAILCALL		= 1 << 2,
*************** struct GTY(()) gimple_statement_call
*** 424,429 ****
--- 426,466 ----
    tree GTY((length ("%h.membase.opbase.gsbase.num_ops"))) op[1];
  };
  
+ /* Kind of GIMPLE_ATOMIC statements.  */
+ enum gimple_atomic_kind
+ {
+   GIMPLE_ATOMIC_LOAD,
+   GIMPLE_ATOMIC_STORE,
+   GIMPLE_ATOMIC_EXCHANGE,
+   GIMPLE_ATOMIC_COMPARE_EXCHANGE,
+   GIMPLE_ATOMIC_FETCH_OP,
+   GIMPLE_ATOMIC_OP_FETCH,
+   GIMPLE_ATOMIC_TEST_AND_SET,
+   GIMPLE_ATOMIC_CLEAR,
+   GIMPLE_ATOMIC_FENCE 
+ };
+ 
+ /* GIMPLE_ATOMIC statement.  */
+ 
+ struct GTY(()) gimple_statement_atomic
+ {
+   /* [ WORD 1-8 ]  */
+   struct gimple_statement_with_memory_ops_base membase;
+ 
+   /* [ WORD 9 ] */
+   enum gimple_atomic_kind kind;
+ 
+   /* [ WORD 10 ] */
+   tree fntype;
+ 
+   /* [ WORD 11 ]
+      Operand vector.  NOTE!  This must always be the last field
+      of this structure.  In particular, this means that this
+      structure cannot be embedded inside another one.  */
+   tree GTY((length ("%h.membase.opbase.gsbase.num_ops"))) op[1];
+ };
+ 
+ 
  
  /* OpenMP statements (#pragma omp).  */
  
*************** union GTY ((desc ("gimple_statement_stru
*** 821,826 ****
--- 858,864 ----
    struct gimple_statement_with_memory_ops_base GTY ((tag ("GSS_WITH_MEM_OPS_BASE"))) gsmembase;
    struct gimple_statement_with_memory_ops GTY ((tag ("GSS_WITH_MEM_OPS"))) gsmem;
    struct gimple_statement_call GTY ((tag ("GSS_CALL"))) gimple_call;
+   struct gimple_statement_atomic GTY ((tag("GSS_ATOMIC"))) gimple_atomic;
    struct gimple_statement_omp GTY ((tag ("GSS_OMP"))) omp;
    struct gimple_statement_bind GTY ((tag ("GSS_BIND"))) gimple_bind;
    struct gimple_statement_catch GTY ((tag ("GSS_CATCH"))) gimple_catch;
*************** gimple gimple_build_debug_source_bind_st
*** 878,883 ****
--- 916,932 ----
  #define gimple_build_debug_source_bind(var,val,stmt)			\
    gimple_build_debug_source_bind_stat ((var), (val), (stmt) MEM_STAT_INFO)
  
+ gimple gimple_build_atomic_load (tree, tree, tree);
+ gimple gimple_build_atomic_store (tree, tree, tree, tree);
+ gimple gimple_build_atomic_exchange (tree, tree, tree, tree);
+ gimple gimple_build_atomic_compare_exchange (tree, tree, tree, tree, tree,
+ 					     tree, bool);
+ gimple gimple_build_atomic_fetch_op (tree, tree, tree, enum tree_code, tree);
+ gimple gimple_build_atomic_op_fetch (tree, tree, tree, enum tree_code, tree);
+ gimple gimple_build_atomic_test_and_set (tree, tree);
+ gimple gimple_build_atomic_clear (tree, tree);
+ gimple gimple_build_atomic_fence (tree, bool);
+ 
  gimple gimple_build_call_vec (tree, VEC(tree, heap) *);
  gimple gimple_build_call (tree, unsigned, ...);
  gimple gimple_build_call_valist (tree, unsigned, va_list);
*************** extern void gimplify_function_tree (tree
*** 1133,1138 ****
--- 1182,1201 ----
  
  /* In cfgexpand.c.  */
  extern tree gimple_assign_rhs_to_tree (gimple);
+ extern void expand_gimple_assign_move (tree, rtx, rtx, bool);
+ 
+ /* In tree-atomic.c.  */
+ extern bool expand_gimple_atomic_load (gimple);
+ extern bool expand_gimple_atomic_store (gimple);
+ extern bool expand_gimple_atomic_exchange (gimple);
+ extern bool expand_gimple_atomic_compare_exchange (gimple);
+ extern bool expand_gimple_atomic_fetch_op (gimple);
+ extern bool expand_gimple_atomic_op_fetch (gimple);
+ extern void expand_gimple_atomic_test_and_set (gimple);
+ extern void expand_gimple_atomic_clear (gimple);
+ extern void expand_gimple_atomic_fence (gimple);
+ extern void expand_gimple_atomic_library_call (gimple);
+ extern void gimple_verify_memmodel (gimple);
  
  /* In builtins.c  */
  extern bool validate_gimple_arglist (const_gimple, ...);
*************** gimple_set_op (gimple gs, unsigned i, tr
*** 1800,1805 ****
--- 1863,2247 ----
    gimple_ops (gs)[i] = op;
  }
  
+ /* Return true if GS is a GIMPLE_ATOMIC.  */
+ 
+ static inline bool
+ is_gimple_atomic (const_gimple gs)
+ {
+   return gimple_code (gs) == GIMPLE_ATOMIC;
+ }
+ 
+ /* Return the kind of atomic operation GS.  */
+ 
+ static inline enum gimple_atomic_kind
+ gimple_atomic_kind (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   return gs->gimple_atomic.kind;
+ }
+ 
+ /* Set the kind of atomic operation GS to K.  */
+ 
+ static inline void
+ gimple_atomic_set_kind (gimple gs, enum gimple_atomic_kind k)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gs->gimple_atomic.kind = k;
+ }
+ 
+ /* Return the base type of the atomic operation GS.  */
+ static inline tree
+ gimple_atomic_type (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   return gs->gimple_atomic.fntype;
+ }
+ 
+ /* Set the base type of atomic operation GS to T.  */
+ 
+ static inline void
+ gimple_atomic_set_type (gimple gs, tree t)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gs->gimple_atomic.fntype = t;
+ }
+ 
+ /*  Return the number of possible results for atomic operation GS.  */
+ 
+ static inline unsigned
+ gimple_atomic_num_lhs (const_gimple gs)
+ {
+   switch (gimple_atomic_kind (gs))
+     {
+     case GIMPLE_ATOMIC_COMPARE_EXCHANGE:
+       return 2;
+ 
+     case GIMPLE_ATOMIC_STORE:
+     case GIMPLE_ATOMIC_CLEAR:
+     case GIMPLE_ATOMIC_FENCE:
+       return 0;
+ 
+     default:
+       break;
+     }
+   return 1;
+ }
+ 
+ /* Return the number of rhs operands for atomic operation GS.  */
+ 
+ static inline unsigned
+ gimple_atomic_num_rhs (const_gimple gs)
+ {
+   return (gimple_num_ops (gs) - gimple_atomic_num_lhs (gs));
+ }
+ 
+ /* Return true if atomic operation GS can have at least one result.  */
+ 
+ static inline bool
+ gimple_atomic_has_lhs (const_gimple gs)
+ {
+   return (gimple_atomic_num_lhs (gs) > 0);
+ }
+ 
+ /* Return the LHS number INDEX of atomic operation GS.  */
+ 
+ static inline tree
+ gimple_atomic_lhs (const_gimple gs, unsigned index)
+ {
+   unsigned n;
+ 
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   n = gimple_atomic_num_lhs (gs);
+   gcc_assert ((n > 0) && (index < n));
+   return gimple_op (gs, gimple_num_ops (gs) - index - 1);
+ }
+ 
+ /* Return the pointer to LHS number INDEX of atomic operation GS.  */
+ 
+ static inline tree *
+ gimple_atomic_lhs_ptr (const_gimple gs, unsigned index)
+ {
+   unsigned n;
+ 
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   n = gimple_atomic_num_lhs (gs);
+   gcc_assert ((n > 0) && (index < n));
+   return gimple_op_ptr (gs, gimple_num_ops (gs) - index - 1);
+ }
+ 
+ /* Set the LHS number INDEX of atomic operation GS to EXPR.  */
+ 
+ static inline void
+ gimple_atomic_set_lhs (gimple gs, unsigned index, tree expr)
+ {
+   unsigned n;
+ 
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   n = gimple_atomic_num_lhs (gs);
+   gcc_assert ((n > 0) && (index < n));
+   gimple_set_op (gs, gimple_num_ops (gs) - index - 1, expr);
+ }
+ 
+ /* Return the memory order for atomic operation GS.  */
+ 
+ static inline tree 
+ gimple_atomic_order (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   return gimple_op (gs, 0);
+ }
+ 
+ /* Return a pointer to the memory order for atomic operation GS.  */
+ 
+ static inline tree *
+ gimple_atomic_order_ptr (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   return gimple_op_ptr (gs, 0);
+ }
+ 
+ 
+ /* set the memory order for atomic operation GS to T.  */
+ 
+ static inline void
+ gimple_atomic_set_order (gimple gs, tree t)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gimple_set_op (gs, 0, t);
+ }
+ 
+ /* Return true if atomic operation GS contarins an atomic target location.  */
+ 
+ static inline bool
+ gimple_atomic_has_target (const_gimple gs)
+ {
+   return (gimple_atomic_kind (gs) != GIMPLE_ATOMIC_FENCE);
+ }
+ 
+ /* Return the target location of atomic operation GS.  */
+ 
+ static inline tree
+ gimple_atomic_target (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_target (gs));
+   return gimple_op (gs, 1);
+ }
+ 
+ /* Return a pointer to the target location of atomic operation GS.  */
+ 
+ static inline tree *
+ gimple_atomic_target_ptr (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_target (gs));
+   return gimple_op_ptr (gs, 1);
+ }
+ 
+ /* Set the target location of atomic operation GS to T.  */
+ 
+ static inline void
+ gimple_atomic_set_target (gimple gs, tree t)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_target (gs));
+   gimple_set_op (gs, 1, t);
+ }
+ 
+ /* Return true if atomic operation GS has an expression field.  */
+ 
+ static inline bool
+ gimple_atomic_has_expr (const_gimple gs)
+ {
+   switch (gimple_atomic_kind (gs))
+   {
+     case GIMPLE_ATOMIC_COMPARE_EXCHANGE:
+     case GIMPLE_ATOMIC_EXCHANGE:
+     case GIMPLE_ATOMIC_STORE:
+     case GIMPLE_ATOMIC_FETCH_OP:
+     case GIMPLE_ATOMIC_OP_FETCH:
+       return true;
+ 
+     default:
+       return false;
+   }
+ }
+ 
+ /* Return the expression field of atomic operation GS.  */
+ 
+ static inline tree
+ gimple_atomic_expr (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_expr (gs));
+   return gimple_op (gs, 2);
+ }
+ 
+ /* Return a pointer to the expression field of atomic operation GS.  */
+ 
+ static inline tree *
+ gimple_atomic_expr_ptr (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_expr (gs));
+   return gimple_op_ptr (gs, 2);
+ }
+ 
+ /* Set the expression field of atomic operation GS.  */
+ 
+ static inline void
+ gimple_atomic_set_expr (gimple gs, tree t)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_expr (gs));
+   gimple_set_op (gs, 2, t);
+ }
+ 
+ /* Return true if atomic operation GS has an expected field.  */
+ 
+ static inline bool
+ gimple_atomic_has_expected (const_gimple gs)
+ {
+   return gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE;
+ }
+ 
+ /* Return the expected field of atomic operation GS.  */
+ 
+ static inline tree
+ gimple_atomic_expected (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_expected (gs));
+   return gimple_op (gs, 3);
+ }
+ 
+ /* Return a pointer to the expected field of atomic operation GS.  */
+ 
+ static inline tree *
+ gimple_atomic_expected_ptr (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_expected (gs));
+   return gimple_op_ptr (gs, 3);
+ }
+ 
+ /* Set the expected field of atomic operation GS.  */
+ 
+ static inline void
+ gimple_atomic_set_expected (gimple gs, tree t)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_expected (gs));
+   gimple_set_op (gs, 3, t);
+ }
+ 
+ /* Return true if atomic operation GS has a fail order field.  */
+ 
+ static inline bool
+ gimple_atomic_has_fail_order (const_gimple gs)
+ {
+   return gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE;
+ }
+ 
+ /* Return the fail_order field of atomic operation GS.  */
+ 
+ static inline tree
+ gimple_atomic_fail_order (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_fail_order (gs));
+   return gimple_op (gs, 4);
+ }
+ 
+ /* Return a pointer to the fail_order field of atomic operation GS.  */
+ 
+ static inline tree *
+ gimple_atomic_fail_order_ptr (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_fail_order (gs));
+   return gimple_op_ptr (gs, 4);
+ }
+ 
+ 
+ /* Set the fail_order field of atomic operation GS.  */
+ 
+ static inline void
+ gimple_atomic_set_fail_order (gimple gs, tree t)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_has_fail_order (gs));
+   gimple_set_op (gs, 4, t);
+ }
+ 
+ /* Return the arithmetic operation tree code for atomic operation GS.  */
+ 
+ static inline enum tree_code
+ gimple_atomic_op_code (const_gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_FETCH_OP ||
+ 	      gimple_atomic_kind (gs) == GIMPLE_ATOMIC_OP_FETCH);
+   return (enum tree_code) gs->gsbase.subcode;
+ }
+ 
+ /* Set the arithmetic operation tree code for atomic operation GS.  */
+ 
+ static inline void
+ gimple_atomic_set_op_code (gimple gs, enum tree_code tc)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_FETCH_OP ||
+ 	      gimple_atomic_kind (gs) == GIMPLE_ATOMIC_OP_FETCH);
+   gs->gsbase.subcode = tc;
+ }
+ 
+ /* Return true if atomic fence GS is a thread fence.  */
+ 
+ static inline bool
+ gimple_atomic_thread_fence (gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_FENCE);
+   return (gs->gsbase.subcode & GF_ATOMIC_THREAD_FENCE) != 0;
+ }
+ 
+ /* Set the thread fence field of atomic fence GS to THREAD.  */
+ 
+ static inline void 
+ gimple_atomic_set_thread_fence (gimple gs, bool thread)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_FENCE);
+   if (thread)
+     gs->gsbase.subcode |= GF_ATOMIC_THREAD_FENCE;
+   else
+     gs->gsbase.subcode &= ~GF_ATOMIC_THREAD_FENCE;
+ }
+ 
+ /* Return the weak flag of atomic operation GS.  */
+ 
+ static inline bool
+ gimple_atomic_weak (gimple gs)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE);
+   return (gs->gsbase.subcode & GF_ATOMIC_WEAK) != 0;
+ }
+ 
+ /* set the weak flag of atomic operation GS to WEAK.  */
+ 
+ static inline void
+ gimple_atomic_set_weak (gimple gs, bool weak)
+ {
+   GIMPLE_CHECK (gs, GIMPLE_ATOMIC);
+   gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE);
+   if (weak)
+     gs->gsbase.subcode |= GF_ATOMIC_WEAK;
+   else
+     gs->gsbase.subcode &= ~GF_ATOMIC_WEAK;
+ }
+ 
  /* Return true if GS is a GIMPLE_ASSIGN.  */
  
  static inline bool
Index: gimple.c
===================================================================
*** gimple.c	(revision 186098)
--- gimple.c	(working copy)
*************** gimple_build_return (tree retval)
*** 200,205 ****
--- 200,373 ----
    return s;
  }
  
+ /* Build a GIMPLE_ATOMIC statement of GIMPLE_ATOMIC_LOAD kind.
+    TYPE is the underlying type of the atomic operation.
+    TARGET is the atomic memory location being operated on.
+    ORDER is the memory model to be used.  */
+ 
+ gimple
+ gimple_build_atomic_load (tree type, tree target, tree order)
+ {
+   gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 3);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_LOAD);
+   gimple_atomic_set_order (s, order);
+   gimple_atomic_set_target (s, target);
+   gimple_atomic_set_type (s, type);
+   gimple_set_has_volatile_ops (s, true);
+   return s;
+ }
+ 
+ /* Build a GIMPLE_ATOMIC statement of GIMPLE_ATOMIC_STORE kind.
+    TYPE is the underlying type of the atomic operation.
+    TARGET is the atomic memory location being operated on.
+    EXPR is the expression to be stored.
+    ORDER is the memory model to be used.  */
+ 
+ gimple
+ gimple_build_atomic_store (tree type, tree target, tree expr, tree order)
+ {
+   gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 3);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_STORE);
+   gimple_atomic_set_order (s, order);
+   gimple_atomic_set_target (s, target);
+   gimple_atomic_set_expr (s, expr);
+   gimple_atomic_set_type (s, type);
+   gimple_set_has_volatile_ops (s, true);
+   return s;
+ }
+ 
+ /* Build a GIMPLE_ATOMIC statement of GIMPLE_ATOMIC_EXCHANGE kind.
+    TYPE is the underlying type of the atomic operation.
+    TARGET is the atomic memory location being operated on.
+    EXPR is the expression to be stored.
+    ORDER is the memory model to be used.  */
+ 
+ gimple
+ gimple_build_atomic_exchange (tree type, tree target, tree expr, tree order)
+ {
+   gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 4);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_EXCHANGE);
+   gimple_atomic_set_order (s, order);
+   gimple_atomic_set_target (s, target);
+   gimple_atomic_set_expr (s, expr);
+   gimple_atomic_set_type (s, type);
+   gimple_set_has_volatile_ops (s, true);
+   return s;
+ }
+ 
+ /* Build a GIMPLE_ATOMIC statement of GIMPLE_ATOMIC_COMPARE_EXCHANGE kind.
+    TYPE is the underlying type of the atomic operation.
+    TARGET is the atomic memory location being operated on.
+    EXPECTED is the value thought to be in the atomic memory location.
+    EXPR is the expression to be stored if EXPECTED matches.
+    SUCCESS is the memory model to be used for a success operation.
+    FAIL is the memory model to be used for a failed operation.
+    WEAK is true if this is a weak compare_exchange, otherwise it is strong.  */
+ 
+ gimple
+ gimple_build_atomic_compare_exchange (tree type, tree target, tree expected,
+ 				      tree expr, tree success, tree fail,
+ 				      bool weak)
+ {
+   gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 7);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_COMPARE_EXCHANGE);
+   gimple_atomic_set_order (s, success);
+   gimple_atomic_set_target (s, target);
+   gimple_atomic_set_expr (s, expr);
+   gimple_atomic_set_expected (s, expected);
+   gimple_atomic_set_fail_order (s, fail);
+   gimple_atomic_set_weak (s, weak);
+   gimple_atomic_set_type (s, type);
+   gimple_set_has_volatile_ops (s, true);
+   return s;
+ }
+ 
+ /* Build a GIMPLE_ATOMIC statement of GIMPLE_ATOMIC_FETCH_OP kind.
+    TYPE is the underlying type of the atomic operation.
+    TARGET is the atomic memory location being operated on.
+    EXPR is the expression to be stored.
+    OP is the tree code for the operation to be performed.
+    ORDER is the memory model to be used.  */
+ 
+ gimple
+ gimple_build_atomic_fetch_op (tree type, tree target, tree expr,
+ 			      enum tree_code op, tree order)
+ {
+   gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 4);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_FETCH_OP);
+   gimple_atomic_set_order (s, order);
+   gimple_atomic_set_target (s, target);
+   gimple_atomic_set_expr (s, expr);
+   gimple_atomic_set_op_code (s, op);
+   gimple_atomic_set_type (s, type);
+   gimple_set_has_volatile_ops (s, true);
+   return s;
+ }
+ 
+ /* Build a GIMPLE_ATOMIC statement of GIMPLE_ATOMIC_OP_FETCH kind.
+    TYPE is the underlying type of the atomic operation.
+    TARGET is the atomic memory location being operated on.
+    EXPR is the expression to be stored.
+    OP is the tree code for the operation to be performed.
+    ORDER is the memory model to be used.  */
+ 
+ gimple
+ gimple_build_atomic_op_fetch (tree type, tree target, tree expr,
+ 			      enum tree_code op, tree order)
+ {
+   gimple s = gimple_build_atomic_fetch_op (type, target, expr, op, order);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_OP_FETCH);
+   return s;
+ }
+ 
+ /* Build a GIMPLE_ATOMIC statement of GIMPLE_ATOMIC_TEST_AND_SET kind.
+    TARGET is the atomic memory location being operated on.
+    ORDER is the memory model to be used.  */
+ 
+ gimple
+ gimple_build_atomic_test_and_set (tree target, tree order)
+ {
+   gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 3);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_TEST_AND_SET);
+   gimple_atomic_set_order (s, order);
+   gimple_atomic_set_target (s, target);
+   gimple_atomic_set_type (s, boolean_type_node);
+   gimple_set_has_volatile_ops (s, true);
+   return s;
+ }
+ 
+ /* Build a GIMPLE_ATOMIC statement of GIMPLE_ATOMIC_CLEAR kind.
+    TARGET is the atomic memory location being operated on.
+    ORDER is the memory model to be used.  */
+ 
+ gimple
+ gimple_build_atomic_clear (tree target, tree order)
+ {
+   gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 2);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_CLEAR);
+   gimple_atomic_set_order (s, order);
+   gimple_atomic_set_target (s, target);
+   gimple_atomic_set_type (s, boolean_type_node);
+   gimple_set_has_volatile_ops (s, true);
+   return s;
+ }
+ 
+ /* Build a GIMPLE_ATOMIC statement of GIMPLE_ATOMIC_FENCE kind.
+    ORDER is the memory model to be used.
+    THREAD is true if this is a thread barrier, otherwise it is a
+    signal barrier for just the local CPU.  */
+ 
+ gimple
+ gimple_build_atomic_fence (tree order, bool thread)
+ {
+   gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 1);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_FENCE);
+   gimple_atomic_set_order (s, order);
+   gimple_atomic_set_thread_fence (s, thread);
+   gimple_set_has_volatile_ops (s, true);
+   return s;
+ }
+ 
  /* Reset alias information on call S.  */
  
  void
*************** walk_gimple_op (gimple stmt, walk_tree_f
*** 1519,1524 ****
--- 1687,1721 ----
  	}
        break;
  
+     case GIMPLE_ATOMIC:
+       if (wi)
+ 	wi->val_only = true;
+ 
+       /* Walk the RHS.  */
+       for (i = 0; i < gimple_atomic_num_rhs (stmt) ; i++)
+        {
+  	 ret = walk_tree (gimple_op_ptr (stmt, i), callback_op, wi,
+ 			  pset);
+ 	 if (ret)
+ 	   return ret;
+        }
+ 
+       if (wi)
+ 	wi->is_lhs = true;
+ 
+       for (i = 0; i < gimple_atomic_num_lhs (stmt) ; i++)
+        {
+  	 ret = walk_tree (gimple_atomic_lhs_ptr (stmt, i), callback_op, wi,
+ 			  pset);
+ 	 if (ret)
+ 	   return ret;
+        }
+ 
+       if (wi)
+ 	wi->is_lhs = false;
+ 
+       break;
+ 
      case GIMPLE_CALL:
        if (wi)
  	{
*************** walk_stmt_load_store_addr_ops (gimple st
*** 5281,5286 ****
--- 5478,5518 ----
  	    }
  	}
      }
+   else if (is_gimple_atomic (stmt))
+     {
+       tree t;
+       if (visit_store)
+         {
+ 	  for (i = 0; i < gimple_atomic_num_lhs (stmt); i++)
+ 	    {
+ 	      t = gimple_atomic_lhs (stmt, i);
+ 	      if (t)
+ 	        {
+ 		  t = get_base_loadstore (t);
+ 		  if (t)
+ 		    ret |= visit_store (stmt, t, data);
+ 		}
+ 	    }
+ 	}
+       if (visit_load || visit_addr)
+         {
+ 	  for (i = 0; i < gimple_atomic_num_rhs (stmt); i++)
+ 	    {
+ 	      t = gimple_op (stmt, i);
+ 	      if (t)
+ 	        {
+ 		  if (visit_addr && TREE_CODE (t) == ADDR_EXPR)
+ 		    ret |= visit_addr (stmt, TREE_OPERAND (t, 0), data);
+ 		  else if (visit_load)
+ 		    {
+ 		      t = get_base_loadstore (t);
+ 		      if (t)
+ 			ret |= visit_load (stmt, t, data);
+ 		    }
+ 		}
+ 	    }
+ 	}    
+       }
    else if (is_gimple_call (stmt))
      {
        if (visit_store)
Index: cfgexpand.c
===================================================================
*** cfgexpand.c	(revision 186098)
--- cfgexpand.c	(working copy)
*************** mark_transaction_restart_calls (gimple s
*** 1990,1995 ****
--- 1990,2076 ----
      }
  }
  
+ 
+ /* Expand a GIMPLE_ATOMIC statement STMT into RTL.  */
+ static void
+ expand_atomic_stmt (gimple stmt)
+ {
+   enum gimple_atomic_kind kind = gimple_atomic_kind (stmt);
+   bool emitted = false;
+   bool try_inline;
+ 
+   /* Fences, test_and_set, and clear operations are required to be inlined.  */
+   try_inline = flag_inline_atomics || (kind == GIMPLE_ATOMIC_FENCE) ||
+ 	       (kind == GIMPLE_ATOMIC_TEST_AND_SET) ||
+ 	       (kind == GIMPLE_ATOMIC_CLEAR);
+ 
+   /* Try emitting inline code if requsted.  */
+   if (try_inline)
+     {
+       switch (kind)
+ 	{
+ 	case GIMPLE_ATOMIC_LOAD:
+ 	  emitted = expand_gimple_atomic_load (stmt);
+ 	  break;
+ 
+ 	case GIMPLE_ATOMIC_STORE:
+ 	  emitted = expand_gimple_atomic_store (stmt);
+ 	  break;
+ 
+ 	case GIMPLE_ATOMIC_EXCHANGE:
+ 	  emitted = expand_gimple_atomic_exchange (stmt);
+ 	  break;
+ 
+ 	case GIMPLE_ATOMIC_COMPARE_EXCHANGE:
+ 	  emitted = expand_gimple_atomic_compare_exchange (stmt);
+ 	  break;
+ 
+ 	case GIMPLE_ATOMIC_FETCH_OP:
+ 	  emitted = expand_gimple_atomic_fetch_op (stmt);
+ 	  break;
+ 
+ 	case GIMPLE_ATOMIC_OP_FETCH:
+ 	  emitted = expand_gimple_atomic_op_fetch (stmt);
+ 	  break;
+ 
+ 	case GIMPLE_ATOMIC_TEST_AND_SET:
+ 	  expand_gimple_atomic_test_and_set (stmt);
+ 	  return;
+ 
+ 	case GIMPLE_ATOMIC_CLEAR:
+ 	  expand_gimple_atomic_clear (stmt);
+ 	  return;
+ 
+ 	case GIMPLE_ATOMIC_FENCE:
+ 	  expand_gimple_atomic_fence (stmt);
+ 	  return;
+ 
+ 	default:
+ 	  gcc_unreachable ();
+ 	}
+     }
+ 
+   /* If no code was emitted, issue a library call.  */
+   if (!emitted)
+     {
+       switch (kind)
+         {
+ 	case GIMPLE_ATOMIC_LOAD:
+ 	case GIMPLE_ATOMIC_STORE:
+ 	case GIMPLE_ATOMIC_EXCHANGE:
+ 	case GIMPLE_ATOMIC_COMPARE_EXCHANGE:
+ 	case GIMPLE_ATOMIC_FETCH_OP:
+ 	case GIMPLE_ATOMIC_OP_FETCH:
+ 	  expand_gimple_atomic_library_call (stmt);
+ 	  return;
+ 
+ 	default:
+ 	  /* The remaining kinds must be inlined or unsupported.  */
+ 	  gcc_unreachable ();
+ 	}
+     }
+ }
+ 
  /* A subroutine of expand_gimple_stmt_1, expanding one GIMPLE_CALL
     statement STMT.  */
  
*************** expand_call_stmt (gimple stmt)
*** 2079,2084 ****
--- 2160,2277 ----
    mark_transaction_restart_calls (stmt);
  }
  
+ 
+ /* A subroutine of expand_gimple_assign, Take care of moving the RHS of an
+    assignment into TARGET which is of type TARGET_TREE_TYPE.   Moves can
+    be NONTEMPORAL.  */
+ 
+ void
+ expand_gimple_assign_move (tree target_tree_type, rtx target, rtx rhs,
+ 			   bool nontemporal) 
+ {
+   bool promoted = false;
+ 
+   if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
+     promoted = true;
+ 
+   if (rhs == target)
+     ;
+   else if (promoted)
+     {
+       int unsignedp = SUBREG_PROMOTED_UNSIGNED_P (target);
+       /* If TEMP is a VOIDmode constant, use convert_modes to make
+ 	 sure that we properly convert it.  */
+       if (CONSTANT_P (rhs) && GET_MODE (rhs) == VOIDmode)
+ 	{
+ 	  rhs = convert_modes (GET_MODE (target),
+ 				TYPE_MODE (target_tree_type),
+ 				rhs, unsignedp);
+ 	  rhs = convert_modes (GET_MODE (SUBREG_REG (target)),
+ 				GET_MODE (target), rhs, unsignedp);
+ 	}
+ 
+       convert_move (SUBREG_REG (target), rhs, unsignedp);
+     }
+   else if (nontemporal && emit_storent_insn (target, rhs))
+     ;
+   else
+     {
+       rhs = force_operand (rhs, target);
+       if (rhs != target)
+ 	emit_move_insn (target, rhs);
+     }
+ }
+ 
+ /* A subroutine of expand_gimple_stmt_1, expanding one GIMPLE_CALL
+    statement STMT.  */
+ 
+ static void
+ expand_gimple_assign (gimple stmt)
+ {
+   tree lhs = gimple_assign_lhs (stmt);
+ 
+   /* Tree expand used to fiddle with |= and &= of two bitfield
+      COMPONENT_REFs here.  This can't happen with gimple, the LHS
+      of binary assigns must be a gimple reg.  */
+ 
+   if (TREE_CODE (lhs) != SSA_NAME
+       || get_gimple_rhs_class (gimple_expr_code (stmt))
+ 	 == GIMPLE_SINGLE_RHS)
+     {
+       tree rhs = gimple_assign_rhs1 (stmt);
+       gcc_assert (get_gimple_rhs_class (gimple_expr_code (stmt))
+ 		  == GIMPLE_SINGLE_RHS);
+       if (gimple_has_location (stmt) && CAN_HAVE_LOCATION_P (rhs))
+ 	SET_EXPR_LOCATION (rhs, gimple_location (stmt));
+       if (TREE_CLOBBER_P (rhs))
+ 	/* This is a clobber to mark the going out of scope for
+ 	   this LHS.  */
+ 	;
+       else
+ 	expand_assignment (lhs, rhs,
+ 			   gimple_assign_nontemporal_move_p (stmt));
+     }
+   else
+     {
+       rtx target, temp;
+       bool nontemporal = gimple_assign_nontemporal_move_p (stmt);
+       struct separate_ops ops;
+       bool promoted = false;
+ 
+       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+       if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
+ 	promoted = true;
+ 
+       ops.code = gimple_assign_rhs_code (stmt);
+       ops.type = TREE_TYPE (lhs);
+       switch (get_gimple_rhs_class (gimple_expr_code (stmt)))
+ 	{
+ 	  case GIMPLE_TERNARY_RHS:
+ 	    ops.op2 = gimple_assign_rhs3 (stmt);
+ 	    /* Fallthru */
+ 	  case GIMPLE_BINARY_RHS:
+ 	    ops.op1 = gimple_assign_rhs2 (stmt);
+ 	    /* Fallthru */
+ 	  case GIMPLE_UNARY_RHS:
+ 	    ops.op0 = gimple_assign_rhs1 (stmt);
+ 	    break;
+ 	  default:
+ 	    gcc_unreachable ();
+ 	}
+       ops.location = gimple_location (stmt);
+ 
+       /* If we want to use a nontemporal store, force the value to
+ 	 register first.  If we store into a promoted register,
+ 	 don't directly expand to target.  */
+       temp = nontemporal || promoted ? NULL_RTX : target;
+       temp = expand_expr_real_2 (&ops, temp, GET_MODE (target),
+ 				 EXPAND_NORMAL);
+ 
+       expand_gimple_assign_move (TREE_TYPE (lhs), target, temp, nontemporal);
+     }
+ }
+ 
+ 
  /* A subroutine of expand_gimple_stmt, expanding one gimple statement
     STMT that doesn't require special handling for outgoing edges.  That
     is no tailcalls and no GIMPLE_COND.  */
*************** expand_gimple_stmt_1 (gimple stmt)
*** 2115,2120 ****
--- 2308,2316 ----
      case GIMPLE_CALL:
        expand_call_stmt (stmt);
        break;
+     case GIMPLE_ATOMIC:
+       expand_atomic_stmt (stmt);
+       break;
  
      case GIMPLE_RETURN:
        op0 = gimple_return_retval (stmt);
*************** expand_gimple_stmt_1 (gimple stmt)
*** 2147,2240 ****
        break;
  
      case GIMPLE_ASSIGN:
!       {
! 	tree lhs = gimple_assign_lhs (stmt);
! 
! 	/* Tree expand used to fiddle with |= and &= of two bitfield
! 	   COMPONENT_REFs here.  This can't happen with gimple, the LHS
! 	   of binary assigns must be a gimple reg.  */
! 
! 	if (TREE_CODE (lhs) != SSA_NAME
! 	    || get_gimple_rhs_class (gimple_expr_code (stmt))
! 	       == GIMPLE_SINGLE_RHS)
! 	  {
! 	    tree rhs = gimple_assign_rhs1 (stmt);
! 	    gcc_assert (get_gimple_rhs_class (gimple_expr_code (stmt))
! 			== GIMPLE_SINGLE_RHS);
! 	    if (gimple_has_location (stmt) && CAN_HAVE_LOCATION_P (rhs))
! 	      SET_EXPR_LOCATION (rhs, gimple_location (stmt));
! 	    if (TREE_CLOBBER_P (rhs))
! 	      /* This is a clobber to mark the going out of scope for
! 		 this LHS.  */
! 	      ;
! 	    else
! 	      expand_assignment (lhs, rhs,
! 				 gimple_assign_nontemporal_move_p (stmt));
! 	  }
! 	else
! 	  {
! 	    rtx target, temp;
! 	    bool nontemporal = gimple_assign_nontemporal_move_p (stmt);
! 	    struct separate_ops ops;
! 	    bool promoted = false;
! 
! 	    target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
! 	    if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
! 	      promoted = true;
! 
! 	    ops.code = gimple_assign_rhs_code (stmt);
! 	    ops.type = TREE_TYPE (lhs);
! 	    switch (get_gimple_rhs_class (gimple_expr_code (stmt)))
! 	      {
! 		case GIMPLE_TERNARY_RHS:
! 		  ops.op2 = gimple_assign_rhs3 (stmt);
! 		  /* Fallthru */
! 		case GIMPLE_BINARY_RHS:
! 		  ops.op1 = gimple_assign_rhs2 (stmt);
! 		  /* Fallthru */
! 		case GIMPLE_UNARY_RHS:
! 		  ops.op0 = gimple_assign_rhs1 (stmt);
! 		  break;
! 		default:
! 		  gcc_unreachable ();
! 	      }
! 	    ops.location = gimple_location (stmt);
! 
! 	    /* If we want to use a nontemporal store, force the value to
! 	       register first.  If we store into a promoted register,
! 	       don't directly expand to target.  */
! 	    temp = nontemporal || promoted ? NULL_RTX : target;
! 	    temp = expand_expr_real_2 (&ops, temp, GET_MODE (target),
! 				       EXPAND_NORMAL);
! 
! 	    if (temp == target)
! 	      ;
! 	    else if (promoted)
! 	      {
! 		int unsignedp = SUBREG_PROMOTED_UNSIGNED_P (target);
! 		/* If TEMP is a VOIDmode constant, use convert_modes to make
! 		   sure that we properly convert it.  */
! 		if (CONSTANT_P (temp) && GET_MODE (temp) == VOIDmode)
! 		  {
! 		    temp = convert_modes (GET_MODE (target),
! 					  TYPE_MODE (ops.type),
! 					  temp, unsignedp);
! 		    temp = convert_modes (GET_MODE (SUBREG_REG (target)),
! 					  GET_MODE (target), temp, unsignedp);
! 		  }
! 
! 		convert_move (SUBREG_REG (target), temp, unsignedp);
! 	      }
! 	    else if (nontemporal && emit_storent_insn (target, temp))
! 	      ;
! 	    else
! 	      {
! 		temp = force_operand (temp, target);
! 		if (temp != target)
! 		  emit_move_insn (target, temp);
! 	      }
! 	  }
!       }
        break;
  
      default:
--- 2343,2349 ----
        break;
  
      case GIMPLE_ASSIGN:
!       expand_gimple_assign (stmt);
        break;
  
      default:
Index: Makefile.in
===================================================================
*** Makefile.in	(revision 186098)
--- Makefile.in	(working copy)
*************** OBJS = \
*** 1356,1361 ****
--- 1356,1362 ----
  	tracer.o \
  	trans-mem.o \
  	tree-affine.o \
+ 	tree-atomic.o \
  	tree-call-cdce.o \
  	tree-cfg.o \
  	tree-cfgcleanup.o \
*************** omp-low.o : omp-low.c $(CONFIG_H) $(SYST
*** 2585,2590 ****
--- 2586,2596 ----
     $(TREE_FLOW_H) $(TIMEVAR_H) $(FLAGS_H) $(EXPR_H) $(DIAGNOSTIC_CORE_H) \
     $(TREE_PASS_H) $(GGC_H) $(EXCEPT_H) $(SPLAY_TREE_H) $(OPTABS_H) \
     $(CFGLOOP_H) tree-iterator.h gt-omp-low.h
+ tree-atomic.o : tree-atomic.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+    $(TREE_H) $(RTL_H) $(GIMPLE_H) $(TREE_INLINE_H) langhooks.h \
+    $(DIAGNOSTIC_CORE_H) $(TREE_FLOW_H) $(TIMEVAR_H) $(FLAGS_H) $(EXPR_H) \
+    $(DIAGNOSTIC_CORE_H) $(TREE_PASS_H) $(GGC_H) $(EXCEPT_H) $(SPLAY_TREE_H) \
+    $(OPTABS_H) $(CFGLOOP_H) tree-iterator.h
  tree-browser.o : tree-browser.c tree-browser.def $(CONFIG_H) $(SYSTEM_H) \
     coretypes.h $(TREE_H) tree-pretty-print.h
  omega.o : omega.c omega.h $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \
Index: tree-atomic.c
===================================================================
*** tree-atomic.c	(revision 0)
--- tree-atomic.c	(revision 0)
***************
*** 0 ****
--- 1,967 ----
+ /* Pass for lowering and manipulating atomic tree codes.
+    Various __builtin_atomic function calls are turned into atomic tree
+    expressions. 
+    Any memory references of type atomic are also translated into
+    the approriate atomic expression.
+    Contributed by Andrew MacLeod <amacleod@redhat.com>
+ 
+    Copyright (C) 2012
+    Free Software Foundation, Inc.
+ 
+ This file is part of GCC.
+ 
+ GCC is free software; you can redistribute it and/or modify it under
+ the terms of the GNU General Public License as published by the Free
+ Software Foundation; either version 3, or (at your option) any later
+ version.
+ 
+ GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+ 
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING3.  If not see
+ <http://www.gnu.org/licenses/>.  */
+ 
+ #include "config.h"
+ #include "system.h"
+ #include "coretypes.h"
+ #include "tm.h"
+ #include "tree.h"
+ #include "rtl.h"
+ #include "gimple.h"
+ #include "tree-iterator.h"
+ #include "tree-inline.h"
+ #include "langhooks.h"
+ #include "diagnostic-core.h"
+ #include "tree-flow.h"
+ #include "timevar.h"
+ #include "flags.h"
+ #include "function.h"
+ #include "expr.h"
+ #include "tree-pass.h"
+ #include "ggc.h"
+ #include "except.h"
+ #include "splay-tree.h"
+ #include "optabs.h"
+ #include "cfgloop.h"
+ #include "tree-pretty-print.h"
+ #include "gimple-pretty-print.h"
+ 
+ 
+ /* Return the type of atomic operation STMT.  */
+ 
+ static tree
+ get_atomic_type (gimple stmt)
+ {
+   tree t;
+ 
+   t = gimple_atomic_type (stmt);
+   gcc_assert (INTEGRAL_TYPE_P (t));
+ 
+   return t;
+ }
+ 
+ 
+ /* Extract memory model from tree EXP, and verify it is valid for KIND.  */
+ 
+ static enum memmodel
+ get_memmodel (enum gimple_atomic_kind kind, tree exp)
+ {
+   rtx op;
+   enum memmodel model;
+ 
+   /* If the parameter is not a constant, it's a run time value so we'll just
+      convert it to MEMMODEL_SEQ_CST to avoid annoying runtime checking.  */
+   if (TREE_CODE (exp) != INTEGER_CST)
+     return MEMMODEL_SEQ_CST;
+ 
+   op = expand_normal (exp);
+   if (INTVAL (op) < 0 || INTVAL (op) >= MEMMODEL_LAST)
+     {
+       warning (OPT_Winvalid_memory_model,
+ 	       "invalid memory model argument for atomic operation");
+       return MEMMODEL_SEQ_CST;
+     }
+   model = (enum memmodel) INTVAL (op);
+ 
+   switch (kind)
+     {
+     case GIMPLE_ATOMIC_LOAD:
+       if (model != MEMMODEL_RELEASE && model != MEMMODEL_ACQ_REL)
+         return model;
+       break;
+ 
+     case GIMPLE_ATOMIC_STORE:
+       if (model == MEMMODEL_RELAXED || model == MEMMODEL_SEQ_CST ||
+ 	  model == MEMMODEL_RELEASE)
+ 	 return model;
+       break;
+ 
+     case GIMPLE_ATOMIC_EXCHANGE:
+       if (model != MEMMODEL_CONSUME)
+         return model;
+       break;
+ 
+     case GIMPLE_ATOMIC_CLEAR:
+       if (model != MEMMODEL_ACQUIRE && model != MEMMODEL_ACQ_REL)
+         return model;
+       break;
+ 
+     default:
+       return model;
+     }
+ 
+   error ("invalid memory model for atomic operation");
+   return MEMMODEL_SEQ_CST;
+ }
+ 
+ /* Verify that all the memory model's are valid for STMT.  */
+ 
+ void
+ gimple_verify_memmodel (gimple stmt)
+ {
+   enum memmodel a,b;
+ 
+   a = get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_order (stmt));
+ 
+   if (gimple_atomic_kind (stmt) != GIMPLE_ATOMIC_COMPARE_EXCHANGE)
+     return;
+ 
+   b = get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_fail_order (stmt));
+   if (b == MEMMODEL_RELEASE || b == MEMMODEL_ACQ_REL)
+     error ("invalid failure memory model for %<__atomic_compare_exchange%>");
+   if (b > a)
+     error ("failure memory model cannot be stronger than success "
+ 	   "memory model for %<__atomic_compare_exchange%>");
+ }
+ 
+ /* Generate RTL for accessing the atomic location LOC in MODE.  */
+ 
+ static rtx
+ expand_atomic_target (tree loc, enum machine_mode mode)
+ {
+   rtx addr, mem;
+ 
+   addr = expand_expr (loc, NULL_RTX, ptr_mode, EXPAND_SUM);
+   addr = convert_memory_address (Pmode, addr);
+ 
+   /* Note that we explicitly do not want any alias information for this
+      memory, so that we kill all other live memories.  Otherwise we don't
+      satisfy the full barrier semantics of the intrinsic.  */
+   mem = validize_mem (gen_rtx_MEM (mode, addr));
+ 
+   /* The alignment needs to be at least according to that of the mode.  */
+   set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
+ 			   get_pointer_alignment (loc)));
+   set_mem_alias_set (mem, ALIAS_SET_MEMORY_BARRIER);
+   MEM_VOLATILE_P (mem) = 1;
+ 
+   return mem;
+ }
+ 
+ 
+ /* Make sure an argument is in the right mode.
+    EXP is the tree argument. 
+    MODE is the mode it should be in.  */
+ 
+ static rtx
+ expand_expr_force_mode (tree exp, enum machine_mode mode)
+ {
+   rtx val;
+   enum machine_mode old_mode;
+ 
+   val = expand_expr (exp, NULL_RTX, mode, EXPAND_NORMAL);
+   /* If VAL is promoted to a wider mode, convert it back to MODE.  Take care
+      of CONST_INTs, where we know the old_mode only from the call argument.  */
+ 
+   old_mode = GET_MODE (val);
+   if (old_mode == VOIDmode)
+     old_mode = TYPE_MODE (TREE_TYPE (exp));
+   val = convert_modes (mode, old_mode, val, 1);
+   return val;
+ }
+ 
+ /* Get the RTL for lhs #INDEX of STMT.  */
+ 
+ static rtx
+ get_atomic_lhs_rtx (gimple stmt, unsigned index)
+ {
+   tree tree_lhs;
+   rtx rtl_lhs;
+   
+   tree_lhs = gimple_atomic_lhs (stmt, index);
+   if (!tree_lhs)
+     return NULL_RTX;
+ 
+   gcc_assert (TREE_CODE (tree_lhs) == SSA_NAME);
+ 
+   rtl_lhs = expand_expr (tree_lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+   return rtl_lhs;
+ }
+ 
+ /* Expand STMT into a library call.  */
+ 
+ void
+ expand_gimple_atomic_library_call (gimple stmt) 
+ {
+   /* Verify the models if inlining hasn't been attempted.  */
+   if (!flag_inline_atomics)
+     get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_order (stmt));
+ 
+   /* Trigger error to come and look so we can complete writing this.  */
+   gcc_assert (stmt == NULL);
+ }
+ 
+ /* Expand atomic load STMT into RTL.  Return true if successful.  */
+ 
+ bool
+ expand_gimple_atomic_load (gimple stmt)
+ {
+   enum machine_mode mode;
+   enum memmodel model;
+   tree type;
+   rtx mem, rtl_rhs, rtl_lhs;
+ 
+   gcc_assert (gimple_atomic_kind (stmt) == GIMPLE_ATOMIC_LOAD);
+ 
+   type = get_atomic_type (stmt);
+   mode = mode_for_size (tree_low_cst (TYPE_SIZE (type), 1), MODE_INT, 0);
+   gcc_assert (mode != BLKmode);
+ 
+   model = get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_order (stmt));
+ 
+   mem = expand_atomic_target (gimple_atomic_target (stmt), mode);
+ 
+   rtl_lhs = get_atomic_lhs_rtx (stmt, 0);
+   rtl_rhs = expand_atomic_load (rtl_lhs, mem, model);
+ 
+   /* If no rtl is generated, indicate the code was not inlined.  */
+   if (!rtl_rhs)
+     return false;
+ 
+   if (rtl_lhs)
+     expand_gimple_assign_move (TREE_TYPE (type), rtl_lhs, rtl_rhs, false);
+   return true;
+ }
+ 
+ 
+ /* Expand atomic store STMT into RTL.  Return true if successful.  */
+ 
+ bool
+ expand_gimple_atomic_store (gimple stmt)
+ {
+   rtx mem, val, rtl_rhs;
+   enum memmodel model;
+   enum machine_mode mode;
+   tree type;
+ 
+   gcc_assert (gimple_atomic_kind (stmt) == GIMPLE_ATOMIC_STORE);
+ 
+   type = get_atomic_type (stmt);
+   mode = mode_for_size (tree_low_cst (TYPE_SIZE (type), 1), MODE_INT, 0);
+   gcc_assert (mode != BLKmode);
+ 
+   model = get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_order (stmt));
+ 
+   /* Expand the operands.  */
+   mem = expand_atomic_target (gimple_atomic_target (stmt), mode);
+   val = expand_expr_force_mode (gimple_atomic_expr (stmt), mode);
+ 
+   rtl_rhs = expand_atomic_store (mem, val, model, false);
+ 
+   /* If no rtl is generated, indicate the code was not inlined.  */
+   if (!rtl_rhs)
+     return false;
+ 
+   return true;
+ }
+ 
+ /* Expand atomic exchange STMT into RTL.  Return true if successful.  */
+ 
+ bool
+ expand_gimple_atomic_exchange (gimple stmt)
+ {
+   rtx mem, val, rtl_rhs, rtl_lhs;
+   enum memmodel model;
+   enum machine_mode mode;
+   tree type;
+ 
+   gcc_assert (gimple_atomic_kind (stmt) == GIMPLE_ATOMIC_EXCHANGE);
+ 
+   type = get_atomic_type (stmt);
+   mode = mode_for_size (tree_low_cst (TYPE_SIZE (type), 1), MODE_INT, 0);
+   gcc_assert (mode != BLKmode);
+ 
+   model = get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_order (stmt));
+ 
+   /* Expand the operands.  */
+   mem = expand_atomic_target (gimple_atomic_target (stmt), mode);
+   val = expand_expr_force_mode (gimple_atomic_expr (stmt), mode);
+ 
+   rtl_lhs = get_atomic_lhs_rtx (stmt, 0);
+   rtl_rhs = expand_atomic_exchange (rtl_lhs, mem, val, model);
+ 
+   /* If no rtl is generated, indicate the code was not inlined.  */
+   if (!rtl_rhs)
+     return false;
+ 
+   if (rtl_lhs)
+     expand_gimple_assign_move (TREE_TYPE (type), rtl_lhs, rtl_rhs, false);
+   return true;
+ }
+ 
+ /* Expand atomic compare_exchange STMT into RTL.  Return true if successful.  */
+ 
+ bool
+ expand_gimple_atomic_compare_exchange (gimple stmt)
+ {
+   rtx mem, val, rtl_lhs1, rtl_lhs2, expect;
+   rtx real_lhs1, real_lhs2;
+   enum memmodel success, failure;
+   enum machine_mode mode;
+   tree type;
+   bool is_weak, emitted;
+ 
+   gcc_assert (gimple_atomic_kind (stmt) == GIMPLE_ATOMIC_COMPARE_EXCHANGE);
+ 
+   type = get_atomic_type (stmt);
+   mode = mode_for_size (tree_low_cst (TYPE_SIZE (type), 1), MODE_INT, 0);
+   gcc_assert (mode != BLKmode);
+ 
+   success = get_memmodel (gimple_atomic_kind (stmt),  gimple_atomic_order (stmt));
+   failure = get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_fail_order (stmt));
+ 
+   /* compare_exchange has additional restrictions on the failure order.  */
+   if (failure == MEMMODEL_RELEASE || failure == MEMMODEL_ACQ_REL)
+     error ("invalid failure memory model for %<__atomic_compare_exchange%>");
+ 
+   if (failure > success)
+     {
+       error ("failure memory model cannot be stronger than success "
+ 	     "memory model for %<__atomic_compare_exchange%>");
+     }
+   
+   /* Expand the operands.  */
+   mem = expand_atomic_target (gimple_atomic_target (stmt), mode);
+   val = expand_expr_force_mode (gimple_atomic_expr (stmt), mode);
+   expect = expand_expr_force_mode (gimple_atomic_expected (stmt), mode);
+   is_weak = gimple_atomic_weak (stmt);
+ 
+   rtl_lhs1 = get_atomic_lhs_rtx (stmt, 0);
+   rtl_lhs2 = get_atomic_lhs_rtx (stmt, 1);
+   real_lhs1 = rtl_lhs1;
+   real_lhs2 = rtl_lhs2;
+   emitted = expand_atomic_compare_and_swap (&real_lhs1, &real_lhs2, mem, expect,
+ 					    val, is_weak, success, failure);
+ 
+   /* If no rtl is generated, indicate the code was not inlined.  */
+   if (!emitted)
+     return false;
+ 
+   if (rtl_lhs1)
+     expand_gimple_assign_move (TREE_TYPE (type), rtl_lhs1, real_lhs1, false);
+   /* The second result is not optional.  */
+   expand_gimple_assign_move (TREE_TYPE (type), rtl_lhs2, real_lhs2, false);
+   return true;
+ }
+ 
+ /* Return the RTL code for the tree operation TCODE.  */
+ 
+ static enum rtx_code
+ rtx_code_from_tree_code (enum tree_code tcode)
+ {
+   switch (tcode)
+     {
+     case PLUS_EXPR:
+       return PLUS;
+     case MINUS_EXPR:
+       return MINUS;
+     case BIT_AND_EXPR:
+       return AND;
+     case BIT_IOR_EXPR:
+       return IOR;
+     case BIT_XOR_EXPR:
+       return XOR;
+     case BIT_NOT_EXPR:
+       return NOT;
+     default :
+       error ("invalid operation type in atomic fetch operation");
+     }
+   return PLUS;
+ }
+ 
+ 
+ /* Expand atomic fetch operation STMT into RTL.  FETCH_AFTER is true if the 
+    value returned is the post operation value.  Return true if successful.  */
+ 
+ static bool
+ expand_atomic_fetch (gimple stmt, bool fetch_after)
+ {
+   rtx mem, val, rtl_rhs, rtl_lhs;
+   enum memmodel model;
+   enum machine_mode mode;
+   tree type;
+   enum rtx_code rcode;
+ 
+ 
+   type = get_atomic_type (stmt);
+   mode = mode_for_size (tree_low_cst (TYPE_SIZE (type), 1), MODE_INT, 0);
+   gcc_assert (mode != BLKmode);
+ 
+   model = get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_order (stmt));
+ 
+   /* Expand the operands.  */
+   mem = expand_atomic_target (gimple_atomic_target (stmt), mode);
+   val = expand_expr_force_mode (gimple_atomic_expr (stmt), mode);
+   rcode = rtx_code_from_tree_code (gimple_atomic_op_code (stmt));
+ 
+   rtl_lhs = get_atomic_lhs_rtx (stmt, 0);
+   rtl_rhs = expand_atomic_fetch_op (rtl_lhs, mem, val, rcode, model,
+ 				    fetch_after);
+ 
+   /* If no rtl is generated, indicate the code was not inlined.  */
+   if (!rtl_rhs)
+     return false;
+ 
+   /* If the result is used, make sure its in correct LHS.  */
+   if (rtl_lhs)
+     expand_gimple_assign_move (TREE_TYPE (type), rtl_lhs, rtl_rhs, false);
+   return true;
+ }
+ 
+ 
+ /* Expand atomic fetch_op operation STMT into RTL.  Return true if successful.  */
+ 
+ bool
+ expand_gimple_atomic_fetch_op (gimple stmt)
+ {
+   gcc_assert (gimple_atomic_kind (stmt) == GIMPLE_ATOMIC_FETCH_OP);
+   return expand_atomic_fetch (stmt, false);
+ }
+ 
+ /* Expand atomic fetch_op operation STMT into RTL.  Return true if successful.  */
+ 
+ bool
+ expand_gimple_atomic_op_fetch (gimple stmt)
+ {
+   gcc_assert (gimple_atomic_kind (stmt) == GIMPLE_ATOMIC_OP_FETCH);
+   return expand_atomic_fetch (stmt, true);
+ }
+ 
+ /* Expand atomic test_and_set STMT into RTL.  Return true if successful.  */
+ 
+ void
+ expand_gimple_atomic_test_and_set (gimple stmt)
+ {
+   rtx mem, rtl_rhs, rtl_lhs;
+   enum memmodel model;
+   enum machine_mode mode;
+   tree type;
+ 
+   gcc_assert (gimple_atomic_kind (stmt) == GIMPLE_ATOMIC_TEST_AND_SET);
+ 
+   type = get_atomic_type (stmt);
+   mode = mode_for_size (tree_low_cst (TYPE_SIZE (type), 1), MODE_INT, 0);
+   gcc_assert (mode != BLKmode);
+ 
+   mode = mode_for_size (BOOL_TYPE_SIZE, MODE_INT, 0);
+   model = get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_order (stmt));
+ 
+   /* Expand the operands.  */
+   mem = expand_atomic_target (gimple_atomic_target (stmt), mode);
+ 
+   rtl_lhs = get_atomic_lhs_rtx (stmt, 0);
+   rtl_rhs = expand_atomic_test_and_set (rtl_lhs, mem, model);
+ 
+   /* Test and set is not allowed to fail.  */
+   gcc_assert (rtl_rhs);
+ 
+   if (rtl_lhs)
+     expand_gimple_assign_move (TREE_TYPE (type), rtl_lhs, rtl_rhs, false);
+ }
+ 
+ #ifndef HAVE_atomic_clear
+ # define HAVE_atomic_clear 0
+ # define gen_atomic_clear(x,y) (gcc_unreachable (), NULL_RTX)
+ #endif
+ 
+ /* Expand atomic clear STMT into RTL.  Return true if successful.  */
+ 
+ void
+ expand_gimple_atomic_clear (gimple stmt)
+ {
+   rtx mem, ret;
+   enum memmodel model;
+   enum machine_mode mode;
+   tree type;
+ 
+   gcc_assert (gimple_atomic_kind (stmt) == GIMPLE_ATOMIC_CLEAR);
+ 
+   type = get_atomic_type (stmt);
+   mode = mode_for_size (tree_low_cst (TYPE_SIZE (type), 1), MODE_INT, 0);
+   gcc_assert (mode != BLKmode);
+ 
+   mode = mode_for_size (BOOL_TYPE_SIZE, MODE_INT, 0);
+   model = get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_order (stmt));
+   mem = expand_atomic_target (gimple_atomic_target (stmt), mode);
+ 
+   if (HAVE_atomic_clear)
+     {
+       emit_insn (gen_atomic_clear (mem, model));
+       return;
+     }
+ 
+   /* Try issuing an __atomic_store, and allow fallback to __sync_lock_release.
+      Failing that, a store is issued by __atomic_store.  The only way this can
+      fail is if the bool type is larger than a word size.  Unlikely, but
+      handle it anyway for completeness.  Assume a single threaded model since
+      there is no atomic support in this case, and no barriers are required.  */
+   ret = expand_atomic_store (mem, const0_rtx, model, true);
+   if (!ret)
+     emit_move_insn (mem, const0_rtx);
+ }
+ 
+ /* Expand atomic fence STMT into RTL.  Return true if successful.  */
+ 
+ void
+ expand_gimple_atomic_fence (gimple stmt)
+ {
+   enum memmodel model;
+   gcc_assert (gimple_atomic_kind (stmt) == GIMPLE_ATOMIC_FENCE);
+ 
+   model = get_memmodel (gimple_atomic_kind (stmt), gimple_atomic_order (stmt));
+ 
+   if (gimple_atomic_thread_fence (stmt))
+     expand_mem_thread_fence (model);
+   else
+     expand_mem_signal_fence (model);
+ }
+ 
+ 
+ /* Return true if FNDECL is an atomic builtin function that can be mapped to a
+    GIMPLE_ATOMIC statement.  */
+ 
+ static bool
+ is_built_in_atomic (tree fndecl)
+ {
+   enum built_in_function fcode;
+ 
+ 
+   if (!fndecl || !DECL_BUILT_IN (fndecl))
+     return false;
+ 
+   if (DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+     return false;
+ 
+   fcode = DECL_FUNCTION_CODE (fndecl);
+   if (fcode >= BUILT_IN_ATOMIC_TEST_AND_SET &&
+       fcode <= BUILT_IN_ATOMIC_SIGNAL_FENCE)
+     return true;
+ 
+   return false;
+ }
+ 
+ /* Return base type for an atomic builtin function.  */
+ 
+ static tree
+ atomic_func_type (unsigned i)
+ {
+   gcc_assert (i <= 4);
+ 
+   switch (i)
+     {
+     case 0:
+       gcc_unreachable ();
+     case 1:
+       return unsigned_intQI_type_node;
+     case 2:
+       return unsigned_intHI_type_node;
+     case 3:
+       return unsigned_intSI_type_node;
+     case 4:
+       return unsigned_intDI_type_node;
+     case 5:
+       return unsigned_intTI_type_node;
+     default:
+       gcc_unreachable ();
+     }
+ }
+ 
+ /* Convert an atomic builtin call at GSI_P into a GIMPLE_ATOMIC statement.  */
+ 
+ static void
+ lower_atomic_call (gimple_stmt_iterator *gsi_p)
+ {
+   tree fndecl;
+   enum built_in_function fcode;
+   gimple s = NULL;
+   tree order;
+   tree target;
+   tree expr;
+   tree type;
+   enum tree_code op;
+   bool fetch_op;
+   gimple stmt = gsi_stmt (*gsi_p);
+ 
+   fndecl = gimple_call_fndecl (stmt);
+   gcc_assert (is_built_in_atomic (fndecl));
+ 
+   fcode = DECL_FUNCTION_CODE (fndecl);
+ 
+   switch (fcode) {
+     case BUILT_IN_ATOMIC_COMPARE_EXCHANGE:
+     case BUILT_IN_ATOMIC_STORE:
+     case BUILT_IN_ATOMIC_LOAD:
+     case BUILT_IN_ATOMIC_EXCHANGE:
+       /* Do nothing for the generic functions at the moment.  */
+       return;
+ 
+     case BUILT_IN_ATOMIC_LOAD_N:
+     case BUILT_IN_ATOMIC_LOAD_1:
+     case BUILT_IN_ATOMIC_LOAD_2:
+     case BUILT_IN_ATOMIC_LOAD_4:
+     case BUILT_IN_ATOMIC_LOAD_8:
+     case BUILT_IN_ATOMIC_LOAD_16:
+       gcc_assert (gimple_call_num_args (stmt) == 2);
+       order = gimple_call_arg (stmt, 1);
+       target = gimple_call_arg (stmt, 0);
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_LOAD_N);
+       s = gimple_build_atomic_load (type, target, order);
+       if (gimple_call_lhs (stmt))
+         gimple_atomic_set_lhs (s, 0, gimple_call_lhs (stmt));
+       break;
+ 
+ 
+     case BUILT_IN_ATOMIC_EXCHANGE_N:
+     case BUILT_IN_ATOMIC_EXCHANGE_1:
+     case BUILT_IN_ATOMIC_EXCHANGE_2:
+     case BUILT_IN_ATOMIC_EXCHANGE_4:
+     case BUILT_IN_ATOMIC_EXCHANGE_8:
+     case BUILT_IN_ATOMIC_EXCHANGE_16:
+       gcc_assert (gimple_call_num_args (stmt) == 3);
+       target = gimple_call_arg (stmt, 0);
+       expr = gimple_call_arg (stmt, 1);
+       order = gimple_call_arg (stmt, 2);
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_EXCHANGE_N);
+       s = gimple_build_atomic_exchange (type, target, expr, order);
+       if (gimple_call_lhs (stmt))
+         gimple_atomic_set_lhs (s, 0, gimple_call_lhs (stmt));
+       break;
+ 
+     case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_N:
+     case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_1:
+     case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_2:
+     case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_4:
+     case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_8:
+     case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_16:
+       {
+         tree tmp1, tmp2, tmp1_type, tmp2_type, deref, deref_tmp;
+         tree expected, fail, weak;
+ 	bool is_weak = false;
+ 
+         gcc_assert (gimple_call_num_args (stmt) == 6);
+ 	target = gimple_call_arg (stmt, 0);
+ 	expected = gimple_call_arg (stmt, 1);
+ 	expr = gimple_call_arg (stmt, 2);
+ 	weak = gimple_call_arg (stmt, 3);
+ 	if (host_integerp (weak, 0) && tree_low_cst (weak, 0) != 0)
+ 	    is_weak = true;
+ 	order = gimple_call_arg (stmt, 4);
+ 	fail = gimple_call_arg (stmt, 5);
+ 
+ 	/* TODO : Translate the original
+ 	   bool = cmp_xch (t,expect,...)
+ 	      into
+ 	   tmp1 = expect;
+ 	   bool, tmp2 = cmp_xch (t,*tmp1,e)
+ 	   *tmp1 = tmp2;  */
+ 	/* tmp1 = expect */
+ 	tmp1_type = TREE_TYPE (expected);
+ 	/* Handle other tree codes  as this assert fails. */
+ 	gcc_assert (TREE_CODE (expected) == ADDR_EXPR);
+ 	tmp2_type = TREE_TYPE (TREE_OPERAND (expected, 0));
+ 
+ 	tmp1 = create_tmp_var (tmp1_type, "cmpxchg_p");
+ 	s = gimple_build_assign (tmp1, expected);
+ 	gimple_set_location (s, gimple_location (stmt));
+ 	gsi_insert_before (gsi_p, s, GSI_SAME_STMT);
+ 
+ 	/* deref_tmp = *tmp1 */
+ 	deref = build2 (MEM_REF, tmp2_type, tmp1, 
+ 			build_int_cst_wide (tmp1_type, 0, 0));
+ 	deref_tmp = create_tmp_var (tmp2_type, "cmpxchg_d");
+ 	s = gimple_build_assign (deref_tmp, deref);
+ 	gimple_set_location (s, gimple_location (stmt));
+ 	gsi_insert_before (gsi_p, s, GSI_SAME_STMT);
+ 
+         /* bool, tmp2 = cmp_exchange (t, deref_tmp, ...) */
+ 	type = atomic_func_type (fcode - BUILT_IN_ATOMIC_COMPARE_EXCHANGE_N);
+ 	s = gimple_build_atomic_compare_exchange (type, target, deref_tmp, expr,
+ 						  order, fail, is_weak);
+ 	gimple_atomic_set_lhs (s, 0, gimple_call_lhs (stmt));
+ 
+ 	tmp2 = create_tmp_var (tmp2_type, "cmpxchg");
+ 	gimple_atomic_set_lhs (s, 1, tmp2);
+ 	gimple_set_location (s, gimple_location (stmt));
+ 
+ 	gsi_insert_before (gsi_p, s, GSI_SAME_STMT);
+ 
+ 	/* *tmp1 = tmp2  */
+ 	deref = build2 (MEM_REF, tmp2_type, tmp1, 
+ 			build_int_cst_wide (tmp1_type, 0, 0));
+ 	s = gimple_build_assign (deref, tmp2);
+ 	break;
+       }
+ 
+     case BUILT_IN_ATOMIC_STORE_N:
+     case BUILT_IN_ATOMIC_STORE_1:
+     case BUILT_IN_ATOMIC_STORE_2:
+     case BUILT_IN_ATOMIC_STORE_4:
+     case BUILT_IN_ATOMIC_STORE_8:
+     case BUILT_IN_ATOMIC_STORE_16:
+       gcc_assert (gimple_call_num_args (stmt) == 3);
+       target = gimple_call_arg (stmt, 0);
+       expr = gimple_call_arg (stmt, 1);
+       order = gimple_call_arg (stmt, 2);
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_STORE_N);
+       s = gimple_build_atomic_store (type, target, expr, order);
+       break;
+ 
+     case BUILT_IN_ATOMIC_ADD_FETCH_N:
+     case BUILT_IN_ATOMIC_ADD_FETCH_1:
+     case BUILT_IN_ATOMIC_ADD_FETCH_2:
+     case BUILT_IN_ATOMIC_ADD_FETCH_4:
+     case BUILT_IN_ATOMIC_ADD_FETCH_8:
+     case BUILT_IN_ATOMIC_ADD_FETCH_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_ADD_FETCH_N);
+       op = PLUS_EXPR;
+       fetch_op = false;
+ fetch_body:
+       gcc_assert (gimple_call_num_args (stmt) == 3);
+       target = gimple_call_arg (stmt, 0);
+       expr = gimple_call_arg (stmt, 1);
+       order = gimple_call_arg (stmt, 2);
+       if (fetch_op)
+ 	s = gimple_build_atomic_fetch_op (type, target, expr, op, order);
+       else
+ 	s = gimple_build_atomic_op_fetch (type, target, expr, op, order);
+       if (gimple_call_lhs (stmt))
+         gimple_atomic_set_lhs (s, 0, gimple_call_lhs (stmt));
+       break;
+ 
+     case BUILT_IN_ATOMIC_FETCH_ADD_N:
+     case BUILT_IN_ATOMIC_FETCH_ADD_1:
+     case BUILT_IN_ATOMIC_FETCH_ADD_2:
+     case BUILT_IN_ATOMIC_FETCH_ADD_4:
+     case BUILT_IN_ATOMIC_FETCH_ADD_8:
+     case BUILT_IN_ATOMIC_FETCH_ADD_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_FETCH_ADD_N);
+       op = PLUS_EXPR;
+       fetch_op = true;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_SUB_FETCH_N:
+     case BUILT_IN_ATOMIC_SUB_FETCH_1:
+     case BUILT_IN_ATOMIC_SUB_FETCH_2:
+     case BUILT_IN_ATOMIC_SUB_FETCH_4:
+     case BUILT_IN_ATOMIC_SUB_FETCH_8:
+     case BUILT_IN_ATOMIC_SUB_FETCH_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_SUB_FETCH_N);
+       op = MINUS_EXPR;
+       fetch_op = false;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_FETCH_SUB_N:
+     case BUILT_IN_ATOMIC_FETCH_SUB_1:
+     case BUILT_IN_ATOMIC_FETCH_SUB_2:
+     case BUILT_IN_ATOMIC_FETCH_SUB_4:
+     case BUILT_IN_ATOMIC_FETCH_SUB_8:
+     case BUILT_IN_ATOMIC_FETCH_SUB_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_FETCH_SUB_N);
+       op = MINUS_EXPR;
+       fetch_op = true;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_AND_FETCH_N:
+     case BUILT_IN_ATOMIC_AND_FETCH_1:
+     case BUILT_IN_ATOMIC_AND_FETCH_2:
+     case BUILT_IN_ATOMIC_AND_FETCH_4:
+     case BUILT_IN_ATOMIC_AND_FETCH_8:
+     case BUILT_IN_ATOMIC_AND_FETCH_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_AND_FETCH_N);
+       op = BIT_AND_EXPR;
+       fetch_op = false;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_FETCH_AND_N:
+     case BUILT_IN_ATOMIC_FETCH_AND_1:
+     case BUILT_IN_ATOMIC_FETCH_AND_2:
+     case BUILT_IN_ATOMIC_FETCH_AND_4:
+     case BUILT_IN_ATOMIC_FETCH_AND_8:
+     case BUILT_IN_ATOMIC_FETCH_AND_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_FETCH_AND_N);
+       op = BIT_AND_EXPR;
+       fetch_op = true;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_XOR_FETCH_N:
+     case BUILT_IN_ATOMIC_XOR_FETCH_1:
+     case BUILT_IN_ATOMIC_XOR_FETCH_2:
+     case BUILT_IN_ATOMIC_XOR_FETCH_4:
+     case BUILT_IN_ATOMIC_XOR_FETCH_8:
+     case BUILT_IN_ATOMIC_XOR_FETCH_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_XOR_FETCH_N);
+       op = BIT_XOR_EXPR;
+       fetch_op = false;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_FETCH_XOR_N:
+     case BUILT_IN_ATOMIC_FETCH_XOR_1:
+     case BUILT_IN_ATOMIC_FETCH_XOR_2:
+     case BUILT_IN_ATOMIC_FETCH_XOR_4:
+     case BUILT_IN_ATOMIC_FETCH_XOR_8:
+     case BUILT_IN_ATOMIC_FETCH_XOR_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_FETCH_XOR_N);
+       op = BIT_XOR_EXPR;
+       fetch_op = true;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_OR_FETCH_N:
+     case BUILT_IN_ATOMIC_OR_FETCH_1:
+     case BUILT_IN_ATOMIC_OR_FETCH_2:
+     case BUILT_IN_ATOMIC_OR_FETCH_4:
+     case BUILT_IN_ATOMIC_OR_FETCH_8:
+     case BUILT_IN_ATOMIC_OR_FETCH_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_OR_FETCH_N);
+       op = BIT_IOR_EXPR;
+       fetch_op = false;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_FETCH_OR_N:
+     case BUILT_IN_ATOMIC_FETCH_OR_1:
+     case BUILT_IN_ATOMIC_FETCH_OR_2:
+     case BUILT_IN_ATOMIC_FETCH_OR_4:
+     case BUILT_IN_ATOMIC_FETCH_OR_8:
+     case BUILT_IN_ATOMIC_FETCH_OR_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_FETCH_OR_N);
+       op = BIT_IOR_EXPR;
+       fetch_op = true;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_NAND_FETCH_N:
+     case BUILT_IN_ATOMIC_NAND_FETCH_1:
+     case BUILT_IN_ATOMIC_NAND_FETCH_2:
+     case BUILT_IN_ATOMIC_NAND_FETCH_4:
+     case BUILT_IN_ATOMIC_NAND_FETCH_8:
+     case BUILT_IN_ATOMIC_NAND_FETCH_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_NAND_FETCH_N);
+       op = BIT_NOT_EXPR;
+       fetch_op = false;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_FETCH_NAND_N:
+     case BUILT_IN_ATOMIC_FETCH_NAND_1:
+     case BUILT_IN_ATOMIC_FETCH_NAND_2:
+     case BUILT_IN_ATOMIC_FETCH_NAND_4:
+     case BUILT_IN_ATOMIC_FETCH_NAND_8:
+     case BUILT_IN_ATOMIC_FETCH_NAND_16:
+       type = atomic_func_type (fcode - BUILT_IN_ATOMIC_FETCH_NAND_N);
+       op = BIT_NOT_EXPR;
+       fetch_op = true;
+       goto fetch_body;
+ 
+     case BUILT_IN_ATOMIC_TEST_AND_SET:
+       gcc_assert (gimple_call_num_args (stmt) == 2);
+       target = gimple_call_arg (stmt, 0);
+       order = gimple_call_arg (stmt, 1);
+       s = gimple_build_atomic_test_and_set (target, order);
+       if (gimple_call_lhs (stmt))
+         gimple_atomic_set_lhs (s, 0, gimple_call_lhs (stmt));
+       break;
+ 
+     case BUILT_IN_ATOMIC_CLEAR:
+       gcc_assert (gimple_call_num_args (stmt) == 2);
+       target = gimple_call_arg (stmt, 0);
+       order = gimple_call_arg (stmt, 1);
+       s = gimple_build_atomic_clear (target, order);
+       break;
+ 
+     case BUILT_IN_ATOMIC_THREAD_FENCE:
+       gcc_assert (gimple_call_num_args (stmt) == 1);
+       order = gimple_call_arg (stmt, 0);
+       s = gimple_build_atomic_fence (order, true);
+       break;
+ 
+     case BUILT_IN_ATOMIC_SIGNAL_FENCE:
+       gcc_assert (gimple_call_num_args (stmt) == 1);
+       order = gimple_call_arg (stmt, 0);
+       s = gimple_build_atomic_fence (order, false);
+       break;
+ 
+     default:
+       gcc_unreachable ();
+   }
+  
+  gcc_assert (s != NULL);
+ 
+  gimple_set_location (s, gimple_location (stmt));
+  gsi_insert_after (gsi_p, s, GSI_SAME_STMT);
+  gsi_remove (gsi_p, true);
+ 
+ }
+ 
+ /* Conversion of atomic builtin functions to tree codes.  Scan the
+    function looking for BUILT_IN_ATOMIC_* functions and replace them with
+    the eqivilent atomic tree codes.  */
+ 
+ static unsigned int
+ lower_atomics (void)
+ {
+   basic_block bb;
+   gimple_stmt_iterator gsi;
+ 
+   FOR_EACH_BB (bb)
+     {
+       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+       	{
+ 	  if (gimple_code (gsi_stmt (gsi)) == GIMPLE_CALL)
+ 	    {
+ 	      if (is_built_in_atomic (gimple_call_fndecl (gsi_stmt (gsi))))
+ 		lower_atomic_call (&gsi);
+ 	    }
+ 	}
+     }
+   return 0;
+ }
+ 
+ 
+ /* Gate to enable lowering of atomic operations.  As this will replace the 
+    built-in support, always do it.  */
+ 
+ static bool
+ gate_lower_atomics (void)
+ {
+   return 1;
+ }
+ 
+ struct gimple_opt_pass pass_lower_atomics =
+ {
+   {
+     GIMPLE_PASS,
+     "lower_atomics",			/* name */
+     gate_lower_atomics,			/* gate */
+     lower_atomics,			/* execute */
+     NULL,				/* sub */
+     NULL,				/* next */
+     0,					/* static_pass_number */
+     TV_NONE,				/* tv_id */
+     PROP_cfg,				/* properties_required */
+     0,					/* properties_provided */
+     0,					/* properties_destroyed */
+     0,					/* todo_flags_start */
+     0,					/* todo_flags_finish */
+   }
+ };
+ 
Index: tree-ssa-operands.c
===================================================================
*** tree-ssa-operands.c	(revision 186098)
--- tree-ssa-operands.c	(working copy)
*************** parse_ssa_operands (gimple stmt)
*** 1063,1068 ****
--- 1063,1078 ----
  			   opf_use | opf_no_vops);
        break;
  
+     case GIMPLE_ATOMIC:
+       /* Atomic operations are memory barriers in both directions for now.  */
+       add_virtual_operand (stmt, opf_def | opf_use);
+       
+       for (n = 0; n < gimple_atomic_num_lhs (stmt); n++)
+ 	get_expr_operands (stmt, gimple_atomic_lhs_ptr (stmt, n), opf_def);
+       for (n = 0; n < gimple_atomic_num_rhs (stmt); n++)
+ 	get_expr_operands (stmt, gimple_op_ptr (stmt, n), opf_use);
+       break;
+       
      case GIMPLE_RETURN:
        append_vuse (gimple_vop (cfun));
        goto do_default;
Index: gimple-pretty-print.c
===================================================================
*** gimple-pretty-print.c	(revision 186098)
--- gimple-pretty-print.c	(working copy)
*************** dump_gimple_call (pretty_printer *buffer
*** 749,754 ****
--- 749,1016 ----
      }
  }
  
+ /* Dump the tree opcode for an atomic_fetch stmt GS into BUFFER.  */
+ 
+ static void
+ dump_gimple_atomic_kind_op (pretty_printer *buffer, const_gimple gs)
+ {
+   switch (gimple_atomic_op_code (gs))
+     {
+     case PLUS_EXPR:
+       pp_string (buffer, "ADD");
+     break;
+ 
+     case MINUS_EXPR:
+       pp_string (buffer, "SUB");
+     break;
+ 
+     case BIT_AND_EXPR:
+       pp_string (buffer, "AND");
+     break;
+ 
+     case BIT_IOR_EXPR:
+       pp_string (buffer, "OR");
+     break;
+ 
+     case BIT_XOR_EXPR:
+       pp_string (buffer, "XOR");
+     break;
+ 
+     case BIT_NOT_EXPR:	/* This is used for NAND in the builtins.  */
+       pp_string (buffer, "NAND");
+     break;
+ 
+     default:
+      gcc_unreachable ();
+     }
+ }
+ 
+ /* Dump a memory order node ORDER. BUFFER, SPC and FLAGS are as in
+    dump_generic_node.  */
+ 
+ static void
+ dump_gimple_atomic_order (pretty_printer *buffer, tree t, int spc, int flags)
+ {
+   enum memmodel order;
+ 
+   if (TREE_CODE (t) != INTEGER_CST)
+     {
+       dump_generic_node (buffer, t, spc, flags, false);
+       return;
+     }
+ 
+   order = (enum memmodel) TREE_INT_CST_LOW (t);
+   switch (order)
+     {
+     case MEMMODEL_RELAXED:
+       pp_string (buffer, "RELAXED");
+       break;
+ 
+     case MEMMODEL_CONSUME:
+       pp_string (buffer, "CONSUME");
+       break;
+ 
+     case MEMMODEL_ACQUIRE:
+       pp_string (buffer, "ACQUIRE");
+       break;
+ 
+     case MEMMODEL_RELEASE:
+       pp_string (buffer, "RELEASE");
+       break;
+ 
+     case MEMMODEL_ACQ_REL:
+       pp_string (buffer, "ACQ_REL");
+       break;
+ 
+     case MEMMODEL_SEQ_CST:
+       pp_string (buffer, "SEQ_CST");
+       break;
+ 
+     default:
+       gcc_unreachable ();
+       break;
+     }
+ }
+ 
+ /* Dump the appropriate suffix size for an atomic statement GS into BUFFER.  */
+ 
+ static void
+ dump_gimple_atomic_type_size (pretty_printer *buffer, const_gimple gs)
+ {
+   tree t = gimple_atomic_type (gs);
+   unsigned n = TREE_INT_CST_LOW (TYPE_SIZE (t));
+   switch (n)
+     {
+     case 8:
+       pp_string (buffer, "_1 <");
+       break;
+ 
+     case 16:
+       pp_string (buffer, "_2 <");
+       break;
+ 
+     case 32:
+       pp_string (buffer, "_4 <");
+       break;
+ 
+     case 64:
+       pp_string (buffer, "_8 <");
+       break;
+ 
+     case 128:
+       pp_string (buffer, "_16 <");
+       break;
+ 
+     default:
+       pp_string (buffer, " <");
+       break;
+     }
+ }
+ 
+ /* Dump the atomic statement GS.  BUFFER, SPC and FLAGS are as in
+    dump_gimple_stmt.  */
+ 
+ static void
+ dump_gimple_atomic (pretty_printer *buffer, gimple gs, int spc, int flags)
+ {
+   if (gimple_atomic_num_lhs (gs) == 1)
+     {
+       dump_generic_node (buffer, gimple_atomic_lhs (gs, 0), spc, flags, false);
+       pp_string (buffer, " = ");
+     }
+   else if (gimple_atomic_num_lhs (gs) > 1)
+     {
+       /* The first LHS is still optional, so print both results only if the
+          first one is present.  */
+       if (gimple_atomic_lhs (gs, 0))
+         {
+ 	  pp_string (buffer, "(");
+ 
+ 	  dump_generic_node (buffer, gimple_atomic_lhs (gs, 0), spc, flags,
+ 			     false);
+ 	  pp_string (buffer, ", ");
+ 	  dump_generic_node (buffer, gimple_atomic_lhs (gs, 1), spc, flags,
+ 			     false);
+ 	  pp_string (buffer, ") = ");
+ 	}
+       else
+        {
+ 	  /* Otherwise just print the result that has to be there.  */
+ 	  dump_generic_node (buffer, gimple_atomic_lhs (gs, 1), spc, flags,
+ 			     false);
+ 	  pp_string (buffer, " = ");
+        }
+     }
+    
+   switch (gimple_atomic_kind (gs))
+     {
+     case GIMPLE_ATOMIC_LOAD:
+       pp_string (buffer, "ATOMIC_LOAD");
+       dump_gimple_atomic_type_size (buffer, gs);
+       dump_generic_node (buffer, gimple_atomic_target (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_gimple_atomic_order (buffer, gimple_atomic_order (gs), spc, flags);
+       pp_string (buffer, "> ");
+       break;
+ 
+     case GIMPLE_ATOMIC_STORE:
+       pp_string (buffer, "ATOMIC_STORE");
+       dump_gimple_atomic_type_size (buffer, gs);
+       dump_generic_node (buffer, gimple_atomic_target (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_generic_node (buffer, gimple_atomic_expr (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_gimple_atomic_order (buffer, gimple_atomic_order (gs), spc, flags);
+       pp_string (buffer, "> ");
+       break;
+ 
+     case GIMPLE_ATOMIC_EXCHANGE:
+       pp_string (buffer, "ATOMIC_EXCHANGE");
+       dump_gimple_atomic_type_size (buffer, gs);
+       dump_generic_node (buffer, gimple_atomic_target (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_generic_node (buffer, gimple_atomic_expr (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_gimple_atomic_order (buffer, gimple_atomic_order (gs), spc, flags);
+       pp_string (buffer, "> ");
+       break;
+ 
+     case GIMPLE_ATOMIC_COMPARE_EXCHANGE:
+       pp_string (buffer, "ATOMIC_COMPARE_EXCHANGE_");
+       if (gimple_atomic_weak (gs))
+ 	pp_string (buffer, "WEAK");
+       else
+ 	pp_string (buffer, "STRONG");
+       dump_gimple_atomic_type_size (buffer, gs);
+       dump_generic_node (buffer, gimple_atomic_target (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_generic_node (buffer, gimple_atomic_expected (gs), spc, flags,
+ 			false);
+       pp_string (buffer, ", ");
+       dump_generic_node (buffer, gimple_atomic_expr (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_gimple_atomic_order (buffer, gimple_atomic_order (gs), spc, flags);
+       pp_string (buffer, ", ");
+       dump_gimple_atomic_order (buffer, gimple_atomic_fail_order (gs), spc,
+ 				flags);
+       pp_string (buffer, "> ");
+       break;
+ 
+     case GIMPLE_ATOMIC_FETCH_OP:
+       pp_string (buffer, "ATOMIC_FETCH_");
+       dump_gimple_atomic_kind_op (buffer, gs);
+       dump_gimple_atomic_type_size (buffer, gs);
+       dump_generic_node (buffer, gimple_atomic_target (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_generic_node (buffer, gimple_atomic_expr (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_gimple_atomic_order (buffer, gimple_atomic_order (gs), spc, flags);
+       pp_string (buffer, "> ");
+       break;
+ 
+     case GIMPLE_ATOMIC_OP_FETCH:
+       pp_string (buffer, "ATOMIC_");
+       dump_gimple_atomic_kind_op (buffer, gs);
+       pp_string (buffer, "_FETCH");
+       dump_gimple_atomic_type_size (buffer, gs);
+       dump_generic_node (buffer, gimple_atomic_target (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_generic_node (buffer, gimple_atomic_expr (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_gimple_atomic_order (buffer, gimple_atomic_order (gs), spc, flags);
+       pp_string (buffer, "> ");
+       break;
+ 
+     case GIMPLE_ATOMIC_TEST_AND_SET:
+       pp_string (buffer, "ATOMIC_TEST_AND_SET <");
+       dump_generic_node (buffer, gimple_atomic_target (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_gimple_atomic_order (buffer, gimple_atomic_order (gs), spc, flags);
+       pp_string (buffer, "> ");
+       break;
+ 
+     case GIMPLE_ATOMIC_CLEAR:
+       pp_string (buffer, "ATOMIC_CLEAR <");
+       dump_generic_node (buffer, gimple_atomic_target (gs), spc, flags, false);
+       pp_string (buffer, ", ");
+       dump_gimple_atomic_order (buffer, gimple_atomic_order (gs), spc, flags);
+       pp_string (buffer, "> ");
+       break;
+ 
+     case GIMPLE_ATOMIC_FENCE:
+       if (gimple_atomic_thread_fence (gs))
+ 	pp_string (buffer, "ATOMIC_THREAD_FENCE <");
+       else
+ 	pp_string (buffer, "ATOMIC_SIGNAL_FENCE <");
+       dump_gimple_atomic_order (buffer, gimple_atomic_order (gs), spc, flags);
+       pp_string (buffer, "> ");
+       break;
+ 
+     default:
+      gcc_unreachable ();
+     }
+ }
+ 
  
  /* Dump the switch statement GS.  BUFFER, SPC and FLAGS are as in
     dump_gimple_stmt.  */
*************** dump_gimple_stmt (pretty_printer *buffer
*** 1920,1925 ****
--- 2182,2191 ----
        dump_gimple_call (buffer, gs, spc, flags);
        break;
  
+     case GIMPLE_ATOMIC:
+       dump_gimple_atomic (buffer, gs, spc, flags);
+       break;
+ 
      case GIMPLE_COND:
        dump_gimple_cond (buffer, gs, spc, flags);
        break;
Index: tree-cfg.c
===================================================================
*** tree-cfg.c	(revision 186098)
--- tree-cfg.c	(working copy)
*************** verify_gimple_return (gimple stmt)
*** 4073,4078 ****
--- 4073,4113 ----
    return false;
  }
  
+ /* Verify that STMT is a valid GIMPLE_ATOMIC statement.  */
+ 
+ static bool
+ verify_gimple_atomic (gimple stmt)
+ {
+   enum gimple_atomic_kind kind = gimple_atomic_kind (stmt);
+ 
+   switch (kind)
+     {
+     case GIMPLE_ATOMIC_LOAD:
+       break;
+ 
+     case GIMPLE_ATOMIC_STORE:
+     case GIMPLE_ATOMIC_EXCHANGE:
+       break;
+ 
+     case GIMPLE_ATOMIC_COMPARE_EXCHANGE:
+       break;
+ 
+     case GIMPLE_ATOMIC_FETCH_OP:
+     case GIMPLE_ATOMIC_OP_FETCH:
+       break;
+ 
+     case GIMPLE_ATOMIC_TEST_AND_SET:
+     case GIMPLE_ATOMIC_CLEAR:
+       break;
+ 
+     case GIMPLE_ATOMIC_FENCE:
+       break;
+ 
+     default:
+       gcc_unreachable ();
+     }
+   return false;
+ }
  
  /* Verify the contents of a GIMPLE_GOTO STMT.  Returns true when there
     is a problem, otherwise false.  */
*************** verify_gimple_stmt (gimple stmt)
*** 4174,4179 ****
--- 4209,4217 ----
      case GIMPLE_ASSIGN:
        return verify_gimple_assign (stmt);
  
+     case GIMPLE_ATOMIC:
+       return verify_gimple_atomic (stmt);
+ 
      case GIMPLE_LABEL:
        return verify_gimple_label (stmt);
  
Index: tree-pass.h
===================================================================
*** tree-pass.h	(revision 186098)
--- tree-pass.h	(working copy)
*************** extern struct gimple_opt_pass pass_tm_me
*** 455,460 ****
--- 455,461 ----
  extern struct gimple_opt_pass pass_tm_edges;
  extern struct gimple_opt_pass pass_split_functions;
  extern struct gimple_opt_pass pass_feedback_split_functions;
+ extern struct gimple_opt_pass pass_lower_atomics;
  
  /* IPA Passes */
  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: passes.c
===================================================================
*** passes.c	(revision 186098)
--- passes.c	(working copy)
*************** init_optimization_passes (void)
*** 1188,1193 ****
--- 1188,1194 ----
    NEXT_PASS (pass_refactor_eh);
    NEXT_PASS (pass_lower_eh);
    NEXT_PASS (pass_build_cfg);
+   NEXT_PASS (pass_lower_atomics);
    NEXT_PASS (pass_warn_function_return);
    NEXT_PASS (pass_build_cgraph_edges);
    *p = NULL;
Index: gimple-low.c
===================================================================
*** gimple-low.c	(revision 186098)
--- gimple-low.c	(working copy)
*************** lower_stmt (gimple_stmt_iterator *gsi, s
*** 404,409 ****
--- 404,410 ----
      case GIMPLE_NOP:
      case GIMPLE_ASM:
      case GIMPLE_ASSIGN:
+     case GIMPLE_ATOMIC:
      case GIMPLE_PREDICT:
      case GIMPLE_LABEL:
      case GIMPLE_EH_MUST_NOT_THROW:
Index: tree-ssa-alias.c
===================================================================
*** tree-ssa-alias.c	(revision 186098)
--- tree-ssa-alias.c	(working copy)
*************** ref_maybe_used_by_stmt_p (gimple stmt, t
*** 1440,1445 ****
--- 1440,1447 ----
      }
    else if (is_gimple_call (stmt))
      return ref_maybe_used_by_call_p (stmt, ref);
+   else if (is_gimple_atomic (stmt))
+     return true;
    else if (gimple_code (stmt) == GIMPLE_RETURN)
      {
        tree retval = gimple_return_retval (stmt);
*************** stmt_may_clobber_ref_p_1 (gimple stmt, a
*** 1762,1767 ****
--- 1764,1771 ----
      }
    else if (gimple_code (stmt) == GIMPLE_ASM)
      return true;
+   else if (is_gimple_atomic (stmt))
+     return true;
  
    return false;
  }
*************** stmt_kills_ref_p_1 (gimple stmt, ao_ref 
*** 1814,1819 ****
--- 1818,1825 ----
  	}
      }
  
+   if (is_gimple_atomic (stmt))
+     return true;
    if (is_gimple_call (stmt))
      {
        tree callee = gimple_call_fndecl (stmt);
Index: tree-ssa-sink.c
===================================================================
*** tree-ssa-sink.c	(revision 186098)
--- tree-ssa-sink.c	(working copy)
*************** is_hidden_global_store (gimple stmt)
*** 145,150 ****
--- 145,154 ----
      {
        tree lhs;
  
+       /* Don't optimize across an atomic operation.  */
+       if (is_gimple_atomic (stmt))
+         return true;
+ 
        gcc_assert (is_gimple_assign (stmt) || is_gimple_call (stmt));
  
        /* Note that we must not check the individual virtual operands
Index: tree-ssa-dce.c
===================================================================
*** tree-ssa-dce.c	(revision 186098)
--- tree-ssa-dce.c	(working copy)
*************** propagate_necessity (struct edge_list *e
*** 920,925 ****
--- 920,943 ----
  		    mark_aliased_reaching_defs_necessary (stmt, arg);
  		}
  	    }
+ 	  else if (is_gimple_atomic (stmt))
+ 	    {
+ 	      unsigned n;
+ 
+ 	      /* We may be able to lessen this with more relaxed memory
+ 	         models, but for now, its a full barrier.  */
+ 	      mark_all_reaching_defs_necessary (stmt);
+ 
+ 	      for (n = 0; n < gimple_atomic_num_rhs (stmt); n++)
+ 	        {
+ 		  tree t = gimple_op (stmt, n);
+ 		  if (TREE_CODE (t) != SSA_NAME && 
+ 		      TREE_CODE (t) != INTEGER_CST &&
+ 		      !is_gimple_min_invariant (t) &&
+ 		      !ref_may_be_aliased (t))
+ 		    mark_aliased_reaching_defs_necessary (stmt, t);
+ 		}
+ 	    }
  	  else if (gimple_assign_single_p (stmt))
  	    {
  	      tree rhs;
Index: tree-inline.c
===================================================================
*** tree-inline.c	(revision 186098)
--- tree-inline.c	(working copy)
*************** estimate_num_insns (gimple stmt, eni_wei
*** 3565,3570 ****
--- 3565,3579 ----
  	break;
        }
  
+     case GIMPLE_ATOMIC:
+       /* Treat this like a call for now, it may expand into a call.  */
+       if (gimple_atomic_kind (stmt) != GIMPLE_ATOMIC_FENCE)
+ 	cost = gimple_num_ops (stmt) *
+ 	       estimate_move_cost (TREE_TYPE (gimple_atomic_target (stmt)));
+       else
+         cost = 1;
+       break;
+ 
      case GIMPLE_RETURN:
        return weights->return_cost;
  
Index: ipa-pure-const.c
===================================================================
*** ipa-pure-const.c	(revision 186098)
--- ipa-pure-const.c	(working copy)
*************** check_stmt (gimple_stmt_iterator *gsip, 
*** 712,717 ****
--- 712,722 ----
            local->looping = true;
  	}
        return;
+     case GIMPLE_ATOMIC:
+       if (dump_file)
+ 	fprintf (dump_file, "    atomic is not const/pure");
+       local->pure_const_state = IPA_NEITHER;
+       return;
      default:
        break;
      }