diff mbox

[PR51752] publication safety violations in loop invariant motion pass

Message ID 4F46833C.2090808@redhat.com
State New
Headers show

Commit Message

Aldy Hernandez Feb. 23, 2012, 6:19 p.m. UTC
In this PR we have a publication safety violation in a transaction.  The 
loop invariant motion pass hoists a load out of a loop, creating a load 
data race.  Those unfamiliar with load data races in transactions, 
please see the PR, as this has been confusing to most (me included).

In the snippet below, we are not allowed to hoist DATA out unless it 
will be loaded on every path out of the loop:

  __transaction_atomic
    {
      for (i = 0; i < 10; i++)
        if (x[i])
          x[i] += DATA;
      // OK to hoist DATA above out if also loaded here:
      // blah = DATA;
    }

rth gave me the evil eye on a previous incantation of this patch, and 
I'm sure this one is not totally devoid of grime.

The main problem is how to record all loads in a given function in an 
efficient manner, so we can properly tease the information out in 
every_path_out_of_loop_has_load().  In my first revision I had some data 
flow equations that calculated loads on different paths, and rth just 
about hit me.  Instead now I save all loads in a function and iterate 
through them in a brute force way.  I'd like to rewrite this into a hash 
of some sort, but before I go any further I'm interested to know if the 
main idea is ok.

FYI, it has been suggested to use the mem_ref_p information already 
available in the pass, but I need information on all loads, not just 
those occurring inside loops.

BTW, this only fixes the loop invariant motion pass.  I'm sure there are 
other passes that will need equally painful fixes.

Aldy
* tree-ssa-loop-im.c: New global all_tm_blocks.
	(every_path_out_of_loop_has_load): New.
	(movement_possibility): Restrict movement of transaction loads.
	(tree_ssa_lim_initialize): Call tree_ssa_lim_tm_initialize.
	(tree_ssa_lim_finalize): Call tree_ssa_lim_tm_finalize.
	(tree_ssa_lim_tm_initialize): New.
	(tree_ssa_lim_tm_finalize): New.
	* tree.h (get_all_tm_blocks): Protoize.
	* trans-mem.c (tm_region_init): Use the heap to store BB
	auxilliary data.
	(get_all_tm_blocks): New.

Comments

Aldy Hernandez Feb. 23, 2012, 9:11 p.m. UTC | #1
On 02/23/12 12:19, Aldy Hernandez wrote:

> about hit me. Instead now I save all loads in a function and iterate
> through them in a brute force way. I'd like to rewrite this into a hash
> of some sort, but before I go any further I'm interested to know if the
> main idea is ok.

For the record, it may be ideal to reuse some of the iterations we 
already do over the function's basic blocks, so we don't have to iterate 
yet again over the IL stream.  Though it may complicate the pass 
unnecessarily.
Richard Biener Feb. 24, 2012, 8:58 a.m. UTC | #2
On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 02/23/12 12:19, Aldy Hernandez wrote:
>
>> about hit me. Instead now I save all loads in a function and iterate
>> through them in a brute force way. I'd like to rewrite this into a hash
>> of some sort, but before I go any further I'm interested to know if the
>> main idea is ok.
>
>
> For the record, it may be ideal to reuse some of the iterations we already
> do over the function's basic blocks, so we don't have to iterate yet again
> over the IL stream.  Though it may complicate the pass unnecessarily.

It's definitely ugly that you need to do this.  And yes, you have to look at
very many passes I guess, PRE comes to my mind at least.

I still do not understand the situation very well, at least why for

 transaction { for () if (b) load; } load;

it should be safe to hoist load out of the transaction while for

 load; transaction { for () if (b) load; }

it is not.  Neither do I understand why it's not ok for

 transaction { for () if (b) load; }

to hoist the load out of the transaction.

I assume all the issue is about hoisting things over the trasaction start.
So - why does the transaction start not simply clobber all global memory?
That would avoid the hoisting.  I assume that  transforming the above to

 transaction { tem = load; for () if (b) = tem; }

is ok?

Thanks,
Richard.
Torvald Riegel Feb. 24, 2012, 1:10 p.m. UTC | #3
On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote:
> On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> > On 02/23/12 12:19, Aldy Hernandez wrote:
> >
> >> about hit me. Instead now I save all loads in a function and iterate
> >> through them in a brute force way. I'd like to rewrite this into a hash
> >> of some sort, but before I go any further I'm interested to know if the
> >> main idea is ok.
> >
> >
> > For the record, it may be ideal to reuse some of the iterations we already
> > do over the function's basic blocks, so we don't have to iterate yet again
> > over the IL stream.  Though it may complicate the pass unnecessarily.
> 
> It's definitely ugly that you need to do this.

Indeed.  But that's simply the price we have to pay for making
publication with transactions easier for the programmer yet still at
acceptable performance.

Also note that what we primarily have to care about for this PR is to
not hoist loads _within_ transactions if they would violate publication
safety.  I didn't have time to look at Aldy's patch yet, but a first
safe and conservative way would be to treat transactions as full
transformation barriers, and prevent publication-safety-violating
transformations _within_ transactions.  Which I would prefer until we're
confident that we understood all of it.

For hoisting out of or across transactions, we have to reason about more
than just publication safety.

> And yes, you have to look at
> very many passes I guess, PRE comes to my mind at least.
> 
> I still do not understand the situation very well, at least why for
> 
>  transaction { for () if (b) load; } load;
> 
> it should be safe to hoist load out of the transaction

This one is funny.  *Iff* this is an atomic txn, we can assume that the
transaction does not wait for a concurrent event. If it is a relaxed
txn, then we must not have a loop which terminates depending on an
atomic load (which we don't in that example); otherwise, we cannot hoist
load to before the txn (same reason as why we must not hoist to before
an atomic load with memory_order_acquire).
Now, if these things hold, then the load will always be executed after
the txn.  Thus, we can assume that it will happen anyway and nothing can
stop it (irrespective of which position the txn gets in the
Transactional Synchronization Order (see the C++ TM spec)).  It is a
nonatomic load, and we can rely on the program being data-race-free, so
we cannot have other threads storing to the same location, so we can
hoist it across because in that part of the program, the location is
guaranteed to be thread-local data (or immutable).

As I said, this isn't related to just pub safety anymore.  And this is
tricky enough that I'd rather not try to optimize this currently but
instead wait until we have more confidence in our understanding of the
matter.

> while for
> 
>  load; transaction { for () if (b) load; }
> 
> it is not.

If the same assumptions as above hold, I think you can hoist it out,
because again you can assume that it targets thread-local/immutable
data: the nontransactional (+nonatomic!) load can happen at any time,
essentially, irrespective of b's value or how/when other threads modify
it.  Thus, it cannot have been changed between the two loads in a
data-race-free program.

> Neither do I understand why it's not ok for
> 
>  transaction { for () if (b) load; }
> 
> to hoist the load out of the transaction.

You would violate publication safety.

Also, you don't have any reason to believe that load targets
thread-local/immutable data, so you must not make it nontransactional
(otherwise, you could introduce a race condition).

> 
> I assume all the issue is about hoisting things over the trasaction start.
> So - why does the transaction start not simply clobber all global memory?
> That would avoid the hoisting.  I assume that  transforming the above to
> 
>  transaction { tem = load; for () if (b) = tem; }
> 
> is ok?

No, it is not.  Actually, this is precisely the transformation that we
need to prevent from happening.
As Aldy said, please see the explanations in the PR, or have a look at
the C++ TM specification and the C++11 memory model alternatively.  We
could also discuss this on IRC, if this would be easier.

Torvald
Aldy Hernandez Feb. 24, 2012, 4:34 p.m. UTC | #4
On 02/24/12 07:10, Torvald Riegel wrote:
> On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote:
>> On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez<aldyh@redhat.com>  wrote:
>>> On 02/23/12 12:19, Aldy Hernandez wrote:
>>>
>>>> about hit me. Instead now I save all loads in a function and iterate
>>>> through them in a brute force way. I'd like to rewrite this into a hash
>>>> of some sort, but before I go any further I'm interested to know if the
>>>> main idea is ok.
>>>
>>>
>>> For the record, it may be ideal to reuse some of the iterations we already
>>> do over the function's basic blocks, so we don't have to iterate yet again
>>> over the IL stream.  Though it may complicate the pass unnecessarily.
>>
>> It's definitely ugly that you need to do this.
>
> Indeed.  But that's simply the price we have to pay for making
> publication with transactions easier for the programmer yet still at
> acceptable performance.
>
> Also note that what we primarily have to care about for this PR is to
> not hoist loads _within_ transactions if they would violate publication

For that matter, didn't rth add a memory barrier at the beginning of 
transactions last week?  That would mean that we can't hoist anything 
outside of a transaction anyhow.  Or was it not a full memory barrier?

> safety.  I didn't have time to look at Aldy's patch yet, but a first
> safe and conservative way would be to treat transactions as full
> transformation barriers, and prevent publication-safety-violating
> transformations _within_ transactions.  Which I would prefer until we're
> confident that we understood all of it.

Do you mean disallow hoisting of *any* loads that happen inside of a 
transaction (regardless of whether a subsequent load happens on every 
path out of the loop)?  This would definitely be safe and quite easily 
doable, simply by checking if loads to be hoisted are within a transaction.

>
> For hoisting out of or across transactions, we have to reason about more
> than just publication safety.

Again, __transactions being barriers and all, I don't think we should 
complicate things unnecessarily at this point, since it doesn't happen.

Aldy
Richard Biener Feb. 26, 2012, 3:15 p.m. UTC | #5
On Fri, Feb 24, 2012 at 2:10 PM, Torvald Riegel <triegel@redhat.com> wrote:
> On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote:
>> On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> > On 02/23/12 12:19, Aldy Hernandez wrote:
>> >
>> >> about hit me. Instead now I save all loads in a function and iterate
>> >> through them in a brute force way. I'd like to rewrite this into a hash
>> >> of some sort, but before I go any further I'm interested to know if the
>> >> main idea is ok.
>> >
>> >
>> > For the record, it may be ideal to reuse some of the iterations we already
>> > do over the function's basic blocks, so we don't have to iterate yet again
>> > over the IL stream.  Though it may complicate the pass unnecessarily.
>>
>> It's definitely ugly that you need to do this.
>
> Indeed.  But that's simply the price we have to pay for making
> publication with transactions easier for the programmer yet still at
> acceptable performance.
>
> Also note that what we primarily have to care about for this PR is to
> not hoist loads _within_ transactions if they would violate publication
> safety.  I didn't have time to look at Aldy's patch yet, but a first
> safe and conservative way would be to treat transactions as full
> transformation barriers, and prevent publication-safety-violating
> transformations _within_ transactions.  Which I would prefer until we're
> confident that we understood all of it.
>
> For hoisting out of or across transactions, we have to reason about more
> than just publication safety.
>
>> And yes, you have to look at
>> very many passes I guess, PRE comes to my mind at least.
>>
>> I still do not understand the situation very well, at least why for
>>
>>  transaction { for () if (b) load; } load;
>>
>> it should be safe to hoist load out of the transaction
>
> This one is funny.  *Iff* this is an atomic txn, we can assume that the
> transaction does not wait for a concurrent event. If it is a relaxed
> txn, then we must not have a loop which terminates depending on an
> atomic load (which we don't in that example); otherwise, we cannot hoist
> load to before the txn (same reason as why we must not hoist to before
> an atomic load with memory_order_acquire).
> Now, if these things hold, then the load will always be executed after
> the txn.  Thus, we can assume that it will happen anyway and nothing can
> stop it (irrespective of which position the txn gets in the
> Transactional Synchronization Order (see the C++ TM spec)).  It is a
> nonatomic load, and we can rely on the program being data-race-free, so
> we cannot have other threads storing to the same location, so we can
> hoist it across because in that part of the program, the location is
> guaranteed to be thread-local data (or immutable).
>
> As I said, this isn't related to just pub safety anymore.  And this is
> tricky enough that I'd rather not try to optimize this currently but
> instead wait until we have more confidence in our understanding of the
> matter.
>
>> while for
>>
>>  load; transaction { for () if (b) load; }
>>
>> it is not.
>
> If the same assumptions as above hold, I think you can hoist it out,
> because again you can assume that it targets thread-local/immutable
> data: the nontransactional (+nonatomic!) load can happen at any time,
> essentially, irrespective of b's value or how/when other threads modify
> it.  Thus, it cannot have been changed between the two loads in a
> data-race-free program.
>
>> Neither do I understand why it's not ok for
>>
>>  transaction { for () if (b) load; }
>>
>> to hoist the load out of the transaction.
>
> You would violate publication safety.
>
> Also, you don't have any reason to believe that load targets
> thread-local/immutable data, so you must not make it nontransactional
> (otherwise, you could introduce a race condition).
>
>>
>> I assume all the issue is about hoisting things over the trasaction start.
>> So - why does the transaction start not simply clobber all global memory?
>> That would avoid the hoisting.  I assume that  transforming the above to
>>
>>  transaction { tem = load; for () if (b) = tem; }
>>
>> is ok?
>
> No, it is not.  Actually, this is precisely the transformation that we
> need to prevent from happening.
> As Aldy said, please see the explanations in the PR, or have a look at
> the C++ TM specification and the C++11 memory model alternatively.  We
> could also discuss this on IRC, if this would be easier.

Ok. I see.  So, I think what would be best is to have a way to check whether
a store/load is part of a transaction - do we have a way to do that right now?
(For example a flag on a gimple stmt?)  Then we can simply avoid the LIM
transform by making transaction load/store stmts behave the same as
potentially trapping stmts (thus, only optimize if the memory is accessed
unconditional somewhere else).  That would work for PRE as well.
[easiest would be to make *_could_trap_p return true for memory ops
inside a transaction]

Does that sound like it would make TM happy (well, apart from missed
optimizations)?  Thus, is it enough to avoid transforms that would be
invalid for loads/stores that might trap?

Thanks,
Richard.

> Torvald
>
Richard Biener Feb. 26, 2012, 3:17 p.m. UTC | #6
On Fri, Feb 24, 2012 at 5:34 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 02/24/12 07:10, Torvald Riegel wrote:
>>
>> On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote:
>>>
>>> On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez<aldyh@redhat.com>
>>>  wrote:
>>>>
>>>> On 02/23/12 12:19, Aldy Hernandez wrote:
>>>>
>>>>> about hit me. Instead now I save all loads in a function and iterate
>>>>> through them in a brute force way. I'd like to rewrite this into a hash
>>>>> of some sort, but before I go any further I'm interested to know if the
>>>>> main idea is ok.
>>>>
>>>>
>>>>
>>>> For the record, it may be ideal to reuse some of the iterations we
>>>> already
>>>> do over the function's basic blocks, so we don't have to iterate yet
>>>> again
>>>> over the IL stream.  Though it may complicate the pass unnecessarily.
>>>
>>>
>>> It's definitely ugly that you need to do this.
>>
>>
>> Indeed.  But that's simply the price we have to pay for making
>> publication with transactions easier for the programmer yet still at
>> acceptable performance.
>>
>> Also note that what we primarily have to care about for this PR is to
>> not hoist loads _within_ transactions if they would violate publication
>
>
> For that matter, didn't rth add a memory barrier at the beginning of
> transactions last week?  That would mean that we can't hoist anything
> outside of a transaction anyhow.  Or was it not a full memory barrier?

It's now a full memory barrier for all global memory and for local statics
if their address is taken (and for automatic vars with their address taken).
Do we need to be concerned about non-address-taken local statics?

>
>> safety.  I didn't have time to look at Aldy's patch yet, but a first
>> safe and conservative way would be to treat transactions as full
>> transformation barriers, and prevent publication-safety-violating
>> transformations _within_ transactions.  Which I would prefer until we're
>> confident that we understood all of it.
>
>
> Do you mean disallow hoisting of *any* loads that happen inside of a
> transaction (regardless of whether a subsequent load happens on every path
> out of the loop)?  This would definitely be safe and quite easily doable,
> simply by checking if loads to be hoisted are within a transaction.

Yes, simply handle memory accesses inside a transaction as if they may
trap.  Then all optimizers should be fine (or they are buggy even without
TM).

Richard.

>
>>
>> For hoisting out of or across transactions, we have to reason about more
>> than just publication safety.
>
>
> Again, __transactions being barriers and all, I don't think we should
> complicate things unnecessarily at this point, since it doesn't happen.
>
> Aldy
Torvald Riegel Feb. 27, 2012, 9:24 a.m. UTC | #7
On Fri, 2012-02-24 at 10:34 -0600, Aldy Hernandez wrote:
> On 02/24/12 07:10, Torvald Riegel wrote:
> > safety.  I didn't have time to look at Aldy's patch yet, but a first
> > safe and conservative way would be to treat transactions as full
> > transformation barriers, and prevent publication-safety-violating
> > transformations _within_ transactions.  Which I would prefer until we're
> > confident that we understood all of it.
> 
> Do you mean disallow hoisting of *any* loads that happen inside of a 
> transaction (regardless of whether a subsequent load happens on every 
> path out of the loop)?  This would definitely be safe and quite easily 
> doable, simply by checking if loads to be hoisted are within a transaction.

If we cannot get the less conservative approach in on time (the one
you've been working on), then we could also use this big hammer.  I
don't quite know how high performance degradation would be (we could
measure number of loads/stores at runtime with STAMP, for example, and I
could build the libitm code for that if you want), but having
publication-safety work correctly in 4.7 might be more important.

> 
> >
> > For hoisting out of or across transactions, we have to reason about more
> > than just publication safety.
> 
> Again, __transactions being barriers and all, I don't think we should 
> complicate things unnecessarily at this point, since it doesn't happen.

Yes.  Based on Richard Guenther's examples, my question was whether your
code (without having actually looked at it ;) ) would also allow
post-dominating loads in nontransactional code to enable hoisting (as in
the __transaction_atomic { if (foo) load; } load case.  I believe only
loads within the same transaction should count, and I wasn't sure
whether you were ensuring that (and/or whether a barrier would enforce
this either).

Torvald
Richard Biener Feb. 27, 2012, 9:44 a.m. UTC | #8
On Mon, Feb 27, 2012 at 10:24 AM, Torvald Riegel <triegel@redhat.com> wrote:
> On Fri, 2012-02-24 at 10:34 -0600, Aldy Hernandez wrote:
>> On 02/24/12 07:10, Torvald Riegel wrote:
>> > safety.  I didn't have time to look at Aldy's patch yet, but a first
>> > safe and conservative way would be to treat transactions as full
>> > transformation barriers, and prevent publication-safety-violating
>> > transformations _within_ transactions.  Which I would prefer until we're
>> > confident that we understood all of it.
>>
>> Do you mean disallow hoisting of *any* loads that happen inside of a
>> transaction (regardless of whether a subsequent load happens on every
>> path out of the loop)?  This would definitely be safe and quite easily
>> doable, simply by checking if loads to be hoisted are within a transaction.
>
> If we cannot get the less conservative approach in on time (the one
> you've been working on), then we could also use this big hammer.  I
> don't quite know how high performance degradation would be (we could
> measure number of loads/stores at runtime with STAMP, for example, and I
> could build the libitm code for that if you want), but having
> publication-safety work correctly in 4.7 might be more important.
>
>>
>> >
>> > For hoisting out of or across transactions, we have to reason about more
>> > than just publication safety.
>>
>> Again, __transactions being barriers and all, I don't think we should
>> complicate things unnecessarily at this point, since it doesn't happen.
>
> Yes.  Based on Richard Guenther's examples, my question was whether your
> code (without having actually looked at it ;) ) would also allow
> post-dominating loads in nontransactional code to enable hoisting (as in
> the __transaction_atomic { if (foo) load; } load case.  I believe only
> loads within the same transaction should count, and I wasn't sure
> whether you were ensuring that (and/or whether a barrier would enforce
> this either).

We should at this point not disrupt the tree by complex changes to optimizers.
You can try patching tree_could_trap_p, though I am not sure there is enough
context provided, or add a few TM checks in selected places (loop IM and PRE).
We are close to a release.

Richard.

> Torvald
>
Aldy Hernandez Feb. 27, 2012, 4:15 p.m. UTC | #9
>> Again, __transactions being barriers and all, I don't think we should
>> complicate things unnecessarily at this point, since it doesn't happen.
>
> Yes.  Based on Richard Guenther's examples, my question was whether your
> code (without having actually looked at it ;) ) would also allow
> post-dominating loads in nontransactional code to enable hoisting (as in
> the __transaction_atomic { if (foo) load; } load case.  I believe only
> loads within the same transaction should count, and I wasn't sure
> whether you were ensuring that (and/or whether a barrier would enforce
> this either).

In my patch, post-dominating loads in nontransactional code is fair game 
for hoisting, and will get hoisted out.  However, tweaking it to make 
sure this post-dominating load occurs in a transaction is trivial since 
we have a bitmap of all BBs in transactions.
Aldy Hernandez Feb. 27, 2012, 4:22 p.m. UTC | #10
> Ok. I see.  So, I think what would be best is to have a way to check whether
> a store/load is part of a transaction - do we have a way to do that right now?
> (For example a flag on a gimple stmt?)  Then we can simply avoid the LIM

We do not (*).  My patch accumulates that information on demand.  I can 
certainly add a gimple bit for this, but can't the optimizations 
change/rewrite the stores/loads so we lose this information?

> transform by making transaction load/store stmts behave the same as
> potentially trapping stmts (thus, only optimize if the memory is accessed
> unconditional somewhere else).  That would work for PRE as well.
> [easiest would be to make *_could_trap_p return true for memory ops
> inside a transaction]

Provided the gimple bit works, this seems reasonable, though quite a big 
hammer.  But given that we are nearing a release, I would be in favor of it.

Richard Henderson, what do you think?

Aldy

(*) Well technically we do, but much later in the tmmark pass, when we 
instrument the loads, but this is useless since all the tree optimizers 
have been run (loop IM, etc).
Aldy Hernandez Feb. 27, 2012, 4:25 p.m. UTC | #11
>> For that matter, didn't rth add a memory barrier at the beginning of
>> transactions last week?  That would mean that we can't hoist anything
>> outside of a transaction anyhow.  Or was it not a full memory barrier?
>
> It's now a full memory barrier for all global memory and for local statics
> if their address is taken (and for automatic vars with their address taken).
> Do we need to be concerned about non-address-taken local statics?

It is my understanding that non-address-taken local statics are not 
visible to other threads, so we don't have to worry about these.

A
Andrew MacLeod Feb. 27, 2012, 4:44 p.m. UTC | #12
On 02/27/2012 11:22 AM, Aldy Hernandez wrote:
>
>> Ok. I see.  So, I think what would be best is to have a way to check 
>> whether
>> a store/load is part of a transaction - do we have a way to do that 
>> right now?
>> (For example a flag on a gimple stmt?)  Then we can simply avoid the LIM
>
> We do not (*).  My patch accumulates that information on demand.  I 
> can certainly add a gimple bit for this, but can't the optimizations 
> change/rewrite the stores/loads so we lose this information?
>
I t would seem appropriate to me that in the future, perhaps the CFG 
could have a flag set for any basic block which is in a transaction...  
This would make it pretty trivial at all times to tell if a statement is 
part of a transaction or not.  It seems like the process of CFG 
construction/updating should be able to maintain that info for little 
cost. (ie, I'd guess it's already making the traversals requires to 
collect that info)

It would certainly be convenient :-)

Andrew
Michael Matz Feb. 27, 2012, 5:02 p.m. UTC | #13
Hi,

On Mon, 27 Feb 2012, Aldy Hernandez wrote:

> 
> > > For that matter, didn't rth add a memory barrier at the beginning of
> > > transactions last week?  That would mean that we can't hoist anything
> > > outside of a transaction anyhow.  Or was it not a full memory barrier?
> >
> > It's now a full memory barrier for all global memory and for local statics
> > if their address is taken (and for automatic vars with their address taken).
> > Do we need to be concerned about non-address-taken local statics?
> 
> It is my understanding that non-address-taken local statics are not 
> visible to other threads,

void f () { static int i; i++; }

Running 'f' in different threads will expose the storage to 'i' to each of 
them without taking its address :-/


Ciao,
Michael.
Aldy Hernandez Feb. 27, 2012, 5:29 p.m. UTC | #14
On 02/27/12 11:02, Michael Matz wrote:
> Hi,
>
> On Mon, 27 Feb 2012, Aldy Hernandez wrote:
>
>>
>>>> For that matter, didn't rth add a memory barrier at the beginning of
>>>> transactions last week?  That would mean that we can't hoist anything
>>>> outside of a transaction anyhow.  Or was it not a full memory barrier?
>>>
>>> It's now a full memory barrier for all global memory and for local statics
>>> if their address is taken (and for automatic vars with their address taken).
>>> Do we need to be concerned about non-address-taken local statics?
>>
>> It is my understanding that non-address-taken local statics are not
>> visible to other threads,
>
> void f () { static int i; i++; }
>
> Running 'f' in different threads will expose the storage to 'i' to each of
> them without taking its address :-/

In that case, shouldn't we make transactions barriers for local statics 
as well, regardless of if their address was taken or not?
Richard Biener Feb. 28, 2012, 9:35 a.m. UTC | #15
On Mon, Feb 27, 2012 at 6:29 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 02/27/12 11:02, Michael Matz wrote:
>>
>> Hi,
>>
>> On Mon, 27 Feb 2012, Aldy Hernandez wrote:
>>
>>>
>>>>> For that matter, didn't rth add a memory barrier at the beginning of
>>>>> transactions last week?  That would mean that we can't hoist anything
>>>>> outside of a transaction anyhow.  Or was it not a full memory barrier?
>>>>
>>>>
>>>> It's now a full memory barrier for all global memory and for local
>>>> statics
>>>> if their address is taken (and for automatic vars with their address
>>>> taken).
>>>> Do we need to be concerned about non-address-taken local statics?
>>>
>>>
>>> It is my understanding that non-address-taken local statics are not
>>> visible to other threads,
>>
>>
>> void f () { static int i; i++; }
>>
>> Running 'f' in different threads will expose the storage to 'i' to each of
>> them without taking its address :-/
>
>
> In that case, shouldn't we make transactions barriers for local statics as
> well, regardless of if their address was taken or not?

We don't have a convenient bit for this right now, but OTOH we assume
that the transaction may re-enter the function, and thus it should already
be a barrier for local statics.  Unless the builtins have the "leaf" attribute
on them.

Richard.
Richard Biener Feb. 28, 2012, 9:39 a.m. UTC | #16
On Mon, Feb 27, 2012 at 5:44 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 02/27/2012 11:22 AM, Aldy Hernandez wrote:
>>
>>
>>> Ok. I see.  So, I think what would be best is to have a way to check
>>> whether
>>> a store/load is part of a transaction - do we have a way to do that right
>>> now?
>>> (For example a flag on a gimple stmt?)  Then we can simply avoid the LIM
>>
>>
>> We do not (*).  My patch accumulates that information on demand.  I can
>> certainly add a gimple bit for this, but can't the optimizations
>> change/rewrite the stores/loads so we lose this information?

Yes, we'd have to either fix them all up or compute that bit automagically
(like we do for the has_volatile_ops flag).  I assume that you _can_ determine
if a stmt is part of a transaction (it may just not be a O(1) operation)?  Thus,
a bit crude way would be to have the gimple flag but have it to-be computed
on-demand.  And you'd compute it before critical passes such as LIM and PRE.

> I t would seem appropriate to me that in the future, perhaps the CFG could
> have a flag set for any basic block which is in a transaction...  This would
> make it pretty trivial at all times to tell if a statement is part of a
> transaction or not.  It seems like the process of CFG construction/updating
> should be able to maintain that info for little cost. (ie, I'd guess it's
> already making the traversals requires to collect that info)
>
> It would certainly be convenient :-)

That would sound easier than maintaining the flag on the stmt.  OTOH we
already maintain on-the-side EH tables, so it can't be that big of a problem.

What do we do if we move stmts inside a transaction though?  Do they
magically become part of it and thus we cannot move them out again?

If you want to fix this issue for 4.7, add the gimple flag and an on-demand
compute helper and check the flag in PRE/LIM.

Richard.

> Andrew
>
>
>
Andrew MacLeod Feb. 28, 2012, 2:20 p.m. UTC | #17
On 02/28/2012 04:39 AM, Richard Guenther wrote:
> On Mon, Feb 27, 2012 at 5:44 PM, Andrew MacLeod<amacleod@redhat.com>  wrote:
> I t would seem appropriate to me that in the future, perhaps the CFG could
> have a flag set for any basic block which is in a transaction...  This would
> make it pretty trivial at all times to tell if a statement is part of a
> transaction or not.  It seems like the process of CFG construction/updating
> should be able to maintain that info for little cost. (ie, I'd guess it's
> already making the traversals requires to collect that info)
>
> It would certainly be convenient :-)
> That would sound easier than maintaining the flag on the stmt.  OTOH we
> already maintain on-the-side EH tables, so it can't be that big of a problem.
>
> What do we do if we move stmts inside a transaction though?  Do they
> magically become part of it and thus we cannot move them out again?
well, I think by definition if you move a statement into a basic block 
of a transaction it becomes a part of the transaction.  I believe 
transactions are block based... the entire block is in f a transaction, 
or not.    Moving a statement out of a transaction would be as easy as 
moving it in... just change the block it is in.

> If you want to fix this issue for 4.7, add the gimple flag and an on-demand
> compute helper and check the flag in PRE/LIM.
>
>
yes, a targeted approach is required for 4.7.  I just wanted to plant 
the seed while the discussion was ongoing.

Andrew
Richard Henderson Feb. 28, 2012, 5:05 p.m. UTC | #18
On 02/27/12 08:22, Aldy Hernandez wrote:
>> transform by making transaction load/store stmts behave the same as
>> potentially trapping stmts (thus, only optimize if the memory is accessed
>> unconditional somewhere else).  That would work for PRE as well.
>> [easiest would be to make *_could_trap_p return true for memory ops
>> inside a transaction]
> 
> Provided the gimple bit works, this seems reasonable, though quite a big hammer.  But given that we are nearing a release, I would be in favor of it.
> 
> Richard Henderson, what do you think?

Well, hooking could_trap_p sounds like an easy solution.

Gimple bits, on the other hand, are not.  Keeping those up-to-date is
always a real pain.  We have had several gimple bits in the history of
the TM code, and we've gotten rid of them all because they were too
invasive to maintain.

OTOH, I have no better suggestion...

r~
diff mbox

Patch

Index: tree-ssa-loop-im.c
===================================================================
--- tree-ssa-loop-im.c	(revision 184445)
+++ tree-ssa-loop-im.c	(working copy)
@@ -150,7 +150,7 @@  typedef struct mem_ref
 
   bitmap indep_ref;		/* The set of memory references on that
 				   this reference is independent.  */
-  bitmap dep_ref;		/* The complement of DEP_REF.  */
+  bitmap dep_ref;		/* The complement of INDEP_REF.  */
 } *mem_ref_p;
 
 DEF_VEC_P(mem_ref_p);
@@ -189,6 +189,13 @@  static struct
 
 static bool ref_indep_loop_p (struct loop *, mem_ref_p);
 
+/* All basic blocks that are within a transaction in the current
+   function.  */
+static bitmap all_tm_blocks;
+
+/* All the loads in the current function.  */
+static mem_ref_locs_p all_loads;
+
 /* Minimum cost of an expensive expression.  */
 #define LIM_EXPENSIVE ((unsigned) PARAM_VALUE (PARAM_LIM_EXPENSIVE))
 
@@ -337,6 +344,26 @@  for_each_index (tree *addr_p, bool (*cbc
     }
 }
 
+/* Return true if every path out of the loop containing STMT loads the
+   decl loaded in STMT.  */
+
+static bool
+every_path_out_of_loop_has_load (gimple stmt)
+{
+  basic_block bb = gimple_bb (stmt);
+  basic_block header = bb->loop_father->header;
+  unsigned int i;
+  mem_ref_loc_p aref;
+  tree rhs = gimple_assign_rhs1 (stmt);
+
+  /* Return true if the same load occurs on every path out of the loop.  */
+  FOR_EACH_VEC_ELT (mem_ref_loc_p, all_loads->locs, i, aref)
+    if (rhs == *aref->ref
+	&& dominated_by_p (CDI_POST_DOMINATORS, header, gimple_bb (aref->stmt)))
+      return true;
+  return false;
+}
+
 /* If it is possible to hoist the statement STMT unconditionally,
    returns MOVE_POSSIBLE.
    If it is possible to hoist the statement STMT, but we must avoid making
@@ -412,6 +439,26 @@  movement_possibility (gimple stmt)
       || gimple_could_trap_p (stmt))
     return MOVE_PRESERVE_EXECUTION;
 
+  /* Non local loads in a transaction cannot be hoisted out unless the
+     load happens on every path out of the loop.  */
+  if (flag_tm
+      && bitmap_bit_p (all_tm_blocks, gimple_bb (stmt)->index)
+      && gimple_assign_single_p (stmt))
+    {
+      tree rhs = gimple_assign_rhs1 (stmt);
+      if (DECL_P (rhs) && is_global_var (rhs)
+	  && !every_path_out_of_loop_has_load (stmt))
+	{
+	  if (dump_file)
+	    {
+	      fprintf (dump_file, "Cannot hoist conditional load of ");
+	      print_generic_expr (dump_file, rhs, TDF_SLIM);
+	      fprintf (dump_file, " because it is in a transaction.\n");
+	    }
+	  return MOVE_IMPOSSIBLE;
+	}
+    }
+
   return ret;
 }
 
@@ -2358,6 +2405,42 @@  fill_always_executed_in (struct loop *lo
     fill_always_executed_in (loop, contains_call);
 }
 
+/* Compute global information needed for transactional restrictions in
+   the loop invariant motion pass.  */
+
+static void
+tree_ssa_lim_tm_initialize (void)
+{
+  basic_block bb;
+  gimple_stmt_iterator bsi;
+
+  calculate_dominance_info (CDI_POST_DOMINATORS);
+  all_tm_blocks = get_all_tm_blocks ();
+  all_loads = mem_ref_locs_alloc ();
+
+  /* Save all the loads in the current function.  */
+  /* ?? This needs to be rewritten into something more efficient, like
+     a hash.  */
+  FOR_EACH_BB (bb)
+    for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
+      {
+	gimple stmt = gsi_stmt (bsi);
+
+	if (is_gimple_assign (stmt)
+	    && gimple_assign_single_p (stmt))
+	  {
+	    tree *rhs_p = gimple_assign_rhs1_ptr (stmt);
+	    if (DECL_P (*rhs_p) && is_global_var (*rhs_p))
+	      {
+		mem_ref_loc_p aref = XNEW (struct mem_ref_loc);
+		aref->stmt = stmt;
+		aref->ref = rhs_p;
+		VEC_safe_push (mem_ref_loc_p, heap, all_loads->locs, aref);
+	      }
+	  }
+      }
+}
+
 /* Compute the global information needed by the loop invariant motion pass.  */
 
 static void
@@ -2387,6 +2470,16 @@  tree_ssa_lim_initialize (void)
   sbitmap_free (contains_call);
 
   lim_aux_data_map = pointer_map_create ();
+
+  if (flag_tm)
+    tree_ssa_lim_tm_initialize ();
+}
+
+static void
+tree_ssa_lim_tm_finalize (void)
+{
+  BITMAP_FREE (all_tm_blocks);
+  free_mem_ref_locs (all_loads);
 }
 
 /* Cleans up after the invariant motion pass.  */
@@ -2420,6 +2513,9 @@  tree_ssa_lim_finalize (void)
 
   if (memory_accesses.ttae_cache)
     pointer_map_destroy (memory_accesses.ttae_cache);
+
+  if (flag_tm)
+    tree_ssa_lim_tm_finalize ();
 }
 
 /* Moves invariants from loops.  Only "expensive" invariants are moved out --
Index: tree.h
===================================================================
--- tree.h	(revision 184445)
+++ tree.h	(working copy)
@@ -5875,6 +5875,7 @@  extern bool is_tm_may_cancel_outer (tree
 extern bool is_tm_ending_fndecl (tree);
 extern void record_tm_replacement (tree, tree);
 extern void tm_malloc_replacement (tree);
+extern bitmap get_all_tm_blocks (void);
 
 static inline bool
 is_tm_safe_or_pure (const_tree x)
Index: testsuite/gcc.dg/tm/pub-safety-1.c
===================================================================
--- testsuite/gcc.dg/tm/pub-safety-1.c	(revision 0)
+++ testsuite/gcc.dg/tm/pub-safety-1.c	(revision 0)
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim1" } */
+
+/* Test that thread visible loads do not get hoisted out of loops if
+   the load would not have occurred on each path out of the loop.  */
+
+int x[10] = {0,0,0,0,0,0,0,0,0,0};
+int DATA_DATA = 0;
+
+void reader()
+{
+  int i;
+  __transaction_atomic
+    { 
+      for (i = 0; i < 10; i++)
+        if (x[i])
+          x[i] += DATA_DATA;
+      /* If we loaded DATA_DATA here, we could hoist it before the loop,
+	 but since we don't... we can't.  */
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: testsuite/gcc.dg/tm/pub-safety-2.c
===================================================================
--- testsuite/gcc.dg/tm/pub-safety-2.c	(revision 0)
+++ testsuite/gcc.dg/tm/pub-safety-2.c	(revision 0)
@@ -0,0 +1,27 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim1" } */
+
+/* Test that thread visible loads do not get hoisted out of loops if
+   the load would not have occurred on each path out of the loop.  */
+
+int x[10] = {0,0,0,0,0,0,0,0,0,0};
+int DATA_DATA = 0, george;
+
+void reader()
+{
+  int i;
+  __transaction_atomic
+    { 
+      for (i = 0; i < 10; i++)
+        if (x[i])
+          x[i] += DATA_DATA;
+
+      /* Because of this load of DATA_DATA, we are loading DATA_DATA
+	 on every path out of the loop.  It is OK to hoist DATA_DATA
+	 out.  */
+      george = DATA_DATA;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 0 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: trans-mem.c
===================================================================
--- trans-mem.c	(revision 184445)
+++ trans-mem.c	(working copy)
@@ -1858,18 +1858,25 @@  tm_region_init (struct tm_region *region
   VEC(basic_block, heap) *queue = NULL;
   bitmap visited_blocks = BITMAP_ALLOC (NULL);
   struct tm_region *old_region;
+  struct tm_region **region_worklist;
 
   all_tm_regions = region;
   bb = single_succ (ENTRY_BLOCK_PTR);
 
+  /* We could store this information in bb->aux, but we may get called
+     through get_all_tm_blocks() from another pass that may be already
+     using bb->aux.  */
+  region_worklist =
+    (struct tm_region **) xcalloc (sizeof (struct tm_region *),
+				  n_basic_blocks + NUM_FIXED_BLOCKS);
+
   VEC_safe_push (basic_block, heap, queue, bb);
-  gcc_assert (!bb->aux);	/* FIXME: Remove me.  */
-  bb->aux = region;
+  region_worklist[bb->index] = region;
   do
     {
       bb = VEC_pop (basic_block, queue);
-      region = (struct tm_region *)bb->aux;
-      bb->aux = NULL;
+      region = region_worklist[bb->index];
+      region_worklist[bb->index] = NULL;
 
       /* Record exit and irrevocable blocks.  */
       region = tm_region_init_1 (region, bb);
@@ -1886,20 +1893,20 @@  tm_region_init (struct tm_region *region
 	  {
 	    bitmap_set_bit (visited_blocks, e->dest->index);
 	    VEC_safe_push (basic_block, heap, queue, e->dest);
-	    gcc_assert (!e->dest->aux); /* FIXME: Remove me.  */
 
 	    /* If the current block started a new region, make sure that only
 	       the entry block of the new region is associated with this region.
 	       Other successors are still part of the old region.  */
 	    if (old_region != region && e->dest != region->entry_block)
-	      e->dest->aux = old_region;
+	      region_worklist[e->dest->index] = old_region;
 	    else
-	      e->dest->aux = region;
+	      region_worklist[e->dest->index] = region;
 	  }
     }
   while (!VEC_empty (basic_block, queue));
   VEC_free (basic_block, heap, queue);
   BITMAP_FREE (visited_blocks);
+  free (region_worklist);
 }
 
 /* The "gate" function for all transactional memory expansion and optimization
@@ -2424,6 +2431,39 @@  get_tm_region_blocks (basic_block entry_
   return bbs;
 }
 
+/* Return a bitmap with all the basic blocks that appear in
+   transactions throughout current_function_decl.  The bitmap must be
+   freed by the caller.
+
+   This bitmap is only usable by passes that don't adjust the CFG in
+   the middle of the pass.  */
+
+bitmap
+get_all_tm_blocks (void)
+{
+  struct tm_region *region;
+  VEC (basic_block, heap) *queue;
+  bitmap all_tm_blocks = BITMAP_ALLOC (NULL);
+
+  /* ?? Perhaps we need to abstract gate_tm_init further, because we
+     certainly don't need it to calculate CDI_DOMINATOR info.  */
+  gate_tm_init ();
+
+  for (region = all_tm_regions; region; region = region->next)
+    {
+      queue = get_tm_region_blocks (region->entry_block,
+				    region->exit_blocks,
+				    region->irr_blocks,
+				    all_tm_blocks,
+				    /*stop_at_irr_p=*/true);
+      VEC_free (basic_block, heap, queue);
+    }
+
+  if (all_tm_regions)
+    bitmap_obstack_release (&tm_obstack);
+  return all_tm_blocks;
+}
+
 /* Entry point to the MARK phase of TM expansion.  Here we replace
    transactional memory statements with calls to builtins, and function
    calls with their transactional clones (if available).  But we don't