Message ID | 4F46833C.2090808@redhat.com |
---|---|
State | New |
Headers | show |
On 02/23/12 12:19, Aldy Hernandez wrote: > about hit me. Instead now I save all loads in a function and iterate > through them in a brute force way. I'd like to rewrite this into a hash > of some sort, but before I go any further I'm interested to know if the > main idea is ok. For the record, it may be ideal to reuse some of the iterations we already do over the function's basic blocks, so we don't have to iterate yet again over the IL stream. Though it may complicate the pass unnecessarily.
On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > On 02/23/12 12:19, Aldy Hernandez wrote: > >> about hit me. Instead now I save all loads in a function and iterate >> through them in a brute force way. I'd like to rewrite this into a hash >> of some sort, but before I go any further I'm interested to know if the >> main idea is ok. > > > For the record, it may be ideal to reuse some of the iterations we already > do over the function's basic blocks, so we don't have to iterate yet again > over the IL stream. Though it may complicate the pass unnecessarily. It's definitely ugly that you need to do this. And yes, you have to look at very many passes I guess, PRE comes to my mind at least. I still do not understand the situation very well, at least why for transaction { for () if (b) load; } load; it should be safe to hoist load out of the transaction while for load; transaction { for () if (b) load; } it is not. Neither do I understand why it's not ok for transaction { for () if (b) load; } to hoist the load out of the transaction. I assume all the issue is about hoisting things over the trasaction start. So - why does the transaction start not simply clobber all global memory? That would avoid the hoisting. I assume that transforming the above to transaction { tem = load; for () if (b) = tem; } is ok? Thanks, Richard.
On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote: > On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > > On 02/23/12 12:19, Aldy Hernandez wrote: > > > >> about hit me. Instead now I save all loads in a function and iterate > >> through them in a brute force way. I'd like to rewrite this into a hash > >> of some sort, but before I go any further I'm interested to know if the > >> main idea is ok. > > > > > > For the record, it may be ideal to reuse some of the iterations we already > > do over the function's basic blocks, so we don't have to iterate yet again > > over the IL stream. Though it may complicate the pass unnecessarily. > > It's definitely ugly that you need to do this. Indeed. But that's simply the price we have to pay for making publication with transactions easier for the programmer yet still at acceptable performance. Also note that what we primarily have to care about for this PR is to not hoist loads _within_ transactions if they would violate publication safety. I didn't have time to look at Aldy's patch yet, but a first safe and conservative way would be to treat transactions as full transformation barriers, and prevent publication-safety-violating transformations _within_ transactions. Which I would prefer until we're confident that we understood all of it. For hoisting out of or across transactions, we have to reason about more than just publication safety. > And yes, you have to look at > very many passes I guess, PRE comes to my mind at least. > > I still do not understand the situation very well, at least why for > > transaction { for () if (b) load; } load; > > it should be safe to hoist load out of the transaction This one is funny. *Iff* this is an atomic txn, we can assume that the transaction does not wait for a concurrent event. If it is a relaxed txn, then we must not have a loop which terminates depending on an atomic load (which we don't in that example); otherwise, we cannot hoist load to before the txn (same reason as why we must not hoist to before an atomic load with memory_order_acquire). Now, if these things hold, then the load will always be executed after the txn. Thus, we can assume that it will happen anyway and nothing can stop it (irrespective of which position the txn gets in the Transactional Synchronization Order (see the C++ TM spec)). It is a nonatomic load, and we can rely on the program being data-race-free, so we cannot have other threads storing to the same location, so we can hoist it across because in that part of the program, the location is guaranteed to be thread-local data (or immutable). As I said, this isn't related to just pub safety anymore. And this is tricky enough that I'd rather not try to optimize this currently but instead wait until we have more confidence in our understanding of the matter. > while for > > load; transaction { for () if (b) load; } > > it is not. If the same assumptions as above hold, I think you can hoist it out, because again you can assume that it targets thread-local/immutable data: the nontransactional (+nonatomic!) load can happen at any time, essentially, irrespective of b's value or how/when other threads modify it. Thus, it cannot have been changed between the two loads in a data-race-free program. > Neither do I understand why it's not ok for > > transaction { for () if (b) load; } > > to hoist the load out of the transaction. You would violate publication safety. Also, you don't have any reason to believe that load targets thread-local/immutable data, so you must not make it nontransactional (otherwise, you could introduce a race condition). > > I assume all the issue is about hoisting things over the trasaction start. > So - why does the transaction start not simply clobber all global memory? > That would avoid the hoisting. I assume that transforming the above to > > transaction { tem = load; for () if (b) = tem; } > > is ok? No, it is not. Actually, this is precisely the transformation that we need to prevent from happening. As Aldy said, please see the explanations in the PR, or have a look at the C++ TM specification and the C++11 memory model alternatively. We could also discuss this on IRC, if this would be easier. Torvald
On 02/24/12 07:10, Torvald Riegel wrote: > On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote: >> On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez<aldyh@redhat.com> wrote: >>> On 02/23/12 12:19, Aldy Hernandez wrote: >>> >>>> about hit me. Instead now I save all loads in a function and iterate >>>> through them in a brute force way. I'd like to rewrite this into a hash >>>> of some sort, but before I go any further I'm interested to know if the >>>> main idea is ok. >>> >>> >>> For the record, it may be ideal to reuse some of the iterations we already >>> do over the function's basic blocks, so we don't have to iterate yet again >>> over the IL stream. Though it may complicate the pass unnecessarily. >> >> It's definitely ugly that you need to do this. > > Indeed. But that's simply the price we have to pay for making > publication with transactions easier for the programmer yet still at > acceptable performance. > > Also note that what we primarily have to care about for this PR is to > not hoist loads _within_ transactions if they would violate publication For that matter, didn't rth add a memory barrier at the beginning of transactions last week? That would mean that we can't hoist anything outside of a transaction anyhow. Or was it not a full memory barrier? > safety. I didn't have time to look at Aldy's patch yet, but a first > safe and conservative way would be to treat transactions as full > transformation barriers, and prevent publication-safety-violating > transformations _within_ transactions. Which I would prefer until we're > confident that we understood all of it. Do you mean disallow hoisting of *any* loads that happen inside of a transaction (regardless of whether a subsequent load happens on every path out of the loop)? This would definitely be safe and quite easily doable, simply by checking if loads to be hoisted are within a transaction. > > For hoisting out of or across transactions, we have to reason about more > than just publication safety. Again, __transactions being barriers and all, I don't think we should complicate things unnecessarily at this point, since it doesn't happen. Aldy
On Fri, Feb 24, 2012 at 2:10 PM, Torvald Riegel <triegel@redhat.com> wrote: > On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote: >> On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez <aldyh@redhat.com> wrote: >> > On 02/23/12 12:19, Aldy Hernandez wrote: >> > >> >> about hit me. Instead now I save all loads in a function and iterate >> >> through them in a brute force way. I'd like to rewrite this into a hash >> >> of some sort, but before I go any further I'm interested to know if the >> >> main idea is ok. >> > >> > >> > For the record, it may be ideal to reuse some of the iterations we already >> > do over the function's basic blocks, so we don't have to iterate yet again >> > over the IL stream. Though it may complicate the pass unnecessarily. >> >> It's definitely ugly that you need to do this. > > Indeed. But that's simply the price we have to pay for making > publication with transactions easier for the programmer yet still at > acceptable performance. > > Also note that what we primarily have to care about for this PR is to > not hoist loads _within_ transactions if they would violate publication > safety. I didn't have time to look at Aldy's patch yet, but a first > safe and conservative way would be to treat transactions as full > transformation barriers, and prevent publication-safety-violating > transformations _within_ transactions. Which I would prefer until we're > confident that we understood all of it. > > For hoisting out of or across transactions, we have to reason about more > than just publication safety. > >> And yes, you have to look at >> very many passes I guess, PRE comes to my mind at least. >> >> I still do not understand the situation very well, at least why for >> >> transaction { for () if (b) load; } load; >> >> it should be safe to hoist load out of the transaction > > This one is funny. *Iff* this is an atomic txn, we can assume that the > transaction does not wait for a concurrent event. If it is a relaxed > txn, then we must not have a loop which terminates depending on an > atomic load (which we don't in that example); otherwise, we cannot hoist > load to before the txn (same reason as why we must not hoist to before > an atomic load with memory_order_acquire). > Now, if these things hold, then the load will always be executed after > the txn. Thus, we can assume that it will happen anyway and nothing can > stop it (irrespective of which position the txn gets in the > Transactional Synchronization Order (see the C++ TM spec)). It is a > nonatomic load, and we can rely on the program being data-race-free, so > we cannot have other threads storing to the same location, so we can > hoist it across because in that part of the program, the location is > guaranteed to be thread-local data (or immutable). > > As I said, this isn't related to just pub safety anymore. And this is > tricky enough that I'd rather not try to optimize this currently but > instead wait until we have more confidence in our understanding of the > matter. > >> while for >> >> load; transaction { for () if (b) load; } >> >> it is not. > > If the same assumptions as above hold, I think you can hoist it out, > because again you can assume that it targets thread-local/immutable > data: the nontransactional (+nonatomic!) load can happen at any time, > essentially, irrespective of b's value or how/when other threads modify > it. Thus, it cannot have been changed between the two loads in a > data-race-free program. > >> Neither do I understand why it's not ok for >> >> transaction { for () if (b) load; } >> >> to hoist the load out of the transaction. > > You would violate publication safety. > > Also, you don't have any reason to believe that load targets > thread-local/immutable data, so you must not make it nontransactional > (otherwise, you could introduce a race condition). > >> >> I assume all the issue is about hoisting things over the trasaction start. >> So - why does the transaction start not simply clobber all global memory? >> That would avoid the hoisting. I assume that transforming the above to >> >> transaction { tem = load; for () if (b) = tem; } >> >> is ok? > > No, it is not. Actually, this is precisely the transformation that we > need to prevent from happening. > As Aldy said, please see the explanations in the PR, or have a look at > the C++ TM specification and the C++11 memory model alternatively. We > could also discuss this on IRC, if this would be easier. Ok. I see. So, I think what would be best is to have a way to check whether a store/load is part of a transaction - do we have a way to do that right now? (For example a flag on a gimple stmt?) Then we can simply avoid the LIM transform by making transaction load/store stmts behave the same as potentially trapping stmts (thus, only optimize if the memory is accessed unconditional somewhere else). That would work for PRE as well. [easiest would be to make *_could_trap_p return true for memory ops inside a transaction] Does that sound like it would make TM happy (well, apart from missed optimizations)? Thus, is it enough to avoid transforms that would be invalid for loads/stores that might trap? Thanks, Richard. > Torvald >
On Fri, Feb 24, 2012 at 5:34 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > On 02/24/12 07:10, Torvald Riegel wrote: >> >> On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote: >>> >>> On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez<aldyh@redhat.com> >>> wrote: >>>> >>>> On 02/23/12 12:19, Aldy Hernandez wrote: >>>> >>>>> about hit me. Instead now I save all loads in a function and iterate >>>>> through them in a brute force way. I'd like to rewrite this into a hash >>>>> of some sort, but before I go any further I'm interested to know if the >>>>> main idea is ok. >>>> >>>> >>>> >>>> For the record, it may be ideal to reuse some of the iterations we >>>> already >>>> do over the function's basic blocks, so we don't have to iterate yet >>>> again >>>> over the IL stream. Though it may complicate the pass unnecessarily. >>> >>> >>> It's definitely ugly that you need to do this. >> >> >> Indeed. But that's simply the price we have to pay for making >> publication with transactions easier for the programmer yet still at >> acceptable performance. >> >> Also note that what we primarily have to care about for this PR is to >> not hoist loads _within_ transactions if they would violate publication > > > For that matter, didn't rth add a memory barrier at the beginning of > transactions last week? That would mean that we can't hoist anything > outside of a transaction anyhow. Or was it not a full memory barrier? It's now a full memory barrier for all global memory and for local statics if their address is taken (and for automatic vars with their address taken). Do we need to be concerned about non-address-taken local statics? > >> safety. I didn't have time to look at Aldy's patch yet, but a first >> safe and conservative way would be to treat transactions as full >> transformation barriers, and prevent publication-safety-violating >> transformations _within_ transactions. Which I would prefer until we're >> confident that we understood all of it. > > > Do you mean disallow hoisting of *any* loads that happen inside of a > transaction (regardless of whether a subsequent load happens on every path > out of the loop)? This would definitely be safe and quite easily doable, > simply by checking if loads to be hoisted are within a transaction. Yes, simply handle memory accesses inside a transaction as if they may trap. Then all optimizers should be fine (or they are buggy even without TM). Richard. > >> >> For hoisting out of or across transactions, we have to reason about more >> than just publication safety. > > > Again, __transactions being barriers and all, I don't think we should > complicate things unnecessarily at this point, since it doesn't happen. > > Aldy
On Fri, 2012-02-24 at 10:34 -0600, Aldy Hernandez wrote: > On 02/24/12 07:10, Torvald Riegel wrote: > > safety. I didn't have time to look at Aldy's patch yet, but a first > > safe and conservative way would be to treat transactions as full > > transformation barriers, and prevent publication-safety-violating > > transformations _within_ transactions. Which I would prefer until we're > > confident that we understood all of it. > > Do you mean disallow hoisting of *any* loads that happen inside of a > transaction (regardless of whether a subsequent load happens on every > path out of the loop)? This would definitely be safe and quite easily > doable, simply by checking if loads to be hoisted are within a transaction. If we cannot get the less conservative approach in on time (the one you've been working on), then we could also use this big hammer. I don't quite know how high performance degradation would be (we could measure number of loads/stores at runtime with STAMP, for example, and I could build the libitm code for that if you want), but having publication-safety work correctly in 4.7 might be more important. > > > > > For hoisting out of or across transactions, we have to reason about more > > than just publication safety. > > Again, __transactions being barriers and all, I don't think we should > complicate things unnecessarily at this point, since it doesn't happen. Yes. Based on Richard Guenther's examples, my question was whether your code (without having actually looked at it ;) ) would also allow post-dominating loads in nontransactional code to enable hoisting (as in the __transaction_atomic { if (foo) load; } load case. I believe only loads within the same transaction should count, and I wasn't sure whether you were ensuring that (and/or whether a barrier would enforce this either). Torvald
On Mon, Feb 27, 2012 at 10:24 AM, Torvald Riegel <triegel@redhat.com> wrote: > On Fri, 2012-02-24 at 10:34 -0600, Aldy Hernandez wrote: >> On 02/24/12 07:10, Torvald Riegel wrote: >> > safety. I didn't have time to look at Aldy's patch yet, but a first >> > safe and conservative way would be to treat transactions as full >> > transformation barriers, and prevent publication-safety-violating >> > transformations _within_ transactions. Which I would prefer until we're >> > confident that we understood all of it. >> >> Do you mean disallow hoisting of *any* loads that happen inside of a >> transaction (regardless of whether a subsequent load happens on every >> path out of the loop)? This would definitely be safe and quite easily >> doable, simply by checking if loads to be hoisted are within a transaction. > > If we cannot get the less conservative approach in on time (the one > you've been working on), then we could also use this big hammer. I > don't quite know how high performance degradation would be (we could > measure number of loads/stores at runtime with STAMP, for example, and I > could build the libitm code for that if you want), but having > publication-safety work correctly in 4.7 might be more important. > >> >> > >> > For hoisting out of or across transactions, we have to reason about more >> > than just publication safety. >> >> Again, __transactions being barriers and all, I don't think we should >> complicate things unnecessarily at this point, since it doesn't happen. > > Yes. Based on Richard Guenther's examples, my question was whether your > code (without having actually looked at it ;) ) would also allow > post-dominating loads in nontransactional code to enable hoisting (as in > the __transaction_atomic { if (foo) load; } load case. I believe only > loads within the same transaction should count, and I wasn't sure > whether you were ensuring that (and/or whether a barrier would enforce > this either). We should at this point not disrupt the tree by complex changes to optimizers. You can try patching tree_could_trap_p, though I am not sure there is enough context provided, or add a few TM checks in selected places (loop IM and PRE). We are close to a release. Richard. > Torvald >
>> Again, __transactions being barriers and all, I don't think we should >> complicate things unnecessarily at this point, since it doesn't happen. > > Yes. Based on Richard Guenther's examples, my question was whether your > code (without having actually looked at it ;) ) would also allow > post-dominating loads in nontransactional code to enable hoisting (as in > the __transaction_atomic { if (foo) load; } load case. I believe only > loads within the same transaction should count, and I wasn't sure > whether you were ensuring that (and/or whether a barrier would enforce > this either). In my patch, post-dominating loads in nontransactional code is fair game for hoisting, and will get hoisted out. However, tweaking it to make sure this post-dominating load occurs in a transaction is trivial since we have a bitmap of all BBs in transactions.
> Ok. I see. So, I think what would be best is to have a way to check whether > a store/load is part of a transaction - do we have a way to do that right now? > (For example a flag on a gimple stmt?) Then we can simply avoid the LIM We do not (*). My patch accumulates that information on demand. I can certainly add a gimple bit for this, but can't the optimizations change/rewrite the stores/loads so we lose this information? > transform by making transaction load/store stmts behave the same as > potentially trapping stmts (thus, only optimize if the memory is accessed > unconditional somewhere else). That would work for PRE as well. > [easiest would be to make *_could_trap_p return true for memory ops > inside a transaction] Provided the gimple bit works, this seems reasonable, though quite a big hammer. But given that we are nearing a release, I would be in favor of it. Richard Henderson, what do you think? Aldy (*) Well technically we do, but much later in the tmmark pass, when we instrument the loads, but this is useless since all the tree optimizers have been run (loop IM, etc).
>> For that matter, didn't rth add a memory barrier at the beginning of >> transactions last week? That would mean that we can't hoist anything >> outside of a transaction anyhow. Or was it not a full memory barrier? > > It's now a full memory barrier for all global memory and for local statics > if their address is taken (and for automatic vars with their address taken). > Do we need to be concerned about non-address-taken local statics? It is my understanding that non-address-taken local statics are not visible to other threads, so we don't have to worry about these. A
On 02/27/2012 11:22 AM, Aldy Hernandez wrote: > >> Ok. I see. So, I think what would be best is to have a way to check >> whether >> a store/load is part of a transaction - do we have a way to do that >> right now? >> (For example a flag on a gimple stmt?) Then we can simply avoid the LIM > > We do not (*). My patch accumulates that information on demand. I > can certainly add a gimple bit for this, but can't the optimizations > change/rewrite the stores/loads so we lose this information? > I t would seem appropriate to me that in the future, perhaps the CFG could have a flag set for any basic block which is in a transaction... This would make it pretty trivial at all times to tell if a statement is part of a transaction or not. It seems like the process of CFG construction/updating should be able to maintain that info for little cost. (ie, I'd guess it's already making the traversals requires to collect that info) It would certainly be convenient :-) Andrew
Hi, On Mon, 27 Feb 2012, Aldy Hernandez wrote: > > > > For that matter, didn't rth add a memory barrier at the beginning of > > > transactions last week? That would mean that we can't hoist anything > > > outside of a transaction anyhow. Or was it not a full memory barrier? > > > > It's now a full memory barrier for all global memory and for local statics > > if their address is taken (and for automatic vars with their address taken). > > Do we need to be concerned about non-address-taken local statics? > > It is my understanding that non-address-taken local statics are not > visible to other threads, void f () { static int i; i++; } Running 'f' in different threads will expose the storage to 'i' to each of them without taking its address :-/ Ciao, Michael.
On 02/27/12 11:02, Michael Matz wrote: > Hi, > > On Mon, 27 Feb 2012, Aldy Hernandez wrote: > >> >>>> For that matter, didn't rth add a memory barrier at the beginning of >>>> transactions last week? That would mean that we can't hoist anything >>>> outside of a transaction anyhow. Or was it not a full memory barrier? >>> >>> It's now a full memory barrier for all global memory and for local statics >>> if their address is taken (and for automatic vars with their address taken). >>> Do we need to be concerned about non-address-taken local statics? >> >> It is my understanding that non-address-taken local statics are not >> visible to other threads, > > void f () { static int i; i++; } > > Running 'f' in different threads will expose the storage to 'i' to each of > them without taking its address :-/ In that case, shouldn't we make transactions barriers for local statics as well, regardless of if their address was taken or not?
On Mon, Feb 27, 2012 at 6:29 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > On 02/27/12 11:02, Michael Matz wrote: >> >> Hi, >> >> On Mon, 27 Feb 2012, Aldy Hernandez wrote: >> >>> >>>>> For that matter, didn't rth add a memory barrier at the beginning of >>>>> transactions last week? That would mean that we can't hoist anything >>>>> outside of a transaction anyhow. Or was it not a full memory barrier? >>>> >>>> >>>> It's now a full memory barrier for all global memory and for local >>>> statics >>>> if their address is taken (and for automatic vars with their address >>>> taken). >>>> Do we need to be concerned about non-address-taken local statics? >>> >>> >>> It is my understanding that non-address-taken local statics are not >>> visible to other threads, >> >> >> void f () { static int i; i++; } >> >> Running 'f' in different threads will expose the storage to 'i' to each of >> them without taking its address :-/ > > > In that case, shouldn't we make transactions barriers for local statics as > well, regardless of if their address was taken or not? We don't have a convenient bit for this right now, but OTOH we assume that the transaction may re-enter the function, and thus it should already be a barrier for local statics. Unless the builtins have the "leaf" attribute on them. Richard.
On Mon, Feb 27, 2012 at 5:44 PM, Andrew MacLeod <amacleod@redhat.com> wrote: > On 02/27/2012 11:22 AM, Aldy Hernandez wrote: >> >> >>> Ok. I see. So, I think what would be best is to have a way to check >>> whether >>> a store/load is part of a transaction - do we have a way to do that right >>> now? >>> (For example a flag on a gimple stmt?) Then we can simply avoid the LIM >> >> >> We do not (*). My patch accumulates that information on demand. I can >> certainly add a gimple bit for this, but can't the optimizations >> change/rewrite the stores/loads so we lose this information? Yes, we'd have to either fix them all up or compute that bit automagically (like we do for the has_volatile_ops flag). I assume that you _can_ determine if a stmt is part of a transaction (it may just not be a O(1) operation)? Thus, a bit crude way would be to have the gimple flag but have it to-be computed on-demand. And you'd compute it before critical passes such as LIM and PRE. > I t would seem appropriate to me that in the future, perhaps the CFG could > have a flag set for any basic block which is in a transaction... This would > make it pretty trivial at all times to tell if a statement is part of a > transaction or not. It seems like the process of CFG construction/updating > should be able to maintain that info for little cost. (ie, I'd guess it's > already making the traversals requires to collect that info) > > It would certainly be convenient :-) That would sound easier than maintaining the flag on the stmt. OTOH we already maintain on-the-side EH tables, so it can't be that big of a problem. What do we do if we move stmts inside a transaction though? Do they magically become part of it and thus we cannot move them out again? If you want to fix this issue for 4.7, add the gimple flag and an on-demand compute helper and check the flag in PRE/LIM. Richard. > Andrew > > >
On 02/28/2012 04:39 AM, Richard Guenther wrote: > On Mon, Feb 27, 2012 at 5:44 PM, Andrew MacLeod<amacleod@redhat.com> wrote: > I t would seem appropriate to me that in the future, perhaps the CFG could > have a flag set for any basic block which is in a transaction... This would > make it pretty trivial at all times to tell if a statement is part of a > transaction or not. It seems like the process of CFG construction/updating > should be able to maintain that info for little cost. (ie, I'd guess it's > already making the traversals requires to collect that info) > > It would certainly be convenient :-) > That would sound easier than maintaining the flag on the stmt. OTOH we > already maintain on-the-side EH tables, so it can't be that big of a problem. > > What do we do if we move stmts inside a transaction though? Do they > magically become part of it and thus we cannot move them out again? well, I think by definition if you move a statement into a basic block of a transaction it becomes a part of the transaction. I believe transactions are block based... the entire block is in f a transaction, or not. Moving a statement out of a transaction would be as easy as moving it in... just change the block it is in. > If you want to fix this issue for 4.7, add the gimple flag and an on-demand > compute helper and check the flag in PRE/LIM. > > yes, a targeted approach is required for 4.7. I just wanted to plant the seed while the discussion was ongoing. Andrew
On 02/27/12 08:22, Aldy Hernandez wrote: >> transform by making transaction load/store stmts behave the same as >> potentially trapping stmts (thus, only optimize if the memory is accessed >> unconditional somewhere else). That would work for PRE as well. >> [easiest would be to make *_could_trap_p return true for memory ops >> inside a transaction] > > Provided the gimple bit works, this seems reasonable, though quite a big hammer. But given that we are nearing a release, I would be in favor of it. > > Richard Henderson, what do you think? Well, hooking could_trap_p sounds like an easy solution. Gimple bits, on the other hand, are not. Keeping those up-to-date is always a real pain. We have had several gimple bits in the history of the TM code, and we've gotten rid of them all because they were too invasive to maintain. OTOH, I have no better suggestion... r~
Index: tree-ssa-loop-im.c =================================================================== --- tree-ssa-loop-im.c (revision 184445) +++ tree-ssa-loop-im.c (working copy) @@ -150,7 +150,7 @@ typedef struct mem_ref bitmap indep_ref; /* The set of memory references on that this reference is independent. */ - bitmap dep_ref; /* The complement of DEP_REF. */ + bitmap dep_ref; /* The complement of INDEP_REF. */ } *mem_ref_p; DEF_VEC_P(mem_ref_p); @@ -189,6 +189,13 @@ static struct static bool ref_indep_loop_p (struct loop *, mem_ref_p); +/* All basic blocks that are within a transaction in the current + function. */ +static bitmap all_tm_blocks; + +/* All the loads in the current function. */ +static mem_ref_locs_p all_loads; + /* Minimum cost of an expensive expression. */ #define LIM_EXPENSIVE ((unsigned) PARAM_VALUE (PARAM_LIM_EXPENSIVE)) @@ -337,6 +344,26 @@ for_each_index (tree *addr_p, bool (*cbc } } +/* Return true if every path out of the loop containing STMT loads the + decl loaded in STMT. */ + +static bool +every_path_out_of_loop_has_load (gimple stmt) +{ + basic_block bb = gimple_bb (stmt); + basic_block header = bb->loop_father->header; + unsigned int i; + mem_ref_loc_p aref; + tree rhs = gimple_assign_rhs1 (stmt); + + /* Return true if the same load occurs on every path out of the loop. */ + FOR_EACH_VEC_ELT (mem_ref_loc_p, all_loads->locs, i, aref) + if (rhs == *aref->ref + && dominated_by_p (CDI_POST_DOMINATORS, header, gimple_bb (aref->stmt))) + return true; + return false; +} + /* If it is possible to hoist the statement STMT unconditionally, returns MOVE_POSSIBLE. If it is possible to hoist the statement STMT, but we must avoid making @@ -412,6 +439,26 @@ movement_possibility (gimple stmt) || gimple_could_trap_p (stmt)) return MOVE_PRESERVE_EXECUTION; + /* Non local loads in a transaction cannot be hoisted out unless the + load happens on every path out of the loop. */ + if (flag_tm + && bitmap_bit_p (all_tm_blocks, gimple_bb (stmt)->index) + && gimple_assign_single_p (stmt)) + { + tree rhs = gimple_assign_rhs1 (stmt); + if (DECL_P (rhs) && is_global_var (rhs) + && !every_path_out_of_loop_has_load (stmt)) + { + if (dump_file) + { + fprintf (dump_file, "Cannot hoist conditional load of "); + print_generic_expr (dump_file, rhs, TDF_SLIM); + fprintf (dump_file, " because it is in a transaction.\n"); + } + return MOVE_IMPOSSIBLE; + } + } + return ret; } @@ -2358,6 +2405,42 @@ fill_always_executed_in (struct loop *lo fill_always_executed_in (loop, contains_call); } +/* Compute global information needed for transactional restrictions in + the loop invariant motion pass. */ + +static void +tree_ssa_lim_tm_initialize (void) +{ + basic_block bb; + gimple_stmt_iterator bsi; + + calculate_dominance_info (CDI_POST_DOMINATORS); + all_tm_blocks = get_all_tm_blocks (); + all_loads = mem_ref_locs_alloc (); + + /* Save all the loads in the current function. */ + /* ?? This needs to be rewritten into something more efficient, like + a hash. */ + FOR_EACH_BB (bb) + for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi)) + { + gimple stmt = gsi_stmt (bsi); + + if (is_gimple_assign (stmt) + && gimple_assign_single_p (stmt)) + { + tree *rhs_p = gimple_assign_rhs1_ptr (stmt); + if (DECL_P (*rhs_p) && is_global_var (*rhs_p)) + { + mem_ref_loc_p aref = XNEW (struct mem_ref_loc); + aref->stmt = stmt; + aref->ref = rhs_p; + VEC_safe_push (mem_ref_loc_p, heap, all_loads->locs, aref); + } + } + } +} + /* Compute the global information needed by the loop invariant motion pass. */ static void @@ -2387,6 +2470,16 @@ tree_ssa_lim_initialize (void) sbitmap_free (contains_call); lim_aux_data_map = pointer_map_create (); + + if (flag_tm) + tree_ssa_lim_tm_initialize (); +} + +static void +tree_ssa_lim_tm_finalize (void) +{ + BITMAP_FREE (all_tm_blocks); + free_mem_ref_locs (all_loads); } /* Cleans up after the invariant motion pass. */ @@ -2420,6 +2513,9 @@ tree_ssa_lim_finalize (void) if (memory_accesses.ttae_cache) pointer_map_destroy (memory_accesses.ttae_cache); + + if (flag_tm) + tree_ssa_lim_tm_finalize (); } /* Moves invariants from loops. Only "expensive" invariants are moved out -- Index: tree.h =================================================================== --- tree.h (revision 184445) +++ tree.h (working copy) @@ -5875,6 +5875,7 @@ extern bool is_tm_may_cancel_outer (tree extern bool is_tm_ending_fndecl (tree); extern void record_tm_replacement (tree, tree); extern void tm_malloc_replacement (tree); +extern bitmap get_all_tm_blocks (void); static inline bool is_tm_safe_or_pure (const_tree x) Index: testsuite/gcc.dg/tm/pub-safety-1.c =================================================================== --- testsuite/gcc.dg/tm/pub-safety-1.c (revision 0) +++ testsuite/gcc.dg/tm/pub-safety-1.c (revision 0) @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim1" } */ + +/* Test that thread visible loads do not get hoisted out of loops if + the load would not have occurred on each path out of the loop. */ + +int x[10] = {0,0,0,0,0,0,0,0,0,0}; +int DATA_DATA = 0; + +void reader() +{ + int i; + __transaction_atomic + { + for (i = 0; i < 10; i++) + if (x[i]) + x[i] += DATA_DATA; + /* If we loaded DATA_DATA here, we could hoist it before the loop, + but since we don't... we can't. */ + } +} + +/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 1 "lim1" } } */ +/* { dg-final { cleanup-tree-dump "lim1" } } */ Index: testsuite/gcc.dg/tm/pub-safety-2.c =================================================================== --- testsuite/gcc.dg/tm/pub-safety-2.c (revision 0) +++ testsuite/gcc.dg/tm/pub-safety-2.c (revision 0) @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim1" } */ + +/* Test that thread visible loads do not get hoisted out of loops if + the load would not have occurred on each path out of the loop. */ + +int x[10] = {0,0,0,0,0,0,0,0,0,0}; +int DATA_DATA = 0, george; + +void reader() +{ + int i; + __transaction_atomic + { + for (i = 0; i < 10; i++) + if (x[i]) + x[i] += DATA_DATA; + + /* Because of this load of DATA_DATA, we are loading DATA_DATA + on every path out of the loop. It is OK to hoist DATA_DATA + out. */ + george = DATA_DATA; + } +} + +/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 0 "lim1" } } */ +/* { dg-final { cleanup-tree-dump "lim1" } } */ Index: trans-mem.c =================================================================== --- trans-mem.c (revision 184445) +++ trans-mem.c (working copy) @@ -1858,18 +1858,25 @@ tm_region_init (struct tm_region *region VEC(basic_block, heap) *queue = NULL; bitmap visited_blocks = BITMAP_ALLOC (NULL); struct tm_region *old_region; + struct tm_region **region_worklist; all_tm_regions = region; bb = single_succ (ENTRY_BLOCK_PTR); + /* We could store this information in bb->aux, but we may get called + through get_all_tm_blocks() from another pass that may be already + using bb->aux. */ + region_worklist = + (struct tm_region **) xcalloc (sizeof (struct tm_region *), + n_basic_blocks + NUM_FIXED_BLOCKS); + VEC_safe_push (basic_block, heap, queue, bb); - gcc_assert (!bb->aux); /* FIXME: Remove me. */ - bb->aux = region; + region_worklist[bb->index] = region; do { bb = VEC_pop (basic_block, queue); - region = (struct tm_region *)bb->aux; - bb->aux = NULL; + region = region_worklist[bb->index]; + region_worklist[bb->index] = NULL; /* Record exit and irrevocable blocks. */ region = tm_region_init_1 (region, bb); @@ -1886,20 +1893,20 @@ tm_region_init (struct tm_region *region { bitmap_set_bit (visited_blocks, e->dest->index); VEC_safe_push (basic_block, heap, queue, e->dest); - gcc_assert (!e->dest->aux); /* FIXME: Remove me. */ /* If the current block started a new region, make sure that only the entry block of the new region is associated with this region. Other successors are still part of the old region. */ if (old_region != region && e->dest != region->entry_block) - e->dest->aux = old_region; + region_worklist[e->dest->index] = old_region; else - e->dest->aux = region; + region_worklist[e->dest->index] = region; } } while (!VEC_empty (basic_block, queue)); VEC_free (basic_block, heap, queue); BITMAP_FREE (visited_blocks); + free (region_worklist); } /* The "gate" function for all transactional memory expansion and optimization @@ -2424,6 +2431,39 @@ get_tm_region_blocks (basic_block entry_ return bbs; } +/* Return a bitmap with all the basic blocks that appear in + transactions throughout current_function_decl. The bitmap must be + freed by the caller. + + This bitmap is only usable by passes that don't adjust the CFG in + the middle of the pass. */ + +bitmap +get_all_tm_blocks (void) +{ + struct tm_region *region; + VEC (basic_block, heap) *queue; + bitmap all_tm_blocks = BITMAP_ALLOC (NULL); + + /* ?? Perhaps we need to abstract gate_tm_init further, because we + certainly don't need it to calculate CDI_DOMINATOR info. */ + gate_tm_init (); + + for (region = all_tm_regions; region; region = region->next) + { + queue = get_tm_region_blocks (region->entry_block, + region->exit_blocks, + region->irr_blocks, + all_tm_blocks, + /*stop_at_irr_p=*/true); + VEC_free (basic_block, heap, queue); + } + + if (all_tm_regions) + bitmap_obstack_release (&tm_obstack); + return all_tm_blocks; +} + /* Entry point to the MARK phase of TM expansion. Here we replace transactional memory statements with calls to builtins, and function calls with their transactional clones (if available). But we don't