Message ID | 52B31920.9080501@redhat.com |
---|---|
State | New |
Headers | show |
On Thu, Dec 19, 2013 at 5:04 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > As discussed in the PR, we already remove nested transactions in the tmlower > stage, but inlining may add more nested transactions later. > > The problem with removing these nested transactions after proper IPA, is > that we'd either have to add another IPA pass later or remove the > problematic transactions much later in the .tmmark stage. Either of these > places will be after we have added the uninstrumented code path, so things > look a little messy CFG-wise (not impossible to do, but messy). > > What I've done in this patch is to fix the inliner cost model so that the > early inliner sees and inlines the transaction early on. This way, .tmipa > can see the inlined nested transaction before it adds the uninstrumented > code path, making everything simpler. Then it's just a matter of removing > the GIMPLE_TRANSACTION and associated commit. > > I'm still unsure whether the IPA inliner (not the early inliner) will add > other nested transactions, so we may have to do everything in .tmmark and > handle the dual code paths :(. Either way, this is a start. Sure it will. At least with cross-unit inlining with LTO. You can of course simply disallow inlining a function that has transactions into a transaction when transactions are lowered already. Richard. > Also, is there anything I should check here, or is checking for > GTMA_HAVE_ABORT enough? > > + /* An inner transactions which has no explicit aborts can be > + folded into its outer transaction. */ > + unsigned int sub = gimple_transaction_subcode > (region->transaction_stmt); > + if (outer && !(sub & GTMA_HAVE_ABORT)) > > Tested on x86-64 Linux. > > How does this look?
>> I'm still unsure whether the IPA inliner (not the early inliner) will add >> other nested transactions, so we may have to do everything in .tmmark and >> handle the dual code paths :(. Either way, this is a start. > > Sure it will. At least with cross-unit inlining with LTO. You can of > course simply > disallow inlining a function that has transactions into a transaction > when transactions > are lowered already. I'd still like to catch the common cases, like I do with this patch. Perhaps we move this code to the .tmmark pass and handle the uninstrumented case. rth?
Aldy Hernandez <aldyh@redhat.com> wrote: > >>> I'm still unsure whether the IPA inliner (not the early inliner) >will add >>> other nested transactions, so we may have to do everything in >.tmmark and >>> handle the dual code paths :(. Either way, this is a start. >> >> Sure it will. At least with cross-unit inlining with LTO. You can >of >> course simply >> disallow inlining a function that has transactions into a transaction >> when transactions >> are lowered already. > >I'd still like to catch the common cases, like I do with this patch. > >Perhaps we move this code to the .tmmark pass and handle the >uninstrumented case. rth? Btw, don't you want an IPA pass that clones functions called from within transactions and optimize those accordingly? Thus make the tm lowering a real IPA pass? Richard.
On 12/19/2013 11:06 AM, Richard Biener wrote: > Aldy Hernandez <aldyh@redhat.com> wrote: >> I'd still like to catch the common cases, like I do with this patch. >> >> Perhaps we move this code to the .tmmark pass and handle the >> uninstrumented case. rth? tmmark is way way later than you'd want. I believe that ipa_tm is the right place. That's where we generate clones. The clones know a-priori that they're called within a transaction and thus all internal transations may be eliminated. And thus any inlining that would happen after ipa_tm would properly inline the clone, and thus no more need be done. > Btw, don't you want an IPA pass that clones functions called from within > transactions and optimize those accordingly? That's what we have, modulo the "optimize those accordingly". > Thus make the tm lowering a real IPA pass? There are plenty of tm-related passes, one of which is "lower_tm". Like most of the other "lower" passes, the high-level gimple that it processes can't be used as input to build_cfg. r~
On 01/06/14 13:40, Richard Henderson wrote: > On 12/19/2013 11:06 AM, Richard Biener wrote: >> Aldy Hernandez <aldyh@redhat.com> wrote: >>> I'd still like to catch the common cases, like I do with this patch. >>> >>> Perhaps we move this code to the .tmmark pass and handle the >>> uninstrumented case. rth? > > tmmark is way way later than you'd want. I believe that ipa_tm is the right > place. That's where we generate clones. The clones know a-priori that they're > called within a transaction and thus all internal transations may be > eliminated. And thus any inlining that would happen after ipa_tm would > properly inline the clone, and thus no more need be done. But ipa_tm still runs before the latter inliner: ... u.c.041i.tmipa u.c.043i.whole-program u.c.044i.profile_estimate u.c.045i.devirt u.c.046i.cp u.c.048i.inline So even though my proposed patch works in the supplied testcase (because the early inliner ran before .tmipa and inlined the nested transaction), if the latter inliner added a nested transaction we're in the same boat. I just saw this happen with LTO, as Richi pointed out. What do you suggest?
On 01/06/14 15:04, Aldy Hernandez wrote: > On 01/06/14 13:40, Richard Henderson wrote: >> On 12/19/2013 11:06 AM, Richard Biener wrote: >>> Aldy Hernandez <aldyh@redhat.com> wrote: >>>> I'd still like to catch the common cases, like I do with this patch. >>>> >>>> Perhaps we move this code to the .tmmark pass and handle the >>>> uninstrumented case. rth? >> >> tmmark is way way later than you'd want. I believe that ipa_tm is the >> right >> place. That's where we generate clones. The clones know a-priori >> that they're >> called within a transaction and thus all internal transations may be >> eliminated. And thus any inlining that would happen after ipa_tm would >> properly inline the clone, and thus no more need be done. > > But ipa_tm still runs before the latter inliner: > > ... > u.c.041i.tmipa > u.c.043i.whole-program > u.c.044i.profile_estimate > u.c.045i.devirt > u.c.046i.cp > u.c.048i.inline > > So even though my proposed patch works in the supplied testcase (because > the early inliner ran before .tmipa and inlined the nested transaction), > if the latter inliner added a nested transaction we're in the same boat. > I just saw this happen with LTO, as Richi pointed out. > > What do you suggest? > Though, let me clarify what I saw. LTO would only do it's cross TU inlining thing if the inlinee was marked as transaction_safe, because anything else wouldn't be valid inside an atomic transaction: reynosa:/dev/shm/trunk/gcc/u$ cat d.c int a; void inc(){ __transaction_atomic { a++; } } reynosa:/dev/shm/trunk/gcc/u$ cat e.c extern void inc() __attribute__((transaction_safe)); main() { __transaction_atomic { inc(); } } So perhaps we don't care about LTO inlining, since most of what it would inline isn't valid within the transactional context ??. So perhaps just getting rid of transactions appearing in a clone we're about to create in ipa-tm is all we need? Is that what you were getting at, or am I running around in circles here? :) Aldy
diff --git a/gcc/testsuite/gcc.dg/tm/pr56572.c b/gcc/testsuite/gcc.dg/tm/pr56572.c new file mode 100644 index 0000000..0b43569 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tm/pr56572.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-fgnu-tm -fdump-ipa-tmipa -O2" } */ + +/* Test that nested unnecessary transactions are properly folded. */ + +int a; + +static inline void f() { + __transaction_atomic { + ++a; + } +} + +void g() { + __transaction_atomic { + f(); + } +} + +/* { dg-final { scan-ipa-dump-times "__transaction_atomic" 1 "tmipa" } } */ +/* { dg-final { cleanup-ipa-dump "tmipa" } } */ diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c index ba34488..5e9e2ab 100644 --- a/gcc/trans-mem.c +++ b/gcc/trans-mem.c @@ -2956,6 +2956,120 @@ propagate_tm_flags_out (struct tm_region *region) propagate_tm_flags_out (region->next); } +/* Callback for expand_regions. + + Given a transaction in REGION, remove the GIMPLE_TRANSACTION and + its corresponding BUILT_IN_TM_COMMIT*. The tm_region itself is + removed by the caller. */ + +static void * +remove_transaction (struct tm_region *region, void *data ATTRIBUTE_UNUSED) +{ + // Remove the transaction. + if (dump_file) + fprintf (dump_file, "Folding unnecessary inner transaction in BB%d\n", + gimple_bb (region->transaction_stmt)->index); + gimple_stmt_iterator gsi = gsi_for_stmt (region->transaction_stmt); + gsi_remove (&gsi, true); + + // Find the corresponding BUILT_IN_TM_COMMIT* and remove it as well. + if (region->exit_blocks) + { + bitmap_iterator bi; + unsigned i; + + EXECUTE_IF_SET_IN_BITMAP (region->exit_blocks, 0, i, bi) + { + gimple_stmt_iterator gsi + = gsi_last_bb (BASIC_BLOCK_FOR_FN (cfun, i)); + gimple stmt = gsi_stmt (gsi); + if (gimple_code (stmt) != GIMPLE_CALL) + continue; + tree fndecl = gimple_call_fndecl (stmt); + if (fndecl + && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL + && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_TM_COMMIT + || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_TM_COMMIT_EH)) + { + tree vdef = gimple_vdef (stmt); + tree vuse = gimple_vuse (stmt); + if (vdef && vuse) + replace_uses_by (vdef, vuse); + gsi_remove (&gsi, true); + } + } + } + return NULL; +} + +/* A helper function for flatten_nested_transactions. + + Go through all the transactional regions, starting at TOP, and + remove any nested transactions that are redundant, removing both + the start/end of a transaction as well as removing the region + from the tm_region list. + + OUTER is TOP's outer transaction and is set to null if TOP is the + outermost transaction. + + Returns FALSE if the region passed in TOP should be removed from + the list of regions by the caller. */ +static bool +flatten_nested_transactions_1 (struct tm_region *top, struct tm_region *outer) +{ + /* We could fold this traversal into ipa_tm_scan_calls_transaction, + and only traverse all_tm_regions once, but the nested + transactions we handle below would complicate things. Besides, + there aren't liable to be many TM regions in a function to make + this extra iteration so unprofitable. + + ?? We could even only call flatten_nested_transactions from + within ipa_tm_scan_calls_transaction only if we have any inner + transactions. Hmmm, probably not worth it. */ + struct tm_region *prev = NULL; + for (struct tm_region *region = top; region; region = region->next) + { + if (region->inner + && !flatten_nested_transactions_1 (region->inner, region)) + region->inner = NULL; + + /* An inner transactions which has no explicit aborts can be + folded into its outer transaction. */ + unsigned int sub = gimple_transaction_subcode (region->transaction_stmt); + if (outer && !(sub & GTMA_HAVE_ABORT)) + { + expand_regions (region, remove_transaction, NULL, false); + + /* Remove region from the list if its anything but the first + item, otherwise make the caller remove the first + item. */ + if (prev == NULL) + return false; + else + prev->next = region->next; + } + prev = region; + } + return true; +} + +/* Go through all the transactional regions and remove any nested + transactions that are redundant. This is done by removing the + GIMPLE_TRANSACTION + BUILT_IN_TM_COMMIT* pairs and by removing the + transaction region from the all_tm_regions list. + + Note: We already did this in the tmlower pass (lower_transaction), + but inlining may have added more nested transactions that are now + candidates for removal. */ + +static void +flatten_nested_transactions () +{ + if (!flatten_nested_transactions_1 (all_tm_regions, NULL)) + all_tm_regions = NULL; +} + + /* Entry point to the MARK phase of TM expansion. Here we replace transactional memory statements with calls to builtins, and function calls with their transactional clones (if available). But we don't @@ -5358,6 +5472,8 @@ ipa_tm_execute (void) { d = get_cg_data (&node, true); + flatten_nested_transactions (); + /* Scan for calls that are in each transaction, and generate the uninstrumented code path. */ ipa_tm_scan_calls_transaction (d, &tm_callees); diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 96a4805..6c6ef41 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -3972,7 +3972,7 @@ init_inline_once (void) eni_size_weights.target_builtin_call_cost = 1; eni_size_weights.div_mod_cost = 1; eni_size_weights.omp_cost = 40; - eni_size_weights.tm_cost = 10; + eni_size_weights.tm_cost = 2; eni_size_weights.time_based = false; eni_size_weights.return_cost = 1;