diff mbox

PR56572 flatten unnecessary nested transactions after inlining

Message ID 52B31920.9080501@redhat.com
State New
Headers show

Commit Message

Aldy Hernandez Dec. 19, 2013, 4:04 p.m. UTC
As discussed in the PR, we already remove nested transactions in the 
tmlower stage, but inlining may add more nested transactions later.

The problem with removing these nested transactions after proper IPA, is 
that we'd either have to add another IPA pass later or remove the 
problematic transactions much later in the .tmmark stage.  Either of 
these places will be after we have added the uninstrumented code path, 
so things look a little messy CFG-wise (not impossible to do, but messy).

What I've done in this patch is to fix the inliner cost model so that 
the early inliner sees and inlines the transaction early on.  This way, 
.tmipa can see the inlined nested transaction before it adds the 
uninstrumented code path, making everything simpler.  Then it's just a 
matter of removing the GIMPLE_TRANSACTION and associated commit.

I'm still unsure whether the IPA inliner (not the early inliner) will 
add other nested transactions, so we may have to do everything in 
.tmmark and handle the dual code paths :(.  Either way, this is a start.

Also, is there anything I should check here, or is checking for 
GTMA_HAVE_ABORT enough?

+      /* An inner transactions which has no explicit aborts can be
+        folded into its outer transaction.  */
+      unsigned int sub = gimple_transaction_subcode 
(region->transaction_stmt);
+      if (outer && !(sub & GTMA_HAVE_ABORT))

Tested on x86-64 Linux.

How does this look?
commit 460af022f8dcc5fb67410a8cd6a33d2be13819b9
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Wed Dec 18 09:53:54 2013 -0800

    	PR tree-optimization/56572
    	* tree-inline.c (init_inline_once): Adjust .tm_cost size.
    	* trans-mem.c (remove_transaction): New.
    	(flatten_nested_transactions_1): New.
    	(flatten_nested_transactions): New.
    	(ipa_tm_execute): Call flatten_nested_transactions.

Comments

Richard Biener Dec. 19, 2013, 4:26 p.m. UTC | #1
On Thu, Dec 19, 2013 at 5:04 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> As discussed in the PR, we already remove nested transactions in the tmlower
> stage, but inlining may add more nested transactions later.
>
> The problem with removing these nested transactions after proper IPA, is
> that we'd either have to add another IPA pass later or remove the
> problematic transactions much later in the .tmmark stage.  Either of these
> places will be after we have added the uninstrumented code path, so things
> look a little messy CFG-wise (not impossible to do, but messy).
>
> What I've done in this patch is to fix the inliner cost model so that the
> early inliner sees and inlines the transaction early on.  This way, .tmipa
> can see the inlined nested transaction before it adds the uninstrumented
> code path, making everything simpler.  Then it's just a matter of removing
> the GIMPLE_TRANSACTION and associated commit.
>
> I'm still unsure whether the IPA inliner (not the early inliner) will add
> other nested transactions, so we may have to do everything in .tmmark and
> handle the dual code paths :(.  Either way, this is a start.

Sure it will.  At least with cross-unit inlining with LTO.  You can of
course simply
disallow inlining a function that has transactions into a transaction
when transactions
are lowered already.

Richard.

> Also, is there anything I should check here, or is checking for
> GTMA_HAVE_ABORT enough?
>
> +      /* An inner transactions which has no explicit aborts can be
> +        folded into its outer transaction.  */
> +      unsigned int sub = gimple_transaction_subcode
> (region->transaction_stmt);
> +      if (outer && !(sub & GTMA_HAVE_ABORT))
>
> Tested on x86-64 Linux.
>
> How does this look?
Aldy Hernandez Dec. 19, 2013, 4:58 p.m. UTC | #2
>> I'm still unsure whether the IPA inliner (not the early inliner) will add
>> other nested transactions, so we may have to do everything in .tmmark and
>> handle the dual code paths :(.  Either way, this is a start.
>
> Sure it will.  At least with cross-unit inlining with LTO.  You can of
> course simply
> disallow inlining a function that has transactions into a transaction
> when transactions
> are lowered already.

I'd still like to catch the common cases, like I do with this patch.

Perhaps we move this code to the .tmmark pass and handle the 
uninstrumented case.  rth?
Richard Biener Dec. 19, 2013, 7:06 p.m. UTC | #3
Aldy Hernandez <aldyh@redhat.com> wrote:
>
>>> I'm still unsure whether the IPA inliner (not the early inliner)
>will add
>>> other nested transactions, so we may have to do everything in
>.tmmark and
>>> handle the dual code paths :(.  Either way, this is a start.
>>
>> Sure it will.  At least with cross-unit inlining with LTO.  You can
>of
>> course simply
>> disallow inlining a function that has transactions into a transaction
>> when transactions
>> are lowered already.
>
>I'd still like to catch the common cases, like I do with this patch.
>
>Perhaps we move this code to the .tmmark pass and handle the 
>uninstrumented case.  rth?

Btw, don't you want an IPA pass that clones functions called from within transactions and optimize those accordingly? Thus make the tm lowering a real IPA pass?

Richard.
Richard Henderson Jan. 6, 2014, 9:40 p.m. UTC | #4
On 12/19/2013 11:06 AM, Richard Biener wrote:
> Aldy Hernandez <aldyh@redhat.com> wrote:
>> I'd still like to catch the common cases, like I do with this patch.
>>
>> Perhaps we move this code to the .tmmark pass and handle the 
>> uninstrumented case.  rth?

tmmark is way way later than you'd want.  I believe that ipa_tm is the right
place.  That's where we generate clones.  The clones know a-priori that they're
called within a transaction and thus all internal transations may be
eliminated.  And thus any inlining that would happen after ipa_tm would
properly inline the clone, and thus no more need be done.

> Btw, don't you want an IPA pass that clones functions called from within
> transactions and optimize those accordingly?

That's what we have, modulo the "optimize those accordingly".

> Thus make the tm lowering a real IPA pass?

There are plenty of tm-related passes, one of which is "lower_tm".  Like most
of the other "lower" passes, the high-level gimple that it processes can't be
used as input to build_cfg.


r~
Aldy Hernandez Jan. 6, 2014, 11:04 p.m. UTC | #5
On 01/06/14 13:40, Richard Henderson wrote:
> On 12/19/2013 11:06 AM, Richard Biener wrote:
>> Aldy Hernandez <aldyh@redhat.com> wrote:
>>> I'd still like to catch the common cases, like I do with this patch.
>>>
>>> Perhaps we move this code to the .tmmark pass and handle the
>>> uninstrumented case.  rth?
>
> tmmark is way way later than you'd want.  I believe that ipa_tm is the right
> place.  That's where we generate clones.  The clones know a-priori that they're
> called within a transaction and thus all internal transations may be
> eliminated.  And thus any inlining that would happen after ipa_tm would
> properly inline the clone, and thus no more need be done.

But ipa_tm still runs before the latter inliner:

...
u.c.041i.tmipa
u.c.043i.whole-program
u.c.044i.profile_estimate
u.c.045i.devirt
u.c.046i.cp
u.c.048i.inline

So even though my proposed patch works in the supplied testcase (because 
the early inliner ran before .tmipa and inlined the nested transaction), 
if the latter inliner added a nested transaction we're in the same boat. 
  I just saw this happen with LTO, as Richi pointed out.

What do you suggest?
Aldy Hernandez Jan. 6, 2014, 11:14 p.m. UTC | #6
On 01/06/14 15:04, Aldy Hernandez wrote:
> On 01/06/14 13:40, Richard Henderson wrote:
>> On 12/19/2013 11:06 AM, Richard Biener wrote:
>>> Aldy Hernandez <aldyh@redhat.com> wrote:
>>>> I'd still like to catch the common cases, like I do with this patch.
>>>>
>>>> Perhaps we move this code to the .tmmark pass and handle the
>>>> uninstrumented case.  rth?
>>
>> tmmark is way way later than you'd want.  I believe that ipa_tm is the
>> right
>> place.  That's where we generate clones.  The clones know a-priori
>> that they're
>> called within a transaction and thus all internal transations may be
>> eliminated.  And thus any inlining that would happen after ipa_tm would
>> properly inline the clone, and thus no more need be done.
>
> But ipa_tm still runs before the latter inliner:
>
> ...
> u.c.041i.tmipa
> u.c.043i.whole-program
> u.c.044i.profile_estimate
> u.c.045i.devirt
> u.c.046i.cp
> u.c.048i.inline
>
> So even though my proposed patch works in the supplied testcase (because
> the early inliner ran before .tmipa and inlined the nested transaction),
> if the latter inliner added a nested transaction we're in the same boat.
>   I just saw this happen with LTO, as Richi pointed out.
>
> What do you suggest?
>

Though, let me clarify what I saw.  LTO would only do it's cross TU 
inlining thing if the inlinee was marked as transaction_safe, because 
anything else wouldn't be valid inside an atomic transaction:

reynosa:/dev/shm/trunk/gcc/u$ cat d.c
int a;
void inc(){
     __transaction_atomic { a++; }
}
reynosa:/dev/shm/trunk/gcc/u$ cat e.c
extern void inc() __attribute__((transaction_safe));
main()
{
   __transaction_atomic {
       inc();
   }
}

So perhaps we don't care about LTO inlining, since most of what it would 
inline isn't valid within the transactional context ??.  So perhaps just 
getting rid of transactions appearing in a clone we're about to create 
in ipa-tm is all we need?  Is that what you were getting at, or am I 
running around in circles here? :)

Aldy
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.dg/tm/pr56572.c b/gcc/testsuite/gcc.dg/tm/pr56572.c
new file mode 100644
index 0000000..0b43569
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tm/pr56572.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -fdump-ipa-tmipa -O2" } */
+
+/* Test that nested unnecessary transactions are properly folded.  */
+
+int a;
+
+static inline void f() {
+  __transaction_atomic {
+    ++a;
+  }
+}
+
+void g() {
+  __transaction_atomic {
+    f();
+  }
+}
+
+/* { dg-final { scan-ipa-dump-times "__transaction_atomic" 1 "tmipa" } } */
+/* { dg-final { cleanup-ipa-dump "tmipa" } } */
diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index ba34488..5e9e2ab 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -2956,6 +2956,120 @@  propagate_tm_flags_out (struct tm_region *region)
   propagate_tm_flags_out (region->next);
 }
 
+/* Callback for expand_regions.
+
+   Given a transaction in REGION, remove the GIMPLE_TRANSACTION and
+   its corresponding BUILT_IN_TM_COMMIT*.  The tm_region itself is
+   removed by the caller.  */
+
+static void *
+remove_transaction (struct tm_region *region, void *data ATTRIBUTE_UNUSED)
+{
+  // Remove the transaction.
+  if (dump_file)
+    fprintf (dump_file, "Folding unnecessary inner transaction in BB%d\n",
+	     gimple_bb (region->transaction_stmt)->index);
+  gimple_stmt_iterator gsi = gsi_for_stmt (region->transaction_stmt);
+  gsi_remove (&gsi, true);
+
+  // Find the corresponding BUILT_IN_TM_COMMIT* and remove it as well.
+  if (region->exit_blocks)
+    {
+      bitmap_iterator bi;
+      unsigned i;
+
+      EXECUTE_IF_SET_IN_BITMAP (region->exit_blocks, 0, i, bi)
+	{
+	  gimple_stmt_iterator gsi
+	    = gsi_last_bb (BASIC_BLOCK_FOR_FN (cfun, i));
+	  gimple stmt = gsi_stmt (gsi);
+	  if (gimple_code (stmt) != GIMPLE_CALL)
+	    continue;
+	  tree fndecl = gimple_call_fndecl (stmt);
+	  if (fndecl
+	      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+	      && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_TM_COMMIT
+		  || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_TM_COMMIT_EH))
+	    {
+	      tree vdef = gimple_vdef (stmt);
+	      tree vuse = gimple_vuse (stmt);
+	      if (vdef && vuse)
+		replace_uses_by (vdef, vuse);
+	      gsi_remove (&gsi, true);
+	    }
+	}
+    }
+  return NULL;
+}
+
+/* A helper function for flatten_nested_transactions.
+
+   Go through all the transactional regions, starting at TOP, and
+   remove any nested transactions that are redundant, removing both
+   the start/end of a transaction as well as removing the region
+   from the tm_region list.
+
+   OUTER is TOP's outer transaction and is set to null if TOP is the
+   outermost transaction.
+
+   Returns FALSE if the region passed in TOP should be removed from
+   the list of regions by the caller.  */
+static bool
+flatten_nested_transactions_1 (struct tm_region *top, struct tm_region *outer)
+{
+  /* We could fold this traversal into ipa_tm_scan_calls_transaction,
+     and only traverse all_tm_regions once, but the nested
+     transactions we handle below would complicate things.  Besides,
+     there aren't liable to be many TM regions in a function to make
+     this extra iteration so unprofitable.
+
+     ?? We could even only call flatten_nested_transactions from
+     within ipa_tm_scan_calls_transaction only if we have any inner
+     transactions.  Hmmm, probably not worth it.  */
+  struct tm_region *prev = NULL;
+  for (struct tm_region *region = top; region; region = region->next)
+    {
+      if (region->inner
+	  && !flatten_nested_transactions_1 (region->inner, region))
+	region->inner = NULL;
+
+      /* An inner transactions which has no explicit aborts can be
+	 folded into its outer transaction.  */
+      unsigned int sub = gimple_transaction_subcode (region->transaction_stmt);
+      if (outer && !(sub & GTMA_HAVE_ABORT))
+	{
+	  expand_regions (region, remove_transaction, NULL, false);
+
+	  /* Remove region from the list if its anything but the first
+	     item, otherwise make the caller remove the first
+	     item.  */
+	  if (prev == NULL)
+	    return false;
+	  else
+	    prev->next = region->next;
+	}
+      prev = region;
+    }
+  return true;
+}
+
+/* Go through all the transactional regions and remove any nested
+   transactions that are redundant.  This is done by removing the
+   GIMPLE_TRANSACTION + BUILT_IN_TM_COMMIT* pairs and by removing the
+   transaction region from the all_tm_regions list.
+
+   Note: We already did this in the tmlower pass (lower_transaction),
+   but inlining may have added more nested transactions that are now
+   candidates for removal.  */
+
+static void
+flatten_nested_transactions ()
+{
+  if (!flatten_nested_transactions_1 (all_tm_regions, NULL))
+    all_tm_regions = NULL;
+}
+
+
 /* Entry point to the MARK phase of TM expansion.  Here we replace
    transactional memory statements with calls to builtins, and function
    calls with their transactional clones (if available).  But we don't
@@ -5358,6 +5472,8 @@  ipa_tm_execute (void)
 	  {
 	    d = get_cg_data (&node, true);
 
+	    flatten_nested_transactions ();
+
 	    /* Scan for calls that are in each transaction, and
 	       generate the uninstrumented code path.  */
 	    ipa_tm_scan_calls_transaction (d, &tm_callees);
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 96a4805..6c6ef41 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3972,7 +3972,7 @@  init_inline_once (void)
   eni_size_weights.target_builtin_call_cost = 1;
   eni_size_weights.div_mod_cost = 1;
   eni_size_weights.omp_cost = 40;
-  eni_size_weights.tm_cost = 10;
+  eni_size_weights.tm_cost = 2;
   eni_size_weights.time_based = false;
   eni_size_weights.return_cost = 1;