Patchwork [PR51752] publication safety violations in loop invariant motion pass

login
register
mail settings
Submitter Aldy Hernandez
Date Feb. 28, 2012, 5:44 p.m.
Message ID <4F4D1277.9080206@redhat.com>
Download mbox | patch
Permalink /patch/143515/
State New
Headers show

Comments

Aldy Hernandez - Feb. 28, 2012, 5:44 p.m.
On 02/28/12 11:05, Richard Henderson wrote:
> On 02/27/12 08:22, Aldy Hernandez wrote:
>>> transform by making transaction load/store stmts behave the same as
>>> potentially trapping stmts (thus, only optimize if the memory is accessed
>>> unconditional somewhere else).  That would work for PRE as well.
>>> [easiest would be to make *_could_trap_p return true for memory ops
>>> inside a transaction]
>>
>> Provided the gimple bit works, this seems reasonable, though quite a big hammer.  But given that we are nearing a release, I would be in favor of it.
>>
>> Richard Henderson, what do you think?
>
> Well, hooking could_trap_p sounds like an easy solution.
>
> Gimple bits, on the other hand, are not.  Keeping those up-to-date is
> always a real pain.  We have had several gimple bits in the history of
> the TM code, and we've gotten rid of them all because they were too
> invasive to maintain.
>
> OTOH, I have no better suggestion...

Well, my solution wasn't pretty either.

However, Mr. Guenther's solution involves recalculating the bits when 
needed, so at least they won't be out of date.

Richards, what do y'all think of this approach?  It is a big hammer as 
discussed, but keeps most of us happy for 4.7.

Tested on x86-64 Linux.

Aldy
PR middle-end/51752
	* gimple.h (gimple_in_transaction): New.
	(gimple_set_in_transaction): New.
	(struct gimple_statement_base): Add in_transaction field.
	* tree-ssa-loop-im.c: (movement_possibility): Restrict movement of
	transaction loads.
	(tree_ssa_lim_initialize): Compute transaction bits.
	* tree.h (compute_transaction_bits): Protoize.
	* trans-mem.c (tm_region_init): Use the heap to store BB
	auxilliary data.
	(compute_transaction_bits): New.
Richard Henderson - Feb. 28, 2012, 7:12 p.m.
On 02/28/12 09:44, Aldy Hernandez wrote:
> 	PR middle-end/51752
> 	* gimple.h (gimple_in_transaction): New.
> 	(gimple_set_in_transaction): New.
> 	(struct gimple_statement_base): Add in_transaction field.
> 	* tree-ssa-loop-im.c: (movement_possibility): Restrict movement of
> 	transaction loads.
> 	(tree_ssa_lim_initialize): Compute transaction bits.
> 	* tree.h (compute_transaction_bits): Protoize.
> 	* trans-mem.c (tm_region_init): Use the heap to store BB
> 	auxilliary data.
> 	(compute_transaction_bits): New.

Looks good.  Thanks for your patience.


r~
Aldy Hernandez - Feb. 28, 2012, 8:11 p.m.
On 02/28/12 13:12, Richard Henderson wrote:
> On 02/28/12 09:44, Aldy Hernandez wrote:
>> 	PR middle-end/51752
>> 	* gimple.h (gimple_in_transaction): New.
>> 	(gimple_set_in_transaction): New.
>> 	(struct gimple_statement_base): Add in_transaction field.
>> 	* tree-ssa-loop-im.c: (movement_possibility): Restrict movement of
>> 	transaction loads.
>> 	(tree_ssa_lim_initialize): Compute transaction bits.
>> 	* tree.h (compute_transaction_bits): Protoize.
>> 	* trans-mem.c (tm_region_init): Use the heap to store BB
>> 	auxilliary data.
>> 	(compute_transaction_bits): New.
>
> Looks good.  Thanks for your patience.
>
>
> r~

Thank you.  I have committed the patch.

I will also look into the tree_could_trap business (and PRE and other 
passes) to see if we can divine some context.  But I probably won't get 
to it before early next week.

Thanks.
Richard Guenther - Feb. 29, 2012, 9:22 a.m.
On Tue, Feb 28, 2012 at 9:11 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 02/28/12 13:12, Richard Henderson wrote:
>>
>> On 02/28/12 09:44, Aldy Hernandez wrote:
>>>
>>>        PR middle-end/51752
>>>        * gimple.h (gimple_in_transaction): New.
>>>        (gimple_set_in_transaction): New.
>>>        (struct gimple_statement_base): Add in_transaction field.
>>>        * tree-ssa-loop-im.c: (movement_possibility): Restrict movement of
>>>        transaction loads.
>>>        (tree_ssa_lim_initialize): Compute transaction bits.
>>>        * tree.h (compute_transaction_bits): Protoize.
>>>        * trans-mem.c (tm_region_init): Use the heap to store BB
>>>        auxilliary data.
>>>        (compute_transaction_bits): New.
>>
>>
>> Looks good.  Thanks for your patience.
>>
>>
>> r~
>
>
> Thank you.  I have committed the patch.
>
> I will also look into the tree_could_trap business (and PRE and other
> passes) to see if we can divine some context.  But I probably won't get to
> it before early next week.

The tree_could_trap business is definitely harder because you lack
a stmt context - this helper takes a 'tree' argument.  And it's not enough
to adjust gimple_could_trap as both are used regularly...

So fixing up individual passes is easier - I can only think of PRE being
problematic right now, I am not aware that any other pass moves loads
or stores.  So I'd simply pre-compute the stmt bit in PRE and adjust
the

          if (gimple_has_volatile_ops (stmt)
              || stmt_could_throw_p (stmt))
            continue;

in compute_avail accordingly.

Richard.

> Thanks.
Aldy Hernandez - March 6, 2012, 5:55 p.m.
On 02/29/12 03:22, Richard Guenther wrote:

> So fixing up individual passes is easier - I can only think of PRE being
> problematic right now, I am not aware that any other pass moves loads
> or stores.  So I'd simply pre-compute the stmt bit in PRE and adjust
> the
>
>            if (gimple_has_volatile_ops (stmt)
>                || stmt_could_throw_p (stmt))
>              continue;
>
> in compute_avail accordingly.

Initially I thought PRE would be problematic for transactions, but 
perhaps it isn't.  As I understand, for PRE we hoist loads/computations 
that are mostly redundant, but will be performed on every path:

	if (flag)
		a = b + c;
	else
		stuff;
	d = b + c;		<-- [b + c] always computed

Even if we hoist [b + c] before the flag, [b + c] will be computed on 
every path out of "if (flag)...".  So... we can allow this 
transformation within transactions, right?

Torvald?
Richard Guenther - March 6, 2012, 8:18 p.m.
On Tue, Mar 6, 2012 at 6:55 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 02/29/12 03:22, Richard Guenther wrote:
>
>> So fixing up individual passes is easier - I can only think of PRE being
>> problematic right now, I am not aware that any other pass moves loads
>> or stores.  So I'd simply pre-compute the stmt bit in PRE and adjust
>> the
>>
>>           if (gimple_has_volatile_ops (stmt)
>>               || stmt_could_throw_p (stmt))
>>             continue;
>>
>> in compute_avail accordingly.
>
>
> Initially I thought PRE would be problematic for transactions, but perhaps
> it isn't.  As I understand, for PRE we hoist loads/computations that are
> mostly redundant, but will be performed on every path:
>
>        if (flag)
>                a = b + c;
>        else
>                stuff;
>        d = b + c;              <-- [b + c] always computed
>
> Even if we hoist [b + c] before the flag, [b + c] will be computed on every
> path out of "if (flag)...".  So... we can allow this transformation within
> transactions, right?

Note that partial PRE (enabled at -O3) can insert expressions into paths
that did _not_ execute the expression.  For regular PRE you are right.

Richard.

> Torvald?
Torvald Riegel - March 6, 2012, 8:56 p.m.
On Tue, 2012-03-06 at 21:18 +0100, Richard Guenther wrote:
> On Tue, Mar 6, 2012 at 6:55 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> > On 02/29/12 03:22, Richard Guenther wrote:
> >
> >> So fixing up individual passes is easier - I can only think of PRE being
> >> problematic right now, I am not aware that any other pass moves loads
> >> or stores.  So I'd simply pre-compute the stmt bit in PRE and adjust
> >> the
> >>
> >>           if (gimple_has_volatile_ops (stmt)
> >>               || stmt_could_throw_p (stmt))
> >>             continue;
> >>
> >> in compute_avail accordingly.
> >
> >
> > Initially I thought PRE would be problematic for transactions, but perhaps
> > it isn't.  As I understand, for PRE we hoist loads/computations that are
> > mostly redundant, but will be performed on every path:
> >
> >        if (flag)
> >                a = b + c;
> >        else
> >                stuff;
> >        d = b + c;              <-- [b + c] always computed
> >
> > Even if we hoist [b + c] before the flag, [b + c] will be computed on every
> > path out of "if (flag)...".  So... we can allow this transformation within
> > transactions, right?

In this particular example, I agree.  We can move [b + c] into the else
branch, and then move it to before flag because it will happen on all
paths to the exit anyway.

> Note that partial PRE (enabled at -O3) can insert expressions into paths
> that did _not_ execute the expression.  For regular PRE you are right.

I suppose if only loads will be moved around by PRE, then this could be
fine, as long as those expressions do not have visible side effects or
can crash if reading garbage.  For examples, dereferencing pointers
could lead to accessing unmapped memory and thus segfaults, speculative
stores are not allowed (even if you undo them later on), etc.

Also, if PRE inserts expressions into paths that did not execute the
transactions, can it happen that then something like loop invariant
motion comes around and optimizes based on that and moves the code to
before "if (flag)..."?  If so, PRE would break publication safety
indirectly by pretending that the expression happened on every path to
the exit, tricking subsequent passes to believe things that were not in
place in the source code.  Is this a realistic scenario?
Richard Guenther - March 7, 2012, 9:18 a.m.
On Tue, Mar 6, 2012 at 9:56 PM, Torvald Riegel <triegel@redhat.com> wrote:
> On Tue, 2012-03-06 at 21:18 +0100, Richard Guenther wrote:
>> On Tue, Mar 6, 2012 at 6:55 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> > On 02/29/12 03:22, Richard Guenther wrote:
>> >
>> >> So fixing up individual passes is easier - I can only think of PRE being
>> >> problematic right now, I am not aware that any other pass moves loads
>> >> or stores.  So I'd simply pre-compute the stmt bit in PRE and adjust
>> >> the
>> >>
>> >>           if (gimple_has_volatile_ops (stmt)
>> >>               || stmt_could_throw_p (stmt))
>> >>             continue;
>> >>
>> >> in compute_avail accordingly.
>> >
>> >
>> > Initially I thought PRE would be problematic for transactions, but perhaps
>> > it isn't.  As I understand, for PRE we hoist loads/computations that are
>> > mostly redundant, but will be performed on every path:
>> >
>> >        if (flag)
>> >                a = b + c;
>> >        else
>> >                stuff;
>> >        d = b + c;              <-- [b + c] always computed
>> >
>> > Even if we hoist [b + c] before the flag, [b + c] will be computed on every
>> > path out of "if (flag)...".  So... we can allow this transformation within
>> > transactions, right?
>
> In this particular example, I agree.  We can move [b + c] into the else
> branch, and then move it to before flag because it will happen on all
> paths to the exit anyway.
>
>> Note that partial PRE (enabled at -O3) can insert expressions into paths
>> that did _not_ execute the expression.  For regular PRE you are right.
>
> I suppose if only loads will be moved around by PRE, then this could be
> fine, as long as those expressions do not have visible side effects or
> can crash if reading garbage.  For examples, dereferencing pointers
> could lead to accessing unmapped memory and thus segfaults, speculative
> stores are not allowed (even if you undo them later on), etc.
>
> Also, if PRE inserts expressions into paths that did not execute the
> transactions, can it happen that then something like loop invariant
> motion comes around and optimizes based on that and moves the code to
> before "if (flag)..."?  If so, PRE would break publication safety
> indirectly by pretending that the expression happened on every path to
> the exit, tricking subsequent passes to believe things that were not in
> place in the source code.  Is this a realistic scenario?

I think so.

Richard.
Aldy Hernandez - March 7, 2012, 3:06 p.m.
On 03/07/12 03:18, Richard Guenther wrote:
> On Tue, Mar 6, 2012 at 9:56 PM, Torvald Riegel<triegel@redhat.com>  wrote:
>> On Tue, 2012-03-06 at 21:18 +0100, Richard Guenther wrote:
>>> On Tue, Mar 6, 2012 at 6:55 PM, Aldy Hernandez<aldyh@redhat.com>  wrote:
>>>> On 02/29/12 03:22, Richard Guenther wrote:
>>>>
>>>>> So fixing up individual passes is easier - I can only think of PRE being
>>>>> problematic right now, I am not aware that any other pass moves loads
>>>>> or stores.  So I'd simply pre-compute the stmt bit in PRE and adjust
>>>>> the
>>>>>
>>>>>            if (gimple_has_volatile_ops (stmt)
>>>>>                || stmt_could_throw_p (stmt))
>>>>>              continue;
>>>>>
>>>>> in compute_avail accordingly.
>>>>
>>>>
>>>> Initially I thought PRE would be problematic for transactions, but perhaps
>>>> it isn't.  As I understand, for PRE we hoist loads/computations that are
>>>> mostly redundant, but will be performed on every path:
>>>>
>>>>         if (flag)
>>>>                 a = b + c;
>>>>         else
>>>>                 stuff;
>>>>         d = b + c;<-- [b + c] always computed
>>>>
>>>> Even if we hoist [b + c] before the flag, [b + c] will be computed on every
>>>> path out of "if (flag)...".  So... we can allow this transformation within
>>>> transactions, right?
>>
>> In this particular example, I agree.  We can move [b + c] into the else
>> branch, and then move it to before flag because it will happen on all
>> paths to the exit anyway.
>>
>>> Note that partial PRE (enabled at -O3) can insert expressions into paths
>>> that did _not_ execute the expression.  For regular PRE you are right.
>>
>> I suppose if only loads will be moved around by PRE, then this could be
>> fine, as long as those expressions do not have visible side effects or
>> can crash if reading garbage.  For examples, dereferencing pointers
>> could lead to accessing unmapped memory and thus segfaults, speculative
>> stores are not allowed (even if you undo them later on), etc.
>>
>> Also, if PRE inserts expressions into paths that did not execute the
>> transactions, can it happen that then something like loop invariant
>> motion comes around and optimizes based on that and moves the code to
>> before "if (flag)..."?  If so, PRE would break publication safety
>> indirectly by pretending that the expression happened on every path to
>> the exit, tricking subsequent passes to believe things that were not in
>> place in the source code.  Is this a realistic scenario?
>
> I think so.

Wow, yeah.  I hadn't thought about that.

Fortunately in the current code base this won't happen because loop 
invariant motion will refuse to move _any_ loads that happen inside a 
transaction, and because of the memory barrier at the beginning of 
transactions, no code will be moved out of a transaction.  However, when 
we optimize things (4.8?) and allow loop invariant motion inside 
transactions when the load happens on every path to exit, then yes... we 
need to keep this problematic scenario in mind.

For now I believe we're safe, modulo the partial PRE scenario that 
Richard G pointed out.

Aldy

Patch

Index: tree-ssa-loop-im.c
===================================================================
--- tree-ssa-loop-im.c	(revision 184271)
+++ tree-ssa-loop-im.c	(working copy)
@@ -150,7 +150,7 @@  typedef struct mem_ref
 
   bitmap indep_ref;		/* The set of memory references on that
 				   this reference is independent.  */
-  bitmap dep_ref;		/* The complement of DEP_REF.  */
+  bitmap dep_ref;		/* The complement of INDEP_REF.  */
 } *mem_ref_p;
 
 DEF_VEC_P(mem_ref_p);
@@ -412,6 +412,26 @@  movement_possibility (gimple stmt)
       || gimple_could_trap_p (stmt))
     return MOVE_PRESERVE_EXECUTION;
 
+  /* Non local loads in a transaction cannot be hoisted out.  Well,
+     unless the load happens on every path out of the loop, but we
+     don't take this into account yet.  */
+  if (flag_tm
+      && gimple_in_transaction (stmt)
+      && gimple_assign_single_p (stmt))
+    {
+      tree rhs = gimple_assign_rhs1 (stmt);
+      if (DECL_P (rhs) && is_global_var (rhs))
+	{
+	  if (dump_file)
+	    {
+	      fprintf (dump_file, "Cannot hoist conditional load of ");
+	      print_generic_expr (dump_file, rhs, TDF_SLIM);
+	      fprintf (dump_file, " because it is in a transaction.\n");
+	    }
+	  return MOVE_IMPOSSIBLE;
+	}
+    }
+
   return ret;
 }
 
@@ -2387,6 +2407,9 @@  tree_ssa_lim_initialize (void)
   sbitmap_free (contains_call);
 
   lim_aux_data_map = pointer_map_create ();
+
+  if (flag_tm)
+    compute_transaction_bits ();
 }
 
 /* Cleans up after the invariant motion pass.  */
Index: testsuite/gcc.dg/tm/pub-safety-1.c
===================================================================
--- testsuite/gcc.dg/tm/pub-safety-1.c	(revision 0)
+++ testsuite/gcc.dg/tm/pub-safety-1.c	(revision 0)
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim1" } */
+
+/* Test that thread visible loads do not get hoisted out of loops if
+   the load would not have occurred on each path out of the loop.  */
+
+int x[10] = {0,0,0,0,0,0,0,0,0,0};
+int DATA_DATA = 0;
+
+void reader()
+{
+  int i;
+  __transaction_atomic
+    { 
+      for (i = 0; i < 10; i++)
+        if (x[i])
+          x[i] += DATA_DATA;
+      /* If we loaded DATA_DATA here, we could hoist it before the loop,
+	 but since we don't... we can't.  */
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: trans-mem.c
===================================================================
--- trans-mem.c	(revision 184272)
+++ trans-mem.c	(working copy)
@@ -1858,18 +1858,25 @@  tm_region_init (struct tm_region *region
   VEC(basic_block, heap) *queue = NULL;
   bitmap visited_blocks = BITMAP_ALLOC (NULL);
   struct tm_region *old_region;
+  struct tm_region **region_worklist;
 
   all_tm_regions = region;
   bb = single_succ (ENTRY_BLOCK_PTR);
 
+  /* We could store this information in bb->aux, but we may get called
+     through get_all_tm_blocks() from another pass that may be already
+     using bb->aux.  */
+  region_worklist =
+    (struct tm_region **) xcalloc (sizeof (struct tm_region *),
+				  n_basic_blocks + NUM_FIXED_BLOCKS + 2);
+
   VEC_safe_push (basic_block, heap, queue, bb);
-  gcc_assert (!bb->aux);	/* FIXME: Remove me.  */
-  bb->aux = region;
+  region_worklist[bb->index] = region;
   do
     {
       bb = VEC_pop (basic_block, queue);
-      region = (struct tm_region *)bb->aux;
-      bb->aux = NULL;
+      region = region_worklist[bb->index];
+      region_worklist[bb->index] = NULL;
 
       /* Record exit and irrevocable blocks.  */
       region = tm_region_init_1 (region, bb);
@@ -1886,20 +1893,20 @@  tm_region_init (struct tm_region *region
 	  {
 	    bitmap_set_bit (visited_blocks, e->dest->index);
 	    VEC_safe_push (basic_block, heap, queue, e->dest);
-	    gcc_assert (!e->dest->aux); /* FIXME: Remove me.  */
 
 	    /* If the current block started a new region, make sure that only
 	       the entry block of the new region is associated with this region.
 	       Other successors are still part of the old region.  */
 	    if (old_region != region && e->dest != region->entry_block)
-	      e->dest->aux = old_region;
+	      region_worklist[e->dest->index] = old_region;
 	    else
-	      e->dest->aux = region;
+	      region_worklist[e->dest->index] = region;
 	  }
     }
   while (!VEC_empty (basic_block, queue));
   VEC_free (basic_block, heap, queue);
   BITMAP_FREE (visited_blocks);
+  free (region_worklist);
 }
 
 /* The "gate" function for all transactional memory expansion and optimization
@@ -2424,6 +2431,42 @@  get_tm_region_blocks (basic_block entry_
   return bbs;
 }
 
+/* Set the IN_TRANSACTION for all gimple statements that appear in a
+   transaction.  */
+
+void
+compute_transaction_bits (void)
+{
+  struct tm_region *region;
+  VEC (basic_block, heap) *queue;
+  unsigned int i;
+  gimple_stmt_iterator gsi;
+  basic_block bb;
+
+  /* ?? Perhaps we need to abstract gate_tm_init further, because we
+     certainly don't need it to calculate CDI_DOMINATOR info.  */
+  gate_tm_init ();
+
+  for (region = all_tm_regions; region; region = region->next)
+    {
+      queue = get_tm_region_blocks (region->entry_block,
+				    region->exit_blocks,
+				    region->irr_blocks,
+				    NULL,
+				    /*stop_at_irr_p=*/true);
+      for (i = 0; VEC_iterate (basic_block, queue, i, bb); ++i)
+	for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	  {
+	    gimple stmt = gsi_stmt (gsi);
+	    gimple_set_in_transaction (stmt, true);
+	  }
+      VEC_free (basic_block, heap, queue);
+    }
+
+  if (all_tm_regions)
+    bitmap_obstack_release (&tm_obstack);
+}
+
 /* Entry point to the MARK phase of TM expansion.  Here we replace
    transactional memory statements with calls to builtins, and function
    calls with their transactional clones (if available).  But we don't
Index: gimple.h
===================================================================
--- gimple.h	(revision 184271)
+++ gimple.h	(working copy)
@@ -305,8 +305,10 @@  struct GTY(()) gimple_statement_base {
   /* Nonzero if this statement contains volatile operands.  */
   unsigned has_volatile_ops 	: 1;
 
-  /* Padding to get subcode to 16 bit alignment.  */
-  unsigned pad			: 1;
+  /* Nonzero if this statement appears inside a transaction.  This bit
+     is calculated on de-mand and has relevant information only after
+     it has been calculated with compute_transaction_bits.  */
+  unsigned in_transaction	: 1;
 
   /* The SUBCODE field can be used for tuple-specific flags for tuples
      that do not require subcodes.  Note that SUBCODE should be at
@@ -1122,6 +1124,7 @@  extern tree omp_reduction_init (tree, tr
 
 /* In trans-mem.c.  */
 extern void diagnose_tm_safe_errors (tree);
+extern void compute_transaction_bits (void);
 
 /* In tree-nested.c.  */
 extern void lower_nested_functions (tree);
@@ -1586,6 +1589,21 @@  gimple_set_has_volatile_ops (gimple stmt
     stmt->gsbase.has_volatile_ops = (unsigned) volatilep;
 }
 
+/* Return true if STMT is in a transaction.  */
+
+static inline bool
+gimple_in_transaction (gimple stmt)
+{
+  return stmt->gsbase.in_transaction;
+}
+
+/* Set the IN_TRANSACTION flag to TRANSACTIONP.  */
+
+static inline void
+gimple_set_in_transaction (gimple stmt, bool transactionp)
+{
+  stmt->gsbase.in_transaction = (unsigned) transactionp;
+}
 
 /* Return true if statement STMT may access memory.  */