Message ID | 4F4D1277.9080206@redhat.com |
---|---|
State | New |
Headers | show |
On 02/28/12 09:44, Aldy Hernandez wrote: > PR middle-end/51752 > * gimple.h (gimple_in_transaction): New. > (gimple_set_in_transaction): New. > (struct gimple_statement_base): Add in_transaction field. > * tree-ssa-loop-im.c: (movement_possibility): Restrict movement of > transaction loads. > (tree_ssa_lim_initialize): Compute transaction bits. > * tree.h (compute_transaction_bits): Protoize. > * trans-mem.c (tm_region_init): Use the heap to store BB > auxilliary data. > (compute_transaction_bits): New. Looks good. Thanks for your patience. r~
On 02/28/12 13:12, Richard Henderson wrote: > On 02/28/12 09:44, Aldy Hernandez wrote: >> PR middle-end/51752 >> * gimple.h (gimple_in_transaction): New. >> (gimple_set_in_transaction): New. >> (struct gimple_statement_base): Add in_transaction field. >> * tree-ssa-loop-im.c: (movement_possibility): Restrict movement of >> transaction loads. >> (tree_ssa_lim_initialize): Compute transaction bits. >> * tree.h (compute_transaction_bits): Protoize. >> * trans-mem.c (tm_region_init): Use the heap to store BB >> auxilliary data. >> (compute_transaction_bits): New. > > Looks good. Thanks for your patience. > > > r~ Thank you. I have committed the patch. I will also look into the tree_could_trap business (and PRE and other passes) to see if we can divine some context. But I probably won't get to it before early next week. Thanks.
On Tue, Feb 28, 2012 at 9:11 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > On 02/28/12 13:12, Richard Henderson wrote: >> >> On 02/28/12 09:44, Aldy Hernandez wrote: >>> >>> PR middle-end/51752 >>> * gimple.h (gimple_in_transaction): New. >>> (gimple_set_in_transaction): New. >>> (struct gimple_statement_base): Add in_transaction field. >>> * tree-ssa-loop-im.c: (movement_possibility): Restrict movement of >>> transaction loads. >>> (tree_ssa_lim_initialize): Compute transaction bits. >>> * tree.h (compute_transaction_bits): Protoize. >>> * trans-mem.c (tm_region_init): Use the heap to store BB >>> auxilliary data. >>> (compute_transaction_bits): New. >> >> >> Looks good. Thanks for your patience. >> >> >> r~ > > > Thank you. I have committed the patch. > > I will also look into the tree_could_trap business (and PRE and other > passes) to see if we can divine some context. But I probably won't get to > it before early next week. The tree_could_trap business is definitely harder because you lack a stmt context - this helper takes a 'tree' argument. And it's not enough to adjust gimple_could_trap as both are used regularly... So fixing up individual passes is easier - I can only think of PRE being problematic right now, I am not aware that any other pass moves loads or stores. So I'd simply pre-compute the stmt bit in PRE and adjust the if (gimple_has_volatile_ops (stmt) || stmt_could_throw_p (stmt)) continue; in compute_avail accordingly. Richard. > Thanks.
On 02/29/12 03:22, Richard Guenther wrote: > So fixing up individual passes is easier - I can only think of PRE being > problematic right now, I am not aware that any other pass moves loads > or stores. So I'd simply pre-compute the stmt bit in PRE and adjust > the > > if (gimple_has_volatile_ops (stmt) > || stmt_could_throw_p (stmt)) > continue; > > in compute_avail accordingly. Initially I thought PRE would be problematic for transactions, but perhaps it isn't. As I understand, for PRE we hoist loads/computations that are mostly redundant, but will be performed on every path: if (flag) a = b + c; else stuff; d = b + c; <-- [b + c] always computed Even if we hoist [b + c] before the flag, [b + c] will be computed on every path out of "if (flag)...". So... we can allow this transformation within transactions, right? Torvald?
On Tue, Mar 6, 2012 at 6:55 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > On 02/29/12 03:22, Richard Guenther wrote: > >> So fixing up individual passes is easier - I can only think of PRE being >> problematic right now, I am not aware that any other pass moves loads >> or stores. So I'd simply pre-compute the stmt bit in PRE and adjust >> the >> >> if (gimple_has_volatile_ops (stmt) >> || stmt_could_throw_p (stmt)) >> continue; >> >> in compute_avail accordingly. > > > Initially I thought PRE would be problematic for transactions, but perhaps > it isn't. As I understand, for PRE we hoist loads/computations that are > mostly redundant, but will be performed on every path: > > if (flag) > a = b + c; > else > stuff; > d = b + c; <-- [b + c] always computed > > Even if we hoist [b + c] before the flag, [b + c] will be computed on every > path out of "if (flag)...". So... we can allow this transformation within > transactions, right? Note that partial PRE (enabled at -O3) can insert expressions into paths that did _not_ execute the expression. For regular PRE you are right. Richard. > Torvald?
On Tue, 2012-03-06 at 21:18 +0100, Richard Guenther wrote: > On Tue, Mar 6, 2012 at 6:55 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > > On 02/29/12 03:22, Richard Guenther wrote: > > > >> So fixing up individual passes is easier - I can only think of PRE being > >> problematic right now, I am not aware that any other pass moves loads > >> or stores. So I'd simply pre-compute the stmt bit in PRE and adjust > >> the > >> > >> if (gimple_has_volatile_ops (stmt) > >> || stmt_could_throw_p (stmt)) > >> continue; > >> > >> in compute_avail accordingly. > > > > > > Initially I thought PRE would be problematic for transactions, but perhaps > > it isn't. As I understand, for PRE we hoist loads/computations that are > > mostly redundant, but will be performed on every path: > > > > if (flag) > > a = b + c; > > else > > stuff; > > d = b + c; <-- [b + c] always computed > > > > Even if we hoist [b + c] before the flag, [b + c] will be computed on every > > path out of "if (flag)...". So... we can allow this transformation within > > transactions, right? In this particular example, I agree. We can move [b + c] into the else branch, and then move it to before flag because it will happen on all paths to the exit anyway. > Note that partial PRE (enabled at -O3) can insert expressions into paths > that did _not_ execute the expression. For regular PRE you are right. I suppose if only loads will be moved around by PRE, then this could be fine, as long as those expressions do not have visible side effects or can crash if reading garbage. For examples, dereferencing pointers could lead to accessing unmapped memory and thus segfaults, speculative stores are not allowed (even if you undo them later on), etc. Also, if PRE inserts expressions into paths that did not execute the transactions, can it happen that then something like loop invariant motion comes around and optimizes based on that and moves the code to before "if (flag)..."? If so, PRE would break publication safety indirectly by pretending that the expression happened on every path to the exit, tricking subsequent passes to believe things that were not in place in the source code. Is this a realistic scenario?
On Tue, Mar 6, 2012 at 9:56 PM, Torvald Riegel <triegel@redhat.com> wrote: > On Tue, 2012-03-06 at 21:18 +0100, Richard Guenther wrote: >> On Tue, Mar 6, 2012 at 6:55 PM, Aldy Hernandez <aldyh@redhat.com> wrote: >> > On 02/29/12 03:22, Richard Guenther wrote: >> > >> >> So fixing up individual passes is easier - I can only think of PRE being >> >> problematic right now, I am not aware that any other pass moves loads >> >> or stores. So I'd simply pre-compute the stmt bit in PRE and adjust >> >> the >> >> >> >> if (gimple_has_volatile_ops (stmt) >> >> || stmt_could_throw_p (stmt)) >> >> continue; >> >> >> >> in compute_avail accordingly. >> > >> > >> > Initially I thought PRE would be problematic for transactions, but perhaps >> > it isn't. As I understand, for PRE we hoist loads/computations that are >> > mostly redundant, but will be performed on every path: >> > >> > if (flag) >> > a = b + c; >> > else >> > stuff; >> > d = b + c; <-- [b + c] always computed >> > >> > Even if we hoist [b + c] before the flag, [b + c] will be computed on every >> > path out of "if (flag)...". So... we can allow this transformation within >> > transactions, right? > > In this particular example, I agree. We can move [b + c] into the else > branch, and then move it to before flag because it will happen on all > paths to the exit anyway. > >> Note that partial PRE (enabled at -O3) can insert expressions into paths >> that did _not_ execute the expression. For regular PRE you are right. > > I suppose if only loads will be moved around by PRE, then this could be > fine, as long as those expressions do not have visible side effects or > can crash if reading garbage. For examples, dereferencing pointers > could lead to accessing unmapped memory and thus segfaults, speculative > stores are not allowed (even if you undo them later on), etc. > > Also, if PRE inserts expressions into paths that did not execute the > transactions, can it happen that then something like loop invariant > motion comes around and optimizes based on that and moves the code to > before "if (flag)..."? If so, PRE would break publication safety > indirectly by pretending that the expression happened on every path to > the exit, tricking subsequent passes to believe things that were not in > place in the source code. Is this a realistic scenario? I think so. Richard.
On 03/07/12 03:18, Richard Guenther wrote: > On Tue, Mar 6, 2012 at 9:56 PM, Torvald Riegel<triegel@redhat.com> wrote: >> On Tue, 2012-03-06 at 21:18 +0100, Richard Guenther wrote: >>> On Tue, Mar 6, 2012 at 6:55 PM, Aldy Hernandez<aldyh@redhat.com> wrote: >>>> On 02/29/12 03:22, Richard Guenther wrote: >>>> >>>>> So fixing up individual passes is easier - I can only think of PRE being >>>>> problematic right now, I am not aware that any other pass moves loads >>>>> or stores. So I'd simply pre-compute the stmt bit in PRE and adjust >>>>> the >>>>> >>>>> if (gimple_has_volatile_ops (stmt) >>>>> || stmt_could_throw_p (stmt)) >>>>> continue; >>>>> >>>>> in compute_avail accordingly. >>>> >>>> >>>> Initially I thought PRE would be problematic for transactions, but perhaps >>>> it isn't. As I understand, for PRE we hoist loads/computations that are >>>> mostly redundant, but will be performed on every path: >>>> >>>> if (flag) >>>> a = b + c; >>>> else >>>> stuff; >>>> d = b + c;<-- [b + c] always computed >>>> >>>> Even if we hoist [b + c] before the flag, [b + c] will be computed on every >>>> path out of "if (flag)...". So... we can allow this transformation within >>>> transactions, right? >> >> In this particular example, I agree. We can move [b + c] into the else >> branch, and then move it to before flag because it will happen on all >> paths to the exit anyway. >> >>> Note that partial PRE (enabled at -O3) can insert expressions into paths >>> that did _not_ execute the expression. For regular PRE you are right. >> >> I suppose if only loads will be moved around by PRE, then this could be >> fine, as long as those expressions do not have visible side effects or >> can crash if reading garbage. For examples, dereferencing pointers >> could lead to accessing unmapped memory and thus segfaults, speculative >> stores are not allowed (even if you undo them later on), etc. >> >> Also, if PRE inserts expressions into paths that did not execute the >> transactions, can it happen that then something like loop invariant >> motion comes around and optimizes based on that and moves the code to >> before "if (flag)..."? If so, PRE would break publication safety >> indirectly by pretending that the expression happened on every path to >> the exit, tricking subsequent passes to believe things that were not in >> place in the source code. Is this a realistic scenario? > > I think so. Wow, yeah. I hadn't thought about that. Fortunately in the current code base this won't happen because loop invariant motion will refuse to move _any_ loads that happen inside a transaction, and because of the memory barrier at the beginning of transactions, no code will be moved out of a transaction. However, when we optimize things (4.8?) and allow loop invariant motion inside transactions when the load happens on every path to exit, then yes... we need to keep this problematic scenario in mind. For now I believe we're safe, modulo the partial PRE scenario that Richard G pointed out. Aldy
Index: tree-ssa-loop-im.c =================================================================== --- tree-ssa-loop-im.c (revision 184271) +++ tree-ssa-loop-im.c (working copy) @@ -150,7 +150,7 @@ typedef struct mem_ref bitmap indep_ref; /* The set of memory references on that this reference is independent. */ - bitmap dep_ref; /* The complement of DEP_REF. */ + bitmap dep_ref; /* The complement of INDEP_REF. */ } *mem_ref_p; DEF_VEC_P(mem_ref_p); @@ -412,6 +412,26 @@ movement_possibility (gimple stmt) || gimple_could_trap_p (stmt)) return MOVE_PRESERVE_EXECUTION; + /* Non local loads in a transaction cannot be hoisted out. Well, + unless the load happens on every path out of the loop, but we + don't take this into account yet. */ + if (flag_tm + && gimple_in_transaction (stmt) + && gimple_assign_single_p (stmt)) + { + tree rhs = gimple_assign_rhs1 (stmt); + if (DECL_P (rhs) && is_global_var (rhs)) + { + if (dump_file) + { + fprintf (dump_file, "Cannot hoist conditional load of "); + print_generic_expr (dump_file, rhs, TDF_SLIM); + fprintf (dump_file, " because it is in a transaction.\n"); + } + return MOVE_IMPOSSIBLE; + } + } + return ret; } @@ -2387,6 +2407,9 @@ tree_ssa_lim_initialize (void) sbitmap_free (contains_call); lim_aux_data_map = pointer_map_create (); + + if (flag_tm) + compute_transaction_bits (); } /* Cleans up after the invariant motion pass. */ Index: testsuite/gcc.dg/tm/pub-safety-1.c =================================================================== --- testsuite/gcc.dg/tm/pub-safety-1.c (revision 0) +++ testsuite/gcc.dg/tm/pub-safety-1.c (revision 0) @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim1" } */ + +/* Test that thread visible loads do not get hoisted out of loops if + the load would not have occurred on each path out of the loop. */ + +int x[10] = {0,0,0,0,0,0,0,0,0,0}; +int DATA_DATA = 0; + +void reader() +{ + int i; + __transaction_atomic + { + for (i = 0; i < 10; i++) + if (x[i]) + x[i] += DATA_DATA; + /* If we loaded DATA_DATA here, we could hoist it before the loop, + but since we don't... we can't. */ + } +} + +/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 1 "lim1" } } */ +/* { dg-final { cleanup-tree-dump "lim1" } } */ Index: trans-mem.c =================================================================== --- trans-mem.c (revision 184272) +++ trans-mem.c (working copy) @@ -1858,18 +1858,25 @@ tm_region_init (struct tm_region *region VEC(basic_block, heap) *queue = NULL; bitmap visited_blocks = BITMAP_ALLOC (NULL); struct tm_region *old_region; + struct tm_region **region_worklist; all_tm_regions = region; bb = single_succ (ENTRY_BLOCK_PTR); + /* We could store this information in bb->aux, but we may get called + through get_all_tm_blocks() from another pass that may be already + using bb->aux. */ + region_worklist = + (struct tm_region **) xcalloc (sizeof (struct tm_region *), + n_basic_blocks + NUM_FIXED_BLOCKS + 2); + VEC_safe_push (basic_block, heap, queue, bb); - gcc_assert (!bb->aux); /* FIXME: Remove me. */ - bb->aux = region; + region_worklist[bb->index] = region; do { bb = VEC_pop (basic_block, queue); - region = (struct tm_region *)bb->aux; - bb->aux = NULL; + region = region_worklist[bb->index]; + region_worklist[bb->index] = NULL; /* Record exit and irrevocable blocks. */ region = tm_region_init_1 (region, bb); @@ -1886,20 +1893,20 @@ tm_region_init (struct tm_region *region { bitmap_set_bit (visited_blocks, e->dest->index); VEC_safe_push (basic_block, heap, queue, e->dest); - gcc_assert (!e->dest->aux); /* FIXME: Remove me. */ /* If the current block started a new region, make sure that only the entry block of the new region is associated with this region. Other successors are still part of the old region. */ if (old_region != region && e->dest != region->entry_block) - e->dest->aux = old_region; + region_worklist[e->dest->index] = old_region; else - e->dest->aux = region; + region_worklist[e->dest->index] = region; } } while (!VEC_empty (basic_block, queue)); VEC_free (basic_block, heap, queue); BITMAP_FREE (visited_blocks); + free (region_worklist); } /* The "gate" function for all transactional memory expansion and optimization @@ -2424,6 +2431,42 @@ get_tm_region_blocks (basic_block entry_ return bbs; } +/* Set the IN_TRANSACTION for all gimple statements that appear in a + transaction. */ + +void +compute_transaction_bits (void) +{ + struct tm_region *region; + VEC (basic_block, heap) *queue; + unsigned int i; + gimple_stmt_iterator gsi; + basic_block bb; + + /* ?? Perhaps we need to abstract gate_tm_init further, because we + certainly don't need it to calculate CDI_DOMINATOR info. */ + gate_tm_init (); + + for (region = all_tm_regions; region; region = region->next) + { + queue = get_tm_region_blocks (region->entry_block, + region->exit_blocks, + region->irr_blocks, + NULL, + /*stop_at_irr_p=*/true); + for (i = 0; VEC_iterate (basic_block, queue, i, bb); ++i) + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple stmt = gsi_stmt (gsi); + gimple_set_in_transaction (stmt, true); + } + VEC_free (basic_block, heap, queue); + } + + if (all_tm_regions) + bitmap_obstack_release (&tm_obstack); +} + /* Entry point to the MARK phase of TM expansion. Here we replace transactional memory statements with calls to builtins, and function calls with their transactional clones (if available). But we don't Index: gimple.h =================================================================== --- gimple.h (revision 184271) +++ gimple.h (working copy) @@ -305,8 +305,10 @@ struct GTY(()) gimple_statement_base { /* Nonzero if this statement contains volatile operands. */ unsigned has_volatile_ops : 1; - /* Padding to get subcode to 16 bit alignment. */ - unsigned pad : 1; + /* Nonzero if this statement appears inside a transaction. This bit + is calculated on de-mand and has relevant information only after + it has been calculated with compute_transaction_bits. */ + unsigned in_transaction : 1; /* The SUBCODE field can be used for tuple-specific flags for tuples that do not require subcodes. Note that SUBCODE should be at @@ -1122,6 +1124,7 @@ extern tree omp_reduction_init (tree, tr /* In trans-mem.c. */ extern void diagnose_tm_safe_errors (tree); +extern void compute_transaction_bits (void); /* In tree-nested.c. */ extern void lower_nested_functions (tree); @@ -1586,6 +1589,21 @@ gimple_set_has_volatile_ops (gimple stmt stmt->gsbase.has_volatile_ops = (unsigned) volatilep; } +/* Return true if STMT is in a transaction. */ + +static inline bool +gimple_in_transaction (gimple stmt) +{ + return stmt->gsbase.in_transaction; +} + +/* Set the IN_TRANSACTION flag to TRANSACTIONP. */ + +static inline void +gimple_set_in_transaction (gimple stmt, bool transactionp) +{ + stmt->gsbase.in_transaction = (unsigned) transactionp; +} /* Return true if statement STMT may access memory. */