[PR,tree-optimization/52558] : RFC: questions on store data race

Hi.  Sorry for the delay.  There were various tricky hiccups along the 
way to bootstrappability and regression cleanliness...

On 04/26/12 04:51, Richard Guenther wrote:
> On Wed, 25 Apr 2012, Aldy Hernandez wrote:
>
>> On 04/25/12 06:45, Richard Guenther wrote:
>>> On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<aldyh@redhat.com>   wrote:
>>>> On 04/13/12 03:46, Richard Guenther wrote:
>>>>>
>>>>> On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<aldyh@redhat.com>
>>>>> wrote:

>> Speak of loads, I am keeping the information as an additional bitmap in
>> `memory_accesses', as ->refs_in_loop was set for stores as well, so I couldn't
>> depend on it.  Let me know if you have another idea.
>
> Hmm, refs_in_loop&  ~all_refs_stored_in_loop, so instead of
>
> +      bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop,
> +                               loop->num);
> +      ref_read_in_loop_p = bitmap_bit_p (reads, ref->id);
>
> ref_read_in_loop_p = bitmap_bit_p (refs, ref->id)&&  !bitmap_bit_p
> (stores, ref->id);
>
> ?  But maybe that doesn't work if a ref is both read and stored to.
> Btw, rather than adding a bitmap to memory_accesses I'd rather add
> a mark_ref_loaded corresponding to mark_ref_stored (or rather merge
> both into one) and a bitmap to struct mem_ref.

I have done so as mark_ref_loaded().  I first tried to merge both into a 
generic mark_ref(), to avoid iterating twice, but I was concerned with 
the early loop exit of !bitmap_bit_p(ref->stored, loop->num), not 
knowing if I should exit the loop altogether in the merged case or what. 
  In either case, the code was somewhat convoluted, so I opted for a 
separate clean function.

In scanning the gimple stream for the loads I noticed that instances of 
two component refs (say foo.bar) were not pointer comparable, so I am 
asking the alias oracle with refs_may_alias_p.  It also seemed to make 
more sense wrt memory chunks that may alias.  What do you think?

This patch is far more robust and fully tested.  Two wrenches to 
complicate things though...

First, while inserting the code writing the _LSM_ temporary writes into 
the original variables, I found a handful of tests where the order of 
the writes mattered (aliased locations, inlined code, etc, IIRC).  [This 
is in loops where multiple stores are moved.]  So iterating and 
inserting on the exit edges caused the stores to be saved in reverse.  I 
am now saving the edges and threading the code, so they happen in the 
correct order:

		if (foo_flag_lsm)
			foo = foo_lsm;
		if (foo2_flag_lsm)
			foo2 = foo2_lsm;
	actual_exit:

This required some edge redirection so we didn't jump to the original 
actual exit prematurely when handling multiple moved stores.  The 
dominator tree needed to be updated, and there is one instance where I 
couldn't find a clever way of saving the order without jumping through 
an inordinate amount of hoops.  It seemed easier to call 
recompute_dominator.

Second, avoiding the load, unless it happens in the loop like you have 
suggested, brought about some bogus unused warnings with the LSM 
temporaries.  What happens is that compute_ssa() creates uses of the LSM 
temporaries that are not yet defined, so we end up with PHI nodes with a 
definition entry (D).  Since we are sure the eventual uses will be gated 
by their corresponding flag, I created phi nodes with an initial value 
of 0 in the preheader of the loops in which they are used.  This solved 
the problem.  I also played with initializing to 0 at the entry block, 
but that seemed a bit too invasive.

Andrew suggested the correct fix was to add a new pass that was able to 
do some ?? flow sensitive data flow analysis ?? that could discover 
these unreachable paths and insert the 0 phis at the start of the blocks 
automatically.  But that seemed like far too much work, considering how 
long it's taken me to get this far ;-).

Bootstraped and fully tested with multi_threaded_model_p=true 
hard-coded.  This is the revision I'd like to contribute, sans the 
hardcoded value.

What do you think?
* cfg.c (alloc_aux_for_edge): Fix comment.
	(alloc_aux_for_edge): Remove static.
	* gimple.h (gimple_set_in_transaction): Remove.
	(gimple_in_transaction): Look in BB instead.
	(gimple_statement_base): Remove in_transaction field.
	* basic-block.h (enum bb_flags): Add BB_IN_TRANSACTION.
	(alloc_aux_for_edge): Protoize.
	* trans-mem.c (compute_transaction_bits): Place transaction bit
	information into basic blocks.
	* Makefile.in (tree-ssa-loop-im.o): Depend on TARGET_H.
	* doc/tm.texi: Regenerate.
	* tree-ssa-loop-im.c (execute_sm_if_changed): New.
	(execute_sm_if_changed_flag): New.
	(execute_sm_if_changed_flag_set): New.
	(execute_sm): Do not generate data races unless requested.
	(mark_ref): Rename from mark_ref_stored.  Set loaded bit.
	(mem_ref_alloc): Initialize loaded field.
	(memref_free): Free loaded field.
	(tree_ssa_lim_initialize): Call alloc_aux_for_edges.
	(tree_ssa_lim_finalize): Call free_aux_for_edges.
	* targhooks.c (default_can_compare_bitwise_p): New.
	* targhooks.h (default_can_compare_bitwise_p): Protoize.
	* target.def (can_compare_bitwise_p): New hook.

[PR,tree-optimization/52558] : RFC: questions on store data race

Commit Message

Comments

Patch