diff mbox

Remove noce_mem_write_may_trap_or_fault_p in ifcvt

Message ID 564CCE73.1090907@redhat.com
State New
Headers show

Commit Message

Bernd Schmidt Nov. 18, 2015, 7:16 p.m. UTC
Ok, so this is a thorny problem. I thought I had a solution, and I'll 
start by describing it, but in the end I think it doesn't actually work.

On 11/06/2015 10:09 PM, Jeff Law wrote:
> On 11/06/2015 12:30 PM, Bernd Schmidt wrote:
>> Well, I think if MEM_READONLY_P is insufficient (and I guess people
>> could cast away const - bother), then the current
>> noce_mem_write_may_trap_or_fault_p should be simplified to "return
>> true;" and eliminated.  We could try to special case stack accesses
>> within a known limit later on.
>>
>> Maybe it doesn't even matter given that we call noce_can_store_speculate
>> immediately after this test.
> may_trap_or_fault_p already has this kind of knowledge.  It just doesn't
> know if the memory is a read or a write.  Hence my suggestion we should
> fix may_trap_or_fault_p.

For the current issue I've come to the conclusion that this kind of 
analysis is irrelevant here (and that is not subject to the problem I'll 
describe later), because of the use of noce_can_store_speculate. See below.

> Ripping out noce_mem_write_may_trap_or_fault_p without fixing
> may_trap_or_fault_p introduces a latent code code generation issue.

I don't think so, actually. One safe option would be to rip it out and 
just stop transforming this case, but let's start by looking at the code 
just a bit further down, calling noce_can_store_speculate. This was 
added later than the code we're discussing, and it tries to verify that 
the same memory location will unconditionally be written to at a point 
later than the one we're trying to convert (but why aren't we testing 
for prior writes?). That should make any may_trap test unnecessary, and 
we can just simply delete the call to 
noce_mem_write_may_trap_or_fault_p. (We could leave such a test in as a 
compile-time focused early-out, but it may actually be too conservative. 
We may have an address which we can't prove safe in isolation, but if we 
know that we also have an unconditional store, it must be safe.)

So... I was about to propose the attached patch, which also fixes some 
oversights in the can_store_speculate path: we shouldn't allow autoinc 
addresses here. The added test in noce_can_store_speculate_p is not 
quite necessary given that the same one is also added to 
memory_must_be_modified_in_insn_p, but it avoids the need for an insn 
walk in cases where it isn't necessary. This bootstrapped and tested ok 
on x86_64-linux.

But then, I looked at noce_can_store_speculate again, and it seems 
broken. We walk over the post-dominators of the block, see if we find a 
store, and fail if something modifies the address:

   for (dominator = get_immediate_dominator (CDI_POST_DOMINATORS, top_bb);
        dominator != NULL;
        dominator = get_immediate_dominator (CDI_POST_DOMINATORS, 
dominator))
     {
       rtx_insn *insn;

       FOR_BB_INSNS (dominator, insn)
	{
[...]
	  if (modified_in_p (XEXP (mem, 0), insn))
	    return false;
	}

But what if the address is modified, but not in a post-dominator block? 
Let's say

  if (cond)
    *p = x; // this is the store we want to convert
  if (other_cond)
    p = some_pointer;
  else
    p = some_other_pointer;
  *p = y; // we'll see this store which postdominates the original
          // one, but not the modifications of p

So I think that algorithm doesn't work. My suggestion at this point 
would be to give up on converting stores entirely (deleting 
noce_can_store_speculate_p and noce_mem_write_may_trap_or_fault_p) until 
someone produces a reasonable scratchpad patch.


Bernd

Comments

Jeff Law Nov. 18, 2015, 11:49 p.m. UTC | #1
On 11/18/2015 12:16 PM, Bernd Schmidt wrote:
>
> For the current issue I've come to the conclusion that this kind of
> analysis is irrelevant here (and that is not subject to the problem I'll
> describe later), because of the use of noce_can_store_speculate. See below.
Hmm....

>
>> Ripping out noce_mem_write_may_trap_or_fault_p without fixing
>> may_trap_or_fault_p introduces a latent code code generation issue.
>
> I don't think so, actually. One safe option would be to rip it out and
> just stop transforming this case, but let's start by looking at the code
> just a bit further down, calling noce_can_store_speculate. This was
> added later than the code we're discussing, and it tries to verify that
> the same memory location will unconditionally be written to at a point
> later than the one we're trying to convert
And if we dig into that thread, Ian suggests this isn't terribly 
important anyway.

However, if we go even further upthread, we find an assertion from 
Michael that this is critical for 456.hmmer and references BZ 27313.

https://gcc.gnu.org/ml/gcc-patches/2007-08/msg01978.html

Sadly, no testcase was included.


> (but why aren't we testing
> for prior writes?).
Oversight I suspect.  Essentially you want to know if the store is 
anticipated (occurs on all paths to a given point).  There's a similar 
term for always occurs on all paths leaving a given point, but I can't 
remember it offhand.

>
> So... I was about to propose the attached patch, which also fixes some
> oversights in the can_store_speculate path: we shouldn't allow autoinc
> addresses here. The added test in noce_can_store_speculate_p is not
> quite necessary given that the same one is also added to
> memory_must_be_modified_in_insn_p, but it avoids the need for an insn
> walk in cases where it isn't necessary. This bootstrapped and tested ok
> on x86_64-linux.
Right.....

>
> But then, I looked at noce_can_store_speculate again, and it seems
> broken. We walk over the post-dominators of the block, see if we find a
> store, and fail if something modifies the address:
Egad.  That's clearly wrong.  Post domination means that paths to the 
exit must go through the post-dominator.  A path not


>
>    for (dominator = get_immediate_dominator (CDI_POST_DOMINATORS, top_bb);
>         dominator != NULL;
>         dominator = get_immediate_dominator (CDI_POST_DOMINATORS,
> dominator))
>      {
>        rtx_insn *insn;
>
>        FOR_BB_INSNS (dominator, insn)
>      {
> [...]
>        if (modified_in_p (XEXP (mem, 0), insn))
>          return false;
>      }
>
> But what if the address is modified, but not in a post-dominator block?
> Let's say
>
>   if (cond)
>     *p = x; // this is the store we want to convert
>   if (other_cond)
>     p = some_pointer;
>   else
>     p = some_other_pointer;
>   *p = y; // we'll see this store which postdominates the original
>           // one, but not the modifications of p
Right.  You essentially have to check all the blocks in the path.  At 
which point you might as well just run standard dataflow algorithms 
rather than coding up something ad-hoc here.

>
> So I think that algorithm doesn't work. My suggestion at this point
> would be to give up on converting stores entirely (deleting
> noce_can_store_speculate_p and noce_mem_write_may_trap_or_fault_p) until
> someone produces a reasonable scratchpad patch.
So if it weren't for the assertion that it's critical for hmmr, I'd be 
convinced that just ripping out was the right thing to do.

Can you look at 27313 and hmmr and see if there's an impact.  Just maybe 
the critical stuff for those is handled by the tree if converter and we 
can just rip out the clearly incorrect RTL bits without regressing 
anything performance-wise.  If there is an impact, then I think we have 
to look at either improving the tree bits (so we can remove the rtl 
bits) or we have to do real dataflow analysis in the rtl bits.


jeff
Bernd Schmidt Nov. 20, 2015, 2:08 p.m. UTC | #2
On 11/19/2015 12:49 AM, Jeff Law wrote:
> On 11/18/2015 12:16 PM, Bernd Schmidt wrote:
>> I don't think so, actually. One safe option would be to rip it out and
>> just stop transforming this case, but let's start by looking at the code
>> just a bit further down, calling noce_can_store_speculate. This was
>> added later than the code we're discussing, and it tries to verify that
>> the same memory location will unconditionally be written to at a point
>> later than the one we're trying to convert
> And if we dig into that thread, Ian suggests this isn't terribly
> important anyway.
>
> However, if we go even further upthread, we find an assertion from
> Michael that this is critical for 456.hmmer and references BZ 27313.

Cc'd in case he has additional input.

> https://gcc.gnu.org/ml/gcc-patches/2007-08/msg01978.html
>
> Sadly, no testcase was included.

BZ27313 is marked as fixed by the introduction of the tree cselim pass, 
thus the problem won't even be seen at RTL level.
I'm undecided on whether cs-elim is safe wrt the store speculation vs 
locks concerns raised in the thread discussing Ian's 
noce_can_store_speculate_p, but that's not something we have to consider 
to solve the problem at hand.

> So if it weren't for the assertion that it's critical for hmmr, I'd be
> convinced that just ripping out was the right thing to do.
>
> Can you look at 27313 and hmmr and see if there's an impact.  Just maybe
> the critical stuff for those is handled by the tree if converter and we
> can just rip out the clearly incorrect RTL bits without regressing
> anything performance-wise.  If there is an impact, then I think we have
> to look at either improving the tree bits (so we can remove the rtl
> bits) or we have to do real dataflow analysis in the rtl bits.

So I made this change:

    if (!set_b && MEM_P (orig_x))
-    {
-      /* Disallow the "if (...) x = a;" form (implicit "else x = x;")
-        for optimizations if writing to x may trap or fault,
-        i.e. it's a memory other than a static var or a stack slot,
-        is misaligned on strict aligned machines or is read-only.  If
-        x is a read-only memory, then the program is valid only if we
-        avoid the store into it.  If there are stores on both the
-        THEN and ELSE arms, then we can go ahead with the conversion;
-        either the program is broken, or the condition is always
-        false such that the other memory is selected.  */
-      if (noce_mem_write_may_trap_or_fault_p (orig_x))
-       return FALSE;
-
-      /* Avoid store speculation: given "if (...) x = a" where x is a
-        MEM, we only want to do the store if x is always set
-        somewhere in the function.  This avoids cases like
-          if (pthread_mutex_trylock(mutex))
-            ++global_variable;
-        where we only want global_variable to be changed if the mutex
-        is held.  FIXME: This should ideally be expressed directly in
-        RTL somehow.  */
-      if (!noce_can_store_speculate_p (test_bb, orig_x))
-       return FALSE;
-    }
+    return FALSE;

As far as I can tell hmmer and the 27313 testcase are unaffected at -O2 
(if anything, hmmer was very slightly faster afterwards). The run wasn't 
super-scientific, but I wouldn't have expected anything else given the 
existence of cs-elim.

Ok to do the above, removing all the bits made unnecessary (including 
memory_must_be_modified_in_insn_p in alias.c)?


Bernd
Jeff Law Nov. 20, 2015, 6:57 p.m. UTC | #3
On 11/20/2015 07:08 AM, Bernd Schmidt wrote:
>
> BZ27313 is marked as fixed by the introduction of the tree cselim pass,
> thus the problem won't even be seen at RTL level.
Cool.

> I'm undecided on whether cs-elim is safe wrt the store speculation vs
> locks concerns raised in the thread discussing Ian's
> noce_can_store_speculate_p, but that's not something we have to consider
> to solve the problem at hand.
I don't think cs-elim is safe WRT locks and such in multi-threaded code.

    In particular it replaces this:

      bb0:
        if (cond) goto bb2; else goto bb1;
      bb1:
        *p = RHS;
      bb2:

    with

      bb0:
        if (cond) goto bb1; else goto bb2;
      bb1:
        condtmp' = *p;
      bb2:
        condtmp = PHI <RHS, condtmp'>
        *p = condtmp;


If *p is a shared memory location, then there may be another writer.  If 
that writer happens to store something in that location after the load 
of *p, but before the store to *p, then that store will get lost in the 
transformed pseudo code.

That seems to introduce a data race.  Presumably one would call the 
original code ill-formed WRT the C11/C++11 memory model since the shared 
location is not marked as such.

I'm willing to consider this an independent problem.



>
> As far as I can tell hmmer and the 27313 testcase are unaffected at -O2
> (if anything, hmmer was very slightly faster afterwards). The run wasn't
> super-scientific, but I wouldn't have expected anything else given the
> existence of cs-elim.
Cool.  It doesn't have to be super-scientific.  Good to see it didn't 
harm anything -- thanks for going the extra mile on this one.

>
> Ok to do the above, removing all the bits made unnecessary (including
> memory_must_be_modified_in_insn_p in alias.c)?
Yup.  Zap it.
jeff
Michael Matz Nov. 23, 2015, 3:54 p.m. UTC | #4
Hi,

On Fri, 20 Nov 2015, Bernd Schmidt wrote:

> On 11/19/2015 12:49 AM, Jeff Law wrote:
> > On 11/18/2015 12:16 PM, Bernd Schmidt wrote:
> > > I don't think so, actually. One safe option would be to rip it out and
> > > just stop transforming this case, but let's start by looking at the code
> > > just a bit further down, calling noce_can_store_speculate. This was
> > > added later than the code we're discussing, and it tries to verify that
> > > the same memory location will unconditionally be written to at a point
> > > later than the one we're trying to convert
> > And if we dig into that thread, Ian suggests this isn't terribly
> > important anyway.
> > 
> > However, if we go even further upthread, we find an assertion from
> > Michael that this is critical for 456.hmmer and references BZ 27313.
> 
> Cc'd in case he has additional input.

The transformation I indicated in the mail introducing the cselim pass is 
indeed crucial for hmmer.  But that has nothing to do with 
noce_mem_write_may_trap_or_fault_p.  The latter is purely an RTL 
transformation, the transformation needs to happen on gimple level.  I 
agree with Bernd that the dominator walking code on the RTL side trying to 
find an unconditional write to the same address in order to validate the 
transformation looks wrong.


Ciao,
Michael.
Michael Matz Nov. 23, 2015, 4:05 p.m. UTC | #5
Hi,

On Fri, 20 Nov 2015, Jeff Law wrote:

> > I'm undecided on whether cs-elim is safe wrt the store speculation vs
> > locks concerns raised in the thread discussing Ian's
> > noce_can_store_speculate_p, but that's not something we have to consider
> > to solve the problem at hand.
> I don't think cs-elim is safe WRT locks and such in multi-threaded code.
> 
>    In particular it replaces this:
> 
>      bb0:
>        if (cond) goto bb2; else goto bb1;
>      bb1:
>        *p = RHS;
>      bb2:
> 
>    with
> 
>      bb0:
>        if (cond) goto bb1; else goto bb2;
>      bb1:
>        condtmp' = *p;
>      bb2:
>        condtmp = PHI <RHS, condtmp'>
>        *p = condtmp;

It only does so under some conditions, amongst them if it sees a 
dominating access to the same memory of the same type (load or store) and 
size.  So it doesn't introduce writes on paths that don't already contain 
a write, and hence are multi-thread safe.  Or, at least, that's the 
intention.

> If *p is a shared memory location, then there may be another writer.  
> If that writer happens to store something in that location after the 
> load of *p, but before the store to *p, then that store will get lost in 
> the transformed pseudo code.

Due to the above checking, also the first thread must have been an writer 
so there already are two writers on the same location, and hence a 
race is pre-existing.  It's only when you introduce a write when there was 
none whatsoever before (perhaps due to conditionals) that you create a new 
problem.


Ciao,
Michael.
Bernd Schmidt Nov. 25, 2015, 10:44 a.m. UTC | #6
On 11/23/2015 05:05 PM, Michael Matz wrote:
>
> It only does so under some conditions, amongst them if it sees a
> dominating access to the same memory of the same type (load or store) and
> size.  So it doesn't introduce writes on paths that don't already contain
> a write, and hence are multi-thread safe.  Or, at least, that's the
> intention.

Does it also ensure there's no memory barrier in between?


Bernd
Richard Biener Nov. 25, 2015, 11:38 a.m. UTC | #7
On Wed, Nov 25, 2015 at 11:44 AM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 11/23/2015 05:05 PM, Michael Matz wrote:
>>
>>
>> It only does so under some conditions, amongst them if it sees a
>> dominating access to the same memory of the same type (load or store) and
>> size.  So it doesn't introduce writes on paths that don't already contain
>> a write, and hence are multi-thread safe.  Or, at least, that's the
>> intention.
>
>
> Does it also ensure there's no memory barrier in between?

I don't think so.  Btw, if you want to add this please add a new gimple
predicate to identify "memory barrier" (any call or asm with a VDEF).

Richard.

>
> Bernd
>
Michael Matz Nov. 25, 2015, 1:17 p.m. UTC | #8
Hi,

On Wed, 25 Nov 2015, Bernd Schmidt wrote:

> On 11/23/2015 05:05 PM, Michael Matz wrote:
> > 
> > It only does so under some conditions, amongst them if it sees a
> > dominating access to the same memory of the same type (load or store) and
> > size.  So it doesn't introduce writes on paths that don't already contain
> > a write, and hence are multi-thread safe.  Or, at least, that's the
> > intention.
> 
> Does it also ensure there's no memory barrier in between?

Yes.  Whenever it sees any unknown call (including mem barriers) it 
forgets all known mem accesses (and hence can't mark accesses further down 
as not trapping).


Ciao,
Michael.
Michael Matz Nov. 25, 2015, 1:18 p.m. UTC | #9
Hi,

On Wed, 25 Nov 2015, Richard Biener wrote:

> I don't think so.  Btw, if you want to add this please add a new gimple 
> predicate to identify "memory barrier" (any call or asm with a VDEF).

      if (is_gimple_call (stmt) && !nonfreeing_call_p (stmt))
        nt_call_phase++;


Ciao,
Michael.
Richard Biener Nov. 25, 2015, 2:52 p.m. UTC | #10
On Wed, Nov 25, 2015 at 2:18 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Wed, 25 Nov 2015, Richard Biener wrote:
>
>> I don't think so.  Btw, if you want to add this please add a new gimple
>> predicate to identify "memory barrier" (any call or asm with a VDEF).
>
>       if (is_gimple_call (stmt) && !nonfreeing_call_p (stmt))
>         nt_call_phase++;

That looks bogus to me.  It misses asm()s and at least today
nonfreeing_call_p too much checks what it sounds like it checks.
In practice it might work though.  At least all the __sync_* and
__atomic_* calls are _not_ barriers this way.  A pthread_mutex_lock is
though as we don't have a builtin for it.

I'd change the above to a conservative

   if ((is_gimple_call (stmt) || is_gimple_asm (stmt)) && gimple_vuse (stmt))

Richard.

>
> Ciao,
> Michael.
Jakub Jelinek Nov. 25, 2015, 3 p.m. UTC | #11
On Wed, Nov 25, 2015 at 03:52:10PM +0100, Richard Biener wrote:
> On Wed, Nov 25, 2015 at 2:18 PM, Michael Matz <matz@suse.de> wrote:
> > Hi,
> >
> > On Wed, 25 Nov 2015, Richard Biener wrote:
> >
> >> I don't think so.  Btw, if you want to add this please add a new gimple
> >> predicate to identify "memory barrier" (any call or asm with a VDEF).
> >
> >       if (is_gimple_call (stmt) && !nonfreeing_call_p (stmt))
> >         nt_call_phase++;
> 
> That looks bogus to me.  It misses asm()s and at least today
> nonfreeing_call_p too much checks what it sounds like it checks.
> In practice it might work though.  At least all the __sync_* and
> __atomic_* calls are _not_ barriers this way.  A pthread_mutex_lock is
> though as we don't have a builtin for it.
> 
> I'd change the above to a conservative
> 
>    if ((is_gimple_call (stmt) || is_gimple_asm (stmt)) && gimple_vuse (stmt))

nonfreeing_call_p is one necessary condition (if that is true, it means
the call could mean that the first access does not trap while the second one
does).
But I agree that we need a predicate for nonbarrier_call_p or similar.
Some atomic builtins are not barriers though and IMHO should not be treated
as such, at least relaxed atomic loads/stores and pehraps even other relaxed
atomics.  And it would be nice to have IPA discovery of nonbarrier_call_p
calls, like we have for nonfreeing_fn.

	Jakub
Michael Matz Nov. 25, 2015, 3:16 p.m. UTC | #12
Hi,

On Wed, 25 Nov 2015, Jakub Jelinek wrote:

> > That looks bogus to me.  It misses asm()s and at least today 
> > nonfreeing_call_p too much checks what it sounds like it checks. In 
> > practice it might work though.  At least all the __sync_* and 
> > __atomic_* calls are _not_ barriers this way.  A pthread_mutex_lock is 
> > though as we don't have a builtin for it.
> > 
> > I'd change the above to a conservative
> > 
> >    if ((is_gimple_call (stmt) || is_gimple_asm (stmt)) && gimple_vuse (stmt))
> 
> nonfreeing_call_p is one necessary condition (if that is true, it means
> the call could mean that the first access does not trap while the second one
> does).

Yes, Richis test is conservatively catching this as well (it's clear the 
table for all calls and asms that even remotely can change memory state 
(well, when using gimple_vdef, not gimple_vuse).


Ciao,
Michael.
diff mbox

Patch

	* alias.c (memory_must_be_modified_in_insn_p): Return false for
	addresses with side effects.
	* ifcvt.c (noce_mem_write_may_trap_or_fault_p): Delete function.
	(noce_can_store_speculate_p): Return false for addresses with side
	effects.
	(noce_process_if_block): Rely only on noce_can_store_speculate_p.

Index: gcc/alias.c
===================================================================
--- gcc/alias.c	(revision 230541)
+++ gcc/alias.c	(working copy)
@@ -2974,6 +2974,10 @@  memory_must_be_modified_in_insn_p (const
 {
   if (!INSN_P (insn))
     return false;
+  /* Our rtx_equal_p tests will not do the right thing in this case.  */
+  if (side_effects_p (XEXP (mem, 0)))
+    return false;
+
   insn = PATTERN (insn);
   if (GET_CODE (insn) == SET)
     return set_dest_equal_p (insn, mem);
@@ -2984,7 +2988,7 @@  memory_must_be_modified_in_insn_p (const
 	{
 	  rtx sub = XVECEXP (insn, 0, i);
 	  if (GET_CODE (sub) == SET
-	      &&  set_dest_equal_p (sub, mem))
+	      && set_dest_equal_p (sub, mem))
 	    return true;
 	}
     }
Index: gcc/ifcvt.c
===================================================================
--- gcc/ifcvt.c	(revision 230541)
+++ gcc/ifcvt.c	(working copy)
@@ -2919,59 +2919,6 @@  noce_operand_ok (const_rtx op)
   return ! may_trap_p (op);
 }
 
-/* Return true if a write into MEM may trap or fault.  */
-
-static bool
-noce_mem_write_may_trap_or_fault_p (const_rtx mem)
-{
-  rtx addr;
-
-  if (MEM_READONLY_P (mem))
-    return true;
-
-  if (may_trap_or_fault_p (mem))
-    return true;
-
-  addr = XEXP (mem, 0);
-
-  /* Call target hook to avoid the effects of -fpic etc....  */
-  addr = targetm.delegitimize_address (addr);
-
-  while (addr)
-    switch (GET_CODE (addr))
-      {
-      case CONST:
-      case PRE_DEC:
-      case PRE_INC:
-      case POST_DEC:
-      case POST_INC:
-      case POST_MODIFY:
-	addr = XEXP (addr, 0);
-	break;
-      case LO_SUM:
-      case PRE_MODIFY:
-	addr = XEXP (addr, 1);
-	break;
-      case PLUS:
-	if (CONST_INT_P (XEXP (addr, 1)))
-	  addr = XEXP (addr, 0);
-	else
-	  return false;
-	break;
-      case LABEL_REF:
-	return true;
-      case SYMBOL_REF:
-	if (SYMBOL_REF_DECL (addr)
-	    && decl_readonly_section (SYMBOL_REF_DECL (addr), 0))
-	  return true;
-	return false;
-      default:
-	return false;
-      }
-
-  return false;
-}
-
 /* Return whether we can use store speculation for MEM.  TOP_BB is the
    basic block above the conditional block where we are considering
    doing the speculative store.  We look for whether MEM is set
@@ -2982,6 +2929,9 @@  noce_can_store_speculate_p (basic_block
 {
   basic_block dominator;
 
+  if (side_effects_p (XEXP (mem, 0)))
+    return false;
+
   for (dominator = get_immediate_dominator (CDI_POST_DOMINATORS, top_bb);
        dominator != NULL;
        dominator = get_immediate_dominator (CDI_POST_DOMINATORS, dominator))
@@ -3540,26 +3490,21 @@  noce_process_if_block (struct noce_if_in
 
   if (!set_b && MEM_P (orig_x))
     {
-      /* Disallow the "if (...) x = a;" form (implicit "else x = x;")
-	 for optimizations if writing to x may trap or fault,
-	 i.e. it's a memory other than a static var or a stack slot,
-	 is misaligned on strict aligned machines or is read-only.  If
-	 x is a read-only memory, then the program is valid only if we
-	 avoid the store into it.  If there are stores on both the
-	 THEN and ELSE arms, then we can go ahead with the conversion;
-	 either the program is broken, or the condition is always
-	 false such that the other memory is selected.  */
-      if (noce_mem_write_may_trap_or_fault_p (orig_x))
-	return FALSE;
+      /* Here we have the "if (...) x = a;" form (with an implicit
+	 "else x = x;"), where x is a store to memory.
 
-      /* Avoid store speculation: given "if (...) x = a" where x is a
-	 MEM, we only want to do the store if x is always set
-	 somewhere in the function.  This avoids cases like
+	 We want to avoid store speculation: we only want to do the store
+	 if x is always set on a path leading to this point without an
+	 intervening memory barrier.  This avoids cases like
 	   if (pthread_mutex_trylock(mutex))
 	     ++global_variable;
 	 where we only want global_variable to be changed if the mutex
 	 is held.  FIXME: This should ideally be expressed directly in
-	 RTL somehow.  */
+	 RTL somehow.
+
+	 A success of this test implies that the access cannot trap, so
+	 we need not call may_trap_or_fault_p; doing so may even lose
+	 valid optimization opportunities.  */
       if (!noce_can_store_speculate_p (test_bb, orig_x))
 	return FALSE;
     }