diff mbox series

Fixing ifcvt issue as exposed by BZ89430

Message ID BL0PR0102MB358803D85C639D61B05604FF9C750@BL0PR0102MB3588.prod.exchangelabs.com
State New
Headers show
Series Fixing ifcvt issue as exposed by BZ89430 | expand

Commit Message

JiangNing OS Feb. 28, 2019, 12:26 p.m. UTC
To solve BZ89430 the followings are needed,

(1) The code below in noce_try_cmove_arith needs to be fixed.

  /* ??? We could handle this if we knew that a load from A or B could
     not trap or fault.  This is also true if we've already loaded
     from the address along the path from ENTRY.  */
  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
    return FALSE;

Finding dominating memory access could help to decide whether the memory access following the conditional move is valid or not. 
* If there is a dominating memory write to the same memory address in test_bb as the one from x=a, it is always safe.
* When the dominating memory access to the same memory address in test_bb is a read, for the load out of x=a, it is always safe. For the store out of x=a, if the memory is on stack, it is still safe.

Besides, there is a bug around rearranging the then_bb and else_bb layout, which needs to be fixed.

(2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html  is an overkill. If the write target following conditional move is a memory access, it exits shortly.

  if (!set_b && MEM_P (orig_x))
    /* We want to avoid store speculation to avoid cases like
         if (pthread_mutex_trylock(mutex))
           ++global_variable;
       Rather than go to much effort here, we rely on the SSA optimizers,
       which do a good enough job these days.  */
    return FALSE;

It looks a bit hard for compiler to know the program semantic is related to mutex and avoid store speculation. Without removing this overkill, the fix mentioned by (1) would not work. Any idea? An alternative solution is to detect the following pattern conservatively,

         if (a_function_call(...))
           ++local_variable;

If the local_variable doesn't have address taken at all in current function, it mustn't be a pthread mutex lock semantic, and then the following patch would work to solve (1) and pass the last case as described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430. 


Thanks,
-Jiangning

Comments

Jeff Law March 19, 2019, 8:41 p.m. UTC | #1
On 2/28/19 5:26 AM, JiangNing OS wrote:
> To solve BZ89430 the followings are needed,
> 
> (1) The code below in noce_try_cmove_arith needs to be fixed.
> 
>   /* ??? We could handle this if we knew that a load from A or B could
>      not trap or fault.  This is also true if we've already loaded
>      from the address along the path from ENTRY.  */
>   else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
>     return FALSE;
> 
> Finding dominating memory access could help to decide whether the memory access following the conditional move is valid or not. 
> * If there is a dominating memory write to the same memory address in test_bb as the one from x=a, it is always safe.
> * When the dominating memory access to the same memory address in test_bb is a read, for the load out of x=a, it is always safe. For the store out of x=a, if the memory is on stack, it is still safe.
> 
> Besides, there is a bug around rearranging the then_bb and else_bb layout, which needs to be fixed.
> 
> (2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html  is an overkill. If the write target following conditional move is a memory access, it exits shortly.
> 
>   if (!set_b && MEM_P (orig_x))
>     /* We want to avoid store speculation to avoid cases like
>          if (pthread_mutex_trylock(mutex))
>            ++global_variable;
>        Rather than go to much effort here, we rely on the SSA optimizers,
>        which do a good enough job these days.  */
>     return FALSE;
> 
> It looks a bit hard for compiler to know the program semantic is related to mutex and avoid store speculation. Without removing this overkill, the fix mentioned by (1) would not work. Any idea? An alternative solution is to detect the following pattern conservatively,
> 
>          if (a_function_call(...))
>            ++local_variable;
> 
> If the local_variable doesn't have address taken at all in current function, it mustn't be a pthread mutex lock semantic, and then the following patch would work to solve (1) and pass the last case as described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430. 
Just a note. I'm deferring this to gcc-10.
jeff
Richard Biener May 8, 2019, 7:34 a.m. UTC | #2
On Thu, Feb 28, 2019 at 1:26 PM JiangNing OS
<jiangning@os.amperecomputing.com> wrote:
>
> To solve BZ89430 the followings are needed,
>
> (1) The code below in noce_try_cmove_arith needs to be fixed.
>
>   /* ??? We could handle this if we knew that a load from A or B could
>      not trap or fault.  This is also true if we've already loaded
>      from the address along the path from ENTRY.  */
>   else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
>     return FALSE;
>
> Finding dominating memory access could help to decide whether the memory access following the conditional move is valid or not.
> * If there is a dominating memory write to the same memory address in test_bb as the one from x=a, it is always safe.
> * When the dominating memory access to the same memory address in test_bb is a read, for the load out of x=a, it is always safe. For the store out of x=a, if the memory is on stack, it is still safe.
>
> Besides, there is a bug around rearranging the then_bb and else_bb layout, which needs to be fixed.
>
> (2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html  is an overkill. If the write target following conditional move is a memory access, it exits shortly.
>
>   if (!set_b && MEM_P (orig_x))
>     /* We want to avoid store speculation to avoid cases like
>          if (pthread_mutex_trylock(mutex))
>            ++global_variable;
>        Rather than go to much effort here, we rely on the SSA optimizers,
>        which do a good enough job these days.  */
>     return FALSE;
>
> It looks a bit hard for compiler to know the program semantic is related to mutex and avoid store speculation. Without removing this overkill, the fix mentioned by (1) would not work. Any idea? An alternative solution is to detect the following pattern conservatively,

But it's important to not break this.  Note we do have
--param allow-store-data-races which the user can use to override this.
Note that accesses to the stack can obviously not cause store
speculation if its address didn't escape.  But that's probably what is
refered to by "rely on the SSA optimizers".

>          if (a_function_call(...))
>            ++local_variable;
>
> If the local_variable doesn't have address taken at all in current function, it mustn't be a pthread mutex lock semantic, and then the following patch would work to solve (1) and pass the last case as described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430.
>
> Index: gcc/cse.c
> ===================================================================
> --- gcc/cse.c   (revision 268256)
> +++ gcc/cse.c   (working copy)
> @@ -540,7 +540,6 @@
>     already as part of an already processed extended basic block.  */
>  static sbitmap cse_visited_basic_blocks;
>
> -static bool fixed_base_plus_p (rtx x);
>  static int notreg_cost (rtx, machine_mode, enum rtx_code, int);
>  static int preferable (int, int, int, int);
>  static void new_basic_block (void);
> @@ -606,30 +605,7 @@
>
>  static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER;
>
>
> -/* Nonzero if X has the form (PLUS frame-pointer integer).  */
>
> -static bool
> -fixed_base_plus_p (rtx x)
> -{
> -  switch (GET_CODE (x))
> -    {
> -    case REG:
> -      if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
> -       return true;
> -      if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
> -       return true;
> -      return false;
> -
> -    case PLUS:
> -      if (!CONST_INT_P (XEXP (x, 1)))
> -       return false;
> -      return fixed_base_plus_p (XEXP (x, 0));
> -
> -    default:
> -      return false;
> -    }
> -}
> -
>  /* Dump the expressions in the equivalence class indicated by CLASSP.
>     This function is used only for debugging.  */
>  DEBUG_FUNCTION void
> Index: gcc/ifcvt.c
> ===================================================================
> --- gcc/ifcvt.c (revision 268256)
> +++ gcc/ifcvt.c (working copy)
> @@ -76,6 +76,9 @@
>  /* Whether conditional execution changes were made.  */
>  static int cond_exec_changed_p;
>
> +/* bitmap for stack frame pointer definition insns. */
> +static bitmap bba_sets_sfp;
> +
>  /* Forward references.  */
>  static int count_bb_insns (const_basic_block);
>  static bool cheap_bb_rtx_cost_p (const_basic_block, profile_probability, int);
> @@ -99,6 +102,7 @@
>                                edge, int);
>  static void noce_emit_move_insn (rtx, rtx);
>  static rtx_insn *block_has_only_trap (basic_block);
> +static void collect_all_fp_insns (void);
>
>
>  /* Count the number of non-jump active insns in BB.  */
>
> @@ -2029,6 +2033,110 @@
>    return true;
>  }
>
> +/* Return true if MEM x is on stack. a_insn contains x, if it exists. */
> +
> +static bool
> +noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x)
> +{
> +  df_ref use;
> +
> +  gcc_assert (x);
> +  gcc_assert (MEM_P (x));
> +
> +  /* Early exits if find base register is a stack register. */
> +  rtx a = XEXP (x, 0);
> +  if (fixed_base_plus_p(a))
> +    return true;
> +
> +  if (!a_insn)
> +    return false;
> +
> +  /* Check if x is on stack. Assume a mem expression using registers
> +     related to stack register is always on stack. */
> +  FOR_EACH_INSN_USE (use, a_insn)
> +    {
> +      if (bitmap_bit_p (bba_sets_sfp, DF_REF_REGNO (use)))
> +        return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Always return true, if there is a dominating write.
> +
> +   When there is a dominating read from memory on stack,
> +   1) if x = a is a memory read, return true.
> +   2) if x = a is a memory write, return true if the memory is on stack.
> +      This is the guarantee the memory is *not* readonly. */
> +
> +static bool
> +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
> +                           const_rtx x, bool is_store)
> +{
> +  rtx_insn *insn;
> +  rtx set;
> +  bool find_load = false;
> +
> +  gcc_assert (MEM_P (x));
> +
> +  FOR_BB_INSNS (bb, insn)
> +    {
> +      set = single_set (insn);
> +      if (!set)
> +        continue;
> +
> +      /* Dominating store */
> +      if (rtx_equal_p (x, SET_DEST (set)))
> +        return true;
> +
> +      /* Dominating load */
> +      if (rtx_equal_p (x, SET_SRC (set)))
> +        find_load = true;
> +    }
> +
> +  if (find_load)
> +    {
> +      if (is_store && noce_mem_is_on_stack (a_insn, x))
> +        return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Return false if the memory a or b must be valid.
> +   This function must be called before latent swap of a and b. */
> +
> +static bool
> +noce_mem_maybe_invalid_p (struct noce_if_info *if_info)
> +{
> +  /* for memory load */
> +  if (if_info->a && MEM_P (if_info->a))
> +    {
> +      return !noce_valid_for_dominating (if_info->test_bb,
> +                                         if_info->insn_a,
> +                                         if_info->a, false);
> +    }
> +
> +  /* for memory store */
> +  else if (if_info->b && MEM_P (if_info->b))
> +    {
> +      if (!if_info->else_bb)
> +        return !noce_valid_for_dominating (if_info->test_bb,
> +                                           if_info->insn_a,
> +                                           if_info->b, true);
> +      else
> +        return !noce_valid_for_dominating (if_info->test_bb,
> +                                           if_info->insn_b,
> +                                           if_info->b, true);
> +    }
> +
> +  /* ??? We could handle this if we knew that a load from A or B could
> +     not trap or fault.  This is also true if we've already loaded
> +     from the address along the path from ENTRY.  */
> +  return (may_trap_or_fault_p (if_info->a)
> +          || may_trap_or_fault_p (if_info->b));
> +}
> +
>  /* Try more complex cases involving conditional_move.  */
>
>  static int
> @@ -2065,10 +2173,7 @@
>        is_mem = 1;
>      }
>
> -  /* ??? We could handle this if we knew that a load from A or B could
> -     not trap or fault.  This is also true if we've already loaded
> -     from the address along the path from ENTRY.  */
> -  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> +  else if (noce_mem_maybe_invalid_p (if_info))
>      return FALSE;
>
>    /* if (test) x = a + b; else x = c - d;
> @@ -2234,7 +2339,7 @@
>    /* If insn to set up A clobbers any registers B depends on, try to
>       swap insn that sets up A with the one that sets up B.  If even
>       that doesn't help, punt.  */
> -  if (modified_in_a && !modified_in_b)
> +  if (!modified_in_a && modified_in_b)
>      {
>        if (!noce_emit_bb (emit_b, else_bb, b_simple))
>         goto end_seq_and_fail;
> @@ -2242,7 +2347,7 @@
>        if (!noce_emit_bb (emit_a, then_bb, a_simple))
>         goto end_seq_and_fail;
>      }
> -  else if (!modified_in_a)
> +  else if (!modified_in_b)
>      {
>        if (!noce_emit_bb (emit_a, then_bb, a_simple))
>         goto end_seq_and_fail;
> @@ -5347,6 +5452,30 @@
>
>    return FALSE;
>  }
> +
> +static void
> +collect_all_fp_insns (void)
> +{
> +  rtx_insn *a_insn;
> +  bba_sets_sfp = BITMAP_ALLOC (&reg_obstack);
> +  df_ref def;
> +  basic_block bb;
> +
> +  FOR_ALL_BB_FN (bb, cfun)
> +    FOR_BB_INSNS (bb, a_insn)
> +      {
> +        rtx sset_a = single_set (a_insn);
> +        if (!sset_a)
> +          continue;
> +
> +        if (fixed_base_plus_p (SET_SRC (sset_a)))
> +          {
> +            FOR_EACH_INSN_DEF (def, a_insn)
> +             bitmap_set_bit (bba_sets_sfp, DF_REF_REGNO (def));
> +         }
> +      }
> +}
> +
>
>
>  /* Main entry point for all if-conversion.  AFTER_COMBINE is true if
>     we are after combine pass.  */
> @@ -5381,6 +5510,8 @@
>
>    df_set_flags (DF_LR_RUN_DCE);
>
> +  collect_all_fp_insns ();
> +
>    /* Go through each of the basic blocks looking for things to convert.  If we
>       have conditional execution, we make multiple passes to allow us to handle
>       IF-THEN{-ELSE} blocks within other IF-THEN{-ELSE} blocks.  */
> @@ -5413,6 +5544,8 @@
>      }
>    while (cond_exec_changed_p);
>
> +  BITMAP_FREE (bba_sets_sfp);
> +
>  #ifdef IFCVT_MULTIPLE_DUMPS
>    if (dump_file)
>      fprintf (dump_file, "\n\n========== no more changes\n");
> Index: gcc/rtl.h
> ===================================================================
> --- gcc/rtl.h   (revision 268256)
> +++ gcc/rtl.h   (working copy)
> @@ -3751,6 +3751,8 @@
>  #define hard_frame_pointer_rtx (global_rtl[GR_HARD_FRAME_POINTER])
>  #define arg_pointer_rtx                (global_rtl[GR_ARG_POINTER])
>
> +extern bool fixed_base_plus_p (rtx x);
> +
>  #ifndef GENERATOR_FILE
>  /* Return the attributes of a MEM rtx.  */
>  static inline const struct mem_attrs *
> Index: gcc/rtlanal.c
> ===================================================================
> --- gcc/rtlanal.c       (revision 268256)
> +++ gcc/rtlanal.c       (working copy)
> @@ -341,6 +341,30 @@
>    return 0;
>  }
>
> +/* Nonzero if X has the form (PLUS frame-pointer integer).  */
> +
> +bool
> +fixed_base_plus_p (rtx x)
> +{
> +  switch (GET_CODE (x))
> +    {
> +    case REG:
> +      if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
> +        return true;
> +      if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
> +        return true;
> +      return false;
> +
> +    case PLUS:
> +      if (!CONST_INT_P (XEXP (x, 1)))
> +        return false;
> +      return fixed_base_plus_p (XEXP (x, 0));
> +
> +    default:
> +      return false;
> +    }
> +}
> +
>  /* Compute an approximation for the offset between the register
>     FROM and TO for the current function, as it was at the start
>     of the routine.  */
>
> Thanks,
> -Jiangning
JiangNing OS May 9, 2019, 6:06 a.m. UTC | #3
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, May 8, 2019 3:35 PM
> To: JiangNing OS <jiangning@os.amperecomputing.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Biener <rguenther@suse.de>;
> pinskia@gcc.gnu.org
> Subject: Re: Fixing ifcvt issue as exposed by BZ89430
> 
> On Thu, Feb 28, 2019 at 1:26 PM JiangNing OS
> <jiangning@os.amperecomputing.com> wrote:
> >
> > To solve BZ89430 the followings are needed,
> >
> > (1) The code below in noce_try_cmove_arith needs to be fixed.
> >
> >   /* ??? We could handle this if we knew that a load from A or B could
> >      not trap or fault.  This is also true if we've already loaded
> >      from the address along the path from ENTRY.  */
> >   else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> >     return FALSE;
> >
> > Finding dominating memory access could help to decide whether the
> memory access following the conditional move is valid or not.
> > * If there is a dominating memory write to the same memory address in
> test_bb as the one from x=a, it is always safe.
> > * When the dominating memory access to the same memory address in
> test_bb is a read, for the load out of x=a, it is always safe. For the store out
> of x=a, if the memory is on stack, it is still safe.
> >
> > Besides, there is a bug around rearranging the then_bb and else_bb layout,
> which needs to be fixed.
> >
> > (2) The fix by https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02998.html
> is an overkill. If the write target following conditional move is a memory
> access, it exits shortly.
> >
> >   if (!set_b && MEM_P (orig_x))
> >     /* We want to avoid store speculation to avoid cases like
> >          if (pthread_mutex_trylock(mutex))
> >            ++global_variable;
> >        Rather than go to much effort here, we rely on the SSA optimizers,
> >        which do a good enough job these days.  */
> >     return FALSE;
> >
> > It looks a bit hard for compiler to know the program semantic is
> > related to mutex and avoid store speculation. Without removing this
> > overkill, the fix mentioned by (1) would not work. Any idea? An
> > alternative solution is to detect the following pattern
> > conservatively,
> 
> But it's important to not break this.  Note we do have --param allow-store-
> data-races which the user can use to override this.
> Note that accesses to the stack can obviously not cause store speculation if
> its address didn't escape.  But that's probably what is refered to by "rely on
> the SSA optimizers".

Yes! So I have sent out a patch with title "[PATCH] improve ifcvt optimization 
(PR rtl-optimization/89430)" to detect the access to stack two months ago.
The SSA Optimization called "tree-if-conv" in middle-end can't really cover
this case. The key part of my change is like,

-    /* We want to avoid store speculation to avoid cases like
-	 if (pthread_mutex_trylock(mutex))
-	   ++global_variable;
-       Rather than go to much effort here, we rely on the SSA optimizers,
-       which do a good enough job these days.  */
-    return FALSE;
+    {
+      /* We want to avoid store speculation to avoid cases like
+           if (pthread_mutex_trylock(mutex))
+             ++global_variable;  
+         Tree if conversion cannot handle this case well, and it intends to
+         help vectorization for loops only. */
+      if (!noce_mem_is_on_stack(insn_a, orig_x))
+        return FALSE;
 
+      /* For case like,
+           if (pthread_mutex_trylock(mutex))
+             ++local_variable;
+         If any stack variable address is taken, potentially this local
+         variable could be modified by other threads and introduce store
+         speculation. */
+      if (!cfun_no_stack_address_taken)
+        return FALSE;
+    }

Can you please take a look if you have time? Thank you!

> 
> >          if (a_function_call(...))
> >            ++local_variable;
> >
> > If the local_variable doesn't have address taken at all in current function, it
> mustn't be a pthread mutex lock semantic, and then the following patch
> would work to solve (1) and pass the last case as described in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89430.
> >
> > Index: gcc/cse.c
> >
> ================================================================
> ===
> > --- gcc/cse.c   (revision 268256)
> > +++ gcc/cse.c   (working copy)
> > @@ -540,7 +540,6 @@
> >     already as part of an already processed extended basic block.  */
> > static sbitmap cse_visited_basic_blocks;
> >
> > -static bool fixed_base_plus_p (rtx x);  static int notreg_cost (rtx,
> > machine_mode, enum rtx_code, int);  static int preferable (int, int,
> > int, int);  static void new_basic_block (void); @@ -606,30 +605,7 @@
> >
> >  static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER;
> >
> >
> > -/* Nonzero if X has the form (PLUS frame-pointer integer).  */
> >
> > -static bool
> > -fixed_base_plus_p (rtx x)
> > -{
> > -  switch (GET_CODE (x))
> > -    {
> > -    case REG:
> > -      if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
> > -       return true;
> > -      if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
> > -       return true;
> > -      return false;
> > -
> > -    case PLUS:
> > -      if (!CONST_INT_P (XEXP (x, 1)))
> > -       return false;
> > -      return fixed_base_plus_p (XEXP (x, 0));
> > -
> > -    default:
> > -      return false;
> > -    }
> > -}
> > -
> >  /* Dump the expressions in the equivalence class indicated by CLASSP.
> >     This function is used only for debugging.  */  DEBUG_FUNCTION void
> > Index: gcc/ifcvt.c
> >
> ================================================================
> ===
> > --- gcc/ifcvt.c (revision 268256)
> > +++ gcc/ifcvt.c (working copy)
> > @@ -76,6 +76,9 @@
> >  /* Whether conditional execution changes were made.  */  static int
> > cond_exec_changed_p;
> >
> > +/* bitmap for stack frame pointer definition insns. */ static bitmap
> > +bba_sets_sfp;
> > +
> >  /* Forward references.  */
> >  static int count_bb_insns (const_basic_block);  static bool
> > cheap_bb_rtx_cost_p (const_basic_block, profile_probability, int); @@
> > -99,6 +102,7 @@
> >                                edge, int);  static void
> > noce_emit_move_insn (rtx, rtx);  static rtx_insn *block_has_only_trap
> > (basic_block);
> > +static void collect_all_fp_insns (void);
> >
> >
> >  /* Count the number of non-jump active insns in BB.  */
> >
> > @@ -2029,6 +2033,110 @@
> >    return true;
> >  }
> >
> > +/* Return true if MEM x is on stack. a_insn contains x, if it exists.
> > +*/
> > +
> > +static bool
> > +noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x) {
> > +  df_ref use;
> > +
> > +  gcc_assert (x);
> > +  gcc_assert (MEM_P (x));
> > +
> > +  /* Early exits if find base register is a stack register. */  rtx a
> > + = XEXP (x, 0);  if (fixed_base_plus_p(a))
> > +    return true;
> > +
> > +  if (!a_insn)
> > +    return false;
> > +
> > +  /* Check if x is on stack. Assume a mem expression using registers
> > +     related to stack register is always on stack. */
> > + FOR_EACH_INSN_USE (use, a_insn)
> > +    {
> > +      if (bitmap_bit_p (bba_sets_sfp, DF_REF_REGNO (use)))
> > +        return true;
> > +    }
> > +
> > +  return false;
> > +}
> > +
> > +/* Always return true, if there is a dominating write.
> > +
> > +   When there is a dominating read from memory on stack,
> > +   1) if x = a is a memory read, return true.
> > +   2) if x = a is a memory write, return true if the memory is on stack.
> > +      This is the guarantee the memory is *not* readonly. */
> > +
> > +static bool
> > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
> > +                           const_rtx x, bool is_store) {
> > +  rtx_insn *insn;
> > +  rtx set;
> > +  bool find_load = false;
> > +
> > +  gcc_assert (MEM_P (x));
> > +
> > +  FOR_BB_INSNS (bb, insn)
> > +    {
> > +      set = single_set (insn);
> > +      if (!set)
> > +        continue;
> > +
> > +      /* Dominating store */
> > +      if (rtx_equal_p (x, SET_DEST (set)))
> > +        return true;
> > +
> > +      /* Dominating load */
> > +      if (rtx_equal_p (x, SET_SRC (set)))
> > +        find_load = true;
> > +    }
> > +
> > +  if (find_load)
> > +    {
> > +      if (is_store && noce_mem_is_on_stack (a_insn, x))
> > +        return true;
> > +    }
> > +
> > +  return false;
> > +}
> > +
> > +/* Return false if the memory a or b must be valid.
> > +   This function must be called before latent swap of a and b. */
> > +
> > +static bool
> > +noce_mem_maybe_invalid_p (struct noce_if_info *if_info) {
> > +  /* for memory load */
> > +  if (if_info->a && MEM_P (if_info->a))
> > +    {
> > +      return !noce_valid_for_dominating (if_info->test_bb,
> > +                                         if_info->insn_a,
> > +                                         if_info->a, false);
> > +    }
> > +
> > +  /* for memory store */
> > +  else if (if_info->b && MEM_P (if_info->b))
> > +    {
> > +      if (!if_info->else_bb)
> > +        return !noce_valid_for_dominating (if_info->test_bb,
> > +                                           if_info->insn_a,
> > +                                           if_info->b, true);
> > +      else
> > +        return !noce_valid_for_dominating (if_info->test_bb,
> > +                                           if_info->insn_b,
> > +                                           if_info->b, true);
> > +    }
> > +
> > +  /* ??? We could handle this if we knew that a load from A or B could
> > +     not trap or fault.  This is also true if we've already loaded
> > +     from the address along the path from ENTRY.  */
> > +  return (may_trap_or_fault_p (if_info->a)
> > +          || may_trap_or_fault_p (if_info->b)); }
> > +
> >  /* Try more complex cases involving conditional_move.  */
> >
> >  static int
> > @@ -2065,10 +2173,7 @@
> >        is_mem = 1;
> >      }
> >
> > -  /* ??? We could handle this if we knew that a load from A or B could
> > -     not trap or fault.  This is also true if we've already loaded
> > -     from the address along the path from ENTRY.  */
> > -  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> > +  else if (noce_mem_maybe_invalid_p (if_info))
> >      return FALSE;
> >
> >    /* if (test) x = a + b; else x = c - d; @@ -2234,7 +2339,7 @@
> >    /* If insn to set up A clobbers any registers B depends on, try to
> >       swap insn that sets up A with the one that sets up B.  If even
> >       that doesn't help, punt.  */
> > -  if (modified_in_a && !modified_in_b)
> > +  if (!modified_in_a && modified_in_b)
> >      {
> >        if (!noce_emit_bb (emit_b, else_bb, b_simple))
> >         goto end_seq_and_fail;
> > @@ -2242,7 +2347,7 @@
> >        if (!noce_emit_bb (emit_a, then_bb, a_simple))
> >         goto end_seq_and_fail;
> >      }
> > -  else if (!modified_in_a)
> > +  else if (!modified_in_b)
> >      {
> >        if (!noce_emit_bb (emit_a, then_bb, a_simple))
> >         goto end_seq_and_fail;
> > @@ -5347,6 +5452,30 @@
> >
> >    return FALSE;
> >  }
> > +
> > +static void
> > +collect_all_fp_insns (void)
> > +{
> > +  rtx_insn *a_insn;
> > +  bba_sets_sfp = BITMAP_ALLOC (&reg_obstack);
> > +  df_ref def;
> > +  basic_block bb;
> > +
> > +  FOR_ALL_BB_FN (bb, cfun)
> > +    FOR_BB_INSNS (bb, a_insn)
> > +      {
> > +        rtx sset_a = single_set (a_insn);
> > +        if (!sset_a)
> > +          continue;
> > +
> > +        if (fixed_base_plus_p (SET_SRC (sset_a)))
> > +          {
> > +            FOR_EACH_INSN_DEF (def, a_insn)
> > +             bitmap_set_bit (bba_sets_sfp, DF_REF_REGNO (def));
> > +         }
> > +      }
> > +}
> > +
> >
> >
> >  /* Main entry point for all if-conversion.  AFTER_COMBINE is true if
> >     we are after combine pass.  */
> > @@ -5381,6 +5510,8 @@
> >
> >    df_set_flags (DF_LR_RUN_DCE);
> >
> > +  collect_all_fp_insns ();
> > +
> >    /* Go through each of the basic blocks looking for things to convert.  If we
> >       have conditional execution, we make multiple passes to allow us to
> handle
> >       IF-THEN{-ELSE} blocks within other IF-THEN{-ELSE} blocks.  */ @@
> > -5413,6 +5544,8 @@
> >      }
> >    while (cond_exec_changed_p);
> >
> > +  BITMAP_FREE (bba_sets_sfp);
> > +
> >  #ifdef IFCVT_MULTIPLE_DUMPS
> >    if (dump_file)
> >      fprintf (dump_file, "\n\n========== no more changes\n");
> > Index: gcc/rtl.h
> >
> ================================================================
> ===
> > --- gcc/rtl.h   (revision 268256)
> > +++ gcc/rtl.h   (working copy)
> > @@ -3751,6 +3751,8 @@
> >  #define hard_frame_pointer_rtx (global_rtl[GR_HARD_FRAME_POINTER])
> >  #define arg_pointer_rtx                (global_rtl[GR_ARG_POINTER])
> >
> > +extern bool fixed_base_plus_p (rtx x);
> > +
> >  #ifndef GENERATOR_FILE
> >  /* Return the attributes of a MEM rtx.  */  static inline const
> > struct mem_attrs *
> > Index: gcc/rtlanal.c
> >
> ================================================================
> ===
> > --- gcc/rtlanal.c       (revision 268256)
> > +++ gcc/rtlanal.c       (working copy)
> > @@ -341,6 +341,30 @@
> >    return 0;
> >  }
> >
> > +/* Nonzero if X has the form (PLUS frame-pointer integer).  */
> > +
> > +bool
> > +fixed_base_plus_p (rtx x)
> > +{
> > +  switch (GET_CODE (x))
> > +    {
> > +    case REG:
> > +      if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
> > +        return true;
> > +      if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
> > +        return true;
> > +      return false;
> > +
> > +    case PLUS:
> > +      if (!CONST_INT_P (XEXP (x, 1)))
> > +        return false;
> > +      return fixed_base_plus_p (XEXP (x, 0));
> > +
> > +    default:
> > +      return false;
> > +    }
> > +}
> > +
> >  /* Compute an approximation for the offset between the register
> >     FROM and TO for the current function, as it was at the start
> >     of the routine.  */
> >
> > Thanks,
> > -Jiangning
diff mbox series

Patch

Index: gcc/cse.c
===================================================================
--- gcc/cse.c	(revision 268256)
+++ gcc/cse.c	(working copy)
@@ -540,7 +540,6 @@ 
    already as part of an already processed extended basic block.  */
 static sbitmap cse_visited_basic_blocks;
 
-static bool fixed_base_plus_p (rtx x);
 static int notreg_cost (rtx, machine_mode, enum rtx_code, int);
 static int preferable (int, int, int, int);
 static void new_basic_block (void);
@@ -606,30 +605,7 @@ 
 
 static const struct rtl_hooks cse_rtl_hooks = RTL_HOOKS_INITIALIZER;
 

-/* Nonzero if X has the form (PLUS frame-pointer integer).  */
 
-static bool
-fixed_base_plus_p (rtx x)
-{
-  switch (GET_CODE (x))
-    {
-    case REG:
-      if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
-	return true;
-      if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
-	return true;
-      return false;
-
-    case PLUS:
-      if (!CONST_INT_P (XEXP (x, 1)))
-	return false;
-      return fixed_base_plus_p (XEXP (x, 0));
-
-    default:
-      return false;
-    }
-}
-
 /* Dump the expressions in the equivalence class indicated by CLASSP.
    This function is used only for debugging.  */
 DEBUG_FUNCTION void
Index: gcc/ifcvt.c
===================================================================
--- gcc/ifcvt.c	(revision 268256)
+++ gcc/ifcvt.c	(working copy)
@@ -76,6 +76,9 @@ 
 /* Whether conditional execution changes were made.  */
 static int cond_exec_changed_p;
 
+/* bitmap for stack frame pointer definition insns. */
+static bitmap bba_sets_sfp;
+
 /* Forward references.  */
 static int count_bb_insns (const_basic_block);
 static bool cheap_bb_rtx_cost_p (const_basic_block, profile_probability, int);
@@ -99,6 +102,7 @@ 
 			       edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
+static void collect_all_fp_insns (void);
 

 /* Count the number of non-jump active insns in BB.  */
 
@@ -2029,6 +2033,110 @@ 
   return true;
 }
 
+/* Return true if MEM x is on stack. a_insn contains x, if it exists. */
+
+static bool
+noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x)
+{
+  df_ref use;
+
+  gcc_assert (x);
+  gcc_assert (MEM_P (x));
+
+  /* Early exits if find base register is a stack register. */
+  rtx a = XEXP (x, 0);
+  if (fixed_base_plus_p(a))
+    return true;
+
+  if (!a_insn)
+    return false;
+
+  /* Check if x is on stack. Assume a mem expression using registers
+     related to stack register is always on stack. */
+  FOR_EACH_INSN_USE (use, a_insn)
+    {
+      if (bitmap_bit_p (bba_sets_sfp, DF_REF_REGNO (use)))
+        return true;
+    }
+
+  return false;
+}
+
+/* Always return true, if there is a dominating write.
+   
+   When there is a dominating read from memory on stack,
+   1) if x = a is a memory read, return true.
+   2) if x = a is a memory write, return true if the memory is on stack.
+      This is the guarantee the memory is *not* readonly. */
+
+static bool
+noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
+                           const_rtx x, bool is_store)
+{
+  rtx_insn *insn;
+  rtx set;
+  bool find_load = false;
+
+  gcc_assert (MEM_P (x));
+
+  FOR_BB_INSNS (bb, insn)
+    {
+      set = single_set (insn);
+      if (!set)
+        continue;
+
+      /* Dominating store */
+      if (rtx_equal_p (x, SET_DEST (set)))
+        return true;
+
+      /* Dominating load */
+      if (rtx_equal_p (x, SET_SRC (set)))
+        find_load = true;
+    }
+
+  if (find_load)
+    {
+      if (is_store && noce_mem_is_on_stack (a_insn, x))
+        return true;
+    }
+
+  return false;
+}
+
+/* Return false if the memory a or b must be valid.
+   This function must be called before latent swap of a and b. */
+
+static bool
+noce_mem_maybe_invalid_p (struct noce_if_info *if_info)
+{
+  /* for memory load */
+  if (if_info->a && MEM_P (if_info->a))
+    {
+      return !noce_valid_for_dominating (if_info->test_bb,
+                                         if_info->insn_a,
+                                         if_info->a, false);
+    }
+
+  /* for memory store */
+  else if (if_info->b && MEM_P (if_info->b))
+    {
+      if (!if_info->else_bb)
+        return !noce_valid_for_dominating (if_info->test_bb,
+                                           if_info->insn_a,
+                                           if_info->b, true);
+      else
+        return !noce_valid_for_dominating (if_info->test_bb,
+                                           if_info->insn_b,
+                                           if_info->b, true);
+    }
+
+  /* ??? We could handle this if we knew that a load from A or B could
+     not trap or fault.  This is also true if we've already loaded
+     from the address along the path from ENTRY.  */
+  return (may_trap_or_fault_p (if_info->a)
+          || may_trap_or_fault_p (if_info->b));
+}
+
 /* Try more complex cases involving conditional_move.  */
 
 static int
@@ -2065,10 +2173,7 @@ 
       is_mem = 1;
     }
 
-  /* ??? We could handle this if we knew that a load from A or B could
-     not trap or fault.  This is also true if we've already loaded
-     from the address along the path from ENTRY.  */
-  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
+  else if (noce_mem_maybe_invalid_p (if_info))
     return FALSE;
 
   /* if (test) x = a + b; else x = c - d;
@@ -2234,7 +2339,7 @@ 
   /* If insn to set up A clobbers any registers B depends on, try to
      swap insn that sets up A with the one that sets up B.  If even
      that doesn't help, punt.  */
-  if (modified_in_a && !modified_in_b)
+  if (!modified_in_a && modified_in_b)
     {
       if (!noce_emit_bb (emit_b, else_bb, b_simple))
 	goto end_seq_and_fail;
@@ -2242,7 +2347,7 @@ 
       if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
     }
-  else if (!modified_in_a)
+  else if (!modified_in_b)
     {
       if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
@@ -5347,6 +5452,30 @@ 
 
   return FALSE;
 }
+
+static void
+collect_all_fp_insns (void)
+{
+  rtx_insn *a_insn;
+  bba_sets_sfp = BITMAP_ALLOC (&reg_obstack);
+  df_ref def;
+  basic_block bb;
+
+  FOR_ALL_BB_FN (bb, cfun)
+    FOR_BB_INSNS (bb, a_insn)
+      {
+        rtx sset_a = single_set (a_insn);
+        if (!sset_a)
+          continue;
+
+        if (fixed_base_plus_p (SET_SRC (sset_a)))
+          {
+            FOR_EACH_INSN_DEF (def, a_insn)
+  	      bitmap_set_bit (bba_sets_sfp, DF_REF_REGNO (def));
+  	  }
+      }
+}
+
 

 /* Main entry point for all if-conversion.  AFTER_COMBINE is true if
    we are after combine pass.  */
@@ -5381,6 +5510,8 @@ 
 
   df_set_flags (DF_LR_RUN_DCE);
 
+  collect_all_fp_insns ();
+
   /* Go through each of the basic blocks looking for things to convert.  If we
      have conditional execution, we make multiple passes to allow us to handle
      IF-THEN{-ELSE} blocks within other IF-THEN{-ELSE} blocks.  */
@@ -5413,6 +5544,8 @@ 
     }
   while (cond_exec_changed_p);
 
+  BITMAP_FREE (bba_sets_sfp);
+
 #ifdef IFCVT_MULTIPLE_DUMPS
   if (dump_file)
     fprintf (dump_file, "\n\n========== no more changes\n");
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	(revision 268256)
+++ gcc/rtl.h	(working copy)
@@ -3751,6 +3751,8 @@ 
 #define hard_frame_pointer_rtx	(global_rtl[GR_HARD_FRAME_POINTER])
 #define arg_pointer_rtx		(global_rtl[GR_ARG_POINTER])
 
+extern bool fixed_base_plus_p (rtx x);
+
 #ifndef GENERATOR_FILE
 /* Return the attributes of a MEM rtx.  */
 static inline const struct mem_attrs *
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	(revision 268256)
+++ gcc/rtlanal.c	(working copy)
@@ -341,6 +341,30 @@ 
   return 0;
 }
 
+/* Nonzero if X has the form (PLUS frame-pointer integer).  */
+
+bool
+fixed_base_plus_p (rtx x)
+{
+  switch (GET_CODE (x))
+    {
+    case REG:
+      if (x == frame_pointer_rtx || x == hard_frame_pointer_rtx)
+        return true;
+      if (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM])
+        return true;
+      return false;
+
+    case PLUS:
+      if (!CONST_INT_P (XEXP (x, 1)))
+        return false;
+      return fixed_base_plus_p (XEXP (x, 0));
+
+    default:
+      return false;
+    }
+}
+
 /* Compute an approximation for the offset between the register
    FROM and TO for the current function, as it was at the start
    of the routine.  */