Patchwork Further restrict TER replacing over calls (PR55752)

login
register
mail settings
Submitter Richard Guenther
Date Dec. 20, 2012, 2:15 p.m.
Message ID <alpine.LNX.2.00.1212201515140.6889@zhemvz.fhfr.qr>
Download mbox | patch
Permalink /patch/207667/
State New
Headers show

Comments

Richard Guenther - Dec. 20, 2012, 2:15 p.m.
On Thu, 20 Dec 2012, Richard Biener wrote:

> On Thu, 20 Dec 2012, Jakub Jelinek wrote:
> 
> > On Thu, Dec 20, 2012 at 02:51:55PM +0100, Richard Biener wrote:
> > > In the PR we perform expression replacement of an FP operation
> > > across a builtin call that sets the FP control register.  This
> > > patch restricts replacement across calls further, from allowing
> > > all builtins to only allowing those without side-effects.
> > > 
> > > Allowing replacement over calls at all was to not pessimize
> > > FP code generation for example for sqrt which is most often
> > > expanded to a single instruction.
> > > 
> > > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > > 
> > > Comments?
> > 
> > Wouldn't it be better to have there a list of known builtins over which it
> > is fine to do TER?  I'd bet most of memory or string builtins that don't
> > call malloc/free should be still ok, but they surely have side-effects.
> 
> I'm not sure - the original reason was that replacing across calls
> made us spill more because there was a call.  We agreed that replacing
> across calls isn't usually a good idea but put in the (admittedly bad)
> workaround to still allow doing so across likely-not-calls.
> string builtins generally will expand to calls though.
> 
> I was thinking of even making it stronger and increment "cur_call_cnt"
> when the stmt (even non-call) has side-effects (would for example
> cover volatile asms or general volatile touching insns).

Like so:

       /* Now see if we are creating a new expression or not.  */

Richard.
Richard Guenther - Dec. 20, 2012, 3:14 p.m.
On Thu, 20 Dec 2012, Richard Biener wrote:

> On Thu, 20 Dec 2012, Richard Biener wrote:
> 
> > On Thu, 20 Dec 2012, Jakub Jelinek wrote:
> > 
> > > On Thu, Dec 20, 2012 at 02:51:55PM +0100, Richard Biener wrote:
> > > > In the PR we perform expression replacement of an FP operation
> > > > across a builtin call that sets the FP control register.  This
> > > > patch restricts replacement across calls further, from allowing
> > > > all builtins to only allowing those without side-effects.
> > > > 
> > > > Allowing replacement over calls at all was to not pessimize
> > > > FP code generation for example for sqrt which is most often
> > > > expanded to a single instruction.
> > > > 
> > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > > > 
> > > > Comments?
> > > 
> > > Wouldn't it be better to have there a list of known builtins over which it
> > > is fine to do TER?  I'd bet most of memory or string builtins that don't
> > > call malloc/free should be still ok, but they surely have side-effects.
> > 
> > I'm not sure - the original reason was that replacing across calls
> > made us spill more because there was a call.  We agreed that replacing
> > across calls isn't usually a good idea but put in the (admittedly bad)
> > workaround to still allow doing so across likely-not-calls.
> > string builtins generally will expand to calls though.
> > 
> > I was thinking of even making it stronger and increment "cur_call_cnt"
> > when the stmt (even non-call) has side-effects (would for example
> > cover volatile asms or general volatile touching insns).

After discussing on IRC I am testing the following which adds
a target hook and just treats ldmxcsr and stmxcsr differently
as well as all volatile asms and internal functions.

Bootstrap & regtest on x86_64-unknown-linux-gnu running.

Ok for trunk?

Thanks,
Richard.

2012-12-20  Richard Biener  <rguenther@suse.de>

	PR middle-end/55752
	* target.def (sched): Add scheduling_barrier_p.
	* targhooks.c (default_scheduling_barrier_p): New function.
	* targhooks.h (default_scheduling_barrier_p): Declare.
	* doc/tm.texi.in (TARGET_SCHED_SCHEDULING_BARRIER_P): Add.
	* doc/tm.texi: Update.
	* tree-ssa-ter.c: Include target.h.
	(find_replaceable_in_bb): Do not schedule across volatile
	asms or stmts the target thinks are scheduling barriers.
	Do not treat internal functions as scheduling barrier by default.
	* i386/i386.c (TARGET_SCHED_SCHEDULING_BARRIER_P): Override.
	(ix86_scheduling_barrier_p): New function.  Handle
	IX86_BUILTIN_LDMXCSR and IX86_BUILTIN_STMXCSR.
	* Makefile.in (tree-ssa-ter.o): Add $(TARGET_H) dependency.

Index: gcc/target.def
===================================================================
*** gcc/target.def	(revision 194632)
--- gcc/target.def	(working copy)
*************** parallelism required in output calculati
*** 939,944 ****
--- 939,954 ----
  int, (unsigned int opc, enum machine_mode mode),
  hook_int_uint_mode_1)
  
+ /* The following member value is a function that returns whether
+    the statement is considered a barrier for scheduling.  By default
+    this returns false.  */
+ DEFHOOK
+ (scheduling_barrier_p,
+ "This hook is called by TER to determine whether the statement is\n\
+ a scheduling barrier.",
+ bool, (gimple stmt),
+ default_scheduling_barrier_p)
+ 
  HOOK_VECTOR_END (sched)
  
  /* Functions relating to vectorization.  */
Index: gcc/targhooks.c
===================================================================
*** gcc/targhooks.c	(revision 194632)
--- gcc/targhooks.c	(working copy)
*************** default_canonicalize_comparison (int *,
*** 1547,1550 ****
--- 1547,1557 ----
  {
  }
  
+ /* Default version of scheduling_barrier_p.  */
+ bool
+ default_scheduling_barrier_p (gimple)
+ {
+   return false;
+ }
+ 
  #include "gt-targhooks.h"
Index: gcc/targhooks.h
===================================================================
*** gcc/targhooks.h	(revision 194632)
--- gcc/targhooks.h	(working copy)
*************** extern const char *default_pch_valid_p (
*** 195,197 ****
--- 195,199 ----
  extern void default_asm_output_ident_directive (const char*);
  
  extern bool default_member_type_forces_blk (const_tree, enum machine_mode);
+ 
+ extern bool default_scheduling_barrier_p (gimple);
Index: gcc/doc/tm.texi.in
===================================================================
*** gcc/doc/tm.texi.in	(revision 194632)
--- gcc/doc/tm.texi.in	(working copy)
*************** in its second parameter.
*** 6737,6742 ****
--- 6737,6744 ----
  
  @hook TARGET_SCHED_REASSOCIATION_WIDTH
  
+ @hook TARGET_SCHED_SCHEDULING_BARRIER_P
+ 
  @node Sections
  @section Dividing the Output into Sections (Texts, Data, @dots{})
  @c the above section title is WAY too long.  maybe cut the part between
Index: gcc/doc/tm.texi
===================================================================
*** gcc/doc/tm.texi	(revision 194632)
--- gcc/doc/tm.texi	(working copy)
*************** This hook is called by tree reassociator
*** 6840,6845 ****
--- 6840,6850 ----
  parallelism required in output calculations chain.
  @end deftypefn
  
+ @deftypefn {Target Hook} bool TARGET_SCHED_SCHEDULING_BARRIER_P (gimple @var{stmt})
+ This hook is called by TER to determine whether the statement is
+ a scheduling barrier.
+ @end deftypefn
+ 
  @node Sections
  @section Dividing the Output into Sections (Texts, Data, @dots{})
  @c the above section title is WAY too long.  maybe cut the part between
Index: gcc/tree-ssa-ter.c
===================================================================
*** gcc/tree-ssa-ter.c	(revision 194632)
--- gcc/tree-ssa-ter.c	(working copy)
*************** along with GCC; see the file COPYING3.
*** 31,36 ****
--- 31,37 ----
  #include "dumpfile.h"
  #include "tree-ssa-live.h"
  #include "flags.h"
+ #include "target.h"
  
  
  /* Temporary Expression Replacement (TER)
*************** find_replaceable_in_bb (temp_expr_table_
*** 681,692 ****
  	    kill_expr (tab, partition);
  	}
  
!       /* Increment counter if this is a non BUILT_IN call. We allow
! 	 replacement over BUILT_IN calls since many will expand to inline
! 	 insns instead of a true call.  */
!       if (is_gimple_call (stmt)
! 	  && !((fndecl = gimple_call_fndecl (stmt))
! 	       && DECL_BUILT_IN (fndecl)))
  	cur_call_cnt++;
  
        /* Now see if we are creating a new expression or not.  */
--- 682,697 ----
  	    kill_expr (tab, partition);
  	}
  
!       /* Increment counter if this is not a BUILT_IN call or a stmt with
! 	 side-effects.  We allow replacement over BUILT_IN calls
! 	 since many will expand to inline insns instead of a true call.  */
!       if ((gimple_code (stmt) == GIMPLE_ASM
! 	   && gimple_asm_volatile_p (stmt))
! 	  || (is_gimple_call (stmt)
! 	      && !gimple_call_internal_p (stmt)
! 	      && !((fndecl = gimple_call_fndecl (stmt))
! 		   && DECL_BUILT_IN (fndecl)))
! 	  || targetm.sched.scheduling_barrier_p (stmt))
  	cur_call_cnt++;
  
        /* Now see if we are creating a new expression or not.  */
Index: gcc/config/i386/i386.c
===================================================================
*** gcc/config/i386/i386.c	(revision 194632)
--- gcc/config/i386/i386.c	(working copy)
*************** ix86_enum_va_list (int idx, const char *
*** 41159,41164 ****
--- 41159,41166 ----
  #define TARGET_SCHED_ADJUST_PRIORITY ix86_adjust_priority
  #undef TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK
  #define TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK ix86_dependencies_evaluation_hook
+ #undef TARGET_SCHED_SCHEDULING_BARRIER_P
+ #define TARGET_SCHED_SCHEDULING_BARRIER_P ix86_scheduling_barrier_p
  
  /* The size of the dispatch window is the total number of bytes of
     object code allowed in a window.  */
*************** ix86_reassociation_width (unsigned int o
*** 41982,41987 ****
--- 41984,42014 ----
    return res;
  }
  
+ /* Implementation of scheduling_barrier_p.  */
+ 
+ static bool
+ ix86_scheduling_barrier_p (gimple stmt)
+ {
+   if (!is_gimple_call (stmt))
+     return false;
+ 
+   tree fndecl = gimple_call_fndecl (stmt);
+   if (!fndecl
+       || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_MD)
+     return false;
+ 
+   switch (DECL_FUNCTION_CODE (fndecl))
+     {
+     case IX86_BUILTIN_LDMXCSR:
+     case IX86_BUILTIN_STMXCSR:
+       return true;
+ 
+     default:;
+     }
+ 
+   return false;
+ }
+ 
  /* ??? No autovectorization into MMX or 3DNOW until we can reliably
     place emms and femms instructions.  */
  
Index: gcc/Makefile.in
===================================================================
*** gcc/Makefile.in	(revision 194632)
--- gcc/Makefile.in	(working copy)
*************** tree-into-ssa.o : tree-into-ssa.c $(TREE
*** 2272,2278 ****
  tree-ssa-ter.o : tree-ssa-ter.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
     $(TREE_H) $(DIAGNOSTIC_H) $(TM_H) coretypes.h dumpfile.h \
     $(TREE_SSA_LIVE_H) $(BITMAP_H) $(FLAGS_H) \
!    $(GIMPLE_PRETTY_PRINT_H)
  tree-ssa-coalesce.o : tree-ssa-coalesce.c $(TREE_FLOW_H) $(CONFIG_H) \
     $(SYSTEM_H) $(TREE_H) $(DIAGNOSTIC_H) $(TM_H) coretypes.h dumpfile.h \
     $(TREE_SSA_LIVE_H) $(BITMAP_H) $(FLAGS_H) $(HASH_TABLE_H) \
--- 2272,2278 ----
  tree-ssa-ter.o : tree-ssa-ter.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
     $(TREE_H) $(DIAGNOSTIC_H) $(TM_H) coretypes.h dumpfile.h \
     $(TREE_SSA_LIVE_H) $(BITMAP_H) $(FLAGS_H) \
!    $(GIMPLE_PRETTY_PRINT_H) $(TARGET_H)
  tree-ssa-coalesce.o : tree-ssa-coalesce.c $(TREE_FLOW_H) $(CONFIG_H) \
     $(SYSTEM_H) $(TREE_H) $(DIAGNOSTIC_H) $(TM_H) coretypes.h dumpfile.h \
     $(TREE_SSA_LIVE_H) $(BITMAP_H) $(FLAGS_H) $(HASH_TABLE_H) \

Patch

Index: gcc/tree-ssa-ter.c
===================================================================
--- gcc/tree-ssa-ter.c  (revision 194632)
+++ gcc/tree-ssa-ter.c  (working copy)
@@ -681,12 +681,13 @@  find_replaceable_in_bb (temp_expr_table_
            kill_expr (tab, partition);
        }
 
-      /* Increment counter if this is a non BUILT_IN call. We allow
-        replacement over BUILT_IN calls since many will expand to inline
-        insns instead of a true call.  */
-      if (is_gimple_call (stmt)
-         && !((fndecl = gimple_call_fndecl (stmt))
-              && DECL_BUILT_IN (fndecl)))
+      /* Increment counter if this is not a BUILT_IN call or a stmt with
+        side-effects.  We allow replacement over BUILT_IN calls
+        since many will expand to inline insns instead of a true call.  
*/
+      if (gimple_has_side_effects (stmt)
+         || (is_gimple_call (stmt)
+             && !((fndecl = gimple_call_fndecl (stmt))
+                  && DECL_BUILT_IN (fndecl))))
        cur_call_cnt++;