diff mbox series

[GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk

Message ID alpine.LSU.2.20.1709271302380.26836@zhemvz.fhfr.qr
State New
Headers show
Series [GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk | expand

Commit Message

Richard Biener Sept. 27, 2017, 11:07 a.m. UTC
This removes another quadraticness from SCOP detection, gather_bbs
domwalk.  This is done by enhancing domwalk to handle SEME regions
via a special return value from before_dom_children.

With this I'm now confident to remove the 
PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit.
Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its
documented default value which enables 90 more loos to be processed
in SPEC CPU 2006.  I've also made a value of zero magic in disabling
the limit (a trick commonly used in GCC).

Statistics I have gathered a few patches before for SPEC CPU 2006:

1255 multi-loop SESEs in SCOP processing
max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8
max. drs per scop 869, 10 scops >= 100
max. pbbs per scop 36, 12 scops >= 10
919 SCOPs fail in build_alias_sets

which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP
is reasonable (if tuned to SPEC CPU 2006).

I've also included the hunk that allows -fgraphite-identity
to work ontop of -floop-nest-optimize and for -floop-nest-optimize
-ftree-parallelize-all also make sure to code-gen loops that
end up not transformed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006
tested, applied to trunk.

Richard.

2017-09-27  Richard Biener  <rguenther@suse.de>

	* doc/invoke.texi (graphite-max-bbs-per-function): Remove.
	(graphite-max-nb-scop-params): Document special value zero.
	* domwalk.h (dom_walker::STOP): New symbolical constant.
	(dom_walker::dom_walker): Add optional parameter for bb to
	RPO mapping.
	(dom_walker::~dom_walker): Declare.
	(dom_walker::before_dom_children): Document STOP return value.
	(dom_walker::m_user_bb_to_rpo): New member.
	(dom_walker::m_bb_to_rpo): Likewise.
	* domwalk.c (dom_walker::dom_walker): Compute bb to RPO
	mapping here if not provided by the user.
	(dom_walker::~dom_walker): Free bb to RPO mapping if not
	provided by the user.
	(dom_walker::STOP): Define.
	(dom_walker::walk): Do not compute bb to RPO mapping here.
	Support STOP return value from before_dom_children to stop
	walking.
	* graphite-optimize-isl.c (optimize_isl): If the schedule
	is the same still generate code if -fgraphite-identity
	or -floop-parallelize-all are given.
	* graphite-scop-detection.c: Include cfganal.h.
	(gather_bbs::gather_bbs): Get and pass through bb to RPO
	mapping.
	(gather_bbs::before_dom_children): Return STOP for BBs
	not in the region.
	(build_scops): Compute bb to RPO mapping and pass it to
	the domwalk.  Treat --param graphite-max-nb-scop-params=0
	as not limiting the number of params.
	* graphite.c (graphite_initialize): Remove limit on the
	number of basic-blocks in a function.
	* params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove.
	(PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented
	default value of 10.

Comments

Sebastian Pop Sept. 28, 2017, 8:12 p.m. UTC | #1
On Wed, Sep 27, 2017 at 6:07 AM, Richard Biener <rguenther@suse.de> wrote:
>
> This removes another quadraticness from SCOP detection, gather_bbs
> domwalk.  This is done by enhancing domwalk to handle SEME regions
> via a special return value from before_dom_children.
>
> With this I'm now confident to remove the
> PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit.
> Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its
> documented default value which enables 90 more loos to be processed
> in SPEC CPU 2006.  I've also made a value of zero magic in disabling
> the limit (a trick commonly used in GCC).
>
> Statistics I have gathered a few patches before for SPEC CPU 2006:
>
> 1255 multi-loop SESEs in SCOP processing
> max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8
> max. drs per scop 869, 10 scops >= 100
> max. pbbs per scop 36, 12 scops >= 10
> 919 SCOPs fail in build_alias_sets
>
> which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP
> is reasonable (if tuned to SPEC CPU 2006).
>
> I've also included the hunk that allows -fgraphite-identity
> to work ontop of -floop-nest-optimize and for -floop-nest-optimize
> -ftree-parallelize-all also make sure to code-gen loops that
> end up not transformed.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006
> tested, applied to trunk.
>
> Richard.
>
> 2017-09-27  Richard Biener  <rguenther@suse.de>
>
>         * doc/invoke.texi (graphite-max-bbs-per-function): Remove.
>         (graphite-max-nb-scop-params): Document special value zero.
>         * domwalk.h (dom_walker::STOP): New symbolical constant.
>         (dom_walker::dom_walker): Add optional parameter for bb to
>         RPO mapping.
>         (dom_walker::~dom_walker): Declare.
>         (dom_walker::before_dom_children): Document STOP return value.
>         (dom_walker::m_user_bb_to_rpo): New member.
>         (dom_walker::m_bb_to_rpo): Likewise.
>         * domwalk.c (dom_walker::dom_walker): Compute bb to RPO
>         mapping here if not provided by the user.
>         (dom_walker::~dom_walker): Free bb to RPO mapping if not
>         provided by the user.
>         (dom_walker::STOP): Define.
>         (dom_walker::walk): Do not compute bb to RPO mapping here.
>         Support STOP return value from before_dom_children to stop
>         walking.
>         * graphite-optimize-isl.c (optimize_isl): If the schedule
>         is the same still generate code if -fgraphite-identity
>         or -floop-parallelize-all are given.
>         * graphite-scop-detection.c: Include cfganal.h.
>         (gather_bbs::gather_bbs): Get and pass through bb to RPO
>         mapping.
>         (gather_bbs::before_dom_children): Return STOP for BBs
>         not in the region.
>         (build_scops): Compute bb to RPO mapping and pass it to
>         the domwalk.  Treat --param graphite-max-nb-scop-params=0
>         as not limiting the number of params.
>         * graphite.c (graphite_initialize): Remove limit on the
>         number of basic-blocks in a function.
>         * params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove.
>         (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented
>         default value of 10.

The patch looks good.  Thanks!

>
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi (revision 253224)
> +++ gcc/doc/invoke.texi (working copy)
> @@ -10512,13 +10512,9 @@ sequence pairs.  This option only applie
>  @item graphite-max-nb-scop-params
>  To avoid exponential effects in the Graphite loop transforms, the
>  number of parameters in a Static Control Part (SCoP) is bounded.  The
> -default value is 10 parameters.

Now that we have "compute-out" functionality in all supported
versions of isl, let's remove this parameter.

We needed this in the past when isl was not able to stop an
exponential computation, and that happened when operating
on large dimension spaces.
Sebastian Pop Sept. 28, 2017, 8:29 p.m. UTC | #2
On Wed, Sep 27, 2017 at 6:07 AM, Richard Biener <rguenther@suse.de> wrote:
>  /* Maximal number of array references in a scop.  */
>
DEFPARAM (PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP,
          "graphite-max-arrays-per-scop",
          "maximum number of arrays per scop.",
          100, 0, 0)

Let's also remove this param as we now have max-isl-operations.

Thanks,
Sebastian
diff mbox series

Patch

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 253224)
+++ gcc/doc/invoke.texi	(working copy)
@@ -10512,13 +10512,9 @@  sequence pairs.  This option only applie
 @item graphite-max-nb-scop-params
 To avoid exponential effects in the Graphite loop transforms, the
 number of parameters in a Static Control Part (SCoP) is bounded.  The
-default value is 10 parameters.  A variable whose value is unknown at
-compilation time and defined outside a SCoP is a parameter of the SCoP.
-
-@item graphite-max-bbs-per-function
-To avoid exponential effects in the detection of SCoPs, the size of
-the functions analyzed by Graphite is bounded.  The default value is
-100 basic blocks.
+default value is 10 parameters, a value of zero can be used to lift
+the bound.  A variable whose value is unknown at compilation time and
+defined outside a SCoP is a parameter of the SCoP.
 
 @item loop-block-tile-size
 Loop blocking or strip mining transforms, enabled with
Index: gcc/domwalk.c
===================================================================
--- gcc/domwalk.c	(revision 253224)
+++ gcc/domwalk.c	(working copy)
@@ -174,13 +174,29 @@  sort_bbs_postorder (basic_block *bbs, in
    If SKIP_UNREACHBLE_BLOCKS is true, then we need to set
    EDGE_EXECUTABLE on every edge in the CFG. */
 dom_walker::dom_walker (cdi_direction direction,
-			bool skip_unreachable_blocks)
+			bool skip_unreachable_blocks,
+			int *bb_index_to_rpo)
   : m_dom_direction (direction),
     m_skip_unreachable_blocks (skip_unreachable_blocks),
-    m_unreachable_dom (NULL)
+    m_user_bb_to_rpo (bb_index_to_rpo != NULL),
+    m_unreachable_dom (NULL),
+    m_bb_to_rpo (bb_index_to_rpo)
 {
+  /* Compute the basic-block index to RPO mapping if not provided by
+     the user.  */
+  if (! m_bb_to_rpo && direction == CDI_DOMINATORS)
+    {
+      int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+      int postorder_num = pre_and_rev_post_order_compute (NULL, postorder,
+							  true);
+      m_bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
+      for (int i = 0; i < postorder_num; ++i)
+	m_bb_to_rpo[postorder[i]] = i;
+      free (postorder);
+    }
+
   /* If we are not skipping unreachable blocks, then there is nothing
-     to do.  */
+     further to do.  */
   if (!m_skip_unreachable_blocks)
     return;
 
@@ -194,6 +210,14 @@  dom_walker::dom_walker (cdi_direction di
     }
 }
 
+/* Destructor.  */
+
+dom_walker::~dom_walker ()
+{
+  if (! m_user_bb_to_rpo)
+    free (m_bb_to_rpo);
+}
+
 /* Return TRUE if BB is reachable, false otherwise.  */
 
 bool
@@ -254,6 +278,8 @@  dom_walker::propagate_unreachable_to_edg
     m_unreachable_dom = bb;
 }
 
+const edge dom_walker::STOP = (edge)-1;
+
 /* Recursively walk the dominator tree.
    BB is the basic block we are currently visiting.  */
 
@@ -264,17 +290,7 @@  dom_walker::walk (basic_block bb)
   basic_block *worklist = XNEWVEC (basic_block,
 				   n_basic_blocks_for_fn (cfun) * 2);
   int sp = 0;
-  int *postorder, postorder_num;
-
-  if (m_dom_direction == CDI_DOMINATORS)
-    {
-      postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
-      postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
-      bb_postorder = XNEWVEC (int, last_basic_block_for_fn (cfun));
-      for (int i = 0; i < postorder_num; ++i)
-	bb_postorder[postorder[i]] = i;
-      free (postorder);
-    }
+  bb_postorder = m_bb_to_rpo;
 
   while (true)
     {
@@ -283,13 +299,14 @@  dom_walker::walk (basic_block bb)
 	  || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)
 	  || bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
 	{
+	  edge taken_edge = NULL;
 
 	  /* Callback for subclasses to do custom things before we have walked
 	     the dominator children, but before we walk statements.  */
 	  if (this->bb_reachable (cfun, bb))
 	    {
-	      edge taken_edge = before_dom_children (bb);
-	      if (taken_edge)
+	      taken_edge = before_dom_children (bb);
+	      if (taken_edge && taken_edge != STOP)
 		{
 		  edge_iterator ei;
 		  edge e;
@@ -306,12 +323,17 @@  dom_walker::walk (basic_block bb)
 	  worklist[sp++] = bb;
 	  worklist[sp++] = NULL;
 
-	  int saved_sp = sp;
-	  for (dest = first_dom_son (m_dom_direction, bb);
-	       dest; dest = next_dom_son (m_dom_direction, dest))
-	    worklist[sp++] = dest;
-	  if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS)
-	    sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp);
+	  /* If the callback returned NONE then we are supposed to
+	     stop and not even propagate EDGE_EXECUTABLE further.  */
+	  if (taken_edge != STOP)
+	    {
+	      int saved_sp = sp;
+	      for (dest = first_dom_son (m_dom_direction, bb);
+		   dest; dest = next_dom_son (m_dom_direction, dest))
+		worklist[sp++] = dest;
+	      if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS)
+		sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp);
+	    }
 	}
       /* NULL is used to mark pop operations in the recursion stack.  */
       while (sp > 0 && !worklist[sp - 1])
@@ -331,10 +353,6 @@  dom_walker::walk (basic_block bb)
       else
 	break;
     }
-  if (m_dom_direction == CDI_DOMINATORS)
-    {
-      free (bb_postorder);
-      bb_postorder = NULL;
-    }
+  bb_postorder = NULL;
   free (worklist);
 }
Index: gcc/domwalk.h
===================================================================
--- gcc/domwalk.h	(revision 253224)
+++ gcc/domwalk.h	(working copy)
@@ -30,14 +30,22 @@  along with GCC; see the file COPYING3.
 class dom_walker
 {
 public:
+  static const edge STOP;
+
   /* Use SKIP_UNREACHBLE_BLOCKS = true when your client can discover
      that some edges are not executable.
 
      If a client can discover that a COND, SWITCH or GOTO has a static
      target in the before_dom_children callback, the taken edge should
      be returned.  The generic walker will clear EDGE_EXECUTABLE on all
-     edges it can determine are not executable.  */
-  dom_walker (cdi_direction direction, bool skip_unreachable_blocks = false);
+     edges it can determine are not executable.
+     
+     You can provide a mapping of basic-block index to RPO if you
+     have that readily available or you do multiple walks.  */
+  dom_walker (cdi_direction direction, bool skip_unreachable_blocks = false,
+	      int *bb_index_to_rpo = NULL);
+
+  ~dom_walker ();
 
   /* Walk the dominator tree.  */
   void walk (basic_block);
@@ -48,7 +56,10 @@  public:
      edges, NULL otherwise.  When skipping unreachable blocks, the walker
      uses the taken edge information to clear EDGE_EXECUTABLE on the other
      edges, exposing unreachable blocks.  A NULL return value means all
-     outgoing edges should still be considered executable.  */
+     outgoing edges should still be considered executable.  A return value
+     of STOP means to stop the domwalk from processing dominated blocks from
+     here.  This can be used to process a SEME region only (note domwalk
+     will still do work linear in function size).  */
   virtual edge before_dom_children (basic_block) { return NULL; }
 
   /* Function to call after the recursive walk of the dominator children.  */
@@ -61,7 +72,9 @@  private:
      dominator tree.  */
   const ENUM_BITFIELD (cdi_direction) m_dom_direction : 2;
   bool m_skip_unreachable_blocks;
+  bool m_user_bb_to_rpo;
   basic_block m_unreachable_dom;
+  int *m_bb_to_rpo;
 
   /* Query whether or not the given block is reachable or not.  */
   bool bb_reachable (struct function *, basic_block);
Index: gcc/graphite-optimize-isl.c
===================================================================
--- gcc/graphite-optimize-isl.c	(revision 253224)
+++ gcc/graphite-optimize-isl.c	(working copy)
@@ -189,7 +189,7 @@  optimize_isl (scop_p scop)
 	print_schedule_ast (dump_file, scop->original_schedule, scop);
       isl_schedule_free (scop->transformed_schedule);
       scop->transformed_schedule = isl_schedule_copy (scop->original_schedule);
-      return false;
+      return flag_graphite_identity || flag_loop_parallelize_all;
     }
 
   return true;
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c	(revision 253224)
+++ gcc/graphite-scop-detection.c	(working copy)
@@ -48,6 +48,7 @@  along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "tree-ssa-propagate.h"
 #include "gimple-pretty-print.h"
+#include "cfganal.h"
 #include "graphite.h"
 
 class debug_printer
@@ -1544,7 +1545,7 @@  build_alias_set (scop_p scop)
 class gather_bbs : public dom_walker
 {
 public:
-  gather_bbs (cdi_direction, scop_p);
+  gather_bbs (cdi_direction, scop_p, int *);
 
   virtual edge before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
@@ -1554,8 +1555,8 @@  private:
   scop_p scop;
 };
 }
-gather_bbs::gather_bbs (cdi_direction direction, scop_p scop)
-  : dom_walker (direction), scop (scop)
+gather_bbs::gather_bbs (cdi_direction direction, scop_p scop, int *bb_to_rpo)
+  : dom_walker (direction, false, bb_to_rpo), scop (scop)
 {
 }
 
@@ -1589,7 +1590,7 @@  gather_bbs::before_dom_children (basic_b
 {
   sese_info_p region = scop->scop_info;
   if (!bb_in_sese_p (bb, region->region))
-    return NULL;
+    return dom_walker::STOP;
 
   record_loop_in_sese (bb, region);
 
@@ -1708,6 +1709,15 @@  build_scops (vec<scop_p> *scops)
 
   /* Now create scops from the lightweight SESEs.  */
   vec<sese_l> scops_l = sb.get_scops ();
+
+  /* Domwalk needs a bb to RPO mapping.  Compute it once here.  */
+  int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+  int postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
+  int *bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
+  for (int i = 0; i < postorder_num; ++i)
+    bb_to_rpo[postorder[i]] = i;
+  free (postorder);
+
   int i;
   sese_l *s;
   FOR_EACH_VEC_ELT (scops_l, i, s)
@@ -1715,7 +1725,7 @@  build_scops (vec<scop_p> *scops)
       scop_p scop = new_scop (s->entry, s->exit);
 
       /* Record all basic blocks and their conditions in REGION.  */
-      gather_bbs (CDI_DOMINATORS, scop).walk (s->entry->dest);
+      gather_bbs (CDI_DOMINATORS, scop, bb_to_rpo).walk (s->entry->dest);
 
       /* domwalk does not fulfil our code-generations constraints on the
          order of pbb which is to produce sth like execution order, delaying
@@ -1757,8 +1767,8 @@  build_scops (vec<scop_p> *scops)
 
       find_scop_parameters (scop);
       graphite_dim_t max_dim = PARAM_VALUE (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS);
-
-      if (scop_nb_params (scop) > max_dim)
+      if (max_dim > 0
+	  && scop_nb_params (scop) > max_dim)
 	{
 	  DEBUG_PRINT (dp << "[scop-detection-fail] too many parameters: "
 			  << scop_nb_params (scop)
@@ -1771,6 +1781,7 @@  build_scops (vec<scop_p> *scops)
       scops->safe_push (scop);
     }
 
+  free (bb_to_rpo);
   DEBUG_PRINT (dp << "number of SCoPs: " << (scops ? scops->length () : 0););
 }
 
Index: gcc/graphite.c
===================================================================
--- gcc/graphite.c	(revision 253224)
+++ gcc/graphite.c	(working copy)
@@ -218,14 +218,9 @@  static bool
 graphite_initialize (void)
 {
   int min_loops = PARAM_VALUE (PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION);
-  int max_bbs = PARAM_VALUE (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION);
-  int nbbs = n_basic_blocks_for_fn (cfun);
   int nloops = number_of_loops (cfun);
 
-  if (nloops <= min_loops
-      /* FIXME: This limit on the number of basic blocks of a function
-	 should be removed when the SCOP detection is faster.  */
-      || (nbbs > max_bbs))
+  if (nloops <= min_loops)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -234,10 +229,6 @@  graphite_initialize (void)
 		     "PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION = %d.\n",
 		     min_loops);
 
-	  else if (nbbs > max_bbs)
-	    fprintf (dump_file, "\nFunction has too many basic blocks: "
-		     "PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION = %d.\n", max_bbs);
-
 	  fprintf (dump_file, "\nnumber of SCoPs: 0\n");
 	  print_global_statistics (dump_file);
 	}
Index: gcc/params.def
===================================================================
--- gcc/params.def	(revision 253224)
+++ gcc/params.def	(working copy)
@@ -873,14 +873,7 @@  DEFPARAM (PARAM_LOOP_BLOCK_TILE_SIZE,
 DEFPARAM (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS,
 	  "graphite-max-nb-scop-params",
 	  "maximum number of parameters in a SCoP.",
-	  7, 0, 0)
-
-/* Maximal number of basic blocks in the functions analyzed by Graphite.  */
-
-DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
-	  "graphite-max-bbs-per-function",
-	  "maximum number of basic blocks per function to be analyzed by Graphite.",
-	  100, 0, 0)
+	  10, 0, 0)
 
 /* Maximal number of array references in a scop.  */