diff mbox

[PR67476] Add param parloops-schedule

Message ID 55F6DBD7.9000109@mentor.com
State New
Headers show

Commit Message

Tom de Vries Sept. 14, 2015, 2:38 p.m. UTC
On 14/09/15 16:28, Tom de Vries wrote:
> On 14/09/15 11:09, Jakub Jelinek wrote:
>> On Fri, Sep 11, 2015 at 03:28:01PM +0200, Tom de Vries wrote:
>>> On 11/09/15 12:57, Jakub Jelinek wrote:
>>>> On Fri, Sep 11, 2015 at 12:55:00PM +0200, Tom de Vries wrote:
>>>>>> Hi,
>>>>>>
>>>>>> this patch adds a param parloops-schedule=<0-4>, which sets the
>>>>>> omp schedule
>>>>>> for loops paralellized by parloops.
>>>>>>
>>>>>> The <0-4> maps onto <static|dynamic|guided|auto|runtime>.
>>>>>>
>>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>>
>>>>>> OK for trunk?
>>>> I don't really like it, the mapping of the integers to the enum values
>>>> is non-obvious and hard to remember.
>>>> Perhaps add support for enumeration params if you want this instead?
>>>>
>>>
>>> This patch adds handling of a DEFPARAMENUM macro, which is similar to
>>> the
>>> DEFPARAM macro, but allows the values to be named.
>>>
>>> So the definition of param parloop-schedule becomes:
>>> ...
>>> DEFPARAMENUM PARAM_PARLOOPS_SCHEDULE,
>>>               "parloops-schedule",
>>>               "Schedule type of omp schedule for loops parallelized by "
>>>               "parloops (static, dynamic, guided, auto, runtime)",
>>>               0, 0, 4, "static", "dynamic", "guided", "auto", "runtime")
>>> ...
>>> [ I'll repost the original patch containing this update. ]
>>>
>>> OK for trunk if x86_64 bootstrap and reg-test succeeds?
>>
>> That still allows numeric arguments for the param, which is IMHO
>> undesirable.  If it is enum kind, only the enum values should be
>> accepted.
>
> Fixed.
>
>> Also, it would be nice if params.h in that case would define an enum with
>> the values like
>> PARAM_PARLOOPS_SCHEDULE_KIND_{static,dynamic,guided,auto,runtime}, so use
>> values not wrapped in ""s and only in a macro or generator make both
>> enums and string array out of that.
>
> Done, there's now a file params-enum.h containing these enums.
>
>> There is also the question if we can use __VA_ARGS__, isn't that C99 or
>> C++11 and later feature?  I see gengtype.h and ipa-icf-gimple.h use
>> that too, so maybe yes, but am not sure.
>>
>
> I've removed the use of variadic macros, meaning we now use
> DEFPARAMENUM5 instead of DEFPARAMENUM.
>
> Also, I've remove the min/max arguments in DEFPARAMENUM5.
>
> And I've ensured that the default is now specified as a string rather
> than an integer.
>
> So the new definition of PARAM_PARLOOPS_SCHEDULE looks like:
>
> DEFPARAMENUM5 (PARAM_PARLOOPS_SCHEDULE,
>                "parloops-schedule",
>                "Schedule type of omp schedule for loops parallelized by "
>                "parloops (static, dynamic, guided, auto, runtime)",
>                static,
>                static, dynamic, guided, auto, runtime)
>
> [ Again, I'll repost the original patch containing this update. ]

This patch adds param parloops-schedule (and now uses the enum 
associated with the param).

OK for trunk, if bootstrap and reg-test on x86_64 succeeds?

Thanks,
- Tom
diff mbox

Patch

Add param parloops-schedule

2015-09-10  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/67476
	* doc/invoke.texi (@item parloops-schedule): New item.
	* omp-low.c (expand_omp_for_generic): Handle simple latch.  Add missing
	phis.  Handle original loop.
	* params.def (PARAM_PARLOOPS_SCHEDULE): New DEFPARAMENUM5.
	* tree-parloops.c: Include params-enum.h.
	(create_parallel_loop): Handle PARAM_PARLOOPS_SCHEDULE.

	* testsuite/libgomp.c/autopar-3.c: New test.
	* testsuite/libgomp.c/autopar-4.c: New test.
	* testsuite/libgomp.c/autopar-5.c: New test.
	* testsuite/libgomp.c/autopar-6.c: New test.
	* testsuite/libgomp.c/autopar-7.c: New test.
	* testsuite/libgomp.c/autopar-8.c: New test.
---
 gcc/doc/invoke.texi                     |  4 +++
 gcc/omp-low.c                           | 57 +++++++++++++++++++++++++++++++--
 gcc/params.def                          | 12 +++++++
 gcc/tree-parloops.c                     | 26 ++++++++++++++-
 libgomp/testsuite/libgomp.c/autopar-3.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-4.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-5.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-6.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-7.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-8.c |  4 +++
 10 files changed, 119 insertions(+), 4 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-8.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 76e5e29..2221795 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11005,6 +11005,10 @@  automaton.  The default is 50.
 Chunk size of omp schedule for loops parallelized by parloops.  The default
 is 0.
 
+@item parloops-schedule
+Schedule type of omp schedule for loops parallelized by parloops (static,
+dynamic, guided, auto, runtime).  The default is static.
+
 @end table
 @end table
 
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 88a5149..4f0498b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -239,6 +239,7 @@  static vec<omp_context *> taskreg_contexts;
 
 static void scan_omp (gimple_seq *, omp_context *);
 static tree scan_omp_1_op (tree *, int *, void *);
+static gphi *find_phi_with_arg_on_edge (tree, edge);
 
 #define WALK_SUBSTMTS  \
     case GIMPLE_BIND: \
@@ -6155,7 +6156,9 @@  expand_omp_for_generic (struct omp_region *region,
   if (!broken_loop)
     {
       l2_bb = create_empty_bb (cont_bb);
-      gcc_assert (BRANCH_EDGE (cont_bb)->dest == l1_bb);
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == l1_bb
+		  || (single_succ_edge (BRANCH_EDGE (cont_bb)->dest)->dest
+		      == l1_bb));
       gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
     }
   else
@@ -6429,8 +6432,12 @@  expand_omp_for_generic (struct omp_region *region,
       remove_edge (e);
 
       make_edge (cont_bb, l2_bb, EDGE_FALSE_VALUE);
-      add_bb_to_loop (l2_bb, cont_bb->loop_father);
       e = find_edge (cont_bb, l1_bb);
+      if (e == NULL)
+	{
+	  e = BRANCH_EDGE (cont_bb);
+	  gcc_assert (single_succ (e->dest) == l1_bb);
+	}
       if (gimple_omp_for_combined_p (fd->for_stmt))
 	{
 	  remove_edge (e);
@@ -6454,7 +6461,45 @@  expand_omp_for_generic (struct omp_region *region,
 	  e->flags = EDGE_FALLTHRU;
 	}
       make_edge (l2_bb, l0_bb, EDGE_TRUE_VALUE);
+    }
+
+
+  if (gimple_in_ssa_p (cfun))
+    {
+      gphi_iterator psi;
+
+      for (psi = gsi_start_phis (l3_bb); !gsi_end_p (psi); gsi_next (&psi))
+	{
+	  source_location locus;
+	  gphi *nphi;
+	  gphi *exit_phi = psi.phi ();
+
+	  edge l2_to_l3 = find_edge (l2_bb, l3_bb);
+	  tree exit_res = PHI_ARG_DEF_FROM_EDGE (exit_phi, l2_to_l3);
 
+	  basic_block latch = BRANCH_EDGE (cont_bb)->dest;
+	  edge latch_to_l1 = find_edge (latch, l1_bb);
+	  gphi *inner_phi = find_phi_with_arg_on_edge (exit_res, latch_to_l1);
+
+	  tree t = gimple_phi_result (exit_phi);
+	  tree new_res = copy_ssa_name (t, NULL);
+	  nphi = create_phi_node (new_res, l0_bb);
+
+	  edge l0_to_l1 = find_edge (l0_bb, l1_bb);
+	  t = PHI_ARG_DEF_FROM_EDGE (inner_phi, l0_to_l1);
+	  locus = gimple_phi_arg_location_from_edge (inner_phi, l0_to_l1);
+	  edge entry_to_l0 = find_edge (entry_bb, l0_bb);
+	  add_phi_arg (nphi, t, entry_to_l0, locus);
+
+	  edge l2_to_l0 = find_edge (l2_bb, l0_bb);
+	  add_phi_arg (nphi, exit_res, l2_to_l0, UNKNOWN_LOCATION);
+
+	  add_phi_arg (inner_phi, new_res, l0_to_l1, UNKNOWN_LOCATION);
+	};
+    }
+
+  if (!broken_loop)
+    {
       set_immediate_dominator (CDI_DOMINATORS, l2_bb,
 			       recompute_dominator (CDI_DOMINATORS, l2_bb));
       set_immediate_dominator (CDI_DOMINATORS, l3_bb,
@@ -6464,14 +6509,20 @@  expand_omp_for_generic (struct omp_region *region,
       set_immediate_dominator (CDI_DOMINATORS, l1_bb,
 			       recompute_dominator (CDI_DOMINATORS, l1_bb));
 
+      struct loop *loop = l1_bb->loop_father;
+      add_bb_to_loop (l2_bb, entry_bb->loop_father);
+
       struct loop *outer_loop = alloc_loop ();
       outer_loop->header = l0_bb;
       outer_loop->latch = l2_bb;
       add_loop (outer_loop, l0_bb->loop_father);
 
+      if (loop != entry_bb->loop_father)
+	return;
+
       if (!gimple_omp_for_combined_p (fd->for_stmt))
 	{
-	  struct loop *loop = alloc_loop ();
+	  loop = alloc_loop ();
 	  loop->header = l1_bb;
 	  /* The loop may have multiple latches.  */
 	  add_loop (loop, outer_loop);
diff --git a/gcc/params.def b/gcc/params.def
index 34e2025..76699d4 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -35,6 +35,11 @@  along with GCC; see the file COPYING3.  If not see
      - The maximum acceptable value for the parameter (if greater than
      the minimum).
 
+   The DEFPARAMENUM<N> macro is similar, but instead of the minumum and maximum
+   arguments, it contains a list of <N> allowed strings, corresponding to
+   integer values 0..<N>-1.  Note that the default argument needs to be
+   specified as one of the allowed strings, rather than an integer value.
+
    Be sure to add an entry to invoke.texi summarizing the parameter.  */
 
 /* When branch is predicted to be taken with probability lower than this
@@ -1145,6 +1150,13 @@  DEFPARAM (PARAM_PARLOOPS_CHUNK_SIZE,
 	  "parloops-chunk-size",
 	  "Chunk size of omp schedule for loops parallelized by parloops",
 	  0, 0, 0)
+
+DEFPARAMENUM5 (PARAM_PARLOOPS_SCHEDULE,
+	       "parloops-schedule",
+	       "Schedule type of omp schedule for loops parallelized by "
+	       "parloops (static, dynamic, guided, auto, runtime)",
+	       static,
+	       static, dynamic, guided, auto, runtime)
 /*
 
 Local variables:
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index c164121..4df631e 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -58,6 +58,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "tree-ssa.h"
 #include "params.h"
+#include "params-enum.h"
 
 /* This pass tries to distribute iterations of loops into several threads.
    The implementation is straightforward -- for each loop we test whether its
@@ -2092,8 +2093,31 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   gimple_cond_set_lhs (cond_stmt, cvar_base);
   type = TREE_TYPE (cvar);
   t = build_omp_clause (loc, OMP_CLAUSE_SCHEDULE);
-  OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
   int chunk_size = PARAM_VALUE (PARAM_PARLOOPS_CHUNK_SIZE);
+  enum PARAM_PARLOOPS_SCHEDULE_KIND schedule_type \
+    = (enum PARAM_PARLOOPS_SCHEDULE_KIND) PARAM_VALUE (PARAM_PARLOOPS_SCHEDULE);
+  switch (schedule_type)
+    {
+    case PARAM_PARLOOPS_SCHEDULE_KIND_static:
+      OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
+      break;
+    case PARAM_PARLOOPS_SCHEDULE_KIND_dynamic:
+      OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_DYNAMIC;
+      break;
+    case PARAM_PARLOOPS_SCHEDULE_KIND_guided:
+      OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_GUIDED;
+      break;
+    case PARAM_PARLOOPS_SCHEDULE_KIND_auto:
+      OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_AUTO;
+      chunk_size = 0;
+      break;
+    case PARAM_PARLOOPS_SCHEDULE_KIND_runtime:
+      OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_RUNTIME;
+      chunk_size = 0;
+      break;
+    default:
+      gcc_unreachable ();
+    }
   if (chunk_size != 0)
     OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (t)
       = build_int_cst (integer_type_node, chunk_size);
diff --git a/libgomp/testsuite/libgomp.c/autopar-3.c b/libgomp/testsuite/libgomp.c/autopar-3.c
new file mode 100644
index 0000000..1c25a44
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/autopar-3.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=4 -ffast-math --param parloops-schedule=dynamic" } */
+
+#include "autopar-1.c"
diff --git a/libgomp/testsuite/libgomp.c/autopar-4.c b/libgomp/testsuite/libgomp.c/autopar-4.c
new file mode 100644
index 0000000..78c77d9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/autopar-4.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=4 -ffast-math --param parloops-schedule=dynamic --param parloops-chunk-size=100" } */
+
+#include "autopar-1.c"
diff --git a/libgomp/testsuite/libgomp.c/autopar-5.c b/libgomp/testsuite/libgomp.c/autopar-5.c
new file mode 100644
index 0000000..f0acb20
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/autopar-5.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=4 -ffast-math --param parloops-schedule=guided" } */
+
+#include "autopar-1.c"
diff --git a/libgomp/testsuite/libgomp.c/autopar-6.c b/libgomp/testsuite/libgomp.c/autopar-6.c
new file mode 100644
index 0000000..f6e723e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/autopar-6.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=4 -ffast-math --param parloops-schedule=guided --param parloops-chunk-size=100" } */
+
+#include "autopar-1.c"
diff --git a/libgomp/testsuite/libgomp.c/autopar-7.c b/libgomp/testsuite/libgomp.c/autopar-7.c
new file mode 100644
index 0000000..5f15508f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/autopar-7.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=4 -ffast-math --param parloops-schedule=auto" } */
+
+#include "autopar-1.c"
diff --git a/libgomp/testsuite/libgomp.c/autopar-8.c b/libgomp/testsuite/libgomp.c/autopar-8.c
new file mode 100644
index 0000000..6099e9f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/autopar-8.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=4 -ffast-math --param parloops-schedule=runtime" } */
+
+#include "autopar-1.c"
-- 
1.9.1