diff mbox

gomp-nvptx branch - middle-end changes

Message ID 20161111081221.GP3541@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Nov. 11, 2016, 8:12 a.m. UTC
On Thu, Nov 10, 2016 at 08:12:27PM +0300, Alexander Monakov wrote:
> gcc/
> 	* internal-fn.c (expand_GOMP_SIMT_LANE): New.
> 	(expand_GOMP_SIMT_VF): New.
> 	(expand_GOMP_SIMT_LAST_LANE): New.
> 	(expand_GOMP_SIMT_ORDERED_PRED): New.
> 	(expand_GOMP_SIMT_VOTE_ANY): New.
> 	(expand_GOMP_SIMT_XCHG_BFLY): New.
> 	(expand_GOMP_SIMT_XCHG_IDX): New.
> 	* internal-fn.def (GOMP_SIMT_LANE): New.
> 	(GOMP_SIMT_VF): New.
> 	(GOMP_SIMT_LAST_LANE): New.
> 	(GOMP_SIMT_ORDERED_PRED): New.
> 	(GOMP_SIMT_VOTE_ANY): New.
> 	(GOMP_SIMT_XCHG_BFLY): New.
> 	(GOMP_SIMT_XCHG_IDX): New.
> 	* omp-low.c (omp_maybe_offloaded_ctx): New, outlined from...
> 	(create_omp_child_function): ...here.  Set "omp target entrypoint"
> 	or "omp declare target" attribute based on is_gimple_omp_offloaded.
> 	(omp_max_simt_vf): New.  Use it...
> 	(omp_max_vf): ...here.
> 	(lower_rec_input_clauses): Add reduction lowering for SIMT execution.
> 	(lower_lastprivate_clauses): Likewise, for "lastprivate" lowering.
> 	(lower_omp_ordered): Likewise, for "ordered" lowering.
> 	(expand_omp_simd): Add SIMT transforms.
> 	(pass_data_lower_omp): Add PROP_gimple_lomp_dev.
> 	(execute_omp_device_lower): New.
> 	(pass_data_omp_device_lower): New.
> 	(pass_omp_device_lower): New pass.
> 	(make_pass_omp_device_lower): New.
> 	* passes.def (pass_omp_device_lower): Position new pass.
> 	* tree-pass.h (PROP_gimple_lomp_dev): Define.
> 	(make_pass_omp_device_lower): Declare.

Ok for trunk, once the needed corresponding config/nvptx bits are committed,
with one nit below that needs immediate action and the rest can be resolved
incrementally.  I'd like to check in afterwards the attached patch, at least
for now, so that non-offloaded SIMD code is less affected.  Once you have
the intended outlining of SIMT regions for PTX offloading done (IMHO the
best place to do that is in omp expansion, not gimplification), you can
either base it on that, or revert and do earlier.

> +
> +/* Return maximum SIMT width if offloading may target SIMT hardware.  */
> +
> +static int
> +omp_max_simt_vf (void)
> +{
> +  if (!optimize)
> +    return 0;
> +  if (ENABLE_OFFLOADING)
> +    for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c; )
> +      {
> +	if (!strncmp (c, "nvptx", strlen ("nvptx")))
> +	  return 32;
> +	else if ((c = strchr (c, ',')))
> +	  c++;
> +      }
> +  return 0;
> +}

As discussed privately, this means one has to manually set OFFLOAD_TARGET_NAMES
in the environment when invoking ./cc1 or ./cc1plus in order to match ./gcc -B ./
etc. behavior.  I think it would be better to change the driver so that
it sets OFFLOAD_TARGET_NAMES= in the environment when ENABLE_OFFLOADING, but
-foffload option is used to disable all offloading and then in this function
use the configured in offloading targets if ENABLE_OFFLOADING and
OFFLOAD_TARGET_NAMES is not in the environment.  Can be done incrementally.

> +
>  /* Return maximum possible vectorization factor for the target.  */
>  
>  static int
> @@ -4277,16 +4306,18 @@ omp_max_vf (void)
>                || global_options_set.x_flag_tree_vectorize)))
>      return 1;
>  
> +  int vf = 1;
>    int vs = targetm.vectorize.autovectorize_vector_sizes ();
>    if (vs)
> +    vf = 1 << floor_log2 (vs);
> +  else
>      {
> -      vs = 1 << floor_log2 (vs);
> -      return vs;
> +      machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
> +      if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
> +	vf = GET_MODE_NUNITS (vqimode);
>      }
> -  machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
> -  if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
> -    return GET_MODE_NUNITS (vqimode);
> -  return 1;
> +  int svf = omp_max_simt_vf ();
> +  return MAX (vf, svf);

Increasing the vf even for host in non-offloaded regions is undesirable.
Can be partly solved by the attached patch I'm planning to apply
incrementally, the other part is for the simd modifier of schedule clause,
there I think what we want is use conditional expression (GOMP_USE_SIMT () ?
omp_max_simt_vf () : omp_max_vf).  I'll try to handle the schedule clause
later.

> +class pass_omp_device_lower : public gimple_opt_pass
> +{
> +public:
> +  pass_omp_device_lower (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_omp_device_lower, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *fun)
> +    {
> +      /* FIXME: inlining does not propagate the lomp_dev property.  */
> +      return 1 || !(fun->curr_properties & PROP_gimple_lomp_dev);

Please change this into
(ENABLE_OFFLOADING && (flag_openmp || in_lto))
for now, so that we don't waste compile time even when clearly it
isn't needed, and incrementally change the inliner to propagate
the property.

	Jakub
2016-11-11  Jakub Jelinek  <jakub@redhat.com>

	* internal-fn.c (expand_GOMP_USE_SIMT): New function.
	* tree.c (omp_clause_num_ops): OMP_CLAUSE__SIMT_ has 0 operands.
	(omp_clause_code_name): Add _simt_ name.
	(walk_tree_1): Handle OMP_CLAUSE__SIMT_.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SIMT_.
	* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE__SIMT_.
	(scan_omp_simd): New function.
	(scan_omp_1_stmt): Use it in target regions if needed.
	(omp_max_vf): Don't max with omp_max_simt_vf.
	(lower_rec_simd_input_clauses): Do it here, only if
	OMP_CLAUSE__SIMT_ is present.
	(lower_rec_input_clauses): Compute maybe_simt from presence of
	OMP_CLAUSE__SIMT_.
	(lower_lastprivate_clauses): Likewise.
	(expand_omp_simd): Likewise.
	(execute_omp_device_lower): Lower IFN_GOMP_USE_SIMT.
	* internal-fn.def (GOMP_USE_SIMT): New internal function.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__SIMT_.



	Jakub

Comments

Alexander Monakov Nov. 11, 2016, 8:52 a.m. UTC | #1
On Fri, 11 Nov 2016, Jakub Jelinek wrote:
[...]
> the intended outlining of SIMT regions for PTX offloading done (IMHO the
> best place to do that is in omp expansion, not gimplification)

Sorry, I couldn't find a good way to implement that during omp expansion.  The
reason I went for gimplification is automatic discovery of sharing clauses -
I'm assuming in expansion it's very hard to try and fill omp_data_[sio] without
gimplifier's help.  Does this sound sensible?

Thanks.
Alexander
Jakub Jelinek Nov. 11, 2016, 9:12 a.m. UTC | #2
On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote:
> On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> [...]
> > the intended outlining of SIMT regions for PTX offloading done (IMHO the
> > best place to do that is in omp expansion, not gimplification)
> 
> Sorry, I couldn't find a good way to implement that during omp expansion.  The
> reason I went for gimplification is automatic discovery of sharing clauses -
> I'm assuming in expansion it's very hard to try and fill omp_data_[sio] without
> gimplifier's help.  Does this sound sensible?

Sure, for discovery of needed sharing clauses the gimplifier has the right
infrastructure.  But that doesn't mean you can't add those clauses at
gimplification time and do the outlining at omp expansion time.
That is what is done for omp parallel, task etc. as well.  If the standard
OpenMP clauses can't serve that purpose, there is always the possibility of
adding further internal clauses, that would e.g. be only considered for the
SIMT stuff.  For the outlining, our current infrastructure really wants to
have CFG etc., something you don't have at gimplification time.

	Jakub
Alexander Monakov Nov. 11, 2016, 9:28 a.m. UTC | #3
On Fri, 11 Nov 2016, Jakub Jelinek wrote:

> On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote:
> > On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> > [...]
> > > the intended outlining of SIMT regions for PTX offloading done (IMHO the
> > > best place to do that is in omp expansion, not gimplification)
> > 
> > Sorry, I couldn't find a good way to implement that during omp expansion.  The
> > reason I went for gimplification is automatic discovery of sharing clauses -
> > I'm assuming in expansion it's very hard to try and fill omp_data_[sio] without
> > gimplifier's help.  Does this sound sensible?
> 
> Sure, for discovery of needed sharing clauses the gimplifier has the right
> infrastructure.  But that doesn't mean you can't add those clauses at
> gimplification time and do the outlining at omp expansion time.
> That is what is done for omp parallel, task etc. as well.  If the standard
> OpenMP clauses can't serve that purpose, there is always the possibility of
> adding further internal clauses, that would e.g. be only considered for the
> SIMT stuff.  For the outlining, our current infrastructure really wants to
> have CFG etc., something you don't have at gimplification time.

Yes, that is exactly what I'm doing. I'm first tweaking the gimplifier to inject
a parallel region with an artificial _simtreg_ clause, transforming

  #pragma omp simd
  for (...)

into

  #pragma omp parallel _simtreg_
    #pragma omp simd
    for (...)

and then expansion of 'omp parallel' can check presence of _simtreg_ clause and
emit a direct call rather than an invocation of GOMP_parallel.

(a few days ago I've sent you privately a patch implementing the above)

Thanks.
Alexander
Jakub Jelinek Nov. 11, 2016, 10:04 a.m. UTC | #4
On Fri, Nov 11, 2016 at 12:28:16PM +0300, Alexander Monakov wrote:
> On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> 
> > On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote:
> > > On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> > > [...]
> > > > the intended outlining of SIMT regions for PTX offloading done (IMHO the
> > > > best place to do that is in omp expansion, not gimplification)
> > > 
> > > Sorry, I couldn't find a good way to implement that during omp expansion.  The
> > > reason I went for gimplification is automatic discovery of sharing clauses -
> > > I'm assuming in expansion it's very hard to try and fill omp_data_[sio] without
> > > gimplifier's help.  Does this sound sensible?
> > 
> > Sure, for discovery of needed sharing clauses the gimplifier has the right
> > infrastructure.  But that doesn't mean you can't add those clauses at
> > gimplification time and do the outlining at omp expansion time.
> > That is what is done for omp parallel, task etc. as well.  If the standard
> > OpenMP clauses can't serve that purpose, there is always the possibility of
> > adding further internal clauses, that would e.g. be only considered for the
> > SIMT stuff.  For the outlining, our current infrastructure really wants to
> > have CFG etc., something you don't have at gimplification time.
> 
> Yes, that is exactly what I'm doing. I'm first tweaking the gimplifier to inject
> a parallel region with an artificial _simtreg_ clause, transforming
> 
>   #pragma omp simd
>   for (...)
> 
> into
> 
>   #pragma omp parallel _simtreg_
>     #pragma omp simd
>     for (...)
> 
> and then expansion of 'omp parallel' can check presence of _simtreg_ clause and
> emit a direct call rather than an invocation of GOMP_parallel.

Well, I meant keep #pragma omp simd as is, just add some data-sharing-like
clauses _simt_shared_(x) or whatever you need, then the omplower versioning
patch I've posted could e.g. drop those _simt_shared_ or whatever else you
need clauses for the omp simd without _simt_ clause, omp lowering then would
do whatever is needed for those _simt_shared_ clauses and finally omp
expansion would outline it.  Adding omp parallel around the omp simd is just
weird, it has nothing to do with omp parallel.

	Jakub
diff mbox

Patch

--- gcc/internal-fn.c.jj	2016-10-12 10:38:54.000000000 +0200
+++ gcc/internal-fn.c	2016-10-24 15:25:58.162292706 +0200
@@ -154,6 +154,12 @@  expand_ANNOTATE (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+static void
+expand_GOMP_USE_SIMT (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
    without SIMT execution this should be expanded in omp_device_lower pass.  */
 
--- gcc/tree.c.jj	2016-10-12 10:38:56.000000000 +0200
+++ gcc/tree.c	2016-10-24 15:49:48.487890952 +0200
@@ -320,6 +320,7 @@  unsigned const char omp_clause_num_ops[]
   1, /* OMP_CLAUSE_HINT  */
   0, /* OMP_CLAUSE_DEFALTMAP  */
   1, /* OMP_CLAUSE__SIMDUID_  */
+  0, /* OMP_CLAUSE__SIMT_  */
   1, /* OMP_CLAUSE__CILK_FOR_COUNT_  */
   0, /* OMP_CLAUSE_INDEPENDENT  */
   1, /* OMP_CLAUSE_WORKER  */
@@ -392,6 +393,7 @@  const char * const omp_clause_code_name[
   "hint",
   "defaultmap",
   "_simduid_",
+  "_simt_",
   "_Cilk_for_count_",
   "independent",
   "worker",
@@ -11671,6 +11673,7 @@  walk_tree_1 (tree *tp, walk_tree_fn func
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
 	case OMP_CLAUSE_TILE:
+	case OMP_CLAUSE__SIMT_:
 	  WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
 
 	case OMP_CLAUSE_LASTPRIVATE:
--- gcc/tree-core.h.jj	2016-10-12 10:38:55.000000000 +0200
+++ gcc/tree-core.h	2016-10-24 15:46:48.996193955 +0200
@@ -435,6 +435,10 @@  enum omp_clause_code {
   /* Internally used only clause, holding SIMD uid.  */
   OMP_CLAUSE__SIMDUID_,
 
+  /* Internally used only clause, flag whether this is SIMT simd
+     loop or not.  */
+  OMP_CLAUSE__SIMT_,
+
   /* Internally used only clause, holding _Cilk_for # of iterations
      on OMP_PARALLEL.  */
   OMP_CLAUSE__CILK_FOR_COUNT_,
--- gcc/omp-low.c.jj	2016-10-12 10:38:54.000000000 +0200
+++ gcc/omp-low.c	2016-10-25 17:54:39.563307069 +0200
@@ -275,6 +275,7 @@  static bool omp_any_child_fn_dumped;
 static void scan_omp (gimple_seq *, omp_context *);
 static tree scan_omp_1_op (tree *, int *, void *);
 static gphi *find_phi_with_arg_on_edge (tree, edge);
+static int omp_max_simt_vf (void);
 
 #define WALK_SUBSTMTS  \
     case GIMPLE_BIND: \
@@ -2188,6 +2189,7 @@  scan_sharing_clauses (tree clauses, omp_
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE__SIMT_:
 	  break;
 
 	case OMP_CLAUSE_ALIGNED:
@@ -2363,6 +2365,7 @@  scan_sharing_clauses (tree clauses, omp_
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
 	case OMP_CLAUSE__GRIDDIM_:
+	case OMP_CLAUSE__SIMT_:
 	  break;
 
 	case OMP_CLAUSE_DEVICE_RESIDENT:
@@ -3067,6 +3070,48 @@  scan_omp_for (gomp_for *stmt, omp_contex
   scan_omp (gimple_omp_body_ptr (stmt), ctx);
 }
 
+/* Duplicate #pragma omp simd, one for SIMT, another one for SIMD.  */
+
+static void
+scan_omp_simd (gimple_stmt_iterator *gsi, gomp_for *stmt,
+	       omp_context *outer_ctx)
+{
+  gbind *bind = gimple_build_bind (NULL, NULL, NULL);
+  gsi_replace (gsi, bind, false);
+  gimple_seq seq = NULL;
+  gimple *g = gimple_build_call_internal (IFN_GOMP_USE_SIMT, 0);
+  tree cond = create_tmp_var_raw (boolean_type_node);
+  DECL_CONTEXT (cond) = current_function_decl;
+  DECL_SEEN_IN_BIND_EXPR_P (cond) = 1;
+  gimple_bind_set_vars (bind, cond);
+  gimple_call_set_lhs (g, cond);
+  gimple_seq_add_stmt (&seq, g);
+  tree lab1 = create_artificial_label (UNKNOWN_LOCATION);
+  tree lab2 = create_artificial_label (UNKNOWN_LOCATION);
+  tree lab3 = create_artificial_label (UNKNOWN_LOCATION);
+  g = gimple_build_cond (NE_EXPR, cond, boolean_false_node, lab1, lab2);
+  gimple_seq_add_stmt (&seq, g);
+  g = gimple_build_label (lab1);
+  gimple_seq_add_stmt (&seq, g);
+  gimple_seq new_seq = copy_gimple_seq_and_replace_locals (stmt);
+  gomp_for *new_stmt = as_a <gomp_for *> (new_seq);
+  tree clause = build_omp_clause (gimple_location (stmt), OMP_CLAUSE__SIMT_);
+  OMP_CLAUSE_CHAIN (clause) = gimple_omp_for_clauses (new_stmt);
+  gimple_omp_for_set_clauses (new_stmt, clause);
+  gimple_seq_add_stmt (&seq, new_stmt);
+  g = gimple_build_goto (lab3);
+  gimple_seq_add_stmt (&seq, g);
+  g = gimple_build_label (lab2);
+  gimple_seq_add_stmt (&seq, g);
+  gimple_seq_add_stmt (&seq, stmt);
+  g = gimple_build_label (lab3);
+  gimple_seq_add_stmt (&seq, g);
+  gimple_bind_set_body (bind, seq);
+  update_stmt (bind);
+  scan_omp_for (new_stmt, outer_ctx);
+  scan_omp_for (stmt, outer_ctx);
+}
+
 /* Scan an OpenMP sections directive.  */
 
 static void
@@ -3955,7 +4000,13 @@  scan_omp_1_stmt (gimple_stmt_iterator *g
       break;
 
     case GIMPLE_OMP_FOR:
-      scan_omp_for (as_a <gomp_for *> (stmt), ctx);
+      if (((gimple_omp_for_kind (as_a <gomp_for *> (stmt))
+	    & GF_OMP_FOR_KIND_MASK) == GF_OMP_FOR_KIND_SIMD)
+	  && omp_maybe_offloaded_ctx (ctx)
+	  && omp_max_simt_vf ())
+	scan_omp_simd (gsi, as_a <gomp_for *> (stmt), ctx);
+      else
+	scan_omp_for (as_a <gomp_for *> (stmt), ctx);
       break;
 
     case GIMPLE_OMP_SECTIONS:
@@ -4300,8 +4351,7 @@  omp_max_vf (void)
       if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
 	vf = GET_MODE_NUNITS (vqimode);
     }
-  int svf = omp_max_simt_vf ();
-  return MAX (vf, svf);
+  return vf;
 }
 
 /* Helper function of lower_rec_input_clauses, used for #pragma omp simd
@@ -4314,6 +4364,12 @@  lower_rec_simd_input_clauses (tree new_v
   if (max_vf == 0)
     {
       max_vf = omp_max_vf ();
+      if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
+			   OMP_CLAUSE__SIMT_))
+	{
+	  int max_simt = omp_max_simt_vf ();
+	  max_vf = MAX (max_vf, max_simt);
+	}
       if (max_vf > 1)
 	{
 	  tree c = find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
@@ -4387,8 +4443,7 @@  lower_rec_input_clauses (tree clauses, g
   int pass;
   bool is_simd = (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
 		  && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD);
-  bool maybe_simt
-    = is_simd && omp_maybe_offloaded_ctx (ctx) && omp_max_simt_vf () > 1;
+  bool maybe_simt = is_simd && find_omp_clause (clauses, OMP_CLAUSE__SIMT_);
   int max_vf = 0;
   tree lane = NULL_TREE, idx = NULL_TREE;
   tree simt_lane = NULL_TREE;
@@ -5477,7 +5532,7 @@  lower_lastprivate_clauses (tree clauses,
   if (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
       && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD)
     {
-      maybe_simt = omp_maybe_offloaded_ctx (ctx) && omp_max_simt_vf () > 1;
+      maybe_simt = find_omp_clause (orig_clauses, OMP_CLAUSE__SIMT_);
       simduid = find_omp_clause (orig_clauses, OMP_CLAUSE__SIMDUID_);
       if (simduid)
 	simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
@@ -10601,7 +10656,11 @@  expand_omp_simd (struct omp_region *regi
   bool offloaded = cgraph_node::get (current_function_decl)->offloadable;
   for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer)
     offloaded = rgn->type == GIMPLE_OMP_TARGET;
-  bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1;
+  bool is_simt
+    = (offloaded
+       && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
+			   OMP_CLAUSE__SIMT_)
+       && safelen_int > 1);
   tree simt_lane = NULL_TREE, simt_maxlane = NULL_TREE;
   if (is_simt)
     {
@@ -21358,6 +21417,9 @@  execute_omp_device_lower ()
 	tree type = lhs ? TREE_TYPE (lhs) : integer_type_node;
 	switch (gimple_call_internal_fn (stmt))
 	  {
+	  case IFN_GOMP_USE_SIMT:
+	    rhs = vf == 1 ? boolean_false_node : boolean_true_node;
+	    break;
 	  case IFN_GOMP_SIMT_LANE:
 	  case IFN_GOMP_SIMT_LAST_LANE:
 	    rhs = vf == 1 ? build_zero_cst (type) : NULL_TREE;
--- gcc/internal-fn.def.jj	2016-10-12 10:38:54.000000000 +0200
+++ gcc/internal-fn.def	2016-10-24 15:24:32.468380502 +0200
@@ -141,6 +141,7 @@  DEF_INTERNAL_INT_FN (FFS, ECF_CONST, ffs
 DEF_INTERNAL_INT_FN (PARITY, ECF_CONST, parity, unary)
 DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST, popcount, unary)
 
+DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
--- gcc/tree-pretty-print.c.jj	2016-10-12 10:38:55.000000000 +0200
+++ gcc/tree-pretty-print.c	2016-10-24 15:50:32.043336116 +0200
@@ -812,6 +812,10 @@  dump_omp_clause (pretty_printer *pp, tre
       pp_right_paren (pp);
       break;
 
+    case OMP_CLAUSE__SIMT_:
+      pp_string (pp, "_simt_");
+      break;
+
     case OMP_CLAUSE_GANG:
       pp_string (pp, "gang");
       if (OMP_CLAUSE_GANG_EXPR (clause) != NULL_TREE)