diff mbox

[gomp4.5] Make even Fortran target use firstprivate for scalars by default, assorted fixes

Message ID 20160520161244.GN28550@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek May 20, 2016, 4:12 p.m. UTC
Hi!

This patch turns on implicit firstprivate for scalars (unless
defaultmap(tofrom: scalar) is present) for !$omp target, and assorted fixes
so that the testsuite passes again.

Tested on x86_64-linux, committed to branch.

2016-05-20  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* langhooks.h (struct lang_hooks_for_decls): Add omp_scalar_p.
	* langhooks-def.h (lhd_omp_scalar_p): New prototype.
	(LANG_HOOKS_OMP_SCALAR_P): Define.
	(LANG_HOOKS_DECLS): Use it.
	* langhooks.c (lhd_omp_scalar_p): New function.
	* gimplify.c (omp_notice_variable): Use lang_hooks.decls.omp_scalar_p.
	(omp_no_lastprivate): Removed.
	(gimplify_scan_omp_clauses): Set ctx->target_map_scalars_firstprivate
	on OMP_TARGET even for Fortran.  Remove omp_no_lastprivate callers.
	Propagate lastprivate on combined teams distribute parallel for simd
	even to distribute and teams construct.
	(gimplify_adjust_omp_clauses): Remove omp_no_lastprivate callers.
	(gimplify_omp_for): Likewise.
	(computable_teams_clause): Fail for automatic vars from current
	function not yet seen in bind expr.
	* omp-low.c (lower_omp_target): Fix up argument to is_reference.
	* varpool.c (varpool_node::get_create): Set node->offloading
	even for DECL_EXTERNAL decls.
gcc/fortran/
	* trans.h (gfc_omp_scalar_p): New prototype.
	* f95-lang.c (LANG_HOOKS_OMP_SCALAR_P): Redefine to gfc_omp_scalar_p.
	* trans-openmp.c (gfc_omp_scalar_p): New function.
	(gfc_trans_omp_do): Clear sched_simd flag.
	(gfc_split_omp_clauses): Change firstprivate and lastprivate
	handling for OpenMP 4.5.
	(gfc_trans_omp_teams): Add omp_clauses argument, add it to other
	teams clauses.
	(gfc_trans_omp_target): For -fopenmp, translate num_teams and
	thread_limit clauses on combined target teams early and pass to
	gfc_trans_omp_teams.
	(gfc_trans_omp_directive): Adjust gfc_trans_omp_teams caller.
libgomp/
	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90
	(fib_wrapper): Add map(from: x) clause.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90
	(e_53_2): Likewise.
	* testsuite/libgomp.fortran/examples-4/declare_target-4.f90
	(accum): Add map(tmp) clause.
	* testsuite/libgomp.fortran/examples-4/declare_target-5.f90
	(accum): Add map(tofrom: tmp) clause.
	* testsuite/libgomp.fortran/examples-4/target_data-3.f90
	(gramSchmidt): Likewise.
	* testsuite/libgomp.fortran/examples-4/teams-2.f90 (dotprod): Add
	map(tofrom: sum) clause.
	* testsuite/libgomp.fortran/nestedfn5.f90 (foo): Add twice
	map (alloc: a, l) clause.  Add defaultmap(tofrom: scalar) clause.
	* testsuite/libgomp.fortran/pr66199-2.f90: Adjust for linear clause
	only allowed on the loop iterator.
	* testsuite/libgomp.fortran/target4.f90 (foo): Add map(t) clause.


	Jakub

Comments

Jakub Jelinek May 20, 2016, 4:21 p.m. UTC | #1
Hi!

While working on this patch, I've noticed the need to do:

On Fri, May 20, 2016 at 06:12:44PM +0200, Jakub Jelinek wrote:
> 	* varpool.c (varpool_node::get_create): Set node->offloading
> 	even for DECL_EXTERNAL decls.
...
> --- gcc/varpool.c.jj	2016-05-04 18:43:25.000000000 +0200
> +++ gcc/varpool.c	2016-05-20 12:18:14.636755302 +0200
> @@ -149,11 +149,11 @@ varpool_node::get_create (tree decl)
>    node = varpool_node::create_empty ();
>    node->decl = decl;
>  
> -  if ((flag_openacc || flag_openmp) && !DECL_EXTERNAL (decl)
> +  if ((flag_openacc || flag_openmp)
>        && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
>      {
>        node->offloadable = 1;
> -      if (ENABLE_OFFLOADING)
> +      if (ENABLE_OFFLOADING && !DECL_EXTERNAL (decl))
>  	{
>  	  g->have_offload = true;
>  	  if (!in_lto_p)

but that made me think on what handling do we want for the
"omp declare target" DECL_EXTERNAL vars.
The reason I needed the above is that both gimplify.c and omp-low.c
test just the node->offloadable flag, bit the attribute, and so when
it is external and the flag wasn't set, we could privatize the vars
even when we were supposed to map them etc.
In the C/C++ FEs, we set not just node->offloadable, but also
for ENABLE_OFFLOADING g->have_offload and offload_vars too.
Wonder if that means we register even non-local vars, that would be IMHO a
bug.  On the other side, we need to watch for an extern declaration
of a VAR_DECL marked for offloading and only later on locally defined,
in that case if we haven't set g->have_offload and added entry to
offload_vars, we'd need to do it when merging the extern decl with the
definition.

So, your thoughts on that?

	Jakub
Alexander Monakov May 23, 2016, 2:37 p.m. UTC | #2
Hello,

On Fri, 20 May 2016, Jakub Jelinek wrote:
[snip]
> The reason I needed the above is that both gimplify.c and omp-low.c
> test just the node->offloadable flag, bit the attribute, and so when
> it is external and the flag wasn't set, we could privatize the vars
> even when we were supposed to map them etc.
> In the C/C++ FEs, we set not just node->offloadable, but also
> for ENABLE_OFFLOADING g->have_offload and offload_vars too.
> Wonder if that means we register even non-local vars, that would be IMHO a
> bug.  On the other side, we need to watch for an extern declaration
> of a VAR_DECL marked for offloading and only later on locally defined,
> in that case if we haven't set g->have_offload and added entry to
> offload_vars, we'd need to do it when merging the extern decl with the
> definition.
> 
> So, your thoughts on that?

As I'm relatively late to this game, at times like this it's hard for me to
follow what's the general model is. It appears that 'omp declare target' is
superfluous given symtab_node::offloadable. Offloading compilation still needs
to distinguish target region entry points from the rest of the functions
(hence 'omp target entrypoint' serves a clear purpose), but does plain 'omp
declare target' have a particular meaning not conveyed by
symtab_node::offloadable && !'omp target entrypoint'?

Is/should be there an invariant like "when omp-low is completed, all decls
annotated with 'omp declare target' will also have symtab_node::offloadable
set"?

Thanks.
Alexander
Jakub Jelinek May 23, 2016, 2:54 p.m. UTC | #3
On Mon, May 23, 2016 at 05:37:17PM +0300, Alexander Monakov wrote:
> Hello,
> 
> On Fri, 20 May 2016, Jakub Jelinek wrote:
> [snip]
> > The reason I needed the above is that both gimplify.c and omp-low.c
> > test just the node->offloadable flag, bit the attribute, and so when
> > it is external and the flag wasn't set, we could privatize the vars
> > even when we were supposed to map them etc.
> > In the C/C++ FEs, we set not just node->offloadable, but also
> > for ENABLE_OFFLOADING g->have_offload and offload_vars too.
> > Wonder if that means we register even non-local vars, that would be IMHO a
> > bug.  On the other side, we need to watch for an extern declaration
> > of a VAR_DECL marked for offloading and only later on locally defined,
> > in that case if we haven't set g->have_offload and added entry to
> > offload_vars, we'd need to do it when merging the extern decl with the
> > definition.
> > 
> > So, your thoughts on that?
> 
> As I'm relatively late to this game, at times like this it's hard for me to
> follow what's the general model is. It appears that 'omp declare target' is
> superfluous given symtab_node::offloadable. Offloading compilation still needs
> to distinguish target region entry points from the rest of the functions
> (hence 'omp target entrypoint' serves a clear purpose), but does plain 'omp
> declare target' have a particular meaning not conveyed by
> symtab_node::offloadable && !'omp target entrypoint'?

"omp declare target" and "omp declare target link" attributes are FE
representation, symtab_node::offloadable is ME representation.
We have just one bit in the latter right now, so e.g. it does not
differentiate between the two kinds of offloadable vars.  In the C/C++ FE,
we set the offloadable bit right away next to the creation of the attribute,
in the Fortran FE we don't (and not sure if it is even safe to create symtab
node at that point yet).

> Is/should be there an invariant like "when omp-low is completed, all decls
> annotated with 'omp declare target' will also have symtab_node::offloadable
> set"?

Without my patch to gomp4.5 branch, that invariant didn't hold for Fortran
DECL_EXTERNAL vars.  With the patch it holds, but we need to come to
agreement what behavior we do want for DECL_EXTERNAL vars.

	Jakub
Alexander Monakov May 23, 2016, 4:15 p.m. UTC | #4
On Fri, 20 May 2016, Jakub Jelinek wrote:
> but that made me think on what handling do we want for the
> "omp declare target" DECL_EXTERNAL vars.
[snip]
> In the C/C++ FEs, we set not just node->offloadable, but also for
> ENABLE_OFFLOADING g->have_offload and offload_vars too.  Wonder if that
> means we register even non-local vars, that would be IMHO a bug.

(it's unclear to me what you mean by 'non-local vars' here, from the context
it looks like it's 'variables with an external declaration and no definition
in the current TU'; correct?)

Looking at the OpenMP 4.5 spec, there's a requirement that

    [2.10.6 declare target directive, Restrictions, C/C++]
    * All declarations and definitions for a function must have a declare
    target directive if one is specified for any of them. Otherwise, the
    result is unspecified.

(why are variables exempted?)

A natural way to conform to that requirement is to have a '#pragma omp declare
target' in the header file declaring the offloaded function. But that means
every TU that includes that header will have g->have_offload set, even if
otherwise it doesn't touch OpenMP at all.

So from that perspective it's undesirable to have 'omp declare target' on
declarations that don't define anything.

> On the other side, we need to watch for an extern declaration
> of a VAR_DECL marked for offloading and only later on locally defined,
> in that case if we haven't set g->have_offload and added entry to
> offload_vars, we'd need to do it when merging the extern decl with the
> definition.

Yes, but I wonder if setting g->have_offload etc. in the front-ends is the
right thing to do at all.  Shouldn't frontends simply set 'omp declare target'
and leave the rest to omp-low?

Hope that's constructive.
Alexander
Jakub Jelinek May 23, 2016, 4:51 p.m. UTC | #5
On Mon, May 23, 2016 at 07:15:48PM +0300, Alexander Monakov wrote:
> (it's unclear to me what you mean by 'non-local vars' here, from the context
> it looks like it's 'variables with an external declaration and no definition
> in the current TU'; correct?)

Sure.

> Looking at the OpenMP 4.5 spec, there's a requirement that
> 
>     [2.10.6 declare target directive, Restrictions, C/C++]
>     * All declarations and definitions for a function must have a declare
>     target directive if one is specified for any of them. Otherwise, the
>     result is unspecified.
> 
> (why are variables exempted?)

I'll ask on the lang committee.  That said, for external declarations of
variables, all we care is that if any external declaration is specified in
to/link clause on declare target directive, then the definition (for common
vars all the definitions I guess) are also specified in the same kind of
clause.

Having the externs specified in omp declare target to is important for
code generation, we need to know that whether the vars should be mapped
implicitly on target constructs and remapped in the target construct bodies,
or whether the actual vars should be used in the regions.

Thus, 

> So from that perspective it's undesirable to have 'omp declare target' on
> declarations that don't define anything.

is just wrong, we at least need the symbol_table::offloadable bit set.

About g->head_offload and offload_vars, I guess it is fine not to set those
for externs but we need to arrange that to be set when we actually define it
when it has been previously extern, and we need some sensible handling
of the case where the var is only declared extern and omp declare target,
used in some target region, but not actually defined anywhere in the same
shared library or executable.

	Jakub
Alexander Monakov May 23, 2016, 6:19 p.m. UTC | #6
On Mon, 23 May 2016, Jakub Jelinek wrote:
> Having the externs specified in omp declare target to is important for
> code generation, we need to know that whether the vars should be mapped
> implicitly on target constructs and remapped in the target construct bodies,
> or whether the actual vars should be used in the regions.

Yep, sorry for missing that.

> Thus, 
> 
> > So from that perspective it's undesirable to have 'omp declare target' on
> > declarations that don't define anything.
> 
> is just wrong, we at least need the symbol_table::offloadable bit set.

So unlike for functions, for variables GCC needs to know exactly whether they
are 'omp declare target [link]' at all points of use, not just at the point of
definition.

There's a pitfall if the user forgets the pragma on the external declaration:

=== a.c

#pragma omp declare target
int a;
void set_a()
{
  a = 42;
}
#pragma omp end declare target

=== main.c

extern int a;
extern void set_a();
#pragma omp declare target to(set_a)

int main()
{
  a = 0;
  #pragma omp target map(tofrom:a)
    set_a();

  if (a != 42) abort();
}
===

As I understand, this aborts, and it's not obvious how to take measures to
produce a compile-time diagnostic.  And I'm not sure if the letter of the spec
is being violated there.

Sorry if I'm elaborating on the more obvious stuff without contributing to
your original question; I hope this is of some value (like it is for me).

> About g->head_offload and offload_vars, I guess it is fine not to set those
> for externs but we need to arrange that to be set when we actually define it
> when it has been previously extern,

+1, it should be nice to avoid unnecessary streaming of externs; as for the
latter point, wouldn't moving handling from frontends to a point in the
middle-end when the symtab is complete solve that automatically?

> and we need some sensible handling of the case where the var is only
> declared extern and omp declare target, used in some target region, but not
> actually defined anywhere in the same shared library or executable.

I think on NVPTX it yields a link error at mkoffload time.

Alexander
Jakub Jelinek May 23, 2016, 6:28 p.m. UTC | #7
On Mon, May 23, 2016 at 09:19:47PM +0300, Alexander Monakov wrote:
> 
> So unlike for functions, for variables GCC needs to know exactly whether they
> are 'omp declare target [link]' at all points of use, not just at the point of
> definition.

There are many bugs that just can't be diagnosed by the compiler.
It is up to the users to make sure they write sane code.

> There's a pitfall if the user forgets the pragma on the external declaration:
> 
> === a.c
> 
> #pragma omp declare target
> int a;
> void set_a()
> {
>   a = 42;
> }
> #pragma omp end declare target
> 
> === main.c
> 
> extern int a;
> extern void set_a();
> #pragma omp declare target to(set_a)
> 
> int main()
> {
>   a = 0;
>   #pragma omp target map(tofrom:a)
>     set_a();
> 
>   if (a != 42) abort();
> }
> ===

The above will abort always, no matter if you have #pragma omp declare target to(a)
in main.c or not, because a is already mapped (with infinite refcount), so
the map(tofrom:a) doesn't actually do anything (but prevent
firstprivatization of the var).  With map clause on the target, the only
change would be that the body of the target (but not functions it calls), if
they reference a, would be less efficient (would reference a through some
pointer set up during the mapping, instead of a directly).

	Jakub
diff mbox

Patch

--- gcc/langhooks.h.jj	2016-05-04 18:37:42.000000000 +0200
+++ gcc/langhooks.h	2016-05-19 18:14:56.474549712 +0200
@@ -256,6 +256,10 @@  struct lang_hooks_for_decls
 
   /* Do language specific checking on an implicitly determined clause.  */
   void (*omp_finish_clause) (tree clause, gimple_seq *pre_p);
+
+  /* Return true if DECL is a scalar variable (for the purpose of
+     implicit firstprivatization).  */
+  bool (*omp_scalar_p) (tree decl);
 };
 
 /* Language hooks related to LTO serialization.  */
--- gcc/langhooks-def.h.jj	2016-05-04 18:43:30.000000000 +0200
+++ gcc/langhooks-def.h	2016-05-19 18:13:40.817541557 +0200
@@ -80,6 +80,7 @@  struct gimplify_omp_ctx;
 extern void lhd_omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *,
 					       tree);
 extern bool lhd_omp_mappable_type (tree);
+extern bool lhd_omp_scalar_p (tree);
 
 #define LANG_HOOKS_NAME			"GNU unknown"
 #define LANG_HOOKS_IDENTIFIER_SIZE	sizeof (struct lang_identifier)
@@ -225,6 +226,7 @@  extern tree lhd_make_node (enum tree_cod
 #define LANG_HOOKS_OMP_CLAUSE_LINEAR_CTOR NULL
 #define LANG_HOOKS_OMP_CLAUSE_DTOR hook_tree_tree_tree_null
 #define LANG_HOOKS_OMP_FINISH_CLAUSE lhd_omp_finish_clause
+#define LANG_HOOKS_OMP_SCALAR_P lhd_omp_scalar_p
 
 #define LANG_HOOKS_DECLS { \
   LANG_HOOKS_GLOBAL_BINDINGS_P, \
@@ -249,7 +251,8 @@  extern tree lhd_make_node (enum tree_cod
   LANG_HOOKS_OMP_CLAUSE_ASSIGN_OP, \
   LANG_HOOKS_OMP_CLAUSE_LINEAR_CTOR, \
   LANG_HOOKS_OMP_CLAUSE_DTOR, \
-  LANG_HOOKS_OMP_FINISH_CLAUSE \
+  LANG_HOOKS_OMP_FINISH_CLAUSE, \
+  LANG_HOOKS_OMP_SCALAR_P \
 }
 
 /* LTO hooks.  */
--- gcc/langhooks.c.jj	2016-05-04 18:37:41.000000000 +0200
+++ gcc/langhooks.c	2016-05-19 18:24:57.213864107 +0200
@@ -514,6 +514,24 @@  lhd_omp_finish_clause (tree, gimple_seq
 {
 }
 
+/* Return true if DECL is a scalar variable (for the purpose of
+   implicit firstprivatization).  */
+
+bool
+lhd_omp_scalar_p (tree decl)
+{
+  tree type = TREE_TYPE (decl);
+  if (TREE_CODE (type) == REFERENCE_TYPE)
+    type = TREE_TYPE (type);
+  if (TREE_CODE (type) == COMPLEX_TYPE)
+    type = TREE_TYPE (type);
+  if (INTEGRAL_TYPE_P (type)
+      || SCALAR_FLOAT_TYPE_P (type)
+      || TREE_CODE (type) == POINTER_TYPE)
+    return true;
+  return false;
+}
+
 /* Register language specific type size variables as potentially OpenMP
    firstprivate variables.  */
 
--- gcc/gimplify.c.jj	2016-05-04 18:37:55.000000000 +0200
+++ gcc/gimplify.c	2016-05-20 17:34:28.464852536 +0200
@@ -6212,17 +6212,7 @@  omp_notice_variable (struct gimplify_omp
 		  is_declare_target = octx == NULL;
 		}
 	      if (!is_declare_target && ctx->target_map_scalars_firstprivate)
-		{
-		  tree type = TREE_TYPE (decl);
-		  if (TREE_CODE (type) == REFERENCE_TYPE)
-		    type = TREE_TYPE (type);
-		  if (TREE_CODE (type) == COMPLEX_TYPE)
-		    type = TREE_TYPE (type);
-		  if (INTEGRAL_TYPE_P (type)
-		      || SCALAR_FLOAT_TYPE_P (type)
-		      || TREE_CODE (type) == POINTER_TYPE)
-		    is_scalar = true;
-		}
+		is_scalar = lang_hooks.decls.omp_scalar_p (decl);
 	      if (is_declare_target)
 		;
 	      else if (ctx->target_map_pointers_as_0len_arrays
@@ -6491,36 +6481,6 @@  omp_check_private (struct gimplify_omp_c
   return false;
 }
 
-/* Return true if the CTX is combined with distribute and thus
-   lastprivate can't be supported.  */
-
-static bool
-omp_no_lastprivate (struct gimplify_omp_ctx *ctx)
-{
-  do
-    {
-      if (ctx->outer_context == NULL)
-	return false;
-      ctx = ctx->outer_context;
-      switch (ctx->region_type)
-	{
-	case ORT_WORKSHARE:
-	  if (!ctx->combined_loop)
-	    return false;
-	  if (ctx->distribute)
-	    return lang_GNU_Fortran ();
-	  break;
-	case ORT_COMBINED_PARALLEL:
-	  break;
-	case ORT_COMBINED_TEAMS:
-	  return lang_GNU_Fortran ();
-	default:
-	  return false;
-	}
-    }
-  while (1);
-}
-
 /* Callback for walk_tree to find a DECL_EXPR for the given DECL.  */
 
 static tree
@@ -6552,11 +6512,10 @@  gimplify_scan_omp_clauses (tree *list_p,
 
   ctx = new_omp_context (region_type);
   outer_ctx = ctx->outer_context;
-  if (code == OMP_TARGET && !lang_GNU_Fortran ())
+  if (code == OMP_TARGET)
     {
-      ctx->target_map_pointers_as_0len_arrays = true;
-      /* FIXME: For Fortran we want to set this too, when
-	 the Fortran FE is updated to OpenMP 4.5.  */
+      if (!lang_GNU_Fortran ())
+	ctx->target_map_pointers_as_0len_arrays = true;
       ctx->target_map_scalars_firstprivate = true;
     }
   if (!lang_GNU_Fortran ())
@@ -6603,12 +6562,7 @@  gimplify_scan_omp_clauses (tree *list_p,
 	  flags = GOVD_LASTPRIVATE | GOVD_SEEN | GOVD_EXPLICIT;
 	  check_non_private = "lastprivate";
 	  decl = OMP_CLAUSE_DECL (c);
-	  if (omp_no_lastprivate (ctx))
-	    {
-	      notice_outer = false;
-	      flags |= GOVD_LINEAR_LASTPRIVATE_NO_OUTER;
-	    }
-	  else if (error_operand_p (decl))
+	  if (error_operand_p (decl))
 	    goto do_add;
 	  else if (outer_ctx
 		   && (outer_ctx->region_type == ORT_COMBINED_PARALLEL
@@ -6648,7 +6602,31 @@  gimplify_scan_omp_clauses (tree *list_p,
 		  struct gimplify_omp_ctx *octx = outer_ctx->outer_context;
 		  omp_add_variable (octx, decl, GOVD_SHARED | GOVD_SEEN);
 		  if (octx->outer_context)
-		    omp_notice_variable (octx->outer_context, decl, true);
+		    {
+		      octx = octx->outer_context;
+		      if (octx->region_type == ORT_WORKSHARE
+			  && octx->combined_loop
+			  && splay_tree_lookup (octx->variables,
+						(splay_tree_key) decl) == NULL
+			  && !omp_check_private (octx, decl, false))
+			{
+			  omp_add_variable (octx, decl,
+					    GOVD_LASTPRIVATE | GOVD_SEEN);
+			  octx = octx->outer_context;
+			  if (octx
+			      && octx->region_type == ORT_COMBINED_TEAMS
+			      && (splay_tree_lookup (octx->variables,
+						     (splay_tree_key) decl)
+				  == NULL))
+			    {
+			      omp_add_variable (octx, decl,
+						GOVD_SHARED | GOVD_SEEN);
+			      octx = octx->outer_context;
+			    }
+			}
+		      if (octx)
+			omp_notice_variable (octx, decl, true);
+		    }
 		}
 	      else if (outer_ctx->outer_context)
 		omp_notice_variable (outer_ctx->outer_context, decl, true);
@@ -6727,8 +6705,7 @@  gimplify_scan_omp_clauses (tree *list_p,
 		  if (octx
 		      && octx->region_type == ORT_WORKSHARE
 		      && octx->combined_loop
-		      && octx->distribute
-		      && !lang_GNU_Fortran ())
+		      && octx->distribute)
 		    {
 		      error_at (OMP_CLAUSE_LOCATION (c),
 				"%<linear%> clause for variable other than "
@@ -6743,8 +6720,6 @@  gimplify_scan_omp_clauses (tree *list_p,
 		 parallel.  Similarly for #pragma omp for simd.  */
 	      struct gimplify_omp_ctx *octx = outer_ctx;
 	      decl = NULL_TREE;
-	      if (omp_no_lastprivate (ctx))
-		OMP_CLAUSE_LINEAR_NO_COPYOUT (c) = 1;
 	      do
 		{
 		  if (OMP_CLAUSE_LINEAR_NO_COPYIN (c)
@@ -7931,15 +7906,8 @@  gimplify_adjust_omp_clauses (gimple_seq
 	  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
 	  OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE (c)
 	    = (n->value & GOVD_FIRSTPRIVATE) != 0;
-	  if (omp_no_lastprivate (ctx))
-	    {
-	      if (OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE (c))
-		remove = true;
-	      else
-		OMP_CLAUSE_CODE (c) = OMP_CLAUSE_PRIVATE;
-	    }
-	  else if (code == OMP_DISTRIBUTE
-		   && OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE (c))
+	  if (code == OMP_DISTRIBUTE
+	      && OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE (c))
 	    {
 	      remove = true;
 	      error_at (OMP_CLAUSE_LOCATION (c),
@@ -8733,9 +8701,8 @@  gimplify_omp_for (tree *expr_p, gimple_s
 	      c = build_omp_clause (input_location, OMP_CLAUSE_LINEAR);
 	      OMP_CLAUSE_LINEAR_NO_COPYIN (c) = 1;
 	      unsigned int flags = GOVD_LINEAR | GOVD_EXPLICIT | GOVD_SEEN;
-	      if ((has_decl_expr
-		   && bitmap_bit_p (has_decl_expr, DECL_UID (decl)))
-		  || omp_no_lastprivate (gimplify_omp_ctxp))
+	      if (has_decl_expr
+		  && bitmap_bit_p (has_decl_expr, DECL_UID (decl)))
 		{
 		  OMP_CLAUSE_LINEAR_NO_COPYOUT (c) = 1;
 		  flags |= GOVD_LINEAR_LASTPRIVATE_NO_OUTER;
@@ -8856,8 +8823,7 @@  gimplify_omp_for (tree *expr_p, gimple_s
 	    {
 	      bool lastprivate
 		= (!has_decl_expr
-		   || !bitmap_bit_p (has_decl_expr, DECL_UID (decl)))
-		  && !omp_no_lastprivate (gimplify_omp_ctxp);
+		   || !bitmap_bit_p (has_decl_expr, DECL_UID (decl)));
 	      struct gimplify_omp_ctx *outer
 		= gimplify_omp_ctxp->outer_context;
 	      if (outer && lastprivate)
@@ -9422,6 +9388,11 @@  computable_teams_clause (tree *tp, int *
 	      || lookup_attribute ("omp declare target link",
 				   DECL_ATTRIBUTES (*tp))))
 	return *tp;
+      if (VAR_P (*tp)
+	  && !DECL_SEEN_IN_BIND_EXPR_P (*tp)
+	  && !is_global_var (*tp)
+	  && decl_function_context (*tp) == current_function_decl)
+	return *tp;
       n = splay_tree_lookup (gimplify_omp_ctxp->variables,
 			     (splay_tree_key) *tp);
       if (n == NULL)
--- gcc/omp-low.c.jj	2016-05-04 18:29:41.000000000 +0200
+++ gcc/omp-low.c	2016-05-20 14:52:59.989224660 +0200
@@ -16313,7 +16313,7 @@  lower_omp_target (gimple_stmt_iterator *
 	      }
 	    if (tkind == GOMP_MAP_FIRSTPRIVATE_INT)
 	      s = size_int (0);
-	    else if (is_reference (var))
+	    else if (is_reference (ovar))
 	      s = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (ovar)));
 	    else
 	      s = TYPE_SIZE_UNIT (TREE_TYPE (ovar));
--- gcc/varpool.c.jj	2016-05-04 18:43:25.000000000 +0200
+++ gcc/varpool.c	2016-05-20 12:18:14.636755302 +0200
@@ -149,11 +149,11 @@  varpool_node::get_create (tree decl)
   node = varpool_node::create_empty ();
   node->decl = decl;
 
-  if ((flag_openacc || flag_openmp) && !DECL_EXTERNAL (decl)
+  if ((flag_openacc || flag_openmp)
       && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
     {
       node->offloadable = 1;
-      if (ENABLE_OFFLOADING)
+      if (ENABLE_OFFLOADING && !DECL_EXTERNAL (decl))
 	{
 	  g->have_offload = true;
 	  if (!in_lto_p)
--- gcc/fortran/trans.h.jj	2016-05-04 18:37:30.000000000 +0200
+++ gcc/fortran/trans.h	2016-05-19 19:00:27.285481315 +0200
@@ -719,6 +719,7 @@  tree gfc_omp_clause_assign_op (tree, tre
 tree gfc_omp_clause_linear_ctor (tree, tree, tree, tree);
 tree gfc_omp_clause_dtor (tree, tree);
 void gfc_omp_finish_clause (tree, gimple_seq *);
+bool gfc_omp_scalar_p (tree);
 bool gfc_omp_disregard_value_expr (tree, bool);
 bool gfc_omp_private_debug_clause (tree, bool);
 bool gfc_omp_private_outer_ref (tree);
--- gcc/fortran/f95-lang.c.jj	2016-05-04 18:37:33.000000000 +0200
+++ gcc/fortran/f95-lang.c	2016-05-19 18:26:41.692474285 +0200
@@ -121,6 +121,7 @@  static const struct attribute_spec gfc_a
 #undef LANG_HOOKS_OMP_CLAUSE_LINEAR_CTOR
 #undef LANG_HOOKS_OMP_CLAUSE_DTOR
 #undef LANG_HOOKS_OMP_FINISH_CLAUSE
+#undef LANG_HOOKS_OMP_SCALAR_P
 #undef LANG_HOOKS_OMP_DISREGARD_VALUE_EXPR
 #undef LANG_HOOKS_OMP_PRIVATE_DEBUG_CLAUSE
 #undef LANG_HOOKS_OMP_PRIVATE_OUTER_REF
@@ -153,6 +154,7 @@  static const struct attribute_spec gfc_a
 #define LANG_HOOKS_OMP_CLAUSE_LINEAR_CTOR	gfc_omp_clause_linear_ctor
 #define LANG_HOOKS_OMP_CLAUSE_DTOR		gfc_omp_clause_dtor
 #define LANG_HOOKS_OMP_FINISH_CLAUSE		gfc_omp_finish_clause
+#define LANG_HOOKS_OMP_SCALAR_P			gfc_omp_scalar_p
 #define LANG_HOOKS_OMP_DISREGARD_VALUE_EXPR	gfc_omp_disregard_value_expr
 #define LANG_HOOKS_OMP_PRIVATE_DEBUG_CLAUSE	gfc_omp_private_debug_clause
 #define LANG_HOOKS_OMP_PRIVATE_OUTER_REF	gfc_omp_private_outer_ref
--- gcc/fortran/trans-openmp.c.jj	2016-05-17 12:21:11.000000000 +0200
+++ gcc/fortran/trans-openmp.c	2016-05-20 16:33:23.688799835 +0200
@@ -1141,6 +1141,34 @@  gfc_omp_finish_clause (tree c, gimple_se
 }
 
 
+/* Return true if DECL is a scalar variable (for the purpose of
+   implicit firstprivatization).  */
+
+bool
+gfc_omp_scalar_p (tree decl)
+{
+  tree type = TREE_TYPE (decl);
+  if (TREE_CODE (type) == REFERENCE_TYPE)
+    type = TREE_TYPE (type);
+  if (TREE_CODE (type) == POINTER_TYPE)
+    {
+      if (GFC_DECL_GET_SCALAR_ALLOCATABLE (decl)
+	  || GFC_DECL_GET_SCALAR_POINTER (decl))
+	type = TREE_TYPE (type);
+      if (GFC_ARRAY_TYPE_P (type)
+	  || GFC_CLASS_TYPE_P (type))
+	return false;
+    }
+  if (TYPE_STRING_FLAG (type))
+    return false;
+  if (INTEGRAL_TYPE_P (type)
+      || SCALAR_FLOAT_TYPE_P (type)
+      || COMPLEX_FLOAT_TYPE_P (type))
+    return true;
+  return false;
+}
+
+
 /* Return true if DECL's DECL_VALUE_EXPR (if any) should be
    disregarded in OpenMP construct, because it is going to be
    remapped during OpenMP lowering.  SHARED is true if DECL
@@ -3336,6 +3364,11 @@  gfc_trans_omp_do (gfc_code *code, gfc_ex
       pblock = &block;
     }
 
+  /* simd schedule modifier is only useful for composite do simd and other
+     constructs including that, where gfc_trans_omp_do is only called
+     on the simd construct and DO's clauses are translated elsewhere.  */
+  do_clauses->sched_simd = false;
+
   omp_clauses = gfc_trans_omp_clauses (pblock, do_clauses, code->loc);
 
   for (i = 0; i < collapse; i++)
@@ -4006,7 +4039,7 @@  gfc_split_omp_clauses (gfc_code *code,
 	}
       /* Private clause is supported on all constructs,
 	 it is enough to put it on the innermost one.  For
-	 !$ omp do put it on parallel though,
+	 !$ omp parallel do put it on parallel though,
 	 as that's what we did for OpenMP 3.1.  */
       clausesa[innermost == GFC_OMP_SPLIT_DO
 	       ? (int) GFC_OMP_SPLIT_PARALLEL
@@ -4014,7 +4047,10 @@  gfc_split_omp_clauses (gfc_code *code,
 	= code->ext.omp_clauses->lists[OMP_LIST_PRIVATE];
       /* Firstprivate clause is supported on all constructs but
 	 simd.  Put it on the outermost of those and duplicate
-	 on parallel.  */
+	 on parallel and teams.  */
+      if (mask & GFC_OMP_MASK_TARGET)
+	clausesa[GFC_OMP_SPLIT_TARGET].lists[OMP_LIST_FIRSTPRIVATE]
+	  = code->ext.omp_clauses->lists[OMP_LIST_FIRSTPRIVATE];
       if (mask & GFC_OMP_MASK_TEAMS)
 	clausesa[GFC_OMP_SPLIT_TEAMS].lists[OMP_LIST_FIRSTPRIVATE]
 	  = code->ext.omp_clauses->lists[OMP_LIST_FIRSTPRIVATE];
@@ -4027,9 +4063,12 @@  gfc_split_omp_clauses (gfc_code *code,
       else if (mask & GFC_OMP_MASK_DO)
 	clausesa[GFC_OMP_SPLIT_DO].lists[OMP_LIST_FIRSTPRIVATE]
 	  = code->ext.omp_clauses->lists[OMP_LIST_FIRSTPRIVATE];
-      /* Lastprivate is allowed on do and simd.  In
-	 parallel do{, simd} we actually want to put it on
+      /* Lastprivate is allowed on distribute, do and simd.
+         In parallel do{, simd} we actually want to put it on
 	 parallel rather than do.  */
+      if (mask & GFC_OMP_MASK_DISTRIBUTE)
+	clausesa[GFC_OMP_SPLIT_DISTRIBUTE].lists[OMP_LIST_LASTPRIVATE]
+	  = code->ext.omp_clauses->lists[OMP_LIST_LASTPRIVATE];
       if (mask & GFC_OMP_MASK_PARALLEL)
 	clausesa[GFC_OMP_SPLIT_PARALLEL].lists[OMP_LIST_LASTPRIVATE]
 	  = code->ext.omp_clauses->lists[OMP_LIST_LASTPRIVATE];
@@ -4401,11 +4440,12 @@  gfc_trans_omp_distribute (gfc_code *code
 }
 
 static tree
-gfc_trans_omp_teams (gfc_code *code, gfc_omp_clauses *clausesa)
+gfc_trans_omp_teams (gfc_code *code, gfc_omp_clauses *clausesa,
+		     tree omp_clauses)
 {
   stmtblock_t block;
   gfc_omp_clauses clausesa_buf[GFC_OMP_SPLIT_NUM];
-  tree stmt, omp_clauses = NULL_TREE;
+  tree stmt;
   bool combined = true;
 
   gfc_start_block (&block);
@@ -4416,8 +4456,9 @@  gfc_trans_omp_teams (gfc_code *code, gfc
     }
   if (flag_openmp)
     omp_clauses
-      = gfc_trans_omp_clauses (&block, &clausesa[GFC_OMP_SPLIT_TEAMS],
-			       code->loc);
+      = chainon (omp_clauses,
+		 gfc_trans_omp_clauses (&block, &clausesa[GFC_OMP_SPLIT_TEAMS],
+					code->loc));
   switch (code->op)
     {
     case EXEC_OMP_TARGET_TEAMS:
@@ -4500,8 +4541,30 @@  gfc_trans_omp_target (gfc_code *code)
 	poplevel (0, 0);
       break;
     default:
-      pushlevel ();
-      stmt = gfc_trans_omp_teams (code, clausesa);
+      if (flag_openmp
+	  && (clausesa[GFC_OMP_SPLIT_TEAMS].num_teams
+	      || clausesa[GFC_OMP_SPLIT_TEAMS].thread_limit))
+	{
+	  gfc_omp_clauses clausesb;
+	  tree teams_clauses;
+	  /* For combined !$omp target teams, the num_teams and
+	     thread_limit clauses are evaluated before entering the
+	     target construct.  */
+	  memset (&clausesb, '\0', sizeof (clausesb));
+	  clausesb.num_teams = clausesa[GFC_OMP_SPLIT_TEAMS].num_teams;
+	  clausesb.thread_limit = clausesa[GFC_OMP_SPLIT_TEAMS].thread_limit;
+	  clausesa[GFC_OMP_SPLIT_TEAMS].num_teams = NULL;
+	  clausesa[GFC_OMP_SPLIT_TEAMS].thread_limit = NULL;
+	  teams_clauses
+	    = gfc_trans_omp_clauses (&block, &clausesb, code->loc);
+	  pushlevel ();
+	  stmt = gfc_trans_omp_teams (code, clausesa, teams_clauses);
+	}
+      else
+	{
+	  pushlevel ();
+	  stmt = gfc_trans_omp_teams (code, clausesa, NULL_TREE);
+	}
       if (TREE_CODE (stmt) != BIND_EXPR)
 	stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
       else
@@ -4880,7 +4943,7 @@  gfc_trans_omp_directive (gfc_code *code)
     case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO:
     case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
     case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD:
-      return gfc_trans_omp_teams (code, NULL);
+      return gfc_trans_omp_teams (code, NULL, NULL_TREE);
     case EXEC_OMP_WORKSHARE:
       return gfc_trans_omp_workshare (code, code->ext.omp_clauses);
     default:
--- libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90.jj	2015-10-14 10:24:10.000000000 +0200
+++ libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90	2016-05-19 15:23:02.386246868 +0200
@@ -17,7 +17,7 @@  contains
 
   integer function fib_wrapper (n)
     integer :: x
-    !$omp target map(to: n) if(n > THRESHOLD)
+    !$omp target map(to: n) map(from: x) if(n > THRESHOLD)
       x = fib (n)
     !$omp end target
     fib_wrapper = x
--- libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90.jj	2015-10-14 10:24:10.000000000 +0200
+++ libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90	2016-05-19 15:23:23.778958079 +0200
@@ -3,7 +3,7 @@ 
 program e_53_2
   !$omp declare target (fib)
   integer :: x, fib
-  !$omp target
+  !$omp target map(from: x)
     x = fib (25)
   !$omp end target
   if (x /= fib (25)) call abort
--- libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90.jj	2015-10-14 10:24:10.000000000 +0200
+++ libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90	2016-05-19 15:24:05.546394243 +0200
@@ -16,7 +16,7 @@  real function accum (k) result (tmp)
   use e_53_4_mod
   integer :: i, k
   tmp = 0.0e0
-  !$omp target
+  !$omp target map(tmp)
     !$omp parallel do reduction(+:tmp)
     do i = 1, N
       tmp = tmp + Pfun (k, i)
--- libgomp/testsuite/libgomp.fortran/examples-4/declare_target-5.f90.jj	2015-10-14 10:24:10.000000000 +0200
+++ libgomp/testsuite/libgomp.fortran/examples-4/declare_target-5.f90	2016-05-19 15:25:32.889215166 +0200
@@ -21,7 +21,7 @@  real function accum () result (tmp)
   real :: tmp1
   integer :: i
   tmp = 0.0e0
-  !$omp target
+  !$omp target map(tofrom: tmp)
     !$omp parallel do private(tmp1) reduction(+:tmp)
     do i = 1, N
       tmp1 = 0.0e0
--- libgomp/testsuite/libgomp.fortran/examples-4/target_data-3.f90.jj	2015-10-14 10:24:10.000000000 +0200
+++ libgomp/testsuite/libgomp.fortran/examples-4/target_data-3.f90	2016-05-20 14:40:16.710411782 +0200
@@ -45,7 +45,7 @@  contains
     !$omp target data map(Q)
       do k = 1, cols
         tmp = 0.0d0
-        !$omp target
+        !$omp target map(tofrom: tmp)
           !$omp parallel do reduction(+:tmp)
           do i = 1, rows
             tmp = tmp + (Q(i,k) * Q(i,k))
--- libgomp/testsuite/libgomp.fortran/examples-4/teams-2.f90.jj	2015-10-14 10:24:10.000000000 +0200
+++ libgomp/testsuite/libgomp.fortran/examples-4/teams-2.f90	2016-05-19 15:30:18.801353547 +0200
@@ -15,7 +15,8 @@  function dotprod (B, C, N, block_size, n
   real :: B(N), C(N), sum
   integer :: N, block_size, num_teams, block_threads, i, i0
   sum = 0.0e0
-  !$omp target map(to: B, C, block_size, num_teams, block_threads)
+  !$omp target map(to: B, C, block_size, num_teams, block_threads) &
+  !$omp& map(tofrom: sum)
     !$omp teams num_teams(num_teams) thread_limit(block_threads) &
     !$omp& reduction(+:sum)
       !$omp distribute
--- libgomp/testsuite/libgomp.fortran/nestedfn5.f90.jj	2015-10-14 10:24:10.000000000 +0200
+++ libgomp/testsuite/libgomp.fortran/nestedfn5.f90	2016-05-20 15:27:23.092637352 +0200
@@ -52,7 +52,7 @@  contains
 !$omp end parallel
     b = 10
 !$omp target data map (tofrom: a, d(2:3,4:4), q) map (from: l)
-!$omp target map (tofrom: b, d(2:3,4:4))
+!$omp target map (tofrom: b, d(2:3,4:4)) map (alloc: a, l)
     l = .false.
     if (a /= 22 .or. any (q /= 5)) l = .true.
     if (lbound (q, 1) /= 19 .or. ubound (q, 1) /= 27) l = .true.
@@ -71,7 +71,7 @@  contains
     q = 14
     d = 15
 !$omp target update to (a, q, d(2:3,4:4))
-!$omp target map (tofrom: b, d(2:3,4:4))
+!$omp target map (tofrom: b, d(2:3,4:4)) map (alloc: a, l)
     if (a /= 12 .or. b /= 13 .or. any (q /= 14)) l = .true.
     l = l .or. any (d(2:3,4:4) /= 15)
 !$omp end target
@@ -85,7 +85,8 @@  contains
     if (l) call abort
 !$omp target teams distribute parallel do simd if (.not.l) device(a) &
 !$omp & num_teams(b) dist_schedule(static, c) num_threads (h) &
-!$omp & reduction (+: m) safelen (n) schedule(static, o)
+!$omp & reduction (+: m) safelen (n) schedule(static, o) &
+!$omp & defaultmap(tofrom: scalar)
     do p = 1, 64
       m = m + 1
     end do
--- libgomp/testsuite/libgomp.fortran/pr66199-2.f90.jj	2015-10-14 10:24:10.000000000 +0200
+++ libgomp/testsuite/libgomp.fortran/pr66199-2.f90	2016-05-20 15:32:07.577829835 +0200
@@ -14,12 +14,11 @@ 
   c = 17
   d = 75
   !$omp target teams distribute parallel do simd default(none) &
-  !$omp& firstprivate (a, b) shared(u, v, w) &
-  !$omp& linear(d) linear(c:5) lastprivate(e)
+  !$omp& firstprivate (a, b, c) shared(u, v, w) &
+  !$omp& linear(d) lastprivate(e)
   do d = a, b
     u(d) = v(d) + w(d)
-    c = c + 5
-    e = c
+    e = c + d * 5
   end do
   a1 = 0
   a2 = 0
--- libgomp/testsuite/libgomp.fortran/target4.f90.jj	2015-10-14 10:24:10.000000000 +0200
+++ libgomp/testsuite/libgomp.fortran/target4.f90	2016-05-19 15:32:09.992851380 +0200
@@ -8,7 +8,7 @@  contains
     !$omp target data map(a) map(to: m, n)
     do i=1,n
       t = 0.0d0
-      !$omp target
+      !$omp target map(t)
         !$omp parallel do reduction(+:t)
           do j=1,m
             t = t + a(j,i) * a(j,i)