diff mbox

OpenACC Firstprivate

Message ID 563E01A4.20607@acm.org
State New
Headers show

Commit Message

Nathan Sidwell Nov. 7, 2015, 1:50 p.m. UTC
Jakub,
this patch implements firstprivate support for openacc.  This is pretty straight 
forwards -- they're just regular auto variables, but with an initialization 
value from the host.

The gimplify.c implementation is somewhat different to gomp4 branch, as I've 
added new bits to enum omp_region_type, rather than add 2 new fields to 
omp_region_ctx.  The new enums use bits already defined in omp_region_type:

+  ORT_ACC = 0x40,  /* An OpenACC region.  */
+  ORT_ACC_DATA = ORT_ACC | ORT_TARGET_DATA, /* Data construct.  */
+  ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET,  /* Parallel construct */
+  ORT_ACC_KERNELS  = ORT_ACC | ORT_TARGET | 0x80,  /* Kernels construct.  */

On gomp4 we were already setting those bits, but then setting the new fields to 
indicate 'openacc'.  Many places in gimplify.c where we check for '== 
ORT_TARGET_DATA' or ORT_TARGET get changed to '& ORT_TARGET_DATA' etc.

On gomp4 for things like an openacc loop we were setting ORT_WORKSHARE, so 
nearly all checks for == ORT_WORKSHARE get an additional '|| X == ORT_ACC'.

Although this patch doesn't make use of the difference between ORT_ACC_KERNELS 
and ORT_ACC_PARALLEL, the default handling patch will -- they have different 
behaviours.

I think the gimpify.c changes are then obvious from that, but let me know.

in omp-low the changes are to remove 'sorry' and build the initializer exprs in 
lower_omp_target.

As you can see this fixes a few xfails.

I'll post the default handling patch, which is much more localized.

nathan

Comments

Jakub Jelinek Nov. 9, 2015, 1:46 p.m. UTC | #1
On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote:
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c	(revision 229892)
> +++ gcc/gimplify.c	(working copy)
> @@ -108,9 +108,15 @@ enum omp_region_type
>    /* Data region with offloading.  */
>    ORT_TARGET = 32,
>    ORT_COMBINED_TARGET = 33,
> +
> +  ORT_ACC = 0x40,  /* An OpenACC region.  */
> +  ORT_ACC_DATA = ORT_ACC | ORT_TARGET_DATA, /* Data construct.  */
> +  ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET,  /* Parallel construct */
> +  ORT_ACC_KERNELS  = ORT_ACC | ORT_TARGET | 0x80,  /* Kernels construct.  */
> +
>    /* Dummy OpenMP region, used to disable expansion of
>       DECL_VALUE_EXPRs in taskloop pre body.  */
> -  ORT_NONE = 64
> +  ORT_NONE = 0x100
>  };

If you want to switch to hexadecimal, you should change all values
in the enum to hexadecimal for consistency.
>  
>  /* Gimplify hashtable helper.  */
> @@ -377,6 +383,12 @@ new_omp_context (enum omp_region_type re
>    else
>      c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
>  
> +  c->combined_loop = false;
> +  c->distribute = false;
> +  c->target_map_scalars_firstprivate = false;
> +  c->target_map_pointers_as_0len_arrays = false;
> +  c->target_firstprivatize_array_bases = false;

Why this?  c is XCNEW allocated, so zero initialized.

> @@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct
>        /* We shouldn't be re-adding the decl with the same data
>  	 sharing class.  */
>        gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0);
> -      /* The only combination of data sharing classes we should see is
> -	 FIRSTPRIVATE and LASTPRIVATE.  */
>        nflags = n->value | flags;
> -      gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS)
> -		  == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)
> +      /* The only combination of data sharing classes we should see is
> +	 FIRSTPRIVATE and LASTPRIVATE.  However, OpenACC permits
> +	 reduction variables to be used in data sharing clauses.  */
> +      gcc_assert ((ctx->region_type & ORT_ACC) != 0
> +		  || ((nflags & GOVD_DATA_SHARE_CLASS)
> +		      == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE))
>  		  || (flags & GOVD_DATA_SHARE_CLASS) == 0);

Are you sure you want to give up on any kind of consistency checks for
OpenACC?  If only reduction is special on OpenACC, perhaps you could tweak
the assert for that instead?  Something that can be done incrementally of
course.

> +
> +	  /*  OpenMP doesn't look in outer contexts to find an
> +	      enclosing data clause.  */

I'm puzzled by the comment.  OpenMP does look in outer context for clauses
that need that (pretty much all closes but private), that is the do_outer:
recursion in omp_notice_variable.  Say for firstprivate in order to copy (or
copy construct) the private variable one needs the access to the outer
context's var etc.).
So perhaps it would help to document what you are doing here for OpenACC and
why.

> +	  struct gimplify_omp_ctx *octx = ctx->outer_context;
> +	  if ((ctx->region_type & ORT_ACC) && octx)
> +	    {
> +	      omp_notice_variable (octx, decl, in_code);
> +	      
> +	      for (; octx; octx = octx->outer_context)
> +		{
> +		  if (!(octx->region_type & (ORT_TARGET_DATA | ORT_TARGET)))
> +		    break;
> +		  splay_tree_node n2
> +		    = splay_tree_lookup (octx->variables,
> +					 (splay_tree_key) decl);
> +		  if (n2)
> +		    {
> +		      nflags |= GOVD_MAP;
> +		      goto found_outer;
> +		    }
> +		}
>  	    }
> -	  else if (nflags == flags)
> -	    nflags |= GOVD_MAP;
> +

The main issue I have is with the omp-low.c changes.
I see:
"2.5.9
private clause
The private clause is allowed on the parallel construct; it declares that a copy of each
item on the list will be created for each parallel gang.

2.5.10
firstprivate clause
The firstprivate clause is allowed on the parallel construct; it declares that a copy
of each item on the list will be created for each parallel gang, and that the copy will be
initialized with the value of that item on the host when the parallel construct is
encountered."

but looking at what you actually emit looks like standard present_copyin
clause I think with a private variable defined in the region where the
value of the present_copyin mapped variable is assigned to the private one.
This I'm afraid performs often two copies rather than just one (one to copy
the host value to the present_copyin mapped value, another one in the
region), but more importantly, if the var is already mapped, you could
initialize the private var with old data.
Say
  int arr[64];
// initialize arr
#pragma acc data copyin (arr)
{
  // modify arr on the host
  # pragma acc parallel firstprivate (arr)
  {
    ...
  }
}
Is that really what you want?  If not, any reason not to implement
GOMP_MAP_FIRSTPRIVATE and GOMP_MAP_FIRSTPRIVATE_INT on the libgomp oacc-*
side and just use the OpenMP firstprivate handling in omp-low.c?

	Jakub
Nathan Sidwell Nov. 9, 2015, 1:59 p.m. UTC | #2
On 11/09/15 08:46, Jakub Jelinek wrote:
> On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote:
>> Index: gcc/gimplify.c
>> ===================================================================

>
> If you want to switch to hexadecimal, you should change all values
> in the enum to hexadecimal for consistency.

ok.

>>
>>   /* Gimplify hashtable helper.  */
>> @@ -377,6 +383,12 @@ new_omp_context (enum omp_region_type re
>>     else
>>       c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
>>
>> +  c->combined_loop = false;
>> +  c->distribute = false;
>> +  c->target_map_scalars_firstprivate = false;
>> +  c->target_map_pointers_as_0len_arrays = false;
>> +  c->target_firstprivatize_array_bases = false;
>
> Why this?  c is XCNEW allocated, so zero initialized.

I presumed it necessary, as it was on the branch.  will  remove.

>
>> @@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct
>>         /* We shouldn't be re-adding the decl with the same data
>>   	 sharing class.  */
>>         gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0);
>> -      /* The only combination of data sharing classes we should see is
>> -	 FIRSTPRIVATE and LASTPRIVATE.  */
>>         nflags = n->value | flags;
>> -      gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS)
>> -		  == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)
>> +      /* The only combination of data sharing classes we should see is
>> +	 FIRSTPRIVATE and LASTPRIVATE.  However, OpenACC permits
>> +	 reduction variables to be used in data sharing clauses.  */
>> +      gcc_assert ((ctx->region_type & ORT_ACC) != 0
>> +		  || ((nflags & GOVD_DATA_SHARE_CLASS)
>> +		      == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE))
>>   		  || (flags & GOVD_DATA_SHARE_CLASS) == 0);
>
> Are you sure you want to give up on any kind of consistency checks for
> OpenACC?  If only reduction is special on OpenACC, perhaps you could tweak
> the assert for that instead?  Something that can be done incrementally of
> course.

Will investigate (later)

>
>> +
>> +	  /*  OpenMP doesn't look in outer contexts to find an
>> +	      enclosing data clause.  */
>
> I'm puzzled by the comment.  OpenMP does look in outer context for clauses
> that need that (pretty much all closes but private), that is the do_outer:
> recursion in omp_notice_variable.  Say for firstprivate in order to copy (or
> copy construct) the private variable one needs the access to the outer
> context's var etc.).
> So perhaps it would help to document what you are doing here for OpenACC and
> why.

Ok.  It seemed (and it may become clearer with default handling added), that 
OpenACC  and OpenMP scanned scopes in opposite orders.  I remember trying to 
get the ACC code to scan in the same order, but came up blank.  Anyway, you're 
right, it should say what OpenACC is trying.


> The main issue I have is with the omp-low.c changes.
> I see:
> "2.5.9
> private clause
> The private clause is allowed on the parallel construct; it declares that a copy of each
> item on the list will be created for each parallel gang.
>
> 2.5.10
> firstprivate clause
> The firstprivate clause is allowed on the parallel construct; it declares that a copy
> of each item on the list will be created for each parallel gang, and that the copy will be
> initialized with the value of that item on the host when the parallel construct is
> encountered."
>
> but looking at what you actually emit looks like standard present_copyin
> clause I think with a private variable defined in the region where the
> value of the present_copyin mapped variable is assigned to the private one.


> This I'm afraid performs often two copies rather than just one (one to copy
> the host value to the present_copyin mapped value, another one in the
> region),

I don't think that can be avoided.  The host doesn't have control over when the 
CTAs (a gang) start -- they may even be serialized onto the same physical HW. 
So each gang has to initialize its own instance.  Or did you mean something else?

> but more importantly, if the var is already mapped, you could
> initialize the private var with old data.


> Say
>    int arr[64];
> // initialize arr
> #pragma acc data copyin (arr)
> {
>    // modify arr on the host
>    # pragma acc parallel firstprivate (arr)
>    {
>      ...
>    }
> }

Hm, I suspect that is either ill formed or the std does not contemplate.

> Is that really what you want?  If not, any reason not to implement
> GOMP_MAP_FIRSTPRIVATE and GOMP_MAP_FIRSTPRIVATE_INT on the libgomp oacc-*
> side and just use the OpenMP firstprivate handling in omp-low.c?

I would have to investigate ...

nathan
Nathan Sidwell Nov. 9, 2015, 2:06 p.m. UTC | #3
On 11/09/15 08:59, Nathan Sidwell wrote:
> On 11/09/15 08:46, Jakub Jelinek wrote:
>> On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote:

>
>> Say
>>    int arr[64];
>> // initialize arr
>> #pragma acc data copyin (arr)
>> {
>>    // modify arr on the host
>>    # pragma acc parallel firstprivate (arr)
>>    {
>>      ...
>>    }
>> }
>
> Hm, I suspect that is either ill formed or the std does not contemplate.

just realized, there are two ways to consider the above.

1) it's  ill formed.   Once you've transferred data to the device, modifying it 
on the host is unspecified.  I'm having trouble finding words in the std that 
actually say that though :(

2) on a system with shared physical global memory, the host modification would 
be visiable on the device (possibly at an arbitrary point due to lack of 
synchronization primitive?)

I don't think this changes 'why not use OpenMP's ...' question, because IIUC you 
think that can be made to DTRT anyway?

nathan
Jakub Jelinek Nov. 9, 2015, 2:10 p.m. UTC | #4
On Mon, Nov 09, 2015 at 08:59:15AM -0500, Nathan Sidwell wrote:
> >This I'm afraid performs often two copies rather than just one (one to copy
> >the host value to the present_copyin mapped value, another one in the
> >region),
> 
> I don't think that can be avoided.  The host doesn't have control over when
> the CTAs (a gang) start -- they may even be serialized onto the same
> physical HW. So each gang has to initialize its own instance.  Or did you
> mean something else?

So, what is the scope of the private and firstprivate vars in OpenACC?
In OpenMP if a variable is private or firstprivate on the target construct,
unless further privatized in inner constructs it is really shared among all
the threads in all the teams (ro one var per all CTAs/workers in PTX terms).
Is that the case for OpenACC too, or are the vars e.g. private to each CTA
already or to each thread in each CTA, something different?
If they are shared by all CTAs, then you should hopefully be able to use the
GOMP_MAP_FIRSTPRIVATE{,_INT}, if not, then I'd say you should at least use
those to provide you the initializer data to initialize your private vars
from as a cheaper alternative to mapping.

	Jakub
Nathan Sidwell Nov. 9, 2015, 2:46 p.m. UTC | #5
On 11/09/15 09:10, Jakub Jelinek wrote:
> On Mon, Nov 09, 2015 at 08:59:15AM -0500, Nathan Sidwell wrote:
>>> This I'm afraid performs often two copies rather than just one (one to copy
>>> the host value to the present_copyin mapped value, another one in the
>>> region),
>>
>> I don't think that can be avoided.  The host doesn't have control over when
>> the CTAs (a gang) start -- they may even be serialized onto the same
>> physical HW. So each gang has to initialize its own instance.  Or did you
>> mean something else?
>
> So, what is the scope of the private and firstprivate vars in OpenACC?
> In OpenMP if a variable is private or firstprivate on the target construct,
> unless further privatized in inner constructs it is really shared among all
> the threads in all the teams (ro one var per all CTAs/workers in PTX terms).
> Is that the case for OpenACC too, or are the vars e.g. private to each CTA
> already or to each thread in each CTA, something different?
> If they are shared by all CTAs, then you should hopefully be able to use the
> GOMP_MAP_FIRSTPRIVATE{,_INT}, if not, then I'd say you should at least use
> those to provide you the initializer data to initialize your private vars
> from as a cheaper alternative to mapping.

I'm going to try and get clarification, but I think the intent is to initialize 
with the value seen on the device.  Consider:


int foo = 0;
#pragma acc data copyin(foo)
{
   #pragma acc parallel present(foo)
   {
     foo = 2;
   }

   if (expr){
     #pragma update host (foo)
   }

   #pragma acc parallel firstprivate (foo)
   {
   // which initialization value?
   }
}

Here we copy data to the device, then set it a distinct value there.  We 
conditionally update the host's instance from the device.

My thinking is that the intent of the firstprivate is to initialize with the 
value known on the device (and behave as-if copyin, if it's not there).  Not the 
value most recently seen on the host -- the update clause could change that, and 
may well be being used as a debugging aide, so it seems bizarre that it can 
change program semantics in such a way.
diff mbox

Patch

2015-11-06  Nathan Sidwell  <nathan@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* gcc/gimplify.c (enum  omp_region_type): Add ORT_ACC,
	ORT_ACC_DATA, ORT_ACC_PARALLEL, ORT_ACC_KERNELS.  Adjust ORT_NONE.
	(new_omp_context): Initialize all fields.
	(gimple_add_tmp_var): Add ORT_ACC checks.
	(gimplify_var_or_parm_decl): Likewise.
	(omp_firstprivatize_variable): Likewise. Use ORT_TARGET_DATA as a
	mask.
	(omp_add_variable): Look in outer contexts for openacc and allow
	reductions with other sharing. Add ORT_ACC and ORT_TARGET_DATA
	checks.
	(omp_notice_variable, omp_is_private, omp_check_private): Add
	ORT_ACC checks.
	(gimplify_scan_omp_clauses: Treat ORT_ACC as ORT_WORKSHARE.
	Permit private openacc reductions.
	(gimplify_oacc_cache): Specify ORT_ACC.
	(gimplify_omp_workshare): Adjust OpenACC region types.
	(gimplify_omp_target_update): Likewise.
	* gcc/omp-low.c (scan_sharing_clauses): Remove Openacc
	firstprivate sorry.
	(lower-rec_input_clauses): Don't handle openacc firstprivate
	references here.
	(lower_omp_target): Emit initializers for openacc firstprivate vars.

	gcc/testsuite/
	* gfortran.dg/goacc/private-3.f95: Remove xfail.
	* gfortran.dg/goacc/combined_loop.f90: Remove xfail.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Remove xfail.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Remove xfail.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: New.

Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 229892)
+++ gcc/gimplify.c	(working copy)
@@ -108,9 +108,15 @@  enum omp_region_type
   /* Data region with offloading.  */
   ORT_TARGET = 32,
   ORT_COMBINED_TARGET = 33,
+
+  ORT_ACC = 0x40,  /* An OpenACC region.  */
+  ORT_ACC_DATA = ORT_ACC | ORT_TARGET_DATA, /* Data construct.  */
+  ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET,  /* Parallel construct */
+  ORT_ACC_KERNELS  = ORT_ACC | ORT_TARGET | 0x80,  /* Kernels construct.  */
+
   /* Dummy OpenMP region, used to disable expansion of
      DECL_VALUE_EXPRs in taskloop pre body.  */
-  ORT_NONE = 64
+  ORT_NONE = 0x100
 };
 
 /* Gimplify hashtable helper.  */
@@ -377,6 +383,12 @@  new_omp_context (enum omp_region_type re
   else
     c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
 
+  c->combined_loop = false;
+  c->distribute = false;
+  c->target_map_scalars_firstprivate = false;
+  c->target_map_pointers_as_0len_arrays = false;
+  c->target_firstprivatize_array_bases = false;
+
   return c;
 }
 
@@ -689,7 +701,8 @@  gimple_add_tmp_var (tree tmp)
 	  struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
 	  while (ctx
 		 && (ctx->region_type == ORT_WORKSHARE
-		     || ctx->region_type == ORT_SIMD))
+		     || ctx->region_type == ORT_SIMD
+		     || ctx->region_type == ORT_ACC))
 	    ctx = ctx->outer_context;
 	  if (ctx)
 	    omp_add_variable (ctx, tmp, GOVD_LOCAL | GOVD_SEEN);
@@ -1804,7 +1817,8 @@  gimplify_var_or_parm_decl (tree *expr_p)
 	  struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
 	  while (ctx
 		 && (ctx->region_type == ORT_WORKSHARE
-		     || ctx->region_type == ORT_SIMD))
+		     || ctx->region_type == ORT_SIMD
+		     || ctx->region_type == ORT_ACC))
 	    ctx = ctx->outer_context;
 	  if (!ctx && !nonlocal_vlas->add (decl))
 	    {
@@ -5579,7 +5593,8 @@  omp_firstprivatize_variable (struct gimp
 	}
       else if (ctx->region_type != ORT_WORKSHARE
 	       && ctx->region_type != ORT_SIMD
-	       && ctx->region_type != ORT_TARGET_DATA)
+	       && ctx->region_type != ORT_ACC
+	       && !(ctx->region_type & ORT_TARGET_DATA))
 	omp_add_variable (ctx, decl, GOVD_FIRSTPRIVATE);
 
       ctx = ctx->outer_context;
@@ -5667,11 +5682,13 @@  omp_add_variable (struct gimplify_omp_ct
       /* We shouldn't be re-adding the decl with the same data
 	 sharing class.  */
       gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0);
-      /* The only combination of data sharing classes we should see is
-	 FIRSTPRIVATE and LASTPRIVATE.  */
       nflags = n->value | flags;
-      gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS)
-		  == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)
+      /* The only combination of data sharing classes we should see is
+	 FIRSTPRIVATE and LASTPRIVATE.  However, OpenACC permits
+	 reduction variables to be used in data sharing clauses.  */
+      gcc_assert ((ctx->region_type & ORT_ACC) != 0
+		  || ((nflags & GOVD_DATA_SHARE_CLASS)
+		      == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE))
 		  || (flags & GOVD_DATA_SHARE_CLASS) == 0);
       n->value = nflags;
       return;
@@ -5968,20 +5985,47 @@  omp_notice_variable (struct gimplify_omp
 	      else if (is_scalar)
 		nflags |= GOVD_FIRSTPRIVATE;
 	    }
-	  tree type = TREE_TYPE (decl);
-	  if (nflags == flags
-	      && gimplify_omp_ctxp->target_firstprivatize_array_bases
-	      && lang_hooks.decls.omp_privatize_by_reference (decl))
-	    type = TREE_TYPE (type);
-	  if (nflags == flags
-	      && !lang_hooks.types.omp_mappable_type (type))
-	    {
-	      error ("%qD referenced in target region does not have "
-		     "a mappable type", decl);
-	      nflags |= GOVD_MAP | GOVD_EXPLICIT;
+
+	  /*  OpenMP doesn't look in outer contexts to find an
+	      enclosing data clause.  */
+	  struct gimplify_omp_ctx *octx = ctx->outer_context;
+	  if ((ctx->region_type & ORT_ACC) && octx)
+	    {
+	      omp_notice_variable (octx, decl, in_code);
+	      
+	      for (; octx; octx = octx->outer_context)
+		{
+		  if (!(octx->region_type & (ORT_TARGET_DATA | ORT_TARGET)))
+		    break;
+		  splay_tree_node n2
+		    = splay_tree_lookup (octx->variables,
+					 (splay_tree_key) decl);
+		  if (n2)
+		    {
+		      nflags |= GOVD_MAP;
+		      goto found_outer;
+		    }
+		}
 	    }
-	  else if (nflags == flags)
-	    nflags |= GOVD_MAP;
+
+	  {
+	    tree type = TREE_TYPE (decl);
+
+	    if (nflags == flags
+		&& gimplify_omp_ctxp->target_firstprivatize_array_bases
+		&& lang_hooks.decls.omp_privatize_by_reference (decl))
+	      type = TREE_TYPE (type);
+	    if (nflags == flags
+		&& !lang_hooks.types.omp_mappable_type (type))
+	      {
+		error ("%qD referenced in target region does not have "
+		       "a mappable type", decl);
+		nflags |= GOVD_MAP | GOVD_EXPLICIT;
+	      }
+	    else if (nflags == flags)
+	      nflags |= GOVD_MAP;
+	  }
+	found_outer:
 	  omp_add_variable (ctx, decl, nflags);
 	}
       else
@@ -5998,7 +6042,8 @@  omp_notice_variable (struct gimplify_omp
     {
       if (ctx->region_type == ORT_WORKSHARE
 	  || ctx->region_type == ORT_SIMD
-	  || ctx->region_type == ORT_TARGET_DATA)
+	  || ctx->region_type == ORT_ACC
+	  || (ctx->region_type & ORT_TARGET_DATA) != 0)
 	goto do_outer;
 
       flags = omp_default_clause (ctx, decl, in_code, flags);
@@ -6112,7 +6157,8 @@  omp_is_private (struct gimplify_omp_ctx
     }
 
   if (ctx->region_type != ORT_WORKSHARE
-      && ctx->region_type != ORT_SIMD)
+      && ctx->region_type != ORT_SIMD
+      && ctx->region_type != ORT_ACC)
     return false;
   else if (ctx->outer_context)
     return omp_is_private (ctx->outer_context, decl, simd);
@@ -6168,7 +6214,8 @@  omp_check_private (struct gimplify_omp_c
 	}
     }
   while (ctx->region_type == ORT_WORKSHARE
-	 || ctx->region_type == ORT_SIMD);
+	 || ctx->region_type == ORT_SIMD
+	 || ctx->region_type == ORT_ACC);
   return false;
 }
 
@@ -6311,7 +6358,8 @@  gimplify_scan_omp_clauses (tree *list_p,
 		omp_notice_variable (outer_ctx->outer_context, decl, true);
 	    }
 	  else if (outer_ctx
-		   && outer_ctx->region_type == ORT_WORKSHARE
+		   && (outer_ctx->region_type == ORT_WORKSHARE
+		       || outer_ctx->region_type == ORT_ACC)
 		   && outer_ctx->combined_loop
 		   && splay_tree_lookup (outer_ctx->variables,
 					 (splay_tree_key) decl) == NULL
@@ -6335,7 +6383,9 @@  gimplify_scan_omp_clauses (tree *list_p,
 	  goto do_add;
 	case OMP_CLAUSE_REDUCTION:
 	  flags = GOVD_REDUCTION | GOVD_SEEN | GOVD_EXPLICIT;
-	  check_non_private = "reduction";
+	  /* OpenACC permits reductions on private variables.  */
+	  if (!(region_type & ORT_ACC))
+	    check_non_private = "reduction";
 	  decl = OMP_CLAUSE_DECL (c);
 	  if (TREE_CODE (decl) == MEM_REF)
 	    {
@@ -7703,7 +7753,7 @@  gimplify_oacc_cache (tree *expr_p, gimpl
 {
   tree expr = *expr_p;
 
-  gimplify_scan_omp_clauses (&OACC_CACHE_CLAUSES (expr), pre_p, ORT_WORKSHARE,
+  gimplify_scan_omp_clauses (&OACC_CACHE_CLAUSES (expr), pre_p, ORT_ACC,
 			     OACC_CACHE);
   gimplify_adjust_omp_clauses (pre_p, &OACC_CACHE_CLAUSES (expr), OACC_CACHE);
 
@@ -7832,7 +7882,9 @@  gimplify_omp_for (tree *expr_p, gimple_s
     case OMP_FOR:
     case CILK_FOR:
     case OMP_DISTRIBUTE:
+      break;
     case OACC_LOOP:
+      ort = ORT_ACC;
       break;
     case OMP_TASKLOOP:
       if (find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_UNTIED))
@@ -8894,10 +8946,14 @@  gimplify_omp_workshare (tree *expr_p, gi
       ort = OMP_TARGET_COMBINED (expr) ? ORT_COMBINED_TARGET : ORT_TARGET;
       break;
     case OACC_KERNELS:
+      ort = ORT_ACC_KERNELS;
+      break;
     case OACC_PARALLEL:
-      ort = ORT_TARGET;
+      ort = ORT_ACC_PARALLEL;
       break;
     case OACC_DATA:
+      ort = ORT_ACC_DATA;
+      break;
     case OMP_TARGET_DATA:
       ort = ORT_TARGET_DATA;
       break;
@@ -8919,7 +8975,7 @@  gimplify_omp_workshare (tree *expr_p, gi
 	pop_gimplify_context (g);
       else
 	pop_gimplify_context (NULL);
-      if (ort == ORT_TARGET_DATA)
+      if ((ort & ORT_TARGET_DATA) != 0)
 	{
 	  enum built_in_function end_ix;
 	  switch (TREE_CODE (expr))
@@ -8994,17 +9050,18 @@  gimplify_omp_target_update (tree *expr_p
   tree expr = *expr_p;
   int kind;
   gomp_target *stmt;
+  enum omp_region_type ort = ORT_WORKSHARE;
 
   switch (TREE_CODE (expr))
     {
     case OACC_ENTER_DATA:
-      kind = GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA;
-      break;
     case OACC_EXIT_DATA:
       kind = GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA;
+      ort = ORT_ACC;
       break;
     case OACC_UPDATE:
       kind = GF_OMP_TARGET_KIND_OACC_UPDATE;
+      ort = ORT_ACC;
       break;
     case OMP_TARGET_UPDATE:
       kind = GF_OMP_TARGET_KIND_UPDATE;
@@ -9019,7 +9076,7 @@  gimplify_omp_target_update (tree *expr_p
       gcc_unreachable ();
     }
   gimplify_scan_omp_clauses (&OMP_STANDALONE_CLAUSES (expr), pre_p,
-			     ORT_WORKSHARE, TREE_CODE (expr));
+			     ort, TREE_CODE (expr));
   gimplify_adjust_omp_clauses (pre_p, &OMP_STANDALONE_CLAUSES (expr),
 			       TREE_CODE (expr));
   stmt = gimple_build_omp_target (NULL, kind, OMP_STANDALONE_CLAUSES (expr));
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	(revision 229892)
+++ gcc/omp-low.c	(working copy)
@@ -1896,12 +1896,6 @@  scan_sharing_clauses (tree clauses, omp_
 	  /* FALLTHRU */
 
 	case OMP_CLAUSE_FIRSTPRIVATE:
-	  if (is_gimple_omp_oacc (ctx->stmt))
-	    {
-	      sorry ("clause not supported yet");
-	      break;
-	    }
-	  /* FALLTHRU */
 	case OMP_CLAUSE_LINEAR:
 	  decl = OMP_CLAUSE_DECL (c);
 	do_private:
@@ -2167,12 +2161,6 @@  scan_sharing_clauses (tree clauses, omp_
 	  /* FALLTHRU */
 
 	case OMP_CLAUSE_FIRSTPRIVATE:
-	  if (is_gimple_omp_oacc (ctx->stmt))
-	    {
-	      sorry ("clause not supported yet");
-	      break;
-	    }
-	  /* FALLTHRU */
 	case OMP_CLAUSE_PRIVATE:
 	case OMP_CLAUSE_LINEAR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
@@ -4684,7 +4672,7 @@  lower_rec_input_clauses (tree clauses, g
 		  gimplify_assign (ptr, x, ilist);
 		}
 	    }
-	  else if (is_reference (var))
+	  else if (is_reference (var) && !is_oacc_parallel (ctx))
 	    {
 	      /* For references that are being privatized for Fortran,
 		 allocate new backing storage for the new pointer
@@ -14878,7 +14866,7 @@  lower_omp_target (gimple_stmt_iterator *
   tree child_fn, t, c;
   gomp_target *stmt = as_a <gomp_target *> (gsi_stmt (*gsi_p));
   gbind *tgt_bind, *bind, *dep_bind = NULL;
-  gimple_seq tgt_body, olist, ilist, new_body;
+  gimple_seq tgt_body, olist, ilist, fplist, new_body;
   location_t loc = gimple_location (stmt);
   bool offloaded, data_region;
   unsigned int map_cnt = 0;
@@ -14930,6 +14918,7 @@  lower_omp_target (gimple_stmt_iterator *
   child_fn = ctx->cb.dst_fn;
 
   push_gimplify_context ();
+  fplist = NULL;
 
   for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
     switch (OMP_CLAUSE_CODE (c))
@@ -14974,6 +14963,7 @@  lower_omp_target (gimple_stmt_iterator *
 	  /* FALLTHRU */
       case OMP_CLAUSE_TO:
       case OMP_CLAUSE_FROM:
+      oacc_firstprivate:
 	var = OMP_CLAUSE_DECL (c);
 	if (!DECL_P (var))
 	  {
@@ -14996,6 +14986,7 @@  lower_omp_target (gimple_stmt_iterator *
 	  }
 
 	if (offloaded
+	    && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	    && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
 		|| OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_REFERENCE))
 	  {
@@ -15024,17 +15015,40 @@  lower_omp_target (gimple_stmt_iterator *
 	    x = build_receiver_ref (var, true, ctx);
 	    tree new_var = lookup_decl (var, ctx);
 
-	    if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
+	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+		&& OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
 		&& !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
 		&& TREE_CODE (TREE_TYPE (var)) == ARRAY_TYPE)
 	      x = build_simple_mem_ref (x);
-	    SET_DECL_VALUE_EXPR (new_var, x);
-	    DECL_HAS_VALUE_EXPR_P (new_var) = 1;
+	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
+	      {
+		gcc_assert (is_gimple_omp_oacc (ctx->stmt));
+		if (is_reference (new_var))
+		  {
+		    /* Create a local object to hold the instance
+		       value.  */
+		    tree inst = create_tmp_var
+		      (TREE_TYPE (TREE_TYPE (new_var)),
+		       IDENTIFIER_POINTER (DECL_NAME (new_var)));
+		    gimplify_assign (inst, fold_indirect_ref (x), &fplist);
+		    x = build_fold_addr_expr (inst);
+		  }
+		gimplify_assign (new_var, x, &fplist);
+	      }
+	    else if (DECL_P (new_var))
+	      {
+		SET_DECL_VALUE_EXPR (new_var, x);
+		DECL_HAS_VALUE_EXPR_P (new_var) = 1;
+	      }
+	    else
+	      gcc_unreachable ();
 	  }
 	map_cnt++;
 	break;
 
       case OMP_CLAUSE_FIRSTPRIVATE:
+	if (is_oacc_parallel (ctx))
+	  goto oacc_firstprivate;
 	map_cnt++;
 	var = OMP_CLAUSE_DECL (c);
 	if (!is_reference (var)
@@ -15059,6 +15073,8 @@  lower_omp_target (gimple_stmt_iterator *
 	break;
 
       case OMP_CLAUSE_PRIVATE:
+	if (is_gimple_omp_oacc (ctx->stmt))
+	  break;
 	var = OMP_CLAUSE_DECL (c);
 	if (is_variable_sized (var))
 	  {
@@ -15162,9 +15178,11 @@  lower_omp_target (gimple_stmt_iterator *
 
 	  default:
 	    break;
+
 	  case OMP_CLAUSE_MAP:
 	  case OMP_CLAUSE_TO:
 	  case OMP_CLAUSE_FROM:
+	  oacc_firstprivate_map:
 	    nc = c;
 	    ovar = OMP_CLAUSE_DECL (c);
 	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
@@ -15215,9 +15233,9 @@  lower_omp_target (gimple_stmt_iterator *
 		x = build_sender_ref (ovar, ctx);
 
 		if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-			 && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
-			 && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
-			 && TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE)
+		    && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
+		    && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
+		    && TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE)
 		  {
 		    gcc_assert (offloaded);
 		    tree avar
@@ -15228,6 +15246,15 @@  lower_omp_target (gimple_stmt_iterator *
 		    avar = build_fold_addr_expr (avar);
 		    gimplify_assign (x, avar, &ilist);
 		  }
+		else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
+		  {
+		    gcc_assert (is_gimple_omp_oacc (ctx->stmt));
+		    if (!is_reference (var))
+		      var = build_fold_addr_expr (var);
+		    else
+		      talign = TYPE_ALIGN_UNIT (TREE_TYPE (TREE_TYPE (ovar)));
+		    gimplify_assign (x, var, &ilist);
+		  }
 		else if (is_gimple_reg (var))
 		  {
 		    gcc_assert (offloaded);
@@ -15256,7 +15283,17 @@  lower_omp_target (gimple_stmt_iterator *
 		    gimplify_assign (x, var, &ilist);
 		  }
 	      }
-	    s = OMP_CLAUSE_SIZE (c);
+	    s = NULL_TREE;
+	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
+	      {
+		gcc_checking_assert (is_gimple_omp_oacc (ctx->stmt));
+		s = TREE_TYPE (ovar);
+		if (TREE_CODE (s) == REFERENCE_TYPE)
+		  s = TREE_TYPE (s);
+		s = TYPE_SIZE_UNIT (s);
+	      }
+	    else
+	      s = OMP_CLAUSE_SIZE (c);
 	    if (s == NULL_TREE)
 	      s = TYPE_SIZE_UNIT (TREE_TYPE (ovar));
 	    s = fold_convert (size_type_node, s);
@@ -15297,6 +15334,11 @@  lower_omp_target (gimple_stmt_iterator *
 		      tkind_zero = tkind;
 		  }
 		break;
+	      case OMP_CLAUSE_FIRSTPRIVATE:
+		gcc_checking_assert (is_gimple_omp_oacc (ctx->stmt));
+		tkind = GOMP_MAP_TO;
+		tkind_zero = tkind;
+		break;
 	      case OMP_CLAUSE_TO:
 		tkind = GOMP_MAP_TO;
 		tkind_zero = tkind;
@@ -15336,6 +15378,8 @@  lower_omp_target (gimple_stmt_iterator *
 	    break;
 
 	  case OMP_CLAUSE_FIRSTPRIVATE:
+	    if (is_oacc_parallel (ctx))
+	      goto oacc_firstprivate_map;
 	    ovar = OMP_CLAUSE_DECL (c);
 	    if (is_reference (ovar))
 	      talign = TYPE_ALIGN_UNIT (TREE_TYPE (TREE_TYPE (ovar)));
@@ -15510,6 +15554,7 @@  lower_omp_target (gimple_stmt_iterator *
       gimple_seq_add_stmt (&new_body,
 	  		   gimple_build_assign (ctx->receiver_decl, t));
     }
+  gimple_seq_add_seq (&new_body, fplist);
 
   if (offloaded || data_region)
     {
@@ -15521,6 +15566,8 @@  lower_omp_target (gimple_stmt_iterator *
 	  default:
 	    break;
 	  case OMP_CLAUSE_FIRSTPRIVATE:
+	    if (is_gimple_omp_oacc (ctx->stmt))
+	      break;
 	    var = OMP_CLAUSE_DECL (c);
 	    if (is_reference (var)
 		|| is_gimple_reg_type (TREE_TYPE (var)))
@@ -15606,6 +15653,8 @@  lower_omp_target (gimple_stmt_iterator *
 	      }
 	    break;
 	  case OMP_CLAUSE_PRIVATE:
+	    if (is_gimple_omp_oacc (ctx->stmt))
+	      break;
 	    var = OMP_CLAUSE_DECL (c);
 	    if (is_reference (var))
 	      {
@@ -15694,7 +15743,7 @@  lower_omp_target (gimple_stmt_iterator *
       /* Handle GOMP_MAP_FIRSTPRIVATE_{POINTER,REFERENCE} in second pass,
 	 so that firstprivate vars holding OMP_CLAUSE_SIZE if needed
 	 are already handled.  */
-      for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
+      for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
 	switch (OMP_CLAUSE_CODE (c))
 	  {
 	    tree var;
Index: gcc/testsuite/gfortran.dg/goacc/private-3.f95
===================================================================
--- gcc/testsuite/gfortran.dg/goacc/private-3.f95	(revision 229864)
+++ gcc/testsuite/gfortran.dg/goacc/private-3.f95	(working copy)
@@ -1,6 +1,4 @@ 
 ! { dg-do compile }
-! <http://news.gmane.org/find-root.php?message_id=%3C563B78B5.5090506%40acm.org%3E>
-! { dg-xfail-if "TODO" { *-*-* } }
 
 ! test for private variables in a reduction clause
 
Index: gcc/testsuite/gfortran.dg/goacc/combined_loop.f90
===================================================================
--- gcc/testsuite/gfortran.dg/goacc/combined_loop.f90	(revision 229864)
+++ gcc/testsuite/gfortran.dg/goacc/combined_loop.f90	(working copy)
@@ -1,6 +1,4 @@ 
 ! { dg-do compile } 
-! <http://news.gmane.org/find-root.php?message_id=%3C563B78B5.5090506%40acm.org%3E>
-! { dg-xfail-if "TODO" { *-*-* } }
 
 !
 ! PR fortran/64726
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c	(revision 229852)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c	(working copy)
@@ -1,7 +1,5 @@ 
 /* { dg-do run } */
 /* { dg-additional-options "-O2" */
-/* <http://news.gmane.org/find-root.php?message_id=%3C563B78B5.5090506%40acm.org%3E>
-   { dg-xfail-if "TODO" { *-*-* } } */
 
 #include <stdio.h>
 
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c	(revision 229852)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c	(working copy)
@@ -1,7 +1,5 @@ 
 /* { dg-do run } */
 /* { dg-additional-options "-O2" */
-/* <http://news.gmane.org/find-root.php?message_id=%3C563B78B5.5090506%40acm.org%3E>
-   { dg-xfail-if "TODO" { *-*-* } } */
 
 #include <stdio.h>
 
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c	(working copy)
@@ -0,0 +1,41 @@ 
+/* { dg-do run } */
+
+#include  <openacc.h>
+
+int main ()
+{
+  int ok = 1;
+  int val = 2;
+  int ary[32];
+  int ondev = 0;
+
+  for (int i = 0; i < 32; i++)
+    ary[i] = ~0;
+  
+#pragma acc parallel num_gangs (32) copy (ok) firstprivate (val) copy(ary, ondev)
+  {
+    ondev = acc_on_device (acc_device_not_host);
+#pragma acc loop gang(static:1)
+    for (unsigned i = 0; i < 32; i++)
+      {
+	if (val != 2)
+	  ok = 0;
+	val += i;
+	ary[i] = val;
+      }
+  }
+
+  if (ondev)
+    {
+      if (!ok)
+	return 1;
+      if (val != 2)
+	return 1;
+
+      for (int i = 0; i < 32; i++)
+	if (ary[i] != 2 + i)
+	  return 1;
+    }
+  
+  return 0;
+}