diff mbox series

[WIP] openmp: Add OpenMP 5.0 task detach clause support

Message ID c8848e55-3f93-f59e-63be-4a6d6a9cbf68@codesourcery.com
State New
Headers show
Series [WIP] openmp: Add OpenMP 5.0 task detach clause support | expand

Commit Message

Kwok Cheung Yeung Nov. 11, 2020, 7:06 p.m. UTC
Hello

This is a WIP implementation of the OpenMP 5.0 task detach clause. The task 
construct can now take a detach clause, passing in a variable of type 
omp_event_handle_t. When the construct is encountered, space for an event is 
allocated and the event variable is set to point to the new event. When the task 
is run, it is not complete until a new function omp_fulfill_event has been 
called on the event variable, either in the task itself or in another thread of 
execution.

lower_detach_clause generates code to call GOMP_new_event, which allocates, 
initializes and returns a pointer to a gomp_allow_completion_event struct. The 
return value is then type-cast to a omp_event_handle_t and assigned to the event 
variable, before the data environment for the task construct is set up.

The event variable is passed into the call to GOMP_task, where it is assigned to 
a field in the gomp_task struct. If the task is not deferred, then it will wait 
for the detach event for be fulfilled inside GOMP_task, otherwise it needs to be 
handled in omp_barrier_handle_tasks.

When a task finishes in omp_barrier_handle_tasks and the detach event has not 
been fulfilled, it is placed onto a separate queue of unfulfilled tasks before 
the current thread continues with another task. When the current thread has no 
more tasks, then it will remove a task from the queue of unfulfilled tasks and 
wait for it to complete. When it does, it is removed and any dependent tasks are 
requeued for execution.

We cannot simply block after a task with an unfulfilled event has finished 
because in the case where there are more tasks than threads, there is the 
possibility that all the threads will be tied up waiting, while a task that 
results in an event getting fulfilled never gets run, causing execution to stall.

The memory allocated for the event is released when the associated task is 
destroyed.

Issues that I can see with the current implementation at the moment are:

- No error checking at the front-end.
- The memory for the event is not mapped on the target. This means that if 
omp_fulfill_event is called from an 'omp target' section with a target that does 
not share memory with the host, the event will not be fulfilled (and a segfault 
will probably occur).
- The tasks awaiting event fulfillment currently wait until there are no other 
runnable tasks left. A better approach would be to poll (without blocking) the 
waiting tasks whenever any task completes, immediately removing any now-complete 
tasks and requeuing any dependent tasks.

This patchset has only been very lightly tested on a x86-64 host. Any 
comments/thoughts/suggestions on this implementation?

Thanks

Kwok
commit 4c3926d9abb1a7e6089a9098e2099e2d574ebfec
Author: Kwok Cheung Yeung <kcy@codesourcery.com>
Date:   Tue Nov 3 03:06:26 2020 -0800

    openmp: Add support for the OpenMP 5.0 task detach clause
    
    2020-11-11  Kwok Cheung Yeung  <kcy@codesourcery.com>
    
    	gcc/
    	* builtin-types.def (BT_PTR_SIZED_INT): New primitive type.
    	(BT_FN_PSINT_VOID): New function type.
    	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
    	to...
    	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT):
    	...this.  Add extra argument.
    	* gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DETACH.
    	(gimplify_adjust_omp_clauses): Likewise.
    	* omp-builtins.def (BUILT_IN_GOMP_TASK): Change function type to
    	BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT.
    	(BUILT_IN_GOMP_NEW_EVENT): New.
    	* omp-expand.c (expand_task_call): Add detach argument when generating
    	call to	GOMP_task.
    	* omp-low.c (scan_sharing_clauses): Setup data environment for detach
    	clause.
    	(lower_detach_clause): New.
    	(lower_omp_taskreg): Call lower_detach_clause for detach clause.  Add
    	Gimple statements generated for detach clause.
    	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DETACH.
    	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_DETACH.
    	* tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_DETACH.
    	(omp_clause_code_name): Add entry for OMP_CLAUSE_DETACH.
    	(walk_tree_1): Handle OMP_CLAUSE_DETACH.
    	* tree.h (OMP_CLAUSE_DETACH_EXPR): New.
    
    	gcc/c-family/
    	* c-pragma.h (pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_DETACH.
    	Redefine PRAGMA_OACC_CLAUSE_DETACH.
    
    	gcc/c/
    	* c-parser.c (c_parser_omp_clause_detach): New.
    	(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH clause.
    	(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
    	* c-typeck.c (c_finish_omp_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH
    	clause.
    
    	gcc/cp/
    	* parser.c (cp_parser_omp_all_clauses): Handle
    	PRAGMA_OMP_CLAUSE_DETACH.
    	(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
    	* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_DETACH clause.
    
    	gcc/fortran/
    	* dump-parse-tree.c (show_omp_clauses): Handle detach clause.
    	* frontend-passes.c (gfc_code_walker): Walk detach expression.
    	* gfortran.h (struct gfc_omp_clauses): Add detach field.
    	* openmp.c (gfc_free_omp_clauses): Free detach clause.
    	(enum omp_mask1): Add OMP_CLAUSE_DETACH.
    	(enum omp_mask2): Remove OMP_CLAUSE_DETACH.
    	(gfc_match_omp_clauses): Handle OMP_CLAUSE_DETACH for OpenMP.
    	(OMP_TASK_CLAUSES): Add OMP_CLAUSE_DETACH.
    	* trans-openmp.c (gfc_trans_omp_clauses): Handle detach clause.
    	* types.def (BT_PTR_SIZED_INT): New type.
    	(BT_FN_PSINT_VOID): New function type.
    	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
    	to...
    	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT):
    	...this.  Add extra argument.
    
    	libgomp/
    	* fortran.c (omp_fulfill_event_): New.
    	* libgomp.h (struct gomp_allow_completion_event): New.
    	(struct gomp_task): Add detach_event field.
    	(struct gomp_team): Add task_detach_queue and task_detach_count
    	fields.
    	(gomp_finish_task): Delete detach_event.
    	* libgomp.map (OMP_5.0.1): Add omp_fulfill_event and omp_fulfill_event_.
    	(GOMP_5.0): Add GOMP_new_event.
    	* libgomp_g.h (GOMP_new_event): New.
    	(GOMP_task): Add uintptr_t argument.
    	* omp.h.in (enum omp_event_handle_t): New.
    	(omp_fulfill_event): New.
    	* omp_lib.f90.in (omp_event_handle_kind): New.
    	(omp_fulfill_event): New.
    	* omp_lib.h.in (omp_event_handle_kind): New.
    	(omp_event_handle_kind): New.
    	(omp_fulfill_event): Declare.
    	* task.c (gomp_init_task): Initialize detach_event field.
    	(GOMP_new_event): New.
    	(GOMP_task): Add detach argument.  Initialize detach_event field.
    	Wait for detach event if task not deferred.
    	(gomp_barrier_handle_tasks): Queue tasks with unfulfilled events.
    	When idle, wait for events to be fulfilled, remove completed tasks
    	and requeue dependent tasks.
    	(omp_fulfill_event): New.
    	* team.c (gomp_new_team): Initialize task_detach_queue and
    	task_detach_count fields.
    	(free_team): Free task_detach_queue field.
    	* testsuite/libgomp.c-c++-common/task-detach-1.c: New testcase.
    	* testsuite/libgomp.c-c++-common/task-detach-2.c: New testcase.
    	* testsuite/libgomp.c-c++-common/task-detach-3.c: New testcase.
    	* testsuite/libgomp.fortran/task-detach-1.f90: New testcase.
    	* testsuite/libgomp.fortran/task-detach-2.f90: New testcase.
    	* testsuite/libgomp.fortran/task-detach-3.f90: New testcase.

Comments

Kwok Cheung Yeung Nov. 27, 2020, 9:48 p.m. UTC | #1
Hello

This is an updated version of the WIP patch for task detach support. Any 
comments would be welcome!

On 11/11/2020 7:06 pm, Kwok Cheung Yeung wrote:
> - No error checking at the front-end.

The detach clause is now parsed properly in C, C++ and Fortran, and will raise 
an error if the syntax is incorrect or if the event variable is of the wrong type.

> - The memory for the event is not mapped on the target. This means that if 
> omp_fulfill_event is called from an 'omp target' section with a target that does 
> not share memory with the host, the event will not be fulfilled (and a segfault 
> will probably occur).

I was thinking of something along the lines of:

#pragma omp task detach (event)
{
}

#pragma omp target
{
   omp_fulfill_event (event);
}

Would something like this be expected to work? I cannot find many examples of 
the detach clause online, and none of them use any offloading constructs.

> - The tasks awaiting event fulfillment currently wait until there are no other 
> runnable tasks left. A better approach would be to poll (without blocking) the 
> waiting tasks whenever any task completes, immediately removing any now-complete 
> tasks and requeuing any dependent tasks.

This has now been implemented. On every iteration of the main loop in 
gomp_barrier_handle_tasks, it first checks to see if any tasks in the detach 
queue have a fulfilled completion event, and if so it will remove the task and 
requeue any dependent tasks.

Thanks

Kwok
From 3611024b39ea5b264ec2fd35ffa64360861052af Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Fri, 27 Nov 2020 11:59:12 -0800
Subject: [PATCH] openmp: Add support for the OpenMP 5.0 task detach clause

2020-11-27  Kwok Cheung Yeung  <kcy@codesourcery.com>

	gcc/
	* builtin-types.def (BT_PTR_SIZED_INT): New primitive type.
	(BT_FN_PSINT_VOID): New function type.
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
	to...
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT):
	...this.  Add extra argument.
	* gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DETACH.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-builtins.def (BUILT_IN_GOMP_TASK): Change function type to
	BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT.
	(BUILT_IN_GOMP_NEW_EVENT): New.
	* omp-expand.c (expand_task_call): Add detach argument when generating
	call to	GOMP_task.
	* omp-low.c (scan_sharing_clauses): Setup data environment for detach
	clause.
	(lower_detach_clause): New.
	(lower_omp_taskreg): Call lower_detach_clause for detach clause.  Add
	Gimple statements generated for detach clause.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DETACH.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_DETACH.
	* tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_DETACH.
	(omp_clause_code_name): Add entry for OMP_CLAUSE_DETACH.
	(walk_tree_1): Handle OMP_CLAUSE_DETACH.
	* tree.h (OMP_CLAUSE_DETACH_EXPR): New.

	gcc/c-family/
	* c-pragma.h (pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_DETACH.
	Redefine PRAGMA_OACC_CLAUSE_DETACH.

	gcc/c/
	* c-parser.c (c_parser_omp_clause_detach): New.
	(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH clause.
	(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
	* c-typeck.c (c_finish_omp_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH
	clause.

	gcc/cp/
	* parser.c (cp_parser_omp_clause_detach): New.
	(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH.
	(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
	* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_DETACH clause.

	gcc/fortran/
	* dump-parse-tree.c (show_omp_clauses): Handle detach clause.
	* frontend-passes.c (gfc_code_walker): Walk detach expression.
	* gfortran.h (struct gfc_omp_clauses): Add detach field.
	(gfc_c_intptr_kind): New.
	* openmp.c (gfc_free_omp_clauses): Free detach clause.
	(gfc_match_omp_detach): New.
	(enum omp_mask1): Add OMP_CLAUSE_DETACH.
	(enum omp_mask2): Remove OMP_CLAUSE_DETACH.
	(gfc_match_omp_clauses): Handle OMP_CLAUSE_DETACH for OpenMP.
	(OMP_TASK_CLAUSES): Add OMP_CLAUSE_DETACH.
	* trans-openmp.c (gfc_trans_omp_clauses): Handle detach clause.
	* trans-types.c (gfc_c_intptr_kind): New.
	(gfc_init_kinds): Initialize gfc_c_intptr_kind.
	* types.def (BT_PTR_SIZED_INT): New type.
	(BT_FN_PSINT_VOID): New function type.
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
	to...
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT):
	...this.  Add extra argument.

	libgomp/
	* fortran.c (omp_fulfill_event_): New.
	* libgomp.h (struct gomp_allow_completion_event): New.
	(struct gomp_task): Add detach_event field.
	(struct gomp_team): Add task_detach_queue and task_detach_count
	fields.
	(gomp_finish_task): Delete detach_event.
	* libgomp.map (OMP_5.0.1): Add omp_fulfill_event and omp_fulfill_event_.
	(GOMP_5.0): Add GOMP_new_event.
	* libgomp_g.h (GOMP_new_event): New.
	(GOMP_task): Add uintptr_t argument.
	* omp.h.in (enum omp_event_handle_t): New.
	(omp_fulfill_event): New.
	* omp_lib.f90.in (omp_event_handle_kind): New.
	(omp_fulfill_event): New.
	* omp_lib.h.in (omp_event_handle_kind): New.
	(omp_event_handle_kind): New.
	(omp_fulfill_event): Declare.
	* priority_queue.c (priority_tree_find): New.
	(priority_list_find): New.
	(priority_queue_find): New.
	* priority_queue.h (priority_queue_predicate): New.
	(priority_queue_find): New.
	* task.c (gomp_init_task): Initialize detach_event field.
	(GOMP_new_event): New.
	(GOMP_task): Add detach argument.  Initialize detach_event field.
	Wait for detach event if task not deferred.
	(task_fulfilled_p): New.
	(gomp_barrier_handle_tasks): Queue tasks with unfulfilled events.
	Remove completed tasks and requeue dependent tasks.
	(omp_fulfill_event): New.
	* team.c (gomp_new_team): Initialize task_detach_queue and
	task_detach_count fields.
	(free_team): Free task_detach_queue field.
	* testsuite/libgomp.c-c++-common/task-detach-1.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-2.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-3.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-4.c: New testcase.
	* testsuite/libgomp.fortran/task-detach-1.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-2.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-3.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-4.f90: New testcase.
---
 gcc/builtin-types.def                              |  10 +-
 gcc/c-family/c-pragma.h                            |   3 +-
 gcc/c/c-parser.c                                   |  54 ++++++-
 gcc/c/c-typeck.c                                   |   5 +
 gcc/cp/parser.c                                    |  67 ++++++++-
 gcc/cp/semantics.c                                 |   3 +
 gcc/fortran/dump-parse-tree.c                      |   6 +
 gcc/fortran/frontend-passes.c                      |   1 +
 gcc/fortran/gfortran.h                             |   2 +
 gcc/fortran/openmp.c                               |  44 +++++-
 gcc/fortran/trans-openmp.c                         |  15 ++
 gcc/fortran/trans-types.c                          |   3 +
 gcc/fortran/types.def                              |  10 +-
 gcc/gimplify.c                                     |   5 +
 gcc/omp-builtins.def                               |   4 +-
 gcc/omp-expand.c                                   |  10 +-
 gcc/omp-low.c                                      |  35 +++++
 gcc/tree-core.h                                    |   3 +
 gcc/tree-pretty-print.c                            |   6 +
 gcc/tree.c                                         |   3 +
 gcc/tree.h                                         |   3 +
 libgomp/fortran.c                                  |   6 +
 libgomp/libgomp.h                                  |  14 ++
 libgomp/libgomp.map                                |   3 +
 libgomp/libgomp_g.h                                |   3 +-
 libgomp/omp.h.in                                   |   7 +
 libgomp/omp_lib.f90.in                             |   8 +
 libgomp/omp_lib.h.in                               |   4 +
 libgomp/priority_queue.c                           |  50 +++++++
 libgomp/priority_queue.h                           |   5 +
 libgomp/task.c                                     | 166 ++++++++++++++++++---
 libgomp/team.c                                     |   4 +
 .../testsuite/libgomp.c-c++-common/task-detach-1.c |  39 +++++
 .../testsuite/libgomp.c-c++-common/task-detach-2.c |  39 +++++
 .../testsuite/libgomp.c-c++-common/task-detach-3.c |  39 +++++
 .../testsuite/libgomp.c-c++-common/task-detach-4.c |  26 ++++
 .../testsuite/libgomp.fortran/task-detach-1.f90    |  33 ++++
 .../testsuite/libgomp.fortran/task-detach-2.f90    |  33 ++++
 .../testsuite/libgomp.fortran/task-detach-3.f90    |  33 ++++
 .../testsuite/libgomp.fortran/task-detach-4.f90    |  21 +++
 40 files changed, 786 insertions(+), 39 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-3.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-4.f90

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 4a82ee4..d28ec75 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -79,6 +79,7 @@ DEF_PRIMITIVE_TYPE (BT_UINT128, uint128_type_node
 DEF_PRIMITIVE_TYPE (BT_WORD, (*lang_hooks.types.type_for_mode) (word_mode, 1))
 DEF_PRIMITIVE_TYPE (BT_UNWINDWORD, (*lang_hooks.types.type_for_mode)
 				    (targetm.unwind_word_mode (), 1))
+DEF_PRIMITIVE_TYPE (BT_PTR_SIZED_INT, pointer_sized_int_node)
 DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
@@ -253,6 +254,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_LONG_LONGDOUBLE, BT_LONG, BT_LONGDOUBLE)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT, BT_LONGLONG, BT_FLOAT)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_DOUBLE, BT_LONGLONG, BT_DOUBLE)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_LONGDOUBLE, BT_LONGLONG, BT_LONGDOUBLE)
+DEF_FUNCTION_TYPE_1 (BT_FN_PSINT_VOID, BT_PTR_SIZED_INT, BT_VOID)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTR, BT_VOID, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_SIZE_CONST_STRING, BT_SIZE, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_CONST_STRING, BT_INT, BT_CONST_STRING)
@@ -756,10 +758,6 @@ DEF_FUNCTION_TYPE_8 (BT_FN_BOOL_UINT_ULLPTR_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_LONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG, BT_PTR, BT_PTR)
 
-DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
-		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
-		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
-		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
 		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
@@ -767,6 +765,10 @@ DEF_FUNCTION_TYPE_9 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR_PTR_PTR
 		     BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG, BT_LONG,
 		     BT_PTR_LONG, BT_PTR_LONG, BT_PTR, BT_PTR)
 
+DEF_FUNCTION_TYPE_10 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT,
+		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
+		      BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
+		      BT_BOOL, BT_UINT, BT_PTR, BT_INT, BT_PTR_SIZED_INT)
 DEF_FUNCTION_TYPE_10 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		      BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		      BT_ULONGLONG, BT_LONG, BT_ULONGLONG, BT_PTR_ULONGLONG,
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 5a493fe..fb784e9 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -94,6 +94,7 @@ enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_DEFAULT,
   PRAGMA_OMP_CLAUSE_DEFAULTMAP,
   PRAGMA_OMP_CLAUSE_DEPEND,
+  PRAGMA_OMP_CLAUSE_DETACH,
   PRAGMA_OMP_CLAUSE_DEVICE,
   PRAGMA_OMP_CLAUSE_DEVICE_TYPE,
   PRAGMA_OMP_CLAUSE_DIST_SCHEDULE,
@@ -150,7 +151,6 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_COPYOUT,
   PRAGMA_OACC_CLAUSE_CREATE,
   PRAGMA_OACC_CLAUSE_DELETE,
-  PRAGMA_OACC_CLAUSE_DETACH,
   PRAGMA_OACC_CLAUSE_DEVICEPTR,
   PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
   PRAGMA_OACC_CLAUSE_FINALIZE,
@@ -173,6 +173,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_COPYIN = PRAGMA_OMP_CLAUSE_COPYIN,
   PRAGMA_OACC_CLAUSE_DEVICE = PRAGMA_OMP_CLAUSE_DEVICE,
   PRAGMA_OACC_CLAUSE_DEFAULT = PRAGMA_OMP_CLAUSE_DEFAULT,
+  PRAGMA_OACC_CLAUSE_DETACH = PRAGMA_OMP_CLAUSE_DETACH,
   PRAGMA_OACC_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
   PRAGMA_OACC_CLAUSE_IF = PRAGMA_OMP_CLAUSE_IF,
   PRAGMA_OACC_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7540a15..d3546d4 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15976,6 +15976,53 @@ c_parser_omp_clause_uniform (c_parser *parser, tree list)
   return list;
 }
 
+/* OpenMP 5.0:
+   detach ( event-handle ) */
+
+static tree
+c_parser_omp_clause_detach (c_parser *parser, tree list)
+{
+  matching_parens parens;
+  location_t clause_loc = c_parser_peek_token (parser)->location;
+
+  if (!parens.require_open (parser))
+    return list;
+
+  if (c_parser_next_token_is_not (parser, CPP_NAME)
+      || c_parser_peek_token (parser)->id_kind != C_ID_ID)
+    {
+      c_parser_error (parser, "expected identifier");
+      return list;
+    }
+
+  tree t = lookup_name (c_parser_peek_token (parser)->value);
+  if (t == NULL_TREE)
+    {
+      undeclared_variable (c_parser_peek_token (parser)->location,
+			   c_parser_peek_token (parser)->value);
+      return list;
+    }
+  c_parser_consume_token (parser);
+
+  tree type = TYPE_MAIN_VARIANT (TREE_TYPE (t));
+  if (!INTEGRAL_TYPE_P (type)
+      || TREE_CODE (type) != ENUMERAL_TYPE
+      || TYPE_NAME (type) != get_identifier ("omp_event_handle_t"))
+    {
+      error_at (clause_loc, "%<detach%> clause event handle "
+			    "has type %qT rather than "
+			    "%<omp_event_handle_t%>",
+			    type);
+      return list;
+    }
+
+  tree u = build_omp_clause (clause_loc, OMP_CLAUSE_DETACH);
+  OMP_CLAUSE_DECL (u) = t;
+  OMP_CLAUSE_CHAIN (u) = list;
+  parens.skip_until_found_close (parser);
+  return u;
+}
+
 /* Parse all OpenACC clauses.  The set clauses allowed by the directive
    is a bitmask in MASK.  Return the list of clauses found.  */
 
@@ -16242,6 +16289,10 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_omp_clause_default (parser, clauses, false);
 	  c_name = "default";
 	  break;
+	case PRAGMA_OMP_CLAUSE_DETACH:
+	  clauses = c_parser_omp_clause_detach (parser, clauses);
+	  c_name = "detach";
+	  break;
 	case PRAGMA_OMP_CLAUSE_FIRSTPRIVATE:
 	  clauses = c_parser_omp_clause_firstprivate (parser, clauses);
 	  c_name = "firstprivate";
@@ -19140,7 +19191,8 @@ c_parser_omp_single (location_t loc, c_parser *parser, bool *if_p)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEPEND)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRIORITY)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ALLOCATE)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION))
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DETACH))
 
 static tree
 c_parser_omp_task (location_t loc, c_parser *parser, bool *if_p)
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 286f3d9..0c2ed59 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -14942,6 +14942,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  pc = &OMP_CLAUSE_CHAIN (c);
 	  continue;
 
+	case OMP_CLAUSE_DETACH:
+	  t = OMP_CLAUSE_DECL (c);
+	  pc = &OMP_CLAUSE_CHAIN (c);
+	  continue;
+
 	case OMP_CLAUSE_IF:
 	case OMP_CLAUSE_NUM_THREADS:
 	case OMP_CLAUSE_NUM_TEAMS:
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 8802124..5731a29 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -36957,6 +36957,66 @@ cp_parser_omp_clause_depend_sink (cp_parser *parser, location_t clause_loc,
 }
 
 /* OpenMP 5.0:
+   detach ( event-handle ) */
+
+static tree
+cp_parser_omp_clause_detach (cp_parser *parser, tree list)
+{
+  matching_parens parens;
+
+  if (!parens.require_open (parser))
+    return list;
+
+  if (cp_lexer_next_token_is_not (parser->lexer, CPP_NAME))
+    {
+      cp_parser_error (parser, "expected identifier");
+      return list;
+    }
+
+  location_t id_loc = cp_lexer_peek_token (parser->lexer)->location;
+  tree t, identifier = cp_parser_identifier (parser);
+
+  if (identifier == error_mark_node)
+    t = error_mark_node;
+  else
+    {
+      t = cp_parser_lookup_name_simple
+	    (parser, identifier,
+	     cp_lexer_peek_token (parser->lexer)->location);
+      if (t == error_mark_node)
+	cp_parser_name_lookup_error (parser, identifier, t, NLE_NULL,
+				     id_loc);
+      else
+	{
+	  tree type = TYPE_MAIN_VARIANT (TREE_TYPE (t));
+	  if (!INTEGRAL_TYPE_P (type)
+	      || TREE_CODE (type) != ENUMERAL_TYPE
+	      || DECL_NAME (TYPE_NAME (type))
+		   != get_identifier ("omp_event_handle_t"))
+	    {
+	      error_at (id_loc, "%<detach%> clause event handle "
+			    "has type %qT rather than "
+			    "%<omp_event_handle_t%>",
+			    type);
+	      return list;
+	    }
+	}
+    }
+
+  if (t == error_mark_node
+      || !parens.require_close (parser))
+    cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+					   /*or_comma=*/false,
+					   /*consume_paren=*/true);
+
+  tree u = build_omp_clause (id_loc, OMP_CLAUSE_DETACH);
+  OMP_CLAUSE_DECL (u) = t;
+  OMP_CLAUSE_CHAIN (u) = list;
+
+  return u;
+}
+
+/* OpenMP 5.0:
    iterators ( iterators-definition )
 
    iterators-definition:
@@ -38012,6 +38072,10 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 						 token->location);
 	  c_name = "depend";
 	  break;
+	case PRAGMA_OMP_CLAUSE_DETACH:
+	  clauses = cp_parser_omp_clause_detach (parser, clauses);
+	  c_name = "detach";
+	  break;
 	case PRAGMA_OMP_CLAUSE_MAP:
 	  clauses = cp_parser_omp_clause_map (parser, clauses);
 	  c_name = "map";
@@ -40546,7 +40610,8 @@ cp_parser_omp_single (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEPEND)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRIORITY)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ALLOCATE)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION))
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DETACH))
 
 static tree
 cp_parser_omp_task (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 5ff70ff..38eecf5 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7394,6 +7394,9 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 		}
 	    }
 	  break;
+	case OMP_CLAUSE_DETACH:
+	  t = OMP_CLAUSE_DECL (c);
+	  break;
 
 	case OMP_CLAUSE_MAP:
 	case OMP_CLAUSE_TO:
diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index cab0fb2..baf1e3c 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1700,6 +1700,12 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
       show_expr (omp_clauses->priority);
       fputc (')', dumpfile);
     }
+  if (omp_clauses->detach)
+    {
+      fputs (" DETACH(", dumpfile);
+      show_expr (omp_clauses->detach);
+      fputc (')', dumpfile);
+    }
   for (i = 0; i < OMP_IF_LAST; i++)
     if (omp_clauses->if_exprs[i])
       {
diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index 83f6fd8..699b354 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -5597,6 +5597,7 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn,
 		  WALK_SUBEXPR (co->ext.omp_clauses->hint);
 		  WALK_SUBEXPR (co->ext.omp_clauses->num_tasks);
 		  WALK_SUBEXPR (co->ext.omp_clauses->priority);
+		  WALK_SUBEXPR (co->ext.omp_clauses->detach);
 		  for (idx = 0; idx < OMP_IF_LAST; idx++)
 		    WALK_SUBEXPR (co->ext.omp_clauses->if_exprs[idx]);
 		  for (idx = 0;
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 6467985..fec96af 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1414,6 +1414,7 @@ typedef struct gfc_omp_clauses
   struct gfc_expr *hint;
   struct gfc_expr *num_tasks;
   struct gfc_expr *priority;
+  struct gfc_expr *detach;
   struct gfc_expr *if_exprs[OMP_IF_LAST];
   enum gfc_omp_sched_kind dist_sched_kind;
   struct gfc_expr *dist_chunk_size;
@@ -3102,6 +3103,7 @@ extern int gfc_default_character_kind;
 extern int gfc_default_logical_kind;
 extern int gfc_default_complex_kind;
 extern int gfc_c_int_kind;
+extern int gfc_c_intptr_kind;
 extern int gfc_atomic_int_kind;
 extern int gfc_atomic_logical_kind;
 extern int gfc_intio_kind;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 68d0b65..4c58447 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -91,6 +91,7 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c->hint);
   gfc_free_expr (c->num_tasks);
   gfc_free_expr (c->priority);
+  gfc_free_expr (c->detach);
   for (i = 0; i < OMP_IF_LAST; i++)
     gfc_free_expr (c->if_exprs[i]);
   gfc_free_expr (c->async_expr);
@@ -448,6 +449,39 @@ cleanup:
   return MATCH_ERROR;
 }
 
+/* Match detach(event-handle).  */
+
+static match
+gfc_match_omp_detach (gfc_expr **expr)
+{
+  locus old_loc = gfc_current_locus;
+
+  if (gfc_match ("detach ( ") != MATCH_YES)
+    goto syntax_error;
+
+  if (gfc_match_variable (expr, 0) != MATCH_YES)
+    goto syntax_error;
+
+  if ((*expr)->ts.type != BT_INTEGER || (*expr)->ts.kind != gfc_c_intptr_kind)
+    {
+      gfc_error ("%qs at %L should be of type "
+		 "integer(kind=omp_event_handle_kind)",
+		 (*expr)->symtree->n.sym->name, &(*expr)->where);
+      return MATCH_ERROR;
+    }
+
+  if (gfc_match_char (')') != MATCH_YES)
+    goto syntax_error;
+
+  return MATCH_YES;
+
+syntax_error:
+   gfc_error ("Syntax error in OpenMP detach clause at %C");
+   gfc_current_locus = old_loc;
+   return MATCH_ERROR;
+
+}
+
 /* Match depend(sink : ...) construct a namelist from it.  */
 
 static match
@@ -807,6 +841,7 @@ enum omp_mask1
   OMP_CLAUSE_ATOMIC,  /* OpenMP 5.0.  */
   OMP_CLAUSE_CAPTURE,  /* OpenMP 5.0.  */
   OMP_CLAUSE_MEMORDER,  /* OpenMP 5.0.  */
+  OMP_CLAUSE_DETACH,  /* OpenMP 5.0.  */
   OMP_CLAUSE_NOWAIT,
   /* This must come last.  */
   OMP_MASK1_LAST
@@ -840,7 +875,6 @@ enum omp_mask2
   OMP_CLAUSE_IF_PRESENT,
   OMP_CLAUSE_FINALIZE,
   OMP_CLAUSE_ATTACH,
-  OMP_CLAUSE_DETACH,
   /* This must come last.  */
   OMP_MASK2_LAST
 };
@@ -1378,6 +1412,11 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		gfc_current_locus = old_loc;
 	    }
 	  if ((mask & OMP_CLAUSE_DETACH)
+	      && !openacc
+	      && gfc_match_omp_detach (&c->detach) == MATCH_YES)
+	    continue;
+	  if ((mask & OMP_CLAUSE_DETACH)
+	      && openacc
 	      && gfc_match ("detach ( ") == MATCH_YES
 	      && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
 					   OMP_MAP_DETACH, false,
@@ -2763,7 +2802,8 @@ cleanup:
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
    | OMP_CLAUSE_SHARED | OMP_CLAUSE_IF | OMP_CLAUSE_DEFAULT		\
    | OMP_CLAUSE_UNTIED | OMP_CLAUSE_FINAL | OMP_CLAUSE_MERGEABLE	\
-   | OMP_CLAUSE_DEPEND | OMP_CLAUSE_PRIORITY | OMP_CLAUSE_IN_REDUCTION)
+   | OMP_CLAUSE_DEPEND | OMP_CLAUSE_PRIORITY | OMP_CLAUSE_IN_REDUCTION	\
+   | OMP_CLAUSE_DETACH)
 #define OMP_TASKLOOP_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
    | OMP_CLAUSE_LASTPRIVATE | OMP_CLAUSE_SHARED | OMP_CLAUSE_IF		\
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 6b4ad6a..314e00d 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3667,6 +3667,21 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
     }
 
+  if (clauses->detach)
+    {
+      tree detach;
+
+      gfc_init_se (&se, NULL);
+      gfc_conv_expr (&se, clauses->detach);
+      gfc_add_block_to_block (block, &se.pre);
+      detach = se.expr;
+      gfc_add_block_to_block (block, &se.post);
+
+      c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_DETACH);
+      OMP_CLAUSE_DETACH_EXPR (c) = detach;
+      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+    }
+
   if (clauses->hint)
     {
       tree hint;
diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index b7129dc..bd9ca14 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -114,6 +114,7 @@ int gfc_default_character_kind;
 int gfc_default_logical_kind;
 int gfc_default_complex_kind;
 int gfc_c_int_kind;
+int gfc_c_intptr_kind;
 int gfc_atomic_int_kind;
 int gfc_atomic_logical_kind;
 
@@ -691,6 +692,8 @@ gfc_init_kinds (void)
   /* Choose atomic kinds to match C's int.  */
   gfc_atomic_int_kind = gfc_c_int_kind;
   gfc_atomic_logical_kind = gfc_c_int_kind;
+
+  gfc_c_intptr_kind = POINTER_SIZE / 8;
 }
 
 
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 5736bba..7cac4dc 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -53,6 +53,7 @@ DEF_PRIMITIVE_TYPE (BT_LONG, long_integer_type_node)
 DEF_PRIMITIVE_TYPE (BT_ULONGLONG, long_long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (BT_WORD, (*lang_hooks.types.type_for_mode) (word_mode, 1))
 DEF_PRIMITIVE_TYPE (BT_SIZE, size_type_node)
+DEF_PRIMITIVE_TYPE (BT_PTR_SIZED_INT, pointer_sized_int_node)
 
 DEF_PRIMITIVE_TYPE (BT_I1, builtin_type_for_size (BITS_PER_UNIT*1, 1))
 DEF_PRIMITIVE_TYPE (BT_I2, builtin_type_for_size (BITS_PER_UNIT*2, 1))
@@ -86,6 +87,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT, BT_INT, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
 DEF_FUNCTION_TYPE_1 (BT_FN_PTR_PTR, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_PSINT_VOID, BT_PTR_SIZED_INT, BT_VOID)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_INT, BT_VOID, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_BOOL, BT_VOID, BT_BOOL)
 DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT)
@@ -235,10 +237,6 @@ DEF_FUNCTION_TYPE_8 (BT_FN_BOOL_UINT_ULLPTR_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_LONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG, BT_PTR, BT_PTR)
 
-DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
-		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
-		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
-		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
 		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
@@ -246,6 +244,10 @@ DEF_FUNCTION_TYPE_9 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR_PTR_PTR
 		     BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG, BT_LONG,
 		     BT_PTR_LONG, BT_PTR_LONG, BT_PTR, BT_PTR)
 
+DEF_FUNCTION_TYPE_10 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT,
+		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
+		      BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
+		      BT_BOOL, BT_UINT, BT_PTR, BT_INT, BT_PTR_SIZED_INT)
 DEF_FUNCTION_TYPE_10 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		      BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		      BT_ULONGLONG, BT_LONG, BT_ULONGLONG, BT_PTR_ULONGLONG,
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 53ec9ec..b0b3922 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9733,6 +9733,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	    }
 	  break;
 
+	case OMP_CLAUSE_DETACH:
+	  decl = OMP_CLAUSE_DECL (c);
+	  goto do_notice;
+
 	case OMP_CLAUSE_IF:
 	  if (OMP_CLAUSE_IF_MODIFIER (c) != ERROR_MARK
 	      && OMP_CLAUSE_IF_MODIFIER (c) != code)
@@ -10870,6 +10874,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	case OMP_CLAUSE_DEFAULTMAP:
 	case OMP_CLAUSE_ORDER:
 	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index f9b78ed..b3bf17f 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -381,7 +381,7 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL_REDUCTIONS,
 		  "GOMP_parallel_reductions",
 		  BT_FN_UINT_OMPFN_PTR_UINT_UINT, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASK, "GOMP_task",
-		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
+		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT,
 		  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASKLOOP, "GOMP_taskloop",
 		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
@@ -451,3 +451,5 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ALLOC,
 		  ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_FREE,
 		  "GOMP_free", BT_FN_VOID_PTR_PTRMODE, ATTR_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_NEW_EVENT, "GOMP_new_event",
+		  BT_FN_PSINT_VOID, ATTR_NOTHROW_LEAF_LIST)
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index c0e94e5..533f47d 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -762,6 +762,7 @@ expand_task_call (struct omp_region *region, basic_block bb,
   tree depend = omp_find_clause (clauses, OMP_CLAUSE_DEPEND);
   tree finalc = omp_find_clause (clauses, OMP_CLAUSE_FINAL);
   tree priority = omp_find_clause (clauses, OMP_CLAUSE_PRIORITY);
+  tree detach = omp_find_clause (clauses, OMP_CLAUSE_DETACH);
 
   unsigned int iflags
     = (untied ? GOMP_TASK_FLAG_UNTIED : 0)
@@ -853,6 +854,11 @@ expand_task_call (struct omp_region *region, basic_block bb,
     priority = integer_zero_node;
 
   gsi = gsi_last_nondebug_bb (bb);
+
+  detach = detach
+      ? fold_convert (pointer_sized_int_node, OMP_CLAUSE_DETACH_EXPR (detach))
+      : null_pointer_node;
+
   tree t = gimple_omp_task_data_arg (entry_stmt);
   if (t == NULL)
     t2 = null_pointer_node;
@@ -875,10 +881,10 @@ expand_task_call (struct omp_region *region, basic_block bb,
 			 num_tasks, priority, startvar, endvar, step);
   else
     t = build_call_expr (builtin_decl_explicit (BUILT_IN_GOMP_TASK),
-			 9, t1, t2, t3,
+			 10, t1, t2, t3,
 			 gimple_omp_task_arg_size (entry_stmt),
 			 gimple_omp_task_arg_align (entry_stmt), cond, flags,
-			 depend, priority);
+			 depend, priority, detach);
 
   force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
 			    false, GSI_CONTINUE_LINKING);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 09a8cbd..3dd15d1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1412,6 +1412,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_NUM_GANGS:
 	case OMP_CLAUSE_NUM_WORKERS:
 	case OMP_CLAUSE_VECTOR_LENGTH:
+	case OMP_CLAUSE_DETACH:
 	  if (ctx->outer)
 	    scan_omp_op (&OMP_CLAUSE_OPERAND (c, 0), ctx->outer);
 	  break;
@@ -1779,6 +1780,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_SIMDLEN:
 	case OMP_CLAUSE_ALIGNED:
 	case OMP_CLAUSE_DEPEND:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_ALLOCATE:
 	case OMP_CLAUSE__LOOPTEMP_:
 	case OMP_CLAUSE__REDUCTEMP_:
@@ -11453,6 +11455,26 @@ create_task_copyfn (gomp_task *task_stmt, omp_context *ctx)
 }
 
 static void
+lower_detach_clause (tree *pclauses, gimple_seq *iseq, omp_context *ctx)
+{
+  tree clause = omp_find_clause (*pclauses, OMP_CLAUSE_DETACH);
+  gcc_assert (clause);
+
+  tree event_decl = OMP_CLAUSE_DECL (clause);
+  tree event_ref = lookup_decl_in_outer_ctx (event_decl, ctx);
+  tree fn_decl = builtin_decl_explicit (BUILT_IN_GOMP_NEW_EVENT);
+  tree handle = create_tmp_var (pointer_sized_int_node);
+
+  gimple *call_stmt = gimple_build_call (fn_decl, 0);
+  gimple_call_set_lhs (call_stmt, handle);
+  gimple_seq_add_stmt (iseq, call_stmt);
+
+  gimplify_assign (event_ref,
+		   fold_convert (TREE_TYPE (event_decl), handle),
+		   iseq);
+}
+
+static void
 lower_depend_clauses (tree *pclauses, gimple_seq *iseq, gimple_seq *oseq)
 {
   tree c, clauses;
@@ -11601,6 +11623,15 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       if (ws_num == 1)
 	gimple_omp_parallel_set_combined_p (stmt, true);
     }
+
+  gimple_seq detach_ilist = NULL;
+  if (gimple_code (stmt) == GIMPLE_OMP_TASK
+      && omp_find_clause (clauses, OMP_CLAUSE_DETACH))
+    {
+      lower_detach_clause (gimple_omp_task_clauses_ptr (stmt), &detach_ilist,
+			   ctx);
+    }
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (gimple_code (stmt) == GIMPLE_OMP_TASK
@@ -11678,6 +11709,10 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   gimple_seq olist = NULL;
   gimple_seq ilist = NULL;
+
+  if (detach_ilist)
+    gimple_seq_add_seq (&ilist, detach_ilist);
+
   lower_send_clauses (clauses, &ilist, &olist, ctx);
   lower_send_shared_vars (&ilist, &olist, ctx);
 
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 313a6af..5b028da 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -301,6 +301,9 @@ enum omp_clause_code {
   /* OpenMP clause: to (variable-list).  */
   OMP_CLAUSE_TO,
 
+  /* OpenMP clause: detach (event-handle).  */
+  OMP_CLAUSE_DETACH,
+
   /* OpenACC clauses: {copy, copyin, copyout, create, delete, deviceptr,
      device, host (self), present, present_or_copy (pcopy), present_or_copyin
      (pcopyin), present_or_copyout (pcopyout), present_or_create (pcreate)}
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 5a93c4d..073d14b 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1265,6 +1265,12 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
     case OMP_CLAUSE_FINALIZE:
       pp_string (pp, "finalize");
       break;
+    case OMP_CLAUSE_DETACH:
+      pp_string (pp, "detach(");
+      dump_generic_node (pp, OMP_CLAUSE_DETACH_EXPR (clause), spc, flags,
+			 false);
+      pp_right_paren (pp);
+      break;
 
     default:
       gcc_unreachable ();
diff --git a/gcc/tree.c b/gcc/tree.c
index d6ba553..921c928 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -299,6 +299,7 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_LINK  */
   2, /* OMP_CLAUSE_FROM  */
   2, /* OMP_CLAUSE_TO  */
+  1, /* OMP_CLAUSE_DETACH  */
   2, /* OMP_CLAUSE_MAP  */
   1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
   1, /* OMP_CLAUSE_USE_DEVICE_ADDR  */
@@ -384,6 +385,7 @@ const char * const omp_clause_code_name[] =
   "link",
   "from",
   "to",
+  "detach",
   "map",
   "use_device_ptr",
   "use_device_addr",
@@ -12234,6 +12236,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	case OMP_CLAUSE_HINT:
 	case OMP_CLAUSE_TO_DECLARE:
 	case OMP_CLAUSE_LINK:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
diff --git a/gcc/tree.h b/gcc/tree.h
index 078919b..c76ee74 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1588,6 +1588,9 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_PRIORITY_EXPR(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PRIORITY),0)
 
+#define OMP_CLAUSE_DETACH_EXPR(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DETACH),0)
+
 /* OpenACC clause expressions  */
 #define OMP_CLAUSE_EXPR(NODE, CLAUSE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, CLAUSE), 0)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index cd719f9..976b248 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -605,6 +605,12 @@ omp_get_max_task_priority_ (void)
 }
 
 void
+omp_fulfill_event_ (intptr_t event)
+{
+  omp_fulfill_event ((omp_event_handle_t) event);
+}
+
+void
 omp_set_affinity_format_ (const char *format, size_t format_len)
 {
   gomp_set_affinity_format (format, format_len);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 070d29c..88fc217 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -516,6 +516,12 @@ struct gomp_taskwait
   gomp_sem_t taskwait_sem;
 };
 
+struct gomp_allow_completion_event
+{
+  bool fulfilled;
+  gomp_sem_t completion_sem;
+};
+
 /* This structure describes a "task" to be run by a thread.  */
 
 struct gomp_task
@@ -545,6 +551,8 @@ struct gomp_task
      entries and the gomp_task in which they reside.  */
   struct priority_node pnode[3];
 
+  struct gomp_allow_completion_event *detach_event;
+
   struct gomp_task_icv icv;
   void (*fn) (void *);
   void *fn_data;
@@ -685,6 +693,10 @@ struct gomp_team
   int work_share_cancelled;
   int team_cancelled;
 
+  /* Tasks waiting for their completion event to be fulfilled.  */
+  struct priority_queue task_detach_queue;
+  unsigned int task_detach_count;
+
   /* This array contains structures for implicit tasks.  */
   struct gomp_task implicit_task[];
 };
@@ -931,6 +943,8 @@ gomp_finish_task (struct gomp_task *task)
 {
   if (__builtin_expect (task->depend_hash != NULL, 0))
     free (task->depend_hash);
+  if (task->detach_event)
+    free (task->detach_event);
 }
 
 /* team.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2c95f78..434dfc3 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -195,6 +195,8 @@ OMP_5.0.1 {
 	omp_free;
 	omp_get_supported_active_levels;
 	omp_get_supported_active_levels_;
+	omp_fulfill_event;
+	omp_fulfill_event_;
 } OMP_5.0;
 
 GOMP_1.0 {
@@ -347,6 +349,7 @@ GOMP_5.0 {
 	GOMP_loop_ull_nonmonotonic_runtime_start;
 	GOMP_loop_ull_ordered_start;
 	GOMP_loop_ull_start;
+	GOMP_new_event;
 	GOMP_parallel_loop_maybe_nonmonotonic_runtime;
 	GOMP_parallel_loop_nonmonotonic_runtime;
 	GOMP_parallel_reductions;
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index b20e186..ca27c53 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -293,8 +293,9 @@ extern bool GOMP_cancellation_point (int);
 
 /* task.c */
 
+extern uintptr_t GOMP_new_event (void);
 extern void GOMP_task (void (*) (void *), void *, void (*) (void *, void *),
-		       long, long, bool, unsigned, void **, int);
+		       long, long, bool, unsigned, void **, int, uintptr_t);
 extern void GOMP_taskloop (void (*) (void *), void *,
 			   void (*) (void *, void *), long, long, unsigned,
 			   unsigned long, int, long, long, long);
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 4424a16..62b6c0f 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -171,6 +171,11 @@ typedef struct omp_alloctrait_t
   omp_uintptr_t value;
 } omp_alloctrait_t;
 
+typedef enum omp_event_handle_t __GOMP_UINTPTR_T_ENUM
+{
+  __omp_event_handle_t_max__ = __UINTPTR_MAX__
+} omp_event_handle_t;
+
 #ifdef __cplusplus
 extern "C" {
 # define __GOMP_NOTHROW throw ()
@@ -245,6 +250,8 @@ extern int omp_is_initial_device (void) __GOMP_NOTHROW;
 extern int omp_get_initial_device (void) __GOMP_NOTHROW;
 extern int omp_get_max_task_priority (void) __GOMP_NOTHROW;
 
+extern void omp_fulfill_event (omp_event_handle_t) __GOMP_NOTHROW;
+
 extern void *omp_target_alloc (__SIZE_TYPE__, int) __GOMP_NOTHROW;
 extern void omp_target_free (void *, int) __GOMP_NOTHROW;
 extern int omp_target_is_present (const void *, int) __GOMP_NOTHROW;
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 3b7f0cb..7b70d8b 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -39,6 +39,7 @@
         integer, parameter :: omp_alloctrait_val_kind = c_intptr_t
         integer, parameter :: omp_memspace_handle_kind = c_intptr_t
         integer, parameter :: omp_depend_kind = @OMP_DEPEND_KIND@
+        integer, parameter :: omp_event_handle_kind = c_intptr_t
         integer (omp_sched_kind), parameter :: omp_sched_static = 1
         integer (omp_sched_kind), parameter :: omp_sched_dynamic = 2
         integer (omp_sched_kind), parameter :: omp_sched_guided = 3
@@ -556,6 +557,13 @@
         end interface
 
         interface
+          subroutine omp_fulfill_event (event)
+            use omp_lib_kinds
+            integer (kind=omp_event_handle_kind), value, intent(in) :: event
+          end subroutine omp_fulfill_event
+        end interface
+
+        interface
           subroutine omp_set_affinity_format (format)
             character(len=*), intent(in) :: format
           end subroutine omp_set_affinity_format
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index eb1dcc4..5b4053f 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -82,10 +82,12 @@
 
       integer omp_allocator_handle_kind, omp_alloctrait_key_kind
       integer omp_alloctrait_val_kind, omp_memspace_handle_kind
+      integer omp_event_handle_kind
       parameter (omp_allocator_handle_kind = @INTPTR_T_KIND@)
       parameter (omp_alloctrait_key_kind = 4)
       parameter (omp_alloctrait_val_kind = @INTPTR_T_KIND@)
       parameter (omp_memspace_handle_kind = @INTPTR_T_KIND@)
+      parameter (omp_event_handle_kind = @INTPTR_T_KIND@)
       integer (omp_alloctrait_key_kind) omp_atk_sync_hint
       integer (omp_alloctrait_key_kind) omp_atk_alignment
       integer (omp_alloctrait_key_kind) omp_atk_access
@@ -245,6 +247,8 @@
       external omp_get_max_task_priority
       integer(4) omp_get_max_task_priority
 
+      external omp_fulfill_event
+
       external omp_set_affinity_format, omp_get_affinity_format
       external omp_display_affinity, omp_capture_affinity
       integer(4) omp_get_affinity_format
diff --git a/libgomp/priority_queue.c b/libgomp/priority_queue.c
index 9b8d2ff..0c6b556 100644
--- a/libgomp/priority_queue.c
+++ b/libgomp/priority_queue.c
@@ -168,6 +168,56 @@ priority_queue_verify (enum priority_queue_type type,
 }
 #endif /* _LIBGOMP_CHECKING_ */
 
+static struct gomp_task *
+priority_tree_find (enum priority_queue_type type,
+		    prio_splay_tree_node node,
+		    priority_queue_predicate pred)
+{
+ again:
+  if (!node)
+    return NULL;
+  struct gomp_task *task = priority_tree_find (type, node->right, pred);
+  if (task)
+    return task;
+  task = priority_node_to_task (type, node->key.l.tasks);
+  if (pred (task))
+    return task;
+  node = node->left;
+  goto again;
+}
+
+static struct gomp_task *
+priority_list_find (enum priority_queue_type type,
+		     struct priority_list *list,
+		     priority_queue_predicate pred)
+{
+  struct priority_node *node = list->tasks;
+  if (!node)
+    return NULL;
+
+  do
+    {
+      struct gomp_task *task = priority_node_to_task (type, node);
+      if (pred (task))
+	return task;
+      node = node->next;
+    }
+  while (node != list->tasks);
+
+  return NULL;
+}
+
+struct gomp_task *
+priority_queue_find (enum priority_queue_type type,
+		     struct priority_queue *head,
+		     priority_queue_predicate pred)
+{
+  if (priority_queue_multi_p (head))
+    return priority_tree_find (type, head->t.root, pred);
+  else
+    return priority_list_find (type, &head->l, pred);
+}
+
 /* Remove NODE from priority queue HEAD, wherever it may be inside the
    tree.  HEAD contains tasks of type TYPE.  */
 
diff --git a/libgomp/priority_queue.h b/libgomp/priority_queue.h
index 0ad78f5..c6fd80d 100644
--- a/libgomp/priority_queue.h
+++ b/libgomp/priority_queue.h
@@ -113,6 +113,8 @@ enum priority_queue_type
   PQ_IGNORED = 999
 };
 
+typedef bool (*priority_queue_predicate)(struct gomp_task *);
+
 /* Priority queue implementation prototypes.  */
 
 extern bool priority_queue_task_in_queue_p (enum priority_queue_type,
@@ -122,6 +124,9 @@ extern void priority_queue_dump (enum priority_queue_type,
 				 struct priority_queue *);
 extern void priority_queue_verify (enum priority_queue_type,
 				   struct priority_queue *, bool);
+extern struct gomp_task *priority_queue_find (enum priority_queue_type,
+					      struct priority_queue *,
+					      priority_queue_predicate);
 extern void priority_tree_remove (enum priority_queue_type,
 				  struct priority_queue *,
 				  struct priority_node *);
diff --git a/libgomp/task.c b/libgomp/task.c
index a95067c..d40a42a 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -86,6 +86,7 @@ gomp_init_task (struct gomp_task *task, struct gomp_task *parent_task,
   task->dependers = NULL;
   task->depend_hash = NULL;
   task->depend_count = 0;
+  task->detach_event = NULL;
 }
 
 /* Clean up a task, after completing it.  */
@@ -326,6 +327,21 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
     }
 }
 
+uintptr_t
+GOMP_new_event ()
+{
+  struct gomp_allow_completion_event *event;
+
+  event = (struct gomp_allow_completion_event *)
+	    gomp_malloc (sizeof (struct gomp_allow_completion_event));
+  event->fulfilled = false;
+  gomp_sem_init (&event->completion_sem, 0);
+
+  gomp_debug (0, "GOMP_new_event: %p\n", event);
+
+  return (uintptr_t) event;
+}
+
 /* Called when encountering an explicit task directive.  If IF_CLAUSE is
    false, then we must not delay in executing the task.  If UNTIED is true,
    then the task may be executed by any member of the team.
@@ -347,11 +363,14 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
 void
 GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	   long arg_size, long arg_align, bool if_clause, unsigned flags,
-	   void **depend, int priority)
+	   void **depend, int priority, uintptr_t detach)
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr->ts.team;
 
+  struct gomp_allow_completion_event *detach_event =
+    detach ? (struct gomp_allow_completion_event *) detach : NULL;
+
 #ifdef HAVE_BROKEN_POSIX_SEMAPHORES
   /* If pthread_mutex_* is used for omp_*lock*, then each task must be
      tied to one thread all the time.  This means UNTIED tasks must be
@@ -404,6 +423,10 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
       task.final_task = (thr->task && thr->task->final_task)
 			|| (flags & GOMP_TASK_FLAG_FINAL);
       task.priority = priority;
+
+      if (detach)
+	task.detach_event = detach_event;
+
       if (thr->task)
 	{
 	  task.in_tied_task = thr->task->in_tied_task;
@@ -420,6 +443,10 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	}
       else
 	fn (data);
+
+      if (detach)
+	  gomp_sem_wait (&task.detach_event->completion_sem);
+
       /* Access to "children" is normally done inside a task_lock
 	 mutex region, but the only way this particular task.children
 	 can be set is if this thread's task work function (fn)
@@ -435,6 +462,7 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	  gomp_clear_parent (&task.children_queue);
 	  gomp_mutex_unlock (&team->task_lock);
 	}
+
       gomp_end_task ();
     }
   else
@@ -458,6 +486,8 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
       task->kind = GOMP_TASK_UNDEFERRED;
       task->in_tied_task = parent->in_tied_task;
       task->taskgroup = taskgroup;
+      if (detach)
+	task->detach_event = detach_event;
       thr->task = task;
       if (cpyfn)
 	{
@@ -1299,6 +1329,13 @@ gomp_task_run_post_remove_taskgroup (struct gomp_task *child_task)
     }
 }
 
+static bool
+task_fulfilled_p (struct gomp_task *task)
+{
+  return __atomic_load_n (&task->detach_event->fulfilled,
+			  __ATOMIC_RELAXED);
+}
+
 void
 gomp_barrier_handle_tasks (gomp_barrier_state_t state)
 {
@@ -1310,6 +1347,8 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
   int do_wake = 0;
 
   gomp_mutex_lock (&team->task_lock);
+  gomp_debug (0, "thread: %d, task_count %d\n",
+	      thr->ts.team_id, team->task_count);
   if (gomp_barrier_last_thread (state))
     {
       if (team->task_count == 0)
@@ -1325,6 +1364,22 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
   while (1)
     {
       bool cancelled = false;
+
+      /* Look for a queued detached task with a fulfilled completion event
+	 that is ready to finish.  */
+      child_task = priority_queue_find (PQ_TEAM, &team->task_detach_queue,
+					task_fulfilled_p);
+      if (child_task)
+	{
+	  priority_queue_remove (PQ_TEAM, &team->task_detach_queue,
+				 child_task, MEMMODEL_RELAXED);
+	  --team->task_detach_count;
+	  gomp_debug (0,
+		      "thread: %d, found task with fulfilled event %p\n",
+		      thr->ts.team_id, child_task->detach_event);
+	  goto finish_cancelled;
+	}
+
       if (!priority_queue_empty_p (&team->task_queue, MEMMODEL_RELAXED))
 	{
 	  bool ignored;
@@ -1388,34 +1443,86 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
 	  thr->task = task;
 	}
       else
-	return;
-      gomp_mutex_lock (&team->task_lock);
-      if (child_task)
 	{
-	 finish_cancelled:;
-	  size_t new_tasks
-	    = gomp_task_run_post_handle_depend (child_task, team);
-	  gomp_task_run_post_remove_parent (child_task);
-	  gomp_clear_parent (&child_task->children_queue);
-	  gomp_task_run_post_remove_taskgroup (child_task);
-	  to_free = child_task;
-	  child_task = NULL;
-	  if (!cancelled)
-	    team->task_running_count--;
-	  if (new_tasks > 1)
+	  bool ignored;
+
+	  /* If there are no tasks left, return.  */
+	  gomp_mutex_lock (&team->task_lock);
+	  if (priority_queue_empty_p (&team->task_detach_queue,
+				      MEMMODEL_RELAXED))
 	    {
-	      do_wake = team->nthreads - team->task_running_count;
-	      if (do_wake > new_tasks)
-		do_wake = new_tasks;
+	      gomp_mutex_unlock (&team->task_lock);
+	      return;
 	    }
-	  if (--team->task_count == 0
-	      && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+
+	  /* Retrieve a queued detached task.  */
+	  child_task
+	    = priority_queue_next_task (PQ_TEAM, &team->task_detach_queue,
+					PQ_IGNORED, NULL,
+					&ignored);
+	  priority_queue_remove (PQ_TEAM, &team->task_detach_queue,
+				 child_task, MEMMODEL_RELAXED);
+	  --team->task_detach_count;
+	  if (!task_fulfilled_p (child_task))
 	    {
-	      gomp_team_barrier_done (&team->barrier, state);
+	      /* Wait for detached task to finish.  */
 	      gomp_mutex_unlock (&team->task_lock);
-	      gomp_team_barrier_wake (&team->barrier, 0);
+	      gomp_debug (0,
+			  "thread: %d, waiting for event to be fulfilled %p\n",
+			  thr->ts.team_id, child_task->detach_event);
+	      gomp_sem_wait (&child_task->detach_event->completion_sem);
 	      gomp_mutex_lock (&team->task_lock);
 	    }
+	  else
+	    gomp_debug (0, "thread: %d, queued event already fulfilled %p\n",
+			thr->ts.team_id, child_task->detach_event);
+	  goto finish_cancelled;
+	}
+      gomp_mutex_lock (&team->task_lock);
+      if (child_task)
+	{
+	  if (child_task->detach_event
+	      && !task_fulfilled_p (child_task))
+	    {
+	      priority_queue_insert (PQ_TEAM, &team->task_detach_queue,
+				     child_task, child_task->priority,
+				     PRIORITY_INSERT_END,
+				     false, false);
+	      ++team->task_detach_count;
+	      gomp_debug (0, "thread: %d, queueing detached %p\n",
+			  thr->ts.team_id, child_task->detach_event);
+	      child_task = NULL;
+	    }
+	  else
+	    {
+	      if (child_task->detach_event)
+		gomp_debug (0, "thread: %d, event already fulfilled %p\n",
+			    thr->ts.team_id, child_task->detach_event);
+	     finish_cancelled:;
+	      size_t new_tasks
+		= gomp_task_run_post_handle_depend (child_task, team);
+	      gomp_task_run_post_remove_parent (child_task);
+	      gomp_clear_parent (&child_task->children_queue);
+	      gomp_task_run_post_remove_taskgroup (child_task);
+	      to_free = child_task;
+	      child_task = NULL;
+	      if (!cancelled)
+		team->task_running_count--;
+	      if (new_tasks > 1)
+		{
+		  do_wake = team->nthreads - team->task_running_count;
+		  if (do_wake > new_tasks)
+		    do_wake = new_tasks;
+		}
+	      if (--team->task_count == 0
+		  && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+		{
+		  gomp_team_barrier_done (&team->barrier, state);
+		  gomp_mutex_unlock (&team->task_lock);
+		  gomp_team_barrier_wake (&team->barrier, 0);
+		  gomp_mutex_lock (&team->task_lock);
+		}
+	    }
 	}
     }
 }
@@ -2326,3 +2433,18 @@ omp_in_final (void)
 }
 
 ialias (omp_in_final)
+
+void omp_fulfill_event(omp_event_handle_t event)
+{
+  struct gomp_allow_completion_event *ev =
+		(struct gomp_allow_completion_event *) event;
+
+  if (__atomic_load_n (&ev->fulfilled, __ATOMIC_RELAXED))
+    gomp_fatal ("omp_fulfill_enent: Event already fulfilled!\n");
+
+  gomp_debug(0, "omp_fulfill_event: %p\n", ev);
+  __atomic_store_n (&ev->fulfilled, true, __ATOMIC_RELAXED);
+  gomp_sem_post (&ev->completion_sem);
+}
+
+ialias (omp_fulfill_event)
diff --git a/libgomp/team.c b/libgomp/team.c
index cbc3aec..ee488f2 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -206,6 +206,9 @@ gomp_new_team (unsigned nthreads)
   team->work_share_cancelled = 0;
   team->team_cancelled = 0;
 
+  priority_queue_init (&team->task_detach_queue);
+  team->task_detach_count = 0;
+
   return team;
 }
 
@@ -221,6 +224,7 @@ free_team (struct gomp_team *team)
   gomp_barrier_destroy (&team->barrier);
   gomp_mutex_destroy (&team->task_lock);
   priority_queue_free (&team->task_queue);
+  priority_queue_free (&team->task_detach_queue);
   team_free (team);
 }
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
new file mode 100644
index 0000000..7f2319c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+omp_event_handle_t detach_event1, detach_event2;
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+
+  #pragma omp parallel
+  {
+    #pragma omp single
+    {
+      #pragma omp task detach(detach_event1)
+      {
+	x++;
+      }
+
+      #pragma omp task detach(detach_event2)
+      {
+	y++;
+	omp_fulfill_event (detach_event1);
+      }
+
+      #pragma omp task
+      {
+	z++;
+	omp_fulfill_event (detach_event2);
+      }
+    }
+    #pragma omp taskwait
+  }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
new file mode 100644
index 0000000..330c936
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+
+  #pragma omp parallel num_threads(1)
+  {
+    #pragma omp single
+    {
+      omp_event_handle_t detach_event1, detach_event2;
+
+      #pragma omp task detach(detach_event1)
+      {
+	x++;
+      }
+
+      #pragma omp task detach(detach_event2)
+      {
+	y++;
+	omp_fulfill_event (detach_event1);
+      }
+
+      #pragma omp task
+      {
+	z++;
+	omp_fulfill_event (detach_event2);
+      }
+    }
+    #pragma omp taskwait
+  }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
new file mode 100644
index 0000000..a16f5336
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+
+  #pragma omp parallel
+  {
+    #pragma omp single
+    {
+      omp_event_handle_t detach_event;
+      int dep;
+
+      #pragma omp task depend(out:dep) detach(detach_event)
+      {
+	x++;
+      }
+
+      #pragma omp task
+      {
+	y++;
+	omp_fulfill_event(detach_event);
+      }
+
+      #pragma omp task depend(in:dep)
+      {
+	z++;
+      }
+    }
+    #pragma omp taskwait
+  }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
new file mode 100644
index 0000000..3d4d491
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+int main (void)
+{
+  int x = 0;
+
+  #pragma omp parallel
+  {
+    #pragma omp single
+    {
+      omp_event_handle_t detach_event;
+
+      #pragma omp task detach(detach_event)
+      {
+	x++;
+	omp_fulfill_event(detach_event);
+      }
+    }
+    #pragma omp taskwait
+  }
+
+  assert (x == 1);
+}
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-1.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
new file mode 100644
index 0000000..20e3675
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
@@ -0,0 +1,33 @@
+! { dg-do run }
+
+program task_detach_1
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+
+  !$omp parallel
+    !$omp single
+
+      !$omp task detach(detach_event1)
+        x = x + 1
+      !$omp end task
+
+      !$omp task detach(detach_event2)
+        y = y + 1
+	call omp_fulfill_event (detach_event1)
+      !$omp end task
+
+      !$omp task
+        z = z + 1
+	call omp_fulfill_event (detach_event2)
+      !$omp end task
+    !$omp end single
+
+    !$omp taskwait
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-2.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
new file mode 100644
index 0000000..bd0f016
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
@@ -0,0 +1,33 @@
+! { dg-do run }
+
+program task_detach_2
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+
+  !$omp parallel num_threads(1)
+    !$omp single
+
+      !$omp task detach(detach_event1)
+        x = x + 1
+      !$omp end task
+
+      !$omp task detach(detach_event2)
+        y = y + 1
+	call omp_fulfill_event (detach_event1)
+      !$omp end task
+
+      !$omp task
+        z = z + 1
+	call omp_fulfill_event (detach_event2)
+      !$omp end task
+    !$omp end single
+
+    !$omp taskwait
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-3.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
new file mode 100644
index 0000000..8a2ae48
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
@@ -0,0 +1,33 @@
+! { dg-do run }
+
+program task_detach_3
+
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event
+  integer :: x = 0, y = 0, z = 0
+  integer :: dep
+
+  !$omp parallel
+    !$omp single
+      !$omp task depend(out:dep) detach(detach_event)
+        x = x + 1
+      !$omp end task
+
+      !$omp task
+        y = y + 1
+	call omp_fulfill_event(detach_event)
+      !$omp end task
+
+      !$omp task depend(in:dep)
+        z = z + 1
+      !$omp end task
+    !$omp end single
+
+    !$omp taskwait
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-4.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-4.f90
new file mode 100644
index 0000000..a4ca3e0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-4.f90
@@ -0,0 +1,21 @@
+! { dg-do run }
+
+program task_detach_4
+
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event
+  integer :: x = 0
+
+  !$omp parallel
+    !$omp single
+      !$omp task detach(detach_event)
+        x = x + 1
+	call omp_fulfill_event(detach_event)
+      !$omp end task
+    !$omp end single
+    !$omp taskwait
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+end program
Kwok Cheung Yeung Dec. 9, 2020, 5:37 p.m. UTC | #2
Hello

This is a further update of the patch for task detach support.

>> - The memory for the event is not mapped on the target. This means that if 
>> omp_fulfill_event is called from an 'omp target' section with a target that 
>> does not share memory with the host, the event will not be fulfilled (and a 
>> segfault will probably occur).
> 
> I was thinking of something along the lines of:
> 
> #pragma omp task detach (event)
> {
> }
> 
> #pragma omp target
> {
>    omp_fulfill_event (event);
> }
> 
> Would something like this be expected to work? I cannot find many examples of 
> the detach clause online, and none of them use any offloading constructs.

I have asked on the omp-lang mailing list - this is not expected to work.

>> - The tasks awaiting event fulfillment currently wait until there are no other 
>> runnable tasks left. A better approach would be to poll (without blocking) the 
>> waiting tasks whenever any task completes, immediately removing any 
>> now-complete tasks and requeuing any dependent tasks.
> 
> This has now been implemented. On every iteration of the main loop in 
> gomp_barrier_handle_tasks, it first checks to see if any tasks in the detach 
> queue have a fulfilled completion event, and if so it will remove the task and 
> requeue any dependent tasks.
> 

I have found another problem with the original blocking approach when the tasks 
are on offload devices. On Nvidia and GCN, a bar.sync/s_barrier instruction is 
issued when gomp_team_barrier_wake is called to synchronise the threads. 
However, if some of the barrier threads are stuck waiting for semaphores 
associated with completion events, and the fulfillment of those events are in 
other tasks waiting to run, then the result is a deadlock as the threads cannot 
synchronise without all the semaphores being released.

I have removed the blocking path on gomp_barrier_handle_tasks altogether, and 
omp_fulfill_event now directly wakes the barrier threads to process any tasks 
that are now complete.

I have also ensured that the event handle specified on the detach clause is 
firstprivate by default on enclosing scopes.

I believe this patch is largely complete now. I have done a bootstrap on x86_64 
and run the testsuites with no regressions. I have also run the libgomp 
testsuite with offloading to Nvidia and AMD GCN devices, also with no 
regressions. Is this patch okay for trunk (or would it be more appropriate to 
wait until GCC 11 is branched off)?

Thanks

Kwok
commit 3d82db0fc3623e9dc241bed4c4cfd266574d45e7
Author: Kwok Cheung Yeung <kcy@codesourcery.com>
Date:   Wed Dec 9 09:33:46 2020 -0800

    openmp: Add support for the OpenMP 5.0 task detach clause
    
    2020-12-09  Kwok Cheung Yeung  <kcy@codesourcery.com>
    
    	gcc/
    	* builtin-types.def (BT_PTR_SIZED_INT): New primitive type.
    	(BT_FN_PSINT_VOID): New function type.
    	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
    	to...
    	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT):
    	...this.  Add extra argument.
    	* gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DETACH.
    	(gimplify_adjust_omp_clauses): Likewise.
    	* omp-builtins.def (BUILT_IN_GOMP_TASK): Change function type to
    	BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT.
    	(BUILT_IN_GOMP_NEW_EVENT): New.
    	* omp-expand.c (expand_task_call): Add detach argument when generating
    	call to	GOMP_task.
    	* omp-low.c (scan_sharing_clauses): Setup data environment for detach
    	clause.
    	(lower_detach_clause): New.
    	(lower_omp_taskreg): Call lower_detach_clause for detach clause.  Add
    	Gimple statements generated for detach clause.
    	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DETACH.
    	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_DETACH.
    	* tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_DETACH.
    	(omp_clause_code_name): Add entry for OMP_CLAUSE_DETACH.
    	(walk_tree_1): Handle OMP_CLAUSE_DETACH.
    	* tree.h (OMP_CLAUSE_DETACH_EXPR): New.
    
    	gcc/c-family/
    	* c-pragma.h (pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_DETACH.
    	Redefine PRAGMA_OACC_CLAUSE_DETACH.
    
    	gcc/c/
    	* c-parser.c (c_parser_omp_clause_detach): New.
    	(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH clause.
    	(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
    	* c-typeck.c (c_finish_omp_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH
    	clause.
    
    	gcc/cp/
    	* parser.c (cp_parser_omp_clause_detach): New.
    	(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH.
    	(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
    	* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_DETACH clause.
    
    	gcc/fortran/
    	* dump-parse-tree.c (show_omp_clauses): Handle detach clause.
    	* frontend-passes.c (gfc_code_walker): Walk detach expression.
    	* gfortran.h (struct gfc_omp_clauses): Add detach field.
    	(gfc_c_intptr_kind): New.
    	* openmp.c (gfc_free_omp_clauses): Free detach clause.
    	(gfc_match_omp_detach): New.
    	(enum omp_mask1): Add OMP_CLAUSE_DETACH.
    	(enum omp_mask2): Remove OMP_CLAUSE_DETACH.
    	(gfc_match_omp_clauses): Handle OMP_CLAUSE_DETACH for OpenMP.
    	(OMP_TASK_CLAUSES): Add OMP_CLAUSE_DETACH.
    	* trans-openmp.c (gfc_trans_omp_clauses): Handle detach clause.
    	* trans-types.c (gfc_c_intptr_kind): New.
    	(gfc_init_kinds): Initialize gfc_c_intptr_kind.
    	* types.def (BT_PTR_SIZED_INT): New type.
    	(BT_FN_PSINT_VOID): New function type.
    	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
    	to...
    	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT):
    	...this.  Add extra argument.
    
    	libgomp/
    	* fortran.c (omp_fulfill_event_): New.
    	* libgomp.h (struct gomp_allow_completion_event): New.
    	(struct gomp_task): Add detach_event field.
    	(struct gomp_team): Add task_detach_queue and task_detach_count
    	fields.
    	(gomp_finish_task): Delete detach_event.
    	* libgomp.map (OMP_5.0.1): Add omp_fulfill_event and omp_fulfill_event_.
    	(GOMP_5.0): Add GOMP_new_event.
    	* libgomp_g.h (GOMP_new_event): New.
    	(GOMP_task): Add uintptr_t argument.
    	* omp.h.in (enum omp_event_handle_t): New.
    	(omp_fulfill_event): New.
    	* omp_lib.f90.in (omp_event_handle_kind): New.
    	(omp_fulfill_event): New.
    	* omp_lib.h.in (omp_event_handle_kind): New.
    	(omp_event_handle_kind): New.
    	(omp_fulfill_event): Declare.
    	* priority_queue.c (priority_tree_find): New.
    	(priority_list_find): New.
    	(priority_queue_find): New.
    	* priority_queue.h (priority_queue_predicate): New.
    	(priority_queue_find): New.
    	* task.c (gomp_init_task): Initialize detach_event field.
    	(GOMP_new_event): New.
    	(GOMP_task): Add detach argument.  Initialize detach_event field.
    	Wait for detach event if task not deferred.
    	(task_fulfilled_p): New.
    	(gomp_barrier_handle_tasks): Queue tasks with unfulfilled events.
    	Remove completed tasks and requeue dependent tasks.
    	(omp_fulfill_event): New.
    	* team.c (gomp_new_team): Initialize task_detach_queue and
    	task_detach_count fields.
    	(free_team): Free task_detach_queue field.
    	* testsuite/libgomp.c-c++-common/task-detach-1.c: New testcase.
    	* testsuite/libgomp.c-c++-common/task-detach-2.c: New testcase.
    	* testsuite/libgomp.c-c++-common/task-detach-3.c: New testcase.
    	* testsuite/libgomp.c-c++-common/task-detach-4.c: New testcase.
    	* testsuite/libgomp.c-c++-common/task-detach-5.c: New testcase.
    	* testsuite/libgomp.c-c++-common/task-detach-6.c: New testcase.
    	* testsuite/libgomp.fortran/task-detach-1.f90: New testcase.
    	* testsuite/libgomp.fortran/task-detach-2.f90: New testcase.
    	* testsuite/libgomp.fortran/task-detach-3.f90: New testcase.
    	* testsuite/libgomp.fortran/task-detach-4.f90: New testcase.
    	* testsuite/libgomp.fortran/task-detach-5.f90: New testcase.
    	* testsuite/libgomp.fortran/task-detach-6.f90: New testcase.

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 4a82ee4..d28ec75 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -79,6 +79,7 @@ DEF_PRIMITIVE_TYPE (BT_UINT128, uint128_type_node
 DEF_PRIMITIVE_TYPE (BT_WORD, (*lang_hooks.types.type_for_mode) (word_mode, 1))
 DEF_PRIMITIVE_TYPE (BT_UNWINDWORD, (*lang_hooks.types.type_for_mode)
 				    (targetm.unwind_word_mode (), 1))
+DEF_PRIMITIVE_TYPE (BT_PTR_SIZED_INT, pointer_sized_int_node)
 DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
@@ -253,6 +254,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_LONG_LONGDOUBLE, BT_LONG, BT_LONGDOUBLE)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT, BT_LONGLONG, BT_FLOAT)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_DOUBLE, BT_LONGLONG, BT_DOUBLE)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_LONGDOUBLE, BT_LONGLONG, BT_LONGDOUBLE)
+DEF_FUNCTION_TYPE_1 (BT_FN_PSINT_VOID, BT_PTR_SIZED_INT, BT_VOID)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTR, BT_VOID, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_SIZE_CONST_STRING, BT_SIZE, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_CONST_STRING, BT_INT, BT_CONST_STRING)
@@ -756,10 +758,6 @@ DEF_FUNCTION_TYPE_8 (BT_FN_BOOL_UINT_ULLPTR_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_LONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG, BT_PTR, BT_PTR)
 
-DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
-		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
-		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
-		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
 		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
@@ -767,6 +765,10 @@ DEF_FUNCTION_TYPE_9 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR_PTR_PTR
 		     BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG, BT_LONG,
 		     BT_PTR_LONG, BT_PTR_LONG, BT_PTR, BT_PTR)
 
+DEF_FUNCTION_TYPE_10 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT,
+		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
+		      BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
+		      BT_BOOL, BT_UINT, BT_PTR, BT_INT, BT_PTR_SIZED_INT)
 DEF_FUNCTION_TYPE_10 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		      BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		      BT_ULONGLONG, BT_LONG, BT_ULONGLONG, BT_PTR_ULONGLONG,
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 5a493fe..fb784e9 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -94,6 +94,7 @@ enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_DEFAULT,
   PRAGMA_OMP_CLAUSE_DEFAULTMAP,
   PRAGMA_OMP_CLAUSE_DEPEND,
+  PRAGMA_OMP_CLAUSE_DETACH,
   PRAGMA_OMP_CLAUSE_DEVICE,
   PRAGMA_OMP_CLAUSE_DEVICE_TYPE,
   PRAGMA_OMP_CLAUSE_DIST_SCHEDULE,
@@ -150,7 +151,6 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_COPYOUT,
   PRAGMA_OACC_CLAUSE_CREATE,
   PRAGMA_OACC_CLAUSE_DELETE,
-  PRAGMA_OACC_CLAUSE_DETACH,
   PRAGMA_OACC_CLAUSE_DEVICEPTR,
   PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
   PRAGMA_OACC_CLAUSE_FINALIZE,
@@ -173,6 +173,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_COPYIN = PRAGMA_OMP_CLAUSE_COPYIN,
   PRAGMA_OACC_CLAUSE_DEVICE = PRAGMA_OMP_CLAUSE_DEVICE,
   PRAGMA_OACC_CLAUSE_DEFAULT = PRAGMA_OMP_CLAUSE_DEFAULT,
+  PRAGMA_OACC_CLAUSE_DETACH = PRAGMA_OMP_CLAUSE_DETACH,
   PRAGMA_OACC_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
   PRAGMA_OACC_CLAUSE_IF = PRAGMA_OMP_CLAUSE_IF,
   PRAGMA_OACC_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7540a15..d3546d4 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15976,6 +15976,53 @@ c_parser_omp_clause_uniform (c_parser *parser, tree list)
   return list;
 }
 
+/* OpenMP 5.0:
+   detach ( event-handle ) */
+
+static tree
+c_parser_omp_clause_detach (c_parser *parser, tree list)
+{
+  matching_parens parens;
+  location_t clause_loc = c_parser_peek_token (parser)->location;
+
+  if (!parens.require_open (parser))
+    return list;
+
+  if (c_parser_next_token_is_not (parser, CPP_NAME)
+      || c_parser_peek_token (parser)->id_kind != C_ID_ID)
+    {
+      c_parser_error (parser, "expected identifier");
+      return list;
+    }
+
+  tree t = lookup_name (c_parser_peek_token (parser)->value);
+  if (t == NULL_TREE)
+    {
+      undeclared_variable (c_parser_peek_token (parser)->location,
+			   c_parser_peek_token (parser)->value);
+      return list;
+    }
+  c_parser_consume_token (parser);
+
+  tree type = TYPE_MAIN_VARIANT (TREE_TYPE (t));
+  if (!INTEGRAL_TYPE_P (type)
+      || TREE_CODE (type) != ENUMERAL_TYPE
+      || TYPE_NAME (type) != get_identifier ("omp_event_handle_t"))
+    {
+      error_at (clause_loc, "%<detach%> clause event handle "
+			    "has type %qT rather than "
+			    "%<omp_event_handle_t%>",
+			    type);
+      return list;
+    }
+
+  tree u = build_omp_clause (clause_loc, OMP_CLAUSE_DETACH);
+  OMP_CLAUSE_DECL (u) = t;
+  OMP_CLAUSE_CHAIN (u) = list;
+  parens.skip_until_found_close (parser);
+  return u;
+}
+
 /* Parse all OpenACC clauses.  The set clauses allowed by the directive
    is a bitmask in MASK.  Return the list of clauses found.  */
 
@@ -16242,6 +16289,10 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_omp_clause_default (parser, clauses, false);
 	  c_name = "default";
 	  break;
+	case PRAGMA_OMP_CLAUSE_DETACH:
+	  clauses = c_parser_omp_clause_detach (parser, clauses);
+	  c_name = "detach";
+	  break;
 	case PRAGMA_OMP_CLAUSE_FIRSTPRIVATE:
 	  clauses = c_parser_omp_clause_firstprivate (parser, clauses);
 	  c_name = "firstprivate";
@@ -19140,7 +19191,8 @@ c_parser_omp_single (location_t loc, c_parser *parser, bool *if_p)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEPEND)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRIORITY)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ALLOCATE)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION))
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DETACH))
 
 static tree
 c_parser_omp_task (location_t loc, c_parser *parser, bool *if_p)
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 286f3d9..0c2ed59 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -14942,6 +14942,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  pc = &OMP_CLAUSE_CHAIN (c);
 	  continue;
 
+	case OMP_CLAUSE_DETACH:
+	  t = OMP_CLAUSE_DECL (c);
+	  pc = &OMP_CLAUSE_CHAIN (c);
+	  continue;
+
 	case OMP_CLAUSE_IF:
 	case OMP_CLAUSE_NUM_THREADS:
 	case OMP_CLAUSE_NUM_TEAMS:
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 8802124..5731a29 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -36957,6 +36957,66 @@ cp_parser_omp_clause_depend_sink (cp_parser *parser, location_t clause_loc,
 }
 
 /* OpenMP 5.0:
+   detach ( event-handle ) */
+
+static tree
+cp_parser_omp_clause_detach (cp_parser *parser, tree list)
+{
+  matching_parens parens;
+
+  if (!parens.require_open (parser))
+    return list;
+
+  if (cp_lexer_next_token_is_not (parser->lexer, CPP_NAME))
+    {
+      cp_parser_error (parser, "expected identifier");
+      return list;
+    }
+
+  location_t id_loc = cp_lexer_peek_token (parser->lexer)->location;
+  tree t, identifier = cp_parser_identifier (parser);
+
+  if (identifier == error_mark_node)
+    t = error_mark_node;
+  else
+    {
+      t = cp_parser_lookup_name_simple
+	    (parser, identifier,
+	     cp_lexer_peek_token (parser->lexer)->location);
+      if (t == error_mark_node)
+	cp_parser_name_lookup_error (parser, identifier, t, NLE_NULL,
+				     id_loc);
+      else
+	{
+	  tree type = TYPE_MAIN_VARIANT (TREE_TYPE (t));
+	  if (!INTEGRAL_TYPE_P (type)
+	      || TREE_CODE (type) != ENUMERAL_TYPE
+	      || DECL_NAME (TYPE_NAME (type))
+		   != get_identifier ("omp_event_handle_t"))
+	    {
+	      error_at (id_loc, "%<detach%> clause event handle "
+			    "has type %qT rather than "
+			    "%<omp_event_handle_t%>",
+			    type);
+	      return list;
+	    }
+	}
+    }
+
+  if (t == error_mark_node
+      || !parens.require_close (parser))
+    cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+					   /*or_comma=*/false,
+					   /*consume_paren=*/true);
+
+  tree u = build_omp_clause (id_loc, OMP_CLAUSE_DETACH);
+  OMP_CLAUSE_DECL (u) = t;
+  OMP_CLAUSE_CHAIN (u) = list;
+
+  return u;
+}
+
+/* OpenMP 5.0:
    iterators ( iterators-definition )
 
    iterators-definition:
@@ -38012,6 +38072,10 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 						 token->location);
 	  c_name = "depend";
 	  break;
+	case PRAGMA_OMP_CLAUSE_DETACH:
+	  clauses = cp_parser_omp_clause_detach (parser, clauses);
+	  c_name = "detach";
+	  break;
 	case PRAGMA_OMP_CLAUSE_MAP:
 	  clauses = cp_parser_omp_clause_map (parser, clauses);
 	  c_name = "map";
@@ -40546,7 +40610,8 @@ cp_parser_omp_single (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEPEND)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRIORITY)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ALLOCATE)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION))
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DETACH))
 
 static tree
 cp_parser_omp_task (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 5ff70ff..38eecf5 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7394,6 +7394,9 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 		}
 	    }
 	  break;
+	case OMP_CLAUSE_DETACH:
+	  t = OMP_CLAUSE_DECL (c);
+	  break;
 
 	case OMP_CLAUSE_MAP:
 	case OMP_CLAUSE_TO:
diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index cab0fb2..baf1e3c 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1700,6 +1700,12 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
       show_expr (omp_clauses->priority);
       fputc (')', dumpfile);
     }
+  if (omp_clauses->detach)
+    {
+      fputs (" DETACH(", dumpfile);
+      show_expr (omp_clauses->detach);
+      fputc (')', dumpfile);
+    }
   for (i = 0; i < OMP_IF_LAST; i++)
     if (omp_clauses->if_exprs[i])
       {
diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index 83f6fd8..699b354 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -5597,6 +5597,7 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn,
 		  WALK_SUBEXPR (co->ext.omp_clauses->hint);
 		  WALK_SUBEXPR (co->ext.omp_clauses->num_tasks);
 		  WALK_SUBEXPR (co->ext.omp_clauses->priority);
+		  WALK_SUBEXPR (co->ext.omp_clauses->detach);
 		  for (idx = 0; idx < OMP_IF_LAST; idx++)
 		    WALK_SUBEXPR (co->ext.omp_clauses->if_exprs[idx]);
 		  for (idx = 0;
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 6467985..fec96af 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1414,6 +1414,7 @@ typedef struct gfc_omp_clauses
   struct gfc_expr *hint;
   struct gfc_expr *num_tasks;
   struct gfc_expr *priority;
+  struct gfc_expr *detach;
   struct gfc_expr *if_exprs[OMP_IF_LAST];
   enum gfc_omp_sched_kind dist_sched_kind;
   struct gfc_expr *dist_chunk_size;
@@ -3102,6 +3103,7 @@ extern int gfc_default_character_kind;
 extern int gfc_default_logical_kind;
 extern int gfc_default_complex_kind;
 extern int gfc_c_int_kind;
+extern int gfc_c_intptr_kind;
 extern int gfc_atomic_int_kind;
 extern int gfc_atomic_logical_kind;
 extern int gfc_intio_kind;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 68d0b65..4c58447 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -91,6 +91,7 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c->hint);
   gfc_free_expr (c->num_tasks);
   gfc_free_expr (c->priority);
+  gfc_free_expr (c->detach);
   for (i = 0; i < OMP_IF_LAST; i++)
     gfc_free_expr (c->if_exprs[i]);
   gfc_free_expr (c->async_expr);
@@ -448,6 +449,39 @@ cleanup:
   return MATCH_ERROR;
 }
 
+/* Match detach(event-handle).  */
+
+static match
+gfc_match_omp_detach (gfc_expr **expr)
+{
+  locus old_loc = gfc_current_locus;
+
+  if (gfc_match ("detach ( ") != MATCH_YES)
+    goto syntax_error;
+
+  if (gfc_match_variable (expr, 0) != MATCH_YES)
+    goto syntax_error;
+
+  if ((*expr)->ts.type != BT_INTEGER || (*expr)->ts.kind != gfc_c_intptr_kind)
+    {
+      gfc_error ("%qs at %L should be of type "
+		 "integer(kind=omp_event_handle_kind)",
+		 (*expr)->symtree->n.sym->name, &(*expr)->where);
+      return MATCH_ERROR;
+    }
+
+  if (gfc_match_char (')') != MATCH_YES)
+    goto syntax_error;
+
+  return MATCH_YES;
+
+syntax_error:
+   gfc_error ("Syntax error in OpenMP detach clause at %C");
+   gfc_current_locus = old_loc;
+   return MATCH_ERROR;
+
+}
+
 /* Match depend(sink : ...) construct a namelist from it.  */
 
 static match
@@ -807,6 +841,7 @@ enum omp_mask1
   OMP_CLAUSE_ATOMIC,  /* OpenMP 5.0.  */
   OMP_CLAUSE_CAPTURE,  /* OpenMP 5.0.  */
   OMP_CLAUSE_MEMORDER,  /* OpenMP 5.0.  */
+  OMP_CLAUSE_DETACH,  /* OpenMP 5.0.  */
   OMP_CLAUSE_NOWAIT,
   /* This must come last.  */
   OMP_MASK1_LAST
@@ -840,7 +875,6 @@ enum omp_mask2
   OMP_CLAUSE_IF_PRESENT,
   OMP_CLAUSE_FINALIZE,
   OMP_CLAUSE_ATTACH,
-  OMP_CLAUSE_DETACH,
   /* This must come last.  */
   OMP_MASK2_LAST
 };
@@ -1378,6 +1412,11 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		gfc_current_locus = old_loc;
 	    }
 	  if ((mask & OMP_CLAUSE_DETACH)
+	      && !openacc
+	      && gfc_match_omp_detach (&c->detach) == MATCH_YES)
+	    continue;
+	  if ((mask & OMP_CLAUSE_DETACH)
+	      && openacc
 	      && gfc_match ("detach ( ") == MATCH_YES
 	      && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
 					   OMP_MAP_DETACH, false,
@@ -2763,7 +2802,8 @@ cleanup:
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
    | OMP_CLAUSE_SHARED | OMP_CLAUSE_IF | OMP_CLAUSE_DEFAULT		\
    | OMP_CLAUSE_UNTIED | OMP_CLAUSE_FINAL | OMP_CLAUSE_MERGEABLE	\
-   | OMP_CLAUSE_DEPEND | OMP_CLAUSE_PRIORITY | OMP_CLAUSE_IN_REDUCTION)
+   | OMP_CLAUSE_DEPEND | OMP_CLAUSE_PRIORITY | OMP_CLAUSE_IN_REDUCTION	\
+   | OMP_CLAUSE_DETACH)
 #define OMP_TASKLOOP_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
    | OMP_CLAUSE_LASTPRIVATE | OMP_CLAUSE_SHARED | OMP_CLAUSE_IF		\
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 6b4ad6a..314e00d 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3667,6 +3667,21 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
     }
 
+  if (clauses->detach)
+    {
+      tree detach;
+
+      gfc_init_se (&se, NULL);
+      gfc_conv_expr (&se, clauses->detach);
+      gfc_add_block_to_block (block, &se.pre);
+      detach = se.expr;
+      gfc_add_block_to_block (block, &se.post);
+
+      c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_DETACH);
+      OMP_CLAUSE_DETACH_EXPR (c) = detach;
+      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+    }
+
   if (clauses->hint)
     {
       tree hint;
diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index b7129dc..bd9ca14 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -114,6 +114,7 @@ int gfc_default_character_kind;
 int gfc_default_logical_kind;
 int gfc_default_complex_kind;
 int gfc_c_int_kind;
+int gfc_c_intptr_kind;
 int gfc_atomic_int_kind;
 int gfc_atomic_logical_kind;
 
@@ -691,6 +692,8 @@ gfc_init_kinds (void)
   /* Choose atomic kinds to match C's int.  */
   gfc_atomic_int_kind = gfc_c_int_kind;
   gfc_atomic_logical_kind = gfc_c_int_kind;
+
+  gfc_c_intptr_kind = POINTER_SIZE / 8;
 }
 
 
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 5736bba..7cac4dc 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -53,6 +53,7 @@ DEF_PRIMITIVE_TYPE (BT_LONG, long_integer_type_node)
 DEF_PRIMITIVE_TYPE (BT_ULONGLONG, long_long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (BT_WORD, (*lang_hooks.types.type_for_mode) (word_mode, 1))
 DEF_PRIMITIVE_TYPE (BT_SIZE, size_type_node)
+DEF_PRIMITIVE_TYPE (BT_PTR_SIZED_INT, pointer_sized_int_node)
 
 DEF_PRIMITIVE_TYPE (BT_I1, builtin_type_for_size (BITS_PER_UNIT*1, 1))
 DEF_PRIMITIVE_TYPE (BT_I2, builtin_type_for_size (BITS_PER_UNIT*2, 1))
@@ -86,6 +87,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT, BT_INT, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
 DEF_FUNCTION_TYPE_1 (BT_FN_PTR_PTR, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_PSINT_VOID, BT_PTR_SIZED_INT, BT_VOID)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_INT, BT_VOID, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_BOOL, BT_VOID, BT_BOOL)
 DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT)
@@ -235,10 +237,6 @@ DEF_FUNCTION_TYPE_8 (BT_FN_BOOL_UINT_ULLPTR_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_LONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG, BT_PTR, BT_PTR)
 
-DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
-		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
-		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
-		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
 		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
@@ -246,6 +244,10 @@ DEF_FUNCTION_TYPE_9 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR_PTR_PTR
 		     BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG, BT_LONG,
 		     BT_PTR_LONG, BT_PTR_LONG, BT_PTR, BT_PTR)
 
+DEF_FUNCTION_TYPE_10 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT,
+		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
+		      BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
+		      BT_BOOL, BT_UINT, BT_PTR, BT_INT, BT_PTR_SIZED_INT)
 DEF_FUNCTION_TYPE_10 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		      BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		      BT_ULONGLONG, BT_LONG, BT_ULONGLONG, BT_PTR_ULONGLONG,
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 53ec9ec..20d39b5 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9733,6 +9733,19 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	    }
 	  break;
 
+	case OMP_CLAUSE_DETACH:
+	  decl = OMP_CLAUSE_DECL (c);
+	  if (outer_ctx)
+	    {
+	      splay_tree_node on
+		= splay_tree_lookup (outer_ctx->variables,
+				     (splay_tree_key)decl);
+	      if (on == NULL || (on->value & GOVD_DATA_SHARE_CLASS) == 0)
+		omp_firstprivatize_variable (outer_ctx, decl);
+	      omp_notice_variable (outer_ctx, decl, true);
+	    }
+	  break;
+
 	case OMP_CLAUSE_IF:
 	  if (OMP_CLAUSE_IF_MODIFIER (c) != ERROR_MARK
 	      && OMP_CLAUSE_IF_MODIFIER (c) != code)
@@ -10870,6 +10883,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	case OMP_CLAUSE_DEFAULTMAP:
 	case OMP_CLAUSE_ORDER:
 	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index f9b78ed..b3bf17f 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -381,7 +381,7 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL_REDUCTIONS,
 		  "GOMP_parallel_reductions",
 		  BT_FN_UINT_OMPFN_PTR_UINT_UINT, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASK, "GOMP_task",
-		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
+		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT,
 		  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASKLOOP, "GOMP_taskloop",
 		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
@@ -451,3 +451,5 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ALLOC,
 		  ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_FREE,
 		  "GOMP_free", BT_FN_VOID_PTR_PTRMODE, ATTR_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_NEW_EVENT, "GOMP_new_event",
+		  BT_FN_PSINT_VOID, ATTR_NOTHROW_LEAF_LIST)
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index c0e94e5..533f47d 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -762,6 +762,7 @@ expand_task_call (struct omp_region *region, basic_block bb,
   tree depend = omp_find_clause (clauses, OMP_CLAUSE_DEPEND);
   tree finalc = omp_find_clause (clauses, OMP_CLAUSE_FINAL);
   tree priority = omp_find_clause (clauses, OMP_CLAUSE_PRIORITY);
+  tree detach = omp_find_clause (clauses, OMP_CLAUSE_DETACH);
 
   unsigned int iflags
     = (untied ? GOMP_TASK_FLAG_UNTIED : 0)
@@ -853,6 +854,11 @@ expand_task_call (struct omp_region *region, basic_block bb,
     priority = integer_zero_node;
 
   gsi = gsi_last_nondebug_bb (bb);
+
+  detach = detach
+      ? fold_convert (pointer_sized_int_node, OMP_CLAUSE_DETACH_EXPR (detach))
+      : null_pointer_node;
+
   tree t = gimple_omp_task_data_arg (entry_stmt);
   if (t == NULL)
     t2 = null_pointer_node;
@@ -875,10 +881,10 @@ expand_task_call (struct omp_region *region, basic_block bb,
 			 num_tasks, priority, startvar, endvar, step);
   else
     t = build_call_expr (builtin_decl_explicit (BUILT_IN_GOMP_TASK),
-			 9, t1, t2, t3,
+			 10, t1, t2, t3,
 			 gimple_omp_task_arg_size (entry_stmt),
 			 gimple_omp_task_arg_align (entry_stmt), cond, flags,
-			 depend, priority);
+			 depend, priority, detach);
 
   force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
 			    false, GSI_CONTINUE_LINKING);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 09a8cbd..3dd15d1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1412,6 +1412,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_NUM_GANGS:
 	case OMP_CLAUSE_NUM_WORKERS:
 	case OMP_CLAUSE_VECTOR_LENGTH:
+	case OMP_CLAUSE_DETACH:
 	  if (ctx->outer)
 	    scan_omp_op (&OMP_CLAUSE_OPERAND (c, 0), ctx->outer);
 	  break;
@@ -1779,6 +1780,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_SIMDLEN:
 	case OMP_CLAUSE_ALIGNED:
 	case OMP_CLAUSE_DEPEND:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_ALLOCATE:
 	case OMP_CLAUSE__LOOPTEMP_:
 	case OMP_CLAUSE__REDUCTEMP_:
@@ -11453,6 +11455,26 @@ create_task_copyfn (gomp_task *task_stmt, omp_context *ctx)
 }
 
 static void
+lower_detach_clause (tree *pclauses, gimple_seq *iseq, omp_context *ctx)
+{
+  tree clause = omp_find_clause (*pclauses, OMP_CLAUSE_DETACH);
+  gcc_assert (clause);
+
+  tree event_decl = OMP_CLAUSE_DECL (clause);
+  tree event_ref = lookup_decl_in_outer_ctx (event_decl, ctx);
+  tree fn_decl = builtin_decl_explicit (BUILT_IN_GOMP_NEW_EVENT);
+  tree handle = create_tmp_var (pointer_sized_int_node);
+
+  gimple *call_stmt = gimple_build_call (fn_decl, 0);
+  gimple_call_set_lhs (call_stmt, handle);
+  gimple_seq_add_stmt (iseq, call_stmt);
+
+  gimplify_assign (event_ref,
+		   fold_convert (TREE_TYPE (event_decl), handle),
+		   iseq);
+}
+
+static void
 lower_depend_clauses (tree *pclauses, gimple_seq *iseq, gimple_seq *oseq)
 {
   tree c, clauses;
@@ -11601,6 +11623,15 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       if (ws_num == 1)
 	gimple_omp_parallel_set_combined_p (stmt, true);
     }
+
+  gimple_seq detach_ilist = NULL;
+  if (gimple_code (stmt) == GIMPLE_OMP_TASK
+      && omp_find_clause (clauses, OMP_CLAUSE_DETACH))
+    {
+      lower_detach_clause (gimple_omp_task_clauses_ptr (stmt), &detach_ilist,
+			   ctx);
+    }
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (gimple_code (stmt) == GIMPLE_OMP_TASK
@@ -11678,6 +11709,10 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   gimple_seq olist = NULL;
   gimple_seq ilist = NULL;
+
+  if (detach_ilist)
+    gimple_seq_add_seq (&ilist, detach_ilist);
+
   lower_send_clauses (clauses, &ilist, &olist, ctx);
   lower_send_shared_vars (&ilist, &olist, ctx);
 
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 313a6af..5b028da 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -301,6 +301,9 @@ enum omp_clause_code {
   /* OpenMP clause: to (variable-list).  */
   OMP_CLAUSE_TO,
 
+  /* OpenMP clause: detach (event-handle).  */
+  OMP_CLAUSE_DETACH,
+
   /* OpenACC clauses: {copy, copyin, copyout, create, delete, deviceptr,
      device, host (self), present, present_or_copy (pcopy), present_or_copyin
      (pcopyin), present_or_copyout (pcopyout), present_or_create (pcreate)}
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 5a93c4d..073d14b 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1265,6 +1265,12 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
     case OMP_CLAUSE_FINALIZE:
       pp_string (pp, "finalize");
       break;
+    case OMP_CLAUSE_DETACH:
+      pp_string (pp, "detach(");
+      dump_generic_node (pp, OMP_CLAUSE_DETACH_EXPR (clause), spc, flags,
+			 false);
+      pp_right_paren (pp);
+      break;
 
     default:
       gcc_unreachable ();
diff --git a/gcc/tree.c b/gcc/tree.c
index d6ba553..921c928 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -299,6 +299,7 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_LINK  */
   2, /* OMP_CLAUSE_FROM  */
   2, /* OMP_CLAUSE_TO  */
+  1, /* OMP_CLAUSE_DETACH  */
   2, /* OMP_CLAUSE_MAP  */
   1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
   1, /* OMP_CLAUSE_USE_DEVICE_ADDR  */
@@ -384,6 +385,7 @@ const char * const omp_clause_code_name[] =
   "link",
   "from",
   "to",
+  "detach",
   "map",
   "use_device_ptr",
   "use_device_addr",
@@ -12234,6 +12236,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	case OMP_CLAUSE_HINT:
 	case OMP_CLAUSE_TO_DECLARE:
 	case OMP_CLAUSE_LINK:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
diff --git a/gcc/tree.h b/gcc/tree.h
index 078919b..c76ee74 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1588,6 +1588,9 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_PRIORITY_EXPR(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PRIORITY),0)
 
+#define OMP_CLAUSE_DETACH_EXPR(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DETACH),0)
+
 /* OpenACC clause expressions  */
 #define OMP_CLAUSE_EXPR(NODE, CLAUSE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, CLAUSE), 0)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index cd719f9..976b248 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -605,6 +605,12 @@ omp_get_max_task_priority_ (void)
 }
 
 void
+omp_fulfill_event_ (intptr_t event)
+{
+  omp_fulfill_event ((omp_event_handle_t) event);
+}
+
+void
 omp_set_affinity_format_ (const char *format, size_t format_len)
 {
   gomp_set_affinity_format (format, format_len);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 070d29c..88fc217 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -516,6 +516,12 @@ struct gomp_taskwait
   gomp_sem_t taskwait_sem;
 };
 
+struct gomp_allow_completion_event
+{
+  bool fulfilled;
+  gomp_sem_t completion_sem;
+};
+
 /* This structure describes a "task" to be run by a thread.  */
 
 struct gomp_task
@@ -545,6 +551,8 @@ struct gomp_task
      entries and the gomp_task in which they reside.  */
   struct priority_node pnode[3];
 
+  struct gomp_allow_completion_event *detach_event;
+
   struct gomp_task_icv icv;
   void (*fn) (void *);
   void *fn_data;
@@ -685,6 +693,10 @@ struct gomp_team
   int work_share_cancelled;
   int team_cancelled;
 
+  /* Tasks waiting for their completion event to be fulfilled.  */
+  struct priority_queue task_detach_queue;
+  unsigned int task_detach_count;
+
   /* This array contains structures for implicit tasks.  */
   struct gomp_task implicit_task[];
 };
@@ -931,6 +943,8 @@ gomp_finish_task (struct gomp_task *task)
 {
   if (__builtin_expect (task->depend_hash != NULL, 0))
     free (task->depend_hash);
+  if (task->detach_event)
+    free (task->detach_event);
 }
 
 /* team.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2c95f78..434dfc3 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -195,6 +195,8 @@ OMP_5.0.1 {
 	omp_free;
 	omp_get_supported_active_levels;
 	omp_get_supported_active_levels_;
+	omp_fulfill_event;
+	omp_fulfill_event_;
 } OMP_5.0;
 
 GOMP_1.0 {
@@ -347,6 +349,7 @@ GOMP_5.0 {
 	GOMP_loop_ull_nonmonotonic_runtime_start;
 	GOMP_loop_ull_ordered_start;
 	GOMP_loop_ull_start;
+	GOMP_new_event;
 	GOMP_parallel_loop_maybe_nonmonotonic_runtime;
 	GOMP_parallel_loop_nonmonotonic_runtime;
 	GOMP_parallel_reductions;
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index b20e186..ca27c53 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -293,8 +293,9 @@ extern bool GOMP_cancellation_point (int);
 
 /* task.c */
 
+extern uintptr_t GOMP_new_event (void);
 extern void GOMP_task (void (*) (void *), void *, void (*) (void *, void *),
-		       long, long, bool, unsigned, void **, int);
+		       long, long, bool, unsigned, void **, int, uintptr_t);
 extern void GOMP_taskloop (void (*) (void *), void *,
 			   void (*) (void *, void *), long, long, unsigned,
 			   unsigned long, int, long, long, long);
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 4424a16..62b6c0f 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -171,6 +171,11 @@ typedef struct omp_alloctrait_t
   omp_uintptr_t value;
 } omp_alloctrait_t;
 
+typedef enum omp_event_handle_t __GOMP_UINTPTR_T_ENUM
+{
+  __omp_event_handle_t_max__ = __UINTPTR_MAX__
+} omp_event_handle_t;
+
 #ifdef __cplusplus
 extern "C" {
 # define __GOMP_NOTHROW throw ()
@@ -245,6 +250,8 @@ extern int omp_is_initial_device (void) __GOMP_NOTHROW;
 extern int omp_get_initial_device (void) __GOMP_NOTHROW;
 extern int omp_get_max_task_priority (void) __GOMP_NOTHROW;
 
+extern void omp_fulfill_event (omp_event_handle_t) __GOMP_NOTHROW;
+
 extern void *omp_target_alloc (__SIZE_TYPE__, int) __GOMP_NOTHROW;
 extern void omp_target_free (void *, int) __GOMP_NOTHROW;
 extern int omp_target_is_present (const void *, int) __GOMP_NOTHROW;
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 3b7f0cb..7b70d8b 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -39,6 +39,7 @@
         integer, parameter :: omp_alloctrait_val_kind = c_intptr_t
         integer, parameter :: omp_memspace_handle_kind = c_intptr_t
         integer, parameter :: omp_depend_kind = @OMP_DEPEND_KIND@
+        integer, parameter :: omp_event_handle_kind = c_intptr_t
         integer (omp_sched_kind), parameter :: omp_sched_static = 1
         integer (omp_sched_kind), parameter :: omp_sched_dynamic = 2
         integer (omp_sched_kind), parameter :: omp_sched_guided = 3
@@ -556,6 +557,13 @@
         end interface
 
         interface
+          subroutine omp_fulfill_event (event)
+            use omp_lib_kinds
+            integer (kind=omp_event_handle_kind), value, intent(in) :: event
+          end subroutine omp_fulfill_event
+        end interface
+
+        interface
           subroutine omp_set_affinity_format (format)
             character(len=*), intent(in) :: format
           end subroutine omp_set_affinity_format
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index eb1dcc4..5b4053f 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -82,10 +82,12 @@
 
       integer omp_allocator_handle_kind, omp_alloctrait_key_kind
       integer omp_alloctrait_val_kind, omp_memspace_handle_kind
+      integer omp_event_handle_kind
       parameter (omp_allocator_handle_kind = @INTPTR_T_KIND@)
       parameter (omp_alloctrait_key_kind = 4)
       parameter (omp_alloctrait_val_kind = @INTPTR_T_KIND@)
       parameter (omp_memspace_handle_kind = @INTPTR_T_KIND@)
+      parameter (omp_event_handle_kind = @INTPTR_T_KIND@)
       integer (omp_alloctrait_key_kind) omp_atk_sync_hint
       integer (omp_alloctrait_key_kind) omp_atk_alignment
       integer (omp_alloctrait_key_kind) omp_atk_access
@@ -245,6 +247,8 @@
       external omp_get_max_task_priority
       integer(4) omp_get_max_task_priority
 
+      external omp_fulfill_event
+
       external omp_set_affinity_format, omp_get_affinity_format
       external omp_display_affinity, omp_capture_affinity
       integer(4) omp_get_affinity_format
diff --git a/libgomp/priority_queue.c b/libgomp/priority_queue.c
index 9b8d2ff..0c6b556 100644
--- a/libgomp/priority_queue.c
+++ b/libgomp/priority_queue.c
@@ -168,6 +168,56 @@ priority_queue_verify (enum priority_queue_type type,
 }
 #endif /* _LIBGOMP_CHECKING_ */
 
+static struct gomp_task *
+priority_tree_find (enum priority_queue_type type,
+		    prio_splay_tree_node node,
+		    priority_queue_predicate pred)
+{
+ again:
+  if (!node)
+    return NULL;
+  struct gomp_task *task = priority_tree_find (type, node->right, pred);
+  if (task)
+    return task;
+  task = priority_node_to_task (type, node->key.l.tasks);
+  if (pred (task))
+    return task;
+  node = node->left;
+  goto again;
+}
+
+static struct gomp_task *
+priority_list_find (enum priority_queue_type type,
+		     struct priority_list *list,
+		     priority_queue_predicate pred)
+{
+  struct priority_node *node = list->tasks;
+  if (!node)
+    return NULL;
+
+  do
+    {
+      struct gomp_task *task = priority_node_to_task (type, node);
+      if (pred (task))
+	return task;
+      node = node->next;
+    }
+  while (node != list->tasks);
+
+  return NULL;
+}
+
+struct gomp_task *
+priority_queue_find (enum priority_queue_type type,
+		     struct priority_queue *head,
+		     priority_queue_predicate pred)
+{
+  if (priority_queue_multi_p (head))
+    return priority_tree_find (type, head->t.root, pred);
+  else
+    return priority_list_find (type, &head->l, pred);
+}
+
 /* Remove NODE from priority queue HEAD, wherever it may be inside the
    tree.  HEAD contains tasks of type TYPE.  */
 
diff --git a/libgomp/priority_queue.h b/libgomp/priority_queue.h
index 0ad78f5..c6fd80d 100644
--- a/libgomp/priority_queue.h
+++ b/libgomp/priority_queue.h
@@ -113,6 +113,8 @@ enum priority_queue_type
   PQ_IGNORED = 999
 };
 
+typedef bool (*priority_queue_predicate)(struct gomp_task *);
+
 /* Priority queue implementation prototypes.  */
 
 extern bool priority_queue_task_in_queue_p (enum priority_queue_type,
@@ -122,6 +124,9 @@ extern void priority_queue_dump (enum priority_queue_type,
 				 struct priority_queue *);
 extern void priority_queue_verify (enum priority_queue_type,
 				   struct priority_queue *, bool);
+extern struct gomp_task *priority_queue_find (enum priority_queue_type,
+					      struct priority_queue *,
+					      priority_queue_predicate);
 extern void priority_tree_remove (enum priority_queue_type,
 				  struct priority_queue *,
 				  struct priority_node *);
diff --git a/libgomp/task.c b/libgomp/task.c
index a95067c..ae1fcf7 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -29,6 +29,7 @@
 #include "libgomp.h"
 #include <stdlib.h>
 #include <string.h>
+#include <stdio.h>
 #include "gomp-constants.h"
 
 typedef struct gomp_task_depend_entry *hash_entry_type;
@@ -86,6 +87,7 @@ gomp_init_task (struct gomp_task *task, struct gomp_task *parent_task,
   task->dependers = NULL;
   task->depend_hash = NULL;
   task->depend_count = 0;
+  task->detach_event = NULL;
 }
 
 /* Clean up a task, after completing it.  */
@@ -326,6 +328,28 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
     }
 }
 
+uintptr_t
+GOMP_new_event ()
+{
+  struct gomp_allow_completion_event *event;
+
+  event = (struct gomp_allow_completion_event *)
+	    gomp_malloc (sizeof (struct gomp_allow_completion_event));
+  event->fulfilled = false;
+  gomp_sem_init (&event->completion_sem, 0);
+
+  gomp_debug (0, "GOMP_new_event: %p\n", event);
+
+  return (uintptr_t) event;
+}
+
+static bool
+task_fulfilled_p (struct gomp_task *task)
+{
+  return __atomic_load_n (&task->detach_event->fulfilled,
+			  __ATOMIC_RELAXED);
+}
+
 /* Called when encountering an explicit task directive.  If IF_CLAUSE is
    false, then we must not delay in executing the task.  If UNTIED is true,
    then the task may be executed by any member of the team.
@@ -347,11 +371,14 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
 void
 GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	   long arg_size, long arg_align, bool if_clause, unsigned flags,
-	   void **depend, int priority)
+	   void **depend, int priority, uintptr_t detach)
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr->ts.team;
 
+  struct gomp_allow_completion_event *detach_event =
+    detach ? (struct gomp_allow_completion_event *) detach : NULL;
+
 #ifdef HAVE_BROKEN_POSIX_SEMAPHORES
   /* If pthread_mutex_* is used for omp_*lock*, then each task must be
      tied to one thread all the time.  This means UNTIED tasks must be
@@ -404,6 +431,10 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
       task.final_task = (thr->task && thr->task->final_task)
 			|| (flags & GOMP_TASK_FLAG_FINAL);
       task.priority = priority;
+
+      if (detach)
+	task.detach_event = detach_event;
+
       if (thr->task)
 	{
 	  task.in_tied_task = thr->task->in_tied_task;
@@ -420,6 +451,10 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	}
       else
 	fn (data);
+
+      if (detach && !task_fulfilled_p (&task))
+	  gomp_sem_wait (&task.detach_event->completion_sem);
+
       /* Access to "children" is normally done inside a task_lock
 	 mutex region, but the only way this particular task.children
 	 can be set is if this thread's task work function (fn)
@@ -435,6 +470,7 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	  gomp_clear_parent (&task.children_queue);
 	  gomp_mutex_unlock (&team->task_lock);
 	}
+
       gomp_end_task ();
     }
   else
@@ -458,6 +494,8 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
       task->kind = GOMP_TASK_UNDEFERRED;
       task->in_tied_task = parent->in_tied_task;
       task->taskgroup = taskgroup;
+      if (detach)
+	task->detach_event = detach_event;
       thr->task = task;
       if (cpyfn)
 	{
@@ -1325,6 +1363,28 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
   while (1)
     {
       bool cancelled = false;
+
+      /* Look for a queued detached task with a fulfilled completion event
+	 that is ready to finish.  */
+      child_task = priority_queue_find (PQ_TEAM, &team->task_detach_queue,
+					task_fulfilled_p);
+      if (child_task)
+	{
+	  priority_queue_remove (PQ_TEAM, &team->task_detach_queue,
+				 child_task, MEMMODEL_RELAXED);
+	  --team->task_detach_count;
+	  gomp_debug (0, "thread %d: found task with fulfilled event %p\n",
+		      thr->ts.team_id, child_task->detach_event);
+
+	if (to_free)
+	  {
+	    gomp_finish_task (to_free);
+	    free (to_free);
+	    to_free = NULL;
+	  }
+	  goto finish_cancelled;
+	}
+
       if (!priority_queue_empty_p (&team->task_queue, MEMMODEL_RELAXED))
 	{
 	  bool ignored;
@@ -1392,29 +1452,44 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
       gomp_mutex_lock (&team->task_lock);
       if (child_task)
 	{
-	 finish_cancelled:;
-	  size_t new_tasks
-	    = gomp_task_run_post_handle_depend (child_task, team);
-	  gomp_task_run_post_remove_parent (child_task);
-	  gomp_clear_parent (&child_task->children_queue);
-	  gomp_task_run_post_remove_taskgroup (child_task);
-	  to_free = child_task;
-	  child_task = NULL;
-	  if (!cancelled)
-	    team->task_running_count--;
-	  if (new_tasks > 1)
+	  if (child_task->detach_event
+	      && !task_fulfilled_p (child_task))
 	    {
-	      do_wake = team->nthreads - team->task_running_count;
-	      if (do_wake > new_tasks)
-		do_wake = new_tasks;
+	      priority_queue_insert (PQ_TEAM, &team->task_detach_queue,
+				     child_task, child_task->priority,
+				     PRIORITY_INSERT_END,
+				     false, false);
+	      ++team->task_detach_count;
+	      gomp_debug (0, "thread %d: queueing task with event %p\n",
+			  thr->ts.team_id, child_task->detach_event);
+	      child_task = NULL;
 	    }
-	  if (--team->task_count == 0
-	      && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+	  else
 	    {
-	      gomp_team_barrier_done (&team->barrier, state);
-	      gomp_mutex_unlock (&team->task_lock);
-	      gomp_team_barrier_wake (&team->barrier, 0);
-	      gomp_mutex_lock (&team->task_lock);
+	     finish_cancelled:;
+	      size_t new_tasks
+		= gomp_task_run_post_handle_depend (child_task, team);
+	      gomp_task_run_post_remove_parent (child_task);
+	      gomp_clear_parent (&child_task->children_queue);
+	      gomp_task_run_post_remove_taskgroup (child_task);
+	      to_free = child_task;
+	      child_task = NULL;
+	      if (!cancelled)
+		team->task_running_count--;
+	      if (new_tasks > 1)
+		{
+		  do_wake = team->nthreads - team->task_running_count;
+		  if (do_wake > new_tasks)
+		    do_wake = new_tasks;
+		}
+	      if (--team->task_count == 0
+		  && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+		{
+		  gomp_team_barrier_done (&team->barrier, state);
+		  gomp_mutex_unlock (&team->task_lock);
+		  gomp_team_barrier_wake (&team->barrier, 0);
+		  gomp_mutex_lock (&team->task_lock);
+		}
 	    }
 	}
     }
@@ -2326,3 +2401,22 @@ omp_in_final (void)
 }
 
 ialias (omp_in_final)
+
+void omp_fulfill_event(omp_event_handle_t event)
+{
+  struct gomp_allow_completion_event *ev =
+		(struct gomp_allow_completion_event *) event;
+  struct gomp_thread *thr = gomp_thread ();
+  struct gomp_team *team = thr ? thr->ts.team : NULL;
+
+  if (__atomic_load_n (&ev->fulfilled, __ATOMIC_RELAXED))
+    gomp_fatal ("omp_fulfill_event: Event already fulfilled!\n");
+
+  gomp_debug(0, "omp_fulfill_event: %p\n", ev);
+  __atomic_store_n (&ev->fulfilled, true, __ATOMIC_RELAXED);
+  gomp_sem_post (&ev->completion_sem);
+  if (team)
+    gomp_team_barrier_wake (&team->barrier, 1);
+}
+
+ialias (omp_fulfill_event)
diff --git a/libgomp/team.c b/libgomp/team.c
index cbc3aec..ee488f2 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -206,6 +206,9 @@ gomp_new_team (unsigned nthreads)
   team->work_share_cancelled = 0;
   team->team_cancelled = 0;
 
+  priority_queue_init (&team->task_detach_queue);
+  team->task_detach_count = 0;
+
   return team;
 }
 
@@ -221,6 +224,7 @@ free_team (struct gomp_team *team)
   gomp_barrier_destroy (&team->barrier);
   gomp_mutex_destroy (&team->task_lock);
   priority_queue_free (&team->task_queue);
+  priority_queue_free (&team->task_detach_queue);
   team_free (team);
 }
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
new file mode 100644
index 0000000..8583e37
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test chaining of detached tasks, with each task fulfilling the
+   completion event of the previous one.  */
+
+int main (void)
+{
+  omp_event_handle_t detach_event1, detach_event2;
+  int x = 0, y = 0, z = 0;
+
+  #pragma omp parallel
+    #pragma omp single
+    {
+      #pragma omp task detach(detach_event1)
+	x++;
+
+      #pragma omp task detach(detach_event2)
+      {
+	y++;
+	omp_fulfill_event (detach_event1);
+      }
+
+      #pragma omp task
+      {
+	z++;
+	omp_fulfill_event (detach_event2);
+      }
+    }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
new file mode 100644
index 0000000..943ac2a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test handling of detach clause with only a single thread.  The runtime
+   should not block when a task with an unfulfilled event finishes
+   running.  */
+
+int main (void)
+{
+  omp_event_handle_t detach_event1, detach_event2;
+  int x = 0, y = 0, z = 0;
+
+  #pragma omp parallel num_threads(1)
+    #pragma omp single
+    {
+      #pragma omp task detach(detach_event1)
+	x++;
+
+      #pragma omp task detach(detach_event2)
+      {
+	y++;
+	omp_fulfill_event (detach_event1);
+      }
+
+      #pragma omp task
+      {
+	z++;
+	omp_fulfill_event (detach_event2);
+      }
+    }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
new file mode 100644
index 0000000..2609fb1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test the task detach clause used together with dependencies.  */
+
+int main (void)
+{
+  omp_event_handle_t detach_event;
+  int x = 0, y = 0, z = 0;
+  int dep;
+
+  #pragma omp parallel
+    #pragma omp single
+    {
+      #pragma omp task depend(out:dep) detach(detach_event)
+	x++;
+
+      #pragma omp task
+      {
+	y++;
+	omp_fulfill_event(detach_event);
+      }
+
+      #pragma omp task depend(in:dep)
+	z++;
+    }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
new file mode 100644
index 0000000..0be74af
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test detach clause, where a task fulfills its own completion event.  */
+
+int main (void)
+{
+  omp_event_handle_t detach_event;
+  int x = 0;
+
+  #pragma omp parallel
+    #pragma omp single
+      #pragma omp task detach(detach_event)
+      {
+	x++;
+	omp_fulfill_event(detach_event);
+      }
+
+  assert (x == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
new file mode 100644
index 0000000..ab80ed8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test tasks with detach clause.  Each thread spawns off a chain of tasks,
+   that can then be executed by any available thread.  */
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+  int thread_count;
+  omp_event_handle_t detach_event1, detach_event2;
+
+  #pragma omp parallel
+  {
+    #pragma omp single
+      thread_count = omp_get_num_threads();
+
+    #pragma omp task detach(detach_event1) untied
+      #pragma omp atomic update
+	x++;
+
+    #pragma omp task detach(detach_event2) untied
+    {
+      #pragma omp atomic update
+	y++;
+      omp_fulfill_event (detach_event1);
+    }
+
+    #pragma omp task untied
+    {
+      #pragma omp atomic update
+	z++;
+      omp_fulfill_event (detach_event2);
+    }
+  }
+
+  assert (x == thread_count);
+  assert (y == thread_count);
+  assert (z == thread_count);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
new file mode 100644
index 0000000..45683ee
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test tasks with detach clause on an offload device.  Each device
+   thread spawns off a chain of tasks, that can then be executed by
+   any available thread.  */
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+  int thread_count;
+  omp_event_handle_t detach_event1, detach_event2;
+
+  #pragma omp target map(tofrom: x, y, z) map(from: thread_count)
+    #pragma omp parallel
+      {
+	#pragma omp single
+	  thread_count = omp_get_num_threads();
+
+	#pragma omp task detach(detach_event1) untied
+	  #pragma omp atomic update
+	    x++;
+
+	#pragma omp task detach(detach_event2) untied
+	{
+	  #pragma omp atomic update
+	    y++;
+	  omp_fulfill_event (detach_event1);
+	}
+
+	#pragma omp task untied
+	{
+	  #pragma omp atomic update
+	    z++;
+	  omp_fulfill_event (detach_event2);
+	}
+
+	#pragma omp taskwait
+      }
+
+  assert (x == thread_count);
+  assert (y == thread_count);
+  assert (z == thread_count);
+}
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-1.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
new file mode 100644
index 0000000..217bf65
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
@@ -0,0 +1,33 @@
+! { dg-do run }
+
+! Test chaining of detached tasks, with each task fulfilling the
+! completion event of the previous one.
+
+program task_detach_1
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+
+  !$omp parallel
+    !$omp single
+      !$omp task detach(detach_event1)
+        x = x + 1
+      !$omp end task
+
+      !$omp task detach(detach_event2)
+        y = y + 1
+	call omp_fulfill_event (detach_event1)
+      !$omp end task
+
+      !$omp task
+        z = z + 1
+	call omp_fulfill_event (detach_event2)
+      !$omp end task
+    !$omp end single
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-2.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
new file mode 100644
index 0000000..ecb4829
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
@@ -0,0 +1,34 @@
+! { dg-do run }
+
+! Test handling of detach clause with only a single thread.  The runtime
+! should not block when a task with an unfulfilled event finishes
+! running.
+
+program task_detach_2
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+
+  !$omp parallel num_threads(1)
+    !$omp single
+      !$omp task detach(detach_event1)
+        x = x + 1
+      !$omp end task
+
+      !$omp task detach(detach_event2)
+        y = y + 1
+	call omp_fulfill_event (detach_event1)
+      !$omp end task
+
+      !$omp task
+        z = z + 1
+	call omp_fulfill_event (detach_event2)
+      !$omp end task
+    !$omp end single
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-3.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
new file mode 100644
index 0000000..bdf93a5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
@@ -0,0 +1,33 @@
+! { dg-do run }
+
+! Test the task detach clause used together with dependencies.
+
+program task_detach_3
+
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event
+  integer :: x = 0, y = 0, z = 0
+  integer :: dep
+
+  !$omp parallel
+    !$omp single
+      !$omp task depend(out:dep) detach(detach_event)
+        x = x + 1
+      !$omp end task
+
+      !$omp task
+        y = y + 1
+	call omp_fulfill_event(detach_event)
+      !$omp end task
+
+      !$omp task depend(in:dep)
+        z = z + 1
+      !$omp end task
+    !$omp end single
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-4.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-4.f90
new file mode 100644
index 0000000..6d0843c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-4.f90
@@ -0,0 +1,22 @@
+! { dg-do run }
+
+! Test detach clause, where a task fulfills its own completion event.
+
+program task_detach_4
+
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event
+  integer :: x = 0
+
+  !$omp parallel
+    !$omp single
+      !$omp task detach(detach_event)
+        x = x + 1
+	call omp_fulfill_event(detach_event)
+      !$omp end task
+    !$omp end single
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-5.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
new file mode 100644
index 0000000..96e6387
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
@@ -0,0 +1,39 @@
+! { dg-do run }
+
+! Test tasks with detach clause.  Each thread spawns off a chain of tasks,
+! that can then be executed by any available thread.
+
+program task_detach_5
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+  integer :: thread_count
+
+  !$omp parallel
+    !$omp single
+      thread_count = omp_get_num_threads()
+    !$omp end single
+
+    !$omp task detach(detach_event1) untied
+      !$omp atomic update
+	x = x + 1
+    !$omp end task
+
+    !$omp task detach(detach_event2) untied
+      !$omp atomic update
+	y = y + 1
+      call omp_fulfill_event (detach_event1);
+    !$omp end task
+
+    !$omp task untied
+      !$omp atomic update
+	z = z + 1
+      call omp_fulfill_event (detach_event2);
+    !$omp end task
+  !$omp end parallel
+
+  if (x /= thread_count) stop 1
+  if (y /= thread_count) stop 2
+  if (z /= thread_count) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
new file mode 100644
index 0000000..54227ef
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
@@ -0,0 +1,44 @@
+! { dg-do run }
+
+! Test tasks with detach clause on an offload device.  Each device
+! thread spawns off a chain of tasks, that can then be executed by
+! any available thread.
+
+program task_detach_6
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+  integer :: thread_count
+
+  !$omp target map(tofrom: x, y, z) map(from: thread_count)
+    !$omp parallel
+      !$omp single
+	thread_count = omp_get_num_threads()
+      !$omp end single
+
+      !$omp task detach(detach_event1) untied
+	!$omp atomic update
+	  x = x + 1
+      !$omp end task
+
+      !$omp task detach(detach_event2) untied
+	!$omp atomic update
+	  y = y + 1
+	call omp_fulfill_event (detach_event1);
+      !$omp end task
+
+      !$omp task untied
+	!$omp atomic update
+	  z = z + 1
+	call omp_fulfill_event (detach_event2);
+      !$omp end task
+
+      !$omp taskwait
+    !$omp end parallel
+  !$omp end target
+
+  if (x /= thread_count) stop 1
+  if (y /= thread_count) stop 2
+  if (z /= thread_count) stop 3
+end program
Jakub Jelinek Dec. 9, 2020, 5:53 p.m. UTC | #3
On Wed, Dec 09, 2020 at 05:37:24PM +0000, Kwok Cheung Yeung wrote:
> I believe this patch is largely complete now. I have done a bootstrap on
> x86_64 and run the testsuites with no regressions. I have also run the
> libgomp testsuite with offloading to Nvidia and AMD GCN devices, also with
> no regressions. Is this patch okay for trunk (or would it be more
> appropriate to wait until GCC 11 is branched off)?

I think it is desirable for GCC 11, doesn't need to be deferred, and sorry
it is taking me so long.  I've paged in the standard wording related to this
yesterday and hoped I'd look at this, but didn't manage, will try to do that
tomorrow or worst case on Friday.

	Jakub
Kwok Cheung Yeung Dec. 9, 2020, 5:55 p.m. UTC | #4
On 09/12/2020 5:53 pm, Jakub Jelinek wrote:
> On Wed, Dec 09, 2020 at 05:37:24PM +0000, Kwok Cheung Yeung wrote:
>> I believe this patch is largely complete now. I have done a bootstrap on
>> x86_64 and run the testsuites with no regressions. I have also run the
>> libgomp testsuite with offloading to Nvidia and AMD GCN devices, also with
>> no regressions. Is this patch okay for trunk (or would it be more
>> appropriate to wait until GCC 11 is branched off)?
> 
> I think it is desirable for GCC 11, doesn't need to be deferred, and sorry
> it is taking me so long.  I've paged in the standard wording related to this
> yesterday and hoped I'd look at this, but didn't manage, will try to do that
> tomorrow or worst case on Friday.

No problem :-), and thanks for looking at the patch.

Kwok
Jakub Jelinek Dec. 10, 2020, 2:38 p.m. UTC | #5
On Wed, Dec 09, 2020 at 05:37:24PM +0000, Kwok Cheung Yeung wrote:
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -14942,6 +14942,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
>  	  pc = &OMP_CLAUSE_CHAIN (c);
>  	  continue;
>  
> +	case OMP_CLAUSE_DETACH:
> +	  t = OMP_CLAUSE_DECL (c);
> +	  pc = &OMP_CLAUSE_CHAIN (c);
> +	  continue;
> +

If you wouldn't need to do anything for C for the detach clause, just would
just add:
	case OMP_CLAUSE_DETACH:
at the end of the case list that starts below:
>  	case OMP_CLAUSE_IF:
>  	case OMP_CLAUSE_NUM_THREADS:
>  	case OMP_CLAUSE_NUM_TEAMS:

But you actually do need to do something, even for C.

There are two restrictions:
- At most one detach clause can appear on the directive.
- If a detach clause appears on the directive, then a mergeable clause cannot appear on the same directive.
that should be checked and diagnosed.  One place to do that would be
like usually in all the FEs separately, that would mean adding
  bool mergeable_seen = false, detach_seen = false;
vars and for those clauses setting the *_seen, plus for DETACH
already complain if detach_seen is already true and remove the clause.
And at the end of the loop if mergeable_seen && detach_seen, diagnose
and remove one of them (perhaps better detach clause).
There is the optional second loop that can be used for the removal...

Testcase coverage should include:
  #pragma omp task detach (x) detach (y)
as well as
  #pragma omp task mergeable detach (x)
and
  #pragma omp task detach (x) mergeable
(and likewise for Fortran).

> +  if (cp_lexer_next_token_is_not (parser->lexer, CPP_NAME))
> +    {
> +      cp_parser_error (parser, "expected identifier");
> +      return list;
> +    }
> +
> +  location_t id_loc = cp_lexer_peek_token (parser->lexer)->location;
> +  tree t, identifier = cp_parser_identifier (parser);
> +
> +  if (identifier == error_mark_node)
> +    t = error_mark_node;
> +  else
> +    {
> +      t = cp_parser_lookup_name_simple
> +	    (parser, identifier,
> +	     cp_lexer_peek_token (parser->lexer)->location);
> +      if (t == error_mark_node)
> +	cp_parser_name_lookup_error (parser, identifier, t, NLE_NULL,
> +				     id_loc);

The above doesn't match what cp_parser_omp_var_list_no_open does,
in particular it should use cp_parser_id_expression
instead of cp_parser_identifier etc.

> +      else
> +	{
> +	  tree type = TYPE_MAIN_VARIANT (TREE_TYPE (t));
> +	  if (!INTEGRAL_TYPE_P (type)
> +	      || TREE_CODE (type) != ENUMERAL_TYPE
> +	      || DECL_NAME (TYPE_NAME (type))
> +		   != get_identifier ("omp_event_handle_t"))
> +	    {
> +	      error_at (id_loc, "%<detach%> clause event handle "
> +			    "has type %qT rather than "
> +			    "%<omp_event_handle_t%>",
> +			    type);
> +	      return list;

You can't do this here for C++, it needs to be done in finish_omp_clauses
instead and only be done if the type is not a dependent type.
Consider (e.g. should be in testsuite)
template <typename T>
void
foo ()
{
  T t;
  #pragma omp task detach (t)
  ;
}

template <typename T>
void
bar ()
{
  T t;
  #pragma omp task detach (t)
  ;
}

void
baz ()
{
  foo <omp_event_handle_t> ();
  bar <int> (); // Instantiating this should error
}

> @@ -7394,6 +7394,9 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
>  		}
>  	    }
>  	  break;
> +	case OMP_CLAUSE_DETACH:
> +	  t = OMP_CLAUSE_DECL (c);
> +	  break;
>  

Again, restriction checking here, plus check the type if it is
non-dependent, otherwise defer that checking for finish_omp_clauses when
it will not be dependent anymore.

I think you need to handle OMP_CLAUSE_DETACH in cp/pt.c too.

> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -9733,6 +9733,19 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
>  	    }
>  	  break;
>  
> +	case OMP_CLAUSE_DETACH:
> +	  decl = OMP_CLAUSE_DECL (c);
> +	  if (outer_ctx)
> +	    {
> +	      splay_tree_node on
> +		= splay_tree_lookup (outer_ctx->variables,
> +				     (splay_tree_key)decl);
> +	      if (on == NULL || (on->value & GOVD_DATA_SHARE_CLASS) == 0)
> +		omp_firstprivatize_variable (outer_ctx, decl);
> +	      omp_notice_variable (outer_ctx, decl, true);
> +	    }
> +	  break;

I don't understand this.  My reading of:
"The event-handle will be considered as if it was specified on a
firstprivate clause. The use of a variable in a detach clause expression of a task
construct causes an implicit reference to the variable in all enclosing
constructs."
is that we should do:
      case OMP_CLAUSE_DETACH:
	decl = OMP_CLAUSE_DECL (c);
	goto do_notice;
which does the second sentence, and for the first sentence I believe it
talks about the task construct rather than about the outer construct.
So (again, something for testsuite):
void
foo (void)
{
  omp_event_handle_t t;
  #pragma omp parallel master default (none) /* { dg-error "..." } */
  {
    #pragma omp task detach (t)
    ;
  }
}
The dg-error should be the usual error about t being referenced in the
construct but not specified in the data sharing clauses on parallel.
And then
void
bar (void)
{
  omp_event_handle_t t;
  #pragma omp task detach (t) default (none)
  omp_fullfill_event (t); // This should be ok, above first sentence says
// that it is as if firstprivate (t)
}

But I think it is actually even stronger than that,
  #pragma omp task detach (t) firstprivate (t)
and
  #pragma omp task detach (t) shared (t)
etc. should be invalid too (at least in pedantic reading).
I guess we should ask on omp-lang.  If it actually works as firstprivate
(t), perhaps we should handle it that way already in the FEs.
> --- a/gcc/omp-expand.c
> +++ b/gcc/omp-expand.c
> @@ -762,6 +762,7 @@ expand_task_call (struct omp_region *region, basic_block bb,
>    tree depend = omp_find_clause (clauses, OMP_CLAUSE_DEPEND);
>    tree finalc = omp_find_clause (clauses, OMP_CLAUSE_FINAL);
>    tree priority = omp_find_clause (clauses, OMP_CLAUSE_PRIORITY);
> +  tree detach = omp_find_clause (clauses, OMP_CLAUSE_DETACH);
>  
>    unsigned int iflags
>      = (untied ? GOMP_TASK_FLAG_UNTIED : 0)
> @@ -853,6 +854,11 @@ expand_task_call (struct omp_region *region, basic_block bb,
>      priority = integer_zero_node;
>  
>    gsi = gsi_last_nondebug_bb (bb);
> +
> +  detach = detach
> +      ? fold_convert (pointer_sized_int_node, OMP_CLAUSE_DETACH_EXPR (detach))
> +      : null_pointer_node;

Formatting is wrong, would need to be
  detach = (detach
	    ? fold_convert (pointer_sized_int_node,
			    OMP_CLAUSE_DETACH_EXPR (detach))
	    : ...);
Now, null_pointer_node doesn't have pointer_sized_int type, so there would
be type mismatch.  But additionally, let's talk about how it should be
implemented.  I'd say best would be if:
  #pragma omp task detach (x)
  ;
expands as
  GOMP_task (..., ..., ..., ..., ..., ..., other_flags | GOMP_TASK_FLAG_DETACH, ..., ..., &x);
where GOMP_task allocates the structure for the struct gomp_task the way it
usually does, will have something in that structure include a gomp_sem_t and
and a flag that detach is needed (and if detach is needed gomp_sem_init that
semaphore) and stores to *detach the address of that semaphore or so.
Then you don't need a separate call and allocate something separately.
The GOMP_TASK_FLAG_DETACH is needed, because we keep extending GOMP_task
rather than creating new versions of that, and so to stay (at least on
ABIs I care about) ABI compatible we must not assume detach will have
meaningful value if GOMP_TASK_FLAG_DETACH bit is not set in flags.
I think we can make the 10th argument to BUILT_IN_GOMP_TASK just void *
e.g. to simplify omp-builtins.def and builtin-types.def etc., and just
cast it to omp_eventhandler_t * inside of GOMP_task.

And, when we pass &x to GOMP_task, we need the detach decl to be
addressable, so best in the FEs ensure it is addressable by calling
*mark_addressable there.

> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -299,6 +299,7 @@ unsigned const char omp_clause_num_ops[] =
>    1, /* OMP_CLAUSE_LINK  */
>    2, /* OMP_CLAUSE_FROM  */
>    2, /* OMP_CLAUSE_TO  */
> +  1, /* OMP_CLAUSE_DETACH  */
>    2, /* OMP_CLAUSE_MAP  */
>    1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
>    1, /* OMP_CLAUSE_USE_DEVICE_ADDR  */

This position for OMP_CLAUSE_DETACH is incorrect (but so is
USE_DEVICE_*/IS_DEVICE*/INCLUSIVE/EXCLUSIVE it seems), given:
#define OMP_CLAUSE_SIZE(NODE)                                           \
  OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE),  \
                                              OMP_CLAUSE_FROM,          \
                                              OMP_CLAUSE__CACHE_), 1)
I think for OMP_CLAUSE_SIZE we just want from/to/map/_cache_ and nothing
else, so I'd move detach/use*/is*/*clusive right before from clause.
Here 3 times and in tree-core.h to match.
We want OMP_CLAUSE_DECL to work for them, which is OMP_CLAUSE_PRIVATE ..
OMP_CLAUSE__SCANTEMP_.  

> +#define OMP_CLAUSE_DETACH_EXPR(NODE) \
> +  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DETACH),0)

No need for that, OMP_CLAUSE_DECL can be used for that.
It is not an expression, but decl anyway.

I don't see tree-nested.c changes, that is needed too.

> @@ -556,6 +557,13 @@
>          end interface
>  
>          interface
> +          subroutine omp_fulfill_event (event)
> +            use omp_lib_kinds
> +            integer (kind=omp_event_handle_kind), value, intent(in) :: event

Too long line?

> diff --git a/libgomp/priority_queue.h b/libgomp/priority_queue.h
> index 0ad78f5..c6fd80d 100644
> --- a/libgomp/priority_queue.h
> +++ b/libgomp/priority_queue.h
> @@ -113,6 +113,8 @@ enum priority_queue_type
>    PQ_IGNORED = 999
>  };
>  
> +typedef bool (*priority_queue_predicate)(struct gomp_task *);

Formatting, space before (.
> +
>  /* Priority queue implementation prototypes.  */
>  
>  extern bool priority_queue_task_in_queue_p (enum priority_queue_type,
> @@ -122,6 +124,9 @@ extern void priority_queue_dump (enum priority_queue_type,
>  				 struct priority_queue *);
>  extern void priority_queue_verify (enum priority_queue_type,
>  				   struct priority_queue *, bool);
> +extern struct gomp_task *priority_queue_find (enum priority_queue_type,
> +					      struct priority_queue *,
> +					      priority_queue_predicate);
>  extern void priority_tree_remove (enum priority_queue_type,
>  				  struct priority_queue *,
>  				  struct priority_node *);
> diff --git a/libgomp/task.c b/libgomp/task.c
> index a95067c..ae1fcf7 100644
> --- a/libgomp/task.c
> +++ b/libgomp/task.c
> @@ -29,6 +29,7 @@
>  #include "libgomp.h"
>  #include <stdlib.h>
>  #include <string.h>
> +#include <stdio.h>
>  #include "gomp-constants.h"
>  
>  typedef struct gomp_task_depend_entry *hash_entry_type;
> @@ -86,6 +87,7 @@ gomp_init_task (struct gomp_task *task, struct gomp_task *parent_task,
>    task->dependers = NULL;
>    task->depend_hash = NULL;
>    task->depend_count = 0;
> +  task->detach_event = NULL;
>  }
>  
>  /* Clean up a task, after completing it.  */
> @@ -326,6 +328,28 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
>      }
>  }
>  
> +uintptr_t
> +GOMP_new_event ()
> +{
> +  struct gomp_allow_completion_event *event;
> +
> +  event = (struct gomp_allow_completion_event *)
> +	    gomp_malloc (sizeof (struct gomp_allow_completion_event));
> +  event->fulfilled = false;
> +  gomp_sem_init (&event->completion_sem, 0);
> +
> +  gomp_debug (0, "GOMP_new_event: %p\n", event);
> +
> +  return (uintptr_t) event;
> +}
> +
> +static bool
> +task_fulfilled_p (struct gomp_task *task)
> +{
> +  return __atomic_load_n (&task->detach_event->fulfilled,
> +			  __ATOMIC_RELAXED);
> +}

I don't understand the fulfilled and sem duality, I think there should be
a flag that the flag has detach at all, and then just a semaphore, the
semaphore state itself would indicate if it is fulfilled or not.

> @@ -347,11 +371,14 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
>  void
>  GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
>  	   long arg_size, long arg_align, bool if_clause, unsigned flags,
> -	   void **depend, int priority)
> +	   void **depend, int priority, uintptr_t detach)
>  {
>    struct gomp_thread *thr = gomp_thread ();
>    struct gomp_team *team = thr->ts.team;
>  
> +  struct gomp_allow_completion_event *detach_event =
> +    detach ? (struct gomp_allow_completion_event *) detach : NULL;

Formatting (no = at the end of line), but as I said above, I'd prefer
to do it differently.

> +void omp_fulfill_event(omp_event_handle_t event)

Formatting.
omp_fulfill_event on separate line from void,
and space before (.

> +{
> +  struct gomp_allow_completion_event *ev =
> +		(struct gomp_allow_completion_event *) event;

Formatting, no = at the end of line and the (struct ... is indented
wierdly.

	Jakub
Jakub Jelinek Dec. 10, 2020, 3:37 p.m. UTC | #6
On Thu, Dec 10, 2020 at 03:38:40PM +0100, Jakub Jelinek via Gcc-patches wrote:
> I don't understand this.  My reading of:
> "The event-handle will be considered as if it was specified on a
> firstprivate clause. The use of a variable in a detach clause expression of a task
> construct causes an implicit reference to the variable in all enclosing
> constructs."
> is that we should do:
>       case OMP_CLAUSE_DETACH:
> 	decl = OMP_CLAUSE_DECL (c);
> 	goto do_notice;
> which does the second sentence, and for the first sentence I believe it
> talks about the task construct rather than about the outer construct.
> So (again, something for testsuite):
> void
> foo (void)
> {
>   omp_event_handle_t t;
>   #pragma omp parallel master default (none) /* { dg-error "..." } */
>   {
>     #pragma omp task detach (t)
>     ;
>   }
> }
> The dg-error should be the usual error about t being referenced in the
> construct but not specified in the data sharing clauses on parallel.
> And then
> void
> bar (void)
> {
>   omp_event_handle_t t;
>   #pragma omp task detach (t) default (none)
>   omp_fullfill_event (t); // This should be ok, above first sentence says
> // that it is as if firstprivate (t)
> }
> 
> But I think it is actually even stronger than that,
>   #pragma omp task detach (t) firstprivate (t)
> and
>   #pragma omp task detach (t) shared (t)
> etc. should be invalid too (at least in pedantic reading).
> I guess we should ask on omp-lang.  If it actually works as firstprivate
> (t), perhaps we should handle it that way already in the FEs.

Asked and Alex said that both should be invalid.  Though, if we implement
detach as passing address of the variable to GOMP_task, if we implicitly
add firstprivate clause it would copy the value from before it has been
initialized.  One way to handle that would be not add firstprivate clause
next to detach, but treat detach like a firstprivate clause in most places,
and just for the passing pass it specially (let parent of task pass address
of the variable and let the receiving side recieve the value instead,
which would force task_cpyfn, or handle it more like we handle the bounds
of a taskloop - force the omp_eventhandler_t to be the first variable in the
structure and let GOMP_task write the address not just to *detach, but also
to the first element in the structure.

	Jakub
Jakub Jelinek Dec. 18, 2020, 4:42 p.m. UTC | #7
On Thu, Dec 10, 2020 at 03:38:40PM +0100, Jakub Jelinek via Gcc-patches wrote:
One further thing, the detach clause effectively writes to the var, so
unless you mark the clause decl addressable as I've suggested, you should
if (omp_shared_to_firstprivate_optimizable_decl_p (decl))
  omp_mark_stores (gimplify_omp_ctxp->outer_context, decl);
e.g. in gimplify_adjust_omp_clauses OMP_CLAUSE_DETACH: handling, to make
sure that outer parallel/task etc. regions don't try to optimize the var
from shared to firstprivate through OMP_CLAUSE_SHARED_READONLY flag on
OMP_CLAUSE_SHARED.

	Jakub
Kwok Cheung Yeung Jan. 11, 2021, 3:02 p.m. UTC | #8
Hello

Thanks for the review. Due to the Christmas holidays I have not finished 
addressing all these issues yet, but I expect to be done by the end of this 
week. Can this patch still make it for GCC 10, as I believe stage 4 is starting 
soon?

Thanks

Kwok

On 10/12/2020 2:38 pm, Jakub Jelinek wrote:
> On Wed, Dec 09, 2020 at 05:37:24PM +0000, Kwok Cheung Yeung wrote:
>> --- a/gcc/c/c-typeck.c
>> +++ b/gcc/c/c-typeck.c
>> @@ -14942,6 +14942,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
>>   	  pc = &OMP_CLAUSE_CHAIN (c);
>>   	  continue;
>>   
>> +	case OMP_CLAUSE_DETACH:
>> +	  t = OMP_CLAUSE_DECL (c);
>> +	  pc = &OMP_CLAUSE_CHAIN (c);
>> +	  continue;
>> +
> 
> If you wouldn't need to do anything for C for the detach clause, just would
> just add:
> 	case OMP_CLAUSE_DETACH:
> at the end of the case list that starts below:
>>   	case OMP_CLAUSE_IF:
>>   	case OMP_CLAUSE_NUM_THREADS:
>>   	case OMP_CLAUSE_NUM_TEAMS:
> 
> But you actually do need to do something, even for C.
> 
> There are two restrictions:
> - At most one detach clause can appear on the directive.
> - If a detach clause appears on the directive, then a mergeable clause cannot appear on the same directive.
> that should be checked and diagnosed.  One place to do that would be
> like usually in all the FEs separately, that would mean adding
>    bool mergeable_seen = false, detach_seen = false;
> vars and for those clauses setting the *_seen, plus for DETACH
> already complain if detach_seen is already true and remove the clause.
> And at the end of the loop if mergeable_seen && detach_seen, diagnose
> and remove one of them (perhaps better detach clause).
> There is the optional second loop that can be used for the removal...
> 
> Testcase coverage should include:
>    #pragma omp task detach (x) detach (y)
> as well as
>    #pragma omp task mergeable detach (x)
> and
>    #pragma omp task detach (x) mergeable
> (and likewise for Fortran).
> 
>> +  if (cp_lexer_next_token_is_not (parser->lexer, CPP_NAME))
>> +    {
>> +      cp_parser_error (parser, "expected identifier");
>> +      return list;
>> +    }
>> +
>> +  location_t id_loc = cp_lexer_peek_token (parser->lexer)->location;
>> +  tree t, identifier = cp_parser_identifier (parser);
>> +
>> +  if (identifier == error_mark_node)
>> +    t = error_mark_node;
>> +  else
>> +    {
>> +      t = cp_parser_lookup_name_simple
>> +	    (parser, identifier,
>> +	     cp_lexer_peek_token (parser->lexer)->location);
>> +      if (t == error_mark_node)
>> +	cp_parser_name_lookup_error (parser, identifier, t, NLE_NULL,
>> +				     id_loc);
> 
> The above doesn't match what cp_parser_omp_var_list_no_open does,
> in particular it should use cp_parser_id_expression
> instead of cp_parser_identifier etc.
> 
>> +      else
>> +	{
>> +	  tree type = TYPE_MAIN_VARIANT (TREE_TYPE (t));
>> +	  if (!INTEGRAL_TYPE_P (type)
>> +	      || TREE_CODE (type) != ENUMERAL_TYPE
>> +	      || DECL_NAME (TYPE_NAME (type))
>> +		   != get_identifier ("omp_event_handle_t"))
>> +	    {
>> +	      error_at (id_loc, "%<detach%> clause event handle "
>> +			    "has type %qT rather than "
>> +			    "%<omp_event_handle_t%>",
>> +			    type);
>> +	      return list;
> 
> You can't do this here for C++, it needs to be done in finish_omp_clauses
> instead and only be done if the type is not a dependent type.
> Consider (e.g. should be in testsuite)
> template <typename T>
> void
> foo ()
> {
>    T t;
>    #pragma omp task detach (t)
>    ;
> }
> 
> template <typename T>
> void
> bar ()
> {
>    T t;
>    #pragma omp task detach (t)
>    ;
> }
> 
> void
> baz ()
> {
>    foo <omp_event_handle_t> ();
>    bar <int> (); // Instantiating this should error
> }
> 
>> @@ -7394,6 +7394,9 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
>>   		}
>>   	    }
>>   	  break;
>> +	case OMP_CLAUSE_DETACH:
>> +	  t = OMP_CLAUSE_DECL (c);
>> +	  break;
>>   
> 
> Again, restriction checking here, plus check the type if it is
> non-dependent, otherwise defer that checking for finish_omp_clauses when
> it will not be dependent anymore.
> 
> I think you need to handle OMP_CLAUSE_DETACH in cp/pt.c too.
> 
>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -9733,6 +9733,19 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
>>   	    }
>>   	  break;
>>   
>> +	case OMP_CLAUSE_DETACH:
>> +	  decl = OMP_CLAUSE_DECL (c);
>> +	  if (outer_ctx)
>> +	    {
>> +	      splay_tree_node on
>> +		= splay_tree_lookup (outer_ctx->variables,
>> +				     (splay_tree_key)decl);
>> +	      if (on == NULL || (on->value & GOVD_DATA_SHARE_CLASS) == 0)
>> +		omp_firstprivatize_variable (outer_ctx, decl);
>> +	      omp_notice_variable (outer_ctx, decl, true);
>> +	    }
>> +	  break;
> 
> I don't understand this.  My reading of:
> "The event-handle will be considered as if it was specified on a
> firstprivate clause. The use of a variable in a detach clause expression of a task
> construct causes an implicit reference to the variable in all enclosing
> constructs."
> is that we should do:
>        case OMP_CLAUSE_DETACH:
> 	decl = OMP_CLAUSE_DECL (c);
> 	goto do_notice;
> which does the second sentence, and for the first sentence I believe it
> talks about the task construct rather than about the outer construct.
> So (again, something for testsuite):
> void
> foo (void)
> {
>    omp_event_handle_t t;
>    #pragma omp parallel master default (none) /* { dg-error "..." } */
>    {
>      #pragma omp task detach (t)
>      ;
>    }
> }
> The dg-error should be the usual error about t being referenced in the
> construct but not specified in the data sharing clauses on parallel.
> And then
> void
> bar (void)
> {
>    omp_event_handle_t t;
>    #pragma omp task detach (t) default (none)
>    omp_fullfill_event (t); // This should be ok, above first sentence says
> // that it is as if firstprivate (t)
> }
> 
> But I think it is actually even stronger than that,
>    #pragma omp task detach (t) firstprivate (t)
> and
>    #pragma omp task detach (t) shared (t)
> etc. should be invalid too (at least in pedantic reading).
> I guess we should ask on omp-lang.  If it actually works as firstprivate
> (t), perhaps we should handle it that way already in the FEs.
>> --- a/gcc/omp-expand.c
>> +++ b/gcc/omp-expand.c
>> @@ -762,6 +762,7 @@ expand_task_call (struct omp_region *region, basic_block bb,
>>     tree depend = omp_find_clause (clauses, OMP_CLAUSE_DEPEND);
>>     tree finalc = omp_find_clause (clauses, OMP_CLAUSE_FINAL);
>>     tree priority = omp_find_clause (clauses, OMP_CLAUSE_PRIORITY);
>> +  tree detach = omp_find_clause (clauses, OMP_CLAUSE_DETACH);
>>   
>>     unsigned int iflags
>>       = (untied ? GOMP_TASK_FLAG_UNTIED : 0)
>> @@ -853,6 +854,11 @@ expand_task_call (struct omp_region *region, basic_block bb,
>>       priority = integer_zero_node;
>>   
>>     gsi = gsi_last_nondebug_bb (bb);
>> +
>> +  detach = detach
>> +      ? fold_convert (pointer_sized_int_node, OMP_CLAUSE_DETACH_EXPR (detach))
>> +      : null_pointer_node;
> 
> Formatting is wrong, would need to be
>    detach = (detach
> 	    ? fold_convert (pointer_sized_int_node,
> 			    OMP_CLAUSE_DETACH_EXPR (detach))
> 	    : ...);
> Now, null_pointer_node doesn't have pointer_sized_int type, so there would
> be type mismatch.  But additionally, let's talk about how it should be
> implemented.  I'd say best would be if:
>    #pragma omp task detach (x)
>    ;
> expands as
>    GOMP_task (..., ..., ..., ..., ..., ..., other_flags | GOMP_TASK_FLAG_DETACH, ..., ..., &x);
> where GOMP_task allocates the structure for the struct gomp_task the way it
> usually does, will have something in that structure include a gomp_sem_t and
> and a flag that detach is needed (and if detach is needed gomp_sem_init that
> semaphore) and stores to *detach the address of that semaphore or so.
> Then you don't need a separate call and allocate something separately.
> The GOMP_TASK_FLAG_DETACH is needed, because we keep extending GOMP_task
> rather than creating new versions of that, and so to stay (at least on
> ABIs I care about) ABI compatible we must not assume detach will have
> meaningful value if GOMP_TASK_FLAG_DETACH bit is not set in flags.
> I think we can make the 10th argument to BUILT_IN_GOMP_TASK just void *
> e.g. to simplify omp-builtins.def and builtin-types.def etc., and just
> cast it to omp_eventhandler_t * inside of GOMP_task.
> 
> And, when we pass &x to GOMP_task, we need the detach decl to be
> addressable, so best in the FEs ensure it is addressable by calling
> *mark_addressable there.
> 
>> --- a/gcc/tree.c
>> +++ b/gcc/tree.c
>> @@ -299,6 +299,7 @@ unsigned const char omp_clause_num_ops[] =
>>     1, /* OMP_CLAUSE_LINK  */
>>     2, /* OMP_CLAUSE_FROM  */
>>     2, /* OMP_CLAUSE_TO  */
>> +  1, /* OMP_CLAUSE_DETACH  */
>>     2, /* OMP_CLAUSE_MAP  */
>>     1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
>>     1, /* OMP_CLAUSE_USE_DEVICE_ADDR  */
> 
> This position for OMP_CLAUSE_DETACH is incorrect (but so is
> USE_DEVICE_*/IS_DEVICE*/INCLUSIVE/EXCLUSIVE it seems), given:
> #define OMP_CLAUSE_SIZE(NODE)                                           \
>    OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE),  \
>                                                OMP_CLAUSE_FROM,          \
>                                                OMP_CLAUSE__CACHE_), 1)
> I think for OMP_CLAUSE_SIZE we just want from/to/map/_cache_ and nothing
> else, so I'd move detach/use*/is*/*clusive right before from clause.
> Here 3 times and in tree-core.h to match.
> We want OMP_CLAUSE_DECL to work for them, which is OMP_CLAUSE_PRIVATE ..
> OMP_CLAUSE__SCANTEMP_.
> 
>> +#define OMP_CLAUSE_DETACH_EXPR(NODE) \
>> +  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DETACH),0)
> 
> No need for that, OMP_CLAUSE_DECL can be used for that.
> It is not an expression, but decl anyway.
> 
> I don't see tree-nested.c changes, that is needed too.
> 
>> @@ -556,6 +557,13 @@
>>           end interface
>>   
>>           interface
>> +          subroutine omp_fulfill_event (event)
>> +            use omp_lib_kinds
>> +            integer (kind=omp_event_handle_kind), value, intent(in) :: event
> 
> Too long line?
> 
>> diff --git a/libgomp/priority_queue.h b/libgomp/priority_queue.h
>> index 0ad78f5..c6fd80d 100644
>> --- a/libgomp/priority_queue.h
>> +++ b/libgomp/priority_queue.h
>> @@ -113,6 +113,8 @@ enum priority_queue_type
>>     PQ_IGNORED = 999
>>   };
>>   
>> +typedef bool (*priority_queue_predicate)(struct gomp_task *);
> 
> Formatting, space before (.
>> +
>>   /* Priority queue implementation prototypes.  */
>>   
>>   extern bool priority_queue_task_in_queue_p (enum priority_queue_type,
>> @@ -122,6 +124,9 @@ extern void priority_queue_dump (enum priority_queue_type,
>>   				 struct priority_queue *);
>>   extern void priority_queue_verify (enum priority_queue_type,
>>   				   struct priority_queue *, bool);
>> +extern struct gomp_task *priority_queue_find (enum priority_queue_type,
>> +					      struct priority_queue *,
>> +					      priority_queue_predicate);
>>   extern void priority_tree_remove (enum priority_queue_type,
>>   				  struct priority_queue *,
>>   				  struct priority_node *);
>> diff --git a/libgomp/task.c b/libgomp/task.c
>> index a95067c..ae1fcf7 100644
>> --- a/libgomp/task.c
>> +++ b/libgomp/task.c
>> @@ -29,6 +29,7 @@
>>   #include "libgomp.h"
>>   #include <stdlib.h>
>>   #include <string.h>
>> +#include <stdio.h>
>>   #include "gomp-constants.h"
>>   
>>   typedef struct gomp_task_depend_entry *hash_entry_type;
>> @@ -86,6 +87,7 @@ gomp_init_task (struct gomp_task *task, struct gomp_task *parent_task,
>>     task->dependers = NULL;
>>     task->depend_hash = NULL;
>>     task->depend_count = 0;
>> +  task->detach_event = NULL;
>>   }
>>   
>>   /* Clean up a task, after completing it.  */
>> @@ -326,6 +328,28 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
>>       }
>>   }
>>   
>> +uintptr_t
>> +GOMP_new_event ()
>> +{
>> +  struct gomp_allow_completion_event *event;
>> +
>> +  event = (struct gomp_allow_completion_event *)
>> +	    gomp_malloc (sizeof (struct gomp_allow_completion_event));
>> +  event->fulfilled = false;
>> +  gomp_sem_init (&event->completion_sem, 0);
>> +
>> +  gomp_debug (0, "GOMP_new_event: %p\n", event);
>> +
>> +  return (uintptr_t) event;
>> +}
>> +
>> +static bool
>> +task_fulfilled_p (struct gomp_task *task)
>> +{
>> +  return __atomic_load_n (&task->detach_event->fulfilled,
>> +			  __ATOMIC_RELAXED);
>> +}
> 
> I don't understand the fulfilled and sem duality, I think there should be
> a flag that the flag has detach at all, and then just a semaphore, the
> semaphore state itself would indicate if it is fulfilled or not.
> 
>> @@ -347,11 +371,14 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
>>   void
>>   GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
>>   	   long arg_size, long arg_align, bool if_clause, unsigned flags,
>> -	   void **depend, int priority)
>> +	   void **depend, int priority, uintptr_t detach)
>>   {
>>     struct gomp_thread *thr = gomp_thread ();
>>     struct gomp_team *team = thr->ts.team;
>>   
>> +  struct gomp_allow_completion_event *detach_event =
>> +    detach ? (struct gomp_allow_completion_event *) detach : NULL;
> 
> Formatting (no = at the end of line), but as I said above, I'd prefer
> to do it differently.
> 
>> +void omp_fulfill_event(omp_event_handle_t event)
> 
> Formatting.
> omp_fulfill_event on separate line from void,
> and space before (.
> 
>> +{
>> +  struct gomp_allow_completion_event *ev =
>> +		(struct gomp_allow_completion_event *) event;
> 
> Formatting, no = at the end of line and the (struct ... is indented
> wierdly.
> 
> 	Jakub
>
Kwok Cheung Yeung Jan. 15, 2021, 3:07 p.m. UTC | #9
On 10/12/2020 2:38 pm, Jakub Jelinek wrote:
> On Wed, Dec 09, 2020 at 05:37:24PM +0000, Kwok Cheung Yeung wrote:
>> --- a/gcc/c/c-typeck.c
>> +++ b/gcc/c/c-typeck.c
>> @@ -14942,6 +14942,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
>>   	  pc = &OMP_CLAUSE_CHAIN (c);
>>   	  continue;
>>   
>> +	case OMP_CLAUSE_DETACH:
>> +	  t = OMP_CLAUSE_DECL (c);
>> +	  pc = &OMP_CLAUSE_CHAIN (c);
>> +	  continue;
>> +
> 
> If you wouldn't need to do anything for C for the detach clause, just would
> just add:
> 	case OMP_CLAUSE_DETACH:
> at the end of the case list that starts below:
>>   	case OMP_CLAUSE_IF:
>>   	case OMP_CLAUSE_NUM_THREADS:
>>   	case OMP_CLAUSE_NUM_TEAMS:
> 
> But you actually do need to do something, even for C.
> 
> There are two restrictions:
> - At most one detach clause can appear on the directive.
> - If a detach clause appears on the directive, then a mergeable clause cannot appear on the same directive.
> that should be checked and diagnosed.  One place to do that would be
> like usually in all the FEs separately, that would mean adding
>    bool mergeable_seen = false, detach_seen = false;
> vars and for those clauses setting the *_seen, plus for DETACH
> already complain if detach_seen is already true and remove the clause.
> And at the end of the loop if mergeable_seen && detach_seen, diagnose
> and remove one of them (perhaps better detach clause).
> There is the optional second loop that can be used for the removal...
> 
> Testcase coverage should include:
>    #pragma omp task detach (x) detach (y)
> as well as
>    #pragma omp task mergeable detach (x)
> and
>    #pragma omp task detach (x) mergeable
> (and likewise for Fortran).
> 

I have implemented checking for multiple detach clauses and usage with 
mergeable. I have included testcases in c-c++-common/gomp/task-detach-1.c and
gfortran.dg/gomp/task-detach-1.f90.

>> +  if (cp_lexer_next_token_is_not (parser->lexer, CPP_NAME))
>> +    {
>> +      cp_parser_error (parser, "expected identifier");
>> +      return list;
>> +    }
>> +
>> +  location_t id_loc = cp_lexer_peek_token (parser->lexer)->location;
>> +  tree t, identifier = cp_parser_identifier (parser);
>> +
>> +  if (identifier == error_mark_node)
>> +    t = error_mark_node;
>> +  else
>> +    {
>> +      t = cp_parser_lookup_name_simple
>> +	    (parser, identifier,
>> +	     cp_lexer_peek_token (parser->lexer)->location);
>> +      if (t == error_mark_node)
>> +	cp_parser_name_lookup_error (parser, identifier, t, NLE_NULL,
>> +				     id_loc);
> 
> The above doesn't match what cp_parser_omp_var_list_no_open does,
> in particular it should use cp_parser_id_expression
> instead of cp_parser_identifier etc.
> 

Changed to use cp_parser_id_expression, and added extra logic from 
cp_parser_omp_var_list in looking up the decl.

>> +      else
>> +	{
>> +	  tree type = TYPE_MAIN_VARIANT (TREE_TYPE (t));
>> +	  if (!INTEGRAL_TYPE_P (type)
>> +	      || TREE_CODE (type) != ENUMERAL_TYPE
>> +	      || DECL_NAME (TYPE_NAME (type))
>> +		   != get_identifier ("omp_event_handle_t"))
>> +	    {
>> +	      error_at (id_loc, "%<detach%> clause event handle "
>> +			    "has type %qT rather than "
>> +			    "%<omp_event_handle_t%>",
>> +			    type);
>> +	      return list;
> 
> You can't do this here for C++, it needs to be done in finish_omp_clauses
> instead and only be done if the type is not a dependent type.
> Consider (e.g. should be in testsuite)
> template <typename T>
> void
> foo ()
> {
>    T t;
>    #pragma omp task detach (t)
>    ;
> }
> 
> template <typename T>
> void
> bar ()
> {
>    T t;
>    #pragma omp task detach (t)
>    ;
> }
> 
> void
> baz ()
> {
>    foo <omp_event_handle_t> ();
>    bar <int> (); // Instantiating this should error
> }
> 

Moved type checking to finish_omp_clauses, and testcase added at 
g++.dg/gomp/task-detach-1.C.

>> @@ -7394,6 +7394,9 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
>>   		}
>>   	    }
>>   	  break;
>> +	case OMP_CLAUSE_DETACH:
>> +	  t = OMP_CLAUSE_DECL (c);
>> +	  break;
>>   
> 
> Again, restriction checking here, plus check the type if it is
> non-dependent, otherwise defer that checking for finish_omp_clauses when
> it will not be dependent anymore.
> 
> I think you need to handle OMP_CLAUSE_DETACH in cp/pt.c too.
> 

Done. g++.dg/gomp/task-detach-1.C contains a test for templates.

>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -9733,6 +9733,19 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
>>   	    }
>>   	  break;
>>   
>> +	case OMP_CLAUSE_DETACH:
>> +	  decl = OMP_CLAUSE_DECL (c);
>> +	  if (outer_ctx)
>> +	    {
>> +	      splay_tree_node on
>> +		= splay_tree_lookup (outer_ctx->variables,
>> +				     (splay_tree_key)decl);
>> +	      if (on == NULL || (on->value & GOVD_DATA_SHARE_CLASS) == 0)
>> +		omp_firstprivatize_variable (outer_ctx, decl);
>> +	      omp_notice_variable (outer_ctx, decl, true);
>> +	    }
>> +	  break;
> 
> I don't understand this.  My reading of:
> "The event-handle will be considered as if it was specified on a
> firstprivate clause. The use of a variable in a detach clause expression of a task
> construct causes an implicit reference to the variable in all enclosing
> constructs."
> is that we should do:
>        case OMP_CLAUSE_DETACH:
> 	decl = OMP_CLAUSE_DECL (c);
> 	goto do_notice;
> which does the second sentence, and for the first sentence I believe it
> talks about the task construct rather than about the outer construct.
> So (again, something for testsuite):
> void
> foo (void)
> {
>    omp_event_handle_t t;
>    #pragma omp parallel master default (none) /* { dg-error "..." } */
>    {
>      #pragma omp task detach (t)
>      ;
>    }
> }
> The dg-error should be the usual error about t being referenced in the
> construct but not specified in the data sharing clauses on parallel.

Done, and test case added to c-c++-common/gomp/task-detach-1.c.

> And then
> void
> bar (void)
> {
>    omp_event_handle_t t;
>    #pragma omp task detach (t) default (none)
>    omp_fullfill_event (t); // This should be ok, above first sentence says
> // that it is as if firstprivate (t)
> }
> 
> But I think it is actually even stronger than that,
>    #pragma omp task detach (t) firstprivate (t)
> and
>    #pragma omp task detach (t) shared (t)
> etc. should be invalid too (at least in pedantic reading).
> I guess we should ask on omp-lang.  If it actually works as firstprivate
> (t), perhaps we should handle it that way already in the FEs.

I have added code to the FEs to make changing the data-sharing type of the 
detach decl an error. omp_default_clause is modified so that firstprivate is 
always applied by default on the detach clause decl.

>> --- a/gcc/omp-expand.c
>> +++ b/gcc/omp-expand.c
>> @@ -762,6 +762,7 @@ expand_task_call (struct omp_region *region, basic_block bb,
>>     tree depend = omp_find_clause (clauses, OMP_CLAUSE_DEPEND);
>>     tree finalc = omp_find_clause (clauses, OMP_CLAUSE_FINAL);
>>     tree priority = omp_find_clause (clauses, OMP_CLAUSE_PRIORITY);
>> +  tree detach = omp_find_clause (clauses, OMP_CLAUSE_DETACH);
>>   
>>     unsigned int iflags
>>       = (untied ? GOMP_TASK_FLAG_UNTIED : 0)
>> @@ -853,6 +854,11 @@ expand_task_call (struct omp_region *region, basic_block bb,
>>       priority = integer_zero_node;
>>   
>>     gsi = gsi_last_nondebug_bb (bb);
>> +
>> +  detach = detach
>> +      ? fold_convert (pointer_sized_int_node, OMP_CLAUSE_DETACH_EXPR (detach))
>> +      : null_pointer_node;
> 
> Formatting is wrong, would need to be
>    detach = (detach
> 	    ? fold_convert (pointer_sized_int_node,
> 			    OMP_CLAUSE_DETACH_EXPR (detach))
> 	    : ...);

Fixed.

> Now, null_pointer_node doesn't have pointer_sized_int type, so there would
> be type mismatch.  But additionally, let's talk about how it should be
> implemented.  I'd say best would be if:
>    #pragma omp task detach (x)
>    ;
> expands as
>    GOMP_task (..., ..., ..., ..., ..., ..., other_flags | GOMP_TASK_FLAG_DETACH, ..., ..., &x);
> where GOMP_task allocates the structure for the struct gomp_task the way it
> usually does, will have something in that structure include a gomp_sem_t and
> and a flag that detach is needed (and if detach is needed gomp_sem_init that
> semaphore) and stores to *detach the address of that semaphore or so.
> Then you don't need a separate call and allocate something separately.

I think this was originally done to get around the problem of the detach clause 
decl being passed into the context of the task construct by value, such that the 
value received was the one _before_ it was modified by GOMP_task. I have made 
the changes, at the expense of some complexity elsewhere...

> Though, if we implement
> detach as passing address of the variable to GOMP_task, if we implicitly
> add firstprivate clause it would copy the value from before it has been
> initialized.  One way to handle that would be not add firstprivate clause
> next to detach, but treat detach like a firstprivate clause in most places,
> and just for the passing pass it specially (let parent of task pass address
> of the variable and let the receiving side recieve the value instead,
> which would force task_cpyfn, or handle it more like we handle the bounds
> of a taskloop - force the omp_eventhandler_t to be the first variable in the
> structure and let GOMP_task write the address not just to *detach, but also
> to the first element in the structure.

I have opted for the second approach of forcing the variable holding the event 
handle to first place in the ctx->record type. Since taskloops cannot take a 
detach clause, there is no conflict between the two.

However, there is an issue when the event handle is not referenced within the 
task construct, such that there is no entry for it in ctx->record_type. In this 
case, I just add a dummy entry for it, otherwise GOMP_task would clobber some 
other variable when it attempts to write the address to the data.

> The GOMP_TASK_FLAG_DETACH is needed, because we keep extending GOMP_task
> rather than creating new versions of that, and so to stay (at least on
> ABIs I care about) ABI compatible we must not assume detach will have
> meaningful value if GOMP_TASK_FLAG_DETACH bit is not set in flags.
> I think we can make the 10th argument to BUILT_IN_GOMP_TASK just void *
> e.g. to simplify omp-builtins.def and builtin-types.def etc., and just
> cast it to omp_eventhandler_t * inside of GOMP_task.
> 
> And, when we pass &x to GOMP_task, we need the detach decl to be
> addressable, so best in the FEs ensure it is addressable by calling
> *mark_addressable there.
> 

Done.

>> --- a/gcc/tree.c
>> +++ b/gcc/tree.c
>> @@ -299,6 +299,7 @@ unsigned const char omp_clause_num_ops[] =
>>     1, /* OMP_CLAUSE_LINK  */
>>     2, /* OMP_CLAUSE_FROM  */
>>     2, /* OMP_CLAUSE_TO  */
>> +  1, /* OMP_CLAUSE_DETACH  */
>>     2, /* OMP_CLAUSE_MAP  */
>>     1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
>>     1, /* OMP_CLAUSE_USE_DEVICE_ADDR  */
> 
> This position for OMP_CLAUSE_DETACH is incorrect (but so is
> USE_DEVICE_*/IS_DEVICE*/INCLUSIVE/EXCLUSIVE it seems), given:
> #define OMP_CLAUSE_SIZE(NODE)                                           \
>    OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE),  \
>                                                OMP_CLAUSE_FROM,          \
>                                                OMP_CLAUSE__CACHE_), 1)
> I think for OMP_CLAUSE_SIZE we just want from/to/map/_cache_ and nothing
> else, so I'd move detach/use*/is*/*clusive right before from clause.
> Here 3 times and in tree-core.h to match.
> We want OMP_CLAUSE_DECL to work for them, which is OMP_CLAUSE_PRIVATE ..
> OMP_CLAUSE__SCANTEMP_.
> 

Fixed.

>> +#define OMP_CLAUSE_DETACH_EXPR(NODE) \
>> +  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DETACH),0)
> 
> No need for that, OMP_CLAUSE_DECL can be used for that.
> It is not an expression, but decl anyway.
> 

Removed.

> I don't see tree-nested.c changes, that is needed too.
> 

Done. There is a test for nested functions in gcc.dg/gomp/task-detach-1.c.

>> @@ -556,6 +557,13 @@
>>           end interface
>>   
>>           interface
>> +          subroutine omp_fulfill_event (event)
>> +            use omp_lib_kinds
>> +            integer (kind=omp_event_handle_kind), value, intent(in) :: event
> 
> Too long line?
> 

Do you mean the 72 column limit of Fortran? I don't think it matters here as we 
are not using fixed form, but I have broken the line up anyway.

>> diff --git a/libgomp/priority_queue.h b/libgomp/priority_queue.h
>> index 0ad78f5..c6fd80d 100644
>> --- a/libgomp/priority_queue.h
>> +++ b/libgomp/priority_queue.h
>> @@ -113,6 +113,8 @@ enum priority_queue_type
>>     PQ_IGNORED = 999
>>   };
>>   
>> +typedef bool (*priority_queue_predicate)(struct gomp_task *);
> 
> Formatting, space before (.

Fixed.

>> +uintptr_t
>> +GOMP_new_event ()
>> +{
>> +  struct gomp_allow_completion_event *event;
>> +
>> +  event = (struct gomp_allow_completion_event *)
>> +	    gomp_malloc (sizeof (struct gomp_allow_completion_event));
>> +  event->fulfilled = false;
>> +  gomp_sem_init (&event->completion_sem, 0);
>> +
>> +  gomp_debug (0, "GOMP_new_event: %p\n", event);
>> +
>> +  return (uintptr_t) event;
>> +}
>> +
>> +static bool
>> +task_fulfilled_p (struct gomp_task *task)
>> +{
>> +  return __atomic_load_n (&task->detach_event->fulfilled,
>> +			  __ATOMIC_RELAXED);
>> +}
> 
> I don't understand the fulfilled and sem duality, I think there should be
> a flag that the flag has detach at all, and then just a semaphore, the
> semaphore state itself would indicate if it is fulfilled or not.

The event implementation is now just two fields in the task struct, with the 
event handle being the address of the semaphore field.

>> @@ -347,11 +371,14 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
>>   void
>>   GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
>>   	   long arg_size, long arg_align, bool if_clause, unsigned flags,
>> -	   void **depend, int priority)
>> +	   void **depend, int priority, uintptr_t detach)
>>   {
>>     struct gomp_thread *thr = gomp_thread ();
>>     struct gomp_team *team = thr->ts.team;
>>   
>> +  struct gomp_allow_completion_event *detach_event =
>> +    detach ? (struct gomp_allow_completion_event *) detach : NULL;
> 
> Formatting (no = at the end of line), but as I said above, I'd prefer
> to do it differently.
> 

Now done differently :-).

>> +void omp_fulfill_event(omp_event_handle_t event)
> 
> Formatting.
> omp_fulfill_event on separate line from void,
> and space before (.
> 

Fixed.

>> +{
>> +  struct gomp_allow_completion_event *ev =
>> +		(struct gomp_allow_completion_event *) event;
> 
> Formatting, no = at the end of line and the (struct ... is indented
> wierdly.

No longer present.

I have tested bootstrapping on x86_64 (no offloading) with no issues, and 
running the libgomp testsuite with Nvidia offloading shows no regressions. I 
have also tested all the gomp.exp tests in the main gcc testsuite, also with no 
issues. I am currently still running the full testsuite, but do not anticipate 
any problems.

Okay to commit on trunk, if the full testsuite run does not show any regressions?

Thanks

Kwok
From 788687f87ad41e51258738ce068ee38d7b24defc Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Fri, 15 Jan 2021 04:49:36 -0800
Subject: [PATCH] openmp: Add support for the OpenMP 5.0 task detach clause

2021-01-15  Kwok Cheung Yeung  <kcy@codesourcery.com>

	gcc/
	* builtin-types.def
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
	to...
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR):
	...this.  Add extra argument.
	* gimplify.c (omp_default_clause): Ensure that event handle is
	firstprivate in a task region.
	(gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DETACH.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-builtins.def (BUILT_IN_GOMP_TASK): Change function type to
	BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR.
	* omp-expand.c (expand_task_call): Add GOMP_TASK_FLAG_DETACH to flags
	if detach clause specified.  Add detach argument when generating
	call to	GOMP_task.
	* omp-low.c (scan_sharing_clauses): Setup data environment for detach
	clause.
	(finish_taskreg_scan): Move field for variable containing the event
	handle to the front of the struct.  Create a temporary field if one
	is not already present.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DETACH.  Fix
	ordering.
	* tree-nested.c (convert_nonlocal_omp_clauses): Handle
	OMP_CLAUSE_DETACH clause.
	(convert_local_omp_clauses): Handle OMP_CLAUSE_DETACH clause.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_DETACH.
	* tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_DETACH.
	Fix ordering.
	(omp_clause_code_name): Add entry for OMP_CLAUSE_DETACH.  Fix
	ordering.
	(walk_tree_1): Handle OMP_CLAUSE_DETACH.
	* tree.h (OMP_CLAUSE_DETACH_EXPR): New.

	gcc/c-family/
	* c-pragma.h (pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_DETACH.
	Redefine PRAGMA_OACC_CLAUSE_DETACH.

	gcc/c/
	* c-parser.c (c_parser_omp_clause_detach): New.
	(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH clause.
	(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
	* c-typeck.c (c_finish_omp_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH
	clause.  Prevent use of detach with mergeable and overriding the
	data sharing mode of the event handle.

	gcc/cp/
	* parser.c (cp_parser_omp_clause_detach): New.
	(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH.
	(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
	* pt.c (tsubst_omp_clauses): Handle OMP_CLAUSE_DETACH clause.
	* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_DETACH clause.
	Prevent use of detach with mergeable and overriding the	data sharing
	mode of the event handle.

	gcc/fortran/
	* dump-parse-tree.c (show_omp_clauses): Handle detach clause.
	* frontend-passes.c (gfc_code_walker): Walk detach expression.
	* gfortran.h (struct gfc_omp_clauses): Add detach field.
	(gfc_c_intptr_kind): New.
	* openmp.c (gfc_free_omp_clauses): Free detach clause.
	(gfc_match_omp_detach): New.
	(enum omp_mask1): Add OMP_CLAUSE_DETACH.
	(enum omp_mask2): Remove OMP_CLAUSE_DETACH.
	(gfc_match_omp_clauses): Handle OMP_CLAUSE_DETACH for OpenMP.
	(OMP_TASK_CLAUSES): Add OMP_CLAUSE_DETACH.
	(resolve_omp_clauses): Prevent use of detach with mergeable and
	overriding the data sharing mode of the event handle.
	* trans-openmp.c (gfc_trans_omp_clauses): Handle detach clause.
	* trans-types.c (gfc_c_intptr_kind): New.
	(gfc_init_kinds): Initialize gfc_c_intptr_kind.
	* types.def
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
	to...
	(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR):
	...this.  Add extra argument.

	gcc/testsuite/
	* c-c++-common/gomp/task-detach-1.c: New.
	* g++.dg/gomp/task-detach-1.C: New.
	* gcc.dg/gomp/task-detach-1.c: New.
	* gfortran.dg/gomp/task-detach-1.f90: New.

	include/
	* gomp-constants.h (GOMP_TASK_FLAG_DETACH): New.

	libgomp/
	* fortran.c (omp_fulfill_event_): New.
	* libgomp.h (struct gomp_task): Add detach and completion_sem fields.
	(struct gomp_team): Add task_detach_queue and task_detach_count
	fields.
	* libgomp.map (OMP_5.0.1): Add omp_fulfill_event and omp_fulfill_event_.
	* libgomp_g.h (GOMP_task): Add extra argument.
	* omp.h.in (enum omp_event_handle_t): New.
	(omp_fulfill_event): New.
	* omp_lib.f90.in (omp_event_handle_kind): New.
	(omp_fulfill_event): New.
	* omp_lib.h.in (omp_event_handle_kind): New.
	(omp_fulfill_event): Declare.
	* priority_queue.c (priority_tree_find): New.
	(priority_list_find): New.
	(priority_queue_find): New.
	* priority_queue.h (priority_queue_predicate): New.
	(priority_queue_find): New.
	* task.c (gomp_init_task): Initialize detach field.
	(task_fulfilled_p): New.
	(GOMP_task): Add detach argument.  Ignore detach argument if
	GOMP_TASK_FLAG_DETACH not set in flags.  Initialize completion_sem
	field.	Copy address of completion_sem into detach argument and
	into the start of the data record.  Wait for detach event if task
	not deferred.
	(gomp_barrier_handle_tasks): Queue tasks with unfulfilled events.
	Remove completed tasks and requeue dependent tasks.
	(omp_fulfill_event): New.
	* team.c (gomp_new_team): Initialize task_detach_queue and
	task_detach_count fields.
	(free_team): Free task_detach_queue field.
	* testsuite/libgomp.c-c++-common/task-detach-1.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-2.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-3.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-4.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-5.c: New testcase.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: New testcase.
	* testsuite/libgomp.fortran/task-detach-1.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-2.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-3.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-4.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-5.f90: New testcase.
	* testsuite/libgomp.fortran/task-detach-6.f90: New testcase.
---
 gcc/builtin-types.def                              |   8 +-
 gcc/c-family/c-pragma.h                            |   3 +-
 gcc/c/c-parser.c                                   |  57 ++++++++-
 gcc/c/c-typeck.c                                   |  58 ++++++++-
 gcc/cp/parser.c                                    |  53 ++++++++-
 gcc/cp/pt.c                                        |   1 +
 gcc/cp/semantics.c                                 |  61 +++++++++-
 gcc/fortran/dump-parse-tree.c                      |   6 +
 gcc/fortran/frontend-passes.c                      |   1 +
 gcc/fortran/gfortran.h                             |   2 +
 gcc/fortran/openmp.c                               |  60 +++++++++-
 gcc/fortran/trans-openmp.c                         |  16 +++
 gcc/fortran/trans-types.c                          |   3 +
 gcc/fortran/types.def                              |   8 +-
 gcc/gimplify.c                                     |  14 +++
 gcc/omp-builtins.def                               |   2 +-
 gcc/omp-expand.c                                   |  19 ++-
 gcc/omp-low.c                                      |  63 ++++++++++
 gcc/testsuite/c-c++-common/gomp/task-detach-1.c    |  32 +++++
 gcc/testsuite/g++.dg/gomp/task-detach-1.C          |  18 +++
 gcc/testsuite/gcc.dg/gomp/task-detach-1.c          |  15 +++
 gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90   |  27 +++++
 gcc/tree-core.h                                    |  29 +++--
 gcc/tree-nested.c                                  |   2 +
 gcc/tree-pretty-print.c                            |   6 +
 gcc/tree.c                                         |  15 ++-
 include/gomp-constants.h                           |   1 +
 libgomp/fortran.c                                  |   6 +
 libgomp/libgomp.h                                  |   7 ++
 libgomp/libgomp.map                                |   2 +
 libgomp/libgomp_g.h                                |   2 +-
 libgomp/omp.h.in                                   |   7 ++
 libgomp/omp_lib.f90.in                             |   9 ++
 libgomp/omp_lib.h.in                               |   4 +
 libgomp/priority_queue.c                           |  57 +++++++++
 libgomp/priority_queue.h                           |   5 +
 libgomp/task.c                                     | 132 +++++++++++++++++----
 libgomp/team.c                                     |   4 +
 .../testsuite/libgomp.c-c++-common/task-detach-1.c |  36 ++++++
 .../testsuite/libgomp.c-c++-common/task-detach-2.c |  37 ++++++
 .../testsuite/libgomp.c-c++-common/task-detach-3.c |  33 ++++++
 .../testsuite/libgomp.c-c++-common/task-detach-4.c |  24 ++++
 .../testsuite/libgomp.c-c++-common/task-detach-5.c |  42 +++++++
 .../testsuite/libgomp.c-c++-common/task-detach-6.c |  46 +++++++
 .../testsuite/libgomp.fortran/task-detach-1.f90    |  33 ++++++
 .../testsuite/libgomp.fortran/task-detach-2.f90    |  34 ++++++
 .../testsuite/libgomp.fortran/task-detach-3.f90    |  33 ++++++
 .../testsuite/libgomp.fortran/task-detach-4.f90    |  22 ++++
 .../testsuite/libgomp.fortran/task-detach-5.f90    |  39 ++++++
 .../testsuite/libgomp.fortran/task-detach-6.f90    |  44 +++++++
 50 files changed, 1176 insertions(+), 62 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/task-detach-1.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/task-detach-1.C
 create mode 100644 gcc/testsuite/gcc.dg/gomp/task-detach-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-3.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-4.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-5.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/task-detach-6.f90

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 47abf90..d160826 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -759,10 +759,6 @@ DEF_FUNCTION_TYPE_8 (BT_FN_BOOL_UINT_ULLPTR_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_LONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG, BT_PTR, BT_PTR)
 
-DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
-		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
-		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
-		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
 		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
@@ -770,6 +766,10 @@ DEF_FUNCTION_TYPE_9 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR_PTR_PTR
 		     BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG, BT_LONG,
 		     BT_PTR_LONG, BT_PTR_LONG, BT_PTR, BT_PTR)
 
+DEF_FUNCTION_TYPE_10 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR,
+		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
+		      BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
+		      BT_BOOL, BT_UINT, BT_PTR, BT_INT, BT_PTR)
 DEF_FUNCTION_TYPE_10 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		      BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		      BT_ULONGLONG, BT_LONG, BT_ULONGLONG, BT_PTR_ULONGLONG,
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index de45203..6c34ffa 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -95,6 +95,7 @@ enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_DEFAULT,
   PRAGMA_OMP_CLAUSE_DEFAULTMAP,
   PRAGMA_OMP_CLAUSE_DEPEND,
+  PRAGMA_OMP_CLAUSE_DETACH,
   PRAGMA_OMP_CLAUSE_DEVICE,
   PRAGMA_OMP_CLAUSE_DEVICE_TYPE,
   PRAGMA_OMP_CLAUSE_DIST_SCHEDULE,
@@ -151,7 +152,6 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_COPYOUT,
   PRAGMA_OACC_CLAUSE_CREATE,
   PRAGMA_OACC_CLAUSE_DELETE,
-  PRAGMA_OACC_CLAUSE_DETACH,
   PRAGMA_OACC_CLAUSE_DEVICEPTR,
   PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
   PRAGMA_OACC_CLAUSE_FINALIZE,
@@ -174,6 +174,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_COPYIN = PRAGMA_OMP_CLAUSE_COPYIN,
   PRAGMA_OACC_CLAUSE_DEVICE = PRAGMA_OMP_CLAUSE_DEVICE,
   PRAGMA_OACC_CLAUSE_DEFAULT = PRAGMA_OMP_CLAUSE_DEFAULT,
+  PRAGMA_OACC_CLAUSE_DETACH = PRAGMA_OMP_CLAUSE_DETACH,
   PRAGMA_OACC_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
   PRAGMA_OACC_CLAUSE_IF = PRAGMA_OMP_CLAUSE_IF,
   PRAGMA_OACC_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index c77d9fc..a8df208 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15977,6 +15977,56 @@ c_parser_omp_clause_uniform (c_parser *parser, tree list)
   return list;
 }
 
+/* OpenMP 5.0:
+   detach ( event-handle ) */
+
+static tree
+c_parser_omp_clause_detach (c_parser *parser, tree list)
+{
+  matching_parens parens;
+  location_t clause_loc = c_parser_peek_token (parser)->location;
+
+  if (!parens.require_open (parser))
+    return list;
+
+  if (c_parser_next_token_is_not (parser, CPP_NAME)
+      || c_parser_peek_token (parser)->id_kind != C_ID_ID)
+    {
+      c_parser_error (parser, "expected identifier");
+      parens.skip_until_found_close (parser);
+      return list;
+    }
+
+  tree t = lookup_name (c_parser_peek_token (parser)->value);
+  if (t == NULL_TREE)
+    {
+      undeclared_variable (c_parser_peek_token (parser)->location,
+			   c_parser_peek_token (parser)->value);
+      parens.skip_until_found_close (parser);
+      return list;
+    }
+  c_parser_consume_token (parser);
+
+  tree type = TYPE_MAIN_VARIANT (TREE_TYPE (t));
+  if (!INTEGRAL_TYPE_P (type)
+      || TREE_CODE (type) != ENUMERAL_TYPE
+      || TYPE_NAME (type) != get_identifier ("omp_event_handle_t"))
+    {
+      error_at (clause_loc, "%<detach%> clause event handle "
+			    "has type %qT rather than "
+			    "%<omp_event_handle_t%>",
+			    type);
+      parens.skip_until_found_close (parser);
+      return list;
+    }
+
+  tree u = build_omp_clause (clause_loc, OMP_CLAUSE_DETACH);
+  OMP_CLAUSE_DECL (u) = t;
+  OMP_CLAUSE_CHAIN (u) = list;
+  parens.skip_until_found_close (parser);
+  return u;
+}
+
 /* Parse all OpenACC clauses.  The set clauses allowed by the directive
    is a bitmask in MASK.  Return the list of clauses found.  */
 
@@ -16243,6 +16293,10 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_omp_clause_default (parser, clauses, false);
 	  c_name = "default";
 	  break;
+	case PRAGMA_OMP_CLAUSE_DETACH:
+	  clauses = c_parser_omp_clause_detach (parser, clauses);
+	  c_name = "detach";
+	  break;
 	case PRAGMA_OMP_CLAUSE_FIRSTPRIVATE:
 	  clauses = c_parser_omp_clause_firstprivate (parser, clauses);
 	  c_name = "firstprivate";
@@ -19190,7 +19244,8 @@ c_parser_omp_single (location_t loc, c_parser *parser, bool *if_p)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEPEND)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRIORITY)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ALLOCATE)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION))
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DETACH))
 
 static tree
 c_parser_omp_task (location_t loc, c_parser *parser, bool *if_p)
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 67c0080..4e9b21b 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13854,6 +13854,8 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
   tree simdlen = NULL_TREE, safelen = NULL_TREE;
   bool branch_seen = false;
   bool copyprivate_seen = false;
+  bool mergeable_seen = false;
+  tree *detach_seen = NULL;
   bool linear_variable_step_check = false;
   tree *nowait_clause = NULL;
   tree ordered_clause = NULL_TREE;
@@ -14935,6 +14937,21 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  pc = &OMP_CLAUSE_CHAIN (c);
 	  continue;
 
+	case OMP_CLAUSE_DETACH:
+	  t = OMP_CLAUSE_DECL (c);
+	  if (detach_seen)
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"there can be at most one %<detach%> clause in a "
+			"task construct");
+	      remove = true;
+	      break;
+	    }
+	  detach_seen = pc;
+	  pc = &OMP_CLAUSE_CHAIN (c);
+	  c_mark_addressable (t);
+	  continue;
+
 	case OMP_CLAUSE_IF:
 	case OMP_CLAUSE_NUM_THREADS:
 	case OMP_CLAUSE_NUM_TEAMS:
@@ -14943,7 +14960,6 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	case OMP_CLAUSE_UNTIED:
 	case OMP_CLAUSE_COLLAPSE:
 	case OMP_CLAUSE_FINAL:
-	case OMP_CLAUSE_MERGEABLE:
 	case OMP_CLAUSE_DEVICE:
 	case OMP_CLAUSE_DIST_SCHEDULE:
 	case OMP_CLAUSE_PARALLEL:
@@ -14977,6 +14993,11 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  pc = &OMP_CLAUSE_CHAIN (c);
 	  continue;
 
+	case OMP_CLAUSE_MERGEABLE:
+	  mergeable_seen = true;
+	  pc = &OMP_CLAUSE_CHAIN (c);
+	  continue;
+
 	case OMP_CLAUSE_NOGROUP:
 	  nogroup_seen = pc;
 	  pc = &OMP_CLAUSE_CHAIN (c);
@@ -15228,6 +15249,41 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
       *nogroup_seen = OMP_CLAUSE_CHAIN (*nogroup_seen);
     }
 
+  if (detach_seen)
+    {
+      if (mergeable_seen)
+	{
+	  error_at (OMP_CLAUSE_LOCATION (*detach_seen),
+		    "%<detach%> clause must not be used together with "
+		    "%<mergeable%> clause");
+	  *detach_seen = OMP_CLAUSE_CHAIN (*detach_seen);
+	}
+      else
+	{
+	  tree detach_decl = OMP_CLAUSE_DECL (*detach_seen);
+
+	  for (pc = &clauses, c = clauses; c ; c = *pc)
+	    {
+	      bool remove = false;
+	      if ((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SHARED
+		   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE
+		   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE
+		   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE)
+		  && OMP_CLAUSE_DECL (c) == detach_decl)
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "the event handle of a %<detach%> clause "
+			    "should not be in a data-sharing clause");
+		  remove = true;
+		}
+	      if (remove)
+		*pc = OMP_CLAUSE_CHAIN (c);
+	      else
+		pc = &OMP_CLAUSE_CHAIN (c);
+	    }
+	}
+    }
+
   bitmap_obstack_release (NULL);
   return clauses;
 }
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c713852..e67339d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -37412,6 +37412,52 @@ cp_parser_omp_clause_depend_sink (cp_parser *parser, location_t clause_loc,
 }
 
 /* OpenMP 5.0:
+   detach ( event-handle ) */
+
+static tree
+cp_parser_omp_clause_detach (cp_parser *parser, tree list)
+{
+  matching_parens parens;
+
+  if (!parens.require_open (parser))
+    return list;
+
+  cp_token *token;
+  tree name, decl;
+
+  token = cp_lexer_peek_token (parser->lexer);
+  name = cp_parser_id_expression (parser, /*template_p=*/false,
+					  /*check_dependency_p=*/true,
+					  /*template_p=*/NULL,
+					  /*declarator_p=*/false,
+					  /*optional_p=*/false);
+  if (name == error_mark_node)
+    decl = error_mark_node;
+  else
+    {
+      if (identifier_p (name))
+	decl = cp_parser_lookup_name_simple (parser, name, token->location);
+      else
+	decl = name;
+      if (decl == error_mark_node)
+	cp_parser_name_lookup_error (parser, name, decl, NLE_NULL,
+				     token->location);
+    }
+
+  if (decl == error_mark_node
+      || !parens.require_close (parser))
+    cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+					   /*or_comma=*/false,
+					   /*consume_paren=*/true);
+
+  tree u = build_omp_clause (token->location, OMP_CLAUSE_DETACH);
+  OMP_CLAUSE_DECL (u) = decl;
+  OMP_CLAUSE_CHAIN (u) = list;
+
+  return u;
+}
+
+/* OpenMP 5.0:
    iterators ( iterators-definition )
 
    iterators-definition:
@@ -38470,6 +38516,10 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 						 token->location);
 	  c_name = "depend";
 	  break;
+	case PRAGMA_OMP_CLAUSE_DETACH:
+	  clauses = cp_parser_omp_clause_detach (parser, clauses);
+	  c_name = "detach";
+	  break;
 	case PRAGMA_OMP_CLAUSE_MAP:
 	  clauses = cp_parser_omp_clause_map (parser, clauses);
 	  c_name = "map";
@@ -41045,7 +41095,8 @@ cp_parser_omp_single (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEPEND)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRIORITY)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ALLOCATE)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION))
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DETACH))
 
 static tree
 cp_parser_omp_task (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c27ef6d..9571401 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -17354,6 +17354,7 @@ tsubst_omp_clauses (tree clauses, enum c_omp_region_type ort,
 	case OMP_CLAUSE_VECTOR:
 	case OMP_CLAUSE_ASYNC:
 	case OMP_CLAUSE_WAIT:
+	case OMP_CLAUSE_DETACH:
 	  OMP_CLAUSE_OPERAND (nc, 0)
 	    = tsubst_expr (OMP_CLAUSE_OPERAND (oc, 0), args, complain,
 			   in_decl, /*integral_constant_expression_p=*/false);
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index c6b4c70..9dfaea2 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6401,6 +6401,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
      has been seen, -2 if mixed inscan/normal reduction diagnosed.  */
   int reduction_seen = 0;
   bool allocate_seen = false;
+  bool detach_seen = false;
+  bool mergeable_seen = false;
 
   bitmap_obstack_initialize (NULL);
   bitmap_initialize (&generic_head, &bitmap_default_obstack);
@@ -7418,6 +7420,36 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 		}
 	    }
 	  break;
+	case OMP_CLAUSE_DETACH:
+	  t = OMP_CLAUSE_DECL (c);
+	  if (detach_seen)
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"there can be at most one %<detach%> clause in a "
+			"task construct");
+	      remove = true;
+	      break;
+	    }
+	  else
+	    {
+	      tree type = TYPE_MAIN_VARIANT (TREE_TYPE (t));
+	      if (!type_dependent_expression_p (t)
+		  && (!INTEGRAL_TYPE_P (type)
+		      || TREE_CODE (type) != ENUMERAL_TYPE
+		      || DECL_NAME (TYPE_NAME (type))
+			   != get_identifier ("omp_event_handle_t")))
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "%<detach%> clause event handle "
+			    "has type %qT rather than "
+			    "%<omp_event_handle_t%>",
+			    type);
+		  remove = true;
+		}
+	      detach_seen = true;
+	      cxx_mark_addressable (t);
+	    }
+	  break;
 
 	case OMP_CLAUSE_MAP:
 	case OMP_CLAUSE_TO:
@@ -7949,7 +7981,6 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	case OMP_CLAUSE_DEFAULT:
 	case OMP_CLAUSE_UNTIED:
 	case OMP_CLAUSE_COLLAPSE:
-	case OMP_CLAUSE_MERGEABLE:
 	case OMP_CLAUSE_PARALLEL:
 	case OMP_CLAUSE_FOR:
 	case OMP_CLAUSE_SECTIONS:
@@ -7968,6 +7999,10 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	case OMP_CLAUSE_FINALIZE:
 	  break;
 
+	case OMP_CLAUSE_MERGEABLE:
+	  mergeable_seen = true;
+	  break;
+
 	case OMP_CLAUSE_TILE:
 	  for (tree list = OMP_CLAUSE_TILE_LIST (c); !remove && list;
 	       list = TREE_CHAIN (list))
@@ -8205,6 +8240,17 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	    }
 	  pc = &OMP_CLAUSE_CHAIN (c);
 	  continue;
+	case OMP_CLAUSE_DETACH:
+	  if (mergeable_seen)
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"%<detach%> clause must not be used together with "
+			"%<mergeable%> clause");
+	      *pc = OMP_CLAUSE_CHAIN (c);
+	      continue;
+	    }
+	  pc = &OMP_CLAUSE_CHAIN (c);
+	  continue;
 	case OMP_CLAUSE_NOWAIT:
 	  if (copyprivate_seen)
 	    {
@@ -8365,6 +8411,19 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	    }
 	}
 
+      if (detach_seen
+	  && (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SHARED
+	      || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE
+	      || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE
+	      || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE)
+	  && OMP_CLAUSE_DECL (c) == t)
+	{
+	  error_at (OMP_CLAUSE_LOCATION (c),
+		    "the event handle of a %<detach%> clause "
+		    "should not be in a data-sharing clause");
+	  remove = true;
+	}
+
       /* We're interested in the base element, not arrays.  */
       inner_type = type = TREE_TYPE (t);
       if ((need_complete_type
diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index a612804..059d842 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1727,6 +1727,12 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
       show_expr (omp_clauses->priority);
       fputc (')', dumpfile);
     }
+  if (omp_clauses->detach)
+    {
+      fputs (" DETACH(", dumpfile);
+      show_expr (omp_clauses->detach);
+      fputc (')', dumpfile);
+    }
   for (i = 0; i < OMP_IF_LAST; i++)
     if (omp_clauses->if_exprs[i])
       {
diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index b1e3926..63ee7b9 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -5597,6 +5597,7 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn,
 		  WALK_SUBEXPR (co->ext.omp_clauses->hint);
 		  WALK_SUBEXPR (co->ext.omp_clauses->num_tasks);
 		  WALK_SUBEXPR (co->ext.omp_clauses->priority);
+		  WALK_SUBEXPR (co->ext.omp_clauses->detach);
 		  for (idx = 0; idx < OMP_IF_LAST; idx++)
 		    WALK_SUBEXPR (co->ext.omp_clauses->if_exprs[idx]);
 		  for (idx = 0;
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 4dd72b6..7935aca 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1416,6 +1416,7 @@ typedef struct gfc_omp_clauses
   struct gfc_expr *hint;
   struct gfc_expr *num_tasks;
   struct gfc_expr *priority;
+  struct gfc_expr *detach;
   struct gfc_expr *if_exprs[OMP_IF_LAST];
   enum gfc_omp_sched_kind dist_sched_kind;
   struct gfc_expr *dist_chunk_size;
@@ -3104,6 +3105,7 @@ extern int gfc_default_character_kind;
 extern int gfc_default_logical_kind;
 extern int gfc_default_complex_kind;
 extern int gfc_c_int_kind;
+extern int gfc_c_intptr_kind;
 extern int gfc_atomic_int_kind;
 extern int gfc_atomic_logical_kind;
 extern int gfc_intio_kind;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index cb166f9..a9ecd96 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -91,6 +91,7 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c->hint);
   gfc_free_expr (c->num_tasks);
   gfc_free_expr (c->priority);
+  gfc_free_expr (c->detach);
   for (i = 0; i < OMP_IF_LAST; i++)
     gfc_free_expr (c->if_exprs[i]);
   gfc_free_expr (c->async_expr);
@@ -448,6 +449,39 @@ cleanup:
   return MATCH_ERROR;
 }
 
+/* Match detach(event-handle).  */
+
+static match
+gfc_match_omp_detach (gfc_expr **expr)
+{
+  locus old_loc = gfc_current_locus;
+
+  if (gfc_match ("detach ( ") != MATCH_YES)
+    goto syntax_error;
+
+  if (gfc_match_variable (expr, 0) != MATCH_YES)
+    goto syntax_error;
+
+  if ((*expr)->ts.type != BT_INTEGER || (*expr)->ts.kind != gfc_c_intptr_kind)
+    {
+      gfc_error ("%qs at %L should be of type "
+		 "integer(kind=omp_event_handle_kind)",
+		 (*expr)->symtree->n.sym->name, &(*expr)->where);
+      return MATCH_ERROR;
+    }
+
+  if (gfc_match_char (')') != MATCH_YES)
+    goto syntax_error;
+
+  return MATCH_YES;
+
+syntax_error:
+   gfc_error ("Syntax error in OpenMP detach clause at %C");
+   gfc_current_locus = old_loc;
+   return MATCH_ERROR;
+
+}
+
 /* Match depend(sink : ...) construct a namelist from it.  */
 
 static match
@@ -807,6 +841,7 @@ enum omp_mask1
   OMP_CLAUSE_ATOMIC,  /* OpenMP 5.0.  */
   OMP_CLAUSE_CAPTURE,  /* OpenMP 5.0.  */
   OMP_CLAUSE_MEMORDER,  /* OpenMP 5.0.  */
+  OMP_CLAUSE_DETACH,  /* OpenMP 5.0.  */
   OMP_CLAUSE_NOWAIT,
   /* This must come last.  */
   OMP_MASK1_LAST
@@ -840,7 +875,6 @@ enum omp_mask2
   OMP_CLAUSE_IF_PRESENT,
   OMP_CLAUSE_FINALIZE,
   OMP_CLAUSE_ATTACH,
-  OMP_CLAUSE_DETACH,
   /* This must come last.  */
   OMP_MASK2_LAST
 };
@@ -1378,6 +1412,12 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		gfc_current_locus = old_loc;
 	    }
 	  if ((mask & OMP_CLAUSE_DETACH)
+	      && !openacc
+	      && !c->detach
+	      && gfc_match_omp_detach (&c->detach) == MATCH_YES)
+	    continue;
+	  if ((mask & OMP_CLAUSE_DETACH)
+	      && openacc
 	      && gfc_match ("detach ( ") == MATCH_YES
 	      && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
 					   OMP_MAP_DETACH, false,
@@ -2763,7 +2803,8 @@ cleanup:
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
    | OMP_CLAUSE_SHARED | OMP_CLAUSE_IF | OMP_CLAUSE_DEFAULT		\
    | OMP_CLAUSE_UNTIED | OMP_CLAUSE_FINAL | OMP_CLAUSE_MERGEABLE	\
-   | OMP_CLAUSE_DEPEND | OMP_CLAUSE_PRIORITY | OMP_CLAUSE_IN_REDUCTION)
+   | OMP_CLAUSE_DEPEND | OMP_CLAUSE_PRIORITY | OMP_CLAUSE_IN_REDUCTION	\
+   | OMP_CLAUSE_DETACH)
 #define OMP_TASKLOOP_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
    | OMP_CLAUSE_LASTPRIVATE | OMP_CLAUSE_SHARED | OMP_CLAUSE_IF		\
@@ -5061,6 +5102,10 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 		if (n->sym->attr.associate_var)
 		  gfc_error ("ASSOCIATE name %qs in SHARED clause at %L",
 			     n->sym->name, &n->where);
+		if (omp_clauses->detach
+		    && n->sym == omp_clauses->detach->symtree->n.sym)
+		  gfc_error ("DETACH event handle %qs in SHARED clause at %L",
+			     n->sym->name, &n->where);
 	      }
 	    break;
 	  case OMP_LIST_ALIGNED:
@@ -5387,7 +5432,13 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 		    default:
 		      break;
 		    }
-
+		if (omp_clauses->detach
+		    && (list == OMP_LIST_PRIVATE
+			|| list == OMP_LIST_FIRSTPRIVATE
+			|| list == OMP_LIST_LASTPRIVATE)
+		    && n->sym == omp_clauses->detach->symtree->n.sym)
+		  gfc_error ("DETACH event handle %qs in %s clause at %L",
+			     n->sym->name, name, &n->where);
 		switch (list)
 		  {
 		  case OMP_LIST_REDUCTION_INSCAN:
@@ -5684,6 +5735,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	gfc_error ("%s must contain at least one MAP clause at %L",
 		   p, &code->loc);
     }
+  if (!openacc && omp_clauses->mergeable && omp_clauses->detach)
+    gfc_error ("%<DETACH%> clause at %L must not be used together with "
+	       "%<MERGEABLE%> clause", &omp_clauses->detach->where);
 }
 
 
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 70bfc02..00358ca 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3673,6 +3673,22 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
     }
 
+  if (clauses->detach)
+    {
+      tree detach;
+
+      gfc_init_se (&se, NULL);
+      gfc_conv_expr (&se, clauses->detach);
+      gfc_add_block_to_block (block, &se.pre);
+      detach = se.expr;
+      gfc_add_block_to_block (block, &se.post);
+
+      c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_DETACH);
+      TREE_ADDRESSABLE (detach) = 1;
+      OMP_CLAUSE_DECL (c) = detach;
+      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+    }
+
   if (clauses->hint)
     {
       tree hint;
diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index d153db7..ccdc468 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -114,6 +114,7 @@ int gfc_default_character_kind;
 int gfc_default_logical_kind;
 int gfc_default_complex_kind;
 int gfc_c_int_kind;
+int gfc_c_intptr_kind;
 int gfc_atomic_int_kind;
 int gfc_atomic_logical_kind;
 
@@ -691,6 +692,8 @@ gfc_init_kinds (void)
   /* Choose atomic kinds to match C's int.  */
   gfc_atomic_int_kind = gfc_c_int_kind;
   gfc_atomic_logical_kind = gfc_c_int_kind;
+
+  gfc_c_intptr_kind = POINTER_SIZE / 8;
 }
 
 
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 3d7f365..8626ed0 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -235,10 +235,6 @@ DEF_FUNCTION_TYPE_8 (BT_FN_BOOL_UINT_ULLPTR_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_LONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG, BT_PTR, BT_PTR)
 
-DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
-		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
-		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
-		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
 		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
@@ -246,6 +242,10 @@ DEF_FUNCTION_TYPE_9 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR_PTR_PTR
 		     BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG, BT_LONG,
 		     BT_PTR_LONG, BT_PTR_LONG, BT_PTR, BT_PTR)
 
+DEF_FUNCTION_TYPE_10 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR,
+		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
+		      BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
+		      BT_BOOL, BT_UINT, BT_PTR, BT_INT, BT_PTR)
 DEF_FUNCTION_TYPE_10 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		      BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		      BT_ULONGLONG, BT_LONG, BT_ULONGLONG, BT_PTR_ULONGLONG,
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 18a154a..5fbe2fc 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7220,6 +7220,15 @@ omp_default_clause (struct gimplify_omp_ctx *ctx, tree decl,
   enum omp_clause_default_kind kind;
 
   kind = lang_hooks.decls.omp_predetermined_sharing (decl);
+  if (ctx->region_type & ORT_TASK)
+    {
+      tree detach_clause = omp_find_clause (ctx->clauses, OMP_CLAUSE_DETACH);
+
+      /* The event-handle specified by a detach clause should always be firstprivate,
+	 regardless of the current default.  */
+      if (detach_clause && OMP_CLAUSE_DECL (detach_clause) == decl)
+	kind = OMP_CLAUSE_DEFAULT_FIRSTPRIVATE;
+    }
   if (kind != OMP_CLAUSE_DEFAULT_UNSPECIFIED)
     default_kind = kind;
   else if (VAR_P (decl) && TREE_STATIC (decl) && DECL_IN_CONSTANT_POOL (decl))
@@ -9754,6 +9763,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	    }
 	  break;
 
+	case OMP_CLAUSE_DETACH:
+	  decl = OMP_CLAUSE_DECL (c);
+	  goto do_notice;
+
 	case OMP_CLAUSE_IF:
 	  if (OMP_CLAUSE_IF_MODIFIER (c) != ERROR_MARK
 	      && OMP_CLAUSE_IF_MODIFIER (c) != code)
@@ -10900,6 +10913,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	case OMP_CLAUSE_DEFAULTMAP:
 	case OMP_CLAUSE_ORDER:
 	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 04356ac..cfbf1e6 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -381,7 +381,7 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL_REDUCTIONS,
 		  "GOMP_parallel_reductions",
 		  BT_FN_UINT_OMPFN_PTR_UINT_UINT, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASK, "GOMP_task",
-		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
+		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR,
 		  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASKLOOP, "GOMP_taskloop",
 		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 54b11ad..ea0f058 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -762,6 +762,7 @@ expand_task_call (struct omp_region *region, basic_block bb,
   tree depend = omp_find_clause (clauses, OMP_CLAUSE_DEPEND);
   tree finalc = omp_find_clause (clauses, OMP_CLAUSE_FINAL);
   tree priority = omp_find_clause (clauses, OMP_CLAUSE_PRIORITY);
+  tree detach = omp_find_clause (clauses, OMP_CLAUSE_DETACH);
 
   unsigned int iflags
     = (untied ? GOMP_TASK_FLAG_UNTIED : 0)
@@ -811,8 +812,13 @@ expand_task_call (struct omp_region *region, basic_block bb,
       if (omp_find_clause (clauses, OMP_CLAUSE_REDUCTION))
 	iflags |= GOMP_TASK_FLAG_REDUCTION;
     }
-  else if (priority)
-    iflags |= GOMP_TASK_FLAG_PRIORITY;
+  else
+    {
+      if (priority)
+	iflags |= GOMP_TASK_FLAG_PRIORITY;
+      if (detach)
+	iflags |= GOMP_TASK_FLAG_DETACH;
+    }
 
   tree flags = build_int_cst (unsigned_type_node, iflags);
 
@@ -853,6 +859,11 @@ expand_task_call (struct omp_region *region, basic_block bb,
     priority = integer_zero_node;
 
   gsi = gsi_last_nondebug_bb (bb);
+
+  detach = detach
+	   ? build_fold_addr_expr (OMP_CLAUSE_DECL (detach))
+	   : null_pointer_node;
+
   tree t = gimple_omp_task_data_arg (entry_stmt);
   if (t == NULL)
     t2 = null_pointer_node;
@@ -875,10 +886,10 @@ expand_task_call (struct omp_region *region, basic_block bb,
 			 num_tasks, priority, startvar, endvar, step);
   else
     t = build_call_expr (builtin_decl_explicit (BUILT_IN_GOMP_TASK),
-			 9, t1, t2, t3,
+			 10, t1, t2, t3,
 			 gimple_omp_task_arg_size (entry_stmt),
 			 gimple_omp_task_arg_align (entry_stmt), cond, flags,
-			 depend, priority);
+			 depend, priority, detach);
 
   force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
 			    false, GSI_CONTINUE_LINKING);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 29c8da1..60a651f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1412,6 +1412,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_NUM_GANGS:
 	case OMP_CLAUSE_NUM_WORKERS:
 	case OMP_CLAUSE_VECTOR_LENGTH:
+	case OMP_CLAUSE_DETACH:
 	  if (ctx->outer)
 	    scan_omp_op (&OMP_CLAUSE_OPERAND (c, 0), ctx->outer);
 	  break;
@@ -1779,6 +1780,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_SIMDLEN:
 	case OMP_CLAUSE_ALIGNED:
 	case OMP_CLAUSE_DEPEND:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_ALLOCATE:
 	case OMP_CLAUSE__LOOPTEMP_:
 	case OMP_CLAUSE__REDUCTEMP_:
@@ -2350,6 +2352,9 @@ finish_taskreg_scan (omp_context *ctx)
     {
       location_t loc = gimple_location (ctx->stmt);
       tree *p, vla_fields = NULL_TREE, *q = &vla_fields;
+      tree detach_clause
+	= omp_find_clause (gimple_omp_task_clauses (ctx->stmt),
+			   OMP_CLAUSE_DETACH);
       /* Move VLA fields to the end.  */
       p = &TYPE_FIELDS (ctx->record_type);
       while (*p)
@@ -2416,6 +2421,64 @@ finish_taskreg_scan (omp_context *ctx)
 	      TYPE_FIELDS (ctx->srecord_type) = f1;
 	    }
 	}
+      if (detach_clause)
+	{
+	  tree c, field;
+
+	  /* Look for a firstprivate clause with the detach event handle.  */
+	  for (c = gimple_omp_taskreg_clauses (ctx->stmt);
+	       c; c = OMP_CLAUSE_CHAIN (c))
+	    {
+	      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_FIRSTPRIVATE)
+		continue;
+	      if (maybe_lookup_decl_in_outer_ctx (OMP_CLAUSE_DECL (c), ctx)
+		  == OMP_CLAUSE_DECL (detach_clause))
+		break;
+	    }
+
+	  if (c)
+	    field = lookup_field (OMP_CLAUSE_DECL (c), ctx);
+	  else
+	    {
+	      /* The detach event handle is not referenced within the
+		 task context, so add a temporary field for it here.  */
+	      field = build_decl (OMP_CLAUSE_LOCATION (detach_clause),
+				  FIELD_DECL, NULL_TREE, ptr_type_node);
+	      insert_field_into_struct (ctx->record_type, field);
+
+	      if (ctx->srecord_type)
+		{
+		  tree sfield
+		    = build_decl (OMP_CLAUSE_LOCATION (detach_clause),
+				  FIELD_DECL, NULL_TREE, ptr_type_node);
+		  insert_field_into_struct (ctx->srecord_type, sfield);
+		}
+	    }
+
+	  /* Move field corresponding to the detach clause first.
+	     This is filled by GOMP_task and needs to be in a
+	     specific position.  */
+	  p = &TYPE_FIELDS (ctx->record_type);
+	  while (*p)
+	    if (*p == field)
+	      *p = DECL_CHAIN (*p);
+	    else
+	      p = &DECL_CHAIN (*p);
+	  DECL_CHAIN (field) = TYPE_FIELDS (ctx->record_type);
+	  TYPE_FIELDS (ctx->record_type) = field;
+	  if (ctx->srecord_type)
+	    {
+	      field = lookup_sfield (OMP_CLAUSE_DECL (detach_clause), ctx);
+	      p = &TYPE_FIELDS (ctx->srecord_type);
+	      while (*p)
+		if (*p == field)
+		  *p = DECL_CHAIN (*p);
+		else
+		  p = &DECL_CHAIN (*p);
+	      DECL_CHAIN (field) = TYPE_FIELDS (ctx->srecord_type);
+	      TYPE_FIELDS (ctx->srecord_type) = field;
+	    }
+	}
       layout_type (ctx->record_type);
       fixup_child_record_type (ctx);
       if (ctx->srecord_type)
diff --git a/gcc/testsuite/c-c++-common/gomp/task-detach-1.c b/gcc/testsuite/c-c++-common/gomp/task-detach-1.c
new file mode 100644
index 0000000..c7dda82
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/task-detach-1.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+#include <omp.h>
+
+void f (omp_event_handle_t x, omp_event_handle_t y, int z)
+{
+  #pragma omp task detach (x) detach (y) /* { dg-error "there can be at most one 'detach' clause in a task construct" } */
+    ;
+
+  #pragma omp task mergeable detach (x) /* { dg-error "'detach' clause must not be used together with 'mergeable' clause" } */
+    ;
+
+  #pragma omp task detach (x) mergeable /* { dg-error "'detach' clause must not be used together with 'mergeable' clause" } */
+    ;
+
+  #pragma omp task detach (z) /* { dg-error "'detach' clause event handle has type 'int' rather than 'omp_event_handle_t'" } */
+    ;
+
+  #pragma omp parallel master default (none) /* { dg-message "enclosing 'parallel'" } */
+    #pragma omp task detach (x) /* { dg-error "'x' not specified in enclosing 'parallel'" } */
+      ;
+
+  #pragma omp task detach (x) default (none) /* This should work.  */
+    omp_fulfill_event (x);
+
+  #pragma omp task detach (x) firstprivate (x) /* { dg-error "the event handle of a 'detach' clause should not be in a data-sharing clause" } */
+    ;
+
+  #pragma omp task detach (x) shared (x) /* { dg-error "the event handle of a 'detach' clause should not be in a data-sharing clause" } */
+    ;
+}
diff --git a/gcc/testsuite/g++.dg/gomp/task-detach-1.C b/gcc/testsuite/g++.dg/gomp/task-detach-1.C
new file mode 100644
index 0000000..443d3e8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/task-detach-1.C
@@ -0,0 +1,18 @@
+// { dg-do compile }
+// { dg-options "-fopenmp" }
+
+#include <omp.h>
+
+template <typename T>
+void func ()
+{
+  T t;
+  #pragma omp task detach (t) // { dg-error "'detach' clause event handle has type 'int' rather than 'omp_event_handle_t'" }
+    ;
+}
+
+void f()
+{
+  func <omp_event_handle_t> ();
+  func <int> (); // { dg-message "required from here" }
+}
diff --git a/gcc/testsuite/gcc.dg/gomp/task-detach-1.c b/gcc/testsuite/gcc.dg/gomp/task-detach-1.c
new file mode 100644
index 0000000..fa7315e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/task-detach-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+#include <omp.h>
+
+void f (omp_event_handle_t x)
+{
+  void g (void)
+  {
+    #pragma omp task detach (x)
+      omp_fulfill_event (x);
+  }
+
+  g ();
+}
diff --git a/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90 b/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90
new file mode 100644
index 0000000..dc51345
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90
@@ -0,0 +1,27 @@
+! { dg-do compile }
+! { dg-options "-fopenmp" }
+
+program task_detach_1
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: x, y
+  integer :: z
+  
+  !$omp task detach(x) detach(y) ! { dg-error "Failed to match clause at \\\(1\\\)" }
+  !$omp end task ! { dg-error "Unexpected !\\\$OMP END TASK statement at \\\(1\\\)" }
+
+  !$omp task mergeable detach(x) ! { dg-error "'DETACH' clause at \\\(1\\\) must not be used together with 'MERGEABLE' clause" }
+  !$omp end task
+
+  !$omp task detach(x) mergeable ! { dg-error "'DETACH' clause at \\\(1\\\) must not be used together with 'MERGEABLE' clause" }
+  !$omp end task
+
+  !$omp task detach(z) ! { dg-error "'z' at \\\(1\\\) should be of type integer\\\(kind=omp_event_handle_kind\\\)" }
+  !$omp end task ! { dg-error "Unexpected !\\\$OMP END TASK statement at \\\(1\\\)" }
+  
+  !$omp task detach (x) firstprivate (x) ! { dg-error "DETACH event handle 'x' in FIRSTPRIVATE clause at \\\(1\\\)" }
+  !$omp end task
+
+  !$omp task detach (x) shared (x) ! { dg-error "DETACH event handle 'x' in SHARED clause at \\\(1\\\)" }
+  !$omp end task
+end program
\ No newline at end of file
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index d3c37a0..d2e6c89 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -299,19 +299,8 @@ enum omp_clause_code {
   /* OpenMP clause: link (variable-list).  */
   OMP_CLAUSE_LINK,
 
-  /* OpenMP clause: from (variable-list).  */
-  OMP_CLAUSE_FROM,
-
-  /* OpenMP clause: to (variable-list).  */
-  OMP_CLAUSE_TO,
-
-  /* OpenACC clauses: {copy, copyin, copyout, create, delete, deviceptr,
-     device, host (self), present, present_or_copy (pcopy), present_or_copyin
-     (pcopyin), present_or_copyout (pcopyout), present_or_create (pcreate)}
-     (variable-list).
-
-     OpenMP clause: map ({alloc:,to:,from:,tofrom:,}variable-list).  */
-  OMP_CLAUSE_MAP,
+  /* OpenMP clause: detach (event-handle).  */
+  OMP_CLAUSE_DETACH,
 
   /* OpenACC clause: use_device (variable-list).
      OpenMP clause: use_device_ptr (ptr-list).  */
@@ -329,6 +318,20 @@ enum omp_clause_code {
   /* OpenMP clause: exclusive (variable-list).  */
   OMP_CLAUSE_EXCLUSIVE,
 
+  /* OpenMP clause: from (variable-list).  */
+  OMP_CLAUSE_FROM,
+
+  /* OpenMP clause: to (variable-list).  */
+  OMP_CLAUSE_TO,
+
+  /* OpenACC clauses: {copy, copyin, copyout, create, delete, deviceptr,
+     device, host (self), present, present_or_copy (pcopy), present_or_copyin
+     (pcopyin), present_or_copyout (pcopyout), present_or_create (pcreate)}
+     (variable-list).
+
+     OpenMP clause: map ({alloc:,to:,from:,tofrom:,}variable-list).  */
+  OMP_CLAUSE_MAP,
+
   /* Internal structure to hold OpenACC cache directive's variable-list.
      #pragma acc cache (variable-list).  */
   OMP_CLAUSE__CACHE_,
diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index 8f2ccff..1b52669 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -1339,6 +1339,7 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
+	case OMP_CLAUSE_DETACH:
 	do_decl_clause:
 	  if (pdecl == NULL)
 	    pdecl = &OMP_CLAUSE_DECL (clause);
@@ -2108,6 +2109,7 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
+	case OMP_CLAUSE_DETACH:
 	do_decl_clause:
 	  if (pdecl == NULL)
 	    pdecl = &OMP_CLAUSE_DECL (clause);
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index f52a785..aabe6bb 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1265,6 +1265,12 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
     case OMP_CLAUSE_FINALIZE:
       pp_string (pp, "finalize");
       break;
+    case OMP_CLAUSE_DETACH:
+      pp_string (pp, "detach(");
+      dump_generic_node (pp, OMP_CLAUSE_DECL (clause), spc, flags,
+			 false);
+      pp_right_paren (pp);
+      break;
 
     default:
       gcc_unreachable ();
diff --git a/gcc/tree.c b/gcc/tree.c
index e0a1d512..a25c71f 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -297,14 +297,15 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_UNIFORM  */
   1, /* OMP_CLAUSE_TO_DECLARE  */
   1, /* OMP_CLAUSE_LINK  */
-  2, /* OMP_CLAUSE_FROM  */
-  2, /* OMP_CLAUSE_TO  */
-  2, /* OMP_CLAUSE_MAP  */
+  1, /* OMP_CLAUSE_DETACH  */
   1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
   1, /* OMP_CLAUSE_USE_DEVICE_ADDR  */
   1, /* OMP_CLAUSE_IS_DEVICE_PTR  */
   1, /* OMP_CLAUSE_INCLUSIVE  */
   1, /* OMP_CLAUSE_EXCLUSIVE  */
+  2, /* OMP_CLAUSE_FROM  */
+  2, /* OMP_CLAUSE_TO  */
+  2, /* OMP_CLAUSE_MAP  */
   2, /* OMP_CLAUSE__CACHE_  */
   2, /* OMP_CLAUSE_GANG  */
   1, /* OMP_CLAUSE_ASYNC  */
@@ -382,14 +383,15 @@ const char * const omp_clause_code_name[] =
   "uniform",
   "to",
   "link",
-  "from",
-  "to",
-  "map",
+  "detach",
   "use_device_ptr",
   "use_device_addr",
   "is_device_ptr",
   "inclusive",
   "exclusive",
+  "from",
+  "to",
+  "map",
   "_cache_",
   "gang",
   "async",
@@ -12240,6 +12242,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	case OMP_CLAUSE_HINT:
 	case OMP_CLAUSE_TO_DECLARE:
 	case OMP_CLAUSE_LINK:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 11a9308..6e163b0 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -221,6 +221,7 @@ enum gomp_map_kind
 #define GOMP_TASK_FLAG_IF		(1 << 10)
 #define GOMP_TASK_FLAG_NOGROUP		(1 << 11)
 #define GOMP_TASK_FLAG_REDUCTION	(1 << 12)
+#define GOMP_TASK_FLAG_DETACH		(1 << 13)
 
 /* GOMP_target{_ext,update_ext,enter_exit_data} flags argument.  */
 #define GOMP_TARGET_FLAG_NOWAIT		(1 << 0)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index e6bccb9..4ec39c4 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -605,6 +605,12 @@ omp_get_max_task_priority_ (void)
 }
 
 void
+omp_fulfill_event_ (intptr_t event)
+{
+  omp_fulfill_event ((omp_event_handle_t) event);
+}
+
+void
 omp_set_affinity_format_ (const char *format, size_t format_len)
 {
   gomp_set_affinity_format (format, format_len);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 305cba3..b4d0c93 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -545,6 +545,9 @@ struct gomp_task
      entries and the gomp_task in which they reside.  */
   struct priority_node pnode[3];
 
+  bool detach;
+  gomp_sem_t completion_sem;
+
   struct gomp_task_icv icv;
   void (*fn) (void *);
   void *fn_data;
@@ -685,6 +688,10 @@ struct gomp_team
   int work_share_cancelled;
   int team_cancelled;
 
+  /* Tasks waiting for their completion event to be fulfilled.  */
+  struct priority_queue task_detach_queue;
+  unsigned int task_detach_count;
+
   /* This array contains structures for implicit tasks.  */
   struct gomp_task implicit_task[];
 };
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2c95f78..4ad190a 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -195,6 +195,8 @@ OMP_5.0.1 {
 	omp_free;
 	omp_get_supported_active_levels;
 	omp_get_supported_active_levels_;
+	omp_fulfill_event;
+	omp_fulfill_event_;
 } OMP_5.0;
 
 GOMP_1.0 {
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 0e1fbee..3cbe0a4 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -294,7 +294,7 @@ extern bool GOMP_cancellation_point (int);
 /* task.c */
 
 extern void GOMP_task (void (*) (void *), void *, void (*) (void *, void *),
-		       long, long, bool, unsigned, void **, int);
+		       long, long, bool, unsigned, void **, int, void *);
 extern void GOMP_taskloop (void (*) (void *), void *,
 			   void (*) (void *, void *), long, long, unsigned,
 			   unsigned long, int, long, long, long);
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index f88e360..69f96f0 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -171,6 +171,11 @@ typedef struct omp_alloctrait_t
   omp_uintptr_t value;
 } omp_alloctrait_t;
 
+typedef enum omp_event_handle_t __GOMP_UINTPTR_T_ENUM
+{
+  __omp_event_handle_t_max__ = __UINTPTR_MAX__
+} omp_event_handle_t;
+
 #ifdef __cplusplus
 extern "C" {
 # define __GOMP_NOTHROW throw ()
@@ -245,6 +250,8 @@ extern int omp_is_initial_device (void) __GOMP_NOTHROW;
 extern int omp_get_initial_device (void) __GOMP_NOTHROW;
 extern int omp_get_max_task_priority (void) __GOMP_NOTHROW;
 
+extern void omp_fulfill_event (omp_event_handle_t) __GOMP_NOTHROW;
+
 extern void *omp_target_alloc (__SIZE_TYPE__, int) __GOMP_NOTHROW;
 extern void omp_target_free (void *, int) __GOMP_NOTHROW;
 extern int omp_target_is_present (const void *, int) __GOMP_NOTHROW;
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index ff00afa..851f85f 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -39,6 +39,7 @@
         integer, parameter :: omp_alloctrait_val_kind = c_intptr_t
         integer, parameter :: omp_memspace_handle_kind = c_intptr_t
         integer, parameter :: omp_depend_kind = @OMP_DEPEND_KIND@
+        integer, parameter :: omp_event_handle_kind = c_intptr_t
         integer (omp_sched_kind), parameter :: omp_sched_static = 1
         integer (omp_sched_kind), parameter :: omp_sched_dynamic = 2
         integer (omp_sched_kind), parameter :: omp_sched_guided = 3
@@ -556,6 +557,14 @@
         end interface
 
         interface
+          subroutine omp_fulfill_event (event)
+            use omp_lib_kinds
+            integer (kind=omp_event_handle_kind), &
+              value, intent(in) :: event
+          end subroutine omp_fulfill_event
+        end interface
+
+        interface
           subroutine omp_set_affinity_format (format)
             character(len=*), intent(in) :: format
           end subroutine omp_set_affinity_format
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index a00d9bd..06d17b5 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -82,10 +82,12 @@
 
       integer omp_allocator_handle_kind, omp_alloctrait_key_kind
       integer omp_alloctrait_val_kind, omp_memspace_handle_kind
+      integer omp_event_handle_kind
       parameter (omp_allocator_handle_kind = @INTPTR_T_KIND@)
       parameter (omp_alloctrait_key_kind = 4)
       parameter (omp_alloctrait_val_kind = @INTPTR_T_KIND@)
       parameter (omp_memspace_handle_kind = @INTPTR_T_KIND@)
+      parameter (omp_event_handle_kind = @INTPTR_T_KIND@)
       integer (omp_alloctrait_key_kind) omp_atk_sync_hint
       integer (omp_alloctrait_key_kind) omp_atk_alignment
       integer (omp_alloctrait_key_kind) omp_atk_access
@@ -245,6 +247,8 @@
       external omp_get_max_task_priority
       integer(4) omp_get_max_task_priority
 
+      external omp_fulfill_event
+
       external omp_set_affinity_format, omp_get_affinity_format
       external omp_display_affinity, omp_capture_affinity
       integer(4) omp_get_affinity_format
diff --git a/libgomp/priority_queue.c b/libgomp/priority_queue.c
index 6361f36..39b69f4 100644
--- a/libgomp/priority_queue.c
+++ b/libgomp/priority_queue.c
@@ -168,6 +168,63 @@ priority_queue_verify (enum priority_queue_type type,
 }
 #endif /* _LIBGOMP_CHECKING_ */
 
+/* Tree version of priority_queue_find.  */
+
+static struct gomp_task *
+priority_tree_find (enum priority_queue_type type,
+		    prio_splay_tree_node node,
+		    priority_queue_predicate pred)
+{
+ again:
+  if (!node)
+    return NULL;
+  struct gomp_task *task = priority_tree_find (type, node->right, pred);
+  if (task)
+    return task;
+  task = priority_node_to_task (type, node->key.l.tasks);
+  if (pred (task))
+    return task;
+  node = node->left;
+  goto again;
+}
+
+/* List version of priority_queue_find.  */
+
+static struct gomp_task *
+priority_list_find (enum priority_queue_type type,
+		     struct priority_list *list,
+		     priority_queue_predicate pred)
+{
+  struct priority_node *node = list->tasks;
+  if (!node)
+    return NULL;
+
+  do
+    {
+      struct gomp_task *task = priority_node_to_task (type, node);
+      if (pred (task))
+	return task;
+      node = node->next;
+    }
+  while (node != list->tasks);
+
+  return NULL;
+}
+
+/* Return the highest priority task in the priority queue HEAD that
+   satisfies the predicate PRED.  HEAD contains tasks of type TYPE.  */
+
+struct gomp_task *
+priority_queue_find (enum priority_queue_type type,
+		     struct priority_queue *head,
+		     priority_queue_predicate pred)
+{
+  if (priority_queue_multi_p (head))
+    return priority_tree_find (type, head->t.root, pred);
+  else
+    return priority_list_find (type, &head->l, pred);
+}
+
 /* Remove NODE from priority queue HEAD, wherever it may be inside the
    tree.  HEAD contains tasks of type TYPE.  */
 
diff --git a/libgomp/priority_queue.h b/libgomp/priority_queue.h
index 41f5c73..d8d31b7 100644
--- a/libgomp/priority_queue.h
+++ b/libgomp/priority_queue.h
@@ -113,6 +113,8 @@ enum priority_queue_type
   PQ_IGNORED = 999
 };
 
+typedef bool (*priority_queue_predicate) (struct gomp_task *);
+
 /* Priority queue implementation prototypes.  */
 
 extern bool priority_queue_task_in_queue_p (enum priority_queue_type,
@@ -122,6 +124,9 @@ extern void priority_queue_dump (enum priority_queue_type,
 				 struct priority_queue *);
 extern void priority_queue_verify (enum priority_queue_type,
 				   struct priority_queue *, bool);
+extern struct gomp_task *priority_queue_find (enum priority_queue_type,
+					      struct priority_queue *,
+					      priority_queue_predicate);
 extern void priority_tree_remove (enum priority_queue_type,
 				  struct priority_queue *,
 				  struct priority_node *);
diff --git a/libgomp/task.c b/libgomp/task.c
index 0e9887d..f02c1ea 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -86,6 +86,7 @@ gomp_init_task (struct gomp_task *task, struct gomp_task *parent_task,
   task->dependers = NULL;
   task->depend_hash = NULL;
   task->depend_count = 0;
+  task->detach = false;
 }
 
 /* Clean up a task, after completing it.  */
@@ -326,6 +327,12 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
     }
 }
 
+static bool
+task_fulfilled_p (struct gomp_task *task)
+{
+  return __atomic_load_n (&task->completion_sem, __ATOMIC_RELAXED);
+}
+
 /* Called when encountering an explicit task directive.  If IF_CLAUSE is
    false, then we must not delay in executing the task.  If UNTIED is true,
    then the task may be executed by any member of the team.
@@ -347,7 +354,7 @@ gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
 void
 GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	   long arg_size, long arg_align, bool if_clause, unsigned flags,
-	   void **depend, int priority)
+	   void **depend, int priority, void *detach)
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr->ts.team;
@@ -383,6 +390,9 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
   else if (priority > gomp_max_task_priority_var)
     priority = gomp_max_task_priority_var;
 
+  if ((flags & GOMP_TASK_FLAG_DETACH) == 0)
+    detach = NULL;
+
   if (!if_clause || team == NULL
       || (thr->task && thr->task->final_task)
       || team->task_count > 64 * team->nthreads)
@@ -404,6 +414,18 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
       task.final_task = (thr->task && thr->task->final_task)
 			|| (flags & GOMP_TASK_FLAG_FINAL);
       task.priority = priority;
+
+      if (detach)
+	{
+	  task.detach = true;
+	  gomp_sem_init (&task.completion_sem, 0);
+	  *(void **) detach = &task.completion_sem;
+	  if (data)
+	    *(void **) data = &task.completion_sem;
+
+	  gomp_debug (0, "New event: %p\n", &task.completion_sem);
+	}
+
       if (thr->task)
 	{
 	  task.in_tied_task = thr->task->in_tied_task;
@@ -420,6 +442,10 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	}
       else
 	fn (data);
+
+      if (detach && !task_fulfilled_p (&task))
+	gomp_sem_wait (&task.completion_sem);
+
       /* Access to "children" is normally done inside a task_lock
 	 mutex region, but the only way this particular task.children
 	 can be set is if this thread's task work function (fn)
@@ -458,6 +484,16 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
       task->kind = GOMP_TASK_UNDEFERRED;
       task->in_tied_task = parent->in_tied_task;
       task->taskgroup = taskgroup;
+      if (detach)
+	{
+	  task->detach = true;
+	  gomp_sem_init (&task->completion_sem, 0);
+	  *(void **) detach = &task->completion_sem;
+	  if (data)
+	    *(void **) data = &task->completion_sem;
+
+	  gomp_debug (0, "New event: %p\n", &task->completion_sem);
+	}
       thr->task = task;
       if (cpyfn)
 	{
@@ -1325,6 +1361,28 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
   while (1)
     {
       bool cancelled = false;
+
+      /* Look for a queued detached task with a fulfilled completion event
+	 that is ready to finish.  */
+      child_task = priority_queue_find (PQ_TEAM, &team->task_detach_queue,
+					task_fulfilled_p);
+      if (child_task)
+	{
+	  priority_queue_remove (PQ_TEAM, &team->task_detach_queue,
+				 child_task, MEMMODEL_RELAXED);
+	  --team->task_detach_count;
+	  gomp_debug (0, "thread %d: found task with fulfilled event %p\n",
+		      thr->ts.team_id, &child_task->completion_sem);
+
+	if (to_free)
+	  {
+	    gomp_finish_task (to_free);
+	    free (to_free);
+	    to_free = NULL;
+	  }
+	  goto finish_cancelled;
+	}
+
       if (!priority_queue_empty_p (&team->task_queue, MEMMODEL_RELAXED))
 	{
 	  bool ignored;
@@ -1392,29 +1450,43 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
       gomp_mutex_lock (&team->task_lock);
       if (child_task)
 	{
-	 finish_cancelled:;
-	  size_t new_tasks
-	    = gomp_task_run_post_handle_depend (child_task, team);
-	  gomp_task_run_post_remove_parent (child_task);
-	  gomp_clear_parent (&child_task->children_queue);
-	  gomp_task_run_post_remove_taskgroup (child_task);
-	  to_free = child_task;
-	  child_task = NULL;
-	  if (!cancelled)
-	    team->task_running_count--;
-	  if (new_tasks > 1)
+	  if (child_task->detach && !task_fulfilled_p (child_task))
 	    {
-	      do_wake = team->nthreads - team->task_running_count;
-	      if (do_wake > new_tasks)
-		do_wake = new_tasks;
+	      priority_queue_insert (PQ_TEAM, &team->task_detach_queue,
+				     child_task, child_task->priority,
+				     PRIORITY_INSERT_END,
+				     false, false);
+	      ++team->task_detach_count;
+	      gomp_debug (0, "thread %d: queueing task with event %p\n",
+			  thr->ts.team_id, &child_task->completion_sem);
+	      child_task = NULL;
 	    }
-	  if (--team->task_count == 0
-	      && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+	  else
 	    {
-	      gomp_team_barrier_done (&team->barrier, state);
-	      gomp_mutex_unlock (&team->task_lock);
-	      gomp_team_barrier_wake (&team->barrier, 0);
-	      gomp_mutex_lock (&team->task_lock);
+	     finish_cancelled:;
+	      size_t new_tasks
+		= gomp_task_run_post_handle_depend (child_task, team);
+	      gomp_task_run_post_remove_parent (child_task);
+	      gomp_clear_parent (&child_task->children_queue);
+	      gomp_task_run_post_remove_taskgroup (child_task);
+	      to_free = child_task;
+	      child_task = NULL;
+	      if (!cancelled)
+		team->task_running_count--;
+	      if (new_tasks > 1)
+		{
+		  do_wake = team->nthreads - team->task_running_count;
+		  if (do_wake > new_tasks)
+		    do_wake = new_tasks;
+		}
+	      if (--team->task_count == 0
+		  && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+		{
+		  gomp_team_barrier_done (&team->barrier, state);
+		  gomp_mutex_unlock (&team->task_lock);
+		  gomp_team_barrier_wake (&team->barrier, 0);
+		  gomp_mutex_lock (&team->task_lock);
+		}
 	    }
 	}
     }
@@ -2326,3 +2398,21 @@ omp_in_final (void)
 }
 
 ialias (omp_in_final)
+
+void
+omp_fulfill_event (omp_event_handle_t event)
+{
+  gomp_sem_t *sem = (gomp_sem_t *) event;
+  struct gomp_thread *thr = gomp_thread ();
+  struct gomp_team *team = thr ? thr->ts.team : NULL;
+
+  if (__atomic_load_n (sem, __ATOMIC_RELAXED))
+    gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", sem);
+
+  gomp_debug(0, "omp_fulfill_event: %p\n", sem);
+  gomp_sem_post (sem);
+  if (team)
+    gomp_team_barrier_wake (&team->barrier, 1);
+}
+
+ialias (omp_fulfill_event)
diff --git a/libgomp/team.c b/libgomp/team.c
index 85d5305..0f3707c 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -206,6 +206,9 @@ gomp_new_team (unsigned nthreads)
   team->work_share_cancelled = 0;
   team->team_cancelled = 0;
 
+  priority_queue_init (&team->task_detach_queue);
+  team->task_detach_count = 0;
+
   return team;
 }
 
@@ -221,6 +224,7 @@ free_team (struct gomp_team *team)
   gomp_barrier_destroy (&team->barrier);
   gomp_mutex_destroy (&team->task_lock);
   priority_queue_free (&team->task_queue);
+  priority_queue_free (&team->task_detach_queue);
   team_free (team);
 }
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
new file mode 100644
index 0000000..8583e37
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test chaining of detached tasks, with each task fulfilling the
+   completion event of the previous one.  */
+
+int main (void)
+{
+  omp_event_handle_t detach_event1, detach_event2;
+  int x = 0, y = 0, z = 0;
+
+  #pragma omp parallel
+    #pragma omp single
+    {
+      #pragma omp task detach(detach_event1)
+	x++;
+
+      #pragma omp task detach(detach_event2)
+      {
+	y++;
+	omp_fulfill_event (detach_event1);
+      }
+
+      #pragma omp task
+      {
+	z++;
+	omp_fulfill_event (detach_event2);
+      }
+    }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
new file mode 100644
index 0000000..943ac2a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test handling of detach clause with only a single thread.  The runtime
+   should not block when a task with an unfulfilled event finishes
+   running.  */
+
+int main (void)
+{
+  omp_event_handle_t detach_event1, detach_event2;
+  int x = 0, y = 0, z = 0;
+
+  #pragma omp parallel num_threads(1)
+    #pragma omp single
+    {
+      #pragma omp task detach(detach_event1)
+	x++;
+
+      #pragma omp task detach(detach_event2)
+      {
+	y++;
+	omp_fulfill_event (detach_event1);
+      }
+
+      #pragma omp task
+      {
+	z++;
+	omp_fulfill_event (detach_event2);
+      }
+    }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
new file mode 100644
index 0000000..2609fb1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test the task detach clause used together with dependencies.  */
+
+int main (void)
+{
+  omp_event_handle_t detach_event;
+  int x = 0, y = 0, z = 0;
+  int dep;
+
+  #pragma omp parallel
+    #pragma omp single
+    {
+      #pragma omp task depend(out:dep) detach(detach_event)
+	x++;
+
+      #pragma omp task
+      {
+	y++;
+	omp_fulfill_event(detach_event);
+      }
+
+      #pragma omp task depend(in:dep)
+	z++;
+    }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
new file mode 100644
index 0000000..eeb9554
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-4.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test detach clause, where a task fulfills its own completion event.  */
+
+int main (void)
+{
+  omp_event_handle_t detach_event;
+  int x = 0;
+
+  detach_event = (omp_event_handle_t) 0x123456789abcdef0;
+
+  #pragma omp parallel
+    #pragma omp single
+      #pragma omp task detach(detach_event)
+      {
+	x++;
+	omp_fulfill_event(detach_event);
+      }
+
+  assert (x == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
new file mode 100644
index 0000000..5a01517
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test tasks with detach clause.  Each thread spawns off a chain of tasks,
+   that can then be executed by any available thread.  */
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+  int thread_count;
+  omp_event_handle_t detach_event1, detach_event2;
+
+  #pragma omp parallel firstprivate(detach_event1, detach_event2)
+  {
+    #pragma omp single
+      thread_count = omp_get_num_threads();
+
+    #pragma omp task detach(detach_event1) untied
+      #pragma omp atomic update
+	x++;
+
+    #pragma omp task detach(detach_event2) untied
+    {
+      #pragma omp atomic update
+	y++;
+      omp_fulfill_event (detach_event1);
+    }
+
+    #pragma omp task untied
+    {
+      #pragma omp atomic update
+	z++;
+      omp_fulfill_event (detach_event2);
+    }
+  }
+
+  assert (x == thread_count);
+  assert (y == thread_count);
+  assert (z == thread_count);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
new file mode 100644
index 0000000..b5f68cc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+/* Test tasks with detach clause on an offload device.  Each device
+   thread spawns off a chain of tasks, that can then be executed by
+   any available thread.  */
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+  int thread_count;
+  omp_event_handle_t detach_event1, detach_event2;
+
+  #pragma omp target map(tofrom: x, y, z) map(from: thread_count)
+    #pragma omp parallel firstprivate(detach_event1, detach_event2)
+      {
+	#pragma omp single
+	  thread_count = omp_get_num_threads();
+
+	#pragma omp task detach(detach_event1) untied
+	  #pragma omp atomic update
+	    x++;
+
+	#pragma omp task detach(detach_event2) untied
+	{
+	  #pragma omp atomic update
+	    y++;
+	  omp_fulfill_event (detach_event1);
+	}
+
+	#pragma omp task untied
+	{
+	  #pragma omp atomic update
+	    z++;
+	  omp_fulfill_event (detach_event2);
+	}
+
+	#pragma omp taskwait
+      }
+
+  assert (x == thread_count);
+  assert (y == thread_count);
+  assert (z == thread_count);
+}
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-1.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
new file mode 100644
index 0000000..217bf65
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
@@ -0,0 +1,33 @@
+! { dg-do run }
+
+! Test chaining of detached tasks, with each task fulfilling the
+! completion event of the previous one.
+
+program task_detach_1
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+
+  !$omp parallel
+    !$omp single
+      !$omp task detach(detach_event1)
+        x = x + 1
+      !$omp end task
+
+      !$omp task detach(detach_event2)
+        y = y + 1
+	call omp_fulfill_event (detach_event1)
+      !$omp end task
+
+      !$omp task
+        z = z + 1
+	call omp_fulfill_event (detach_event2)
+      !$omp end task
+    !$omp end single
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-2.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
new file mode 100644
index 0000000..ecb4829
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
@@ -0,0 +1,34 @@
+! { dg-do run }
+
+! Test handling of detach clause with only a single thread.  The runtime
+! should not block when a task with an unfulfilled event finishes
+! running.
+
+program task_detach_2
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+
+  !$omp parallel num_threads(1)
+    !$omp single
+      !$omp task detach(detach_event1)
+        x = x + 1
+      !$omp end task
+
+      !$omp task detach(detach_event2)
+        y = y + 1
+	call omp_fulfill_event (detach_event1)
+      !$omp end task
+
+      !$omp task
+        z = z + 1
+	call omp_fulfill_event (detach_event2)
+      !$omp end task
+    !$omp end single
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-3.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
new file mode 100644
index 0000000..bdf93a5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
@@ -0,0 +1,33 @@
+! { dg-do run }
+
+! Test the task detach clause used together with dependencies.
+
+program task_detach_3
+
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event
+  integer :: x = 0, y = 0, z = 0
+  integer :: dep
+
+  !$omp parallel
+    !$omp single
+      !$omp task depend(out:dep) detach(detach_event)
+        x = x + 1
+      !$omp end task
+
+      !$omp task
+        y = y + 1
+	call omp_fulfill_event(detach_event)
+      !$omp end task
+
+      !$omp task depend(in:dep)
+        z = z + 1
+      !$omp end task
+    !$omp end single
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-4.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-4.f90
new file mode 100644
index 0000000..6d0843c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-4.f90
@@ -0,0 +1,22 @@
+! { dg-do run }
+
+! Test detach clause, where a task fulfills its own completion event.
+
+program task_detach_4
+
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event
+  integer :: x = 0
+
+  !$omp parallel
+    !$omp single
+      !$omp task detach(detach_event)
+        x = x + 1
+	call omp_fulfill_event(detach_event)
+      !$omp end task
+    !$omp end single
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-5.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
new file mode 100644
index 0000000..955d687
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
@@ -0,0 +1,39 @@
+! { dg-do run }
+
+! Test tasks with detach clause.  Each thread spawns off a chain of tasks,
+! that can then be executed by any available thread.
+
+program task_detach_5
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+  integer :: thread_count
+
+  !$omp parallel firstprivate(detach_event1, detach_event2)
+    !$omp single
+      thread_count = omp_get_num_threads()
+    !$omp end single
+
+    !$omp task detach(detach_event1) untied
+      !$omp atomic update
+	x = x + 1
+    !$omp end task
+
+    !$omp task detach(detach_event2) untied
+      !$omp atomic update
+	y = y + 1
+      call omp_fulfill_event (detach_event1);
+    !$omp end task
+
+    !$omp task untied
+      !$omp atomic update
+	z = z + 1
+      call omp_fulfill_event (detach_event2);
+    !$omp end task
+  !$omp end parallel
+
+  if (x /= thread_count) stop 1
+  if (y /= thread_count) stop 2
+  if (z /= thread_count) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
new file mode 100644
index 0000000..0fe2155
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
@@ -0,0 +1,44 @@
+! { dg-do run }
+
+! Test tasks with detach clause on an offload device.  Each device
+! thread spawns off a chain of tasks, that can then be executed by
+! any available thread.
+
+program task_detach_6
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+  integer :: thread_count
+
+  !$omp target map(tofrom: x, y, z) map(from: thread_count)
+    !$omp parallel firstprivate(detach_event1, detach_event2)
+      !$omp single
+	thread_count = omp_get_num_threads()
+      !$omp end single
+
+      !$omp task detach(detach_event1) untied
+	!$omp atomic update
+	  x = x + 1
+      !$omp end task
+
+      !$omp task detach(detach_event2) untied
+	!$omp atomic update
+	  y = y + 1
+	call omp_fulfill_event (detach_event1);
+      !$omp end task
+
+      !$omp task untied
+	!$omp atomic update
+	  z = z + 1
+	call omp_fulfill_event (detach_event2);
+      !$omp end task
+
+      !$omp taskwait
+    !$omp end parallel
+  !$omp end target
+
+  if (x /= thread_count) stop 1
+  if (y /= thread_count) stop 2
+  if (z /= thread_count) stop 3
+end program
Kwok Cheung Yeung Jan. 15, 2021, 4:58 p.m. UTC | #10
On 15/01/2021 3:07 pm, Kwok Cheung Yeung wrote:
> I have tested bootstrapping on x86_64 (no offloading) with no issues, and 
> running the libgomp testsuite with Nvidia offloading shows no regressions. I 
> have also tested all the gomp.exp tests in the main gcc testsuite, also with no 
> issues. I am currently still running the full testsuite, but do not anticipate 
> any problems.
> 
> Okay to commit on trunk, if the full testsuite run does not show any regressions?

Found an issue already :-( - the libgomp include files are not found when the 
tests are run via 'make check'. I have now included the relevant parts of the 
include files in the tests themselves. Okay for trunk (to be merged into the 
main patch)?

Thanks

Kwok
diff --git a/gcc/testsuite/c-c++-common/gomp/task-detach-1.c b/gcc/testsuite/c-c++-common/gomp/task-detach-1.c
index c7dda82..f50f748 100644
--- a/gcc/testsuite/c-c++-common/gomp/task-detach-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/task-detach-1.c
@@ -1,7 +1,12 @@
 /* { dg-do compile } */
 /* { dg-options "-fopenmp" } */
 
-#include <omp.h>
+typedef enum omp_event_handle_t
+{
+  __omp_event_handle_t_max__ = __UINTPTR_MAX__
+} omp_event_handle_t;
+
+extern void omp_fulfill_event (omp_event_handle_t);
 
 void f (omp_event_handle_t x, omp_event_handle_t y, int z)
 {
diff --git a/gcc/testsuite/g++.dg/gomp/task-detach-1.C b/gcc/testsuite/g++.dg/gomp/task-detach-1.C
index 443d3e8..2f0c650 100644
--- a/gcc/testsuite/g++.dg/gomp/task-detach-1.C
+++ b/gcc/testsuite/g++.dg/gomp/task-detach-1.C
@@ -1,7 +1,10 @@
 // { dg-do compile }
 // { dg-options "-fopenmp" }
 
-#include <omp.h>
+typedef enum omp_event_handle_t
+{
+  __omp_event_handle_t_max__ = __UINTPTR_MAX__
+} omp_event_handle_t;
 
 template <typename T>
 void func ()
diff --git a/gcc/testsuite/gcc.dg/gomp/task-detach-1.c b/gcc/testsuite/gcc.dg/gomp/task-detach-1.c
index fa7315e..611044d 100644
--- a/gcc/testsuite/gcc.dg/gomp/task-detach-1.c
+++ b/gcc/testsuite/gcc.dg/gomp/task-detach-1.c
@@ -1,7 +1,12 @@
 /* { dg-do compile } */
 /* { dg-options "-fopenmp" } */
 
-#include <omp.h>
+typedef enum omp_event_handle_t
+{
+  __omp_event_handle_t_max__ = __UINTPTR_MAX__
+} omp_event_handle_t;
+
+extern void omp_fulfill_event (omp_event_handle_t);
 
 void f (omp_event_handle_t x)
 {
diff --git a/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90 b/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90
index dc51345..114068e 100644
--- a/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90
@@ -2,8 +2,10 @@
 ! { dg-options "-fopenmp" }
 
 program task_detach_1
-  use omp_lib
-
+  use iso_c_binding, only: c_intptr_t
+  implicit none
+  
+  integer, parameter :: omp_event_handle_kind = c_intptr_t
   integer (kind=omp_event_handle_kind) :: x, y
   integer :: z
Jakub Jelinek Jan. 15, 2021, 6:01 p.m. UTC | #11
On Fri, Jan 15, 2021 at 04:58:25PM +0000, Kwok Cheung Yeung wrote:
> On 15/01/2021 3:07 pm, Kwok Cheung Yeung wrote:
> > I have tested bootstrapping on x86_64 (no offloading) with no issues,
> > and running the libgomp testsuite with Nvidia offloading shows no
> > regressions. I have also tested all the gomp.exp tests in the main gcc
> > testsuite, also with no issues. I am currently still running the full
> > testsuite, but do not anticipate any problems.
> > 
> > Okay to commit on trunk, if the full testsuite run does not show any regressions?
> 
> Found an issue already :-( - the libgomp include files are not found when
> the tests are run via 'make check'. I have now included the relevant parts
> of the include files in the tests themselves. Okay for trunk (to be merged
> into the main patch)?

This incremental patch is ok.
I'll try to review the previous patch tomorrow.

	Jakub
Jakub Jelinek Jan. 16, 2021, 9:25 a.m. UTC | #12
On Fri, Jan 15, 2021 at 03:07:56PM +0000, Kwok Cheung Yeung wrote:
> +	{
> +	  tree detach_decl = OMP_CLAUSE_DECL (*detach_seen);
> +
> +	  for (pc = &clauses, c = clauses; c ; c = *pc)
> +	    {
> +	      bool remove = false;
> +	      if ((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SHARED
> +		   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE
> +		   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE
> +		   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE)
> +		  && OMP_CLAUSE_DECL (c) == detach_decl)
> +		{
> +		  error_at (OMP_CLAUSE_LOCATION (c),
> +			    "the event handle of a %<detach%> clause "
> +			    "should not be in a data-sharing clause");
> +		  remove = true;
> +		}

I think you don't need this loop, instead you could just check
		if (bitmap_bit_p (&generic_head, DECL_UID (detach_decl))
		    || bitmap_bit_p (&firstprivate_head, DECL_UID (detach_decl))
		    || bitmap_bit_p (&lastprivate_head, DECL_UID (detach_decl)))

> +		      || TREE_CODE (type) != ENUMERAL_TYPE
		      || DECL_NAME (TYPE_NAME (type))
			   != get_identifier ("omp_event_handle_t")))

The formatting is off, and I think as a service to Emacs users we are
usually formatting it as:
		      || (DECL_NAME (TYPE_NAME (type))
			  != get_identifier ("omp_event_handle_t"))))

> +
> +  detach = detach
> +	   ? build_fold_addr_expr (OMP_CLAUSE_DECL (detach))
> +	   : null_pointer_node;

Again formatting nit, please write:
  detach = (detach
	    ? build_fold_addr_expr (OMP_CLAUSE_DECL (detach))
	    : null_pointer_node);

> @@ -2416,6 +2421,64 @@ finish_taskreg_scan (omp_context *ctx)
>  	      TYPE_FIELDS (ctx->srecord_type) = f1;
>  	    }
>  	}
> +      if (detach_clause)
> +	{
> +	  tree c, field;
> +
> +	  /* Look for a firstprivate clause with the detach event handle.  */
> +	  for (c = gimple_omp_taskreg_clauses (ctx->stmt);
> +	       c; c = OMP_CLAUSE_CHAIN (c))
> +	    {
> +	      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_FIRSTPRIVATE)
> +		continue;
> +	      if (maybe_lookup_decl_in_outer_ctx (OMP_CLAUSE_DECL (c), ctx)
> +		  == OMP_CLAUSE_DECL (detach_clause))
> +		break;
> +	    }
> +
> +	  if (c)
> +	    field = lookup_field (OMP_CLAUSE_DECL (c), ctx);
> +	  else
> +	    {
> +	      /* The detach event handle is not referenced within the
> +		 task context, so add a temporary field for it here.  */
> +	      field = build_decl (OMP_CLAUSE_LOCATION (detach_clause),
> +				  FIELD_DECL, NULL_TREE, ptr_type_node);
> +	      insert_field_into_struct (ctx->record_type, field);

Can't you just force the firstprivate clause during gimplification, so that
it doesn't go away even if not referenced?
That would mean just forcing in | GOVD_SEEN when it is added.
If not, not a big deal, just thought it could be easier.

> +
> +	      if (ctx->srecord_type)
> +		{
> +		  tree sfield
> +		    = build_decl (OMP_CLAUSE_LOCATION (detach_clause),
> +				  FIELD_DECL, NULL_TREE, ptr_type_node);
> +		  insert_field_into_struct (ctx->srecord_type, sfield);
> +		}
> +	    }
> +
> +	  /* Move field corresponding to the detach clause first.
> +	     This is filled by GOMP_task and needs to be in a
> +	     specific position.  */
> +	  p = &TYPE_FIELDS (ctx->record_type);
> +	  while (*p)
> +	    if (*p == field)
> +	      *p = DECL_CHAIN (*p);
> +	    else
> +	      p = &DECL_CHAIN (*p);
> +	  DECL_CHAIN (field) = TYPE_FIELDS (ctx->record_type);
> +	  TYPE_FIELDS (ctx->record_type) = field;
> +	  if (ctx->srecord_type)
> +	    {
> +	      field = lookup_sfield (OMP_CLAUSE_DECL (detach_clause), ctx);
> +	      p = &TYPE_FIELDS (ctx->srecord_type);
> +	      while (*p)
> +		if (*p == field)
> +		  *p = DECL_CHAIN (*p);
> +		else
> +		  p = &DECL_CHAIN (*p);
> +	      DECL_CHAIN (field) = TYPE_FIELDS (ctx->srecord_type);
> +	      TYPE_FIELDS (ctx->srecord_type) = field;
> +	    }
> +	}
>        layout_type (ctx->record_type);
>        fixup_child_record_type (ctx);
>        if (ctx->srecord_type)
> diff --git a/gcc/testsuite/c-c++-common/gomp/task-detach-1.c b/gcc/testsuite/c-c++-common/gomp/task-detach-1.c
> new file mode 100644
> index 0000000..c7dda82
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/gomp/task-detach-1.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fopenmp" } */
> +
> +#include <omp.h>
> +
> +void f (omp_event_handle_t x, omp_event_handle_t y, int z)
> +{
> +  #pragma omp task detach (x) detach (y) /* { dg-error "there can be at most one 'detach' clause in a task construct" } */

It would be on a task construct rather than in a task construct, but the
common wording for this diagnostics is
"too many %qs clauses", "detach"
Please use that wording.
> +
> +template <typename T>
> +void func ()
> +{
> +  T t;
> +  #pragma omp task detach (t) // { dg-error "'detach' clause event handle has type 'int' rather than 'omp_event_handle_t'" }
> +    ;
> +}
> +
> +void f()
> +{
> +  func <omp_event_handle_t> ();
> +  func <int> (); // { dg-message "required from here" }
> +}

I'd suggest two different templates with the same content, because the
dg-error is in the template and splitting them might make it clearer in
which one it is from.  The required from here message is I think ignored by
default, so if it incorrectly diagnosed both spots, one wouldn't notice.
> +  
> +  !$omp task detach (x) firstprivate (x) ! { dg-error "DETACH event handle 'x' in FIRSTPRIVATE clause at \\\(1\\\)" }
> +  !$omp end task
> +
> +  !$omp task detach (x) shared (x) ! { dg-error "DETACH event handle 'x' in SHARED clause at \\\(1\\\)" }
> +  !$omp end task
> +end program
> \ No newline at end of file

Please add newline.
> +
> +  gomp_debug(0, "omp_fulfill_event: %p\n", sem);

Space before (.

Otherwise LGTM, thanks.

	Jakub
Kwok Cheung Yeung Jan. 16, 2021, 7:19 p.m. UTC | #13
Thanks for the review.

On 16/01/2021 9:25 am, Jakub Jelinek wrote:
> On Fri, Jan 15, 2021 at 03:07:56PM +0000, Kwok Cheung Yeung wrote:
>> +	{
>> +	  tree detach_decl = OMP_CLAUSE_DECL (*detach_seen);
>> +
>> +	  for (pc = &clauses, c = clauses; c ; c = *pc)
>> +	    {
>> +	      bool remove = false;
>> +	      if ((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SHARED
>> +		   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE
>> +		   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE
>> +		   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE)
>> +		  && OMP_CLAUSE_DECL (c) == detach_decl)
>> +		{
>> +		  error_at (OMP_CLAUSE_LOCATION (c),
>> +			    "the event handle of a %<detach%> clause "
>> +			    "should not be in a data-sharing clause");
>> +		  remove = true;
>> +		}
> 
> I think you don't need this loop, instead you could just check
> 		if (bitmap_bit_p (&generic_head, DECL_UID (detach_decl))
> 		    || bitmap_bit_p (&firstprivate_head, DECL_UID (detach_decl))
> 		    || bitmap_bit_p (&lastprivate_head, DECL_UID (detach_decl)))
> 

I think the main problem with this is that you cannot then point to the location 
of the offending data-sharing clause. Given a task construct with 'detach(x) 
shared(x)', I would tend to think of the 'shared(x)' as being the incorrect part 
here, and so would want the error to point to it. Unless you have any 
objections, I am inclined to keep this as it is?

>> @@ -2416,6 +2421,64 @@ finish_taskreg_scan (omp_context *ctx)
>>   	      TYPE_FIELDS (ctx->srecord_type) = f1;
>>   	    }
>>   	}
>> +      if (detach_clause)
>> +	{
>> +	  tree c, field;
>> +
>> +	  /* Look for a firstprivate clause with the detach event handle.  */
>> +	  for (c = gimple_omp_taskreg_clauses (ctx->stmt);
>> +	       c; c = OMP_CLAUSE_CHAIN (c))
>> +	    {
>> +	      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_FIRSTPRIVATE)
>> +		continue;
>> +	      if (maybe_lookup_decl_in_outer_ctx (OMP_CLAUSE_DECL (c), ctx)
>> +		  == OMP_CLAUSE_DECL (detach_clause))
>> +		break;
>> +	    }
>> +
>> +	  if (c)
>> +	    field = lookup_field (OMP_CLAUSE_DECL (c), ctx);
>> +	  else
>> +	    {
>> +	      /* The detach event handle is not referenced within the
>> +		 task context, so add a temporary field for it here.  */
>> +	      field = build_decl (OMP_CLAUSE_LOCATION (detach_clause),
>> +				  FIELD_DECL, NULL_TREE, ptr_type_node);
>> +	      insert_field_into_struct (ctx->record_type, field);
> 
> Can't you just force the firstprivate clause during gimplification, so that
> it doesn't go away even if not referenced?
> That would mean just forcing in | GOVD_SEEN when it is added.
> If not, not a big deal, just thought it could be easier.
> 

I've tried this diff:

         case OMP_CLAUSE_DETACH:
-         decl = OMP_CLAUSE_DECL (c);
-         goto do_notice;
+         flags = GOVD_FIRSTPRIVATE | GOVD_SEEN;
+         goto do_add;

and just asserted that a suitable firstprivate clause is found in 
finish_taskreg_scan, and it seems to work fine :-).

>> +  #pragma omp task detach (x) detach (y) /* { dg-error "there can be at most one 'detach' clause in a task construct" } */
> 
> It would be on a task construct rather than in a task construct, but the
> common wording for this diagnostics is
> "too many %qs clauses", "detach"
> Please use that wording.

Done, though I don't see the point of using a %qs format code with a constant 
string here...

I have applied your other suggestions, and have retested the gomp.exp and 
libgomp tests. The full testrun started yesterday showed no regressions. If you 
have no further issues then I will commit this later tonight ahead of stage4.

Thanks

Kwok
commit 68f17e5d3f28b4150fc0fa9112671403c4519c05
Author: Kwok Cheung Yeung <kcy@codesourcery.com>
Date:   Sat Jan 16 09:27:28 2021 -0800

    More task detach fixes.

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 4e9b21b..b938e6a 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -14942,8 +14942,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  if (detach_seen)
 	    {
 	      error_at (OMP_CLAUSE_LOCATION (c),
-			"there can be at most one %<detach%> clause in a "
-			"task construct");
+			"too many %<detach%> clauses on a task construct");
 	      remove = true;
 	      break;
 	    }
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 9dfaea2..c28cde0 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7425,8 +7425,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  if (detach_seen)
 	    {
 	      error_at (OMP_CLAUSE_LOCATION (c),
-			"there can be at most one %<detach%> clause in a "
-			"task construct");
+			"too many %<detach%> clauses on a task construct");
 	      remove = true;
 	      break;
 	    }
@@ -7436,8 +7435,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	      if (!type_dependent_expression_p (t)
 		  && (!INTEGRAL_TYPE_P (type)
 		      || TREE_CODE (type) != ENUMERAL_TYPE
-		      || DECL_NAME (TYPE_NAME (type))
-			   != get_identifier ("omp_event_handle_t")))
+		      || (DECL_NAME (TYPE_NAME (type))
+			  != get_identifier ("omp_event_handle_t"))))
 		{
 		  error_at (OMP_CLAUSE_LOCATION (c),
 			    "%<detach%> clause event handle "
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 5fbe2fc..d2ac5f9 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9764,8 +9764,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	  break;
 
 	case OMP_CLAUSE_DETACH:
-	  decl = OMP_CLAUSE_DECL (c);
-	  goto do_notice;
+	  flags = GOVD_FIRSTPRIVATE | GOVD_SEEN;
+	  goto do_add;
 
 	case OMP_CLAUSE_IF:
 	  if (OMP_CLAUSE_IF_MODIFIER (c) != ERROR_MARK
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index ea0f058..7559ec8 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -860,9 +860,9 @@ expand_task_call (struct omp_region *region, basic_block bb,
 
   gsi = gsi_last_nondebug_bb (bb);
 
-  detach = detach
-	   ? build_fold_addr_expr (OMP_CLAUSE_DECL (detach))
-	   : null_pointer_node;
+  detach = (detach
+	    ? build_fold_addr_expr (OMP_CLAUSE_DECL (detach))
+	    : null_pointer_node);
 
   tree t = gimple_omp_task_data_arg (entry_stmt);
   if (t == NULL)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 60a651f..c1267dc 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2436,24 +2436,8 @@ finish_taskreg_scan (omp_context *ctx)
 		break;
 	    }
 
-	  if (c)
-	    field = lookup_field (OMP_CLAUSE_DECL (c), ctx);
-	  else
-	    {
-	      /* The detach event handle is not referenced within the
-		 task context, so add a temporary field for it here.  */
-	      field = build_decl (OMP_CLAUSE_LOCATION (detach_clause),
-				  FIELD_DECL, NULL_TREE, ptr_type_node);
-	      insert_field_into_struct (ctx->record_type, field);
-
-	      if (ctx->srecord_type)
-		{
-		  tree sfield
-		    = build_decl (OMP_CLAUSE_LOCATION (detach_clause),
-				  FIELD_DECL, NULL_TREE, ptr_type_node);
-		  insert_field_into_struct (ctx->srecord_type, sfield);
-		}
-	    }
+	  gcc_assert (c);
+	  field = lookup_field (OMP_CLAUSE_DECL (c), ctx);
 
 	  /* Move field corresponding to the detach clause first.
 	     This is filled by GOMP_task and needs to be in a
diff --git a/gcc/testsuite/c-c++-common/gomp/task-detach-1.c b/gcc/testsuite/c-c++-common/gomp/task-detach-1.c
index f50f748..4558bc1 100644
--- a/gcc/testsuite/c-c++-common/gomp/task-detach-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/task-detach-1.c
@@ -10,7 +10,7 @@ extern void omp_fulfill_event (omp_event_handle_t);
 
 void f (omp_event_handle_t x, omp_event_handle_t y, int z)
 {
-  #pragma omp task detach (x) detach (y) /* { dg-error "there can be at most one 'detach' clause in a task construct" } */
+  #pragma omp task detach (x) detach (y) /* { dg-error "too many 'detach' clauses on a task construct" } */
     ;
 
   #pragma omp task mergeable detach (x) /* { dg-error "'detach' clause must not be used together with 'mergeable' clause" } */
diff --git a/gcc/testsuite/g++.dg/gomp/task-detach-1.C b/gcc/testsuite/g++.dg/gomp/task-detach-1.C
index 2f0c650..6028cb4 100644
--- a/gcc/testsuite/g++.dg/gomp/task-detach-1.C
+++ b/gcc/testsuite/g++.dg/gomp/task-detach-1.C
@@ -7,7 +7,15 @@ typedef enum omp_event_handle_t
 } omp_event_handle_t;
 
 template <typename T>
-void func ()
+void foo ()
+{
+  T t;
+  #pragma omp task detach (t)
+    ;
+}
+
+template <typename T>
+void bar ()
 {
   T t;
   #pragma omp task detach (t) // { dg-error "'detach' clause event handle has type 'int' rather than 'omp_event_handle_t'" }
@@ -16,6 +24,6 @@ void func ()
 
 void f()
 {
-  func <omp_event_handle_t> ();
-  func <int> (); // { dg-message "required from here" }
+  foo <omp_event_handle_t> ();
+  bar <int> (); // { dg-message "required from here" }
 }
diff --git a/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90 b/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90
index 114068e..4763f13 100644
--- a/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/task-detach-1.f90
@@ -26,4 +26,4 @@ program task_detach_1
 
   !$omp task detach (x) shared (x) ! { dg-error "DETACH event handle 'x' in SHARED clause at \\\(1\\\)" }
   !$omp end task
-end program
\ No newline at end of file
+end program
diff --git a/libgomp/task.c b/libgomp/task.c
index f02c1ea..5ece878 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -2409,7 +2409,7 @@ omp_fulfill_event (omp_event_handle_t event)
   if (__atomic_load_n (sem, __ATOMIC_RELAXED))
     gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", sem);
 
-  gomp_debug(0, "omp_fulfill_event: %p\n", sem);
+  gomp_debug (0, "omp_fulfill_event: %p\n", sem);
   gomp_sem_post (sem);
   if (team)
     gomp_team_barrier_wake (&team->barrier, 1);
Jakub Jelinek Jan. 16, 2021, 7:45 p.m. UTC | #14
On Sat, Jan 16, 2021 at 07:19:51PM +0000, Kwok Cheung Yeung wrote:
> > I think you don't need this loop, instead you could just check
> > 		if (bitmap_bit_p (&generic_head, DECL_UID (detach_decl))
> > 		    || bitmap_bit_p (&firstprivate_head, DECL_UID (detach_decl))
> > 		    || bitmap_bit_p (&lastprivate_head, DECL_UID (detach_decl)))
> > 
> 
> I think the main problem with this is that you cannot then point to the
> location of the offending data-sharing clause. Given a task construct with
> 'detach(x) shared(x)', I would tend to think of the 'shared(x)' as being the
> incorrect part here, and so would want the error to point to it. Unless you
> have any objections, I am inclined to keep this as it is?

Ok.  As detach clause is at most one, the loop is acceptable, but we
certainly would want to avoid O(n^2) complexities in number of clauses.

> I've tried this diff:
> 
>         case OMP_CLAUSE_DETACH:
> -         decl = OMP_CLAUSE_DECL (c);
> -         goto do_notice;
> +         flags = GOVD_FIRSTPRIVATE | GOVD_SEEN;
> +         goto do_add;
> 
> and just asserted that a suitable firstprivate clause is found in
> finish_taskreg_scan, and it seems to work fine :-).

Yeah, that should DTRT.
> 
> > > +  #pragma omp task detach (x) detach (y) /* { dg-error "there can be at most one 'detach' clause in a task construct" } */
> > 
> > It would be on a task construct rather than in a task construct, but the
> > common wording for this diagnostics is
> > "too many %qs clauses", "detach"
> > Please use that wording.
> 
> Done, though I don't see the point of using a %qs format code with a
> constant string here...

Helping translators.
They already have the "too many %qs clauses" string to translate (and many
have translated it already), the detach word shouldn't be translated, and we
don't want them to translate
"too many %<detach%> clauses"
"too many %<if%> clauses"
"too many %<schedule%> clauses"
...

> 
> I have applied your other suggestions, and have retested the gomp.exp and
> libgomp tests. The full testrun started yesterday showed no regressions. If
> you have no further issues then I will commit this later tonight ahead of
> stage4.

LGTM, thanks.

	Jakub
Rainer Orth Jan. 17, 2021, 11:26 a.m. UTC | #15
Hi Jakub,

>> I have applied your other suggestions, and have retested the gomp.exp and
>> libgomp tests. The full testrun started yesterday showed no regressions. If
>> you have no further issues then I will commit this later tonight ahead of
>> stage4.
>
> LGTM, thanks.

this patch broke Solaris bootstrap, but probably all non-Linux targets:

/vol/gcc/src/hg/master/local/libgomp/task.c: In function 'task_fulfilled_p':
/vol/gcc/src/hg/master/local/libgomp/task.c:334:1: error: control reaches end of non-void function [-Werror=return-type]
  334 | }
      | ^

task_fulfilled_p is

  return __atomic_load_n (&task->completion_sem, __ATOMIC_RELAXED);

but in config/posix/sem.h gomp_sem_t is (for
!HAVE_BROKEN_POSIX_SEMAPHORES):

  typedef sem_t gomp_sem_t;

and sem_t being a struct in Solaris <semaphores.h>.

	Rainer
Jakub Jelinek Jan. 17, 2021, 12:33 p.m. UTC | #16
On Sun, Jan 17, 2021 at 12:26:26PM +0100, Rainer Orth wrote:
> >> I have applied your other suggestions, and have retested the gomp.exp and
> >> libgomp tests. The full testrun started yesterday showed no regressions. If
> >> you have no further issues then I will commit this later tonight ahead of
> >> stage4.
> >
> > LGTM, thanks.
> 
> this patch broke Solaris bootstrap, but probably all non-Linux targets:
> 
> /vol/gcc/src/hg/master/local/libgomp/task.c: In function 'task_fulfilled_p':
> /vol/gcc/src/hg/master/local/libgomp/task.c:334:1: error: control reaches end of non-void function [-Werror=return-type]
>   334 | }
>       | ^
> 
> task_fulfilled_p is
> 
>   return __atomic_load_n (&task->completion_sem, __ATOMIC_RELAXED);
> 
> but in config/posix/sem.h gomp_sem_t is (for
> !HAVE_BROKEN_POSIX_SEMAPHORES):
> 
>   typedef sem_t gomp_sem_t;
> 
> and sem_t being a struct in Solaris <semaphores.h>.

Oops.
I guess we want to add to sem.h some API to query current value of the
semaphore, which could be atomic load for the config/{linux,accel}/sem.h,
sem_getvalue for config/posix/sem.h (does Solaris implement that?)
and dunno what for config/rtems/sem.h.

Kwok, I guess you can reproduce it even on Linux with --disable-linux-futex

	Jakub
Rainer Orth Jan. 17, 2021, 12:40 p.m. UTC | #17
Hi Jakub,

> On Sun, Jan 17, 2021 at 12:26:26PM +0100, Rainer Orth wrote:
>> >> I have applied your other suggestions, and have retested the gomp.exp and
>> >> libgomp tests. The full testrun started yesterday showed no regressions. If
>> >> you have no further issues then I will commit this later tonight ahead of
>> >> stage4.
>> >
>> > LGTM, thanks.
>> 
>> this patch broke Solaris bootstrap, but probably all non-Linux targets:
>> 
>> /vol/gcc/src/hg/master/local/libgomp/task.c: In function 'task_fulfilled_p':
>> /vol/gcc/src/hg/master/local/libgomp/task.c:334:1: error: control reaches end of non-void function [-Werror=return-type]
>>   334 | }
>>       | ^
>> 
>> task_fulfilled_p is
>> 
>>   return __atomic_load_n (&task->completion_sem, __ATOMIC_RELAXED);
>> 
>> but in config/posix/sem.h gomp_sem_t is (for
>> !HAVE_BROKEN_POSIX_SEMAPHORES):
>> 
>>   typedef sem_t gomp_sem_t;
>> 
>> and sem_t being a struct in Solaris <semaphores.h>.
>
> Oops.
> I guess we want to add to sem.h some API to query current value of the
> semaphore, which could be atomic load for the config/{linux,accel}/sem.h,
> sem_getvalue for config/posix/sem.h (does Solaris implement that?)

it does: this was already in POSIX.1-2001 and even Solaris 11.3 supports XPG6.

	Rainer
Andreas Schwab Jan. 17, 2021, 3:25 p.m. UTC | #18
On Jan 17 2021, Jakub Jelinek via Gcc-patches wrote:

> Kwok, I guess you can reproduce it even on Linux with --disable-linux-futex

And all targets that are not explicitly configured in
libcomp/configure.tgt, where --enable-linux-futex is a no-op.

Andreas.
Jakub Jelinek Jan. 17, 2021, 4:04 p.m. UTC | #19
On Sun, Jan 17, 2021 at 04:25:24PM +0100, Andreas Schwab wrote:
> On Jan 17 2021, Jakub Jelinek via Gcc-patches wrote:
> 
> > Kwok, I guess you can reproduce it even on Linux with --disable-linux-futex
> 
> And all targets that are not explicitly configured in
> libcomp/configure.tgt, where --enable-linux-futex is a no-op.

Completely untested patch (except for the linux futex version; and RTEMS
stuff is missing; I think it doesn't have a function for it but has a
counter in the struct, so perhaps fetch it manually from there), it is
Sunday, don't want to do more tonight:

--- libgomp/config/linux/sem.h.jj	2021-01-04 10:25:56.160037625 +0100
+++ libgomp/config/linux/sem.h	2021-01-17 16:49:39.900750416 +0100
@@ -85,4 +85,13 @@ gomp_sem_post (gomp_sem_t *sem)
   if (__builtin_expect (count & SEM_WAIT, 0))
     gomp_sem_post_slow (sem);
 }
+
+static inline int
+gomp_sem_getcount (gomp_sem_t *sem)
+{
+  int count = __atomic_load_n (sem, MEMMODEL_RELAXED);
+  if ((count & SEM_WAIT) != 0)
+    return -1;
+  return count / SEM_INC;
+}
 #endif /* GOMP_SEM_H */
--- libgomp/config/posix/sem.h.jj	2021-01-04 10:25:56.166037557 +0100
+++ libgomp/config/posix/sem.h	2021-01-17 16:49:53.605593659 +0100
@@ -64,6 +64,8 @@ extern void gomp_sem_post (gomp_sem_t *s
 
 extern void gomp_sem_destroy (gomp_sem_t *sem);
 
+extern int gomp_sem_getcount (gomp_sem_t *sem);
+
 #else /* HAVE_BROKEN_POSIX_SEMAPHORES  */
 
 typedef sem_t gomp_sem_t;
@@ -84,5 +86,13 @@ static inline void gomp_sem_destroy (gom
 {
   sem_destroy (sem);
 }
+
+static inline int gomp_sem_getcount (gomp_sem_t *sem)
+{
+  int val;
+  if (sem_getvalue (sem, &val) < 0)
+    return -1;
+  return val;  
+}
 #endif /* doesn't HAVE_BROKEN_POSIX_SEMAPHORES  */
 #endif /* GOMP_SEM_H  */
--- libgomp/config/posix/sem.c.jj	2021-01-04 10:25:56.184037354 +0100
+++ libgomp/config/posix/sem.c	2021-01-17 16:52:00.207145847 +0100
@@ -112,6 +112,26 @@ void gomp_sem_destroy (gomp_sem_t *sem)
 
   return;
 }
+
+int gomp_sem_getcount (gomp_sem_t *sem)
+{
+  int ret, count;
+
+  ret = pthread_mutex_lock (&sem->mutex);
+  if (ret)
+    return -1;
+
+  count = sem->value;
+
+  ret = pthread_mutex_unlock (&sem->mutex);
+  if (ret)
+    return -1;
+
+  if (count < 0)
+    return -1;
+
+  return count;
+}
 #else /* HAVE_BROKEN_POSIX_SEMAPHORES  */
 void
 gomp_sem_wait (gomp_sem_t *sem)
--- libgomp/config/accel/sem.h.jj	2021-01-04 10:25:56.261036482 +0100
+++ libgomp/config/accel/sem.h	2021-01-17 16:53:13.381309036 +0100
@@ -62,4 +62,13 @@ gomp_sem_post (gomp_sem_t *sem)
 {
   (void) __atomic_add_fetch (sem, 1, MEMMODEL_RELEASE);
 }
+
+static inline int
+gomp_sem_getcount (gomp_sem_t *sem)
+{
+  int count = __atomic_load_n (sem, MEMMODEL_RELAXED);
+  if (count < 0)
+    return -1;
+  return count;
+}
 #endif /* GOMP_SEM_H */
--- libgomp/task.c.jj	2021-01-16 22:52:33.749412323 +0100
+++ libgomp/task.c	2021-01-17 16:54:54.315154777 +0100
@@ -330,7 +330,7 @@ gomp_task_handle_depend (struct gomp_tas
 static bool
 task_fulfilled_p (struct gomp_task *task)
 {
-  return __atomic_load_n (&task->completion_sem, __ATOMIC_RELAXED);
+  return gomp_sem_getcount (&task->completion_sem) > 0;
 }
 
 /* Called when encountering an explicit task directive.  If IF_CLAUSE is
@@ -2406,7 +2406,7 @@ omp_fulfill_event (omp_event_handle_t ev
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr ? thr->ts.team : NULL;
 
-  if (__atomic_load_n (sem, __ATOMIC_RELAXED))
+  if (gomp_sem_getcount (sem) > 0)
     gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", sem);
 
   gomp_debug (0, "omp_fulfill_event: %p\n", sem);


	Jakub
Sebastian Huber Jan. 18, 2021, 6:10 a.m. UTC | #20
Hello Jakub,

On 17/01/2021 17:04, Jakub Jelinek via Gcc-patches wrote:

> On Sun, Jan 17, 2021 at 04:25:24PM +0100, Andreas Schwab wrote:
>> On Jan 17 2021, Jakub Jelinek via Gcc-patches wrote:
>>
>>> Kwok, I guess you can reproduce it even on Linux with --disable-linux-futex
>> And all targets that are not explicitly configured in
>> libcomp/configure.tgt, where --enable-linux-futex is a no-op.
> Completely untested patch (except for the linux futex version; and RTEMS
> stuff is missing; I think it doesn't have a function for it but has a
> counter in the struct, so perhaps fetch it manually from there), it is
> Sunday, don't want to do more tonight:

here is the RTEMS part:

diff --git a/libgomp/config/rtems/sem.h b/libgomp/config/rtems/sem.h
index 50b650ab807..0cd74153b05 100644
--- a/libgomp/config/rtems/sem.h
+++ b/libgomp/config/rtems/sem.h
@@ -47,6 +47,11 @@ static inline void gomp_sem_post (gomp_sem_t *sem)
    _Semaphore_Post (sem);
  }

+static inline int gomp_sem_getcount (gomp_sem_t *sem)
+{
+  return (int) __atomic_load_n (&sem->_count, MEMMODEL_RELAXED);
+}
+
  static inline void gomp_sem_destroy (gomp_sem_t *sem)
  {
    _Semaphore_Destroy (sem);
Jakub Jelinek Jan. 18, 2021, 6:21 a.m. UTC | #21
On Mon, Jan 18, 2021 at 07:10:07AM +0100, Sebastian Huber wrote:
> Hello Jakub,
> 
> On 17/01/2021 17:04, Jakub Jelinek via Gcc-patches wrote:
> 
> > On Sun, Jan 17, 2021 at 04:25:24PM +0100, Andreas Schwab wrote:
> > > On Jan 17 2021, Jakub Jelinek via Gcc-patches wrote:
> > > 
> > > > Kwok, I guess you can reproduce it even on Linux with --disable-linux-futex
> > > And all targets that are not explicitly configured in
> > > libcomp/configure.tgt, where --enable-linux-futex is a no-op.
> > Completely untested patch (except for the linux futex version; and RTEMS
> > stuff is missing; I think it doesn't have a function for it but has a
> > counter in the struct, so perhaps fetch it manually from there), it is
> > Sunday, don't want to do more tonight:
> 
> here is the RTEMS part:

Ok for trunk with ChangeLog entry, thanks.

I have now committed this after bootstrapping/regtesting on x86_64-linux and
i686-linux and additionally building on x86_64-linux with
--disable-linux-futex and testing libgomp there.

2021-01-18  Jakub Jelinek  <jakub@redhat.com>

	* config/linux/sem.h (gomp_sem_getcount): New function.
	* config/posix/sem.h (gomp_sem_getcount): New function.
	* config/posix/sem.c (gomp_sem_getcount): New function.
	* config/accel/sem.h (gomp_sem_getcount): New function.
	* task.c (task_fulfilled_p): Use gomp_sem_getcount.
	(omp_fulfill_event): Likewise.

--- libgomp/config/linux/sem.h.jj	2021-01-04 10:25:56.160037625 +0100
+++ libgomp/config/linux/sem.h	2021-01-17 16:49:39.900750416 +0100
@@ -85,4 +85,13 @@ gomp_sem_post (gomp_sem_t *sem)
   if (__builtin_expect (count & SEM_WAIT, 0))
     gomp_sem_post_slow (sem);
 }
+
+static inline int
+gomp_sem_getcount (gomp_sem_t *sem)
+{
+  int count = __atomic_load_n (sem, MEMMODEL_RELAXED);
+  if ((count & SEM_WAIT) != 0)
+    return -1;
+  return count / SEM_INC;
+}
 #endif /* GOMP_SEM_H */
--- libgomp/config/posix/sem.h.jj	2021-01-04 10:25:56.166037557 +0100
+++ libgomp/config/posix/sem.h	2021-01-17 16:49:53.605593659 +0100
@@ -64,6 +64,8 @@ extern void gomp_sem_post (gomp_sem_t *s
 
 extern void gomp_sem_destroy (gomp_sem_t *sem);
 
+extern int gomp_sem_getcount (gomp_sem_t *sem);
+
 #else /* HAVE_BROKEN_POSIX_SEMAPHORES  */
 
 typedef sem_t gomp_sem_t;
@@ -84,5 +86,13 @@ static inline void gomp_sem_destroy (gom
 {
   sem_destroy (sem);
 }
+
+static inline int gomp_sem_getcount (gomp_sem_t *sem)
+{
+  int val;
+  if (sem_getvalue (sem, &val) < 0)
+    return -1;
+  return val;  
+}
 #endif /* doesn't HAVE_BROKEN_POSIX_SEMAPHORES  */
 #endif /* GOMP_SEM_H  */
--- libgomp/config/posix/sem.c.jj	2021-01-04 10:25:56.184037354 +0100
+++ libgomp/config/posix/sem.c	2021-01-17 16:52:00.207145847 +0100
@@ -112,6 +112,26 @@ void gomp_sem_destroy (gomp_sem_t *sem)
 
   return;
 }
+
+int gomp_sem_getcount (gomp_sem_t *sem)
+{
+  int ret, count;
+
+  ret = pthread_mutex_lock (&sem->mutex);
+  if (ret)
+    return -1;
+
+  count = sem->value;
+
+  ret = pthread_mutex_unlock (&sem->mutex);
+  if (ret)
+    return -1;
+
+  if (count < 0)
+    return -1;
+
+  return count;
+}
 #else /* HAVE_BROKEN_POSIX_SEMAPHORES  */
 void
 gomp_sem_wait (gomp_sem_t *sem)
--- libgomp/config/accel/sem.h.jj	2021-01-04 10:25:56.261036482 +0100
+++ libgomp/config/accel/sem.h	2021-01-17 16:53:13.381309036 +0100
@@ -62,4 +62,13 @@ gomp_sem_post (gomp_sem_t *sem)
 {
   (void) __atomic_add_fetch (sem, 1, MEMMODEL_RELEASE);
 }
+
+static inline int
+gomp_sem_getcount (gomp_sem_t *sem)
+{
+  int count = __atomic_load_n (sem, MEMMODEL_RELAXED);
+  if (count < 0)
+    return -1;
+  return count;
+}
 #endif /* GOMP_SEM_H */
--- libgomp/task.c.jj	2021-01-16 22:52:33.749412323 +0100
+++ libgomp/task.c	2021-01-17 16:54:54.315154777 +0100
@@ -330,7 +330,7 @@ gomp_task_handle_depend (struct gomp_tas
 static bool
 task_fulfilled_p (struct gomp_task *task)
 {
-  return __atomic_load_n (&task->completion_sem, __ATOMIC_RELAXED);
+  return gomp_sem_getcount (&task->completion_sem) > 0;
 }
 
 /* Called when encountering an explicit task directive.  If IF_CLAUSE is
@@ -2406,7 +2406,7 @@ omp_fulfill_event (omp_event_handle_t ev
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr ? thr->ts.team : NULL;
 
-  if (__atomic_load_n (sem, __ATOMIC_RELAXED))
+  if (gomp_sem_getcount (sem) > 0)
     gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", sem);
 
   gomp_debug (0, "omp_fulfill_event: %p\n", sem);


	Jakub
Rainer Orth Jan. 18, 2021, 11:59 a.m. UTC | #22
Hi Jakub,

> On Sun, Jan 17, 2021 at 04:25:24PM +0100, Andreas Schwab wrote:
>> On Jan 17 2021, Jakub Jelinek via Gcc-patches wrote:
>> 
>> > Kwok, I guess you can reproduce it even on Linux with --disable-linux-futex
>> 
>> And all targets that are not explicitly configured in
>> libcomp/configure.tgt, where --enable-linux-futex is a no-op.
>
> Completely untested patch (except for the linux futex version; and RTEMS
> stuff is missing; I think it doesn't have a function for it but has a
> counter in the struct, so perhaps fetch it manually from there), it is
> Sunday, don't want to do more tonight:

this worked for me on both i386-pc-solaris2.11 and
sparc-sun-solaris2.11, thanks.  However, I had to rerun the builds with
the DWARF-5 patch backed out since that caused so much breakage that the
results were all but useless.

Two comments, though:

> --- libgomp/config/linux/sem.h.jj	2021-01-04 10:25:56.160037625 +0100
> +++ libgomp/config/linux/sem.h	2021-01-17 16:49:39.900750416 +0100
> @@ -85,4 +85,13 @@ gomp_sem_post (gomp_sem_t *sem)
>    if (__builtin_expect (count & SEM_WAIT, 0))
>      gomp_sem_post_slow (sem);
>  }
> +
> +static inline int
> +gomp_sem_getcount (gomp_sem_t *sem)
> +{
> +  int count = __atomic_load_n (sem, MEMMODEL_RELAXED);
> +  if ((count & SEM_WAIT) != 0)
> +    return -1;
> +  return count / SEM_INC;
> +}
>  #endif /* GOMP_SEM_H */
> --- libgomp/config/posix/sem.h.jj	2021-01-04 10:25:56.166037557 +0100
> +++ libgomp/config/posix/sem.h	2021-01-17 16:49:53.605593659 +0100
> @@ -64,6 +64,8 @@ extern void gomp_sem_post (gomp_sem_t *s
>  
>  extern void gomp_sem_destroy (gomp_sem_t *sem);
>  
> +extern int gomp_sem_getcount (gomp_sem_t *sem);
> +
>  #else /* HAVE_BROKEN_POSIX_SEMAPHORES  */
>  
>  typedef sem_t gomp_sem_t;
> @@ -84,5 +86,13 @@ static inline void gomp_sem_destroy (gom
>  {
>    sem_destroy (sem);
>  }
> +
> +static inline int gomp_sem_getcount (gomp_sem_t *sem)

Shouldn't there be a line break before gomp_semp_getcount (and once
again in posix/sem.c), as done in linux/sem.h above?  libgomp seems a
bit inconsistent in that matter, though.

Besides, I've seen regular timeouts on both Solaris and Linux/x86_64 for
one of the new tests:

WARNING: libgomp.fortran/task-detach-6.f90   -O2  execution test program timed out.
FAIL: libgomp.fortran/task-detach-6.f90   -O2  execution test

It doesn't happen every time when manually running the test, but every
third or forth time.

	Rainer
Florian Weimer Jan. 20, 2021, 4:04 p.m. UTC | #23
* Kwok Cheung Yeung:

> From 788687f87ad41e51258738ce068ee38d7b24defc Mon Sep 17 00:00:00 2001
> From: Kwok Cheung Yeung <kcy@codesourcery.com>
> Date: Fri, 15 Jan 2021 04:49:36 -0800
> Subject: [PATCH] openmp: Add support for the OpenMP 5.0 task detach clause

Sorry, this appears to cause OpenMP task state corruption in RPM.  We
have only seen this on s390x.

The relevant code looks like this:

745      * (largest first) to help achieve an optimal load distribution.
746      */
747     rpmRC packageBinaries(rpmSpec spec, const char *cookie, int cheating)
748     {
749         rpmRC rc = RPMRC_OK;
750         Package pkg;
751         Package *tasks;
752         int npkgs = 0;
753
754         for (pkg = spec->packages; pkg != NULL; pkg = pkg->next)
755             npkgs++;
756         tasks = xcalloc(npkgs, sizeof(Package));
757
758         pkg = spec->packages;
759         for (int i = 0; i < npkgs; i++) {
760             tasks[i] = pkg;
761             pkg = pkg->next;
762         }
763         qsort(tasks, npkgs, sizeof(Package), compareBinaries);
764
765         #pragma omp parallel
766         #pragma omp single
767         for (int i = 0; i < npkgs; i++) {
768             Package pkg = tasks[i];
769             #pragma omp task untied priority(i)
770             {
771             pkg->rc = packageBinary(spec, pkg, cookie, cheating, &pkg->filename);
772             rpmlog(RPMLOG_DEBUG,
773                     _("Finished binary package job, result %d, filename %s\n"),
774                     pkg->rc, pkg->filename);
775             if (pkg->rc) {
776                 #pragma omp critical
777                 rc = pkg->rc;
778             }
779             } /* omp task */
780             if (rc)
781                 break;
782         }
783
784         /* Now check the package set if enabled */
785         if (rc == RPMRC_OK)
786             rc = checkPackageSet(spec->packages);
787
788         free(tasks);
789
790         return rc;
791     }

Backtrace looks like this:

Thread 1 "rpmbuild" received signal SIGSEGV, Segmentation fault.
0x000003fffde89e60 in packageBinaries._omp_fn.0 () at pack.c:780
780             if (rc)
(gdb) bt
#0  0x000003fffde89e60 in packageBinaries._omp_fn.0 () at pack.c:780
#1  0x000003fffca94806 in GOMP_parallel (
    fn=0x3fffde89d80 <packageBinaries._omp_fn.0>, data=0x3fffffff478, 
    num_threads=2, flags=<optimized out>) at ../../../libgomp/parallel.c:178
#2  0x000003fffde953fa in packageBinaries (cheating=0, cookie=0x0, 
    spec=0x2aa00065570) at pack.c:765
#3  buildSpec (ts=<optimized out>, buildArgs=<optimized out>, 
    spec=0x2aa00065570, what=<optimized out>) at build.c:411
#4  0x000003fffde98074 in rpmSpecBuild (ts=<optimized out>, 
    spec=<optimized out>, buildArgs=<optimized out>) at build.c:452
#5  0x000002aa00003e74 in buildForTarget (ts=0x2aa00069a80, 
    arg=<optimized out>, ba=0x2aa00007990 <rpmBTArgs>) at rpmbuild.c:500
#6  0x000002aa0000409a in build (ts=0x2aa00069a80, 
    arg=0x3fffffffe3a "/builddir/build/SPECS/compsize.spec", rcfile=0x0, 
    ba=0x2aa00007990 <rpmBTArgs>) at rpmbuild.c:552
#7  0x000002aa00002f84 in main (argc=<optimized out>, argv=<optimized out>)
    at rpmbuild.c:690

Debuginfo is a bit wonky, but the task pointer appears to be null.
Disassembly of the function around the crash site:

(gdb) disassemble 
Dump of assembler code for function packageBinaries._omp_fn.0:
   0x000003fffde89d80 <+0>:     stmg    %r6,%r15,48(%r15)
   0x000003fffde89d86 <+6>:     ear     %r1,%a0
   0x000003fffde89d8a <+10>:    lgr     %r14,%r15
   0x000003fffde89d8e <+14>:    lay     %r15,-280(%r15)
   0x000003fffde89d94 <+20>:    aghi    %r14,-24
   0x000003fffde89d98 <+24>:    std     %f8,0(%r14)
   0x000003fffde89d9c <+28>:    std     %f10,8(%r14)
   0x000003fffde89da0 <+32>:    std     %f14,16(%r14)
   0x000003fffde89da4 <+36>:    sllg    %r1,%r1,32
   0x000003fffde89daa <+42>:    ear     %r1,%a1
   0x000003fffde89dae <+46>:    l       %r11,32(%r2)
   0x000003fffde89db2 <+50>:    stg     %r1,200(%r15)
   0x000003fffde89db8 <+56>:    lg      %r10,16(%r2)
   0x000003fffde89dbe <+62>:    l       %r13,24(%r2)
   0x000003fffde89dc2 <+66>:    mvc     248(8,%r15),40(%r1)
   0x000003fffde89dc8 <+72>:    ld      %f8,8(%r2)
   0x000003fffde89dcc <+76>:    ld      %f10,0(%r2)
   0x000003fffde89dd0 <+80>:    lgr     %r8,%r2
   0x000003fffde89dd4 <+84>:    brasl   %r14,0x3fffde87530 <GOMP_single_start@plt>
   0x000003fffde89dda <+90>:    cije    %r2,0,0x3fffde89e76 <packageBinaries._omp_fn.0+246>
   0x000003fffde89de0 <+96>:    cijnh   %r11,0,0x3fffde89e76 <packageBinaries._omp_fn.0+246>
   0x000003fffde89de6 <+102>:   stg     %r8,192(%r15)
   0x000003fffde89dec <+108>:   la      %r7,208(%r15)
   0x000003fffde89df0 <+112>:   la      %r1,28(%r8)
   0x000003fffde89df4 <+116>:   lgr     %r8,%r7
   0x000003fffde89df8 <+120>:   lgdr    %r7,%f8
   0x000003fffde89dfc <+124>:   ldgr    %f14,%r1
   0x000003fffde89e00 <+128>:   lhi     %r9,0
   0x000003fffde89e04 <+132>:   lg      %r1,0(%r10)
   0x000003fffde89e0a <+138>:   std     %f10,208(%r15)
   0x000003fffde89e0e <+142>:   std     %f14,224(%r15)
   0x000003fffde89e12 <+146>:   stg     %r1,232(%r15)
   0x000003fffde89e18 <+152>:   st      %r13,240(%r15)
   0x000003fffde89e1c <+156>:   stg     %r7,216(%r15)
   0x000003fffde89e22 <+162>:   lgfr    %r1,%r9
   0x000003fffde89e26 <+166>:   stg     %r1,184(%r15)
   0x000003fffde89e2c <+172>:   mvghi   176(%r15),0
   0x000003fffde89e32 <+178>:   mvghi   168(%r15),17
   0x000003fffde89e38 <+184>:   mvghi   160(%r15),1
   0x000003fffde89e3e <+190>:   lgr     %r3,%r8
   0x000003fffde89e42 <+194>:   lghi    %r6,8
   0x000003fffde89e46 <+198>:   lghi    %r5,40
   0x000003fffde89e4a <+202>:   lghi    %r4,0
   0x000003fffde89e4e <+206>:   larl    %r2,0x3fffde910c0 <packageBinaries._omp_fn.1>
   0x000003fffde89e54 <+212>:   brasl   %r14,0x3fffde87c70 <GOMP_task@plt>
   0x000003fffde89e5a <+218>:   lg      %r1,192(%r15)
=> 0x000003fffde89e60 <+224>:   lt      %r1,28(%r1)
   0x000003fffde89e66 <+230>:   jne     0x3fffde89e76 <packageBinaries._omp_fn.0+246>
   0x000003fffde89e6a <+234>:   ahi     %r9,1
   0x000003fffde89e6e <+238>:   aghi    %r10,8
   0x000003fffde89e72 <+242>:   brct    %r11,0x3fffde89e04 <packageBinaries._omp_fn.0+132>
   0x000003fffde89e76 <+246>:   lg      %r1,200(%r15)
   0x000003fffde89e7c <+252>:   clc     248(8,%r15),40(%r1)
   0x000003fffde89e82 <+258>:   jne     0x3fffde89e9e <packageBinaries._omp_fn.0+286>
   0x000003fffde89e86 <+262>:   ld      %f8,256(%r15)
   0x000003fffde89e8a <+266>:   ld      %f10,264(%r15)
   0x000003fffde89e8e <+270>:   ld      %f14,272(%r15)
   0x000003fffde89e92 <+274>:   lmg     %r6,%r15,328(%r15)
   0x000003fffde89e98 <+280>:   jg      0x3fffde895f0 <GOMP_barrier@plt>
   0x000003fffde89e9e <+286>:   brasl   %r14,0x3fffde88810 <__stack_chk_fail@plt>
(gdb) print/x $r1
$1 = 0x0

Confirmed via bisection on libgomp.so.1.  I did not rebuild rpmbuild.

Thanks,
Florian
Jakub Jelinek Jan. 20, 2021, 4:40 p.m. UTC | #24
On Wed, Jan 20, 2021 at 05:04:39PM +0100, Florian Weimer wrote:
> Sorry, this appears to cause OpenMP task state corruption in RPM.  We
> have only seen this on s390x.

Haven't actually verified it, but my suspection is that this is a caller
stack corruption.

We play with fire with the GOMP_task API/ABI extensions, the GOMP_task
function used to be:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
           long arg_size, long arg_align, bool if_clause, unsigned flags);
and later:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
           long arg_size, long arg_align, bool if_clause, unsigned flags,
           void **depend);
and later:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
           long arg_size, long arg_align, bool if_clause, unsigned flags,
           void **depend, int priority);
and now:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
           long arg_size, long arg_align, bool if_clause, unsigned flags,
           void **depend, int priority, void *detach)
and which of those depend, priority and detach argument is present depends
on the bits in flags.
I'm afraid the compiler just decided to spill the detach = NULL store in
  if ((flags & GOMP_TASK_FLAG_DETACH) == 0)
    detach = NULL;
on s390x into the argument stack slot.  Not a problem if the caller passes
all those 10 arguments, but if not, can clobber random stack location.

This hack should fix it up.  Priority doesn't need changing, but I've
changed it anyway just to be safe.  With the patch none of the 3 arguments
are ever modified, so I'd hope gcc doesn't decide to spill something
unrelated there.

2021-01-20  Jakub Jelinek  <jakub@redhat.com>

	* task.c (GOMP_task): Rename priority argument to priority_arg,
	add priority automatic variable and modify that variable.  Instead of
	clearing detach argument when GOMP_TASK_FLAG_DETACH bit is not set,
	check flags for that bit.

--- libgomp/task.c.jj	2021-01-18 07:18:42.362339622 +0100
+++ libgomp/task.c	2021-01-20 17:23:36.973758174 +0100
@@ -354,10 +354,11 @@ task_fulfilled_p (struct gomp_task *task
 void
 GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	   long arg_size, long arg_align, bool if_clause, unsigned flags,
-	   void **depend, int priority, void *detach)
+	   void **depend, int priority_arg, void *detach)
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr->ts.team;
+  int priority = 0;
 
 #ifdef HAVE_BROKEN_POSIX_SEMAPHORES
   /* If pthread_mutex_* is used for omp_*lock*, then each task must be
@@ -385,13 +386,12 @@ GOMP_task (void (*fn) (void *), void *da
 	}
     }
 
-  if ((flags & GOMP_TASK_FLAG_PRIORITY) == 0)
-    priority = 0;
-  else if (priority > gomp_max_task_priority_var)
-    priority = gomp_max_task_priority_var;
-
-  if ((flags & GOMP_TASK_FLAG_DETACH) == 0)
-    detach = NULL;
+  if (__builtin_expect ((flags & GOMP_TASK_FLAG_PRIORITY) != 0, 0))
+    {
+      priority = priority_arg;
+      if (priority > gomp_max_task_priority_var)
+	priority = gomp_max_task_priority_var;
+    }
 
   if (!if_clause || team == NULL
       || (thr->task && thr->task->final_task)
@@ -415,7 +415,7 @@ GOMP_task (void (*fn) (void *), void *da
 			|| (flags & GOMP_TASK_FLAG_FINAL);
       task.priority = priority;
 
-      if (detach)
+      if ((flags & GOMP_TASK_FLAG_DETACH) != 0)
 	{
 	  task.detach = true;
 	  gomp_sem_init (&task.completion_sem, 0);
@@ -443,7 +443,7 @@ GOMP_task (void (*fn) (void *), void *da
       else
 	fn (data);
 
-      if (detach && !task_fulfilled_p (&task))
+      if (task.detach && !task_fulfilled_p (&task))
 	gomp_sem_wait (&task.completion_sem);
 
       /* Access to "children" is normally done inside a task_lock
@@ -484,7 +484,7 @@ GOMP_task (void (*fn) (void *), void *da
       task->kind = GOMP_TASK_UNDEFERRED;
       task->in_tied_task = parent->in_tied_task;
       task->taskgroup = taskgroup;
-      if (detach)
+      if ((flags & GOMP_TASK_FLAG_DETACH) != 0)
 	{
 	  task->detach = true;
 	  gomp_sem_init (&task->completion_sem, 0);


	Jakub
Thomas Schwinge Jan. 21, 2021, 7:32 a.m. UTC | #25
Hi!

On 2021-01-20T17:40:58+0100, Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> On Wed, Jan 20, 2021 at 05:04:39PM +0100, Florian Weimer wrote:
>> Sorry, this appears to cause OpenMP task state corruption in RPM.  We
>> have only seen this on s390x.
>
> Haven't actually verified it, but my suspection is that this is a caller
> stack corruption.
>
> We play with fire with the GOMP_task API/ABI extensions, the GOMP_task
> function used to be:
> void
> GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
>            long arg_size, long arg_align, bool if_clause, unsigned flags);
> and later:
> void
> GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
>            long arg_size, long arg_align, bool if_clause, unsigned flags,
>            void **depend);
> and later:
> void
> GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
>            long arg_size, long arg_align, bool if_clause, unsigned flags,
>            void **depend, int priority);
> and now:
> void
> GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
>            long arg_size, long arg_align, bool if_clause, unsigned flags,
>            void **depend, int priority, void *detach)

Yeah, I'd wondered about that, too.

> and which of those depend, priority and detach argument is present depends
> on the bits in flags.
> I'm afraid the compiler just decided to spill the detach = NULL store in
>   if ((flags & GOMP_TASK_FLAG_DETACH) == 0)
>     detach = NULL;
> on s390x into the argument stack slot.  Not a problem if the caller passes
> all those 10 arguments, but if not, can clobber random stack location.
>
> This hack should fix it up.  Priority doesn't need changing, but I've
> changed it anyway just to be safe.  With the patch none of the 3 arguments
> are ever modified, so I'd hope gcc doesn't decide to spill something
> unrelated there.

That still seems fragile; is "hope gcc doesn't decide to spill" really
sufficient?

Cannot we (easily) use symbol versioning to introduce new entry point
variants (which then internally all route to the same function)?


Grüße
 Thomas


> 2021-01-20  Jakub Jelinek  <jakub@redhat.com>
>
>       * task.c (GOMP_task): Rename priority argument to priority_arg,
>       add priority automatic variable and modify that variable.  Instead of
>       clearing detach argument when GOMP_TASK_FLAG_DETACH bit is not set,
>       check flags for that bit.
>
> --- libgomp/task.c.jj 2021-01-18 07:18:42.362339622 +0100
> +++ libgomp/task.c    2021-01-20 17:23:36.973758174 +0100
> @@ -354,10 +354,11 @@ task_fulfilled_p (struct gomp_task *task
>  void
>  GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
>          long arg_size, long arg_align, bool if_clause, unsigned flags,
> -        void **depend, int priority, void *detach)
> +        void **depend, int priority_arg, void *detach)
>  {
>    struct gomp_thread *thr = gomp_thread ();
>    struct gomp_team *team = thr->ts.team;
> +  int priority = 0;
>
>  #ifdef HAVE_BROKEN_POSIX_SEMAPHORES
>    /* If pthread_mutex_* is used for omp_*lock*, then each task must be
> @@ -385,13 +386,12 @@ GOMP_task (void (*fn) (void *), void *da
>       }
>      }
>
> -  if ((flags & GOMP_TASK_FLAG_PRIORITY) == 0)
> -    priority = 0;
> -  else if (priority > gomp_max_task_priority_var)
> -    priority = gomp_max_task_priority_var;
> -
> -  if ((flags & GOMP_TASK_FLAG_DETACH) == 0)
> -    detach = NULL;
> +  if (__builtin_expect ((flags & GOMP_TASK_FLAG_PRIORITY) != 0, 0))
> +    {
> +      priority = priority_arg;
> +      if (priority > gomp_max_task_priority_var)
> +     priority = gomp_max_task_priority_var;
> +    }
>
>    if (!if_clause || team == NULL
>        || (thr->task && thr->task->final_task)
> @@ -415,7 +415,7 @@ GOMP_task (void (*fn) (void *), void *da
>                       || (flags & GOMP_TASK_FLAG_FINAL);
>        task.priority = priority;
>
> -      if (detach)
> +      if ((flags & GOMP_TASK_FLAG_DETACH) != 0)
>       {
>         task.detach = true;
>         gomp_sem_init (&task.completion_sem, 0);
> @@ -443,7 +443,7 @@ GOMP_task (void (*fn) (void *), void *da
>        else
>       fn (data);
>
> -      if (detach && !task_fulfilled_p (&task))
> +      if (task.detach && !task_fulfilled_p (&task))
>       gomp_sem_wait (&task.completion_sem);
>
>        /* Access to "children" is normally done inside a task_lock
> @@ -484,7 +484,7 @@ GOMP_task (void (*fn) (void *), void *da
>        task->kind = GOMP_TASK_UNDEFERRED;
>        task->in_tied_task = parent->in_tied_task;
>        task->taskgroup = taskgroup;
> -      if (detach)
> +      if ((flags & GOMP_TASK_FLAG_DETACH) != 0)
>       {
>         task->detach = true;
>         gomp_sem_init (&task->completion_sem, 0);
>
>
>       Jakub
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Jakub Jelinek Jan. 21, 2021, 7:39 a.m. UTC | #26
On Thu, Jan 21, 2021 at 08:32:52AM +0100, Thomas Schwinge wrote:
> > and which of those depend, priority and detach argument is present depends
> > on the bits in flags.
> > I'm afraid the compiler just decided to spill the detach = NULL store in
> >   if ((flags & GOMP_TASK_FLAG_DETACH) == 0)
> >     detach = NULL;
> > on s390x into the argument stack slot.  Not a problem if the caller passes
> > all those 10 arguments, but if not, can clobber random stack location.
> >
> > This hack should fix it up.  Priority doesn't need changing, but I've
> > changed it anyway just to be safe.  With the patch none of the 3 arguments
> > are ever modified, so I'd hope gcc doesn't decide to spill something
> > unrelated there.
> 
> That still seems fragile; is "hope gcc doesn't decide to spill" really
> sufficient?
> 
> Cannot we (easily) use symbol versioning to introduce new entry point
> variants (which then internally all route to the same function)?

Not all targets support GNU style function versioning where one can have
multiple symbol versions for the same symbol name, and I wanted to
avoid GOMP_task2, GOMP_task3 etc.
In retrospect, it would have been probably better to make the function
void (*fn) (void *), /* ... */, unsigned flags, ...)
but it is too late to change that now.
And we can't really change at least the GOMP_task with just flags, with also
depend and with also priority, so we have that potential problem anyway.

On the other side, I really don't see a reason why gcc would try to spill
something that has never been changed.

	Jakub
Florian Weimer Jan. 21, 2021, 8:44 a.m. UTC | #27
* Jakub Jelinek:

> Not all targets support GNU style function versioning where one can have
> multiple symbol versions for the same symbol name, and I wanted to
> avoid GOMP_task2, GOMP_task3 etc.

Why?  If it is about aesthetics, wouldn't compliance with language
standards and ABI requirements be more important?

> In retrospect, it would have been probably better to make the function
> void (*fn) (void *), /* ... */, unsigned flags, ...)
> but it is too late to change that now.

Right, on some targets, the caller would have to allocate a parameter
save area, resulting in stack corruption if it is missing.

> On the other side, I really don't see a reason why gcc would try to spill
> something that has never been changed.

If the argument slot is dead, GCC might reuse it for something else (if
the ABI says that the callee owns those slots).

Thanks,
Florian
diff mbox series

Patch

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c46b1bc..cb2d5c2 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -79,6 +79,7 @@  DEF_PRIMITIVE_TYPE (BT_UINT128, uint128_type_node
 DEF_PRIMITIVE_TYPE (BT_WORD, (*lang_hooks.types.type_for_mode) (word_mode, 1))
 DEF_PRIMITIVE_TYPE (BT_UNWINDWORD, (*lang_hooks.types.type_for_mode)
 				    (targetm.unwind_word_mode (), 1))
+DEF_PRIMITIVE_TYPE (BT_PTR_SIZED_INT, pointer_sized_int_node)
 DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
@@ -253,6 +254,7 @@  DEF_FUNCTION_TYPE_1 (BT_FN_LONG_LONGDOUBLE, BT_LONG, BT_LONGDOUBLE)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_FLOAT, BT_LONGLONG, BT_FLOAT)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_DOUBLE, BT_LONGLONG, BT_DOUBLE)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGLONG_LONGDOUBLE, BT_LONGLONG, BT_LONGDOUBLE)
+DEF_FUNCTION_TYPE_1 (BT_FN_PSINT_VOID, BT_PTR_SIZED_INT, BT_VOID)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTR, BT_VOID, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_SIZE_CONST_STRING, BT_SIZE, BT_CONST_STRING)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_CONST_STRING, BT_INT, BT_CONST_STRING)
@@ -754,10 +756,6 @@  DEF_FUNCTION_TYPE_8 (BT_FN_BOOL_UINT_ULLPTR_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_LONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG, BT_PTR, BT_PTR)
 
-DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
-		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
-		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
-		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
 		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
@@ -765,6 +763,10 @@  DEF_FUNCTION_TYPE_9 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR_PTR_PTR
 		     BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG, BT_LONG,
 		     BT_PTR_LONG, BT_PTR_LONG, BT_PTR, BT_PTR)
 
+DEF_FUNCTION_TYPE_10 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT,
+		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
+		      BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
+		      BT_BOOL, BT_UINT, BT_PTR, BT_INT, BT_PTR_SIZED_INT)
 DEF_FUNCTION_TYPE_10 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		      BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		      BT_ULONGLONG, BT_LONG, BT_ULONGLONG, BT_PTR_ULONGLONG,
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 5a493fe..fb784e9 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -94,6 +94,7 @@  enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_DEFAULT,
   PRAGMA_OMP_CLAUSE_DEFAULTMAP,
   PRAGMA_OMP_CLAUSE_DEPEND,
+  PRAGMA_OMP_CLAUSE_DETACH,
   PRAGMA_OMP_CLAUSE_DEVICE,
   PRAGMA_OMP_CLAUSE_DEVICE_TYPE,
   PRAGMA_OMP_CLAUSE_DIST_SCHEDULE,
@@ -150,7 +151,6 @@  enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_COPYOUT,
   PRAGMA_OACC_CLAUSE_CREATE,
   PRAGMA_OACC_CLAUSE_DELETE,
-  PRAGMA_OACC_CLAUSE_DETACH,
   PRAGMA_OACC_CLAUSE_DEVICEPTR,
   PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
   PRAGMA_OACC_CLAUSE_FINALIZE,
@@ -173,6 +173,7 @@  enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_COPYIN = PRAGMA_OMP_CLAUSE_COPYIN,
   PRAGMA_OACC_CLAUSE_DEVICE = PRAGMA_OMP_CLAUSE_DEVICE,
   PRAGMA_OACC_CLAUSE_DEFAULT = PRAGMA_OMP_CLAUSE_DEFAULT,
+  PRAGMA_OACC_CLAUSE_DETACH = PRAGMA_OMP_CLAUSE_DETACH,
   PRAGMA_OACC_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
   PRAGMA_OACC_CLAUSE_IF = PRAGMA_OMP_CLAUSE_IF,
   PRAGMA_OACC_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index ecc3d21..3b766fd 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13407,6 +13407,15 @@  c_parser_omp_clause_default (c_parser *parser, tree list, bool is_oacc)
   return c;
 }
 
+/* OpenMP 5.0:
+   detach ( event-handle ) */
+
+static tree
+c_parser_omp_clause_detach (c_parser *parser, tree list)
+{
+  return c_parser_omp_var_list_parens (parser, OMP_CLAUSE_DETACH, list);
+}
+
 /* OpenMP 2.5:
    firstprivate ( variable-list ) */
 
@@ -16244,6 +16253,10 @@  c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_omp_clause_default (parser, clauses, false);
 	  c_name = "default";
 	  break;
+	case PRAGMA_OMP_CLAUSE_DETACH:
+	  clauses = c_parser_omp_clause_detach (parser, clauses);
+	  c_name = "detach";
+	  break;
 	case PRAGMA_OMP_CLAUSE_FIRSTPRIVATE:
 	  clauses = c_parser_omp_clause_firstprivate (parser, clauses);
 	  c_name = "firstprivate";
@@ -19142,7 +19155,8 @@  c_parser_omp_single (location_t loc, c_parser *parser, bool *if_p)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEPEND)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRIORITY)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ALLOCATE)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION))
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DETACH))
 
 static tree
 c_parser_omp_task (location_t loc, c_parser *parser, bool *if_p)
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 9684037..9a5018c 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -14937,6 +14937,11 @@  c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  pc = &OMP_CLAUSE_CHAIN (c);
 	  continue;
 
+	case OMP_CLAUSE_DETACH:
+	  t = OMP_CLAUSE_DECL (c);
+	  pc = &OMP_CLAUSE_CHAIN (c);
+	  continue;
+
 	case OMP_CLAUSE_IF:
 	case OMP_CLAUSE_NUM_THREADS:
 	case OMP_CLAUSE_NUM_TEAMS:
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index bbf157e..92457929 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -37847,6 +37847,10 @@  cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 						 token->location);
 	  c_name = "depend";
 	  break;
+	case PRAGMA_OMP_CLAUSE_DETACH:
+	  clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_DETACH, clauses);
+	  c_name = "detach";
+	  break;
 	case PRAGMA_OMP_CLAUSE_MAP:
 	  clauses = cp_parser_omp_clause_map (parser, clauses);
 	  c_name = "map";
@@ -40381,7 +40385,8 @@  cp_parser_omp_single (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEPEND)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRIORITY)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ALLOCATE)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION))
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DETACH))
 
 static tree
 cp_parser_omp_task (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index a550db6..6972d22 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7391,6 +7391,9 @@  finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 		}
 	    }
 	  break;
+	case OMP_CLAUSE_DETACH:
+	  t = OMP_CLAUSE_DECL (c);
+	  break;
 
 	case OMP_CLAUSE_MAP:
 	case OMP_CLAUSE_TO:
diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 43b97ba..b28fe73 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1692,6 +1692,12 @@  show_omp_clauses (gfc_omp_clauses *omp_clauses)
       show_expr (omp_clauses->priority);
       fputc (')', dumpfile);
     }
+  if (omp_clauses->detach)
+    {
+      fputs (" DETACH(", dumpfile);
+      show_expr (omp_clauses->detach);
+      fputc (')', dumpfile);
+    }
   for (i = 0; i < OMP_IF_LAST; i++)
     if (omp_clauses->if_exprs[i])
       {
diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index 83f6fd8..699b354 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -5597,6 +5597,7 @@  gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn,
 		  WALK_SUBEXPR (co->ext.omp_clauses->hint);
 		  WALK_SUBEXPR (co->ext.omp_clauses->num_tasks);
 		  WALK_SUBEXPR (co->ext.omp_clauses->priority);
+		  WALK_SUBEXPR (co->ext.omp_clauses->detach);
 		  for (idx = 0; idx < OMP_IF_LAST; idx++)
 		    WALK_SUBEXPR (co->ext.omp_clauses->if_exprs[idx]);
 		  for (idx = 0;
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index dfd7796..a2193dc 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1410,6 +1410,7 @@  typedef struct gfc_omp_clauses
   struct gfc_expr *hint;
   struct gfc_expr *num_tasks;
   struct gfc_expr *priority;
+  struct gfc_expr *detach;
   struct gfc_expr *if_exprs[OMP_IF_LAST];
   enum gfc_omp_sched_kind dist_sched_kind;
   struct gfc_expr *dist_chunk_size;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 2270c85..c361859 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -91,6 +91,7 @@  gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c->hint);
   gfc_free_expr (c->num_tasks);
   gfc_free_expr (c->priority);
+  gfc_free_expr (c->detach);
   for (i = 0; i < OMP_IF_LAST; i++)
     gfc_free_expr (c->if_exprs[i]);
   gfc_free_expr (c->async_expr);
@@ -805,6 +806,7 @@  enum omp_mask1
   OMP_CLAUSE_ATOMIC,  /* OpenMP 5.0.  */
   OMP_CLAUSE_CAPTURE,  /* OpenMP 5.0.  */
   OMP_CLAUSE_MEMORDER,  /* OpenMP 5.0.  */
+  OMP_CLAUSE_DETACH,  /* OpenMP 5.0.  */
   OMP_CLAUSE_NOWAIT,
   /* This must come last.  */
   OMP_MASK1_LAST
@@ -838,7 +840,6 @@  enum omp_mask2
   OMP_CLAUSE_IF_PRESENT,
   OMP_CLAUSE_FINALIZE,
   OMP_CLAUSE_ATTACH,
-  OMP_CLAUSE_DETACH,
   /* This must come last.  */
   OMP_MASK2_LAST
 };
@@ -1219,6 +1220,12 @@  gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		gfc_current_locus = old_loc;
 	    }
 	  if ((mask & OMP_CLAUSE_DETACH)
+	      && !openacc
+	      && c->detach == NULL
+	      && gfc_match ("detach ( %e )", &c->detach) == MATCH_YES)
+	    continue;
+	  if ((mask & OMP_CLAUSE_DETACH)
+	      && openacc
 	      && gfc_match ("detach ( ") == MATCH_YES
 	      && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
 					   OMP_MAP_DETACH, false,
@@ -2696,7 +2703,7 @@  cleanup:
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
    | OMP_CLAUSE_SHARED | OMP_CLAUSE_IF | OMP_CLAUSE_DEFAULT		\
    | OMP_CLAUSE_UNTIED | OMP_CLAUSE_FINAL | OMP_CLAUSE_MERGEABLE	\
-   | OMP_CLAUSE_DEPEND | OMP_CLAUSE_PRIORITY)
+   | OMP_CLAUSE_DEPEND | OMP_CLAUSE_PRIORITY | OMP_CLAUSE_DETACH)
 #define OMP_TASKLOOP_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
    | OMP_CLAUSE_LASTPRIVATE | OMP_CLAUSE_SHARED | OMP_CLAUSE_IF		\
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 1d652a0..9182482 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3639,6 +3639,21 @@  gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
     }
 
+  if (clauses->detach)
+    {
+      tree detach;
+
+      gfc_init_se (&se, NULL);
+      gfc_conv_expr (&se, clauses->detach);
+      gfc_add_block_to_block (block, &se.pre);
+      detach = se.expr;
+      gfc_add_block_to_block (block, &se.post);
+
+      c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_DETACH);
+      OMP_CLAUSE_DETACH_EXPR (c) = detach;
+      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+    }
+
   if (clauses->hint)
     {
       tree hint;
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 7b4925c..36543d2 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -53,6 +53,7 @@  DEF_PRIMITIVE_TYPE (BT_LONG, long_integer_type_node)
 DEF_PRIMITIVE_TYPE (BT_ULONGLONG, long_long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (BT_WORD, (*lang_hooks.types.type_for_mode) (word_mode, 1))
 DEF_PRIMITIVE_TYPE (BT_SIZE, size_type_node)
+DEF_PRIMITIVE_TYPE (BT_PTR_SIZED_INT, pointer_sized_int_node)
 
 DEF_PRIMITIVE_TYPE (BT_I1, builtin_type_for_size (BITS_PER_UNIT*1, 1))
 DEF_PRIMITIVE_TYPE (BT_I2, builtin_type_for_size (BITS_PER_UNIT*2, 1))
@@ -85,6 +86,7 @@  DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT, BT_INT, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
 DEF_FUNCTION_TYPE_1 (BT_FN_PTR_PTR, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_PSINT_VOID, BT_PTR_SIZED_INT, BT_VOID)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_INT, BT_VOID, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_BOOL, BT_VOID, BT_BOOL)
 DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT)
@@ -230,10 +232,6 @@  DEF_FUNCTION_TYPE_8 (BT_FN_BOOL_UINT_ULLPTR_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_LONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG, BT_PTR, BT_PTR)
 
-DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
-		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
-		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
-		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
 		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
@@ -241,6 +239,10 @@  DEF_FUNCTION_TYPE_9 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR_PTR_PTR
 		     BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG, BT_LONG,
 		     BT_PTR_LONG, BT_PTR_LONG, BT_PTR, BT_PTR)
 
+DEF_FUNCTION_TYPE_10 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT,
+		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
+		      BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
+		      BT_BOOL, BT_UINT, BT_PTR, BT_INT, BT_PTR_SIZED_INT)
 DEF_FUNCTION_TYPE_10 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_LONG_ULL_ULLPTR_ULLPTR_PTR_PTR,
 		      BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		      BT_ULONGLONG, BT_LONG, BT_ULONGLONG, BT_PTR_ULONGLONG,
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index aa3b914..dc41805 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9523,6 +9523,10 @@  gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	    }
 	  break;
 
+	case OMP_CLAUSE_DETACH:
+	  decl = OMP_CLAUSE_DECL (c);
+	  goto do_notice;
+
 	case OMP_CLAUSE_IF:
 	  if (OMP_CLAUSE_IF_MODIFIER (c) != ERROR_MARK
 	      && OMP_CLAUSE_IF_MODIFIER (c) != code)
@@ -10621,6 +10625,7 @@  gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	case OMP_CLAUSE_DEFAULTMAP:
 	case OMP_CLAUSE_ORDER:
 	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index f461d60..c883ec8 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -379,7 +379,7 @@  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL_REDUCTIONS,
 		  "GOMP_parallel_reductions",
 		  BT_FN_UINT_OMPFN_PTR_UINT_UINT, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASK, "GOMP_task",
-		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
+		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT,
 		  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TASKLOOP, "GOMP_taskloop",
 		  BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
@@ -446,3 +446,5 @@  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WORKSHARE_TASK_REDUCTION_UNREGISTER,
 		  BT_FN_VOID_BOOL, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DECLARE, "GOACC_declare",
 		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_NEW_EVENT, "GOMP_new_event",
+		  BT_FN_PSINT_VOID, ATTR_NOTHROW_LEAF_LIST)
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 6583c88..9f979ac 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -762,6 +762,7 @@  expand_task_call (struct omp_region *region, basic_block bb,
   tree depend = omp_find_clause (clauses, OMP_CLAUSE_DEPEND);
   tree finalc = omp_find_clause (clauses, OMP_CLAUSE_FINAL);
   tree priority = omp_find_clause (clauses, OMP_CLAUSE_PRIORITY);
+  tree detach = omp_find_clause (clauses, OMP_CLAUSE_DETACH);
 
   unsigned int iflags
     = (untied ? GOMP_TASK_FLAG_UNTIED : 0)
@@ -853,6 +854,11 @@  expand_task_call (struct omp_region *region, basic_block bb,
     priority = integer_zero_node;
 
   gsi = gsi_last_nondebug_bb (bb);
+
+  detach = detach
+      ? fold_convert (pointer_sized_int_node, OMP_CLAUSE_DETACH_EXPR (detach))
+      : null_pointer_node;
+
   tree t = gimple_omp_task_data_arg (entry_stmt);
   if (t == NULL)
     t2 = null_pointer_node;
@@ -875,10 +881,10 @@  expand_task_call (struct omp_region *region, basic_block bb,
 			 num_tasks, priority, startvar, endvar, step);
   else
     t = build_call_expr (builtin_decl_explicit (BUILT_IN_GOMP_TASK),
-			 9, t1, t2, t3,
+			 10, t1, t2, t3,
 			 gimple_omp_task_arg_size (entry_stmt),
 			 gimple_omp_task_arg_align (entry_stmt), cond, flags,
-			 depend, priority);
+			 depend, priority, detach);
 
   force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
 			    false, GSI_CONTINUE_LINKING);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ea9008b..c5221f6 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1313,6 +1313,14 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  install_var_field (decl, by_ref, 3, ctx);
 	  break;
 
+	case OMP_CLAUSE_DETACH:
+	  decl = OMP_CLAUSE_DECL (c);
+	  install_var_field (decl, true, 3, ctx);
+	  install_var_local (decl, ctx);
+	  if (ctx->outer)
+	    scan_omp_op (&OMP_CLAUSE_OPERAND (c, 0), ctx->outer);
+	  break;
+
 	case OMP_CLAUSE_FINAL:
 	case OMP_CLAUSE_IF:
 	case OMP_CLAUSE_NUM_THREADS:
@@ -1654,6 +1662,7 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_SIMDLEN:
 	case OMP_CLAUSE_ALIGNED:
 	case OMP_CLAUSE_DEPEND:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_ALLOCATE:
 	case OMP_CLAUSE__LOOPTEMP_:
 	case OMP_CLAUSE__REDUCTEMP_:
@@ -11094,6 +11103,26 @@  create_task_copyfn (gomp_task *task_stmt, omp_context *ctx)
 }
 
 static void
+lower_detach_clause (tree *pclauses, gimple_seq *iseq, omp_context *ctx)
+{
+  tree clause = omp_find_clause (*pclauses, OMP_CLAUSE_DETACH);
+  gcc_assert (clause);
+
+  tree event_decl = OMP_CLAUSE_DECL (clause);
+  tree event_ref = lookup_decl_in_outer_ctx (event_decl, ctx);
+  tree fn_decl = builtin_decl_explicit (BUILT_IN_GOMP_NEW_EVENT);
+  tree handle = create_tmp_var (pointer_sized_int_node);
+
+  gimple *call_stmt = gimple_build_call (fn_decl, 0);
+  gimple_call_set_lhs (call_stmt, handle);
+  gimple_seq_add_stmt (iseq, call_stmt);
+
+  gimplify_assign (event_ref,
+		   fold_convert (TREE_TYPE (event_decl), handle),
+		   iseq);
+}
+
+static void
 lower_depend_clauses (tree *pclauses, gimple_seq *iseq, gimple_seq *oseq)
 {
   tree c, clauses;
@@ -11242,6 +11271,15 @@  lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       if (ws_num == 1)
 	gimple_omp_parallel_set_combined_p (stmt, true);
     }
+
+  gimple_seq detach_ilist = NULL;
+  if (gimple_code (stmt) == GIMPLE_OMP_TASK
+      && omp_find_clause (clauses, OMP_CLAUSE_DETACH))
+    {
+      lower_detach_clause (gimple_omp_task_clauses_ptr (stmt), &detach_ilist,
+			   ctx);
+    }
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (gimple_code (stmt) == GIMPLE_OMP_TASK
@@ -11319,6 +11357,10 @@  lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   gimple_seq olist = NULL;
   gimple_seq ilist = NULL;
+
+  if (detach_ilist)
+    gimple_seq_add_seq (&ilist, detach_ilist);
+
   lower_send_clauses (clauses, &ilist, &olist, ctx);
   lower_send_shared_vars (&ilist, &olist, ctx);
 
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index c9280a8..54a436b 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -301,6 +301,9 @@  enum omp_clause_code {
   /* OpenMP clause: to (variable-list).  */
   OMP_CLAUSE_TO,
 
+  /* OpenMP clause: detach (event-handle).  */
+  OMP_CLAUSE_DETACH,
+
   /* OpenACC clauses: {copy, copyin, copyout, create, delete, deviceptr,
      device, host (self), present, present_or_copy (pcopy), present_or_copyin
      (pcopyin), present_or_copyout (pcopyout), present_or_create (pcreate)}
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 318f048..f0fef6a 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1265,6 +1265,12 @@  dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
     case OMP_CLAUSE_FINALIZE:
       pp_string (pp, "finalize");
       break;
+    case OMP_CLAUSE_DETACH:
+      pp_string (pp, "detach(");
+      dump_generic_node (pp, OMP_CLAUSE_DETACH_EXPR (clause), spc, flags,
+			 false);
+      pp_right_paren (pp);
+      break;
 
     default:
       gcc_unreachable ();
diff --git a/gcc/tree.c b/gcc/tree.c
index 9260772..1c8baed 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -299,6 +299,7 @@  unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_LINK  */
   2, /* OMP_CLAUSE_FROM  */
   2, /* OMP_CLAUSE_TO  */
+  1, /* OMP_CLAUSE_DETACH  */
   2, /* OMP_CLAUSE_MAP  */
   1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
   1, /* OMP_CLAUSE_USE_DEVICE_ADDR  */
@@ -384,6 +385,7 @@  const char * const omp_clause_code_name[] =
   "link",
   "from",
   "to",
+  "detach",
   "map",
   "use_device_ptr",
   "use_device_addr",
@@ -12176,6 +12178,7 @@  walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	case OMP_CLAUSE_HINT:
 	case OMP_CLAUSE_TO_DECLARE:
 	case OMP_CLAUSE_LINK:
+	case OMP_CLAUSE_DETACH:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_USE_DEVICE_ADDR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
diff --git a/gcc/tree.h b/gcc/tree.h
index f8f0a60..5bf1c1d 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1585,6 +1585,9 @@  class auto_suppress_location_wrappers
 #define OMP_CLAUSE_PRIORITY_EXPR(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PRIORITY),0)
 
+#define OMP_CLAUSE_DETACH_EXPR(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DETACH),0)
+
 /* OpenACC clause expressions  */
 #define OMP_CLAUSE_EXPR(NODE, CLAUSE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, CLAUSE), 0)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index cd719f9..976b248 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -605,6 +605,12 @@  omp_get_max_task_priority_ (void)
 }
 
 void
+omp_fulfill_event_ (intptr_t event)
+{
+  omp_fulfill_event ((omp_event_handle_t) event);
+}
+
+void
 omp_set_affinity_format_ (const char *format, size_t format_len)
 {
   gomp_set_affinity_format (format, format_len);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index da7ac03..d9d2514 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -517,6 +517,12 @@  struct gomp_taskwait
   gomp_sem_t taskwait_sem;
 };
 
+struct gomp_allow_completion_event
+{
+  bool fulfilled;
+  gomp_sem_t completion_sem;
+};
+
 /* This structure describes a "task" to be run by a thread.  */
 
 struct gomp_task
@@ -546,6 +552,8 @@  struct gomp_task
      entries and the gomp_task in which they reside.  */
   struct priority_node pnode[3];
 
+  struct gomp_allow_completion_event *detach_event;
+
   struct gomp_task_icv icv;
   void (*fn) (void *);
   void *fn_data;
@@ -686,6 +694,10 @@  struct gomp_team
   int work_share_cancelled;
   int team_cancelled;
 
+  /* Tasks waiting for their completion event to be fulfilled.  */
+  struct priority_queue task_detach_queue;
+  unsigned int task_detach_count;
+
   /* This array contains structures for implicit tasks.  */
   struct gomp_task implicit_task[];
 };
@@ -932,6 +944,8 @@  gomp_finish_task (struct gomp_task *task)
 {
   if (__builtin_expect (task->depend_hash != NULL, 0))
     free (task->depend_hash);
+  if (task->detach_event)
+    free (task->detach_event);
 }
 
 /* team.c */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index c5f52f7..d29e05b 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -195,6 +195,8 @@  OMP_5.0.1 {
 	omp_free;
 	omp_get_supported_active_levels;
 	omp_get_supported_active_levels_;
+	omp_fulfill_event;
+	omp_fulfill_event_;
 } OMP_5.0;
 
 GOMP_1.0 {
@@ -347,6 +349,7 @@  GOMP_5.0 {
 	GOMP_loop_ull_nonmonotonic_runtime_start;
 	GOMP_loop_ull_ordered_start;
 	GOMP_loop_ull_start;
+	GOMP_new_event;
 	GOMP_parallel_loop_maybe_nonmonotonic_runtime;
 	GOMP_parallel_loop_nonmonotonic_runtime;
 	GOMP_parallel_reductions;
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 59e3697..670cb2d 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -293,8 +293,9 @@  extern bool GOMP_cancellation_point (int);
 
 /* task.c */
 
+extern uintptr_t GOMP_new_event (void);
 extern void GOMP_task (void (*) (void *), void *, void (*) (void *, void *),
-		       long, long, bool, unsigned, void **, int);
+		       long, long, bool, unsigned, void **, int, uintptr_t);
 extern void GOMP_taskloop (void (*) (void *), void *,
 			   void (*) (void *, void *), long, long, unsigned,
 			   unsigned long, int, long, long, long);
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index be7df6d..b7c3eea 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -171,6 +171,11 @@  typedef struct omp_alloctrait_t
   omp_uintptr_t value;
 } omp_alloctrait_t;
 
+typedef enum omp_event_handle_t __GOMP_UINTPTR_T_ENUM
+{
+  __omp_event_handle_t_max__ = __UINTPTR_MAX__
+} omp_event_handle_t;
+
 #ifdef __cplusplus
 extern "C" {
 # define __GOMP_NOTHROW throw ()
@@ -245,6 +250,8 @@  extern int omp_is_initial_device (void) __GOMP_NOTHROW;
 extern int omp_get_initial_device (void) __GOMP_NOTHROW;
 extern int omp_get_max_task_priority (void) __GOMP_NOTHROW;
 
+extern void omp_fulfill_event (omp_event_handle_t) __GOMP_NOTHROW;
+
 extern void *omp_target_alloc (__SIZE_TYPE__, int) __GOMP_NOTHROW;
 extern void omp_target_free (void *, int) __GOMP_NOTHROW;
 extern int omp_target_is_present (const void *, int) __GOMP_NOTHROW;
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 3b7f0cb..7b70d8b 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -39,6 +39,7 @@ 
         integer, parameter :: omp_alloctrait_val_kind = c_intptr_t
         integer, parameter :: omp_memspace_handle_kind = c_intptr_t
         integer, parameter :: omp_depend_kind = @OMP_DEPEND_KIND@
+        integer, parameter :: omp_event_handle_kind = c_intptr_t
         integer (omp_sched_kind), parameter :: omp_sched_static = 1
         integer (omp_sched_kind), parameter :: omp_sched_dynamic = 2
         integer (omp_sched_kind), parameter :: omp_sched_guided = 3
@@ -556,6 +557,13 @@ 
         end interface
 
         interface
+          subroutine omp_fulfill_event (event)
+            use omp_lib_kinds
+            integer (kind=omp_event_handle_kind), value, intent(in) :: event
+          end subroutine omp_fulfill_event
+        end interface
+
+        interface
           subroutine omp_set_affinity_format (format)
             character(len=*), intent(in) :: format
           end subroutine omp_set_affinity_format
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index eb1dcc4..5b4053f 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -82,10 +82,12 @@ 
 
       integer omp_allocator_handle_kind, omp_alloctrait_key_kind
       integer omp_alloctrait_val_kind, omp_memspace_handle_kind
+      integer omp_event_handle_kind
       parameter (omp_allocator_handle_kind = @INTPTR_T_KIND@)
       parameter (omp_alloctrait_key_kind = 4)
       parameter (omp_alloctrait_val_kind = @INTPTR_T_KIND@)
       parameter (omp_memspace_handle_kind = @INTPTR_T_KIND@)
+      parameter (omp_event_handle_kind = @INTPTR_T_KIND@)
       integer (omp_alloctrait_key_kind) omp_atk_sync_hint
       integer (omp_alloctrait_key_kind) omp_atk_alignment
       integer (omp_alloctrait_key_kind) omp_atk_access
@@ -245,6 +247,8 @@ 
       external omp_get_max_task_priority
       integer(4) omp_get_max_task_priority
 
+      external omp_fulfill_event
+
       external omp_set_affinity_format, omp_get_affinity_format
       external omp_display_affinity, omp_capture_affinity
       integer(4) omp_get_affinity_format
diff --git a/libgomp/task.c b/libgomp/task.c
index a95067c..a09a133 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -86,6 +86,7 @@  gomp_init_task (struct gomp_task *task, struct gomp_task *parent_task,
   task->dependers = NULL;
   task->depend_hash = NULL;
   task->depend_count = 0;
+  task->detach_event = NULL;
 }
 
 /* Clean up a task, after completing it.  */
@@ -326,6 +327,21 @@  gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
     }
 }
 
+uintptr_t
+GOMP_new_event ()
+{
+  struct gomp_allow_completion_event *event;
+
+  event = (struct gomp_allow_completion_event *)
+	    gomp_malloc (sizeof (struct gomp_allow_completion_event));
+  event->fulfilled = false;
+  gomp_sem_init (&event->completion_sem, 0);
+
+  gomp_debug (0, "GOMP_new_event: %p\n", event);
+
+  return (uintptr_t) event;
+}
+
 /* Called when encountering an explicit task directive.  If IF_CLAUSE is
    false, then we must not delay in executing the task.  If UNTIED is true,
    then the task may be executed by any member of the team.
@@ -347,11 +363,14 @@  gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
 void
 GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	   long arg_size, long arg_align, bool if_clause, unsigned flags,
-	   void **depend, int priority)
+	   void **depend, int priority, uintptr_t detach)
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr->ts.team;
 
+  struct gomp_allow_completion_event *detach_event =
+    detach ? (struct gomp_allow_completion_event *) detach : NULL;
+
 #ifdef HAVE_BROKEN_POSIX_SEMAPHORES
   /* If pthread_mutex_* is used for omp_*lock*, then each task must be
      tied to one thread all the time.  This means UNTIED tasks must be
@@ -404,6 +423,10 @@  GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
       task.final_task = (thr->task && thr->task->final_task)
 			|| (flags & GOMP_TASK_FLAG_FINAL);
       task.priority = priority;
+
+      if (detach)
+	task.detach_event = detach_event;
+
       if (thr->task)
 	{
 	  task.in_tied_task = thr->task->in_tied_task;
@@ -420,6 +443,10 @@  GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	}
       else
 	fn (data);
+
+      if (detach)
+	  gomp_sem_wait (&task.detach_event->completion_sem);
+
       /* Access to "children" is normally done inside a task_lock
 	 mutex region, but the only way this particular task.children
 	 can be set is if this thread's task work function (fn)
@@ -435,6 +462,7 @@  GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	  gomp_clear_parent (&task.children_queue);
 	  gomp_mutex_unlock (&team->task_lock);
 	}
+
       gomp_end_task ();
     }
   else
@@ -458,6 +486,8 @@  GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
       task->kind = GOMP_TASK_UNDEFERRED;
       task->in_tied_task = parent->in_tied_task;
       task->taskgroup = taskgroup;
+      if (detach)
+	task->detach_event = detach_event;
       thr->task = task;
       if (cpyfn)
 	{
@@ -1310,6 +1340,8 @@  gomp_barrier_handle_tasks (gomp_barrier_state_t state)
   int do_wake = 0;
 
   gomp_mutex_lock (&team->task_lock);
+  gomp_debug (0, "thread: %d, task_count %d\n",
+	      thr->ts.team_id, team->task_count);
   if (gomp_barrier_last_thread (state))
     {
       if (team->task_count == 0)
@@ -1388,34 +1420,87 @@  gomp_barrier_handle_tasks (gomp_barrier_state_t state)
 	  thr->task = task;
 	}
       else
-	return;
-      gomp_mutex_lock (&team->task_lock);
-      if (child_task)
 	{
-	 finish_cancelled:;
-	  size_t new_tasks
-	    = gomp_task_run_post_handle_depend (child_task, team);
-	  gomp_task_run_post_remove_parent (child_task);
-	  gomp_clear_parent (&child_task->children_queue);
-	  gomp_task_run_post_remove_taskgroup (child_task);
-	  to_free = child_task;
-	  child_task = NULL;
-	  if (!cancelled)
-	    team->task_running_count--;
-	  if (new_tasks > 1)
+	  bool ignored;
+
+	  gomp_mutex_lock (&team->task_lock);
+
+	  if (priority_queue_empty_p (&team->task_detach_queue,
+				      MEMMODEL_RELAXED))
 	    {
-	      do_wake = team->nthreads - team->task_running_count;
-	      if (do_wake > new_tasks)
-		do_wake = new_tasks;
+	      gomp_mutex_unlock (&team->task_lock);
+	      return;
 	    }
-	  if (--team->task_count == 0
-	      && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+
+	  child_task
+	    = priority_queue_next_task (PQ_TEAM, &team->task_detach_queue,
+					PQ_IGNORED, NULL,
+					&ignored);
+	  priority_queue_remove (PQ_TEAM, &team->task_detach_queue,
+				 child_task, MEMMODEL_RELAXED);
+	  --team->task_detach_count;
+	  if (!__atomic_load_n (&child_task->detach_event->fulfilled,
+				__ATOMIC_RELAXED))
 	    {
-	      gomp_team_barrier_done (&team->barrier, state);
+	      /* Wait for detached task to finish.  */
 	      gomp_mutex_unlock (&team->task_lock);
-	      gomp_team_barrier_wake (&team->barrier, 0);
+	      gomp_debug (0,
+			  "thread: %d, waiting for event to be fulfilled %p\n",
+			  thr->ts.team_id, child_task->detach_event);
+	      gomp_sem_wait (&child_task->detach_event->completion_sem);
 	      gomp_mutex_lock (&team->task_lock);
 	    }
+	  else
+	    gomp_debug (0, "thread: %d, queued event already fulfilled %p\n",
+			thr->ts.team_id, child_task->detach_event);
+	  goto finish_cancelled;
+	}
+      gomp_mutex_lock (&team->task_lock);
+      if (child_task)
+	{
+	  if (child_task->detach_event
+	      && !__atomic_load_n (&child_task->detach_event->fulfilled,
+				   __ATOMIC_RELAXED))
+	    {
+	      priority_queue_insert (PQ_TEAM, &team->task_detach_queue,
+				     child_task, child_task->priority,
+				     PRIORITY_INSERT_END,
+				     false, false);
+	      ++team->task_detach_count;
+	      gomp_debug (0, "thread: %d, queueing detached %p\n",
+			  thr->ts.team_id, child_task->detach_event);
+	      child_task = NULL;
+	    }
+	  else
+	    {
+	      if (child_task->detach_event)
+		gomp_debug (0, "thread: %d, event already fulfilled %p\n",
+			    thr->ts.team_id, child_task->detach_event);
+	     finish_cancelled:;
+	      size_t new_tasks
+		= gomp_task_run_post_handle_depend (child_task, team);
+	      gomp_task_run_post_remove_parent (child_task);
+	      gomp_clear_parent (&child_task->children_queue);
+	      gomp_task_run_post_remove_taskgroup (child_task);
+	      to_free = child_task;
+	      child_task = NULL;
+	      if (!cancelled)
+		team->task_running_count--;
+	      if (new_tasks > 1)
+		{
+		  do_wake = team->nthreads - team->task_running_count;
+		  if (do_wake > new_tasks)
+		    do_wake = new_tasks;
+		}
+	      if (--team->task_count == 0
+		  && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+		{
+		  gomp_team_barrier_done (&team->barrier, state);
+		  gomp_mutex_unlock (&team->task_lock);
+		  gomp_team_barrier_wake (&team->barrier, 0);
+		  gomp_mutex_lock (&team->task_lock);
+		}
+	    }
 	}
     }
 }
@@ -2326,3 +2411,18 @@  omp_in_final (void)
 }
 
 ialias (omp_in_final)
+
+void omp_fulfill_event(omp_event_handle_t event)
+{
+  struct gomp_allow_completion_event *ev =
+		(struct gomp_allow_completion_event *) event;
+
+  if (__atomic_load_n (&ev->fulfilled, __ATOMIC_RELAXED))
+    gomp_fatal ("omp_fulfill_enent: Event already fulfilled!\n");
+
+  gomp_debug(0, "omp_fulfill_event: %p\n", ev);
+  __atomic_store_n (&ev->fulfilled, true, __ATOMIC_RELAXED);
+  gomp_sem_post (&ev->completion_sem);
+}
+
+ialias (omp_fulfill_event)
diff --git a/libgomp/team.c b/libgomp/team.c
index cbc3aec..ee488f2 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -206,6 +206,9 @@  gomp_new_team (unsigned nthreads)
   team->work_share_cancelled = 0;
   team->team_cancelled = 0;
 
+  priority_queue_init (&team->task_detach_queue);
+  team->task_detach_count = 0;
+
   return team;
 }
 
@@ -221,6 +224,7 @@  free_team (struct gomp_team *team)
   gomp_barrier_destroy (&team->barrier);
   gomp_mutex_destroy (&team->task_lock);
   priority_queue_free (&team->task_queue);
+  priority_queue_free (&team->task_detach_queue);
   team_free (team);
 }
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
new file mode 100644
index 0000000..7f2319c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-1.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+omp_event_handle_t detach_event1, detach_event2;
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+
+  #pragma omp parallel
+  {
+    #pragma omp single
+    {
+      #pragma omp task detach(detach_event1)
+      {
+	x++;
+      }
+
+      #pragma omp task detach(detach_event2)
+      {
+	y++;
+	omp_fulfill_event (detach_event1);
+      }
+
+      #pragma omp task
+      {
+	z++;
+	omp_fulfill_event (detach_event2);
+      }
+    }
+    #pragma omp taskwait
+  }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
new file mode 100644
index 0000000..330c936
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-2.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+
+  #pragma omp parallel num_threads(1)
+  {
+    #pragma omp single
+    {
+      omp_event_handle_t detach_event1, detach_event2;
+
+      #pragma omp task detach(detach_event1)
+      {
+	x++;
+      }
+
+      #pragma omp task detach(detach_event2)
+      {
+	y++;
+	omp_fulfill_event (detach_event1);
+      }
+
+      #pragma omp task
+      {
+	z++;
+	omp_fulfill_event (detach_event2);
+      }
+    }
+    #pragma omp taskwait
+  }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
new file mode 100644
index 0000000..a16f5336
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-3.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run } */
+
+#include <omp.h>
+#include <assert.h>
+
+int main (void)
+{
+  int x = 0, y = 0, z = 0;
+
+  #pragma omp parallel
+  {
+    #pragma omp single
+    {
+      omp_event_handle_t detach_event;
+      int dep;
+
+      #pragma omp task depend(out:dep) detach(detach_event)
+      {
+	x++;
+      }
+
+      #pragma omp task
+      {
+	y++;
+	omp_fulfill_event(detach_event);
+      }
+
+      #pragma omp task depend(in:dep)
+      {
+	z++;
+      }
+    }
+    #pragma omp taskwait
+  }
+
+  assert (x == 1);
+  assert (y == 1);
+  assert (z == 1);
+}
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-1.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
new file mode 100644
index 0000000..20e3675
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-1.f90
@@ -0,0 +1,33 @@ 
+! { dg-do run }
+
+program task_detach_1
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+
+  !$omp parallel
+    !$omp single
+
+      !$omp task detach(detach_event1)
+        x = x + 1
+      !$omp end task
+
+      !$omp task detach(detach_event2)
+        y = y + 1
+	call omp_fulfill_event (detach_event1)
+      !$omp end task
+
+      !$omp task
+        z = z + 1
+	call omp_fulfill_event (detach_event2)
+      !$omp end task
+    !$omp end single
+
+    !$omp taskwait
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-2.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
new file mode 100644
index 0000000..bd0f016
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-2.f90
@@ -0,0 +1,33 @@ 
+! { dg-do run }
+
+program task_detach_2
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event1, detach_event2
+  integer :: x = 0, y = 0, z = 0
+
+  !$omp parallel num_threads(1)
+    !$omp single
+
+      !$omp task detach(detach_event1)
+        x = x + 1
+      !$omp end task
+
+      !$omp task detach(detach_event2)
+        y = y + 1
+	call omp_fulfill_event (detach_event1)
+      !$omp end task
+
+      !$omp task
+        z = z + 1
+	call omp_fulfill_event (detach_event2)
+      !$omp end task
+    !$omp end single
+
+    !$omp taskwait
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-3.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
new file mode 100644
index 0000000..8a2ae48
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-3.f90
@@ -0,0 +1,33 @@ 
+! { dg-do run }
+
+program task_detach_3
+
+  use omp_lib
+
+  integer (kind=omp_event_handle_kind) :: detach_event
+  integer :: x = 0, y = 0, z = 0
+  integer :: dep
+
+  !$omp parallel
+    !$omp single
+      !$omp task depend(out:dep) detach(detach_event)
+        x = x + 1
+      !$omp end task
+
+      !$omp task
+        y = y + 1
+	call omp_fulfill_event(detach_event)
+      !$omp end task
+
+      !$omp task depend(in:dep)
+        z = z + 1
+      !$omp end task
+    !$omp end single
+
+    !$omp taskwait
+  !$omp end parallel
+
+  if (x /= 1) stop 1
+  if (y /= 1) stop 2
+  if (z /= 1) stop 3
+end program