diff mbox series

openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]

Message ID f465050f-125d-846d-86f8-ee671602fb5a@codesourcery.com
State New
Headers show
Series openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738] | expand

Commit Message

Kwok Cheung Yeung Jan. 21, 2021, 7:33 p.m. UTC
Hello

This patch addresses the intermittent hanging seen in the 
libgomp.c-c++-common/task-detach-6.f90 test.

The main problem is due to the 'omp taskwait' in the test. GOMP_taskwait can run 
tasks, so for correct semantics it needs to be able to place finished tasks that 
have unfulfilled completion events into the detach queue, rather than just 
finishing them immediately (in effect ignoring the detach clause).

Unfinished tasks in the detach queue are still children of their parent task, so 
they can appear in next_task in the main GOMP_taskwait loop. If next_task is 
fulfilled then it can be finished immediately, otherwise it will wait on 
taskwait_sem.

omp_fulfill_event needs to be able to post the taskwait_sem semaphore as well as 
wake the team barrier. Since the semaphore is located on the parent of the task 
whose completion event is being fulfilled, I have changed the event handle to 
being a pointer to the task instead of just the completion semaphore in order to 
access the parent field.

This type of code is currently used to wake the threads for the team barrier:

   if (team->nthreads > team->task_running_count)
     gomp_team_barrier_wake (&team->barrier, 1);

This issues a gomp_team_barrier_wake if any of the threads are not running a 
task (and so might be sleeping). However, detach tasks that are queued waiting 
for a completion event are currently included in task_running_count (because the 
finish_cancelled code executed later decrements it). Since 
gomp_barrier_handle_tasks does not block if there are unfinished detached tasks 
remaining (since during development I found that doing so could cause deadlocks 
in single-threaded code), threads could be sleeping even if team->nthreads == 
team->task_running_count, and this code would fail to wake them. I fixed this by 
decrementing task_running_count when queuing an unfinished detach task, and 
skipping the decrement in finish_cancelled if the task was a queued detach tash. 
I added a new gomp_task_kind GOMP_TASK_DETACHED to mark these type of tasks.

I have tried running the task-detach-6 testcase (C and Fortran) 10,000 
iterations at a time using 32 threads, on a x86_64 Linux machine with GCC built 
with --disable-linux-futex, and no hangs. I have checked that it bootstraps, and 
noticed no regressions in the libgomp testsuite when run without offloading.

With Nvidia and GCN offloading though, task-detach-6 hangs... I _think_ the 
reason why it 'worked' before was because the taskwait allowed tasks with detach 
clauses to always complete immediately after execution. Since that backdoor has 
been closed, task-detach-6 hangs with or without the taskwait.

I think GOMP_taskgroup_end and maybe gomp_task_maybe_wait_for_dependencies also 
need the same type of TLC as they can also run tasks, but there are currently no 
tests that exercise it.

The detach support clearly needs more work, but is this particular patch okay 
for trunk?

Thanks

Kwok
From 12cc24c937e9294d5616dd0cd9a754c02ffb26fa Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Thu, 21 Jan 2021 05:38:47 -0800
Subject: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp
 tests [PR98738]

This adds support for the task detach clause to taskwait, and fixes a
number of problems related to semaphores that may lead to a hang in
some circumstances.

2021-01-21  Kwok Cheung Yeung  <kcy@codesourcery.com>

	libgomp/

	PR libgomp/98738
	* libgomp.h (enum gomp_task_kind): Add GOMP_TASK_DETACHED.
	* task.c (task_fulfilled_p): Check detach field as well.
	(GOMP_task): Use address of task as the event handle.
	(gomp_barrier_handle_tasks): Fix indentation.  Use address of task
	as event handle. Set kind of suspended detach task to
	GOMP_TASK_DETACHED and decrement task_running_count.  Move
	finish_cancelled block out of else branch.  Skip decrement of
	task_running_count if task kind is GOMP_TASK_DETACHED.
	(GOMP_taskwait): Finish fulfilled detach tasks.  Update comment.
	Queue detach tasks that have not been fulfilled.
	(omp_fulfill_event): Use address of task as event handle.  Post
	to taskwait_sem and taskgroup_sem if necessary.  Check
	task_running_count before calling gomp_team_barrier_wake.
	* testsuite/libgomp.c-c++-common/task-detach-5.c (main): Change
	data-sharing of detach events on enclosing parallel to private.
	* testsuite/libgomp.c-c++-common/task-detach-6.c (main): Likewise.
	* testsuite/libgomp.fortran/task-detach-5.f90 (task_detach_5):
	Likewise.
	* testsuite/libgomp.fortran/task-detach-6.f90 (task_detach_6):
	Likewise.
---
 libgomp/libgomp.h                                  |   5 +-
 libgomp/task.c                                     | 155 ++++++++++++++-------
 .../testsuite/libgomp.c-c++-common/task-detach-5.c |   2 +-
 .../testsuite/libgomp.c-c++-common/task-detach-6.c |   2 +-
 .../testsuite/libgomp.fortran/task-detach-5.f90    |   2 +-
 .../testsuite/libgomp.fortran/task-detach-6.f90    |   2 +-
 6 files changed, 115 insertions(+), 53 deletions(-)

Comments

Kwok Cheung Yeung Jan. 21, 2021, 10:46 p.m. UTC | #1
On 21/01/2021 7:33 pm, Kwok Cheung Yeung wrote:
> With Nvidia and GCN offloading though, task-detach-6 hangs... I _think_ the 
> reason why it 'worked' before was because the taskwait allowed tasks with detach 
> clauses to always complete immediately after execution. Since that backdoor has 
> been closed, task-detach-6 hangs with or without the taskwait.

It turns out that the hang is because the team barrier threads fail to wake up 
when gomp_team_barrier_wake is called from omp_fulfill_event, because it was 
done while task_lock was held. When the lock is freed first, the wake works as 
expected and the test completes.

Is this patch okay for trunk (to be squashed into the previous patch)?

Thanks

Kwok
From 2ee183c22772bc7d80d24ae75d5bd57f419712fd Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Thu, 21 Jan 2021 14:01:16 -0800
Subject: [PATCH] openmp: Fix hangs when task constructs with detach clauses
 are offloaded

2021-01-21  Kwok Cheung Yeung  <kcy@codesourcery.com>

	libgomp/
	task.c (GOMP_task): Add thread to debug message.
	(gomp_barrier_handle_tasks): Do not take address of child_task in
	debug message.
	(omp_fulfill_event): Release team->task_lock before waking team
	barrier threads.
---
 libgomp/task.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libgomp/task.c b/libgomp/task.c
index dbd6284..60b598e 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -492,7 +492,7 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	  if (data)
 	    *(void **) data = task;
 
-	  gomp_debug (0, "New event: %p\n", task);
+	  gomp_debug (0, "Thread %d: new event: %p\n", thr->ts.team_id, task);
 	}
       thr->task = task;
       if (cpyfn)
@@ -1372,7 +1372,7 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
 				 child_task, MEMMODEL_RELAXED);
 	  --team->task_detach_count;
 	  gomp_debug (0, "thread %d: found task with fulfilled event %p\n",
-		      thr->ts.team_id, &child_task);
+		      thr->ts.team_id, child_task);
 
 	  if (to_free)
 	    {
@@ -2470,8 +2470,12 @@ omp_fulfill_event (omp_event_handle_t event)
       gomp_sem_post (&task->taskgroup->taskgroup_sem);
     }
   if (team && team->nthreads > team->task_running_count)
-    gomp_team_barrier_wake (&team->barrier, 1);
-  gomp_mutex_unlock (&team->task_lock);
+    {
+      gomp_mutex_unlock (&team->task_lock);
+      gomp_team_barrier_wake (&team->barrier, 1);
+    }
+  else
+    gomp_mutex_unlock (&team->task_lock);
 }
 
 ialias (omp_fulfill_event)
Jakub Jelinek Jan. 29, 2021, 3:03 p.m. UTC | #2
On Thu, Jan 21, 2021 at 07:33:34PM +0000, Kwok Cheung Yeung wrote:
> The detach support clearly needs more work, but is this particular patch
> okay for trunk?

Sorry for the delay.

I'm afraid it is far from being ready.

> @@ -2402,17 +2437,41 @@ ialias (omp_in_final)
>  void
>  omp_fulfill_event (omp_event_handle_t event)
>  {
> -  gomp_sem_t *sem = (gomp_sem_t *) event;
> +  struct gomp_task *task = (struct gomp_task *) event;
> +  struct gomp_task *parent = task->parent;
>    struct gomp_thread *thr = gomp_thread ();
>    struct gomp_team *team = thr ? thr->ts.team : NULL;
>  
> -  if (gomp_sem_getcount (sem) > 0)
> -    gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", sem);
> +  if (gomp_sem_getcount (&task->completion_sem) > 0)
> +    gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", task);

As written earlier, the intent of omp_fulfill_event is that it should be
callable from anywhere, not necessarily one of the threads in the team.
The application could have other threads (often called unshackeled threads)
from which it would call it, or just some other parallel or whatever else,
as long as it is not racy to pass in the omp_event_handle_t to there.
So,
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr ? thr->ts.team : NULL;
is incorrect, it will give you the team of the current thread, rather than
the team of the task to be fulfilled.

It can also crash if team is NULL, which will happen any time
this is called outside of a parallel.  Just try (should go into testsuite
too):
#include <omp.h>

int
main ()
{
  omp_event_handle_t ev;
  #pragma omp task detach (ev)
  omp_fulfill_event (ev);
  return 0;
}

Additionally, there is an important difference between fulfill for
included tasks and for non-included tasks, for the former there is no team
or anything to care about, for the latter there is a team and one needs to
take the task_lock, but at that point it can do pretty much everything in
omp_fulfill_event rather than handling it elsewhere.

So, what I'm suggesting is:

Replace
  bool detach;
  gomp_sem_t completion_sem;
with
  struct gomp_task_detach *detach;
and add struct gomp_task_detach that would contain everything that will be
needed (indirect so that we don't waste space for it in every task, but only
for those that have detach clause).
We need:
1) some way to tell if it is an included task or not
2) for included tasks the gomp_sem_t completion_sem
(and nothing but 1) and 2) for those),
3) struct gomp_team * for non-included tasks
4) some way to find out if the task has finished and is just waiting for
fulfill event (perhaps your GOMP_TASK_DETACHED is ok for that)
5) some way to find out if the task has been fulfilled already
(gomp_sem_t for that seems an overkill though)

1) could be done through the struct gomp_team *team; member,
set it to NULL in included tasks (no matter if they are in some team or not)
and to non-NULL team of the task (non-included tasks must have a team).

And I don't see the point of task_detach_queue if we can handle the
dependers etc. all in omp_fulfill_event, which I think we can if we take the
task_lock.

So, I think omp_fulfill_event should look at the task->detach it got,
if task->detach->team is NULL, it is included task, GOMP_task should have
initialized task->detach->completion_sem and omp_fulfill_event should just
gomp_sem_post it and that is all, GOMP_task for included task needs to
gomp_sem_wait after it finishes before it returns.

Otherwise, take the team's task_lock, and look at whether the task is still
running, in that case just set the bool that it has been fulfilled (or
whatever way of signalling 5), perhaps it can be say clearing task->detach
pointer).  When creating non-included tasks in GOMP_task with detach clause
through gomp_malloc, it would add the size needed for struct
gomp_task_detach.
But if the task is already in GOMP_TASK_DETACHED state, instead we need
while holding the task_lock do everything that would have been done normally
on task finish, but we've skipped it because it hasn't been fulfilled.
Including the waking/sem_posts when something could be waiting on that task.

Do you agree with this, or see some reason why this can't work?

And testsuite should include also cases where we wait for the tasks with
detach clause to be fulfilled at the end of taskgroup (i.e. need to cover
all of taskwait, taskgroup end and barrier).

	Jakub
H.J. Lu Feb. 12, 2021, 2:36 p.m. UTC | #3
On Fri, Jan 29, 2021 at 7:53 AM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Jan 21, 2021 at 07:33:34PM +0000, Kwok Cheung Yeung wrote:
> > The detach support clearly needs more work, but is this particular patch
> > okay for trunk?
>
> Sorry for the delay.
>
> I'm afraid it is far from being ready.
>
> > @@ -2402,17 +2437,41 @@ ialias (omp_in_final)
> >  void
> >  omp_fulfill_event (omp_event_handle_t event)
> >  {
> > -  gomp_sem_t *sem = (gomp_sem_t *) event;
> > +  struct gomp_task *task = (struct gomp_task *) event;
> > +  struct gomp_task *parent = task->parent;
> >    struct gomp_thread *thr = gomp_thread ();
> >    struct gomp_team *team = thr ? thr->ts.team : NULL;
> >
> > -  if (gomp_sem_getcount (sem) > 0)
> > -    gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", sem);
> > +  if (gomp_sem_getcount (&task->completion_sem) > 0)
> > +    gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", task);
>
> As written earlier, the intent of omp_fulfill_event is that it should be
> callable from anywhere, not necessarily one of the threads in the team.
> The application could have other threads (often called unshackeled threads)
> from which it would call it, or just some other parallel or whatever else,
> as long as it is not racy to pass in the omp_event_handle_t to there.
> So,
>    struct gomp_thread *thr = gomp_thread ();
>    struct gomp_team *team = thr ? thr->ts.team : NULL;
> is incorrect, it will give you the team of the current thread, rather than
> the team of the task to be fulfilled.
>
> It can also crash if team is NULL, which will happen any time
> this is called outside of a parallel.  Just try (should go into testsuite
> too):
> #include <omp.h>
>
> int
> main ()
> {
>   omp_event_handle_t ev;
>   #pragma omp task detach (ev)
>   omp_fulfill_event (ev);
>   return 0;
> }
>
> Additionally, there is an important difference between fulfill for
> included tasks and for non-included tasks, for the former there is no team
> or anything to care about, for the latter there is a team and one needs to
> take the task_lock, but at that point it can do pretty much everything in
> omp_fulfill_event rather than handling it elsewhere.
>
> So, what I'm suggesting is:
>
> Replace
>   bool detach;
>   gomp_sem_t completion_sem;
> with
>   struct gomp_task_detach *detach;
> and add struct gomp_task_detach that would contain everything that will be
> needed (indirect so that we don't waste space for it in every task, but only
> for those that have detach clause).
> We need:
> 1) some way to tell if it is an included task or not
> 2) for included tasks the gomp_sem_t completion_sem
> (and nothing but 1) and 2) for those),
> 3) struct gomp_team * for non-included tasks
> 4) some way to find out if the task has finished and is just waiting for
> fulfill event (perhaps your GOMP_TASK_DETACHED is ok for that)
> 5) some way to find out if the task has been fulfilled already
> (gomp_sem_t for that seems an overkill though)
>
> 1) could be done through the struct gomp_team *team; member,
> set it to NULL in included tasks (no matter if they are in some team or not)
> and to non-NULL team of the task (non-included tasks must have a team).
>
> And I don't see the point of task_detach_queue if we can handle the
> dependers etc. all in omp_fulfill_event, which I think we can if we take the
> task_lock.
>
> So, I think omp_fulfill_event should look at the task->detach it got,
> if task->detach->team is NULL, it is included task, GOMP_task should have
> initialized task->detach->completion_sem and omp_fulfill_event should just
> gomp_sem_post it and that is all, GOMP_task for included task needs to
> gomp_sem_wait after it finishes before it returns.
>
> Otherwise, take the team's task_lock, and look at whether the task is still
> running, in that case just set the bool that it has been fulfilled (or
> whatever way of signalling 5), perhaps it can be say clearing task->detach
> pointer).  When creating non-included tasks in GOMP_task with detach clause
> through gomp_malloc, it would add the size needed for struct
> gomp_task_detach.
> But if the task is already in GOMP_TASK_DETACHED state, instead we need
> while holding the task_lock do everything that would have been done normally
> on task finish, but we've skipped it because it hasn't been fulfilled.
> Including the waking/sem_posts when something could be waiting on that task.
>
> Do you agree with this, or see some reason why this can't work?
>
> And testsuite should include also cases where we wait for the tasks with
> detach clause to be fulfilled at the end of taskgroup (i.e. need to cover
> all of taskwait, taskgroup end and barrier).
>

task-detach-6.f90 should be disabled for now.  It has been blocking my testers
for weeks.

--
H.J.
diff mbox series

Patch

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index b4d0c93..b24de5c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -481,7 +481,10 @@  enum gomp_task_kind
      but not yet completed.  Once that completes, they will be readded
      into the queues as GOMP_TASK_WAITING in order to perform the var
      unmapping.  */
-  GOMP_TASK_ASYNC_RUNNING
+  GOMP_TASK_ASYNC_RUNNING,
+  /* Task that has finished executing but is waiting for its
+     completion event to be fulfilled.  */
+  GOMP_TASK_DETACHED
 };
 
 struct gomp_task_depend_entry
diff --git a/libgomp/task.c b/libgomp/task.c
index b242e7c..dbd6284 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -330,7 +330,7 @@  gomp_task_handle_depend (struct gomp_task *task, struct gomp_task *parent,
 static bool
 task_fulfilled_p (struct gomp_task *task)
 {
-  return gomp_sem_getcount (&task->completion_sem) > 0;
+  return task->detach && gomp_sem_getcount (&task->completion_sem) > 0;
 }
 
 /* Called when encountering an explicit task directive.  If IF_CLAUSE is
@@ -419,11 +419,11 @@  GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	{
 	  task.detach = true;
 	  gomp_sem_init (&task.completion_sem, 0);
-	  *(void **) detach = &task.completion_sem;
+	  *(void **) detach = &task;
 	  if (data)
-	    *(void **) data = &task.completion_sem;
+	    *(void **) data = &task;
 
-	  gomp_debug (0, "New event: %p\n", &task.completion_sem);
+	  gomp_debug (0, "New event: %p\n", &task);
 	}
 
       if (thr->task)
@@ -488,11 +488,11 @@  GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
 	{
 	  task->detach = true;
 	  gomp_sem_init (&task->completion_sem, 0);
-	  *(void **) detach = &task->completion_sem;
+	  *(void **) detach = task;
 	  if (data)
-	    *(void **) data = &task->completion_sem;
+	    *(void **) data = task;
 
-	  gomp_debug (0, "New event: %p\n", &task->completion_sem);
+	  gomp_debug (0, "New event: %p\n", task);
 	}
       thr->task = task;
       if (cpyfn)
@@ -1372,14 +1372,14 @@  gomp_barrier_handle_tasks (gomp_barrier_state_t state)
 				 child_task, MEMMODEL_RELAXED);
 	  --team->task_detach_count;
 	  gomp_debug (0, "thread %d: found task with fulfilled event %p\n",
-		      thr->ts.team_id, &child_task->completion_sem);
+		      thr->ts.team_id, &child_task);
 
-	if (to_free)
-	  {
-	    gomp_finish_task (to_free);
-	    free (to_free);
-	    to_free = NULL;
-	  }
+	  if (to_free)
+	    {
+	      gomp_finish_task (to_free);
+	      free (to_free);
+	      to_free = NULL;
+	    }
 	  goto finish_cancelled;
 	}
 
@@ -1452,41 +1452,43 @@  gomp_barrier_handle_tasks (gomp_barrier_state_t state)
 	{
 	  if (child_task->detach && !task_fulfilled_p (child_task))
 	    {
+	      child_task->kind = GOMP_TASK_DETACHED;
 	      priority_queue_insert (PQ_TEAM, &team->task_detach_queue,
 				     child_task, child_task->priority,
 				     PRIORITY_INSERT_END,
 				     false, false);
 	      ++team->task_detach_count;
-	      gomp_debug (0, "thread %d: queueing task with event %p\n",
-			  thr->ts.team_id, &child_task->completion_sem);
+	      --team->task_running_count;
+	      gomp_debug (0,
+			  "thread %d: queuing detached task with event %p\n",
+			  thr->ts.team_id, child_task);
 	      child_task = NULL;
+	      continue;
 	    }
-	  else
+
+	 finish_cancelled:;
+	  size_t new_tasks
+	    = gomp_task_run_post_handle_depend (child_task, team);
+	  gomp_task_run_post_remove_parent (child_task);
+	  gomp_clear_parent (&child_task->children_queue);
+	  gomp_task_run_post_remove_taskgroup (child_task);
+	  to_free = child_task;
+	  if (!cancelled && child_task->kind != GOMP_TASK_DETACHED)
+	    team->task_running_count--;
+	  child_task = NULL;
+	  if (new_tasks > 1)
 	    {
-	     finish_cancelled:;
-	      size_t new_tasks
-		= gomp_task_run_post_handle_depend (child_task, team);
-	      gomp_task_run_post_remove_parent (child_task);
-	      gomp_clear_parent (&child_task->children_queue);
-	      gomp_task_run_post_remove_taskgroup (child_task);
-	      to_free = child_task;
-	      child_task = NULL;
-	      if (!cancelled)
-		team->task_running_count--;
-	      if (new_tasks > 1)
-		{
-		  do_wake = team->nthreads - team->task_running_count;
-		  if (do_wake > new_tasks)
-		    do_wake = new_tasks;
-		}
-	      if (--team->task_count == 0
-		  && gomp_team_barrier_waiting_for_tasks (&team->barrier))
-		{
-		  gomp_team_barrier_done (&team->barrier, state);
-		  gomp_mutex_unlock (&team->task_lock);
-		  gomp_team_barrier_wake (&team->barrier, 0);
-		  gomp_mutex_lock (&team->task_lock);
-		}
+	      do_wake = team->nthreads - team->task_running_count;
+	      if (do_wake > new_tasks)
+		do_wake = new_tasks;
+	    }
+	  if (--team->task_count == 0
+	      && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+	    {
+	      gomp_team_barrier_done (&team->barrier, state);
+	      gomp_mutex_unlock (&team->task_lock);
+	      gomp_team_barrier_wake (&team->barrier, 0);
+	      gomp_mutex_lock (&team->task_lock);
 	    }
 	}
     }
@@ -1556,10 +1558,28 @@  GOMP_taskwait (void)
 	      goto finish_cancelled;
 	    }
 	}
+      else if (next_task->kind == GOMP_TASK_DETACHED
+	       && task_fulfilled_p (next_task))
+	{
+	  child_task = next_task;
+	  gomp_debug (0, "thread %d: found task with fulfilled event %p\n",
+		      thr->ts.team_id, &child_task);
+	  priority_queue_remove (PQ_TEAM, &team->task_detach_queue,
+				 child_task, MEMMODEL_RELAXED);
+	  --team->task_detach_count;
+	  if (to_free)
+	    {
+	      gomp_finish_task (to_free);
+	      free (to_free);
+	      to_free = NULL;
+	    }
+	  goto finish_cancelled;
+	}
       else
 	{
 	/* All tasks we are waiting for are either running in other
-	   threads, or they are tasks that have not had their
+	   threads, are detached and waiting for the completion event to be
+	   fulfilled, or they are tasks that have not had their
 	   dependencies met (so they're not even in the queue).  Wait
 	   for them.  */
 	  if (task->taskwait == NULL)
@@ -1614,6 +1634,21 @@  GOMP_taskwait (void)
       gomp_mutex_lock (&team->task_lock);
       if (child_task)
 	{
+	  if (child_task->detach && !task_fulfilled_p (child_task))
+	    {
+	      child_task->kind = GOMP_TASK_DETACHED;
+	      priority_queue_insert (PQ_TEAM, &team->task_detach_queue,
+				     child_task, child_task->priority,
+				     PRIORITY_INSERT_END,
+				     false, false);
+	      ++team->task_detach_count;
+	      gomp_debug (0,
+			  "thread %d: queuing detached task with event %p\n",
+			  thr->ts.team_id, child_task);
+	      child_task = NULL;
+	      continue;
+	    }
+
 	 finish_cancelled:;
 	  size_t new_tasks
 	    = gomp_task_run_post_handle_depend (child_task, team);
@@ -2402,17 +2437,41 @@  ialias (omp_in_final)
 void
 omp_fulfill_event (omp_event_handle_t event)
 {
-  gomp_sem_t *sem = (gomp_sem_t *) event;
+  struct gomp_task *task = (struct gomp_task *) event;
+  struct gomp_task *parent = task->parent;
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr ? thr->ts.team : NULL;
 
-  if (gomp_sem_getcount (sem) > 0)
-    gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", sem);
+  if (gomp_sem_getcount (&task->completion_sem) > 0)
+    gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", task);
 
-  gomp_debug (0, "omp_fulfill_event: %p\n", sem);
-  gomp_sem_post (sem);
-  if (team)
+  gomp_debug (0, "omp_fulfill_event: %p\n", task);
+  gomp_sem_post (&task->completion_sem);
+
+  /* Wake up any threads that may be waiting for the detached task
+     to complete.  */
+  gomp_mutex_lock (&team->task_lock);
+  if (parent && parent->taskwait)
+    {
+      if (parent->taskwait->in_taskwait)
+	{
+	  parent->taskwait->in_taskwait = false;
+	  gomp_sem_post (&parent->taskwait->taskwait_sem);
+	}
+      else if (parent->taskwait->in_depend_wait)
+	{
+	  parent->taskwait->in_depend_wait = false;
+	  gomp_sem_post (&parent->taskwait->taskwait_sem);
+	}
+    }
+  if (task->taskgroup && task->taskgroup->in_taskgroup_wait)
+    {
+      task->taskgroup->in_taskgroup_wait = false;
+      gomp_sem_post (&task->taskgroup->taskgroup_sem);
+    }
+  if (team && team->nthreads > team->task_running_count)
     gomp_team_barrier_wake (&team->barrier, 1);
+  gomp_mutex_unlock (&team->task_lock);
 }
 
 ialias (omp_fulfill_event)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
index 5a01517..71bcde9 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
@@ -12,7 +12,7 @@  int main (void)
   int thread_count;
   omp_event_handle_t detach_event1, detach_event2;
 
-  #pragma omp parallel firstprivate(detach_event1, detach_event2)
+  #pragma omp parallel private(detach_event1, detach_event2)
   {
     #pragma omp single
       thread_count = omp_get_num_threads();
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
index b5f68cc..e7af05a 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
@@ -14,7 +14,7 @@  int main (void)
   omp_event_handle_t detach_event1, detach_event2;
 
   #pragma omp target map(tofrom: x, y, z) map(from: thread_count)
-    #pragma omp parallel firstprivate(detach_event1, detach_event2)
+    #pragma omp parallel private(detach_event1, detach_event2)
       {
 	#pragma omp single
 	  thread_count = omp_get_num_threads();
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-5.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
index 955d687..8bebb5c 100644
--- a/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
@@ -10,7 +10,7 @@  program task_detach_5
   integer :: x = 0, y = 0, z = 0
   integer :: thread_count
 
-  !$omp parallel firstprivate(detach_event1, detach_event2)
+  !$omp parallel private(detach_event1, detach_event2)
     !$omp single
       thread_count = omp_get_num_threads()
     !$omp end single
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90 b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
index 0fe2155..437ca66 100644
--- a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
@@ -12,7 +12,7 @@  program task_detach_6
   integer :: thread_count
 
   !$omp target map(tofrom: x, y, z) map(from: thread_count)
-    !$omp parallel firstprivate(detach_event1, detach_event2)
+    !$omp parallel private(detach_event1, detach_event2)
       !$omp single
 	thread_count = omp_get_num_threads()
       !$omp end single