Patchwork PR libgomp/56073: benchmark regression due to PR libgomp/51376 fix

login
register
mail settings
Submitter Alan Modra
Date Jan. 23, 2013, 1:11 a.m.
Message ID <20130123011152.GO3244@bubble.grove.modra.org>
Download mbox | patch
Permalink /patch/214706/
State New
Headers show

Comments

Alan Modra - Jan. 23, 2013, 1:11 a.m.
On Tue, Jan 22, 2013 at 02:20:23PM +0100, Torvald Riegel wrote:
> > @@ -116,11 +116,10 @@ GOMP_task (void (*fn) (void *), void *data, void (
> >  	}
> >        else
> >  	fn (data);
> > -      if (team != NULL)
> > +      if (task.children != NULL)
> 
> Why does this not need an acquire barrier?

I explained that in my patch submission.  Briefly, the only way this
particular task.children can be set is if this thread's task work
function creates children.  So since the setter is *this* thread, we
need no barriers.  We can have task.children set by the current
thread then changed by a child thread, but seeing a stale non-NULL
value is not a problem here.  Once past the task_lock acquisition,
this thread will see the real value of task.children.

> >  	{
> >  	  gomp_mutex_lock (&team->task_lock);
> > -	  if (task.children != NULL)
> 
> Are you sure that you can remove this check to task.children here,
> especially if you don't use an acquire barrier previously?
> 
> Can there be several threads trying to call gomp_clear_parent?

Yes and yes.  gomp_clear_parent handles a NULL arg, and as you can
see, runs inside a task_lock mutex region here.

> > -	    gomp_clear_parent (task.children);
> > +	  gomp_clear_parent (task.children);
> >  	  gomp_mutex_unlock (&team->task_lock);
> 
> Please add comments or reference the comments in the other pieces of
> synchronizing code related to this.

Please note that all of the above is a reversion!  Adding a comment
has the risk that the comment is *wrong*.  That said, I appreciate the
need to document threaded code well..

> > -  if (task == NULL || team == NULL)
> > +  if (task == NULL
> > +      || __atomic_load_n (&task->children, MEMMODEL_ACQUIRE) == NULL)
> 
> Please mention with which stores this acquire load synchronizes.

How does this look?

	* task.c (GOMP_task, GOMP_taskwait): Comment.
Torvald Riegel - Jan. 23, 2013, 10:39 a.m.
On Wed, 2013-01-23 at 11:41 +1030, Alan Modra wrote:
> On Tue, Jan 22, 2013 at 02:20:23PM +0100, Torvald Riegel wrote:
> > > @@ -116,11 +116,10 @@ GOMP_task (void (*fn) (void *), void *data, void (
> > >  	}
> > >        else
> > >  	fn (data);
> > > -      if (team != NULL)
> > > +      if (task.children != NULL)
> > 
> > Why does this not need an acquire barrier?
> 
> I explained that in my patch submission.

Right, I missed this when looking at just the code later.  Sorry.

> Briefly, the only way this
> particular task.children can be set is if this thread's task work
> function creates children.  So since the setter is *this* thread, we
> need no barriers.  We can have task.children set by the current
> thread then changed by a child thread, but seeing a stale non-NULL
> value is not a problem here.  Once past the task_lock acquisition,
> this thread will see the real value of task.children.
> 
> > >  	{
> > >  	  gomp_mutex_lock (&team->task_lock);
> > > -	  if (task.children != NULL)
> > 
> > Are you sure that you can remove this check to task.children here,
> > especially if you don't use an acquire barrier previously?
> > 
> > Can there be several threads trying to call gomp_clear_parent?
> 
> Yes and yes.  gomp_clear_parent handles a NULL arg, and as you can
> see, runs inside a task_lock mutex region here.
> 
> > > -	    gomp_clear_parent (task.children);
> > > +	  gomp_clear_parent (task.children);
> > >  	  gomp_mutex_unlock (&team->task_lock);
> > 
> > Please add comments or reference the comments in the other pieces of
> > synchronizing code related to this.
> 
> Please note that all of the above is a reversion!

I understand.  But you took the time to look at this in detail, so it
would be a shame if this would be "hidden" in the email for the patch,
not in the code.

> Adding a comment
> has the risk that the comment is *wrong*.

That's true, but I think this would be still better than adding no
comment.  If the comments summarize the intent/understanding of you or
someone else, and the code doesn't match, then it's easier for everyone
to see that something might be wrong.  This might be unnecessary or even
counter-productive for sequential code, but I think that concurrent code
is often sufficiently complex for the additional redundancy to be
worthwhile (e.g., because you often need global reasoning in contrast to
typically local reasoning for sequential code, etc.).

> That said, I appreciate the
> need to document threaded code well..
> 
> > > -  if (task == NULL || team == NULL)
> > > +  if (task == NULL
> > > +      || __atomic_load_n (&task->children, MEMMODEL_ACQUIRE) == NULL)
> > 
> > Please mention with which stores this acquire load synchronizes.
> 
> How does this look?

Thanks!  This looks good.

Torvald

> 	* task.c (GOMP_task, GOMP_taskwait): Comment.
> 
> Index: libgomp/task.c
> ===================================================================
> --- libgomp/task.c	(revision 195370)
> +++ libgomp/task.c	(working copy)
> @@ -116,6 +116,15 @@
>  	}
>        else
>  	fn (data);
> +      /* Access to "children" is normally done inside a task_lock
> +	 mutex region, but the only way this particular task.children
> +	 can be set is if this thread's task work function (fn)
> +	 creates children.  So since the setter is *this* thread, we
> +	 need no barriers here when testing for non-NULL.  We can have
> +	 task.children set by the current thread then changed by a
> +	 child thread, but seeing a stale non-NULL value is not a
> +	 problem.  Once past the task_lock acquisition, this thread
> +	 will see the real value of task.children.  */
>        if (task.children != NULL)
>  	{
>  	  gomp_mutex_lock (&team->task_lock);
> @@ -296,6 +305,12 @@
>    struct gomp_task *child_task = NULL;
>    struct gomp_task *to_free = NULL;
>  
> +  /* The acquire barrier on load of task->children here synchronizes
> +     with the write of a NULL in gomp_barrier_handle_tasks.  It is
> +     not necessary that we synchronize with other non-NULL writes at
> +     this point, but we must ensure that all writes to memory by a
> +     child thread task work function are seen before we exit from
> +     GOMP_taskwait.  */
>    if (task == NULL
>        || __atomic_load_n (&task->children, MEMMODEL_ACQUIRE) == NULL)
>      return;
>

Patch

Index: libgomp/task.c
===================================================================
--- libgomp/task.c	(revision 195370)
+++ libgomp/task.c	(working copy)
@@ -116,6 +116,15 @@ 
 	}
       else
 	fn (data);
+      /* Access to "children" is normally done inside a task_lock
+	 mutex region, but the only way this particular task.children
+	 can be set is if this thread's task work function (fn)
+	 creates children.  So since the setter is *this* thread, we
+	 need no barriers here when testing for non-NULL.  We can have
+	 task.children set by the current thread then changed by a
+	 child thread, but seeing a stale non-NULL value is not a
+	 problem.  Once past the task_lock acquisition, this thread
+	 will see the real value of task.children.  */
       if (task.children != NULL)
 	{
 	  gomp_mutex_lock (&team->task_lock);
@@ -296,6 +305,12 @@ 
   struct gomp_task *child_task = NULL;
   struct gomp_task *to_free = NULL;
 
+  /* The acquire barrier on load of task->children here synchronizes
+     with the write of a NULL in gomp_barrier_handle_tasks.  It is
+     not necessary that we synchronize with other non-NULL writes at
+     this point, but we must ensure that all writes to memory by a
+     child thread task work function are seen before we exit from
+     GOMP_taskwait.  */
   if (task == NULL
       || __atomic_load_n (&task->children, MEMMODEL_ACQUIRE) == NULL)
     return;