libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
diff mbox

Message ID 20150414151502.5ca882a4@octopus
State New
Headers show

Commit Message

Julian Brown April 14, 2015, 2:15 p.m. UTC
On Wed, 8 Apr 2015 17:58:56 +0300
Ilya Verbin <iverbin@gmail.com> wrote:

> On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote:
> > This version is mostly the same as the last posted version but has a
> > tweak in GOACC_parallel to account for the new splay tree
> > arrangement for target functions:
> > 
> > -      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
> > +      tgt_fn = (void (*)) tgt_fn_key->tgt_offset;
> > 
> > Have there been any other changes I might have missed?
> 
> No.
> 
> > It passes libgomp testing on NVPTX. OK?
> 
> Have you tested it with disabled offloading?
> 
> I see several regressions:
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

I think there may be multiple issues here. The attached patch addresses
one -- acc_device_type not distinguishing between "offloaded" and host
code with the host_nonshm plugin.

The other problem is that it appears that the ACC_DEVICE_TYPE
environment variable is not getting set properly on the target for (any
of) the OpenACC tests: this means a lot of the time the "wrong" plugin
is being tested, and means that the above tests (and several others)
still fail. That will apparently need some more engineering (on our
part).

(Not asking for review just yet, JFYI.)

Julian

ChangeLog

    libgomp/
    * oacc-init.c (acc_on_device): Check whether we're in an offloaded
    region for host_nonshm plugin.
    * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
    nonshm_exec flag in thread-local data.
    (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
    data for host_nonshm plugin.
    (+GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
    for host_nonshm plugin.
    * plugin/plugin-host.h: New.

Comments

Thomas Schwinge April 14, 2015, 3:34 p.m. UTC | #1
Hi!

On Tue, 14 Apr 2015 15:15:02 +0100, Julian Brown <julian@codesourcery.com> wrote:
> The other problem is that it appears that the ACC_DEVICE_TYPE
> environment variable is not getting set properly on the target for (any
> of) the OpenACC tests: this means a lot of the time the "wrong" plugin
> is being tested, and means that the above tests (and several others)
> still fail. That will apparently need some more engineering (on our
> part).

This should be working fine for "local" testing (that is, build-tree
testing, without a remote board).  Setting/communicating environment
variables to remote boards (which we use a lot for our internal testing)
indeed does not work in DejaGnu.  This has been reported several times in
the last years, by different people, but nobody ever came up with a
solution that was sufficiently generic so that was acceptable for
inclusion.  (Need a way to specify whether environment variables should
be set for the host and/or target system, and so on.)


For the problem at hand, I once had a different suggestion, which I'm
paraphrasing here: the existing code should be changed such that when
-foffload=nvptx,intelmic,... is specified (also considering a comple-time
default value based on --enable-offload-targets=[...]), the first
offloading target (nvptx in my example) is the one that is used by
default (acc_device_default) with OpenACC, and for OpenMP as device ID 0,
and so on.  That way, we could compile the libgomp test cases with
-foffload=$offload_target_openacc (see
libgomp/testsuite/libgomp.oacc-c/c.exp for context), and wouldn't have to
care about the ACC_DEVICE_TYPE environment variable anymore.

Does that make sense?


Grüße,
 Thomas
Thomas Schwinge April 14, 2015, 3:43 p.m. UTC | #2
Hi!

On Tue, 14 Apr 2015 15:15:02 +0100, Julian Brown <julian@codesourcery.com> wrote:
> On Wed, 8 Apr 2015 17:58:56 +0300
> Ilya Verbin <iverbin@gmail.com> wrote:
> > I see several regressions:
> > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
> > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> 
> I think there may be multiple issues here. The attached patch addresses
> one -- acc_device_type not distinguishing between "offloaded" and host
> code with the host_nonshm plugin.

(You mean acc_on_device?)

> --- libgomp/oacc-init.c	(revision 221922)
> +++ libgomp/oacc-init.c	(working copy)
> @@ -548,7 +549,14 @@ ialias (acc_set_device_num)
>  int
>  acc_on_device (acc_device_t dev)
>  {
> -  if (acc_get_device_type () == acc_device_host_nonshm)
> +  struct goacc_thread *thr = goacc_thread ();
> +
> +  /* We only want to appear to be the "host_nonshm" plugin from "offloaded"
> +     code -- i.e. within a parallel region.  Test a flag set by the
> +     openacc_parallel hook of the host_nonshm plugin to determine that.  */
> +  if (acc_get_device_type () == acc_device_host_nonshm
> +      && thr && thr->target_tls
> +      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
>      return dev == acc_device_host_nonshm || dev == acc_device_not_host;
>  
>    /* Just rely on the compiler builtin.  */

Really, acc_on_device is implemented as a compiler builtin (which is just
disabled for a few libgomp test cases, in order to test the acc_on_device
library function in libgomp), and I never understood why the "fallback"
implementation in libgomp (cited above) should be doing anything
different from the GCC builtin.  Is the "problem" actually, that some
libgomp test cases are expecting from acc_on_device for
acc_device_host_nonshm a different answer than the one they're currently
getting?  What is the expected answer?  Given that the OpenACC
specification doesn't talk about a host_nonshm device type, can we
accordingly define what the expected behavior is, so that we can just
have libgomp/oacc-init.c:acc_on_device »rely on the compiler builtin«?


Grüße,
 Thomas
Julian Brown April 17, 2015, 9:54 a.m. UTC | #3
On Tue, 14 Apr 2015 15:15:02 +0100
Julian Brown <julian@codesourcery.com> wrote:

> On Wed, 8 Apr 2015 17:58:56 +0300
> Ilya Verbin <iverbin@gmail.com> wrote:
> 
> > On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote:
> > > This version is mostly the same as the last posted version but
> > > has a tweak in GOACC_parallel to account for the new splay tree
> > > arrangement for target functions:
> > > 
> > > -      tgt_fn = (void (*)) tgt_fn_key->tgt->tgt_start;
> > > +      tgt_fn = (void (*)) tgt_fn_key->tgt_offset;
> > > 
> > > Have there been any other changes I might have missed?
> > 
> > No.
> > 
> > > It passes libgomp testing on NVPTX. OK?
> > 
> > Have you tested it with disabled offloading?
> > 
> > I see several regressions:
> > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
> > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> 
> I think there may be multiple issues here. The attached patch
> addresses one -- acc_device_type not distinguishing between
> "offloaded" and host code with the host_nonshm plugin.

The patch appears to fix the original issue after all: I've re-run
tests with host==target and the failures no longer appear. Also the
same has been noted by Dominique d'Humieres in PR65742.

> The other problem is that it appears that the ACC_DEVICE_TYPE
> environment variable is not getting set properly on the target for
> (any of) the OpenACC tests: this means a lot of the time the "wrong"
> plugin is being tested, and means that the above tests (and several
> others) still fail. That will apparently need some more engineering
> (on our part).

Fixing this turns out to require more DejaGNU-fu than I have: AFAICT,
setting a per-test environment variable from an .exp file can't easily
be done at present. The potentially useful-looking
{dg-}set-target-env-var doesn't look quite suitable for this purpose,
and besides which doesn't actually seem to be implemented for host !=
target anyway.

(At least, if this fragment of gcc-dg.exp is anything to go by:

   if { [info exists set_target_env_var] \
        && [llength $set_target_env_var] != 0 } {
     if { [is_remote target] } {
       return [list "unsupported" ""]
     } ...
).

So: OK for trunk?

Thanks,

Julian

> ChangeLog
> 
>     libgomp/
>     * oacc-init.c (acc_on_device): Check whether we're in an offloaded
>     region for host_nonshm plugin.
>     * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
>     nonshm_exec flag in thread-local data.
>     (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
>     data for host_nonshm plugin.
>     (+GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local
> data for host_nonshm plugin.
>     * plugin/plugin-host.h: New.
Jakub Jelinek April 17, 2015, 1:16 p.m. UTC | #4
On Tue, Apr 14, 2015 at 05:43:26PM +0200, Thomas Schwinge wrote:
> On Tue, 14 Apr 2015 15:15:02 +0100, Julian Brown <julian@codesourcery.com> wrote:
> > On Wed, 8 Apr 2015 17:58:56 +0300
> > Ilya Verbin <iverbin@gmail.com> wrote:
> > > I see several regressions:
> > > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
> > > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> > > FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> > > -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> > 
> > I think there may be multiple issues here. The attached patch addresses
> > one -- acc_device_type not distinguishing between "offloaded" and host
> > code with the host_nonshm plugin.
> 
> (You mean acc_on_device?)
> 
> > --- libgomp/oacc-init.c	(revision 221922)
> > +++ libgomp/oacc-init.c	(working copy)
> > @@ -548,7 +549,14 @@ ialias (acc_set_device_num)
> >  int
> >  acc_on_device (acc_device_t dev)
> >  {
> > -  if (acc_get_device_type () == acc_device_host_nonshm)
> > +  struct goacc_thread *thr = goacc_thread ();
> > +
> > +  /* We only want to appear to be the "host_nonshm" plugin from "offloaded"
> > +     code -- i.e. within a parallel region.  Test a flag set by the
> > +     openacc_parallel hook of the host_nonshm plugin to determine that.  */
> > +  if (acc_get_device_type () == acc_device_host_nonshm
> > +      && thr && thr->target_tls
> > +      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
> >      return dev == acc_device_host_nonshm || dev == acc_device_not_host;
> >  
> >    /* Just rely on the compiler builtin.  */
> 
> Really, acc_on_device is implemented as a compiler builtin (which is just
> disabled for a few libgomp test cases, in order to test the acc_on_device
> library function in libgomp), and I never understood why the "fallback"
> implementation in libgomp (cited above) should be doing anything
> different from the GCC builtin.  Is the "problem" actually, that some

The question is if the builtin expansion isn't wrong, at least as long as
the host_nonshm device is meant to be supported.  The
#ifdef ACCEL_COMPILER
case is easier, at least as long as ACCEL_COMPILER compiled code is not
meant to be able to offload to other devices (or host again), but the
non-ACCEL_COMPILER case means the code is either on the host, or
host_nonshm, or e.g. with Intel MIC you could have some shared library be
compiled by the host compiler, but then actuall linked into the MIC
offloaded path.  In all those cases, I think it is just the library that
can determine the return value.

E.g. OpenMP omp_is_initial_device function is also only implemented in the
library, perhaps at some point I could expand it for #ifdef ACCEL_COMPILER
as builtin, but not for the host code, at least not due to the host-nonshm
plugin.

	Jakub

Patch
diff mbox

Index: libgomp/oacc-init.c
===================================================================
--- libgomp/oacc-init.c	(revision 221922)
+++ libgomp/oacc-init.c	(working copy)
@@ -29,6 +29,7 @@ 
 #include "libgomp.h"
 #include "oacc-int.h"
 #include "openacc.h"
+#include "plugin/plugin-host.h"
 #include <assert.h>
 #include <stdlib.h>
 #include <strings.h>
@@ -548,7 +549,14 @@  ialias (acc_set_device_num)
 int
 acc_on_device (acc_device_t dev)
 {
-  if (acc_get_device_type () == acc_device_host_nonshm)
+  struct goacc_thread *thr = goacc_thread ();
+
+  /* We only want to appear to be the "host_nonshm" plugin from "offloaded"
+     code -- i.e. within a parallel region.  Test a flag set by the
+     openacc_parallel hook of the host_nonshm plugin to determine that.  */
+  if (acc_get_device_type () == acc_device_host_nonshm
+      && thr && thr->target_tls
+      && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
     return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
   /* Just rely on the compiler builtin.  */
Index: libgomp/plugin/plugin-host.c
===================================================================
--- libgomp/plugin/plugin-host.c	(revision 221922)
+++ libgomp/plugin/plugin-host.c	(working copy)
@@ -44,6 +44,7 @@ 
 #include <stdlib.h>
 #include <string.h>
 #include <stdio.h>
+#include <stdbool.h>
 
 #ifdef HOST_NONSHM_PLUGIN
 #define STATIC
@@ -55,6 +56,10 @@ 
 #define SELF "host: "
 #endif
 
+#ifdef HOST_NONSHM_PLUGIN
+#include "plugin-host.h"
+#endif
+
 STATIC const char *
 GOMP_OFFLOAD_get_name (void)
 {
@@ -174,7 +179,10 @@  GOMP_OFFLOAD_openacc_parallel (void (*fn
 			       void *targ_mem_desc __attribute__ ((unused)))
 {
 #ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd = GOMP_PLUGIN_acc_thread ();
+  thd->nonshm_exec = true;
   fn (devaddrs);
+  thd->nonshm_exec = false;
 #else
   fn (hostaddrs);
 #endif
@@ -232,11 +240,20 @@  STATIC void *
 GOMP_OFFLOAD_openacc_create_thread_data (int ord
 					 __attribute__ ((unused)))
 {
+#ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd
+    = GOMP_PLUGIN_malloc (sizeof (struct nonshm_thread));
+  thd->nonshm_exec = false;
+  return thd;
+#else
   return NULL;
+#endif
 }
 
 STATIC void
-GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data
-					  __attribute__ ((unused)))
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data)
 {
+#ifdef HOST_NONSHM_PLUGIN
+  free (tls_data);
+#endif
 }
Index: libgomp/plugin/plugin-host.h
===================================================================
--- libgomp/plugin/plugin-host.h	(revision 0)
+++ libgomp/plugin/plugin-host.h	(revision 0)
@@ -0,0 +1,37 @@ 
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef PLUGIN_HOST_H
+#define PLUGIN_HOST_H
+
+struct nonshm_thread
+{
+  bool nonshm_exec;
+};
+
+#endif