diff mbox

[0/10] OpenACC 2.0 support for libgomp

Message ID 20140924123231.GB17454@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Sept. 24, 2014, 12:32 p.m. UTC
On Tue, Sep 23, 2014 at 07:17:25PM +0100, Julian Brown wrote:
> The upcoming patch series constitutes our current (still in-progress)
> implementation of run-time support for OpenACC 2.0 in libgomp. We've
> tried to build on top of the (also currently WIP) support for OpenMP
> 4.0's "target" construct, sharing code where possible: because of this,
> I've also prepared versions of (a fairly minimal, hopefully correct set
> of) prerequisite patches that apply to current mainline (and were
> previously on the gomp 4.0 branch), although in many cases we weren't
> the original authors of those.
> 
> Other parts of the OpenACC support for GCC are being sent upstream
> concurrently with this runtime support (and are co-dependent with it),
> so unfortunately, though the main part of the implementation (part 7/10)
> works on our internal branch, I haven't yet been able to convincingly
> test the series I'm about to post upstream. However this code will be
> useful to others who are posting their bits of OpenACC support
> upstream, so perhaps it'd be useful to commit it anyway (we have to
> start somewhere!).
> 
> I've tried to retain proper attribution for all the forthcoming patches,
> but I may have made mistakes. Please let me know if so!

Just random comments about all the 10 patches:


oacc-init.c:__thread  void *ACC_handle;
oacc-init.c:static __thread int handle_num = -1;
oacc-init.c:static __thread struct gomp_device_descr const *saved_bound_dev;
oacc-mem.c:__thread struct memmap_t *ACC_memmap;
oacc-parallel.c:static __thread struct devgeom devgeom = { 1, 1, 1 };
oacc-parallel.c:static __thread struct target_mem_desc *mapped_data = NULL;

Do you really need all those __thread vars?  As libgomp uses IE model
for performance reasons, growing the total size too much might very well
mean that the dynamic linker will refuse to dlopen it.  Couldn't you e.g.
use just a single __thread pointer to a struct that will contain all of
this?  Also, note that libgomp must be supported also for the
!HAVE_TLS case, where you shouldn't use __thread at all, use
pthread_getspecific etc. instead (so it would really help if you'd just
use a single pointer).

+void
+gomp_notify(const char *msg, ...)

Formatting, missing space before (.

   char bind_var;
+  int acc_notify_var;
   /* Internal ICV.  */
   struct target_mem_desc *target_data;

This is again in TLS, and duplicated/copied on any OpenMP parallel/task
etc., so it also affects performance of #pragma omp parallel/task.
Why do you need to put ACC stuff in there?  Can't it live in
target_data or elsewhere?

+       gomp_plugin_malloc;
+       gomp_plugin_malloc_cleared;
...
Please use GOMP_PLUGIN_ instead.  Also, please make sure the entrypoints
libgomp looks for in the plugins have similar/same prefix.

+__attribute__((used)) static void
+dump_mappings (FILE *f, splay_tree_node node)
+{

IMHO this should be guarded by some define, while it can be useful for
debugging the library, it is unneeded for production libgomp.

+  if (device->get_caps_func () & TARGET_CAP_OPENMP_400)
+    DLSYM (device_run);
+  if (device->get_caps_func () & TARGET_CAP_OPENACC_200)

Cache the return value?  Also, I must say I'm not particularly excited
about different plugins not supporting both OpenMP 4.0 and OpenACC 2.0
offloading.  Why is that needed?

+      /* Make sure all the CUDA functions are there if any of them are.  */
+      if (optional_present && optional_present != optional_total)
+       {
+         err = "plugin missing OpenACC CUDA handler function";
+         goto out;
+       }

So, any plugin that doesn't support CUDA will not support OpenACC?
I hoped OpenACC would not be so tied to one particular HW...

//#define DEBUG
//#define DISABLE_ASYNC

I don't like these in the files, debug flags should be enabled in
the Makefile instead IMHO.

#ifdef DEBUG
  fprintf (stderr, "libgomp plugin: %s:%s using stream %p (CUDA stream %p) "
           "for async %d\n", __FILE__, __FUNCTION__, stream,
           stream ? stream->stream : NULL, orig_async);
#endif

I'd find it cleaner to create a macro like gomp_debug_printf which
would expand to nothing and ignore all the arguments unless some debug
macro is defined, instead of using #ifdef everywhere.  Or drop these
from the patch.

const int TARGET_TYPE_HOST = 0;

libgomp isn't written in C++, please use #define or enum.  And, isn't
this already defined in target.h?

	(gomp_map_vars, gomp_unmap_tgt, gomp_unmap_vars, gomp_update):
	  Use these.
	(resolve_device, gomp_find_available_plugins): Remove ID 257
	  hack.

that is not how ChangeLog entries should look like, if a line is not
starting with ( after the tab, it should not contain extra spaces
after the tab, so move Use these. and hack. (and in other spots)
two columns to the left.

	Jakub

Comments

Julian Brown Oct. 2, 2014, 2:49 p.m. UTC | #1
Hi,

On Wed, 24 Sep 2014 14:32:31 +0200
Jakub Jelinek <jakub@redhat.com> wrote:

> On Tue, Sep 23, 2014 at 07:17:25PM +0100, Julian Brown wrote:
> > The upcoming patch series constitutes our current (still
> > in-progress) implementation of run-time support for OpenACC 2.0 in
> > libgomp. We've tried to build on top of the (also currently WIP)
> > support for OpenMP 4.0's "target" construct, sharing code where
> > possible: because of this, I've also prepared versions of (a fairly
> > minimal, hopefully correct set of) prerequisite patches that apply
> > to current mainline (and were previously on the gomp 4.0 branch),
> > although in many cases we weren't the original authors of those.
> > 
> > Other parts of the OpenACC support for GCC are being sent upstream
> > concurrently with this runtime support (and are co-dependent with
> > it), so unfortunately, though the main part of the implementation
> > (part 7/10) works on our internal branch, I haven't yet been able
> > to convincingly test the series I'm about to post upstream. However
> > this code will be useful to others who are posting their bits of
> > OpenACC support upstream, so perhaps it'd be useful to commit it
> > anyway (we have to start somewhere!).
> 
> Just random comments about all the 10 patches:

Thanks for your comments -- I'm planning to address the things you've
bought up, but will probably change tack a little and do that work on
the gomp-4_0-branch (rather than working directly on mainline). That
way I can (hopefully) send incremental patches rather than working
entirely locally then sending another over-sized patch.

> Cache the return value?  Also, I must say I'm not particularly excited
> about different plugins not supporting both OpenMP 4.0 and OpenACC 2.0
> offloading.  Why is that needed?

For now, because OpenACC supports some stuff that (AFAIK!) OpenMP
doesn't, such as asynchronous execution. The eventual plan is for the
plugin interface to be generic, but we're not there yet.

> +      /* Make sure all the CUDA functions are there if any of them
> are.  */
> +      if (optional_present && optional_present != optional_total)
> +       {
> +         err = "plugin missing OpenACC CUDA handler function";
> +         goto out;
> +       }
> 
> So, any plugin that doesn't support CUDA will not support OpenACC?
> I hoped OpenACC would not be so tied to one particular HW...

The intention was for that section to allow zero CUDA handling
functions, or all of them. For better or worse, OpenACC defines a few
APIs which are target-specific (for NVidia, AMD, Intel so far, IIRC).
An OpenACC application doesn't have to use any of those, of course.

> that is not how ChangeLog entries should look like, if a line is not
> starting with ( after the tab, it should not contain extra spaces
> after the tab, so move Use these. and hack. (and in other spots)
> two columns to the left.

That was merely a copy/paste error of some sort, apologies.

Thanks,

Julian
diff mbox

Patch

--- libgomp/Makefile.am (revision 215546)
+++ libgomp/Makefile.am (working copy)
@@ -14,13 +14,35 @@  libsubincludedir = $(libdir)/gcc/$(targe
 
 vpath % $(strip $(search_path))
 
-AM_CPPFLAGS = $(addprefix -I, $(search_path))
+AM_CPPFLAGS = $(addprefix -I, $(search_path)) \
+       $(addprefix -I, $(search_path)/../include)
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)

This looks wrong, search_path is typically something like:
$(top_srcdir)/config/linux/x86 $(top_srcdir)/config/linux \
$(top_srcdir)/config/posix $(top_srcdir)
so $(search_path)/../include means you duplicate all the
*/config/* paths again.  Just add -I$(top_srcdir)/../include
to AM_CPPFLAGS.

As for plugins, my preference would be to move their sources
to a libgomp/plugins/ subdirectory and build them in that subdirectory
(for mic, which builds its plugin inside of libmicoffload it
could copy it there).

# TODO: not for OpenACC?
libgomp really needs to be built against libpthread, so if you don't
want that, you'd need to move the openacc bits to a separate shared library.

In general, I'd prefer if the stuff that gets committed to trunk
contains as few TODO: and FIXME: comments as possible, keep them
on the branch if you really need them.

 static void
+goacc_parse_device_num (void)
+{

Any reason why you don't want to use parse_int for this?
Does the standard require you parse and don't reject negative
numbers?