diff mbox

[2/n] OpenMP 4.0 offloading infrastructure: LTO streaming

Message ID 20150731142007.GA64740@msticlxl57.ims.intel.com
State New
Headers show

Commit Message

Ilya Verbin July 31, 2015, 2:20 p.m. UTC
On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
> We had established the use of a boolean flag have_offload in gcc::context
> to indicate whether during compilation, we've actually seen any code to
> be offloaded (see cited below the relevant parts of the patch by Ilya et
> al.).  This means that currently, the whole offload machinery will not be
> run unless we actually have any offloaded data.  This means that the
> configured mkoffload programs (-foffload=[...], defaulting to
> configure-time --enable-offload-targets=[...]) will not be invoked unless
> we actually have any offloaded data.  This means that we will not
> actually generate constructor code to call libgomp's
> GOMP_offload_register unless we actually have any offloaded data.

Yes, that was the plan.

> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
> targets have been specified during compilation.
> 
> But: at runtime, I'd like to know which -foffload=[...] targets have been
> specified during compilation, so that we can, for example, reliably
> resort to host fallback execution for -foffload=disable instead of
> getting error message that an offloaded function is missing.

It's easy to fix:



> other hand, for example, for -foffload=nvptx-none, even if user program
> code doesn't contain any offloaded data (and thus the offload machinery
> has not been run), the user program might still contain any executable
> directives or OpenACC runtime library calls, so we'd still like to use
> the libgomp nvptx plugin.  However, we currently cannot detect this
> situation.
> 
> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
> configuration in the executable (as a string, for example) for libgomp to
> look that up, or b) make it a requirement that (if configured via
> -foffload=[...]), the offload machinery is run even if there is not
> actually any data to be offloaded, so we then reliably get the respective
> constructor call to libgomp's GOMP_offload_register.  I once began to
> implement a), but this to get a big ugly, so then looked into b) instead.
> Compared to the status quo, always running the whole offloading machinery
> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
> are active, certainly does introduce some overhead when there isn't
> actually any code to be offloaded, so I'm not sure whether that is
> acceptable?

I vote for (a).

  -- Ilya

Comments

Richard Biener Aug. 5, 2015, 8:40 a.m. UTC | #1
On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
>> We had established the use of a boolean flag have_offload in gcc::context
>> to indicate whether during compilation, we've actually seen any code to
>> be offloaded (see cited below the relevant parts of the patch by Ilya et
>> al.).  This means that currently, the whole offload machinery will not be
>> run unless we actually have any offloaded data.  This means that the
>> configured mkoffload programs (-foffload=[...], defaulting to
>> configure-time --enable-offload-targets=[...]) will not be invoked unless
>> we actually have any offloaded data.  This means that we will not
>> actually generate constructor code to call libgomp's
>> GOMP_offload_register unless we actually have any offloaded data.
>
> Yes, that was the plan.
>
>> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
>> targets have been specified during compilation.
>>
>> But: at runtime, I'd like to know which -foffload=[...] targets have been
>> specified during compilation, so that we can, for example, reliably
>> resort to host fallback execution for -foffload=disable instead of
>> getting error message that an offloaded function is missing.
>
> It's easy to fix:
>
> diff --git a/libgomp/target.c b/libgomp/target.c
> index a5fb164..f81d570 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
>        k.host_end = k.host_start + 1;
>        splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
>        gomp_mutex_unlock (&devicep->lock);
> -      if (tgt_fn == NULL)
> -       gomp_fatal ("Target function wasn't mapped");
> -
>        return (void *) tgt_fn->tgt_offset;
>      }
>  }
> @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
>      return gomp_target_fallback (fn, hostaddrs);
>
>    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> +  if (fn_addr == NULL)
> +    return gomp_target_fallback (fn, hostaddrs);
>
>    struct target_mem_desc *tgt_vars
>      = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
> @@ -1155,6 +1154,8 @@ GOMP_target_41 (int device, void (*fn) (void *), size_t mapnum,
>      }
>
>    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> +  if (fn_addr == NULL)
> +    return gomp_target_fallback (fn, hostaddrs);
>
>    struct target_mem_desc *tgt_vars
>      = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
>
>
>> other hand, for example, for -foffload=nvptx-none, even if user program
>> code doesn't contain any offloaded data (and thus the offload machinery
>> has not been run), the user program might still contain any executable
>> directives or OpenACC runtime library calls, so we'd still like to use
>> the libgomp nvptx plugin.  However, we currently cannot detect this
>> situation.
>>
>> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
>> configuration in the executable (as a string, for example) for libgomp to
>> look that up, or b) make it a requirement that (if configured via
>> -foffload=[...]), the offload machinery is run even if there is not
>> actually any data to be offloaded, so we then reliably get the respective
>> constructor call to libgomp's GOMP_offload_register.  I once began to
>> implement a), but this to get a big ugly, so then looked into b) instead.
>> Compared to the status quo, always running the whole offloading machinery
>> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
>> are active, certainly does introduce some overhead when there isn't
>> actually any code to be offloaded, so I'm not sure whether that is
>> acceptable?
>
> I vote for (a).

What happens for conflicting -fofffload=[...] options in different TUs?

Richard.

>   -- Ilya
Ilya Verbin Aug. 5, 2015, 3:09 p.m. UTC | #2
On Wed, Aug 05, 2015 at 10:40:44 +0200, Richard Biener wrote:
> On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> > On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
> >> We had established the use of a boolean flag have_offload in gcc::context
> >> to indicate whether during compilation, we've actually seen any code to
> >> be offloaded (see cited below the relevant parts of the patch by Ilya et
> >> al.).  This means that currently, the whole offload machinery will not be
> >> run unless we actually have any offloaded data.  This means that the
> >> configured mkoffload programs (-foffload=[...], defaulting to
> >> configure-time --enable-offload-targets=[...]) will not be invoked unless
> >> we actually have any offloaded data.  This means that we will not
> >> actually generate constructor code to call libgomp's
> >> GOMP_offload_register unless we actually have any offloaded data.
> >
> > Yes, that was the plan.
> >
> >> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
> >> targets have been specified during compilation.
> >>
> >> But: at runtime, I'd like to know which -foffload=[...] targets have been
> >> specified during compilation, so that we can, for example, reliably
> >> resort to host fallback execution for -foffload=disable instead of
> >> getting error message that an offloaded function is missing.
> >
> > It's easy to fix:
> >
> > diff --git a/libgomp/target.c b/libgomp/target.c
> > index a5fb164..f81d570 100644
> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
> >        k.host_end = k.host_start + 1;
> >        splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
> >        gomp_mutex_unlock (&devicep->lock);
> > -      if (tgt_fn == NULL)
> > -       gomp_fatal ("Target function wasn't mapped");
> > -
> >        return (void *) tgt_fn->tgt_offset;
> >      }
> >  }
> > @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
> >      return gomp_target_fallback (fn, hostaddrs);
> >
> >    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > +  if (fn_addr == NULL)
> > +    return gomp_target_fallback (fn, hostaddrs);
> >
> >    struct target_mem_desc *tgt_vars
> >      = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
> > @@ -1155,6 +1154,8 @@ GOMP_target_41 (int device, void (*fn) (void *), size_t mapnum,
> >      }
> >
> >    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > +  if (fn_addr == NULL)
> > +    return gomp_target_fallback (fn, hostaddrs);
> >
> >    struct target_mem_desc *tgt_vars
> >      = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
> >
> >
> >> other hand, for example, for -foffload=nvptx-none, even if user program
> >> code doesn't contain any offloaded data (and thus the offload machinery
> >> has not been run), the user program might still contain any executable
> >> directives or OpenACC runtime library calls, so we'd still like to use
> >> the libgomp nvptx plugin.  However, we currently cannot detect this
> >> situation.
> >>
> >> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
> >> configuration in the executable (as a string, for example) for libgomp to
> >> look that up, or b) make it a requirement that (if configured via
> >> -foffload=[...]), the offload machinery is run even if there is not
> >> actually any data to be offloaded, so we then reliably get the respective
> >> constructor call to libgomp's GOMP_offload_register.  I once began to
> >> implement a), but this to get a big ugly, so then looked into b) instead.
> >> Compared to the status quo, always running the whole offloading machinery
> >> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
> >> are active, certainly does introduce some overhead when there isn't
> >> actually any code to be offloaded, so I'm not sure whether that is
> >> acceptable?
> >
> > I vote for (a).
> 
> What happens for conflicting -fofffload=[...] options in different TUs?

If you're asking about what happens now, only the list of offload targets from
link-time -foffload=tgt1,tgt2 option matters.

I don't like plan (b) because it calls ipa_write_summaries unconditionally for
all OpenMP programs, which creates IR sections, which increases filesize and may
cause other problems, e.g. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63868>.
Also compile-time is increased because of LTO machinery, mkoffloads, etc.

If OpenACC requires some registration in libgomp even without offload, maybe you
can run this machinery only under flag_openacc?

  -- Ilya
diff mbox

Patch

diff --git a/libgomp/target.c b/libgomp/target.c
index a5fb164..f81d570 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1066,9 +1066,6 @@  gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
       k.host_end = k.host_start + 1;
       splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
       gomp_mutex_unlock (&devicep->lock);
-      if (tgt_fn == NULL)
-	gomp_fatal ("Target function wasn't mapped");
-
       return (void *) tgt_fn->tgt_offset;
     }
 }
@@ -1095,6 +1092,8 @@  GOMP_target (int device, void (*fn) (void *), const void *unused,
     return gomp_target_fallback (fn, hostaddrs);
 
   void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
+  if (fn_addr == NULL)
+    return gomp_target_fallback (fn, hostaddrs);
 
   struct target_mem_desc *tgt_vars
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
@@ -1155,6 +1154,8 @@  GOMP_target_41 (int device, void (*fn) (void *), size_t mapnum,
     }
 
   void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
+  if (fn_addr == NULL)
+    return gomp_target_fallback (fn, hostaddrs);
 
   struct target_mem_desc *tgt_vars
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,