diff mbox

[RFC,gomp4] Offloading patches (1/3): Add '-fopenmp_target' option

Message ID 20131217113522.GA23078@msticlxl57.ims.intel.com
State New
Headers show

Commit Message

Michael Zolotukhin Dec. 17, 2013, 11:35 a.m. UTC
Hi everybody,

Here is a set of patches implementing one more piece of offloading support in
GCC.  These three patches allow to build a host binary with target image and all
tables embedded.  Along with patches for libgomp and libgomp plugin, which
hopefully will be sent soon, that gives a functional and runnable executable (or
DSO) with actual offloading to MIC.

There is still a lot to do in this area, but this is the necessary basics - with
this we could actually run offloaded code, produced fully by compiler.

We would like to hear any feedback on these patches: what issues we should
address first before commit (if any), how the patches fit OpenACC work, etc.

Here is a patch 1/3: Add '-fopenmp_target' option This option tells lto1 to look
for "*.target_lto*" sections instead of usual "*.lto*".  That option is passed
to target compiler when we invoke it to build target image.

Thanks,
Michael


---
 gcc/lto/lang.opt     |    4 ++++
 gcc/lto/lto-object.c |    5 +++--
 gcc/lto/lto.c        |    7 ++++++-
 3 files changed, 13 insertions(+), 3 deletions(-)

Comments

Michael Zolotukhin Jan. 16, 2014, 11:36 a.m. UTC | #1
Ping.

On 17 December 2013 15:35, Michael V. Zolotukhin
<michael.v.zolotukhin@gmail.com> wrote:
> Hi everybody,
>
> Here is a set of patches implementing one more piece of offloading support in
> GCC.  These three patches allow to build a host binary with target image and all
> tables embedded.  Along with patches for libgomp and libgomp plugin, which
> hopefully will be sent soon, that gives a functional and runnable executable (or
> DSO) with actual offloading to MIC.
>
> There is still a lot to do in this area, but this is the necessary basics - with
> this we could actually run offloaded code, produced fully by compiler.
>
> We would like to hear any feedback on these patches: what issues we should
> address first before commit (if any), how the patches fit OpenACC work, etc.
>
> Here is a patch 1/3: Add '-fopenmp_target' option This option tells lto1 to look
> for "*.target_lto*" sections instead of usual "*.lto*".  That option is passed
> to target compiler when we invoke it to build target image.
>
> Thanks,
> Michael
>
>
> ---
>  gcc/lto/lang.opt     |    4 ++++
>  gcc/lto/lto-object.c |    5 +++--
>  gcc/lto/lto.c        |    7 ++++++-
>  3 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt
> index 7a9aede..cd0098c 100644
> --- a/gcc/lto/lang.opt
> +++ b/gcc/lto/lang.opt
> @@ -40,4 +40,8 @@ fresolution=
>  LTO Joined
>  The resolution file
>
> +fopenmp_target
> +LTO Var(flag_openmp_target)
> +Run LTO infrastructure to read target-side bytecode and to build it.
> +
>  ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
> index 19f10cc..64274f3 100644
> --- a/gcc/lto/lto-object.c
> +++ b/gcc/lto/lto-object.c
> @@ -59,6 +59,8 @@ struct lto_simple_object
>
>  static simple_object_attributes *saved_attributes;
>
> +extern const char *section_name_prefix;
> +
>  /* Initialize FILE, an LTO file object for FILENAME.  */
>
>  static void
> @@ -229,8 +231,7 @@ lto_obj_add_section (void *data, const char *name, off_t offset,
>    void **slot;
>    struct lto_section_list *list = loasd->list;
>
> -  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
> -              strlen (LTO_SECTION_NAME_PREFIX)) != 0)
> +  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
>      return 1;
>
>    new_name = xstrdup (name);
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index 0211437..dedf8a8 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -49,6 +49,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "context.h"
>  #include "pass_manager.h"
>
> +extern const char *section_name_prefix;
> +
>  /* Vector to keep track of external variables we've seen so far.  */
>  vec<tree, va_gc> *lto_global_var_decls;
>
> @@ -2081,7 +2083,7 @@ lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
>  {
>    const char *s;
>
> -  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
> +  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
>      return 0;
>    s = strrchr (name, '.');
>    return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
> @@ -2757,6 +2759,9 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
>
>    timevar_push (TV_IPA_LTO_DECL_IN);
>
> +  if (flag_openmp_target)
> +    section_name_prefix = OMP_SECTION_NAME_PREFIX;
> +
>    real_file_decl_data
>      = decl_data = ggc_alloc_cleared_vec_lto_file_decl_data_ptr (nfiles + 1);
>    real_file_count = nfiles;
> --
> 1.7.1
>
>
>
Bernd Schmidt Jan. 21, 2014, 2:40 p.m. UTC | #2
On 12/17/2013 12:35 PM, Michael V. Zolotukhin wrote:
> Here is a set of patches implementing one more piece of offloading support in
> GCC.  These three patches allow to build a host binary with target image and all
> tables embedded.  Along with patches for libgomp and libgomp plugin, which
> hopefully will be sent soon, that gives a functional and runnable executable (or
> DSO) with actual offloading to MIC.

Do you have a testcase that can be used to see what this does in action?


Bernd
Andrey Turetskiy Jan. 22, 2014, 10:53 a.m. UTC | #3
Hi Bernd,

We have some testcases, but they require XeonPhi hardware and a
working libgomp plugin. Our current version of the plugin depends on
some libraries, that are not open-sourced yet, so currently we can’t
share it.

However, you could examine what these patches do, making the following steps:
1) Build GCC with patches:
        http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01484.html
        http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01485.html
        http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01486.html
        http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01896.html
2) Set environment variables (e.g. for two ‘targets’):
        export OFFLOAD_TARGET_NAMES=mic:hsail              (for now
names don’t really matter)
        export OFFLOAD_TARGET_COMPILERS=./gcc:./gcc    (use GCC with
patches above as target compiler, because it must support the
-fopenmp_target option)
3) Build any example with #pragma omp target (e.g. see attachment):
        ./gcc -flto -fopenmp test.c -o test.exe
    Options -flto and -fopenmp are necessary for using.

Now you have a binary with target images embedded and tables properly
filled. You can’t run it due to reasons mentioned above, though you
could examine it with objdump/nm/readelf to see new sections and their
content: there will be .offload_image_section with ‘target’ code and
.offload_func_table_section with ‘target’ function table.

On Tue, Jan 21, 2014 at 6:40 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 12/17/2013 12:35 PM, Michael V. Zolotukhin wrote:
>>
>> Here is a set of patches implementing one more piece of offloading support
>> in
>> GCC.  These three patches allow to build a host binary with target image
>> and all
>> tables embedded.  Along with patches for libgomp and libgomp plugin, which
>> hopefully will be sent soon, that gives a functional and runnable
>> executable (or
>> DSO) with actual offloading to MIC.
>
>
> Do you have a testcase that can be used to see what this does in action?
>
>
> Bernd
>
Bernd Schmidt Jan. 27, 2014, 1:12 p.m. UTC | #4
On 01/22/2014 11:53 AM, Andrey Turetskiy wrote:
> We have some testcases, but they require XeonPhi hardware and a
> working libgomp plugin. Our current version of the plugin depends on
> some libraries, that are not open-sourced yet, so currently we can’t
> share it.
>
> However, you could examine what these patches do, making the following steps:
> 1) Build GCC with patches:
>          http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01484.html
>          http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01485.html
>          http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01486.html
>          http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01896.html
> 2) Set environment variables (e.g. for two ‘targets’):
>          export OFFLOAD_TARGET_NAMES=mic:hsail              (for now
> names don’t really matter)
>          export OFFLOAD_TARGET_COMPILERS=./gcc:./gcc    (use GCC with
> patches above as target compiler, because it must support the
> -fopenmp_target option)
> 3) Build any example with #pragma omp target (e.g. see attachment):
>          ./gcc -flto -fopenmp test.c -o test.exe
>      Options -flto and -fopenmp are necessary for using.
>
> Now you have a binary with target images embedded and tables properly
> filled. You can’t run it due to reasons mentioned above, though you
> could examine it with objdump/nm/readelf to see new sections and their
> content: there will be .offload_image_section with ‘target’ code and
> .offload_func_table_section with ‘target’ function table.

I played around with this for a while last week. To have a slightly more 
realistic scenario where the offload compiler is for a different target, 
I built an aarch64-linux compiler and used that in 
OFFLOAD_TARGET_COMPILERS. This exposed some problems.

> +  /* Run gcc for target.  */
> +  obstack_init (&argv_obstack);
> +  obstack_ptr_grow (&argv_obstack, compiler);
> +  obstack_ptr_grow (&argv_obstack, "-shared");
> +  obstack_ptr_grow (&argv_obstack, "-fPIC");
> +  obstack_ptr_grow (&argv_obstack, "-xlto");
> +  obstack_ptr_grow (&argv_obstack, "-fopenmp_target");
> +  obstack_ptr_grow (&argv_obstack, "-o");
> +  obstack_ptr_grow (&argv_obstack, target_image_file_name);

Since environment variables such as GCC_EXEC_PREFIX and COMPILER_PATH 
are set at this point, the compiler we're running here won't find the 
correct lto1 - best case it doesn't find anything, worst case it finds 
the lto1 for the host compiler and produces an image for the host, not 
the target (this fails with an arm compiler since the host assembler 
doesn't understand -meabi=5, but it could silently do the wrong thing 
with other offload toolchains).

Once I worked around this by unsetting the environment variables around 
this compiler invocation here, the next problem is exposed - the code 
tries to link together files compiled for the target (created by the 
code quoted above) and the host (the _omp_descr file, I believe). Linker 
errors ensue.

As mentioned before, I think all this target-specific code has no place 
in lto-wrapper to begin with. For ptx, we're going to require some quite 
different mechanisms, so I think it might be best to invoke a new tool, 
maybe called $target-gen-offload, which knows how to produce an image 
that can be linked into the host executable. Different offload targets 
can then use different strategies to produce such an image. Probably 
each such image should contain its own code to register itself with 
libgomp, so that we don't have to construct a table.

Some other observations:
  * is OFFLOAD_TARGET_NAMES actually useful, or would any string
    generated at link time suffice?
  * Is the user expected to set OFFLOAD_TARGET_COMPILERS, or should
    this be done by the gcc driver, possibly based on command line
    options (I'd much prefer that)?
  * Do we actually need an -fopenmp-target option? The way I imagine it
    (and which was somewhat present in the Makefile patches I posted
    last year) is that an offload compiler is specially configured to
    know that that's how it will be used, and to know what the host
    architecture is. A $target-gen-offload could then be built with
    knowledge of the host architecture and installed in the host
    compiler's libexec install directory.

I think I'll need to implement my own set of mechanisms for ptx, since 
this code doesn't seem suitable for inclusion in its current state. I'll 
try to take on board some of the ideas I've found here in the hope that 
we'll converge on something that works for everybody.


Bernd
Ilya Verbin Jan. 28, 2014, 1:47 p.m. UTC | #5
Hi Bernd,

2014/1/27 Bernd Schmidt <bernds@codesourcery.com>:
> Once I worked around this by unsetting the environment variables around this
> compiler invocation here, the next problem is exposed - the code tries to
> link together files compiled for the target (created by the code quoted
> above) and the host (the _omp_descr file, I believe). Linker errors ensue.

Thanks, that's a bug.  Fortunately, it could be fixed easily.

> As mentioned before, I think all this target-specific code has no place in
> lto-wrapper to begin with. For ptx, we're going to require some quite
> different mechanisms, so I think it might be best to invoke a new tool,
> maybe called $target-gen-offload, which knows how to produce an image that
> can be linked into the host executable. Different offload targets can then
> use different strategies to produce such an image.

That's quite a viable way.  We added all this stuff to these patches
to allow other targets to reuse it as much as possible.  I.e. we
wasn't aware if other targets support objcopy et al., so we proposed
our way so that others could reuse it as-is if everything is
available.  It turned out, that the targets differ much in this place,
so as you suggested, it's better to move all this stuff to
target-specific utils.  Certainly, these patches don't pretend to be a
final version - they are just RFC, and we currently only want to show
what we need and find out what other targets need.

> Probably each such image
> should contain its own code to register itself with libgomp, so that we
> don't have to construct a table.

If other targets use another mapping scheme, then I think these tables
could easily be omitted from host/target executables (e.g. we could
add a corresponding flag to the target images descriptor).  But
personally I believe this part is general enough to satisfy all
targets.  Could you please describe how functions would be invoked on
PTX?

> Some other observations:
>  * is OFFLOAD_TARGET_NAMES actually useful, or would any string
>    generated at link time suffice?

Yep, it might be redundant for now, because all we need is target
compilers.  Target names aren't necessary.

>  * Is the user expected to set OFFLOAD_TARGET_COMPILERS, or should
>    this be done by the gcc driver, possibly based on command line
>    options (I'd much prefer that)?

It's supposed to be set by gcc driver.  Initial work in this direction
could be found here:
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01242.html

>  * Do we actually need an -fopenmp-target option? The way I imagine it
>    (and which was somewhat present in the Makefile patches I posted
>    last year) is that an offload compiler is specially configured to
>    know that that's how it will be used, and to know what the host
>    architecture is. A $target-gen-offload could then be built with
>    knowledge of the host architecture and installed in the host
>    compiler's libexec install directory.

Our idea here was that a single x86 compiler could serve both as host
and as target compiler, depending on presence of this option.  If
these compilers are always different, then indeed this option isn't
needed.

> Bernd

  -- Ilya
diff mbox

Patch

diff --git a/gcc/lto/lang.opt b/gcc/lto/lang.opt
index 7a9aede..cd0098c 100644
--- a/gcc/lto/lang.opt
+++ b/gcc/lto/lang.opt
@@ -40,4 +40,8 @@  fresolution=
 LTO Joined
 The resolution file
 
+fopenmp_target
+LTO Var(flag_openmp_target)
+Run LTO infrastructure to read target-side bytecode and to build it.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
index 19f10cc..64274f3 100644
--- a/gcc/lto/lto-object.c
+++ b/gcc/lto/lto-object.c
@@ -59,6 +59,8 @@  struct lto_simple_object
 
 static simple_object_attributes *saved_attributes;
 
+extern const char *section_name_prefix;
+
 /* Initialize FILE, an LTO file object for FILENAME.  */
 
 static void
@@ -229,8 +231,7 @@  lto_obj_add_section (void *data, const char *name, off_t offset,
   void **slot;
   struct lto_section_list *list = loasd->list;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
-	       strlen (LTO_SECTION_NAME_PREFIX)) != 0)
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 1;
 
   new_name = xstrdup (name);
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 0211437..dedf8a8 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -49,6 +49,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "context.h"
 #include "pass_manager.h"
 
+extern const char *section_name_prefix;
+
 /* Vector to keep track of external variables we've seen so far.  */
 vec<tree, va_gc> *lto_global_var_decls;
 
@@ -2081,7 +2083,7 @@  lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
 {
   const char *s;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 0;
   s = strrchr (name, '.');
   return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
@@ -2757,6 +2759,9 @@  read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
 
   timevar_push (TV_IPA_LTO_DECL_IN);
 
+  if (flag_openmp_target)
+    section_name_prefix = OMP_SECTION_NAME_PREFIX;
+
   real_file_decl_data
     = decl_data = ggc_alloc_cleared_vec_lto_file_decl_data_ptr (nfiles + 1);
   real_file_count = nfiles;