diff mbox

[hsa,merge,07/10] IPA-HSA pass

Message ID 20160113173925.776317025@virgil.suse.cz
State New
Headers show

Commit Message

Martin Jambor Jan. 13, 2016, 5:39 p.m. UTC
Hi,

this patch contains IPA-related changes that we need to bring about
for HSA.

The patch is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00720.html but so far we
have not received any feedback.  Let me quote the original
accompanying email here for reference:

When a target construct is gridified, the HSA GPU function is
associated with the CPU function throughout the compilation, so that
they can be registered as a pair in libgomp.

Ungridified target constructs and, more importantly, "pragma omp
declare target" marked functions emerge out of OMP expansion as one
gimple function for both the host and the accelerator. However, at
some point we need to create a special HSA function representation so
that we can modify behavior of a (very) few optimization passes for
them.

Both is done by the following new IPA pass, which creates new HSA
clones in these cases.  Moreover, it redirects the appropriate call
graph edges to be in between HSA implementations, marks HSA clones
with the flatten attribute to minimize any call overhead (which is
much more significant on GPUs) and makes sure both the CPU and GPU
functions are coupled together and remain in the same LTO partition so
that they can b registered together to libgomp.

Thanks,

Martin


2016-01-13  Martin Liska  <mliska@suse.cz>
	    Martin Jambor  <mjambor@suse.cz>

	* ipa-hsa.c: New file.
	* lto-section-in.c (lto_section_name): Add hsa section name.
	* lto-streamer.h (lto_section_type): Add hsa section.
	* lto-partition.c: Include "hsa.h"
	(add_symbol_to_partition_1): Put hsa implementations into the
	same partition as host implementations.
	* timevar.def (TV_IPA_HSA): New.

Comments

Jakub Jelinek Jan. 14, 2016, 12:58 p.m. UTC | #1
On Wed, Jan 13, 2016 at 06:39:32PM +0100, Martin Jambor wrote:

> +	  cgraph_node *clone = node->create_virtual_clone
> +	    (vec <cgraph_edge *> (), NULL, NULL, "hsa");

Nicer formatting would be
	  cgraph_node *clone
	    = node->create_virtual_clone (vec <cgraph_edge *> (),
					  NULL, NULL, "hsa");

> +	  cgraph_node *clone = node->create_virtual_clone
> +	    (vec <cgraph_edge *> (), NULL, NULL, "hsa");

Ditto.

> +  const struct lto_function_header *header =
> +    (const struct lto_function_header *) data;

= goes on the next line.

> +  const int cfg_offset = sizeof (struct lto_function_header);
> +  const int main_offset = cfg_offset + header->cfg_size;
> +  const int string_offset = main_offset + header->main_size;
> +  struct data_in *data_in;
> +  unsigned int i;
> +  unsigned int count;
> +
> +  lto_input_block ib_main ((const char *) data + main_offset,
> +			   header->main_size, file_data->mode_table);
> +
> +  data_in =

Ditto.

> +bool
> +pass_ipa_hsa::gate (function *)
> +{
> +  return hsa_gen_requested_p () || in_lto_p;

Does it really need to be enabled whenever in_lto_p?
I mean, if HSA is not configured in, I think the gate should be false too.

Otherwise LGTM.

	Jakub
Jan Hubicka Jan. 15, 2016, 9:52 a.m. UTC | #2
> 2016-01-13  Martin Liska  <mliska@suse.cz>
> 	    Martin Jambor  <mjambor@suse.cz>
> 
> 	* ipa-hsa.c: New file.
> 	* lto-section-in.c (lto_section_name): Add hsa section name.
> 	* lto-streamer.h (lto_section_type): Add hsa section.
> 	* lto-partition.c: Include "hsa.h"
> 	(add_symbol_to_partition_1): Put hsa implementations into the
> 	same partition as host implementations.
> 	* timevar.def (TV_IPA_HSA): New.
> 
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 81a63a5..0a56170 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-prop.h"
>  #include "ipa-inline.h"
>  #include "lto-partition.h"
> +#include "hsa.h"
>  
>  vec<ltrans_partition> ltrans_partitions;
>  
> @@ -170,6 +171,24 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
>  	 Therefore put it into the same partition.  */
>        if (cnode->instrumented_version)
>  	add_symbol_to_partition_1 (part, cnode->instrumented_version);
> +
> +      /* Add an HSA associated with the symbol.  */
> +      if (hsa_summaries != NULL)
> +	{
> +	  hsa_function_summary *s = hsa_summaries->get (cnode);
> +	  if (s->m_kind == HSA_KERNEL)
> +	    {
> +	      /* Add binded function.  */
> +	      bool added = add_symbol_to_partition_1 (part,
> +						      s->m_binded_function);
> +	      gcc_assert (added);
> +	      if (symtab->dump_file)
> +		fprintf (symtab->dump_file,
> +			 "adding an HSA function (host/gpu) to the "
> +			 "partition: %s\n",
> +			 s->m_binded_function->name ());
> +	    }
> +	}

Do we really need to look that up in the hsa summary? Why these can not be partitioned the
usual way?

The patch looks OK for me modulo Jakub's comments.

Honza
>      }
>  
>    add_references_to_partition (part, node);
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 2765179..d9a5066 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -97,6 +97,7 @@ DEFTIMEVAR (TV_WHOPR_WPA_IO          , "whopr wpa I/O")
>  DEFTIMEVAR (TV_WHOPR_PARTITIONING    , "whopr partitioning")
>  DEFTIMEVAR (TV_WHOPR_LTRANS          , "whopr ltrans")
>  DEFTIMEVAR (TV_IPA_REFERENCE         , "ipa reference")
> +DEFTIMEVAR (TV_IPA_HSA		     , "ipa HSA")
>  DEFTIMEVAR (TV_IPA_PROFILE           , "ipa profile")
>  DEFTIMEVAR (TV_IPA_AUTOFDO           , "auto profile")
>  DEFTIMEVAR (TV_IPA_PURE_CONST        , "ipa pure const")
Martin Liška Jan. 15, 2016, 11:47 a.m. UTC | #3
On 01/15/2016 10:52 AM, Jan Hubicka wrote:
> Do we really need to look that up in the hsa summary? Why these can not be partitioned the
> usual way?

Hi.

Yes, it's needed as hsa-brig.c uses host function declaration of a kernel as a key for libgomp.
That's why we want to put the pair to a LTO partition.

Martin
Martin Liška Jan. 15, 2016, 11:48 a.m. UTC | #4
On 01/14/2016 01:58 PM, Jakub Jelinek wrote:
> Does it really need to be enabled whenever in_lto_p?
> I mean, if HSA is not configured in, I think the gate should be false too.

Sure, it can be removed, change will incorporated in final installed version of the file.

Thanks,
Martin
Jan Hubicka Jan. 16, 2016, 10 a.m. UTC | #5
> On 01/15/2016 10:52 AM, Jan Hubicka wrote:
> > Do we really need to look that up in the hsa summary? Why these can not be partitioned the
> > usual way?
> 
> Hi.
> 
> Yes, it's needed as hsa-brig.c uses host function declaration of a kernel as a key for libgomp.
> That's why we want to put the pair to a LTO partition.

Can't it be represented via explicit REF_ADDR or something like that?

Honza
> 
> Martin
diff mbox

Patch

diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
new file mode 100644
index 0000000..dd47995
--- /dev/null
+++ b/gcc/ipa-hsa.c
@@ -0,0 +1,329 @@ 
+/* Callgraph based analysis of static variables.
+   Copyright (C) 2015-2016 Free Software Foundation, Inc.
+   Contributed by Martin Liska <mliska@suse.cz>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Interprocedural HSA pass is responsible for creation of HSA clones.
+   For all these HSA clones, we emit HSAIL instructions and pass processing
+   is terminated.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "hash-set.h"
+#include "vec.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "dumpfile.h"
+#include "gimple-pretty-print.h"
+#include "tree-streamer.h"
+#include "stringpool.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+
+namespace {
+
+/* If NODE is not versionable, warn about not emiting HSAIL and return false.
+   Otherwise return true.  */
+
+static bool
+check_warn_node_versionable (cgraph_node *node)
+{
+  if (!node->local.versionable)
+    {
+      warning_at (EXPR_LOCATION (node->decl), OPT_Whsa,
+		  "could not emit HSAIL for function %s: function cannot be "
+		  "cloned", node->name ());
+      return false;
+    }
+  return true;
+}
+
+/* The function creates HSA clones for all functions that were either
+   marked as HSA kernels or are callable HSA functions.  Apart from that,
+   we redirect all edges that come from an HSA clone and end in another
+   HSA clone to connect these two functions.  */
+
+static unsigned int
+process_hsa_functions (void)
+{
+  struct cgraph_node *node;
+
+  if (hsa_summaries == NULL)
+    hsa_summaries = new hsa_summary_t (symtab);
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      hsa_function_summary *s = hsa_summaries->get (node);
+
+      /* A linked function is skipped.  */
+      if (s->m_binded_function != NULL)
+	continue;
+
+      if (s->m_kind != HSA_NONE)
+	{
+	  if (!check_warn_node_versionable (node))
+	    continue;
+	  cgraph_node *clone = node->create_virtual_clone
+	    (vec <cgraph_edge *> (), NULL, NULL, "hsa");
+	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+
+	  clone->force_output = true;
+	  hsa_summaries->link_functions (clone, node, s->m_kind, false);
+
+	  if (dump_file)
+	    fprintf (dump_file, "Created a new HSA clone: %s, type: %s\n",
+		     clone->name (),
+		     s->m_kind == HSA_KERNEL ? "kernel" : "function");
+	}
+      else if (hsa_callable_function_p (node->decl))
+	{
+	  if (!check_warn_node_versionable (node))
+	    continue;
+	  cgraph_node *clone = node->create_virtual_clone
+	    (vec <cgraph_edge *> (), NULL, NULL, "hsa");
+	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+
+	  if (!cgraph_local_p (node))
+	    clone->force_output = true;
+	  hsa_summaries->link_functions (clone, node, HSA_FUNCTION, false);
+
+	  if (dump_file)
+	    fprintf (dump_file, "Created a new HSA function clone: %s\n",
+		     clone->name ());
+	}
+    }
+
+  /* Redirect all edges that are between HSA clones.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      cgraph_edge *e = node->callees;
+
+      while (e)
+	{
+	  hsa_function_summary *src = hsa_summaries->get (node);
+	  if (src->m_kind != HSA_NONE && src->m_gpu_implementation_p)
+	    {
+	      hsa_function_summary *dst = hsa_summaries->get (e->callee);
+	      if (dst->m_kind != HSA_NONE && !dst->m_gpu_implementation_p)
+		{
+		  e->redirect_callee (dst->m_binded_function);
+		  if (dump_file)
+		    fprintf (dump_file,
+			     "Redirecting edge to HSA function: %s->%s\n",
+			     xstrdup_for_dump (e->caller->name ()),
+			     xstrdup_for_dump (e->callee->name ()));
+		}
+	    }
+
+	  e = e->next_callee;
+	}
+    }
+
+  return 0;
+}
+
+/* Iterate all HSA functions and stream out HSA function summary.  */
+
+static void
+ipa_hsa_write_summary (void)
+{
+  struct bitpack_d bp;
+  struct cgraph_node *node;
+  struct output_block *ob;
+  unsigned int count = 0;
+  lto_symtab_encoder_iterator lsei;
+  lto_symtab_encoder_t encoder;
+
+  if (!hsa_summaries)
+    return;
+
+  ob = create_output_block (LTO_section_ipa_hsa);
+  encoder = ob->decl_state->symtab_node_encoder;
+  ob->symbol = NULL;
+  for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
+       lsei_next_function_in_partition (&lsei))
+    {
+      node = lsei_cgraph_node (lsei);
+      hsa_function_summary *s = hsa_summaries->get (node);
+
+      if (s->m_kind != HSA_NONE)
+	count++;
+    }
+
+  streamer_write_uhwi (ob, count);
+
+  /* Process all of the functions.  */
+  for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
+       lsei_next_function_in_partition (&lsei))
+    {
+      node = lsei_cgraph_node (lsei);
+      hsa_function_summary *s = hsa_summaries->get (node);
+
+      if (s->m_kind != HSA_NONE)
+	{
+	  encoder = ob->decl_state->symtab_node_encoder;
+	  int node_ref = lto_symtab_encoder_encode (encoder, node);
+	  streamer_write_uhwi (ob, node_ref);
+
+	  bp = bitpack_create (ob->main_stream);
+	  bp_pack_value (&bp, s->m_kind, 2);
+	  bp_pack_value (&bp, s->m_gpu_implementation_p, 1);
+	  bp_pack_value (&bp, s->m_binded_function != NULL, 1);
+	  streamer_write_bitpack (&bp);
+	  if (s->m_binded_function)
+	    stream_write_tree (ob, s->m_binded_function->decl, true);
+	}
+    }
+
+  streamer_write_char_stream (ob->main_stream, 0);
+  produce_asm (ob, NULL);
+  destroy_output_block (ob);
+}
+
+/* Read section in file FILE_DATA of length LEN with data DATA.  */
+
+static void
+ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
+		       size_t len)
+{
+  const struct lto_function_header *header =
+    (const struct lto_function_header *) data;
+  const int cfg_offset = sizeof (struct lto_function_header);
+  const int main_offset = cfg_offset + header->cfg_size;
+  const int string_offset = main_offset + header->main_size;
+  struct data_in *data_in;
+  unsigned int i;
+  unsigned int count;
+
+  lto_input_block ib_main ((const char *) data + main_offset,
+			   header->main_size, file_data->mode_table);
+
+  data_in =
+    lto_data_in_create (file_data, (const char *) data + string_offset,
+			header->string_size, vNULL);
+  count = streamer_read_uhwi (&ib_main);
+
+  for (i = 0; i < count; i++)
+    {
+      unsigned int index;
+      struct cgraph_node *node;
+      lto_symtab_encoder_t encoder;
+
+      index = streamer_read_uhwi (&ib_main);
+      encoder = file_data->symtab_node_encoder;
+      node = dyn_cast<cgraph_node *> (lto_symtab_encoder_deref (encoder,
+								index));
+      gcc_assert (node->definition);
+      hsa_function_summary *s = hsa_summaries->get (node);
+
+      struct bitpack_d bp = streamer_read_bitpack (&ib_main);
+      s->m_kind = (hsa_function_kind) bp_unpack_value (&bp, 2);
+      s->m_gpu_implementation_p = bp_unpack_value (&bp, 1);
+      bool has_tree = bp_unpack_value (&bp, 1);
+
+      if (has_tree)
+	{
+	  tree decl = stream_read_tree (&ib_main, data_in);
+	  s->m_binded_function = cgraph_node::get_create (decl);
+	}
+    }
+  lto_free_section_data (file_data, LTO_section_ipa_hsa, NULL, data,
+			 len);
+  lto_data_in_delete (data_in);
+}
+
+/* Load streamed HSA functions summary and assign the summary to a function.  */
+
+static void
+ipa_hsa_read_summary (void)
+{
+  struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
+  struct lto_file_decl_data *file_data;
+  unsigned int j = 0;
+
+  if (hsa_summaries == NULL)
+    hsa_summaries = new hsa_summary_t (symtab);
+
+  while ((file_data = file_data_vec[j++]))
+    {
+      size_t len;
+      const char *data = lto_get_section_data (file_data, LTO_section_ipa_hsa,
+					       NULL, &len);
+
+      if (data)
+	ipa_hsa_read_section (file_data, data, len);
+    }
+}
+
+const pass_data pass_data_ipa_hsa =
+{
+  IPA_PASS, /* type */
+  "hsa", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_IPA_HSA, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_dump_symtab, /* todo_flags_finish */
+};
+
+class pass_ipa_hsa : public ipa_opt_pass_d
+{
+public:
+  pass_ipa_hsa (gcc::context *ctxt)
+    : ipa_opt_pass_d (pass_data_ipa_hsa, ctxt,
+		      NULL, /* generate_summary */
+		      ipa_hsa_write_summary, /* write_summary */
+		      ipa_hsa_read_summary, /* read_summary */
+		      ipa_hsa_write_summary, /* write_optimization_summary */
+		      ipa_hsa_read_summary, /* read_optimization_summary */
+		      NULL, /* stmt_fixup */
+		      0, /* function_transform_todo_flags_start */
+		      NULL, /* function_transform */
+		      NULL) /* variable_transform */
+    {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+
+  virtual unsigned int execute (function *) { return process_hsa_functions (); }
+
+}; // class pass_ipa_reference
+
+bool
+pass_ipa_hsa::gate (function *)
+{
+  return hsa_gen_requested_p () || in_lto_p;
+}
+
+} // anon namespace
+
+ipa_opt_pass_d *
+make_pass_ipa_hsa (gcc::context *ctxt)
+{
+  return new pass_ipa_hsa (ctxt);
+}
diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index 972f062..93b82be 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -51,7 +51,8 @@  const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "ipcp_trans",
   "icf",
   "offload_table",
-  "mode_table"
+  "mode_table",
+  "hsa"
 };
 
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 42654f5..0cb200e 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -244,6 +244,7 @@  enum lto_section_type
   LTO_section_ipa_icf,
   LTO_section_offload_table,
   LTO_section_mode_table,
+  LTO_section_ipa_hsa,
   LTO_N_SECTION_TYPES		/* Must be last.  */
 };
 
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 81a63a5..0a56170 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -34,6 +34,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "ipa-inline.h"
 #include "lto-partition.h"
+#include "hsa.h"
 
 vec<ltrans_partition> ltrans_partitions;
 
@@ -170,6 +171,24 @@  add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
 	 Therefore put it into the same partition.  */
       if (cnode->instrumented_version)
 	add_symbol_to_partition_1 (part, cnode->instrumented_version);
+
+      /* Add an HSA associated with the symbol.  */
+      if (hsa_summaries != NULL)
+	{
+	  hsa_function_summary *s = hsa_summaries->get (cnode);
+	  if (s->m_kind == HSA_KERNEL)
+	    {
+	      /* Add binded function.  */
+	      bool added = add_symbol_to_partition_1 (part,
+						      s->m_binded_function);
+	      gcc_assert (added);
+	      if (symtab->dump_file)
+		fprintf (symtab->dump_file,
+			 "adding an HSA function (host/gpu) to the "
+			 "partition: %s\n",
+			 s->m_binded_function->name ());
+	    }
+	}
     }
 
   add_references_to_partition (part, node);
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 2765179..d9a5066 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -97,6 +97,7 @@  DEFTIMEVAR (TV_WHOPR_WPA_IO          , "whopr wpa I/O")
 DEFTIMEVAR (TV_WHOPR_PARTITIONING    , "whopr partitioning")
 DEFTIMEVAR (TV_WHOPR_LTRANS          , "whopr ltrans")
 DEFTIMEVAR (TV_IPA_REFERENCE         , "ipa reference")
+DEFTIMEVAR (TV_IPA_HSA		     , "ipa HSA")
 DEFTIMEVAR (TV_IPA_PROFILE           , "ipa profile")
 DEFTIMEVAR (TV_IPA_AUTOFDO           , "auto profile")
 DEFTIMEVAR (TV_IPA_PURE_CONST        , "ipa pure const")