diff mbox

[hsa,5/12] New HSA-related GCC options

Message ID 20151109165856.GH14925@virgil.suse.cz
State New
Headers show

Commit Message

Martin Jambor Nov. 9, 2015, 4:58 p.m. UTC
Hi,

On Fri, Nov 06, 2015 at 09:42:25AM +0100, Richard Biener wrote:
> On Thu, 5 Nov 2015, Martin Jambor wrote:
> 
> > Hi,
> > 
> > the following small part of the merge deals with new options.  It adds
> > four independent things:
> > 
> > 1) flag_disable_hsa is used by code in opts.c (in the first patch) to
> >    remember whether HSA has been explicitely disabled on the compiler
> >    command line.
> 
> But I don't see any way to disable it on the command line?  (no switch?)

No, the switch is -foffload, which has missing documentation (PR
67300) and is only described at https://gcc.gnu.org/wiki/Offloading
Nevertheless, the option allows the user to specify compiler option
-foffload=disable and no offloading should happen, not even HSA.  The
user can also enumerate just the offload targets they want (and pass
them special command line stuff).

It seems I have misplaced a hunk in the patch series.  Nevertheless,
in the first patch (with configuration stuff), there is a change to
opts.c which scans the -foffload= contents and sets the flag variable
if hsa is not present.

Whenever the compiler has to decide whether HSA is enabled for the
given compilation or not, it has to look at this variable (if
configured for HSA).

> 
> > 2) -Whsa is a new warning we emit whenever we fail to produce HSAIL
> >    for some source code.  It is on by default but of course only
> >    emitted by HSAIL generating code so should never affect anybody who
> >    does not use HSA-enabled compiler and OpenMP 4 device constructs.
> > 
> > We have found the following two additions very useful for debugging on
> > the branch but will understand if they are not deemed suitable for
> > trunk and will gladly remove them:
> > 
> > 3) -fdisable-hsa-gridification disables the gridification process to
> >    ease experimenting with dynamic parallelism.  With this option,
> >    HSAIL is always generated from the CPU-intended gimple.
> 
> So this sounds like sth a user should never do which means
> it shouln't be a switch (but a parameter or removed).

Martin said he likes the capability to switch gridification off so I
turned it into a parameter.

> 
> > 4) Parameter hsa-gen-debug-stores will be obsolete once HSA run-time
> >    supports debugging traps.  Before that, we have to do with
> >    debugging stores to memory at defined places, which however can
> >    cost speed in benchmarks.  So they are only enabled with this
> >    parameter.  We decided to make it a parameter rather than a switch
> >    to emphasize the fact it will go away and to possibly allow us
> >    select different levels of verbosity of the stores in the future).
> 
> You miss documentation in invoke.texi for new switches and parameters.

Right, I have added that together with other changes addressing the
above comments and am about to commit the following to the branch:


2015-11-09  Martin Jambor  <mjambor@suse.cz>

	* common.opt (-fdisable-hsa-gridification): Removed.
	* params.def (PARAM_OMP_GPU_GRIDIFY): New.
	* omp-low.c: Include params.h.
	(execute_lower_omp): Check parameter PARAM_OMP_GPU_GRIDIFY instead of
	flag_disable-hsa-gridification.
	* doc/invoke.texi (Optimize Options): Add description of
	omp-gpu-gridify and hsa-gen-debug-stores parameters.

Comments

Richard Biener Nov. 10, 2015, 9:01 a.m. UTC | #1
On Mon, 9 Nov 2015, Martin Jambor wrote:

> Hi,
> 
> On Fri, Nov 06, 2015 at 09:42:25AM +0100, Richard Biener wrote:
> > On Thu, 5 Nov 2015, Martin Jambor wrote:
> > 
> > > Hi,
> > > 
> > > the following small part of the merge deals with new options.  It adds
> > > four independent things:
> > > 
> > > 1) flag_disable_hsa is used by code in opts.c (in the first patch) to
> > >    remember whether HSA has been explicitely disabled on the compiler
> > >    command line.
> > 
> > But I don't see any way to disable it on the command line?  (no switch?)
> 
> No, the switch is -foffload, which has missing documentation (PR
> 67300) and is only described at https://gcc.gnu.org/wiki/Offloading
> Nevertheless, the option allows the user to specify compiler option
> -foffload=disable and no offloading should happen, not even HSA.  The
> user can also enumerate just the offload targets they want (and pass
> them special command line stuff).
> 
> It seems I have misplaced a hunk in the patch series.  Nevertheless,
> in the first patch (with configuration stuff), there is a change to
> opts.c which scans the -foffload= contents and sets the flag variable
> if hsa is not present.
> 
> Whenever the compiler has to decide whether HSA is enabled for the
> given compilation or not, it has to look at this variable (if
> configured for HSA).
> 
> > 
> > > 2) -Whsa is a new warning we emit whenever we fail to produce HSAIL
> > >    for some source code.  It is on by default but of course only
> > >    emitted by HSAIL generating code so should never affect anybody who
> > >    does not use HSA-enabled compiler and OpenMP 4 device constructs.
> > > 
> > > We have found the following two additions very useful for debugging on
> > > the branch but will understand if they are not deemed suitable for
> > > trunk and will gladly remove them:
> > > 
> > > 3) -fdisable-hsa-gridification disables the gridification process to
> > >    ease experimenting with dynamic parallelism.  With this option,
> > >    HSAIL is always generated from the CPU-intended gimple.
> > 
> > So this sounds like sth a user should never do which means
> > it shouln't be a switch (but a parameter or removed).
> 
> Martin said he likes the capability to switch gridification off so I
> turned it into a parameter.
> 
> > 
> > > 4) Parameter hsa-gen-debug-stores will be obsolete once HSA run-time
> > >    supports debugging traps.  Before that, we have to do with
> > >    debugging stores to memory at defined places, which however can
> > >    cost speed in benchmarks.  So they are only enabled with this
> > >    parameter.  We decided to make it a parameter rather than a switch
> > >    to emphasize the fact it will go away and to possibly allow us
> > >    select different levels of verbosity of the stores in the future).
> > 
> > You miss documentation in invoke.texi for new switches and parameters.
> 
> Right, I have added that together with other changes addressing the
> above comments and am about to commit the following to the branch:

Looks good to me.

Thanks,
Richard.

> 
> 2015-11-09  Martin Jambor  <mjambor@suse.cz>
> 
> 	* common.opt (-fdisable-hsa-gridification): Removed.
> 	* params.def (PARAM_OMP_GPU_GRIDIFY): New.
> 	* omp-low.c: Include params.h.
> 	(execute_lower_omp): Check parameter PARAM_OMP_GPU_GRIDIFY instead of
> 	flag_disable-hsa-gridification.
> 	* doc/invoke.texi (Optimize Options): Add description of
> 	omp-gpu-gridify and hsa-gen-debug-stores parameters.
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 9cb52db..8bee504 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1115,10 +1115,6 @@ fdiagnostics-show-location=
>  Common Joined RejectNegative Enum(diagnostic_prefixing_rule)
>  -fdiagnostics-show-location=[once|every-line]	How often to emit source location at the beginning of line-wrapped diagnostics.
>  
> -fdisable-hsa-gridification
> -Common Report Var(flag_disable_hsa_gridification)
> -Disable HSA gridification for OMP pragmas
> -
>  ; Required for these enum values.
>  SourceInclude
>  pretty-print.h
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4fc7d88..b9fb1e1 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11171,6 +11171,17 @@ dynamic, guided, auto, runtime).  The default is static.
>  Maximum depth of recursion when querying properties of SSA names in things
>  like fold routines.  One level of recursion corresponds to following a
>  use-def chain.
> +
> +@item omp-gpu-gridify
> +Enable creation of gridified GPU kernels out of loops within target
> +OpenMP constructs.  This conversion is enabled by default when
> +offloading to HSA, to disable it, use @option{--param omp-gpu-gridify=0}
> +
> +@item hsa-gen-debug-stores
> +Enable emission of special debug stores within HSA kernels which are
> +then read and reported by libgomp plugin.  Generation of these stores
> +is disabled by default, use @option{--param hsa-gen-debug-stores=1} to
> +enable it.
>  @end table
>  @end table
>  
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index 34aafc8..f90a698 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-pretty-print.h"
>  #include "symbol-summary.h"
>  #include "hsa.h"
> +#include "params.h"
>  
>  /* Lowering of OMP parallel and workshare constructs proceeds in two
>     phases.  The first phase scans the function looking for OMP statements
> @@ -17449,7 +17450,8 @@ execute_lower_omp (void)
>  
>    body = gimple_body (current_function_decl);
>  
> -  if (hsa_gen_requested_p () && !flag_disable_hsa_gridification)
> +  if (hsa_gen_requested_p ()
> +      && PARAM_VALUE (PARAM_OMP_GPU_GRIDIFY) == 1)
>      create_target_gpukernels (&body);
>  
>    scan_omp (&body, NULL);
> diff --git a/gcc/params.def b/gcc/params.def
> index 86911e2..f12755b 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -1178,6 +1178,12 @@ DEFPARAM (PARAM_MAX_SSA_NAME_QUERY_DEPTH,
>  	  " SSA name.",
>  	  2, 1, 0)
>  
> +DEFPARAM (PARAM_OMP_GPU_GRIDIFY,
> +	  "omp-gpu-gridify",
> +	  "Enable creation of gridified GPU kernels out of OpenMP target "
> +	  "constructs",
> +	  1, 0, 1)
> +
>  DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
>  	  "hsa-gen-debug-stores",
>  	  "Level of hsa debug stores verbosity",
> 
>
Jakub Jelinek Nov. 12, 2015, 11:19 a.m. UTC | #2
On Mon, Nov 09, 2015 at 05:58:56PM +0100, Martin Jambor wrote:
> > But I don't see any way to disable it on the command line?  (no switch?)
> 
> No, the switch is -foffload, which has missing documentation (PR
> 67300) and is only described at https://gcc.gnu.org/wiki/Offloading
> Nevertheless, the option allows the user to specify compiler option
> -foffload=disable and no offloading should happen, not even HSA.  The
> user can also enumerate just the offload targets they want (and pass
> them special command line stuff).
> 
> It seems I have misplaced a hunk in the patch series.  Nevertheless,
> in the first patch (with configuration stuff), there is a change to
> opts.c which scans the -foffload= contents and sets the flag variable
> if hsa is not present.
> 
> Whenever the compiler has to decide whether HSA is enabled for the
> given compilation or not, it has to look at this variable (if
> configured for HSA).

Buut what is the difference between
-foffload=disable
or
-foffload={list not including hsa}
and the new param?  If you don't gridify, you don't emit any kernels...

	Jakub
Martin Jambor Nov. 13, 2015, 1:01 p.m. UTC | #3
On Thu, Nov 12, 2015 at 12:19:50PM +0100, Jakub Jelinek wrote:
> On Mon, Nov 09, 2015 at 05:58:56PM +0100, Martin Jambor wrote:
> > > But I don't see any way to disable it on the command line?  (no switch?)
> > 
> > No, the switch is -foffload, which has missing documentation (PR
> > 67300) and is only described at https://gcc.gnu.org/wiki/Offloading
> > Nevertheless, the option allows the user to specify compiler option
> > -foffload=disable and no offloading should happen, not even HSA.  The
> > user can also enumerate just the offload targets they want (and pass
> > them special command line stuff).
> > 
> > It seems I have misplaced a hunk in the patch series.  Nevertheless,
> > in the first patch (with configuration stuff), there is a change to
> > opts.c which scans the -foffload= contents and sets the flag variable
> > if hsa is not present.
> > 
> > Whenever the compiler has to decide whether HSA is enabled for the
> > given compilation or not, it has to look at this variable (if
> > configured for HSA).
> 
> Buut what is the difference between
> -foffload=disable
> or
> -foffload={list not including hsa}
> and the new param?  If you don't gridify, you don't emit any kernels...
> 

We do.  When a kernel cannot be gridified, we try to handle it via
dynamic parallelism (i.e. launching a kernel from a kernel) .  Even
though we have not been able to get any good performance with it and
there are several limitations and open problems, I still include this
option in my plans because it is unlikely we will be able to handle
complex scenarios without it (and I hope that as HSA evolves, it will
become a viable, even though most probably always a bit slower,
option).

Apart from the performance degradation, the biggest problem is that
currently HSA dynamic parallelism does not allow you wait for the
completion of the child kernel in a straightforward way.  There is a
hack that allowed us to do it, but by its nature it only allows depth
three dispatch (i.e. kernel->kernel->kernel).  We are limiting
ourselves to depth two, at the moment.

Dynamic parallelism also requires non-trivial preparation at the CPU
side, and quite a few HSA characteristics of the to-be dispatched
kernels have to be known and passed to the GPU when the first kernel
is invoked.  In our current scheme, we have to know the "dependencies"
of each kernel at compile-time, which is sometimes not possible, for
example if the second kernel is invoked from a function that is in a
different compilation unit than the first kernel.

As I said, I hope that with time we will be able to overcome all of
this, but at the moment, dynamic parallelism is clearly just an
experimental feature (that is why I suggested warning when not
gridifying).

I hope this answers the question and explains the situation a bit,

Martin
diff mbox

Patch

diff --git a/gcc/common.opt b/gcc/common.opt
index 9cb52db..8bee504 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1115,10 +1115,6 @@  fdiagnostics-show-location=
 Common Joined RejectNegative Enum(diagnostic_prefixing_rule)
 -fdiagnostics-show-location=[once|every-line]	How often to emit source location at the beginning of line-wrapped diagnostics.
 
-fdisable-hsa-gridification
-Common Report Var(flag_disable_hsa_gridification)
-Disable HSA gridification for OMP pragmas
-
 ; Required for these enum values.
 SourceInclude
 pretty-print.h
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4fc7d88..b9fb1e1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11171,6 +11171,17 @@  dynamic, guided, auto, runtime).  The default is static.
 Maximum depth of recursion when querying properties of SSA names in things
 like fold routines.  One level of recursion corresponds to following a
 use-def chain.
+
+@item omp-gpu-gridify
+Enable creation of gridified GPU kernels out of loops within target
+OpenMP constructs.  This conversion is enabled by default when
+offloading to HSA, to disable it, use @option{--param omp-gpu-gridify=0}
+
+@item hsa-gen-debug-stores
+Enable emission of special debug stores within HSA kernels which are
+then read and reported by libgomp plugin.  Generation of these stores
+is disabled by default, use @option{--param hsa-gen-debug-stores=1} to
+enable it.
 @end table
 @end table
 
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 34aafc8..f90a698 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -82,6 +82,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "symbol-summary.h"
 #include "hsa.h"
+#include "params.h"
 
 /* Lowering of OMP parallel and workshare constructs proceeds in two
    phases.  The first phase scans the function looking for OMP statements
@@ -17449,7 +17450,8 @@  execute_lower_omp (void)
 
   body = gimple_body (current_function_decl);
 
-  if (hsa_gen_requested_p () && !flag_disable_hsa_gridification)
+  if (hsa_gen_requested_p ()
+      && PARAM_VALUE (PARAM_OMP_GPU_GRIDIFY) == 1)
     create_target_gpukernels (&body);
 
   scan_omp (&body, NULL);
diff --git a/gcc/params.def b/gcc/params.def
index 86911e2..f12755b 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1178,6 +1178,12 @@  DEFPARAM (PARAM_MAX_SSA_NAME_QUERY_DEPTH,
 	  " SSA name.",
 	  2, 1, 0)
 
+DEFPARAM (PARAM_OMP_GPU_GRIDIFY,
+	  "omp-gpu-gridify",
+	  "Enable creation of gridified GPU kernels out of OpenMP target "
+	  "constructs",
+	  1, 0, 1)
+
 DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
 	  "hsa-gen-debug-stores",
 	  "Level of hsa debug stores verbosity",