diff mbox

[5/16] Add in_oacc_kernels_region in struct loop

Message ID 5640CA57.7090007@mentor.com
State New
Headers show

Commit Message

Tom de Vries Nov. 9, 2015, 4:31 p.m. UTC
On 09/11/15 16:35, Tom de Vries wrote:
> Hi,
>
> this patch series for stage1 trunk adds support to:
> - parallelize oacc kernels regions using parloops, and
> - map the loops onto the oacc gang dimension.
>
> The patch series contains these patches:
>
>       1    Insert new exit block only when needed in
>          transform_to_exit_first_loop_alt
>       2    Make create_parallel_loop return void
>       3    Ignore reduction clause on kernels directive
>       4    Implement -foffload-alias
>       5    Add in_oacc_kernels_region in struct loop
>       6    Add pass_oacc_kernels
>       7    Add pass_dominator_oacc_kernels
>       8    Add pass_ch_oacc_kernels
>       9    Add pass_parallelize_loops_oacc_kernels
>      10    Add pass_oacc_kernels pass group in passes.def
>      11    Update testcases after adding kernels pass group
>      12    Handle acc loop directive
>      13    Add c-c++-common/goacc/kernels-*.c
>      14    Add gfortran.dg/goacc/kernels-*.f95
>      15    Add libgomp.oacc-c-c++-common/kernels-*.c
>      16    Add libgomp.oacc-fortran/kernels-*.f95
>
> The first 9 patches are more or less independent, but patches 10-16 are
> intended to be committed at the same time.
>
> Bootstrapped and reg-tested on x86_64.
>
> Build and reg-tested with nvidia accelerator, in combination with a
> patch that enables accelerator testing (which is submitted at
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
>
> I'll post the individual patches in reply to this message.

this patch adds and initializes the field in_oacc_kernels_region field 
in struct loop.

The field is used to signal to subsequent passes that we're dealing with 
a loop in a kernels region that we're trying parallelize.

Note that we do not parallelize kernels regions with more than one loop 
nest. [ In general, kernels regions with more than one loop nest should 
be split up into seperate kernels regions, but that's not supported atm. ]

Thanks,
- Tom

Comments

Richard Biener Nov. 11, 2015, 10:55 a.m. UTC | #1
On Mon, 9 Nov 2015, Tom de Vries wrote:

> On 09/11/15 16:35, Tom de Vries wrote:
> > Hi,
> > 
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.
> > 
> > The patch series contains these patches:
> > 
> >       1    Insert new exit block only when needed in
> >          transform_to_exit_first_loop_alt
> >       2    Make create_parallel_loop return void
> >       3    Ignore reduction clause on kernels directive
> >       4    Implement -foffload-alias
> >       5    Add in_oacc_kernels_region in struct loop
> >       6    Add pass_oacc_kernels
> >       7    Add pass_dominator_oacc_kernels
> >       8    Add pass_ch_oacc_kernels
> >       9    Add pass_parallelize_loops_oacc_kernels
> >      10    Add pass_oacc_kernels pass group in passes.def
> >      11    Update testcases after adding kernels pass group
> >      12    Handle acc loop directive
> >      13    Add c-c++-common/goacc/kernels-*.c
> >      14    Add gfortran.dg/goacc/kernels-*.f95
> >      15    Add libgomp.oacc-c-c++-common/kernels-*.c
> >      16    Add libgomp.oacc-fortran/kernels-*.f95
> > 
> > The first 9 patches are more or less independent, but patches 10-16 are
> > intended to be committed at the same time.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build and reg-tested with nvidia accelerator, in combination with a
> > patch that enables accelerator testing (which is submitted at
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> > I'll post the individual patches in reply to this message.
> 
> this patch adds and initializes the field in_oacc_kernels_region field in
> struct loop.
> 
> The field is used to signal to subsequent passes that we're dealing with a
> loop in a kernels region that we're trying parallelize.
> 
> Note that we do not parallelize kernels regions with more than one loop nest.
> [ In general, kernels regions with more than one loop nest should be split up
> into seperate kernels regions, but that's not supported atm. ]

I think mark_loops_in_oacc_kernels_region can be greatly simplified.

Both region entry and exit should have the same ->loop_father (a SESE
region).  Then you can just walk that loops inner (and their sibling) 
loops checking their header domination relation with the region entry
exit (only necessary for direct inner loops).

Richard.

> Thanks,
> - Tom
> 
>
diff mbox

Patch

Add in_oacc_kernels_region in struct loop

2015-11-09  Tom de Vries  <tom@codesourcery.com>

	* cfgloop.h (struct loop): Add in_oacc_kernels_region field.
	* omp-low.c (mark_loops_in_oacc_kernels_region): New function.
	(expand_omp_target): Call mark_loops_in_oacc_kernels_region.
---
 gcc/cfgloop.h |  3 +++
 gcc/omp-low.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 6af6893..ee73bf9 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -191,6 +191,9 @@  struct GTY ((chain_next ("%h.next"))) loop {
   /* True if we should try harder to vectorize this loop.  */
   bool force_vectorize;
 
+  /* True if the loop is part of an oacc kernels region.  */
+  bool in_oacc_kernels_region;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
      by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
      builtins.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d052c13..7121d73 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12429,6 +12429,61 @@  get_oacc_ifn_dim_arg (const gimple *stmt)
   return (int) axis;
 }
 
+/* Mark the loops inside the kernels region starting at REGION_ENTRY and ending
+   at REGION_EXIT.  */
+
+static void
+mark_loops_in_oacc_kernels_region (basic_block region_entry,
+				   basic_block region_exit)
+{
+  bitmap dominated_bitmap = BITMAP_GGC_ALLOC ();
+  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
+  unsigned di;
+  basic_block bb;
+
+  bitmap_clear (dominated_bitmap);
+  bitmap_clear (excludes_bitmap);
+
+  /* Get all the blocks dominated by the region entry.  That will include the
+     entire region.  */
+  vec<basic_block> dominated
+    = get_all_dominated_blocks (CDI_DOMINATORS, region_entry);
+  FOR_EACH_VEC_ELT (dominated, di, bb)
+      bitmap_set_bit (dominated_bitmap, bb->index);
+
+  /* Exclude all the blocks which are not in the region: the blocks dominated by
+     the region exit.  */
+  if (region_exit != NULL)
+    {
+      vec<basic_block> excludes
+	= get_all_dominated_blocks (CDI_DOMINATORS, region_exit);
+      FOR_EACH_VEC_ELT (excludes, di, bb)
+	bitmap_set_bit (excludes_bitmap, bb->index);
+    }
+
+  /* Don't parallelize the kernels region if it contains more than one outer
+     loop.  */
+  unsigned int nr_outer_loops = 0;
+  struct loop *loop;
+  FOR_EACH_LOOP (loop, 0)
+    {
+      if (loop_outer (loop) != current_loops->tree_root)
+	continue;
+
+      if (bitmap_bit_p (dominated_bitmap, loop->header->index)
+	  && !bitmap_bit_p (excludes_bitmap, loop->header->index))
+	nr_outer_loops++;
+    }
+  if (nr_outer_loops != 1)
+    return;
+
+  /* Mark the loops in the region.  */
+  FOR_EACH_LOOP (loop, 0)
+    if (bitmap_bit_p (dominated_bitmap, loop->header->index)
+	&& !bitmap_bit_p (excludes_bitmap, loop->header->index))
+      loop->in_oacc_kernels_region = true;
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -12483,6 +12538,9 @@  expand_omp_target (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
+  if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+    mark_loops_in_oacc_kernels_region (region->entry, region->exit);
+
   if (offloaded)
     {
       unsigned srcidx, dstidx, num;
-- 
1.9.1