diff mbox series

Fix PR83017 (fortran part)

Message ID alpine.LSU.2.20.1711171009530.12252@zhemvz.fhfr.qr
State New
Headers show
Series Fix PR83017 (fortran part) | expand

Commit Message

Richard Biener Nov. 17, 2017, 9:13 a.m. UTC
This adds a new ANNOTATE_EXPR kind, annot_expr_parallel_kind, which
is stronger than ivdep which maps semantically to safelen=INT_MAX
which alone doesn't tell us enough to auto-parallelize anything.

annot_expr_parallel_kind can map to the already existing loop
flag can_be_parallel which can be used unconditionally by autopar.

This patch changes the Fortran frontend to annotate DO CONCURRENT
with parallel instead of ivdep.

The patch is not enough to enable a runtime benefit because of
some autopar costing issues but for other cases it should enable
auto-parallelization of all DO CONCURRENT loops.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for the fortran
part?

Thanks,
Richard.

2017-11-17  Richard Biener  <rguenther@suse.de>

	PR fortran/83017
	* tree-core.h (enum annot_expr_kind): Add annot_expr_parallel_kind.
	* tree-pretty-print.c (dump_generic_node): Handle
	annot_expr_parallel_kind.
	* tree-cfg.c (replace_loop_annotate_in_block): Likewise.
	* gimplify.c (gimple_boolify): Likewise.

	fortran/
	* trans-stmt.c (gfc_trans_forall_loop): Annotate DO CONCURRENT
	loops with annot_expr_parallel_kind instead of just
	annot_expr_ivdep_kind.

Comments

Janne Blomqvist Nov. 17, 2017, 12:44 p.m. UTC | #1
On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener <rguenther@suse.de> wrote:
> This patch changes the Fortran frontend to annotate DO CONCURRENT
> with parallel instead of ivdep.
>
> The patch is not enough to enable a runtime benefit because of
> some autopar costing issues but for other cases it should enable
> auto-parallelization of all DO CONCURRENT loops.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for the fortran
> part?

I recall some years ago there was discussion whether DO CONCURRENT
should be handled as "Ok to vectorize" or "Ok to parallelize using
threads", and I believe back then it was decided to play it safe and
just vectorize. Has this consensus changed now? And with this change,
are we now using threads, or both vectors and threads? And is it
enabled always, or only with -fopenmp, or some
-ftree-loop-parallel-whatever? And if the second, how does it interact
with openmp parallelization?
Richard Biener Nov. 17, 2017, 1:03 p.m. UTC | #2
On Fri, 17 Nov 2017, Janne Blomqvist wrote:

> On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener <rguenther@suse.de> wrote:
> > This patch changes the Fortran frontend to annotate DO CONCURRENT
> > with parallel instead of ivdep.
> >
> > The patch is not enough to enable a runtime benefit because of
> > some autopar costing issues but for other cases it should enable
> > auto-parallelization of all DO CONCURRENT loops.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for the fortran
> > part?
> 
> I recall some years ago there was discussion whether DO CONCURRENT
> should be handled as "Ok to vectorize" or "Ok to parallelize using
> threads", and I believe back then it was decided to play it safe and
> just vectorize. Has this consensus changed now? And with this change,
> are we now using threads, or both vectors and threads? And is it
> enabled always, or only with -fopenmp, or some
> -ftree-loop-parallel-whatever? And if the second, how does it interact
> with openmp parallelization?

It is only parallelized with -ftree-loop-parallelize, OpenMP processing
comes first to I think it doesn't interfere here.  The loops are still
marked with ivdep as well and thus should enable vectorization.  The
usual caveats may apply when trying to vectorize outlined loops
(loops that have been parallelized).

Was the "play safe" for correctness concerns or for optimization
concern?

Richard.
Janne Blomqvist Nov. 17, 2017, 1:23 p.m. UTC | #3
On Fri, Nov 17, 2017 at 3:03 PM, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 17 Nov 2017, Janne Blomqvist wrote:
>
>> On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener <rguenther@suse.de> wrote:
>> > This patch changes the Fortran frontend to annotate DO CONCURRENT
>> > with parallel instead of ivdep.
>> >
>> > The patch is not enough to enable a runtime benefit because of
>> > some autopar costing issues but for other cases it should enable
>> > auto-parallelization of all DO CONCURRENT loops.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for the fortran
>> > part?
>>
>> I recall some years ago there was discussion whether DO CONCURRENT
>> should be handled as "Ok to vectorize" or "Ok to parallelize using
>> threads", and I believe back then it was decided to play it safe and
>> just vectorize. Has this consensus changed now? And with this change,
>> are we now using threads, or both vectors and threads? And is it
>> enabled always, or only with -fopenmp, or some
>> -ftree-loop-parallel-whatever? And if the second, how does it interact
>> with openmp parallelization?
>
> It is only parallelized with -ftree-loop-parallelize, OpenMP processing
> comes first to I think it doesn't interfere here.  The loops are still
> marked with ivdep as well and thus should enable vectorization.

Ok, sounds good then. Ok for my part.

For OpenMP, the thing that came to mind was that if you had an outer
loop parallelized with "omp parallel do", then an inner DO CONCURRENT
loop, hopefully you don't then get N*N threads? That is, does
-ftree-loop-parallelize use the thread pool that libgomp creates, or
does it create its own?

> The
> usual caveats may apply when trying to vectorize outlined loops
> (loops that have been parallelized).
>
> Was the "play safe" for correctness concerns or for optimization
> concern?

IIRC it was an optimization concern, if users assume DO CONCURRENT
means "vectorize" then they may use it for short loops where the
overhead of threads isn't worth it. Back then I recall it wasn't clear
what other compiler were going to do with DO CONCURRENT, thus merely
vectorizing was a safer option at the time. It seems that nowadays at
least Intel attempts to use threads for DO CONCURRENT only with
-parallel//Qparallel, which more or less matches what you're proposing
here. Which I guess is good for users in a performance portability
sense.
Richard Biener Nov. 17, 2017, 1:34 p.m. UTC | #4
On Fri, 17 Nov 2017, Janne Blomqvist wrote:

> On Fri, Nov 17, 2017 at 3:03 PM, Richard Biener <rguenther@suse.de> wrote:
> > On Fri, 17 Nov 2017, Janne Blomqvist wrote:
> >
> >> On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener <rguenther@suse.de> wrote:
> >> > This patch changes the Fortran frontend to annotate DO CONCURRENT
> >> > with parallel instead of ivdep.
> >> >
> >> > The patch is not enough to enable a runtime benefit because of
> >> > some autopar costing issues but for other cases it should enable
> >> > auto-parallelization of all DO CONCURRENT loops.
> >> >
> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for the fortran
> >> > part?
> >>
> >> I recall some years ago there was discussion whether DO CONCURRENT
> >> should be handled as "Ok to vectorize" or "Ok to parallelize using
> >> threads", and I believe back then it was decided to play it safe and
> >> just vectorize. Has this consensus changed now? And with this change,
> >> are we now using threads, or both vectors and threads? And is it
> >> enabled always, or only with -fopenmp, or some
> >> -ftree-loop-parallel-whatever? And if the second, how does it interact
> >> with openmp parallelization?
> >
> > It is only parallelized with -ftree-loop-parallelize, OpenMP processing
> > comes first to I think it doesn't interfere here.  The loops are still
> > marked with ivdep as well and thus should enable vectorization.
> 
> Ok, sounds good then. Ok for my part.
> 
> For OpenMP, the thing that came to mind was that if you had an outer
> loop parallelized with "omp parallel do", then an inner DO CONCURRENT
> loop, hopefully you don't then get N*N threads? That is, does
> -ftree-loop-parallelize use the thread pool that libgomp creates, or
> does it create its own?

It uses the same thread pool.

> > The
> > usual caveats may apply when trying to vectorize outlined loops
> > (loops that have been parallelized).
> >
> > Was the "play safe" for correctness concerns or for optimization
> > concern?
> 
> IIRC it was an optimization concern, if users assume DO CONCURRENT
> means "vectorize" then they may use it for short loops where the
> overhead of threads isn't worth it. Back then I recall it wasn't clear
> what other compiler were going to do with DO CONCURRENT, thus merely
> vectorizing was a safer option at the time. It seems that nowadays at
> least Intel attempts to use threads for DO CONCURRENT only with
> -parallel//Qparallel, which more or less matches what you're proposing
> here. Which I guess is good for users in a performance portability
> sense.

Sure.  -ftree-parallelize-loops isn't enabled by default.  The patch
merely adds more precise dependence analysis hints for the middle-end.

Richard.
diff mbox series

Patch

Index: gcc/tree-core.h
===================================================================
--- gcc/tree-core.h	(revision 254797)
+++ gcc/tree-core.h	(working copy)
@@ -853,6 +853,7 @@  enum annot_expr_kind {
   annot_expr_ivdep_kind,
   annot_expr_no_vector_kind,
   annot_expr_vector_kind,
+  annot_expr_parallel_kind,
   annot_expr_kind_last
 };
 
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 254797)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -2638,6 +2640,9 @@  dump_generic_node (pretty_printer *pp, t
 	case annot_expr_vector_kind:
 	  pp_string (pp, ", vector");
 	  break;
+	case annot_expr_parallel_kind:
+	  pp_string (pp, ", parallel");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 254797)
+++ gcc/tree-cfg.c	(working copy)
@@ -287,6 +287,10 @@  replace_loop_annotate_in_block (basic_bl
 	  loop->force_vectorize = true;
 	  cfun->has_force_vectorize_loops = true;
 	  break;
+	case annot_expr_parallel_kind:
+	  loop->can_be_parallel = true;
+	  loop->safelen = INT_MAX;
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 254797)
+++ gcc/gimplify.c	(working copy)
@@ -3749,6 +3749,7 @@  gimple_boolify (tree expr)
 	case annot_expr_ivdep_kind:
 	case annot_expr_no_vector_kind:
 	case annot_expr_vector_kind:
+	case annot_expr_parallel_kind:
 	  TREE_OPERAND (expr, 0) = gimple_boolify (TREE_OPERAND (expr, 0));
 	  if (TREE_CODE (type) != BOOLEAN_TYPE)
 	    TREE_TYPE (expr) = boolean_type_node;
Index: gcc/fortran/trans-stmt.c
===================================================================
--- gcc/fortran/trans-stmt.c	(revision 254797)
+++ gcc/fortran/trans-stmt.c	(working copy)
@@ -3455,7 +3455,7 @@  gfc_trans_forall_loop (forall_info *fora
       if (forall_tmp->do_concurrent)
 	cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		       build_int_cst (integer_type_node,
-				      annot_expr_ivdep_kind));
+				      annot_expr_parallel_kind));
 
       tmp = build1_v (GOTO_EXPR, exit_label);
       tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node,