diff mbox series

[OpenACC] Enable SIMD vectorization on vector loops

Message ID 72c9e096-70fc-bc55-200b-2b51eb5c2e0b@mentor.com
State New
Headers show
Series [OpenACC] Enable SIMD vectorization on vector loops | expand

Commit Message

Cesar Philippidis Sept. 13, 2017, 11:20 p.m. UTC
This patch enables SIMD vectorization on non-SIMT targets in acc vector
loops. It does does so by setting the force_vectorization flag in a
similar manner to OpenMP SIMD loops. Unlike OpenMP, OpenACC provides the
compiler with the flexibility to assign gang, worker and vector
parallelism to independent acc loops. At present, automatic parallelism
is assigned during the oacc device lower pass, specifically inside
oacc_loop_process. Consequently, this patch applies the
force_vectorization flag late when the GOACC_LOOP internal functions are
expanded into target-specific code.

Note that expand_oacc_for may construct two loops for each acc loop; the
outer loop represents the "chunking" factor, whereas the inner loops are
for individual gang, worker and vector threads.

Also note that OpenACC permits the user to apply any combination of
gang, worker and vector level parallelism to each loop. E.g., acc loop
gang vector. However, oacc_xform_loop does not strip-mine the acc loops
to take advantage of this on non-SIMT targets as it does for SIMT
targets. Therefore, this the force vectorization flag is only set when
the acc loop has been assigned vector partitioning.

Is this patch OK for trunk?

Cesar

Comments

Jakub Jelinek Sept. 14, 2017, 7:15 a.m. UTC | #1
On Wed, Sep 13, 2017 at 04:20:32PM -0700, Cesar Philippidis wrote:
> 2017-09-13  Cesar Philippidis  <cesar@codesourcery.com>
> 
> 	gcc/
> 	* omp-offload.c (oacc_xform_loop): Enable SIMD vectorization on
> 	non-SIMT targets in acc vector loops.

Ok, thanks.

	Jakub
diff mbox series

Patch

2017-09-13  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* omp-offload.c (oacc_xform_loop): Enable SIMD vectorization on
	non-SIMT targets in acc vector loops.


diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 2d4fd411680..9d5b8bef649 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -51,6 +51,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "cfgloop.h"
 
 /* Describe the OpenACC looping structure of a function.  The entire
    function is held in a 'NULL' loop.  */
@@ -370,6 +371,30 @@  oacc_xform_loop (gcall *call)
       break;
 
     case IFN_GOACC_LOOP_OFFSET:
+      /* Enable vectorization on non-SIMT targets.  */
+      if (!targetm.simt.vf
+	  && outer_mask == GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+	  /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
+	     the loop.  */
+	  && (flag_tree_loop_vectorize
+	      || !global_options_set.x_flag_tree_loop_vectorize))
+	{
+	  basic_block bb = gsi_bb (gsi);
+	  struct loop *parent = bb->loop_father;
+	  struct loop *body = parent->inner;
+
+	  parent->force_vectorize = true;
+	  parent->safelen = INT_MAX;
+
+	  /* "Chunking loops" may have inner loops.  */
+	  if (parent->inner)
+	    {
+	      body->force_vectorize = true;
+	      body->safelen = INT_MAX;
+	    }
+
+	  cfun->has_force_vectorize_loops = true;
+	}
       if (striding)
 	{
 	  r = oacc_thread_numbers (true, mask, &seq);