diff mbox

[gomp4] Optimize expand_omp_for_static_chunk for chunk_size one

Message ID 55DB2A03.90304@mentor.com
State New
Headers show

Commit Message

Tom de Vries Aug. 24, 2015, 2:28 p.m. UTC
On 24-08-15 11:43, Jakub Jelinek wrote:
> On Mon, Jul 28, 2014 at 11:21:53AM +0200, Tom de Vries wrote:
>> Jakub,
>>
>> we're using expand_omp_for_static_chunk with a chunk_size of one to expand the
>> openacc loop construct.
>>
>> This results in an inner and outer loop being generated, with the inner loop
>> having a trip count of one, which means that the inner loop can be simplified to
>> just the inner loop body. However, subsequent optimizations do not manage to do
>> this simplification.
>>
>> This patch sets the loop exit condition to true if the chunk_size is one, to
>> ensure that the compiler will optimize away the inner loop.
>>
>> OK for gomp4 branch?
>>
>> Thanks,
>> - Tom
>
>> 2014-07-25  Tom de Vries  <tom@codesourcery.com>
>>
>> 	* omp-low.c (expand_omp_for_static_chunk): Remove inner loop if
>> 	chunk_size is one.
>
> If that is still the case on the trunk, the patch is ok for trunk after
> retesting it.  Please mention the PR tree-optimization/65468 in the
> ChangeLog entry and make sure there is some runtime testcase that tests
> that code path (both OpenMP and OpenACC one).
>

Committed attached patch to trunk.

I'll look into openacc testcase for trunk.

Thanks,
- Tom
diff mbox

Patch

Optimize expand_omp_for_static_chunk for chunk_size one

2015-08-24  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/65468
	* omp-low.c (expand_omp_for_static_chunk): Remove inner loop if
	chunk_size is one.

	* gcc.dg/gomp/static-chunk-size-one.c: New test.

	* testsuite/libgomp.c/static-chunk-size-one.c: New test.
---
 gcc/omp-low.c                                      | 11 ++++++++---
 gcc/testsuite/gcc.dg/gomp/static-chunk-size-one.c  | 18 +++++++++++++++++
 .../testsuite/libgomp.c/static-chunk-size-one.c    | 23 ++++++++++++++++++++++
 3 files changed, 49 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/static-chunk-size-one.c
 create mode 100644 libgomp/testsuite/libgomp.c/static-chunk-size-one.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d181101..19f34ec 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -7204,9 +7204,14 @@  expand_omp_for_static_chunk (struct omp_region *region,
 	  assign_stmt = gimple_build_assign (vback, t);
 	  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
 
-	  t = build2 (fd->loop.cond_code, boolean_type_node,
-		      DECL_P (vback) && TREE_ADDRESSABLE (vback)
-		      ? t : vback, e);
+	  if (tree_int_cst_equal (fd->chunk_size, integer_one_node))
+	    t = build2 (EQ_EXPR, boolean_type_node,
+			build_int_cst (itype, 0),
+			build_int_cst (itype, 1));
+	  else
+	    t = build2 (fd->loop.cond_code, boolean_type_node,
+			DECL_P (vback) && TREE_ADDRESSABLE (vback)
+			? t : vback, e);
 	  gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT);
 	}
 
diff --git a/gcc/testsuite/gcc.dg/gomp/static-chunk-size-one.c b/gcc/testsuite/gcc.dg/gomp/static-chunk-size-one.c
new file mode 100644
index 0000000..e82de77
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/static-chunk-size-one.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -O2 -fdump-tree-optimized -fno-tree-pre" } */
+
+int
+bar ()
+{
+  int a = 0, i;
+
+#pragma omp parallel for num_threads (3) reduction (+:a) schedule(static, 1)
+  for (i = 0; i < 10; i++)
+    a += i;
+
+  return a;
+}
+
+/* Two phis for reduction, one in loop header, one in loop exit.  One phi for iv
+   in loop header.  */
+/* { dg-final { scan-tree-dump-times "PHI" 3 "optimized" } } */
diff --git a/libgomp/testsuite/libgomp.c/static-chunk-size-one.c b/libgomp/testsuite/libgomp.c/static-chunk-size-one.c
new file mode 100644
index 0000000..9ed7b83
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/static-chunk-size-one.c
@@ -0,0 +1,23 @@ 
+extern void abort ();
+
+int
+bar ()
+{
+  int a = 0, i;
+
+#pragma omp parallel for num_threads (3) reduction (+:a) schedule(static, 1)
+  for (i = 0; i < 10; i++)
+    a += i;
+
+  return a;
+}
+
+int
+main (void)
+{
+  int res;
+  res = bar ();
+  if (res != 45)
+    abort ();
+  return 0;
+}
-- 
1.9.1