diff mbox

[PR69599] Fix GOMP/GOACC_parallel optimization in ipa-pta

Message ID 56B86597.8070108@mentor.com
State New
Headers show

Commit Message

Tom de Vries Feb. 8, 2016, 9:53 a.m. UTC
Hi,

when compiling the fipa-pta tests in the libgomp testsuite 
(omp-nested-2.c, pr46032.c) with -flto -flto-partition=max, the tests 
fail in execution (PR69599).

The problem is related to the GOMP/GOACC_parallel optimization we do in 
fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a 
call foo._0 (data).

The problem is that this optimization is only legal in lto if both:
- foo containing the call GOMP_parallel (&foo._0, data) and
- foo._0
are contained in the same partition.

In the case of -flto-partition=max, foo is contained in it's own 
partition, and foo._0 is contained in another partition.  This means the 
data argument to the GOMP_parallel call appears unused, and the setting 
of the argument is optimized away, which causes the execution failure.

This patch fixes that by testing if foo and foo._0 are part of the same 
partition.

[ Note that the node_address_taken change in the patch has no effect, 
since nonlocal_p already tests for used_from_other_partition. But I 
thought it was clearer to state the conditions under which we are 
allowed to ignore node->address_taken explicitly. ]

Bootstrapped and reg-tested on x86_64.

Build for nvidia accelerator and reg-tested libgomp with various lto 
settings.

OK for trunk, stage4?

Thanks,
- Tom

Comments

Richard Biener Feb. 8, 2016, 10:54 a.m. UTC | #1
On Mon, 8 Feb 2016, Tom de Vries wrote:

> Hi,
> 
> when compiling the fipa-pta tests in the libgomp testsuite (omp-nested-2.c,
> pr46032.c) with -flto -flto-partition=max, the tests fail in execution
> (PR69599).
> 
> The problem is related to the GOMP/GOACC_parallel optimization we do in
> fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a call
> foo._0 (data).
> 
> The problem is that this optimization is only legal in lto if both:
> - foo containing the call GOMP_parallel (&foo._0, data) and
> - foo._0
> are contained in the same partition.
> 
> In the case of -flto-partition=max, foo is contained in it's own partition,
> and foo._0 is contained in another partition.  This means the data argument to
> the GOMP_parallel call appears unused, and the setting of the argument is
> optimized away, which causes the execution failure.
> 
> This patch fixes that by testing if foo and foo._0 are part of the same
> partition.
> 
> [ Note that the node_address_taken change in the patch has no effect, since
> nonlocal_p already tests for used_from_other_partition. But I thought it was
> clearer to state the conditions under which we are allowed to ignore
> node->address_taken explicitly. ]
> 
> Bootstrapped and reg-tested on x86_64.
> 
> Build for nvidia accelerator and reg-tested libgomp with various lto settings.
> 
> OK for trunk, stage4?

I don't like the in_lto_p checks, why's the check not working
for non-LTO?

Thanks,
Richard.

> 
> Thanks,
> - Tom
>
diff mbox

Patch

Fix GOMP/GOACC_parallel optimization in ipa-pta

2016-02-08  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/69599
	* tree-ssa-structalias.c (fndecl_maybe_in_other_partition): New
	function.
	(find_func_aliases_for_builtin_call, find_func_clobbers)
	(ipa_pta_execute):  Handle case that foo and foo._0 are not in same lto
	partition.

	* testsuite/libgomp.c/omp-nested-3.c: New test.
	* testsuite/libgomp.c/pr46032-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/parallel-2.c: New test.

---
 gcc/tree-ssa-structalias.c                         | 53 ++++++++++++++++++----
 libgomp/testsuite/libgomp.c/omp-nested-3.c         |  4 ++
 libgomp/testsuite/libgomp.c/pr46032-2.c            |  4 ++
 .../libgomp.oacc-c-c++-common/kernels-2.c          |  4 ++
 .../libgomp.oacc-c-c++-common/parallel-2.c         |  4 ++
 5 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index e7d0797..feebd8d 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4162,6 +4162,18 @@  find_func_aliases_for_call_arg (varinfo_t fi, unsigned index, tree arg)
     process_constraint (new_constraint (lhs, *rhsp));
 }
 
+/* Return true if FNDECL may be part of another lto partition.  */
+
+static bool
+fndecl_maybe_in_other_partition (tree fndecl)
+{
+  cgraph_node *fn_node = cgraph_node::get (fndecl);
+  if (fn_node == NULL)
+    return true;
+
+  return fn_node->in_other_partition;
+}
+
 /* Create constraints for the builtin call T.  Return true if the call
    was handled, otherwise false.  */
 
@@ -4537,6 +4549,11 @@  find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
 	      tree fnarg = gimple_call_arg (t, fnpos);
 	      gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
 	      tree fndecl = TREE_OPERAND (fnarg, 0);
+	      if (in_lto_p
+		  && fndecl_maybe_in_other_partition (fndecl))
+		/* Fallthru to general call handling.  */
+		break;
+
 	      tree arg = gimple_call_arg (t, argpos);
 
 	      varinfo_t fi = get_vi_for_tree (fndecl);
@@ -5113,6 +5130,11 @@  find_func_clobbers (struct function *fn, gimple *origt)
 	      tree fnarg = gimple_call_arg (t, fnpos);
 	      gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
 	      tree fndecl = TREE_OPERAND (fnarg, 0);
+	      if (in_lto_p
+		  && fndecl_maybe_in_other_partition (fndecl))
+		/* Fallthru to general call handling.  */
+		break;
+
 	      varinfo_t cfi = get_vi_for_tree (fndecl);
 
 	      tree arg = gimple_call_arg (t, argpos);
@@ -7505,9 +7527,14 @@  ipa_pta_execute (void)
 	 address_taken bit for function foo._0, which would make it non-local.
 	 But for the purpose of ipa-pta, we can regard the run_on_threads call
 	 as a local call foo._0 (data),  so we ignore address_taken on nodes
-	 with parallelized_function set.  */
-      bool node_address_taken = (node->address_taken
-				 && !node->parallelized_function);
+	 with parallelized_function set.
+	 Note: this is only safe, if foo and foo._0 are in the same lto
+	 partition.  */
+      bool node_address_taken = ((node->parallelized_function
+				  && !(in_lto_p
+				       && node->used_from_other_partition))
+				 ? false
+				 : node->address_taken);
 
       /* For externally visible or attribute used annotated functions use
 	 local constraints for their arguments.
@@ -7676,12 +7703,20 @@  ipa_pta_execute (void)
 		continue;
 
 	      /* Handle direct calls to functions with body.  */
-	      if (gimple_call_builtin_p (stmt, BUILT_IN_GOMP_PARALLEL))
-		decl = TREE_OPERAND (gimple_call_arg (stmt, 0), 0);
-	      else if (gimple_call_builtin_p (stmt, BUILT_IN_GOACC_PARALLEL))
-		decl = TREE_OPERAND (gimple_call_arg (stmt, 1), 0);
-	      else
-		decl = gimple_call_fndecl (stmt);
+	      decl = gimple_call_fndecl (stmt);
+
+	      {
+		tree called_decl = NULL_TREE;
+		if (gimple_call_builtin_p (stmt, BUILT_IN_GOMP_PARALLEL))
+		  called_decl = TREE_OPERAND (gimple_call_arg (stmt, 0), 0);
+		else if (gimple_call_builtin_p (stmt, BUILT_IN_GOACC_PARALLEL))
+		  called_decl = TREE_OPERAND (gimple_call_arg (stmt, 1), 0);
+
+		if (called_decl != NULL_TREE
+		    && !(in_lto_p
+			 && fndecl_maybe_in_other_partition (called_decl)))
+		  decl = called_decl;
+	      }
 
 	      if (decl
 		  && (fi = lookup_vi_for_tree (decl))
diff --git a/libgomp/testsuite/libgomp.c/omp-nested-3.c b/libgomp/testsuite/libgomp.c/omp-nested-3.c
new file mode 100644
index 0000000..7790c58
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/omp-nested-3.c
@@ -0,0 +1,4 @@ 
+// { dg-do run { target lto } }
+// { dg-additional-options "-fipa-pta -flto -flto-partition=max" }
+
+#include "omp-nested-1.c"
diff --git a/libgomp/testsuite/libgomp.c/pr46032-2.c b/libgomp/testsuite/libgomp.c/pr46032-2.c
new file mode 100644
index 0000000..1125f6e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr46032-2.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run { target lto } } */
+/* { dg-options "-O2 -ftree-vectorize -std=c99 -fipa-pta -flto -flto-partition=max" } */
+
+#include "pr46032.c"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c
new file mode 100644
index 0000000..f76c926
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run { target lto } } */
+/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
+
+#include "kernels-1.c"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c
new file mode 100644
index 0000000..d9fff6f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run { target lto } } */
+/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
+
+#include "parallel-1.c"