diff mbox

[PR69599] Fix GOMP/GOACC_parallel optimization in ipa-pta

Message ID 56B904B0.2080708@mentor.com
State New
Headers show

Commit Message

Tom de Vries Feb. 8, 2016, 9:12 p.m. UTC
On 08/02/16 11:54, Richard Biener wrote:
> On Mon, 8 Feb 2016, Tom de Vries wrote:
>
>> Hi,
>>
>> when compiling the fipa-pta tests in the libgomp testsuite (omp-nested-2.c,
>> pr46032.c) with -flto -flto-partition=max, the tests fail in execution
>> (PR69599).
>>
>> The problem is related to the GOMP/GOACC_parallel optimization we do in
>> fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a call
>> foo._0 (data).
>>
>> The problem is that this optimization is only legal in lto if both:
>> - foo containing the call GOMP_parallel (&foo._0, data) and
>> - foo._0
>> are contained in the same partition.
>>
>> In the case of -flto-partition=max, foo is contained in it's own partition,
>> and foo._0 is contained in another partition.  This means the data argument to
>> the GOMP_parallel call appears unused, and the setting of the argument is
>> optimized away, which causes the execution failure.
>>
>> This patch fixes that by testing if foo and foo._0 are part of the same
>> partition.
>>
>> [ Note that the node_address_taken change in the patch has no effect, since
>> nonlocal_p already tests for used_from_other_partition. But I thought it was
>> clearer to state the conditions under which we are allowed to ignore
>> node->address_taken explicitly. ]
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> Build for nvidia accelerator and reg-tested libgomp with various lto settings.
>>
>> OK for trunk, stage4?
>
> I don't like the in_lto_p checks, why's the check not working
> for non-LTO?
>

I was not sure if the partition flags were valid outside lto.

Updated patch removes the in_lto_p checks.

Bootstrapped on x86_64.

Build and reg-tested libgomp testsuite.

OK?

Thanks,
- Tom

Thanks,
- Tom

Comments

Richard Biener Feb. 9, 2016, 8:37 a.m. UTC | #1
On Mon, 8 Feb 2016, Tom de Vries wrote:

> On 08/02/16 11:54, Richard Biener wrote:
> > On Mon, 8 Feb 2016, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > when compiling the fipa-pta tests in the libgomp testsuite
> > > (omp-nested-2.c,
> > > pr46032.c) with -flto -flto-partition=max, the tests fail in execution
> > > (PR69599).
> > > 
> > > The problem is related to the GOMP/GOACC_parallel optimization we do in
> > > fipa-pta, where we interpret a call GOMP_parallel (&foo._0, data) as a
> > > call
> > > foo._0 (data).
> > > 
> > > The problem is that this optimization is only legal in lto if both:
> > > - foo containing the call GOMP_parallel (&foo._0, data) and
> > > - foo._0
> > > are contained in the same partition.
> > > 
> > > In the case of -flto-partition=max, foo is contained in it's own
> > > partition,
> > > and foo._0 is contained in another partition.  This means the data
> > > argument to
> > > the GOMP_parallel call appears unused, and the setting of the argument is
> > > optimized away, which causes the execution failure.
> > > 
> > > This patch fixes that by testing if foo and foo._0 are part of the same
> > > partition.
> > > 
> > > [ Note that the node_address_taken change in the patch has no effect,
> > > since
> > > nonlocal_p already tests for used_from_other_partition. But I thought it
> > > was
> > > clearer to state the conditions under which we are allowed to ignore
> > > node->address_taken explicitly. ]
> > > 
> > > Bootstrapped and reg-tested on x86_64.
> > > 
> > > Build for nvidia accelerator and reg-tested libgomp with various lto
> > > settings.
> > > 
> > > OK for trunk, stage4?
> > 
> > I don't like the in_lto_p checks, why's the check not working
> > for non-LTO?
> > 
> 
> I was not sure if the partition flags were valid outside lto.
> 
> Updated patch removes the in_lto_p checks.
> 
> Bootstrapped on x86_64.
> 
> Build and reg-tested libgomp testsuite.
> 
> OK?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
> 
> Thanks,
> - Tom
> 
>
diff mbox

Patch

Fix GOMP/GOACC_parallel optimization in ipa-pta

2016-02-08  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/69599
	* tree-ssa-structalias.c (fndecl_maybe_in_other_partition): New
	function.
	(find_func_aliases_for_builtin_call, find_func_clobbers)
	(ipa_pta_execute):  Handle case that foo and foo._0 are not in same lto
	partition.

	* testsuite/libgomp.c/omp-nested-3.c: New test.
	* testsuite/libgomp.c/pr46032-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/parallel-2.c: New test.

---
 gcc/tree-ssa-structalias.c                         | 49 ++++++++++++++++++----
 libgomp/testsuite/libgomp.c/omp-nested-3.c         |  4 ++
 libgomp/testsuite/libgomp.c/pr46032-2.c            |  4 ++
 .../libgomp.oacc-c-c++-common/kernels-2.c          |  4 ++
 .../libgomp.oacc-c-c++-common/parallel-2.c         |  4 ++
 5 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index e7d0797..d7a7dc5 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4162,6 +4162,18 @@  find_func_aliases_for_call_arg (varinfo_t fi, unsigned index, tree arg)
     process_constraint (new_constraint (lhs, *rhsp));
 }
 
+/* Return true if FNDECL may be part of another lto partition.  */
+
+static bool
+fndecl_maybe_in_other_partition (tree fndecl)
+{
+  cgraph_node *fn_node = cgraph_node::get (fndecl);
+  if (fn_node == NULL)
+    return true;
+
+  return fn_node->in_other_partition;
+}
+
 /* Create constraints for the builtin call T.  Return true if the call
    was handled, otherwise false.  */
 
@@ -4537,6 +4549,10 @@  find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
 	      tree fnarg = gimple_call_arg (t, fnpos);
 	      gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
 	      tree fndecl = TREE_OPERAND (fnarg, 0);
+	      if (fndecl_maybe_in_other_partition (fndecl))
+		/* Fallthru to general call handling.  */
+		break;
+
 	      tree arg = gimple_call_arg (t, argpos);
 
 	      varinfo_t fi = get_vi_for_tree (fndecl);
@@ -5113,6 +5129,10 @@  find_func_clobbers (struct function *fn, gimple *origt)
 	      tree fnarg = gimple_call_arg (t, fnpos);
 	      gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
 	      tree fndecl = TREE_OPERAND (fnarg, 0);
+	      if (fndecl_maybe_in_other_partition (fndecl))
+		/* Fallthru to general call handling.  */
+		break;
+
 	      varinfo_t cfi = get_vi_for_tree (fndecl);
 
 	      tree arg = gimple_call_arg (t, argpos);
@@ -7505,9 +7525,13 @@  ipa_pta_execute (void)
 	 address_taken bit for function foo._0, which would make it non-local.
 	 But for the purpose of ipa-pta, we can regard the run_on_threads call
 	 as a local call foo._0 (data),  so we ignore address_taken on nodes
-	 with parallelized_function set.  */
-      bool node_address_taken = (node->address_taken
-				 && !node->parallelized_function);
+	 with parallelized_function set.
+	 Note: this is only safe, if foo and foo._0 are in the same lto
+	 partition.  */
+      bool node_address_taken = ((node->parallelized_function
+				  && !node->used_from_other_partition)
+				 ? false
+				 : node->address_taken);
 
       /* For externally visible or attribute used annotated functions use
 	 local constraints for their arguments.
@@ -7676,12 +7700,19 @@  ipa_pta_execute (void)
 		continue;
 
 	      /* Handle direct calls to functions with body.  */
-	      if (gimple_call_builtin_p (stmt, BUILT_IN_GOMP_PARALLEL))
-		decl = TREE_OPERAND (gimple_call_arg (stmt, 0), 0);
-	      else if (gimple_call_builtin_p (stmt, BUILT_IN_GOACC_PARALLEL))
-		decl = TREE_OPERAND (gimple_call_arg (stmt, 1), 0);
-	      else
-		decl = gimple_call_fndecl (stmt);
+	      decl = gimple_call_fndecl (stmt);
+
+	      {
+		tree called_decl = NULL_TREE;
+		if (gimple_call_builtin_p (stmt, BUILT_IN_GOMP_PARALLEL))
+		  called_decl = TREE_OPERAND (gimple_call_arg (stmt, 0), 0);
+		else if (gimple_call_builtin_p (stmt, BUILT_IN_GOACC_PARALLEL))
+		  called_decl = TREE_OPERAND (gimple_call_arg (stmt, 1), 0);
+
+		if (called_decl != NULL_TREE
+		    && !fndecl_maybe_in_other_partition (called_decl))
+		  decl = called_decl;
+	      }
 
 	      if (decl
 		  && (fi = lookup_vi_for_tree (decl))
diff --git a/libgomp/testsuite/libgomp.c/omp-nested-3.c b/libgomp/testsuite/libgomp.c/omp-nested-3.c
new file mode 100644
index 0000000..7790c58
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/omp-nested-3.c
@@ -0,0 +1,4 @@ 
+// { dg-do run { target lto } }
+// { dg-additional-options "-fipa-pta -flto -flto-partition=max" }
+
+#include "omp-nested-1.c"
diff --git a/libgomp/testsuite/libgomp.c/pr46032-2.c b/libgomp/testsuite/libgomp.c/pr46032-2.c
new file mode 100644
index 0000000..1125f6e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr46032-2.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run { target lto } } */
+/* { dg-options "-O2 -ftree-vectorize -std=c99 -fipa-pta -flto -flto-partition=max" } */
+
+#include "pr46032.c"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c
new file mode 100644
index 0000000..f76c926
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run { target lto } } */
+/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
+
+#include "kernels-1.c"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c
new file mode 100644
index 0000000..d9fff6f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c
@@ -0,0 +1,4 @@ 
+/* { dg-do run { target lto } } */
+/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
+
+#include "parallel-1.c"