diff mbox

Avoid NULL cfun ICE in gcc/config/nvptx/nvptx.c:nvptx_libcall_value (was: [PATCH] Fix PR70760)

Message ID 87wpnixc4v.fsf@kepler.schwinge.homeip.net
State New
Headers show

Commit Message

Thomas Schwinge April 28, 2016, 10:26 a.m. UTC
Hi!

Richard's r235511 changes (quoted below) cause certain nvptx offloading
test cases to run into SIGSEGVs:

    [...]
    #4  0x0000000000d14193 in nvptx_libcall_value (mode=mode@entry=SImode)
        at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489
    #5  0x0000000000d17a20 in nvptx_function_value (type=0x7fc1fa359690, func=0x0, outgoing=<optimized out>)
        at [...]/source-gcc/gcc/config/nvptx/nvptx.c:512
    #6  0x00000000006ba220 in hard_function_value (valtype=valtype@entry=0x7fc1fa359690, func=func@entry=0x0, fntype=fntype@entry=0x0, 
        outgoing=outgoing@entry=0) at [...]/source-gcc/gcc/explow.c:1860
    #7  0x000000000073b0fa in aggregate_value_p (exp=exp@entry=0x7fc1fa41a048, fntype=0x0)
        at [...]/source-gcc/gcc/function.c:2086
    #8  0x0000000000bebc11 in find_func_aliases_for_call (t=0x1feac90, fn=0x7ffe448ca8a0)
        at [...]/source-gcc/gcc/tree-ssa-structalias.c:4644
    #9  find_func_aliases (fn=fn@entry=0x7fc1fa43a540, origt=origt@entry=0x7fc1fa43a7e0)
        at [...]/source-gcc/gcc/tree-ssa-structalias.c:4737
    #10 0x0000000000bf04eb in ipa_pta_execute ()
        at [...]/source-gcc/gcc/tree-ssa-structalias.c:7787
    #11 (anonymous namespace)::pass_ipa_pta::execute (this=<optimized out>)
        at [...]/source-gcc/gcc/tree-ssa-structalias.c:8035
    #12 0x0000000000940bed in execute_one_pass (pass=pass@entry=0x1f43770)
        at [...]/source-gcc/gcc/passes.c:2348
    #13 0x0000000000941972 in execute_ipa_pass_list (pass=0x1f43770)
        at [...]/source-gcc/gcc/passes.c:2778
    #14 0x0000000000607f1f in symbol_table::compile (this=0x7fc1fa359000)
        at [...]/source-gcc/gcc/cgraphunit.c:2435
    #15 0x000000000056ad48 in lto_main () at [...]/source-gcc/gcc/lto/lto.c:3328
    #16 0x0000000000a065df in compile_file () at [...]/source-gcc/gcc/toplev.c:474
    #17 0x000000000053753a in do_compile () at [...]/source-gcc/gcc/toplev.c:1998
    #18 toplev::main (this=this@entry=0x7ffe448caba0, argc=argc@entry=18, argv=0x1f1eec0, argv@entry=0x7ffe448caca8)
        at [...]/source-gcc/gcc/toplev.c:2106
    #19 0x00000000005391d7 in main (argc=18, argv=0x7ffe448caca8)
        at [...]/source-gcc/gcc/main.c:39

The immediate problem is that
gcc/config/nvptx/nvptx.c:nvptx_libcall_value is called in a context where
cfun is NULL, and it fails to handle that appropriately:

    (gdb) frame 4
    #4  0x0000000000d14193 in nvptx_libcall_value (mode=mode@entry=SImode)
        at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489
    489       if (!cfun->machine->doing_call)
    (gdb) print cfun
    $1 = (function *) 0x0

Looking at the backtrace, I see that in frame 7,
gcc/function.c:aggregate_value_p is called with a NULL fntype.  This
function is evidently prepared to handle that case, likewise for
gcc/explow.c:hard_function_value.  Does it thus follow that
gcc/config/nvptx/nvptx.c:nvptx_function_value and/or
gcc/config/nvptx/nvptx.c:nvptx_libcall_value need to be changed?  Is
something like the following sufficient (works in offloading testing, but
feels a bit like just "treating the symptoms"); for instance, should this
case rather be handled in gcc/config/nvptx/nvptx.c:nvptx_function_value
already?



For reference:

On Wed, 27 Apr 2016 13:07:36 +0200 (CEST), Richard Biener <rguenther@suse.de> wrote:
> The following patch fixes an issue in IPA PTA regarding to handling
> of DECL_BY_REFERENCE function results at the caller side.  The issue
> for the testcase in the PR is that we use the wrong function decl
> to look for DECL_RESULT for calls that are an alias (which get
> DECL_RESULT released).
> 
> But the issue is deeper in that the code also does not handle
> indirect calls correctly - to expose a testcase for this the
> patch also enables optimistic handling of functions escaping
> via their addresses, this is already handled fine after I added
> code to parse global initializers correctly.
> 
> LTO bootstrapped and tested on x86_64-unknown-linux-gnu with IPA PTA 
> enabled, inspected PTA result on the PRs testcase (I failed to create a 
> small reproducer).
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> 
> This is the trunk version of the fix, for the branch where the
> issue was reported against I will refrain from handling address-taken
> functions differently.
> 
> I hope I deciphered enough of the calls handling to assess that
> aggregate_value_p always matches DECL_BY_REFERENCE on DECL_RESULT.
> IPA PTA needs to know the GIMPLE representation of the callees
> DECL_RESULT (whether it's a pointer - at the caller side we
> still see the non-reference LHS).  And that needs to work for
> indirect calls as well.
> 
> Richard.
> 
> 2016-04-27  Richard Biener  <rguenther@suse.de>
> 
> 	PR ipa/70760
> 	* tree-ssa-structalias.c (find_func_aliases_for_call): Use
> 	aggregate_value_p to determine if a function result is
> 	returned by reference.
> 	(ipa_pta_execute): Functions having their address taken are
> 	not automatically nonlocal.
> 
> 	* g++.dg/ipa/ipa-pta-2.C: New testcase.
> 	* gcc.dg/ipa/ipa-pta-1.c: Adjust.
> 
> Index: gcc/tree-ssa-structalias.c
> ===================================================================
> *** gcc/tree-ssa-structalias.c	(revision 235478)
> --- gcc/tree-ssa-structalias.c	(working copy)
> *************** find_func_aliases_for_call (struct funct
> *** 4641,4652 ****
>   	  auto_vec<ce_s, 2> lhsc;
>   	  struct constraint_expr rhs;
>   	  struct constraint_expr *lhsp;
>   
>   	  get_constraint_for (lhsop, &lhsc);
>   	  rhs = get_function_part_constraint (fi, fi_result);
> ! 	  if (fndecl
> ! 	      && DECL_RESULT (fndecl)
> ! 	      && DECL_BY_REFERENCE (DECL_RESULT (fndecl)))
>   	    {
>   	      auto_vec<ce_s, 2> tem;
>   	      tem.quick_push (rhs);
> --- 4737,4747 ----
>   	  auto_vec<ce_s, 2> lhsc;
>   	  struct constraint_expr rhs;
>   	  struct constraint_expr *lhsp;
> + 	  bool aggr_p = aggregate_value_p (lhsop, gimple_call_fntype (t));
>   
>   	  get_constraint_for (lhsop, &lhsc);
>   	  rhs = get_function_part_constraint (fi, fi_result);
> ! 	  if (aggr_p)
>   	    {
>   	      auto_vec<ce_s, 2> tem;
>   	      tem.quick_push (rhs);
> *************** find_func_aliases_for_call (struct funct
> *** 4656,4677 ****
>   	    }
>   	  FOR_EACH_VEC_ELT (lhsc, j, lhsp)
>   	    process_constraint (new_constraint (*lhsp, rhs));
> - 	}
>   
> !       /* If we pass the result decl by reference, honor that.  */
> !       if (lhsop
> ! 	  && fndecl
> ! 	  && DECL_RESULT (fndecl)
> ! 	  && DECL_BY_REFERENCE (DECL_RESULT (fndecl)))
> ! 	{
> ! 	  struct constraint_expr lhs;
> ! 	  struct constraint_expr *rhsp;
>   
> ! 	  get_constraint_for_address_of (lhsop, &rhsc);
> ! 	  lhs = get_function_part_constraint (fi, fi_result);
> ! 	  FOR_EACH_VEC_ELT (rhsc, j, rhsp)
> ! 	    process_constraint (new_constraint (lhs, *rhsp));
> ! 	  rhsc.truncate (0);
>   	}
>   
>         /* If we use a static chain, pass it along.  */
> --- 4751,4769 ----
>   	    }
>   	  FOR_EACH_VEC_ELT (lhsc, j, lhsp)
>   	    process_constraint (new_constraint (*lhsp, rhs));
>   
> ! 	  /* If we pass the result decl by reference, honor that.  */
> ! 	  if (aggr_p)
> ! 	    {
> ! 	      struct constraint_expr lhs;
> ! 	      struct constraint_expr *rhsp;
>   
> ! 	      get_constraint_for_address_of (lhsop, &rhsc);
> ! 	      lhs = get_function_part_constraint (fi, fi_result);
> ! 	      FOR_EACH_VEC_ELT (rhsc, j, rhsp)
> ! 		  process_constraint (new_constraint (lhs, *rhsp));
> ! 	      rhsc.truncate (0);
> ! 	    }
>   	}
>   
>         /* If we use a static chain, pass it along.  */
> *************** ipa_pta_execute (void)
> *** 7686,7715 ****
>   
>         gcc_assert (!node->clone_of);
>   
> -       /* When parallelizing a code region, we split the region off into a
> - 	 separate function, to be run by several threads in parallel.  So for a
> - 	 function foo, we split off a region into a function
> - 	 foo._0 (void *foodata), and replace the region with some variant of a
> - 	 function call run_on_threads (&foo._0, data).  The '&foo._0' sets the
> - 	 address_taken bit for function foo._0, which would make it non-local.
> - 	 But for the purpose of ipa-pta, we can regard the run_on_threads call
> - 	 as a local call foo._0 (data),  so we ignore address_taken on nodes
> - 	 with parallelized_function set.
> - 	 Note: this is only safe, if foo and foo._0 are in the same lto
> - 	 partition.  */
> -       bool node_address_taken = ((node->parallelized_function
> - 				  && !node->used_from_other_partition)
> - 				 ? false
> - 				 : node->address_taken);
> - 
>         /* For externally visible or attribute used annotated functions use
>   	 local constraints for their arguments.
>   	 For local functions we see all callers and thus do not need initial
>   	 constraints for parameters.  */
>         bool nonlocal_p = (node->used_from_other_partition
>   			 || node->externally_visible
> ! 			 || node->force_output
> ! 			 || node_address_taken);
>         node->call_for_symbol_thunks_and_aliases (refered_from_nonlocal_fn,
>   						&nonlocal_p, true);
>   
> --- 7778,7790 ----
>   
>         gcc_assert (!node->clone_of);
>   
>         /* For externally visible or attribute used annotated functions use
>   	 local constraints for their arguments.
>   	 For local functions we see all callers and thus do not need initial
>   	 constraints for parameters.  */
>         bool nonlocal_p = (node->used_from_other_partition
>   			 || node->externally_visible
> ! 			 || node->force_output);
>         node->call_for_symbol_thunks_and_aliases (refered_from_nonlocal_fn,
>   						&nonlocal_p, true);
>   
> Index: gcc/testsuite/g++.dg/ipa/ipa-pta-2.C
> ===================================================================
> *** gcc/testsuite/g++.dg/ipa/ipa-pta-2.C	(revision 0)
> --- gcc/testsuite/g++.dg/ipa/ipa-pta-2.C	(working copy)
> ***************
> *** 0 ****
> --- 1,37 ----
> + // { dg-do run }
> + // { dg-options "-O2 -fipa-pta" }
> + 
> + extern "C" void abort (void);
> + 
> + struct Y { ~Y(); int i; };
> + 
> + Y::~Y () {}
> + 
> + static Y __attribute__((noinline)) foo ()
> + {
> +   Y res;
> +   res.i = 3;
> +   return res;
> + }
> + 
> + static Y __attribute__((noinline)) bar ()
> + {
> +   Y res;
> +   res.i = 42;
> +   return res;
> + }
> + 
> + static Y (*fn) ();
> + 
> + int a;
> + int main()
> + {
> +   if (a)
> +     fn = foo;
> +   else
> +     fn = bar;
> +   Y res = fn ();
> +   if (res.i != 42)
> +     abort ();
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c	(revision 235478)
> --- gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c	(working copy)
> *************** int main()
> *** 40,52 ****
>   }
>   
>   /* IPA PTA needs to handle indirect calls properly.  Verify that
> !    both bar and foo get a (and only a) in their arguments points-to sets.
> !    ???  As bar and foo have their address taken there might be callers
> !    not seen by IPA PTA (if the address escapes the unit which we only compute
> !    during IPA PTA...).  Thus the solution also includes NONLOCAL.  */
>   
>   /* { dg-final { scan-ipa-dump "fn_1 = { bar foo }" "pta2" } } */
> ! /* { dg-final { scan-ipa-dump "bar.arg0 = { NONLOCAL a }" "pta2" } } */
> ! /* { dg-final { scan-ipa-dump "bar.arg1 = { NONLOCAL a }" "pta2" } } */
> ! /* { dg-final { scan-ipa-dump "foo.arg0 = { NONLOCAL a }" "pta2" } } */
> ! /* { dg-final { scan-ipa-dump "foo.arg1 = { NONLOCAL a }" "pta2" } } */
> --- 40,49 ----
>   }
>   
>   /* IPA PTA needs to handle indirect calls properly.  Verify that
> !    both bar and foo get a (and only a) in their arguments points-to sets.  */
>   
>   /* { dg-final { scan-ipa-dump "fn_1 = { bar foo }" "pta2" } } */
> ! /* { dg-final { scan-ipa-dump "bar.arg0 = { a }" "pta2" } } */
> ! /* { dg-final { scan-ipa-dump "bar.arg1 = { a }" "pta2" } } */
> ! /* { dg-final { scan-ipa-dump "foo.arg0 = { a }" "pta2" } } */
> ! /* { dg-final { scan-ipa-dump "foo.arg1 = { a }" "pta2" } } */


Grüße
 Thomas

Comments

Richard Biener April 28, 2016, 10:43 a.m. UTC | #1
On Thu, 28 Apr 2016, Thomas Schwinge wrote:

> Hi!
> 
> Richard's r235511 changes (quoted below) cause certain nvptx offloading
> test cases to run into SIGSEGVs:
> 
>     [...]
>     #4  0x0000000000d14193 in nvptx_libcall_value (mode=mode@entry=SImode)
>         at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489
>     #5  0x0000000000d17a20 in nvptx_function_value (type=0x7fc1fa359690, func=0x0, outgoing=<optimized out>)
>         at [...]/source-gcc/gcc/config/nvptx/nvptx.c:512
>     #6  0x00000000006ba220 in hard_function_value (valtype=valtype@entry=0x7fc1fa359690, func=func@entry=0x0, fntype=fntype@entry=0x0, 
>         outgoing=outgoing@entry=0) at [...]/source-gcc/gcc/explow.c:1860
>     #7  0x000000000073b0fa in aggregate_value_p (exp=exp@entry=0x7fc1fa41a048, fntype=0x0)
>         at [...]/source-gcc/gcc/function.c:2086
>     #8  0x0000000000bebc11 in find_func_aliases_for_call (t=0x1feac90, fn=0x7ffe448ca8a0)
>         at [...]/source-gcc/gcc/tree-ssa-structalias.c:4644
>     #9  find_func_aliases (fn=fn@entry=0x7fc1fa43a540, origt=origt@entry=0x7fc1fa43a7e0)
>         at [...]/source-gcc/gcc/tree-ssa-structalias.c:4737
>     #10 0x0000000000bf04eb in ipa_pta_execute ()
>         at [...]/source-gcc/gcc/tree-ssa-structalias.c:7787
>     #11 (anonymous namespace)::pass_ipa_pta::execute (this=<optimized out>)
>         at [...]/source-gcc/gcc/tree-ssa-structalias.c:8035
>     #12 0x0000000000940bed in execute_one_pass (pass=pass@entry=0x1f43770)
>         at [...]/source-gcc/gcc/passes.c:2348
>     #13 0x0000000000941972 in execute_ipa_pass_list (pass=0x1f43770)
>         at [...]/source-gcc/gcc/passes.c:2778
>     #14 0x0000000000607f1f in symbol_table::compile (this=0x7fc1fa359000)
>         at [...]/source-gcc/gcc/cgraphunit.c:2435
>     #15 0x000000000056ad48 in lto_main () at [...]/source-gcc/gcc/lto/lto.c:3328
>     #16 0x0000000000a065df in compile_file () at [...]/source-gcc/gcc/toplev.c:474
>     #17 0x000000000053753a in do_compile () at [...]/source-gcc/gcc/toplev.c:1998
>     #18 toplev::main (this=this@entry=0x7ffe448caba0, argc=argc@entry=18, argv=0x1f1eec0, argv@entry=0x7ffe448caca8)
>         at [...]/source-gcc/gcc/toplev.c:2106
>     #19 0x00000000005391d7 in main (argc=18, argv=0x7ffe448caca8)
>         at [...]/source-gcc/gcc/main.c:39
> 
> The immediate problem is that
> gcc/config/nvptx/nvptx.c:nvptx_libcall_value is called in a context where
> cfun is NULL, and it fails to handle that appropriately:
> 
>     (gdb) frame 4
>     #4  0x0000000000d14193 in nvptx_libcall_value (mode=mode@entry=SImode)
>         at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489
>     489       if (!cfun->machine->doing_call)
>     (gdb) print cfun
>     $1 = (function *) 0x0
> 
> Looking at the backtrace, I see that in frame 7,
> gcc/function.c:aggregate_value_p is called with a NULL fntype.  This
> function is evidently prepared to handle that case, likewise for
> gcc/explow.c:hard_function_value.  Does it thus follow that
> gcc/config/nvptx/nvptx.c:nvptx_function_value and/or
> gcc/config/nvptx/nvptx.c:nvptx_libcall_value need to be changed?  Is
> something like the following sufficient (works in offloading testing, but
> feels a bit like just "treating the symptoms"); for instance, should this
> case rather be handled in gcc/config/nvptx/nvptx.c:nvptx_function_value
> already?
> 
> --- gcc/config/nvptx/nvptx.c
> +++ gcc/config/nvptx/nvptx.c
> @@ -484,7 +484,7 @@ nvptx_strict_argument_naming (cumulative_args_t cum_v)
>  static rtx
>  nvptx_libcall_value (machine_mode mode, const_rtx)
>  {
> -  if (!cfun->machine->doing_call)
> +  if (!cfun || !cfun->machine->doing_call)
>      /* Pretend to return in a hard reg for early uses before pseudos can be
>         generated.  */
>      return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);

Doing anything based on 'cfun' here is fishy at least for the
call context of aggregate_value_p as that is also used when
looking at the caller side of a call for example when expanding calls
where cfun is then the callers cfun and not the callees.

So I suggest to remove cfun->machine->doing_call and revisit the
reason why it was added for PTX.

Richard.

> 
> For reference:
> 
> On Wed, 27 Apr 2016 13:07:36 +0200 (CEST), Richard Biener <rguenther@suse.de> wrote:
> > The following patch fixes an issue in IPA PTA regarding to handling
> > of DECL_BY_REFERENCE function results at the caller side.  The issue
> > for the testcase in the PR is that we use the wrong function decl
> > to look for DECL_RESULT for calls that are an alias (which get
> > DECL_RESULT released).
> > 
> > But the issue is deeper in that the code also does not handle
> > indirect calls correctly - to expose a testcase for this the
> > patch also enables optimistic handling of functions escaping
> > via their addresses, this is already handled fine after I added
> > code to parse global initializers correctly.
> > 
> > LTO bootstrapped and tested on x86_64-unknown-linux-gnu with IPA PTA 
> > enabled, inspected PTA result on the PRs testcase (I failed to create a 
> > small reproducer).
> > 
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > 
> > This is the trunk version of the fix, for the branch where the
> > issue was reported against I will refrain from handling address-taken
> > functions differently.
> > 
> > I hope I deciphered enough of the calls handling to assess that
> > aggregate_value_p always matches DECL_BY_REFERENCE on DECL_RESULT.
> > IPA PTA needs to know the GIMPLE representation of the callees
> > DECL_RESULT (whether it's a pointer - at the caller side we
> > still see the non-reference LHS).  And that needs to work for
> > indirect calls as well.
> > 
> > Richard.
> > 
> > 2016-04-27  Richard Biener  <rguenther@suse.de>
> > 
> > 	PR ipa/70760
> > 	* tree-ssa-structalias.c (find_func_aliases_for_call): Use
> > 	aggregate_value_p to determine if a function result is
> > 	returned by reference.
> > 	(ipa_pta_execute): Functions having their address taken are
> > 	not automatically nonlocal.
> > 
> > 	* g++.dg/ipa/ipa-pta-2.C: New testcase.
> > 	* gcc.dg/ipa/ipa-pta-1.c: Adjust.
> > 
> > Index: gcc/tree-ssa-structalias.c
> > ===================================================================
> > *** gcc/tree-ssa-structalias.c	(revision 235478)
> > --- gcc/tree-ssa-structalias.c	(working copy)
> > *************** find_func_aliases_for_call (struct funct
> > *** 4641,4652 ****
> >   	  auto_vec<ce_s, 2> lhsc;
> >   	  struct constraint_expr rhs;
> >   	  struct constraint_expr *lhsp;
> >   
> >   	  get_constraint_for (lhsop, &lhsc);
> >   	  rhs = get_function_part_constraint (fi, fi_result);
> > ! 	  if (fndecl
> > ! 	      && DECL_RESULT (fndecl)
> > ! 	      && DECL_BY_REFERENCE (DECL_RESULT (fndecl)))
> >   	    {
> >   	      auto_vec<ce_s, 2> tem;
> >   	      tem.quick_push (rhs);
> > --- 4737,4747 ----
> >   	  auto_vec<ce_s, 2> lhsc;
> >   	  struct constraint_expr rhs;
> >   	  struct constraint_expr *lhsp;
> > + 	  bool aggr_p = aggregate_value_p (lhsop, gimple_call_fntype (t));
> >   
> >   	  get_constraint_for (lhsop, &lhsc);
> >   	  rhs = get_function_part_constraint (fi, fi_result);
> > ! 	  if (aggr_p)
> >   	    {
> >   	      auto_vec<ce_s, 2> tem;
> >   	      tem.quick_push (rhs);
> > *************** find_func_aliases_for_call (struct funct
> > *** 4656,4677 ****
> >   	    }
> >   	  FOR_EACH_VEC_ELT (lhsc, j, lhsp)
> >   	    process_constraint (new_constraint (*lhsp, rhs));
> > - 	}
> >   
> > !       /* If we pass the result decl by reference, honor that.  */
> > !       if (lhsop
> > ! 	  && fndecl
> > ! 	  && DECL_RESULT (fndecl)
> > ! 	  && DECL_BY_REFERENCE (DECL_RESULT (fndecl)))
> > ! 	{
> > ! 	  struct constraint_expr lhs;
> > ! 	  struct constraint_expr *rhsp;
> >   
> > ! 	  get_constraint_for_address_of (lhsop, &rhsc);
> > ! 	  lhs = get_function_part_constraint (fi, fi_result);
> > ! 	  FOR_EACH_VEC_ELT (rhsc, j, rhsp)
> > ! 	    process_constraint (new_constraint (lhs, *rhsp));
> > ! 	  rhsc.truncate (0);
> >   	}
> >   
> >         /* If we use a static chain, pass it along.  */
> > --- 4751,4769 ----
> >   	    }
> >   	  FOR_EACH_VEC_ELT (lhsc, j, lhsp)
> >   	    process_constraint (new_constraint (*lhsp, rhs));
> >   
> > ! 	  /* If we pass the result decl by reference, honor that.  */
> > ! 	  if (aggr_p)
> > ! 	    {
> > ! 	      struct constraint_expr lhs;
> > ! 	      struct constraint_expr *rhsp;
> >   
> > ! 	      get_constraint_for_address_of (lhsop, &rhsc);
> > ! 	      lhs = get_function_part_constraint (fi, fi_result);
> > ! 	      FOR_EACH_VEC_ELT (rhsc, j, rhsp)
> > ! 		  process_constraint (new_constraint (lhs, *rhsp));
> > ! 	      rhsc.truncate (0);
> > ! 	    }
> >   	}
> >   
> >         /* If we use a static chain, pass it along.  */
> > *************** ipa_pta_execute (void)
> > *** 7686,7715 ****
> >   
> >         gcc_assert (!node->clone_of);
> >   
> > -       /* When parallelizing a code region, we split the region off into a
> > - 	 separate function, to be run by several threads in parallel.  So for a
> > - 	 function foo, we split off a region into a function
> > - 	 foo._0 (void *foodata), and replace the region with some variant of a
> > - 	 function call run_on_threads (&foo._0, data).  The '&foo._0' sets the
> > - 	 address_taken bit for function foo._0, which would make it non-local.
> > - 	 But for the purpose of ipa-pta, we can regard the run_on_threads call
> > - 	 as a local call foo._0 (data),  so we ignore address_taken on nodes
> > - 	 with parallelized_function set.
> > - 	 Note: this is only safe, if foo and foo._0 are in the same lto
> > - 	 partition.  */
> > -       bool node_address_taken = ((node->parallelized_function
> > - 				  && !node->used_from_other_partition)
> > - 				 ? false
> > - 				 : node->address_taken);
> > - 
> >         /* For externally visible or attribute used annotated functions use
> >   	 local constraints for their arguments.
> >   	 For local functions we see all callers and thus do not need initial
> >   	 constraints for parameters.  */
> >         bool nonlocal_p = (node->used_from_other_partition
> >   			 || node->externally_visible
> > ! 			 || node->force_output
> > ! 			 || node_address_taken);
> >         node->call_for_symbol_thunks_and_aliases (refered_from_nonlocal_fn,
> >   						&nonlocal_p, true);
> >   
> > --- 7778,7790 ----
> >   
> >         gcc_assert (!node->clone_of);
> >   
> >         /* For externally visible or attribute used annotated functions use
> >   	 local constraints for their arguments.
> >   	 For local functions we see all callers and thus do not need initial
> >   	 constraints for parameters.  */
> >         bool nonlocal_p = (node->used_from_other_partition
> >   			 || node->externally_visible
> > ! 			 || node->force_output);
> >         node->call_for_symbol_thunks_and_aliases (refered_from_nonlocal_fn,
> >   						&nonlocal_p, true);
> >   
> > Index: gcc/testsuite/g++.dg/ipa/ipa-pta-2.C
> > ===================================================================
> > *** gcc/testsuite/g++.dg/ipa/ipa-pta-2.C	(revision 0)
> > --- gcc/testsuite/g++.dg/ipa/ipa-pta-2.C	(working copy)
> > ***************
> > *** 0 ****
> > --- 1,37 ----
> > + // { dg-do run }
> > + // { dg-options "-O2 -fipa-pta" }
> > + 
> > + extern "C" void abort (void);
> > + 
> > + struct Y { ~Y(); int i; };
> > + 
> > + Y::~Y () {}
> > + 
> > + static Y __attribute__((noinline)) foo ()
> > + {
> > +   Y res;
> > +   res.i = 3;
> > +   return res;
> > + }
> > + 
> > + static Y __attribute__((noinline)) bar ()
> > + {
> > +   Y res;
> > +   res.i = 42;
> > +   return res;
> > + }
> > + 
> > + static Y (*fn) ();
> > + 
> > + int a;
> > + int main()
> > + {
> > +   if (a)
> > +     fn = foo;
> > +   else
> > +     fn = bar;
> > +   Y res = fn ();
> > +   if (res.i != 42)
> > +     abort ();
> > +   return 0;
> > + }
> > Index: gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c
> > ===================================================================
> > *** gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c	(revision 235478)
> > --- gcc/testsuite/gcc.dg/ipa/ipa-pta-1.c	(working copy)
> > *************** int main()
> > *** 40,52 ****
> >   }
> >   
> >   /* IPA PTA needs to handle indirect calls properly.  Verify that
> > !    both bar and foo get a (and only a) in their arguments points-to sets.
> > !    ???  As bar and foo have their address taken there might be callers
> > !    not seen by IPA PTA (if the address escapes the unit which we only compute
> > !    during IPA PTA...).  Thus the solution also includes NONLOCAL.  */
> >   
> >   /* { dg-final { scan-ipa-dump "fn_1 = { bar foo }" "pta2" } } */
> > ! /* { dg-final { scan-ipa-dump "bar.arg0 = { NONLOCAL a }" "pta2" } } */
> > ! /* { dg-final { scan-ipa-dump "bar.arg1 = { NONLOCAL a }" "pta2" } } */
> > ! /* { dg-final { scan-ipa-dump "foo.arg0 = { NONLOCAL a }" "pta2" } } */
> > ! /* { dg-final { scan-ipa-dump "foo.arg1 = { NONLOCAL a }" "pta2" } } */
> > --- 40,49 ----
> >   }
> >   
> >   /* IPA PTA needs to handle indirect calls properly.  Verify that
> > !    both bar and foo get a (and only a) in their arguments points-to sets.  */
> >   
> >   /* { dg-final { scan-ipa-dump "fn_1 = { bar foo }" "pta2" } } */
> > ! /* { dg-final { scan-ipa-dump "bar.arg0 = { a }" "pta2" } } */
> > ! /* { dg-final { scan-ipa-dump "bar.arg1 = { a }" "pta2" } } */
> > ! /* { dg-final { scan-ipa-dump "foo.arg0 = { a }" "pta2" } } */
> > ! /* { dg-final { scan-ipa-dump "foo.arg1 = { a }" "pta2" } } */
> 
> 
> Grüße
>  Thomas
>
Alexander Monakov April 28, 2016, 11:15 a.m. UTC | #2
On Thu, 28 Apr 2016, Richard Biener wrote:
> Doing anything based on 'cfun' here is fishy at least for the
> call context of aggregate_value_p as that is also used when
> looking at the caller side of a call for example when expanding calls
> where cfun is then the callers cfun and not the callees.
> 
> So I suggest to remove cfun->machine->doing_call and revisit the
> reason why it was added for PTX.

It does look a bit strange.  My understanding is this: today, nvptx never
returns aggregates on registers, so either an aggregate value is returned on
stack, or a scalar value is returned on a register.  The backend wants to
record the RTL mode of the value being returned (hence the 'outgoing'
check in nvptx_function_value), but otherwise wants to behave as if callees
return values in a fixed register, but callers receive values in a pseudo
register (hence the 'doing_call' check in nvptx_libcall_value: it can be
called in a context when pseudos can't be generated, but if we aren't
currently expanding a call to RTL, we can give whatever hard reg to middle end
in the right mode; it's a hack).

So if my understanding is correct, additional !cfun check can be acceptable as
a fix along the existing hack.  Perhaps with a note about the nature of the
hack.

Thanks.
Alexander
Bernd Schmidt April 28, 2016, 11:42 a.m. UTC | #3
On 04/28/2016 01:15 PM, Alexander Monakov wrote:
> So if my understanding is correct, additional !cfun check can be acceptable as
> a fix along the existing hack.  Perhaps with a note about the nature of the
> hack.

Yes, I think Thomas' patch is ok.


Bernd
diff mbox

Patch

--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -484,7 +484,7 @@  nvptx_strict_argument_naming (cumulative_args_t cum_v)
 static rtx
 nvptx_libcall_value (machine_mode mode, const_rtx)
 {
-  if (!cfun->machine->doing_call)
+  if (!cfun || !cfun->machine->doing_call)
     /* Pretend to return in a hard reg for early uses before pseudos can be
        generated.  */
     return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);