diff mbox

ipa-cp heuristics fixes

Message ID 20151210073037.GA40772@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Dec. 10, 2015, 7:30 a.m. UTC
Martin,
while looking into the ipa-cp dumps for bzip and Firefox I noticed few issues.
First of all, ipcp_cloning_candidate_p calls
 optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))
which can not be used at WPA time, becuase we have no DECL_STRUCT_FUNCTION
around.  I replaced it by node->optimize_for_size_p ().

Second we perform incredible number of clones because we do obtain some sort of
polymorphic call context for them.  In wast majority of cases this is useless
effort, because the functions in question do not contain virtual calls and do
not pass the parameter further.  For firefox about 40k out of 50k clones
created are created just because we found some context.

I changed the code to only clone if this immediately leads to devirtualization.
This do not cause any noticeable drop in number of devirtualized calls on
Firefox. I suppose we will miss the case where cloning a caller may allow
devirtualization in a clone of callee, but I do not think the heuristics for
context independent values can handle this as implemented right now and it
simply have way to many false positives.

What we can do is to devirtualize w/o cloning for local functions and
speculatively devirtualize in case we would otherwise clone.

Third problem I noticed is that
will_be_removed_from_program_if_no_direct_calls_p is used to decide if we can
ignore the function size when deciding about the code size impact.
This function is doing some analysis for inliner where it, for example, analyses
if a comdat which is going to be inlined consistently in the whole program
will be removed.

In the cloning case I do not see this to apply: we have no evidence that the
other units will pass the same constants to the function.  I think you
basically want to assume that the  function will be removed if it has no
address taken and it is not externally visibible. This is what local flag
is for.

I gathered some stats:

number of clones for all contexts: 49948->11102
number of clones: 4376->4383

good_cloning_opportunity_p is called about 70k times, I wonder if the
thresholds are not simply set too high.  For example, inliner does about 300k
inlines at Firefox.

number of param replacements: 13041-> 13056 + 5383 aggregate replacements (I do not have data on unpatched tree for this)
number of devirts: 956->933
number of devirts happening at inline: 781->868
number of indirect calls promoted: 512->512

Inliner stats from: Unit growth for small function inlining: 7965701->9130051 (14%)
to: Unit growth for small function inlining: 7965010->9138577

So it seems that except for large drop in number of clones there is no significant difference.

I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?

Honza

	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
	(good_cloning_opportunity_p): Likewise.
	(gather_context_independent_values): Do not return true when
	polymorphic call context is known or when we have known aggregate
	value of unused parameter.
	(estimate_local_effects): Try to create clone for all context
	when either some params are substituted or devirtualization is possible
	or some params can be removed; use local flag instead of
	node->will_be_removed_from_program_if_no_direct_calls_p.
	(identify_dead_nodes): Likewise.

Comments

Martin Jambor Dec. 10, 2015, 10:47 a.m. UTC | #1
Hi,

thanks for looking into this, I only have one question:

On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> Martin,
> while looking into the ipa-cp dumps for bzip and Firefox I noticed few issues.
> First of all, ipcp_cloning_candidate_p calls
>  optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))
> which can not be used at WPA time, becuase we have no DECL_STRUCT_FUNCTION
> around.  I replaced it by node->optimize_for_size_p ().
> 
> Second we perform incredible number of clones because we do obtain some sort of
> polymorphic call context for them.  In wast majority of cases this is useless
> effort, because the functions in question do not contain virtual calls and do
> not pass the parameter further.  For firefox about 40k out of 50k clones
> created are created just because we found some context.
> 
> I changed the code to only clone if this immediately leads to devirtualization.
> This do not cause any noticeable drop in number of devirtualized calls on
> Firefox. I suppose we will miss the case where cloning a caller may allow
> devirtualization in a clone of callee, but I do not think the heuristics for
> context independent values can handle this as implemented right now and it
> simply have way to many false positives.
> 
> What we can do is to devirtualize w/o cloning for local functions and
> speculatively devirtualize in case we would otherwise clone.
> 
> Third problem I noticed is that
> will_be_removed_from_program_if_no_direct_calls_p is used to decide if we can
> ignore the function size when deciding about the code size impact.
> This function is doing some analysis for inliner where it, for example, analyses
> if a comdat which is going to be inlined consistently in the whole program
> will be removed.
> 
> In the cloning case I do not see this to apply: we have no evidence that the
> other units will pass the same constants to the function.  I think you
> basically want to assume that the  function will be removed if it has no
> address taken and it is not externally visibible. This is what local flag
> is for.
> 
> I gathered some stats:
> 
> number of clones for all contexts: 49948->11102
> number of clones: 4376->4383
> 
> good_cloning_opportunity_p is called about 70k times, I wonder if the
> thresholds are not simply set too high.  For example, inliner does about 300k
> inlines at Firefox.
> 
> number of param replacements: 13041-> 13056 + 5383 aggregate replacements (I do not have data on unpatched tree for this)
> number of devirts: 956->933
> number of devirts happening at inline: 781->868
> number of indirect calls promoted: 512->512
> 
> Inliner stats from: Unit growth for small function inlining: 7965701->9130051 (14%)
> to: Unit growth for small function inlining: 7965010->9138577
> 
> So it seems that except for large drop in number of clones there is no significant difference.
> 
> I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?
> 
> Honza
> 
> 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> 	(good_cloning_opportunity_p): Likewise.
> 	(gather_context_independent_values): Do not return true when
> 	polymorphic call context is known or when we have known aggregate
> 	value of unused parameter.
> 	(estimate_local_effects): Try to create clone for all context
> 	when either some params are substituted or devirtualization is possible
> 	or some params can be removed; use local flag instead of
> 	node->will_be_removed_from_program_if_no_direct_calls_p.
> 	(identify_dead_nodes): Likewise.
> Index: ipa-cp.c
> ===================================================================
> --- ipa-cp.c	(revision 231477)
> +++ ipa-cp.c	(working copy)
> @@ -613,7 +613,7 @@ ipcp_cloning_candidate_p (struct cgraph_
>        return false;
>      }
>  
> -  if (!optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> +  if (node->optimize_for_size_p ())
>      {
>        if (dump_file)
>          fprintf (dump_file, "Not considering %s for cloning; "
> @@ -2267,7 +2267,7 @@ good_cloning_opportunity_p (struct cgrap
>  {
>    if (time_benefit == 0
>        || !opt_for_fn (node->decl, flag_ipa_cp_clone)
> -      || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> +      || node->optimize_for_size_p ())
>      return false;
>  
>    gcc_assert (size_cost > 0);
> @@ -2387,12 +2387,14 @@ gather_context_independent_values (struc
>  	*removable_params_cost
>  	  += ipa_get_param_move_cost (info, i);
>  
> +      if (!ipa_is_param_used (info, i))
> +	continue;
> +

Is this really necessary, is it not enough to remove the assignment to
ret below?  If the parameter is not used, devirtualization time bonus,
which you then rely on estimate_local_effects, should be zero for it.

It is a very minor point, I suppose, but if the function gets cloned
for a different reason, it might still be beneficial to have as much
context-independent information for it as possible too, because that
can then be used in a callee (see the second call of
gather_context_independent_values).

Other than that, all the changes seem like a clear improvement.

Thanks,

Martin

>        ipcp_lattice<ipa_polymorphic_call_context> *ctxlat = &plats->ctxlat;
> +      /* Do not account known context as reason for cloning.  We can see
> +	 if it permits devirtualization.  */
>        if (ctxlat->is_single_const ())
> -	{
> -	  (*known_contexts)[i] = ctxlat->values->value;
> -	  ret = true;
> -	}
> +	(*known_contexts)[i] = ctxlat->values->value;
>  
>        if (known_aggs)
>  	{
> @@ -2494,7 +2496,9 @@ estimate_local_effects (struct cgraph_no
>  						    &known_contexts, &known_aggs,
>  						    &removable_params_cost);
>    known_aggs_ptrs = agg_jmp_p_vec_for_t_vec (known_aggs);
> -  if (always_const)
> +  int devirt_bonus = devirtualization_time_bonus (node, known_csts,
> +					   known_contexts, known_aggs_ptrs);
> +  if (always_const || devirt_bonus || removable_params_cost)
>      {
>        struct caller_statistics stats;
>        inline_hints hints;
> @@ -2505,8 +2509,7 @@ estimate_local_effects (struct cgraph_no
>  					      false);
>        estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
>  					 known_aggs_ptrs, &size, &time, &hints);
> -      time -= devirtualization_time_bonus (node, known_csts, known_contexts,
> -					   known_aggs_ptrs);
> +      time -= devirt_bonus;
>        time -= hint_time_bonus (hints);
>        time -= removable_params_cost;
>        size -= stats.n_calls * removable_params_cost;
> @@ -2515,8 +2518,7 @@ estimate_local_effects (struct cgraph_no
>  	fprintf (dump_file, " - context independent values, size: %i, "
>  		 "time_benefit: %i\n", size, base_time - time);
>  
> -      if (size <= 0
> -	  || node->will_be_removed_from_program_if_no_direct_calls_p ())
> +      if (size <= 0 || node->local.local)
>  	{
>  	  info->do_clone_for_all_contexts = true;
>  	  base_time = time;
> @@ -2544,6 +2546,10 @@ estimate_local_effects (struct cgraph_no
>  		     "max_new_size would be reached with %li.\n",
>  		     size + overall_size);
>  	}
> +      else if (dump_file && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file, "   Not cloning for all contexts because "
> +		 "!good_cloning_opportunity_p.\n");
> +	
>      }
>  
>    for (i = 0; i < count ; i++)
> @@ -4419,7 +4425,7 @@ identify_dead_nodes (struct cgraph_node
>  {
>    struct cgraph_node *v;
>    for (v = node; v ; v = ((struct ipa_dfs_info *) v->aux)->next_cycle)
> -    if (v->will_be_removed_from_program_if_no_direct_calls_p ()
> +    if (v->local.local
>  	&& !v->call_for_symbol_thunks_and_aliases
>  	     (has_undead_caller_from_outside_scc_p, NULL, true))
>        IPA_NODE_REF (v)->node_dead = 1;
Jan Hubicka Dec. 10, 2015, 4:56 p.m. UTC | #2
> Is this really necessary, is it not enough to remove the assignment to
> ret below?  If the parameter is not used, devirtualization time bonus,
> which you then rely on estimate_local_effects, should be zero for it.
> 
> It is a very minor point, I suppose, but if the function gets cloned
> for a different reason, it might still be beneficial to have as much
> context-independent information for it as possible too, because that
> can then be used in a callee (see the second call of
> gather_context_independent_values).
> 
> Other than that, all the changes seem like a clear improvement.

The cutoff is there mainly for the rest of the function:
      if (known_aggs)
        {
          vec<ipa_agg_jf_item, va_gc> *agg_items;
          struct ipa_agg_jump_function *ajf;

          agg_items = context_independent_aggregate_values (plats);
          ajf = &(*known_aggs)[i];
          ajf->items = agg_items;
          ajf->by_ref = plats->aggs_by_ref;
          ret |= agg_items != NULL;
        }
I did not want ret to become true if we manage to propagate into an unused
aggregate parameter.

Honza
Martin Jambor Dec. 11, 2015, 12:03 p.m. UTC | #3
On Thu, Dec 10, 2015 at 05:56:26PM +0100, Jan Hubicka wrote:
> > Is this really necessary, is it not enough to remove the assignment to
> > ret below?  If the parameter is not used, devirtualization time bonus,
> > which you then rely on estimate_local_effects, should be zero for it.
> > 
> > It is a very minor point, I suppose, but if the function gets cloned
> > for a different reason, it might still be beneficial to have as much
> > context-independent information for it as possible too, because that
> > can then be used in a callee (see the second call of
> > gather_context_independent_values).
> > 
> > Other than that, all the changes seem like a clear improvement.
> 
> The cutoff is there mainly for the rest of the function:
>       if (known_aggs)
>         {
>           vec<ipa_agg_jf_item, va_gc> *agg_items;
>           struct ipa_agg_jump_function *ajf;
> 
>           agg_items = context_independent_aggregate_values (plats);
>           ajf = &(*known_aggs)[i];
>           ajf->items = agg_items;
>           ajf->by_ref = plats->aggs_by_ref;
>           ret |= agg_items != NULL;
>         }
> I did not want ret to become true if we manage to propagate into an unused
> aggregate parameter.

I see, it makes sense.

Thanks,

Martin
Jakub Jelinek Dec. 11, 2015, 8:11 p.m. UTC | #4
On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?

I think this patch (just a guess, but certainly ipa-cp related during last
24 hours) significantly regressed guality/pr36728-*.c on x86_64.
Previously we have not turned foo into foo.constprop*, now we do, and pass
just arg7 instead of arg1..arg7.  That is fine, but we really should be
emitting the debug info stuff for that case, that was added to fix PR47858,
but for whatever reason it doesn't happen in this case.  Does it take some
other path in ipa-prop.c, or bypass ipa-prop, something different?

> 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> 	(good_cloning_opportunity_p): Likewise.
> 	(gather_context_independent_values): Do not return true when
> 	polymorphic call context is known or when we have known aggregate
> 	value of unused parameter.
> 	(estimate_local_effects): Try to create clone for all context
> 	when either some params are substituted or devirtualization is possible
> 	or some params can be removed; use local flag instead of
> 	node->will_be_removed_from_program_if_no_direct_calls_p.
> 	(identify_dead_nodes): Likewise.

	Jakub
Jan Hubicka Dec. 11, 2015, 8:40 p.m. UTC | #5
> On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> > I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?
> 
> I think this patch (just a guess, but certainly ipa-cp related during last
> 24 hours) significantly regressed guality/pr36728-*.c on x86_64.
> Previously we have not turned foo into foo.constprop*, now we do, and pass
> just arg7 instead of arg1..arg7.  That is fine, but we really should be

Yes, I changed the heuristics to consider it a win to drop the arugment even
if no constant propagation is done.

> emitting the debug info stuff for that case, that was added to fix PR47858,
> but for whatever reason it doesn't happen in this case.  Does it take some
> other path in ipa-prop.c, or bypass ipa-prop, something different?

No, it is the same clonning as any other. Just the decision heuristics changed.
We would get same issue qith guality/pr36728- if one of arg1...arg7 was
constant.  I will try to take a look at your patch for PR47858 and see if I can
work out what goes wrong.

Honza
> 
> > 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> > 	(good_cloning_opportunity_p): Likewise.
> > 	(gather_context_independent_values): Do not return true when
> > 	polymorphic call context is known or when we have known aggregate
> > 	value of unused parameter.
> > 	(estimate_local_effects): Try to create clone for all context
> > 	when either some params are substituted or devirtualization is possible
> > 	or some params can be removed; use local flag instead of
> > 	node->will_be_removed_from_program_if_no_direct_calls_p.
> > 	(identify_dead_nodes): Likewise.
> 
> 	Jakub
Jan Hubicka Dec. 11, 2015, 9:20 p.m. UTC | #6
Actually I added
      if (!ipa_is_param_used (info, i))                                         
        continue;                                                               
shortcut to gather_context_independent_values which prevents
us from recording context_independent_aggregate_values for unused
aggregate parameters. Perhaps that is causing the isssue?
We can simply record them and just avoid returning true if
all propagations happen to those.
Dominik Vogt Dec. 16, 2015, 9:15 a.m. UTC | #7
On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> 	(good_cloning_opportunity_p): Likewise.
> 	(gather_context_independent_values): Do not return true when
> 	polymorphic call context is known or when we have known aggregate
> 	value of unused parameter.
> 	(estimate_local_effects): Try to create clone for all context
> 	when either some params are substituted or devirtualization is possible
> 	or some params can be removed; use local flag instead of
> 	node->will_be_removed_from_program_if_no_direct_calls_p.
> 	(identify_dead_nodes): Likewise.

This commit breaks several guality tests on S/390x:

+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 y == 2
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 y == 2
 FAIL: gcc.dg/guality/pr36728-2.c   -O2  line 18 *x == (char) 25
 FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 18 *x == (char) 25
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
 FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 *x == (char) 25
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 y == 2
 FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 *x == (char) 25
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 y == 2
 FAIL: gcc.dg/guality/pr36728-2.c   -Os  line 18 *x == (char) 25
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg7 == 30
...
 FAIL: gcc.dg/guality/vla-1.c   -O1  line 17 sizeof (a) == 6
 FAIL: gcc.dg/guality/vla-1.c   -O2  line 17 sizeof (a) == 6
 FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 17 sizeof (a) == 6
+FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 i == 5
 FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 sizeof (a) == 6
+FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 i == 5
 FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 sizeof (a) == 6
 FAIL: gcc.dg/guality/vla-1.c   -Os  line 17 sizeof (a) == 6

What can I do to help fixing this?

Ciao

Dominik ^_^  ^_^
Richard Biener Dec. 16, 2015, 9:59 a.m. UTC | #8
On Wed, Dec 16, 2015 at 10:15 AM, Dominik Vogt <vogt@linux.vnet.ibm.com> wrote:
> On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
>>       * ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
>>       (good_cloning_opportunity_p): Likewise.
>>       (gather_context_independent_values): Do not return true when
>>       polymorphic call context is known or when we have known aggregate
>>       value of unused parameter.
>>       (estimate_local_effects): Try to create clone for all context
>>       when either some params are substituted or devirtualization is possible
>>       or some params can be removed; use local flag instead of
>>       node->will_be_removed_from_program_if_no_direct_calls_p.
>>       (identify_dead_nodes): Likewise.
>
> This commit breaks several guality tests on S/390x:
>
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2  line 18 *x == (char) 25
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -Os  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg7 == 30
> ...
>  FAIL: gcc.dg/guality/vla-1.c   -O1  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -O2  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 17 sizeof (a) == 6
> +FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 i == 5
>  FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 sizeof (a) == 6
> +FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 i == 5
>  FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -Os  line 17 sizeof (a) == 6
>
> What can I do to help fixing this?

Same on x86_64 btw.  If there isn't a bugreport already please open
one to track this issue.

Thanks,
Richard.

> Ciao
>
> Dominik ^_^  ^_^
>
> --
>
> Dominik Vogt
> IBM Germany
>
Jakub Jelinek Dec. 16, 2015, 10:08 a.m. UTC | #9
On Wed, Dec 16, 2015 at 10:59:10AM +0100, Richard Biener wrote:
> > What can I do to help fixing this?
> 
> Same on x86_64 btw.  If there isn't a bugreport already please open
> one to track this issue.

PR68860 covers this already and it has been discussed on this ml too
already.

	Jakub
Dominik Vogt Dec. 16, 2015, 10:21 a.m. UTC | #10
On Wed, Dec 16, 2015 at 10:59:10AM +0100, Richard Biener wrote:
> Same on x86_64 btw.  If there isn't a bugreport already please open
> one to track this issue.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68935

Ciao

Dominik ^_^  ^_^
Jan Hubicka Dec. 16, 2015, 4:24 p.m. UTC | #11
> On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> > 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> > 	(good_cloning_opportunity_p): Likewise.
> > 	(gather_context_independent_values): Do not return true when
> > 	polymorphic call context is known or when we have known aggregate
> > 	value of unused parameter.
> > 	(estimate_local_effects): Try to create clone for all context
> > 	when either some params are substituted or devirtualization is possible
> > 	or some params can be removed; use local flag instead of
> > 	node->will_be_removed_from_program_if_no_direct_calls_p.
> > 	(identify_dead_nodes): Likewise.
> 
> This commit breaks several guality tests on S/390x:

The patch changes ipa-cp heuristics in a way that it will clone in order to remove
unused parameter.  Previously it did know how to remove unused parameter but would
do so only if some of function's parameter (perhaps the unused one) was also constant.
This is inconsistent.  As a result we do more ipa-cp cloning and I suppose we are
more likely to hit them on guality for simplified testcases that often contain unused
code:
int __attribute__((noinline))
foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7)
{
  char *x = __builtin_alloca (arg7);
  int __attribute__ ((aligned(32))) y;

  y = 2;
  asm (NOP : "=m" (y), "=m" (b) : "m" (y));
  x[0] = 25;
  asm (NOP : "=m" (x[0]), "=m" (a) : "m" (x[0]), "m" (b));
  return y;
}

int
main ()
{
  int l = 0;
  asm ("" : "=r" (l) : "0" (l));
  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
  asm volatile ("" :: "r" (l));
  return 0;
}

I am trying to understand Jakub's debug code and perhaps it can be improved. But in
the case of optimized out unused parameters I think it is perfectly resonable to
say that the variable was optimized out.

As you can see, the testcase explicitely prevents ipa-cp constant propagation by the
asm statement.  We can just update the testcases to use the parameters and test
the original issue again.
> 
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2  line 18 *x == (char) 25
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -Os  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg7 == 30
> ...
>  FAIL: gcc.dg/guality/vla-1.c   -O1  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -O2  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 17 sizeof (a) == 6
> +FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 i == 5
>  FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 sizeof (a) == 6
> +FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 i == 5
>  FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -Os  line 17 sizeof (a) == 6
> 
> What can I do to help fixing this?
> 
> Ciao
> 
> Dominik ^_^  ^_^
> 
> -- 
> 
> Dominik Vogt
> IBM Germany
Jakub Jelinek Dec. 16, 2015, 4:28 p.m. UTC | #12
On Wed, Dec 16, 2015 at 05:24:25PM +0100, Jan Hubicka wrote:
> I am trying to understand Jakub's debug code and perhaps it can be improved. But in
> the case of optimized out unused parameters I think it is perfectly resonable to
> say that the variable was optimized out.

As long as the values that would be passed to the unused parameters are
constant or live in memory or registers that isn't clobbered by the call in
the caller, we have the ability to express that in the debug info now, and
we should.

> As you can see, the testcase explicitely prevents ipa-cp constant propagation by the
> asm statement.  We can just update the testcases to use the parameters and test
> the original issue again.

No, the testcase intentionally wants to test unused arguments, they happen
in quite a lot of code and "optimized out" is not really desirable answer if
we can provide the values some other way.

	Jakub
diff mbox

Patch

Index: ipa-cp.c
===================================================================
--- ipa-cp.c	(revision 231477)
+++ ipa-cp.c	(working copy)
@@ -613,7 +613,7 @@  ipcp_cloning_candidate_p (struct cgraph_
       return false;
     }
 
-  if (!optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
+  if (node->optimize_for_size_p ())
     {
       if (dump_file)
         fprintf (dump_file, "Not considering %s for cloning; "
@@ -2267,7 +2267,7 @@  good_cloning_opportunity_p (struct cgrap
 {
   if (time_benefit == 0
       || !opt_for_fn (node->decl, flag_ipa_cp_clone)
-      || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
+      || node->optimize_for_size_p ())
     return false;
 
   gcc_assert (size_cost > 0);
@@ -2387,12 +2387,14 @@  gather_context_independent_values (struc
 	*removable_params_cost
 	  += ipa_get_param_move_cost (info, i);
 
+      if (!ipa_is_param_used (info, i))
+	continue;
+
       ipcp_lattice<ipa_polymorphic_call_context> *ctxlat = &plats->ctxlat;
+      /* Do not account known context as reason for cloning.  We can see
+	 if it permits devirtualization.  */
       if (ctxlat->is_single_const ())
-	{
-	  (*known_contexts)[i] = ctxlat->values->value;
-	  ret = true;
-	}
+	(*known_contexts)[i] = ctxlat->values->value;
 
       if (known_aggs)
 	{
@@ -2494,7 +2496,9 @@  estimate_local_effects (struct cgraph_no
 						    &known_contexts, &known_aggs,
 						    &removable_params_cost);
   known_aggs_ptrs = agg_jmp_p_vec_for_t_vec (known_aggs);
-  if (always_const)
+  int devirt_bonus = devirtualization_time_bonus (node, known_csts,
+					   known_contexts, known_aggs_ptrs);
+  if (always_const || devirt_bonus || removable_params_cost)
     {
       struct caller_statistics stats;
       inline_hints hints;
@@ -2505,8 +2509,7 @@  estimate_local_effects (struct cgraph_no
 					      false);
       estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
 					 known_aggs_ptrs, &size, &time, &hints);
-      time -= devirtualization_time_bonus (node, known_csts, known_contexts,
-					   known_aggs_ptrs);
+      time -= devirt_bonus;
       time -= hint_time_bonus (hints);
       time -= removable_params_cost;
       size -= stats.n_calls * removable_params_cost;
@@ -2515,8 +2518,7 @@  estimate_local_effects (struct cgraph_no
 	fprintf (dump_file, " - context independent values, size: %i, "
 		 "time_benefit: %i\n", size, base_time - time);
 
-      if (size <= 0
-	  || node->will_be_removed_from_program_if_no_direct_calls_p ())
+      if (size <= 0 || node->local.local)
 	{
 	  info->do_clone_for_all_contexts = true;
 	  base_time = time;
@@ -2544,6 +2546,10 @@  estimate_local_effects (struct cgraph_no
 		     "max_new_size would be reached with %li.\n",
 		     size + overall_size);
 	}
+      else if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "   Not cloning for all contexts because "
+		 "!good_cloning_opportunity_p.\n");
+	
     }
 
   for (i = 0; i < count ; i++)
@@ -4419,7 +4425,7 @@  identify_dead_nodes (struct cgraph_node
 {
   struct cgraph_node *v;
   for (v = node; v ; v = ((struct ipa_dfs_info *) v->aux)->next_cycle)
-    if (v->will_be_removed_from_program_if_no_direct_calls_p ()
+    if (v->local.local
 	&& !v->call_for_symbol_thunks_and_aliases
 	     (has_undead_caller_from_outside_scc_p, NULL, true))
       IPA_NODE_REF (v)->node_dead = 1;