diff mbox series

[RFC] ipa-cp: Feed results of IPA-CP into SCCVN

Message ID ri6fsmxaz11.fsf@suse.cz
State New
Headers show
Series [RFC] ipa-cp: Feed results of IPA-CP into SCCVN | expand

Commit Message

Martin Jambor April 1, 2022, 10:17 a.m. UTC
Hi,

PRs 68930 and 92497 show that when IPA-CP figures out constants in
aggregate parameters or when passed by reference but the loads happen
in an inlined function the information is lost.  This happens even
when the inlined function itself was known to have - or even cloned to
have - such constants in incoming parameters because the transform
phase of IPA passes is not run on them.  See discussion in the bugs
for reasons why.

Honza suggested that we can plug the results of IPA-CP analysis into
value numbering, so that FRE can figure out that some loads fetch
known constants.  This is what this patch attempts to do.

Although I spent quite some time reading tree-sccvn.c, it is complex
enough that I am sure I am not aware of various caveats and so I would
not be terribly surprised if there were some issues with my approach
that I am not aware of.  Nevertheless, it seems to work well for simple
cases and even passes bootstrap and testing (and LTO bootstrap) on
x86_64-linux.

I have experimented a little with using this approach instead of the
function walking parts of the IPA-CP transformation phase.  This would
mean that the known-constants would not participate in the passes after
IPA but before FRE - which are not many but there is a ccp and fwprop
pass among others.  For simple testcases like
gcc/testsuite/gcc.dg/ipa/ipcp-agg-*.c, it makes not assembly difference
at all.

What do you think?

Martin


gcc/ChangeLog:

2022-03-30  Martin Jambor  <mjambor@suse.cz>

	PR ipa/68930
	PR ipa/92497
	* ipa-prop.cc (ipcp_get_aggregate_const): New function.
	(ipcp_transform_function): Do not deallocate transformation info.
	* ipa-prop.h (ipcp_get_aggregate_const): Declare.
	* tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and
	ipa-prop.h.
	(vn_reference_lookup_2): When hitting default-def vuse, query
	IPA-CP transformation info for any known constants.

gcc/testsuite/ChangeLog:

2022-03-30  Martin Jambor  <mjambor@suse.cz>

	PR ipa/68930
	PR ipa/92497
	* gcc.dg/ipa/pr92497-1.c: New test.
	* gcc.dg/ipa/pr92497-2.c: Likewise.
---
 gcc/ipa-prop.cc                      | 43 ++++++++++++++++++++++++----
 gcc/ipa-prop.h                       |  2 ++
 gcc/testsuite/gcc.dg/ipa/pr92497-1.c | 26 +++++++++++++++++
 gcc/testsuite/gcc.dg/ipa/pr92497-2.c | 26 +++++++++++++++++
 gcc/tree-ssa-sccvn.cc                | 35 +++++++++++++++++++++-
 5 files changed, 126 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-1.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-2.c

Comments

Richard Biener April 1, 2022, 11:49 a.m. UTC | #1
On Fri, 1 Apr 2022, Martin Jambor wrote:

> Hi,
> 
> PRs 68930 and 92497 show that when IPA-CP figures out constants in
> aggregate parameters or when passed by reference but the loads happen
> in an inlined function the information is lost.  This happens even
> when the inlined function itself was known to have - or even cloned to
> have - such constants in incoming parameters because the transform
> phase of IPA passes is not run on them.  See discussion in the bugs
> for reasons why.
> 
> Honza suggested that we can plug the results of IPA-CP analysis into
> value numbering, so that FRE can figure out that some loads fetch
> known constants.  This is what this patch attempts to do.
> 
> Although I spent quite some time reading tree-sccvn.c, it is complex
> enough that I am sure I am not aware of various caveats and so I would
> not be terribly surprised if there were some issues with my approach
> that I am not aware of.  Nevertheless, it seems to work well for simple
> cases and even passes bootstrap and testing (and LTO bootstrap) on
> x86_64-linux.
> 
> I have experimented a little with using this approach instead of the
> function walking parts of the IPA-CP transformation phase.  This would
> mean that the known-constants would not participate in the passes after
> IPA but before FRE - which are not many but there is a ccp and fwprop
> pass among others.  For simple testcases like
> gcc/testsuite/gcc.dg/ipa/ipcp-agg-*.c, it makes not assembly difference
> at all.
> 
> What do you think?

Comments below

> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2022-03-30  Martin Jambor  <mjambor@suse.cz>
> 
> 	PR ipa/68930
> 	PR ipa/92497
> 	* ipa-prop.cc (ipcp_get_aggregate_const): New function.
> 	(ipcp_transform_function): Do not deallocate transformation info.
> 	* ipa-prop.h (ipcp_get_aggregate_const): Declare.
> 	* tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and
> 	ipa-prop.h.
> 	(vn_reference_lookup_2): When hitting default-def vuse, query
> 	IPA-CP transformation info for any known constants.
> 
> gcc/testsuite/ChangeLog:
> 
> 2022-03-30  Martin Jambor  <mjambor@suse.cz>
> 
> 	PR ipa/68930
> 	PR ipa/92497
> 	* gcc.dg/ipa/pr92497-1.c: New test.
> 	* gcc.dg/ipa/pr92497-2.c: Likewise.
> ---
>  gcc/ipa-prop.cc                      | 43 ++++++++++++++++++++++++----
>  gcc/ipa-prop.h                       |  2 ++
>  gcc/testsuite/gcc.dg/ipa/pr92497-1.c | 26 +++++++++++++++++
>  gcc/testsuite/gcc.dg/ipa/pr92497-2.c | 26 +++++++++++++++++
>  gcc/tree-ssa-sccvn.cc                | 35 +++++++++++++++++++++-
>  5 files changed, 126 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-2.c
> 
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index e55fe2776f2..a73a5d9ec1d 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -5748,6 +5748,44 @@ ipcp_modif_dom_walker::before_dom_children (basic_block bb)
>    return NULL;
>  }
>  
> +/* If IPA-CP discovered a constant in parameter PARM at OFFSET of a given SIZE
> +   - whether passed by reference or not is given by BY_REF - return that
> +   constant.  Otherwise return NULL_TREE.  */
> +
> +tree
> +ipcp_get_aggregate_const (tree parm, bool by_ref,
> +			  HOST_WIDE_INT offset, HOST_WIDE_INT size)

I'd prefer to pass in the function decl or struct function or
cgraph node.

> +{
> +  cgraph_node *cnode = cgraph_node::get (current_function_decl);
> +
> +  ipa_agg_replacement_value *aggval = ipa_get_agg_replacements_for_node (cnode);
> +  if (!aggval)
> +    return NULL_TREE;
> +
> +  int index = 0;
> +  for (tree p = DECL_ARGUMENTS (current_function_decl);
> +       p != parm; p = DECL_CHAIN (p))
> +    {
> +      index++;
> +      if (!p)
> +	return NULL_TREE;
> +    }
> +
> +  ipa_agg_replacement_value *v;
> +  for (v = aggval; v; v = v->next)
> +    if (v->index == index
> +	&& v->offset == offset)
> +      break;
> +  if (!v
> +      || v->by_ref != by_ref
> +      || maybe_ne (tree_to_poly_int64 (TYPE_SIZE (TREE_TYPE (v->value))),
> +		   size))
> +    return NULL_TREE;

two linear searches here - ugh.  I wonder if we should instead
pre-fill a hash-map from PARM_DECL to a ipa_agg_replacement_value *
vector sorted by offset which we can binary search?  That could be
done once when starting value-numbering (not on regions).  Is
there any reason the data structure is as it is?  It seems that
even ipcp_modif_dom_walker::before_dom_children will do a linear
walk and ipa_get_param_decl_index_1 also linearly walks parameters.

That looks highly sub-optimal to me, it's also done for each
mention of a parameter.

> +  return v->value;
> +}
> +
> +
>  /* Return true if we have recorded VALUE and MASK about PARM.
>     Set VALUE and MASk accordingly.  */
>  
> @@ -6055,11 +6093,6 @@ ipcp_transform_function (struct cgraph_node *node)
>      free_ipa_bb_info (bi);
>    fbi.bb_infos.release ();
>  
> -  ipcp_transformation *s = ipcp_transformation_sum->get (node);
> -  s->agg_values = NULL;
> -  s->bits = NULL;
> -  s->m_vr = NULL;
> -
>    vec_free (descriptors);
>    if (cfg_changed)
>      delete_unreachable_blocks_update_callgraph (node, false);
> diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> index 553adfc9f35..aa6fcb522ac 100644
> --- a/gcc/ipa-prop.h
> +++ b/gcc/ipa-prop.h
> @@ -1181,6 +1181,8 @@ void ipa_dump_param (FILE *, class ipa_node_params *info, int i);
>  void ipa_release_body_info (struct ipa_func_body_info *);
>  tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
>  bool ipcp_get_parm_bits (tree, tree *, widest_int *);
> +tree ipcp_get_aggregate_const (tree parm, bool by_ref,
> +			       HOST_WIDE_INT offset, HOST_WIDE_INT size);
>  bool unadjusted_ptr_and_unit_offset (tree op, tree *ret,
>  				     poly_int64 *offset_ret);
>  
> diff --git a/gcc/testsuite/gcc.dg/ipa/pr92497-1.c b/gcc/testsuite/gcc.dg/ipa/pr92497-1.c
> new file mode 100644
> index 00000000000..eb8f2e75fd0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/pr92497-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-early-inlining"  } */
> +
> +struct a {int a;};
> +static int //__attribute__ ((noinline))
> +foo (struct a a)
> +{
> +  if (!__builtin_constant_p (a.a))
> +    __builtin_abort ();
> +  return a.a;
> +}
> +
> +static int __attribute__ ((noinline))
> +bar (struct a a)
> +{
> +  return foo(a);
> +}
> +
> +volatile int r;
> +
> +int main()
> +{
> +  struct a a={1};
> +  r = bar (a);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/ipa/pr92497-2.c b/gcc/testsuite/gcc.dg/ipa/pr92497-2.c
> new file mode 100644
> index 00000000000..56e611df8f9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/pr92497-2.c
> @@ -0,0 +1,26 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-early-inlining -fno-ipa-sra"  } */
> +
> +struct a {int a;};
> +static int //__attribute__ ((noinline))
> +foo (struct a *a)
> +{
> +  if (!__builtin_constant_p (a->a))
> +    __builtin_abort ();
> +  return a->a;
> +}
> +
> +static int __attribute__ ((noinline))
> +bar (struct a *a)
> +{
> +  return foo(a);
> +}
> +
> +volatile int r;
> +
> +int main()
> +{
> +  struct a a={1};
> +  r = bar (&a);
> +  return 0;
> +}
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 66b4fc21f5b..da558055054 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -74,6 +74,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-modref-tree.h"
>  #include "ipa-modref.h"
>  #include "tree-ssa-sccvn.h"
> +#include "alloc-pool.h"
> +#include "symbol-summary.h"
> +#include "ipa-prop.h"
>  
>  /* This algorithm is based on the SCC algorithm presented by Keith
>     Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
> @@ -2273,7 +2276,7 @@ vn_walk_cb_data::push_partial_def (pd_data pd,
>     with the current VUSE and performs the expression lookup.  */
>  
>  static void *
> -vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_)
> +vn_reference_lookup_2 (ao_ref *op, tree vuse, void *data_)
>  {
>    vn_walk_cb_data *data = (vn_walk_cb_data *)data_;
>    vn_reference_t vr = data->vr;
> @@ -2307,6 +2310,36 @@ vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_)
>        return *slot;
>      }
>  
> +  if (SSA_NAME_IS_DEFAULT_DEF (vuse))
> +    {
> +      HOST_WIDE_INT offset, size;
> +      tree v = NULL_TREE;
> +      if (op->base && TREE_CODE (op->base) == PARM_DECL
> +	  && op->offset.is_constant (&offset)
> +	  && op->size.is_constant (&size))

you probably also want to check

        op->max_size_known_p ()
        && known_eq (op->size, op->max_size)

I see data->partial_defs.is_empty () is already checked in this
function, but we won't currently ever call vn_reference_lookup_3
for default defs.

That said, ideally we'd handle also partial defs, but we can
start without.

> +	v = ipcp_get_aggregate_const (op->base, false, offset, size);
> +      else if (op->ref)
> +	{
> +	  HOST_WIDE_INT offset, size;
> +	  bool reverse;
> +	  tree base = get_ref_base_and_extent_hwi (op->ref, &offset,
> +						   &size, &reverse);
> +	  if (base && TREE_CODE (base) == PARM_DECL)

In which case do you arrive here but op->base is not set above?
Maybe you need to use ao_ref_base (op->base) since op->base is
filled lazily from op->ref.

> +	    v = ipcp_get_aggregate_const (base, false, offset, size);
> +	  else if (base
> +		   && TREE_CODE (base) == MEM_REF
> +		   && integer_zerop (TREE_OPERAND (base, 1))
> +		   && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
> +		   && SSA_NAME_IS_DEFAULT_DEF (TREE_OPERAND (base, 0))
> +		   && (TREE_CODE (SSA_NAME_VAR (TREE_OPERAND (base, 0)))
> +		       == PARM_DECL))

This code should also be in the op->base part.

I'd prefer if you pass DECL_BY_REFERENCE rather than second-guessing it
(or assert you guessed right).

> +	    v = ipcp_get_aggregate_const (SSA_NAME_VAR (TREE_OPERAND (base, 0)),
> +					  true, offset, size);
> +	}
> +      if (v)
> +	return data->finish (vr->set, vr->base_set, v);
> +    }
> +
>    return NULL;
>  }
>  
>
Martin Jambor April 1, 2022, 1:25 p.m. UTC | #2
Hi,

thanks for a very quick reply.

On Fri, Apr 01 2022, Richard Biener wrote:
> On Fri, 1 Apr 2022, Martin Jambor wrote:
>
>> Hi,
>> 
>> PRs 68930 and 92497 show that when IPA-CP figures out constants in
>> aggregate parameters or when passed by reference but the loads happen
>> in an inlined function the information is lost.  This happens even
>> when the inlined function itself was known to have - or even cloned to
>> have - such constants in incoming parameters because the transform
>> phase of IPA passes is not run on them.  See discussion in the bugs
>> for reasons why.
>> 
>> Honza suggested that we can plug the results of IPA-CP analysis into
>> value numbering, so that FRE can figure out that some loads fetch
>> known constants.  This is what this patch attempts to do.
>> 
>> Although I spent quite some time reading tree-sccvn.c, it is complex
>> enough that I am sure I am not aware of various caveats and so I would
>> not be terribly surprised if there were some issues with my approach
>> that I am not aware of.  Nevertheless, it seems to work well for simple
>> cases and even passes bootstrap and testing (and LTO bootstrap) on
>> x86_64-linux.
>> 
>> I have experimented a little with using this approach instead of the
>> function walking parts of the IPA-CP transformation phase.  This would
>> mean that the known-constants would not participate in the passes after
>> IPA but before FRE - which are not many but there is a ccp and fwprop
>> pass among others.  For simple testcases like
>> gcc/testsuite/gcc.dg/ipa/ipcp-agg-*.c, it makes not assembly difference
>> at all.
>> 
>> What do you think?
>
> Comments below
>
>> Martin
>> 
>> 
>> gcc/ChangeLog:
>> 
>> 2022-03-30  Martin Jambor  <mjambor@suse.cz>
>> 
>> 	PR ipa/68930
>> 	PR ipa/92497
>> 	* ipa-prop.cc (ipcp_get_aggregate_const): New function.
>> 	(ipcp_transform_function): Do not deallocate transformation info.
>> 	* ipa-prop.h (ipcp_get_aggregate_const): Declare.
>> 	* tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and
>> 	ipa-prop.h.
>> 	(vn_reference_lookup_2): When hitting default-def vuse, query
>> 	IPA-CP transformation info for any known constants.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2022-03-30  Martin Jambor  <mjambor@suse.cz>
>> 
>> 	PR ipa/68930
>> 	PR ipa/92497
>> 	* gcc.dg/ipa/pr92497-1.c: New test.
>> 	* gcc.dg/ipa/pr92497-2.c: Likewise.
>> ---
>>  gcc/ipa-prop.cc                      | 43 ++++++++++++++++++++++++----
>>  gcc/ipa-prop.h                       |  2 ++
>>  gcc/testsuite/gcc.dg/ipa/pr92497-1.c | 26 +++++++++++++++++
>>  gcc/testsuite/gcc.dg/ipa/pr92497-2.c | 26 +++++++++++++++++
>>  gcc/tree-ssa-sccvn.cc                | 35 +++++++++++++++++++++-
>>  5 files changed, 126 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-2.c
>> 
>> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
>> index e55fe2776f2..a73a5d9ec1d 100644
>> --- a/gcc/ipa-prop.cc
>> +++ b/gcc/ipa-prop.cc
>> @@ -5748,6 +5748,44 @@ ipcp_modif_dom_walker::before_dom_children (basic_block bb)
>>    return NULL;
>>  }
>>  
>> +/* If IPA-CP discovered a constant in parameter PARM at OFFSET of a given SIZE
>> +   - whether passed by reference or not is given by BY_REF - return that
>> +   constant.  Otherwise return NULL_TREE.  */
>> +
>> +tree
>> +ipcp_get_aggregate_const (tree parm, bool by_ref,
>> +			  HOST_WIDE_INT offset, HOST_WIDE_INT size)
>
> I'd prefer to pass in the function decl or struct function or
> cgraph node.

OK.

>
>> +{
>> +  cgraph_node *cnode = cgraph_node::get (current_function_decl);
>> +
>> +  ipa_agg_replacement_value *aggval = ipa_get_agg_replacements_for_node (cnode);
>> +  if (!aggval)
>> +    return NULL_TREE;
>> +
>> +  int index = 0;
>> +  for (tree p = DECL_ARGUMENTS (current_function_decl);
>> +       p != parm; p = DECL_CHAIN (p))
>> +    {
>> +      index++;
>> +      if (!p)
>> +	return NULL_TREE;
>> +    }
>> +
>> +  ipa_agg_replacement_value *v;
>> +  for (v = aggval; v; v = v->next)
>> +    if (v->index == index
>> +	&& v->offset == offset)
>> +      break;
>> +  if (!v
>> +      || v->by_ref != by_ref
>> +      || maybe_ne (tree_to_poly_int64 (TYPE_SIZE (TREE_TYPE (v->value))),
>> +		   size))
>> +    return NULL_TREE;
>
> two linear searches here - ugh.  I wonder if we should instead
> pre-fill a hash-map from PARM_DECL to a ipa_agg_replacement_value *
> vector sorted by offset which we can binary search?  That could be
> done once when starting value-numbering (not on regions).  Is
> there any reason the data structure is as it is?

Only that it is usually a very short list.  It is bounded by
param_ipa_cp_value_list_size (8 by default) times the number of
arguments and of course only few usually have any constants in them.

Having said that, changing the structure is something I am looking into
also for other reasons and I am very much opened to not being so lax
about the worst case.

>  It seems that
> even ipcp_modif_dom_walker::before_dom_children will do a linear
> walk and ipa_get_param_decl_index_1 also linearly walks parameters.
>
> That looks highly sub-optimal to me, it's also done for each
> mention of a parameter.
>
>> +  return v->value;
>> +}
>> +
>> +
>>  /* Return true if we have recorded VALUE and MASK about PARM.
>>     Set VALUE and MASk accordingly.  */
>>  

[...]

>> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
>> index 66b4fc21f5b..da558055054 100644
>> --- a/gcc/tree-ssa-sccvn.cc
>> +++ b/gcc/tree-ssa-sccvn.cc
>> @@ -74,6 +74,9 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "ipa-modref-tree.h"
>>  #include "ipa-modref.h"
>>  #include "tree-ssa-sccvn.h"
>> +#include "alloc-pool.h"
>> +#include "symbol-summary.h"
>> +#include "ipa-prop.h"
>>  
>>  /* This algorithm is based on the SCC algorithm presented by Keith
>>     Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
>> @@ -2273,7 +2276,7 @@ vn_walk_cb_data::push_partial_def (pd_data pd,
>>     with the current VUSE and performs the expression lookup.  */
>>  
>>  static void *
>> -vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_)
>> +vn_reference_lookup_2 (ao_ref *op, tree vuse, void *data_)
>>  {
>>    vn_walk_cb_data *data = (vn_walk_cb_data *)data_;
>>    vn_reference_t vr = data->vr;
>> @@ -2307,6 +2310,36 @@ vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_)
>>        return *slot;
>>      }
>>  
>> +  if (SSA_NAME_IS_DEFAULT_DEF (vuse))
>> +    {
>> +      HOST_WIDE_INT offset, size;
>> +      tree v = NULL_TREE;
>> +      if (op->base && TREE_CODE (op->base) == PARM_DECL
>> +	  && op->offset.is_constant (&offset)
>> +	  && op->size.is_constant (&size))
>
> you probably also want to check
>
>         op->max_size_known_p ()
>         && known_eq (op->size, op->max_size)

OK, thanks.

>
> I see data->partial_defs.is_empty () is already checked in this
> function, but we won't currently ever call vn_reference_lookup_3
> for default defs.
>
> That said, ideally we'd handle also partial defs, but we can
> start without.

I'll need to look into that a bit more but yes, sure!

>
>> +	v = ipcp_get_aggregate_const (op->base, false, offset, size);
>> +      else if (op->ref)
>> +	{
>> +	  HOST_WIDE_INT offset, size;
>> +	  bool reverse;
>> +	  tree base = get_ref_base_and_extent_hwi (op->ref, &offset,
>> +						   &size, &reverse);
>> +	  if (base && TREE_CODE (base) == PARM_DECL)
>
> In which case do you arrive here but op->base is not set above?
> Maybe you need to use ao_ref_base (op->base) since op->base is
> filled lazily from op->ref.

Right, this is something I wanted to ask in my email but forgot.  I
believe the answer is never, I just was not sure.  I'll delete this
condition.

>
>> +	    v = ipcp_get_aggregate_const (base, false, offset, size);
>> +	  else if (base
>> +		   && TREE_CODE (base) == MEM_REF
>> +		   && integer_zerop (TREE_OPERAND (base, 1))
>> +		   && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
>> +		   && SSA_NAME_IS_DEFAULT_DEF (TREE_OPERAND (base, 0))
>> +		   && (TREE_CODE (SSA_NAME_VAR (TREE_OPERAND (base, 0)))
>> +		       == PARM_DECL))
>
> This code should also be in the op->base part.

I'll remove that branch.

>
> I'd prefer if you pass DECL_BY_REFERENCE rather than second-guessing it
> (or assert you guessed right).

I am not sure I understand but by_ref in IPA-CP really means the
parameter is an explicit pointer (rather than DECL_BY_REFERENCE) and the
constants correspond to MEM_REFs through that pointer (as opposed to
direct loads from the PARM_DECL itself).  That the by_ref corresponds to
what IPA-CP arrived at is checked in the linear look-up, so in the case
of some LTO-induced type mismatch, no constant will be found.  Does that
make sense?

Thanks again for the quick feedback.  I'll address the concerns but am
very happy that the overall approach seems feasible.

Martin


>
>> +	    v = ipcp_get_aggregate_const (SSA_NAME_VAR (TREE_OPERAND (base, 0)),
>> +					  true, offset, size);
>> +	}
>> +      if (v)
>> +	return data->finish (vr->set, vr->base_set, v);
>> +    }
>> +
>>    return NULL;
>>  }
>>  
>> 
>
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
Richard Biener April 4, 2022, 6:28 a.m. UTC | #3
On Fri, 1 Apr 2022, Martin Jambor wrote:

> Hi,
> 
> thanks for a very quick reply.
> 
> On Fri, Apr 01 2022, Richard Biener wrote:
> > On Fri, 1 Apr 2022, Martin Jambor wrote:
> >
> >> Hi,
> >> 
> >> PRs 68930 and 92497 show that when IPA-CP figures out constants in
> >> aggregate parameters or when passed by reference but the loads happen
> >> in an inlined function the information is lost.  This happens even
> >> when the inlined function itself was known to have - or even cloned to
> >> have - such constants in incoming parameters because the transform
> >> phase of IPA passes is not run on them.  See discussion in the bugs
> >> for reasons why.
> >> 
> >> Honza suggested that we can plug the results of IPA-CP analysis into
> >> value numbering, so that FRE can figure out that some loads fetch
> >> known constants.  This is what this patch attempts to do.
> >> 
> >> Although I spent quite some time reading tree-sccvn.c, it is complex
> >> enough that I am sure I am not aware of various caveats and so I would
> >> not be terribly surprised if there were some issues with my approach
> >> that I am not aware of.  Nevertheless, it seems to work well for simple
> >> cases and even passes bootstrap and testing (and LTO bootstrap) on
> >> x86_64-linux.
> >> 
> >> I have experimented a little with using this approach instead of the
> >> function walking parts of the IPA-CP transformation phase.  This would
> >> mean that the known-constants would not participate in the passes after
> >> IPA but before FRE - which are not many but there is a ccp and fwprop
> >> pass among others.  For simple testcases like
> >> gcc/testsuite/gcc.dg/ipa/ipcp-agg-*.c, it makes not assembly difference
> >> at all.
> >> 
> >> What do you think?
> >
> > Comments below
> >
> >> Martin
> >> 
> >> 
> >> gcc/ChangeLog:
> >> 
> >> 2022-03-30  Martin Jambor  <mjambor@suse.cz>
> >> 
> >> 	PR ipa/68930
> >> 	PR ipa/92497
> >> 	* ipa-prop.cc (ipcp_get_aggregate_const): New function.
> >> 	(ipcp_transform_function): Do not deallocate transformation info.
> >> 	* ipa-prop.h (ipcp_get_aggregate_const): Declare.
> >> 	* tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and
> >> 	ipa-prop.h.
> >> 	(vn_reference_lookup_2): When hitting default-def vuse, query
> >> 	IPA-CP transformation info for any known constants.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >> 2022-03-30  Martin Jambor  <mjambor@suse.cz>
> >> 
> >> 	PR ipa/68930
> >> 	PR ipa/92497
> >> 	* gcc.dg/ipa/pr92497-1.c: New test.
> >> 	* gcc.dg/ipa/pr92497-2.c: Likewise.
> >> ---
> >>  gcc/ipa-prop.cc                      | 43 ++++++++++++++++++++++++----
> >>  gcc/ipa-prop.h                       |  2 ++
> >>  gcc/testsuite/gcc.dg/ipa/pr92497-1.c | 26 +++++++++++++++++
> >>  gcc/testsuite/gcc.dg/ipa/pr92497-2.c | 26 +++++++++++++++++
> >>  gcc/tree-ssa-sccvn.cc                | 35 +++++++++++++++++++++-
> >>  5 files changed, 126 insertions(+), 6 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-1.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92497-2.c
> >> 
> >> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> >> index e55fe2776f2..a73a5d9ec1d 100644
> >> --- a/gcc/ipa-prop.cc
> >> +++ b/gcc/ipa-prop.cc
> >> @@ -5748,6 +5748,44 @@ ipcp_modif_dom_walker::before_dom_children (basic_block bb)
> >>    return NULL;
> >>  }
> >>  
> >> +/* If IPA-CP discovered a constant in parameter PARM at OFFSET of a given SIZE
> >> +   - whether passed by reference or not is given by BY_REF - return that
> >> +   constant.  Otherwise return NULL_TREE.  */
> >> +
> >> +tree
> >> +ipcp_get_aggregate_const (tree parm, bool by_ref,
> >> +			  HOST_WIDE_INT offset, HOST_WIDE_INT size)
> >
> > I'd prefer to pass in the function decl or struct function or
> > cgraph node.
> 
> OK.
> 
> >
> >> +{
> >> +  cgraph_node *cnode = cgraph_node::get (current_function_decl);
> >> +
> >> +  ipa_agg_replacement_value *aggval = ipa_get_agg_replacements_for_node (cnode);
> >> +  if (!aggval)
> >> +    return NULL_TREE;
> >> +
> >> +  int index = 0;
> >> +  for (tree p = DECL_ARGUMENTS (current_function_decl);
> >> +       p != parm; p = DECL_CHAIN (p))
> >> +    {
> >> +      index++;
> >> +      if (!p)
> >> +	return NULL_TREE;
> >> +    }
> >> +
> >> +  ipa_agg_replacement_value *v;
> >> +  for (v = aggval; v; v = v->next)
> >> +    if (v->index == index
> >> +	&& v->offset == offset)
> >> +      break;
> >> +  if (!v
> >> +      || v->by_ref != by_ref
> >> +      || maybe_ne (tree_to_poly_int64 (TYPE_SIZE (TREE_TYPE (v->value))),
> >> +		   size))
> >> +    return NULL_TREE;
> >
> > two linear searches here - ugh.  I wonder if we should instead
> > pre-fill a hash-map from PARM_DECL to a ipa_agg_replacement_value *
> > vector sorted by offset which we can binary search?  That could be
> > done once when starting value-numbering (not on regions).  Is
> > there any reason the data structure is as it is?
> 
> Only that it is usually a very short list.  It is bounded by
> param_ipa_cp_value_list_size (8 by default) times the number of
> arguments and of course only few usually have any constants in them.
> 
> Having said that, changing the structure is something I am looking into
> also for other reasons and I am very much opened to not being so lax
> about the worst case.
> 
> >  It seems that
> > even ipcp_modif_dom_walker::before_dom_children will do a linear
> > walk and ipa_get_param_decl_index_1 also linearly walks parameters.
> >
> > That looks highly sub-optimal to me, it's also done for each
> > mention of a parameter.
> >
> >> +  return v->value;
> >> +}
> >> +
> >> +
> >>  /* Return true if we have recorded VALUE and MASK about PARM.
> >>     Set VALUE and MASk accordingly.  */
> >>  
> 
> [...]
> 
> >> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> >> index 66b4fc21f5b..da558055054 100644
> >> --- a/gcc/tree-ssa-sccvn.cc
> >> +++ b/gcc/tree-ssa-sccvn.cc
> >> @@ -74,6 +74,9 @@ along with GCC; see the file COPYING3.  If not see
> >>  #include "ipa-modref-tree.h"
> >>  #include "ipa-modref.h"
> >>  #include "tree-ssa-sccvn.h"
> >> +#include "alloc-pool.h"
> >> +#include "symbol-summary.h"
> >> +#include "ipa-prop.h"
> >>  
> >>  /* This algorithm is based on the SCC algorithm presented by Keith
> >>     Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
> >> @@ -2273,7 +2276,7 @@ vn_walk_cb_data::push_partial_def (pd_data pd,
> >>     with the current VUSE and performs the expression lookup.  */
> >>  
> >>  static void *
> >> -vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_)
> >> +vn_reference_lookup_2 (ao_ref *op, tree vuse, void *data_)
> >>  {
> >>    vn_walk_cb_data *data = (vn_walk_cb_data *)data_;
> >>    vn_reference_t vr = data->vr;
> >> @@ -2307,6 +2310,36 @@ vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_)
> >>        return *slot;
> >>      }
> >>  
> >> +  if (SSA_NAME_IS_DEFAULT_DEF (vuse))
> >> +    {
> >> +      HOST_WIDE_INT offset, size;
> >> +      tree v = NULL_TREE;
> >> +      if (op->base && TREE_CODE (op->base) == PARM_DECL
> >> +	  && op->offset.is_constant (&offset)
> >> +	  && op->size.is_constant (&size))
> >
> > you probably also want to check
> >
> >         op->max_size_known_p ()
> >         && known_eq (op->size, op->max_size)
> 
> OK, thanks.
> 
> >
> > I see data->partial_defs.is_empty () is already checked in this
> > function, but we won't currently ever call vn_reference_lookup_3
> > for default defs.
> >
> > That said, ideally we'd handle also partial defs, but we can
> > start without.
> 
> I'll need to look into that a bit more but yes, sure!
> 
> >
> >> +	v = ipcp_get_aggregate_const (op->base, false, offset, size);
> >> +      else if (op->ref)
> >> +	{
> >> +	  HOST_WIDE_INT offset, size;
> >> +	  bool reverse;
> >> +	  tree base = get_ref_base_and_extent_hwi (op->ref, &offset,
> >> +						   &size, &reverse);
> >> +	  if (base && TREE_CODE (base) == PARM_DECL)
> >
> > In which case do you arrive here but op->base is not set above?
> > Maybe you need to use ao_ref_base (op->base) since op->base is
> > filled lazily from op->ref.
> 
> Right, this is something I wanted to ask in my email but forgot.  I
> believe the answer is never, I just was not sure.  I'll delete this
> condition.
> 
> >
> >> +	    v = ipcp_get_aggregate_const (base, false, offset, size);
> >> +	  else if (base
> >> +		   && TREE_CODE (base) == MEM_REF
> >> +		   && integer_zerop (TREE_OPERAND (base, 1))
> >> +		   && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
> >> +		   && SSA_NAME_IS_DEFAULT_DEF (TREE_OPERAND (base, 0))
> >> +		   && (TREE_CODE (SSA_NAME_VAR (TREE_OPERAND (base, 0)))
> >> +		       == PARM_DECL))
> >
> > This code should also be in the op->base part.
> 
> I'll remove that branch.
> 
> >
> > I'd prefer if you pass DECL_BY_REFERENCE rather than second-guessing it
> > (or assert you guessed right).
> 
> I am not sure I understand but by_ref in IPA-CP really means the
> parameter is an explicit pointer (rather than DECL_BY_REFERENCE) and the
> constants correspond to MEM_REFs through that pointer (as opposed to
> direct loads from the PARM_DECL itself).  That the by_ref corresponds to
> what IPA-CP arrived at is checked in the linear look-up, so in the case
> of some LTO-induced type mismatch, no constant will be found.  Does that
> make sense?

I see, so are we then not possibly confused by

void foo (struct X *p)
{
  struct X *q = p;
  bar (&p, q);
}

that is, when 'p' has its address taken we will not have p as SSA name
but on the caller side it's going to be a pointer.  In foo above
we're then seeing a load from 'p' but not from '*p'.  How do you
avoid taking the [0, pointer-size] access as an access to *p?  Given
that you simply "strip" the SSA name and go for the PARM_DECL?

I thought you simply excluded this case by means of DECL_BY_REFERENCE
which would mean you cannot take the address of the pointer in the
callee since the pointer is not visible in the source.

Richard.

> Thanks again for the quick feedback.  I'll address the concerns but am
> very happy that the overall approach seems feasible.
> 
> Martin
> 
> 
> >
> >> +	    v = ipcp_get_aggregate_const (SSA_NAME_VAR (TREE_OPERAND (base, 0)),
> >> +					  true, offset, size);
> >> +	}
> >> +      if (v)
> >> +	return data->finish (vr->set, vr->base_set, v);
> >> +    }
> >> +
> >>    return NULL;
> >>  }
> >>  
> >> 
> >
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
>
diff mbox series

Patch

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index e55fe2776f2..a73a5d9ec1d 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -5748,6 +5748,44 @@  ipcp_modif_dom_walker::before_dom_children (basic_block bb)
   return NULL;
 }
 
+/* If IPA-CP discovered a constant in parameter PARM at OFFSET of a given SIZE
+   - whether passed by reference or not is given by BY_REF - return that
+   constant.  Otherwise return NULL_TREE.  */
+
+tree
+ipcp_get_aggregate_const (tree parm, bool by_ref,
+			  HOST_WIDE_INT offset, HOST_WIDE_INT size)
+{
+  cgraph_node *cnode = cgraph_node::get (current_function_decl);
+
+  ipa_agg_replacement_value *aggval = ipa_get_agg_replacements_for_node (cnode);
+  if (!aggval)
+    return NULL_TREE;
+
+  int index = 0;
+  for (tree p = DECL_ARGUMENTS (current_function_decl);
+       p != parm; p = DECL_CHAIN (p))
+    {
+      index++;
+      if (!p)
+	return NULL_TREE;
+    }
+
+  ipa_agg_replacement_value *v;
+  for (v = aggval; v; v = v->next)
+    if (v->index == index
+	&& v->offset == offset)
+      break;
+  if (!v
+      || v->by_ref != by_ref
+      || maybe_ne (tree_to_poly_int64 (TYPE_SIZE (TREE_TYPE (v->value))),
+		   size))
+    return NULL_TREE;
+
+  return v->value;
+}
+
+
 /* Return true if we have recorded VALUE and MASK about PARM.
    Set VALUE and MASk accordingly.  */
 
@@ -6055,11 +6093,6 @@  ipcp_transform_function (struct cgraph_node *node)
     free_ipa_bb_info (bi);
   fbi.bb_infos.release ();
 
-  ipcp_transformation *s = ipcp_transformation_sum->get (node);
-  s->agg_values = NULL;
-  s->bits = NULL;
-  s->m_vr = NULL;
-
   vec_free (descriptors);
   if (cfg_changed)
     delete_unreachable_blocks_update_callgraph (node, false);
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 553adfc9f35..aa6fcb522ac 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -1181,6 +1181,8 @@  void ipa_dump_param (FILE *, class ipa_node_params *info, int i);
 void ipa_release_body_info (struct ipa_func_body_info *);
 tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
 bool ipcp_get_parm_bits (tree, tree *, widest_int *);
+tree ipcp_get_aggregate_const (tree parm, bool by_ref,
+			       HOST_WIDE_INT offset, HOST_WIDE_INT size);
 bool unadjusted_ptr_and_unit_offset (tree op, tree *ret,
 				     poly_int64 *offset_ret);
 
diff --git a/gcc/testsuite/gcc.dg/ipa/pr92497-1.c b/gcc/testsuite/gcc.dg/ipa/pr92497-1.c
new file mode 100644
index 00000000000..eb8f2e75fd0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr92497-1.c
@@ -0,0 +1,26 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-early-inlining"  } */
+
+struct a {int a;};
+static int //__attribute__ ((noinline))
+foo (struct a a)
+{
+  if (!__builtin_constant_p (a.a))
+    __builtin_abort ();
+  return a.a;
+}
+
+static int __attribute__ ((noinline))
+bar (struct a a)
+{
+  return foo(a);
+}
+
+volatile int r;
+
+int main()
+{
+  struct a a={1};
+  r = bar (a);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/ipa/pr92497-2.c b/gcc/testsuite/gcc.dg/ipa/pr92497-2.c
new file mode 100644
index 00000000000..56e611df8f9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr92497-2.c
@@ -0,0 +1,26 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-early-inlining -fno-ipa-sra"  } */
+
+struct a {int a;};
+static int //__attribute__ ((noinline))
+foo (struct a *a)
+{
+  if (!__builtin_constant_p (a->a))
+    __builtin_abort ();
+  return a->a;
+}
+
+static int __attribute__ ((noinline))
+bar (struct a *a)
+{
+  return foo(a);
+}
+
+volatile int r;
+
+int main()
+{
+  struct a a={1};
+  r = bar (&a);
+  return 0;
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 66b4fc21f5b..da558055054 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -74,6 +74,9 @@  along with GCC; see the file COPYING3.  If not see
 #include "ipa-modref-tree.h"
 #include "ipa-modref.h"
 #include "tree-ssa-sccvn.h"
+#include "alloc-pool.h"
+#include "symbol-summary.h"
+#include "ipa-prop.h"
 
 /* This algorithm is based on the SCC algorithm presented by Keith
    Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
@@ -2273,7 +2276,7 @@  vn_walk_cb_data::push_partial_def (pd_data pd,
    with the current VUSE and performs the expression lookup.  */
 
 static void *
-vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_)
+vn_reference_lookup_2 (ao_ref *op, tree vuse, void *data_)
 {
   vn_walk_cb_data *data = (vn_walk_cb_data *)data_;
   vn_reference_t vr = data->vr;
@@ -2307,6 +2310,36 @@  vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void *data_)
       return *slot;
     }
 
+  if (SSA_NAME_IS_DEFAULT_DEF (vuse))
+    {
+      HOST_WIDE_INT offset, size;
+      tree v = NULL_TREE;
+      if (op->base && TREE_CODE (op->base) == PARM_DECL
+	  && op->offset.is_constant (&offset)
+	  && op->size.is_constant (&size))
+	v = ipcp_get_aggregate_const (op->base, false, offset, size);
+      else if (op->ref)
+	{
+	  HOST_WIDE_INT offset, size;
+	  bool reverse;
+	  tree base = get_ref_base_and_extent_hwi (op->ref, &offset,
+						   &size, &reverse);
+	  if (base && TREE_CODE (base) == PARM_DECL)
+	    v = ipcp_get_aggregate_const (base, false, offset, size);
+	  else if (base
+		   && TREE_CODE (base) == MEM_REF
+		   && integer_zerop (TREE_OPERAND (base, 1))
+		   && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
+		   && SSA_NAME_IS_DEFAULT_DEF (TREE_OPERAND (base, 0))
+		   && (TREE_CODE (SSA_NAME_VAR (TREE_OPERAND (base, 0)))
+		       == PARM_DECL))
+	    v = ipcp_get_aggregate_const (SSA_NAME_VAR (TREE_OPERAND (base, 0)),
+					  true, offset, size);
+	}
+      if (v)
+	return data->finish (vr->set, vr->base_set, v);
+    }
+
   return NULL;
 }