diff mbox

[RFC,4/5] Handle constant-pool entries

Message ID 1440500777-25966-5-git-send-email-alan.lawrence@arm.com
State New
Headers show

Commit Message

Alan Lawrence Aug. 25, 2015, 11:06 a.m. UTC
This makes SRA replace loads of records/arrays from constant pool entries,
with elementwise assignments of the constant values, hence, overcoming the
fundamental problem in PR/63679.

As a first pass, the approach I took was to look for constant-pool loads as
we scanned through other accesses, and add them as candidates there; to build a
constant replacement_decl for any such accesses in completely_scalarize; and to
use any existing replacement_decl rather than creating a variable in
create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but
that does not allow addresses of labels, which can still end up in the constant
pool.)

Feedback as to the approach or how it might be better structured / fitted into
SRA, is solicited ;).

Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly increases the number of testcases in which this code is exercised!

Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, and sh.

gcc/ChangeLog:

	* tree-sra.c (create_access): Scan for uses of constant pool and add
	to candidates.
	(subst_initial): New.
	(scalarize_elem): Build replacement_decl using subst_initial.
	(create_access_replacement): Use replacement_decl if set.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
	sra-max-scalarization-size-Ospeed.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
 gcc/tree-sra.c                                | 56 +++++++++++++++++++++++++--
 2 files changed, 55 insertions(+), 8 deletions(-)

Comments

Jeff Law Aug. 25, 2015, 8:13 p.m. UTC | #1
On 08/25/2015 05:06 AM, Alan Lawrence wrote:
> This makes SRA replace loads of records/arrays from constant pool entries,
> with elementwise assignments of the constant values, hence, overcoming the
> fundamental problem in PR/63679.
>
> As a first pass, the approach I took was to look for constant-pool loads as
> we scanned through other accesses, and add them as candidates there; to build a
> constant replacement_decl for any such accesses in completely_scalarize; and to
> use any existing replacement_decl rather than creating a variable in
> create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but
> that does not allow addresses of labels, which can still end up in the constant
> pool.)
>
> Feedback as to the approach or how it might be better structured / fitted into
> SRA, is solicited ;).
>
> Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
> arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly increases the number of testcases in which this code is exercised!
>
> Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, and sh.
>
> gcc/ChangeLog:
>
> 	* tree-sra.c (create_access): Scan for uses of constant pool and add
> 	to candidates.
> 	(subst_initial): New.
> 	(scalarize_elem): Build replacement_decl using subst_initial.
> 	(create_access_replacement): Use replacement_decl if set.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
> 	sra-max-scalarization-size-Ospeed.
> ---
>   gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
>   gcc/tree-sra.c                                | 56 +++++++++++++++++++++++++--
>   2 files changed, 55 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index af35fcc..a3ff2df 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
>     else
>       ptr = false;
>
> +  /* FORNOW: scan for uses of constant pool as we go along.  */
I'm not sure why you have this marked as FORNOW.  If I'm reading all 
this code correctly, you're lazily adding items from the constant pool 
into the candidates table when you find they're used.  That seems better 
than walking the entire constant pool adding them all to the candidates.

I don't see this as fundamentally wrong or unclean.

The question I have is why this differs from the effects of patch #5. 
That would seem to indicate that there's things we're not getting into 
the candidate tables with this approach?!?



> @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>       }
>   }
>
> +static tree
> +subst_initial (tree expr, tree var)
Function comment.

I think this patch is fine with the function comment added and removing 
the FORNOW part of the comment in create_access.  It may be worth noting 
in create_access's comment that it can add new items to the candidates 
tables for constant pool entries.


Jeff
Richard Biener Aug. 26, 2015, 7:20 a.m. UTC | #2
On Tue, Aug 25, 2015 at 10:13 PM, Jeff Law <law@redhat.com> wrote:
> On 08/25/2015 05:06 AM, Alan Lawrence wrote:
>>
>> This makes SRA replace loads of records/arrays from constant pool entries,
>> with elementwise assignments of the constant values, hence, overcoming the
>> fundamental problem in PR/63679.
>>
>> As a first pass, the approach I took was to look for constant-pool loads
>> as
>> we scanned through other accesses, and add them as candidates there; to
>> build a
>> constant replacement_decl for any such accesses in completely_scalarize;
>> and to
>> use any existing replacement_decl rather than creating a variable in
>> create_access_replacement. (I did try using CONSTANT_CLASS_P in the
>> latter, but
>> that does not allow addresses of labels, which can still end up in the
>> constant
>> pool.)
>>
>> Feedback as to the approach or how it might be better structured / fitted
>> into
>> SRA, is solicited ;).
>>
>> Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
>> arm-none-linux-gnueabihf, including with the next patch (rfc), which
>> greatly increases the number of testcases in which this code is exercised!
>>
>> Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes
>> (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc,
>> sparc, avr, and sh.
>>
>> gcc/ChangeLog:
>>
>>         * tree-sra.c (create_access): Scan for uses of constant pool and
>> add
>>         to candidates.
>>         (subst_initial): New.
>>         (scalarize_elem): Build replacement_decl using subst_initial.
>>         (create_access_replacement): Use replacement_decl if set.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
>>         sra-max-scalarization-size-Ospeed.
>> ---
>>   gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
>>   gcc/tree-sra.c                                | 56
>> +++++++++++++++++++++++++--
>>   2 files changed, 55 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> index af35fcc..a3ff2df 100644
>> --- a/gcc/tree-sra.c
>> +++ b/gcc/tree-sra.c
>> @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
>>     else
>>       ptr = false;
>>
>> +  /* FORNOW: scan for uses of constant pool as we go along.  */
>
> I'm not sure why you have this marked as FORNOW.  If I'm reading all this
> code correctly, you're lazily adding items from the constant pool into the
> candidates table when you find they're used.  That seems better than walking
> the entire constant pool adding them all to the candidates.
>
> I don't see this as fundamentally wrong or unclean.
>
> The question I have is why this differs from the effects of patch #5. That
> would seem to indicate that there's things we're not getting into the
> candidate tables with this approach?!?
>
>
>
>> @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type,
>> HOST_WIDE_INT offset, tree ref)
>>       }
>>   }
>>
>> +static tree
>> +subst_initial (tree expr, tree var)
>
> Function comment.
>
> I think this patch is fine with the function comment added and removing the
> FORNOW part of the comment in create_access.  It may be worth noting in
> create_access's comment that it can add new items to the candidates tables
> for constant pool entries.

I'm happy seeing this code in SRA as I never liked that we already decide
at gimplification time which initializers to expand and which to init from
a constant pool entry.  So ... can we now "remove" gimplify_init_constructor
by _always_ emitting a constant pool entry and an assignment from it
(obviously only if the constructor can be put into the constant pool)?  Defering
the expansion decision to SRA makes it possible to better estimate whether
the code is hot/cold or whether the initialized variable can be replaced by
the constant pool entry completely (variable ends up readonly).

Oh, and we'd no longer create the awful split code at -O0 ...

So can you explore that a bit once this series is settled?  This is probably
also related to 5/5 as this makes all the target dependent decisions in SRA
now and thus the initial IL from gimplification should be the same for all
targets (that's always a nice thing to have IMHO).

Thanks,
Richard.

>
> Jeff
Martin Jambor Aug. 26, 2015, 2:04 p.m. UTC | #3
Hi,

On Tue, Aug 25, 2015 at 12:06:16PM +0100, Alan Lawrence wrote:
> This makes SRA replace loads of records/arrays from constant pool entries,
> with elementwise assignments of the constant values, hence, overcoming the
> fundamental problem in PR/63679.
> 
> As a first pass, the approach I took was to look for constant-pool loads as
> we scanned through other accesses, and add them as candidates there; to build a
> constant replacement_decl for any such accesses in completely_scalarize; and to
> use any existing replacement_decl rather than creating a variable in
> create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but
> that does not allow addresses of labels, which can still end up in the constant
> pool.)
> 
> Feedback as to the approach or how it might be better structured / fitted into
> SRA, is solicited ;).

I'm not familiar with constant pools very much, but I'll try:

> 
> Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
> arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly increases the number of testcases in which this code is exercised!
> 
> Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, and sh.
> 
> gcc/ChangeLog:
> 
> 	* tree-sra.c (create_access): Scan for uses of constant pool and add
> 	to candidates.
> 	(subst_initial): New.
> 	(scalarize_elem): Build replacement_decl using subst_initial.
> 	(create_access_replacement): Use replacement_decl if set.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
> 	sra-max-scalarization-size-Ospeed.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
>  gcc/tree-sra.c                                | 56 +++++++++++++++++++++++++--
>  2 files changed, 55 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> index 9eccdc9..b13d583 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
> +/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */
>  
>  int
>  foo ()
> @@ -17,7 +17,4 @@ foo ()
>  /* After late unrolling the above loop completely DOM should be
>     able to optimize this to return 28.  */
>  
> -/* See PR63679 and PR64159, if the target forces the initializer to memory then
> -   DOM is not able to perform this optimization.  */
> -
> -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
> +/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index af35fcc..a3ff2df 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
>    else
>      ptr = false;
>  
> +  /* FORNOW: scan for uses of constant pool as we go along.  */
> +  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
> +      && !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
> +    {
> +      gcc_assert (!write);
> +      bitmap_set_bit (candidate_bitmap, DECL_UID (base));
> +      tree_node **slot = candidates->find_slot_with_hash (base, DECL_UID (base),
> +							  INSERT);
> +      *slot = base;
> +    }
> +

I believe you only want to do this if (sra_mode ==
SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA).

The idea of candidates is that we gather them in find_var_candidates
and ten we only eliminate them, this has the benefit of not worrying
about disqualifying a candidate and then erroneously re-adding it
later.  So if you could find a way to structure your code this way, I'd
much happier.  If it is impossible without traversing the whole
function just for that purpose, we may need some mechanism to prevent
us from making a disqualified decl a candidate again.  Or, if we come
to the conclusion that constant pool decls do not ever get
disqualified, a gcc_assert making sure it actually does not happen in
disqualify_candidate.

And of course at find_var_candidates time we check that all candidates
pass simple checks in maybe_add_sra_candidate.  I suppose many of them
do not make sense for constant pool decls but at least please have a
look whether that is the case for all of them or not.

>    if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>      return NULL;
>  
> @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>      }
>  }
>  
> +static tree
> +subst_initial (tree expr, tree var)

This needs a comment and a better name.  A name that would make it
clear this is for constant pool stuff only.

> +{
> +  if (TREE_CODE (expr) == VAR_DECL)
> +    {
> +      gcc_assert (DECL_IN_CONSTANT_POOL (expr));
> +      gcc_assert (expr == var);
> +      return DECL_INITIAL (expr);
> +    }
> +  if (TREE_CODE (expr) == COMPONENT_REF)
> +    {
> +      gcc_assert (TREE_CODE (TREE_OPERAND (expr, 1)) == FIELD_DECL);
> +      gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
> +      return fold_build3 (COMPONENT_REF, TREE_TYPE (expr),
> +			  subst_initial (TREE_OPERAND (expr, 0), var),
> +			  TREE_OPERAND (expr, 1),
> +			  NULL_TREE);
> +    }
> +  if (TREE_CODE (expr) == ARRAY_REF)
> +    {
> +      gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
> +      gcc_assert (TREE_OPERAND (expr, 3) == NULL_TREE);
> +      return fold (build4 (ARRAY_REF, TREE_TYPE (expr),
> +			   subst_initial (TREE_OPERAND (expr, 0), var),
> +			   TREE_OPERAND (expr, 1),
> +			   NULL_TREE,
> +			   NULL_TREE));
> +    }
> +  gcc_unreachable ();
> +}
> +
>  static void
>  scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
>  		 tree ref, tree type)
> @@ -1033,6 +1075,9 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
>    {
>      struct access *access = create_access_1 (base, pos, size);
>      access->expr = ref;
> +    if (TREE_CODE (base) == VAR_DECL
> +	&& DECL_IN_CONSTANT_POOL (base))
> +      access->replacement_decl = subst_initial (ref, base);

So replacement_decl ceases to be a decl and should be renamed to
replacement_expr.  (Such cleanups can be done as a followup, of
course).

Thanks,

Martin
Alan Lawrence Aug. 26, 2015, 3:37 p.m. UTC | #4
Jeff Law wrote:
>
> The question I have is why this differs from the effects of patch #5. 
> That would seem to indicate that there's things we're not getting into 
> the candidate tables with this approach?!?

I'll answer this first, as I think (Richard and) Martin have identified enough 
other issues with this patch that will take longer to address.... but if you 
look at the context to the hunk in patch 5, it is iterating through the 
candidates (from patch 4), and then filtering out any candidates bigger than 
max-scalarization-size, which filtering patch 5 removes.

--Alan
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index 9eccdc9..b13d583 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */
 
 int
 foo ()
@@ -17,7 +17,4 @@  foo ()
 /* After late unrolling the above loop completely DOM should be
    able to optimize this to return 28.  */
 
-/* See PR63679 and PR64159, if the target forces the initializer to memory then
-   DOM is not able to perform this optimization.  */
-
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index af35fcc..a3ff2df 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -865,6 +865,17 @@  create_access (tree expr, gimple stmt, bool write)
   else
     ptr = false;
 
+  /* FORNOW: scan for uses of constant pool as we go along.  */
+  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
+      && !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
+    {
+      gcc_assert (!write);
+      bitmap_set_bit (candidate_bitmap, DECL_UID (base));
+      tree_node **slot = candidates->find_slot_with_hash (base, DECL_UID (base),
+							  INSERT);
+      *slot = base;
+    }
+
   if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
     return NULL;
 
@@ -1025,6 +1036,37 @@  completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
     }
 }
 
+static tree
+subst_initial (tree expr, tree var)
+{
+  if (TREE_CODE (expr) == VAR_DECL)
+    {
+      gcc_assert (DECL_IN_CONSTANT_POOL (expr));
+      gcc_assert (expr == var);
+      return DECL_INITIAL (expr);
+    }
+  if (TREE_CODE (expr) == COMPONENT_REF)
+    {
+      gcc_assert (TREE_CODE (TREE_OPERAND (expr, 1)) == FIELD_DECL);
+      gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
+      return fold_build3 (COMPONENT_REF, TREE_TYPE (expr),
+			  subst_initial (TREE_OPERAND (expr, 0), var),
+			  TREE_OPERAND (expr, 1),
+			  NULL_TREE);
+    }
+  if (TREE_CODE (expr) == ARRAY_REF)
+    {
+      gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
+      gcc_assert (TREE_OPERAND (expr, 3) == NULL_TREE);
+      return fold (build4 (ARRAY_REF, TREE_TYPE (expr),
+			   subst_initial (TREE_OPERAND (expr, 0), var),
+			   TREE_OPERAND (expr, 1),
+			   NULL_TREE,
+			   NULL_TREE));
+    }
+  gcc_unreachable ();
+}
+
 static void
 scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
 		 tree ref, tree type)
@@ -1033,6 +1075,9 @@  scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
   {
     struct access *access = create_access_1 (base, pos, size);
     access->expr = ref;
+    if (TREE_CODE (base) == VAR_DECL
+	&& DECL_IN_CONSTANT_POOL (base))
+      access->replacement_decl = subst_initial (ref, base);
     access->type = type;
     access->grp_total_scalarization = 1;
     /* Accesses for intraprocedural SRA can have their stmt NULL.  */
@@ -2038,14 +2083,19 @@  sort_and_splice_var_accesses (tree var)
 }
 
 /* Create a variable for the given ACCESS which determines the type, name and a
-   few other properties.  Return the variable declaration and store it also to
-   ACCESS->replacement.  */
+   few other properties.  Return the variable declaration.  */
 
 static tree
 create_access_replacement (struct access *access)
 {
-  tree repl;
+  if (access->replacement_decl)
+    {
+      gcc_assert (!access->write);
+      gcc_assert (is_gimple_reg_type (TREE_TYPE (access->expr)));
+      return access->replacement_decl;
+    }
 
+  tree repl;
   if (access->grp_to_be_debug_replaced)
     {
       repl = create_tmp_var_raw (access->type);