diff mbox

[PR,80293] Don't totally-sRA char arrays

Message ID 20170413174129.h6mavi5jweqpzw3k@virgil.suse.cz
State New
Headers show

Commit Message

Martin Jambor April 13, 2017, 5:41 p.m. UTC
Hi,

On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
> On Wed, 12 Apr 2017, Martin Jambor wrote:
> 
> > Hi,
> > 
> > the patch below is an attempt to deal with PR 80293 as non-invasively
> > as possible.  Basically, it switches off total SRA scalarization of
> > any local aggregates which contains an array of elements that have one
> > byte (or less).
> > 
> > The logic behind this is that accessing such arrays element-wise
> > usually results in poor code and that such char arrays are often used
> > for non-statically-typed content anyway, and we do not want to copy
> > that byte per byte.
> > 
> > Alan, do you think this could impact your constant pool scalarization
> > too severely?
> 
> Hmm, isn't one of the issues that we have
> 
>         if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
>           {
>             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>                 <= max_scalarization_size)
>               {
>                 create_total_scalarization_access (var);
> 
> which limits the size of scalarizable vars but not the number
> of accesses we create for total scalarization?

Well, yes, but at least I understood from your comment #4 in the bug
that you did not want to limit the number of replacements for gcc 7?

> 
> Is scalarizable_type_p only used in contexts where we have no hint
> of the actual accesses?

There are should_scalarize_away_bitmap and
cannot_scalarize_away_bitmap hints.

Total scalarization is a hack process where we chop up small
aggregates according to their types - as opposed to actual accesses,
meaning it is an alien process to the rest of SRA - knowing that they
will go completely away afterwards (that is ensured by
cannot_scalarize_away_bitmap) and that this will facilitate copy
propagation (this is indicated by should_scalarize_away_bitmap, which
has a bit set if there is a non-scalar assignment _from_ (a part of)
the aggregate).

> That is, for the constant pool case we
> usually have
> 
>   x = .LC0;
>   .. = x[2];
> 
> so we have a "hint" that accesses on x are those we'd want to
> optimize to accesses to .LC0.

You don't need total scalarization for this, just the existence of
read from x[2] would be sufficient but thanks for pointing me to the
right direction.  For constant pool decl scalarization, it is not
important to introduce artificial accesses for x but for .LC0.
Therefore, I have adapted the patch to allow char arrays for const
decls only and verified that it scalarizes a const-pool array of chars
on Aarch64.  The (otherwise yet untested) result is below.

What do you think?

Martin


2017-04-13  Martin Jambor  <mjambor@suse.cz>

	* tree-sra.c (scalarizable_type_p): New parameter const_decl, make
	char arrays not totally scalarizable if it is false.
	(analyze_all_variable_accesses): Pass correct value in the new
	parameter.

testsuite/
	* g++.dg/tree-ssa/pr80293.C: New test.
---
 gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 +++++++++++++++++++++++++++++++++
 gcc/tree-sra.c                          | 21 ++++++++++-----
 2 files changed, 60 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C

Comments

Richard Biener April 13, 2017, 6:48 p.m. UTC | #1
On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote:
>Hi,
>
>On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
>> On Wed, 12 Apr 2017, Martin Jambor wrote:
>> 
>> > Hi,
>> > 
>> > the patch below is an attempt to deal with PR 80293 as
>non-invasively
>> > as possible.  Basically, it switches off total SRA scalarization of
>> > any local aggregates which contains an array of elements that have
>one
>> > byte (or less).
>> > 
>> > The logic behind this is that accessing such arrays element-wise
>> > usually results in poor code and that such char arrays are often
>used
>> > for non-statically-typed content anyway, and we do not want to copy
>> > that byte per byte.
>> > 
>> > Alan, do you think this could impact your constant pool
>scalarization
>> > too severely?
>> 
>> Hmm, isn't one of the issues that we have
>> 
>>         if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
>>           {
>>             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>>                 <= max_scalarization_size)
>>               {
>>                 create_total_scalarization_access (var);
>> 
>> which limits the size of scalarizable vars but not the number
>> of accesses we create for total scalarization?
>
>Well, yes, but at least I understood from your comment #4 in the bug
>that you did not want to limit the number of replacements for gcc 7?
>
>> 
>> Is scalarizable_type_p only used in contexts where we have no hint
>> of the actual accesses?
>
>There are should_scalarize_away_bitmap and
>cannot_scalarize_away_bitmap hints.
>
>Total scalarization is a hack process where we chop up small
>aggregates according to their types - as opposed to actual accesses,
>meaning it is an alien process to the rest of SRA - knowing that they
>will go completely away afterwards (that is ensured by
>cannot_scalarize_away_bitmap) and that this will facilitate copy
>propagation (this is indicated by should_scalarize_away_bitmap, which
>has a bit set if there is a non-scalar assignment _from_ (a part of)
>the aggregate).

OK, but for the copy x = y where x would go away it still depends on uses of x what pieces we actually want?  Or is total scalarization only done for x  = y; z = x;?
Thus no further accesses to x?

>> That is, for the constant pool case we
>> usually have
>> 
>>   x = .LC0;
>>   .. = x[2];
>> 
>> so we have a "hint" that accesses on x are those we'd want to
>> optimize to accesses to .LC0.
>
>You don't need total scalarization for this, just the existence of
>read from x[2] would be sufficient but thanks for pointing me to the
>right direction.  For constant pool decl scalarization, it is not
>important to introduce artificial accesses for x but for .LC0.
>Therefore, I have adapted the patch to allow char arrays for const
>decls only and verified that it scalarizes a const-pool array of chars
>on Aarch64.  The (otherwise yet untested) result is below.
>
>What do you think?

Why special case char arrays?  I'd simply disallow total scalarization of non-const arrays completely.

>Martin
>
>
>2017-04-13  Martin Jambor  <mjambor@suse.cz>
>
>	* tree-sra.c (scalarizable_type_p): New parameter const_decl, make
>	char arrays not totally scalarizable if it is false.
>	(analyze_all_variable_accesses): Pass correct value in the new
>	parameter.
>
>testsuite/
>	* g++.dg/tree-ssa/pr80293.C: New test.
>---
>gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45
>+++++++++++++++++++++++++++++++++
> gcc/tree-sra.c                          | 21 ++++++++++-----
> 2 files changed, 60 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>
>diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>new file mode 100644
>index 00000000000..7faf35ae983
>--- /dev/null
>+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>@@ -0,0 +1,45 @@
>+// { dg-do compile }
>+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
>+
>+#include <array>
>+
>+// Return a copy of the underlying memory of an arbitrary value.
>+template <
>+    typename T,
>+    typename = typename
>std::enable_if<std::is_trivially_copyable<T>::value>::type
>+>
>+auto getMem(
>+    T const & value
>+) -> std::array<char, sizeof(T)> {
>+    auto ret = std::array<char, sizeof(T)>{};
>+    __builtin_memcpy(ret.data(), &value, sizeof(T));
>+    return ret;
>+}
>+
>+template <
>+    typename T,
>+    typename = typename
>std::enable_if<std::is_trivially_copyable<T>::value>::type
>+>
>+auto fromMem(
>+    std::array<char, sizeof(T)> const & buf
>+) -> T {
>+    auto ret = T{};
>+    __builtin_memcpy(&ret, buf.data(), sizeof(T));
>+    return ret;
>+}
>+
>+double foo1(std::uint64_t arg) {
>+    return fromMem<double>(getMem(arg));
>+}
>+
>+double foo2(std::uint64_t arg) {
>+    return *reinterpret_cast<double*>(&arg);
>+}
>+
>+double foo3(std::uint64_t arg) {
>+    double ret;
>+    __builtin_memcpy(&ret, &arg, sizeof(arg));
>+    return ret;
>+}
>+
>+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
>diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>index 02453d3ed9a..ab06be66131 100644
>--- a/gcc/tree-sra.c
>+++ b/gcc/tree-sra.c
>@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool
>write)
> 
>/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or
>fixed-length
>ARRAY_TYPE with fields that are either of gimple register types
>(excluding
>-   bit-fields) or (recursively) scalarizable types.  */
>+   bit-fields) or (recursively) scalarizable types.  CONST_DECL must
>be true if
>+   we are considering a decl from constant pool.  If it is false, char
>arrays
>+   will be refused.  */
> 
> static bool
>-scalarizable_type_p (tree type)
>+scalarizable_type_p (tree type, bool const_decl)
> {
>   gcc_assert (!is_gimple_reg_type (type));
>   if (type_contains_placeholder_p (type))
>@@ -970,7 +972,7 @@ scalarizable_type_p (tree type)
> 	    return false;
> 
> 	  if (!is_gimple_reg_type (ft)
>-	      && !scalarizable_type_p (ft))
>+	      && !scalarizable_type_p (ft, const_decl))
> 	    return false;
> 	}
> 
>@@ -978,10 +980,16 @@ scalarizable_type_p (tree type)
> 
>   case ARRAY_TYPE:
>     {
>+      HOST_WIDE_INT min_elem_size;
>+      if (const_decl)
>+	min_elem_size = 0;
>+      else
>+	min_elem_size = BITS_PER_UNIT;
>+
>       if (TYPE_DOMAIN (type) == NULL_TREE
> 	  || !tree_fits_shwi_p (TYPE_SIZE (type))
> 	  || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type)))
>-	  || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0)
>+	  || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size)
> 	  || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type))))
> 	return false;
>       if (tree_to_shwi (TYPE_SIZE (type)) == 0
>@@ -995,7 +1003,7 @@ scalarizable_type_p (tree type)
> 
>       tree elem = TREE_TYPE (type);
>       if (!is_gimple_reg_type (elem)
>-	 && !scalarizable_type_p (elem))
>+	  && !scalarizable_type_p (elem, const_decl))
> 	return false;
>       return true;
>     }
>@@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void)
>       {
> 	tree var = candidate (i);
> 
>-	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
>+	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
>+						constant_decl_p (var)))
> 	  {
> 	    if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> 		<= max_scalarization_size)
Martin Jambor April 17, 2017, 7:35 p.m. UTC | #2
Hi,

On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote:
> On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote:
> >Hi,
> >
> >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
> >> On Wed, 12 Apr 2017, Martin Jambor wrote:
> >> 
> >> > Hi,
> >> > 
> >> > the patch below is an attempt to deal with PR 80293 as
> >non-invasively
> >> > as possible.  Basically, it switches off total SRA scalarization of
> >> > any local aggregates which contains an array of elements that have
> >one
> >> > byte (or less).
> >> > 
> >> > The logic behind this is that accessing such arrays element-wise
> >> > usually results in poor code and that such char arrays are often
> >used
> >> > for non-statically-typed content anyway, and we do not want to copy
> >> > that byte per byte.
> >> > 
> >> > Alan, do you think this could impact your constant pool
> >scalarization
> >> > too severely?
> >> 
> >> Hmm, isn't one of the issues that we have
> >> 
> >>         if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
> >>           {
> >>             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> >>                 <= max_scalarization_size)
> >>               {
> >>                 create_total_scalarization_access (var);
> >> 
> >> which limits the size of scalarizable vars but not the number
> >> of accesses we create for total scalarization?
> >
> >Well, yes, but at least I understood from your comment #4 in the bug
> >that you did not want to limit the number of replacements for gcc 7?
> >
> >> 
> >> Is scalarizable_type_p only used in contexts where we have no hint
> >> of the actual accesses?
> >
> >There are should_scalarize_away_bitmap and
> >cannot_scalarize_away_bitmap hints.
> >
> >Total scalarization is a hack process where we chop up small
> >aggregates according to their types - as opposed to actual accesses,
> >meaning it is an alien process to the rest of SRA - knowing that they
> >will go completely away afterwards (that is ensured by
> >cannot_scalarize_away_bitmap) and that this will facilitate copy
> >propagation (this is indicated by should_scalarize_away_bitmap, which
> >has a bit set if there is a non-scalar assignment _from_ (a part of)
> >the aggregate).
> 
> OK, but for the copy x = y where x would go away it still depends on uses of x what pieces we actually want?  Or is total scalarization only done for x  = y; z = x;?
> Thus no further accesses to x?

Total scalarization adds artificial accesses only to y, but
(in both cases of total and "natural" scalarization) for all aggregate
assignments between SRA candidates, SRA attempts to recreate all
accesses covering RHS to LHS.  Transitively.  So the artificial
accesses created for y will then get created for x and z even if they
would not be candidates for total scalarization.  So e.g. if z cannot
go away because it is passed to a function, it will be loaded
piece-wise from y.

> 
> >> That is, for the constant pool case we
> >> usually have
> >> 
> >>   x = .LC0;
> >>   .. = x[2];
> >> 
> >> so we have a "hint" that accesses on x are those we'd want to
> >> optimize to accesses to .LC0.
> >
> >You don't need total scalarization for this, just the existence of
> >read from x[2] would be sufficient but thanks for pointing me to the
> >right direction.  For constant pool decl scalarization, it is not
> >important to introduce artificial accesses for x but for .LC0.
> >Therefore, I have adapted the patch to allow char arrays for const
> >decls only and verified that it scalarizes a const-pool array of chars
> >on Aarch64.  The (otherwise yet untested) result is below.
> >
> >What do you think?
> 
> Why special case char arrays?  I'd simply disallow total scalarization of non-const arrays completely.

Well, currently we will get element-wise copy propagation for things
like "int rgb[3];" (possibly embeded in a small struct).  If I remove
it, someone will complain.  Maybe the correct limit is SI mode size or
BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though.
I just wanted to be conservative at this point in the release cycle.

Martin

> 
> >Martin
> >
> >
> >2017-04-13  Martin Jambor  <mjambor@suse.cz>
> >
> >	* tree-sra.c (scalarizable_type_p): New parameter const_decl, make
> >	char arrays not totally scalarizable if it is false.
> >	(analyze_all_variable_accesses): Pass correct value in the new
> >	parameter.
> >
> >testsuite/
> >	* g++.dg/tree-ssa/pr80293.C: New test.
> >---
> >gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45
> >+++++++++++++++++++++++++++++++++
> > gcc/tree-sra.c                          | 21 ++++++++++-----
> > 2 files changed, 60 insertions(+), 6 deletions(-)
> > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> >
> >diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> >b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> >new file mode 100644
> >index 00000000000..7faf35ae983
> >--- /dev/null
> >+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> >@@ -0,0 +1,45 @@
> >+// { dg-do compile }
> >+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
> >+
> >+#include <array>
> >+
> >+// Return a copy of the underlying memory of an arbitrary value.
> >+template <
> >+    typename T,
> >+    typename = typename
> >std::enable_if<std::is_trivially_copyable<T>::value>::type
> >+>
> >+auto getMem(
> >+    T const & value
> >+) -> std::array<char, sizeof(T)> {
> >+    auto ret = std::array<char, sizeof(T)>{};
> >+    __builtin_memcpy(ret.data(), &value, sizeof(T));
> >+    return ret;
> >+}
> >+
> >+template <
> >+    typename T,
> >+    typename = typename
> >std::enable_if<std::is_trivially_copyable<T>::value>::type
> >+>
> >+auto fromMem(
> >+    std::array<char, sizeof(T)> const & buf
> >+) -> T {
> >+    auto ret = T{};
> >+    __builtin_memcpy(&ret, buf.data(), sizeof(T));
> >+    return ret;
> >+}
> >+
> >+double foo1(std::uint64_t arg) {
> >+    return fromMem<double>(getMem(arg));
> >+}
> >+
> >+double foo2(std::uint64_t arg) {
> >+    return *reinterpret_cast<double*>(&arg);
> >+}
> >+
> >+double foo3(std::uint64_t arg) {
> >+    double ret;
> >+    __builtin_memcpy(&ret, &arg, sizeof(arg));
> >+    return ret;
> >+}
> >+
> >+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
> >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> >index 02453d3ed9a..ab06be66131 100644
> >--- a/gcc/tree-sra.c
> >+++ b/gcc/tree-sra.c
> >@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool
> >write)
> > 
> >/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or
> >fixed-length
> >ARRAY_TYPE with fields that are either of gimple register types
> >(excluding
> >-   bit-fields) or (recursively) scalarizable types.  */
> >+   bit-fields) or (recursively) scalarizable types.  CONST_DECL must
> >be true if
> >+   we are considering a decl from constant pool.  If it is false, char
> >arrays
> >+   will be refused.  */
> > 
> > static bool
> >-scalarizable_type_p (tree type)
> >+scalarizable_type_p (tree type, bool const_decl)
> > {
> >   gcc_assert (!is_gimple_reg_type (type));
> >   if (type_contains_placeholder_p (type))
> >@@ -970,7 +972,7 @@ scalarizable_type_p (tree type)
> > 	    return false;
> > 
> > 	  if (!is_gimple_reg_type (ft)
> >-	      && !scalarizable_type_p (ft))
> >+	      && !scalarizable_type_p (ft, const_decl))
> > 	    return false;
> > 	}
> > 
> >@@ -978,10 +980,16 @@ scalarizable_type_p (tree type)
> > 
> >   case ARRAY_TYPE:
> >     {
> >+      HOST_WIDE_INT min_elem_size;
> >+      if (const_decl)
> >+	min_elem_size = 0;
> >+      else
> >+	min_elem_size = BITS_PER_UNIT;
> >+
> >       if (TYPE_DOMAIN (type) == NULL_TREE
> > 	  || !tree_fits_shwi_p (TYPE_SIZE (type))
> > 	  || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type)))
> >-	  || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0)
> >+	  || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size)
> > 	  || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type))))
> > 	return false;
> >       if (tree_to_shwi (TYPE_SIZE (type)) == 0
> >@@ -995,7 +1003,7 @@ scalarizable_type_p (tree type)
> > 
> >       tree elem = TREE_TYPE (type);
> >       if (!is_gimple_reg_type (elem)
> >-	 && !scalarizable_type_p (elem))
> >+	  && !scalarizable_type_p (elem, const_decl))
> > 	return false;
> >       return true;
> >     }
> >@@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void)
> >       {
> > 	tree var = candidate (i);
> > 
> >-	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
> >+	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
> >+						constant_decl_p (var)))
> > 	  {
> > 	    if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> > 		<= max_scalarization_size)
>
Richard Biener April 20, 2017, 11:07 a.m. UTC | #3
On Mon, 17 Apr 2017, Martin Jambor wrote:

> Hi,
> 
> On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote:
> > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote:
> > >Hi,
> > >
> > >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
> > >> On Wed, 12 Apr 2017, Martin Jambor wrote:
> > >> 
> > >> > Hi,
> > >> > 
> > >> > the patch below is an attempt to deal with PR 80293 as
> > >non-invasively
> > >> > as possible.  Basically, it switches off total SRA scalarization of
> > >> > any local aggregates which contains an array of elements that have
> > >one
> > >> > byte (or less).
> > >> > 
> > >> > The logic behind this is that accessing such arrays element-wise
> > >> > usually results in poor code and that such char arrays are often
> > >used
> > >> > for non-statically-typed content anyway, and we do not want to copy
> > >> > that byte per byte.
> > >> > 
> > >> > Alan, do you think this could impact your constant pool
> > >scalarization
> > >> > too severely?
> > >> 
> > >> Hmm, isn't one of the issues that we have
> > >> 
> > >>         if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
> > >>           {
> > >>             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> > >>                 <= max_scalarization_size)
> > >>               {
> > >>                 create_total_scalarization_access (var);
> > >> 
> > >> which limits the size of scalarizable vars but not the number
> > >> of accesses we create for total scalarization?
> > >
> > >Well, yes, but at least I understood from your comment #4 in the bug
> > >that you did not want to limit the number of replacements for gcc 7?
> > >
> > >> 
> > >> Is scalarizable_type_p only used in contexts where we have no hint
> > >> of the actual accesses?
> > >
> > >There are should_scalarize_away_bitmap and
> > >cannot_scalarize_away_bitmap hints.
> > >
> > >Total scalarization is a hack process where we chop up small
> > >aggregates according to their types - as opposed to actual accesses,
> > >meaning it is an alien process to the rest of SRA - knowing that they
> > >will go completely away afterwards (that is ensured by
> > >cannot_scalarize_away_bitmap) and that this will facilitate copy
> > >propagation (this is indicated by should_scalarize_away_bitmap, which
> > >has a bit set if there is a non-scalar assignment _from_ (a part of)
> > >the aggregate).
> > 
> > OK, but for the copy x = y where x would go away it still depends on uses of x what pieces we actually want?  Or is total scalarization only done for x  = y; z = x;?
> > Thus no further accesses to x?
> 
> Total scalarization adds artificial accesses only to y, but
> (in both cases of total and "natural" scalarization) for all aggregate
> assignments between SRA candidates, SRA attempts to recreate all
> accesses covering RHS to LHS.  Transitively.  So the artificial
> accesses created for y will then get created for x and z even if they
> would not be candidates for total scalarization.  So e.g. if z cannot
> go away because it is passed to a function, it will be loaded
> piece-wise from y.

So what I was trying to figure out is whether there is anything that
fores us to totally scalarize using "natural" elements rather than
using larger accesses?  For

  x = y;
  z = x;

the best pieces to chop x to depends on the write accesses to y
and the read accesses from z (we eventually want to easily match them
up for CSE).

I see we first create total scalarization accesses and only then
doing propagate_all_subaccesses.

> > 
> > >> That is, for the constant pool case we
> > >> usually have
> > >> 
> > >>   x = .LC0;
> > >>   .. = x[2];
> > >> 
> > >> so we have a "hint" that accesses on x are those we'd want to
> > >> optimize to accesses to .LC0.
> > >
> > >You don't need total scalarization for this, just the existence of
> > >read from x[2] would be sufficient but thanks for pointing me to the
> > >right direction.  For constant pool decl scalarization, it is not
> > >important to introduce artificial accesses for x but for .LC0.
> > >Therefore, I have adapted the patch to allow char arrays for const
> > >decls only and verified that it scalarizes a const-pool array of chars
> > >on Aarch64.  The (otherwise yet untested) result is below.
> > >
> > >What do you think?
> > 
> > Why special case char arrays?  I'd simply disallow total scalarization of non-const arrays completely.
> 
> Well, currently we will get element-wise copy propagation for things
> like "int rgb[3];" (possibly embeded in a small struct).  If I remove
> it, someone will complain.  Maybe the correct limit is SI mode size or
> BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though.
> I just wanted to be conservative at this point in the release cycle.

Sure but this total scalarization was added to be able to defer
ctor "lowering" at gimplification time to SRA.  So it would be ok
(IMHO) to limit it to constant initializers.

But agreed for to be conservative at this point.  I'd say we do this
once stage1 opens on trunk and then backport for 7.2.

Maybe there's input from Alan as well then.

Thanks,
Richard.

> Martin
> 
> > 
> > >Martin
> > >
> > >
> > >2017-04-13  Martin Jambor  <mjambor@suse.cz>
> > >
> > >	* tree-sra.c (scalarizable_type_p): New parameter const_decl, make
> > >	char arrays not totally scalarizable if it is false.
> > >	(analyze_all_variable_accesses): Pass correct value in the new
> > >	parameter.
> > >
> > >testsuite/
> > >	* g++.dg/tree-ssa/pr80293.C: New test.
> > >---
> > >gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45
> > >+++++++++++++++++++++++++++++++++
> > > gcc/tree-sra.c                          | 21 ++++++++++-----
> > > 2 files changed, 60 insertions(+), 6 deletions(-)
> > > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> > >
> > >diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> > >b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> > >new file mode 100644
> > >index 00000000000..7faf35ae983
> > >--- /dev/null
> > >+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
> > >@@ -0,0 +1,45 @@
> > >+// { dg-do compile }
> > >+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
> > >+
> > >+#include <array>
> > >+
> > >+// Return a copy of the underlying memory of an arbitrary value.
> > >+template <
> > >+    typename T,
> > >+    typename = typename
> > >std::enable_if<std::is_trivially_copyable<T>::value>::type
> > >+>
> > >+auto getMem(
> > >+    T const & value
> > >+) -> std::array<char, sizeof(T)> {
> > >+    auto ret = std::array<char, sizeof(T)>{};
> > >+    __builtin_memcpy(ret.data(), &value, sizeof(T));
> > >+    return ret;
> > >+}
> > >+
> > >+template <
> > >+    typename T,
> > >+    typename = typename
> > >std::enable_if<std::is_trivially_copyable<T>::value>::type
> > >+>
> > >+auto fromMem(
> > >+    std::array<char, sizeof(T)> const & buf
> > >+) -> T {
> > >+    auto ret = T{};
> > >+    __builtin_memcpy(&ret, buf.data(), sizeof(T));
> > >+    return ret;
> > >+}
> > >+
> > >+double foo1(std::uint64_t arg) {
> > >+    return fromMem<double>(getMem(arg));
> > >+}
> > >+
> > >+double foo2(std::uint64_t arg) {
> > >+    return *reinterpret_cast<double*>(&arg);
> > >+}
> > >+
> > >+double foo3(std::uint64_t arg) {
> > >+    double ret;
> > >+    __builtin_memcpy(&ret, &arg, sizeof(arg));
> > >+    return ret;
> > >+}
> > >+
> > >+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
> > >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> > >index 02453d3ed9a..ab06be66131 100644
> > >--- a/gcc/tree-sra.c
> > >+++ b/gcc/tree-sra.c
> > >@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool
> > >write)
> > > 
> > >/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or
> > >fixed-length
> > >ARRAY_TYPE with fields that are either of gimple register types
> > >(excluding
> > >-   bit-fields) or (recursively) scalarizable types.  */
> > >+   bit-fields) or (recursively) scalarizable types.  CONST_DECL must
> > >be true if
> > >+   we are considering a decl from constant pool.  If it is false, char
> > >arrays
> > >+   will be refused.  */
> > > 
> > > static bool
> > >-scalarizable_type_p (tree type)
> > >+scalarizable_type_p (tree type, bool const_decl)
> > > {
> > >   gcc_assert (!is_gimple_reg_type (type));
> > >   if (type_contains_placeholder_p (type))
> > >@@ -970,7 +972,7 @@ scalarizable_type_p (tree type)
> > > 	    return false;
> > > 
> > > 	  if (!is_gimple_reg_type (ft)
> > >-	      && !scalarizable_type_p (ft))
> > >+	      && !scalarizable_type_p (ft, const_decl))
> > > 	    return false;
> > > 	}
> > > 
> > >@@ -978,10 +980,16 @@ scalarizable_type_p (tree type)
> > > 
> > >   case ARRAY_TYPE:
> > >     {
> > >+      HOST_WIDE_INT min_elem_size;
> > >+      if (const_decl)
> > >+	min_elem_size = 0;
> > >+      else
> > >+	min_elem_size = BITS_PER_UNIT;
> > >+
> > >       if (TYPE_DOMAIN (type) == NULL_TREE
> > > 	  || !tree_fits_shwi_p (TYPE_SIZE (type))
> > > 	  || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type)))
> > >-	  || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0)
> > >+	  || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size)
> > > 	  || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type))))
> > > 	return false;
> > >       if (tree_to_shwi (TYPE_SIZE (type)) == 0
> > >@@ -995,7 +1003,7 @@ scalarizable_type_p (tree type)
> > > 
> > >       tree elem = TREE_TYPE (type);
> > >       if (!is_gimple_reg_type (elem)
> > >-	 && !scalarizable_type_p (elem))
> > >+	  && !scalarizable_type_p (elem, const_decl))
> > > 	return false;
> > >       return true;
> > >     }
> > >@@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void)
> > >       {
> > > 	tree var = candidate (i);
> > > 
> > >-	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
> > >+	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
> > >+						constant_decl_p (var)))
> > > 	  {
> > > 	    if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> > > 		<= max_scalarization_size)
> > 
> 
>
Martin Jambor April 20, 2017, 12:59 p.m. UTC | #4
Hi,

On Mon, Apr 17, 2017 at 09:35:18PM +0200, Martin Jambor wrote:
> On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote:
> > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote:

...

> > >Therefore, I have adapted the patch to allow char arrays for const
> > >decls only and verified that it scalarizes a const-pool array of chars
> > >on Aarch64.  The (otherwise yet untested) result is below.
> > >
> > >What do you think?
> > 
> > Why special case char arrays?  I'd simply disallow total scalarization of non-const arrays completely.
> 
> Well, currently we will get element-wise copy propagation for things
> like "int rgb[3];" (possibly embeded in a small struct).  If I remove
> it, someone will complain.  Maybe the correct limit is SI mode size or
> BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though.
> I just wanted to be conservative at this point in the release cycle.
> 

To gather some data for this, I have looked at SRA statistics gathered
for SPEC 2006 benchmarks.  Complete tables are below, let me comment
on the diffs, though.  The most interesting column is probably the
third one (the second one with numbers).

This is the difference between my patch that special-cases char arrays
and disallowing any non-const-pool array total scalarization:

 | Benchmark - pass      | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts |
 |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------|
-| 403.gcc - esra        |                   138 |                           12 |                   458 |           1274 |            20 |
-| 403.gcc - sra         |                    42 |                           11 |                   141 |            457 |            13 |
+| 403.gcc - esra        |                   132 |                            4 |                   370 |           1274 |            13 |
+| 403.gcc - sra         |                    43 |                            8 |                   140 |            459 |            12 |
-| 447.dealII - esra     |                 12899 |                         7826 |                 16347 |          29046 |          1840 |
-| 447.dealII - sra      |                  1083 |                          567 |                  2298 |           5479 |           392 |
+| 447.dealII - esra     |                 12897 |                         7790 |                 16329 |          29046 |          1838 |
+| 447.dealII - sra      |                  1079 |                          553 |                  2262 |           5479 |           388 |
-| 450.soplex - sra      |                    42 |                            2 |                    87 |            263 |             2 |
+| 450.soplex - sra      |                    42 |                            0 |                    81 |            263 |             2 |
-| 453.povray - esra     |                   265 |                           10 |                   812 |           3422 |            12 |
-| 453.povray - sra      |                    76 |                            6 |                   217 |            786 |            57 |
+| 453.povray - esra     |                   264 |                            8 |                   806 |           3422 |            11 |
+| 453.povray - sra      |                    76 |                            5 |                   215 |            786 |            57 |
-| 465.tonto - esra      |                  4029 |                            5 |                  8269 |          21351 |             5 |
+| 465.tonto - esra      |                  4024 |                            0 |                  8233 |          21351 |             0 |

For reference, this is the difference between (2 week old) trunk and
my proposed patch:

 | Benchmark - pass      | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts |
 |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------|
-| 401.bzip2 - sra       |                     3 |                            2 |                    22 |             62 |             0 |
+| 401.bzip2 - sra       |                     3 |                            0 |                    22 |             62 |             0 |
-| 403.gcc - esra        |                   138 |                           13 |                   458 |           1274 |            20 |
-| 403.gcc - sra         |                    42 |                           12 |                   141 |            457 |            13 |
+| 403.gcc - esra        |                   138 |                           12 |                   458 |           1274 |            20 |
+| 403.gcc - sra         |                    42 |                           11 |                   141 |            457 |            13 |
-| 447.dealII - esra     |                 12899 |                         7830 |                 16387 |          29046 |          1840 |
+| 447.dealII - esra     |                 12899 |                         7826 |                 16347 |          29046 |          1840 |
-| 453.povray - esra     |                   266 |                           11 |                   822 |           3422 |            13 |
+| 453.povray - esra     |                   265 |                           10 |                   812 |           3422 |            12 |
-| 454.calculix - sra    |                    17 |                            2 |                    81 |            305 |             5 |
+| 454.calculix - sra    |                    15 |                            0 |                    71 |            305 |             0 |
-| 465.tonto - sra       |                    98 |                           36 |                  4652 |            374 |            37 |
+| 465.tonto - sra       |                    62 |                            0 |                   166 |            374 |             0 |
-| 483.xalancbmk - esra  |                 17844 |                        10648 |                 19824 |          39699 |          1404 |
+| 483.xalancbmk - esra  |                 17844 |                        10642 |                 19764 |          39699 |          1404 |


The following are the non-char non-const-pool arrays we totally
scalarize now in individual benchmarks with some notes:

GCC
  - struct output_state in diagnostic.c contains "int
    diagnostic_count[6]".  I briefly inspected of a few affected
    functions, in all we did perform copy propagation and got rid of a
    useless aggregate intermediary.  But that copy propagation was done
    integer-wise which is only 32bit.

  - ereal_negate in real.c has local "short unsigned int[6]" this
    actually looks harmful and that the size limit should not allow
    shorts either.  ereal_ldexp looks like having the same thing (but is
    much longer).

  - struct realvaluetype in in build_real_from_int_cst in has int[3] in
    it, resulting in a copy propagation with PHI nodes.

dealII
  - struct TableIndices is basically encapsulated int[2].  Total
    scalarization facilitates constant propagation of zeros quite a few
    times.  I have not checked the optimized dump to see if ti matters
    though.

  - struct _ValueType is int[8[ followed by a single char field.
    Scalarization also only facilitates propagation of a zero
    initializer... using 32bit stores.  struct CellData is the same thing.

soplex
  - UnitVector of four pointers.  Total scalarization does not change what
    normal SRA does.

povray 
  - struct BCYL_INT has two double[2] arrays.  Facilitates quite a lot
    of zero initializer propagation of the first array and propagation
    of undefinedness of the second array.

  - struct BBOX is similar in structure but total scalarization is
    probably unnecessary.

  - TempPixelD.26340 contains float[5] and seems to facilitate nice copy
    propagation.

tonto
  - struct array1_integer has a one element array of three long integers
    called stride, lbound and ubound.  Total scalarization facilitates
    simple copy propagation of the form local = *p1; *p2 = local;

  - ditto for struct array2_real which has descriptors of two
    dimensions.

  - only 5 cases overall but in ipa-sra so they might be inlined to lots
    of places.

So, I still think we should keep total scalarization of arrays with
WORD-sized elements and perhaps of int-sized elements too.

Martin


All data:

No patch:
| Benchmark - pass      | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts |
|-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------|
| 400.perlbench - esra  |                     7 |                            4 |                    54 |            104 |             4 |
| 400.perlbench - sra   |                     0 |                            0 |                     0 |              0 |             0 |
| 401.bzip2 - esra      |                     3 |                            0 |                     9 |             66 |             0 |
| 401.bzip2 - sra       |                     3 |                            2 |                    22 |             62 |             0 |
| 403.gcc - esra        |                   138 |                           13 |                   458 |           1274 |            20 |
| 403.gcc - sra         |                    42 |                           12 |                   141 |            457 |            13 |
| 410.bwaves - esra     |                     0 |                            0 |                     0 |              0 |             0 |
| 410.bwaves - sra      |                     0 |                            0 |                     0 |              0 |             0 |
| 416.gamess - esra     |                    48 |                            0 |                   246 |           1250 |             0 |
| 416.gamess - sra      |                    46 |                            0 |                   498 |           2208 |             0 |
| 429.mcf - esra        |                     0 |                            0 |                     0 |              0 |             0 |
| 429.mcf - sra         |                     0 |                            0 |                     0 |              0 |             0 |
| 433.milc - esra       |                    18 |                            3 |                    33 |            103 |             4 |
| 433.milc - sra        |                     6 |                            0 |                    18 |             87 |             0 |
| 434.zeusmp - esra     |                     3 |                            0 |                     7 |             87 |             0 |
| 434.zeusmp - sra      |                     0 |                            0 |                     0 |              0 |             0 |
| 435.gromacs - esra    |                   150 |                           22 |                   378 |           1048 |            23 |
| 435.gromacs - sra     |                    48 |                            0 |                    88 |            458 |             0 |
| 436.cactusADM - esra  |                    58 |                           30 |                   117 |            538 |            30 |
| 436.cactusADM - sra   |                     0 |                            0 |                     0 |              0 |             0 |
| 437.leslie3d - esra   |                    91 |                            0 |                   412 |           2370 |             0 |
| 437.leslie3d - sra    |                     1 |                            0 |                     1 |              5 |             0 |
| 444.namd - esra       |                   175 |                           11 |                   182 |            676 |            11 |
| 444.namd - sra        |                     4 |                            4 |                    12 |              0 |             4 |
| 445.gobmk - esra      |                     5 |                            1 |                    10 |             33 |             1 |
| 445.gobmk - sra       |                     1 |                            0 |                     3 |             13 |             0 |
| 447.dealII - esra     |                 12899 |                         7830 |                 16387 |          29046 |          1840 |
| 447.dealII - sra      |                  1083 |                          567 |                  2298 |           5479 |           392 |
| 450.soplex - esra     |                    64 |                           34 |                    68 |            128 |             6 |
| 450.soplex - sra      |                    42 |                            2 |                    87 |            263 |             2 |
| 453.povray - esra     |                   266 |                           11 |                   822 |           3422 |            13 |
| 453.povray - sra      |                    76 |                            6 |                   217 |            786 |            57 |
| 454.calculix - esra   |                    19 |                            0 |                   126 |            836 |             0 |
| 454.calculix - sra    |                    17 |                            2 |                    81 |            305 |             5 |
| 456.hmmer - esra      |                     0 |                            0 |                     0 |              0 |             0 |
| 456.hmmer - sra       |                     4 |                            0 |                     5 |             13 |             0 |
| 458.sjeng - esra      |                     7 |                            6 |                    38 |              3 |            12 |
| 458.sjeng - sra       |                     3 |                            0 |                    15 |             33 |             2 |
| 459.GemsFDTD - esra   |                   217 |                            1 |                   484 |           1805 |             1 |
| 459.GemsFDTD - sra    |                     0 |                            0 |                     0 |              0 |             0 |
| 462.libquantum - esra |                    17 |                            6 |                    44 |            126 |             6 |
| 462.libquantum - sra  |                     6 |                            0 |                     6 |             11 |             0 |
| 464.h264ref - esra    |                    16 |                            0 |                    92 |            393 |             0 |
| 464.h264ref - sra     |                     6 |                            0 |                    12 |             84 |             0 |
| 465.tonto - esra      |                  4029 |                            5 |                  8269 |          21351 |             5 |
| 465.tonto - sra       |                    98 |                           36 |                  4652 |            374 |            37 |
| 470.lbm - esra        |                     0 |                            0 |                     0 |              0 |             0 |
| 470.lbm - sra         |                     0 |                            0 |                     0 |              0 |             0 |
| 471.omnetpp - esra    |                   208 |                          152 |                   232 |            560 |            24 |
| 471.omnetpp - sra     |                    34 |                           17 |                    49 |            200 |            12 |
| 473.astar - esra      |                     4 |                            3 |                    12 |             25 |             3 |
| 473.astar - sra       |                    24 |                           19 |                    68 |            118 |            19 |
| 481.wrf - esra        |                  2113 |                           11 |                  2560 |           5778 |             0 |
| 481.wrf - sra         |                    16 |                            1 |                    40 |             57 |             1 |
| 482.sphinx3 - esra    |                     0 |                            0 |                     0 |              0 |             0 |
| 482.sphinx3 - sra     |                     0 |                            0 |                     0 |              0 |             0 |
| 483.xalancbmk - esra  |                 17844 |                        10648 |                 19824 |          39699 |          1404 |
| 483.xalancbmk - sra   |                   862 |                          426 |                  1634 |           3814 |           249 |


With my patch:

| Benchmark - pass      | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts |
|-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------|
| 400.perlbench - esra  |                     7 |                            4 |                    54 |            104 |             4 |
| 400.perlbench - sra   |                     0 |                            0 |                     0 |              0 |             0 |
| 401.bzip2 - esra      |                     3 |                            0 |                     9 |             66 |             0 |
| 401.bzip2 - sra       |                     3 |                            0 |                    22 |             62 |             0 |
| 403.gcc - esra        |                   138 |                           12 |                   458 |           1274 |            20 |
| 403.gcc - sra         |                    42 |                           11 |                   141 |            457 |            13 |
| 410.bwaves - esra     |                     0 |                            0 |                     0 |              0 |             0 |
| 410.bwaves - sra      |                     0 |                            0 |                     0 |              0 |             0 |
| 416.gamess - esra     |                    48 |                            0 |                   246 |           1250 |             0 |
| 416.gamess - sra      |                    46 |                            0 |                   498 |           2208 |             0 |
| 429.mcf - esra        |                     0 |                            0 |                     0 |              0 |             0 |
| 429.mcf - sra         |                     0 |                            0 |                     0 |              0 |             0 |
| 433.milc - esra       |                    18 |                            3 |                    33 |            103 |             4 |
| 433.milc - sra        |                     6 |                            0 |                    18 |             87 |             0 |
| 434.zeusmp - esra     |                     3 |                            0 |                     7 |             87 |             0 |
| 434.zeusmp - sra      |                     0 |                            0 |                     0 |              0 |             0 |
| 435.gromacs - esra    |                   150 |                           22 |                   378 |           1048 |            23 |
| 435.gromacs - sra     |                    48 |                            0 |                    88 |            458 |             0 |
| 436.cactusADM - esra  |                    58 |                           30 |                   117 |            538 |            30 |
| 436.cactusADM - sra   |                     0 |                            0 |                     0 |              0 |             0 |
| 437.leslie3d - esra   |                    91 |                            0 |                   412 |           2370 |             0 |
| 437.leslie3d - sra    |                     1 |                            0 |                     1 |              5 |             0 |
| 444.namd - esra       |                   175 |                           11 |                   182 |            676 |            11 |
| 444.namd - sra        |                     4 |                            4 |                    12 |              0 |             4 |
| 445.gobmk - esra      |                     5 |                            1 |                    10 |             33 |             1 |
| 445.gobmk - sra       |                     1 |                            0 |                     3 |             13 |             0 |
| 447.dealII - esra     |                 12899 |                         7826 |                 16347 |          29046 |          1840 |
| 447.dealII - sra      |                  1083 |                          567 |                  2298 |           5479 |           392 |
| 450.soplex - esra     |                    64 |                           34 |                    68 |            128 |             6 |
| 450.soplex - sra      |                    42 |                            2 |                    87 |            263 |             2 |
| 453.povray - esra     |                   265 |                           10 |                   812 |           3422 |            12 |
| 453.povray - sra      |                    76 |                            6 |                   217 |            786 |            57 |
| 454.calculix - esra   |                    19 |                            0 |                   126 |            836 |             0 |
| 454.calculix - sra    |                    15 |                            0 |                    71 |            305 |             0 |
| 456.hmmer - esra      |                     0 |                            0 |                     0 |              0 |             0 |
| 456.hmmer - sra       |                     4 |                            0 |                     5 |             13 |             0 |
| 458.sjeng - esra      |                     7 |                            6 |                    38 |              3 |            12 |
| 458.sjeng - sra       |                     3 |                            0 |                    15 |             33 |             2 |
| 459.GemsFDTD - esra   |                   217 |                            1 |                   484 |           1805 |             1 |
| 459.GemsFDTD - sra    |                     0 |                            0 |                     0 |              0 |             0 |
| 462.libquantum - esra |                    17 |                            6 |                    44 |            126 |             6 |
| 462.libquantum - sra  |                     6 |                            0 |                     6 |             11 |             0 |
| 464.h264ref - esra    |                    16 |                            0 |                    92 |            393 |             0 |
| 464.h264ref - sra     |                     6 |                            0 |                    12 |             84 |             0 |
| 465.tonto - esra      |                  4029 |                            5 |                  8269 |          21351 |             5 |
| 465.tonto - sra       |                    62 |                            0 |                   166 |            374 |             0 |
| 470.lbm - esra        |                     0 |                            0 |                     0 |              0 |             0 |
| 470.lbm - sra         |                     0 |                            0 |                     0 |              0 |             0 |
| 471.omnetpp - esra    |                   208 |                          152 |                   232 |            560 |            24 |
| 471.omnetpp - sra     |                    34 |                           17 |                    49 |            200 |            12 |
| 473.astar - esra      |                     4 |                            3 |                    12 |             25 |             3 |
| 473.astar - sra       |                    24 |                           19 |                    68 |            118 |            19 |
| 481.wrf - esra        |                  2113 |                           11 |                  2560 |           5778 |             0 |
| 481.wrf - sra         |                    16 |                            1 |                    40 |             57 |             1 |
| 482.sphinx3 - esra    |                     0 |                            0 |                     0 |              0 |             0 |
| 482.sphinx3 - sra     |                     0 |                            0 |                     0 |              0 |             0 |
| 483.xalancbmk - esra  |                 17844 |                        10642 |                 19764 |          39699 |          1404 |
| 483.xalancbmk - sra   |                   862 |                          426 |                  1634 |           3814 |           249 |

Disallowing any non-constant-pool arrays:

| Benchmark - pass      | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts |
|-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------|
| 400.perlbench - esra  |                     7 |                            4 |                    54 |            104 |             4 |
| 400.perlbench - sra   |                     0 |                            0 |                     0 |              0 |             0 |
| 401.bzip2 - esra      |                     3 |                            0 |                     9 |             66 |             0 |
| 401.bzip2 - sra       |                     3 |                            0 |                    22 |             62 |             0 |
| 403.gcc - esra        |                   132 |                            4 |                   370 |           1274 |            13 |
| 403.gcc - sra         |                    43 |                            8 |                   140 |            459 |            12 |
| 410.bwaves - esra     |                     0 |                            0 |                     0 |              0 |             0 |
| 410.bwaves - sra      |                     0 |                            0 |                     0 |              0 |             0 |
| 416.gamess - esra     |                    48 |                            0 |                   246 |           1250 |             0 |
| 416.gamess - sra      |                    46 |                            0 |                   498 |           2208 |             0 |
| 429.mcf - esra        |                     0 |                            0 |                     0 |              0 |             0 |
| 429.mcf - sra         |                     0 |                            0 |                     0 |              0 |             0 |
| 433.milc - esra       |                    18 |                            3 |                    33 |            103 |             4 |
| 433.milc - sra        |                     6 |                            0 |                    18 |             87 |             0 |
| 434.zeusmp - esra     |                     3 |                            0 |                     7 |             87 |             0 |
| 434.zeusmp - sra      |                     0 |                            0 |                     0 |              0 |             0 |
| 435.gromacs - esra    |                   150 |                           22 |                   378 |           1048 |            23 |
| 435.gromacs - sra     |                    48 |                            0 |                    88 |            458 |             0 |
| 436.cactusADM - esra  |                    58 |                           30 |                   117 |            538 |            30 |
| 436.cactusADM - sra   |                     0 |                            0 |                     0 |              0 |             0 |
| 437.leslie3d - esra   |                    91 |                            0 |                   412 |           2370 |             0 |
| 437.leslie3d - sra    |                     1 |                            0 |                     1 |              5 |             0 |
| 444.namd - esra       |                   175 |                           11 |                   182 |            676 |            11 |
| 444.namd - sra        |                     4 |                            4 |                    12 |              0 |             4 |
| 445.gobmk - esra      |                     5 |                            1 |                    10 |             33 |             1 |
| 445.gobmk - sra       |                     1 |                            0 |                     3 |             13 |             0 |
| 447.dealII - esra     |                 12897 |                         7790 |                 16329 |          29046 |          1838 |
| 447.dealII - sra      |                  1079 |                          553 |                  2262 |           5479 |           388 |
| 450.soplex - esra     |                    64 |                           34 |                    68 |            128 |             6 |
| 450.soplex - sra      |                    42 |                            0 |                    81 |            263 |             2 |
| 453.povray - esra     |                   264 |                            8 |                   806 |           3422 |            11 |
| 453.povray - sra      |                    76 |                            5 |                   215 |            786 |            57 |
| 454.calculix - esra   |                    19 |                            0 |                   126 |            836 |             0 |
| 454.calculix - sra    |                    15 |                            0 |                    71 |            305 |             0 |
| 456.hmmer - esra      |                     0 |                            0 |                     0 |              0 |             0 |
| 456.hmmer - sra       |                     4 |                            0 |                     5 |             13 |             0 |
| 458.sjeng - esra      |                     7 |                            6 |                    38 |              3 |            12 |
| 458.sjeng - sra       |                     3 |                            0 |                    15 |             33 |             2 |
| 459.GemsFDTD - esra   |                   217 |                            1 |                   484 |           1805 |             1 |
| 459.GemsFDTD - sra    |                     0 |                            0 |                     0 |              0 |             0 |
| 462.libquantum - esra |                    17 |                            6 |                    44 |            126 |             6 |
| 462.libquantum - sra  |                     6 |                            0 |                     6 |             11 |             0 |
| 464.h264ref - esra    |                    16 |                            0 |                    92 |            393 |             0 |
| 464.h264ref - sra     |                     6 |                            0 |                    12 |             84 |             0 |
| 465.tonto - esra      |                  4024 |                            0 |                  8233 |          21351 |             0 |
| 465.tonto - sra       |                    62 |                            0 |                   166 |            374 |             0 |
| 470.lbm - esra        |                     0 |                            0 |                     0 |              0 |             0 |
| 470.lbm - sra         |                     0 |                            0 |                     0 |              0 |             0 |
| 471.omnetpp - esra    |                   208 |                          152 |                   232 |            560 |            24 |
| 471.omnetpp - sra     |                    34 |                           17 |                    49 |            200 |            12 |
| 473.astar - esra      |                     4 |                            3 |                    12 |             25 |             3 |
| 473.astar - sra       |                    24 |                           19 |                    68 |            118 |            19 |
| 481.wrf - esra        |                  2113 |                           11 |                  2560 |           5778 |             0 |
| 481.wrf - sra         |                    16 |                            1 |                    40 |             57 |             1 |
| 482.sphinx3 - esra    |                     0 |                            0 |                     0 |              0 |             0 |
| 482.sphinx3 - sra     |                     0 |                            0 |                     0 |              0 |             0 |
| 483.xalancbmk - esra  |                 17844 |                        10642 |                 19764 |          39699 |          1404 |
| 483.xalancbmk - sra   |                   862 |                          426 |                  1634 |           3814 |           249 |
diff mbox

Patch

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
new file mode 100644
index 00000000000..7faf35ae983
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
@@ -0,0 +1,45 @@ 
+// { dg-do compile }
+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
+
+#include <array>
+
+// Return a copy of the underlying memory of an arbitrary value.
+template <
+    typename T,
+    typename = typename std::enable_if<std::is_trivially_copyable<T>::value>::type
+>
+auto getMem(
+    T const & value
+) -> std::array<char, sizeof(T)> {
+    auto ret = std::array<char, sizeof(T)>{};
+    __builtin_memcpy(ret.data(), &value, sizeof(T));
+    return ret;
+}
+
+template <
+    typename T,
+    typename = typename std::enable_if<std::is_trivially_copyable<T>::value>::type
+>
+auto fromMem(
+    std::array<char, sizeof(T)> const & buf
+) -> T {
+    auto ret = T{};
+    __builtin_memcpy(&ret, buf.data(), sizeof(T));
+    return ret;
+}
+
+double foo1(std::uint64_t arg) {
+    return fromMem<double>(getMem(arg));
+}
+
+double foo2(std::uint64_t arg) {
+    return *reinterpret_cast<double*>(&arg);
+}
+
+double foo3(std::uint64_t arg) {
+    double ret;
+    __builtin_memcpy(&ret, &arg, sizeof(arg));
+    return ret;
+}
+
+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 02453d3ed9a..ab06be66131 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -949,10 +949,12 @@  create_access (tree expr, gimple *stmt, bool write)
 
 /* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or fixed-length
    ARRAY_TYPE with fields that are either of gimple register types (excluding
-   bit-fields) or (recursively) scalarizable types.  */
+   bit-fields) or (recursively) scalarizable types.  CONST_DECL must be true if
+   we are considering a decl from constant pool.  If it is false, char arrays
+   will be refused.  */
 
 static bool
-scalarizable_type_p (tree type)
+scalarizable_type_p (tree type, bool const_decl)
 {
   gcc_assert (!is_gimple_reg_type (type));
   if (type_contains_placeholder_p (type))
@@ -970,7 +972,7 @@  scalarizable_type_p (tree type)
 	    return false;
 
 	  if (!is_gimple_reg_type (ft)
-	      && !scalarizable_type_p (ft))
+	      && !scalarizable_type_p (ft, const_decl))
 	    return false;
 	}
 
@@ -978,10 +980,16 @@  scalarizable_type_p (tree type)
 
   case ARRAY_TYPE:
     {
+      HOST_WIDE_INT min_elem_size;
+      if (const_decl)
+	min_elem_size = 0;
+      else
+	min_elem_size = BITS_PER_UNIT;
+
       if (TYPE_DOMAIN (type) == NULL_TREE
 	  || !tree_fits_shwi_p (TYPE_SIZE (type))
 	  || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type)))
-	  || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0)
+	  || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size)
 	  || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type))))
 	return false;
       if (tree_to_shwi (TYPE_SIZE (type)) == 0
@@ -995,7 +1003,7 @@  scalarizable_type_p (tree type)
 
       tree elem = TREE_TYPE (type);
       if (!is_gimple_reg_type (elem)
-	 && !scalarizable_type_p (elem))
+	  && !scalarizable_type_p (elem, const_decl))
 	return false;
       return true;
     }
@@ -2653,7 +2661,8 @@  analyze_all_variable_accesses (void)
       {
 	tree var = candidate (i);
 
-	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
+	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
+						constant_decl_p (var)))
 	  {
 	    if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
 		<= max_scalarization_size)