diff mbox series

[PR,83141] Prevent SRA from removing type changing assignment

Message ID ri6indmgk7q.fsf@suse.cz
State New
Headers show
Series [PR,83141] Prevent SRA from removing type changing assignment | expand

Commit Message

Martin Jambor Dec. 4, 2017, 11:13 p.m. UTC
Hi,

this is a followup to Richi's
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
83141.  The basic idea is simple, be just as conservative about type
changing MEM_REFs as we are about actual VCEs.

I have checked how that would affect compilation of SPEC 2006 and (non
LTO) Mozilla Firefox and am happy to report that the difference was
tiny.  However, I had to make the test less strict, otherwise testcase
gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
and expects us to track values accross:

  int a[] = { 1, 2, 3 };
  /* ... */
  __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
				/* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
				/* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
				/* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
  
SRA is able to load replacement of a[0] directly from the temporary
array which is apparently necessary to generate proper debug info.  I
have therefore allowed the current transformation to go forward if the
source does not contain any padding or if it is a read-only declaration.
I would especially appreciate a review of the type_contains_padding_p
predicate as I am not sure it caches all cases.

The added call to contains_vce_or_bfcref_p in build_accesses_from_assign
is not strictly necessary, it is there to avoid attempting total
scalarization in vain.  This is also tested by the dump scan in the
added testcase.

A very similar patch has passed bootstrap and testing on x86_64, this
exact one is undergoing both now.  OK for trunk if it passes too?

Thanks,

Martin


2017-12-04  Martin Jambor  <mjambor@suse.cz>

	PR tree-optimization/83141

	* tree-sra.c (type_contains_padding_p): New function.
	(contains_vce_or_bfcref_p): Move up in the file, also test for
	MEM_REFs implicitely changing types with padding.  Remove inline
	keyword.
	(build_accesses_from_assign): Check contains_vce_or_bfcref_p
	before setting bit in should_scalarize_away_bitmap.

testsuite/
	* gcc.dg/tree-ssa/pr83141.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c |  31 +++++++++
 gcc/tree-sra.c                          | 116 ++++++++++++++++++++++++++------
 2 files changed, 128 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c

Comments

Martin Jambor Dec. 5, 2017, 12:06 a.m. UTC | #1
On Tue, Dec 05 2017, Martin Jambor wrote:
Hi,

> Hi,
>
> this is a followup to Richi's
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
> 83141.  The basic idea is simple, be just as conservative about type
> changing MEM_REFs as we are about actual VCEs.
>
> I have checked how that would affect compilation of SPEC 2006 and (non
> LTO) Mozilla Firefox and am happy to report that the difference was
> tiny.  However, I had to make the test less strict, otherwise testcase
> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
> and expects us to track values accross:
>
>   int a[] = { 1, 2, 3 };
>   /* ... */
>   __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
> 				/* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
> 				/* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
> 				/* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
>   
> SRA is able to load replacement of a[0] directly from the temporary
> array which is apparently necessary to generate proper debug info.  I
> have therefore allowed the current transformation to go forward if the
> source does not contain any padding or if it is a read-only declaration.

Ah, the read-only test is of course bogus, it was a last minute addition
when I was apparently already too tired to think it through.  Please
disregard that line in the patch (it has passed bootstrap and testing
without it).

Sorry for the noise,

Martin


> I would especially appreciate a review of the type_contains_padding_p
> predicate as I am not sure it caches all cases.
>
> The added call to contains_vce_or_bfcref_p in build_accesses_from_assign
> is not strictly necessary, it is there to avoid attempting total
> scalarization in vain.  This is also tested by the dump scan in the
> added testcase.
>
> A very similar patch has passed bootstrap and testing on x86_64, this
> exact one is undergoing both now.  OK for trunk if it passes too?
>
> Thanks,
>
> Martin
>
>
> 2017-12-04  Martin Jambor  <mjambor@suse.cz>
>
> 	PR tree-optimization/83141
>
> 	* tree-sra.c (type_contains_padding_p): New function.
> 	(contains_vce_or_bfcref_p): Move up in the file, also test for
> 	MEM_REFs implicitely changing types with padding.  Remove inline
> 	keyword.
> 	(build_accesses_from_assign): Check contains_vce_or_bfcref_p
> 	before setting bit in should_scalarize_away_bitmap.
>
> testsuite/
> 	* gcc.dg/tree-ssa/pr83141.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr83141.c |  31 +++++++++
>  gcc/tree-sra.c                          | 116 ++++++++++++++++++++++++++------
>  2 files changed, 128 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> new file mode 100644
> index 00000000000..86caf66025b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -fdump-tree-esra-details" } */
> +
> +volatile short vs;
> +volatile long vl;
> +
> +struct A { short s; long i; long j; };
> +struct A a, b;
> +void foo ()
> +{
> +  struct A c;
> +  __builtin_memcpy (&c, &b, sizeof (struct A));
> +  __builtin_memcpy (&a, &c, sizeof (struct A));
> +
> +  vs = c.s;
> +  vl = c.i;
> +  vl = c.j;
> +}
> +int main()
> +{
> +  __builtin_memset (&b, 0, sizeof (struct A));
> +  b.s = 1;
> +  __builtin_memcpy ((char *)&b+2, &b, 2);
> +  foo ();
> +  __builtin_memcpy (&a, (char *)&a+2, 2);
> +  if (a.s != 1)
> +    __builtin_abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" "esra" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 866cff0edb0..0a9622e77f8 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1141,6 +1141,101 @@ contains_view_convert_expr_p (const_tree ref)
>    return false;
>  }
>  
> +/* Return false if we can determine that TYPE has no padding, otherwise return
> +   true.  */
> +
> +static bool
> +type_contains_padding_p (tree type)
> +{
> +  if (is_gimple_reg_type (type))
> +    return false;
> +
> +  if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
> +    return true;
> +
> +  switch (TREE_CODE (type))
> +    {
> +    case RECORD_TYPE:
> +      {
> +	unsigned HOST_WIDE_INT pos = 0;
> +	for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
> +	  if (TREE_CODE (fld) == FIELD_DECL)
> +	    {
> +	      tree ft = TREE_TYPE (fld);
> +
> +	      if (DECL_BIT_FIELD (fld)
> +		  || DECL_PADDING_P (fld)
> +		  || !tree_fits_uhwi_p (TYPE_SIZE (ft)))
> +		return true;
> +
> +	      tree t = bit_position (fld);
> +	      if (!tree_fits_uhwi_p (t))
> +		return true;
> +	      unsigned HOST_WIDE_INT bp = tree_to_uhwi (t);
> +	      if (pos != bp)
> +		return true;
> +
> +	      pos += tree_to_uhwi (TYPE_SIZE (ft));
> +	      if (type_contains_padding_p (ft))
> +		return true;
> +	    }
> +
> +	if (pos != tree_to_uhwi (TYPE_SIZE (type)))
> +	  return true;
> +
> +	return false;
> +      }
> +
> +    case ARRAY_TYPE:
> +      return type_contains_padding_p (TYPE_SIZE (type));
> +
> +    default:
> +      return true;
> +    }
> +  gcc_unreachable ();
> +}
> +
> +/* Return true if REF contains a MEM_REF that performs type conversion from a
> +   type with padding or any VIEW_CONVERT_EXPR or a COMPONENT_REF with a
> +   bit-field field declaration.  */
> +
> +static bool
> +contains_vce_or_bfcref_p (const_tree ref)
> +{
> +  while (true)
> +    {
> +      if (TREE_CODE (ref) == MEM_REF)
> +	{
> +	  tree op0 = TREE_OPERAND (ref, 0);
> +	  if (TREE_CODE (op0) == ADDR_EXPR)
> +	    {
> +	      tree mem = TREE_OPERAND (op0, 0);
> +	      if ((TYPE_MAIN_VARIANT (TREE_TYPE (ref))
> +		   != TYPE_MAIN_VARIANT (TREE_TYPE (mem)))
> +		  && !((DECL_P (mem) && TREE_READONLY (mem))
> +		       || type_contains_padding_p (TREE_TYPE (mem))))
> +		return true;
> +
> +	      ref = mem;
> +	      continue;
> +	    }
> +	  else
> +	    return false;
> +	}
> +
> +      if (!handled_component_p (ref))
> +	return false;
> +
> +      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
> +	  || (TREE_CODE (ref) == COMPONENT_REF
> +	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
> +	return true;
> +      ref = TREE_OPERAND (ref, 0);
> +    }
> +
> +  return false;
> +}
> +
>  /* Search the given tree for a declaration by skipping handled components and
>     exclude it from the candidates.  */
>  
> @@ -1338,7 +1433,8 @@ build_accesses_from_assign (gimple *stmt)
>      {
>        racc->grp_assignment_read = 1;
>        if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
> -	  && !is_gimple_reg_type (racc->type))
> +	  && !is_gimple_reg_type (racc->type)
> +	  && !contains_vce_or_bfcref_p (rhs))
>  	bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
>        if (storage_order_barrier_p (lhs))
>  	racc->grp_unscalarizable_region = 1;
> @@ -3416,24 +3512,6 @@ get_repl_default_def_ssa_name (struct access *racc)
>    return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
>  }
>  
> -/* Return true if REF has an VIEW_CONVERT_EXPR or a COMPONENT_REF with a
> -   bit-field field declaration somewhere in it.  */
> -
> -static inline bool
> -contains_vce_or_bfcref_p (const_tree ref)
> -{
> -  while (handled_component_p (ref))
> -    {
> -      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
> -	  || (TREE_CODE (ref) == COMPONENT_REF
> -	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
> -	return true;
> -      ref = TREE_OPERAND (ref, 0);
> -    }
> -
> -  return false;
> -}
> -
>  /* Examine both sides of the assignment statement pointed to by STMT, replace
>     them with a scalare replacement if there is one and generate copying of
>     replacements if scalarized aggregates have been used in the assignment.  GSI
> -- 
> 2.15.1
Martin Jambor Dec. 5, 2017, noon UTC | #2
On Tue, Dec 05 2017, Martin Jambor wrote:
> On Tue, Dec 05 2017, Martin Jambor wrote:
> Hi,
>
>> Hi,
>>
>> this is a followup to Richi's
>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
>> 83141.  The basic idea is simple, be just as conservative about type
>> changing MEM_REFs as we are about actual VCEs.
>>
>> I have checked how that would affect compilation of SPEC 2006 and (non
>> LTO) Mozilla Firefox and am happy to report that the difference was
>> tiny.  However, I had to make the test less strict, otherwise testcase
>> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
>> and expects us to track values accross:
>>
>>   int a[] = { 1, 2, 3 };
>>   /* ... */
>>   __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
>> 				/* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
>> 				/* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
>> 				/* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
>>   
>> SRA is able to load replacement of a[0] directly from the temporary
>> array which is apparently necessary to generate proper debug info.  I
>> have therefore allowed the current transformation to go forward if the
>> source does not contain any padding or if it is a read-only declaration.
>
> Ah, the read-only test is of course bogus, it was a last minute addition
> when I was apparently already too tired to think it through.  Please
> disregard that line in the patch (it has passed bootstrap and testing
> without it).
>
> Sorry for the noise,
>
> Martin
>

And for the record, below is the actual patch, after a fresh round of
re-testing to double check I did not mess up anything else.  As before,
I'd like to ask for review, especially of the type_contains_padding_p
predicate and then would like to commit it to trunk.

Thanks,

Martin


2017-12-05  Martin Jambor  <mjambor@suse.cz>

	PR tree-optimization/83141
	* tree-sra.c (type_contains_padding_p): New function.
	(contains_vce_or_bfcref_p): Move up in the file, also test for
	MEM_REFs implicitely changing types with padding.  Remove inline
	keyword.
	(build_accesses_from_assign): Check contains_vce_or_bfcref_p
	before setting bit in should_scalarize_away_bitmap.

testsuite/
	* gcc.dg/tree-ssa/pr83141.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c |  31 +++++++++
 gcc/tree-sra.c                          | 115 ++++++++++++++++++++++++++------
 2 files changed, 127 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
new file mode 100644
index 00000000000..86caf66025b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O -fdump-tree-esra-details" } */
+
+volatile short vs;
+volatile long vl;
+
+struct A { short s; long i; long j; };
+struct A a, b;
+void foo ()
+{
+  struct A c;
+  __builtin_memcpy (&c, &b, sizeof (struct A));
+  __builtin_memcpy (&a, &c, sizeof (struct A));
+
+  vs = c.s;
+  vl = c.i;
+  vl = c.j;
+}
+int main()
+{
+  __builtin_memset (&b, 0, sizeof (struct A));
+  b.s = 1;
+  __builtin_memcpy ((char *)&b+2, &b, 2);
+  foo ();
+  __builtin_memcpy (&a, (char *)&a+2, 2);
+  if (a.s != 1)
+    __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" "esra" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 866cff0edb0..df9f10f83b6 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1141,6 +1141,100 @@ contains_view_convert_expr_p (const_tree ref)
   return false;
 }
 
+/* Return false if we can determine that TYPE has no padding, otherwise return
+   true.  */
+
+static bool
+type_contains_padding_p (tree type)
+{
+  if (is_gimple_reg_type (type))
+    return false;
+
+  if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
+    return true;
+
+  switch (TREE_CODE (type))
+    {
+    case RECORD_TYPE:
+      {
+	unsigned HOST_WIDE_INT pos = 0;
+	for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
+	  if (TREE_CODE (fld) == FIELD_DECL)
+	    {
+	      tree ft = TREE_TYPE (fld);
+
+	      if (DECL_BIT_FIELD (fld)
+		  || DECL_PADDING_P (fld)
+		  || !tree_fits_uhwi_p (TYPE_SIZE (ft)))
+		return true;
+
+	      tree t = bit_position (fld);
+	      if (!tree_fits_uhwi_p (t))
+		return true;
+	      unsigned HOST_WIDE_INT bp = tree_to_uhwi (t);
+	      if (pos != bp)
+		return true;
+
+	      pos += tree_to_uhwi (TYPE_SIZE (ft));
+	      if (type_contains_padding_p (ft))
+		return true;
+	    }
+
+	if (pos != tree_to_uhwi (TYPE_SIZE (type)))
+	  return true;
+
+	return false;
+      }
+
+    case ARRAY_TYPE:
+      return type_contains_padding_p (TYPE_SIZE (type));
+
+    default:
+      return true;
+    }
+  gcc_unreachable ();
+}
+
+/* Return true if REF contains a MEM_REF that performs type conversion from a
+   type with padding or any VIEW_CONVERT_EXPR or a COMPONENT_REF with a
+   bit-field field declaration.  */
+
+static bool
+contains_vce_or_bfcref_p (const_tree ref)
+{
+  while (true)
+    {
+      if (TREE_CODE (ref) == MEM_REF)
+	{
+	  tree op0 = TREE_OPERAND (ref, 0);
+	  if (TREE_CODE (op0) == ADDR_EXPR)
+	    {
+	      tree mem = TREE_OPERAND (op0, 0);
+	      if ((TYPE_MAIN_VARIANT (TREE_TYPE (ref))
+		   != TYPE_MAIN_VARIANT (TREE_TYPE (mem)))
+		  && type_contains_padding_p (TREE_TYPE (mem)))
+		return true;
+
+	      ref = mem;
+	      continue;
+	    }
+	  else
+	    return false;
+	}
+
+      if (!handled_component_p (ref))
+	return false;
+
+      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
+	  || (TREE_CODE (ref) == COMPONENT_REF
+	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
+	return true;
+      ref = TREE_OPERAND (ref, 0);
+    }
+
+  return false;
+}
+
 /* Search the given tree for a declaration by skipping handled components and
    exclude it from the candidates.  */
 
@@ -1338,7 +1432,8 @@ build_accesses_from_assign (gimple *stmt)
     {
       racc->grp_assignment_read = 1;
       if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
-	  && !is_gimple_reg_type (racc->type))
+	  && !is_gimple_reg_type (racc->type)
+	  && !contains_vce_or_bfcref_p (rhs))
 	bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
       if (storage_order_barrier_p (lhs))
 	racc->grp_unscalarizable_region = 1;
@@ -3416,24 +3511,6 @@ get_repl_default_def_ssa_name (struct access *racc)
   return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
 }
 
-/* Return true if REF has an VIEW_CONVERT_EXPR or a COMPONENT_REF with a
-   bit-field field declaration somewhere in it.  */
-
-static inline bool
-contains_vce_or_bfcref_p (const_tree ref)
-{
-  while (handled_component_p (ref))
-    {
-      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
-	  || (TREE_CODE (ref) == COMPONENT_REF
-	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
-	return true;
-      ref = TREE_OPERAND (ref, 0);
-    }
-
-  return false;
-}
-
 /* Examine both sides of the assignment statement pointed to by STMT, replace
    them with a scalare replacement if there is one and generate copying of
    replacements if scalarized aggregates have been used in the assignment.  GSI
Richard Biener Dec. 5, 2017, 1:16 p.m. UTC | #3
On Tue, 5 Dec 2017, Martin Jambor wrote:

> On Tue, Dec 05 2017, Martin Jambor wrote:
> > On Tue, Dec 05 2017, Martin Jambor wrote:
> > Hi,
> >
> >> Hi,
> >>
> >> this is a followup to Richi's
> >> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
> >> 83141.  The basic idea is simple, be just as conservative about type
> >> changing MEM_REFs as we are about actual VCEs.
> >>
> >> I have checked how that would affect compilation of SPEC 2006 and (non
> >> LTO) Mozilla Firefox and am happy to report that the difference was
> >> tiny.  However, I had to make the test less strict, otherwise testcase
> >> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
> >> and expects us to track values accross:
> >>
> >>   int a[] = { 1, 2, 3 };
> >>   /* ... */
> >>   __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
> >> 				/* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
> >> 				/* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
> >> 				/* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
> >>   
> >> SRA is able to load replacement of a[0] directly from the temporary
> >> array which is apparently necessary to generate proper debug info.  I
> >> have therefore allowed the current transformation to go forward if the
> >> source does not contain any padding or if it is a read-only declaration.
> >
> > Ah, the read-only test is of course bogus, it was a last minute addition
> > when I was apparently already too tired to think it through.  Please
> > disregard that line in the patch (it has passed bootstrap and testing
> > without it).
> >
> > Sorry for the noise,
> >
> > Martin
> >
> 
> And for the record, below is the actual patch, after a fresh round of
> re-testing to double check I did not mess up anything else.  As before,
> I'd like to ask for review, especially of the type_contains_padding_p
> predicate and then would like to commit it to trunk.

Comments below...

> Thanks,
> 
> Martin
> 
> 
> 2017-12-05  Martin Jambor  <mjambor@suse.cz>
> 
> 	PR tree-optimization/83141
> 	* tree-sra.c (type_contains_padding_p): New function.
> 	(contains_vce_or_bfcref_p): Move up in the file, also test for
> 	MEM_REFs implicitely changing types with padding.  Remove inline
> 	keyword.
> 	(build_accesses_from_assign): Check contains_vce_or_bfcref_p
> 	before setting bit in should_scalarize_away_bitmap.
> 
> testsuite/
> 	* gcc.dg/tree-ssa/pr83141.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr83141.c |  31 +++++++++
>  gcc/tree-sra.c                          | 115 ++++++++++++++++++++++++++------
>  2 files changed, 127 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> new file mode 100644
> index 00000000000..86caf66025b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -fdump-tree-esra-details" } */
> +
> +volatile short vs;
> +volatile long vl;
> +
> +struct A { short s; long i; long j; };
> +struct A a, b;
> +void foo ()
> +{
> +  struct A c;
> +  __builtin_memcpy (&c, &b, sizeof (struct A));
> +  __builtin_memcpy (&a, &c, sizeof (struct A));
> +
> +  vs = c.s;
> +  vl = c.i;
> +  vl = c.j;
> +}
> +int main()
> +{
> +  __builtin_memset (&b, 0, sizeof (struct A));
> +  b.s = 1;
> +  __builtin_memcpy ((char *)&b+2, &b, 2);
> +  foo ();
> +  __builtin_memcpy (&a, (char *)&a+2, 2);
> +  if (a.s != 1)
> +    __builtin_abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" "esra" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 866cff0edb0..df9f10f83b6 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1141,6 +1141,100 @@ contains_view_convert_expr_p (const_tree ref)
>    return false;
>  }
>  
> +/* Return false if we can determine that TYPE has no padding, otherwise return
> +   true.  */
> +
> +static bool
> +type_contains_padding_p (tree type)
> +{
> +  if (is_gimple_reg_type (type))
> +    return false;
> +
> +  if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
> +    return true;
> +
> +  switch (TREE_CODE (type))
> +    {
> +    case RECORD_TYPE:
> +      {
> +	unsigned HOST_WIDE_INT pos = 0;
> +	for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
> +	  if (TREE_CODE (fld) == FIELD_DECL)
> +	    {
> +	      tree ft = TREE_TYPE (fld);
> +
> +	      if (DECL_BIT_FIELD (fld)
> +		  || DECL_PADDING_P (fld)
> +		  || !tree_fits_uhwi_p (TYPE_SIZE (ft)))
> +		return true;
> +
> +	      tree t = bit_position (fld);
> +	      if (!tree_fits_uhwi_p (t))
> +		return true;
> +	      unsigned HOST_WIDE_INT bp = tree_to_uhwi (t);
> +	      if (pos != bp)
> +		return true;
> +
> +	      pos += tree_to_uhwi (TYPE_SIZE (ft));
> +	      if (type_contains_padding_p (ft))
> +		return true;
> +	    }
> +
> +	if (pos != tree_to_uhwi (TYPE_SIZE (type)))
> +	  return true;
> +
> +	return false;
> +      }
> +
> +    case ARRAY_TYPE:
> +      return type_contains_padding_p (TYPE_SIZE (type));
> +
> +    default:
> +      return true;
> +    }
> +  gcc_unreachable ();
> +}

The possibly repeated walk over all fields and subfields of an aggregate
type looks expensive - some caching on the main variant might be
advisable to me.

> +/* Return true if REF contains a MEM_REF that performs type conversion from a
> +   type with padding or any VIEW_CONVERT_EXPR or a COMPONENT_REF with a
> +   bit-field field declaration.  */
> +
> +static bool
> +contains_vce_or_bfcref_p (const_tree ref)
> +{
> +  while (true)
> +    {
> +      if (TREE_CODE (ref) == MEM_REF)
> +	{
> +	  tree op0 = TREE_OPERAND (ref, 0);
> +	  if (TREE_CODE (op0) == ADDR_EXPR)
> +	    {
> +	      tree mem = TREE_OPERAND (op0, 0);
> +	      if ((TYPE_MAIN_VARIANT (TREE_TYPE (ref))
> +		   != TYPE_MAIN_VARIANT (TREE_TYPE (mem)))
> +		  && type_contains_padding_p (TREE_TYPE (mem)))
> +		return true;
> +
> +	      ref = mem;
> +	      continue;
> +	    }
> +	  else
> +	    return false;
> +	}
> +
> +      if (!handled_component_p (ref))
> +	return false;
> +
> +      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
> +	  || (TREE_CODE (ref) == COMPONENT_REF
> +	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
> +	return true;
> +      ref = TREE_OPERAND (ref, 0);
> +    }

It is best to retain the old code and simply append

  if (TREE_CODE (ref) == MEM_REF)

after the while (handled_component_p ()) loop.  A MEM_REF can only
appear in innermost position.

I'm not sure that we really want to retain SRAing structures
based on their fields when we see a VCEd MEM_REF accessing them.
This might for example do floating-point accesses on data that
isn't floating-point and on some architectures may raise exceptions
(IA64 comes to my mind).  The memcpy folding code explicitely disallows
float modes and also BOOLEAN_TYPE and ENUMERAL_TYPE becuase they
might not match their precision:

      /* Make sure we are not copying using a floating-point mode or
         a type whose size possibly does not match its precision.  */
      if (FLOAT_MODE_P (TYPE_MODE (desttype))
          || TREE_CODE (desttype) == BOOLEAN_TYPE
          || TREE_CODE (desttype) == ENUMERAL_TYPE)
        desttype = bitwise_type_for_mode (TYPE_MODE (desttype));
      if (FLOAT_MODE_P (TYPE_MODE (srctype))
          || TREE_CODE (srctype) == BOOLEAN_TYPE
          || TREE_CODE (srctype) == ENUMERAL_TYPE)
        srctype = bitwise_type_for_mode (TYPE_MODE (srctype));

so - what's the reason you think you need to retain SRAing memcpied
structs?  Via total scalarization I mean - if we see the "real" accesses
besides the aggregate copy then we of course win.

Do we check anywhere when seeing an aggregate copy that lhs and rhs
have matching TYPE_MAIN_VARIANT?  GIMPLE allows matching TYPE_CANONICAL
which can have mismatched fields (LTO, cross-language) or even any
type if TYPE_STRUCTURAL_EQUALITY.  That is, what happens if we mark
two distinct enough decls for total scalarization that are copied
to each other?  Do we end up re-materializing them if their
"sturcture" doesn't match when transforming the assignment?

Thanks,
Richard.

> +  return false;
> +}
> +
>  /* Search the given tree for a declaration by skipping handled components and
>     exclude it from the candidates.  */
>  
> @@ -1338,7 +1432,8 @@ build_accesses_from_assign (gimple *stmt)
>      {
>        racc->grp_assignment_read = 1;
>        if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
> -	  && !is_gimple_reg_type (racc->type))
> +	  && !is_gimple_reg_type (racc->type)
> +	  && !contains_vce_or_bfcref_p (rhs))
>  	bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
>        if (storage_order_barrier_p (lhs))
>  	racc->grp_unscalarizable_region = 1;
> @@ -3416,24 +3511,6 @@ get_repl_default_def_ssa_name (struct access *racc)
>    return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
>  }
>  
> -/* Return true if REF has an VIEW_CONVERT_EXPR or a COMPONENT_REF with a
> -   bit-field field declaration somewhere in it.  */
> -
> -static inline bool
> -contains_vce_or_bfcref_p (const_tree ref)
> -{
> -  while (handled_component_p (ref))
> -    {
> -      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
> -	  || (TREE_CODE (ref) == COMPONENT_REF
> -	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
> -	return true;
> -      ref = TREE_OPERAND (ref, 0);
> -    }
> -
> -  return false;
> -}
> -
>  /* Examine both sides of the assignment statement pointed to by STMT, replace
>     them with a scalare replacement if there is one and generate copying of
>     replacements if scalarized aggregates have been used in the assignment.  GSI
>
Martin Jambor Dec. 6, 2017, 1:37 p.m. UTC | #4
Hi,

On Tue, Dec 05 2017, Richard Biener wrote:
> On Tue, 5 Dec 2017, Martin Jambor wrote:
>
>> On Tue, Dec 05 2017, Martin Jambor wrote:
>> > On Tue, Dec 05 2017, Martin Jambor wrote:
>> > Hi,
>> >
>> >> Hi,
>> >>
>> >> this is a followup to Richi's
>> >> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
>> >> 83141.  The basic idea is simple, be just as conservative about type
>> >> changing MEM_REFs as we are about actual VCEs.
>> >>
>> >> I have checked how that would affect compilation of SPEC 2006 and (non
>> >> LTO) Mozilla Firefox and am happy to report that the difference was
>> >> tiny.  However, I had to make the test less strict, otherwise testcase
>> >> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
>> >> and expects us to track values accross:
>> >>
>> >>   int a[] = { 1, 2, 3 };
>> >>   /* ... */
>> >>   __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
>> >> 				/* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
>> >> 				/* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
>> >> 				/* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
>> >>   
>> >> SRA is able to load replacement of a[0] directly from the temporary
>> >> array which is apparently necessary to generate proper debug info.  I
>> >> have therefore allowed the current transformation to go forward if the
>> >> source does not contain any padding or if it is a read-only declaration.
>> >
>> > Ah, the read-only test is of course bogus, it was a last minute addition
>> > when I was apparently already too tired to think it through.  Please
>> > disregard that line in the patch (it has passed bootstrap and testing
>> > without it).
>> >
>> > Sorry for the noise,
>> >
>> > Martin
>> >
>> 
>> And for the record, below is the actual patch, after a fresh round of
>> re-testing to double check I did not mess up anything else.  As before,
>> I'd like to ask for review, especially of the type_contains_padding_p
>> predicate and then would like to commit it to trunk.
>
> Comments below...
>
>> Thanks,
>> 
>> Martin
>> 
>> 
>> 2017-12-05  Martin Jambor  <mjambor@suse.cz>
>> 
>> 	PR tree-optimization/83141
>> 	* tree-sra.c (type_contains_padding_p): New function.
>> 	(contains_vce_or_bfcref_p): Move up in the file, also test for
>> 	MEM_REFs implicitely changing types with padding.  Remove inline
>> 	keyword.
>> 	(build_accesses_from_assign): Check contains_vce_or_bfcref_p
>> 	before setting bit in should_scalarize_away_bitmap.
>> 
>> testsuite/
>> 	* gcc.dg/tree-ssa/pr83141.c: New test.
>> ---
>>  gcc/testsuite/gcc.dg/tree-ssa/pr83141.c |  31 +++++++++
>>  gcc/tree-sra.c                          | 115 ++++++++++++++++++++++++++------
>>  2 files changed, 127 insertions(+), 19 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
>> 
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
>> new file mode 100644
>> index 00000000000..86caf66025b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
>> @@ -0,0 +1,31 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O -fdump-tree-esra-details" } */
>> +
>> +volatile short vs;
>> +volatile long vl;
>> +
>> +struct A { short s; long i; long j; };
>> +struct A a, b;
>> +void foo ()
>> +{
>> +  struct A c;
>> +  __builtin_memcpy (&c, &b, sizeof (struct A));
>> +  __builtin_memcpy (&a, &c, sizeof (struct A));
>> +
>> +  vs = c.s;
>> +  vl = c.i;
>> +  vl = c.j;
>> +}
>> +int main()
>> +{
>> +  __builtin_memset (&b, 0, sizeof (struct A));
>> +  b.s = 1;
>> +  __builtin_memcpy ((char *)&b+2, &b, 2);
>> +  foo ();
>> +  __builtin_memcpy (&a, (char *)&a+2, 2);
>> +  if (a.s != 1)
>> +    __builtin_abort ();
>> +  return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" "esra" } } */
>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> index 866cff0edb0..df9f10f83b6 100644
>> --- a/gcc/tree-sra.c
>> +++ b/gcc/tree-sra.c
>> @@ -1141,6 +1141,100 @@ contains_view_convert_expr_p (const_tree ref)
>>    return false;
>>  }
>>  
>> +/* Return false if we can determine that TYPE has no padding, otherwise return
>> +   true.  */
>> +
>> +static bool
>> +type_contains_padding_p (tree type)
>> +{
>> +  if (is_gimple_reg_type (type))
>> +    return false;
>> +
>> +  if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
>> +    return true;
>> +
>> +  switch (TREE_CODE (type))
>> +    {
>> +    case RECORD_TYPE:
>> +      {
>> +	unsigned HOST_WIDE_INT pos = 0;
>> +	for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
>> +	  if (TREE_CODE (fld) == FIELD_DECL)
>> +	    {
>> +	      tree ft = TREE_TYPE (fld);
>> +
>> +	      if (DECL_BIT_FIELD (fld)
>> +		  || DECL_PADDING_P (fld)
>> +		  || !tree_fits_uhwi_p (TYPE_SIZE (ft)))
>> +		return true;
>> +
>> +	      tree t = bit_position (fld);
>> +	      if (!tree_fits_uhwi_p (t))
>> +		return true;
>> +	      unsigned HOST_WIDE_INT bp = tree_to_uhwi (t);
>> +	      if (pos != bp)
>> +		return true;
>> +
>> +	      pos += tree_to_uhwi (TYPE_SIZE (ft));
>> +	      if (type_contains_padding_p (ft))
>> +		return true;
>> +	    }
>> +
>> +	if (pos != tree_to_uhwi (TYPE_SIZE (type)))
>> +	  return true;
>> +
>> +	return false;
>> +      }
>> +
>> +    case ARRAY_TYPE:
>> +      return type_contains_padding_p (TYPE_SIZE (type));
>> +
>> +    default:
>> +      return true;
>> +    }
>> +  gcc_unreachable ();
>> +}
>
> The possibly repeated walk over all fields and subfields of an aggregate
> type looks expensive - some caching on the main variant might be
> advisable to me.

It almost never happens but I guess you are right.  But see below.

>
>> +/* Return true if REF contains a MEM_REF that performs type conversion from a
>> +   type with padding or any VIEW_CONVERT_EXPR or a COMPONENT_REF with a
>> +   bit-field field declaration.  */
>> +
>> +static bool
>> +contains_vce_or_bfcref_p (const_tree ref)
>> +{
>> +  while (true)
>> +    {
>> +      if (TREE_CODE (ref) == MEM_REF)
>> +	{
>> +	  tree op0 = TREE_OPERAND (ref, 0);
>> +	  if (TREE_CODE (op0) == ADDR_EXPR)
>> +	    {
>> +	      tree mem = TREE_OPERAND (op0, 0);
>> +	      if ((TYPE_MAIN_VARIANT (TREE_TYPE (ref))
>> +		   != TYPE_MAIN_VARIANT (TREE_TYPE (mem)))
>> +		  && type_contains_padding_p (TREE_TYPE (mem)))
>> +		return true;
>> +
>> +	      ref = mem;
>> +	      continue;
>> +	    }
>> +	  else
>> +	    return false;
>> +	}
>> +
>> +      if (!handled_component_p (ref))
>> +	return false;
>> +
>> +      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
>> +	  || (TREE_CODE (ref) == COMPONENT_REF
>> +	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
>> +	return true;
>> +      ref = TREE_OPERAND (ref, 0);
>> +    }
>
> It is best to retain the old code and simply append
>
>   if (TREE_CODE (ref) == MEM_REF)
>
> after the while (handled_component_p ()) loop.  A MEM_REF can only
> appear in innermost position.

I hoped so but could not find any such check quickly in the tree-cfg.c
verifier.

>
> I'm not sure that we really want to retain SRAing structures
> based on their fields when we see a VCEd MEM_REF accessing them.
> This might for example do floating-point accesses on data that
> isn't floating-point and on some architectures may raise exceptions
> (IA64 comes to my mind).  The memcpy folding code explicitely disallows
> float modes and also BOOLEAN_TYPE and ENUMERAL_TYPE becuase they
> might not match their precision:
>
>       /* Make sure we are not copying using a floating-point mode or
>          a type whose size possibly does not match its precision.  */
>       if (FLOAT_MODE_P (TYPE_MODE (desttype))
>           || TREE_CODE (desttype) == BOOLEAN_TYPE
>           || TREE_CODE (desttype) == ENUMERAL_TYPE)
>         desttype = bitwise_type_for_mode (TYPE_MODE (desttype));
>       if (FLOAT_MODE_P (TYPE_MODE (srctype))
>           || TREE_CODE (srctype) == BOOLEAN_TYPE
>           || TREE_CODE (srctype) == ENUMERAL_TYPE)
>         srctype = bitwise_type_for_mode (TYPE_MODE (srctype));
>
> so - what's the reason you think you need to retain SRAing memcpied
> structs?  Via total scalarization I mean - if we see the "real" accesses
> besides the aggregate copy then we of course win.

Two reasons.  First, Jakub explicitely asked me to do it in
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02139.html.  I will openly
admit that at the time I was planning to silently ignore it and hope to
get away with it because I shared your caution.  (Now I see I forgot
about the second request which I did not want to ignore and will adjust
the testcase accordingly.)

Second is the testcase I described in my previous email.  When I saw

  FAIL: gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4

At all optimization levels, I grumbled about Jakub being right again and
duly decided to bite the bullet and do what he asked me to because it
fixes the issue.  But if you allow me to XFAIL the guality test, I will
happily remove the whole padding detection, I don't really like it
either.

The debug information is apparently lost because a[0] is never used from
that point on, as opposed to a[1] and a[2] for which the guality test
still passes.

>
> Do we check anywhere when seeing an aggregate copy that lhs and rhs
> have matching TYPE_MAIN_VARIANT?

No.  I can do some due diligence measurements but I think it would not
hurt to add it.

> GIMPLE allows matching TYPE_CANONICAL
> which can have mismatched fields (LTO, cross-language) or even any
> type if TYPE_STRUCTURAL_EQUALITY.  That is, what happens if we mark
> two distinct enough decls for total scalarization that are copied
> to each other?  Do we end up re-materializing them if their
> "sturcture" doesn't match when transforming the assignment?

Basically yes, although we try to do the rematerialization only on the
RHS or LHS.  In more detail, if an assignment is not considered fragile
by the condition

  if (modify_this_stmt
      || gimple_has_volatile_ops (stmt)
      || contains_vce_or_bfcref_p (rhs)
      || contains_vce_or_bfcref_p (lhs)
      || stmt_ends_bb_p (stmt))

the following happens:

  1) if each replacement of the LHS has a match on the RHS in terms of
     offset and size, scalar assignment between them is performed,
     possibly with a VIEW_CONVERT_EXPR if types do not match.  

  2) If for some LHS replacement that is not the case, then we flush all
     RHS replacements either to the LHS (hoping to make the original RHS
     redundant) if the RHS replacements contain all data there is in
     RHS, or to RHS otherwise, and load the LHS replacement from the LHS
     or RHS that we picked (really the same one, we do not rely on the
     original assignment to carry the data).

  3) Same flushing happens when some part of LHS is not scalarized,
     unless we know it is never read, then we don't care.

  Generally, if we can prove that we do not loose any data, the original
  statement is deleted.
  
But let me emphasize again that whenever correctness is the issue, the
question whether an SRA recorded access comes from total scalarization
or not is not important.  Users accessing the data in some other part of
the function can create them too.  Users equipped with placement new can
create all sorts of weird type accesses and because tree-sra is flow
insensitive, they can then force scalarization to such replacements even
at places where the data have wildly different types.

Martin
Richard Biener Dec. 7, 2017, 10:56 a.m. UTC | #5
On Thu, 7 Dec 2017, Martin Jambor wrote:

> Hi,
> 
> On Wed, Dec 06 2017, Richard Biener wrote:
> > On Wed, 6 Dec 2017, Martin Jambor wrote:
> 
> >> ...
> 
> >> Second is the testcase I described in my previous email.  When I saw
> >> 
> >>   FAIL: gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
> >> 
> >> At all optimization levels, I grumbled about Jakub being right again and
> >> duly decided to bite the bullet and do what he asked me to because it
> >> fixes the issue.  But if you allow me to XFAIL the guality test, I will
> >> happily remove the whole padding detection, I don't really like it
> >> either.
> >> 
> >> The debug information is apparently lost because a[0] is never used from
> >> that point on, as opposed to a[1] and a[2] for which the guality test
> >> still passes.
> >
> > XFAILing that is fine I think.
> >
> 
> Great, the updated and re-tested patch is below.  The only problem is
> that I did not figure out how to XFAIL a dg-final test only for
> optimized runs and so it now XPASSes at -O0.  Alternatively I can make
> a[0] not dead in the test, but that would hide the new regression which
> seems worse.

Works for me.  One could duplicate the test and dg-skip-if one for -O0
and one for anything besides -O0 ...

> >> ...
> 
> >> But let me emphasize again that whenever correctness is the issue, the
> >> question whether an SRA recorded access comes from total scalarization
> >> or not is not important.  Users accessing the data in some other part of
> >> the function can create them too.  Users equipped with placement new can
> >> create all sorts of weird type accesses and because tree-sra is flow
> >> insensitive, they can then force scalarization to such replacements even
> >> at places where the data have wildly different types.
> >
> > Yes, but SRA will never create loads or stores for the non-total
> > scalarization case it will only choose one (better non-FP if not
> > all accesses are FP - I think compare_access_positions guarantees that) 
> > scalar register for each replacement, right?
> 
> Yes.  My point was just that with placement new, the same aggregate decl
> can contain wildly different data in two different places of a function,
> and SRA might pick some from the first place and use it in the other.
> Thus, any testcase that miscompiles with total scalarization can be
> extended to one that miscompiles without it.

I doubt that - do you have something specific in mind?
I think placement-new also emits a CLOBBER, how does
SRA treat that at the moment?

> >
> > Basically it will replace _all_ accesses of a memory piece with
> > a register instead, making that memory piece unused?
> 
> Yes.  By the way, given that we are about to consider assignments with
> type-changing MEM_REFs fragile and will not delete them, the aggregate
> will not go away and that is why I added back the bit setting to
> cannot_scalarize_away bitmap.  After all, that is exactly what the
> bitmap is for, don't bother totally scalarizing, the aggregate will not
> disappear.

Good.

> Below is the updated and quite a bit simpler patch, which has passed
> bootstrap and testing on x86_64-linux (but suffers from the XFAILs and
> XPASSes dewscribed above).

Ok.

Thanks,
Richard.

> Martin
> 
> 
> 2017-12-06  Martin Jambor  <mjambor@suse.cz>
> 
> 	PR tree-optimization/83141
> 	* tree-sra.c (contains_vce_or_bfcref_p): Move up in the file, also
> 	test for MEM_REFs implicitely changing types with padding.  Remove
> 	inline keyword.
> 	(build_accesses_from_assign): Added contains_vce_or_bfcref_p checks.
> 
> testsuite/
> 	* gcc.dg/tree-ssa/pr83141.c: New test.
> 	* gcc.dg/guality/pr54970.c: XFAIL tests querying a[0].
> ---
>  gcc/testsuite/gcc.dg/guality/pr54970.c  | 10 +++---
>  gcc/testsuite/gcc.dg/tree-ssa/pr83141.c | 37 ++++++++++++++++++++++
>  gcc/tree-sra.c                          | 54 +++++++++++++++++++++------------
>  3 files changed, 77 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> 
> diff --git a/gcc/testsuite/gcc.dg/guality/pr54970.c b/gcc/testsuite/gcc.dg/guality/pr54970.c
> index a9b8c064d8b..1819d023e21 100644
> --- a/gcc/testsuite/gcc.dg/guality/pr54970.c
> +++ b/gcc/testsuite/gcc.dg/guality/pr54970.c
> @@ -24,23 +24,23 @@ main ()
>  				/* { dg-final { gdb-test 25 "*p" "13" } } */
>    asm volatile (NOP);		/* { dg-final { gdb-test 25 "*q" "12" } } */
>    __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
> -				/* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
> +				/* { dg-final { gdb-test 31 "a\[0\]" "4" { xfail { *-*-* } } } } */
>  				/* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
>  				/* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
>  				/* { dg-final { gdb-test 31 "*p" "6" } } */
>    asm volatile (NOP);		/* { dg-final { gdb-test 31 "*q" "5" } } */
> -  *p += 20;			/* { dg-final { gdb-test 36 "a\[0\]" "4" } } */
> +  *p += 20;			/* { dg-final { gdb-test 36 "a\[0\]" "4" { xfail { *-*-* } } } } */
>  				/* { dg-final { gdb-test 36 "a\[1\]" "5" } } */
>  				/* { dg-final { gdb-test 36 "a\[2\]" "26" } } */
>  				/* { dg-final { gdb-test 36 "*p" "26" } } */
>    asm volatile (NOP);		/* { dg-final { gdb-test 36 "*q" "5" } } */
> -  *q += 20;			/* { dg-final { gdb-test 45 "a\[0\]" "4" } } */
> +  *q += 20;			/* { dg-final { gdb-test 45 "a\[0\]" "4" { xfail { *-*-* } } } } */
>  				/* { dg-final { gdb-test 45 "a\[1\]" "25" } } */
>  				/* { dg-final { gdb-test 45 "a\[2\]" "26" } } */
>  				/* { dg-final { gdb-test 45 "*p" "26" } } */
>  				/* { dg-final { gdb-test 45 "p\[-1\]" "25" } } */
> -				/* { dg-final { gdb-test 45 "p\[-2\]" "4" } } */
> -				/* { dg-final { gdb-test 45 "q\[-1\]" "4" } } */
> +				/* { dg-final { gdb-test 45 "p\[-2\]" "4" { xfail { *-*-* } } } } */
> +				/* { dg-final { gdb-test 45 "q\[-1\]" "4" { xfail { *-*-* } } } } */
>  				/* { dg-final { gdb-test 45 "q\[1\]" "26" } } */
>    asm volatile (NOP);		/* { dg-final { gdb-test 45 "*q" "25" } } */
>    return 0;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> new file mode 100644
> index 00000000000..73ea45c613c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> @@ -0,0 +1,37 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -fdump-tree-esra-details" } */
> +
> +volatile short vs;
> +volatile long vl;
> +
> +struct A { short s; long i; long j; };
> +struct A a, b;
> +void foo ()
> +{
> +  struct A c;
> +  __builtin_memcpy (&c, &b, sizeof (struct A));
> +  __builtin_memcpy (&a, &c, sizeof (struct A));
> +
> +  vs = c.s;
> +  vl = c.i;
> +  vl = c.j;
> +}
> +
> +
> +int main()
> +{
> +  if ((sizeof (short) != 2)
> +      || (__builtin_offsetof (struct A, i) < 4))
> +    return 0;
> +
> +  __builtin_memset (&b, 0, sizeof (struct A));
> +  b.s = 1;
> +  __builtin_memcpy ((char *)&b+2, &b, 2);
> +  foo ();
> +  __builtin_memcpy (&a, (char *)&a+2, 2);
> +  if (a.s != 1)
> +    __builtin_abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" "esra" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 866cff0edb0..54f1c8d54d5 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1141,6 +1141,33 @@ contains_view_convert_expr_p (const_tree ref)
>    return false;
>  }
>  
> +/* Return true if REF contains a VIEW_CONVERT_EXPR or a MEM_REF that performs
> +   type conversion or a COMPONENT_REF with a bit-field field declaration.  */
> +
> +static bool
> +contains_vce_or_bfcref_p (const_tree ref)
> +{
> +  while (handled_component_p (ref))
> +    {
> +      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
> +	  || (TREE_CODE (ref) == COMPONENT_REF
> +	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
> +	return true;
> +      ref = TREE_OPERAND (ref, 0);
> +    }
> +
> +  if (TREE_CODE (ref) != MEM_REF
> +      || TREE_CODE (TREE_OPERAND (ref, 0)) != ADDR_EXPR)
> +    return false;
> +
> +  tree mem = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
> +  if (TYPE_MAIN_VARIANT (TREE_TYPE (ref))
> +      != TYPE_MAIN_VARIANT (TREE_TYPE (mem)))
> +    return true;
> +
> +  return false;
> +}
> +
>  /* Search the given tree for a declaration by skipping handled components and
>     exclude it from the candidates.  */
>  
> @@ -1339,7 +1366,14 @@ build_accesses_from_assign (gimple *stmt)
>        racc->grp_assignment_read = 1;
>        if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
>  	  && !is_gimple_reg_type (racc->type))
> -	bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
> +	{
> +	  if (contains_vce_or_bfcref_p (rhs))
> +	    bitmap_set_bit (cannot_scalarize_away_bitmap,
> +			    DECL_UID (racc->base));
> +	  else
> +	    bitmap_set_bit (should_scalarize_away_bitmap,
> +			    DECL_UID (racc->base));
> +	}
>        if (storage_order_barrier_p (lhs))
>  	racc->grp_unscalarizable_region = 1;
>      }
> @@ -3416,24 +3450,6 @@ get_repl_default_def_ssa_name (struct access *racc)
>    return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
>  }
>  
> -/* Return true if REF has an VIEW_CONVERT_EXPR or a COMPONENT_REF with a
> -   bit-field field declaration somewhere in it.  */
> -
> -static inline bool
> -contains_vce_or_bfcref_p (const_tree ref)
> -{
> -  while (handled_component_p (ref))
> -    {
> -      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
> -	  || (TREE_CODE (ref) == COMPONENT_REF
> -	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
> -	return true;
> -      ref = TREE_OPERAND (ref, 0);
> -    }
> -
> -  return false;
> -}
> -
>  /* Examine both sides of the assignment statement pointed to by STMT, replace
>     them with a scalare replacement if there is one and generate copying of
>     replacements if scalarized aggregates have been used in the assignment.  GSI
>
H.J. Lu Aug. 1, 2018, 1:29 a.m. UTC | #6
On Tue, Dec 5, 2017 at 4:00 AM, Martin Jambor <mjambor@suse.cz> wrote:
> On Tue, Dec 05 2017, Martin Jambor wrote:
>> On Tue, Dec 05 2017, Martin Jambor wrote:
>> Hi,
>>
>>> Hi,
>>>
>>> this is a followup to Richi's
>>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
>>> 83141.  The basic idea is simple, be just as conservative about type
>>> changing MEM_REFs as we are about actual VCEs.
>>>
>>> I have checked how that would affect compilation of SPEC 2006 and (non
>>> LTO) Mozilla Firefox and am happy to report that the difference was
>>> tiny.  However, I had to make the test less strict, otherwise testcase
>>> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
>>> and expects us to track values accross:
>>>
>>>   int a[] = { 1, 2, 3 };
>>>   /* ... */
>>>   __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
>>>                              /* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
>>>                              /* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
>>>                              /* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
>>>
>>> SRA is able to load replacement of a[0] directly from the temporary
>>> array which is apparently necessary to generate proper debug info.  I
>>> have therefore allowed the current transformation to go forward if the
>>> source does not contain any padding or if it is a read-only declaration.
>>
>> Ah, the read-only test is of course bogus, it was a last minute addition
>> when I was apparently already too tired to think it through.  Please
>> disregard that line in the patch (it has passed bootstrap and testing
>> without it).
>>
>> Sorry for the noise,
>>
>> Martin
>>
>
> And for the record, below is the actual patch, after a fresh round of
> re-testing to double check I did not mess up anything else.  As before,
> I'd like to ask for review, especially of the type_contains_padding_p
> predicate and then would like to commit it to trunk.
>
> Thanks,
>
> Martin
>
>
> 2017-12-05  Martin Jambor  <mjambor@suse.cz>
>
>         PR tree-optimization/83141
>         * tree-sra.c (type_contains_padding_p): New function.
>         (contains_vce_or_bfcref_p): Move up in the file, also test for
>         MEM_REFs implicitely changing types with padding.  Remove inline
>         keyword.
>         (build_accesses_from_assign): Check contains_vce_or_bfcref_p
>         before setting bit in should_scalarize_away_bitmap.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86763

H.J.
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
new file mode 100644
index 00000000000..86caf66025b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
@@ -0,0 +1,31 @@ 
+/* { dg-do run } */
+/* { dg-options "-O -fdump-tree-esra-details" } */
+
+volatile short vs;
+volatile long vl;
+
+struct A { short s; long i; long j; };
+struct A a, b;
+void foo ()
+{
+  struct A c;
+  __builtin_memcpy (&c, &b, sizeof (struct A));
+  __builtin_memcpy (&a, &c, sizeof (struct A));
+
+  vs = c.s;
+  vl = c.i;
+  vl = c.j;
+}
+int main()
+{
+  __builtin_memset (&b, 0, sizeof (struct A));
+  b.s = 1;
+  __builtin_memcpy ((char *)&b+2, &b, 2);
+  foo ();
+  __builtin_memcpy (&a, (char *)&a+2, 2);
+  if (a.s != 1)
+    __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" "esra" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 866cff0edb0..0a9622e77f8 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1141,6 +1141,101 @@  contains_view_convert_expr_p (const_tree ref)
   return false;
 }
 
+/* Return false if we can determine that TYPE has no padding, otherwise return
+   true.  */
+
+static bool
+type_contains_padding_p (tree type)
+{
+  if (is_gimple_reg_type (type))
+    return false;
+
+  if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
+    return true;
+
+  switch (TREE_CODE (type))
+    {
+    case RECORD_TYPE:
+      {
+	unsigned HOST_WIDE_INT pos = 0;
+	for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
+	  if (TREE_CODE (fld) == FIELD_DECL)
+	    {
+	      tree ft = TREE_TYPE (fld);
+
+	      if (DECL_BIT_FIELD (fld)
+		  || DECL_PADDING_P (fld)
+		  || !tree_fits_uhwi_p (TYPE_SIZE (ft)))
+		return true;
+
+	      tree t = bit_position (fld);
+	      if (!tree_fits_uhwi_p (t))
+		return true;
+	      unsigned HOST_WIDE_INT bp = tree_to_uhwi (t);
+	      if (pos != bp)
+		return true;
+
+	      pos += tree_to_uhwi (TYPE_SIZE (ft));
+	      if (type_contains_padding_p (ft))
+		return true;
+	    }
+
+	if (pos != tree_to_uhwi (TYPE_SIZE (type)))
+	  return true;
+
+	return false;
+      }
+
+    case ARRAY_TYPE:
+      return type_contains_padding_p (TYPE_SIZE (type));
+
+    default:
+      return true;
+    }
+  gcc_unreachable ();
+}
+
+/* Return true if REF contains a MEM_REF that performs type conversion from a
+   type with padding or any VIEW_CONVERT_EXPR or a COMPONENT_REF with a
+   bit-field field declaration.  */
+
+static bool
+contains_vce_or_bfcref_p (const_tree ref)
+{
+  while (true)
+    {
+      if (TREE_CODE (ref) == MEM_REF)
+	{
+	  tree op0 = TREE_OPERAND (ref, 0);
+	  if (TREE_CODE (op0) == ADDR_EXPR)
+	    {
+	      tree mem = TREE_OPERAND (op0, 0);
+	      if ((TYPE_MAIN_VARIANT (TREE_TYPE (ref))
+		   != TYPE_MAIN_VARIANT (TREE_TYPE (mem)))
+		  && !((DECL_P (mem) && TREE_READONLY (mem))
+		       || type_contains_padding_p (TREE_TYPE (mem))))
+		return true;
+
+	      ref = mem;
+	      continue;
+	    }
+	  else
+	    return false;
+	}
+
+      if (!handled_component_p (ref))
+	return false;
+
+      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
+	  || (TREE_CODE (ref) == COMPONENT_REF
+	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
+	return true;
+      ref = TREE_OPERAND (ref, 0);
+    }
+
+  return false;
+}
+
 /* Search the given tree for a declaration by skipping handled components and
    exclude it from the candidates.  */
 
@@ -1338,7 +1433,8 @@  build_accesses_from_assign (gimple *stmt)
     {
       racc->grp_assignment_read = 1;
       if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
-	  && !is_gimple_reg_type (racc->type))
+	  && !is_gimple_reg_type (racc->type)
+	  && !contains_vce_or_bfcref_p (rhs))
 	bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
       if (storage_order_barrier_p (lhs))
 	racc->grp_unscalarizable_region = 1;
@@ -3416,24 +3512,6 @@  get_repl_default_def_ssa_name (struct access *racc)
   return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
 }
 
-/* Return true if REF has an VIEW_CONVERT_EXPR or a COMPONENT_REF with a
-   bit-field field declaration somewhere in it.  */
-
-static inline bool
-contains_vce_or_bfcref_p (const_tree ref)
-{
-  while (handled_component_p (ref))
-    {
-      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
-	  || (TREE_CODE (ref) == COMPONENT_REF
-	      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
-	return true;
-      ref = TREE_OPERAND (ref, 0);
-    }
-
-  return false;
-}
-
 /* Examine both sides of the assignment statement pointed to by STMT, replace
    them with a scalare replacement if there is one and generate copying of
    replacements if scalarized aggregates have been used in the assignment.  GSI