diff mbox series

store-merging: Fix coalesce_immediate_stores [PR93820]

Message ID 20200225233325.GP2155@tucnak
State New
Headers show
Series store-merging: Fix coalesce_immediate_stores [PR93820] | expand

Commit Message

Jakub Jelinek Feb. 25, 2020, 11:33 p.m. UTC
Hi!

The following testcase is miscompiled in 8+.
The problem is that check_no_overlap has a special case for INTEGER_CST
marked stores (i.e. stores of constants), if both all currenly merged stores
and the one under consideration for merging with them are marked that way,
it anticipates that other INTEGER_CST marked stores that overlap with those
and precede those (have smaller info->order) could be merged with those and
doesn't punt for them.
In PR86844 and PR87859 fixes I've then added quite large code that is
performed after check_no_overlap and tries to find out if we need and can
merge further INTEGER_CST marked stores, or need to punt.
Unfortunately, that code is there only in the overlapping case code and
the testcase below shows that we really need it even in the adjacent store
case.  After sort_by_bitpos we have:
bitpos	width	order	rhs_code
96	32	3	INTEGER_CST
128	32	1	INTEGER_CST
128	128	2	INTEGER_CST
192	32	0	MEM_REF
Because of the missing PR86844/PR87859-ish code in the adjacent store
case, we merge the adjacent (memory wise) stores 96/32/3 and 128/32/1,
and then we consider the 128-bit store which is in program-order in between
them, but in this case we punt, because the merging would extend the
merged store region from bitpos 96 and 64-bits to bitpos 96 and 160-bits
and that has an overlap with an incompatible store (the MEM_REF one).
The problem is that we can't really punt this way, because the 128-bit
store is in between those two we've merged already, so either we manage
to merge even that one together with the others, or would need to avoid
already merging the 96/32/3 and 128/32/1 stores together.
Now, rather than copying around the PR86844/PR87859 code to the other spot,
we can actually just use the overlapping code, merge_overlapping is really
a superset of merge_into, so that is what the patch does.  If doing
adjacent store merge for rhs_code other than INTEGER_CST, I believe the
current code is already fine, check_no_overlap in that case doesn't make
the exception and will punt if there is some earlier (smaller order)
non-mergeable overlapping store.  There is just one case that could be
problematic, if the merged_store has BIT_INSERT_EXPRs in them and the
new store is a constant store (INTEGER_CST rhs_code), then check_no_overlap
would do the exception and still would allow the special case.  But we
really shouldn't have the special case in that case, so this patch also
changes check_no_overlap to just have a bool whether we should have the
special case or not.

Bootstrapped/regtested on x86_64-linux and i686-linux on the trunk as well
as coprresponding backport to 8 branch, ok for trunk and release branches?

Note, as I said in the PR, for GCC11 we could consider performing some kind
of cheap DSE during the store merging (perhaps guarded with flag_tree_dse).
And another thing to consider is only consider as problematic non-mergeable
stores that not only have order smaller than last_order as currently, but
also have order larger than first_order, as in this testcase if we actually
ignored (not merged with anything at all) the 192/32/0 store, because it is
not in between the other stores we'd merge, it would be fine to merge the
other 3 stores, though of course the testcase can be easily adjusted by
putting the 192/32 store after the 128/32 store and then this patch would be
still needed.  Though, I think I'd need more time thinking this over.

2020-02-25  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/93820
	* gimple-ssa-store-merging.c (check_no_overlap): Change RHS_CODE
	argument to ALL_INTEGER_CST_P boolean.
	(imm_store_chain_info::try_coalesce_bswap): Adjust caller.
	(imm_store_chain_info::coalesce_immediate_stores): Likewise.  Handle
	adjacent INTEGER_CST store into merged_store->only_constants like
	overlapping one.

	* gcc.dg/pr93820.c: New test.


	Jakub

Comments

Richard Biener Feb. 26, 2020, 8:09 a.m. UTC | #1
On Wed, 26 Feb 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled in 8+.
> The problem is that check_no_overlap has a special case for INTEGER_CST
> marked stores (i.e. stores of constants), if both all currenly merged stores
> and the one under consideration for merging with them are marked that way,
> it anticipates that other INTEGER_CST marked stores that overlap with those
> and precede those (have smaller info->order) could be merged with those and
> doesn't punt for them.
> In PR86844 and PR87859 fixes I've then added quite large code that is
> performed after check_no_overlap and tries to find out if we need and can
> merge further INTEGER_CST marked stores, or need to punt.
> Unfortunately, that code is there only in the overlapping case code and
> the testcase below shows that we really need it even in the adjacent store
> case.  After sort_by_bitpos we have:
> bitpos	width	order	rhs_code
> 96	32	3	INTEGER_CST
> 128	32	1	INTEGER_CST
> 128	128	2	INTEGER_CST
> 192	32	0	MEM_REF
> Because of the missing PR86844/PR87859-ish code in the adjacent store
> case, we merge the adjacent (memory wise) stores 96/32/3 and 128/32/1,
> and then we consider the 128-bit store which is in program-order in between
> them, but in this case we punt, because the merging would extend the
> merged store region from bitpos 96 and 64-bits to bitpos 96 and 160-bits
> and that has an overlap with an incompatible store (the MEM_REF one).
> The problem is that we can't really punt this way, because the 128-bit
> store is in between those two we've merged already, so either we manage
> to merge even that one together with the others, or would need to avoid
> already merging the 96/32/3 and 128/32/1 stores together.
> Now, rather than copying around the PR86844/PR87859 code to the other spot,
> we can actually just use the overlapping code, merge_overlapping is really
> a superset of merge_into, so that is what the patch does.  If doing
> adjacent store merge for rhs_code other than INTEGER_CST, I believe the
> current code is already fine, check_no_overlap in that case doesn't make
> the exception and will punt if there is some earlier (smaller order)
> non-mergeable overlapping store.  There is just one case that could be
> problematic, if the merged_store has BIT_INSERT_EXPRs in them and the
> new store is a constant store (INTEGER_CST rhs_code), then check_no_overlap
> would do the exception and still would allow the special case.  But we
> really shouldn't have the special case in that case, so this patch also
> changes check_no_overlap to just have a bool whether we should have the
> special case or not.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux on the trunk as well
> as coprresponding backport to 8 branch, ok for trunk and release branches?

OK.

> Note, as I said in the PR, for GCC11 we could consider performing some kind
> of cheap DSE during the store merging (perhaps guarded with flag_tree_dse).
> And another thing to consider is only consider as problematic non-mergeable
> stores that not only have order smaller than last_order as currently, but
> also have order larger than first_order, as in this testcase if we actually
> ignored (not merged with anything at all) the 192/32/0 store, because it is
> not in between the other stores we'd merge, it would be fine to merge the
> other 3 stores, though of course the testcase can be easily adjusted by
> putting the 192/32 store after the 128/32 store and then this patch would be
> still needed.  Though, I think I'd need more time thinking this over.

When looking at the PR I thought that the issue must be with the sorting
since if there's no intermediate store breaking the chain we should be
able to "merge" all INTEGER_CSTs by simply native encoding them in
program order?  (similar to what VN does with partial defs)

Richard.

> 2020-02-25  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/93820
> 	* gimple-ssa-store-merging.c (check_no_overlap): Change RHS_CODE
> 	argument to ALL_INTEGER_CST_P boolean.
> 	(imm_store_chain_info::try_coalesce_bswap): Adjust caller.
> 	(imm_store_chain_info::coalesce_immediate_stores): Likewise.  Handle
> 	adjacent INTEGER_CST store into merged_store->only_constants like
> 	overlapping one.
> 
> 	* gcc.dg/pr93820.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.c.jj	2020-02-13 10:03:39.000000000 +0100
> +++ gcc/gimple-ssa-store-merging.c	2020-02-25 18:47:40.303092315 +0100
> @@ -2370,8 +2370,9 @@ gather_bswap_load_refs (vec<tree> *refs,
>  /* Check if there are any stores in M_STORE_INFO after index I
>     (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap
>     a potential group ending with END that have their order
> -   smaller than LAST_ORDER.  RHS_CODE is the kind of store in the
> -   group.  Return true if there are no such stores.
> +   smaller than LAST_ORDER.  ALL_INTEGER_CST_P is true if
> +   all the stores already merged and the one under consideration
> +   have rhs_code of INTEGER_CST.  Return true if there are no such stores.
>     Consider:
>       MEM[(long long int *)p_28] = 0;
>       MEM[(long long int *)p_28 + 8B] = 0;
> @@ -2394,13 +2395,13 @@ gather_bswap_load_refs (vec<tree> *refs,
>     the MEM[(long long int *)p_28 + 8B] = 0; would now be before it,
>     so we need to refuse merging MEM[(long long int *)p_28 + 8B] = 0;
>     into the group.  That way it will be its own store group and will
> -   not be touched.  If RHS_CODE is INTEGER_CST and there are overlapping
> +   not be touched.  If ALL_INTEGER_CST_P and there are overlapping
>     INTEGER_CST stores, those are mergeable using merge_overlapping,
>     so don't return false for those.  */
>  
>  static bool
>  check_no_overlap (vec<store_immediate_info *> m_store_info, unsigned int i,
> -		  enum tree_code rhs_code, unsigned int last_order,
> +		  bool all_integer_cst_p, unsigned int last_order,
>  		  unsigned HOST_WIDE_INT end)
>  {
>    unsigned int len = m_store_info.length ();
> @@ -2410,7 +2411,7 @@ check_no_overlap (vec<store_immediate_in
>        if (info->bitpos >= end)
>  	break;
>        if (info->order < last_order
> -	  && (rhs_code != INTEGER_CST || info->rhs_code != INTEGER_CST))
> +	  && (!all_integer_cst_p || info->rhs_code != INTEGER_CST))
>  	return false;
>      }
>    return true;
> @@ -2563,7 +2564,7 @@ imm_store_chain_info::try_coalesce_bswap
>    if (n.base_addr == NULL_TREE && !is_gimple_val (n.src))
>      return false;
>  
> -  if (!check_no_overlap (m_store_info, last, LROTATE_EXPR, last_order, end))
> +  if (!check_no_overlap (m_store_info, last, false, last_order, end))
>      return false;
>  
>    /* Don't handle memory copy this way if normal non-bswap processing
> @@ -2713,7 +2714,14 @@ imm_store_chain_info::coalesce_immediate
>  	       |---store 2---|
>  	 Overlapping stores.  */
>        else if (IN_RANGE (info->bitpos, merged_store->start,
> -			 merged_store->start + merged_store->width - 1))
> +			 merged_store->start + merged_store->width - 1)
> +	       /* |---store 1---||---store 2---|
> +		  Handle also the consecutive INTEGER_CST stores case here,
> +		  as we have here the code to deal with overlaps.  */
> +	       || (info->bitregion_start <= merged_store->bitregion_end
> +		   && info->rhs_code == INTEGER_CST
> +		   && merged_store->only_constants
> +		   && merged_store->can_be_merged_into (info)))
>  	{
>  	  /* Only allow overlapping stores of constants.  */
>  	  if (info->rhs_code == INTEGER_CST
> @@ -2725,8 +2733,7 @@ imm_store_chain_info::coalesce_immediate
>  	      unsigned HOST_WIDE_INT end
>  		= MAX (merged_store->start + merged_store->width,
>  		       info->bitpos + info->bitsize);
> -	      if (check_no_overlap (m_store_info, i, INTEGER_CST,
> -				    last_order, end))
> +	      if (check_no_overlap (m_store_info, i, true, last_order, end))
>  		{
>  		  /* check_no_overlap call above made sure there are no
>  		     overlapping stores with non-INTEGER_CST rhs_code
> @@ -2879,7 +2886,7 @@ imm_store_chain_info::coalesce_immediate
>  	      std::swap (info->ops[0], info->ops[1]);
>  	      info->ops_swapped_p = true;
>  	    }
> -	  if (check_no_overlap (m_store_info, i, info->rhs_code,
> +	  if (check_no_overlap (m_store_info, i, false,
>  				MAX (merged_store->last_order, info->order),
>  				MAX (merged_store->start + merged_store->width,
>  				     info->bitpos + info->bitsize)))
> --- gcc/testsuite/gcc.dg/pr93820.c.jj	2020-02-25 18:34:01.535154614 +0100
> +++ gcc/testsuite/gcc.dg/pr93820.c	2020-02-25 14:32:31.600207611 +0100
> @@ -0,0 +1,26 @@
> +/* PR tree-optimization/93820 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-tree-dse" } */
> +
> +typedef int v4si __attribute__((vector_size(4 * sizeof (int))));
> +int a[10];
> +
> +__attribute__((noipa)) int
> +foo (int *p)
> +{
> +  a[6] = *p;
> +  a[4] = 1;
> +  *(((v4si *)&a[0]) + 1) = (v4si) { 0, 0, 0, 0 };
> +  a[3] = 0;
> +}
> +
> +int
> +main ()
> +{
> +  int i = 0;
> +  foo (&i);
> +  for (i = 0; i < 10; i++)
> +    if (a[i])
> +      __builtin_abort ();
> +  return 0;
> +}
> 
> 	Jakub
> 
>
Jakub Jelinek Feb. 26, 2020, 8:24 a.m. UTC | #2
On Wed, Feb 26, 2020 at 09:09:11AM +0100, Richard Biener wrote:
> > Note, as I said in the PR, for GCC11 we could consider performing some kind
> > of cheap DSE during the store merging (perhaps guarded with flag_tree_dse).
> > And another thing to consider is only consider as problematic non-mergeable
> > stores that not only have order smaller than last_order as currently, but
> > also have order larger than first_order, as in this testcase if we actually
> > ignored (not merged with anything at all) the 192/32/0 store, because it is
> > not in between the other stores we'd merge, it would be fine to merge the
> > other 3 stores, though of course the testcase can be easily adjusted by
> > putting the 192/32 store after the 128/32 store and then this patch would be
> > still needed.  Though, I think I'd need more time thinking this over.
> 
> When looking at the PR I thought that the issue must be with the sorting
> since if there's no intermediate store breaking the chain we should be
> able to "merge" all INTEGER_CSTs by simply native encoding them in
> program order?  (similar to what VN does with partial defs)

They are actually merged in the program order, there is
  stores.qsort (sort_by_order);
at the start of apply_stores, it is just that for the decisions what to
coalesce together it is better to process them in the bitpos order to find
out easily what is adjacent, overlapping etc.

	Jakub
diff mbox series

Patch

--- gcc/gimple-ssa-store-merging.c.jj	2020-02-13 10:03:39.000000000 +0100
+++ gcc/gimple-ssa-store-merging.c	2020-02-25 18:47:40.303092315 +0100
@@ -2370,8 +2370,9 @@  gather_bswap_load_refs (vec<tree> *refs,
 /* Check if there are any stores in M_STORE_INFO after index I
    (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap
    a potential group ending with END that have their order
-   smaller than LAST_ORDER.  RHS_CODE is the kind of store in the
-   group.  Return true if there are no such stores.
+   smaller than LAST_ORDER.  ALL_INTEGER_CST_P is true if
+   all the stores already merged and the one under consideration
+   have rhs_code of INTEGER_CST.  Return true if there are no such stores.
    Consider:
      MEM[(long long int *)p_28] = 0;
      MEM[(long long int *)p_28 + 8B] = 0;
@@ -2394,13 +2395,13 @@  gather_bswap_load_refs (vec<tree> *refs,
    the MEM[(long long int *)p_28 + 8B] = 0; would now be before it,
    so we need to refuse merging MEM[(long long int *)p_28 + 8B] = 0;
    into the group.  That way it will be its own store group and will
-   not be touched.  If RHS_CODE is INTEGER_CST and there are overlapping
+   not be touched.  If ALL_INTEGER_CST_P and there are overlapping
    INTEGER_CST stores, those are mergeable using merge_overlapping,
    so don't return false for those.  */
 
 static bool
 check_no_overlap (vec<store_immediate_info *> m_store_info, unsigned int i,
-		  enum tree_code rhs_code, unsigned int last_order,
+		  bool all_integer_cst_p, unsigned int last_order,
 		  unsigned HOST_WIDE_INT end)
 {
   unsigned int len = m_store_info.length ();
@@ -2410,7 +2411,7 @@  check_no_overlap (vec<store_immediate_in
       if (info->bitpos >= end)
 	break;
       if (info->order < last_order
-	  && (rhs_code != INTEGER_CST || info->rhs_code != INTEGER_CST))
+	  && (!all_integer_cst_p || info->rhs_code != INTEGER_CST))
 	return false;
     }
   return true;
@@ -2563,7 +2564,7 @@  imm_store_chain_info::try_coalesce_bswap
   if (n.base_addr == NULL_TREE && !is_gimple_val (n.src))
     return false;
 
-  if (!check_no_overlap (m_store_info, last, LROTATE_EXPR, last_order, end))
+  if (!check_no_overlap (m_store_info, last, false, last_order, end))
     return false;
 
   /* Don't handle memory copy this way if normal non-bswap processing
@@ -2713,7 +2714,14 @@  imm_store_chain_info::coalesce_immediate
 	       |---store 2---|
 	 Overlapping stores.  */
       else if (IN_RANGE (info->bitpos, merged_store->start,
-			 merged_store->start + merged_store->width - 1))
+			 merged_store->start + merged_store->width - 1)
+	       /* |---store 1---||---store 2---|
+		  Handle also the consecutive INTEGER_CST stores case here,
+		  as we have here the code to deal with overlaps.  */
+	       || (info->bitregion_start <= merged_store->bitregion_end
+		   && info->rhs_code == INTEGER_CST
+		   && merged_store->only_constants
+		   && merged_store->can_be_merged_into (info)))
 	{
 	  /* Only allow overlapping stores of constants.  */
 	  if (info->rhs_code == INTEGER_CST
@@ -2725,8 +2733,7 @@  imm_store_chain_info::coalesce_immediate
 	      unsigned HOST_WIDE_INT end
 		= MAX (merged_store->start + merged_store->width,
 		       info->bitpos + info->bitsize);
-	      if (check_no_overlap (m_store_info, i, INTEGER_CST,
-				    last_order, end))
+	      if (check_no_overlap (m_store_info, i, true, last_order, end))
 		{
 		  /* check_no_overlap call above made sure there are no
 		     overlapping stores with non-INTEGER_CST rhs_code
@@ -2879,7 +2886,7 @@  imm_store_chain_info::coalesce_immediate
 	      std::swap (info->ops[0], info->ops[1]);
 	      info->ops_swapped_p = true;
 	    }
-	  if (check_no_overlap (m_store_info, i, info->rhs_code,
+	  if (check_no_overlap (m_store_info, i, false,
 				MAX (merged_store->last_order, info->order),
 				MAX (merged_store->start + merged_store->width,
 				     info->bitpos + info->bitsize)))
--- gcc/testsuite/gcc.dg/pr93820.c.jj	2020-02-25 18:34:01.535154614 +0100
+++ gcc/testsuite/gcc.dg/pr93820.c	2020-02-25 14:32:31.600207611 +0100
@@ -0,0 +1,26 @@ 
+/* PR tree-optimization/93820 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-tree-dse" } */
+
+typedef int v4si __attribute__((vector_size(4 * sizeof (int))));
+int a[10];
+
+__attribute__((noipa)) int
+foo (int *p)
+{
+  a[6] = *p;
+  a[4] = 1;
+  *(((v4si *)&a[0]) + 1) = (v4si) { 0, 0, 0, 0 };
+  a[3] = 0;
+}
+
+int
+main ()
+{
+  int i = 0;
+  foo (&i);
+  for (i = 0; i < 10; i++)
+    if (a[i])
+      __builtin_abort ();
+  return 0;
+}