diff mbox series

Fix store merging (PR tree-optimization/84503)

Message ID 20180221222836.GS5867@tucnak
State New
Headers show
Series Fix store merging (PR tree-optimization/84503) | expand

Commit Message

Jakub Jelinek Feb. 21, 2018, 10:28 p.m. UTC
Hi!

The long comment above the new check_no_overlap function below should
hopefully explain the problem.  coalesce_immediate_stores is splitting the
stores from the same base into groups that we are going to optimize
individually; for each successfully merged group we emit new stores at the
location of the last store from that group.  And coalesce_immediate_stores
processes stores ordered primarily by increasing bit position and only
secondarily by the original ordering in the basic block.
On the following testcases, we have the unfortunate ordering of:
     MEM[(long long int *)p_28] = 0;
     MEM[(long long int *)p_28 + 8B] = 0;
     MEM[(long long int *)p_28 + 16B] = 0;
     MEM[(long long int *)p_28 + 24B] = 0;
     _129 = (int) _130;
     MEM[(int *)p_28 + 8B] = _129;
     MEM[(int *)p_28].a = -1;
We already have
     MEM[(long long int *)p_28] = 0;
     MEM[(int *)p_28].a = -1;
in the first group (which is fine) and the following 2 stores to consider
are
     MEM[(long long int *)p_28 + 8B] = 0;
and
     MEM[(int *)p_28 + 8B] = _129;
We add the first store to the group, which has the effect of emitting the
merged stores at the location of former MEM[(int *)p_28].a = -1;
store.  Then we process MEM[(int *)p_28 + 8B] = _129; (which was handled
just as potential bswap possibility and as it overlaps with previous
and can't be merged with next either, it will be its own group and thus
reuse the original stmt; the net effect is that we've reordered the
MEM[(long long int *)p_28 + 8B] = 0; store from before the
MEM[(int *)p_28 + 8B] = _129; store to after it, but those two do alias.

Instead of detecting this situation late, where we could just throw away the
whole group (which would mean we couldn't merge even the
MEM[(long long int *)p_28] = 0; and MEM[(int *)p_28].a = -1; stores)
this patch attempts to check before calling merge_into whether it will
not overlap with later stores we won't be able to emit in the same group
and if it does, it terminates the group right away.

I had to fix also the merged_store_group::merge_into width computation,
it was mistakenly just counting sum of the bits we've added to the group,
but we really need the distance between the first bit in the group and last
bit, including any gaps in between, but exclusing gaps before the start and
after the end (still within the bitregion).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

For 7.x I think we'll need something along the lines of PR82916 instead,
while the same testcase fails there, we only handle stores with lhs being
a constant and so the problem is that we aren't terminating the chain when
we should on the variable store.

2018-02-21  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/84503
	* gimple-ssa-store-merging.c (merged_store_group::merge_into): Compute
	width as info->bitpos + info->bitsize - start.
	(merged_store_group::merge_overlapping): Simplify width computation.
	(check_no_overlap): New function.
	(imm_store_chain_info::try_coalesce_bswap): Compute expected
	start + width and last_order of the group, fail if check_no_overlap
	fails.
	(imm_store_chain_info::coalesce_immediate_stores): Don't merge info
	to group if check_no_overlap fails.

	* gcc.dg/pr84503-1.c: New test.
	* gcc.dg/pr84503-2.c: New test.


	Jakub
diff mbox series

Patch

--- gcc/gimple-ssa-store-merging.c.jj	2018-01-16 09:52:26.230235131 +0100
+++ gcc/gimple-ssa-store-merging.c	2018-02-21 20:13:45.239129599 +0100
@@ -1891,12 +1891,11 @@  merged_store_group::do_merge (store_imme
 void
 merged_store_group::merge_into (store_immediate_info *info)
 {
-  unsigned HOST_WIDE_INT wid = info->bitsize;
   /* Make sure we're inserting in the position we think we're inserting.  */
   gcc_assert (info->bitpos >= start + width
 	      && info->bitregion_start <= bitregion_end);
 
-  width += wid;
+  width = info->bitpos + info->bitsize - start;
   do_merge (info);
 }
 
@@ -1909,7 +1908,7 @@  merged_store_group::merge_overlapping (s
 {
   /* If the store extends the size of the group, extend the width.  */
   if (info->bitpos + info->bitsize > start + width)
-    width += info->bitpos + info->bitsize - (start + width);
+    width = info->bitpos + info->bitsize - start;
 
   do_merge (info);
 }
@@ -2304,6 +2303,55 @@  gather_bswap_load_refs (vec<tree> *refs,
     }
 }
 
+/* Check if there are any stores in M_STORE_INFO after index I
+   (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap
+   a potential group ending with END that have their order
+   smaller than LAST_ORDER.  RHS_CODE is the kind of store in the
+   group.  Return true if there are no such stores.
+   Consider:
+     MEM[(long long int *)p_28] = 0;
+     MEM[(long long int *)p_28 + 8B] = 0;
+     MEM[(long long int *)p_28 + 16B] = 0;
+     MEM[(long long int *)p_28 + 24B] = 0;
+     _129 = (int) _130;
+     MEM[(int *)p_28 + 8B] = _129;
+     MEM[(int *)p_28].a = -1;
+   We already have
+     MEM[(long long int *)p_28] = 0;
+     MEM[(int *)p_28].a = -1;
+   stmts in the current group and need to consider if it is safe to
+   add MEM[(long long int *)p_28 + 8B] = 0; store into the same group.
+   There is an overlap between that store and the MEM[(int *)p_28 + 8B] = _129;
+   store though, so if we add the MEM[(long long int *)p_28 + 8B] = 0;
+   into the group and merging of those 3 stores is successful, merged
+   stmts will be emitted at the latest store from that group, i.e.
+   LAST_ORDER, which is the MEM[(int *)p_28].a = -1; store.
+   The MEM[(int *)p_28 + 8B] = _129; store that originally follows
+   the MEM[(long long int *)p_28 + 8B] = 0; would now be before it,
+   so we need to refuse merging MEM[(long long int *)p_28 + 8B] = 0;
+   into the group.  That way it will be its own store group and will
+   not be touched.  If RHS_CODE is INTEGER_CST and there are overlapping
+   INTEGER_CST stores, those are mergeable using merge_overlapping,
+   so don't return false for those.  */
+
+static bool
+check_no_overlap (vec<store_immediate_info *> m_store_info, unsigned int i,
+		  enum tree_code rhs_code, unsigned int last_order,
+		  unsigned HOST_WIDE_INT end)
+{
+  unsigned int len = m_store_info.length ();
+  for (++i; i < len; ++i)
+    {
+      store_immediate_info *info = m_store_info[i];
+      if (info->bitpos >= end)
+	break;
+      if (info->order < last_order
+	  && (rhs_code != INTEGER_CST || info->rhs_code != INTEGER_CST))
+	return false;
+    }
+  return true;
+}
+
 /* Return true if m_store_info[first] and at least one following store
    form a group which store try_size bitsize value which is byte swapped
    from a memory load or some value, or identity from some value.
@@ -2375,6 +2423,7 @@  imm_store_chain_info::try_coalesce_bswap
   unsigned int last_order = merged_store->last_order;
   gimple *first_stmt = merged_store->first_stmt;
   gimple *last_stmt = merged_store->last_stmt;
+  unsigned HOST_WIDE_INT end = merged_store->start + merged_store->width;
   store_immediate_info *infof = m_store_info[first];
 
   for (unsigned int i = first; i <= last; ++i)
@@ -2413,25 +2462,23 @@  imm_store_chain_info::try_coalesce_bswap
 	}
       else
 	{
-	  if (n.base_addr)
+	  if (n.base_addr && n.vuse != this_n.vuse)
 	    {
-	      if (n.vuse != this_n.vuse)
-		{
-		  if (vuse_store == 0)
-		    return false;
-		  vuse_store = 1;
-		}
-	      if (info->order > last_order)
-		{
-		  last_order = info->order;
-		  last_stmt = info->stmt;
-		}
-	      else if (info->order < first_order)
-		{
-		  first_order = info->order;
-		  first_stmt = info->stmt;
-		}
+	      if (vuse_store == 0)
+		return false;
+	      vuse_store = 1;
 	    }
+	  if (info->order > last_order)
+	    {
+	      last_order = info->order;
+	      last_stmt = info->stmt;
+	    }
+	  else if (info->order < first_order)
+	    {
+	      first_order = info->order;
+	      first_stmt = info->stmt;
+	    }
+	  end = MAX (end, info->bitpos + info->bitsize);
 
 	  ins_stmt = perform_symbolic_merge (ins_stmt, &n, info->ins_stmt,
 					     &this_n, &n);
@@ -2452,6 +2499,9 @@  imm_store_chain_info::try_coalesce_bswap
   if (n.base_addr == NULL_TREE && !is_gimple_val (n.src))
     return false;
 
+  if (!check_no_overlap (m_store_info, last, LROTATE_EXPR, last_order, end))
+    return false;
+
   /* Don't handle memory copy this way if normal non-bswap processing
      would handle it too.  */
   if (n.n == cmpnop && (unsigned) n.n_ops == last - first + 1)
@@ -2633,7 +2683,13 @@  imm_store_chain_info::coalesce_immediate
 	       : !info->ops[0].base_addr)
 	      && (infof->ops[1].base_addr
 		  ? compatible_load_p (merged_store, info, base_addr, 1)
-		  : !info->ops[1].base_addr))
+		  : !info->ops[1].base_addr)
+	      && check_no_overlap (m_store_info, i, info->rhs_code,
+				   MAX (merged_store->last_order,
+					info->order),
+				   MAX (merged_store->start
+					+ merged_store->width,
+					info->bitpos + info->bitsize)))
 	    {
 	      merged_store->merge_into (info);
 	      continue;
--- gcc/testsuite/gcc.dg/pr84503-1.c.jj	2018-02-21 20:36:33.474226039 +0100
+++ gcc/testsuite/gcc.dg/pr84503-1.c	2018-02-21 20:20:53.144165116 +0100
@@ -0,0 +1,68 @@ 
+/* PR tree-optimization/84503 */
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+typedef __SIZE_TYPE__ size_t;
+typedef __UINTPTR_TYPE__ uintptr_t;
+
+struct S { int a; unsigned short b; int c, d, e; long f, g, h; int i, j; };
+static struct S *k;
+static size_t l = 0;
+int m;
+
+static int
+bar (void)
+{
+  unsigned i;
+  int j;
+  if (k[0].c == 0)
+    {
+      ++m;
+      size_t n = l * 2;
+      struct S *o;
+      o = (struct S *) __builtin_realloc (k, sizeof (struct S) * n);
+      if (!o)
+	__builtin_exit (0);
+      k = o;
+      for (i = l; i < n; i++)
+	{
+	  void *p = (void *) &k[i];
+	  int q = 0;
+	  size_t r = sizeof (struct S);
+	  if ((((uintptr_t) p) % __alignof__ (long)) == 0
+	      && r % sizeof (long) == 0)
+	    {
+	      long __attribute__ ((may_alias)) *s = (long *) p;
+	      long *t = (long *) ((char *) s + r);
+	      while (s < t)
+		*s++ = 0;
+	    }
+	  else
+	    __builtin_memset (p, q, r);
+	  k[i].c = i + 1;
+	  k[i].a = -1;
+	}
+      k[n - 1].c = 0;
+      k[0].c = l;
+      l = n;
+    }
+  j = k[0].c;
+  k[0].c = k[j].c;
+  return j;
+}
+
+int
+main ()
+{
+  k = (struct S *) __builtin_malloc (sizeof (struct S));
+  if (!k)
+    __builtin_exit (0);
+  __builtin_memset (k, '\0', sizeof (struct S));
+  k->a = -1;
+  l = 1;
+  for (int i = 0; i < 15; ++i)
+    bar ();
+  if (m != 4)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.dg/pr84503-2.c.jj	2018-02-21 20:36:41.874226585 +0100
+++ gcc/testsuite/gcc.dg/pr84503-2.c	2018-02-21 20:36:59.875227757 +0100
@@ -0,0 +1,5 @@ 
+/* PR tree-optimization/84503 */
+/* { dg-do run } */
+/* { dg-options "-O3 -fno-tree-vectorize -fno-ivopts" } */
+
+#include "pr84503-1.c"