diff mbox series

[committed] Backports to gcc-9

Message ID mptblpwmfyu.fsf@arm.com
State New
Headers show
Series [committed] Backports to gcc-9 | expand

Commit Message

Richard Sandiford Feb. 18, 2020, 8:56 a.m. UTC
The first patch is a backport of Prathamesh's aarch64 fix.  The other
three were approved for branches during the initial submission.

Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu. pushed.

Richard
diff mbox series

Patch

From da055b66ff0309c41b5f4d7db12e42e27123373b Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Mon, 27 Jan 2020 19:37:55 +0000
Subject: [PATCH 4/4] predcom: Fix invalid store-store commoning [PR93434]

predcom has the following code to stop one rogue load from
interfering with other store-load opportunities:

      /* If A is read and B write or vice versa and there is unsuitable
	 dependence, instead of merging both components into a component
	 that will certainly not pass suitable_component_p, just put the
	 read into bad component, perhaps at least the write together with
	 all the other data refs in it's component will be optimizable.  */

But when store-store commoning was added later, this had the effect
of ignoring loads that occur between two candidate stores.

There is code further up to handle loads and stores with unknown
dependences:

      /* Don't do store elimination if there is any unknown dependence for
	 any store data reference.  */
      if ((DR_IS_WRITE (dra) || DR_IS_WRITE (drb))
	  && (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
	      || DDR_NUM_DIST_VECTS (ddr) == 0))
	eliminate_store_p = false;

But the store-load code above skips loads for *known* dependences
if (a) the load has already been marked "bad" or (b) the data-ref
machinery knows the dependence distance, but determine_offsets
can't handle the combination.

(a) happens to be the problem in the testcase, but a different
sequence could have given (b) instead.  We have writes to individual
fields of a structure and reads from the whole structure.  Since
determine_offsets requires the types to be the same, it returns false
for each such read/write combination.

This patch records which components have had loads removed and
prevents store-store commoning for them.  It's a bit too pessimistic,
since there shouldn't be a problem if a "bad" load dominates all stores
in a component.  But (a) we can't AFAIK use pcom_stmt_dominates_stmt_p
here and (b) the handling for that case would probably need to be
removed again if we handled more exotic cases in future.

2020-02-18  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	Backport from mainline
	2020-01-28  Richard Sandiford  <richard.sandiford@arm.com>

	PR tree-optimization/93434
	* tree-predcom.c (split_data_refs_to_components): Record which
	components have had aliasing loads removed.  Prevent store-store
	commoning for all such components.

gcc/testsuite/
	PR tree-optimization/93434
	* gcc.c-torture/execute/pr93434.c: New test.
---
 gcc/testsuite/gcc.c-torture/execute/pr93434.c | 36 +++++++++++++++++++
 gcc/tree-predcom.c                            | 24 +++++++++++--
 2 files changed, 58 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr93434.c

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr93434.c b/gcc/testsuite/gcc.c-torture/execute/pr93434.c
new file mode 100644
index 00000000000..e786252794b
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr93434.c
@@ -0,0 +1,36 @@ 
+typedef struct creal_T {
+  double re;
+  double im;
+} creal_T;
+
+#define N 16
+int main() {
+  int k;
+  int i;
+  int j;
+  creal_T t2[N];
+  double inval;
+
+  inval = 1.0;
+  for (j = 0; j < N; ++j) {
+    t2[j].re = 0;
+    t2[j].im = 0;
+  }
+
+  for (j = 0; j < N/4; j++) {
+    i = j * 4;
+    t2[i].re = inval;
+    t2[i].im = inval;
+    k = i + 3;
+    t2[k].re = inval;
+    t2[k].im = inval;
+    t2[i] = t2[k];
+    t2[k].re = inval;
+  }
+
+  for (i = 0; i < 2; ++i)
+    if (t2[i].re != !i || t2[i].im != !i)
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 8e83a715a24..dd66cdc12ca 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -767,6 +767,7 @@  split_data_refs_to_components (struct loop *loop,
   /* Don't do store elimination if loop has multiple exit edges.  */
   bool eliminate_store_p = single_exit (loop) != NULL;
   basic_block last_always_executed = last_always_executed_block (loop);
+  auto_bitmap no_store_store_comps;
 
   FOR_EACH_VEC_ELT (datarefs, i, dr)
     {
@@ -838,9 +839,13 @@  split_data_refs_to_components (struct loop *loop,
       else if (DR_IS_READ (dra) && ib != bad)
 	{
 	  if (ia == bad)
-	    continue;
+	    {
+	      bitmap_set_bit (no_store_store_comps, ib);
+	      continue;
+	    }
 	  else if (!determine_offset (dra, drb, &dummy_off))
 	    {
+	      bitmap_set_bit (no_store_store_comps, ib);
 	      merge_comps (comp_father, comp_size, bad, ia);
 	      continue;
 	    }
@@ -848,9 +853,13 @@  split_data_refs_to_components (struct loop *loop,
       else if (DR_IS_READ (drb) && ia != bad)
 	{
 	  if (ib == bad)
-	    continue;
+	    {
+	      bitmap_set_bit (no_store_store_comps, ia);
+	      continue;
+	    }
 	  else if (!determine_offset (dra, drb, &dummy_off))
 	    {
+	      bitmap_set_bit (no_store_store_comps, ia);
 	      merge_comps (comp_father, comp_size, bad, ib);
 	      continue;
 	    }
@@ -908,6 +917,17 @@  split_data_refs_to_components (struct loop *loop,
       comp->refs.quick_push (dataref);
     }
 
+  if (eliminate_store_p)
+    {
+      bitmap_iterator bi;
+      EXECUTE_IF_SET_IN_BITMAP (no_store_store_comps, 0, ia, bi)
+	{
+	  ca = component_of (comp_father, ia);
+	  if (ca != bad)
+	    comps[ca]->eliminate_store_p = false;
+	}
+    }
+
   for (i = 0; i < n; i++)
     {
       comp = comps[i];
-- 
2.17.1