diff mbox series

[PR82726/PR70754,2/2] New fix by finding correct root reference in combined chains

Message ID DB5PR0801MB2742FAA88691784D0EC1331BE75D0@DB5PR0801MB2742.eurprd08.prod.outlook.com
State New
Headers show
Series [PR82726,1/2] Revert previous fixes for PR70754 and PR79663 | expand

Commit Message

Bin Cheng Nov. 3, 2017, 12:40 p.m. UTC
Hi,
As described in message of previous patch:

This patch set fixes both PRs in the opposite way: Instead of find dominance
insertion position for root reference, we resort zero-distance references of
combined chain by their position information so that new root reference must
dominate others.  This should be more efficient because we avoid function call
to stmt_dominates_stmt_p.
Bootstrap and test on x86_64 and AArch64 in patch set.  Is it OK?

Thanks,
bin
2017-11-02  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/82726
	PR tree-optimization/70754
	* tree-predcom.c (<map>, INCLUDE_ALGORITHM): New headers.
	(order_drefs_by_pos): New function.
	(combine_chains): Move code setting has_max_use_after to...
	(try_combine_chains): ...here.  New parameter.  Sort combined chains
	according to position information.
	(tree_predictive_commoning_loop): Update call to above function.
	(update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New.

gcc/testsuite
2017-11-02  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/82726
	* gcc.dg/tree-ssa/pr82726.c: New test.

Comments

Richard Biener Nov. 7, 2017, 10:53 a.m. UTC | #1
On Fri, Nov 3, 2017 at 1:40 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> As described in message of previous patch:
>
> This patch set fixes both PRs in the opposite way: Instead of find dominance
> insertion position for root reference, we resort zero-distance references of
> combined chain by their position information so that new root reference must
> dominate others.  This should be more efficient because we avoid function call
> to stmt_dominates_stmt_p.
> Bootstrap and test on x86_64 and AArch64 in patch set.  Is it OK?

+/* { dg-additional-options "-mavx2" { target avx2_runtime } } */

you don't need avx2_runtime for -mavx2 so please instead use
{ target { x86_64-*-* i?86-*-* } }

+#include <map>
+#define INCLUDE_ALGORITHM /* std::sort */

can you please use GCCs own hash_map?  Btw...

+  /* Setup UID for all statements in dominance order.  */
+  std::map<gimple *, int> stmts_map;
+  basic_block *bbs = get_loop_body_in_dom_order (loop);
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      int uid = 0;
+      basic_block bb = bbs[i];
+
+      for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
+          gsi_next (&bsi))
+       {
+         gimple *stmt = gsi_stmt (bsi);
+         if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt))))
+           stmts_map[stmt] = uid;

why don't you use gimple_[set_]uid ()?  Given you do a dominance check
you don't even need to do this in dominance order - usually passes just
number UIDs in all relevant BBs.  There is a helper for that as well,
renumber_gimple_stmt_uids_in_blocks which can be used on
the get_loop_body result.

+      /* Sort all ZERO distance references by position.  */
+      std::sort (&ch1->refs[0], &ch1->refs[0] + j, order_drefs_by_pos);
+

given ch1->refs is a vec you can use the new vec::qsort_block you added
instead of including algorithm and using std::sort.

Richard.

> Thanks,
> bin
> 2017-11-02  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/82726
>         PR tree-optimization/70754
>         * tree-predcom.c (<map>, INCLUDE_ALGORITHM): New headers.
>         (order_drefs_by_pos): New function.
>         (combine_chains): Move code setting has_max_use_after to...
>         (try_combine_chains): ...here.  New parameter.  Sort combined chains
>         according to position information.
>         (tree_predictive_commoning_loop): Update call to above function.
>         (update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New.
>
> gcc/testsuite
> 2017-11-02  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/82726
>         * gcc.dg/tree-ssa/pr82726.c: New test.
Bin.Cheng Nov. 10, 2017, 2:13 p.m. UTC | #2
On Tue, Nov 7, 2017 at 10:53 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Nov 3, 2017 at 1:40 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> As described in message of previous patch:
>>
>> This patch set fixes both PRs in the opposite way: Instead of find dominance
>> insertion position for root reference, we resort zero-distance references of
>> combined chain by their position information so that new root reference must
>> dominate others.  This should be more efficient because we avoid function call
>> to stmt_dominates_stmt_p.
>> Bootstrap and test on x86_64 and AArch64 in patch set.  Is it OK?
>
> +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
>
> you don't need avx2_runtime for -mavx2 so please instead use
> { target { x86_64-*-* i?86-*-* } }
>
> +#include <map>
> +#define INCLUDE_ALGORITHM /* std::sort */
>
> can you please use GCCs own hash_map?  Btw...
>
> +  /* Setup UID for all statements in dominance order.  */
> +  std::map<gimple *, int> stmts_map;
> +  basic_block *bbs = get_loop_body_in_dom_order (loop);
> +  for (i = 0; i < loop->num_nodes; i++)
> +    {
> +      int uid = 0;
> +      basic_block bb = bbs[i];
> +
> +      for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
> +          gsi_next (&bsi))
> +       {
> +         gimple *stmt = gsi_stmt (bsi);
> +         if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt))))
> +           stmts_map[stmt] = uid;
>
> why don't you use gimple_[set_]uid ()?  Given you do a dominance check
> you don't even need to do this in dominance order - usually passes just
> number UIDs in all relevant BBs.  There is a helper for that as well,
> renumber_gimple_stmt_uids_in_blocks which can be used on
> the get_loop_body result.
Yea, I forgot gimple_[set_]uid interface when doing this.  All fixed now.
>
> +      /* Sort all ZERO distance references by position.  */
> +      std::sort (&ch1->refs[0], &ch1->refs[0] + j, order_drefs_by_pos);
> +
>
> given ch1->refs is a vec you can use the new vec::qsort_block you added
> instead of including algorithm and using std::sort.
Sorry, I haven't push that patch in.  In this updated patch, I fall
back to generic qsort so algorithm is not included.

Bootstrap and test on x86_64.  Is it OK?
Thanks,
bin

2017-11-10  Bin Cheng  <bin.cheng@arm.com>

PR tree-optimization/82726
PR tree-optimization/70754
* tree-predcom.c (order_drefs_by_pos): New function.
(combine_chains): Move code setting has_max_use_after to...
(try_combine_chains): ...here.  New parameter.  Sort combined chains
according to position information.
(tree_predictive_commoning_loop): Update call to above function.
(update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New.

gcc/testsuite
2017-11-10  Bin Cheng  <bin.cheng@arm.com>

PR tree-optimization/82726
* gcc.dg/tree-ssa/pr82726.c: New test.


>
> Richard.
>
>> Thanks,
>> bin
>> 2017-11-02  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/82726
>>         PR tree-optimization/70754
>>         * tree-predcom.c (<map>, INCLUDE_ALGORITHM): New headers.
>>         (order_drefs_by_pos): New function.
>>         (combine_chains): Move code setting has_max_use_after to...
>>         (try_combine_chains): ...here.  New parameter.  Sort combined chains
>>         according to position information.
>>         (tree_predictive_commoning_loop): Update call to above function.
>>         (update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New.
>>
>> gcc/testsuite
>> 2017-11-02  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/82726
>>         * gcc.dg/tree-ssa/pr82726.c: New test.
Bin.Cheng Nov. 10, 2017, 2:14 p.m. UTC | #3
Hmm, the patch...

Thanks,
bin

On Fri, Nov 10, 2017 at 2:13 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Nov 7, 2017 at 10:53 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Nov 3, 2017 at 1:40 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> As described in message of previous patch:
>>>
>>> This patch set fixes both PRs in the opposite way: Instead of find dominance
>>> insertion position for root reference, we resort zero-distance references of
>>> combined chain by their position information so that new root reference must
>>> dominate others.  This should be more efficient because we avoid function call
>>> to stmt_dominates_stmt_p.
>>> Bootstrap and test on x86_64 and AArch64 in patch set.  Is it OK?
>>
>> +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
>>
>> you don't need avx2_runtime for -mavx2 so please instead use
>> { target { x86_64-*-* i?86-*-* } }
>>
>> +#include <map>
>> +#define INCLUDE_ALGORITHM /* std::sort */
>>
>> can you please use GCCs own hash_map?  Btw...
>>
>> +  /* Setup UID for all statements in dominance order.  */
>> +  std::map<gimple *, int> stmts_map;
>> +  basic_block *bbs = get_loop_body_in_dom_order (loop);
>> +  for (i = 0; i < loop->num_nodes; i++)
>> +    {
>> +      int uid = 0;
>> +      basic_block bb = bbs[i];
>> +
>> +      for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
>> +          gsi_next (&bsi))
>> +       {
>> +         gimple *stmt = gsi_stmt (bsi);
>> +         if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt))))
>> +           stmts_map[stmt] = uid;
>>
>> why don't you use gimple_[set_]uid ()?  Given you do a dominance check
>> you don't even need to do this in dominance order - usually passes just
>> number UIDs in all relevant BBs.  There is a helper for that as well,
>> renumber_gimple_stmt_uids_in_blocks which can be used on
>> the get_loop_body result.
> Yea, I forgot gimple_[set_]uid interface when doing this.  All fixed now.
>>
>> +      /* Sort all ZERO distance references by position.  */
>> +      std::sort (&ch1->refs[0], &ch1->refs[0] + j, order_drefs_by_pos);
>> +
>>
>> given ch1->refs is a vec you can use the new vec::qsort_block you added
>> instead of including algorithm and using std::sort.
> Sorry, I haven't push that patch in.  In this updated patch, I fall
> back to generic qsort so algorithm is not included.
>
> Bootstrap and test on x86_64.  Is it OK?
> Thanks,
> bin
>
> 2017-11-10  Bin Cheng  <bin.cheng@arm.com>
>
> PR tree-optimization/82726
> PR tree-optimization/70754
> * tree-predcom.c (order_drefs_by_pos): New function.
> (combine_chains): Move code setting has_max_use_after to...
> (try_combine_chains): ...here.  New parameter.  Sort combined chains
> according to position information.
> (tree_predictive_commoning_loop): Update call to above function.
> (update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New.
>
> gcc/testsuite
> 2017-11-10  Bin Cheng  <bin.cheng@arm.com>
>
> PR tree-optimization/82726
> * gcc.dg/tree-ssa/pr82726.c: New test.
>
>
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>> 2017-11-02  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         PR tree-optimization/82726
>>>         PR tree-optimization/70754
>>>         * tree-predcom.c (<map>, INCLUDE_ALGORITHM): New headers.
>>>         (order_drefs_by_pos): New function.
>>>         (combine_chains): Move code setting has_max_use_after to...
>>>         (try_combine_chains): ...here.  New parameter.  Sort combined chains
>>>         according to position information.
>>>         (tree_predictive_commoning_loop): Update call to above function.
>>>         (update_pos_for_combined_chains, pcom_stmt_dominates_stmt_p): New.
>>>
>>> gcc/testsuite
>>> 2017-11-02  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         PR tree-optimization/82726
>>>         * gcc.dg/tree-ssa/pr82726.c: New test.
From f7b9b4ac78f33aee60ecd37ca515f2f8773f5561 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Wed, 1 Nov 2017 17:43:55 +0000
Subject: [PATCH 2/2] pr82726-20171110.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/pr82726.c |  26 ++++++
 gcc/tree-predcom.c                      | 158 ++++++++++++++++++++++++++++----
 2 files changed, 168 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82726.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c
new file mode 100644
index 0000000..22bc59d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param tree-reassoc-width=4" } */
+/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
+
+#define N 40
+#define M 128
+unsigned int in[N+M];
+unsigned short out[N];
+
+/* Outer-loop vectorization. */
+
+void
+foo (){
+  int i,j;
+  unsigned int diff;
+
+  for (i = 0; i < N; i++) {
+    diff = 0;
+    for (j = 0; j < M; j+=8) {
+      diff += in[j+i];
+    }
+    out[i]=(unsigned short)diff;
+  }
+
+  return;
+}
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 24d7c9c..3b6c2ea 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -1020,6 +1020,17 @@ order_drefs (const void *a, const void *b)
   return (*da)->pos - (*db)->pos;
 }
 
+/* Compares two drefs A and B by their position.  Callback for qsort.  */
+
+static int
+order_drefs_by_pos (const void *a, const void *b)
+{
+  const dref *const da = (const dref *) a;
+  const dref *const db = (const dref *) b;
+
+  return (*da)->pos - (*db)->pos;
+}
+
 /* Returns root of the CHAIN.  */
 
 static inline dref
@@ -2633,7 +2644,6 @@ combine_chains (chain_p ch1, chain_p ch2)
   bool swap = false;
   chain_p new_chain;
   unsigned i;
-  gimple *root_stmt;
   tree rslt_type = NULL_TREE;
 
   if (ch1 == ch2)
@@ -2675,31 +2685,55 @@ combine_chains (chain_p ch1, chain_p ch2)
       new_chain->refs.safe_push (nw);
     }
 
-  new_chain->has_max_use_after = false;
-  root_stmt = get_chain_root (new_chain)->stmt;
-  for (i = 1; new_chain->refs.iterate (i, &nw); i++)
-    {
-      if (nw->distance == new_chain->length
-	  && !stmt_dominates_stmt_p (nw->stmt, root_stmt))
-	{
-	  new_chain->has_max_use_after = true;
-	  break;
-	}
-    }
-
   ch1->combined = true;
   ch2->combined = true;
   return new_chain;
 }
 
-/* Try to combine the CHAINS.  */
+/* Recursively update position information of all offspring chains to ROOT
+   chain's position information.  */
+
+static void
+update_pos_for_combined_chains (chain_p root)
+{
+  chain_p ch1 = root->ch1, ch2 = root->ch2;
+  dref ref, ref1, ref2;
+  for (unsigned j = 0; (root->refs.iterate (j, &ref)
+			&& ch1->refs.iterate (j, &ref1)
+			&& ch2->refs.iterate (j, &ref2)); ++j)
+    ref1->pos = ref2->pos = ref->pos;
+
+  if (ch1->type == CT_COMBINATION)
+    update_pos_for_combined_chains (ch1);
+  if (ch2->type == CT_COMBINATION)
+    update_pos_for_combined_chains (ch2);
+}
+
+/* Returns true if statement S1 dominates statement S2.  */
+
+static bool
+pcom_stmt_dominates_stmt_p (gimple *s1, gimple *s2)
+{
+  basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2);
+
+  if (!bb1 || s1 == s2)
+    return true;
+
+  if (bb1 == bb2)
+    return gimple_uid (s1) < gimple_uid (s2);
+
+  return dominated_by_p (CDI_DOMINATORS, bb2, bb1);
+}
+
+/* Try to combine the CHAINS in LOOP.  */
 
 static void
-try_combine_chains (vec<chain_p> *chains)
+try_combine_chains (struct loop *loop, vec<chain_p> *chains)
 {
   unsigned i, j;
   chain_p ch1, ch2, cch;
   auto_vec<chain_p> worklist;
+  bool combined_p = false;
 
   FOR_EACH_VEC_ELT (*chains, i, ch1)
     if (chain_can_be_combined_p (ch1))
@@ -2721,6 +2755,98 @@ try_combine_chains (vec<chain_p> *chains)
 	    {
 	      worklist.safe_push (cch);
 	      chains->safe_push (cch);
+	      combined_p = true;
+	      break;
+	    }
+	}
+    }
+  if (!combined_p)
+    return;
+
+  /* Setup UID for all statements in dominance order.  */
+  basic_block *bbs = get_loop_body (loop);
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      unsigned uid = 0;
+      basic_block bb = bbs[i];
+
+      for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
+	   gsi_next (&bsi))
+	{
+	  gimple *stmt = gsi_stmt (bsi);
+	  if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt))))
+	    gimple_set_uid (stmt, uid);
+	}
+
+      for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
+	   gsi_next (&bsi))
+	{
+	  gimple *stmt = gsi_stmt (bsi);
+	  if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt))
+	    gimple_set_uid (stmt, ++uid);
+	}
+    }
+  free (bbs);
+
+  /* Re-association in combined chains may generate statements different to
+     order of references of the original chain.  We need to keep references
+     of combined chain in dominance order so that all uses will be inserted
+     after definitions.  Note:
+       A) This is necessary for all combined chains.
+       B) This is only necessary for ZERO distance references because other
+	  references inherit value from loop carried PHIs.
+
+     We first update position information for all combined chains.  */
+  dref ref;
+  for (i = 0; chains->iterate (i, &ch1); ++i)
+    {
+      if (ch1->type != CT_COMBINATION || ch1->combined)
+	continue;
+
+      for (j = 0; ch1->refs.iterate (j, &ref); ++j)
+	ref->pos = gimple_uid (ref->stmt);
+
+      update_pos_for_combined_chains (ch1);
+    }
+  /* Then sort references according to newly updated position information.  */
+  for (i = 0; chains->iterate (i, &ch1); ++i)
+    {
+      if (ch1->type != CT_COMBINATION && !ch1->combined)
+	continue;
+
+      /* Find the first reference with non-ZERO distance.  */
+      if (ch1->length == 0)
+	j = ch1->refs.length();
+      else
+	{
+	  for (j = 0; ch1->refs.iterate (j, &ref); ++j)
+	    if (ref->distance != 0)
+	      break;
+	}
+
+      /* Sort all ZERO distance references by position.  */
+      qsort (&ch1->refs[0], j, sizeof (ch1->refs[0]), order_drefs_by_pos);
+
+      if (ch1->combined)
+	continue;
+
+      /* For ZERO length chain, has_max_use_after must be true since root
+	 combined stmt must dominates others.  */
+      if (ch1->length == 0)
+	{
+	  ch1->has_max_use_after = true;
+	  continue;
+	}
+      /* Check if there is use at max distance after root for combined chains
+	 and set flag accordingly.  */
+      ch1->has_max_use_after = false;
+      gimple *root_stmt = get_chain_root (ch1)->stmt;
+      for (j = 1; ch1->refs.iterate (j, &ref); ++j)
+	{
+	  if (ref->distance == ch1->length
+	      && !pcom_stmt_dominates_stmt_p (ref->stmt, root_stmt))
+	    {
+	      ch1->has_max_use_after = true;
 	      break;
 	    }
 	}
@@ -3058,7 +3184,7 @@ tree_predictive_commoning_loop (struct loop *loop)
   loop_closed_ssa = prepare_finalizers (loop, chains);
 
   /* Try to combine the chains that are always worked with together.  */
-  try_combine_chains (&chains);
+  try_combine_chains (loop, &chains);
 
   insert_init_seqs (loop, chains);
Bernhard Reutner-Fischer Nov. 11, 2017, 10:19 a.m. UTC | #4
On Fri, Nov 10, 2017 at 02:14:25PM +0000, Bin.Cheng wrote:
> Hmm, the patch...

+  /* Setup UID for all statements in dominance order.  */
+  basic_block *bbs = get_loop_body (loop);
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      unsigned uid = 0;
+      basic_block bb = bbs[i];
+
+      for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
+	   gsi_next (&bsi))
+	{
+	  gimple *stmt = gsi_stmt (bsi);
+	  if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt))))
+	    gimple_set_uid (stmt, uid);
+	}
+
+      for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
+	   gsi_next (&bsi))
+	{
+	  gimple *stmt = gsi_stmt (bsi);
+	  if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt))
+	    gimple_set_uid (stmt, ++uid);
+	}

      for (gimple_stmt_iterator bsi = gsi_start_nondebug_after_labels_bb (bb);
	   !gsi_end_p (bsi);
	   gsi_next_nondebug (&bsi))
	 gimple_set_uid (gsi_stmt (bsi), ++uid);

thanks,

+    }
+  free (bbs);
Richard Biener Nov. 13, 2017, 1:20 p.m. UTC | #5
On Sat, Nov 11, 2017 at 11:19 AM, Bernhard Reutner-Fischer
<rep.dot.nop@gmail.com> wrote:
> On Fri, Nov 10, 2017 at 02:14:25PM +0000, Bin.Cheng wrote:
>> Hmm, the patch...
>
> +  /* Setup UID for all statements in dominance order.  */
> +  basic_block *bbs = get_loop_body (loop);
> +  for (i = 0; i < loop->num_nodes; i++)
> +    {
> +      unsigned uid = 0;
> +      basic_block bb = bbs[i];
> +
> +      for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
> +          gsi_next (&bsi))
> +       {
> +         gimple *stmt = gsi_stmt (bsi);
> +         if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt))))
> +           gimple_set_uid (stmt, uid);
> +       }
> +
> +      for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
> +          gsi_next (&bsi))
> +       {
> +         gimple *stmt = gsi_stmt (bsi);
> +         if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt))
> +           gimple_set_uid (stmt, ++uid);
> +       }
>
>       for (gimple_stmt_iterator bsi = gsi_start_nondebug_after_labels_bb (bb);
>            !gsi_end_p (bsi);
>            gsi_next_nondebug (&bsi))
>          gimple_set_uid (gsi_stmt (bsi), ++uid);

Or even better instead of the whole loop

    renumber_gimple_stmt_uids_in_blocks (bbs, loop->num_nodes);

Ok with that change.

Thanks,
Richard.

> thanks,
>
> +    }
> +  free (bbs);
>
Bin.Cheng Nov. 15, 2017, 3:30 p.m. UTC | #6
On Mon, Nov 13, 2017 at 1:20 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Sat, Nov 11, 2017 at 11:19 AM, Bernhard Reutner-Fischer
> <rep.dot.nop@gmail.com> wrote:
>> On Fri, Nov 10, 2017 at 02:14:25PM +0000, Bin.Cheng wrote:
>>> Hmm, the patch...
>>
>> +  /* Setup UID for all statements in dominance order.  */
>> +  basic_block *bbs = get_loop_body (loop);
>> +  for (i = 0; i < loop->num_nodes; i++)
>> +    {
>> +      unsigned uid = 0;
>> +      basic_block bb = bbs[i];
>> +
>> +      for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
>> +          gsi_next (&bsi))
>> +       {
>> +         gimple *stmt = gsi_stmt (bsi);
>> +         if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt))))
>> +           gimple_set_uid (stmt, uid);
>> +       }
>> +
>> +      for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
>> +          gsi_next (&bsi))
>> +       {
>> +         gimple *stmt = gsi_stmt (bsi);
>> +         if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt))
>> +           gimple_set_uid (stmt, ++uid);
>> +       }
>>
>>       for (gimple_stmt_iterator bsi = gsi_start_nondebug_after_labels_bb (bb);
>>            !gsi_end_p (bsi);
>>            gsi_next_nondebug (&bsi))
>>          gimple_set_uid (gsi_stmt (bsi), ++uid);
>
> Or even better instead of the whole loop
>
>     renumber_gimple_stmt_uids_in_blocks (bbs, loop->num_nodes);
>
> Ok with that change.
Right, here is the updated patch.  Will commit it later.

Thanks,
bin
>
> Thanks,
> Richard.
>
>> thanks,
>>
>> +    }
>> +  free (bbs);
>>
From 28a21f4a86ed4e1b5a174b004c45bd4b8ede944f Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Wed, 1 Nov 2017 17:43:55 +0000
Subject: [PATCH 2/2] pr82726-20171111.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/pr82726.c |  26 ++++++
 gcc/tree-predcom.c                      | 138 ++++++++++++++++++++++++++++----
 2 files changed, 148 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82726.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c
new file mode 100644
index 0000000..22bc59d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param tree-reassoc-width=4" } */
+/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
+
+#define N 40
+#define M 128
+unsigned int in[N+M];
+unsigned short out[N];
+
+/* Outer-loop vectorization. */
+
+void
+foo (){
+  int i,j;
+  unsigned int diff;
+
+  for (i = 0; i < N; i++) {
+    diff = 0;
+    for (j = 0; j < M; j+=8) {
+      diff += in[j+i];
+    }
+    out[i]=(unsigned short)diff;
+  }
+
+  return;
+}
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 24d7c9c..28dac82 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -1020,6 +1020,17 @@ order_drefs (const void *a, const void *b)
   return (*da)->pos - (*db)->pos;
 }
 
+/* Compares two drefs A and B by their position.  Callback for qsort.  */
+
+static int
+order_drefs_by_pos (const void *a, const void *b)
+{
+  const dref *const da = (const dref *) a;
+  const dref *const db = (const dref *) b;
+
+  return (*da)->pos - (*db)->pos;
+}
+
 /* Returns root of the CHAIN.  */
 
 static inline dref
@@ -2633,7 +2644,6 @@ combine_chains (chain_p ch1, chain_p ch2)
   bool swap = false;
   chain_p new_chain;
   unsigned i;
-  gimple *root_stmt;
   tree rslt_type = NULL_TREE;
 
   if (ch1 == ch2)
@@ -2675,31 +2685,55 @@ combine_chains (chain_p ch1, chain_p ch2)
       new_chain->refs.safe_push (nw);
     }
 
-  new_chain->has_max_use_after = false;
-  root_stmt = get_chain_root (new_chain)->stmt;
-  for (i = 1; new_chain->refs.iterate (i, &nw); i++)
-    {
-      if (nw->distance == new_chain->length
-	  && !stmt_dominates_stmt_p (nw->stmt, root_stmt))
-	{
-	  new_chain->has_max_use_after = true;
-	  break;
-	}
-    }
-
   ch1->combined = true;
   ch2->combined = true;
   return new_chain;
 }
 
-/* Try to combine the CHAINS.  */
+/* Recursively update position information of all offspring chains to ROOT
+   chain's position information.  */
+
+static void
+update_pos_for_combined_chains (chain_p root)
+{
+  chain_p ch1 = root->ch1, ch2 = root->ch2;
+  dref ref, ref1, ref2;
+  for (unsigned j = 0; (root->refs.iterate (j, &ref)
+			&& ch1->refs.iterate (j, &ref1)
+			&& ch2->refs.iterate (j, &ref2)); ++j)
+    ref1->pos = ref2->pos = ref->pos;
+
+  if (ch1->type == CT_COMBINATION)
+    update_pos_for_combined_chains (ch1);
+  if (ch2->type == CT_COMBINATION)
+    update_pos_for_combined_chains (ch2);
+}
+
+/* Returns true if statement S1 dominates statement S2.  */
+
+static bool
+pcom_stmt_dominates_stmt_p (gimple *s1, gimple *s2)
+{
+  basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2);
+
+  if (!bb1 || s1 == s2)
+    return true;
+
+  if (bb1 == bb2)
+    return gimple_uid (s1) < gimple_uid (s2);
+
+  return dominated_by_p (CDI_DOMINATORS, bb2, bb1);
+}
+
+/* Try to combine the CHAINS in LOOP.  */
 
 static void
-try_combine_chains (vec<chain_p> *chains)
+try_combine_chains (struct loop *loop, vec<chain_p> *chains)
 {
   unsigned i, j;
   chain_p ch1, ch2, cch;
   auto_vec<chain_p> worklist;
+  bool combined_p = false;
 
   FOR_EACH_VEC_ELT (*chains, i, ch1)
     if (chain_can_be_combined_p (ch1))
@@ -2721,6 +2755,78 @@ try_combine_chains (vec<chain_p> *chains)
 	    {
 	      worklist.safe_push (cch);
 	      chains->safe_push (cch);
+	      combined_p = true;
+	      break;
+	    }
+	}
+    }
+  if (!combined_p)
+    return;
+
+  /* Setup UID for all statements in dominance order.  */
+  basic_block *bbs = get_loop_body (loop);
+  renumber_gimple_stmt_uids_in_blocks (bbs, loop->num_nodes);
+  free (bbs);
+
+  /* Re-association in combined chains may generate statements different to
+     order of references of the original chain.  We need to keep references
+     of combined chain in dominance order so that all uses will be inserted
+     after definitions.  Note:
+       A) This is necessary for all combined chains.
+       B) This is only necessary for ZERO distance references because other
+	  references inherit value from loop carried PHIs.
+
+     We first update position information for all combined chains.  */
+  dref ref;
+  for (i = 0; chains->iterate (i, &ch1); ++i)
+    {
+      if (ch1->type != CT_COMBINATION || ch1->combined)
+	continue;
+
+      for (j = 0; ch1->refs.iterate (j, &ref); ++j)
+	ref->pos = gimple_uid (ref->stmt);
+
+      update_pos_for_combined_chains (ch1);
+    }
+  /* Then sort references according to newly updated position information.  */
+  for (i = 0; chains->iterate (i, &ch1); ++i)
+    {
+      if (ch1->type != CT_COMBINATION && !ch1->combined)
+	continue;
+
+      /* Find the first reference with non-ZERO distance.  */
+      if (ch1->length == 0)
+	j = ch1->refs.length();
+      else
+	{
+	  for (j = 0; ch1->refs.iterate (j, &ref); ++j)
+	    if (ref->distance != 0)
+	      break;
+	}
+
+      /* Sort all ZERO distance references by position.  */
+      qsort (&ch1->refs[0], j, sizeof (ch1->refs[0]), order_drefs_by_pos);
+
+      if (ch1->combined)
+	continue;
+
+      /* For ZERO length chain, has_max_use_after must be true since root
+	 combined stmt must dominates others.  */
+      if (ch1->length == 0)
+	{
+	  ch1->has_max_use_after = true;
+	  continue;
+	}
+      /* Check if there is use at max distance after root for combined chains
+	 and set flag accordingly.  */
+      ch1->has_max_use_after = false;
+      gimple *root_stmt = get_chain_root (ch1)->stmt;
+      for (j = 1; ch1->refs.iterate (j, &ref); ++j)
+	{
+	  if (ref->distance == ch1->length
+	      && !pcom_stmt_dominates_stmt_p (ref->stmt, root_stmt))
+	    {
+	      ch1->has_max_use_after = true;
 	      break;
 	    }
 	}
@@ -3058,7 +3164,7 @@ tree_predictive_commoning_loop (struct loop *loop)
   loop_closed_ssa = prepare_finalizers (loop, chains);
 
   /* Try to combine the chains that are always worked with together.  */
-  try_combine_chains (&chains);
+  try_combine_chains (loop, &chains);
 
   insert_init_seqs (loop, chains);
diff mbox series

Patch

From 843cef544a46236e40063416cebc8037736ad18a Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Wed, 1 Nov 2017 17:43:55 +0000
Subject: [PATCH 2/2] pr82726-20171102.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/pr82726.c |  26 ++++++
 gcc/tree-predcom.c                      | 159 ++++++++++++++++++++++++++++----
 2 files changed, 169 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82726.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c
new file mode 100644
index 0000000..179f93a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82726.c
@@ -0,0 +1,26 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 --param tree-reassoc-width=4" } */
+/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
+
+#define N 40
+#define M 128
+unsigned int in[N+M];
+unsigned short out[N];
+
+/* Outer-loop vectorization. */
+
+void
+foo (){
+  int i,j;
+  unsigned int diff;
+
+  for (i = 0; i < N; i++) {
+    diff = 0;
+    for (j = 0; j < M; j+=8) {
+      diff += in[j+i];
+    }
+    out[i]=(unsigned short)diff;
+  }
+
+  return;
+}
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 24d7c9c..a243bce 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -201,6 +201,8 @@  along with GCC; see the file COPYING3.  If not see
    i * i with ii_last + 2 * i + 1), to generalize strength reduction.  */
 
 #include "config.h"
+#include <map>
+#define INCLUDE_ALGORITHM /* std::sort */
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -1020,6 +1022,14 @@  order_drefs (const void *a, const void *b)
   return (*da)->pos - (*db)->pos;
 }
 
+/* Compares two drefs A and B by their position.  Callback for std::sort.  */
+
+static bool
+order_drefs_by_pos (dref a, dref b)
+{
+  return a->pos < b->pos;
+}
+
 /* Returns root of the CHAIN.  */
 
 static inline dref
@@ -2633,7 +2643,6 @@  combine_chains (chain_p ch1, chain_p ch2)
   bool swap = false;
   chain_p new_chain;
   unsigned i;
-  gimple *root_stmt;
   tree rslt_type = NULL_TREE;
 
   if (ch1 == ch2)
@@ -2675,31 +2684,56 @@  combine_chains (chain_p ch1, chain_p ch2)
       new_chain->refs.safe_push (nw);
     }
 
-  new_chain->has_max_use_after = false;
-  root_stmt = get_chain_root (new_chain)->stmt;
-  for (i = 1; new_chain->refs.iterate (i, &nw); i++)
-    {
-      if (nw->distance == new_chain->length
-	  && !stmt_dominates_stmt_p (nw->stmt, root_stmt))
-	{
-	  new_chain->has_max_use_after = true;
-	  break;
-	}
-    }
-
   ch1->combined = true;
   ch2->combined = true;
   return new_chain;
 }
 
-/* Try to combine the CHAINS.  */
+/* Recursively update position information of all offspring chains to ROOT
+   chain's position information.  */
+
+static void
+update_pos_for_combined_chains (chain_p root)
+{
+  chain_p ch1 = root->ch1, ch2 = root->ch2;
+  dref ref, ref1, ref2;
+  for (unsigned j = 0; (root->refs.iterate (j, &ref)
+			&& ch1->refs.iterate (j, &ref1)
+			&& ch2->refs.iterate (j, &ref2)); ++j)
+    ref1->pos = ref2->pos = ref->pos;
+
+  if (ch1->type == CT_COMBINATION)
+    update_pos_for_combined_chains (ch1);
+  if (ch2->type == CT_COMBINATION)
+    update_pos_for_combined_chains (ch2);
+}
+
+/* Returns true if statement S1 dominates statement S2.  */
+
+static bool
+pcom_stmt_dominates_stmt_p (std::map<gimple *, int> &stmts_map,
+			    gimple *s1, gimple *s2)
+{
+  basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2);
+
+  if (!bb1 || s1 == s2)
+    return true;
+
+  if (bb1 == bb2)
+    return stmts_map[s1] < stmts_map[s2];
+
+  return dominated_by_p (CDI_DOMINATORS, bb2, bb1);
+}
+
+/* Try to combine the CHAINS in LOOP.  */
 
 static void
-try_combine_chains (vec<chain_p> *chains)
+try_combine_chains (struct loop *loop, vec<chain_p> *chains)
 {
   unsigned i, j;
   chain_p ch1, ch2, cch;
   auto_vec<chain_p> worklist;
+  bool combined_p = false;
 
   FOR_EACH_VEC_ELT (*chains, i, ch1)
     if (chain_can_be_combined_p (ch1))
@@ -2721,6 +2755,99 @@  try_combine_chains (vec<chain_p> *chains)
 	    {
 	      worklist.safe_push (cch);
 	      chains->safe_push (cch);
+	      combined_p = true;
+	      break;
+	    }
+	}
+    }
+  if (!combined_p)
+    return;
+
+  /* Setup UID for all statements in dominance order.  */
+  std::map<gimple *, int> stmts_map;
+  basic_block *bbs = get_loop_body_in_dom_order (loop);
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      int uid = 0;
+      basic_block bb = bbs[i];
+
+      for (gimple_stmt_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
+	   gsi_next (&bsi))
+	{
+	  gimple *stmt = gsi_stmt (bsi);
+	  if (!virtual_operand_p (gimple_phi_result (as_a<gphi *> (stmt))))
+	    stmts_map[stmt] = uid;
+	}
+
+      for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
+	   gsi_next (&bsi))
+	{
+	  gimple *stmt = gsi_stmt (bsi);
+	  if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt))
+	    stmts_map[stmt] = ++uid;
+	}
+    }
+  free (bbs);
+
+  /* Re-association in combined chains may generate statements different to
+     order of references of the original chain.  We need to keep references
+     of combined chain in dominance order so that all uses will be inserted
+     after definitions.  Note:
+       A) This is necessary for all combined chains.
+       B) This is only necessary for ZERO distance references because other
+	  references inherit value from loop carried PHIs.
+
+     We first update position information for all combined chains.  */
+  dref ref;
+  for (i = 0; chains->iterate (i, &ch1); ++i)
+    {
+      if (ch1->type != CT_COMBINATION || ch1->combined)
+	continue;
+
+      for (j = 0; ch1->refs.iterate (j, &ref); ++j)
+	ref->pos = stmts_map[ref->stmt];
+
+      update_pos_for_combined_chains (ch1);
+    }
+  /* Then sort references according to newly updated position information.  */
+  for (i = 0; chains->iterate (i, &ch1); ++i)
+    {
+      if (ch1->type != CT_COMBINATION && !ch1->combined)
+	continue;
+
+      /* Find the first reference with non-ZERO distance.  */
+      if (ch1->length == 0)
+	j = ch1->refs.length();
+      else
+	{
+	  for (j = 0; ch1->refs.iterate (j, &ref); ++j)
+	    if (ref->distance != 0)
+	      break;
+	}
+
+      /* Sort all ZERO distance references by position.  */
+      std::sort (&ch1->refs[0], &ch1->refs[0] + j, order_drefs_by_pos);
+
+      if (ch1->combined)
+	continue;
+
+      /* For ZERO length chain, has_max_use_after must be true since root
+	 combined stmt must dominates others.  */
+      if (ch1->length == 0)
+	{
+	  ch1->has_max_use_after = true;
+	  continue;
+	}
+      /* Check if there is use at max distance after root for combined chains
+	 and set flag accordingly.  */
+      ch1->has_max_use_after = false;
+      gimple *root_stmt = get_chain_root (ch1)->stmt;
+      for (j = 1; ch1->refs.iterate (j, &ref); ++j)
+	{
+	  if (ref->distance == ch1->length
+	      && !pcom_stmt_dominates_stmt_p (stmts_map, ref->stmt, root_stmt))
+	    {
+	      ch1->has_max_use_after = true;
 	      break;
 	    }
 	}
@@ -3058,7 +3185,7 @@  tree_predictive_commoning_loop (struct loop *loop)
   loop_closed_ssa = prepare_finalizers (loop, chains);
 
   /* Try to combine the chains that are always worked with together.  */
-  try_combine_chains (&chains);
+  try_combine_chains (loop, &chains);
 
   insert_init_seqs (loop, chains);
 
-- 
1.9.1