diff mbox

A new reload-rewrite pattern recognizer for GCC vectorizer.

Message ID CAK=A3=0Ay6rg7TPRp6DD1ROgth1F9JRHwYhnxCAhsbEWbip+nA@mail.gmail.com
State New
Headers show

Commit Message

Cong Hou April 25, 2014, 12:32 a.m. UTC
In this patch a new reload-rewrite pattern detector is composed to
handle the following pattern in the loop being vectorized:

   x = *p;
   ...
   y = *p;

   or

   *p = x;
   ...
   y = *p;

In both cases, *p is reloaded because there may exist other defs to
another memref that may alias with p.  However, aliasing is eliminated
with alias checks.  Then we can safely replace the last statement in
above cases by y = x.

The following rewrite pattern is also detected:

   *p = x;
   ...
   *p = y;

The first write is redundant due to the fact that there is no aliasing
between p and other pointers.  In this case we don't need to vectorize
this write.  Here we replace it with a dummy statement z = x.

Bootstrapped and tested on x86-64.

OK for trunk?


thanks,
Cong

Comments

Jakub Jelinek April 26, 2014, 7:54 a.m. UTC | #1
On Thu, Apr 24, 2014 at 05:32:54PM -0700, Cong Hou wrote:
> In this patch a new reload-rewrite pattern detector is composed to
> handle the following pattern in the loop being vectorized:
> 
>    x = *p;
>    ...
>    y = *p;
> 
>    or
> 
>    *p = x;
>    ...
>    y = *p;
> 
> In both cases, *p is reloaded because there may exist other defs to
> another memref that may alias with p.  However, aliasing is eliminated
> with alias checks.  Then we can safely replace the last statement in
> above cases by y = x.

Not safely, at least not for #pragma omp simd/#pragma simd/#pragma ivdep
loops if alias analysis hasn't proven there is no aliasing.

So, IMNSHO you need to guard this with LOOP_VINFO_NO_DATA_DEPENDENCIES,
assuming it has been computed at that point already (otherwise you need to
do it elsewhere).

Consider:

void
foo (int *p, int *q)
{
  int i;
  #pragma omp simd safelen(16)
  for (i = 0; i < 128; i++)
    {
      int x = *p;
      *q += 8;
      *p = *p + x;
      p++;
      q++;
    }
}

It is valid to call the above with completely unrelated p and q, but
also e.g. p == q, or q == p + 16 or p == q + 16.
Your patch would certainly break it e.g. for p == q.

	Jakub
Cong Hou April 30, 2014, 8:28 p.m. UTC | #2
Thank you for reminding me the omp possibility. Yes, in this case my
pattern is incorrect, because I assume all aliases will be resolved by
alias checks, which may not be true with omp.

LOOP_VINFO_NO_DATA_DEPENDENCIES or LOOP_REQUIRES_VERSIONING_FOR_ALIAS
may not help here because vect_pattern_recog() is called prior to
vect_analyze_data_ref_dependences() in vect_analyze_loop_2().

So can we detect if omp is used in the pattern recognizer? Like
checking loop->force_vectorize? Is there any other case in which my
assumption does not hold?


thanks,
Cong


On Sat, Apr 26, 2014 at 12:54 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Apr 24, 2014 at 05:32:54PM -0700, Cong Hou wrote:
>> In this patch a new reload-rewrite pattern detector is composed to
>> handle the following pattern in the loop being vectorized:
>>
>>    x = *p;
>>    ...
>>    y = *p;
>>
>>    or
>>
>>    *p = x;
>>    ...
>>    y = *p;
>>
>> In both cases, *p is reloaded because there may exist other defs to
>> another memref that may alias with p.  However, aliasing is eliminated
>> with alias checks.  Then we can safely replace the last statement in
>> above cases by y = x.
>
> Not safely, at least not for #pragma omp simd/#pragma simd/#pragma ivdep
> loops if alias analysis hasn't proven there is no aliasing.
>
> So, IMNSHO you need to guard this with LOOP_VINFO_NO_DATA_DEPENDENCIES,
> assuming it has been computed at that point already (otherwise you need to
> do it elsewhere).
>
> Consider:
>
> void
> foo (int *p, int *q)
> {
>   int i;
>   #pragma omp simd safelen(16)
>   for (i = 0; i < 128; i++)
>     {
>       int x = *p;
>       *q += 8;
>       *p = *p + x;
>       p++;
>       q++;
>     }
> }
>
> It is valid to call the above with completely unrelated p and q, but
> also e.g. p == q, or q == p + 16 or p == q + 16.
> Your patch would certainly break it e.g. for p == q.
>
>         Jakub
Cong Hou May 23, 2014, 1:48 a.m. UTC | #3
Ping?


thanks,
Cong


On Wed, Apr 30, 2014 at 1:28 PM, Cong Hou <congh@google.com> wrote:
> Thank you for reminding me the omp possibility. Yes, in this case my
> pattern is incorrect, because I assume all aliases will be resolved by
> alias checks, which may not be true with omp.
>
> LOOP_VINFO_NO_DATA_DEPENDENCIES or LOOP_REQUIRES_VERSIONING_FOR_ALIAS
> may not help here because vect_pattern_recog() is called prior to
> vect_analyze_data_ref_dependences() in vect_analyze_loop_2().
>
> So can we detect if omp is used in the pattern recognizer? Like
> checking loop->force_vectorize? Is there any other case in which my
> assumption does not hold?
>
>
> thanks,
> Cong
>
>
> On Sat, Apr 26, 2014 at 12:54 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, Apr 24, 2014 at 05:32:54PM -0700, Cong Hou wrote:
>>> In this patch a new reload-rewrite pattern detector is composed to
>>> handle the following pattern in the loop being vectorized:
>>>
>>>    x = *p;
>>>    ...
>>>    y = *p;
>>>
>>>    or
>>>
>>>    *p = x;
>>>    ...
>>>    y = *p;
>>>
>>> In both cases, *p is reloaded because there may exist other defs to
>>> another memref that may alias with p.  However, aliasing is eliminated
>>> with alias checks.  Then we can safely replace the last statement in
>>> above cases by y = x.
>>
>> Not safely, at least not for #pragma omp simd/#pragma simd/#pragma ivdep
>> loops if alias analysis hasn't proven there is no aliasing.
>>
>> So, IMNSHO you need to guard this with LOOP_VINFO_NO_DATA_DEPENDENCIES,
>> assuming it has been computed at that point already (otherwise you need to
>> do it elsewhere).
>>
>> Consider:
>>
>> void
>> foo (int *p, int *q)
>> {
>>   int i;
>>   #pragma omp simd safelen(16)
>>   for (i = 0; i < 128; i++)
>>     {
>>       int x = *p;
>>       *q += 8;
>>       *p = *p + x;
>>       p++;
>>       q++;
>>     }
>> }
>>
>> It is valid to call the above with completely unrelated p and q, but
>> also e.g. p == q, or q == p + 16 or p == q + 16.
>> Your patch would certainly break it e.g. for p == q.
>>
>>         Jakub
diff mbox

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 117cdd0..59a4388 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@ 
+2014-04-23  Cong Hou  <congh@google.com>
+
+	* tree-vect-patterns.c (vect_recog_reload_rewrite_pattern):
+	New function.
+	(vect_vect_recog_func_ptrs): Add new pattern.
+	* tree-vectorizer.h (NUM_PATTERNS):  Update the pattern count.
+
 2014-04-23  David Malcolm  <dmalcolm@redhat.com>
 
 	* is-a.h: Update comments to reflect the following changes to the
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 62b07f4..2116cd3 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@ 
+2014-04-23  Cong Hou  <congh@google.com>
+
+	* gcc.dg/vect/vect-reload-rewrite-pattern.c: New test.
+
 2014-04-23  Jeff Law  <law@redhat.com>
 
 	PR tree-optimization/60902
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reload-rewrite-pattern.c b/gcc/testsuite/gcc.dg/vect/vect-reload-rewrite-pattern.c
new file mode 100644
index 0000000..e75f969
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reload-rewrite-pattern.c
@@ -0,0 +1,61 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+#define N 1000
+int a[N];
+
+void test1 (int *b, int *c)
+{
+  int i;
+  for (i = 0; i < N; ++i)
+    {
+      a[i] = c[i];
+      /* Reload of c[i].  */
+      b[i] = c[i];
+    }
+}
+
+void test2 (int *b, int *c)
+{
+  int i;
+  for (i = 0; i < N; ++i)
+    {
+      c[i] = a[i] + 10;
+      /* Reload of a[i].  */
+      a[i]++;
+      /* Reload of c[i].  */
+      b[i] = c[i];
+    }
+}
+
+void test3 (int *b, int *c)
+{
+  int i;
+  for (i = 0; i < N; ++i)
+    {
+      c[i] = a[i] & 63;
+      /* Reload of a[i].  */
+      a[i]++;
+      /* Reload of c[i].  */
+      /* Rewrite to c[i].  */
+      c[i]--;
+    }
+}
+
+void test4 (_Complex int *b, _Complex int *c, _Complex int *d)
+{
+  int i;
+  for (i = 0; i < N; ++i)
+    {
+      b[i] = c[i] + d[i];
+      /* Reload of REALPART_EXPR (c[i]).  */
+      /* Reload of IMAGPART_EXPR (c[i]).  */
+      /* Reload of REALPART_EXPR (d[i]).  */
+      /* Reload of IMAGPART_EXPR (d[i]).  */
+      c[i] = c[i] - d[i];
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect"} } */
+/* { dg-final { scan-tree-dump-times "vect_recog_reload_rewrite_pattern: detected" 10 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 5daaf24..38a0fec 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -40,6 +40,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "ssa-iterators.h"
 #include "stringpool.h"
 #include "tree-ssanames.h"
+#include "tree-ssa-sccvn.h"
 #include "cfgloop.h"
 #include "expr.h"
 #include "optabs.h"
@@ -70,6 +71,7 @@  static gimple vect_recog_divmod_pattern (vec<gimple> *,
 static gimple vect_recog_mixed_size_cond_pattern (vec<gimple> *,
 						  tree *, tree *);
 static gimple vect_recog_bool_pattern (vec<gimple> *, tree *, tree *);
+static gimple vect_recog_reload_rewrite_pattern (vec<gimple> *, tree *, tree *);
 static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
 	vect_recog_widen_mult_pattern,
 	vect_recog_widen_sum_pattern,
@@ -81,6 +83,7 @@  static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
 	vect_recog_vector_vector_shift_pattern,
 	vect_recog_divmod_pattern,
 	vect_recog_mixed_size_cond_pattern,
+	vect_recog_reload_rewrite_pattern,
 	vect_recog_bool_pattern};
 
 static inline void
@@ -3019,6 +3022,160 @@  vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in,
     return NULL;
 }
 
+/* Function vect_recog_reload_rewrite_pattern
+
+   Try to find the following reload pattern:
+
+   x = *p;
+   ...
+   y = *p;
+
+   or
+
+   *p = x;
+   ...
+   y = *p;
+
+   In both cases, *p is reloaded because there may exist other defs to another
+   memref that may alias with p.  However, aliasing is eliminated with alias
+   checks.  Then we can safely replace the last statement in above cases by
+   y = x.
+
+   Also try to detect rewrite pattern:
+
+   *p = x;
+   ...
+   *p = y;
+
+   The first write is redundant due to the fact that there is no aliasing
+   between p and other pointers.  In this case we don't need to vectorize
+   the first write.  Here we replace it by a dummy statement z = x.
+
+   Input:
+
+   * STMTS: Contains a stmt from which the pattern search begins.
+
+   Output:
+
+   * TYPE_IN: The type of the input arguments to the pattern.
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern.  In this case it will be y = x;
+   for reload pattern, and z = x; for rewrite pattern.  */
+
+static gimple
+vect_recog_reload_rewrite_pattern (vec<gimple> *stmts, tree *type_in,
+				   tree *type_out)
+{
+  gimple last_stmt = (*stmts)[0];
+
+  /* We only consider this pattern for loops (alias checks are only
+     done for loops).  */
+  stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+  if (!loop_vinfo)
+    return NULL;
+
+  if (!is_gimple_assign (last_stmt))
+    return NULL;
+
+  /* Check if LAST_STMT is a load or write.  */
+  bool load = false;
+  tree lhs_oprnd = gimple_assign_lhs (last_stmt);
+  enum tree_code code = gimple_assign_rhs_code (last_stmt);
+
+  if (TREE_CODE (lhs_oprnd) == SSA_NAME
+      && (code == ARRAY_REF
+	  || code == BIT_FIELD_REF
+	  || code == INDIRECT_REF
+	  || code == COMPONENT_REF
+	  || code == IMAGPART_EXPR
+	  || code == REALPART_EXPR
+	  || code == MEM_REF))
+    load = true;
+  else if (TREE_CODE (lhs_oprnd) != ARRAY_REF
+	   && TREE_CODE (lhs_oprnd) != BIT_FIELD_REF
+	   && TREE_CODE (lhs_oprnd) != INDIRECT_REF
+	   && TREE_CODE (lhs_oprnd) != COMPONENT_REF
+	   && TREE_CODE (lhs_oprnd) != IMAGPART_EXPR
+	   && TREE_CODE (lhs_oprnd) != REALPART_EXPR
+	   && TREE_CODE (lhs_oprnd) != MEM_REF)
+    return NULL;
+
+  tree new_val = NULL_TREE;
+
+  if (load)
+    {
+      /* If LAST_STMT is a load, check if there is any load or write to
+	 the same memref before LAST_STMT.  */
+
+      tree memref = gimple_assign_rhs1 (last_stmt);
+      basic_block bb = gimple_bb (last_stmt);
+
+      for (gimple_stmt_iterator si = gsi_start_bb (bb);
+	   !gsi_end_p (si); gsi_next (&si))
+	{
+	  gimple stmt = gsi_stmt (si);
+	  if (stmt == last_stmt)
+	    break;
+	  if (!is_gimple_assign (stmt))
+	    continue;
+
+	  if (expressions_equal_p (memref, gimple_assign_rhs1 (stmt)))
+	    new_val = gimple_assign_lhs (stmt);
+	  else if (expressions_equal_p (memref, gimple_assign_lhs (stmt)))
+	    new_val = gimple_assign_rhs1 (stmt);
+	}
+    }
+  else
+    {
+      /* If LAST_STMT is a write, check if there is any write to
+	 the same memref after LAST_STMT.  */
+
+      tree memref = gimple_assign_lhs (last_stmt);
+      basic_block bb = gimple_bb (last_stmt);
+
+      gimple_stmt_iterator si = gsi_start_bb (bb);
+      while (gsi_stmt (si) != last_stmt)
+	gsi_next (&si);
+      gsi_next (&si);
+
+      for (; !gsi_end_p (si); gsi_next (&si))
+	{
+	  gimple stmt = gsi_stmt (si);
+	  if (is_gimple_assign (stmt)
+	      && expressions_equal_p (memref, gimple_assign_lhs (stmt)))
+	    {
+	      new_val = gimple_assign_rhs1 (last_stmt);
+	      break;
+	    }
+	}
+    }
+
+  if (new_val == NULL_TREE)
+    return NULL;
+
+  *type_in = *type_out = STMT_VINFO_VECTYPE (stmt_vinfo);
+
+  tree type = gimple_expr_type (last_stmt);
+  tree new_var = vect_recog_temp_ssa_var (type, NULL);
+  /* Pattern detected.  Create a stmt to be used to replace the pattern.  */
+  gimple pattern_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_var,
+						      new_val, NULL);
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "vect_recog_reload_rewrite_pattern: detected: ");
+      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
+      dump_printf (MSG_NOTE, "\n");
+      dump_gimple_stmt_loc (MSG_NOTE, vect_location, TDF_SLIM, pattern_stmt, 0);
+    }
+
+  return pattern_stmt;
+}
 
 /* Mark statements that are involved in a pattern.  */
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index c5cb037..e91474d 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1123,7 +1123,7 @@  extern void vect_slp_transform_bb (basic_block);
    Additional pattern recognition functions can (and will) be added
    in the future.  */
 typedef gimple (* vect_recog_func_ptr) (vec<gimple> *, tree *, tree *);
-#define NUM_PATTERNS 11
+#define NUM_PATTERNS 12
 void vect_pattern_recog (loop_vec_info, bb_vec_info);
 
 /* In tree-vectorizer.c.  */