diff mbox

[v2] Rerun loop-header-copying just before vectorization

Message ID 5584521A.7030205@arm.com
State New
Headers show

Commit Message

Alan Lawrence June 19, 2015, 5:32 p.m. UTC
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02139.html . 
Changes are:

    * Separate the two passes by descending from a common base class, allowing 
different predicates;
    * Test flag_tree_vectorize, and loop->force_vectorize/dont_vectorize - this 
fixes the test failing before;
    * Simplify the check for "code after exit edge";
    * Revert unnecessary changes to pass_tree_loop_init::execute;
    * Revert change to slp-perm-7 test (following fix by Marc Glisse)

Bootstrapped + check-gcc on aarch64 and x86_64 (linux).

gcc/ChangeLog:

	* tree-pass.h (make_pass_ch_vect): New.
	* passes.def: Add pass_ch_vect just before pass_if_conversion.

	* tree-ssa-loop-ch.c (pass_ch_base, pass_ch_vect, pass_data_ch_vect,
	pass_ch::process_loop_p): New.
	(pass_ch): Extend pass_ch_base.

	(pass_ch::execute): Move all but loop_optimizer_init/finalize to...
	(pass_ch_base::execute): ...here.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of x,y,z,w.
	of unsigned
	* gcc.dg/vect/vect-ifcvt-11.c: New.

Comments

Jeff Law June 25, 2015, 5:01 a.m. UTC | #1
On 06/19/2015 11:32 AM, Alan Lawrence wrote:
> This is a respin of
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02139.html . Changes are:
>
>     * Separate the two passes by descending from a common base class,
> allowing different predicates;
>     * Test flag_tree_vectorize, and loop->force_vectorize/dont_vectorize
> - this fixes the test failing before;
>     * Simplify the check for "code after exit edge";
>     * Revert unnecessary changes to pass_tree_loop_init::execute;
>     * Revert change to slp-perm-7 test (following fix by Marc Glisse)
So FWIW, if you don't want to make this a separate pass, you'd probably 
want the code which allows us to run the phi-only propagator as a 
subroutine to propagate and eliminate those degenerate PHIs.  I posted 
it a year or two ago, but went a different direction to solve whatever 
issue I was looking at.

I'm comfortable with this as a separate pass and relying on cfg cleanups 
to handle this stuff for us as this implementation of your patch 
currently does.


>
> Bootstrapped + check-gcc on aarch64 and x86_64 (linux).
>
> gcc/ChangeLog:
>
>      * tree-pass.h (make_pass_ch_vect): New.
>      * passes.def: Add pass_ch_vect just before pass_if_conversion.
>
>      * tree-ssa-loop-ch.c (pass_ch_base, pass_ch_vect, pass_data_ch_vect,
>      pass_ch::process_loop_p): New.
>      (pass_ch): Extend pass_ch_base.
>
>      (pass_ch::execute): Move all but loop_optimizer_init/finalize to...
>      (pass_ch_base::execute): ...here.
>
> gcc/testsuite/ChangeLog:
>
>      * gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of
> x,y,z,w.
>      of unsigned
>      * gcc.dg/vect/vect-ifcvt-11.c: New.
Can you add a function comment to ch_base::copy_headers.  I know it 
didn't have one before, but it really should have one.

I'd also add a comment to the execute methods.  pass_ch initializes and 
finalizes loop structures while pass_ch_vect::execute assumes the loop 
structures are already initialized and finalization is assumed to be 
handled earlier in the call chain.

I'd also suggest a comment to the process_loop_p method.

> +
> +  /* Apply copying if the exit block looks to have code after it.  */
> +  edge_iterator ei;
> +  edge e;
> +  FOR_EACH_EDGE (e, ei, exit->src->succs)
> +    if (!loop_exit_edge_p (loop, e)
> +	&& e->dest != loop->header
> +	&& e->dest != loop->latch)
> +      return true; /* Block with exit edge has code after it.  */
Don't put comments on the same line as code.  Instead I'd suggest 
describing the CFG pattern your looking for as part of the comment 
before the loop over the edges.


With those comment fixes, this is OK for the trunk.

jeff
Alan Lawrence July 2, 2015, 11:51 a.m. UTC | #2
> With those comment fixes, this is OK for the trunk.
> 
> jeff

Thank you for review - I've pushed r225311 with what I hope are appropriate 
comment fixes.

Cheers, Alan
diff mbox

Patch

diff --git a/gcc/passes.def b/gcc/passes.def
index 4690e23..5755035 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -247,6 +247,7 @@  along with GCC; see the file COPYING3.  If not see
 	  PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
 	      NEXT_PASS (pass_expand_omp_ssa);
 	  POP_INSERT_PASSES ()
+	  NEXT_PASS (pass_ch_vect);
 	  NEXT_PASS (pass_if_conversion);
 	  /* pass_vectorize must immediately follow pass_if_conversion.
 	     Please do not add any other passes in between.  */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
new file mode 100644
index 0000000..7e32369
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
@@ -0,0 +1,36 @@ 
+/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+
+#define N 16
+
+extern void abort (void);
+
+int A[N] = {36, 39, 42, 45, 43, 32, 21, 12, 23, 34, 45, 56, 67, 78, 81, 11};
+int B[N] = {144,195,210,225,172,128,105,60, 92, 136,225,280,268,390,324,55};
+
+__attribute__((noinline))
+void foo ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      int m = (A[i] & i) ? 5 : 4;
+      A[i] = A[i] * m;
+    }
+}
+
+int main ()
+{
+
+  check_vect ();
+  foo ();
+  /* check results:  */
+  for (int i = 0; i < N; i++)
+    if (A[i] != B[i])
+      abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
index af33ed4..0be68b3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
@@ -21,7 +21,6 @@  main1 ()
   s *ptr = arr;
   s res[N];
   int i;
-  unsigned short x, y, z, w;
 
   for (i = 0; i < N; i++)
     {
@@ -35,6 +34,7 @@  main1 ()
 
   for (i = 0; i < N; i++)
     {
+      unsigned short x, y, z, w;
       x = ptr->b - ptr->a;
       y = ptr->d - ptr->c;
       res[i].c = x + y;
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 172bd82..083e771 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -380,6 +380,7 @@  extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 6ece78b..bd409ef 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -144,6 +144,17 @@  do_while_loop_p (struct loop *loop)
 
 namespace {
 
+class ch_base : public gimple_opt_pass
+{
+  protected:
+    ch_base (pass_data data, gcc::context *ctxt)
+      : gimple_opt_pass (data, ctxt)
+    {}
+
+  unsigned int copy_headers (function *fun);
+  virtual bool process_loop_p (struct loop *loop) = 0;
+};
+
 const pass_data pass_data_ch =
 {
   GIMPLE_PASS, /* type */
@@ -157,21 +168,61 @@  const pass_data pass_data_ch =
   0, /* todo_flags_finish */
 };
 
-class pass_ch : public gimple_opt_pass
+/* This pass calls loop_optimizer_init before it executes,
+   and loop_optimizer_finalize after.  */
+class pass_ch : public ch_base
 {
 public:
   pass_ch (gcc::context *ctxt)
-    : gimple_opt_pass (pass_data_ch, ctxt)
+    : ch_base (pass_data_ch, ctxt)
   {}
 
   /* opt_pass methods: */
   virtual bool gate (function *) { return flag_tree_ch != 0; }
   virtual unsigned int execute (function *);
 
+protected:
+  /* ch_base method: */
+  virtual bool process_loop_p (struct loop *loop);
 }; // class pass_ch
 
+const pass_data pass_data_ch_vect =
+{
+  GIMPLE_PASS, /* type */
+  "ch_vect", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_CH, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+/* This is a more aggressive version, designed to run just before if-conversion
+   and vectorization, to put more loops into their required form.  */
+class pass_ch_vect : public ch_base
+{
+public:
+  pass_ch_vect (gcc::context *ctxt)
+    : ch_base (pass_data_ch_vect, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fun)
+  {
+    return flag_tree_ch != 0
+	   && (flag_tree_loop_vectorize != 0 || fun->has_force_vectorize_loops);
+  }
+  virtual unsigned int execute (function *);
+
+protected:
+  /* ch_base method: */
+  virtual bool process_loop_p (struct loop *loop);
+}; // class pass_ch_vect
+
 unsigned int
-pass_ch::execute (function *fun)
+ch_base::copy_headers (function *fun)
 {
   struct loop *loop;
   basic_block header;
@@ -181,13 +232,8 @@  pass_ch::execute (function *fun)
   unsigned bbs_size;
   bool changed = false;
 
-  loop_optimizer_init (LOOPS_HAVE_PREHEADERS
-		       | LOOPS_HAVE_SIMPLE_LATCHES);
   if (number_of_loops (fun) <= 1)
-    {
-      loop_optimizer_finalize ();
       return 0;
-    }
 
   bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
   copied_bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
@@ -204,7 +250,7 @@  pass_ch::execute (function *fun)
 	 written as such, or because jump threading transformed it into one),
 	 we might be in fact peeling the first iteration of the loop.  This
 	 in general is not a good idea.  */
-      if (do_while_loop_p (loop))
+      if (!process_loop_p (loop))
 	continue;
 
       /* Iterate the header copying up to limit; this takes care of the cases
@@ -291,17 +337,76 @@  pass_ch::execute (function *fun)
       changed = true;
     }
 
-  update_ssa (TODO_update_ssa);
+  if (changed)
+    update_ssa (TODO_update_ssa);
   free (bbs);
   free (copied_bbs);
 
-  loop_optimizer_finalize ();
   return changed ? TODO_cleanup_cfg : 0;
 }
 
+unsigned int
+pass_ch::execute (function *fun)
+{
+  loop_optimizer_init (LOOPS_HAVE_PREHEADERS
+		       | LOOPS_HAVE_SIMPLE_LATCHES);
+
+  unsigned int res = copy_headers (fun);
+
+  loop_optimizer_finalize ();
+  return res;
+}
+
+unsigned int
+pass_ch_vect::execute (function *fun)
+{
+  return copy_headers (fun);
+}
+
+bool
+pass_ch::process_loop_p (struct loop *loop)
+{
+  return !do_while_loop_p (loop);
+}
+
+bool
+pass_ch_vect::process_loop_p (struct loop *loop)
+{
+  if (!flag_tree_vectorize && !loop->force_vectorize)
+    return false;
+
+  if (loop->dont_vectorize)
+    return false;
+
+  if (!do_while_loop_p (loop))
+    return true;
+
+ /* The vectorizer won't handle anything with multiple exits, so skip.  */
+  edge exit = single_exit (loop);
+  if (!exit)
+    return false;
+
+  /* Apply copying if the exit block looks to have code after it.  */
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, exit->src->succs)
+    if (!loop_exit_edge_p (loop, e)
+	&& e->dest != loop->header
+	&& e->dest != loop->latch)
+      return true; /* Block with exit edge has code after it.  */
+
+  return false;
+}
+
 } // anon namespace
 
 gimple_opt_pass *
+make_pass_ch_vect (gcc::context *ctxt)
+{
+  return new pass_ch_vect (ctxt);
+}
+
+gimple_opt_pass *
 make_pass_ch (gcc::context *ctxt)
 {
   return new pass_ch (ctxt);