diff mbox

[RFC] 69526 - ivopts candidate strangeness

Message ID 56EADBF7.8080104@linux.vnet.ibm.com
State New
Headers show

Commit Message

Robin Dapp March 17, 2016, 4:31 p.m. UTC
The attached patch is a first and somewhat hideous attempt to fix the
missed optimization discussed here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526

where (paraphrasing) a spurious
n = n - 1
n = (sizetype) n
n = n + 1
is being created.

The patch tries to avoid this on tree level as well as on RTL level.
Both parts have to be addressed since the logic is essentially doubled
and e.g. loop-iv.c performs the same actions as tree-ssa-loop-niter.c to
obtain the number of iterations of a loop.

(On a side note, I find the name 'niter' rather dubious, as the variable
actually represents the number of backjumps/latch executions and not the
number of iterations in a normal sense. This put me off track for quite
a while.)

The n - 1 above is the number of "iterations"/backjumps. For loops where
the induction variable is incremented before the comparison, the number
of iterations has to be incremented as well (n + 1) to obtain the
correct number of backjumps.

Both operations (-1, +1) are separate from each other and don't know if
the other operation was performed or not. Therefore, the rationale of
this patch to detect the situation where the number of iterations is of
the form n - 1 and still must be incremented afterwards. In such a case,
the - 1 will be stripped and a flag is set to avoid performing the + 1
later on.

A major abomination in the patch is that in order to decide whether to
act or not in loop-doloop.c, loop-iv.c has to have prepared an
unmodified expression before which is why find_simple_exit() is actually
called a second time with a flag that prevents the - 1.

I am aware that it is neither optimal nor minimal hence I'd be grateful
for ideas on how to solve this in a more elegant way. I found no
regressions on x86-64 and s390x but did not yet perform bootstrapping
and more testing due to the premature nature of the patch.

Thanks
 Robin


gcc/ChangeLog:

2016-03-17  Robin Dapp  <rdapp@linux.vnet.ibm.com>

        * cfgloop.h (struct GTY): Add second number of iterations
        * loop-doloop.c (doloop_condition_get): Fix whitespace
        (doloop_modify): Use second number of iterations
        * loop-iv.c (iv_number_of_iterations): Add parameter
        (check_simple_exit): Add parameter
        (find_simple_exit): Add parameter
        (get_simple_loop_desc): Call find_simple_exit() a second time
        * loop-unroll.c (unroll_loop_constant_iterations): Copy niter_expr
        (unroll_loop_runtime_iterations): Copy niter_expr
        * tree-ssa-loop-ivopts.c (cand_value_at): Avoid creating n + 1 - 1
        (may_eliminate_iv): Call with may_be_zero
        (remove_unused_ivs): Fix whitespace

Comments

Richard Biener March 18, 2016, 10:03 a.m. UTC | #1
On Thu, Mar 17, 2016 at 5:31 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> The attached patch is a first and somewhat hideous attempt to fix the
> missed optimization discussed here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526
>
> where (paraphrasing) a spurious
> n = n - 1
> n = (sizetype) n
> n = n + 1
> is being created.
>
> The patch tries to avoid this on tree level as well as on RTL level.
> Both parts have to be addressed since the logic is essentially doubled
> and e.g. loop-iv.c performs the same actions as tree-ssa-loop-niter.c to
> obtain the number of iterations of a loop.
>
> (On a side note, I find the name 'niter' rather dubious, as the variable
> actually represents the number of backjumps/latch executions and not the
> number of iterations in a normal sense. This put me off track for quite
> a while.)
>
> The n - 1 above is the number of "iterations"/backjumps. For loops where
> the induction variable is incremented before the comparison, the number
> of iterations has to be incremented as well (n + 1) to obtain the
> correct number of backjumps.
>
> Both operations (-1, +1) are separate from each other and don't know if
> the other operation was performed or not. Therefore, the rationale of
> this patch to detect the situation where the number of iterations is of
> the form n - 1 and still must be incremented afterwards. In such a case,
> the - 1 will be stripped and a flag is set to avoid performing the + 1
> later on.
>
> A major abomination in the patch is that in order to decide whether to
> act or not in loop-doloop.c, loop-iv.c has to have prepared an
> unmodified expression before which is why find_simple_exit() is actually
> called a second time with a flag that prevents the - 1.
>
> I am aware that it is neither optimal nor minimal hence I'd be grateful
> for ideas on how to solve this in a more elegant way. I found no
> regressions on x86-64 and s390x but did not yet perform bootstrapping
> and more testing due to the premature nature of the patch.

As explained in the PR VRP has enough information to fix the IL.

Other than that remembering both the number of exit tests and the
number of latch executions might be a good idea.  Personally
I wouldn't worry about RTL level at all but instead do that on the
GIMPLE level only.  The vectorizer already does sth similar to
avoid running into some degenerate case where the number of
exit test invocations is zero (because # of latch executions + 1 overflows).
See PR59058.

Richard.

> Thanks
>  Robin
>
>
> gcc/ChangeLog:
>
> 2016-03-17  Robin Dapp  <rdapp@linux.vnet.ibm.com>
>
>         * cfgloop.h (struct GTY): Add second number of iterations
>         * loop-doloop.c (doloop_condition_get): Fix whitespace
>         (doloop_modify): Use second number of iterations
>         * loop-iv.c (iv_number_of_iterations): Add parameter
>         (check_simple_exit): Add parameter
>         (find_simple_exit): Add parameter
>         (get_simple_loop_desc): Call find_simple_exit() a second time
>         * loop-unroll.c (unroll_loop_constant_iterations): Copy niter_expr
>         (unroll_loop_runtime_iterations): Copy niter_expr
>         * tree-ssa-loop-ivopts.c (cand_value_at): Avoid creating n + 1 - 1
>         (may_eliminate_iv): Call with may_be_zero
>         (remove_unused_ivs): Fix whitespace
diff mbox

Patch

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 882861c..cc4fd23 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -415,6 +415,9 @@  struct GTY(()) niter_desc
 
   /* The number of iterations of the loop.  */
   rtx niter_expr;
+
+  /* The real number of iterations of the loop.  */
+  rtx niter2_expr;
 };
 
 extern void iv_analysis_loop_init (struct loop *);
@@ -424,9 +427,11 @@  extern bool iv_analyze_expr (rtx_insn *, rtx, machine_mode,
 			     struct rtx_iv *);
 extern rtx get_iv_value (struct rtx_iv *, rtx);
 extern bool biv_p (rtx_insn *, rtx);
+extern void find_simple_exit (struct loop *, struct niter_desc *, bool add_one);
 extern void find_simple_exit (struct loop *, struct niter_desc *);
 extern void iv_analysis_done (void);
 
+extern struct niter_desc *get_simple_loop_desc (struct loop *loop, bool add_one);
 extern struct niter_desc *get_simple_loop_desc (struct loop *loop);
 extern void free_simple_loop_desc (struct loop *loop);
 
diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index 940c966..49276b5 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -139,15 +139,15 @@  doloop_condition_get (rtx doloop_pat)
 	    return 0;
 	  cmp_arg1 = XEXP (SET_SRC (cmp_orig), 0);
           cmp_arg2 = XEXP (SET_SRC (cmp_orig), 1);
-	  if (cmp_arg2 != const0_rtx 
+	  if (cmp_arg2 != const0_rtx
 	      || GET_CODE (cmp_arg1) != PLUS)
 	    return 0;
 	  reg_orig = XEXP (cmp_arg1, 0);
-	  if (XEXP (cmp_arg1, 1) != GEN_INT (-1) 
+	  if (XEXP (cmp_arg1, 1) != GEN_INT (-1)
 	      || !REG_P (reg_orig))
 	    return 0;
 	  cc_reg = SET_DEST (cmp_orig);
-	  
+
 	  inc = XVECEXP (PATTERN (prev_insn), 0, 1);
 	}
       else
@@ -201,7 +201,7 @@  doloop_condition_get (rtx doloop_pat)
     return 0;
 
   if ((XEXP (condition, 0) == reg)
-      /* For the third case:  */  
+      /* For the third case:  */
       || ((cc_reg != NULL_RTX)
 	  && (XEXP (condition, 0) == cc_reg)
 	  && (reg_orig == reg))
@@ -474,7 +474,20 @@  doloop_modify (struct loop *loop, struct niter_desc *desc,
     }
 
   if (increment_count)
-    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    {
+      /* Get the expression for the number of iterations */
+      desc = get_simple_loop_desc (loop);
+
+      if (desc->niter2_expr != NULL &&
+	  !rtx_equal_p (desc->niter_expr, desc->niter2_expr))
+	{
+	  count = copy_rtx (desc->niter2_expr);
+	}
+      else
+	{
+	  count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+	}
+    }
 
   /* Insert initialization of the count register into the loop header.  */
   start_sequence ();
diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index fecaf8f..03263e3 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -2311,7 +2311,7 @@  determine_max_iter (struct loop *loop, struct niter_desc *desc, rtx old_niter)
 
 static void
 iv_number_of_iterations (struct loop *loop, rtx_insn *insn, rtx condition,
-			 struct niter_desc *desc)
+			 struct niter_desc *desc, bool add_one)
 {
   rtx op0, op1, delta, step, bound, may_xform, tmp, tmp0, tmp1;
   struct rtx_iv iv0, iv1;
@@ -2646,6 +2646,57 @@  iv_number_of_iterations (struct loop *loop, rtx_insn *insn, rtx condition,
 	 makes us able to do more involved computations of number of iterations
 	 than in other cases.  First transform the condition into shape
 	 s * i <> c, with s positive.  */
+      if (add_one)
+	{
+	  /* Check if base is of the form foo + iv.step).  */
+	  rtx optmp = NULL_RTX;
+	  enum rtx_code cond0;
+	  enum rtx_code cond1;
+
+	  struct rtx_iv ivtmp;
+	  ivtmp.mode = VOIDmode;
+	  ivtmp.base = NULL_RTX;
+	  ivtmp.step = NULL_RTX;
+
+	  cond0 = GET_CODE (iv0.base);
+	  cond1 = GET_CODE (iv1.base);
+
+	  bool changed = false;
+
+	  if (cond0 == PLUS)
+	    ivtmp = iv0;
+
+	  if (cond1 == PLUS)
+	    ivtmp = iv1;
+
+	  if (ivtmp.base != NULL_RTX)
+	    {
+	      optmp = XEXP (ivtmp.base, 1);
+	      rtx base = NULL_RTX;
+
+	      if (optmp != NULL_RTX && GET_CODE (optmp) == CONST_INT)
+		{
+		  int sw = INTVAL (optmp);
+		  int sz = INTVAL (ivtmp.step);
+		  if (sw == sz)
+		    {
+		      base = simplify_gen_binary (MINUS, comp_mode,
+						  ivtmp.base, ivtmp.step);
+		      changed = true;
+		    }
+		}
+
+	      if (changed)
+		{
+		  if (cond0 == PLUS)
+		    iv0.base = base;
+		  else if (cond1 == PLUS)
+		    iv1.base = base;
+		}
+	    }
+	}
+
+
       iv1.base = simplify_gen_binary (MINUS, comp_mode, iv1.base, iv0.base);
       iv0.base = const0_rtx;
       iv0.step = simplify_gen_binary (MINUS, comp_mode, iv0.step, iv1.step);
@@ -2871,11 +2922,18 @@  fail:
   return;
 }
 
+static void
+iv_number_of_iterations (struct loop *loop, rtx_insn *insn, rtx condition,
+			 struct niter_desc *desc)
+{
+  iv_number_of_iterations (loop, insn, condition, desc, false);
+}
+
 /* Checks whether E is a simple exit from LOOP and stores its description
    into DESC.  */
 
 static void
-check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc)
+check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc, bool add_one)
 {
   basic_block exit_bb;
   rtx condition;
@@ -2917,13 +2975,20 @@  check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc)
 
   /* Check that we are able to determine number of iterations and fill
      in information about it.  */
-  iv_number_of_iterations (loop, at, condition, desc);
+  iv_number_of_iterations (loop, at, condition, desc, add_one);
+}
+
+static void
+check_simple_exit (struct loop *loop, edge e, struct niter_desc *desc)
+{
+  check_simple_exit (loop, e, desc, false);
 }
 
+
 /* Finds a simple exit of LOOP and stores its description into DESC.  */
 
 void
-find_simple_exit (struct loop *loop, struct niter_desc *desc)
+find_simple_exit (struct loop *loop, struct niter_desc *desc, bool add_one)
 {
   unsigned i;
   basic_block *body;
@@ -2942,7 +3007,7 @@  find_simple_exit (struct loop *loop, struct niter_desc *desc)
 	  if (flow_bb_inside_loop_p (loop, e->dest))
 	    continue;
 
-	  check_simple_exit (loop, e, &act);
+	  check_simple_exit (loop, e, &act, add_one);
 	  if (!act.simple_p)
 	    continue;
 
@@ -2965,7 +3030,7 @@  find_simple_exit (struct loop *loop, struct niter_desc *desc)
 	}
     }
 
-  if (dump_file)
+  if (dump_file && !add_one)
     {
       if (desc->simple_p)
 	{
@@ -3008,22 +3073,34 @@  find_simple_exit (struct loop *loop, struct niter_desc *desc)
   free (body);
 }
 
+void
+find_simple_exit (struct loop *loop, struct niter_desc *desc)
+{
+  find_simple_exit (loop, desc, false);
+}
+
 /* Creates a simple loop description of LOOP if it was not computed
    already.  */
 
 struct niter_desc *
-get_simple_loop_desc (struct loop *loop)
+get_simple_loop_desc (struct loop *loop, bool add_one)
 {
   struct niter_desc *desc = simple_loop_desc (loop);
 
-  if (desc)
+  if (!add_one && desc)
     return desc;
 
   /* At least desc->infinite is not always initialized by
      find_simple_loop_exit.  */
   desc = ggc_cleared_alloc<niter_desc> ();
+  niter_desc *desc_old = ggc_cleared_alloc<niter_desc> ();
   iv_analysis_loop_init (loop);
-  find_simple_exit (loop, desc);
+  find_simple_exit (loop, desc, add_one);
+  find_simple_exit (loop, desc_old, true);
+  if (desc_old != NULL && desc_old->niter_expr != NULL)
+    {
+      desc->niter2_expr = copy_rtx (desc_old->niter_expr);
+    }
   loop->simple_loop_desc = desc;
 
   if (desc->simple_p && (desc->assumptions || desc->infinite))
@@ -3064,6 +3141,12 @@  get_simple_loop_desc (struct loop *loop)
   return desc;
 }
 
+struct niter_desc *
+get_simple_loop_desc (struct loop *loop)
+{
+  return get_simple_loop_desc (loop, false);
+}
+
 /* Releases simple loop description for LOOP.  */
 
 void
diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 4d26e2f..330df8c 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -620,6 +620,7 @@  unroll_loop_constant_iterations (struct loop *loop)
     loop->nb_iterations_estimate
       = wi::udiv_trunc (loop->nb_iterations_estimate, max_unroll + 1);
   desc->niter_expr = GEN_INT (desc->niter);
+  desc->niter2_expr = GEN_INT (desc->niter);
 
   /* Remove the edges.  */
   FOR_EACH_VEC_ELT (remove_edges, i, e)
@@ -1054,6 +1055,7 @@  unroll_loop_runtime_iterations (struct loop *loop)
   desc->niter_expr =
     simplify_gen_binary (UDIV, desc->mode, old_niter,
 			 gen_int_mode (max_unroll + 1, desc->mode));
+  desc->niter2_expr = copy_rtx (desc->niter_expr);
   loop->nb_iterations_upper_bound
     = wi::udiv_trunc (loop->nb_iterations_upper_bound, max_unroll + 1);
   if (loop->any_estimate)
@@ -1063,6 +1065,7 @@  unroll_loop_runtime_iterations (struct loop *loop)
     {
       desc->niter_expr =
 	simplify_gen_binary (MINUS, desc->mode, desc->niter_expr, const1_rtx);
+      desc->niter2_expr = copy_rtx (desc->niter_expr);
       desc->noloop_assumptions = NULL_RTX;
       --loop->nb_iterations_upper_bound;
       if (loop->any_estimate
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 3faed93..0e90a68 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5094,7 +5094,7 @@  determine_use_iv_cost_address (struct ivopts_data *data,
 
 static void
 cand_value_at (struct loop *loop, struct iv_cand *cand, gimple *at, tree niter,
-	       aff_tree *val)
+	       tree may_be_zero_p, aff_tree *val)
 {
   aff_tree step, delta, nit;
   struct iv *iv = cand->iv;
@@ -5103,14 +5103,55 @@  cand_value_at (struct loop *loop, struct iv_cand *cand, gimple *at, tree niter,
   if (POINTER_TYPE_P (type))
     steptype = sizetype;
   steptype = unsigned_type_for (type);
+  bool may_be_zero = !integer_zerop (may_be_zero_p);
+  bool removed_offset = false;
 
   tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &step);
   aff_combination_convert (&step, steptype);
   tree_to_aff_combination (niter, TREE_TYPE (niter), &nit);
+
+  /* Try to get rid of a conversion like ((sizetype) (n - 1)) + 1.  */
+
+  // FIXME: may_be_zero is not enough, niter may still be zero :/
+  if (stmt_after_increment (loop, cand, at) && !may_be_zero && nit.offset != 0)
+    {
+      tree itertype = TREE_TYPE (niter);
+      aff_tree step_conv = step;
+      aff_combination_convert (&step_conv, itertype);
+
+      tree_code code = TREE_CODE (niter);
+
+      tree_code step_code = TREE_CODE (iv->step);
+      if (code == PLUS_EXPR && step_code == INTEGER_CST)
+	{
+	  tree arg1 = TREE_OPERAND (niter, 1);
+	  tree_code arg2_code = TREE_CODE (arg1);
+	  if (arg2_code == INTEGER_CST)
+	    {
+	      // FIXME: magic value for -1
+	      tree t = build_int_cstu (uint32_type_node, 4294967295);
+	      tree type2 = TYPE_SIZE_UNIT (nit.type);
+	      unsigned HOST_WIDE_INT w = tree_to_uhwi (type2);
+	      tree type3 = build_int_cstu (uint32_type_node, w);
+	      if (simple_cst_equal (arg1, t) &&
+		  simple_cst_equal (iv->step, type3))
+		{
+		  nit.offset = 0;
+		  removed_offset = true;
+		}
+	    }
+	}
+    }
+
   aff_combination_convert (&nit, steptype);
   aff_combination_mult (&nit, &step, &delta);
-  if (stmt_after_increment (loop, cand, at))
-    aff_combination_add (&delta, &step);
+
+  /* We might have already removed the offset/step.  */
+  if (!removed_offset)
+    {
+      if (stmt_after_increment (loop, cand, at))
+	aff_combination_add (&delta, &step);
+    }
 
   tree_to_aff_combination (iv->base, type, val);
   if (!POINTER_TYPE_P (type))
@@ -5450,7 +5491,7 @@  may_eliminate_iv (struct ivopts_data *data,
         }
     }
 
-  cand_value_at (loop, cand, use->stmt, desc->niter, &bnd);
+  cand_value_at (loop, cand, use->stmt, desc->niter, desc->may_be_zero, &bnd);
 
   *bound = fold_convert (TREE_TYPE (cand->iv->base),
 			 aff_combination_to_tree (&bnd));
@@ -7405,7 +7446,7 @@  remove_unused_ivs (struct ivopts_data *data)
 	  && !info->preserve_biv)
 	{
 	  bitmap_set_bit (toremove, SSA_NAME_VERSION (info->iv->ssa_name));
-	  
+
 	  tree def = info->iv->ssa_name;
 
 	  if (MAY_HAVE_DEBUG_STMTS && SSA_NAME_DEF_STMT (def))