Message ID | 002a01cfa19e$8752d0d0$95f87270$@arm.com |
---|---|
State | New |
Headers | show |
On Thu, Jul 17, 2014 at 11:07 AM, Bin Cheng <bin.cheng@arm.com> wrote: > Hi, > This is a series of three patches improving induction variable elimination. > Currently GCC only eliminates iv for very specific case when the loop’s > latch could run zero times, i.e., when may_be_zero field of loop niter > information evaluates to true. In fact, it’s so specific that > iv_elimination_compare_lt rarely succeeds during either GCC bootstrap or > spec2000/spec2006 compilation. Though intrusive data shows these patches > don’t help iv elimination that much for GCC bootstrap, they do capture > 5%~15% more eliminations for compiling spec2000/2006. Detailed numbers are > like: > 2k/int 2k/fp 2k6/int 2k6/fp > improve ~9.6% ~4.8% ~5.5% ~14.4% > > All patches pass bootstrap and regression test on x86_64/x86. I will > bootstrap and test them on aarch64/arm platforms too. > > The first patch turns to tree operand_equal_p to check the number of > iterations in iv_elimination_lt. Though I think this change isn’t necessary > for current code, it’s needed if we further relax iv elimination for cases > in which sign/unsigned conversion is involved. As said elsewhere this bug should be fixed in tree-affine.c. Do you have a testcase? Thanks, Richard. > Thanks, > bin > > 2014-07-17 Bin Cheng <bin.cheng@arm.com> > > * tree-ssa-loop-ivopts.c (iv_elimination_compare_lt): Check number > of iteration using tree comparison.
On Fri, Jul 25, 2014 at 1:27 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Thu, Jul 17, 2014 at 11:07 AM, Bin Cheng <bin.cheng@arm.com> wrote: >> Hi, >> This is a series of three patches improving induction variable elimination. >> Currently GCC only eliminates iv for very specific case when the loop's >> latch could run zero times, i.e., when may_be_zero field of loop niter >> information evaluates to true. In fact, it's so specific that >> iv_elimination_compare_lt rarely succeeds during either GCC bootstrap or >> spec2000/spec2006 compilation. Though intrusive data shows these patches >> don't help iv elimination that much for GCC bootstrap, they do capture >> 5%~15% more eliminations for compiling spec2000/2006. Detailed numbers are >> like: >> 2k/int 2k/fp 2k6/int 2k6/fp >> improve ~9.6% ~4.8% ~5.5% ~14.4% >> >> All patches pass bootstrap and regression test on x86_64/x86. I will >> bootstrap and test them on aarch64/arm platforms too. >> >> The first patch turns to tree operand_equal_p to check the number of >> iterations in iv_elimination_lt. Though I think this change isn't necessary >> for current code, it's needed if we further relax iv elimination for cases >> in which sign/unsigned conversion is involved. > > As said elsewhere this bug should be fixed in tree-affine.c. Do you have > a testcase? > Sorry I don't have test case without patching GCC, I will revisit the problem and try to understand whether it's necessary or in which part it should be fixed. Thanks, bin
Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 212387) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -4605,7 +4605,7 @@ iv_elimination_compare_lt (struct ivopts_data *dat struct tree_niter_desc *niter) { tree cand_type, a, b, mbz, nit_type = TREE_TYPE (niter->niter), offset; - struct aff_tree nit, tmpa, tmpb; + struct aff_tree nit, tmp1, tmpa, tmpb; enum tree_code comp; HOST_WIDE_INT step; @@ -4661,15 +4661,19 @@ iv_elimination_compare_lt (struct ivopts_data *dat return false; /* Expected number of iterations is B - A - 1. Check that it matches - the actual number, i.e., that B - A - NITER = 1. */ + the actual number, i.e., that B - A = NITER + 1. */ tree_to_aff_combination (niter->niter, nit_type, &nit); - tree_to_aff_combination (fold_convert (nit_type, a), nit_type, &tmpa); - tree_to_aff_combination (fold_convert (nit_type, b), nit_type, &tmpb); - aff_combination_scale (&nit, -1); - aff_combination_scale (&tmpa, -1); - aff_combination_add (&tmpb, &tmpa); - aff_combination_add (&tmpb, &nit); - if (tmpb.n != 0 || tmpb.offset != 1) + aff_combination_const (&tmp1, nit_type, 1); + tree_to_aff_combination (b, TREE_TYPE (b), &tmpb); + aff_combination_add (&nit, &tmp1); + if (a != integer_zero_node) + { + tree_to_aff_combination (a, TREE_TYPE (b), &tmpa); + aff_combination_scale (&tmpa, -1); + aff_combination_add (&tmpb, &tmpa); + } + if (!operand_equal_p (aff_combination_to_tree (&nit), + aff_combination_to_tree (&tmpb), 0)) return false; /* Finally, check that CAND->IV->BASE - CAND->IV->STEP * A does not