Message ID | 001101cef6fe$b6ed3c90$24c7b5b0$@arm.com |
---|---|
State | New |
Headers | show |
On Thu, Dec 12, 2013 at 1:55 PM, bin.cheng <bin.cheng@arm.com> wrote: > > >> -----Original Message----- >> From: Richard Biener [mailto:richard.guenther@gmail.com] >> Sent: Wednesday, December 11, 2013 6:04 PM >> To: Jakub Jelinek >> Cc: Bin.Cheng; Jeff Law; Bin Cheng; gcc-patches List >> Subject: Re: [PATCH PR41488]Recognize more induction variables by >> simplifying PEELED chrec in scalar evolution >> >> On Wed, Dec 11, 2013 at 9:50 AM, Jakub Jelinek <jakub@redhat.com> wrote: >> > On Wed, Dec 11, 2013 at 04:31:55PM +0800, Bin.Cheng wrote: >> >> Thank both of you for being patient on this patch. >> >> I went through the documentation quickly and realized that I have to >> >> modify pointer-map structure to make it recognized by GC (maybe more >> > >> > Changing pointer_map at this point is IMHO not appropriate. >> > >> >> work suggested by Jakub). It seems I shouldn't include that task in >> >> this patch at this stage 3, I am thinking just call free_affine* >> >> function in the place it is created for SCEV. Of course, that makes >> >> it more expensive. >> > >> > Perhaps you could call free_affine_expand_cache say from >> > analyze_scalar_evolution, that is the innermost enclosing exported API >> > that indirectly calls your new code, but it seems it is also called >> > recursively from analyze_scalar_evolution_1 or functions it calls. >> > So maybe you'd need to rename analyze_scalar_evolution to >> > analyze_scalar_evolution_2 and adjust some calls to >> > analyze_scalar_evolution to the latter in tree-scalar-evolution.c, and >> > add analyze_scalar_evolution wrapper that calls >> analyze_scalar_evolution_2 and free_affine_expand_cache. >> > Whether this would work depends on detailed analysis of the >> > tree-scalar-evolution.c callgraph, there are tons of functions, most >> > of them static, and the question is if there is a single non-static >> > (or at most a few of them) function that cover all calls to your new >> > code in the static functions it (indirectly) calls, i.e. if there >> > isn't any other external entry point that might call your new code >> > without doing free_affine_expand_cache afterwards. >> > >> > If you can find that, I'd say it would be the safest thing to do. >> > But let's see what say Richard thinks about it too. >> >> I think keeping the SCEV cache live over multiple passes is fragile (we do > re- >> use SSA names and passes do not appropriately call scev_reset[_htab] in > all >> cases). >> >> With fixing that the easiest approach is to associate a affine-expand > cache >> with the scev info. >> >> Without that there isn't a very good solution other than discarding the > cache >> after each expansion at the point you use it. The SCEV cache should >> effectively make most of the affine expansion caching moot (but you can > try >> instrumenting that to verify it). >> >> That is, I suggest to try freeing the cache right after the second >> tree_to_aff_combination_expand. >> >> Btw, >> >> + /* Try harder to check if they are equal. */ >> + tree_to_aff_combination_expand (left, type, &aff1, &peeled_chrec_map); >> + tree_to_aff_combination_expand (step_val, type, &aff2, >> + &peeled_chrec_map); aff_combination_scale (&aff2, >> + double_int_minus_one); aff_combination_add (&aff1, &aff2); left = >> + fold_convert (type, aff_combination_to_tree (&aff1)); >> + >> + /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP >> + if "left" equals to "init + right". */ if (operand_equal_p >> + (left, integer_zero_node, 0)) >> >> you don't need aff_combination_to_tree, simply check >> >> aff1.n == 0 && aff1.offset.is_zero () >> >> (you may want to add aff_combination_zero_p or even >> aff_combination_equal_p) > > Hi, > This is the new version patch with bug 59445 fixed and calling free_affine > stuff just after it is used. I also added aff_combination_zero_p as > suggested. > The change in add_old_iv_candidates is unnecessary now, since the previous > version patch triggered the assertion, I just leave it here as an additional > check. > > At last, apology that I missed java in bootstrap before. > > Bootstrap and test on x86/x86_64, arm is still ongoing. Is it OK if tests > pass. Bootstrap/test on arm is good. Dominique d'Humieres helped that original issue on x86_64-apple-darwin13 is addressed too. Thanks, bin > > 2013-12-12 Bin Cheng <bin.cheng@arm.com> > > PR tree-optimization/58296 > PR tree-optimization/41488 > * tree-scalar-evolution.c: Include necessary header files. > (simplify_peeled_chrec): New function. > (analyze_evolution_in_loop): New static variable. > Call simplify_peeled_chrec. > * tree-ssa-loop-ivopts.c (mark_bivs): Don't mark peeled IV as biv. > (add_old_iv_candidates): Don't add candidate for peeled IV. > * tree-affine.h (aff_combination_zero_p): New function. > > gcc/testsuite/ChangeLog > 2013-12-12 Bin Cheng <bin.cheng@arm.com> > > PR tree-optimization/58296 > PR tree-optimization/41488 > * gcc.dg/tree-ssa/scev-7.c: New test. > * gcc.dg/pr41488.c: New test. > * g++.dg/pr59445.C: New test.
On Thu, Dec 12, 2013 at 6:55 AM, bin.cheng <bin.cheng@arm.com> wrote: > > >> -----Original Message----- >> From: Richard Biener [mailto:richard.guenther@gmail.com] >> Sent: Wednesday, December 11, 2013 6:04 PM >> To: Jakub Jelinek >> Cc: Bin.Cheng; Jeff Law; Bin Cheng; gcc-patches List >> Subject: Re: [PATCH PR41488]Recognize more induction variables by >> simplifying PEELED chrec in scalar evolution >> >> On Wed, Dec 11, 2013 at 9:50 AM, Jakub Jelinek <jakub@redhat.com> wrote: >> > On Wed, Dec 11, 2013 at 04:31:55PM +0800, Bin.Cheng wrote: >> >> Thank both of you for being patient on this patch. >> >> I went through the documentation quickly and realized that I have to >> >> modify pointer-map structure to make it recognized by GC (maybe more >> > >> > Changing pointer_map at this point is IMHO not appropriate. >> > >> >> work suggested by Jakub). It seems I shouldn't include that task in >> >> this patch at this stage 3, I am thinking just call free_affine* >> >> function in the place it is created for SCEV. Of course, that makes >> >> it more expensive. >> > >> > Perhaps you could call free_affine_expand_cache say from >> > analyze_scalar_evolution, that is the innermost enclosing exported API >> > that indirectly calls your new code, but it seems it is also called >> > recursively from analyze_scalar_evolution_1 or functions it calls. >> > So maybe you'd need to rename analyze_scalar_evolution to >> > analyze_scalar_evolution_2 and adjust some calls to >> > analyze_scalar_evolution to the latter in tree-scalar-evolution.c, and >> > add analyze_scalar_evolution wrapper that calls >> analyze_scalar_evolution_2 and free_affine_expand_cache. >> > Whether this would work depends on detailed analysis of the >> > tree-scalar-evolution.c callgraph, there are tons of functions, most >> > of them static, and the question is if there is a single non-static >> > (or at most a few of them) function that cover all calls to your new >> > code in the static functions it (indirectly) calls, i.e. if there >> > isn't any other external entry point that might call your new code >> > without doing free_affine_expand_cache afterwards. >> > >> > If you can find that, I'd say it would be the safest thing to do. >> > But let's see what say Richard thinks about it too. >> >> I think keeping the SCEV cache live over multiple passes is fragile (we do > re- >> use SSA names and passes do not appropriately call scev_reset[_htab] in > all >> cases). >> >> With fixing that the easiest approach is to associate a affine-expand > cache >> with the scev info. >> >> Without that there isn't a very good solution other than discarding the > cache >> after each expansion at the point you use it. The SCEV cache should >> effectively make most of the affine expansion caching moot (but you can > try >> instrumenting that to verify it). >> >> That is, I suggest to try freeing the cache right after the second >> tree_to_aff_combination_expand. >> >> Btw, >> >> + /* Try harder to check if they are equal. */ >> + tree_to_aff_combination_expand (left, type, &aff1, &peeled_chrec_map); >> + tree_to_aff_combination_expand (step_val, type, &aff2, >> + &peeled_chrec_map); aff_combination_scale (&aff2, >> + double_int_minus_one); aff_combination_add (&aff1, &aff2); left = >> + fold_convert (type, aff_combination_to_tree (&aff1)); >> + >> + /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP >> + if "left" equals to "init + right". */ if (operand_equal_p >> + (left, integer_zero_node, 0)) >> >> you don't need aff_combination_to_tree, simply check >> >> aff1.n == 0 && aff1.offset.is_zero () >> >> (you may want to add aff_combination_zero_p or even >> aff_combination_equal_p) > > Hi, > This is the new version patch with bug 59445 fixed and calling free_affine > stuff just after it is used. I also added aff_combination_zero_p as > suggested. > The change in add_old_iv_candidates is unnecessary now, since the previous > version patch triggered the assertion, I just leave it here as an additional > check. > > At last, apology that I missed java in bootstrap before. > > Bootstrap and test on x86/x86_64, arm is still ongoing. Is it OK if tests > pass. Ok. Thanks, Richard. > Thanks, > bin > > 2013-12-12 Bin Cheng <bin.cheng@arm.com> > > PR tree-optimization/58296 > PR tree-optimization/41488 > * tree-scalar-evolution.c: Include necessary header files. > (simplify_peeled_chrec): New function. > (analyze_evolution_in_loop): New static variable. > Call simplify_peeled_chrec. > * tree-ssa-loop-ivopts.c (mark_bivs): Don't mark peeled IV as biv. > (add_old_iv_candidates): Don't add candidate for peeled IV. > * tree-affine.h (aff_combination_zero_p): New function. > > gcc/testsuite/ChangeLog > 2013-12-12 Bin Cheng <bin.cheng@arm.com> > > PR tree-optimization/58296 > PR tree-optimization/41488 > * gcc.dg/tree-ssa/scev-7.c: New test. > * gcc.dg/pr41488.c: New test. > * g++.dg/pr59445.C: New test.
Index: gcc/tree-scalar-evolution.c =================================================================== --- gcc/tree-scalar-evolution.c (revision 205798) +++ gcc/tree-scalar-evolution.c (working copy) @@ -280,6 +280,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa.h" #include "cfgloop.h" #include "tree-chrec.h" +#include "pointer-set.h" +#include "tree-affine.h" #include "tree-scalar-evolution.h" #include "dumpfile.h" #include "params.h" @@ -1380,7 +1382,65 @@ follow_ssa_edge (struct loop *loop, gimple def, gi } +/* Simplify PEELED_CHREC represented by (init_cond, arg) in LOOP. + Handle below case and return the corresponding POLYNOMIAL_CHREC: + # i_17 = PHI <i_13(5), 0(3)> + # _20 = PHI <_5(5), start_4(D)(3)> + ... + i_13 = i_17 + 1; + _5 = start_4(D) + i_13; + + Though variable _20 appears as a PEELED_CHREC in the form of + (start_4, _5)_LOOP, it's a POLYNOMIAL_CHREC like {start_4, 1}_LOOP. + + See PR41488. */ + +static tree +simplify_peeled_chrec (struct loop *loop, tree arg, tree init_cond) +{ + aff_tree aff1, aff2; + tree ev, left, right, type, step_val; + pointer_map_t *peeled_chrec_map = NULL; + + ev = instantiate_parameters (loop, analyze_scalar_evolution (loop, arg)); + if (ev == NULL_TREE || TREE_CODE (ev) != POLYNOMIAL_CHREC) + return chrec_dont_know; + + left = CHREC_LEFT (ev); + right = CHREC_RIGHT (ev); + type = TREE_TYPE (left); + step_val = chrec_fold_plus (type, init_cond, right); + + /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP + if "left" equals to "init + right". */ + if (operand_equal_p (left, step_val, 0)) + { + if (dump_file && (dump_flags & TDF_SCEV)) + fprintf (dump_file, "Simplify PEELED_CHREC into POLYNOMIAL_CHREC.\n"); + + return build_polynomial_chrec (loop->num, init_cond, right); + } + + /* Try harder to check if they are equal. */ + tree_to_aff_combination_expand (left, type, &aff1, &peeled_chrec_map); + tree_to_aff_combination_expand (step_val, type, &aff2, &peeled_chrec_map); + free_affine_expand_cache (&peeled_chrec_map); + aff_combination_scale (&aff2, double_int_minus_one); + aff_combination_add (&aff1, &aff2); + + /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP + if "left" equals to "init + right". */ + if (aff_combination_zero_p (&aff1)) + { + if (dump_file && (dump_flags & TDF_SCEV)) + fprintf (dump_file, "Simplify PEELED_CHREC into POLYNOMIAL_CHREC.\n"); + + return build_polynomial_chrec (loop->num, init_cond, right); + } + return chrec_dont_know; +} + /* Given a LOOP_PHI_NODE, this function determines the evolution function from LOOP_PHI_NODE to LOOP_PHI_NODE in the loop. */ @@ -1392,6 +1452,7 @@ analyze_evolution_in_loop (gimple loop_phi_node, tree evolution_function = chrec_not_analyzed_yet; struct loop *loop = loop_containing_stmt (loop_phi_node); basic_block bb; + static bool simplify_peeled_chrec_p = true; if (dump_file && (dump_flags & TDF_SCEV)) { @@ -1442,7 +1503,19 @@ analyze_evolution_in_loop (gimple loop_phi_node, all the other iterations it has the value of ARG. For the moment, PEELED_CHREC nodes are not built. */ if (res != t_true) - ev_fn = chrec_dont_know; + { + ev_fn = chrec_dont_know; + /* Try to recognize POLYNOMIAL_CHREC which appears in + the form of PEELED_CHREC, but guard the process with + a bool variable to keep the analyzer from infinite + recurrence for real PEELED_RECs. */ + if (simplify_peeled_chrec_p && TREE_CODE (arg) == SSA_NAME) + { + simplify_peeled_chrec_p = false; + ev_fn = simplify_peeled_chrec (loop, arg, init_cond); + simplify_peeled_chrec_p = true; + } + } /* When there are multiple back edges of the loop (which in fact never happens currently, but nevertheless), merge their evolutions. */ Index: gcc/testsuite/gcc.dg/tree-ssa/scev-7.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/scev-7.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/scev-7.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sccp-scev" } */ + +struct struct_t +{ + int* data; +}; + +void foo (struct struct_t* sp, int start, int end) +{ + int i; + + for (i = 1000; i+start > end; i--) + sp->data[i+start] = 0; +} + +/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into POLYNOMIAL_CHREC" 1 "sccp" } } */ +/* { dg-final { cleanup-tree-dump "sccp" } } */ Index: gcc/testsuite/gcc.dg/pr41488.c =================================================================== --- gcc/testsuite/gcc.dg/pr41488.c (revision 0) +++ gcc/testsuite/gcc.dg/pr41488.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sccp-scev" } */ + +struct struct_t +{ + int* data; +}; + +void foo (struct struct_t* sp, int start, int end) +{ + int i; + + for (i = 0; i+start < end; i++) + sp->data[i+start] = 0; +} + +/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into POLYNOMIAL_CHREC" 1 "sccp" } } */ +/* { dg-final { cleanup-tree-dump "sccp" } } */ Index: gcc/testsuite/g++.dg/pr59445.C =================================================================== --- gcc/testsuite/g++.dg/pr59445.C (revision 0) +++ gcc/testsuite/g++.dg/pr59445.C (revision 0) @@ -0,0 +1,81 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +template <typename _Iterator> struct A; +template <typename _Tp> struct A<_Tp *> { + typedef _Tp value_type; + typedef int difference_type; +}; +template <typename _Compare> struct B {}; +template <typename _Compare> struct C { + _Compare _M_comp; + template <typename _Value, typename _Iterator> + int operator()(_Value &p1, _Iterator p2) { + return _M_comp(p1, *p2); + } +}; +template <typename _Compare> C<_Compare> __val_comp_iter(B<_Compare>); +template <typename _RandomAccessIterator, typename _Compare> +void __unguarded_linear_insert(_RandomAccessIterator p1, _Compare p2) { + typename A<_RandomAccessIterator>::value_type a; + _RandomAccessIterator b = p1; + --b; + while (p2(a, b)) { + *p1 = 0; + p1 = b; + --b; + } +} +template <typename _RandomAccessIterator, typename _Compare> +void __insertion_sort(_RandomAccessIterator, _Compare p2) { + for (_RandomAccessIterator c;; ++c) + __unguarded_linear_insert(c, __val_comp_iter(p2)); +} +template <typename _RandomAccessIterator, typename _Distance, typename _Compare> +void __chunk_insertion_sort(_RandomAccessIterator, _Distance, _Compare p3) { + _RandomAccessIterator d; + __insertion_sort(d, p3); +} +template <typename _RandomAccessIterator, typename _Pointer, typename _Compare> +void __merge_sort_with_buffer(_RandomAccessIterator p1, _Pointer, _Compare p3) { + __chunk_insertion_sort(p1, 0, p3); +} +template <typename _RandomAccessIterator, typename _Pointer, typename _Distance, + typename _Compare> +void __stable_sort_adaptive(_RandomAccessIterator, _Pointer, _Distance, + _Compare p4) { + _RandomAccessIterator e; + __merge_sort_with_buffer(e, 0, p4); +} +template <typename _RandomAccessIterator, typename _Compare> +void __stable_sort(_RandomAccessIterator p1, _Compare p2) { + __stable_sort_adaptive( + p1, 0, typename A<_RandomAccessIterator>::difference_type(), p2); +} +template <typename _RandomAccessIterator, typename _Compare> +void stable_sort(_RandomAccessIterator, _RandomAccessIterator p2, _Compare) { + B<_Compare> f; + __stable_sort(p2, f); +} +class D { +public: + void m_fn1(); +}; +class F { + struct G { + D MFI; + int operator()(int p1, int p2) { + if (p1) + return 0; + if (p2) + return 1; + MFI.m_fn1(); + } + }; + void m_fn1(int &p1) const; +}; +void F::m_fn1(int &p1) const { + int *g, *h; + stable_sort(h, g, G()); +} + Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 205798) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -1074,7 +1074,7 @@ find_bivs (struct ivopts_data *data) static void mark_bivs (struct ivopts_data *data) { - gimple phi; + gimple phi, def; tree var; struct iv *iv, *incr_iv; struct loop *loop = data->current_loop; @@ -1090,6 +1090,13 @@ mark_bivs (struct ivopts_data *data) continue; var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop)); + def = SSA_NAME_DEF_STMT (var); + /* Don't mark iv peeled from other one as biv. */ + if (def + && gimple_code (def) == GIMPLE_PHI + && gimple_bb (def) == loop->header) + continue; + incr_iv = get_iv (data, var); if (!incr_iv) continue; @@ -2526,11 +2533,19 @@ add_old_iv_candidates (struct ivopts_data *data, s /* Additionally record the possibility of leaving the original iv untouched. */ def = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (data->current_loop)); - cand = add_candidate_1 (data, - iv->base, iv->step, true, IP_ORIGINAL, NULL, - SSA_NAME_DEF_STMT (def)); - cand->var_before = iv->ssa_name; - cand->var_after = def; + /* Don't add candidate if it's from another PHI node because + it's an affine iv appearing in the form of PEELED_CHREC. */ + phi = SSA_NAME_DEF_STMT (def); + if (gimple_code (phi) != GIMPLE_PHI) + { + cand = add_candidate_1 (data, + iv->base, iv->step, true, IP_ORIGINAL, NULL, + SSA_NAME_DEF_STMT (def)); + cand->var_before = iv->ssa_name; + cand->var_after = def; + } + else + gcc_assert (gimple_bb (phi) == data->current_loop->header); } } Index: gcc/tree-affine.h =================================================================== --- gcc/tree-affine.h (revision 205798) +++ gcc/tree-affine.h (working copy) @@ -80,3 +80,16 @@ bool aff_comb_cannot_overlap_p (aff_tree *, double /* Debugging functions. */ void debug_aff (aff_tree *); + +/* Return true if AFF is actually ZERO. */ +static inline bool +aff_combination_zero_p (aff_tree *aff) +{ + if (!aff) + return true; + + if (aff->n == 0 && aff->offset.is_zero ()) + return true; + + return false; +}