Message ID | 000401d0e52f$2c1917d0$844b4770$@arm.com |
---|---|
State | New |
Headers | show |
On Wed, Sep 2, 2015 at 5:26 AM, Bin Cheng <bin.cheng@arm.com> wrote: > Hi, > This patch is a new approach to fix PR66388. IVO today computes iv_use with > iv_cand which has at least same type precision as the use. On 64bit > platforms like AArch64, this results in different iv_cand created for each > address type iv_use, and register pressure increased. As a matter of fact, > the BIV should be used for all iv_uses in some of these cases. It is a > latent bug but recently getting worse because of overflow changes. > > The original approach at > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue > except it conflict with IV elimination. Seems to me it is impossible to > mitigate the contradiction. > > This new approach fixes the issue by adding sizetype iv_cand for BIVs > directly. In cases if the original BIV is preferred, the sizetype iv_cand > will be chosen. As for code generation, the sizetype iv_cand has the same > effect as the original BIV. Actually, it's better because BIV needs to be > explicitly extended to sizetype to be used in address expression on most > targets. > > One shortage of this approach is it may introduce more iv candidates. To > minimize the impact, this patch does sophisticated code analysis and adds > sizetype candidate for BIV only if it is used as index. Moreover, it avoids > to add candidate of the original type if the BIV is only used as index. > Statistics for compiling spec2k6 shows increase of candidate number is > modest and can be ignored. > > There are two more patches following to fix corner cases revealed by this > one. In together they bring obvious perf improvement for spec26k/int on > aarch64. > Spec2k6/int > 400.perlbench 3.44% > 445.gobmk -0.86% > 456.hmmer 14.83% > 458.sjeng 2.49% > 462.libquantum -0.79% > GEOMEAN 1.68% > > There is also about 0.36% improvement for spec2k6/fp, mostly because of case > 436.cactusADM. I believe it can be further improved, but that should be > another patch. > > I also collected benchmark data for x86_64. Spec2k6/fp is not affected. As > for spec2k6/int, though the geomean is improved slightly, 400.perlbench is > regressed by ~3%. I can see BIVs are chosen for some loops instead of > address candidates. Generally, the loop header will be simplified because > iv elimination with BIV is simpler; the number of instructions in loop body > isn't changed. I suspect the regression comes from different addressing > modes. With BIV, complex addressing mode like [base + index << scale + > disp] is used, rather than [base + disp]. I guess the former has more > micro-ops, thus more expensive. This guess can be confirmed by manually > suppressing the complex addressing mode with higher address cost. > Now the problem becomes why overall cost of BIV is computed lower while the > actual cost is higher. I noticed for most affected loops, loop header is > bloated because of iv elimination using the old address candidate. The > bloated loop header results in much higher cost than BIV. As a result, BIV > is preferred. I also noticed the bloated loop header generally can be > simplified (I have a following patch for this). After applying the local > patch, the old address candidate is chosen, and most of regression is > recovered. > Conclusion is I think loop header bloated issue should be blamed for the > regression, and it can be resolved. > > Bootstrap and test on x64_64 and aarch64. It fixes failure of > gcc.target/i386/pr49781-1.c, without new breakage. > > So what do you think? The data above looks ok to me. +static struct iv * +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv) +{ + aff_tree aff; + struct expand_data exp_data; + + if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME) + return iv; + + /* Expand IV's ssa_name till the deriving biv is found. */ + exp_data.data = data; + exp_data.biv = NULL; + tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name), + &aff, &data->name_expansion_cache, + stop_expand, &exp_data); + return exp_data.biv; that's actually "abusing" tree_to_aff_combination_expand for simply walking SSA uses and their defs uses recursively until you hit "stop". ISTR past discussion to add a generic walk_ssa_use interface for that. Not sure if it materialized with a name I can't remember or whether it didn't. - add_candidate (data, iv->base, iv->step, true, NULL); + /* Check if this biv is used in address type use. */ + if (iv->no_overflow && iv->have_address_use + && INTEGRAL_TYPE_P (TREE_TYPE (iv->base)) + && TYPE_PRECISION (TREE_TYPE (iv->base)) < TYPE_PRECISION (sizetype)) + { + tree type = unsigned_type_for (sizetype); sizetype is unsigned. the rest looks ok to me but I really don't like the abuse of tree_to_aff_combination_expand... Thanks, Richard. > Thanks, > bin > > 2015-08-31 Bin Cheng <bin.cheng@arm.com> > > * tree-affine.c (aff_combination_expand): New parameters. > (tree_to_aff_combination_expand): Ditto. > * tree-affine.h (aff_combination_expand): New declaration. > (tree_to_aff_combination_expand): Ditto. > * tree-ssa-loop-ivopts.c (struct iv, iv_cand): New fields. > (dump_iv): Dump no_overflow information. > (alloc_iv): Initialize new field for struct iv. > (struct expand_data): New struct for affine combination expanding. > (stop_expand): New callback func for affine combination expanding. > (find_deriving_biv_for_iv, record_biv_for_address_use): New > functions. > (idx_find_step): Call new functions above. > (find_depends, add_candidate): New paramter. > (add_iv_candidate_for_biv): Add sizetype cand for BIV. > (get_computation_aff): Simplify convertion of cand for BIV. > (get_computation_cost_at): Step cand's base if necessary. >
On Wed, Sep 2, 2015 at 10:12 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Wed, Sep 2, 2015 at 5:26 AM, Bin Cheng <bin.cheng@arm.com> wrote: >> Hi, >> This patch is a new approach to fix PR66388. IVO today computes iv_use with >> iv_cand which has at least same type precision as the use. On 64bit >> platforms like AArch64, this results in different iv_cand created for each >> address type iv_use, and register pressure increased. As a matter of fact, >> the BIV should be used for all iv_uses in some of these cases. It is a >> latent bug but recently getting worse because of overflow changes. >> >> The original approach at >> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue >> except it conflict with IV elimination. Seems to me it is impossible to >> mitigate the contradiction. >> >> This new approach fixes the issue by adding sizetype iv_cand for BIVs >> directly. In cases if the original BIV is preferred, the sizetype iv_cand >> will be chosen. As for code generation, the sizetype iv_cand has the same >> effect as the original BIV. Actually, it's better because BIV needs to be >> explicitly extended to sizetype to be used in address expression on most >> targets. >> >> One shortage of this approach is it may introduce more iv candidates. To >> minimize the impact, this patch does sophisticated code analysis and adds >> sizetype candidate for BIV only if it is used as index. Moreover, it avoids >> to add candidate of the original type if the BIV is only used as index. >> Statistics for compiling spec2k6 shows increase of candidate number is >> modest and can be ignored. >> >> There are two more patches following to fix corner cases revealed by this >> one. In together they bring obvious perf improvement for spec26k/int on >> aarch64. >> Spec2k6/int >> 400.perlbench 3.44% >> 445.gobmk -0.86% >> 456.hmmer 14.83% >> 458.sjeng 2.49% >> 462.libquantum -0.79% >> GEOMEAN 1.68% >> >> There is also about 0.36% improvement for spec2k6/fp, mostly because of case >> 436.cactusADM. I believe it can be further improved, but that should be >> another patch. >> >> I also collected benchmark data for x86_64. Spec2k6/fp is not affected. As >> for spec2k6/int, though the geomean is improved slightly, 400.perlbench is >> regressed by ~3%. I can see BIVs are chosen for some loops instead of >> address candidates. Generally, the loop header will be simplified because >> iv elimination with BIV is simpler; the number of instructions in loop body >> isn't changed. I suspect the regression comes from different addressing >> modes. With BIV, complex addressing mode like [base + index << scale + >> disp] is used, rather than [base + disp]. I guess the former has more >> micro-ops, thus more expensive. This guess can be confirmed by manually >> suppressing the complex addressing mode with higher address cost. >> Now the problem becomes why overall cost of BIV is computed lower while the >> actual cost is higher. I noticed for most affected loops, loop header is >> bloated because of iv elimination using the old address candidate. The >> bloated loop header results in much higher cost than BIV. As a result, BIV >> is preferred. I also noticed the bloated loop header generally can be >> simplified (I have a following patch for this). After applying the local >> patch, the old address candidate is chosen, and most of regression is >> recovered. >> Conclusion is I think loop header bloated issue should be blamed for the >> regression, and it can be resolved. >> >> Bootstrap and test on x64_64 and aarch64. It fixes failure of >> gcc.target/i386/pr49781-1.c, without new breakage. >> >> So what do you think? > > The data above looks ok to me. > > +static struct iv * > +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv) > +{ > + aff_tree aff; > + struct expand_data exp_data; > + > + if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME) > + return iv; > + > + /* Expand IV's ssa_name till the deriving biv is found. */ > + exp_data.data = data; > + exp_data.biv = NULL; > + tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name), > + &aff, &data->name_expansion_cache, > + stop_expand, &exp_data); > + return exp_data.biv; > > that's actually "abusing" tree_to_aff_combination_expand for simply walking > SSA uses and their defs uses recursively until you hit "stop". ISTR past > discussion to add a generic walk_ssa_use interface for that. Not sure if it > materialized with a name I can't remember or whether it didn't. Thanks for reviewing. I didn't found existing interface to walk up definition chains of ssa vars. In this updated patch, I implemented a simple function which meets the minimal requirement of walking up definition chains of BIV variables. I also counted number of no_overflow BIVs that are not used in address type use. Since generally there are only two BIVs in a loop, this can prevent us from visiting definition chains most of time. Statistics shows that number of call to find_deriving_biv_for_expr plummet. > > - add_candidate (data, iv->base, iv->step, true, NULL); > + /* Check if this biv is used in address type use. */ > + if (iv->no_overflow && iv->have_address_use > + && INTEGRAL_TYPE_P (TREE_TYPE (iv->base)) > + && TYPE_PRECISION (TREE_TYPE (iv->base)) < TYPE_PRECISION (sizetype)) > + { > + tree type = unsigned_type_for (sizetype); > > sizetype is unsigned. Fixed. Bootstrap and test on x86_64, is this OK? Thanks, bin > > the rest looks ok to me but I really don't like the abuse of > tree_to_aff_combination_expand... > > Thanks, > Richard. > >> Thanks, >> bin >> >> 2015-08-31 Bin Cheng <bin.cheng@arm.com> >> >> * tree-affine.c (aff_combination_expand): New parameters. >> (tree_to_aff_combination_expand): Ditto. >> * tree-affine.h (aff_combination_expand): New declaration. >> (tree_to_aff_combination_expand): Ditto. >> * tree-ssa-loop-ivopts.c (struct iv, iv_cand): New fields. >> (dump_iv): Dump no_overflow information. >> (alloc_iv): Initialize new field for struct iv. >> (struct expand_data): New struct for affine combination expanding. >> (stop_expand): New callback func for affine combination expanding. >> (find_deriving_biv_for_iv, record_biv_for_address_use): New >> functions. >> (idx_find_step): Call new functions above. >> (find_depends, add_candidate): New paramter. >> (add_iv_candidate_for_biv): Add sizetype cand for BIV. >> (get_computation_aff): Simplify convertion of cand for BIV. >> (get_computation_cost_at): Step cand's base if necessary. >>
On Tue, Sep 8, 2015 at 6:06 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Wed, Sep 2, 2015 at 10:12 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Wed, Sep 2, 2015 at 5:26 AM, Bin Cheng <bin.cheng@arm.com> wrote: >>> Hi, >>> This patch is a new approach to fix PR66388. IVO today computes iv_use with >>> iv_cand which has at least same type precision as the use. On 64bit >>> platforms like AArch64, this results in different iv_cand created for each >>> address type iv_use, and register pressure increased. As a matter of fact, >>> the BIV should be used for all iv_uses in some of these cases. It is a >>> latent bug but recently getting worse because of overflow changes. >>> >>> The original approach at >>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue >>> except it conflict with IV elimination. Seems to me it is impossible to >>> mitigate the contradiction. >>> >>> This new approach fixes the issue by adding sizetype iv_cand for BIVs >>> directly. In cases if the original BIV is preferred, the sizetype iv_cand >>> will be chosen. As for code generation, the sizetype iv_cand has the same >>> effect as the original BIV. Actually, it's better because BIV needs to be >>> explicitly extended to sizetype to be used in address expression on most >>> targets. >>> >>> One shortage of this approach is it may introduce more iv candidates. To >>> minimize the impact, this patch does sophisticated code analysis and adds >>> sizetype candidate for BIV only if it is used as index. Moreover, it avoids >>> to add candidate of the original type if the BIV is only used as index. >>> Statistics for compiling spec2k6 shows increase of candidate number is >>> modest and can be ignored. >>> >>> There are two more patches following to fix corner cases revealed by this >>> one. In together they bring obvious perf improvement for spec26k/int on >>> aarch64. >>> Spec2k6/int >>> 400.perlbench 3.44% >>> 445.gobmk -0.86% >>> 456.hmmer 14.83% >>> 458.sjeng 2.49% >>> 462.libquantum -0.79% >>> GEOMEAN 1.68% >>> >>> There is also about 0.36% improvement for spec2k6/fp, mostly because of case >>> 436.cactusADM. I believe it can be further improved, but that should be >>> another patch. >>> >>> I also collected benchmark data for x86_64. Spec2k6/fp is not affected. As >>> for spec2k6/int, though the geomean is improved slightly, 400.perlbench is >>> regressed by ~3%. I can see BIVs are chosen for some loops instead of >>> address candidates. Generally, the loop header will be simplified because >>> iv elimination with BIV is simpler; the number of instructions in loop body >>> isn't changed. I suspect the regression comes from different addressing >>> modes. With BIV, complex addressing mode like [base + index << scale + >>> disp] is used, rather than [base + disp]. I guess the former has more >>> micro-ops, thus more expensive. This guess can be confirmed by manually >>> suppressing the complex addressing mode with higher address cost. >>> Now the problem becomes why overall cost of BIV is computed lower while the >>> actual cost is higher. I noticed for most affected loops, loop header is >>> bloated because of iv elimination using the old address candidate. The >>> bloated loop header results in much higher cost than BIV. As a result, BIV >>> is preferred. I also noticed the bloated loop header generally can be >>> simplified (I have a following patch for this). After applying the local >>> patch, the old address candidate is chosen, and most of regression is >>> recovered. >>> Conclusion is I think loop header bloated issue should be blamed for the >>> regression, and it can be resolved. >>> >>> Bootstrap and test on x64_64 and aarch64. It fixes failure of >>> gcc.target/i386/pr49781-1.c, without new breakage. >>> >>> So what do you think? >> >> The data above looks ok to me. >> >> +static struct iv * >> +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv) >> +{ >> + aff_tree aff; >> + struct expand_data exp_data; >> + >> + if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME) >> + return iv; >> + >> + /* Expand IV's ssa_name till the deriving biv is found. */ >> + exp_data.data = data; >> + exp_data.biv = NULL; >> + tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name), >> + &aff, &data->name_expansion_cache, >> + stop_expand, &exp_data); >> + return exp_data.biv; >> >> that's actually "abusing" tree_to_aff_combination_expand for simply walking >> SSA uses and their defs uses recursively until you hit "stop". ISTR past >> discussion to add a generic walk_ssa_use interface for that. Not sure if it >> materialized with a name I can't remember or whether it didn't. > Thanks for reviewing. I didn't found existing interface to walk up > definition chains of ssa vars. In this updated patch, I implemented a > simple function which meets the minimal requirement of walking up > definition chains of BIV variables. I also counted number of > no_overflow BIVs that are not used in address type use. Since > generally there are only two BIVs in a loop, this can prevent us from > visiting definition chains most of time. Statistics shows that number > of call to find_deriving_biv_for_expr plummet. > >> >> - add_candidate (data, iv->base, iv->step, true, NULL); >> + /* Check if this biv is used in address type use. */ >> + if (iv->no_overflow && iv->have_address_use >> + && INTEGRAL_TYPE_P (TREE_TYPE (iv->base)) >> + && TYPE_PRECISION (TREE_TYPE (iv->base)) < TYPE_PRECISION (sizetype)) >> + { >> + tree type = unsigned_type_for (sizetype); >> >> sizetype is unsigned. > Fixed. > > Bootstrap and test on x86_64, is this OK? > > Thanks, > bin And here is the updated Changelog entry. 2015-09-08 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/66388 * tree-ssa-loop-ivopts.c (struct iv, iv_cand, ivopts_data): New fields. (dump_iv): Dump no_overflow information. (alloc_iv): Initialize new field for struct iv. (mark_bivs): Count number of no_overflow bivs. (find_deriving_biv_for_expr, record_biv_for_address_use): New functions. (idx_find_step): Call new functions above. (add_candidate_1, add_candidate): New paramter. (add_iv_candidate_for_biv): Add sizetype cand for BIV. (get_computation_aff): Simplify convertion of cand for BIV. (get_computation_cost_at): Step cand's base if necessary.
Index: gcc/tree-affine.c =================================================================== --- gcc/tree-affine.c (revision 227163) +++ gcc/tree-affine.c (working copy) @@ -625,11 +656,14 @@ struct name_expansion }; /* Expands SSA names in COMB recursively. CACHE is used to cache the - results. */ + results. If callback function STOP is specified, this function stops + expanding when it returns TRUE. DATA points to private structure + used by the callback function. */ void aff_combination_expand (aff_tree *comb ATTRIBUTE_UNUSED, - hash_map<tree, name_expansion *> **cache) + hash_map<tree, name_expansion *> **cache, + bool (*stop) (void *, void *), void *data) { unsigned i; aff_tree to_add, current, curre; @@ -654,6 +688,11 @@ aff_combination_expand (aff_tree *comb ATTRIBUTE_U name = TREE_OPERAND (e, 0); if (TREE_CODE (name) != SSA_NAME) continue; + + /* Don't expand further if STOP returns TRUE. */ + if (stop != NULL && (*stop) (data, name)) + continue; + def = SSA_NAME_DEF_STMT (name); if (!is_gimple_assign (def) || gimple_assign_lhs (def) != name) continue; @@ -702,7 +741,8 @@ aff_combination_expand (aff_tree *comb ATTRIBUTE_U if (e != name) rhs = fold_convert (type, rhs); } - tree_to_aff_combination_expand (rhs, comb->type, ¤t, cache); + tree_to_aff_combination_expand (rhs, comb->type, + ¤t, cache, stop, data); exp->expansion = current; exp->in_progress = 0; } @@ -735,14 +775,19 @@ aff_combination_expand (aff_tree *comb ATTRIBUTE_U a1 = a0 + a0; a2 = a1 + a1; a3 = a2 + a2; - ... */ + ... + + If callback function STOP is specified, this function stops expanding + when it returns TRUE. DATA points to private structure used by the + callback function. */ void tree_to_aff_combination_expand (tree expr, tree type, aff_tree *comb, - hash_map<tree, name_expansion *> **cache) + hash_map<tree, name_expansion *> **cache, + bool (*stop) (void *, void *), void *data) { tree_to_aff_combination (expr, type, comb); - aff_combination_expand (comb, cache); + aff_combination_expand (comb, cache, stop, data); } /* Frees memory occupied by struct name_expansion in *VALUE. Callback for Index: gcc/tree-affine.h =================================================================== --- gcc/tree-affine.h (revision 227163) +++ gcc/tree-affine.h (working copy) @@ -77,9 +77,12 @@ void tree_to_aff_combination (tree, tree, aff_tree tree aff_combination_to_tree (aff_tree *); void unshare_aff_combination (aff_tree *); bool aff_combination_constant_multiple_p (aff_tree *, aff_tree *, widest_int *); -void aff_combination_expand (aff_tree *, hash_map<tree, name_expansion *> **); +void aff_combination_expand (aff_tree *, hash_map<tree, name_expansion *> **, + bool (*)(void *, void *) = NULL, void * = NULL); void tree_to_aff_combination_expand (tree, tree, aff_tree *, - hash_map<tree, name_expansion *> **); + hash_map<tree, name_expansion *> **, + bool (*)(void *, void *) = NULL, + void * = NULL); tree get_inner_reference_aff (tree, aff_tree *, widest_int *); void free_affine_expand_cache (hash_map<tree, name_expansion *> **); bool aff_comb_cannot_overlap_p (aff_tree *, const widest_int &, Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 227163) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -147,6 +147,8 @@ struct iv bool biv_p; /* Is it a biv? */ bool have_use_for; /* Do we already have a use for it? */ bool no_overflow; /* True if the iv doesn't overflow. */ + bool have_address_use;/* For biv, indicate if it's used in any address + type use. */ }; /* Per-ssa version information (induction variable descriptions, etc.). */ @@ -251,6 +253,8 @@ struct iv_cand where it is incremented. */ bitmap depends_on; /* The list of invariants that are used in step of the biv. */ + struct iv *orig_iv; /* The original iv if this cand is added from biv with + smaller type. */ }; /* Loop invariant expression hashtable entry. */ @@ -529,6 +533,9 @@ dump_iv (FILE *file, struct iv *iv, bool dump_name if (iv->biv_p) fprintf (file, " is a biv\n"); + + if (iv->no_overflow) + fprintf (file, " iv doesn't overflow wrto loop niter\n"); } /* Dumps information about the USE to FILE. */ @@ -1013,6 +1020,7 @@ alloc_iv (struct ivopts_data *data, tree base, tre iv->use_id = 0; iv->ssa_name = NULL_TREE; iv->no_overflow = no_overflow; + iv->have_address_use = false; return iv; } @@ -1621,6 +1629,106 @@ expr_invariant_in_loop_p (struct loop *loop, tree return true; } +/* Local data for affine combination expanding. */ + +struct expand_data +{ + struct ivopts_data *data; + struct iv *biv; +}; + +/* Callback function to tree_to_aff_combination_expand. + + Return true if biv or loop invariant are encountered, false otherwise. */ + +static bool +stop_expand (void *d1, void *d2) +{ + struct expand_data *exp_data = (struct expand_data *)d1; + tree ssa_name = (tree)d2; + struct ivopts_data *data; + struct iv *iv; + + if (!exp_data || !ssa_name || TREE_CODE (ssa_name) != SSA_NAME) + return false; + + data = exp_data->data; + gcc_assert (data != NULL); + iv = get_iv (data, ssa_name); + if (!iv || integer_zerop (iv->step)) + return true; + + if (iv->biv_p) + { + exp_data->biv = iv; + return true; + } + + return false; +} + +/* Return biv from which the IV is derived. */ + +static struct iv * +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv) +{ + aff_tree aff; + struct expand_data exp_data; + + if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME) + return iv; + + /* Expand IV's ssa_name till the deriving biv is found. */ + exp_data.data = data; + exp_data.biv = NULL; + tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name), + &aff, &data->name_expansion_cache, + stop_expand, &exp_data); + return exp_data.biv; +} + +/* Record BIV, its predecessor and successor that they are used in + address type uses. */ + +static void +record_biv_for_address_use (struct ivopts_data *data, struct iv *biv) +{ + unsigned i; + tree type, base_1, base_2; + bitmap_iterator bi; + + if (!biv || !biv->biv_p || integer_zerop (biv->step) + || biv->have_address_use || !biv->no_overflow) + return; + + type = TREE_TYPE (biv->base); + if (!INTEGRAL_TYPE_P (type)) + return; + + biv->have_address_use = true; + base_1 = fold_build2 (PLUS_EXPR, type, biv->base, biv->step); + EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, i, bi) + { + struct iv *iv = ver_info (data, i)->iv; + + if (!iv || !iv->biv_p || integer_zerop (iv->step) + || iv->have_address_use || !iv->no_overflow) + continue; + + if (type != TREE_TYPE (iv->base) + || !INTEGRAL_TYPE_P (TREE_TYPE (iv->base))) + continue; + + if (!operand_equal_p (biv->step, iv->step, 0)) + continue; + + base_2 = fold_build2 (PLUS_EXPR, type, iv->base, iv->step); + if (operand_equal_p (base_1, iv->base, 0) + || operand_equal_p (base_2, biv->base, 0)) + iv->have_address_use = true; + } +} + /* Cumulates the steps of indices into DATA and replaces their values with the initial ones. Returns false when the value of the index cannot be determined. Callback for for_each_index. */ @@ -1711,6 +1819,10 @@ idx_find_step (tree base, tree *idx, void *data) step = fold_build2 (MULT_EXPR, sizetype, step, iv_step); dta->step = fold_build2 (PLUS_EXPR, sizetype, dta->step, step); + if (!iv->biv_p) + iv = find_deriving_biv_for_iv (dta->ivopts_data, iv); + + record_biv_for_address_use (dta->ivopts_data, iv); return true; } @@ -2603,7 +2715,8 @@ find_depends (tree *expr_p, int *ws ATTRIBUTE_UNUS static struct iv_cand * add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important, enum iv_position pos, - struct iv_use *use, gimple incremented_at) + struct iv_use *use, gimple incremented_at, + struct iv *orig_iv = NULL) { unsigned i; struct iv_cand *cand = NULL; @@ -2685,6 +2798,7 @@ add_candidate_1 (struct ivopts_data *data, else cand->ainc_use = NULL; + cand->orig_iv = orig_iv; if (dump_file && (dump_flags & TDF_DETAILS)) dump_cand (dump_file, cand); } @@ -2793,15 +2907,17 @@ add_autoinc_candidates (struct ivopts_data *data, static void add_candidate (struct ivopts_data *data, - tree base, tree step, bool important, struct iv_use *use) + tree base, tree step, bool important, struct iv_use *use, + struct iv *orig_iv = NULL) { gcc_assert (use == NULL || use->sub_id == 0); if (ip_normal_pos (data->current_loop)) - add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL); + add_candidate_1 (data, base, step, important, + IP_NORMAL, use, NULL, orig_iv); if (ip_end_pos (data->current_loop) && allow_ip_end_pos_p (data->current_loop)) - add_candidate_1 (data, base, step, important, IP_END, use, NULL); + add_candidate_1 (data, base, step, important, IP_END, use, NULL, orig_iv); } /* Adds standard iv candidates. */ @@ -2836,8 +2952,24 @@ add_iv_candidate_for_biv (struct ivopts_data *data tree def; struct iv_cand *cand; - add_candidate (data, iv->base, iv->step, true, NULL); + /* Check if this biv is used in address type use. */ + if (iv->no_overflow && iv->have_address_use + && INTEGRAL_TYPE_P (TREE_TYPE (iv->base)) + && TYPE_PRECISION (TREE_TYPE (iv->base)) < TYPE_PRECISION (sizetype)) + { + tree type = unsigned_type_for (sizetype); + tree base = fold_convert (type, iv->base); + tree step = fold_convert (type, iv->step); + /* Add iv cand of same precision as index part in TARGET_MEM_REF. */ + add_candidate (data, base, step, true, NULL, iv); + /* Add iv cand of the original type only if it has nonlinear use. */ + if (iv->have_use_for) + add_candidate (data, iv->base, iv->step, true, NULL); + } + else + add_candidate (data, iv->base, iv->step, true, NULL); + /* The same, but with initial value zero. */ if (POINTER_TYPE_P (TREE_TYPE (iv->base))) add_candidate (data, size_int (0), iv->step, true, NULL); @@ -3358,6 +3490,28 @@ get_computation_aff (struct loop *loop, /* If the conversion is not noop, perform it. */ if (TYPE_PRECISION (utype) < TYPE_PRECISION (ctype)) { + if (cand->orig_iv != NULL && CONVERT_EXPR_P (cbase) + && (CONVERT_EXPR_P (cstep) || TREE_CODE (cstep) == INTEGER_CST)) + { + tree inner_base, inner_step, inner_type; + inner_base = TREE_OPERAND (cbase, 0); + if (CONVERT_EXPR_P (cstep)) + inner_step = TREE_OPERAND (cstep, 0); + else + inner_step = cstep; + + inner_type = TREE_TYPE (inner_base); + /* If candidate is added from a biv whose type is smaller than + ctype, we know both candidate and the biv won't overflow. + In this case, it's safe to skip the convertion in candidate. + As an example, (unsigned short)((unsigned long)A) equals to + (unsigned short)A, if A has a type no larger than short. */ + if (TYPE_PRECISION (inner_type) <= TYPE_PRECISION (uutype)) + { + cbase = inner_base; + cstep = inner_step; + } + } cstep = fold_convert (uutype, cstep); cbase = fold_convert (uutype, cbase); var = fold_convert (uutype, var); @@ -4525,6 +4681,13 @@ get_computation_cost_at (struct ivopts_data *data, (ratio, mem_mode, TYPE_ADDR_SPACE (TREE_TYPE (utype)))) { + if (cstepi == 0 && stmt_is_after_inc) + { + if (POINTER_TYPE_P (ctype)) + cbase = fold_build2 (POINTER_PLUS_EXPR, ctype, cbase, cstep); + else + cbase = fold_build2 (PLUS_EXPR, ctype, cbase, cstep); + } cbase = fold_build2 (MULT_EXPR, ctype, cbase, build_int_cst (ctype, ratio)); cost = difference_cost (data,