Message ID | 20191019062731.GL2116@tucnak |
---|---|
State | New |
Headers | show |
Series | Improve debug info in ivopts optimized loops (PR debug/90231) | expand |
On Sat, Oct 19, 2019 at 08:27:31AM +0200, Jakub Jelinek wrote: > And as questioned in the PR, are there other cases where we can safely > assume no wrap (e.g. use it if use->iv->no_overflow && cand->iv->no_overflow > without those extra precision checks)? Like, is there a way to find out if an iv_cand has been created from pointer arithmetics or similar and thus at least with -fno-wrapv-pointer shouldn't wrap around? Jakub
On Sat, Oct 19, 2019 at 2:27 PM Jakub Jelinek <jakub@redhat.com> wrote: > > Hi! > > As mentioned in the PR, the following patch attempts to address two issues. > In remove_unused_ivs we first find the best iv_cand (we prefer primarily the > same step, next same mode and lastly constant base) and only then call > get_computation_at to determine the replacement expression. Unfortunately, > in various cases get_computation_at can return NULL_TREE and in that case we > don't try any other candidate and just leave the vars for debug as is, which > results in => NULL and the IVs <optimized away>. > > The following patch will only consider candidates for which > get_computation_at succeeds, while it can be slower, it can handle more > cases. Perhaps alternative would be to have two passes, pick up the best > candidate without calling get_computation_at, calling it on the best > candidate and if that fails, retry the best candidate search with calling > get_computation_at. > > Another thing is that get_computation_at can only handle cases where > use can be expressed as ubase + (var - cbase) * ratio for integral ratio. > In the PR testcase we don't have any, one (the one we pick as best) > has a ratio 1/15 and the other 1/4. Using a division in ivopts at runtime > is hardly ever desirable, but for debug info we don't mind that, all we need > to ensure is that we don't result in wrong-debug. > The patch implements expressing use as ubase + (var - cbase) / ratio for > these debug info only uses, but so far requires that if the use IV couldn't > wrap around, that the candidate (var - cbase) can't wrap around either, > which should be true whenever the candidate type is IMHO at least > ceil_log2 (ratio) bits larger than the use type. Do I need to punt if > !use->iv->no_overflow, or is > ubase + (utype) ((var - cbase) / ratio) computation safe even if it wraps? > And as questioned in the PR, are there other cases where we can safely > assume no wrap (e.g. use it if use->iv->no_overflow && cand->iv->no_overflow > without those extra precision checks)? > > Anyway, bootstrapped/regtested successfully on x86_64-linux and i686-linux. Hi Jakub, thanks very much for working on this. As for choosing the best candidate, I was thinking to reuse existing get_computation_cost interface, but looks like it requires non-trivial modification. Your patch is simpler. The overflow check is difficult, IIUC, checks on bit precision in the patch is not enough? Considering an unsigned 64-bit candidate with unknown compilation start value. My original idea was using no_overflow flag, apparently this information is not well computed and inaccurate in most cases? At last, for now there is no link between candidate and its original iv. IIRC, when a candidate is derived from a no_overflow pointer candidate, it's not safe to mark the candidate as no_overflow. I am not sure if I remember wrong, unfortunately, I don't have examples either. Thanks, bin > > Comparing bootstrapped cc1plus without and with this patch (the former with > the patch applied after bootstrap and stage3 rebuilt, so that .text etc. is > identical) shows: > - [33] .debug_info PROGBITS 0000000000000000 22a4788 749275e 00 0 0 1 > - [34] .debug_abbrev PROGBITS 0000000000000000 9736ee6 204aad 00 0 0 1 > - [35] .debug_line PROGBITS 0000000000000000 993b993 1688464 00 0 0 1 > - [36] .debug_str PROGBITS 0000000000000000 afc3df7 6f65aa 01 MS 0 0 1 > - [37] .debug_loc PROGBITS 0000000000000000 b6ba3a1 71a2dde 00 0 0 1 > - [38] .debug_ranges PROGBITS 0000000000000000 1285d17f 16414d0 00 0 0 1 > - [39] .symtab SYMTAB 0000000000000000 13e9e650 166ff8 18 40 38193 8 > - [40] .strtab STRTAB 0000000000000000 14005648 2ad809 00 0 0 1 > - [41] .shstrtab STRTAB 0000000000000000 142b2e51 0001a0 00 0 0 1 > + [33] .debug_info PROGBITS 0000000000000000 22a4788 749365e 00 0 0 1 > + [34] .debug_abbrev PROGBITS 0000000000000000 9737de6 204a9f 00 0 0 1 > + [35] .debug_line PROGBITS 0000000000000000 993c885 1688f0c 00 0 0 1 > + [36] .debug_str PROGBITS 0000000000000000 afc5791 6f65aa 01 MS 0 0 1 > + [37] .debug_loc PROGBITS 0000000000000000 b6bbd3b 71cd404 00 0 0 1 > + [38] .debug_ranges PROGBITS 0000000000000000 1288913f 16414b0 00 0 0 1 > + [39] .symtab SYMTAB 0000000000000000 13eca5f0 166ff8 18 40 38193 8 > + [40] .strtab STRTAB 0000000000000000 140315e8 2ad809 00 0 0 1 > + [41] .shstrtab STRTAB 0000000000000000 142dedf1 0001a0 00 0 0 1 > so .debug_info is 3840 bytes larger and .debug_loc is 173606 bytes larger > (0.15%), so there are some changes with this, but not huge amount of them, > though .debug_loc size changed in 217 gcc/*.o files out of 474. > > 2019-10-18 Jakub Jelinek <jakub@redhat.com> > > PR debug/90231 > * tree-ssa-loop-ivopts.c (get_debug_computation_at): New function. > (remove_unused_ivs): Use it instead of get_computation_at. When > choosing best candidate, only consider candidates where > get_debug_computation_at actually returns non-NULL. > > --- gcc/tree-ssa-loop-ivopts.c.jj 2019-09-20 12:25:26.810718338 +0200 > +++ gcc/tree-ssa-loop-ivopts.c 2019-10-18 23:55:10.054026219 +0200 > @@ -4089,6 +4089,83 @@ get_computation_at (class loop *loop, gi > return fold_convert (type, aff_combination_to_tree (&aff)); > } > > +/* Like get_computation_at, but try harder, even if the computation > + is more expensive. Intended for debug stmts. */ > + > +static tree > +get_debug_computation_at (class loop *loop, gimple *at, > + struct iv_use *use, struct iv_cand *cand) > +{ > + if (tree ret = get_computation_at (loop, at, use, cand)) > + return ret; > + > + tree ubase = use->iv->base, ustep = use->iv->step; > + tree cbase = cand->iv->base, cstep = cand->iv->step; > + tree var; > + tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase); > + widest_int rat; > + > + /* We must have a precision to express the values of use. */ > + if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype)) > + return NULL_TREE; > + > + /* Try to handle the case that get_computation_at doesn't, > + try to express > + use = ubase + (var - cbase) / ratio. */ > + if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep), > + &rat)) > + return NULL_TREE; > + > + bool neg_p = false; > + if (wi::neg_p (rat)) > + { > + if (TYPE_UNSIGNED (ctype)) > + return NULL_TREE; > + neg_p = true; > + rat = wi::neg (rat); > + } > + > + int bits = wi::exact_log2 (rat); > + if (bits == -1) > + bits = wi::floor_log2 (rat) + 1; > + if (TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype)) > + return NULL_TREE; > + > + var = var_at_stmt (loop, cand, at); > + > + if (POINTER_TYPE_P (ctype)) > + { > + ctype = unsigned_type_for (ctype); > + cbase = fold_convert (ctype, cbase); > + cstep = fold_convert (ctype, cstep); > + var = fold_convert (ctype, var); > + } > + > + ubase = unshare_expr (ubase); > + cbase = unshare_expr (cbase); > + if (stmt_after_increment (loop, cand, at)) > + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, > + unshare_expr (cstep)); > + > + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase); > + var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var, > + wide_int_to_tree (TREE_TYPE (var), rat)); > + if (POINTER_TYPE_P (utype)) > + { > + var = fold_convert (sizetype, var); > + if (neg_p) > + var = fold_build1 (NEGATE_EXPR, sizetype, var); > + var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var); > + } > + else > + { > + var = fold_convert (utype, var); > + var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype, > + ubase, var); > + } > + return var; > +} > + > /* Adjust the cost COST for being in loop setup rather than loop body. > If we're optimizing for space, the loop setup overhead is constant; > if we're optimizing for speed, amortize it over the per-iteration cost. > @@ -7523,6 +7600,7 @@ remove_unused_ivs (struct ivopts_data *d > struct iv_use dummy_use; > struct iv_cand *best_cand = NULL, *cand; > unsigned i, best_pref = 0, cand_pref; > + tree comp = NULL_TREE; > > memset (&dummy_use, 0, sizeof (dummy_use)); > dummy_use.iv = info->iv; > @@ -7543,20 +7621,22 @@ remove_unused_ivs (struct ivopts_data *d > ? 1 : 0; > if (best_cand == NULL || best_pref < cand_pref) > { > - best_cand = cand; > - best_pref = cand_pref; > + tree this_comp > + = get_debug_computation_at (data->current_loop, > + SSA_NAME_DEF_STMT (def), > + &dummy_use, cand); > + if (this_comp) > + { > + best_cand = cand; > + best_pref = cand_pref; > + comp = this_comp; > + } > } > } > > if (!best_cand) > continue; > > - tree comp = get_computation_at (data->current_loop, > - SSA_NAME_DEF_STMT (def), > - &dummy_use, best_cand); > - if (!comp) > - continue; > - > if (count > 1) > { > tree vexpr = make_node (DEBUG_EXPR_DECL); > > Jakub
On Mon, Oct 21, 2019 at 06:41:46PM +0800, Bin.Cheng wrote: > The overflow check is difficult, IIUC, checks on bit precision in the > patch is not enough? Considering an unsigned 64-bit candidate with I think it can cover some cases, especially on 64-bit targets, but will leave other cases out. I guess the first question is what iv->no_overflow means, the IVs are base + iteration * step, does no_overflow mean that neither iteration * step, nor base + iteration * step overflows? For the debug info and division, guess we care primarily if just the iteration * step part doesn't overflow. There are still cases where I fear even the precision check might not be enough. If ustep is a power of two and so is rat, then I think we should be fine, or if use->iv->no_overflow, but otherwise I fear of say unsigned char use and unsigned short candidate, with ustep 3 and cstep 9 (i.e. rat 3). If the loop has more than 65536 / 9 iterations, the var - cbase will overflow, so will have values 0, 9, ..., 0xfff0, 0xfff9, 2, 11, ... and corresponding use computed using division (assuming ubase is 0) 0 + (unsigned char) ((var - cbase) / 3): 0, 3, ..., 0x50, 0x53, 0, 3, ... which is incorrect. Simple testcase -O3 -g -fno-tree-dce: void foo (unsigned short c, unsigned char *p) { unsigned char a = 0; unsigned short b = 0; for (; b != c; a += 3, b += 9) p[b]++; } where I think with the patch it will be wrong-debug on 7282th iteration etc. So I wonder if for correctness I don't need to add: if (!use->iv->no_overflow && !cand->iv->no_overflow && !integer_pow2p (cstep)) return NULL_TREE; with some of the above as comment explaining why. On the other side, if cand->iv->no_overflow, couldn't we bypass the extra precision test? And related to the first question above, perhaps incrementally, couldn't we track separately whether the whole base + iteration * step overflows (i.e. what we track right now) and also just whether iteration * step overflows in a separate bool? Because when we subtract the base from the value, all we care about is iteration * step. Jakub
On Mon, Oct 21, 2019 at 01:24:30PM +0200, Jakub Jelinek wrote: > So I wonder if for correctness I don't need to add: > > if (!use->iv->no_overflow > && !cand->iv->no_overflow > && !integer_pow2p (cstep)) > return NULL_TREE; > > with some of the above as comment explaining why. > > On the other side, if cand->iv->no_overflow, couldn't we bypass the extra > precision test? Here are these two in patch form. 2019-10-22 Jakub Jelinek <jakub@redhat.com> PR debug/90231 * tree-ssa-loop-ivopts.c (get_debug_computation_at): New function. (remove_unused_ivs): Use it instead of get_computation_at. When choosing best candidate, only consider candidates where get_debug_computation_at actually returns non-NULL. --- gcc/tree-ssa-loop-ivopts.c.jj 2019-10-21 14:17:57.598198162 +0200 +++ gcc/tree-ssa-loop-ivopts.c 2019-10-22 09:30:09.782238157 +0200 @@ -4089,6 +4089,94 @@ get_computation_at (class loop *loop, gi return fold_convert (type, aff_combination_to_tree (&aff)); } +/* Like get_computation_at, but try harder, even if the computation + is more expensive. Intended for debug stmts. */ + +static tree +get_debug_computation_at (class loop *loop, gimple *at, + struct iv_use *use, struct iv_cand *cand) +{ + if (tree ret = get_computation_at (loop, at, use, cand)) + return ret; + + tree ubase = use->iv->base, ustep = use->iv->step; + tree cbase = cand->iv->base, cstep = cand->iv->step; + tree var; + tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase); + widest_int rat; + + /* We must have a precision to express the values of use. */ + if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype)) + return NULL_TREE; + + /* Try to handle the case that get_computation_at doesn't, + try to express + use = ubase + (var - cbase) / ratio. */ + if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep), + &rat)) + return NULL_TREE; + + bool neg_p = false; + if (wi::neg_p (rat)) + { + if (TYPE_UNSIGNED (ctype)) + return NULL_TREE; + neg_p = true; + rat = wi::neg (rat); + } + + /* If both IVs can wrap around and CAND doesn't have a power of two step, + it is unsafe. Consider uint16_t CAND with step 9, when wrapping around, + the values will be ... 0xfff0, 0xfff9, 2, 11 ... and when use is say + uint8_t with step 3, those values divided by 3 cast to uint8_t will be + ... 0x50, 0x53, 0, 3 ... rather than expected 0x50, 0x53, 0x56, 0x59. */ + if (!use->iv->no_overflow + && !cand->iv->no_overflow + && !integer_pow2p (cstep)) + return NULL_TREE; + + int bits = wi::exact_log2 (rat); + if (bits == -1) + bits = wi::floor_log2 (rat) + 1; + if (!cand->iv->no_overflow + && TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype)) + return NULL_TREE; + + var = var_at_stmt (loop, cand, at); + + if (POINTER_TYPE_P (ctype)) + { + ctype = unsigned_type_for (ctype); + cbase = fold_convert (ctype, cbase); + cstep = fold_convert (ctype, cstep); + var = fold_convert (ctype, var); + } + + ubase = unshare_expr (ubase); + cbase = unshare_expr (cbase); + if (stmt_after_increment (loop, cand, at)) + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, + unshare_expr (cstep)); + + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase); + var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var, + wide_int_to_tree (TREE_TYPE (var), rat)); + if (POINTER_TYPE_P (utype)) + { + var = fold_convert (sizetype, var); + if (neg_p) + var = fold_build1 (NEGATE_EXPR, sizetype, var); + var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var); + } + else + { + var = fold_convert (utype, var); + var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype, + ubase, var); + } + return var; +} + /* Adjust the cost COST for being in loop setup rather than loop body. If we're optimizing for space, the loop setup overhead is constant; if we're optimizing for speed, amortize it over the per-iteration cost. @@ -7523,6 +7611,7 @@ remove_unused_ivs (struct ivopts_data *d struct iv_use dummy_use; struct iv_cand *best_cand = NULL, *cand; unsigned i, best_pref = 0, cand_pref; + tree comp = NULL_TREE; memset (&dummy_use, 0, sizeof (dummy_use)); dummy_use.iv = info->iv; @@ -7543,20 +7632,22 @@ remove_unused_ivs (struct ivopts_data *d ? 1 : 0; if (best_cand == NULL || best_pref < cand_pref) { - best_cand = cand; - best_pref = cand_pref; + tree this_comp + = get_debug_computation_at (data->current_loop, + SSA_NAME_DEF_STMT (def), + &dummy_use, cand); + if (this_comp) + { + best_cand = cand; + best_pref = cand_pref; + comp = this_comp; + } } } if (!best_cand) continue; - tree comp = get_computation_at (data->current_loop, - SSA_NAME_DEF_STMT (def), - &dummy_use, best_cand); - if (!comp) - continue; - if (count > 1) { tree vexpr = make_node (DEBUG_EXPR_DECL); Jakub
On Tue, Oct 22, 2019 at 3:32 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Oct 21, 2019 at 01:24:30PM +0200, Jakub Jelinek wrote: > > So I wonder if for correctness I don't need to add: > > > > if (!use->iv->no_overflow > > && !cand->iv->no_overflow > > && !integer_pow2p (cstep)) > > return NULL_TREE; > > > > with some of the above as comment explaining why. > > > > On the other side, if cand->iv->no_overflow, couldn't we bypass the extra > > precision test? > > Here are these two in patch form. > > 2019-10-22 Jakub Jelinek <jakub@redhat.com> > > PR debug/90231 > * tree-ssa-loop-ivopts.c (get_debug_computation_at): New function. > (remove_unused_ivs): Use it instead of get_computation_at. When > choosing best candidate, only consider candidates where > get_debug_computation_at actually returns non-NULL. > > --- gcc/tree-ssa-loop-ivopts.c.jj 2019-10-21 14:17:57.598198162 +0200 > +++ gcc/tree-ssa-loop-ivopts.c 2019-10-22 09:30:09.782238157 +0200 > @@ -4089,6 +4089,94 @@ get_computation_at (class loop *loop, gi > return fold_convert (type, aff_combination_to_tree (&aff)); > } > > +/* Like get_computation_at, but try harder, even if the computation > + is more expensive. Intended for debug stmts. */ > + > +static tree > +get_debug_computation_at (class loop *loop, gimple *at, > + struct iv_use *use, struct iv_cand *cand) > +{ > + if (tree ret = get_computation_at (loop, at, use, cand)) > + return ret; > + > + tree ubase = use->iv->base, ustep = use->iv->step; > + tree cbase = cand->iv->base, cstep = cand->iv->step; > + tree var; > + tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase); > + widest_int rat; > + > + /* We must have a precision to express the values of use. */ > + if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype)) > + return NULL_TREE; > + > + /* Try to handle the case that get_computation_at doesn't, > + try to express > + use = ubase + (var - cbase) / ratio. */ > + if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep), > + &rat)) > + return NULL_TREE; > + > + bool neg_p = false; > + if (wi::neg_p (rat)) > + { > + if (TYPE_UNSIGNED (ctype)) > + return NULL_TREE; > + neg_p = true; > + rat = wi::neg (rat); > + } > + > + /* If both IVs can wrap around and CAND doesn't have a power of two step, > + it is unsafe. Consider uint16_t CAND with step 9, when wrapping around, > + the values will be ... 0xfff0, 0xfff9, 2, 11 ... and when use is say > + uint8_t with step 3, those values divided by 3 cast to uint8_t will be > + ... 0x50, 0x53, 0, 3 ... rather than expected 0x50, 0x53, 0x56, 0x59. */ Interesting, so we can still get correct debug info for iter in such special cases. > + if (!use->iv->no_overflow > + && !cand->iv->no_overflow > + && !integer_pow2p (cstep)) > + return NULL_TREE; > + > + int bits = wi::exact_log2 (rat); > + if (bits == -1) > + bits = wi::floor_log2 (rat) + 1; > + if (!cand->iv->no_overflow > + && TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype)) > + return NULL_TREE; The patch is fine for me. Just for the record, guess we may try to find (by recording information in early phase) the correct/bijection candidate in computing the iv in debuginfo in the future, then those checks would be unnecessary. Thanks, bin > + > + var = var_at_stmt (loop, cand, at); > + > + if (POINTER_TYPE_P (ctype)) > + { > + ctype = unsigned_type_for (ctype); > + cbase = fold_convert (ctype, cbase); > + cstep = fold_convert (ctype, cstep); > + var = fold_convert (ctype, var); > + } > + > + ubase = unshare_expr (ubase); > + cbase = unshare_expr (cbase); > + if (stmt_after_increment (loop, cand, at)) > + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, > + unshare_expr (cstep)); > + > + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase); > + var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var, > + wide_int_to_tree (TREE_TYPE (var), rat)); > + if (POINTER_TYPE_P (utype)) > + { > + var = fold_convert (sizetype, var); > + if (neg_p) > + var = fold_build1 (NEGATE_EXPR, sizetype, var); > + var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var); > + } > + else > + { > + var = fold_convert (utype, var); > + var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype, > + ubase, var); > + } > + return var; > +} > + > /* Adjust the cost COST for being in loop setup rather than loop body. > If we're optimizing for space, the loop setup overhead is constant; > if we're optimizing for speed, amortize it over the per-iteration cost. > @@ -7523,6 +7611,7 @@ remove_unused_ivs (struct ivopts_data *d > struct iv_use dummy_use; > struct iv_cand *best_cand = NULL, *cand; > unsigned i, best_pref = 0, cand_pref; > + tree comp = NULL_TREE; > > memset (&dummy_use, 0, sizeof (dummy_use)); > dummy_use.iv = info->iv; > @@ -7543,20 +7632,22 @@ remove_unused_ivs (struct ivopts_data *d > ? 1 : 0; > if (best_cand == NULL || best_pref < cand_pref) > { > - best_cand = cand; > - best_pref = cand_pref; > + tree this_comp > + = get_debug_computation_at (data->current_loop, > + SSA_NAME_DEF_STMT (def), > + &dummy_use, cand); > + if (this_comp) > + { > + best_cand = cand; > + best_pref = cand_pref; > + comp = this_comp; > + } > } > } > > if (!best_cand) > continue; > > - tree comp = get_computation_at (data->current_loop, > - SSA_NAME_DEF_STMT (def), > - &dummy_use, best_cand); > - if (!comp) > - continue; > - > if (count > 1) > { > tree vexpr = make_node (DEBUG_EXPR_DECL); > > > Jakub >
--- gcc/tree-ssa-loop-ivopts.c.jj 2019-09-20 12:25:26.810718338 +0200 +++ gcc/tree-ssa-loop-ivopts.c 2019-10-18 23:55:10.054026219 +0200 @@ -4089,6 +4089,83 @@ get_computation_at (class loop *loop, gi return fold_convert (type, aff_combination_to_tree (&aff)); } +/* Like get_computation_at, but try harder, even if the computation + is more expensive. Intended for debug stmts. */ + +static tree +get_debug_computation_at (class loop *loop, gimple *at, + struct iv_use *use, struct iv_cand *cand) +{ + if (tree ret = get_computation_at (loop, at, use, cand)) + return ret; + + tree ubase = use->iv->base, ustep = use->iv->step; + tree cbase = cand->iv->base, cstep = cand->iv->step; + tree var; + tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase); + widest_int rat; + + /* We must have a precision to express the values of use. */ + if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype)) + return NULL_TREE; + + /* Try to handle the case that get_computation_at doesn't, + try to express + use = ubase + (var - cbase) / ratio. */ + if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep), + &rat)) + return NULL_TREE; + + bool neg_p = false; + if (wi::neg_p (rat)) + { + if (TYPE_UNSIGNED (ctype)) + return NULL_TREE; + neg_p = true; + rat = wi::neg (rat); + } + + int bits = wi::exact_log2 (rat); + if (bits == -1) + bits = wi::floor_log2 (rat) + 1; + if (TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype)) + return NULL_TREE; + + var = var_at_stmt (loop, cand, at); + + if (POINTER_TYPE_P (ctype)) + { + ctype = unsigned_type_for (ctype); + cbase = fold_convert (ctype, cbase); + cstep = fold_convert (ctype, cstep); + var = fold_convert (ctype, var); + } + + ubase = unshare_expr (ubase); + cbase = unshare_expr (cbase); + if (stmt_after_increment (loop, cand, at)) + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, + unshare_expr (cstep)); + + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase); + var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var, + wide_int_to_tree (TREE_TYPE (var), rat)); + if (POINTER_TYPE_P (utype)) + { + var = fold_convert (sizetype, var); + if (neg_p) + var = fold_build1 (NEGATE_EXPR, sizetype, var); + var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var); + } + else + { + var = fold_convert (utype, var); + var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype, + ubase, var); + } + return var; +} + /* Adjust the cost COST for being in loop setup rather than loop body. If we're optimizing for space, the loop setup overhead is constant; if we're optimizing for speed, amortize it over the per-iteration cost. @@ -7523,6 +7600,7 @@ remove_unused_ivs (struct ivopts_data *d struct iv_use dummy_use; struct iv_cand *best_cand = NULL, *cand; unsigned i, best_pref = 0, cand_pref; + tree comp = NULL_TREE; memset (&dummy_use, 0, sizeof (dummy_use)); dummy_use.iv = info->iv; @@ -7543,20 +7621,22 @@ remove_unused_ivs (struct ivopts_data *d ? 1 : 0; if (best_cand == NULL || best_pref < cand_pref) { - best_cand = cand; - best_pref = cand_pref; + tree this_comp + = get_debug_computation_at (data->current_loop, + SSA_NAME_DEF_STMT (def), + &dummy_use, cand); + if (this_comp) + { + best_cand = cand; + best_pref = cand_pref; + comp = this_comp; + } } } if (!best_cand) continue; - tree comp = get_computation_at (data->current_loop, - SSA_NAME_DEF_STMT (def), - &dummy_use, best_cand); - if (!comp) - continue; - if (count > 1) { tree vexpr = make_node (DEBUG_EXPR_DECL);