Message ID | CAHFci2-ZJdXrEbV4yQG6UjqswiPp-B8trwrgAn2NAzm4f1fdRw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, Jul 1, 2014 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: > Sorry for this late reply, I spent some time in understanding the problem. > > On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng <bin.cheng@arm.com> wrote: > >>> expressions. It's possible to have iv_elimination_compare_lt to do some >>> undo transformation on may_be_zero, but I found it's difficult for cases >>> involving signed/unsigned conversion like case loop-41.c. Since I think >>> there is no obvious benefit to fold may_be_zero here (somehow because the >>> two operands are already in folded forms), this patch just calls build2_loc >>> instead. >> >> But it may fold to true/false, no? > You are right, it can be folded to false in many cases. Thus I > managed to check specific folded forms of may_be_zero, as in attached > patch. So far it works for tests I added, but there are some other > folded forms not handled. > >> > >>> >>> When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in >>> iv_elimination_compare_lt are like below with i_1 of signed type: >>> B: i_1 + 1 >>> A: 0 >>> niter->niter: (unsigned int)i_1 >>> >>> Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1". >>> Without this patch, it is considered equal to each other. >> >> just looking at this part. Do you have a testcase that exhibits a difference >> when just applying patch A? So I can have a look here? >> >> From the code in iv_elimination_compare_lt I can't figure why we'd >> end up with i_1 instead of (unsigned int)i_1 as we convert to a common >> type. >> >> I suppose the issue may be that at tree_to_aff_combination time >> we strip all nops with STRIP_NOPS but when operating on ->rest >> via convert/scale or add we do not strip them again. >> >> But then 'nit' should be i_1, not (unsigned int)i_1. So the analysis above >> really doesn't look correct. >> >> Just to make sure we don't paper over an issue in tree-affine.c. >> >> Thus - testcase? On x86 we don't run into this place in >> iv_elimination_compare_lt (on an unpatched tree). >> >> CCing Zdenek for the meat of patch B. > > I am more and more convinced that check of overflow in function > iv_elimination_compare_lt is not sufficient. Considering example > given by comments just before the function: > /* Tries to replace loop exit by one formulated in terms of a LT_EXPR > comparison with CAND. NITER describes the number of iterations of > the loops. If successful, the comparison in COMP_P is altered accordingly. > > We aim to handle the following situation: > > sometype *base, *p; > int a, b, i; > > i = a; > p = p_0 = base + a; > > do > { > bla (*p); > p++; > i++; > } > while (i < b); > > Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1. > We aim to optimize this to > > p = p_0 = base + a; > do > { > bla (*p); > p++; > } > while (p < p_0 - a + b); > > This preserves the correctness, since the pointer arithmetics does not > overflow. More precisely: > > 1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no > overflow in computing it or the values of p. > 2) if a + 1 > b, then we need to verify that the expression p_0 - a does not > overflow. To prove this, we use the fact that p_0 = base + a. */ > > The example seems wrong to me, because the statement only stands for > unsigned case like ivopts-lt.c: > > > #include "stdint.h" > > void > f1 (char *p, uintptr_t i, uintptr_t n) > { > p += i; > do > { > *p = '\0'; > p += 1; > i++; > } > while (i < n); > } > > If i/n are of signed type, the 2nd point for "a+1>b" scenario in the > comment isn't right, because of below facts: > a) niter of loop are calculated in unsigned type as in function > number_of_iterations_lt. So loop information for this case is like: > > Analyzing # of iterations of loop 1 > exit condition [i_4(D) + 1, + , 1](no_overflow) < n_12(D) > bounds on difference of bases: -4294967295 ... 4294967294 > result: > zero if i_4(D) >= n_12(D) > # of iterations ((unsigned int) n_12(D) - (unsigned int) i_4(D)) - > 1, bounded by 4294967294 > number of iterations ((unsigned int) n_12(D) - (unsigned int) > i_4(D)) - 1; zero if i_4(D) >= n_12(D) > > b) The (unsigned int)n_12(D) could overflow, the check in > iv_elimination_compare_lt doesn't take this fact into consideration. > > Tricky thing is GCC now doesn't eliminate "i < n" with point P for > signed version of case. The reason behind is again the may_be_zero > information. The original form of may_be_zero is like "i_4(D) + 1> > n_12(D)", which is folded into "i_4(D) >= n_12(D)" because both > i_4/n_12 are of signed type and don't wrap. In the end, function > iv_elimination_compare_lt only handles the original form of > may_be_zero. > So below case is the closest one to illustrate the problem: > > #include "stdint.h" > > void > f1 (char *p, int i, int n) > { > p += i; > do > { > *p = '\0'; > p += 1; > i++; > } > while (i < n); > } > > char arr[100] = "hello\n"; > > int main(void) > { > f1 (arr, 1, -1); > > return 0; > } > > If we build the original form of may_be_zero with change: > Index: gcc/tree-ssa-loop-niter.c > =================================================================== > --- gcc/tree-ssa-loop-niter.c (revision 211966) > +++ gcc/tree-ssa-loop-niter.c (working copy) > @@ -1153,8 +1153,13 @@ > condition. */ > > if (mpz_sgn (bnds->below) < 0) > +#if 0 > niter->may_be_zero = fold_build2 (LT_EXPR, boolean_type_node, > iv1->base, iv0->base); > +#else > + niter->may_be_zero = build2_loc (UNKNOWN_LOCATION, LT_EXPR, > boolean_type_node, > + iv1->base, iv0->base); > +#endif > niter->niter = delta; > niter->max = widest_int::from (wi::from_mpz (niter_type, > bnds->up, false), > TYPE_SIGN (niter_type)); > > The case will be mis-compiled. > > One the other handle, iv_elimination_compare_lt for loops with > may_be_zero can be further improved to handle cases other than pointer > candidate. My original patch is intended to handle unsigned type > candidates, but there are other cases. > > The problem here is a little complicated, especially if we want to > improve iv elimination, I may mis-read the code somehow. So any idea > about this? > BTW, for below gimple dump before ivopt: foo (int l) { long long int sum; long long int j; int i; int _8; <bb 2>: <bb 3>: # i_1 = PHI <0(2), i_4(4)> # j_2 = PHI <0(2), j_5(4)> # sum_3 = PHI <0(2), sum_6(4)> i_4 = i_1 + 1; j_5 = j_2 + 1; sum_6 = sum_3 + j_5; if (i_4 < l_7(D)) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: # sum_12 = PHI <sum_6(3)> _8 = (int) sum_12; return _8; } The loop information is like: ;; ;; Loop 1 ;; header 3, latch 4 ;; depth 1, outer 0 ;; nodes: 3 4 Processing loop 1 single exit 3 -> 5, exit condition if (i_4 < l_7(D)) Analyzing # of iterations of loop 1 exit condition [1, + , 1](no_overflow) < l_7(D) bounds on difference of bases: -2147483649 ... 2147483646 result: zero if l_7(D) < 1 # of iterations (unsigned int) l_7(D) + 4294967295, bounded by 2147483646 number of iterations (unsigned int) l_7(D) + 4294967295; zero if l_7(D) < 1 Question is, is it safe to eliminate "i_4 < l_7(D)" as below? Replacing exit test: if (i_4 < l_7(D)) foo (int l) { long long int sum; long long int j; int i; int _8; long long int _10; unsigned int _11; unsigned long _13; unsigned long _14; unsigned int _15; <bb 2>: _11 = (unsigned int) l_7(D); _15 = _11 + 4294967295; _14 = (unsigned long) _15; _13 = _14 + 1; _10 = (long long int) _13; <bb 3>: # j_2 = PHI <0(2), j_5(4)> # sum_3 = PHI <0(2), sum_6(4)> j_5 = j_2 + 1; sum_6 = sum_3 + j_5; if (j_5 < _10) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: # sum_12 = PHI <sum_6(3)> _8 = (int) sum_12; return _8; } Thanks, bin
On Tue, Jul 1, 2014 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: > Sorry for this late reply, I spent some time in understanding the problem. > > On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng <bin.cheng@arm.com> wrote: > >>> expressions. It's possible to have iv_elimination_compare_lt to do some >>> undo transformation on may_be_zero, but I found it's difficult for cases >>> involving signed/unsigned conversion like case loop-41.c. Since I think >>> there is no obvious benefit to fold may_be_zero here (somehow because the >>> two operands are already in folded forms), this patch just calls build2_loc >>> instead. >> >> But it may fold to true/false, no? > You are right, it can be folded to false in many cases. Thus I > managed to check specific folded forms of may_be_zero, as in attached > patch. So far it works for tests I added, but there are some other > folded forms not handled. > >> > >>> >>> When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in >>> iv_elimination_compare_lt are like below with i_1 of signed type: >>> B: i_1 + 1 >>> A: 0 >>> niter->niter: (unsigned int)i_1 >>> >>> Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1". >>> Without this patch, it is considered equal to each other. >> >> just looking at this part. Do you have a testcase that exhibits a difference >> when just applying patch A? So I can have a look here? >> >> From the code in iv_elimination_compare_lt I can't figure why we'd >> end up with i_1 instead of (unsigned int)i_1 as we convert to a common >> type. >> >> I suppose the issue may be that at tree_to_aff_combination time >> we strip all nops with STRIP_NOPS but when operating on ->rest >> via convert/scale or add we do not strip them again. >> >> But then 'nit' should be i_1, not (unsigned int)i_1. So the analysis above >> really doesn't look correct. >> >> Just to make sure we don't paper over an issue in tree-affine.c. >> >> Thus - testcase? On x86 we don't run into this place in >> iv_elimination_compare_lt (on an unpatched tree). >> >> CCing Zdenek for the meat of patch B. > For the record, here is my understanding about the problem. Considering below simple loop, and we don't restrict P to any specific type as GCC now does: some_type p, p_0; int a, b, i; i = a; p = p_0 do { use(p); p += step; i++; } while (i < b); We want to optimization it into below form, given there is no overflow/wrap behavior introduced, if NITER is the number of the loop. p = p_0; do { use(p); p += step; } while (p < p_0 + (NITER + 1) * step); For convenient, we assume positive step for now. I think it's safe (in other words, no new overflow or wrap), if below two conditions are satisfied: 1) When the original latch executes for N(>0) times, the new latch executes for same times. 2) When the original latch executes ZERO time, the new latch executes for ZERO time. For 1), expression "p_0 + (NITER + 1) * step" should not overflow/wrap upward. This is checked now by code snippet like: /* We need to know that the candidate induction variable does not overflow. While more complex analysis may be used to prove this, for now just check that the variable appears in the original program and that it is computed in a type that guarantees no overflows. */ cand_type = TREE_TYPE (cand->iv->base); if (cand->pos != IP_ORIGINAL || !nowrap_type_p (cand_type)) return false; GCC only checks if the variable appears originally and is of type not overflow/wrap. Just as the comment states, it can be improved (something like introduced in my patch). For 2), expression "p_0 + (NITER + 1) * step" should not overflow/wrap downward, in other words, "p >= p_0 + (NITER + 1) * step" should hold when may_be_zero (i.e, "a + 1 > b") holds. Since NITER is computed as (unsigned_type)B - (unsigned_type)A - 1, the new bound is effective in form of "p_0 + (unsigned_type)B * step - (unsigned_type)A * step". Due to the folded form of may_be_zero, GCC only handles cases in which B/A are of unsigned type, we only need to make sure that "p_0 - (unsigned_type)A * step" holds, given "(unsigned_type)B <= (unsigned_type)A" already holds. When comes to cases in which B is of signed type, we need to make sure that "(unsigned_type)B" doesn't overflow/wrap. So the conclusions are: 1) With the original patch B, patch A is needed because we relax GCC for cases in which B is of signed type. 2) With the updated patch B, patch A won't be needed. Thanks, bin
Index: gcc/tree-ssa-loop-niter.c =================================================================== --- gcc/tree-ssa-loop-niter.c (revision 211966) +++ gcc/tree-ssa-loop-niter.c (working copy) @@ -1153,8 +1153,13 @@ condition. */ if (mpz_sgn (bnds->below) < 0) +#if 0 niter->may_be_zero = fold_build2 (LT_EXPR, boolean_type_node, iv1->base, iv0->base); +#else + niter->may_be_zero = build2_loc (UNKNOWN_LOCATION, LT_EXPR, boolean_type_node, + iv1->base, iv0->base); +#endif niter->niter = delta; niter->max = widest_int::from (wi::from_mpz (niter_type,