From patchwork Wed Jul 20 16:23:13 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Bin.Cheng" X-Patchwork-Id: 650770 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rvj0737w4z9sC3 for ; Thu, 21 Jul 2016 02:23:50 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=ijoAXiDV; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; q=dns; s=default; b=dyJEeY9RzjIrjvd mT7Ajvsvuas4xKN4MFRbWQV9Ibe0WVkIVWo5CruXYPMchmY2oRxi4biYPpqWnVmk DUKuZCPUS4f/1aXwoOlOI0S1IGpqpTq/5KOP7d5VBaBawlvqWoaB5R26Tot8imiT 5r5awy41+vpgN4js8aM1ZJwAO1LI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; s=default; bh=1zzHkKaxkGz7PkBWnPHvU Hplxr4=; b=ijoAXiDVs8oIN5qg6AKybAKsq9GpMFqjyIR6Pw6LBCBpKYEVNG0mH 5KWBzMy0dsqiapVkbZfaEAG/MUQdknDXzl/UWPdEN2Dx68OhI9OzkSFVRvTVApED UkkCitU1HqBefG/ekV6u+AdnBLYSDfR2YQtmKypMIFc6Q2gCv+0oJ0= Received: (qmail 26170 invoked by alias); 20 Jul 2016 16:23:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 26067 invoked by uid 89); 20 Jul 2016 16:23:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=BinChengarmcom, Bin.Cheng@arm.com, amkerchenggmailcom, amker.cheng@gmail.com X-HELO: mail-vk0-f49.google.com Received: from mail-vk0-f49.google.com (HELO mail-vk0-f49.google.com) (209.85.213.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 20 Jul 2016 16:23:16 +0000 Received: by mail-vk0-f49.google.com with SMTP id w127so76183364vkh.2 for ; Wed, 20 Jul 2016 09:23:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=KGoc61+jCgM1ciyQMbutSbh+JnvUBX1C5S1SUFNmRVE=; b=IPz6ybTForjV/8LTocECLZ4dS/7PsDsG12P+MfcX5kmwdG4+IEMhsKjpSG0WJ/F9Gi L3wCjvNxtUz5bk5XavgFCQiCKCnfIWbshIxlr+J4JGv+VgUUhcsfPoOefS+51jW16g0o ZCRPa0vPNQRCdDQ3xufYnFtNtgLsRhq3VeDyYgrwRT+ZBKCWzfO97VRqCU1qZTcfQ9o/ F5+QxzM3cYFkTkrvWIMFZLhrqD0titTBPl1uogaImjQc2DYwcoLr8ApNOha6rRF3uBXO 0/L7tx2Ikib0FVtQxRj2FUaydTd7QsJhgOoKWiHqkwggzpANtsPciSGN18hkCNoufhSd 20Lw== X-Gm-Message-State: ALyK8tJ6WxB/AdOya8dE0Ry4+7XbDpLEJ3KVq+HN8TScmY3jUda29I06RmCRmYFCPmKbjZB2SkYECCFxf1swWQ== X-Received: by 10.31.177.139 with SMTP id a133mr23805708vkf.151.1469031793739; Wed, 20 Jul 2016 09:23:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.116.144 with HTTP; Wed, 20 Jul 2016 09:23:13 -0700 (PDT) In-Reply-To: References: From: "Bin.Cheng" Date: Wed, 20 Jul 2016 17:23:13 +0100 Message-ID: Subject: Re: [PATCH GCC]Improve no-overflow check in SCEV using value range info. To: Richard Biener Cc: Bin Cheng , "gcc-patches@gcc.gnu.org" , nd X-IsSubscribed: yes On Wed, Jul 20, 2016 at 11:01 AM, Richard Biener wrote: > On Tue, Jul 19, 2016 at 6:15 PM, Bin.Cheng wrote: >> On Tue, Jul 19, 2016 at 1:10 PM, Richard Biener >> wrote: >>> On Mon, Jul 18, 2016 at 6:27 PM, Bin Cheng wrote: >>>> Hi, >>>> Scalar evolution needs to prove no-overflow for source variable when handling type conversion. This is important because otherwise we would fail to recognize result of the conversion as SCEV, resulting in missing loop optimizations. Take case added by this patch as an example, the loop can't be distributed as memset call because address of memory reference is not recognized. At the moment, we rely on type overflow semantics and loop niter info for no-overflow checking, unfortunately that's not enough. This patch introduces new method checking no-overflow using value range information. As commented in the patch, value range can only be used when source operand variable evaluates on every loop iteration, rather than guarded by some conditions. >>>> >>>> This together with patch improving loop niter analysis (https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00736.html) can help various loop passes like vectorization. >>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>> >>> @@ -3187,7 +3187,8 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta) >>> /* If access is not executed on every iteration, we must ensure that overlow >>> may not make the access valid later. */ >>> if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)) >>> - && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num), >>> + && scev_probably_wraps_p (NULL, >>> >>> use NULL_TREE for the null pointer constant of tree. >>> >>> + /* Check if VAR evaluates in every loop iteration. */ >>> + gimple *def; >>> + if ((def = SSA_NAME_DEF_STMT (var)) != NULL >>> >>> def is never NULL but it might be a GIMPLE_NOP which has a NULL gimple_bb. >>> Better check for ! SSA_DEFAULT_DEF_P (var) >>> >>> + if (TREE_CODE (step) != INTEGER_CST || !INTEGRAL_TYPE_P (TREE_TYPE (var))) >>> + return false; >>> >>> this looks like a cheaper test so please do that first. >>> >>> + step_wi = step; >>> + type = TREE_TYPE (var); >>> + if (tree_int_cst_sign_bit (step)) >>> + { >>> + diff = lower_bound_in_type (type, type); >>> + diff = minv - diff; >>> + step_wi = - step_wi; >>> + } >>> + else >>> + { >>> + diff = upper_bound_in_type (type, type); >>> + diff = diff - maxv; >>> + } >>> >>> this lacks a comment - it's not obvious to me what the gymnastics >>> with lower/upper_bound_in_type are supposed to achieve. >> >> Thanks for reviewing, I will prepare another version of patch. >>> >>> As VRP uses niter analysis itself I wonder how this fires back-to-back between >> I am not sure if I mis-understood the question. If the VRP >> information comes from loop niter, I think it will not change loop >> niter or VRP2 in back because that's the best information we got in >> the first place in niter. If the VRP information comes from other >> places (guard conditions?) SCEV and loop niter after vrp1 might be >> improved and thus VRP2. There should be no problems in either case, >> as long as GCC breaks the recursive chain among niter/scev/vrp >> correctly. > > Ok. > >>> VRP1 and VRP2? If the def of var dominates the latch isn't it enough to do >>> a + 1 to check whether VRP bumped the range up to INT_MAX/MIN? That is, >>> why do we need to add step if not for the TYPE_OVERFLOW_UNDEFINED case >>> of VRP handling the ranges optimistically? >> Again, please correct me if I mis-understood. Considering a variable >> whose type is unsigned int and scev is {0, 4}_loop, the value range >> could be computed as [0, 0xfffffffc], thus MAX + 1 is smaller than >> type_MAX, but the scev could be overflow. > > Yes. I was wondering about the case where VRP bumps the range to +INF > because it gave up during iteration or because overflow behavior is undefined. > Do I understand correctly that the code is mostly to improve the not > undefined-overflow case? Hi Richard, I think we resolved these on IRC, here are words for the record. The motivation case is for unsigned type loop counter, while the patch should work for signed type in theory. Considering a loop has signed char counter i and it's used in array_ref[i + 10], since front-end converts signed char addition into unsigned operation, we may need the range information to prove (unsigned char)i + 10 doesn't overflow, thus address of array reference is a scev. I am not sure if the signed case can be handled by current code, or there are other fallouts preventing this patch from working. > > Also I was wondering if the range DEF dominates the latch then why > do we necessarily need to add step to verify overflow? Can't we do better > if we for example see that the DEF is the loop header PHI? I don't think we can, and there is nothing special for loop header PHI in this case, right? Attachment is the updated patch with your comments resolved. Bootstrap and test on x86_64 and AArch64, is it OK? Thanks, bin diff --git a/gcc/tree-chrec.c b/gcc/tree-chrec.c index ee789a2..707a3aa 100644 --- a/gcc/tree-chrec.c +++ b/gcc/tree-chrec.c @@ -1162,16 +1162,17 @@ nb_vars_in_chrec (tree chrec) /* Converts BASE and STEP of affine scev to TYPE. LOOP is the loop whose iv the scev corresponds to. AT_STMT is the statement at that the scev is - evaluated. USE_OVERFLOW_SEMANTICS is true if this function should assume that - the rules for overflow of the given language apply (e.g., that signed - arithmetics in C does not overflow) -- i.e., to use them to avoid unnecessary - tests, but also to enforce that the result follows them. Returns true if the - conversion succeeded, false otherwise. */ + evaluated. USE_OVERFLOW_SEMANTICS is true if this function should assume + that the rules for overflow of the given language apply (e.g., that signed + arithmetics in C does not overflow) -- i.e., to use them to avoid + unnecessary tests, but also to enforce that the result follows them. + FROM is the source variable converted if it's not NULL. Returns true if + the conversion succeeded, false otherwise. */ bool convert_affine_scev (struct loop *loop, tree type, tree *base, tree *step, gimple *at_stmt, - bool use_overflow_semantics) + bool use_overflow_semantics, tree from) { tree ct = TREE_TYPE (*step); bool enforce_overflow_semantics; @@ -1230,7 +1231,7 @@ convert_affine_scev (struct loop *loop, tree type, must_check_rslt_overflow = false; if (must_check_src_overflow - && scev_probably_wraps_p (*base, *step, at_stmt, loop, + && scev_probably_wraps_p (from, *base, *step, at_stmt, loop, use_overflow_semantics)) return false; @@ -1258,7 +1259,8 @@ convert_affine_scev (struct loop *loop, tree type, if (must_check_rslt_overflow /* Note that in this case we cannot use the fact that signed variables do not overflow, as this is what we are verifying for the new iv. */ - && scev_probably_wraps_p (new_base, new_step, at_stmt, loop, false)) + && scev_probably_wraps_p (NULL_TREE, new_base, new_step, + at_stmt, loop, false)) return false; *base = new_base; @@ -1288,12 +1290,14 @@ chrec_convert_rhs (tree type, tree chrec, gimple *at_stmt) USE_OVERFLOW_SEMANTICS is true if this function should assume that the rules for overflow of the given language apply (e.g., that signed - arithmetics in C does not overflow) -- i.e., to use them to avoid unnecessary - tests, but also to enforce that the result follows them. */ + arithmetics in C does not overflow) -- i.e., to use them to avoid + unnecessary tests, but also to enforce that the result follows them. + + FROM is the source variable converted if it's not NULL. */ static tree chrec_convert_1 (tree type, tree chrec, gimple *at_stmt, - bool use_overflow_semantics) + bool use_overflow_semantics, tree from) { tree ct, res; tree base, step; @@ -1314,7 +1318,7 @@ chrec_convert_1 (tree type, tree chrec, gimple *at_stmt, step = CHREC_RIGHT (chrec); if (convert_affine_scev (loop, type, &base, &step, at_stmt, - use_overflow_semantics)) + use_overflow_semantics, from)) return build_polynomial_chrec (loop->num, base, step); /* If we cannot propagate the cast inside the chrec, just keep the cast. */ @@ -1347,7 +1351,7 @@ keep_cast: CHREC_LEFT (chrec)), fold_convert (utype, CHREC_RIGHT (chrec))); - res = chrec_convert_1 (type, res, at_stmt, use_overflow_semantics); + res = chrec_convert_1 (type, res, at_stmt, use_overflow_semantics, from); } else res = fold_convert (type, chrec); @@ -1395,14 +1399,16 @@ keep_cast: USE_OVERFLOW_SEMANTICS is true if this function should assume that the rules for overflow of the given language apply (e.g., that signed - arithmetics in C does not overflow) -- i.e., to use them to avoid unnecessary - tests, but also to enforce that the result follows them. */ + arithmetics in C does not overflow) -- i.e., to use them to avoid + unnecessary tests, but also to enforce that the result follows them. + + FROM is the source variable converted if it's not NULL. */ tree chrec_convert (tree type, tree chrec, gimple *at_stmt, - bool use_overflow_semantics) + bool use_overflow_semantics, tree from) { - return chrec_convert_1 (type, chrec, at_stmt, use_overflow_semantics); + return chrec_convert_1 (type, chrec, at_stmt, use_overflow_semantics, from); } /* Convert CHREC to TYPE, without regard to signed overflows. Returns the new diff --git a/gcc/tree-chrec.h b/gcc/tree-chrec.h index 383271c..877330e 100644 --- a/gcc/tree-chrec.h +++ b/gcc/tree-chrec.h @@ -59,7 +59,7 @@ enum ev_direction scev_direction (const_tree); extern tree chrec_fold_plus (tree, tree, tree); extern tree chrec_fold_minus (tree, tree, tree); extern tree chrec_fold_multiply (tree, tree, tree); -extern tree chrec_convert (tree, tree, gimple *, bool = true); +extern tree chrec_convert (tree, tree, gimple *, bool = true, tree = NULL); extern tree chrec_convert_rhs (tree, tree, gimple *); extern tree chrec_convert_aggressive (tree, tree, bool *); @@ -75,7 +75,7 @@ extern tree reset_evolution_in_loop (unsigned, tree, tree); extern tree chrec_merge (tree, tree); extern void for_each_scev_op (tree *, bool (*) (tree *, void *), void *); extern bool convert_affine_scev (struct loop *, tree, tree *, tree *, gimple *, - bool); + bool, tree = NULL); /* Observers. */ extern bool eq_evolutions_p (const_tree, const_tree); diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index ae5e907..1ba26f3 100644 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -1933,7 +1933,7 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt, } else chrec1 = analyze_scalar_evolution (loop, rhs1); - res = chrec_convert (type, chrec1, at_stmt); + res = chrec_convert (type, chrec1, at_stmt, true, rhs1); break; case BIT_AND_EXPR: diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 0723752..785580f 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -3187,7 +3187,8 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta) /* If access is not executed on every iteration, we must ensure that overlow may not make the access valid later. */ if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)) - && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num), + && scev_probably_wraps_p (NULL_TREE, + initial_condition_in_loop_num (ev, loop->num), step, data->stmt, loop, true)) upper = false; @@ -4252,6 +4253,104 @@ loop_exits_before_overflow (tree base, tree step, return false; } +/* VAR is scev variable whose evolution part is constant STEP, this function + proves that VAR can't overflow by using value range info. If VAR's value + range is [MIN, MAX], it can be proven by: + MAX + step doesn't overflow ; if step > 0 + or + MIN + step doesn't underflow ; if step < 0. + + We can only do this if var is computed in every loop iteration, i.e, var's + definition has to dominate loop latch. Consider below example: + + { + unsigned int i; + + : + + : + # RANGE [0, 4294967294] NONZERO 65535 + # i_21 = PHI <0(3), i_18(9)> + if (i_21 != 0) + goto ; + else + goto ; + + : + # RANGE [0, 65533] NONZERO 65535 + _6 = i_21 + 4294967295; + # RANGE [0, 65533] NONZERO 65535 + _7 = (long unsigned int) _6; + # RANGE [0, 524264] NONZERO 524280 + _8 = _7 * 8; + # PT = nonlocal escaped + _9 = a_14 + _8; + *_9 = 0; + + : + # RANGE [1, 65535] NONZERO 65535 + i_18 = i_21 + 1; + if (i_18 >= 65535) + goto ; + else + goto ; + + : + goto ; + + : + return; + } + + VAR _6 doesn't overflow only with pre-condition (i_21 != 0), here we + can't use _6 to prove no-overlfow for _7. In fact, var _7 takes value + sequence (4294967295, 0, 1, ..., 65533) in loop life time, rather than + (4294967295, 4294967296, ...). */ + +static bool +scev_var_range_cant_overflow (tree var, tree step, struct loop *loop) +{ + tree type; + wide_int minv, maxv, diff, step_wi; + enum value_range_type rtype; + + if (TREE_CODE (step) != INTEGER_CST || !INTEGRAL_TYPE_P (TREE_TYPE (var))) + return false; + + /* Check if VAR evaluates in every loop iteration. It's not the case + if VAR is default definition or does not dominate loop's latch. */ + basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var)); + if (!def_bb || !dominated_by_p (CDI_DOMINATORS, loop->latch, def_bb)) + return false; + + rtype = get_range_info (var, &minv, &maxv); + if (rtype != VR_RANGE) + return false; + + /* VAR is a scev whose evolution part is STEP and value range info + is [MIN, MAX], we can prove its no-overflowness by conditions: + + type_MAX - MAX >= step ; if step > 0 + MIN - type_MIN >= |step| ; if step < 0. + + Or VAR must take value outside of value range, which is not true. */ + step_wi = step; + type = TREE_TYPE (var); + if (tree_int_cst_sign_bit (step)) + { + diff = lower_bound_in_type (type, type); + diff = minv - diff; + step_wi = - step_wi; + } + else + { + diff = upper_bound_in_type (type, type); + diff = diff - maxv; + } + + return (wi::geu_p (diff, step_wi)); +} + /* Return false only when the induction variable BASE + STEP * I is known to not overflow: i.e. when the number of iterations is small enough with respect to the step and initial condition in order to @@ -4260,10 +4359,13 @@ loop_exits_before_overflow (tree base, tree step, USE_OVERFLOW_SEMANTICS is true if this function should assume that the rules for overflow of the given language apply (e.g., that signed - arithmetics in C does not overflow). */ + arithmetics in C does not overflow). + + If VAR is a ssa variable, this function also returns false if VAR can + be proven not overflow with value range info. */ bool -scev_probably_wraps_p (tree base, tree step, +scev_probably_wraps_p (tree var, tree base, tree step, gimple *at_stmt, struct loop *loop, bool use_overflow_semantics) { @@ -4300,6 +4402,11 @@ scev_probably_wraps_p (tree base, tree step, if (TREE_CODE (step) != INTEGER_CST) return true; + /* Check if var can be proven not overflow with value range info. */ + if (var && TREE_CODE (var) == SSA_NAME + && scev_var_range_cant_overflow (var, step, loop)) + return false; + if (loop_exits_before_overflow (base, step, at_stmt, loop)) return false; diff --git a/gcc/tree-ssa-loop-niter.h b/gcc/tree-ssa-loop-niter.h index 1b5d111..6fce752 100644 --- a/gcc/tree-ssa-loop-niter.h +++ b/gcc/tree-ssa-loop-niter.h @@ -46,7 +46,8 @@ extern bool estimated_stmt_executions (struct loop *, widest_int *); extern void estimate_numbers_of_iterations (void); extern bool stmt_dominates_stmt_p (gimple *, gimple *); extern bool nowrap_type_p (tree); -extern bool scev_probably_wraps_p (tree, tree, gimple *, struct loop *, bool); +extern bool scev_probably_wraps_p (tree, tree, tree, gimple *, + struct loop *, bool); extern void free_loop_control_ivs (struct loop *); extern void free_numbers_of_iterations_estimates_loop (struct loop *); extern void free_numbers_of_iterations_estimates (function *); diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 4333d60..b9ccb73 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4156,8 +4156,8 @@ adjust_range_with_scev (value_range *vr, struct loop *loop, or decreases, ... */ dir == EV_DIR_UNKNOWN /* ... or if it may wrap. */ - || scev_probably_wraps_p (init, step, stmt, get_chrec_loop (chrec), - true)) + || scev_probably_wraps_p (NULL_TREE, init, step, stmt, + get_chrec_loop (chrec), true)) return; /* We use TYPE_MIN_VALUE and TYPE_MAX_VALUE here instead of diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-15.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-15.c new file mode 100644 index 0000000..a0d2e59 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-15.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-ldist" } */ + +void +foo (int *p) +{ + unsigned short i, j; + + for (i = 0; i < 100; i++) + for (j = 1; j < 101; j++) + { + unsigned int index = 100 * i + j; + p[index-1] = 0; + } +} + +/* Loop can be transformed into builtin memset since &p[...] is SCEV. */ +/* { dg-final { scan-tree-dump "builtin_memset" "ldist" } } */