From patchwork Tue Jul 28 09:36:02 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bin Cheng X-Patchwork-Id: 501101 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5D411140D25 for ; Tue, 28 Jul 2015 19:36:35 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=Jwamh6xp; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=x7SF+53hskgKC4ahSISrPfLYVOldpODrlK0Z+Y0PPX1ZOVfcrnUVt qStDzr6PPk9TBAn/mfY20V/r0CcFRIDqr79UkuYngB01ugChaEbEMF6W3EjuTuMt INCh15ZdtgAMyUTr/Oxh1Zoa6OfBiwtg9v77l/hX4AR+2EJtYCcuVE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=assqvEaNnsI4lv3PKfu9XYjg6ug=; b=Jwamh6xpuvHZL8reN5lO Be1OYSchhSWPu7fLF3A9P0ss5oGeri5MaE1EiRTx9H9I4hRhd1lV/OIGgSwXVQD/ pUyWOWry88fs1NCF7xtiLiffkdrwMEX2ciX3XdObnssWiWrDD2evdbE+xJDVXONe 9js3EAz3sedgQn1LeK9qPM8= Received: (qmail 5646 invoked by alias); 28 Jul 2015 09:36:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 5631 invoked by uid 89); 28 Jul 2015 09:36:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=no version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Jul 2015 09:36:24 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-28-hFc3QJObQJih0y0nk_PrpQ-1; Tue, 28 Jul 2015 10:36:19 +0100 Received: from shawin233 ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 28 Jul 2015 10:36:17 +0100 From: "Bin Cheng" To: Subject: [PATCH GCC]Improve bound information in loop niter analysis Date: Tue, 28 Jul 2015 17:36:02 +0800 Message-ID: <000401d0c918$d7a2e780$86e8b680$@arm.com> MIME-Version: 1.0 X-MC-Unique: hFc3QJObQJih0y0nk_PrpQ-1 X-IsSubscribed: yes Hi, Loop niter computes inaccurate bound information for different loops. This patch is to improve it by using loop initial condition in determine_value_range. Generally, loop niter is computed by subtracting start var from end var in loop exit condition. Moreover, loop bound is computed using value range information of both start and end variables. Basic idea of this patch is to check if loop initial condition implies more range information for both start/end variables. If yes, we refine range information and use that to compute loop bound. With this improvement, more accurate loop bound information is computed for test cases added by this patch. Is it OK? Thanks, bin 2015-07-28 Bin Cheng * tree-ssa-loop-niter.c (refine_value_range_using_guard): New. (determine_value_range): Call refine_value_range_using_guard for each loop initial condition to improve value range. gcc/testsuite/ChangeLog 2015-07-28 Bin Cheng * gcc.dg/tree-ssa/loop-bound-1.c: New test. * gcc.dg/tree-ssa/loop-bound-3.c: New test. * gcc.dg/tree-ssa/loop-bound-5.c: New test. Index: gcc/testsuite/gcc.dg/tree-ssa/loop-bound-3.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/loop-bound-3.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-bound-3.c (revision 0) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */ + +int *a; + +int +foo (unsigned char s, unsigned char l) +{ + unsigned char i; + int sum = 0; + + for (i = s; i > l; i -= 1) + { + sum += a[i]; + } + + return sum; +} + +/* Check loop niter bound information. */ +/* { dg-final { scan-tree-dump "bounded by 254" "ivopts" } } */ +/* { dg-final { scan-tree-dump-not "bounded by 255" "ivopts" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/loop-bound-5.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/loop-bound-5.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-bound-5.c (revision 0) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */ + +int *a; + +int +foo (unsigned char s) +{ + unsigned char i; + int sum = 0; + + for (i = s; i > 0; i -= 1) + { + sum += a[i]; + } + + return sum; +} + +/* Check loop niter bound information. */ +/* { dg-final { scan-tree-dump "bounded by 254" "ivopts" } } */ +/* { dg-final { scan-tree-dump-not "bounded by 255" "ivopts" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/loop-bound-1.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/loop-bound-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-bound-1.c (revision 0) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */ + +int *a; + +int +foo (unsigned char s, unsigned char l) +{ + unsigned char i; + int sum = 0; + + for (i = s; i < l; i += 1) + { + sum += a[i]; + } + + return sum; +} + +/* Check loop niter bound information. */ +/* { dg-final { scan-tree-dump "bounded by 254" "ivopts" } } */ +/* { dg-final { scan-tree-dump-not "bounded by 255" "ivopts" } } */ Index: gcc/tree-ssa-loop-niter.c =================================================================== --- gcc/tree-ssa-loop-niter.c (revision 225859) +++ gcc/tree-ssa-loop-niter.c (working copy) @@ -122,6 +122,233 @@ split_to_var_and_offset (tree expr, tree *var, mpz } } +/* From condition C0 CMP C1 derives information regarding the value range + of VAR, which is of TYPE. Results are stored in to BELOW and UP. */ + +static void +refine_value_range_using_guard (tree type, tree var, + tree c0, enum tree_code cmp, tree c1, + mpz_t below, mpz_t up) +{ + tree varc0, varc1, ctype; + mpz_t offc0, offc1; + mpz_t mint, maxt, minc1, maxc1; + wide_int minv, maxv; + bool no_wrap = nowrap_type_p (type); + bool c0_ok, c1_ok; + signop sgn = TYPE_SIGN (type); + + switch (cmp) + { + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + STRIP_SIGN_NOPS (c0); + STRIP_SIGN_NOPS (c1); + ctype = TREE_TYPE (c0); + if (!useless_type_conversion_p (ctype, type)) + return; + + break; + + case EQ_EXPR: + /* We could derive quite precise information from EQ_EXPR, however, + such a guard is unlikely to appear, so we do not bother with + handling it. */ + return; + + case NE_EXPR: + /* NE_EXPR comparisons do not contain much of useful information, + except for cases of comparing with bounds. */ + if (TREE_CODE (c1) != INTEGER_CST + || !INTEGRAL_TYPE_P (type)) + return; + + /* Ensure that the condition speaks about an expression in the same + type as X and Y. */ + ctype = TREE_TYPE (c0); + if (TYPE_PRECISION (ctype) != TYPE_PRECISION (type)) + return; + c0 = fold_convert (type, c0); + c1 = fold_convert (type, c1); + + if (operand_equal_p (var, c0, 0)) + { + mpz_t valc1; + + /* Case of comparing VAR with its below/up bounds. */ + mpz_init (valc1); + wi::to_mpz (c1, valc1, TYPE_SIGN (type)); + if (mpz_cmp (valc1, below) == 0) + cmp = GT_EXPR; + if (mpz_cmp (valc1, up) == 0) + cmp = LT_EXPR; + + mpz_clear (valc1); + } + else + { + /* Case of comparing with the bounds of the type. */ + if (TYPE_MIN_VALUE (type) + && operand_equal_p (c1, TYPE_MIN_VALUE (type), 0)) + cmp = GT_EXPR; + if (TYPE_MAX_VALUE (type) + && operand_equal_p (c1, TYPE_MAX_VALUE (type), 0)) + cmp = LT_EXPR; + } + + /* Quick return if no useful information. */ + if (cmp == NE_EXPR) + return; + + break; + + default: + return; + } + + mpz_init (offc0); + mpz_init (offc1); + split_to_var_and_offset (expand_simple_operations (c0), &varc0, offc0); + split_to_var_and_offset (expand_simple_operations (c1), &varc1, offc1); + + /* We are only interested in comparisons of expressions based on VAR. */ + if (operand_equal_p (var, varc1, 0)) + { + std::swap (varc0, varc1); + mpz_swap (offc0, offc1); + cmp = swap_tree_comparison (cmp); + } + else if (!operand_equal_p (var, varc0, 0)) + goto end_2; + + mpz_init (mint); + mpz_init (maxt); + get_type_static_bounds (type, mint, maxt); + mpz_init (minc1); + mpz_init (maxc1); + /* Setup range information for varc1. */ + if (integer_zerop (varc1)) + { + wi::to_mpz (integer_zero_node, minc1, TYPE_SIGN (type)); + wi::to_mpz (integer_zero_node, maxc1, TYPE_SIGN (type)); + } + else if (TREE_CODE (varc1) == SSA_NAME + && INTEGRAL_TYPE_P (type) + && get_range_info (varc1, &minv, &maxv) == VR_RANGE) + { + gcc_assert (wi::le_p (minv, maxv, sgn)); + wi::to_mpz (minv, minc1, sgn); + wi::to_mpz (maxv, maxc1, sgn); + } + else + { + mpz_set (minc1, mint); + mpz_set (maxc1, maxt); + } + + /* Compute valid range information for varc1 + offc1. Note nothing + useful can be derived if it overflows or underflows. Overflow or + underflow could happen when: + + offc1 > 0 && varc1 + offc1 > MAX_VAL (type) + offc1 < 0 && varc1 + offc1 < MIN_VAL (type). */ + mpz_add (minc1, minc1, offc1); + mpz_add (maxc1, maxc1, offc1); + c1_ok = (no_wrap + || mpz_sgn (offc1) == 0 + || (mpz_sgn (offc1) < 0 && mpz_cmp (minc1, mint) >= 0) + || (mpz_sgn (offc1) > 0 && mpz_cmp (maxc1, maxt) <= 0)); + if (!c1_ok) + goto end_1; + + if (mpz_cmp (minc1, mint) < 0) + mpz_set (minc1, mint); + if (mpz_cmp (maxc1, maxt) > 0) + mpz_set (maxc1, maxt); + + if (cmp == LT_EXPR) + { + cmp = LE_EXPR; + mpz_sub_ui (maxc1, maxc1, 1); + } + if (cmp == GT_EXPR) + { + cmp = GE_EXPR; + mpz_add_ui (minc1, minc1, 1); + } + + /* Compute range information for varc0. If there is no overflow, + the condition implied that + + (varc0) cmp (varc1 + offc1 - offc0) + + We can possibly improve the upper bound of varc0 if cmp is LE_EXPR, + or the below bound if cmp is GE_EXPR. + + To prove there is no overflow/underflow, we need to check below + four cases: + 1) cmp == LE_EXPR && offc0 > 0 + + (varc0 + offc0) doesn't overflow + && (varc1 + offc1 - offc0) doesn't underflow + + 2) cmp == LE_EXPR && offc0 < 0 + + (varc0 + offc0) doesn't underflow + && (varc1 + offc1 - offc0) doesn't overfloe + + In this case, (varc0 + offc0) will never underflow if we can + prove (varc1 + offc1 - offc0) doesn't overflow. + + 3) cmp == GE_EXPR && offc0 < 0 + + (varc0 + offc0) doesn't underflow + && (varc1 + offc1 - offc0) doesn't overflow + + 4) cmp == GE_EXPR && offc0 > 0 + + (varc0 + offc0) doesn't overflow + && (varc1 + offc1 - offc0) doesn't underflow + + In this case, (varc0 + offc0) will never overflow if we can + prove (varc1 + offc1 - offc0) doesn't underflow. + + Note we only handle case 2 and 4 in below code. */ + + mpz_sub (minc1, minc1, offc0); + mpz_sub (maxc1, maxc1, offc0); + c0_ok = (no_wrap + || mpz_sgn (offc0) == 0 + || (cmp == LE_EXPR + && mpz_sgn (offc0) < 0 && mpz_cmp (maxc1, maxt) <= 0) + || (cmp == GE_EXPR + && mpz_sgn (offc0) > 0 && mpz_cmp (minc1, mint) >= 0)); + if (!c0_ok) + goto end_1; + + if (cmp == LE_EXPR) + { + if (mpz_cmp (up, maxc1) > 0) + mpz_set (up, maxc1); + } + else + { + if (mpz_cmp (below, minc1) < 0) + mpz_set (below, minc1); + } + +end_1: + mpz_clear (mint); + mpz_clear (maxt); + mpz_clear (minc1); + mpz_clear (maxc1); +end_2: + mpz_clear (offc0); + mpz_clear (offc1); +} + /* Stores estimate on the minimum/maximum value of the expression VAR + OFF in TYPE to MIN and MAX. */ @@ -129,6 +356,9 @@ static void determine_value_range (struct loop *loop, tree type, tree var, mpz_t off, mpz_t min, mpz_t max) { + int cnt = 0; + mpz_t minm, maxm; + basic_block bb; wide_int minv, maxv; enum value_range_type rtype = VR_VARYING; @@ -183,35 +413,69 @@ determine_value_range (struct loop *loop, tree typ } } } - if (rtype == VR_RANGE) + mpz_init (minm); + mpz_init (maxm); + if (rtype != VR_RANGE) { - mpz_t minm, maxm; + mpz_set (minm, min); + mpz_set (maxm, max); + } + else + { gcc_assert (wi::le_p (minv, maxv, sgn)); - mpz_init (minm); - mpz_init (maxm); wi::to_mpz (minv, minm, sgn); wi::to_mpz (maxv, maxm, sgn); - mpz_add (minm, minm, off); - mpz_add (maxm, maxm, off); - /* If the computation may not wrap or off is zero, then this - is always fine. If off is negative and minv + off isn't - smaller than type's minimum, or off is positive and - maxv + off isn't bigger than type's maximum, use the more - precise range too. */ - if (nowrap_type_p (type) - || mpz_sgn (off) == 0 - || (mpz_sgn (off) < 0 && mpz_cmp (minm, min) >= 0) - || (mpz_sgn (off) > 0 && mpz_cmp (maxm, max) <= 0)) - { - mpz_set (min, minm); - mpz_set (max, maxm); - mpz_clear (minm); - mpz_clear (maxm); - return; - } + } + /* Now walk the dominators of the loop header and use the entry + guards to refine the estimates. */ + for (bb = loop->header; + bb != ENTRY_BLOCK_PTR_FOR_FN (cfun) && cnt < MAX_DOMINATORS_TO_WALK; + bb = get_immediate_dominator (CDI_DOMINATORS, bb)) + { + edge e; + tree c0, c1; + gimple cond; + enum tree_code cmp; + + if (!single_pred_p (bb)) + continue; + e = single_pred_edge (bb); + + if (!(e->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE))) + continue; + + cond = last_stmt (e->src); + c0 = gimple_cond_lhs (cond); + cmp = gimple_cond_code (cond); + c1 = gimple_cond_rhs (cond); + + if (e->flags & EDGE_FALSE_VALUE) + cmp = invert_tree_comparison (cmp, false); + + refine_value_range_using_guard (type, var, c0, cmp, c1, minm, maxm); + ++cnt; + } + + mpz_add (minm, minm, off); + mpz_add (maxm, maxm, off); + /* If the computation may not wrap or off is zero, then this + is always fine. If off is negative and minv + off isn't + smaller than type's minimum, or off is positive and + maxv + off isn't bigger than type's maximum, use the more + precise range too. */ + if (nowrap_type_p (type) + || mpz_sgn (off) == 0 + || (mpz_sgn (off) < 0 && mpz_cmp (minm, min) >= 0) + || (mpz_sgn (off) > 0 && mpz_cmp (maxm, max) <= 0)) + { + mpz_set (min, minm); + mpz_set (max, maxm); mpz_clear (minm); mpz_clear (maxm); + return; } + mpz_clear (minm); + mpz_clear (maxm); } /* If the computation may wrap, we know nothing about the value, except for