From patchwork Sat Oct 19 06:27:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 1179748 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-511345-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="eGNwg0AT"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46wCd46B49z9sNw for ; Sat, 19 Oct 2019 17:27:50 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=lYgb0RSlx93IScCgKyRbmABXTty/G SVOXJ0ZBKBUmFndbU/1akfvgz1gEjQYZb9qq7mvUXBf6yAwlLfRacLTsCVlPLvCz vaG2ul9UdkGZ1ITw64/cGpM/gJBXl/KQ+hvWQx0WQJEm1YQ2NKRy8PjFKk1k6/2h /kypqlFKYUCvGs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; s=default; bh=WOgv3ZlA7VEFIfgS2/yzYFyPsR0=; b=eGN wg0ATIZ/fli+CU01zEcnDe4qM85il+PdLRfwSYVTD15Of+Gb9Ft5kdaP4DCoNmA0 ky7Qd7+1v5Ss/PqCdBqjEWlHyQBIhdlYlOcrAQZdV/U17rVdPn6+8eQx8JYWCiBG PTew0nMKeHwhuXW7w+u063J27+fnG2tbqz2HnGJo= Received: (qmail 130052 invoked by alias); 19 Oct 2019 06:27:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 130043 invoked by uid 89); 19 Oct 2019 06:27:42 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=type_precision, IVs, TYPE_PRECISION, ivs X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 19 Oct 2019 06:27:39 +0000 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 35B4E308A98C; Sat, 19 Oct 2019 06:27:38 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.36.118.135]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9D6C3100EBDA; Sat, 19 Oct 2019 06:27:37 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id x9J6RZ9E006922; Sat, 19 Oct 2019 08:27:35 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id x9J6RVDS006921; Sat, 19 Oct 2019 08:27:31 +0200 Date: Sat, 19 Oct 2019 08:27:31 +0200 From: Jakub Jelinek To: "Bin.Cheng" , Richard Biener , Alexandre Oliva Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231) Message-ID: <20191019062731.GL2116@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.11.3 (2019-02-01) X-IsSubscribed: yes Hi! As mentioned in the PR, the following patch attempts to address two issues. In remove_unused_ivs we first find the best iv_cand (we prefer primarily the same step, next same mode and lastly constant base) and only then call get_computation_at to determine the replacement expression. Unfortunately, in various cases get_computation_at can return NULL_TREE and in that case we don't try any other candidate and just leave the vars for debug as is, which results in => NULL and the IVs . The following patch will only consider candidates for which get_computation_at succeeds, while it can be slower, it can handle more cases. Perhaps alternative would be to have two passes, pick up the best candidate without calling get_computation_at, calling it on the best candidate and if that fails, retry the best candidate search with calling get_computation_at. Another thing is that get_computation_at can only handle cases where use can be expressed as ubase + (var - cbase) * ratio for integral ratio. In the PR testcase we don't have any, one (the one we pick as best) has a ratio 1/15 and the other 1/4. Using a division in ivopts at runtime is hardly ever desirable, but for debug info we don't mind that, all we need to ensure is that we don't result in wrong-debug. The patch implements expressing use as ubase + (var - cbase) / ratio for these debug info only uses, but so far requires that if the use IV couldn't wrap around, that the candidate (var - cbase) can't wrap around either, which should be true whenever the candidate type is IMHO at least ceil_log2 (ratio) bits larger than the use type. Do I need to punt if !use->iv->no_overflow, or is ubase + (utype) ((var - cbase) / ratio) computation safe even if it wraps? And as questioned in the PR, are there other cases where we can safely assume no wrap (e.g. use it if use->iv->no_overflow && cand->iv->no_overflow without those extra precision checks)? Anyway, bootstrapped/regtested successfully on x86_64-linux and i686-linux. Comparing bootstrapped cc1plus without and with this patch (the former with the patch applied after bootstrap and stage3 rebuilt, so that .text etc. is identical) shows: - [33] .debug_info PROGBITS 0000000000000000 22a4788 749275e 00 0 0 1 - [34] .debug_abbrev PROGBITS 0000000000000000 9736ee6 204aad 00 0 0 1 - [35] .debug_line PROGBITS 0000000000000000 993b993 1688464 00 0 0 1 - [36] .debug_str PROGBITS 0000000000000000 afc3df7 6f65aa 01 MS 0 0 1 - [37] .debug_loc PROGBITS 0000000000000000 b6ba3a1 71a2dde 00 0 0 1 - [38] .debug_ranges PROGBITS 0000000000000000 1285d17f 16414d0 00 0 0 1 - [39] .symtab SYMTAB 0000000000000000 13e9e650 166ff8 18 40 38193 8 - [40] .strtab STRTAB 0000000000000000 14005648 2ad809 00 0 0 1 - [41] .shstrtab STRTAB 0000000000000000 142b2e51 0001a0 00 0 0 1 + [33] .debug_info PROGBITS 0000000000000000 22a4788 749365e 00 0 0 1 + [34] .debug_abbrev PROGBITS 0000000000000000 9737de6 204a9f 00 0 0 1 + [35] .debug_line PROGBITS 0000000000000000 993c885 1688f0c 00 0 0 1 + [36] .debug_str PROGBITS 0000000000000000 afc5791 6f65aa 01 MS 0 0 1 + [37] .debug_loc PROGBITS 0000000000000000 b6bbd3b 71cd404 00 0 0 1 + [38] .debug_ranges PROGBITS 0000000000000000 1288913f 16414b0 00 0 0 1 + [39] .symtab SYMTAB 0000000000000000 13eca5f0 166ff8 18 40 38193 8 + [40] .strtab STRTAB 0000000000000000 140315e8 2ad809 00 0 0 1 + [41] .shstrtab STRTAB 0000000000000000 142dedf1 0001a0 00 0 0 1 so .debug_info is 3840 bytes larger and .debug_loc is 173606 bytes larger (0.15%), so there are some changes with this, but not huge amount of them, though .debug_loc size changed in 217 gcc/*.o files out of 474. 2019-10-18 Jakub Jelinek PR debug/90231 * tree-ssa-loop-ivopts.c (get_debug_computation_at): New function. (remove_unused_ivs): Use it instead of get_computation_at. When choosing best candidate, only consider candidates where get_debug_computation_at actually returns non-NULL. Jakub --- gcc/tree-ssa-loop-ivopts.c.jj 2019-09-20 12:25:26.810718338 +0200 +++ gcc/tree-ssa-loop-ivopts.c 2019-10-18 23:55:10.054026219 +0200 @@ -4089,6 +4089,83 @@ get_computation_at (class loop *loop, gi return fold_convert (type, aff_combination_to_tree (&aff)); } +/* Like get_computation_at, but try harder, even if the computation + is more expensive. Intended for debug stmts. */ + +static tree +get_debug_computation_at (class loop *loop, gimple *at, + struct iv_use *use, struct iv_cand *cand) +{ + if (tree ret = get_computation_at (loop, at, use, cand)) + return ret; + + tree ubase = use->iv->base, ustep = use->iv->step; + tree cbase = cand->iv->base, cstep = cand->iv->step; + tree var; + tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase); + widest_int rat; + + /* We must have a precision to express the values of use. */ + if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype)) + return NULL_TREE; + + /* Try to handle the case that get_computation_at doesn't, + try to express + use = ubase + (var - cbase) / ratio. */ + if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep), + &rat)) + return NULL_TREE; + + bool neg_p = false; + if (wi::neg_p (rat)) + { + if (TYPE_UNSIGNED (ctype)) + return NULL_TREE; + neg_p = true; + rat = wi::neg (rat); + } + + int bits = wi::exact_log2 (rat); + if (bits == -1) + bits = wi::floor_log2 (rat) + 1; + if (TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype)) + return NULL_TREE; + + var = var_at_stmt (loop, cand, at); + + if (POINTER_TYPE_P (ctype)) + { + ctype = unsigned_type_for (ctype); + cbase = fold_convert (ctype, cbase); + cstep = fold_convert (ctype, cstep); + var = fold_convert (ctype, var); + } + + ubase = unshare_expr (ubase); + cbase = unshare_expr (cbase); + if (stmt_after_increment (loop, cand, at)) + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, + unshare_expr (cstep)); + + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase); + var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var, + wide_int_to_tree (TREE_TYPE (var), rat)); + if (POINTER_TYPE_P (utype)) + { + var = fold_convert (sizetype, var); + if (neg_p) + var = fold_build1 (NEGATE_EXPR, sizetype, var); + var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var); + } + else + { + var = fold_convert (utype, var); + var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype, + ubase, var); + } + return var; +} + /* Adjust the cost COST for being in loop setup rather than loop body. If we're optimizing for space, the loop setup overhead is constant; if we're optimizing for speed, amortize it over the per-iteration cost. @@ -7523,6 +7600,7 @@ remove_unused_ivs (struct ivopts_data *d struct iv_use dummy_use; struct iv_cand *best_cand = NULL, *cand; unsigned i, best_pref = 0, cand_pref; + tree comp = NULL_TREE; memset (&dummy_use, 0, sizeof (dummy_use)); dummy_use.iv = info->iv; @@ -7543,20 +7621,22 @@ remove_unused_ivs (struct ivopts_data *d ? 1 : 0; if (best_cand == NULL || best_pref < cand_pref) { - best_cand = cand; - best_pref = cand_pref; + tree this_comp + = get_debug_computation_at (data->current_loop, + SSA_NAME_DEF_STMT (def), + &dummy_use, cand); + if (this_comp) + { + best_cand = cand; + best_pref = cand_pref; + comp = this_comp; + } } } if (!best_cand) continue; - tree comp = get_computation_at (data->current_loop, - SSA_NAME_DEF_STMT (def), - &dummy_use, best_cand); - if (!comp) - continue; - if (count > 1) { tree vexpr = make_node (DEBUG_EXPR_DECL);