From patchwork Sat Oct 19 06:27:31 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jakub Jelinek <jakub@redhat.com>
X-Patchwork-Id: 1179748
Return-Path: 
 <gcc-patches-return-511345-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized)
	smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131;
	helo=sourceware.org;
	envelope-from=gcc-patches-return-511345-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=fail (p=none dis=none) header.from=redhat.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="eGNwg0AT"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 46wCd46B49z9sNw
	for <incoming@patchwork.ozlabs.org>;
	Sat, 19 Oct 2019 17:27:50 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:reply-to:mime-version
	:content-type; q=dns; s=default; b=lYgb0RSlx93IScCgKyRbmABXTty/G
	SVOXJ0ZBKBUmFndbU/1akfvgz1gEjQYZb9qq7mvUXBf6yAwlLfRacLTsCVlPLvCz
	vaG2ul9UdkGZ1ITw64/cGpM/gJBXl/KQ+hvWQx0WQJEm1YQ2NKRy8PjFKk1k6/2h
	/kypqlFKYUCvGs=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:reply-to:mime-version
	:content-type; s=default; bh=WOgv3ZlA7VEFIfgS2/yzYFyPsR0=; b=eGN
	wg0ATIZ/fli+CU01zEcnDe4qM85il+PdLRfwSYVTD15Of+Gb9Ft5kdaP4DCoNmA0
	ky7Qd7+1v5Ss/PqCdBqjEWlHyQBIhdlYlOcrAQZdV/U17rVdPn6+8eQx8JYWCiBG
	PTew0nMKeHwhuXW7w+u063J27+fnG2tbqz2HnGJo=
Received: (qmail 130052 invoked by alias); 19 Oct 2019 06:27:42 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 130043 invoked by uid 89); 19 Oct 2019 06:27:42 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-7.6 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_2, GIT_PATCH_3,
	SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=type_precision,
	IVs, TYPE_PRECISION, ivs
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Sat, 19 Oct 2019 06:27:39 +0000
Received: from smtp.corp.redhat.com
	(int-mx07.intmail.prod.int.phx2.redhat.com
	[10.5.11.22])	(using TLSv1.2 with cipher AECDH-AES256-SHA
	(256/256 bits))	(No client certificate requested)	by
	mx1.redhat.com (Postfix) with ESMTPS id 35B4E308A98C;
	Sat, 19 Oct 2019 06:27:38 +0000 (UTC)
Received: from tucnak.zalov.cz (unknown [10.36.118.135])	by
	smtp.corp.redhat.com (Postfix) with ESMTPS id 9D6C3100EBDA;
	Sat, 19 Oct 2019 06:27:37 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])	by tucnak.zalov.cz
	(8.15.2/8.15.2) with ESMTP id x9J6RZ9E006922;
	Sat, 19 Oct 2019 08:27:35 +0200
Received: (from jakub@localhost)	by tucnak.zalov.cz (8.15.2/8.15.2/Submit)
	id x9J6RVDS006921; Sat, 19 Oct 2019 08:27:31 +0200
Date: Sat, 19 Oct 2019 08:27:31 +0200
From: Jakub Jelinek <jakub@redhat.com>
To: "Bin.Cheng" <amker.cheng@gmail.com>, Richard Biener <rguenther@suse.de>,
	Alexandre Oliva <oliva@gnu.org>
Cc: gcc-patches@gcc.gnu.org
Subject: [PATCH] Improve debug info in ivopts optimized loops (PR
	debug/90231)
Message-ID: <20191019062731.GL2116@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: Mutt/1.11.3 (2019-02-01)
X-IsSubscribed: yes

Hi!

As mentioned in the PR, the following patch attempts to address two issues.
In remove_unused_ivs we first find the best iv_cand (we prefer primarily the
same step, next same mode and lastly constant base) and only then call
get_computation_at to determine the replacement expression.  Unfortunately,
in various cases get_computation_at can return NULL_TREE and in that case we
don't try any other candidate and just leave the vars for debug as is, which
results in => NULL and the IVs <optimized away>.

The following patch will only consider candidates for which
get_computation_at succeeds, while it can be slower, it can handle more
cases.  Perhaps alternative would be to have two passes, pick up the best
candidate without calling get_computation_at, calling it on the best
candidate and if that fails, retry the best candidate search with calling
get_computation_at.

Another thing is that get_computation_at can only handle cases where
use can be expressed as ubase + (var - cbase) * ratio for integral ratio.
In the PR testcase we don't have any, one (the one we pick as best)
has a ratio 1/15 and the other 1/4.  Using a division in ivopts at runtime
is hardly ever desirable, but for debug info we don't mind that, all we need
to ensure is that we don't result in wrong-debug.
The patch implements expressing use as ubase + (var - cbase) / ratio for
these debug info only uses, but so far requires that if the use IV couldn't
wrap around, that the candidate (var - cbase) can't wrap around either,
which should be true whenever the candidate type is IMHO at least
ceil_log2 (ratio) bits larger than the use type.  Do I need to punt if
!use->iv->no_overflow, or is
ubase + (utype) ((var - cbase) / ratio) computation safe even if it wraps?
And as questioned in the PR, are there other cases where we can safely
assume no wrap (e.g. use it if use->iv->no_overflow && cand->iv->no_overflow
without those extra precision checks)?

Anyway, bootstrapped/regtested successfully on x86_64-linux and i686-linux.

Comparing bootstrapped cc1plus without and with this patch (the former with
the patch applied after bootstrap and stage3 rebuilt, so that .text etc. is
identical) shows:
-  [33] .debug_info       PROGBITS        0000000000000000 22a4788 749275e 00      0   0  1
-  [34] .debug_abbrev     PROGBITS        0000000000000000 9736ee6 204aad 00      0   0  1
-  [35] .debug_line       PROGBITS        0000000000000000 993b993 1688464 00      0   0  1
-  [36] .debug_str        PROGBITS        0000000000000000 afc3df7 6f65aa 01  MS  0   0  1
-  [37] .debug_loc        PROGBITS        0000000000000000 b6ba3a1 71a2dde 00      0   0  1
-  [38] .debug_ranges     PROGBITS        0000000000000000 1285d17f 16414d0 00      0   0  1
-  [39] .symtab           SYMTAB          0000000000000000 13e9e650 166ff8 18     40 38193  8
-  [40] .strtab           STRTAB          0000000000000000 14005648 2ad809 00      0   0  1
-  [41] .shstrtab         STRTAB          0000000000000000 142b2e51 0001a0 00      0   0  1
+  [33] .debug_info       PROGBITS        0000000000000000 22a4788 749365e 00      0   0  1
+  [34] .debug_abbrev     PROGBITS        0000000000000000 9737de6 204a9f 00      0   0  1
+  [35] .debug_line       PROGBITS        0000000000000000 993c885 1688f0c 00      0   0  1
+  [36] .debug_str        PROGBITS        0000000000000000 afc5791 6f65aa 01  MS  0   0  1
+  [37] .debug_loc        PROGBITS        0000000000000000 b6bbd3b 71cd404 00      0   0  1
+  [38] .debug_ranges     PROGBITS        0000000000000000 1288913f 16414b0 00      0   0  1
+  [39] .symtab           SYMTAB          0000000000000000 13eca5f0 166ff8 18     40 38193  8
+  [40] .strtab           STRTAB          0000000000000000 140315e8 2ad809 00      0   0  1
+  [41] .shstrtab         STRTAB          0000000000000000 142dedf1 0001a0 00      0   0  1
so .debug_info is 3840 bytes larger and .debug_loc is 173606 bytes larger
(0.15%), so there are some changes with this, but not huge amount of them,
though .debug_loc size changed in 217 gcc/*.o files out of 474.

2019-10-18  Jakub Jelinek  <jakub@redhat.com>

	PR debug/90231
	* tree-ssa-loop-ivopts.c (get_debug_computation_at): New function.
	(remove_unused_ivs): Use it instead of get_computation_at.  When
	choosing best candidate, only consider candidates where
	get_debug_computation_at actually returns non-NULL.


	Jakub
--- gcc/tree-ssa-loop-ivopts.c.jj	2019-09-20 12:25:26.810718338 +0200
+++ gcc/tree-ssa-loop-ivopts.c	2019-10-18 23:55:10.054026219 +0200
@@ -4089,6 +4089,83 @@ get_computation_at (class loop *loop, gi
   return fold_convert (type, aff_combination_to_tree (&aff));
 }
 
+/* Like get_computation_at, but try harder, even if the computation
+   is more expensive.  Intended for debug stmts.  */
+
+static tree
+get_debug_computation_at (class loop *loop, gimple *at,
+			  struct iv_use *use, struct iv_cand *cand)
+{
+  if (tree ret = get_computation_at (loop, at, use, cand))
+    return ret;
+
+  tree ubase = use->iv->base, ustep = use->iv->step;
+  tree cbase = cand->iv->base, cstep = cand->iv->step;
+  tree var;
+  tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase);
+  widest_int rat;
+
+  /* We must have a precision to express the values of use.  */
+  if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype))
+    return NULL_TREE;
+
+  /* Try to handle the case that get_computation_at doesn't,
+     try to express
+     use = ubase + (var - cbase) / ratio.  */
+  if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep),
+			     &rat))
+    return NULL_TREE;
+
+  bool neg_p = false;
+  if (wi::neg_p (rat))
+    {
+      if (TYPE_UNSIGNED (ctype))
+	return NULL_TREE;
+      neg_p = true;
+      rat = wi::neg (rat);
+    }
+
+  int bits = wi::exact_log2 (rat);
+  if (bits == -1)
+    bits = wi::floor_log2 (rat) + 1;
+  if (TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype))
+    return NULL_TREE;
+
+  var = var_at_stmt (loop, cand, at);
+
+  if (POINTER_TYPE_P (ctype))
+    {
+      ctype = unsigned_type_for (ctype);
+      cbase = fold_convert (ctype, cbase);
+      cstep = fold_convert (ctype, cstep);
+      var = fold_convert (ctype, var);
+    }
+
+  ubase = unshare_expr (ubase);
+  cbase = unshare_expr (cbase);
+  if (stmt_after_increment (loop, cand, at))
+    var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var,
+		       unshare_expr (cstep));
+
+  var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase);
+  var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var,
+		     wide_int_to_tree (TREE_TYPE (var), rat));
+  if (POINTER_TYPE_P (utype))
+    {
+      var = fold_convert (sizetype, var);
+      if (neg_p)
+	var = fold_build1 (NEGATE_EXPR, sizetype, var);
+      var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var);
+    }
+  else
+    {
+      var = fold_convert (utype, var);
+      var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype,
+			 ubase, var);
+    }
+  return var;
+}
+
 /* Adjust the cost COST for being in loop setup rather than loop body.
    If we're optimizing for space, the loop setup overhead is constant;
    if we're optimizing for speed, amortize it over the per-iteration cost.
@@ -7523,6 +7600,7 @@ remove_unused_ivs (struct ivopts_data *d
 	      struct iv_use dummy_use;
 	      struct iv_cand *best_cand = NULL, *cand;
 	      unsigned i, best_pref = 0, cand_pref;
+	      tree comp = NULL_TREE;
 
 	      memset (&dummy_use, 0, sizeof (dummy_use));
 	      dummy_use.iv = info->iv;
@@ -7543,20 +7621,22 @@ remove_unused_ivs (struct ivopts_data *d
 		    ? 1 : 0;
 		  if (best_cand == NULL || best_pref < cand_pref)
 		    {
-		      best_cand = cand;
-		      best_pref = cand_pref;
+		      tree this_comp
+			= get_debug_computation_at (data->current_loop,
+						    SSA_NAME_DEF_STMT (def),
+						    &dummy_use, cand);
+		      if (this_comp)
+			{
+			  best_cand = cand;
+			  best_pref = cand_pref;
+			  comp = this_comp;
+			}
 		    }
 		}
 
 	      if (!best_cand)
 		continue;
 
-	      tree comp = get_computation_at (data->current_loop,
-					      SSA_NAME_DEF_STMT (def),
-					      &dummy_use, best_cand);
-	      if (!comp)
-		continue;
-
 	      if (count > 1)
 		{
 		  tree vexpr = make_node (DEBUG_EXPR_DECL);