From patchwork Thu Dec 7 12:04:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Oliva X-Patchwork-Id: 845539 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-468673-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="wnm4K4zY"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ysvMh637tz9sNd for ; Thu, 7 Dec 2017 23:06:08 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=wUzoCjdtIarHeqRmyA3gRsBF0mvH3WWL66TRAWtWpImBGDhqNt mb/FYX3oXHatUBJl06bHjfJAi6+DNDdetT0kd2Z8V4oW6Na248OOS3QZnhGP0ADq BOV0c/yTP7HGLW3nBoKUA7DxxbYbb7ZKF2pvNxDwIaASYu2NesQgj81XU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=ZgK4Xm4nyENQF4LUSFjtwd+3K1o=; b=wnm4K4zYr3VqkqG5vx6R 9ryHpm5TIn2lnTQ6c7fvWv9iWLqLSQKnEB3zLfXxU2IiIuokTVNuhG3K2+9nYk3O eSJ0CbDjf6pHTZN+qU1uT1FBp+Gfw781T+x+J6Ywo+R+SiOmGqO5cJ3EGaKpCvng fXbwKgrWOS8G8u+hUM2C8GY= Received: (qmail 2851 invoked by alias); 7 Dec 2017 12:06:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 2827 invoked by uid 89); 7 Dec 2017 12:05:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=brasil, Brasil, estimates, fighter X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 07 Dec 2017 12:05:57 +0000 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3C8CEC059733; Thu, 7 Dec 2017 12:05:56 +0000 (UTC) Received: from freie.home (ovpn04.gateway.prod.ext.phx2.redhat.com [10.5.9.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F2AC818221; Thu, 7 Dec 2017 12:05:55 +0000 (UTC) Received: from livre (livre.home [172.31.160.2]) by freie.home (8.15.2/8.15.2) with ESMTP id vB7C4gJG022859; Thu, 7 Dec 2017 10:04:43 -0200 From: Alexandre Oliva To: gcc-patches@gcc.gnu.org Cc: law@redhat.com, richard.guenther@gmail.com Subject: [PR81165] discount killed stmts when sizing blocks for threading Date: Thu, 07 Dec 2017 10:04:42 -0200 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 We limit the amount of copying for jump threading based on counting stmts. This counting is overly pessimistic, because we will very often delete stmts as a consequence of jump threading: when the final conditional jump of a block is removed, earlier SSA names computed exclusively for use in that conditional are killed. Furthermore, PHI nodes in blocks with only two predecessors are trivially replaced with their now-single values after threading. This patch scans blocks to be copied in the path constructed so far and estimates the number of stmts that will be removed in the copies, bumping up the stmt count limit. Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? for gcc/ChangeLog * tree-ssa-threadedge.c (uses_in_bb): New. (estimate_threading_killed_stmts): New. (estimate_threading_killed_stmts): New overload. (record_temporary_equivalences_from_stmts_at_dest): Add path parameter; adjust caller. Expand limit when it's hit. for gcc/testsuite/ChangeLog * gcc.dg/pr81165.c: New. --- gcc/testsuite/gcc.dg/pr81165.c | 59 ++++++++++++ gcc/tree-ssa-threadedge.c | 189 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 245 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr81165.c diff --git a/gcc/testsuite/gcc.dg/pr81165.c b/gcc/testsuite/gcc.dg/pr81165.c new file mode 100644 index 000000000000..8508d893bed6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr81165.c @@ -0,0 +1,59 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-not " \[/%\] " "optimized" } } */ + +/* Testcase submitted for PR81165, with its main function removed as + it's turned into a compile test. We want to make sure that all of + the divide/remainder computations are removed by tree optimizers. + + We can figure out that we don't need to compute at runtime even the + condition to enter the loop: the initial i==0 would have to be + greater than the sum of two small unsigned values: 1U>>t1 is in the + range 0..1, whereas the char value is bounded by the range 0..127, + being 128 % a positive number (zero would invoke undefined + behavior, so we can assume it doesn't happen). (We know it's + nonnegative because it's 10 times a number that has no more than + the bits for 16, 8 and 1 set.) + + We don't realize that the loop is useless right away: jump + threading helps remove some of the complexity, particularly of the + computation within the loop: t1 is compared with 1, but it can + never be 1. (We could assume as much, since its being 1 would + divide by zero, but we don't.) + + If we don't enter the conditional block, t1 remains at 2; if we do, + it's set to either -1. If we jump thread at the end of the + conditional block, we can figure out the ranges exclude 1 and the + jump body is completely optimized out. However, we used to fail to + consider the block for jump threading due to the amount of + computation in it, without realizing most of it would die in + consequence of the threading. + + We now take the dying code into account when deciding whether or + not to try jump threading. That might enable us to optimize the + function into { if (x2 != 0 || (x1 & 1) == 0) abort (); }. At the + time of this writing, with the patch, we get close, but the test on + x2 only gets as far as ((1 >> x2) == 0). Without the patch, some + of the loop remains. */ + +short x0 = 15; + +void func (){ + volatile int x1 = 1U; + volatile char x2 = 0; + char t0 = 0; + unsigned long t1 = 2LU; + int i = 0; + + if(1>>x2) { + t0 = -1; + t1 = (1&(short)(x1^8U))-1; + } + + while(i > (int)((1U>>t1)+(char)(128%(10*(25LU&(29%x0)))))) { + i += (int)(12L/(1!=(int)t1)); + } + + if (t0 != -1) __builtin_abort(); + if (t1 != 0L) __builtin_abort(); +} diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 536c4717b725..25ccac2a3ecc 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -170,6 +170,173 @@ threadedge_valueize (tree t) return t; } +/* Return how many uses of T there are within BB, as long as there + aren't any uses outside BB. If there are any uses outside BB, + return -1 if there's at most one use within BB, or -2 if there is + more than one use within BB. */ + +static int +uses_in_bb (tree t, basic_block bb) +{ + int uses = 0; + bool outside_bb = false; + + imm_use_iterator iter; + use_operand_p use_p; + FOR_EACH_IMM_USE_FAST (use_p, iter, t) + { + if (is_gimple_debug (USE_STMT (use_p))) + continue; + + if (gimple_bb (USE_STMT (use_p)) != bb) + outside_bb = true; + else + uses++; + + if (outside_bb && uses > 1) + return -2; + } + + if (outside_bb) + return -1; + + return uses; +} + +/* Starting from the final control flow stmt in BB, assuming it will + be removed, follow uses in to-be-removed stmts back to their defs + and count how many defs are to become dead and be removed as + well. */ + +static int +estimate_threading_killed_stmts (basic_block bb) +{ + int killed_stmts = 0; + hash_map ssa_remaining_uses; + auto_vec dead_worklist; + + /* If the block has only two predecessors, threading will turn phi + dsts into either src, so count them as dead stmts. */ + bool drop_all_phis = EDGE_COUNT (bb->preds) == 2; + + if (drop_all_phis) + for (gphi_iterator gsi = gsi_start_phis (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + gphi *phi = gsi.phi (); + tree dst = gimple_phi_result (phi); + + /* We don't count virtual PHIs as stmts in + record_temporary_equivalences_from_phis. */ + if (virtual_operand_p (dst)) + continue; + + killed_stmts++; + } + + if (gsi_end_p (gsi_last_bb (bb))) + return killed_stmts; + + gimple *stmt = gsi_stmt (gsi_last_bb (bb)); + if (gimple_code (stmt) != GIMPLE_COND + && gimple_code (stmt) != GIMPLE_GOTO + && gimple_code (stmt) != GIMPLE_SWITCH) + return killed_stmts; + + dead_worklist.quick_push (stmt); + while (!dead_worklist.is_empty ()) + { + stmt = dead_worklist.pop (); + + ssa_op_iter iter; + use_operand_p use_p; + FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) + { + tree t = USE_FROM_PTR (use_p); + if (SSA_NAME_DEF_STMT (t) + && (gimple_code (SSA_NAME_DEF_STMT (t)) == GIMPLE_ASSIGN + || (gimple_code (SSA_NAME_DEF_STMT (t)) == GIMPLE_PHI + && !virtual_operand_p (t) + && !drop_all_phis)) + && gimple_bb (SSA_NAME_DEF_STMT (t)) == bb) + { + int *usesp = ssa_remaining_uses.get (t); + int uses; + + if (usesp) + uses = *usesp; + else + uses = uses_in_bb (t, bb); + + gcc_assert (uses); + + /* Don't bother recording the expected use count if we + won't find any further uses within BB. */ + if (!usesp && (uses < -1 || uses > 1)) + { + usesp = &ssa_remaining_uses.get_or_insert (t); + *usesp = uses; + } + + if (uses < 0) + continue; + + --uses; + if (usesp) + *usesp = uses; + + if (!uses) + { + killed_stmts++; + if (usesp) + ssa_remaining_uses.remove (t); + if (gimple_code (SSA_NAME_DEF_STMT (t)) != GIMPLE_PHI) + dead_worklist.safe_push (SSA_NAME_DEF_STMT (t)); + } + } + } + } + + if (dump_file) + fprintf (dump_file, "threading bb %i kills %i stmts\n", + bb->index, killed_stmts); + + return killed_stmts; +} + +/* Estimate killed statements over all blocks in the PATH, or in E. + If PATH is not empty, E must be its last entry. */ + +static int +estimate_threading_killed_stmts (edge e, vec *path) +{ + gcc_assert (path->length () == 0 || path->last ()->e == e); + if (path->length () == 0) + return estimate_threading_killed_stmts (e->dest); + + int total = 0; + int i = 0; + jump_thread_edge *je; + FOR_EACH_VEC_ELT (*path, i, je) + switch (je->type) + { + case EDGE_START_JUMP_THREAD: + case EDGE_COPY_SRC_BLOCK: + case EDGE_COPY_SRC_JOINER_BLOCK: + total += estimate_threading_killed_stmts (je->e->dest); + break; + + default: + gcc_unreachable (); + + case EDGE_FSM_THREAD: + case EDGE_NO_COPY_SRC_BLOCK: + break; + } + + return total; +} + /* Try to simplify each statement in E->dest, ultimately leading to a simplification of the COND_EXPR at the end of E->dest. @@ -191,7 +358,8 @@ static gimple * record_temporary_equivalences_from_stmts_at_dest (edge e, const_and_copies *const_and_copies, avail_exprs_stack *avail_exprs_stack, - pfn_simplify simplify) + pfn_simplify simplify, + vec *path) { gimple *stmt = NULL; gimple_stmt_iterator gsi; @@ -233,7 +401,22 @@ record_temporary_equivalences_from_stmts_at_dest (edge e, expansion, then do not thread through this block. */ stmt_count++; if (stmt_count > max_stmt_count) - return NULL; + { + /* If any of the stmts in the PATH's dests are going to be + killed due to threading, grow the max count + accordingly. */ + if (max_stmt_count + == PARAM_VALUE (PARAM_MAX_JUMP_THREAD_DUPLICATION_STMTS)) + { + max_stmt_count += estimate_threading_killed_stmts (e, path); + if (dump_file) + fprintf (dump_file, "threading bb %i up to %i stmts\n", + e->dest->index, max_stmt_count); + } + /* If we're still past the limit, we're done. */ + if (stmt_count > max_stmt_count) + return NULL; + } /* If this is not a statement that sets an SSA_NAME to a new value, then do not try to simplify this statement as it will @@ -991,7 +1174,7 @@ thread_through_normal_block (edge e, gimple *stmt = record_temporary_equivalences_from_stmts_at_dest (e, const_and_copies, avail_exprs_stack, - simplify); + simplify, path); /* There's two reasons STMT might be null, and distinguishing between them is important.