From patchwork Thu Aug 10 14:10:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jackson Woodruff X-Patchwork-Id: 800222 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-460189-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="PTiSwGpk"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xSqnR2yt8z9t2k for ; Fri, 11 Aug 2017 00:11:39 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:cc:message-id:date:mime-version:content-type; q= dns; s=default; b=wEVmjSlBq0FuPaYpZYLe83UlR8yeAgOglm/EtnohQ3K5r8 8GOxJNFvBBVhwrNl1qZjopVeLzKu2WqC4CVTieWC+hO9Xh7I6EkGPf1yxBS5ca8d 9wGIh3PjgDyX4c9VJ7T+wOJisXeDVV5vO+O86xAlsdiBulwsrzwzBLN0TMAaM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:cc:message-id:date:mime-version:content-type; s= default; bh=lYF+tQiIJKm2EOdCBBmtiG3Dto4=; b=PTiSwGpkO0q78FZI6JZ7 OpYX6Bp8yWBkrbRioXiQUSaI8B3UNcGvytdqkIKt+YDEoJBMH4J6P9UsJ+bNiu0j 5ovolZkesH/KFXAx2pS1oKBlF5YtiJAIp3EVu6k6OlxpnhAeCXoD4QNmIzWWczHB jxwuytZ3eOHLeIIZIUslrFA= Received: (qmail 58478 invoked by alias); 10 Aug 2017 14:11:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 57574 invoked by uid 89); 10 Aug 2017 14:11:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=Hx-languages-length:8585, rooted X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 10 Aug 2017 14:11:00 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 61E8E80D; Thu, 10 Aug 2017 07:10:59 -0700 (PDT) Received: from [10.2.206.195] (e112997-lin.cambridge.arm.com [10.2.206.195]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6B9AE3F483; Thu, 10 Aug 2017 07:10:58 -0700 (PDT) To: wilco.dijkstra@arm.com, richard.guenther@gmail.com, kyrylo.tkachov@foss.arm.com, joseph@codesourcery.com From: Jackson Woodruff Subject: [PATCH] Factor out division by squares and remove division around comparisons (2/2) Cc: gcc-patches@gcc.gnu.org Message-ID: Date: Thu, 10 Aug 2017 15:10:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 X-IsSubscribed: yes Hi all, The patch implements the some of the division optimizations discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 . We now reassociate (as discussed in the bug report): x / (y * y) -> x * (1 / y) * (1 / y) If it is reasonable to do so. This is done with -funsafe-math-optimizations. Bootstrapped and regtested with part (1/2). OK for trunk? Jackson gcc/ 2017-08-03 Jackson Woodruff PR 71026/tree-optimization * tree-ssa-math-opts (is_division_by_square, is_square_of, insert_sqaure_reciprocals): New. (insert_reciprocals): Change to insert reciprocals before a division by a square. (execute_cse_reciprocals_1): Change to consider division by a square. gcc/testsuite 2017-08-03 Jackson Woodruff PR 71026/tree-optimization * gcc.dg/associate_division_1.c: New. diff --git a/gcc/testsuite/gcc.dg/associate_division_1.c b/gcc/testsuite/gcc.dg/associate_division_1.c new file mode 100644 index 0000000000000000000000000000000000000000..187d3597af8825a6a43f29bfaa68b089d2d5bbfe --- /dev/null +++ b/gcc/testsuite/gcc.dg/associate_division_1.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized" } */ + +float +extract_square (float x, float y, float *a, float *b) +{ + *a = 3 / (y * y); + *b = 5 / (y * y); + + return x / (y * y); +} + +/* Don't expect the 'powmult' (calculation of y * y) + to be deleted until a later pass, so look for one + more multiplication than strictly necessary. */ +float +extract_recip (float *w, float x, float y, float z) +{ + *w = 7 / y; + + return x / (y * y) + z / y; +} + +float +neg_recip (float x, float y, float z) +{ + return (x / y) + (z / -y); +} + +float +const_divisor (float *w, float x, float y, float z) +{ + *w = 5 / y; + return x / (y * 3.0f) + z / y; +} + +/* 4 For the pointers to a, b, w and w, 4 multiplications in 'extract_square', + 5 multiplications in 'extract_recip', 0 multiplications + in 'neg_recip', 3 multiplcations in 'const_divisor' expected. */ +/* { dg-final { scan-tree-dump-times " \\* " 14 "optimized" } } */ + +/* 1 division in 'extract_square', 1 division in 'extract_recip', + 1 division in 'neg_recip', 1 division in 'const_divisor'. */ +/* { dg-final { scan-tree-dump-times " / " 4 "optimized" } } */ + + diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 7ac1659fa0670b7080685f3f9513939807073a63..0db160f68001ffb90942c4002703625430128b9f 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -340,6 +340,38 @@ is_division_by (gimple *use_stmt, tree def) && gimple_assign_rhs1 (use_stmt) != def; } +/* Return whether USE_STMT is DEF * DEF. */ +static inline bool +is_square_of (gimple *use_stmt, tree def) +{ + if (gimple_code (use_stmt) == GIMPLE_ASSIGN + && gimple_assign_rhs_code (use_stmt) == MULT_EXPR) + { + tree op0 = gimple_assign_rhs1 (use_stmt); + tree op1 = gimple_assign_rhs2 (use_stmt); + + return op0 == op1 && op0 == def; + } + return 0; +} + +/* Return whether USE_STMT is a floating-point division by + DEF * DEF. */ +static inline bool +is_division_by_square (gimple *use_stmt, tree def) +{ + if (gimple_code (use_stmt) == GIMPLE_ASSIGN + && gimple_assign_rhs_code (use_stmt) == RDIV_EXPR) + { + tree denominator = gimple_assign_rhs2 (use_stmt); + if (TREE_CODE (denominator) == SSA_NAME) + { + return is_square_of (SSA_NAME_DEF_STMT (denominator), def); + } + } + return 0; +} + /* Walk the subset of the dominator tree rooted at OCC, setting the RECIP_DEF field to a definition of 1.0 / DEF that can be used in the given basic block. The field may be left NULL, of course, @@ -369,14 +401,16 @@ insert_reciprocals (gimple_stmt_iterator *def_gsi, struct occurrence *occ, build_one_cst (type), def); if (occ->bb_has_division) - { - /* Case 1: insert before an existing division. */ - gsi = gsi_after_labels (occ->bb); - while (!gsi_end_p (gsi) && !is_division_by (gsi_stmt (gsi), def)) + { + /* Case 1: insert before an existing division. */ + gsi = gsi_after_labels (occ->bb); + while (!gsi_end_p (gsi) + && (!is_division_by (gsi_stmt (gsi), def)) + && (!is_division_by_square (gsi_stmt (gsi), def))) gsi_next (&gsi); - gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); - } + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + } else if (def_gsi && occ->bb == def_gsi->bb) { /* Case 2: insert right after the definition. Note that this will @@ -403,6 +437,80 @@ insert_reciprocals (gimple_stmt_iterator *def_gsi, struct occurrence *occ, } +/* Walk the tree until we find the first division by a square. Insert + the definition of the reciprocal there. This returns that definition, + or if there is an alternate definition earlier, then it returns that + instead. */ + +static tree +insert_square_reciprocals (struct occurrence *occ, tree def) +{ + gimple_stmt_iterator gsi = gsi_after_labels (occ->bb); + + while (!gsi_end_p (gsi) + && !is_division_by (gsi_stmt (gsi), def) + /* Check to see if a calculation statement has already + been inserted. */ + && !is_square_of (gsi_stmt (gsi), occ->recip_def)) + gsi_next (&gsi); + + if (gsi_end_p (gsi)) + return NULL; + else if (is_square_of (gsi_stmt (gsi), occ->recip_def)) + return gimple_assign_lhs (gsi_stmt (gsi)); + else + { + tree type = TREE_TYPE (def); + tree recip_square_def = create_tmp_reg (type, "powmult_reciptmp"); + gassign *new_stmt = gimple_build_assign (recip_square_def, MULT_EXPR, + occ->recip_def, occ->recip_def); + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + return recip_square_def; + } +} + +/* Replace occurrences of expr / (x * x) with expr * ((1 / x) * (1 / x)). + Unlike replace_reciprocals, we expect use_p to be a square definition + (i.e. use_p is _ = x * x). Iterate though all uses of this square + and replace them. */ +static inline void +replace_reciprocal_squares (use_operand_p use_p) +{ + gimple *use_stmt = USE_STMT (use_p); + basic_block bb = gimple_bb (use_stmt); + struct occurrence *occ = (struct occurrence *) bb->aux; + imm_use_iterator use_iter; + tree def_name = gimple_assign_lhs (use_stmt); + use_operand_p square_use_p; + tree squared_reciprocal = insert_square_reciprocals (occ, def_name); + + if (optimize_bb_for_speed_p (bb) && squared_reciprocal + && occ->recip_def && use_stmt != occ->recip_def_stmt) + { + + /* Find all occurrences of the use name and replace + them by multiplications of this new inverse. */ + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, def_name) + { + FOR_EACH_IMM_USE_ON_STMT (square_use_p, use_iter) + { + gimple *square_use = USE_STMT (square_use_p); + + if (gimple_assign_rhs_code (square_use) == RDIV_EXPR) + { + gimple_assign_set_rhs_code (square_use, MULT_EXPR); + gimple_assign_set_rhs2 (square_use, squared_reciprocal); + SET_USE (square_use_p, squared_reciprocal); + + reciprocal_stats.rdivs_inserted++; + update_stmt (square_use); + } + } + } + } +} + + /* Replace the division at USE_P with a multiplication by the reciprocal, if possible. */ @@ -460,10 +568,12 @@ free_bb (struct occurrence *occ) static void execute_cse_reciprocals_1 (gimple_stmt_iterator *def_gsi, tree def) { - use_operand_p use_p; - imm_use_iterator use_iter; + use_operand_p use_p, square_use_p; + imm_use_iterator use_iter, square_use_iter; + tree square_def; struct occurrence *occ; int count = 0, threshold; + int square_recip_count = 0; gcc_assert (FLOAT_TYPE_P (TREE_TYPE (def)) && is_gimple_reg (def)); @@ -475,11 +585,26 @@ execute_cse_reciprocals_1 (gimple_stmt_iterator *def_gsi, tree def) register_division_in (gimple_bb (use_stmt)); count++; } + + if (is_square_of (use_stmt, def)) + { + square_def = gimple_assign_lhs (use_stmt); + FOR_EACH_IMM_USE_FAST (square_use_p, square_use_iter, square_def) + { + gimple *square_use_stmt = USE_STMT (square_use_p); + if (is_division_by (square_use_stmt, square_def)) + { + register_division_in (gimple_bb (square_use_stmt)); + square_recip_count++; + } + } + } } /* Do the expensive part only if we can hope to optimize something. */ threshold = targetm.min_divisions_for_recip_mul (TYPE_MODE (TREE_TYPE (def))); - if (count >= threshold) + if (count + square_recip_count >= threshold + && count >= 1) { gimple *use_stmt; for (occ = occ_head; occ; occ = occ->next) @@ -495,6 +620,11 @@ execute_cse_reciprocals_1 (gimple_stmt_iterator *def_gsi, tree def) FOR_EACH_IMM_USE_ON_STMT (use_p, use_iter) replace_reciprocal (use_p); } + else if (is_square_of (use_stmt, def)) + { + FOR_EACH_IMM_USE_ON_STMT (use_p, use_iter) + replace_reciprocal_squares (use_p); + } } }