From patchwork Wed May 9 23:41:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hans-Peter Nilsson X-Patchwork-Id: 911219 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-477464-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=axis.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="GvS+UoWa"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40hCZF6bZLz9s35 for ; Thu, 10 May 2018 09:42:11 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :message-id:from:to:subject:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=nKL1lzWIbvCrVu0J zAEH/gKbldeF+McsYWBxEmWkmQotjh+zZfKPszRT1xgHQOkTUXJcL77Sl9eEqUti 7p4WhY7rYcI2LdYInChIwLdlHgIn/nfBayoM0EElYKiBLZc6+o3uMTDo2BODG+nz lzwHMynUw0cR0xWQYKt9dKPxVIs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :message-id:from:to:subject:mime-version:content-type :content-transfer-encoding; s=default; bh=saao4KuEIfdAVLfvM7XKOn kBfO4=; b=GvS+UoWaMH2Ppr7RDHNF4hapKl+HeyoGuzLHhgYaJRCjYKE3KulERt adjfRzvXEzzfNWdY2lGiR5T9CJ3+kqa0eO9XCLJPQyof5XxXAxQalIjTlljAMsWw 8DNtoPYXCvNH1QZ4BpSFJmYIg7ZEnZQqFMpxkCzkc7nEgYN0/TLJM= Received: (qmail 6808 invoked by alias); 9 May 2018 23:42:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 6789 invoked by uid 89); 9 May 2018 23:42:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=practically, hp, feeding, SIGNED X-HELO: bastet.se.axis.com Received: from bastet.se.axis.com (HELO bastet.se.axis.com) (195.60.68.11) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 09 May 2018 23:41:59 +0000 Received: from localhost (localhost [127.0.0.1]) by bastet.se.axis.com (Postfix) with ESMTP id 13EBC184F2 for ; Thu, 10 May 2018 01:41:57 +0200 (CEST) Received: from bastet.se.axis.com ([IPv6:::ffff:127.0.0.1]) by localhost (bastet.se.axis.com [::ffff:127.0.0.1]) (amavisd-new, port 10024) with LMTP id Og_vSqwvuZ8I for ; Thu, 10 May 2018 01:41:55 +0200 (CEST) Received: from boulder03.se.axis.com (boulder03.se.axis.com [10.0.8.17]) by bastet.se.axis.com (Postfix) with ESMTPS id A4C97184ED for ; Thu, 10 May 2018 01:41:54 +0200 (CEST) Received: from boulder03.se.axis.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7C1681E075 for ; Thu, 10 May 2018 01:41:54 +0200 (CEST) Received: from boulder03.se.axis.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6E7901E074 for ; Thu, 10 May 2018 01:41:54 +0200 (CEST) Received: from thoth.se.axis.com (unknown [10.0.2.173]) by boulder03.se.axis.com (Postfix) with ESMTP for ; Thu, 10 May 2018 01:41:54 +0200 (CEST) Received: from ignucius.se.axis.com (ignucius.se.axis.com [10.88.21.50]) by thoth.se.axis.com (Postfix) with ESMTP id 60D2E2CDC; Thu, 10 May 2018 01:41:54 +0200 (CEST) Received: from ignucius.se.axis.com (localhost [127.0.0.1]) by ignucius.se.axis.com (8.12.8p1/8.12.8/Debian-2woody1) with ESMTP id w49Nfsg8030642; Thu, 10 May 2018 01:41:54 +0200 Received: (from hp@localhost) by ignucius.se.axis.com (8.12.8p1/8.12.8/Debian-2woody1) id w49Nfr4o030638; Thu, 10 May 2018 01:41:53 +0200 Date: Thu, 10 May 2018 01:41:53 +0200 Message-Id: <201805092341.w49Nfr4o030638@ignucius.se.axis.com> From: Hans-Peter Nilsson To: gcc-patches@gcc.gnu.org Subject: Fix PR85726 (div-div suboptimization) and a rant on match.pd :s-flag MIME-Version: 1.0 X-TM-AS-GCONF: 00 Replacing a division feeding a division helps only when the second division is the only user, and "fusing" the divisions is downright bad if another user of the result of first division is a modulus of the same value as the second division, forming a divmod pair. See the test-case, where for the tested architectures (which all fail the test-case before the patch) the div and mod are implemented using the high-part of a widened multiplication and shift, emitted separately but combined as late as in rtl, with the multiplicaton and shift re-used. That of course does not happen if later passes see (y / 48; y % 3). While in match.pd, I noticed the corresponding mul-mul match, which I believe should be guarded the same way. I noticed this spot investigating performance regressions for mipsisa32r2el-linux-gnu comparing to gcc-4.7-era code generated for a compute-intensive library. I'd say the pattern in the test-case is common in cases like implementing a 3x3 filter using SIMD with a vector size 16. The suboptimization was more of an (the first, or second) eye-catcher in a hot degraded function than an actual cause though; fixing it seems to amount for no more than 2% (where there's a 13% regression) in that function. As a contrast, I see e.g. four times as many local call-saved registers used for some loop-intensive functions, but I can't dive into the larger problem, at least not right now. (And no, it's not LRA, AFAICT.) Tested using the gcc-8.1.0 release on aarch64-unknown-linux-gnu, powerpc64-unknown-linux-gnu, x86_64-pc-linux-gnu and partly a cross to mipsisa32r2el-linux-gnu. The patch is generated using "-w" because otherwise re-indentation for the single_use-wrapping makes the diff practically unreadable. The test-case uses rtl-scanning so the result can be verified closer to the actual code but still not restricting it to assembly code. Scanning just after the (last) match.pd pass would still be far from the divmod-combining effects done in rtl. Unfortunately, the pattern for the truncated multiplication is observable in various numbers depending on both the target and the bug, so to avoid littering I restrict it to my target of interest at the moment. At least the presence and absence of the "1/48"-constant is stable across the line with/without the bug with no false positives. This is a optimization regression counting from at least 4.9.2, most likely since r218211, so: ok for for all open branches? Please also see the ":s"-rant after the patch. gcc/testsuite: PR tree-optimization/85726 * pr85726.c: New test. gcc: * match.pd (div (div C1) C2): Guard with single_use of inner div. (mult (mult C1) C2): Similar. Now a rant on the match.pd ":s" flag, which reasonable people may reasonably suggest I should have used in the patch instead of the (if (single_use ...)). Initially, I got the match.pd-language (which deserves a proper name) all wrong. Then I read the documentation and still got it wrong. I "misunderstood" that the ":s" on an operand "O" was supposed to have the effect of conditionalize the replacement "R" by wrapping it in "(if (single_use (O)) R)" as in the suggested patch (above). To wit, this does not work; it will *not* stop the replacement as seen in the test-case (THIS IS NOT A SUGGESTED PATCH): (for div (trunc_div exact_div) (simplify - (div (div @0 INTEGER_CST@1) INTEGER_CST@2) + (div (div:s @0 INTEGER_CST@1) INTEGER_CST@2) (with { bool overflow_p; wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2), In PR69556, it seems other people seem to have read the documentation of ":s" the same way, but are corrected by other comments there, so I guess it's not my reading that's flawed. I suggest preferably (1) correcting the semantics of ":s" to do as the documentation says because I don't understand the explanation in PR69556 comment #4 that the replacement "is still allowed if it is a single operation as that replaces at least one other (the one we are simplifying)"; I see that as a complete nullification of the :s flag, making it a nop. Alternatively (2), if the :s is *not* a nop in some case add an example of that case and sufficient explanation to match-and-simplify.texi *and* the suggested ":S" flag (i.e. *really* conditionalize the replacement by a single-use of that operand). That's just a detail, I do like that language. brgds, H-P --- /dev/null Tue Oct 29 15:57:07 2002 +++ gcc.dg/pr85726.c Tue May 8 07:18:19 2018 @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-fwprop1" } */ + +/* There should just be one mult-as-div sequence, re-used for the mod, + not one for each of the y / 3 and y % 3 (as in due to suboptimal + simplification as y / 48 and y % 3) and there should no be a trace of + the constant used for the 1 / 48 mult-as-div. */ +int ww, vv; +int x(int y) +{ + int z = y / 16; + int w = z / 3; + int v = z % 3; + ww = w; + return v; +} +/* { dg-final { scan-rtl-dump-times "truncate:SI .lshiftrt:DI .mult:DI .sign_extend:DI" 1 "fwprop1" { target mips*-*-* } } } */ +/* { dg-final { scan-rtl-dump-times " .0x2aaaaaab" 0 "fwprop1" } } */ diff --git a/gcc/match.pd b/gcc/match.pd index 0de4432..e8ebeac 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -278,11 +278,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && TYPE_UNSIGNED (type)) (trunc_div @0 @1))) -/* Combine two successive divisions. Note that combining ceil_div - and floor_div is trickier and combining round_div even more so. */ +/* Combine two successive divisions, if the second is the only + user of the result of the first, or else we'll just end up + with two divisions anyway. Note that combining ceil_div and + floor_div is trickier and combining round_div even more so. */ (for div (trunc_div exact_div) (simplify - (div (div @0 INTEGER_CST@1) INTEGER_CST@2) + (div (div@3 @0 INTEGER_CST@1) INTEGER_CST@2) + (if (single_use (@3)) (with { bool overflow_p; wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2), @@ -292,12 +295,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (div @0 { wide_int_to_tree (type, mul); }) (if (TYPE_UNSIGNED (type) || mul != wi::min_value (TYPE_PRECISION (type), SIGNED)) - { build_zero_cst (type); }))))) + { build_zero_cst (type); })))))) /* Combine successive multiplications. Similar to above, but handling overflow is different. */ (simplify - (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2) + (mult (mult@3 @0 INTEGER_CST@1) INTEGER_CST@2) + (if (single_use (@3)) (with { bool overflow_p; wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2), @@ -306,7 +310,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Skip folding on overflow: the only special case is @1 * @2 == -INT_MIN, otherwise undefined overflow implies that @0 must be zero. */ (if (!overflow_p || TYPE_OVERFLOW_WRAPS (type)) - (mult @0 { wide_int_to_tree (type, mul); })))) + (mult @0 { wide_int_to_tree (type, mul); }))))) /* Optimize A / A to 1.0 if we don't care about NaNs or Infinities. */