From patchwork Fri Aug 21 18:41:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1349468 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=DW8tFHUp; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BY9Mg09nBz9sPC for ; Sat, 22 Aug 2020 04:41:17 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ACD2E393BC3C; Fri, 21 Aug 2020 18:41:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id D1F573870906 for ; Fri, 21 Aug 2020 18:41:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D1F573870906 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=roger@nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=VdLp0yW1/1bLxgL/fyHLMtKctPuhVBivy0LpGf46Q8c=; b=DW8tFHUpGauNXQjGCc1l49OKQ0 K1pLViJ3jzwZ5QrE36cmapIFWufsgD8QE5nKVtA3vsPyitbDMRk57cg56rnWTROkZJIMoUhJKBOtD 9yei/smi+n6HjyHnDJEcQ9AuORaG7NsUXt4dkfVXV8FMzCOXcUOKfNzInTcp02dZNR2mdYB3+v/Py xyEKqW9fmEV7I1ftAfVPjmCblUQM4OdkO/5FfyrGxm7pDtNlfpbrpKcnAzPf1b7dJagBFQVFqslC5 NzIc24pEaGiVUF0DsGhfJ4vtbmsldFHgsNJ+wpE9HnZPikTmriQJxdP11AWgf8Yzmg1s7yk6sDRok VRrnvCOQ==; Received: from [185.62.158.67] (port=60767 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1k9ByP-00053g-VG; Fri, 21 Aug 2020 14:41:10 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] hppa: PR middle-end/87256: Improved hppa_rtx_costs avoids synth_mult madness. Date: Fri, 21 Aug 2020 19:41:08 +0100 Message-ID: <004701d677ea$a2fb2930$e8f17b90$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdZ358Orakakq8QtSB68eEM+u1BZGA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 'John David Anglin' Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This is my proposed fix to PR middle-end/87256 where synth_mult takes an unreasonable amount of CPU time determining an optimal sequence of instructions to perform multiplications by (large) integer constants on hppa. One workaround, proposed in bugzilla, is to increase the hash table used to cache/reuse intermediate results. This helps but is a workaround for the (more subtle) underlying problem. The real issue is that the hppa_rtx_costs function is providing wildly inaccurate values (estimates) to the middle-end. For example, (p*q)+(r*s) would appear to be cheaper than a single multiplication. Another example is that "(ashiftrt:di regA regB)" is claimed to be only COST_N_INSNS(1) when in fact the hppa backend actually generates: ashrdi: ldi 32,%r28 and %r24,%r28,%r28 comib,= 0,%r28,.L6 mtsar %r24 subi 31,%r24,%r24 extrs %r25,0,1,%r28 mtsar %r24 bv %r0(%r2) vextrs %r25,32,%r29 .L6: uaddcm %r0,%r24,%r28 vshd %r0,%r26,%r29 subi 31,%r28,%r28 mtsar %r28 zdep %r25,30,31,%r19 subi 31,%r24,%r24 zvdep %r19,32,%r19 mtsar %r24 or %r19,%r29,%r29 bv %r0(%r2) vextrs %r25,32,%r28 which is slightly more than a single instruction. It turns out that simply tightening up the logic in hppa_rtx_costs to return more reasonable values, dramatically reduces the number of recursive invocations in synth_mult for the test case in PR87256, and presumably also produces faster code (that should be observable in benchmarks). Unfortunately, once again this has only be tested via cross-compilers to hppa-unknown-linux-gnu and hppa64-unknown-linux-gnu hosted on x86_64-pc-linux-gnu, but I can confirm cross-compilation is much faster. Many thanks in advance to anyone who can bootstrap and regression test this on real hardware. In an ideal world, changes to rtx_costs should be pretty safe, this function can't introduce any bugs, only expose those are already present (but possibly latent) elsewhere in the compiler. Ha. 2020-08-21 Roger Sayle gcc/ChangeLog PR middle-end/87256 * config/pa/pa.c (hppa_rtx_costs_shadd_p): New helper function to check for coefficients supported by shNadd and shladd,l. (hppa_rtx_costs): Rewrite to avoid using estimates based upon FACTOR and enable recursing deeper into RTL expressions. Thanks again. Roger --- Roger Sayle NextMove Software Cambridge, UK diff --git a/gcc/config/pa/pa.c b/gcc/config/pa/pa.c index 07d3287..cc876e6 100644 --- a/gcc/config/pa/pa.c +++ b/gcc/config/pa/pa.c @@ -1492,6 +1492,33 @@ hppa_address_cost (rtx X, machine_mode mode ATTRIBUTE_UNUSED, } } +/* Return true if X represents a (possibly non-canonical) shNadd pattern. + The machine mode of X is known to be SImode or DImode. */ + +static bool +hppa_rtx_costs_shadd_p (rtx x) +{ + if (GET_CODE (x) != PLUS + || !REG_P (XEXP (x, 1))) + return false; + rtx op0 = XEXP (x, 0); + if (GET_CODE (op0) == ASHIFT + && CONST_INT_P (XEXP (op0, 1)) + && REG_P (XEXP (op0, 0))) + { + unsigned HOST_WIDE_INT x = UINTVAL (XEXP (op0, 1)); + return x == 1 || x == 2 || x == 3; + } + if (GET_CODE (op0) == MULT + && CONST_INT_P (XEXP (op0, 1)) + && REG_P (XEXP (op0, 0))) + { + unsigned HOST_WIDE_INT x = UINTVAL (XEXP (op0, 1)); + return x == 2 || x == 4 || x == 8; + } + return false; +} + /* Compute a (partial) cost for rtx X. Return true if the complete cost has been computed, and false if subexpressions should be scanned. In either case, *TOTAL contains the cost result. */ @@ -1499,15 +1526,16 @@ hppa_address_cost (rtx X, machine_mode mode ATTRIBUTE_UNUSED, static bool hppa_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UNUSED, - int *total, bool speed ATTRIBUTE_UNUSED) + int *total, bool speed) { - int factor; int code = GET_CODE (x); switch (code) { case CONST_INT: - if (INTVAL (x) == 0) + if (outer_code == SET) + *total = COSTS_N_INSNS (1); + else if (INTVAL (x) == 0) *total = 0; else if (INT_14_BITS (x)) *total = 1; @@ -1537,25 +1565,28 @@ hppa_rtx_costs (rtx x, machine_mode mode, int outer_code, if (GET_MODE_CLASS (mode) == MODE_FLOAT) { *total = COSTS_N_INSNS (3); - return true; } - - /* A mode size N times larger than SImode needs O(N*N) more insns. */ - factor = GET_MODE_SIZE (mode) / 4; - if (factor == 0) - factor = 1; - - if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT) - *total = factor * factor * COSTS_N_INSNS (8); + else if (mode == DImode) + { + if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT) + *total = COSTS_N_INSNS (32); + else + *total = COSTS_N_INSNS (80); + } else - *total = factor * factor * COSTS_N_INSNS (20); - return true; + { + if (TARGET_PA_11 && !TARGET_DISABLE_FPREGS && !TARGET_SOFT_FLOAT) + *total = COSTS_N_INSNS (8); + else + *total = COSTS_N_INSNS (20); + } + return REG_P (XEXP (x, 0)) && REG_P (XEXP (x, 1)); case DIV: if (GET_MODE_CLASS (mode) == MODE_FLOAT) { *total = COSTS_N_INSNS (14); - return true; + return false; } /* FALLTHRU */ @@ -1563,34 +1594,107 @@ hppa_rtx_costs (rtx x, machine_mode mode, int outer_code, case MOD: case UMOD: /* A mode size N times larger than SImode needs O(N*N) more insns. */ - factor = GET_MODE_SIZE (mode) / 4; - if (factor == 0) - factor = 1; - - *total = factor * factor * COSTS_N_INSNS (60); - return true; + if (mode == DImode) + *total = COSTS_N_INSNS (240); + else + *total = COSTS_N_INSNS (60); + return REG_P (XEXP (x, 0)) && REG_P (XEXP (x, 1)); case PLUS: /* this includes shNadd insns */ case MINUS: if (GET_MODE_CLASS (mode) == MODE_FLOAT) + *total = COSTS_N_INSNS (3); + else if (mode == DImode) { - *total = COSTS_N_INSNS (3); - return true; + if (TARGET_64BIT) + { + *total = COSTS_N_INSNS (1); + /* Handle shladd,l instructions. */ + if (hppa_rtx_costs_shadd_p (x)) + return true; + } + else + *total = COSTS_N_INSNS (2); } - - /* A size N times larger than UNITS_PER_WORD needs N times as - many insns, taking N times as long. */ - factor = GET_MODE_SIZE (mode) / UNITS_PER_WORD; - if (factor == 0) - factor = 1; - *total = factor * COSTS_N_INSNS (1); - return true; + else + { + *total = COSTS_N_INSNS (1); + /* Handle shNadd instructions. */ + if (hppa_rtx_costs_shadd_p (x)) + return true; + } + return REG_P (XEXP (x, 0)) + && (REG_P (XEXP (x, 1)) + || CONST_INT_P (XEXP (x, 1))); case ASHIFT: + if (mode == DImode) + { + if (TARGET_64BIT) + *total = COSTS_N_INSNS (3); + else if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1))) + { + *total = COSTS_N_INSNS (2); + return true; + } + else if (speed) + *total = COSTS_N_INSNS (13); + else + *total = COSTS_N_INSNS (18); + } + else if (TARGET_64BIT) + *total = COSTS_N_INSNS (4); + else + *total = COSTS_N_INSNS (2); + return REG_P (XEXP (x, 0)) + && (REG_P (XEXP (x, 1)) + || CONST_INT_P (XEXP (x, 1))); + case ASHIFTRT: + if (mode == DImode) + { + if (TARGET_64BIT) + *total = COSTS_N_INSNS (3); + else if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1))) + { + *total = COSTS_N_INSNS (2); + return true; + } + else if (speed) + *total = COSTS_N_INSNS (14); + else + *total = COSTS_N_INSNS (19); + } + else if (TARGET_64BIT) + *total = COSTS_N_INSNS (4); + else + *total = COSTS_N_INSNS (2); + return REG_P (XEXP (x, 0)) + && (REG_P (XEXP (x, 1)) + || CONST_INT_P (XEXP (x, 1))); + case LSHIFTRT: - *total = COSTS_N_INSNS (1); - return true; + if (mode == DImode) + { + if (TARGET_64BIT) + *total = COSTS_N_INSNS (2); + else if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1))) + { + *total = COSTS_N_INSNS (2); + return true; + } + else if (speed) + *total = COSTS_N_INSNS (12); + else + *total = COSTS_N_INSNS (15); + } + else if (TARGET_64BIT) + *total = COSTS_N_INSNS (3); + else + *total = COSTS_N_INSNS (2); + return REG_P (XEXP (x, 0)) + && (REG_P (XEXP (x, 1)) + || CONST_INT_P (XEXP (x, 1))); default: return false;