From patchwork Sat Jan 6 13:30:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1883213 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=oWSRUhZp; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4T6h6W6MLWz1yP9 for ; Sun, 7 Jan 2024 00:30:59 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2E7E7385829D for ; Sat, 6 Jan 2024 13:30:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 1D9E93858D20 for ; Sat, 6 Jan 2024 13:30:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1D9E93858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1D9E93858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704547839; cv=none; b=td4VyW04nLGp06JyxyQXwhUn9qu/zOeoWWFzILNb3r/ALYcYwyz3JkW3mXBotdnFX6Qda3uD+RAwj8LiaQnGciw70G5d8gP/keYpwYWTWlxS8sCQXFDu4I/nGIKTLjU3/9lAXdeJCtva8/eZ1/j+m+5tkC6/cOQdzUGimDLSmUc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704547839; c=relaxed/simple; bh=yMWn6scKye5Smd2b79WpDxNW+6QtOl8QM6aDhbQBZh0=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=lAhctngD6wfOHexsyhqflCz55CzCF6szizMmnLxjjfonBoNH2IEiTX5A7Q2fPWTgtQIddaUNW/rL/Zn9e0s7b6rTx4p0Nb8m/+C1LZrGocpJ9w3KaKIppHyxwiqZ+MTFzncb5RkJe2nqT678/oYHOMYPziDVKtkRKO/dAnaiIFc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=uyDPpzm35YW5rFFGBN0TK9WGLcfhjO7FBv1kbnTRPJs=; b=oWSRUhZp8S2u8Nvoe4bjo6c+B9 ealzmQPNJw9zaNkGnx5C/PoHBm1hMgJhnmW5FlongGZqQ48inF0Afh3Ye1gR1zNFpd66A13QY4wDe R3EPlA/Vp467ekwPKYRY72E+RQbjkesRMhVQlGzCjBio1QtqwFY3goFf78Ijkbv1SDEJ6M8uh8mcj 4sPsTHecrA1taUR0JWa6GlU7Iqrzs9aN4O31XNMsbX8tkBVJkq6LXIlpabPqAi2xeoxGgFi46cEl+ ThW58QyjxxwuBkTVlkqHX8oqnVTbiKz+0yNOBmrn243boQI0eecUUZP3We7/tm4YaBce1aUvkJM6X jy/W9svA==; Received: from host109-154-238-190.range109-154.btcentralplus.com ([109.154.238.190]:59937 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1rM6l2-0000aR-1g; Sat, 06 Jan 2024 08:30:36 -0500 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86 PATCH] PR target/113231: Improved costs in Scalar-To-Vector (STV) pass. Date: Sat, 6 Jan 2024 13:30:34 -0000 Message-ID: <03c401da40a4$8819fe30$984dfa90$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdpApDbB5NHW6ssNQtCM9MBaqshtwQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch improves the cost/gain calculation used during the i386 backend's SImode/DImode scalar-to-vector (STV) conversion pass. The current code handles loads and stores, but doesn't consider that converting other scalar operations with a memory destination, requires an explicit load before and an explicit store after the vector equivalent. To ease the review, the significant change looks like: /* For operations on memory operands, include the overhead of explicit load and store instructions. */ if (MEM_P (dst)) igain += !optimize_insn_for_size_p () ? (m * (ix86_cost->int_load[2] + ix86_cost->int_store[2]) - (ix86_cost->sse_load[sse_cost_idx] + ix86_cost->sse_store[sse_cost_idx])) : -COSTS_N_BYTES (8); however the patch itself is complicated by a change in indentation which leads to a number of lines with only whitespace changes. For architectures where integer load/store costs are the same as vector load/store costs, there should be no change without -Os/-Oz. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-01-06 Roger Sayle gcc/ChangeLog PR target/113231 * config/i386/i386-features.cc (compute_convert_gain): Include the overhead of explicit load and store (movd) instructions when converting non-store scalar operations with memory destinations. gcc/testsuite/ChangeLog PR target/113231 * gcc.target/i386/pr113231.c: New test case. Thanks again, Roger diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 4ae3e75..3677aef 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -563,183 +563,195 @@ general_scalar_chain::compute_convert_gain () else if (MEM_P (src) && REG_P (dst)) igain += m * ix86_cost->int_load[2] - ix86_cost->sse_load[sse_cost_idx]; else - switch (GET_CODE (src)) - { - case ASHIFT: - case ASHIFTRT: - case LSHIFTRT: - if (m == 2) - { - if (INTVAL (XEXP (src, 1)) >= 32) - igain += ix86_cost->add; - /* Gain for extend highpart case. */ - else if (GET_CODE (XEXP (src, 0)) == ASHIFT) - igain += ix86_cost->shift_const - ix86_cost->sse_op; - else - igain += ix86_cost->shift_const; - } - - igain += ix86_cost->shift_const - ix86_cost->sse_op; + { + /* For operations on memory operands, include the overhead + of explicit load and store instructions. */ + if (MEM_P (dst)) + igain += !optimize_insn_for_size_p () + ? (m * (ix86_cost->int_load[2] + + ix86_cost->int_store[2]) + - (ix86_cost->sse_load[sse_cost_idx] + + ix86_cost->sse_store[sse_cost_idx])) + : -COSTS_N_BYTES (8); - if (CONST_INT_P (XEXP (src, 0))) - igain -= vector_const_cost (XEXP (src, 0)); - break; + switch (GET_CODE (src)) + { + case ASHIFT: + case ASHIFTRT: + case LSHIFTRT: + if (m == 2) + { + if (INTVAL (XEXP (src, 1)) >= 32) + igain += ix86_cost->add; + /* Gain for extend highpart case. */ + else if (GET_CODE (XEXP (src, 0)) == ASHIFT) + igain += ix86_cost->shift_const - ix86_cost->sse_op; + else + igain += ix86_cost->shift_const; + } - case ROTATE: - case ROTATERT: - igain += m * ix86_cost->shift_const; - if (TARGET_AVX512VL) - igain -= ix86_cost->sse_op; - else if (smode == DImode) - { - int bits = INTVAL (XEXP (src, 1)); - if ((bits & 0x0f) == 0) - igain -= ix86_cost->sse_op; - else if ((bits & 0x07) == 0) - igain -= 2 * ix86_cost->sse_op; - else - igain -= 3 * ix86_cost->sse_op; - } - else if (INTVAL (XEXP (src, 1)) == 16) - igain -= ix86_cost->sse_op; - else - igain -= 2 * ix86_cost->sse_op; - break; + igain += ix86_cost->shift_const - ix86_cost->sse_op; - case AND: - case IOR: - case XOR: - case PLUS: - case MINUS: - igain += m * ix86_cost->add - ix86_cost->sse_op; - /* Additional gain for andnot for targets without BMI. */ - if (GET_CODE (XEXP (src, 0)) == NOT - && !TARGET_BMI) - igain += m * ix86_cost->add; - - if (CONST_INT_P (XEXP (src, 0))) - igain -= vector_const_cost (XEXP (src, 0)); - if (CONST_INT_P (XEXP (src, 1))) - igain -= vector_const_cost (XEXP (src, 1)); - if (MEM_P (XEXP (src, 1))) - { - if (optimize_insn_for_size_p ()) - igain -= COSTS_N_BYTES (m == 2 ? 3 : 5); - else - igain += m * ix86_cost->int_load[2] - - ix86_cost->sse_load[sse_cost_idx]; - } - break; + if (CONST_INT_P (XEXP (src, 0))) + igain -= vector_const_cost (XEXP (src, 0)); + break; - case NEG: - case NOT: - igain -= ix86_cost->sse_op + COSTS_N_INSNS (1); + case ROTATE: + case ROTATERT: + igain += m * ix86_cost->shift_const; + if (TARGET_AVX512VL) + igain -= ix86_cost->sse_op; + else if (smode == DImode) + { + int bits = INTVAL (XEXP (src, 1)); + if ((bits & 0x0f) == 0) + igain -= ix86_cost->sse_op; + else if ((bits & 0x07) == 0) + igain -= 2 * ix86_cost->sse_op; + else + igain -= 3 * ix86_cost->sse_op; + } + else if (INTVAL (XEXP (src, 1)) == 16) + igain -= ix86_cost->sse_op; + else + igain -= 2 * ix86_cost->sse_op; + break; - if (GET_CODE (XEXP (src, 0)) != ABS) - { + case AND: + case IOR: + case XOR: + case PLUS: + case MINUS: + igain += m * ix86_cost->add - ix86_cost->sse_op; + /* Additional gain for andnot for targets without BMI. */ + if (GET_CODE (XEXP (src, 0)) == NOT + && !TARGET_BMI) igain += m * ix86_cost->add; - break; - } - /* FALLTHRU */ - - case ABS: - case SMAX: - case SMIN: - case UMAX: - case UMIN: - /* We do not have any conditional move cost, estimate it as a - reg-reg move. Comparisons are costed as adds. */ - igain += m * (COSTS_N_INSNS (2) + ix86_cost->add); - /* Integer SSE ops are all costed the same. */ - igain -= ix86_cost->sse_op; - break; - case COMPARE: - if (XEXP (src, 1) != const0_rtx) - { - /* cmp vs. pxor;pshufd;ptest. */ - igain += COSTS_N_INSNS (m - 3); - } - else if (GET_CODE (XEXP (src, 0)) != AND) - { - /* test vs. pshufd;ptest. */ - igain += COSTS_N_INSNS (m - 2); - } - else if (GET_CODE (XEXP (XEXP (src, 0), 0)) != NOT) - { - /* and;test vs. pshufd;ptest. */ - igain += COSTS_N_INSNS (2 * m - 2); - } - else if (TARGET_BMI) - { - /* andn;test vs. pandn;pshufd;ptest. */ - igain += COSTS_N_INSNS (2 * m - 3); - } - else - { - /* not;and;test vs. pandn;pshufd;ptest. */ - igain += COSTS_N_INSNS (3 * m - 3); - } - break; + if (CONST_INT_P (XEXP (src, 0))) + igain -= vector_const_cost (XEXP (src, 0)); + if (CONST_INT_P (XEXP (src, 1))) + igain -= vector_const_cost (XEXP (src, 1)); + if (MEM_P (XEXP (src, 1))) + { + if (optimize_insn_for_size_p ()) + igain -= COSTS_N_BYTES (m == 2 ? 3 : 5); + else + igain += m * ix86_cost->int_load[2] + - ix86_cost->sse_load[sse_cost_idx]; + } + break; - case CONST_INT: - if (REG_P (dst)) - { - if (optimize_insn_for_size_p ()) - { - /* xor (2 bytes) vs. xorps (3 bytes). */ - if (src == const0_rtx) - igain -= COSTS_N_BYTES (1); - /* movdi_internal vs. movv2di_internal. */ - /* => mov (5 bytes) vs. movaps (7 bytes). */ - else if (x86_64_immediate_operand (src, SImode)) - igain -= COSTS_N_BYTES (2); - else - /* ??? Larger immediate constants are placed in the - constant pool, where the size benefit/impact of - STV conversion is affected by whether and how - often each constant pool entry is shared/reused. - The value below is empirically derived from the - CSiBE benchmark (and the optimal value may drift - over time). */ - igain += COSTS_N_BYTES (0); - } - else - { - /* DImode can be immediate for TARGET_64BIT - and SImode always. */ - igain += m * COSTS_N_INSNS (1); - igain -= vector_const_cost (src); - } - } - else if (MEM_P (dst)) - { - igain += (m * ix86_cost->int_store[2] - - ix86_cost->sse_store[sse_cost_idx]); - igain -= vector_const_cost (src); - } - break; + case NEG: + case NOT: + igain -= ix86_cost->sse_op + COSTS_N_INSNS (1); - case VEC_SELECT: - if (XVECEXP (XEXP (src, 1), 0, 0) == const0_rtx) - { - // movd (4 bytes) replaced with movdqa (4 bytes). - if (!optimize_insn_for_size_p ()) - igain += ix86_cost->sse_to_integer - ix86_cost->xmm_move; - } - else - { - // pshufd; movd replaced with pshufd. - if (optimize_insn_for_size_p ()) - igain += COSTS_N_BYTES (4); - else - igain += ix86_cost->sse_to_integer; - } - break; + if (GET_CODE (XEXP (src, 0)) != ABS) + { + igain += m * ix86_cost->add; + break; + } + /* FALLTHRU */ + + case ABS: + case SMAX: + case SMIN: + case UMAX: + case UMIN: + /* We do not have any conditional move cost, estimate it as a + reg-reg move. Comparisons are costed as adds. */ + igain += m * (COSTS_N_INSNS (2) + ix86_cost->add); + /* Integer SSE ops are all costed the same. */ + igain -= ix86_cost->sse_op; + break; - default: - gcc_unreachable (); - } + case COMPARE: + if (XEXP (src, 1) != const0_rtx) + { + /* cmp vs. pxor;pshufd;ptest. */ + igain += COSTS_N_INSNS (m - 3); + } + else if (GET_CODE (XEXP (src, 0)) != AND) + { + /* test vs. pshufd;ptest. */ + igain += COSTS_N_INSNS (m - 2); + } + else if (GET_CODE (XEXP (XEXP (src, 0), 0)) != NOT) + { + /* and;test vs. pshufd;ptest. */ + igain += COSTS_N_INSNS (2 * m - 2); + } + else if (TARGET_BMI) + { + /* andn;test vs. pandn;pshufd;ptest. */ + igain += COSTS_N_INSNS (2 * m - 3); + } + else + { + /* not;and;test vs. pandn;pshufd;ptest. */ + igain += COSTS_N_INSNS (3 * m - 3); + } + break; + + case CONST_INT: + if (REG_P (dst)) + { + if (optimize_insn_for_size_p ()) + { + /* xor (2 bytes) vs. xorps (3 bytes). */ + if (src == const0_rtx) + igain -= COSTS_N_BYTES (1); + /* movdi_internal vs. movv2di_internal. */ + /* => mov (5 bytes) vs. movaps (7 bytes). */ + else if (x86_64_immediate_operand (src, SImode)) + igain -= COSTS_N_BYTES (2); + else + /* ??? Larger immediate constants are placed in the + constant pool, where the size benefit/impact of + STV conversion is affected by whether and how + often each constant pool entry is shared/reused. + The value below is empirically derived from the + CSiBE benchmark (and the optimal value may drift + over time). */ + igain += COSTS_N_BYTES (0); + } + else + { + /* DImode can be immediate for TARGET_64BIT + and SImode always. */ + igain += m * COSTS_N_INSNS (1); + igain -= vector_const_cost (src); + } + } + else if (MEM_P (dst)) + { + igain += (m * ix86_cost->int_store[2] + - ix86_cost->sse_store[sse_cost_idx]); + igain -= vector_const_cost (src); + } + break; + + case VEC_SELECT: + if (XVECEXP (XEXP (src, 1), 0, 0) == const0_rtx) + { + // movd (4 bytes) replaced with movdqa (4 bytes). + if (!optimize_insn_for_size_p ()) + igain += ix86_cost->sse_to_integer - ix86_cost->xmm_move; + } + else + { + // pshufd; movd replaced with pshufd. + if (optimize_insn_for_size_p ()) + igain += COSTS_N_BYTES (4); + else + igain += ix86_cost->sse_to_integer; + } + break; + + default: + gcc_unreachable (); + } + } if (igain != 0 && dump_file) { diff --git a/gcc/testsuite/gcc.target/i386/pr113231.c b/gcc/testsuite/gcc.target/i386/pr113231.c new file mode 100644 index 0000000..f9dcd9a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr113231.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ + +void foo(int *i) { *i *= 2; } +void bar(int *i) { *i <<= 2; } +void baz(int *i) { *i >>= 2; } + +/* { dg-final { scan-assembler-not "movd" } } */