From patchwork Mon Nov 1 16:45:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1549206 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=LbPH7F3J; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Hjf6b1DVLz9sVc for ; Tue, 2 Nov 2021 03:45:42 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 33E683857818 for ; Mon, 1 Nov 2021 16:45:40 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 1ECCE3858432 for ; Mon, 1 Nov 2021 16:45:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1ECCE3858432 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=8qt5Pp6Kzqdzc2t6QCIV5SWlBNu72+1cyR2fQ+egGZg=; b=LbPH7F3JDI+/B/BfKWC9uxHmiO YpY8crm9vOd/Y7jsSpfomzQdJnZmIFPAitUiSzkgWWq5hVaMntXR3TZxvjWy9dA/hKi2LI9F8Y+Ja OLbMcbSx6yA65CIV0Hj5hQzXEo/dFPj9iOxEdsKUsdkMNAMX8XI7KF074XUd3OYIRTg2ckR/NSKgs 1kJrLrl8oOH21jkgEEdW9LMtM8ANG/AI8lICqWHi3ieBmeE5sA2BZt/PPUqnRLQj120ow/+SJ6fcz UcEFtCqUc0mrzb+u9M2ZU56Yfjb77xHPneEi+Gwl9nTN0v69OiJQlQrcljC6GnFYf3MG09NXP+PsO MH4EgNSA==; Received: from [185.62.158.67] (port=54208 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mhaR5-00035R-Ds; Mon, 01 Nov 2021 12:45:27 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] x86_64: Improved implementation of TImode rotations. Date: Mon, 1 Nov 2021 16:45:26 -0000 Message-ID: <01b001d7cf3f$dfd47110$9f7d5330$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdfPPpBAbx115FLtSS2efuiQtt4z4w== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This simple patch improves the implementation of 128-bit (TImode) rotations on x86_64 (a missed optimization opportunity spotted during the recent V1TImode improvements). Currently, the function: unsigned __int128 rotrti3(unsigned __int128 x, unsigned int i) { return (x >> i) | (x << (128-i)); } produces: rotrti3: movq %rsi, %r8 movq %rdi, %r9 movl %edx, %ecx movq %rdi, %rsi movq %r9, %rax movq %r8, %rdx movq %r8, %rdi shrdq %r8, %rax shrq %cl, %rdx xorl %r8d, %r8d testb $64, %cl cmovne %rdx, %rax cmovne %r8, %rdx negl %ecx andl $127, %ecx shldq %r9, %rdi salq %cl, %rsi xorl %r9d, %r9d testb $64, %cl cmovne %rsi, %rdi cmovne %r9, %rsi orq %rdi, %rdx orq %rsi, %rax ret with this patch, GCC will now generate the much nicer: rotrti3: movl %edx, %ecx movq %rdi, %rdx shrdq %rsi, %rdx shrdq %rdi, %rsi andl $64, %ecx movq %rdx, %rax cmove %rsi, %rdx cmovne %rsi, %rax ret Even I wasn't expecting the optimizer's choice of the final three instructions; a thing of beauty. For rotations larger than 64, the lowpart and the highpart (%rax and %rdx) are transposed, and it would be nice to have a conditional swap/exchange. The inspired solution the compiler comes up with is to store/duplicate the same value in both %rax/%rdx, and then use complementary conditional moves to either update the lowpart or highpart, which cleverly avoids the potential decode-stage pipeline stall (on some microarchitectures) from having multiple instructions conditional on the same condition. See X86_TUNE_ONE_IF_CONV_INSN, and notice there are two such stalls in the original expansion of rot[rl]ti3. One quick question, does TARGET_64BIT (always) imply TARGET_CMOVE? This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap and make -k check with no new failures. Interestingly the correct behaviour is already tested by (amongst other tests) sse2-v1ti-shift-3.c that confirms V1TImode rotates by constants match rotlti3/rotrti3. Ok for mainline? 2021-11-01 Roger Sayle * config/i386/i386.md (ti3): Provide expansion for rotations by non-constant amounts on TARGET_CMOVE architectures. Thanks in advance, Roger diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index e733a40..2285c6c 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12572,6 +12572,31 @@ if (const_1_to_63_operand (operands[2], VOIDmode)) emit_insn (gen_ix86_ti3_doubleword (operands[0], operands[1], operands[2])); + else if (TARGET_CMOVE) + { + rtx amount = force_reg (QImode, operands[2]); + rtx src_lo = gen_lowpart (DImode, operands[1]); + rtx src_hi = gen_highpart (DImode, operands[1]); + rtx tmp_lo = gen_reg_rtx (DImode); + rtx tmp_hi = gen_reg_rtx (DImode); + emit_move_insn (tmp_lo, src_lo); + emit_move_insn (tmp_hi, src_hi); + if ( == ROTATE) + { + emit_insn (gen_x86_64_shld (tmp_lo, src_hi, amount)); + emit_insn (gen_x86_64_shld (tmp_hi, src_lo, amount)); + } + else + { + emit_insn (gen_x86_64_shrd (tmp_lo, src_hi, amount)); + emit_insn (gen_x86_64_shrd (tmp_hi, src_lo, amount)); + } + rtx dst_lo = gen_lowpart (DImode, operands[0]); + rtx dst_hi = gen_highpart (DImode, operands[0]); + emit_move_insn (dst_lo, tmp_lo); + emit_move_insn (dst_hi, tmp_hi); + emit_insn (gen_x86_shiftdi_adj_1 (dst_lo, dst_hi, amount, tmp_lo)); + } else FAIL;