From patchwork Sun Jun 26 15:54:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1648440 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=NUm9caqk; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LWFnb3S8Bz9tkT for ; Mon, 27 Jun 2022 01:55:46 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 11F4F3856086 for ; Sun, 26 Jun 2022 15:55:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 96DEE3856941 for ; Sun, 26 Jun 2022 15:54:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 96DEE3856941 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=psHHJvZQOnEJd5O3Hty7DbvCOT9oKFof5q/VyN/tQwo=; b=NUm9caqkbbROnqgn3vF3fbzP93 9fOmWnamzJGhSYfkI9NvES0RDiyPJ1JJ0q5hWrgJIJiQsKLIJOGh43x40uSnPMWwuqSJYbHNDjNLA hOeQOHNCofqTSWEFGq6nFzzSjcVB8hF9Vt2/iR+AdOeM24638CMyI7Vc8QCqn/8095kwBU0teX8pX 51SnzoqT97PPRWTsI2tezQJ8oYM/Wrtq1HgyawW98npwphHTLS/AB6RZ7dkxv8aYIuNVwxL6Bb3CU sZNA2YQzG0EuIpWTPKjsi3TErzVr99pqoHhfuBOwQbHDycH8ZL5PlFktZ4zNi6ppft7MapulepJqv xtiHKtIA==; Received: from host86-130-134-60.range86-130.btcentralplus.com ([86.130.134.60]:55584 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o5UaH-0003MT-Ti; Sun, 26 Jun 2022 11:54:02 -0400 From: "Roger Sayle" To: Subject: [x86 PATCH] Use xchg for DImode double word rotate by 32 bits with -m32. Date: Sun, 26 Jun 2022 16:54:00 +0100 Message-ID: <000301d88974$f4870290$dd9507b0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdiJdCS5Wij3XFVKQ0CDDnoThBPZ9A== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch was motivated by the investigation of Linus Torvalds' spill heavy cryptography kernels in PR 105930. The di3 expander handles all rotations by an immediate constant for 1..63 bits with the exception of 32 bits, which FAILs and is then split by the middle-end. This patch makes these 32-bit doubleword rotations consistent with the other DImode rotations during reload, which results in reduced register pressure, fewer instructions and the use of x86's xchg instruction when appropriate. In theory, xchg can be handled by register renaming, but even on micro-architectures where it's implemented by 3 uops (no worse than a three instruction shuffle), avoiding nominating a "temporary" register, reduces user-visible register pressure (and has obvious code size benefits). To effects are best shown with the new testcase: unsigned long long bar(); unsigned long long foo() { unsigned long long x = bar(); return (x>>32) | (x<<32); } for which GCC with -m32 -O2 currently generates: subl $12, %esp call bar addl $12, %esp movl %eax, %ecx movl %edx, %eax movl %ecx, %edx ret but with this patch now generates: subl $12, %esp call bar addl $12, %esp xchgl %edx, %eax ret With this patch, the number of lines of assembly language generated for the blake2b kernel (from the attachment to PR105930) decreases from 5626 to 5404. Although there's an impressive reduction in instruction count, there's no change/reduction in stack frame size. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-06-26 Roger Sayle gcc/ChangeLog * config/i386/i386.md (swap_mode): Rename from *swap to provide gen_swapsi. (di3): Handle !TARGET_64BIT rotations via new gen_ix86_32di2_doubleword below. (ix86_32di2_doubleword): New define_insn_and_split that splits after reload as either a pair of move instructions or an xchgl (using gen_swapsi). gcc/testsuite/ChangeLog * gcc.target/i386/xchg-3.c: New test case. Thanks in advance, Roger diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index dd173f7..ab94866 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2966,7 +2966,7 @@ (set_attr "memory" "load") (set_attr "mode" "")]) -(define_insn "*swap" +(define_insn "swap" [(set (match_operand:SWI48 0 "register_operand" "+r") (match_operand:SWI48 1 "register_operand" "+r")) (set (match_dup 1) @@ -13648,6 +13648,8 @@ else if (const_1_to_31_operand (operands[2], VOIDmode)) emit_insn (gen_ix86_di3_doubleword (operands[0], operands[1], operands[2])); + else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 32) + emit_insn (gen_ix86_32di2_doubleword (operands[0], operands[1])); else FAIL; @@ -13820,6 +13822,24 @@ split_double_mode (mode, &operands[0], 1, &operands[4], &operands[5]); }) +(define_insn_and_split "ix86_32di2_doubleword" + [(set (match_operand:DI 0 "register_operand" "=r,r") + (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "r,o") + (const_int 32)))] + "!TARGET_64BIT" + "#" + "&& reload_completed" + [(set (match_dup 0) (match_dup 3)) + (set (match_dup 2) (match_dup 1))] +{ + split_double_mode (DImode, &operands[0], 2, &operands[0], &operands[2]); + if (rtx_equal_p (operands[0], operands[1])) + { + emit_insn (gen_swapsi (operands[0], operands[2])); + DONE; + } +}) + (define_mode_attr rorx_immediate_operand [(SI "const_0_to_31_operand") (DI "const_0_to_63_operand")]) diff --git a/gcc/testsuite/gcc.target/i386/xchg-3.c b/gcc/testsuite/gcc.target/i386/xchg-3.c new file mode 100644 index 0000000..eec05f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/xchg-3.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2" } */ + +unsigned long long bar(); + +unsigned long long foo() +{ + unsigned long long x = bar(); + return (x>>32) | (x<<32); +} + +/*{ dg-final { scan-assembler "xchgl" } } */