From patchwork Fri May 11 13:29:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 911992 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-477588-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=foss.arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="PiFJOhOQ"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40j9tc65N6z9s19 for ; Fri, 11 May 2018 23:29:42 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=IngVdGJVnLEcCGO59mc661IdM/as+SYzqTwka8FQqen olV8zmATCzdB9JmqGydAtRa8uVuKPbF7T4gZaWbgWQOdss3Tg8NcnV0Ze3RB7n61 xodlmI50CUg2LKJ6+sGC868C+Qymah4Gs63AzxgIcK1eTlzh3AxtLTgx0aVy7IZQ = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=uhK3ZvVR/+0quh5RmrRavVZaPfs=; b=PiFJOhOQnuUxyk/vq 78ty73FcwK5Pr9dwy3pKezcHQQ0dY2DLU6LFBvUzdho/M2ndBCuMuxaQe0u/5s17 3qBOvNjlZN+4yCmm6lYhTm1A8qwqta2AmKOIFvWoNRSZq0gOtXHvnvhzLea1stiB Rh7q7421sSdFF+uJd89jSZj7kg= Received: (qmail 87870 invoked by alias); 11 May 2018 13:29:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 87829 invoked by uid 89); 11 May 2018 13:29:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH autolearn=ham version=3.3.2 spammy=compiler's, newline, feeds X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 11 May 2018 13:29:32 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A428E1596; Fri, 11 May 2018 06:29:30 -0700 (PDT) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DB4523F220; Fri, 11 May 2018 06:29:29 -0700 (PDT) Message-ID: <5AF59AB8.2080001@foss.arm.com> Date: Fri, 11 May 2018 14:29:28 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" CC: Marcus Shawcroft , "Richard Earnshaw (lists)" , James Greenhalgh Subject: [PATCH][AArch64] Add combine pattern to fuse AESE/AESMC instructions Hi all, When the AESE,AESD and AESMC, AESMC instructions are generated through the appropriate arm_neon.h intrinsics we really want to keep them together when the AESE feeds into an AESMC and fusion is supported by the target CPU. We have macro-fusion hooks and scheduling model forwarding paths defined to facilitate that. It is, however, not always enough. This patch adds another mechanism for doing that. When we can detect during combine that the required dependency is exists (AESE -> AESMC, AESD -> AESIMC) just keep them together with a combine pattern throughout the rest of compilation. We won't ever want to split them. The testcases generate 4 AESE(D) instructions in a block followed by 4 AES(I)MC instructions that consume the corresponding results and it also adds a bunch of computations in-between so that the AESE and AESMC instructions are not trivially back-to-back, thus exercising the compiler's ability to bring them together. With this patch all 4 pairs are fused whereas before a couple of fusions would be missed due to intervening arithmetic and memory instructions. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2018-05-11 Kyrylo Tkachov * config/aarch64/aarch64-simd.md (*aarch64_crypto_aese_fused): New pattern. (aarch64_crypto_aesd_fused): Likewise. 2018-05-11 Kyrylo Tkachov * gcc.target/aarch64/crypto-fuse-1.c: New test. * gcc.target/aarch64/crypto-fuse-2.c: Likewise. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 7c166b6c8ec40475d1e01561b613b590b6690ad5..9a6ed304432af0ca23ec7d3797783a3128776a6e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -5790,6 +5790,44 @@ (define_insn "aarch64_crypto_aesv16qi" (const_string "yes")])] ) +;; When AESE/AESMC fusion is enabled we really want to keep the two together +;; and enforce the register dependency without scheduling or register +;; allocation messing up the order or introducing moves inbetween. +;; Mash the two together during combine. + +(define_insn "*aarch64_crypto_aese_fused" + [(set (match_operand:V16QI 0 "register_operand" "=&w") + (unspec:V16QI + [(unspec:V16QI + [(match_operand:V16QI 1 "register_operand" "0") + (match_operand:V16QI 2 "register_operand" "w")] UNSPEC_AESE) + ] UNSPEC_AESMC))] + "TARGET_SIMD && TARGET_AES + && aarch64_fusion_enabled_p (AARCH64_FUSE_AES_AESMC)" + "aese\\t%0.16b, %2.16b\;aesmc\\t%0.16b, %0.16b" + [(set_attr "type" "crypto_aese") + (set_attr "length" "8")] +) + +;; When AESD/AESIMC fusion is enabled we really want to keep the two together +;; and enforce the register dependency without scheduling or register +;; allocation messing up the order or introducing moves inbetween. +;; Mash the two together during combine. + +(define_insn "*aarch64_crypto_aesd_fused" + [(set (match_operand:V16QI 0 "register_operand" "=&w") + (unspec:V16QI + [(unspec:V16QI + [(match_operand:V16QI 1 "register_operand" "0") + (match_operand:V16QI 2 "register_operand" "w")] UNSPEC_AESD) + ] UNSPEC_AESIMC))] + "TARGET_SIMD && TARGET_AES + && aarch64_fusion_enabled_p (AARCH64_FUSE_AES_AESMC)" + "aesd\\t%0.16b, %2.16b\;aesimc\\t%0.16b, %0.16b" + [(set_attr "type" "crypto_aese") + (set_attr "length" "8")] +) + ;; sha1 (define_insn "aarch64_crypto_sha1hsi" diff --git a/gcc/testsuite/gcc.target/aarch64/crypto-fuse-1.c b/gcc/testsuite/gcc.target/aarch64/crypto-fuse-1.c new file mode 100644 index 0000000000000000000000000000000000000000..79fd6011ed946d746ed5f03d26c7fe661f3f8154 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/crypto-fuse-1.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mcpu=cortex-a72+crypto -dp" } */ + +#include + +#define AESE(r, v, key) (r = vaeseq_u8 ((v), (key))); +#define AESMC(r, i) (r = vaesmcq_u8 (i)) + +uint8x16_t dummy; +uint8x16_t a; +uint8x16_t b; +uint8x16_t c; +uint8x16_t d; +uint8x16_t e; + +void +foo (void) +{ + AESE (a, a, e); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESE (b, b, e); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESE (c, c, e); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESE (d, d, e); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + + AESMC (a, a); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESMC (b, b); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESMC (c, c); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESMC (d, d); +} + +/* { dg-final { scan-assembler-times "crypto_aese_fused" 4 } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.target/aarch64/crypto-fuse-2.c b/gcc/testsuite/gcc.target/aarch64/crypto-fuse-2.c new file mode 100644 index 0000000000000000000000000000000000000000..ed9eb69e803b24ec16a72075c46a9b6e6898c2fe --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/crypto-fuse-2.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mcpu=cortex-a72+crypto -dp" } */ + +#include + +#define AESE(r, v, key) (r = vaesdq_u8 ((v), (key))); +#define AESMC(r, i) (r = vaesimcq_u8 (i)) + +uint8x16_t dummy; +uint8x16_t a; +uint8x16_t b; +uint8x16_t c; +uint8x16_t d; +uint8x16_t e; + +void +foo (void) +{ + AESE (a, a, e); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESE (b, b, e); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESE (c, c, e); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESE (d, d, e); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + + AESMC (a, a); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESMC (b, b); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESMC (c, c); + dummy = vaddq_u8 (dummy, dummy); + dummy = vaddq_u8 (dummy, dummy); + AESMC (d, d); +} + +/* { dg-final { scan-assembler-times "crypto_aesd_fused" 4 } } */ \ No newline at end of file